Keuntae Park created TAJO-1339:
----------------------------------

             Summary: Incorrect handling of tables with custom delimiter when 
their data contain '|'
                 Key: TAJO-1339
                 URL: https://issues.apache.org/jira/browse/TAJO-1339
             Project: Tajo
          Issue Type: Bug
            Reporter: Keuntae Park


With the table data
{code}
1;a;1.1
2;a|b;2.4
3;b|c|d;3.2
{code}
and external table declaration
{code}
create external table delimiter (id int, name text, score float) using csv
with ('csvfile.delimiter'=';') location 'xxx';
{code}
, I got the following incorrect query result for 'query select name, score from 
delimiter'
{code}
name,score
-------------------------------
a,1.1
a,null
b,null
{code}

It looks like '|' in name column is recognized as delimiter.
As I inspect the code,
table meta information like 'csvfile.delimiter' is only valid on leaf scan 
operation and all the other operations (including making intermediate data and 
materialize query result) assumes that delimiter is DEFAULT_FIELD_DELIMITER, 
which is '|'.
Hence, if the plan has the process of making intermediate data, 
it handles '|' in the data as a delimiter even though it is not actually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to