Keuntae Park created TAJO-1339:
----------------------------------
Summary: Incorrect handling of tables with custom delimiter when
their data contain '|'
Key: TAJO-1339
URL: https://issues.apache.org/jira/browse/TAJO-1339
Project: Tajo
Issue Type: Bug
Reporter: Keuntae Park
With the table data
{code}
1;a;1.1
2;a|b;2.4
3;b|c|d;3.2
{code}
and external table declaration
{code}
create external table delimiter (id int, name text, score float) using csv
with ('csvfile.delimiter'=';') location 'xxx';
{code}
, I got the following incorrect query result for 'query select name, score from
delimiter'
{code}
name,score
-------------------------------
a,1.1
a,null
b,null
{code}
It looks like '|' in name column is recognized as delimiter.
As I inspect the code,
table meta information like 'csvfile.delimiter' is only valid on leaf scan
operation and all the other operations (including making intermediate data and
materialize query result) assumes that delimiter is DEFAULT_FIELD_DELIMITER,
which is '|'.
Hence, if the plan has the process of making intermediate data,
it handles '|' in the data as a delimiter even though it is not actually.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)