Sean Hsuan-Yi Chu created DRILL-3808:
----------------------------------------

             Summary: When reading TSV files, TextReader does not follow the 
standard
                 Key: DRILL-3808
                 URL: https://issues.apache.org/jira/browse/DRILL-3808
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Text & CSV
            Reporter: Sean Hsuan-Yi Chu
            Assignee: Sean Hsuan-Yi Chu
            Priority: Critical


According to references [1], [2]:

In .csv, the double quote is a special character as it can optionally enclose a 
text field. But in .tsv, it is not a special character, and it can appear 
anywhere and when it does, it should treated as a literal. The tsv format 
specification also does not provide for the tab or CR/LF characters to show up 
anywhere in text fields. However, Drill treats tsv very the same like csv.

For an example, given data:
{code}
"test"\t"test"
{code}
A query: select columns[0], columns[1] from `t.tsv`; Drill would give
{code}
test      test
{code}
However, according to the reference[2], it is supposed to be
{code}
"test"      "test"
{code}

Ideally, the Drill should follow the standard see[2].
[1] CSV - https://tools.ietf.org/html/rfc4180
[2] TSV - http://www.iana.org/assignments/media-types/text/tab-separated-values




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to