Varun Raval created ORC-1031:
--------------------------------

             Summary: No way to escape delimiter in column values
                 Key: ORC-1031
                 URL: https://issues.apache.org/jira/browse/ORC-1031
             Project: ORC
          Issue Type: Bug
          Components: C++
            Reporter: Varun Raval


I am using the C++ csv to orc tool to convert csv file to orc file and I could 
not find a way to escape the delimiters present in the column values of the 
table in csv file. If a delimiter is present as part of a column value in csv 
file, csv to orc tool uses that character to separate the columns and that 
messes up the data in the orc file.

 

For my scenario, all the possible values for delimiter can be a character in 
one of the columns in csv file.

To provide more information about my use case, I have a hive table with binary 
column and I have a csv file with that column having binary data. I am 
converting csv file to orc file using this tool. There are no restrictions on 
what kind of data that binary column can have and hence the delimiter we use 
for csv to orc conversion, can end up inside that binary column.

Sample value of the binary column shown below
{code:java}
9Tl���������������~sjc_\[[\^`a`]WPF:."�������������������+Gaw���������������xnf`][Z[\_`a_[TK@4
{code}
 

If there is a way to escape the delimiter characters in the column values, that 
would be really helpful!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to