[ 
https://issues.apache.org/jira/browse/DRILL-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553522#comment-15553522
 ] 

ASF GitHub Bot commented on DRILL-3178:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/593#discussion_r82303401
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/TextReader.java
 ---
    @@ -231,33 +231,34 @@ private void parseQuotedValue(byte prev) throws 
IOException {
         final TextInput input = this.input;
         final byte quote = this.quote;
     
    -    ch = input.nextChar();
    +    try {
    +      input.setMonitorForNewLine(false);
    --- End diff --
    
    Seems an overly complex way to do the parsing. Is there any reason we want 
to capture the original newline character rather than the normalized one?
    
    If we need to capture the original one, then a cleaner way to do that is to 
keep track of the start & end position of the current token (character), and 
provide a method to return that block as a string. Then, scan for a close 
quote, reading characters & special-casing any newlines.
    
    If we want to include newlines in quoted strings sometimes, but not other 
times, then the check logic can be a bit more complex.
    
    But, the proposed solution of making newlines not be newlines seems a bit 
odd...


> csv reader should allow newlines inside quotes 
> -----------------------------------------------
>
>                 Key: DRILL-3178
>                 URL: https://issues.apache.org/jira/browse/DRILL-3178
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Text & CSV
>    Affects Versions: 1.0.0
>         Environment: Ubuntu Trusty 14.04.2 LTS
>            Reporter: Neal McBurnett
>            Assignee: F Méthot
>             Fix For: Future
>
>         Attachments: drill-3178.patch
>
>
> When reading a csv file which contains newlines within quoted strings, e.g. 
> via
>     select * from dfs.`/tmp/q.csv`;
> Drill 1.0 says:
>     Error: SYSTEM ERROR: com.univocity.parsers.common.TextParsingException:  
> Error processing input: Cannot use newline character within quoted string
> But many tools produce csv files with newlines in quoted strings.  Drill 
> should be able to handle them.
> Workaround: the csvquote program (https://github.com/dbro/csvquote) can 
> encode embedded commas and newlines, and even decode them later if desired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to