[ 
https://issues.apache.org/jira/browse/DRILL-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553520#comment-15553520
 ] 

ASF GitHub Bot commented on DRILL-3178:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/593#discussion_r82304834
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/TextReader.java
 ---
    @@ -231,33 +231,34 @@ private void parseQuotedValue(byte prev) throws 
IOException {
         final TextInput input = this.input;
         final byte quote = this.quote;
     
    -    ch = input.nextChar();
    +    try {
    +      input.setMonitorForNewLine(false);
    +      ch = input.nextChar();
     
    -    while (!(prev == quote && (ch == delimiter || ch == newLine || 
isWhite(ch)))) {
    -      if (ch != quote) {
    -        if (prev == quote) { // unescaped quote detected
    -          if (parseUnescapedQuotes) {
    -            output.append(quote);
    -            output.append(ch);
    -            parseQuotedValue(ch);
    -            break;
    -          } else {
    -            throw new TextParsingException(
    -                context,
    -                "Unescaped quote character '"
    -                    + quote
    -                    + "' inside quoted value of CSV field. To allow 
unescaped quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser 
settings. Cannot parse CSV input.");
    +      while (!(prev == quote && (ch == delimiter || ch == newLine || 
isWhite(ch)))) {
    +        if (ch != quote) {
    +          if (prev == quote) { // unescaped quote detected
    +            if (parseUnescapedQuotes) {
    +              output.append(quote);
    +              output.append(ch);
    +              parseQuotedValue(ch);
    +              break;
    +            } else {
    +              throw new TextParsingException(context, "Unescaped quote 
character '" + quote + "' inside quoted value of CSV field. To allow unescaped 
quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser settings. Cannot 
parse CSV input.");
    +            }
               }
    +          output.append(ch);
    +          prev = ch;
    +        } else if (prev == quoteEscape) {
    +          output.append(quote);
    +          prev = NULL_BYTE;
    +        } else {
    +          prev = ch;
             }
    -        output.append(ch);
    -        prev = ch;
    -      } else if (prev == quoteEscape) {
    -        output.append(quote);
    -        prev = NULL_BYTE;
    -      } else {
    -        prev = ch;
    +        ch = input.nextChar();
           }
    -      ch = input.nextChar();
    +    } finally {
    --- End diff --
    
    I see why it is done in finally. However, as noted above, I'm not sure that 
pushing this kind of flag into the getChar function is the optimal approach...


> csv reader should allow newlines inside quotes 
> -----------------------------------------------
>
>                 Key: DRILL-3178
>                 URL: https://issues.apache.org/jira/browse/DRILL-3178
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Text & CSV
>    Affects Versions: 1.0.0
>         Environment: Ubuntu Trusty 14.04.2 LTS
>            Reporter: Neal McBurnett
>            Assignee: F Méthot
>             Fix For: Future
>
>         Attachments: drill-3178.patch
>
>
> When reading a csv file which contains newlines within quoted strings, e.g. 
> via
>     select * from dfs.`/tmp/q.csv`;
> Drill 1.0 says:
>     Error: SYSTEM ERROR: com.univocity.parsers.common.TextParsingException:  
> Error processing input: Cannot use newline character within quoted string
> But many tools produce csv files with newlines in quoted strings.  Drill 
> should be able to handle them.
> Workaround: the csvquote program (https://github.com/dbro/csvquote) can 
> encode embedded commas and newlines, and even decode them later if desired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to