[ https://issues.apache.org/jira/browse/DRILL-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553522#comment-15553522 ]
ASF GitHub Bot commented on DRILL-3178: --------------------------------------- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/593#discussion_r82303401 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/TextReader.java --- @@ -231,33 +231,34 @@ private void parseQuotedValue(byte prev) throws IOException { final TextInput input = this.input; final byte quote = this.quote; - ch = input.nextChar(); + try { + input.setMonitorForNewLine(false); --- End diff -- Seems an overly complex way to do the parsing. Is there any reason we want to capture the original newline character rather than the normalized one? If we need to capture the original one, then a cleaner way to do that is to keep track of the start & end position of the current token (character), and provide a method to return that block as a string. Then, scan for a close quote, reading characters & special-casing any newlines. If we want to include newlines in quoted strings sometimes, but not other times, then the check logic can be a bit more complex. But, the proposed solution of making newlines not be newlines seems a bit odd... > csv reader should allow newlines inside quotes > ----------------------------------------------- > > Key: DRILL-3178 > URL: https://issues.apache.org/jira/browse/DRILL-3178 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Text & CSV > Affects Versions: 1.0.0 > Environment: Ubuntu Trusty 14.04.2 LTS > Reporter: Neal McBurnett > Assignee: F Méthot > Fix For: Future > > Attachments: drill-3178.patch > > > When reading a csv file which contains newlines within quoted strings, e.g. > via > select * from dfs.`/tmp/q.csv`; > Drill 1.0 says: > Error: SYSTEM ERROR: com.univocity.parsers.common.TextParsingException: > Error processing input: Cannot use newline character within quoted string > But many tools produce csv files with newlines in quoted strings. Drill > should be able to handle them. > Workaround: the csvquote program (https://github.com/dbro/csvquote) can > encode embedded commas and newlines, and even decode them later if desired. -- This message was sent by Atlassian JIRA (v6.3.4#6332)