[GitHub] drill issue #593: DRILL-3178 csv reader should allow newlines inside quotes

2016-10-07 Thread fmethot
Github user fmethot commented on the issue:

https://github.com/apache/drill/pull/593
  
Thanks for the comments so far, See my newer changes,  I suggest that we 
remove the flag and add an extra method instead.
There is no more check for a boolean in nextChar, but instead there is an 
extra method call (readNext()->readNextNoNewLineCheck())



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #593: Add flag for end of line monitoring

2016-09-22 Thread fmethot
Github user fmethot commented on a diff in the pull request:

https://github.com/apache/drill/pull/593#discussion_r80170855
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/TextInput.java
 ---
@@ -273,7 +278,7 @@ public final byte nextChar() throws IOException {
 byte byteChar = PlatformDependent.getByte(bStartMinus1 + bufferPtr);
 
 if (bufferPtr >= length) {
-  if (length != -1) {
+  if(length != -1) {
--- End diff --

fixed, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #593: Add flag for end of line monitoring

2016-09-22 Thread fmethot
Github user fmethot commented on a diff in the pull request:

https://github.com/apache/drill/pull/593#discussion_r80169931
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/TextReader.java
 ---
@@ -231,33 +231,34 @@ private void parseQuotedValue(byte prev) throws 
IOException {
 final TextInput input = this.input;
 final byte quote = this.quote;
 
-ch = input.nextChar();
+try {
+  input.setMonitorForNewLine(false);
+  ch = input.nextChar();
 
-while (!(prev == quote && (ch == delimiter || ch == newLine || 
isWhite(ch {
-  if (ch != quote) {
-if (prev == quote) { // unescaped quote detected
-  if (parseUnescapedQuotes) {
-output.append(quote);
-output.append(ch);
-parseQuotedValue(ch);
-break;
-  } else {
-throw new TextParsingException(
-context,
-"Unescaped quote character '"
-+ quote
-+ "' inside quoted value of CSV field. To allow 
unescaped quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser 
settings. Cannot parse CSV input.");
+  while (!(prev == quote && (ch == delimiter || ch == newLine || 
isWhite(ch {
+if (ch != quote) {
+  if (prev == quote) { // unescaped quote detected
+if (parseUnescapedQuotes) {
+  output.append(quote);
+  output.append(ch);
+  parseQuotedValue(ch);
+  break;
+} else {
+  throw new TextParsingException(context, "Unescaped quote 
character '" + quote + "' inside quoted value of CSV field. To allow unescaped 
quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser settings. Cannot 
parse CSV input.");
+}
   }
+  output.append(ch);
+  prev = ch;
+} else if (prev == quoteEscape) {
+  output.append(quote);
+  prev = NULL_BYTE;
+} else {
+  prev = ch;
 }
-output.append(ch);
-prev = ch;
-  } else if (prev == quoteEscape) {
-output.append(quote);
-prev = NULL_BYTE;
-  } else {
-prev = ch;
+ch = input.nextChar();
   }
-  ch = input.nextChar();
+} finally {
--- End diff --

Because the finally block always runs, it is important to always set the 
flag back to false because the input.getChar() is called  from everywhere. 
 - We could remove the finally assuming that when an exception occurs the 
TextReader will just stop doing any parsing and exit. (input.getChar never gets 
called again) Please advice if that's the way we should do.
- In a custom version of the CompliantTextRecordReader that we use 
internally, we are resilient to error within rows, once an exception occurs we 
are able to recover to a next line and keep parsing, the finally clause ensure 
the TextInput is in proper state after failure.
- Only thing I am worried is the extra operations required to handle try 
finally in a method that gets called 100 000s of time per seconds per thread, I 
haven't tested it. Compiler must be doing a good job at optimizing these.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #593: Add flag for end of line monitoring

2016-09-22 Thread fmethot
Github user fmethot commented on a diff in the pull request:

https://github.com/apache/drill/pull/593#discussion_r80169935
  
--- Diff: exec/java-exec/src/test/resources/store/text/WithQuotedCrLf.tbl 
---
@@ -0,0 +1,6 @@
+"a
+1"|a|a
+a|"a
+2"|a
+a|a|"a
+3"
--- End diff --

This csv contains new lines. The test check for new line with \r\n chars,


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #593: Add flag for end of line monitoring

2016-09-20 Thread fmethot
GitHub user fmethot opened a pull request:

https://github.com/apache/drill/pull/593

Add flag for end of line monitoring

Add a option for disabling for end-of-line character detection.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fmethot/drill-1 drill-3178

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/593.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #593


commit 60ffa1dd45a5c9f9ade23da001cf73201e892713
Author: Francois Methot 
Date:   2016-09-21T00:16:15Z

Add flag for end of line monitoring




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---