Github user fmethot commented on a diff in the pull request:
https://github.com/apache/drill/pull/593#discussion_r80169931
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/TextReader.java
---
@@ -231,33 +231,34 @@ private void parseQuotedValue(byte prev) throws
IOException {
final TextInput input = this.input;
final byte quote = this.quote;
- ch = input.nextChar();
+ try {
+ input.setMonitorForNewLine(false);
+ ch = input.nextChar();
- while (!(prev == quote && (ch == delimiter || ch == newLine ||
isWhite(ch)))) {
- if (ch != quote) {
- if (prev == quote) { // unescaped quote detected
- if (parseUnescapedQuotes) {
- output.append(quote);
- output.append(ch);
- parseQuotedValue(ch);
- break;
- } else {
- throw new TextParsingException(
- context,
- "Unescaped quote character '"
- + quote
- + "' inside quoted value of CSV field. To allow
unescaped quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser
settings. Cannot parse CSV input.");
+ while (!(prev == quote && (ch == delimiter || ch == newLine ||
isWhite(ch)))) {
+ if (ch != quote) {
+ if (prev == quote) { // unescaped quote detected
+ if (parseUnescapedQuotes) {
+ output.append(quote);
+ output.append(ch);
+ parseQuotedValue(ch);
+ break;
+ } else {
+ throw new TextParsingException(context, "Unescaped quote
character '" + quote + "' inside quoted value of CSV field. To allow unescaped
quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser settings. Cannot
parse CSV input.");
+ }
}
+ output.append(ch);
+ prev = ch;
+ } else if (prev == quoteEscape) {
+ output.append(quote);
+ prev = NULL_BYTE;
+ } else {
+ prev = ch;
}
- output.append(ch);
- prev = ch;
- } else if (prev == quoteEscape) {
- output.append(quote);
- prev = NULL_BYTE;
- } else {
- prev = ch;
+ ch = input.nextChar();
}
- ch = input.nextChar();
+ } finally {
--- End diff --
Because the finally block always runs, it is important to always set the
flag back to false because the input.getChar() is called from everywhere.
- We could remove the finally assuming that when an exception occurs the
TextReader will just stop doing any parsing and exit. (input.getChar never gets
called again) Please advice if that's the way we should do.
- In a custom version of the CompliantTextRecordReader that we use
internally, we are resilient to error within rows, once an exception occurs we
are able to recover to a next line and keep parsing, the finally clause ensure
the TextInput is in proper state after failure.
- Only thing I am worried is the extra operations required to handle try
finally in a method that gets called 100 000s of time per seconds per thread, I
haven't tested it. Compiler must be doing a good job at optimizing these.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---