Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8936 )
Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners ...................................................................... Patch Set 15: (6 comments) http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/exec/hdfs-rcfile-scanner.cc File be/src/exec/hdfs-rcfile-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/exec/hdfs-rcfile-scanner.cc@91 PS15, Line 91: excceeded typo: exceeded http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/exec/hdfs-rcfile-scanner.cc@91 PS15, Line 91: 8 million limit" Let's use the constant instead of hardcoding it in the error message so that they can't get out of sync. http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/exec/hdfs-rcfile-scanner.cc@545 PS15, Line 545: if (max_tuples < 0) { It seems like this is catching the problem too late. How does it get into this invalid state? The problem is in our own code so we shoudl fix it at the source rather than trying to catch it here. http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/exec/hdfs-rcfile-scanner.cc@588 PS15, Line 588: if (col_end > row_group_end || column.start_offset < 0 || column.buffer_pos < 0 I think we should catch the invalid column metadata when we read it in GetCurrentKeyBuffer(), rather than letting things get into an invalid state and trying to catch it later. http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/runtime/timestamp-parse-util.cc@370 PS15, Line 370: if (UNLIKELY(str == NULL || len <= 0)) { We're already checking len here but the problem is that we're trimming whitespace below. I think the right fix is to check if (len <= 0) immediately after stripping leading and trailing whitespace. That will be less error-prone than trying to catch it later http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/runtime/timestamp-parse-util.cc@477 PS15, Line 477: if (lazy_len > 0 && ParseFormatTokensByStr(&lazy_ctx)) { Looks like you hit upon a different bug here - I can trigger that DCHECK with another query - select cast(' ' as timestamp); I don't think it should affect release builds, but can you file a separate bug to fix that? I think we should fix that separately and add a specific regression text. -- To view, visit http://gerrit.cloudera.org:8080/8936 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a Gerrit-Change-Number: 8936 Gerrit-PatchSet: 15 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: anujphadke <apha...@cloudera.com> Gerrit-Comment-Date: Fri, 27 Apr 2018 23:27:44 +0000 Gerrit-HasComments: Yes