Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
......................................................................


Patch Set 15:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/exec/hdfs-rcfile-scanner.cc
File be/src/exec/hdfs-rcfile-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/exec/hdfs-rcfile-scanner.cc@91
PS15, Line 91: excceeded
typo: exceeded


http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/exec/hdfs-rcfile-scanner.cc@91
PS15, Line 91:  8 million limit"
Let's use the constant instead of hardcoding it in the error message so that 
they can't get out of sync.


http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/exec/hdfs-rcfile-scanner.cc@545
PS15, Line 545:     if (max_tuples < 0) {
It seems like this is catching the problem too late. How does it get into this 
invalid state? The problem is in our own code so we shoudl fix it at the source 
rather than trying to catch it here.


http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/exec/hdfs-rcfile-scanner.cc@588
PS15, Line 588:         if (col_end > row_group_end || column.start_offset < 0 
|| column.buffer_pos < 0
I think we should catch the invalid column metadata when we read it in 
GetCurrentKeyBuffer(), rather than letting things get into an invalid state and 
trying to catch it later.


http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/runtime/timestamp-parse-util.cc@370
PS15, Line 370:   if (UNLIKELY(str == NULL || len <= 0)) {
We're already checking len here but the problem is that we're trimming 
whitespace below. I think the right fix is to check if (len <= 0) immediately 
after stripping leading and trailing whitespace. That will be less error-prone 
than trying to catch it later


http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/runtime/timestamp-parse-util.cc@477
PS15, Line 477:     if (lazy_len > 0 && ParseFormatTokensByStr(&lazy_ctx)) {
Looks like you hit upon a different bug here - I can trigger that DCHECK with 
another query -

  select cast(' ' as timestamp);

I don't think it should affect release builds, but can you file a separate bug 
to fix that? I think we should fix that separately and add a specific 
regression text.



--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 15
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: anujphadke <apha...@cloudera.com>
Gerrit-Comment-Date: Fri, 27 Apr 2018 23:27:44 +0000
Gerrit-HasComments: Yes

Reply via email to