Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
......................................................................


Patch Set 6:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/4968/6//COMMIT_MSG
Commit Message:

PS6, Line 12: did't
> typo
Done


PS6, Line 14: date
> year*
Done


http://gerrit.cloudera.org:8080/#/c/4968/6/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 526:     parent_->parse_status_ = Status(*msg);
> the claim is we are pulling this out to avoid status construction code, but
Good point, Done.


PS6, Line 598:  
> nit: extraneous space
Done


Line 599:   if (!src->IsValidDate()) {
> UNLIKELY
Done, (also changed the order of the if statement).


http://gerrit.cloudera.org:8080/#/c/4968/6/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

PS6, Line 512: 
> was this an incorrect comment?
I saw a comment in the implementation, so I didn't think it was true any more:
"// AssembleCollection() advances child readers, so we don't need to call 
NextLevels()"

I looked at the implementation, but it turns out AssembleCollection() actually 
calls NextLevels(). I'll put the comment back.


PS6, Line 449: void* slot
> shouldn't this be 'tuple'.  but the fact that this compiled probably means 
Yeah, I don't think this is used anywhere, so I'll delete it.


http://gerrit.cloudera.org:8080/#/c/4968/6/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 156:     return date_.day_number() >= MIN_DAY_NUMBER && date_.day_number() 
<= MAX_DAY_NUMBER;
> are we sure these are the only invalidate TimestampValue's that may have be
Yes, it turns out there are (but they don't cause a crash). I added validation 
of time_ as well.


http://gerrit.cloudera.org:8080/#/c/4968/6/testdata/bad_parquet_data/README
File testdata/bad_parquet_data/README:

PS6, Line 16: out-of-range-timestamp.parq
> not sure what you mean -- aren't you adding a parquet file to test this? co
I'm adding a file to testdata/data, this readme is in testdata/bad_parquet_data


http://gerrit.cloudera.org:8080/#/c/4968/6/testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test
File 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test:

PS6, Line 7: Parquet file '$NAM
> why is the message repeated?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <laszlo.g...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to