Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4363: Add Parquet timestamp validation ......................................................................
Patch Set 6: (10 comments) http://gerrit.cloudera.org:8080/#/c/4968/6//COMMIT_MSG Commit Message: PS6, Line 12: did't > typo Done PS6, Line 14: date > year* Done http://gerrit.cloudera.org:8080/#/c/4968/6/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: Line 526: parent_->parse_status_ = Status(*msg); > the claim is we are pulling this out to avoid status construction code, but Good point, Done. PS6, Line 598: > nit: extraneous space Done Line 599: if (!src->IsValidDate()) { > UNLIKELY Done, (also changed the order of the if statement). http://gerrit.cloudera.org:8080/#/c/4968/6/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: PS6, Line 512: > was this an incorrect comment? I saw a comment in the implementation, so I didn't think it was true any more: "// AssembleCollection() advances child readers, so we don't need to call NextLevels()" I looked at the implementation, but it turns out AssembleCollection() actually calls NextLevels(). I'll put the comment back. PS6, Line 449: void* slot > shouldn't this be 'tuple'. but the fact that this compiled probably means Yeah, I don't think this is used anywhere, so I'll delete it. http://gerrit.cloudera.org:8080/#/c/4968/6/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: Line 156: return date_.day_number() >= MIN_DAY_NUMBER && date_.day_number() <= MAX_DAY_NUMBER; > are we sure these are the only invalidate TimestampValue's that may have be Yes, it turns out there are (but they don't cause a crash). I added validation of time_ as well. http://gerrit.cloudera.org:8080/#/c/4968/6/testdata/bad_parquet_data/README File testdata/bad_parquet_data/README: PS6, Line 16: out-of-range-timestamp.parq > not sure what you mean -- aren't you adding a parquet file to test this? co I'm adding a file to testdata/data, this readme is in testdata/bad_parquet_data http://gerrit.cloudera.org:8080/#/c/4968/6/testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test File testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test: PS6, Line 7: Parquet file '$NAM > why is the message repeated? Done -- To view, visit http://gerrit.cloudera.org:8080/4968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Laszlo Gaal <laszlo.g...@cloudera.com> Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-HasComments: Yes