Dan Hecht has posted comments on this change. Change subject: IMPALA-4363: Add Parquet timestamp validation ......................................................................
Patch Set 7: (5 comments) http://gerrit.cloudera.org:8080/#/c/4968/7/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: Line 473: ErrorMsg msg; this has a non-trivial destructor, so we should probably only construct it after checking NeedsValidation(). Also, why are NeedsValidation() and NeedsConversion() mutually exclusive? Actually, doesn't this break --convert_legacy_hive_parquet_utc_timestamps since we'll never take the conversion path for timestamp now? (Do we not have tests for that?) That is, I think both NeedsConversion() and NeedsValidation() can be true, and the code needs to be adjusted for that. Also consider tucking lines 475 to 480 inside ValidateSlot() and that way the ErrorMsg doesn't have to be passed back and forth, and it'll naturally keep the construction after the NeedsValidation() check. PS7, Line 474: val_ptr you'll also have to be careful that we pass the right pointer here, depending on how you address the comment above. http://gerrit.cloudera.org:8080/#/c/4968/7/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: Line 147: /// Verifies that the timestamp date falls into a valid range (years 1400..9999). update this comment to be clear that we also validate time. PS7, Line 156: l doesn't this have to be "ll" (long long)? PS7, Line 160: time_.ticks() < NUMBER_OF_NANOSECONDS_IN_24H I'm slightly worried about how this interacts with leap seconds. Are you sure this is always illegal? If it doesn't crash, what behavior did you see (i.e. how do you know this is the correct bounds)? -- To view, visit http://gerrit.cloudera.org:8080/4968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Laszlo Gaal <laszlo.g...@cloudera.com> Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-HasComments: Yes