Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-3989: Display skew warning for poorly formatted Parquet files ......................................................................
Patch Set 4: (9 comments) http://gerrit.cloudera.org:8080/#/c/5400/4/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 75: bool poorly_formatted_file_warned; Presumably you did this to prevent excessive logging, but I think its unnecessary. Removing it won't result in excess logging because error messages are grouped by their error code (eg. in the shell you'll just see one instance of the message followed by '(1 of X similar)', and its better not to clutter up this struct. If you're still worried about excessive logging, you could move the call to LogPoorlyFormattedParquetFileWarning to the end of NextRowGroup outside of the 'while(true)'. In fact, if you remove this, then LogPoorlyFormattedParquetFileWarning isn't really doing much and you could eliminate it and just do the logging inline in NextRowGroup. http://gerrit.cloudera.org:8080/#/c/5400/4/common/thrift/generate_error_codes.py File common/thrift/generate_error_codes.py: PS4, Line 315: Remove spaces, here and below. Sorry, I know we're inconsistent about our Python style, but we try to mostly follow PEP8: https://www.python.org/dev/peps/pep-0008/#indentation http://gerrit.cloudera.org:8080/#/c/5400/4/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: PS4, Line 319: 'NumScannersWithMisalignedRowGroups' NumScannersWithNoReads PS4, Line 321: """ Move """ to next line. Another place where we're inconsistent with our Python style: https://www.python.org/dev/peps/pep-0008/#documentation-strings PS4, Line 347: num_scanners_misaligned_row_groups num_scanners_with_no_reads PS4, Line 349: excuting executing PS4, Line 351: 'num_scanners_misaligned_row_groups' num_scanners_with_no_reads PS4, Line 352: """ Move to next line. PS4, Line 362: 'NumScannersWithMisalignedRowGroups: NumScannersWithNoReads -- To view, visit http://gerrit.cloudera.org:8080/5400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-HasComments: Yes