Dan Hecht has posted comments on this change.

Change subject: IMPALA-3989: Display skew warning for poorly formatted Parquet 
files
......................................................................


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/5400/8/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

PS8, Line 315: is poorly formatted
this seems a bit strong and can be misinterpreted as in the file results in an 
error. The parquet file is valid, it's just that it's not optimally aligned 
with hdfs blocks for performance.  

"Parquet file '$0': Row group size doesn't align with HDFS block size, 
potentially resulting in decreased scan performance."

or something like that.


PS8, Line 319: fs.s3a.block.size
this is a global option, though, so it might not be possible to match the size 
of all the files (they may have mismatched row groups).  Also, the person 
executing the query may not be the administer of the system.

Instead, maybe we can just hint strongly enough at the solution:

Parquet file '$0': Row group size doesn't match the S3A blocksize 
(fs.s3a.block.size) potentially resulting in decreased scan performance.

or similar.

Also, it may help to include the actual value of fs.s3a.block.size (and 
similarly HDFS blocksize) in the error to help diagnose.


http://gerrit.cloudera.org:8080/#/c/5400/8/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

Line 314:   @SkipIfS3.hdfs_block_size
why not make this test work for S3?


PS8, Line 327: hdfs://localhost:20500
this (and other places) won't work for S3 (and other non-hdfs) test setups. Use 
filesystem_prefix().


-- 
To view, visit http://gerrit.cloudera.org:8080/5400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to