Attila Jeges has posted comments on this change. Change subject: IMPALA-3989: Display skew warning for poorly formatted Parquet files ......................................................................
Patch Set 9: > > (1 comment) > > > > > (2 comments) > > > > > > If you want to add the config values to the message, I can take > > > another look. Otherwise, this looks okay. Did you run the S3 > > tests > > > to make sure it works? > > > > I've tested with S3 today and the 'test_misaligned_parquet_row_groups()' > > test does not work. This is probably expected. > > > > The parquet files are copied to the destination file system with > > the following command (create-load-data.sh): > > hadoop fs -Ddfs.block.size=1048576 -put -f <localsrc> <dst> > > > > It sets dfs.block.size to 1MB to make sure that some row groups > in > > the parquet files span across block boundaries and thus the files > > are "poorly formatted". This doesn't seem to be working with S3. > I > > tried using -Dfs.s3a.block.size=1048576 but it didn't work > either. > > > > So, probably we should skip the test when the file system is not > > HDFS. What do you think? > > Hmm, yeah I guess we'd have to run this as a custom cluster test so > that we can set the fs.s3a.block.size hadoop config value for the > s3a connector to pick up. I'm a bit worried about checking this in > without any kind of testing on S3. Is there some easy manual > testing you could at least do (or try doing it as a custom cluster > test)? > > This is also why I'm a bit worried about making this a warning > rather than just a profile message -- the person running queries my > not be able to do anything to "fix" the warning. In the case of S3, > they really need help from the cluster administrator. For that > (and other reasons), the message is not always actionable, and it > seems like warnings should always be actionable. What do you > think? I agree. I removed the warning message and kept only the counter in the profile. I also did some manual testing with S3 to make sure that the 'NumScannersWithNoReads' counters are set properly. -- To view, visit http://gerrit.cloudera.org:8080/5400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-HasComments: No