Pooja Nilangekar has posted comments on this change. ( http://gerrit.cloudera.org:8080/11517 )
Change subject: [WIP] IMPALA-6932: Speed up scans for sequence datasets with many files ...................................................................... Patch Set 3: > Patch Set 3: > > The discussion looks like it's not easy to add the tests on HDFS. Should we > try S3 then? Hi Lars, I have tried testing on S3. I did notice a reduction in the query time. I was querying the tpch_avro.lineitem table and the timeline difference was 8.66s vs 5.87s. The scanner I/O wait time went down from 2.371s to 0.851s. However, I couldn't notice a deterministic counter which can be verified each time the test is run. Tim suggested that I could add a counter which keeps track of the number of scan ranges read for each scan node. I was thinking I'll add that in a separate patch and then test it here. Do you have any other suggestions? Thanks, Pooja -- To view, visit http://gerrit.cloudera.org:8080/11517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965 Gerrit-Change-Number: 11517 Gerrit-PatchSet: 3 Gerrit-Owner: Pooja Nilangekar <pooja.nilange...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com> Gerrit-Reviewer: Pooja Nilangekar <pooja.nilange...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Wed, 02 Jan 2019 18:44:14 +0000 Gerrit-HasComments: No