Pooja Nilangekar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11517 )

Change subject: [WIP] IMPALA-6932: Speed up scans for sequence datasets with 
many files
......................................................................


Patch Set 3:

> Patch Set 3:
>
> The discussion looks like it's not easy to add the tests on HDFS. Should we 
> try S3 then?

Hi Lars,

I have tried testing on S3. I did notice a reduction in the query time. I was 
querying the tpch_avro.lineitem table and the timeline difference was 8.66s vs 
5.87s. The scanner I/O wait time went down from 2.371s to 0.851s.
However, I couldn't notice a deterministic counter which can be verified each 
time the test is run. Tim suggested that I could add a counter which keeps 
track of the number of scan ranges read for each scan node. I was thinking I'll 
add that in a separate patch and then test it here.

Do you have any other suggestions?

Thanks,
Pooja


--
To view, visit http://gerrit.cloudera.org:8080/11517
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965
Gerrit-Change-Number: 11517
Gerrit-PatchSet: 3
Gerrit-Owner: Pooja Nilangekar <pooja.nilange...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <pooja.nilange...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Wed, 02 Jan 2019 18:44:14 +0000
Gerrit-HasComments: No

Reply via email to