Philip Zeyliger has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11517 )
Change subject: IMPALA-6932: Speed up scans for sequence datasets with many files ...................................................................... IMPALA-6932: Speed up scans for sequence datasets with many files This change addresses the slow scans of sequence datasets with many files by enqueueing the scan ranges to the head of the disk IO queue instead of the tail. This ensures that the data ranges get priority over headers of other files. Hence it produces results earlier for limit queries. Testing: Added a unit test to verify that the expected elements are dequeued from the front. Tested the performance of this patch on S3 to emulate remote reads. The following query was executed several times: "SELECT * FROM TPCH_AVRO.LINEITEM LIMIT 1;" The average timeline difference was 8.66s vs 5.87s. The scanner I/O wait time went down from 2.37s to 9.85s. Tested the patch with backend and end-to-end tests. Single node performance test results: +----------+--------------------+---------+------------+------------+----------------+ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +----------+--------------------+---------+------------+------------+----------------+ | TPCH(50) | avro / none / none | 65.62 | -0.38% | 43.51 | -0.79% | +----------+--------------------+---------+------------+------------+----------------+ Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965 Reviewed-on: http://gerrit.cloudera.org:8080/11517 Reviewed-by: Philip Zeyliger <phi...@cloudera.com> Tested-by: Philip Zeyliger <phi...@cloudera.com> --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-text-scanner.cc M be/src/runtime/io/disk-io-mgr-stress.cc M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/util/internal-queue-test.cc M be/src/util/internal-queue.h 13 files changed, 159 insertions(+), 115 deletions(-) Approvals: Philip Zeyliger: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/11517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965 Gerrit-Change-Number: 11517 Gerrit-PatchSet: 7 Gerrit-Owner: Pooja Nilangekar <pooja.nilange...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com> Gerrit-Reviewer: Pooja Nilangekar <pooja.nilange...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>