[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14348 ) Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of len() .. IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of len() IMPALA-7543 introduced sub-ranges in scan ranges. These are smaller parts of the scan ranges that actually need to be read, other parts of the scan range can be skipped. Currently sub-ranges are only used in the Parquet scanner during page filtering. With sub-ranges the scan range has a new field 'bytes_to_read_', that is the sum of the lengths of the sub-ranges. Or, if there are no sub-ranges, 'bytes_to_read_' equals to field 'len_' which is the length of the whole scan range. At some parts of Impala ScanRange::len() is being used instead of ScanRange::bytes_to_read(). It doesn't cause a bug because only the Parquet scanner uses sub-ranges, i.e. bytes_to_read() usually equals to len(). The Parquet scanner also doesn't hit the bug because it tracks which pages it reads. However, it can be a potential source of bugs in the future to leave the invocations of len() instead of bytes_to_read(). Also, the scanners might allocate more memory than needed. At couple of places we still need to invoke len(), e.g. when we test scan-range containment (for local splits), or when we test whether a scan range contains the mid-point of a Parquet row group. Testing: Added a scanner reservation test. Ran the exhaustive tests. Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c Reviewed-on: http://gerrit.cloudera.org:8080/14348 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/scanner-context.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/request-context.cc M be/src/runtime/io/scan-range.cc M testdata/workloads/functional-query/queries/QueryTest/scanner-reservation.test 8 files changed, 24 insertions(+), 8 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/14348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c Gerrit-Change-Number: 14348 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14348 ) Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of len() .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/14348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c Gerrit-Change-Number: 14348 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 16 Oct 2019 17:57:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14348 ) Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of len() .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5098/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/14348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c Gerrit-Change-Number: 14348 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 16 Oct 2019 13:42:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14348 ) Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of len() .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c Gerrit-Change-Number: 14348 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 16 Oct 2019 13:42:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14348 ) Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of len() .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c Gerrit-Change-Number: 14348 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 15 Oct 2019 17:20:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14348 ) Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of len() .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4783/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c Gerrit-Change-Number: 14348 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 11 Oct 2019 12:24:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14348 ) Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of len() .. Patch Set 3: In PS3 I removed the check for ParquetRowGroupActualReservation because it is irrelevant to the changes, and I got different values in my dev environment than the test environment. -- To view, visit http://gerrit.cloudera.org:8080/14348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c Gerrit-Change-Number: 14348 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 11 Oct 2019 11:29:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14348 to look at the new patch set (#3). Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of len() .. IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of len() IMPALA-7543 introduced sub-ranges in scan ranges. These are smaller parts of the scan ranges that actually need to be read, other parts of the scan range can be skipped. Currently sub-ranges are only used in the Parquet scanner during page filtering. With sub-ranges the scan range has a new field 'bytes_to_read_', that is the sum of the lengths of the sub-ranges. Or, if there are no sub-ranges, 'bytes_to_read_' equals to field 'len_' which is the length of the whole scan range. At some parts of Impala ScanRange::len() is being used instead of ScanRange::bytes_to_read(). It doesn't cause a bug because only the Parquet scanner uses sub-ranges, i.e. bytes_to_read() usually equals to len(). The Parquet scanner also doesn't hit the bug because it tracks which pages it reads. However, it can be a potential source of bugs in the future to leave the invocations of len() instead of bytes_to_read(). Also, the scanners might allocate more memory than needed. At couple of places we still need to invoke len(), e.g. when we test scan-range containment (for local splits), or when we test whether a scan range contains the mid-point of a Parquet row group. Testing: Added a scanner reservation test. Ran the exhaustive tests. Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/scanner-context.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/request-context.cc M be/src/runtime/io/scan-range.cc M testdata/workloads/functional-query/queries/QueryTest/scanner-reservation.test 8 files changed, 24 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/14348/3 -- To view, visit http://gerrit.cloudera.org:8080/14348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c Gerrit-Change-Number: 14348 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14348 ) Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of len() .. Patch Set 2: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5045/ -- To view, visit http://gerrit.cloudera.org:8080/14348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c Gerrit-Change-Number: 14348 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 03 Oct 2019 16:29:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14348 ) Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of len() .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5045/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/14348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c Gerrit-Change-Number: 14348 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 03 Oct 2019 12:12:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14348 ) Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of len() .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c Gerrit-Change-Number: 14348 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 03 Oct 2019 12:12:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14348 ) Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of len() .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c Gerrit-Change-Number: 14348 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 02 Oct 2019 19:58:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14348 ) Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of len() .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4700/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c Gerrit-Change-Number: 14348 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 02 Oct 2019 13:27:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14348 Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of len() .. IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of len() IMPALA-7543 introduced sub-ranges in scan ranges. These are smaller parts of the scan ranges that actually need to be read, other parts of the scan range can be skipped. Currently sub-ranges are only used in the Parquet scanner during page filtering. With sub-ranges the scan range has a new field 'bytes_to_read_', that is the sum of the lengths of the sub-ranges. Or, if there are no sub-ranges, 'bytes_to_read_' equals to field 'len_' which is the length of the whole scan range. At some parts of Impala ScanRange::len() is being used instead of ScanRange::bytes_to_read(). It doesn't cause a bug because only the Parquet scanner uses sub-ranges, i.e. bytes_to_read() usually equals to len(). The Parquet scanner also doesn't hit the bug because it tracks which pages it reads. However, it can be a potential source of bugs in the future to leave the invocations of len() instead of bytes_to_read(). Also, the scanners might allocate more memory than needed. At couple of places we still need to invoke len(), e.g. when we test scan-range containment (for local splits), or when we test whether a scan range contains the mid-point of a Parquet row group. Testing: Added a scanner reservation test. Ran the exhaustive tests. Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/scanner-context.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/request-context.cc M be/src/runtime/io/scan-range.cc M testdata/workloads/functional-query/queries/QueryTest/scanner-reservation.test 8 files changed, 24 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/14348/1 -- To view, visit http://gerrit.cloudera.org:8080/14348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c Gerrit-Change-Number: 14348 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy