[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()

2019-10-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/14348 )

Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of 
len()
..

IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of len()

IMPALA-7543 introduced sub-ranges in scan ranges. These are smaller
parts of the scan ranges that actually need to be read, other parts
of the scan range can be skipped. Currently sub-ranges are only used
in the Parquet scanner during page filtering.

With sub-ranges the scan range has a new field 'bytes_to_read_', that
is the sum of the lengths of the sub-ranges. Or, if there are no
sub-ranges, 'bytes_to_read_' equals to field 'len_' which is the length
of the whole scan range.

At some parts of Impala ScanRange::len() is being used instead of
ScanRange::bytes_to_read(). It doesn't cause a bug because only the
Parquet scanner uses sub-ranges, i.e. bytes_to_read() usually equals to
len(). The Parquet scanner also doesn't hit the bug because it tracks
which pages it reads.

However, it can be a potential source of bugs in the future to leave
the invocations of len() instead of bytes_to_read(). Also, the scanners
might allocate more memory than needed. At couple of places we still
need to invoke len(), e.g. when we test scan-range containment (for
local splits), or when we test whether a scan range contains the
mid-point of a Parquet row group.

Testing:
Added a scanner reservation test.
Ran the exhaustive tests.

Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c
Reviewed-on: http://gerrit.cloudera.org:8080/14348
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/scanner-context.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/request-context.cc
M be/src/runtime/io/scan-range.cc
M testdata/workloads/functional-query/queries/QueryTest/scanner-reservation.test
8 files changed, 24 insertions(+), 8 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/14348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c
Gerrit-Change-Number: 14348
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()

2019-10-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14348 )

Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of 
len()
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/14348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c
Gerrit-Change-Number: 14348
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 16 Oct 2019 17:57:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()

2019-10-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14348 )

Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of 
len()
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5098/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/14348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c
Gerrit-Change-Number: 14348
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 16 Oct 2019 13:42:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()

2019-10-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14348 )

Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of 
len()
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c
Gerrit-Change-Number: 14348
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 16 Oct 2019 13:42:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()

2019-10-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14348 )

Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of 
len()
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c
Gerrit-Change-Number: 14348
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 15 Oct 2019 17:20:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()

2019-10-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14348 )

Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of 
len()
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4783/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c
Gerrit-Change-Number: 14348
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 11 Oct 2019 12:24:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()

2019-10-11 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14348 )

Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of 
len()
..


Patch Set 3:

In PS3 I removed the check for ParquetRowGroupActualReservation because it is 
irrelevant to the changes, and I got different values in my dev environment 
than the test environment.


--
To view, visit http://gerrit.cloudera.org:8080/14348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c
Gerrit-Change-Number: 14348
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 11 Oct 2019 11:29:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()

2019-10-11 Thread Zoltan Borok-Nagy (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14348

to look at the new patch set (#3).

Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of 
len()
..

IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of len()

IMPALA-7543 introduced sub-ranges in scan ranges. These are smaller
parts of the scan ranges that actually need to be read, other parts
of the scan range can be skipped. Currently sub-ranges are only used
in the Parquet scanner during page filtering.

With sub-ranges the scan range has a new field 'bytes_to_read_', that
is the sum of the lengths of the sub-ranges. Or, if there are no
sub-ranges, 'bytes_to_read_' equals to field 'len_' which is the length
of the whole scan range.

At some parts of Impala ScanRange::len() is being used instead of
ScanRange::bytes_to_read(). It doesn't cause a bug because only the
Parquet scanner uses sub-ranges, i.e. bytes_to_read() usually equals to
len(). The Parquet scanner also doesn't hit the bug because it tracks
which pages it reads.

However, it can be a potential source of bugs in the future to leave
the invocations of len() instead of bytes_to_read(). Also, the scanners
might allocate more memory than needed. At couple of places we still
need to invoke len(), e.g. when we test scan-range containment (for
local splits), or when we test whether a scan range contains the
mid-point of a Parquet row group.

Testing:
Added a scanner reservation test.
Ran the exhaustive tests.

Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/scanner-context.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/request-context.cc
M be/src/runtime/io/scan-range.cc
M testdata/workloads/functional-query/queries/QueryTest/scanner-reservation.test
8 files changed, 24 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/14348/3
--
To view, visit http://gerrit.cloudera.org:8080/14348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c
Gerrit-Change-Number: 14348
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()

2019-10-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14348 )

Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of 
len()
..


Patch Set 2: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5045/


--
To view, visit http://gerrit.cloudera.org:8080/14348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c
Gerrit-Change-Number: 14348
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 03 Oct 2019 16:29:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()

2019-10-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14348 )

Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of 
len()
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5045/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/14348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c
Gerrit-Change-Number: 14348
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 03 Oct 2019 12:12:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()

2019-10-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14348 )

Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of 
len()
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c
Gerrit-Change-Number: 14348
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 03 Oct 2019 12:12:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()

2019-10-02 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14348 )

Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of 
len()
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c
Gerrit-Change-Number: 14348
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 02 Oct 2019 19:58:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()

2019-10-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14348 )

Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of 
len()
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4700/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c
Gerrit-Change-Number: 14348
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 02 Oct 2019 13:27:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8742: Switch to ScanRange::bytes to read() instead of len()

2019-10-02 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/14348


Change subject: IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of 
len()
..

IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of len()

IMPALA-7543 introduced sub-ranges in scan ranges. These are smaller
parts of the scan ranges that actually need to be read, other parts
of the scan range can be skipped. Currently sub-ranges are only used
in the Parquet scanner during page filtering.

With sub-ranges the scan range has a new field 'bytes_to_read_', that
is the sum of the lengths of the sub-ranges. Or, if there are no
sub-ranges, 'bytes_to_read_' equals to field 'len_' which is the length
of the whole scan range.

At some parts of Impala ScanRange::len() is being used instead of
ScanRange::bytes_to_read(). It doesn't cause a bug because only the
Parquet scanner uses sub-ranges, i.e. bytes_to_read() usually equals to
len(). The Parquet scanner also doesn't hit the bug because it tracks
which pages it reads.

However, it can be a potential source of bugs in the future to leave
the invocations of len() instead of bytes_to_read(). Also, the scanners
might allocate more memory than needed. At couple of places we still
need to invoke len(), e.g. when we test scan-range containment (for
local splits), or when we test whether a scan range contains the
mid-point of a Parquet row group.

Testing:
Added a scanner reservation test.
Ran the exhaustive tests.

Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/scanner-context.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/request-context.cc
M be/src/runtime/io/scan-range.cc
M testdata/workloads/functional-query/queries/QueryTest/scanner-reservation.test
8 files changed, 24 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/14348/1
--
To view, visit http://gerrit.cloudera.org:8080/14348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c
Gerrit-Change-Number: 14348
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy