[Impala-ASF-CR] IMPALA-11039: Fix incorrect page jumping in late materialization of Parquet

2022-04-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18372 )

Change subject: IMPALA-11039: Fix incorrect page jumping in late 
materialization of Parquet
..


Patch Set 5: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a783115ba8faf1a276e51087f3a70f79402c21d
Gerrit-Change-Number: 18372
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Sat, 02 Apr 2022 03:26:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11039: Fix incorrect page jumping in late materialization of Parquet

2022-04-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/18372 )

Change subject: IMPALA-11039: Fix incorrect page jumping in late 
materialization of Parquet
..

IMPALA-11039: Fix incorrect page jumping in late materialization of Parquet

The current calculation of LastRowIdxInCurrentPage() is incorrect. It
uses the first row index of the next candidate page instead of the next
valid page. The next candidate page could be far away from the current
page. Thus giving a number larger than the current page size. Skipping
rows in the current page could overflow the boundary due to this. This
patch fixes LastRowIdxInCurrentPage() to use the next valid page.

When skip_row_id is set (>0), the current approach of
SkipRowsInternal() expects jumping to a page containing this row
and then skipping rows in that page. However, the expected row might
not be in the candidate pages. When we jump to the next candidate page,
the target row could already be skipped. In this case, we don't need to
skip rows in the current page.

Tests:
 - Add a test on alltypes_empty_pages to reveal the bug.
 - Add more batch_size values in test_page_index.
 - Pass tests/query_test/test_parquet_stats.py locally.

Change-Id: I3a783115ba8faf1a276e51087f3a70f79402c21d
Reviewed-on: http://gerrit.cloudera.org:8080/18372
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-column-readers.cc
M be/src/exec/parquet/parquet-column-readers.h
M be/src/exec/parquet/parquet-common.cc
M be/src/exec/parquet/parquet-common.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-page-index.test
M tests/query_test/test_parquet_stats.py
7 files changed, 97 insertions(+), 30 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/18372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3a783115ba8faf1a276e51087f3a70f79402c21d
Gerrit-Change-Number: 18372
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-11039: Fix incorrect page jumping in late materialization of Parquet

2022-04-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18372 )

Change subject: IMPALA-11039: Fix incorrect page jumping in late 
materialization of Parquet
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a783115ba8faf1a276e51087f3a70f79402c21d
Gerrit-Change-Number: 18372
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 01 Apr 2022 22:56:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11039: Fix incorrect page jumping in late materialization of Parquet

2022-04-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18372 )

Change subject: IMPALA-11039: Fix incorrect page jumping in late 
materialization of Parquet
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8008/ 
DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/18372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a783115ba8faf1a276e51087f3a70f79402c21d
Gerrit-Change-Number: 18372
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 01 Apr 2022 22:56:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11039: Fix incorrect page jumping in late materialization of Parquet

2022-04-01 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18372 )

Change subject: IMPALA-11039: Fix incorrect page jumping in late 
materialization of Parquet
..


Patch Set 4: Code-Review+2

Thanks for adding more tests!


--
To view, visit http://gerrit.cloudera.org:8080/18372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a783115ba8faf1a276e51087f3a70f79402c21d
Gerrit-Change-Number: 18372
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 01 Apr 2022 15:23:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11039: Fix incorrect page jumping in late materialization of Parquet

2022-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18372 )

Change subject: IMPALA-11039: Fix incorrect page jumping in late 
materialization of Parquet
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10377/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a783115ba8faf1a276e51087f3a70f79402c21d
Gerrit-Change-Number: 18372
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 01 Apr 2022 03:24:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11039: Fix incorrect page jumping in late materialization of Parquet

2022-03-31 Thread Quanlong Huang (Code Review)
Hello Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18372

to look at the new patch set (#4).

Change subject: IMPALA-11039: Fix incorrect page jumping in late 
materialization of Parquet
..

IMPALA-11039: Fix incorrect page jumping in late materialization of Parquet

The current calculation of LastRowIdxInCurrentPage() is incorrect. It
uses the first row index of the next candidate page instead of the next
valid page. The next candidate page could be far away from the current
page. Thus giving a number larger than the current page size. Skipping
rows in the current page could overflow the boundary due to this. This
patch fixes LastRowIdxInCurrentPage() to use the next valid page.

When skip_row_id is set (>0), the current approach of
SkipRowsInternal() expects jumping to a page containing this row
and then skipping rows in that page. However, the expected row might
not be in the candidate pages. When we jump to the next candidate page,
the target row could already be skipped. In this case, we don't need to
skip rows in the current page.

Tests:
 - Add a test on alltypes_empty_pages to reveal the bug.
 - Add more batch_size values in test_page_index.
 - Pass tests/query_test/test_parquet_stats.py locally.

Change-Id: I3a783115ba8faf1a276e51087f3a70f79402c21d
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-column-readers.cc
M be/src/exec/parquet/parquet-column-readers.h
M be/src/exec/parquet/parquet-common.cc
M be/src/exec/parquet/parquet-common.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-page-index.test
M tests/query_test/test_parquet_stats.py
7 files changed, 97 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/18372/4
--
To view, visit http://gerrit.cloudera.org:8080/18372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3a783115ba8faf1a276e51087f3a70f79402c21d
Gerrit-Change-Number: 18372
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-11039: Fix incorrect page jumping in late materialization of Parquet

2022-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18372 )

Change subject: IMPALA-11039: Fix incorrect page jumping in late 
materialization of Parquet
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10375/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a783115ba8faf1a276e51087f3a70f79402c21d
Gerrit-Change-Number: 18372
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 01 Apr 2022 01:52:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11039: Fix incorrect page jumping in late materialization of Parquet

2022-03-31 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18372 )

Change subject: IMPALA-11039: Fix incorrect page jumping in late 
materialization of Parquet
..


Patch Set 2:

(2 comments)

Thank for the quick review! I add one more test in PS2.

http://gerrit.cloudera.org:8080/#/c/18372/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18372/1//COMMIT_MSG@12
PS1, Line 12: curre
> nit: current?
Done


http://gerrit.cloudera.org:8080/#/c/18372/1//COMMIT_MSG@18
PS1, Line 18: row
> nit: row?
Done



--
To view, visit http://gerrit.cloudera.org:8080/18372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a783115ba8faf1a276e51087f3a70f79402c21d
Gerrit-Change-Number: 18372
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 01 Apr 2022 01:33:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11039: Fix incorrect page jumping in late materialization of Parquet

2022-03-31 Thread Quanlong Huang (Code Review)
Hello Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18372

to look at the new patch set (#2).

Change subject: IMPALA-11039: Fix incorrect page jumping in late 
materialization of Parquet
..

IMPALA-11039: Fix incorrect page jumping in late materialization of Parquet

The current calculation of LastRowIdxInCurrentPage() is incorrect. It
uses the first row index of the next candidate page instead of the next
valid page. The next candidate page could be far away from the current
page. Thus giving a number larger than the current page size. Skipping
rows in the current page could overflow the boundary due to this. This
patch fixes LastRowIdxInCurrentPage() to use the next valid page.

When skip_row_id is set (>0), the current approach of
SkipRowsInternal() expects jumping to a page containing this row
and then skipping rows in that page. However, the expected row might
not be in the candidate pages. When we jump to the next candidate page,
the target row could already be skipped. In this case, we don't need to
skip rows in the current page.

Tests:
 - Add a test on alltypes_empty_pages to reveal the bug.
 - Add more batch_size values in test_page_index.
 - Pass tests/query_test/test_parquet_stats.py locally.

Change-Id: I3a783115ba8faf1a276e51087f3a70f79402c21d
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-column-readers.cc
M be/src/exec/parquet/parquet-column-readers.h
M be/src/exec/parquet/parquet-common.cc
M be/src/exec/parquet/parquet-common.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-page-index.test
M tests/query_test/test_parquet_stats.py
7 files changed, 92 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/18372/2
--
To view, visit http://gerrit.cloudera.org:8080/18372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3a783115ba8faf1a276e51087f3a70f79402c21d
Gerrit-Change-Number: 18372
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-11039: Fix incorrect page jumping in late materialization of Parquet

2022-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18372 )

Change subject: IMPALA-11039: Fix incorrect page jumping in late 
materialization of Parquet
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a783115ba8faf1a276e51087f3a70f79402c21d
Gerrit-Change-Number: 18372
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 31 Mar 2022 14:55:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11039: Fix incorrect page jumping in late materialization of Parquet

2022-03-31 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18372 )

Change subject: IMPALA-11039: Fix incorrect page jumping in late 
materialization of Parquet
..


Patch Set 1: Code-Review+2

(2 comments)

Thanks for fixing this so quickly. LGTM!

http://gerrit.cloudera.org:8080/#/c/18372/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18372/1//COMMIT_MSG@12
PS1, Line 12: extra
nit: current?


http://gerrit.cloudera.org:8080/#/c/18372/1//COMMIT_MSG@18
PS1, Line 18: page
nit: row?



--
To view, visit http://gerrit.cloudera.org:8080/18372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a783115ba8faf1a276e51087f3a70f79402c21d
Gerrit-Change-Number: 18372
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 31 Mar 2022 14:11:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11039: Fix incorrect page jumping in late materialization of Parquet

2022-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18372 )

Change subject: IMPALA-11039: Fix incorrect page jumping in late 
materialization of Parquet
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7993/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a783115ba8faf1a276e51087f3a70f79402c21d
Gerrit-Change-Number: 18372
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 31 Mar 2022 10:20:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11039: Fix incorrect page jumping in late materialization of Parquet

2022-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18372 )

Change subject: IMPALA-11039: Fix incorrect page jumping in late 
materialization of Parquet
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10370/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a783115ba8faf1a276e51087f3a70f79402c21d
Gerrit-Change-Number: 18372
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 31 Mar 2022 10:16:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11039: Fix incorrect page jumping in late materialization of Parquet

2022-03-31 Thread Quanlong Huang (Code Review)
Quanlong Huang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18372


Change subject: IMPALA-11039: Fix incorrect page jumping in late 
materialization of Parquet
..

IMPALA-11039: Fix incorrect page jumping in late materialization of Parquet

The current calculation of LastRowIdxInCurrentPage() is incorrect. It
uses the first row index of the next candidate page instead of the next
valid page. The next candidate page could be far away from the current
page. Thus giving a number larger than the extra page size. Skipping
rows in the current page could overflow the boundary due to this. This
patch fixes LastRowIdxInCurrentPage() to use the next valid page.

When skip_row_id is set (>0), the current approach of
SkipRowsInternal() expects jumping to a page containing this row
and then skipping rows in that page. However, the expected page might
not be in the candidate pages. When we jump to the next candidate page,
the target row could already be skipped. In this case, we don’t need to
skip rows in the current page.

Tests:
 - Add more batch_size values in test_page_index
 - Pass tests/query_test/test_parquet_stats.py locally.

Change-Id: I3a783115ba8faf1a276e51087f3a70f79402c21d
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-column-readers.cc
M be/src/exec/parquet/parquet-column-readers.h
M be/src/exec/parquet/parquet-common.cc
M be/src/exec/parquet/parquet-common.h
M tests/query_test/test_parquet_stats.py
6 files changed, 33 insertions(+), 27 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/18372/1
--
To view, visit http://gerrit.cloudera.org:8080/18372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3a783115ba8faf1a276e51087f3a70f79402c21d
Gerrit-Change-Number: 18372
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang