[Impala-ASF-CR] WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader
Csaba Ringhofer has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/13329 ) Change subject: WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader .. WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader Moved some responsibilities from parquet-column-readers.cc to a new class 'ParquetPageReader': - reading pages from ScanRange - decompress data if needed The main motivation is to make the implementation of V2 data page reading simpler by moving most parts that will differ between V1 and V2 into a class with manageable complexity. TODOs: - The current implementation tries to change the existing logic as little as possible. The interface of ParquetPageReader could be simplified by changing the logic a bit e.g. reorder handling of errors. - The comments could be extended / polished once the interface is more or less final. Testing: - ran parquet related scanner tests Change-Id: Ic0289394adcb97a3529313030930c9c5b85aaa12 --- M be/src/exec/parquet/CMakeLists.txt M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-readers.cc M be/src/exec/parquet/parquet-column-readers.h A be/src/exec/parquet/parquet-page-reader.cc A be/src/exec/parquet/parquet-page-reader.h 6 files changed, 674 insertions(+), 406 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/13329/5 -- To view, visit http://gerrit.cloudera.org:8080/13329 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic0289394adcb97a3529313030930c9c5b85aaa12 Gerrit-Change-Number: 13329 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13329 ) Change subject: WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4432/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13329 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0289394adcb97a3529313030930c9c5b85aaa12 Gerrit-Change-Number: 13329 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 29 Aug 2019 13:03:49 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader
Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13328 Change subject: WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader .. WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader Moved some responsibilities from parquet-column-readers.cc to a new class 'ParquetPageReader': - reading pages from ScanRange - decompress data if needed The main motivation is to make the implementation of V2 data page reading simpler by moving most parts that will differ between V1 and V2 into a class with manageable complexity. TODOs: - The current implementation tries to change the existing logic as little as possible. The interface of ParquetPageReader could be simplified by changing the logic a bit e.g. reorder handling of errors. Testing: - ran parquet related scanner tests Change-Id: I33dc7df71f7f593a6362ad39a6468dd4f7609ca7 --- M be/src/exec/parquet/CMakeLists.txt M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-readers.cc M be/src/exec/parquet/parquet-column-readers.h A be/src/exec/parquet/parquet-page-reader.cc A be/src/exec/parquet/parquet-page-reader.h 6 files changed, 674 insertions(+), 390 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/13328/1 -- To view, visit http://gerrit.cloudera.org:8080/13328 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I33dc7df71f7f593a6362ad39a6468dd4f7609ca7 Gerrit-Change-Number: 13328 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer
[Impala-ASF-CR] WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13328 ) Change subject: WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/13328/1/be/src/exec/parquet/parquet-page-reader.cc File be/src/exec/parquet/parquet-page-reader.cc: http://gerrit.cloudera.org:8080/#/c/13328/1/be/src/exec/parquet/parquet-page-reader.cc@351 PS1, Line 351: if (current_page_header_.uncompressed_page_size != current_page_header_.compressed_page_size) line too long (97 > 90) -- To view, visit http://gerrit.cloudera.org:8080/13328 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I33dc7df71f7f593a6362ad39a6468dd4f7609ca7 Gerrit-Change-Number: 13328 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 14 May 2019 10:43:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader
Csaba Ringhofer has abandoned this change. ( http://gerrit.cloudera.org:8080/13328 ) Change subject: WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader .. Abandoned This was pushed a bit to early (before rebasing and solving conflicts with column index changes). -- To view, visit http://gerrit.cloudera.org:8080/13328 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I33dc7df71f7f593a6362ad39a6468dd4f7609ca7 Gerrit-Change-Number: 13328 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader
Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13329 Change subject: WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader .. WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader Moved some responsibilities from parquet-column-readers.cc to a new class 'ParquetPageReader': - reading pages from ScanRange - decompress data if needed The main motivation is to make the implementation of V2 data page reading simpler by moving most parts that will differ between V1 and V2 into a class with manageable complexity. TODOs: - The current implementation tries to change the existing logic as little as possible. The interface of ParquetPageReader could be simplified by changing the logic a bit e.g. reorder handling of errors. - The comments could be extended / polished once the interface is more or less final. Testing: - ran parquet related scanner tests Change-Id: Ic0289394adcb97a3529313030930c9c5b85aaa12 --- M be/src/exec/parquet/CMakeLists.txt M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-readers.cc M be/src/exec/parquet/parquet-column-readers.h A be/src/exec/parquet/parquet-page-reader.cc A be/src/exec/parquet/parquet-page-reader.h 6 files changed, 674 insertions(+), 393 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/13329/1 -- To view, visit http://gerrit.cloudera.org:8080/13329 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic0289394adcb97a3529313030930c9c5b85aaa12 Gerrit-Change-Number: 13329 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer
[Impala-ASF-CR] WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13329 ) Change subject: WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader .. Patch Set 1: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/3215/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/13329 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0289394adcb97a3529313030930c9c5b85aaa12 Gerrit-Change-Number: 13329 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 14 May 2019 13:49:26 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13329 to look at the new patch set (#2). Change subject: WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader .. WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader Moved some responsibilities from parquet-column-readers.cc to a new class 'ParquetPageReader': - reading pages from ScanRange - decompress data if needed The main motivation is to make the implementation of V2 data page reading simpler by moving most parts that will differ between V1 and V2 into a class with manageable complexity. TODOs: - The current implementation tries to change the existing logic as little as possible. The interface of ParquetPageReader could be simplified by changing the logic a bit e.g. reorder handling of errors. - The comments could be extended / polished once the interface is more or less final. Testing: - ran parquet related scanner tests Change-Id: Ic0289394adcb97a3529313030930c9c5b85aaa12 --- M be/src/exec/parquet/CMakeLists.txt M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-readers.cc M be/src/exec/parquet/parquet-column-readers.h A be/src/exec/parquet/parquet-page-reader.cc A be/src/exec/parquet/parquet-page-reader.h 6 files changed, 674 insertions(+), 393 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/13329/2 -- To view, visit http://gerrit.cloudera.org:8080/13329 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic0289394adcb97a3529313030930c9c5b85aaa12 Gerrit-Change-Number: 13329 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13329 ) Change subject: WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/3216/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13329 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0289394adcb97a3529313030930c9c5b85aaa12 Gerrit-Change-Number: 13329 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 14 May 2019 14:57:31 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13329 ) Change subject: WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4250/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/13329 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0289394adcb97a3529313030930c9c5b85aaa12 Gerrit-Change-Number: 13329 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 14 May 2019 15:29:12 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13329 ) Change subject: WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader .. Patch Set 2: (8 comments) Did a first pass, will take another round tomorrow. http://gerrit.cloudera.org:8080/#/c/13329/2/be/src/exec/parquet/parquet-column-readers.cc File be/src/exec/parquet/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/13329/2/be/src/exec/parquet/parquet-column-readers.cc@1124 PS2, Line 1124: if (data_size == 0) { : return CreateDictionaryDecoder(nullptr, 0, &dict_decoder); : } nit: fits single-line http://gerrit.cloudera.org:8080/#/c/13329/2/be/src/exec/parquet/parquet-column-readers.cc@1178 PS2, Line 1178: if (eos) { : return HandleTooEarlyEos(); : } nit: single line http://gerrit.cloudera.org:8080/#/c/13329/2/be/src/exec/parquet/parquet-page-reader.h File be/src/exec/parquet/parquet-page-reader.h: http://gerrit.cloudera.org:8080/#/c/13329/2/be/src/exec/parquet/parquet-page-reader.h@39 PS2, Line 39: /// Moved to implementation to be able to forward declare class in scoped_ptr. nit: Please rephrase, I can't parse this sentence. http://gerrit.cloudera.org:8080/#/c/13329/2/be/src/exec/parquet/parquet-page-reader.h@83 PS2, Line 83: & nit: output parameters usually have pointer types http://gerrit.cloudera.org:8080/#/c/13329/2/be/src/exec/parquet/parquet-page-reader.h@91 PS2, Line 91: slot_desc has_slot_desc http://gerrit.cloudera.org:8080/#/c/13329/2/be/src/exec/parquet/parquet-page-reader.cc File be/src/exec/parquet/parquet-page-reader.cc: http://gerrit.cloudera.org:8080/#/c/13329/2/be/src/exec/parquet/parquet-page-reader.cc@60 PS2, Line 60: ParquetColumnReader ParquetPageReader http://gerrit.cloudera.org:8080/#/c/13329/2/be/src/exec/parquet/parquet-page-reader.cc@382 PS2, Line 382: nit: indent http://gerrit.cloudera.org:8080/#/c/13329/2/be/src/exec/parquet/parquet-page-reader.cc@413 PS2, Line 413: nit: indent -- To view, visit http://gerrit.cloudera.org:8080/13329 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0289394adcb97a3529313030930c9c5b85aaa12 Gerrit-Change-Number: 13329 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 14 May 2019 17:00:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13329 ) Change subject: WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader .. Patch Set 2: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/4250/ -- To view, visit http://gerrit.cloudera.org:8080/13329 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0289394adcb97a3529313030930c9c5b85aaa12 Gerrit-Change-Number: 13329 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 14 May 2019 18:12:58 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13329 ) Change subject: WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4263/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/13329 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0289394adcb97a3529313030930c9c5b85aaa12 Gerrit-Change-Number: 13329 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 15 May 2019 11:35:51 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/13329 ) Change subject: WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader .. Patch Set 3: I do not understand the reason behind the failed build. Rebased and started a new build. I will address the comments later. -- To view, visit http://gerrit.cloudera.org:8080/13329 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0289394adcb97a3529313030930c9c5b85aaa12 Gerrit-Change-Number: 13329 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 15 May 2019 11:36:35 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13329 ) Change subject: WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4264/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/13329 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0289394adcb97a3529313030930c9c5b85aaa12 Gerrit-Change-Number: 13329 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 15 May 2019 13:03:07 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13329 ) Change subject: WIP: IMPALA-6433: Part 1: Extract page reading logic from ParquetColumnReader .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/13329 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0289394adcb97a3529313030930c9c5b85aaa12 Gerrit-Change-Number: 13329 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 15 May 2019 18:24:31 + Gerrit-HasComments: No