[Impala-ASF-CR] IMPALA-6311: Skip row groups with predicates on NULL columns
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9140 ) Change subject: IMPALA-6311: Skip row groups with predicates on NULL columns .. Patch Set 2: (4 comments) LGTM, had minor comments http://gerrit.cloudera.org:8080/#/c/9140/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9140/2//COMMIT_MSG@7 PS2, Line 7: IMPALA-6311 it's IMPALA-6113 http://gerrit.cloudera.org:8080/#/c/9140/2/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: http://gerrit.cloudera.org:8080/#/c/9140/2/be/src/exec/parquet-column-stats.h@78 PS2, Line 78: int64_t* Currently it returns a pointer to a member of ColumnChunk. This means that the returned pointer can easily become dangling if the corresponding ColumnChunk object dies. Maybe you could add a comment about it. Or, it could return an int64_t, and -1 would indicate that there are no null_count statistics. http://gerrit.cloudera.org:8080/#/c/9140/2/be/src/exec/parquet-column-stats.cc File be/src/exec/parquet-column-stats.cc: http://gerrit.cloudera.org:8080/#/c/9140/2/be/src/exec/parquet-column-stats.cc@132 PS2, Line 132: const int64_t* ColumnStatsBase::ReadNullCountStat( : const parquet::ColumnChunk& col_chunk) { nit: can fit into one line http://gerrit.cloudera.org:8080/#/c/9140/2/be/src/exec/parquet-column-stats.cc@139 PS2, Line 139: if (stats.__isset.null_count) { : return &stats.null_count; : } nit: can be a one-liner -- To view, visit http://gerrit.cloudera.org:8080/9140 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I141317af0e0df30da8f220b29b0bfba364f40ddf Gerrit-Change-Number: 9140 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Fri, 26 Jan 2018 11:19:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6311: Skip row groups with predicates on NULL columns
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/9140 ) Change subject: IMPALA-6311: Skip row groups with predicates on NULL columns .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/9140/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test: http://gerrit.cloudera.org:8080/#/c/9140/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@463 PS1, Line 463: insert into table_for_null_count_test values (1, NULL), (2, NULL), (3, NULL); > Can we modify these create table statements and move them Hey Anuj, thanks for taking a look! Sure, I can move the create table statements to the .py file. For my benefit, could you please let me know what we gain if we moved these there? Can't I use the $DATABASE variable in the .test file to use the unique_database? -- To view, visit http://gerrit.cloudera.org:8080/9140 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I141317af0e0df30da8f220b29b0bfba364f40ddf Gerrit-Change-Number: 9140 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Fri, 26 Jan 2018 11:11:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6311: Skip row groups with predicates on NULL columns
Hello Lars Volker, Zoltan Borok-Nagy, anujphadke, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9140 to look at the new patch set (#2). Change subject: IMPALA-6311: Skip row groups with predicates on NULL columns .. IMPALA-6311: Skip row groups with predicates on NULL columns Based on the existing Parquet column chunk level statistics null_count, Impala's Parquet scanner is enhanced to skip an entire row group if the null_count statistics indicate that all the values under the predicated column are NULL as we wouldn't get any result rows from that row group anyway. Change-Id: I141317af0e0df30da8f220b29b0bfba364f40ddf --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test M tests/query_test/test_parquet_stats.py 5 files changed, 58 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/9140/2 -- To view, visit http://gerrit.cloudera.org:8080/9140 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I141317af0e0df30da8f220b29b0bfba364f40ddf Gerrit-Change-Number: 9140 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-6311: Skip row groups with predicates on NULL columns
anujphadke has posted comments on this change. ( http://gerrit.cloudera.org:8080/9140 ) Change subject: IMPALA-6311: Skip row groups with predicates on NULL columns .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/9140/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test: http://gerrit.cloudera.org:8080/#/c/9140/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@463 PS1, Line 463: create table table_for_null_count_test (i int, j int) stored as parquet; Can we modify these create table statements and move them test_parquet_stats.py and create them using a unique database? http://gerrit.cloudera.org:8080/#/c/9140/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@485 PS1, Line 485: create table table_for_null_count_test2 (i int, j int) stored as parquet; same here. -- To view, visit http://gerrit.cloudera.org:8080/9140 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I141317af0e0df30da8f220b29b0bfba364f40ddf Gerrit-Change-Number: 9140 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Thu, 25 Jan 2018 21:48:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6311: Skip row groups with predicates on NULL columns
Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9140 Change subject: IMPALA-6311: Skip row groups with predicates on NULL columns .. IMPALA-6311: Skip row groups with predicates on NULL columns Based on the existing Parquet column chunk level statistics null_count, Impala's Parquet scanner is enhanced to skip an entire row group if the null_count statistics indicate that all the values under the predicated column are NULL as we wouldn't get any result rows from that row group anyway. Change-Id: I141317af0e0df30da8f220b29b0bfba364f40ddf --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test 4 files changed, 56 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/9140/1 -- To view, visit http://gerrit.cloudera.org:8080/9140 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I141317af0e0df30da8f220b29b0bfba364f40ddf Gerrit-Change-Number: 9140 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab