[Impala-ASF-CR] IMPALA-6311: Skip row groups with predicates on NULL columns

2018-01-26 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9140 )

Change subject: IMPALA-6311: Skip row groups with predicates on NULL columns
..


Patch Set 2:

(4 comments)

LGTM, had minor comments

http://gerrit.cloudera.org:8080/#/c/9140/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9140/2//COMMIT_MSG@7
PS2, Line 7: IMPALA-6311
it's IMPALA-6113


http://gerrit.cloudera.org:8080/#/c/9140/2/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

http://gerrit.cloudera.org:8080/#/c/9140/2/be/src/exec/parquet-column-stats.h@78
PS2, Line 78: int64_t*
Currently it returns a pointer to a member of ColumnChunk. This means that the 
returned pointer can easily become dangling if the corresponding ColumnChunk 
object dies.
Maybe you could add a comment about it.

Or, it could return an int64_t, and -1 would indicate that there are no 
null_count statistics.


http://gerrit.cloudera.org:8080/#/c/9140/2/be/src/exec/parquet-column-stats.cc
File be/src/exec/parquet-column-stats.cc:

http://gerrit.cloudera.org:8080/#/c/9140/2/be/src/exec/parquet-column-stats.cc@132
PS2, Line 132: const int64_t* ColumnStatsBase::ReadNullCountStat(
 : const parquet::ColumnChunk& col_chunk) {
nit: can fit into one line


http://gerrit.cloudera.org:8080/#/c/9140/2/be/src/exec/parquet-column-stats.cc@139
PS2, Line 139:   if (stats.__isset.null_count) {
 : return &stats.null_count;
 :   }
nit: can be a one-liner



--
To view, visit http://gerrit.cloudera.org:8080/9140
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I141317af0e0df30da8f220b29b0bfba364f40ddf
Gerrit-Change-Number: 9140
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Fri, 26 Jan 2018 11:19:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6311: Skip row groups with predicates on NULL columns

2018-01-26 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9140 )

Change subject: IMPALA-6311: Skip row groups with predicates on NULL columns
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9140/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/9140/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@463
PS1, Line 463: insert into table_for_null_count_test values (1, NULL), (2, 
NULL), (3, NULL);
> Can we modify these create table statements and move them
Hey Anuj, thanks for taking a look!
Sure, I can move the create table statements to the .py file.

For my benefit, could you please let me know what we gain if we moved these 
there? Can't I use the $DATABASE variable in the .test file to use the 
unique_database?



--
To view, visit http://gerrit.cloudera.org:8080/9140
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I141317af0e0df30da8f220b29b0bfba364f40ddf
Gerrit-Change-Number: 9140
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Fri, 26 Jan 2018 11:11:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6311: Skip row groups with predicates on NULL columns

2018-01-26 Thread Gabor Kaszab (Code Review)
Hello Lars Volker, Zoltan Borok-Nagy, anujphadke, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9140

to look at the new patch set (#2).

Change subject: IMPALA-6311: Skip row groups with predicates on NULL columns
..

IMPALA-6311: Skip row groups with predicates on NULL columns

Based on the existing Parquet column chunk level statistics null_count,
Impala's Parquet scanner is enhanced to skip an entire row group if the
null_count statistics indicate that all the values under the predicated
column are NULL as we wouldn't get any result rows from that row group
anyway.

Change-Id: I141317af0e0df30da8f220b29b0bfba364f40ddf
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
M tests/query_test/test_parquet_stats.py
5 files changed, 58 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/9140/2
--
To view, visit http://gerrit.cloudera.org:8080/9140
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I141317af0e0df30da8f220b29b0bfba364f40ddf
Gerrit-Change-Number: 9140
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-6311: Skip row groups with predicates on NULL columns

2018-01-25 Thread anujphadke (Code Review)
anujphadke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9140 )

Change subject: IMPALA-6311: Skip row groups with predicates on NULL columns
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9140/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/9140/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@463
PS1, Line 463: create table table_for_null_count_test (i int, j int) stored as 
parquet;
Can we modify these create table statements and move them
test_parquet_stats.py and create them using a unique database?


http://gerrit.cloudera.org:8080/#/c/9140/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@485
PS1, Line 485: create table table_for_null_count_test2 (i int, j int) stored as 
parquet;
same here.



--
To view, visit http://gerrit.cloudera.org:8080/9140
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I141317af0e0df30da8f220b29b0bfba364f40ddf
Gerrit-Change-Number: 9140
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Thu, 25 Jan 2018 21:48:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6311: Skip row groups with predicates on NULL columns

2018-01-25 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9140


Change subject: IMPALA-6311: Skip row groups with predicates on NULL columns
..

IMPALA-6311: Skip row groups with predicates on NULL columns

Based on the existing Parquet column chunk level statistics null_count,
Impala's Parquet scanner is enhanced to skip an entire row group if the
null_count statistics indicate that all the values under the predicated
column are NULL as we wouldn't get any result rows from that row group
anyway.

Change-Id: I141317af0e0df30da8f220b29b0bfba364f40ddf
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
4 files changed, 56 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/9140/1
--
To view, visit http://gerrit.cloudera.org:8080/9140
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I141317af0e0df30da8f220b29b0bfba364f40ddf
Gerrit-Change-Number: 9140
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab