Lars Volker has posted comments on this change.

Change subject: IMPALA-5061: Populate null_count in parquet::statistics
......................................................................


Patch Set 4:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/7058/4//COMMIT_MSG
Commit Message:

Line 10: value is encountered by paquet table writer. The value is written
nit typo


http://gerrit.cloudera.org:8080/#/c/7058/4/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 197:     if(page_stats_base_ != nullptr) {
nit: single line


http://gerrit.cloudera.org:8080/#/c/7058/2/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 79:   static bool ReadCountFromThrift(
> If the return type is int64, what would the return value be in case the sca
For frequently called functions it's often better to return a bool instead of 
the more heavy-weight Status. In this case you would probably pass an int64_t* 
to return the value.

Let's remove the method here altogether and re-introduce it in a change that 
handles reading the stats.


http://gerrit.cloudera.org:8080/#/c/7058/4/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

PS4, Line 63: NULL_COUNT
remove


PS4, Line 105: row
column?


PS4, Line 105: Initilizes
typo


PS4, Line 116: NULL
nit: lowercase


http://gerrit.cloudera.org:8080/#/c/7058/4/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

Line 32: inline void ColumnStatsBase::UpdateNullCount() {
It looks like this will fit into one line in the header. If so, please move it 
there.


Line 186:   if (cs->has_values_) {
You could also change the interface of Update() to take a min and max parameter 
and then clean up this method by calling it. You don't have to do it in this 
change though if you think it's too much.


Line 205:   null_count_ += cs->null_count_;
If you add a "int64_t count" parameter to the UpdateNullCount method, you can 
call it from here.


http://gerrit.cloudera.org:8080/#/c/7058/2/tests/query_test/test_insert_parquet.py
File tests/query_test/test_insert_parquet.py:

Line 427:         ColumnStats('bigint_col', 0, 90, 0),
> Done
Cool :)


http://gerrit.cloudera.org:8080/#/c/7058/4/tests/query_test/test_insert_parquet.py
File tests/query_test/test_insert_parquet.py:

Line 324:       null_count = None
How about moving this to L330, initialize it from stats.null_count and then 
check it's not None?


Line 480:     """Test that we don't write min/max statistics for null columns. 
Ensure null_count 
Please remove the trailing whitespace. You can also set-up git in a way that it 
warns about trailing whitespace during commit time. Even better, you may be 
able to configure your editor to highlight it for you.


-- 
To view, visit http://gerrit.cloudera.org:8080/7058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4c49a63af84c2234f0633be63206cb52eb7e8ebb
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar <pooja.nilange...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <pooja.nilange...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to