Pooja Nilangekar has posted comments on this change.

Change subject: IMPALA-5061: Populate null_count in parquet::statistics
......................................................................


Patch Set 4:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/7058/2//COMMIT_MSG
Commit Message:

Line 7: IMPALA-5061: Populate null_count in parquet::statistics
> Nit: Make the first line of the commit message fit in 80 chars. You can end
Done


http://gerrit.cloudera.org:8080/#/c/7058/2/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 553:       // We need to get min_value.
> Let's only do the write path in this change and then address the read path 
Sure, I haven't made a call to read the null_count anywhere. Just defined a 
function which could be used in the future.


http://gerrit.cloudera.org:8080/#/c/7058/2/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

PS2, Line 198: Co
> Do you need the default implementation here? Without it, you could also rem
Done


http://gerrit.cloudera.org:8080/#/c/7058/2/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 79:   static bool ReadCountFromThrift(
> I think it will be easier to return an int64 here. However we should do the
If the return type is int64, what would the return value be in case the scanner 
is reading a parquet file which does not have it's stats populated? Nos sure if 
'-1' is consistent with the rest of the code. The other option would be to 
return a Status object. I was trying to make the function consistent with 
ReadValueFromThrift. What would you suggest?


Line 115: 
> Why do we need has_null_count_ instead of just always counting them? Are th
Done


Line 170:   virtual void Merge(const ColumnStatsBase& other) override;
> Can this go into the base class? It doesn't depend on T.
Done


http://gerrit.cloudera.org:8080/#/c/7058/2/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

Line 50:   DCHECK(dynamic_cast<const ColumnStats<T>*>(&other));
> nit: we use prefix operators, mostly for consistency, but sometimes for per
Done


http://gerrit.cloudera.org:8080/#/c/7058/2/tests/query_test/test_insert_parquet.py
File tests/query_test/test_insert_parquet.py:

Line 323:       max_value = None
> With the additional checks below, is this line still needed? What is its in
Done


Line 328:         max_value = decode_stats_value(schema, stats.max_value)
> You should have one assignment per line
Done


Line 427:         ColumnStats('bigint_col', 0, 90, 0),
> Why are these None and not 0? I think we should count 0s even if there are 
Done


PS2, Line 607:  
> Replace tabs with spaces, here and elsewhere
Done


Line 620
> The formatting here seems off
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4c49a63af84c2234f0633be63206cb52eb7e8ebb
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar <pooja.nilange...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <pooja.nilange...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to