Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet 
Statistics for remaining types
......................................................................


Patch Set 4:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/6563/4/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 375:   scoped_ptr<ColumnStats<T>> page_stats_;
why this change?


http://gerrit.cloudera.org:8080/#/c/6563/4/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 34: /// decode parquet::Statistics into slots.
update


Line 60:   /// Enum to select whether to read minimum or maximum statistics. 
Values to not
"do not"


Line 73:       const ColumnType& col_type, const StatsField& stats_field, void* 
slot);
const& isn't necessary here


Line 82:   /// provide the functionality.
is this useful for anything other than strings/var-len data? if not, make name 
more specific.


Line 83:   virtual void MaterializeValuesToInternalBuffer(){};
) {}


Line 105:   static bool ShouldUseDeprecatedStats(const ColumnType& col_type);
CanUse-? or should as in 'would be preferable'/'there's a moral imperative'?


Line 108: /// This class contains the type-specific behavior to track 
statistics per column.
i had to remind myself again what T was. would be useful to point out in the 
class comment that this is for the types of our in-memory format.


Line 127:   ColumnStats(MemPool* mem_pool, int plain_encoded_value_size)
describe params


Line 135:   /// statistics.
you should point out that it may keep a reference to 'v' until Materialize..() 
got called.


Line 144:   static bool ReadFromThrift(const string& thrift_stats, void* slot);
this doesn't need to be part of the public interface, columstatsbase can be a 
friend.


Line 152:   void EncodeValueToString(const T& v, std::string* out) const;
static


Line 154:   /// Decodes a statistics values from 'buffer' into 'result'.
you don't mention here that this is plain-encoded.

since they both convert between our format and plain, why don't you name them 
uniformly and make clear that that's what happens.

unless that isn't true, but then you should still call it something like 
'..To/FromThrift' for the sake of uniformity.


Line 158:   int64_t BytesNeededInternal(const T& v) const;
why -internal?


Line 161:   static void CopyToBufferInternal(StringBuffer* buffer, StringValue* 
value);
why -internal?


http://gerrit.cloudera.org:8080/#/c/6563/4/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

Line 28: inline bool ColumnStats<T>::ReadFromThrift(const string& thrift_stats, 
void* slot) {
again, you changed the meaning of a variable but not the name.

please pay attention to that yourself.


Line 30:   return DecodeValueFromThrift(thrift_stats, out);
this whole function does almost nothing. is it worth having it (extra level of 
indirection = more obscurity) instead of just plugging that code into the call 
sites? it would still be a single statement.


Line 189: template <typename T>
why do you need T here?


http://gerrit.cloudera.org:8080/#/c/6563/4/testdata/workloads/functional-query/queries/QueryTest/parquet-deprecated-stats.test
File 
testdata/workloads/functional-query/queries/QueryTest/parquet-deprecated-stats.test:

Line 138: #       widerow table here.
stray line?


http://gerrit.cloudera.org:8080/#/c/6563/4/testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test:

Line 290: select count(*) from functional_parquet.alltypessmall where 
string_col < "1"
use a predicate that will result in filter? (so we can see it also works with <)


http://gerrit.cloudera.org:8080/#/c/6563/4/tests/query_test/test_insert_parquet.py
File tests/query_test/test_insert_parquet.py:

Line 60:     # 'timetuple', the datetime __eq__ function will forward an 
unknown equality check to
'timetuple' will the datetime __eq__ function forward...


-- 
To view, visit http://gerrit.cloudera.org:8080/6563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to