Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types ......................................................................
Patch Set 6: (10 comments) getting close http://gerrit.cloudera.org:8080/#/c/6563/6/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 543: if (col_idx < col_orders.size()) col_order = &col_orders[col_idx]; what does this mean if col_orders is not empty? (does the spec allow a partial col_orders?) http://gerrit.cloudera.org:8080/#/c/6563/6/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: Line 1016: file_metadata_.__isset.column_orders = true; do we have a test that checks the integrity of generated parquet data? if so, we should also check for the existence of this field. http://gerrit.cloudera.org:8080/#/c/6563/6/be/src/exec/parquet-column-stats.cc File be/src/exec/parquet-column-stats.cc: Line 104: StringValue* result = reinterpret_cast<StringValue*>(slot); why no padding here? do stats for char columns make sense, given that our char comparisons are broken? http://gerrit.cloudera.org:8080/#/c/6563/4/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: Line 34: /// decode parquet::Statistics into slots. > Done i meant: update the comment to reflect the changes in this patch. Line 108: > Done please state explicitly that T must be one of the *Value classes http://gerrit.cloudera.org:8080/#/c/6563/6/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: Line 105: /// 'col_order'. Otherwise, returns false. what if col_order == 0? Line 172: static void CopyToBuffer(StringBuffer* buffer, StringValue* value); move to -base and remove template params in .inl.h? http://gerrit.cloudera.org:8080/#/c/6563/6/be/src/exec/parquet-column-stats.inline.h File be/src/exec/parquet-column-stats.inline.h: Line 123: // The ParquetPlainEncoder interface expects mutable pointers. several function signatures in parquet-common.h look wrong (output args at beginning, missing const, etc.) please leave a todo there to fix it (or simply fix it - the fact that it's wrong is spilling over into your new code, which isn't good). http://gerrit.cloudera.org:8080/#/c/6563/4/testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test File testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test: Line 290: select count(*) from functional_parquet.alltypessmall where string_col < "1" > I'm not sure I understand your comment. The test in L274 filters all rows u nm http://gerrit.cloudera.org:8080/#/c/6563/6/testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test File testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test: Line 418: # Test CHAR type columns. our char implementation is broken, i'm not sure these tests are meaningful. -- To view, visit http://gerrit.cloudera.org:8080/6563 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes