Lars Volker has posted comments on this change.

Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet 
Statistics for remaining types
......................................................................


Patch Set 6:

(4 comments)

Please see my inline comments, especially on CHAR support. I'd like to clarify 
on them before pushing a new PS.

http://gerrit.cloudera.org:8080/#/c/6563/6/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 543:     if (col_idx < col_orders.size()) col_order = &col_orders[col_idx];
> what does this mean if col_orders is not empty? (does the spec allow a part
The spec is not clear to me, so I assumed that a different writer may write it 
partially. In that case we cannot assume it is empty, but we could assume the 
stats are broken and ignore them for the whole row group. Should we do that? 
Alternatively we could document my assumption in a comment.


http://gerrit.cloudera.org:8080/#/c/6563/6/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 1016:   file_metadata_.__isset.column_orders = true;
> do we have a test that checks the integrity of generated parquet data? if s
I added a test to TestHdfsParquetTableWriter. In addition, if these don't get 
set, other stats tests will fail.


http://gerrit.cloudera.org:8080/#/c/6563/6/be/src/exec/parquet-column-stats.cc
File be/src/exec/parquet-column-stats.cc:

Line 104:         StringValue* result = reinterpret_cast<StringValue*>(slot);
> why no padding here?
I assumed that the comparisons on StringValues work on variable length data.

I wasn't aware that our comparisons are broken. Do you have a JIRA? Should we 
just disable support for stats for CHAR columns?


http://gerrit.cloudera.org:8080/#/c/6563/6/testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test:

Line 418: # Test CHAR type columns.
> our char implementation is broken, i'm not sure these tests are meaningful.
See my comment in the stats.cc file. Should we remove stats support for CHAR 
columns altogether?


-- 
To view, visit http://gerrit.cloudera.org:8080/6563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to