Lars Volker has posted comments on this change. Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types ......................................................................
Patch Set 6: (4 comments) Please see my inline comments, especially on CHAR support. I'd like to clarify on them before pushing a new PS. http://gerrit.cloudera.org:8080/#/c/6563/6/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 543: if (col_idx < col_orders.size()) col_order = &col_orders[col_idx]; > what does this mean if col_orders is not empty? (does the spec allow a part The spec is not clear to me, so I assumed that a different writer may write it partially. In that case we cannot assume it is empty, but we could assume the stats are broken and ignore them for the whole row group. Should we do that? Alternatively we could document my assumption in a comment. http://gerrit.cloudera.org:8080/#/c/6563/6/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: Line 1016: file_metadata_.__isset.column_orders = true; > do we have a test that checks the integrity of generated parquet data? if s I added a test to TestHdfsParquetTableWriter. In addition, if these don't get set, other stats tests will fail. http://gerrit.cloudera.org:8080/#/c/6563/6/be/src/exec/parquet-column-stats.cc File be/src/exec/parquet-column-stats.cc: Line 104: StringValue* result = reinterpret_cast<StringValue*>(slot); > why no padding here? I assumed that the comparisons on StringValues work on variable length data. I wasn't aware that our comparisons are broken. Do you have a JIRA? Should we just disable support for stats for CHAR columns? http://gerrit.cloudera.org:8080/#/c/6563/6/testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test File testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test: Line 418: # Test CHAR type columns. > our char implementation is broken, i'm not sure these tests are meaningful. See my comment in the stats.cc file. Should we remove stats support for CHAR columns altogether? -- To view, visit http://gerrit.cloudera.org:8080/6563 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes