Lars Volker has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6563

Change subject: IMPALA-4817: Populate Parquet Statistics for Strings
......................................................................

IMPALA-4817: Populate Parquet Statistics for Strings

This change adds functionality to populate the new parquet::Statistics
fields 'min_value' and 'max_value', that were added in parquet-format PR
change, Impala will stop populating the deprecated 'min' and 'max'
fields.

Keeping track of StringValue statistics requires some memory management
code to materialize values that reside in memory owned by row batches.

The HdfsParquetScanner will preferably read the new fields if they are
populated. For tables with only the old fields populated, it will read
them only if they are of simple numeric type, i.e. boolean, integer, or
floating point.

This change removes the comparison of the Parquet Statistics we write to
Hive from the tests, since Hive does not write the new fields.

TODO: This change still needs tests reading statistics from files which
use the old 'min' and 'max' fields, such as those written by Hive. I'll
add these tests in a subsequent patch set.

Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-metadata-utils.h
M common/thrift/parquet.thrift
M tests/query_test/test_insert_parquet.py
9 files changed, 411 insertions(+), 201 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6563/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <l...@cloudera.com>

Reply via email to