Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3940

to look at the new patch set (#3).

Change subject: IMPALA-3964: Fix crash when a count(*) is performed on a nested 
collection.
......................................................................

IMPALA-3964: Fix crash when a count(*) is performed on a nested collection.

The Bug: Prior to this patch, a DCHECK was used to verify that the
underlying memory pool for the scratch batch was empty in a count based
scenario. For IMPALA-3964 (where a count(*) is performed on a nested
collection), if a Parquet column chunk is compressed, upon reading each
new data page it would be decompressed and eventually placed in to the
underlying scratch batch memory pool causing the aforementioned DCHECK
to fail. This was not picked up in the test suite as the TPCH nested
Parquet data is not compressed.

The Fix: Removed the erroneous DCHECK. Added logic to determine if any
remaining memory in the scratch batch needs to be moved to the output
batch, if so, it will be done. Augmented the load_nested.py script to
snappy compress each of the tables within the 'tpch_nested_parquet'
database. This is consistent with how the flat TPCH Parquet data set
is stored. Regarding test coverage, there are already a number of tests
that will perform nested collection counts against the tables in the
'tpch_nested_parquet' database. For uncompressed nested Parquet, the
'test_nested_types.py' test suite leverages the 'ComplexTypesTbl' table
to provide good coverage.

Change-Id: Id0955c85d18dfba4bd29a35ec95d0355da050607
---
M be/src/exec/hdfs-parquet-scanner.cc
M testdata/bin/load_nested.py
2 files changed, 9 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/3940/3
-- 
To view, visit http://gerrit.cloudera.org:8080/3940
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id0955c85d18dfba4bd29a35ec95d0355da050607
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Christopher Channing <cchann...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Christopher Channing <cchann...@cloudera.com>
Gerrit-Reviewer: Michael Ho

Reply via email to