Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17071


Change subject: IMPALA-10501: Hit DCHECK in parquet-column-readers.cc: 
def_levels_.CacheRemaining() <= num_buffered_values_
......................................................................

IMPALA-10501: Hit DCHECK in parquet-column-readers.cc: 
def_levels_.CacheRemaining() <= num_buffered_values_

We had a DCHECK in ScalarColumnReader::MaterializeValueBatch() that
checked that 'num_buffered_values_' is greater or equal to the
number of cached values in the Parquet definition level decoder.

However, the decoder might contain more values because literal
runs are stored in groups of 8, i.e. there might be padding zeros
at the end. Also, the decoder doesn't know the exact number of
the actual values, it is up to the client of the decoder to keep
track the number of values.

I removed this wrong assumption from MaterializeValueBatch() and
modified the code accordingly.

Testing
 * until this patch TestParquetStats::test_page_index was flaky
   because of this issue
 * I tested the solution on a hacked Impala that randomly generated
   skip ranges

Change-Id: Ic071473e7b315300fd5e163225d3e39735f09c4f
---
M be/src/exec/parquet/parquet-column-readers.cc
1 file changed, 6 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17071/1
--
To view, visit http://gerrit.cloudera.org:8080/17071
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic071473e7b315300fd5e163225d3e39735f09c4f
Gerrit-Change-Number: 17071
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to