Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18265


Change subject: IMPALA-11147: Min/max filtering crashes on Parquet file that 
contains partition columns
......................................................................

IMPALA-11147: Min/max filtering crashes on Parquet file that contains partition 
columns

Impala crashes on a Parquet file that contains the partition columns.
Data files usually don't contain the partition columns, so Impala don't
expect to find such columns in the data files. Unfortunately min/max
filtering generates a SEGFAULT when the partition column is present in
the data files.

It happens when FindSkipRangesForPagesWithMinMaxFilters() tries to
retrieve the Parquet schema element for a given slot descriptor. When
the slot descriptor refers to a partition column, we usually don't find
a schema element so we don't try to skip pages.

But when the partition column is present in the data file, the code
tries to calculate the filtered pages for the column. It uses the column
reader object corresponding to the column, but this is NULL for
partition columns, hence we get a SEGFAULT.

The code shouldn't do anything at the page-level for partition columns,
as the data in such columns are the same for the whole file and it is
already filtered at a higher level.

Testing:
 * added e2e test

Change-Id: I17eff4467da3fd67a21353ba2d52d3bec405acd2
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M testdata/data/README
A testdata/data/partition_col_in_parquet.parquet
M tests/query_test/test_runtime_filters.py
4 files changed, 35 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/18265/1
--
To view, visit http://gerrit.cloudera.org:8080/18265
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I17eff4467da3fd67a21353ba2d52d3bec405acd2
Gerrit-Change-Number: 18265
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to