-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70474/
-----------------------------------------------------------
(Updated May 9, 2019, 7:51 a.m.)
Review request for hive and Peter Vary.
Changes
-------
Fixed the whitespace issue.
Bugs: HIVE-21407
https://issues.apache.org/jira/browse/HIVE-21407
Repository: hive-git
Description
-------
The idea behind the patch is that for CHAR columns extend the predicate which
is pushed to Parquet with an “or” clause which contains the same expression
with a padded and a stripped value.
Example:
column c is a CHAR(10) type and the search expression is c='apple'
The predicate which is pushed to Parquet looked like c='apple ' before the
patch and it would look like (c='apple ' or c='apple') after the patch.
Since the value 'apple' is stored in Parquet without padding, the predicate
before the patch didn’t return any rows. With the patch it will return the
correct row.
Since on predicate level, there is no distinction between CHAR or VARCHAR, the
predicates for VARCHARs will be changed as well, so the result set returned
from Parquet will be wider than before.
Example:
A table contains a c VARCHAR(10) column and there is a row where c='apple' and
there is an other row where c='apple '. If the search expression is c='apple ',
both rows will be returned from Parquet after the patch. But since Hive is
doing an additional filtering on the rows returned from Parquet, it won’t be a
problem, the result set returned by Hive will contain only the row with the
value 'apple '.
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/LeafFilterFactory.java
be4c0d5
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java
0210a0a
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java
d464046
ql/src/test/queries/clientpositive/parquet_ppd_char.q 4230d8c
ql/src/test/queries/clientpositive/parquet_ppd_char2.q PRE-CREATION
ql/src/test/results/clientpositive/parquet_ppd_char2.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/70474/diff/2/
Changes: https://reviews.apache.org/r/70474/diff/1-2/
Testing
-------
Added new q test for testing the PPD for char and varchar types. Also extended
the unit tests for the ParquetFilterPredicateConverter.toFilterPredicate method.
The TestParquetRecordReaderWrapper and the TestParquetFilterPredicate are both
testing the same thing, the behavior of the
ParquetFilterPredicateConverter.toFilterPredicate method. It doesn't make sense
to have tests for the same use case in different test classes, so moved the
test cases from the TestParquetRecordReaderWrapper to
TestParquetFilterPredicate.
Thanks,
Marta Kuczora