----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/70474/ -----------------------------------------------------------
(Updated May 9, 2019, 7:51 a.m.) Review request for hive and Peter Vary. Changes ------- Fixed the whitespace issue. Bugs: HIVE-21407 https://issues.apache.org/jira/browse/HIVE-21407 Repository: hive-git Description ------- The idea behind the patch is that for CHAR columns extend the predicate which is pushed to Parquet with an “or” clause which contains the same expression with a padded and a stripped value. Example: column c is a CHAR(10) type and the search expression is c='apple' The predicate which is pushed to Parquet looked like c='apple ' before the patch and it would look like (c='apple ' or c='apple') after the patch. Since the value 'apple' is stored in Parquet without padding, the predicate before the patch didn’t return any rows. With the patch it will return the correct row. Since on predicate level, there is no distinction between CHAR or VARCHAR, the predicates for VARCHARs will be changed as well, so the result set returned from Parquet will be wider than before. Example: A table contains a c VARCHAR(10) column and there is a row where c='apple' and there is an other row where c='apple '. If the search expression is c='apple ', both rows will be returned from Parquet after the patch. But since Hive is doing an additional filtering on the rows returned from Parquet, it won’t be a problem, the result set returned by Hive will contain only the row with the value 'apple '. Diffs (updated) ----- ql/src/java/org/apache/hadoop/hive/ql/io/parquet/LeafFilterFactory.java be4c0d5 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java 0210a0a ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java d464046 ql/src/test/queries/clientpositive/parquet_ppd_char.q 4230d8c ql/src/test/queries/clientpositive/parquet_ppd_char2.q PRE-CREATION ql/src/test/results/clientpositive/parquet_ppd_char2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/70474/diff/2/ Changes: https://reviews.apache.org/r/70474/diff/1-2/ Testing ------- Added new q test for testing the PPD for char and varchar types. Also extended the unit tests for the ParquetFilterPredicateConverter.toFilterPredicate method. The TestParquetRecordReaderWrapper and the TestParquetFilterPredicate are both testing the same thing, the behavior of the ParquetFilterPredicateConverter.toFilterPredicate method. It doesn't make sense to have tests for the same use case in different test classes, so moved the test cases from the TestParquetRecordReaderWrapper to TestParquetFilterPredicate. Thanks, Marta Kuczora