Rajesh Balamohan created HIVE-26013:
---------------------------------------
Summary: Parquet predicate filters are not properly propogated to
task configs at runtime
Key: HIVE-26013
URL: https://issues.apache.org/jira/browse/HIVE-26013
Project: Hive
Issue Type: Bug
Reporter: Rajesh Balamohan
Hive ParquetRecordReader sets the predicate filter in the config for parquet
libs to read.
Ref:
[https://github.com/apache/hive/blob/master/ql%2Fsrc%2Fjava%2Forg%2Fapache%2Fhadoop%2Fhive%2Fql%2Fio%2Fparquet%2FParquetRecordReaderBase.java#L188]
{code:java}
ParquetInputFormat.setFilterPredicate(conf, p);
{code}
This internally sets {color:#FF0000}"parquet.private.read.filter.predicate"
{color}variable in config.
Ref:
[https://github.com/apache/parquet-mr/blob/master/parquet-hadoop%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fparquet%2Fhadoop%2FParquetInputFormat.java#L231]
Config set in compilation phase isn't visible at runtime for the tasks. This
causes filters to be lost and tasks run with excessive IO.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)