Rajesh Balamohan created HIVE-26013:
---------------------------------------

             Summary: Parquet predicate filters are not properly propogated to 
task configs at runtime
                 Key: HIVE-26013
                 URL: https://issues.apache.org/jira/browse/HIVE-26013
             Project: Hive
          Issue Type: Bug
            Reporter: Rajesh Balamohan


Hive ParquetRecordReader sets the predicate filter in the config for parquet 
libs to read.

Ref: 
[https://github.com/apache/hive/blob/master/ql%2Fsrc%2Fjava%2Forg%2Fapache%2Fhadoop%2Fhive%2Fql%2Fio%2Fparquet%2FParquetRecordReaderBase.java#L188]
{code:java}
 ParquetInputFormat.setFilterPredicate(conf, p);
{code}
This internally sets {color:#FF0000}"parquet.private.read.filter.predicate" 
{color}variable in config.

Ref: 
[https://github.com/apache/parquet-mr/blob/master/parquet-hadoop%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fparquet%2Fhadoop%2FParquetInputFormat.java#L231]

Config set in compilation phase isn't visible at runtime for the tasks. This 
causes filters to be lost and tasks run with excessive IO.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to