Matt Martin created PARQUET-4:
---------------------------------

             Summary: Use LRU caching for footers in ParquetInputFormat.
                 Key: PARQUET-4
                 URL: https://issues.apache.org/jira/browse/PARQUET-4
             Project: Parquet
          Issue Type: Bug
          Components: parquet-mr
            Reporter: Matt Martin


The caching approach needs to change because of issues that occur when the same 
ParquetInputFormat instance is reused to generate splits for different input 
directories. For example, it causes problems in Hive's FetchOperator when the 
FetchOperator is attempting to operate over more than one partition (sidenote: 
as far as I could tell, Hive has been reusing inputformat instances in this way 
for quite some time). The details of how this issue manifests itself with 
respect to Hive are described in more detail here: 
https://groups.google.com/d/msg/parquet-dev/0aXql-3z7vE/Gn5m094V7PMJ

The proposed patch can be found here: 
https://github.com/apache/incubator-parquet-mr/pull/2



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to