[ 
https://issues.apache.org/jira/browse/HIVE-16969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma updated HIVE-16969:
----------------------------
    Description: 
For a table with many partition files, 
MapOperator.cloneConfsForNestedColPruning() will update the 
hive.io.file.readNestedColumn.paths many times. The larger value of 
hive.io.file.readNestedColumn.paths will cause the poor performance for 
ParquetHiveSerDe.processRawPrunedPaths(). 
So, the unnecessary paths should not be appended to 
hive.io.file.readNestedColumn.paths.

  was:
For a table with many partition files, 
MapOperator.cloneConfsForNestedColPruning() will update the 
hive.io.file.readNestedColumn.paths many times. The larger value of 
hive.io.file.readNestedColumn.paths will cause the poor performance for 
ParquetHiveSerDe.processRawPrunedPaths(). 
So, the unnecessary paths should be appended to 
hive.io.file.readNestedColumn.paths.


> Improvement performance of MapOperator for Parquet
> --------------------------------------------------
>
>                 Key: HIVE-16969
>                 URL: https://issues.apache.org/jira/browse/HIVE-16969
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: Colin Ma
>            Assignee: Colin Ma
>             Fix For: 3.0.0
>
>         Attachments: HIVE-16969.001.patch
>
>
> For a table with many partition files, 
> MapOperator.cloneConfsForNestedColPruning() will update the 
> hive.io.file.readNestedColumn.paths many times. The larger value of 
> hive.io.file.readNestedColumn.paths will cause the poor performance for 
> ParquetHiveSerDe.processRawPrunedPaths(). 
> So, the unnecessary paths should not be appended to 
> hive.io.file.readNestedColumn.paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to