[
https://issues.apache.org/jira/browse/HIVE-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13778980#comment-13778980
]
Ashutosh Chauhan commented on HIVE-5298:
----------------------------------------
Can you provide more detail? I think pathToPartInfo will really be returning
partition directory (I think variable oneFile is misnamed there). If so, it
seems like for loop will have same # of iterations before and after patch. I
don't get from where the perf advantage is coming from.
> AvroSerde performance problem caused by HIVE-3833
> -------------------------------------------------
>
> Key: HIVE-5298
> URL: https://issues.apache.org/jira/browse/HIVE-5298
> Project: Hive
> Issue Type: Improvement
> Components: Query Processor
> Affects Versions: 0.11.0
> Reporter: Xuefu Zhang
> Assignee: Xuefu Zhang
> Fix For: 0.13.0
>
> Attachments: HIVE-5298.1.patch, HIVE-5298.patch
>
>
> HIVE-3833 fixed the targeted problem and made Hive to use partition-level
> metadata to initialize object inspector. In doing that, however, it goes thru
> every file under the table to access the partition metadata, which is very
> inefficient, especially in case of multiple files per partition. This causes
> more problem for AvroSerde because AvroSerde initialization accesses schema,
> which is located on file system. As a result, before hive can process any
> data, it needs to access every file for a table, which can take long enough
> to cause job failure because of lack of job progress.
> The improvement can be made so that partition metadata is only access once
> per partition.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira