[
https://issues.apache.org/jira/browse/HIVE-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ashutosh Chauhan updated HIVE-5298:
-----------------------------------
Status: Open (was: Patch Available)
Canceling patch till we have better understanding of problem.
> AvroSerde performance problem caused by HIVE-3833
> -------------------------------------------------
>
> Key: HIVE-5298
> URL: https://issues.apache.org/jira/browse/HIVE-5298
> Project: Hive
> Issue Type: Improvement
> Components: Query Processor
> Affects Versions: 0.11.0
> Reporter: Xuefu Zhang
> Assignee: Xuefu Zhang
> Attachments: HIVE-5298.1.patch, HIVE-5298.patch
>
>
> HIVE-3833 fixed the targeted problem and made Hive to use partition-level
> metadata to initialize object inspector. In doing that, however, it goes thru
> every file under the table to access the partition metadata, which is very
> inefficient, especially in case of multiple files per partition. This causes
> more problem for AvroSerde because AvroSerde initialization accesses schema,
> which is located on file system. As a result, before hive can process any
> data, it needs to access every file for a table, which can take long enough
> to cause job failure because of lack of job progress.
> The improvement can be made so that partition metadata is only access once
> per partition.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)