Carter Shanklin created HIVE-12682:
--------------------------------------
Summary: Reducers in dynamic partitioning job spend a lot of time
running hadoop.conf.Configuration.getOverlay
Key: HIVE-12682
URL: https://issues.apache.org/jira/browse/HIVE-12682
Project: Hive
Issue Type: Bug
Components: Hive
Affects Versions: 1.2.1
Reporter: Carter Shanklin
Attachments: reducer.png
I tested this on Hive 1.2.1 but looks like it's still applicable to 2.0.
I ran this query:
{code}
create table flights (
…
)
PARTITIONED BY (Year int)
CLUSTERED BY (Month)
SORTED BY (DayofMonth) into 12 buckets
STORED AS ORC
TBLPROPERTIES("orc.bloom.filter.columns"="*")
;
{code}
(Taken from here:
https://github.com/t3rmin4t0r/all-airlines-data/blob/master/ddl/orc.sql)
I profiled just the reduce phase and noticed something odd, the attached graph
shows where time was spent during the reducer phase.
Problem seems to relate to
https://github.com/apache/hive/blob/branch-2.0/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L903
/cc [~gopalv]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)