Dynamic partition insert performance problem
--------------------------------------------
Key: HIVE-2087
URL: https://issues.apache.org/jira/browse/HIVE-2087
Project: Hive
Issue Type: Bug
Components: Metastore
Affects Versions: 0.7.0
Environment: Amazon EMR, S3
Reporter: Q Long
Create an external(backed by S3) table T, make it partitioned by column P.
Populate table T so it has large number of partitions (say 100). Execute
statement like
insert overwrite table T partition (p) select * from another_table
check hive server log, and it will show that all existing partitions will be
read and loaded before any mapper starts working. This feels excessive, given
that the insert statement may only create or overwrite a very small number of
partitions. Is there other reason that insert using dynamic partition requires
loading the whole table?
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira