-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50888/
-----------------------------------------------------------
(Updated Aug. 10, 2016, 10:50 a.m.)
Review request for hive and Ashutosh Chauhan.
Changes
-------
In some corner cases, it is possible that partitions can have nested & multiple
directories. (e.g table/ii=1/jj=15/q=10/r=20/s=30/000000_0,
table/ii=1/jj=15/q=11/r=22/s=33/000000_0 where in ii and jj are the only
partition columns).
{{HiveMetastoreChecker.getPartitionName}} ends up resolving partition names as
"ii=1/jj=15/q=11/r=22/s=33" and "ii=1/jj=15/q=10/r=20/s=30".
When msck is run, it would end up throwing duplicate partitions exception for
ii=1, jj=15 in MS. msck falls back to {{msckAddPartitionsOneByOne}}, which
tries to repair one partition at a time and ignores any exceptions. So job
completes essentially, but ends up making lots of calls to MS and can be too
slow. I will attach the latest patch in RB
Without Patch:
=============
msck runtime for 10000 partitions in small cluster: *370 seconds*
With Patch:
===========
msck runtime for 10000 partitions in small cluster: *62 seconds*
Bugs: HIVE-14462
https://issues.apache.org/jira/browse/HIVE-14462
Repository: hive-git
Description
-------
Metastore already does all the validations. Lots of MS calls are made just
before add_partitions to double check if the partitions exists. This impacts
perf when large number of partitions are present.
Diffs (updated)
-----
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
38c0eed
ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java a59b781
ql/src/java/org/apache/hadoop/hive/ql/metadata/CheckResult.java ec9deeb
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java
a164b12
ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java
5b8ec60
Diff: https://reviews.apache.org/r/50888/diff/
Testing
-------
Thanks,
Rajesh Balamohan