I actually decided to remove one of my 2 partition columns and make it a
bucketing column instead... same query completed fully in under 10 minutes
with 92 partitions added. This will suffice for me for now.

On Thu, Jun 11, 2015 at 2:25 PM, Pradeep Gollakota <pradeep...@gmail.com>
wrote:

> Hmm... did your performance increase with the patch you supplied? I do
> need the partitions in Hive, but I have a separate tool that has the
> ability to add partitions to the metastore and is definitely much faster
> than this. I just checked my job again, the actual Hive job completed 24
> hours ago and has been adding the dynamic partitions to the metastore since
> then and is still not done. According to the metastore theres only 10830
> partitions added so far... at this pace, it will take approximately 2 more
> days for it complete.
>
> On Thu, Jun 11, 2015 at 1:18 PM, Slava Markeyev <
> slava.marke...@upsight.com> wrote:
>
>> This is something that a few of us have run into. I think the bottleneck
>> is in partition creation calls to the metastore. My work around was
>> HIVE-10385 which optionally removed partition creation in the metastore but
>> this isn't a solution for everyone. If you don't require actual partitions
>> in the table but simply partitioned data in hdfs give it a shot. It may be
>> worthwhile looking into optimizations for this use case.
>>
>> -Slava
>>
>> On Thu, Jun 11, 2015 at 11:56 AM, Pradeep Gollakota <pradeep...@gmail.com
>> > wrote:
>>
>>> Hi All,
>>>
>>> I have a table which is partitioned on two columns (customer, date). I'm
>>> loading some data into the table using a Hive query. The MapReduce job
>>> completed within a few minutes and needs to "commit" the data to the
>>> appropriate partitions. There were about 32000 partitions generated. The
>>> commit phase has been running for almost 16 hours and has not finished yet.
>>> I've been monitoring jmap, and don't believe it's a memory or gc issue.
>>> I've also been looking at jstack and not sure why it's so slow. I'm not
>>> sure what the problem is, but seems to be a Hive performance issue when it
>>> comes to "highly partitioned" tables.
>>>
>>> Any thoughts on this issue would be greatly appreciated.
>>>
>>> Thanks in advance,
>>> Pradeep
>>>
>>
>>
>>
>> --
>>
>> Slava Markeyev | Engineering | Upsight
>>
>> Find me on LinkedIn <http://www.linkedin.com/in/slavamarkeyev>
>> <http://www.linkedin.com/in/slavamarkeyev>
>>
>
>

Reply via email to