Hi All,

I have a table which is partitioned on two columns (customer, date). I'm
loading some data into the table using a Hive query. The MapReduce job
completed within a few minutes and needs to "commit" the data to the
appropriate partitions. There were about 32000 partitions generated. The
commit phase has been running for almost 16 hours and has not finished yet.
I've been monitoring jmap, and don't believe it's a memory or gc issue.
I've also been looking at jstack and not sure why it's so slow. I'm not
sure what the problem is, but seems to be a Hive performance issue when it
comes to "highly partitioned" tables.

Any thoughts on this issue would be greatly appreciated.

Thanks in advance,
Pradeep

Reply via email to