[ https://issues.apache.org/jira/browse/HIVE-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903664#action_12903664 ]
Ning Zhang commented on HIVE-1602: ---------------------------------- @joydeep, this is intended to be an open ended discussions about how to tackle partition skews. Combining small partitions into one large partitions seems to be a natural way. May be the name of list partition is not so obvious, but I meant to map a list of values from the DP column to one partition rather than a 1-to-1 mapping. HAR is one option and we can keep the partition spec as part of the file name so that the actual column is not stored. Another way is to store the partition column value in the data file itself if the partition corresponds to a list of values. > the user can do a one time analysis of the data (for size distribution on > different partitioning columns) and then generate the clumping logic manually. The problem is that there is no way that the user can manually cluster data with different partition column values. for example, if event is a DP column and you find a couple of large partitions event = {'l', 'g'}, and a 3 small partitions event = {'s', 'm', 'l'}. How can the user manually cluster event=s, event=m, event=l into one? If there are a lot of these small partitions it introduces a lot of problems in HDFS, metastore, and Hive client side. > List Partitioning > ----------------- > > Key: HIVE-1602 > URL: https://issues.apache.org/jira/browse/HIVE-1602 > Project: Hadoop Hive > Issue Type: New Feature > Affects Versions: 0.7.0 > Reporter: Ning Zhang > > Dynamic partition inserts create partitions bases on the dynamic partition > column values. Currently it creates one partition for each distinct DP column > value. This could result in skews in the created dynamic partitions in that > some partitions are large but there could be large number of small partitions > as well. This results in burdens in HDFS as well as metastore. A list > partitioning scheme that aggregate a number of small partitions into one big > one is more preferable for skewed partitions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.