[ https://issues.apache.org/jira/browse/HIVE-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903708#action_12903708 ]
Ning Zhang commented on HIVE-1602: ---------------------------------- I agree this will be a big change and we are tossing the ideas here. We don't have a final plan yet. HAR is one idea and definitely we should try it once HIVE-1467 is done. But as you said it won't change the # of partitions. Check out some of our tables, which has more than 240 partitions each day. With dynamic partition, it is very easy to increase it even more. Another idea Namit and I were talking about is to store the mapping from the list of values {'s', 'm', 'l'} to the actual partition location and store this mapping in the metastore. This essentially separates the logical concept of partition from the physical storage location (HDFS directories). This could be a big change and break some users' assumption who are relying on the reverse of the mapping (figuring out partition from the HDFS directory). If we decide to go this route, inserting is easy as we get the mapping from metastore and decide which directory to write given an output row. Querying is a little bit complicated as the partition prunning phase need to figure out which physical directory a partition correspond to and get the partition column value from the data file itself rather than from the directory name. The overhead is of course the partition column value need extra storage in the data file. But if we sort based on the partitioning column and with RCFile and column level run-length compression (which we have already supported), the storage overhead is very small. > List Partitioning > ----------------- > > Key: HIVE-1602 > URL: https://issues.apache.org/jira/browse/HIVE-1602 > Project: Hadoop Hive > Issue Type: New Feature > Affects Versions: 0.7.0 > Reporter: Ning Zhang > > Dynamic partition inserts create partitions bases on the dynamic partition > column values. Currently it creates one partition for each distinct DP column > value. This could result in skews in the created dynamic partitions in that > some partitions are large but there could be large number of small partitions > as well. This results in burdens in HDFS as well as metastore. A list > partitioning scheme that aggregate a number of small partitions into one big > one is more preferable for skewed partitions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.