[ https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838958#action_12838958 ]
He Yongqiang commented on HIVE-1193: ------------------------------------ @Zheng, >>1. How do we make sure that the data is bucketed / sorted? By adding an >>additional map-reduce job? Yes. >>2. What if the user already specified "CLUSTER BY key" in his query? As 1, there will be a new job added which will redistribute the data. If the user specify a cluster by column different than the table's sort and bucket property, we maybe should let it fail. But right now that cluster by is actually ignored. >>3. Do we disable merging of small files when we do this? Yes. We should disable it. we should disable it when enabled enforceBucketing or enforceSorting > ensure sorting properties for a table > ------------------------------------- > > Key: HIVE-1193 > URL: https://issues.apache.org/jira/browse/HIVE-1193 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Namit Jain > Assignee: Namit Jain > Fix For: 0.6.0 > > Attachments: hive.1193.1.patch > > > If a table is sorted, and data is being inserted into that - currently, we > dont make sure that data is sorted. That might be useful some downstream > operations. > This cannot be made the default due to backward compatibility, but an option > can be added for the same -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.