[ https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eugene Koifman updated HIVE-16832: ---------------------------------- Attachment: HIVE-16832.01.patch HIVE-16832.01.patch is an incomplete WIP VectorizedOrcAcidRowBatchReader assumes that ROW__ID.bucketId is the same in each split (and each bucket file of a delete_delta) which is no longer the case SortedDynPartitionOptimizer needs to ensure that data is sorted by by (ROW__ID.bucketId%numBuckets) before it's sorted by ROW__ID so that FileSinkOperator.process() sees all rows for a given bucket equivalence set before moving on to the next equivalence set. > duplicate ROW__ID possible in multi insert into transactional table > ------------------------------------------------------------------- > > Key: HIVE-16832 > URL: https://issues.apache.org/jira/browse/HIVE-16832 > Project: Hive > Issue Type: Bug > Components: Transactions > Affects Versions: 2.2.0 > Reporter: Eugene Koifman > Assignee: Eugene Koifman > Priority: Critical > Attachments: HIVE-16832.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)