[ https://issues.apache.org/jira/browse/HIVE-19838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508925#comment-16508925 ]
Eugene Koifman commented on HIVE-19838: --------------------------------------- I think one of the ways {{totalDeleteEventCount}} in {{ColumnizedDeleteEventRegistry}} may be off, is that {{DeleteReaderValue}} takes a ValidWriteIdList which means that {{next()}} may skip some event because it belongs to a transaction that was not yet committed when the current reader locked in the snapshot. In practice, this would require compaction (at least a minor one) which includes a txn that is open to the reader's txn, to complete before the VectorizedOrc reader starts reading - which is possible but not very likely. Another issue, which I think is eliminated by the current patch is, {noformat} if (lastSeenOwid != deleteRecordKey.originalWriteId || lastSeenBucketProperty != deleteRecordKey.bucketProperty) { ++distinctOwids; lastSeenOwid = deleteRecordKey.originalWriteId; lastSeenBucketProperty = deleteRecordKey.bucketProperty; } {noformat} {{distinctOwids}} is incremented when bucketProperty changes, which seems invalid even for bucketed tables. > simplify & fix ColumnizedDeleteEventRegistry load loop > ------------------------------------------------------ > > Key: HIVE-19838 > URL: https://issues.apache.org/jira/browse/HIVE-19838 > Project: Hive > Issue Type: Bug > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > Priority: Major > Attachments: HIVE-19838.01.patch, HIVE-19838.patch > > > Apparently sometimes the delete count in ACID stats doesn't match what merger > actually returns. > It could be due to some deltas having duplicate deletes from parallel queries > (I guess?) that are being squashed by the merger or some other reasons beyond > my mortal comprehension. > The loop assumes the merger will return the exact number of records, so it > fails with array index exception. Also, it could actually be done in a single > loop. -- This message was sent by Atlassian JIRA (v7.6.3#76005)