[ 
https://issues.apache.org/jira/browse/HIVE-19838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509003#comment-16509003
 ] 

Eugene Koifman commented on HIVE-19838:
---------------------------------------

+1
I left a couple of nits on RB
Ignore my previous comment about distinctOwids.  It's a poorly named variable - 
it's really counting the number of distinct (writeid, bucketproperty) pairs and 
the search on CompressedOwid matches this.  

Note to self:
For unbucketed tables, if multiple bucket files are all loaded, each files has 
it's own reader in the heap, which means regardless of how delete events are 
spread among files, the heap sorts all of them by (writeid, bucketprop. rowid) 
so ColumnizedDeleteEventRegistry.isDeleted() looks ok.

> simplify & fix ColumnizedDeleteEventRegistry load loop
> ------------------------------------------------------
>
>                 Key: HIVE-19838
>                 URL: https://issues.apache.org/jira/browse/HIVE-19838
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Major
>         Attachments: HIVE-19838.01.patch, HIVE-19838.patch
>
>
> Apparently sometimes the delete count in ACID stats doesn't match what merger 
> actually returns.
> It could be due to some deltas having duplicate deletes from parallel queries 
> (I guess?) that are being squashed by the merger or some other reasons beyond 
> my mortal comprehension.
> The loop assumes the merger will return the exact number of records, so it 
> fails with array index exception. Also, it could actually be done in a single 
> loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to