[
https://issues.apache.org/jira/browse/HBASE-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-946:
------------------------
Attachment: 946.patch
I want to apply this to branch and trunk (Running unit tests now)
> Row with 55k deletes timesout scanner lease
> -------------------------------------------
>
> Key: HBASE-946
> URL: https://issues.apache.org/jira/browse/HBASE-946
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Priority: Blocker
> Fix For: 0.18.1, 0.19.0
>
> Attachments: 946.patch
>
>
> Made a blocker because it was found by Jon Gray (smile)
> So, Jon Gray has a row with 55k deletes all in the same row. When he tries
> to scan, his scanner timesout when it gets to this row. The root cause is
> the mechanism we use to make sure a delete in a new store file overshadows an
> entry at same address in an old file. We accumulate a List of all deletes
> encountered. Before adding a delete to the List, we check if already a
> deleted. This check is whats killing us. One issue is that its doing super
> inefficient check of whether table is root but even fixing this inefficency
> -- and then removing the check for root since its redundant we're still too
> slow.
> Chatting with Jim K, he suggested that ArrayList check is linear. Changing
> the aggregation of deletes to instead use HashSet makes all run an order of
> magnitude faster.
> Also part of this issue, need to figure why on compaction we are not letting
> go of these deletes.
> Filing this issue against 0.18.1 so it gets into the RC2 (after chatting w/
> J-D and JK -- J-D is seeing the issue also).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.