[ 
https://issues.apache.org/jira/browse/HBASE-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-946:
------------------------

    Attachment: 946.patch

I want to apply this to branch and trunk (Running unit tests now)

> Row with 55k deletes timesout scanner lease
> -------------------------------------------
>
>                 Key: HBASE-946
>                 URL: https://issues.apache.org/jira/browse/HBASE-946
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.18.1, 0.19.0
>
>         Attachments: 946.patch
>
>
> Made a blocker because it was found by Jon Gray (smile)
> So, Jon Gray has a row with 55k deletes all in the same row.  When he tries 
> to scan, his scanner timesout when it gets to this row.  The root cause is 
> the mechanism we use to make sure a delete in a new store file overshadows an 
> entry at same address in an old file.   We accumulate a List of all deletes 
> encountered.  Before adding a delete to the List, we check if already a 
> deleted.  This check is whats killing us.  One issue is that its doing super 
> inefficient check of whether table is root but even fixing this inefficency 
> -- and then removing the check for root since its redundant we're still too 
> slow.
> Chatting with Jim K, he suggested that ArrayList check is linear.  Changing 
> the aggregation of deletes to instead use HashSet makes all run an order of 
> magnitude faster.
> Also part of this issue, need to figure why on compaction we are not letting 
> go of these deletes.
> Filing this issue against 0.18.1 so it gets into the RC2 (after chatting w/ 
> J-D and JK -- J-D is seeing the issue also).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to