[ 
https://issues.apache.org/jira/browse/HBASE-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906908#action_12906908
 ] 

HBase Review Board commented on HBASE-2964:
-------------------------------------------

Message from: "Todd Lipcon" <t...@cloudera.com>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/798/
-----------------------------------------------------------

Review request for hbase and stack.


Summary
-------

Moves all RPCs outside of the region writeLock - the writeLock is now only used 
long enough to set the 'closing' flag. When we drop the lock any waiters will 
see 'closing' upon acquiring the lock, and thus throw NSRE.

In the case that we abort the split, it will reopen the region as before. 
Accessors will have gotten NSRE but will just come back to the same region 
eventually.


This addresses bug HBASE-2964.
    http://issues.apache.org/jira/browse/HBASE-2964


Diffs
-----

  src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 
3507c0d 

Diff: http://review.cloudera.org/r/798/diff


Testing
-------

YCSB testing on my cluster - it used to deadlock due to this bug within an 
hour. I ran a 5 hour load test overnight and it worked OK.


Thanks,

Todd




> Deadlock when RS tries to RPC to itself inside SplitTransaction
> ---------------------------------------------------------------
>
>                 Key: HBASE-2964
>                 URL: https://issues.apache.org/jira/browse/HBASE-2964
>             Project: HBase
>          Issue Type: Bug
>          Components: ipc, regionserver
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Blocker
>         Attachments: hbase-2964.txt
>
>
> In testing the 0.89.20100830 rc, I ran into a deadlock with the following 
> situation:
> - All of the IPC Handler threads are blocked on the region lock, which is 
> held by CompactSplitThread.
> - CompactSplitThread is in the process of trying to edit META to create the 
> offline parent. META happens to be on the same server as is executing the 
> split.
> Therefore, the CompactSplitThread is trying to connect back to itself, but 
> all of the handler threads are blocked, so the IPC never happens. Thus, the 
> entire RS gets deadlocked.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to