[ 
https://issues.apache.org/jira/browse/HBASE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917423#action_12917423
 ] 

Nicolas Spiegelberg commented on HBASE-3043:
--------------------------------------------

Pranav's comments: 
1) On WriteState variable privacy: 6 of one, half-dozen of the other.  I made 
sure the WriteState variable was package private.  I was looking at possibly 
some more unit tests dealing with our write state, so I didn't want to write a 
bunch of accessors just to deal with unit tests.  In the unit test case, we 
don't really need to worry about synchronization either.  My thought was to add 
accessor methods if we're going to use it outside of a unit test.  Okay?
2) The lack of unlock() actually could have caused some extremely-rare deadlock 
conditions but only on exit, so no one's probably run across it.  Just mainly 
wanted to fix poor practice.

Stack's comment:
Your thought is correct.  However, I do need to make a small change that I had 
done internally, but lost when I refactored.  This works because of some subtle 
interactions between server.stopRequested(), CompactSplitThread.lock, & 
HRegion.writeState.writesEnabled.  States that can happen:
1) We get the lock & interrupt compactionQueue.poll().  It throws an 
InterruptedException, which calls continue, which fails the next while() check, 
which finishes the close
2) We get the lock & interrupt, but the thread is somewhere between the poll() 
and the lock().  [In new patch] CompactSplitThread.run() queries 
stopRequested() immediately after getting the lock(), which skips the 
compact/split code to return to the while() check and ...
3) We don't get the lock.  HRegionServer.run() calls closeAllRegions(), which 
calls HRegion.close(), which sets the writeState.  The compaction sees this, 
throws an InterruptedIOE, which is aborts the current compaction, goes to the 
while() check in CompactSplitThread.run() and ...

> 'hbase-daemon.sh stop regionserver' should kill compactions that are in 
> progress
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-3043
>                 URL: https://issues.apache.org/jira/browse/HBASE-3043
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100621, 0.90.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>             Fix For: 0.89.20100924, 0.90.0
>
>         Attachments: HBASE-3043_0.89.patch, HBASE-3043_0.90.patch
>
>
> During rolling restarts, we'll occasionally get into a situation with our 
> 100-node cluster where a RS stop takes 5-10 minutes.  The problem is that the 
> RS is undergoing a compaction and won't stop until it is complete.  In a stop 
> situation, it would be preferable to preempt the compaction, delete the 
> newly-created compaction file, and try again once the cluster is restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to