[
https://issues.apache.org/jira/browse/HBASE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917423#action_12917423
]
Nicolas Spiegelberg commented on HBASE-3043:
--------------------------------------------
Pranav's comments:
1) On WriteState variable privacy: 6 of one, half-dozen of the other. I made
sure the WriteState variable was package private. I was looking at possibly
some more unit tests dealing with our write state, so I didn't want to write a
bunch of accessors just to deal with unit tests. In the unit test case, we
don't really need to worry about synchronization either. My thought was to add
accessor methods if we're going to use it outside of a unit test. Okay?
2) The lack of unlock() actually could have caused some extremely-rare deadlock
conditions but only on exit, so no one's probably run across it. Just mainly
wanted to fix poor practice.
Stack's comment:
Your thought is correct. However, I do need to make a small change that I had
done internally, but lost when I refactored. This works because of some subtle
interactions between server.stopRequested(), CompactSplitThread.lock, &
HRegion.writeState.writesEnabled. States that can happen:
1) We get the lock & interrupt compactionQueue.poll(). It throws an
InterruptedException, which calls continue, which fails the next while() check,
which finishes the close
2) We get the lock & interrupt, but the thread is somewhere between the poll()
and the lock(). [In new patch] CompactSplitThread.run() queries
stopRequested() immediately after getting the lock(), which skips the
compact/split code to return to the while() check and ...
3) We don't get the lock. HRegionServer.run() calls closeAllRegions(), which
calls HRegion.close(), which sets the writeState. The compaction sees this,
throws an InterruptedIOE, which is aborts the current compaction, goes to the
while() check in CompactSplitThread.run() and ...
> 'hbase-daemon.sh stop regionserver' should kill compactions that are in
> progress
> --------------------------------------------------------------------------------
>
> Key: HBASE-3043
> URL: https://issues.apache.org/jira/browse/HBASE-3043
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.89.20100621, 0.90.0
> Reporter: Nicolas Spiegelberg
> Assignee: Nicolas Spiegelberg
> Fix For: 0.89.20100924, 0.90.0
>
> Attachments: HBASE-3043_0.89.patch, HBASE-3043_0.90.patch
>
>
> During rolling restarts, we'll occasionally get into a situation with our
> 100-node cluster where a RS stop takes 5-10 minutes. The problem is that the
> RS is undergoing a compaction and won't stop until it is complete. In a stop
> situation, it would be preferable to preempt the compaction, delete the
> newly-created compaction file, and try again once the cluster is restarted.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.