-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/798/
-----------------------------------------------------------
(Updated 2010-09-07 13:38:39.968517)
Review request for hbase and stack.
Changes
-------
This version removes from SplitTransaction the setting of the this.parent.lock
completely. Its not needed. Down in the parent close, it takes out the write
lock.
In the past, we had a split lock and a close lock (splitLock and
splitsAndClosesLock). The split lock was held across the split while daughter
regions were calculated and during close, actual split and update of .META. As
part of lock pruning, an error made in hbase-2641, was using
splitsAndClosesLock where splitLock was used previously -- and even expanding
the scope of what splitLock used cover).
Looking, splitLock looks like it could have served some purpose preventing two
threads contending over splitting (splits make objects in filesystem and move
stuff around), but we don't really need this in current HBase since only
CompactSplitThread runs splits -- even in new master regime where client can
call a splitRegion. Later when we want to run multiple concurrent split
transactions, we'll need to reexamine.
Summary
-------
Moves all RPCs outside of the region writeLock - the writeLock is now only used
long enough to set the 'closing' flag. When we drop the lock any waiters will
see 'closing' upon acquiring the lock, and thus throw NSRE.
In the case that we abort the split, it will reopen the region as before.
Accessors will have gotten NSRE but will just come back to the same region
eventually.
This addresses bug HBASE-2964.
http://issues.apache.org/jira/browse/HBASE-2964
Diffs (updated)
-----
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java a692125
src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
3507c0d
src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
a245d97
Diff: http://review.cloudera.org/r/798/diff
Testing
-------
YCSB testing on my cluster - it used to deadlock due to this bug within an
hour. I ran a 5 hour load test overnight and it worked OK.
Thanks,
Todd