----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/179/#review216 -----------------------------------------------------------
Regarding rewriting recovered edits back to the WAL, I'm afraid we may have an issue with the following situation: - RS A writes edits 1-100 to log - RS A fails - RS B starts recovering log, rewrites edits 1-50 - RS B fails while recovering - RS C starts up. It *should* replay 1-100. Instead I fear it might only replay 1-50, or replay 1-100 followed by 1-50, either of which would be incorrect. We should add a test case where the RS fails in the middle of recovering a prior failure. Also, replayRecoveredEdits returns the highest seqid from the logs. Is it possible for storeFiles to have a higher seqid than what's in the log in any situation? We should add some kind of assert perhaps. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java <http://review.hbase.org/r/179/#comment973> this returns -1 if there is nothing to replay. Shouldn't it return the maxSeqId + 1 perhaps? Do we have a test which makes a region, writes some data, closes the region, opens the region, writes some data, then recovers logs? src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java <http://review.hbase.org/r/179/#comment972> we used to print the sequence ID after the increment, now we print before the increment. Maybe change log to "sequenceid for next edit will be NNN" src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java <http://review.hbase.org/r/179/#comment975> add some javadoc - I think important to note that minSeqId is *exclusive* - it's a little unintuitive. Perhaps makes more sense to pass in maxSeqIdFromStoreFiles+1 here, and make minSeqId be "minimum sequence id, inclusive, to apply" src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java <http://review.hbase.org/r/179/#comment976> should probably use LOG.error for these src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java <http://review.hbase.org/r/179/#comment974> I agree, i don't think the splitter would ever write a partial one. We should treat an IOE here as a real error src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java <http://review.hbase.org/r/179/#comment977> when would this happen? deserves a comment I think src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java <http://review.hbase.org/r/179/#comment978> can we rename maxSeqIdInLog to currentEditSeqId or something? since that's what we're really talking about here. Also reversing this inequality to: if (currentEditSeqId < storeMaxSeqId>) I think it would read more clearly src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java <http://review.hbase.org/r/179/#comment979> I think this will crash when the region contains stores that were the result of an HFOF bulk load (they have no sequence id) src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java <http://review.hbase.org/r/179/#comment980> maybe rename to restoreEdit() or something to be clear that this is during restoration only? src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java <http://review.hbase.org/r/179/#comment981> possibly more efficient: Collections.singletonMap(kv.getFamily(), Collections.singletonList(kv)) though there may be a comparator issue? src/main/java/org/apache/hadoop/hbase/regionserver/Store.java <http://review.hbase.org/r/179/#comment970> comment is now out of date, right? src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java <http://review.hbase.org/r/179/#comment971> maybe just Preconditions.checkArgument here so it throws if you try to pass a non-DFS? - Todd On 2010-06-14 11:43:30, stack wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://review.hbase.org/r/179/ > ----------------------------------------------------------- > > (Updated 2010-06-14 11:43:30) > > > Review request for hbase. > > > Summary > ------- > > + Moved replay of edits up from Store up into Region. This means we play the > edits once only rather than once per Store. > + Lots of cleanup in the replay of edits code. Uses the new flag introduced > by cosmin – hbase.skip.errors. If set, and exception processing edits, we'll > fail. If false, we'll move the broke edits file aside and keep going. Renamed > the method from doReconstructionLog to replayRecoveredEdits (reconstruction?). > + The main change in this patch is that recovering edits, we now go in via > the HRegion main API doing put and delete so that only one code path, so > replayed edits are added to the WAL, and so a flush will be triggered if we > fill memory. > + In HStore, lots of removed code and comments since no longer does log > replay. Cleanup of maximum seqid. Calculate it instead rather than save as a > data member. Its only used once on HRegion startup. > + Change the HRegion#initialize. It used to take 'initial files' which is a > notion never used (it was a means of putting files in place after a split but > split is done internal to HRegion so can do things in HRegions guts w/o need > of exposing notion of initial files). I removed it and added overload that > takes no args which is the usual way this method is invoked. > + Rename the product of splits, 'recovered.edits' instead of 'oldlogfile.log' > + Added small facility to HBaseTestingUtility for creating different user in > a Configuration so can have more than one Filesystem instance the easier. > + Redid the test TestStoreReconstruction as TestWALReplay. > > > This addresses bug hbase-1025. > http://issues.apache.org/jira/browse/hbase-1025 > > > Diffs > ----- > > src/main/java/org/apache/hadoop/hbase/HConstants.java f5d3e94 > src/main/java/org/apache/hadoop/hbase/HMerge.java 62f3561 > src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 06e022c > src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java > bca819e > src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 2a0dcee > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 05cf17f > src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java dbb21d4 > src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 479c661 > src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 43a8a28 > > src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreReconstruction.java > 4f0417d > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java > 86cf4ea > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java > PRE-CREATION > > Diff: http://review.hbase.org/r/179/diff > > > Testing > ------- > > > Thanks, > > stack > >
