Re: Review Request: hbase-2437 log split code revamp by Cosmin

Todd Lipcon Wed, 26 May 2010 13:35:19 -0700


> On 2010-05-23 13:42:36, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1210
> > <http://review.hbase.org/r/74/diff/1/?file=508#file508line1210>
> >
> >     having looked at this function a few times and been very confused, I'd 
> > love to see a javadoc that explains the entire process - eg what the 
> > "batching" does, what the output looks like, etc
> 
> Cosmin Lehene wrote:
>     the original version of this function determined me to start refactoring 
> in the first place. I'll add the description but if it's still confusing it 
> might need more work.


Oh, it's much much improved! Thanks! I still think a "high level overview" 
would be good to see.


> On 2010-05-23 13:42:36, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1237
> > <http://review.hbase.org/r/74/diff/1/?file=508#file508line1237>
> >
> >     this conf is incorrectly named, right? it's not a number of threads, 
> > but rather a number of logs to read in in each round?
> 
> Cosmin Lehene wrote:
>     perhaps hbase.regionserver.hlog.splitlog.batch.size?

why .regionserver.? I'd say just hbase.hlog.split.batch.size or something


> On 2010-05-23 13:42:36, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1238
> > <http://review.hbase.org/r/74/diff/1/?file=508#file508line1238>
> >
> >     this conf is very vaguely named
> 
> Cosmin Lehene wrote:
>     I know and tried to get a better name when created it. Can you suggest 
> something better? I can't figure a short descriptive enough name
>

if this applies only to hlog splitting, maybe hbase.hlog.split.skip.errors


> On 2010-05-23 13:42:36, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1525
> > <http://review.hbase.org/r/74/diff/1/?file=508#file508line1525>
> >
> >     typo, and we should fix this before committing. we can use the 
> > org.apache.hadoop.util.VersionInfo class for this, or use reflection to 
> > look for the "hflush()" function
> 
> Cosmin Lehene wrote:
>     I'd rather have these differences dealt at the lowest level (writers) and 
> abstracted than spread across code.
>     What do you think?

Which do you mean by "writers" here? I'd support factoring this function out 
into an HdfsUtil class somewhere.


> On 2010-05-23 13:42:36, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1745
> > <http://review.hbase.org/r/74/diff/1/?file=508#file508line1745>
> >
> >     yes, I think so. The RS could have crashed right after opening but 
> > before writing any data, and if the master failed to recover that, then 
> > we'd never recover that region. I say ignore with a WARN
> 
> Cosmin Lehene wrote:
>     more aspects here:
>     I think the reported size will be >0 after recover, even if file has no 
> records. I was asking if we should add logic to check if it's the last log. 
>     EOF for non zero length, non zero records file means file is corrupted.

I agree if it has no records (I think - do we syncfs after writing the 
sequencefile header?). But there's the case where inside SequenceFile we call 
create, but never actually write any bytes. This is still worth recovering.

In general I think a corrupt tail means we should drop that record (incomplete 
written record) but not shut down. This is only true if it's the tail record, 
though.


> On 2010-05-23 13:42:36, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1768
> > <http://review.hbase.org/r/74/diff/1/?file=508#file508line1768>
> >
> >     I think we need to handle EOF specially here too, though OK to leave 
> > this for another JIRA. IIRC one of the FB guys opened this already
> 
> Cosmin Lehene wrote:
>     what's the other JIRA? see my above comments.

Can't find it now... does my above comment make sense?


> On 2010-05-23 13:42:36, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1811
> > <http://review.hbase.org/r/74/diff/1/?file=508#file508line1811>
> >
> >     no point to call .sync() here, it just wastes a bunch of IO to write 
> > "sync markers" which we don't make any real use of.
> 
> Cosmin Lehene wrote:
>     sync() used to call syncFs(). It looks like HBASE-2544 changed things a 
> bit, but it doesn't only add the SequenceFile sync marker.
>     
>     I added this after I've seen inconsistent results when running splitLog 
> on bigger hlogs. Try copying a log from the cluster locally and run splitLog 
> from the command line a few times without flushing it after each append. I 
> used to get inconsistent results between runs and calling sync fixed it.
>     
>     There's this "//TODO: test the split of a large (lots of regions > 500 
> file). In my tests it seems without hflush"  in the TestHLogSplit. 
>     
>     We could do some testing to figure out why would log entries be lost when 
> running locally.
>     
>     What would be a better way to flush the writer?

This seems really voodoo.. if anything we're probably masking a real bug by 
doing this. Can you write a unit test which shows this problem (even if it 
takes 30 minutes to run, would be good to have in our arsenal)


> On 2010-05-23 13:42:36, Todd Lipcon wrote:
> > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java, 
> > line 197
> > <http://review.hbase.org/r/74/diff/1/?file=509#file509line197>
> >
> >     this is the case I take issue with - trailing garbage should still 
> > proceed (with warnings) even if we've told it not to skip errors.
> 
> Cosmin Lehene wrote:
>     I'd like to be able to investigate the trailing garbage. I don't think 
> this should ever happen (do you see any scenarios?). If it did we might lose 
> data. We used to fix NameNode edits for fsImage by adding a missing byte to a 
> corrupted entry.
>     
>     I'd like to reflect more on this, maybe see other opinions.

The case where this happens is if you crash in the middle of appending a long 
edit. Consider the case where a single edit might have 1MB of data (large 
rows). We can easily crash in the middle of transferring it, before we call 
sync on the edit. In this case, the client never received an "ack" for the 
write, so we can feel free to throw it away (this isn't data loss, it's correct 
operation).


- Todd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/74/#review40
-----------------------------------------------------------


On 2010-05-21 12:11:59, stack wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/74/
> -----------------------------------------------------------
> 
> (Updated 2010-05-21 12:11:59)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> This is a version of Cosmin's patch that applies to trunk doctored to work w/ 
> hadoop 0.20.
> 
> 
> This addresses bug hbase-2437.
>     http://issues.apache.org/jira/browse/hbase-2437
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e479eae 
>   src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java 
> PRE-CREATION 
> 
> Diff: http://review.hbase.org/r/74/diff
> 
> 
> Testing
> -------
> 
> All his new tests past.
> 
> 
> Thanks,
> 
> stack
> 
>

Re: Review Request: hbase-2437 log split code revamp by Cosmin

Reply via email to