Re: Review Request: hbase-2437 log split code revamp by Cosmin

Todd Lipcon Sun, 23 May 2010 13:43:05 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/74/#review40
-----------------------------------------------------------



Looks pretty good. If possible it would be awesome to move all of these log 
splitting methods into a new class, but we can do that in a followup.


src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment207>

    why not make these Class<? extends HLog.Reader>?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment208>

    can we use conf.getClass here? That way we'd have the appropriate generic 
types



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment209>

    having looked at this function a few times and been very confused, I'd love 
to see a javadoc that explains the entire process - eg what the "batching" 
does, what the output looks like, etc



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment211>

    this conf is incorrectly named, right? it's not a number of threads, but 
rather a number of logs to read in in each round?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment210>

    this conf is very vaguely named



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment212>

    should add e as a second parameter, so you get the full stack trace in this 
warning



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment213>

    typo, and we should fix this before committing. we can use the 
org.apache.hadoop.util.VersionInfo class for this, or use reflection to look 
for the "hflush()" function



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment214>

    javadoc



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment215>

    we should use a namingthreadfactory here. Consider:
    
    
http://guava-libraries.googlecode.com/svn-history/r13/trunk/javadoc/com/google/common/util/concurrent/NamingThreadFactory.html
    
    (I have some patches coming in which use guava, it's very handy)



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment216>

    we're calling execute(Thread) whereas execute takes a Runnable. By some 
quirk of java, Thread extends Runnable, but this is somewhat strange - we 
should make createNewSplitThread be createNewSplitter or createNewSplitRunnable



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment217>

    if j > 30 (or something), escalate log to info level. Also preferable to 
log the number of seconds, not the iteration.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment218>

    yes, I think so. The RS could have crashed right after opening but before 
writing any data, and if the master failed to recover that, then we'd never 
recover that region. I say ignore with a WARN



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment219>

    see above logic - writer could have crashed after writing only part of the 
sequencefile header, etc, so we should just warn and continue



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment220>

    I think we need to handle EOF specially here too, though OK to leave this 
for another JIRA. IIRC one of the FB guys opened this already



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment221>

    this thread name is not very descriptive



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment222>

    wouldn't it make more sense to just add them in the other order? LinkedList 
is doubly linked, so adding to the beginning or end are both constant time, and 
then we wouldn't have to do this contortion to iterate backwards here. 



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment223>

    we can move this out of the inner loop, right?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment224>

    add comment about what this is doing - presumably the point here is that 
some previous master may have started splitting the logs, then crashed in the 
middle.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment225>

    no point to call .sync() here, it just wastes a bunch of IO to write "sync 
markers" which we don't make any real use of.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment226>

    I'd think we need to fail here. This ties into the comment above about 
getting Futures out of the threadpool and checking any uncaught exceptions



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment227>

    javadoc



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment228>

    the .exists checks are redundant, since HDFS mkdirs acts like mkdir -p, 
we're just wasting RPCs



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/74/#comment229>

    this log message has never been english



src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java
<http://review.hbase.org/r/74/#comment230>

    apache license



src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java
<http://review.hbase.org/r/74/#comment231>

    this is the case I take issue with - trailing garbage should still proceed 
(with warnings) even if we've told it not to skip errors.


- Todd


On 2010-05-21 12:11:59, stack wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/74/
> -----------------------------------------------------------
> 
> (Updated 2010-05-21 12:11:59)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> This is a version of Cosmin's patch that applies to trunk doctored to work w/ 
> hadoop 0.20.
> 
> 
> This addresses bug hbase-2437.
>     http://issues.apache.org/jira/browse/hbase-2437
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e479eae 
>   src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java 
> PRE-CREATION 
> 
> Diff: http://review.hbase.org/r/74/diff
> 
> 
> Testing
> -------
> 
> All his new tests past.
> 
> 
> Thanks,
> 
> stack
> 
>

Re: Review Request: hbase-2437 log split code revamp by Cosmin

Reply via email to