[ 
https://issues.apache.org/jira/browse/HBASE-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922381#action_12922381
 ] 

Hairong Kuang commented on HBASE-3124:
--------------------------------------

It seems to me this unit test has a race condition. 
ZombieNewLogWriterRegionServer creates a file after detecting that t1 split is 
done. However, there is no synchronization to guarantee that the created before 
logs are archived.

The logs also show the problem:
2010-10-18 13:58:46,249 INFO  [main] util.FSUtils(646): Finished lease recover 
attempt for hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.9
2010-10-18 13:58:46,255 DEBUG [main] wal.HLog(1574): Pushed=20 entries from 
hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.92010-10-18 13:58:46,256 
DEBUG [SplitWriter-0] wal.HLog$1(1597): Split writer thread for region bbb got 
10 to process
2010-10-18 13:58:46,256 DEBUG [SplitWriter-0] wal.HLog$1(1622): Split writer 
thread for region bbb Applied 10 total edits to bbb in 0ms
2010-10-18 13:58:46,256 DEBUG [SplitWriter-1] wal.HLog$1(1597): Split writer 
thread for region ccc got 10 to process
2010-10-18 13:58:46,257 DEBUG [SplitWriter-1] wal.HLog$1(1622): Split writer 
thread for region ccc Applied 10 total edits to ccc in 0ms2010-10-18 
13:58:46,267 INFO  [main] wal.HLog(1666): Archived processed log 
hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.0 to 
/hbase/hlog.old/hlog.dat.0
2010-10-18 13:58:46,269 INFO  [ZombieNewLogWriterRegionServer] 
wal.SequenceFileLogWriter(105): Using syncFs -- HDFS-200
2010-10-18 13:58:46,271 INFO  [main] wal.HLog(1666): Archived processed log 
hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.1 to 
/hbase/hlog.old/hlog.dat.1
2010-10-18 13:58:46,274 INFO  [main] wal.HLog(1666): Archived processed log 
hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.2 to 
/hbase/hlog.old/hlog.dat.2
2010-10-18 13:58:46,276 INFO  [main] wal.HLog(1666): Archived processed log 
hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.3 to 
/hbase/hlog.old/hlog.dat.3
2010-10-18 13:58:46,279 INFO  [main] wal.HLog(1666): Archived processed log 
hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.4 to 
/hbase/hlog.old/hlog.dat.4
2010-10-18 13:58:46,281 INFO  [main] wal.HLog(1666): Archived processed log 
hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.5 to 
/hbase/hlog.old/hlog.dat.5
2010-10-18 13:58:46,284 INFO  [main] wal.HLog(1666): Archived processed log 
hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.6 to 
/hbase/hlog.old/hlog.dat.6
2010-10-18 13:58:46,287 INFO  [main] wal.HLog(1666): Archived processed log 
hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.7 to 
/hbase/hlog.old/hlog.dat.7
2010-10-18 13:58:46,290 INFO  [main] wal.HLog(1666): Archived processed log 
hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.8 to 
/hbase/hlog.old/hlog.dat.8
2010-10-18 13:58:46,292 INFO  [main] wal.HLog(1666): Archived processed log 
hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.9 to 
/hbase/hlog.old/hlog.dat.9
Juliet file creator: created file /hbase/hlog/hlog.dat..juliet
2010-10-18 13:58:46,314 DEBUG [main] wal.HLog(1348): Closed 
/hbase/t1/1183284561/recovered.edits/0000000000000000001
2010-10-18 13:58:46,336 DEBUG [main] wal.HLog(1348): Closed 
/hbase/t1/1386060762/recovered.edits/0000000000000000001
2010-10-18 13:58:46,337 INFO  [main] wal.HLog(1203): Moving 
/hbase/hlog/hlog.dat..juliet to /hbase/hlog.old/hlog.dat..juliet
2010-10-18 13:58:46,339 DEBUG [main] wal.HLog(1207): Moved 1 log files to 
/hbase/hlog.old
2010-10-18 13:58:46,339 WARN  [main] hdfs.DFSClient(320): File /hbase/hlog is 
beng deleted only through Trash org.apache.hadoop.fs.FsShell.delete because all 
deletes must go through Trash.
Deleted /hbase/hlog
2010-10-18 13:58:46,343 INFO  [main] wal.HLog(1217): hlog file splitting 
completed in 184 millis for /hbase/hlog

> TestHLogSplit#testSplitWillNotTouchLogsIfNewHLogGets occsionally fails with 
> NPE
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-3124
>                 URL: https://issues.apache.org/jira/browse/HBASE-3124
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Hairong Kuang
>
> I am working on an HDFS side patch to reduce the cost of file recovery in 
> HLog#splitLog. when I run HBase unit test, I sometimes see 
> TestHLogSplit#testSplitWillNotTouchLogsIfNewHLogGets fail with the following 
> error.
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit.testSplitWillNotTouchLogsIfNewHLogGetsCreatedAfterSplitStarted(TestHLogSplit.java:462)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>       at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
>       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>       at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
>       at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
>       at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
>       at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
>       at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
>       at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
>       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
>       at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
>       at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:46)
>       at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>       at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
>       at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
>       at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
>       at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to