[ https://issues.apache.org/jira/browse/HBASE-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922381#action_12922381 ]
Hairong Kuang commented on HBASE-3124: -------------------------------------- It seems to me this unit test has a race condition. ZombieNewLogWriterRegionServer creates a file after detecting that t1 split is done. However, there is no synchronization to guarantee that the created before logs are archived. The logs also show the problem: 2010-10-18 13:58:46,249 INFO [main] util.FSUtils(646): Finished lease recover attempt for hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.9 2010-10-18 13:58:46,255 DEBUG [main] wal.HLog(1574): Pushed=20 entries from hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.92010-10-18 13:58:46,256 DEBUG [SplitWriter-0] wal.HLog$1(1597): Split writer thread for region bbb got 10 to process 2010-10-18 13:58:46,256 DEBUG [SplitWriter-0] wal.HLog$1(1622): Split writer thread for region bbb Applied 10 total edits to bbb in 0ms 2010-10-18 13:58:46,256 DEBUG [SplitWriter-1] wal.HLog$1(1597): Split writer thread for region ccc got 10 to process 2010-10-18 13:58:46,257 DEBUG [SplitWriter-1] wal.HLog$1(1622): Split writer thread for region ccc Applied 10 total edits to ccc in 0ms2010-10-18 13:58:46,267 INFO [main] wal.HLog(1666): Archived processed log hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.0 to /hbase/hlog.old/hlog.dat.0 2010-10-18 13:58:46,269 INFO [ZombieNewLogWriterRegionServer] wal.SequenceFileLogWriter(105): Using syncFs -- HDFS-200 2010-10-18 13:58:46,271 INFO [main] wal.HLog(1666): Archived processed log hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.1 to /hbase/hlog.old/hlog.dat.1 2010-10-18 13:58:46,274 INFO [main] wal.HLog(1666): Archived processed log hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.2 to /hbase/hlog.old/hlog.dat.2 2010-10-18 13:58:46,276 INFO [main] wal.HLog(1666): Archived processed log hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.3 to /hbase/hlog.old/hlog.dat.3 2010-10-18 13:58:46,279 INFO [main] wal.HLog(1666): Archived processed log hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.4 to /hbase/hlog.old/hlog.dat.4 2010-10-18 13:58:46,281 INFO [main] wal.HLog(1666): Archived processed log hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.5 to /hbase/hlog.old/hlog.dat.5 2010-10-18 13:58:46,284 INFO [main] wal.HLog(1666): Archived processed log hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.6 to /hbase/hlog.old/hlog.dat.6 2010-10-18 13:58:46,287 INFO [main] wal.HLog(1666): Archived processed log hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.7 to /hbase/hlog.old/hlog.dat.7 2010-10-18 13:58:46,290 INFO [main] wal.HLog(1666): Archived processed log hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.8 to /hbase/hlog.old/hlog.dat.8 2010-10-18 13:58:46,292 INFO [main] wal.HLog(1666): Archived processed log hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.9 to /hbase/hlog.old/hlog.dat.9 Juliet file creator: created file /hbase/hlog/hlog.dat..juliet 2010-10-18 13:58:46,314 DEBUG [main] wal.HLog(1348): Closed /hbase/t1/1183284561/recovered.edits/0000000000000000001 2010-10-18 13:58:46,336 DEBUG [main] wal.HLog(1348): Closed /hbase/t1/1386060762/recovered.edits/0000000000000000001 2010-10-18 13:58:46,337 INFO [main] wal.HLog(1203): Moving /hbase/hlog/hlog.dat..juliet to /hbase/hlog.old/hlog.dat..juliet 2010-10-18 13:58:46,339 DEBUG [main] wal.HLog(1207): Moved 1 log files to /hbase/hlog.old 2010-10-18 13:58:46,339 WARN [main] hdfs.DFSClient(320): File /hbase/hlog is beng deleted only through Trash org.apache.hadoop.fs.FsShell.delete because all deletes must go through Trash. Deleted /hbase/hlog 2010-10-18 13:58:46,343 INFO [main] wal.HLog(1217): hlog file splitting completed in 184 millis for /hbase/hlog > TestHLogSplit#testSplitWillNotTouchLogsIfNewHLogGets occsionally fails with > NPE > ------------------------------------------------------------------------------- > > Key: HBASE-3124 > URL: https://issues.apache.org/jira/browse/HBASE-3124 > Project: HBase > Issue Type: Bug > Reporter: Hairong Kuang > > I am working on an HDFS side patch to reduce the cost of file recovery in > HLog#splitLog. when I run HBase unit test, I sometimes see > TestHLogSplit#testSplitWillNotTouchLogsIfNewHLogGets fail with the following > error. > java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit.testSplitWillNotTouchLogsIfNewHLogGetsCreatedAfterSplitStarted(TestHLogSplit.java:462) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) > at org.junit.runners.ParentRunner.run(ParentRunner.java:236) > at > org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:46) > at > org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.