[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhihong Yu updated HBASE-5623: ------------------------------ Hadoop Flags: Reviewed Status: Patch Available (was: Open) Patch makes sense. > Race condition when rolling the HLog and hlogFlush > -------------------------------------------------- > > Key: HBASE-5623 > URL: https://issues.apache.org/jira/browse/HBASE-5623 > Project: HBase > Issue Type: Bug > Components: wal > Affects Versions: 0.94.0 > Reporter: Enis Soztutar > Assignee: Enis Soztutar > Attachments: HBASE-5623_v0.patch > > > When doing a ycsb test with a large number of handlers > (regionserver.handler.count=60), I get the following exceptions: > {code} > Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: > java.lang.NullPointerException > at > org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) > at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) > at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) > at > org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) > at $Proxy1.multi(Unknown Source) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) > at > org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) > {code} > and > {code} > java.lang.NullPointerException > at > org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) > at > org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) > at > org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) > at > org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) > at > org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) > at > org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) > at > org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) > at > org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) > at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) > {code} > It seems the root cause of the issue is that we open a new log writer and > close the old one at HLog#rollWriter() holding the updateLock, but the other > threads doing syncer() calls > {code} > logSyncerThread.hlogFlush(this.writer); > {code} > without holding the updateLock. LogSyncer only synchronizes against > concurrent appends and flush(), but not on the passed writer, which can be > closed already by rollWriter(). In this case, since > SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira