[ https://issues.apache.org/jira/browse/HBASE-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-2861: ------------------------- Priority: Critical (was: Blocker) Changed from blocker to critical. Shouldn't hold up 0.90. Will document this as known issue w/ hdfs-724 suggested workaround. > regionserver's logsyncer thread hangs on > DFSClient$DFSOutputStream.waitForAckedSeqno > ------------------------------------------------------------------------------------ > > Key: HBASE-2861 > URL: https://issues.apache.org/jira/browse/HBASE-2861 > Project: HBase > Issue Type: Bug > Reporter: Kannan Muthukkaruppan > Priority: Critical > Fix For: 0.90.0 > > Attachments: jstack.txt > > > During loads into HBase, we are noticing that a RS is sometimes getting stuck. > The logSyncer thread: > {code} > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.waitForAckedSeqno(DFSClient.java:3367) > - locked <0x00002aaac7fef748> (a java.util.LinkedList) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3301) > at > org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) > at > org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944) > at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:124) > at org.apache.hadoop.hbase.regionserver.wal.HLog.hflush(HLog.java:949) > {code} > A lot of other threads are stuck on: > {code} > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) > at > org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.addToSyncQueue(HLog.java:916) > at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:936) > at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:828) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1657) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1425) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1393) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1665) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.multiPut(HRegionServer.java:2326) > {code} > Subsequently, trying to disable the table, which in turn attempts to close > the region(s), caused internalFlushCache() also to get stuck here: > {code} > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:974) > at > org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:511) > - locked <0x00002aaab76af670> (a java.lang.Object) > at > org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:463) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:1468) > at > org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1329) > {code} > I'll attach the full jstack trace soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.