[ https://issues.apache.org/jira/browse/HDFS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinod Kumar Vavilapalli updated HDFS-7999: ------------------------------------------ Labels: 2.6.1-candidate (was: ) > FsDatasetImpl#createTemporary sometimes holds the FSDatasetImpl lock for a > very long time > ----------------------------------------------------------------------------------------- > > Key: HDFS-7999 > URL: https://issues.apache.org/jira/browse/HDFS-7999 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.6.0 > Reporter: zhouyingchao > Assignee: zhouyingchao > Labels: 2.6.1-candidate > Fix For: 2.7.0 > > Attachments: HDFS-7999-001.patch, HDFS-7999-002.patch, > HDFS-7999-003.patch > > > I'm using 2.6.0 and noticed that sometime DN's heartbeat were delayed for > very long time, say more than 100 seconds. I get the jstack twice and looks > like they are all blocked (at getStorageReport) by dataset lock, and which is > held by a thread that is calling createTemporary, which again is blocked to > wait earlier incarnation writer to exit. > The heartbeat thread stack: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) > - waiting to lock <0x00000007b01428c0> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) > - locked <0x00000007b0140ed0> (a java.lang.Object) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) > at java.lang.Thread.run(Thread.java:662) > The DataXceiver thread holds the dataset lock: > "DataXceiver for client at XXXXX" daemon prio=10 tid=0x00007f14041e6480 > nid=0x52bc in Object.wait() [0x00007f11d78f7000] > java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1194) > locked <0x00000007a33b85d8> (a org.apache.hadoop.util.Daemon) > at > org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1231) > locked <0x00000007b01428c0> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:114) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:179) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) > at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.3.4#6332)