Re: 0.18.1 datanode psuedo deadlock problem

Raghu Angadi Fri, 09 Jan 2009 14:46:47 -0800

The scan required for each block report is well known issue and it canbe fixed. It was discussed multiple times (e.g.https://issues.apache.org/jira/browse/HADOOP-3232?focusedCommentId=12587795#action_12587795).

Earlier, inline 'du' on datanodes used to cause the same problem andthey they were moved to a separate thread (HADOOP-3232). block reportscan do the same...

Though 2M blocks on DN is very large, there is no reason block reportsshould break things. Once we fix block reports, something else mightbreak.. but that is different issue.


Raghu.

Jason Venner wrote:

The problem we are having is that datanodes periodically stall for 10-15minutes and drop off the active list and then come back.
What is going on is that a long operation set is holding the lock on onFSDataset.volumes, and all of the other block service requests stallbehind this lock.
"DataNode: [/data/dfs-video-18/dfs/data]" daemon prio=10 tid=0x4d7ad400nid=0x7c40 runnable [0x4c698000..0x4c6990d0]
  java.lang.Thread.State: RUNNABLE
   at java.lang.String.lastIndexOf(String.java:1628)
   at java.io.File.getName(File.java:399)
atorg.apache.hadoop.dfs.FSDataset$FSDir.getGenerationStampFromFile(FSDataset.java:148)atorg.apache.hadoop.dfs.FSDataset$FSDir.getBlockInfo(FSDataset.java:181)atorg.apache.hadoop.dfs.FSDataset$FSVolume.getBlockInfo(FSDataset.java:412)atorg.apache.hadoop.dfs.FSDataset$FSVolumeSet.getBlockInfo(FSDataset.java:511)
   - locked <0x551e8d48> (a org.apache.hadoop.dfs.FSDataset$FSVolumeSet)
   at org.apache.hadoop.dfs.FSDataset.getBlockReport(FSDataset.java:1053)
   at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:708)
   at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2890)
   at java.lang.Thread.run(Thread.java:619)
This is basically taking a stat on every hdfs block on the datanode,which in our case is ~ 2million, and can take 10+ minutes (we may beexperiencing problems with our raid controller but have no visibilityinto it) at the OS level the file system seems fine and operationseventually finish.
It appears that a couple of different data structures are being lockedwith the single object FSDataset$Volume.
Then this happens:
"org.apache.hadoop.dfs.datanode$dataxcei...@1bcee17" daemon prio=10tid=0x4da8d000 nid=0x7ae4 waiting for monitor entry[0x459fe000..0x459ff0d0]
  java.lang.Thread.State: BLOCKED (on object monitor)
atorg.apache.hadoop.dfs.FSDataset$FSVolumeSet.getNextVolume(FSDataset.java:473)- waiting to lock <0x551e8d48> (aorg.apache.hadoop.dfs.FSDataset$FSVolumeSet)
   at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:934)
   - locked <0x54e550e0> (a org.apache.hadoop.dfs.FSDataset)
atorg.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:2322)atorg.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1187)
   at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1045)
   at java.lang.Thread.run(Thread.java:619)

which locks the FSDataset while waiting on the volume object
and now all of the Datanode operations stall waiting on the FSDatasetobject.
----------
Our particular installation doesn't use multiple directories for hdfs,so a first simple hack for a local fix would be to modify getNextVolumeto just return the single volume and not be synchronized
A richer alternative would be to make the locking more fine grained onFSDataset$FSVolumeSet.
Of course we are also trying to fix the file system performance and dfsblock loading that results in the block report taking a long time.
Any suggestions or warnings?

Thanks.

Re: 0.18.1 datanode psuedo deadlock problem

Reply via email to