Re: 0.18.1 datanode psuedo deadlock problem

Konstantin Shvachko Fri, 09 Jan 2009 11:47:05 -0800

Hi Jason,

2 million blocks per data-node is not going to work.
There were discussions about it previously, please
check the mail archives.


This means you have a lot of very small files, which
HDFS is not designed to support. A general recommendation
is to group small files into large ones, introducing
some kind of record structure delimiting those small files,
and control it in on the application level.

Thanks,
--Konstantin


Jason Venner wrote:

The problem we are having is that datanodes periodically stall for 10-15minutes and drop off the active list and then come back.
What is going on is that a long operation set is holding the lock on onFSDataset.volumes, and all of the other block service requests stallbehind this lock.
"DataNode: [/data/dfs-video-18/dfs/data]" daemon prio=10 tid=0x4d7ad400nid=0x7c40 runnable [0x4c698000..0x4c6990d0]
  java.lang.Thread.State: RUNNABLE
   at java.lang.String.lastIndexOf(String.java:1628)
   at java.io.File.getName(File.java:399)
atorg.apache.hadoop.dfs.FSDataset$FSDir.getGenerationStampFromFile(FSDataset.java:148)atorg.apache.hadoop.dfs.FSDataset$FSDir.getBlockInfo(FSDataset.java:181)atorg.apache.hadoop.dfs.FSDataset$FSVolume.getBlockInfo(FSDataset.java:412)atorg.apache.hadoop.dfs.FSDataset$FSVolumeSet.getBlockInfo(FSDataset.java:511)
   - locked <0x551e8d48> (a org.apache.hadoop.dfs.FSDataset$FSVolumeSet)
   at org.apache.hadoop.dfs.FSDataset.getBlockReport(FSDataset.java:1053)
   at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:708)
   at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2890)
   at java.lang.Thread.run(Thread.java:619)
This is basically taking a stat on every hdfs block on the datanode,which in our case is ~ 2million, and can take 10+ minutes (we may beexperiencing problems with our raid controller but have no visibilityinto it) at the OS level the file system seems fine and operationseventually finish.
It appears that a couple of different data structures are being lockedwith the single object FSDataset$Volume.
Then this happens:
"org.apache.hadoop.dfs.datanode$dataxcei...@1bcee17" daemon prio=10tid=0x4da8d000 nid=0x7ae4 waiting for monitor entry[0x459fe000..0x459ff0d0]
  java.lang.Thread.State: BLOCKED (on object monitor)
atorg.apache.hadoop.dfs.FSDataset$FSVolumeSet.getNextVolume(FSDataset.java:473)- waiting to lock <0x551e8d48> (aorg.apache.hadoop.dfs.FSDataset$FSVolumeSet)
   at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:934)
   - locked <0x54e550e0> (a org.apache.hadoop.dfs.FSDataset)
atorg.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:2322)atorg.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1187)
   at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1045)
   at java.lang.Thread.run(Thread.java:619)

which locks the FSDataset while waiting on the volume object
and now all of the Datanode operations stall waiting on the FSDatasetobject.
----------
Our particular installation doesn't use multiple directories for hdfs,so a first simple hack for a local fix would be to modify getNextVolumeto just return the single volume and not be synchronized
A richer alternative would be to make the locking more fine grained onFSDataset$FSVolumeSet.
Of course we are also trying to fix the file system performance and dfsblock loading that results in the block report taking a long time.
Any suggestions or warnings?

Thanks.

Re: 0.18.1 datanode psuedo deadlock problem

Reply via email to