[
https://issues.apache.org/jira/browse/HADOOP-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12476697
]
Hadoop QA commented on HADOOP-1046:
-----------------------------------
+1, because
http://issues.apache.org/jira/secure/attachment/12352260/fsdataset.patch
applied and successfully tested against trunk revision
http://svn.apache.org/repos/asf/lucene/hadoop/trunk/512924. Results are at
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
> Datanode should periodically clean up /tmp from partially received (and not
> completed) block files
> --------------------------------------------------------------------------------------------------
>
> Key: HADOOP-1046
> URL: https://issues.apache.org/jira/browse/HADOOP-1046
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.9.2, 0.12.0
> Environment: Cluster of 10 machines, running Hadoop 0.9.2 + Nutch
> Reporter: Andrzej Bialecki
> Fix For: 0.12.0
>
> Attachments: fsdataset.patch
>
>
> Cluster is set up with tasktrackers running on the same machines as
> datanodes. Tasks create heavy load in terms of local CPU/RAM/diskIO. I
> noticed a lot of the following messages from the datanodes in such situations:
> 2007-02-15 05:30:53,298 WARN dfs.DataNode - Failed to transfer
> blk_-4590782726923911824 to xxx.xxx.xxx/10.10.16.109:50010
> java.net.SocketException: Connection reset
> ....
> java.io.IOException: Block blk_71053993347675204 has already been started
> (though not completed), and thus cannot be created.
> My reading of the code in DataNode.DataXceiver.writeBlock() and
> FSDataset.writeToBlock() + FSDataset.java:459 suggests the following
> scenario: there is no cleanup of temporary files in /tmp that are used to
> store the incomplete blocks being transferred. If the datanode is CPU-starved
> and drops the connection while creating this temp file, the source datanode
> will attempt to transfer it again - but there is already a file under this
> name in /tmp, because when the connection was dropped the target datanode
> didn't bother to cleanup.
> I also see that this section is unchanged in trunk/.
> The solution to this would be to check the age of the physical file in the
> /tmp dir, in FSDataset.java:436 - if it's older than a few hours or so, we
> should delete it and proceed as if there were no ongoing create op for this
> block.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.