[ https://issues.apache.org/jira/browse/HADOOP-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549515 ]
Doug Cutting commented on HADOOP-2379: -------------------------------------- This may be improved by HADOOP-2129, which improves the caching of FileStatus in DFS. You might try applying that patch to 0.14.3... > Distcp setup is slow > -------------------- > > Key: HADOOP-2379 > URL: https://issues.apache.org/jira/browse/HADOOP-2379 > Project: Hadoop > Issue Type: Improvement > Components: dfs > Affects Versions: 0.14.3 > Environment: from 35 node cluster to 10 node cluster > Reporter: Johan Oskarsson > Priority: Minor > > When starting a distcp the setup phase often takes a very long time. For > example during the distcp I just ran the setup phase took 15 minutes and the > actual copy 3 minutes. Could this be improved? Or at least a progress bar > added so the user doesn't think it stalled. > I also often see exceptions like this in the setup, but the distcp finishes > eventually. > java.io.EOFException > at java.io.DataInputStream.readShort(DataInputStream.java:298) > at > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1672) > at > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1744) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64) > at > org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:774) > at > org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.setup(CopyFiles.java:351) > at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:773) > at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:854) > at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187) > at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:864) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.