[ https://issues.apache.org/jira/browse/MAPREDUCE-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728238#action_12728238 ]
Hudson commented on MAPREDUCE-646: ---------------------------------- Integrated in Hadoop-Mapreduce-trunk #15 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/15/]) > distcp should place the file distcp_src_files in distributed cache > ------------------------------------------------------------------ > > Key: MAPREDUCE-646 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-646 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp > Reporter: Ravi Gummadi > Assignee: Ravi Gummadi > Fix For: 0.21.0 > > Attachments: d_replica_srcfilelist.patch, > d_replica_srcfilelist_v1.patch, d_replica_srcfilelist_v2.patch > > > When large number of files are being copied by distcp, accessing > distcp_src_files seems to be an issue, as all map tasks would be accessing > this file. The error message seen is: > 09/06/16 10:13:16 INFO mapred.JobClient: Task Id : > attempt_200906040559_0110_m_003348_0, Status : FAILED > java.io.IOException: Could not obtain block: blk_-4229860619941366534_1500174 > file=/mapredsystem/hadoop/mapredsystem/distcp_7fiyvq/_distcp_src_files > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1757) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1585) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1712) > at java.io.DataInputStream.readFully(DataInputStream.java:178) > at java.io.DataInputStream.readFully(DataInputStream.java:152) > at > org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43) > at > org.apache.hadoop.tools.DistCp$CopyInputFormat.getRecordReader(DistCp.java:299) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:336) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > This could be because of HADOOP-6038 and/or HADOOP-4681. > If distcp places this special file distcp_src_files in distributed cache, > that could solve the problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.