[ https://issues.apache.org/jira/browse/HDFS-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909821#comment-16909821 ]
hemanthboyina commented on HDFS-14574: -------------------------------------- [~jojochuang] , In DIstCp we have preserve status (rbugpc..) if we have an option for replication then these replications will override any suggestions about this ? > [distcp] Add ability to increase the replication factor for fileList.seq > ------------------------------------------------------------------------ > > Key: HDFS-14574 > URL: https://issues.apache.org/jira/browse/HDFS-14574 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp > Reporter: Wei-Chiu Chuang > Assignee: hemanthboyina > Priority: Major > > distcp creates fileList.seq with default replication factor = 3. > For large clusters runing distcp job with thousands of mappers, that > 3-replica for the file listing file is not good enough, because DataNodes > easily run out of max number of xceivers. > > It looks like we can pass in a distcp option, update replication factor in > when creating the sequence file writer: > [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java#L517-L521] > > Like this: > {code:java} > return SequenceFile.createWriter(getConf(), > SequenceFile.Writer.file(pathToListFile), > SequenceFile.Writer.keyClass(Text.class), > SequenceFile.Writer.valueClass(CopyListingFileStatus.class), > SequenceFile.Writer.compression(SequenceFile.CompressionType.NONE), > SequenceFile.Writer.replication((short)100)); <-- this line > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org