[ https://issues.apache.org/jira/browse/HADOOP-12017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621944#comment-14621944 ]
Vinayakumar B commented on HADOOP-12017: ---------------------------------------- bq. I understand that you want some way to set the replication of the index files. But why the source file replication factor and the destination index file replication factor have to be the same? {{jobfs.setReplication(srcFiles, repl);}}, The {{repl}} used to set the replication of {{srcFiles}}. But this {{srcFiles}} is not the actual source files which contains data, this is just an intermediate list of filestatuses, written as sequencefile, which will be read to generate the MR job splits, immediately after this file is created. First, HDFS will not have any time to replicate, second, there is no use of increasing the replication since it will be read in the same client and only once as part of split generation. Also {{srcFiles}} will be deleted once the Job is done. On the other hand, actual data files, which are created from mappers as part files, have the default replication. Still the proposed patch didn't change this. Need to change this these also. So, IMO, user specified 'replication' should be used for the resultant archive (both content and indexes), not for the intermediate file. Also, since default replication 10, is not really used, we can change this to default replication 3 itself. and update in docs also. Any thoughts? > Hadoop archives command should use configurable replication factor when > closing > ------------------------------------------------------------------------------- > > Key: HADOOP-12017 > URL: https://issues.apache.org/jira/browse/HADOOP-12017 > Project: Hadoop Common > Issue Type: Bug > Affects Versions: 2.7.0 > Reporter: Zhe Zhang > Assignee: Bibin A Chundatt > Attachments: 0002-HADOOP-12017.patch, 0003-HADOOP-12017.patch, > 0003-HADOOP-12017.patch, 0004-HADOOP-12017.patch > > > {{HadoopArchives#HArchivesReducer#close}} uses hard-coded replication factor. > It should use {{repl}} instead, which is parsed from command line parameters. > {code} > // try increasing the replication > fs.setReplication(index, (short) 5); > fs.setReplication(masterIndex, (short) 5); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)