Re: [jira] Commented: (HADOOP-51) per-file replication counts

Eric Baldeschwieler Mon, 10 Apr 2006 09:15:32 -0700

"Full" replication is a good idea, but I suggest we file it as a newbug/enhancement.

Actually placing a copy of a file on every node is probably rarelythe right thing to do for "full" replication. One copy per switchwould be my preferred default on our clusters (gigabit switches) andfor .JAR files squareroot(numNodes) is probably the right answer.


e14

On Apr 8, 2006, at 12:16 PM, Bryan Pendleton (JIRA) wrote:

[ http://issues.apache.org/jira/browse/HADOOP-51?page=comments#action_12373745 ]
Bryan Pendleton commented on HADOOP-51:
---------------------------------------

Great!

A few comments from reading the patch (haven't test with it yet):
1) The <description> for dfs.replication.min is wrong
2) This is a wider concern, but on coding style - the idiom ofconf.getType("config.value",defaultValue) is good for user-definedvalues, but shouldn't the default be skipped for things that aredefined in hadoop-default.xml, in general? It takes away the valueof hadoop-default, and it also means changing that value might ormight not always have the desired system-wide results.3) Wouldn't it be better to log at a severe level replications thatare set below minReplication, or greater than maxReplication, andjust set the replication to the nearest bound? Since replication isset per-file by the application, but min and max are probably setby the administrator of the hadoop cluster. Throwing an IOExceptioncauses failure where degraded performance would be preferable.4) I may be dense, but I didn't see any way to specify thatreplication be "full", ie, a copy per datanode. I got the feelingthis was something that was desired of this functionality (ie, forjob.jar files, job configs, and lookup data used widely in a job)Using a short means, if we ever scale to > 32k nodes, there'd be noway to manually specify this. Just using Short.MAX_VALUE meansgetting a lot of errors about not being able to replicate as fullyas desired.
Otherwise, this looks like a wonderful patch!
per-file replication counts
---------------------------

         Key: HADOOP-51
         URL: http://issues.apache.org/jira/browse/HADOOP-51
     Project: Hadoop
        Type: New Feature
  Components: dfs
    Versions: 0.2
    Reporter: Doug Cutting
    Assignee: Konstantin Shvachko
     Fix For: 0.2
 Attachments: Replication.patch
It should be possible to specify different replication counts fordifferent files. Perhaps an option when creating a new fileshould be the desired replication count. MapReduce should takeadvantage of this feature so that job.xml and job.jar files, whichare frequently accessed by lots of machines, are more highlyreplicated than large data files.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of theadministrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Re: [jira] Commented: (HADOOP-51) per-file replication counts

Reply via email to