Why should this lead to an IOException?  Is it because the pruning of
replicas is asynchronous and the datanodes try to access nonexistent
files?  If so that seems like a pretty major bug


On Fri, Aug 16, 2013 at 5:21 PM, Chris Nauroth <cnaur...@hortonworks.com>wrote:

> Hi Eric,
>
> Yes, this is intentional.  The job.xml file and the job jar file get read
> from every node running a map or reduce task.  Because of this, using a
> higher than normal replication factor on these files improves locality.
>  More than 3 task slots will have access to local replicas.  These files
> tend to be much smaller than the actual data read by a job, so there tends
> to be little harm done in terms of disk space consumption.
>
> Why not create the file initially with 10 replicas instead of creating it
> with 3 and then dialing up?  I imagine this is so that job submission
> doesn't block on a synchronous write to a long pipeline.  The extra
> replicas aren't necessary for correctness, and a long-running job will get
> the locality benefits in the long term once more replicas are created in
> the background.
>
> I recommend submitting a new jira describing the problem that you saw.  We
> probably can handle this better, and a jira would be a good place to
> discuss the trade-offs.  A few possibilities:
>
> Log a warning if mapred.submit.replication < dfs.replication.
> Skip resetting replication if mapred.submit.replication <= dfs.replication.
> Fail with error if mapred.submit.replication < dfs.replication.
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
> On Thu, Aug 15, 2013 at 6:21 AM, Sirianni, Eric <eric.siria...@netapp.com
> >wrote:
>
> > In debugging some replication issues in our HDFS environment, I noticed
> > that the MapReduce framework uses the following algorithm for setting the
> > replication on submitted job files:
> >
> > 1.     Create the file with *default* DFS replication factor (i.e.
> > 'dfs.replication')
> >
> > 2.     Subsequently alter the replication of the file based on the
> > 'mapred.submit.replication' config value
> >
> >   private static FSDataOutputStream createFile(FileSystem fs, Path
> > splitFile,
> >       Configuration job)  throws IOException {
> >     FSDataOutputStream out = FileSystem.create(fs, splitFile,
> >         new FsPermission(JobSubmissionFiles.JOB_FILE_PERMISSION));
> >     int replication = job.getInt("mapred.submit.replication", 10);
> >     fs.setReplication(splitFile, (short)replication);
> >     writeSplitHeader(out);
> >     return out;
> >   }
> >
> > If I understand currectly, the net functional effect of this approach is
> > that
> >
> > -       The initial write pipeline is setup with 'dfs.replication' nodes
> > (i.e. 3)
> >
> > -       The namenode triggers additional inter-datanode replications in
> > the background (as it detects the blocks as "under-replicated").
> >
> > I'm assuming this is intentional?  Alternatively, if the
> > mapred.submit.replication was specified on initial create, the write
> > pipeline would be significantly larger.
> >
> > The reason I noticed is that we had inadvertently specified
> > mapred.submit.replication as *less than* dfs.replication in our
> > configuration, which caused a bunch of excess replica pruning (and
> > ultimately IOExceptions in our datanode logs).
> >
> > Thanks,
> > Eric
> >
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>



-- 
Jay Vyas
http://jayunit100.blogspot.com

Reply via email to