Re: Data-local map tasks lower than Launched map tasks even with full replication

Todd Lipcon Fri, 17 Jul 2009 16:22:49 -0700

On Fri, Jul 17, 2009 at 4:16 PM, Seunghwa Kang <s.k...@gatech.edu> wrote:


> I checked with
>
> bin/hadoop fs -stat "%n %r" input/*
>
> part-00000 4
> part-00001 4
> part-00002 4
> part-00003 4
> part-00004 4
> part-00005 4
> part-00006 4
> part-00007 4
>
> and see replication factor is 4.
>
> Also, I set replication factor to 4 in hadoop-site.xml, run stop-all.sh
> and start-all.sh, re-load the data, and re-run the code but still
> getting the same result.
>
> I am searching for hadoop-default.xml and find
>
> <property>
> <name>dfs.balance.bandwidthPerSec</name>
> <value>1048576</value>
> <description>
> Specifies the maximum amount of bandwidth that each datanode
> can utilize for the balancing purpose in term of
> the number of bytes per second.
> </description>
> </property>
>
> 1048576 is 1 GB/s and seems like higher than 1 Gbit/s for my nodes. I am
> going to change this value and see what happens.


That's in bytes per second, so that's 1MB/sec. If anything you may want to
raise it on a small cluster. This also affects the dfs balancer, not
rereplication of underreplicated blocks, so shouldn't matter.

Check fsck and see what it says for a count of underreplicated blocks. Also,
if you use the NN web UI to navigate to view one of these files, it should
tell you where the blocks are hosted.

Overall, I wouldn't worry about this on a small cluster unless you've seen
on your monitoring graphs that your network is getting saturated. You'll
probably be CPU bound before you're network bound unless you have VERY low
locality and very fast CPUs.

-Todd


>
> On Fri, 2009-07-17 at 16:07 -0700, Ted Dunning wrote:
> >
> > Does [hadoop fs -fsck /] show any under-replicated files/blocks?  you
> > may not waited long enough after increasing the target replication
> > rate.
> >
> > Another thing to watch out for in a production node is the
> > distribution of node blocks.  You should be careful to load data from
> > outside the cluster to ensure random placement of file blocks.  That
> > is critical for getting good locality.  This obviously doesn't apply
> > to your situation with 4 replicas on 4 nodes.
> >
> > Todd's comment about -setrep is also very important to note.
> >
> > On Fri, Jul 17, 2009 at 3:57 PM, Seunghwa Kang <s.k...@gatech.edu>
> > wrote:
> >
> >         Just for test purpose, I increase the replication factor to 4,
> >         and check
> >         that input data actually has replication factor of 4 with
> >         'hadoop fs
> >         -stat %r%n' but find that the ratio is still around 80% for 4
> >         nodes.
> >
> >
>
>

Re: Data-local map tasks lower than Launched map tasks even with full replication

Reply via email to