Hi Harsh,

             Thanks for the reply. So if I have a 2048 MB file with 64 MB
block size (32 blocks) with replication 3, then I'll have 96 blocks of the
file on HDFS, with no two similar blocks being on the same datanode. Also,
if I change the dfs.replication property, does it effect files already in
HDFS or is it valid only for new files that will be uploaded into HDFS? Is
there a way to rebalance the cluster based on the new replication factor?

And if I have replication set to 3, do all the 3 disk writes happen
simultaneously or is there some background process which does the
replication? If not, then increasing replication would lead to more writes
and thus reduce performance of any write-intensive job, am I right?

Thanks,
Hari

On Thu, Nov 4, 2010 at 11:59 PM, Harsh J <qwertyman...@gmail.com> wrote:

> Hi,
>
> The following inline reply is from what I know so far.
>
> On Thu, Nov 4, 2010 at 11:24 PM, Hari Sreekumar
> <hsreeku...@clickable.com> wrote:
> > Hi,
> >
> >  I have some pretty basic stuff on replication that I am no very clear
> > about, even after reading the online docs..
> >
> > 1. My understanding is that replication factor of x means any block of
> data
> > in HDFS will be available, given enough time, at x different nodes. I
> have a
> > confusion whether it is x nodes or x locations or x disks? e.g, if I have
> > replication set to 3 on a single node setup with one physical disk, we'll
> > have the same data at 3 locations on the hard drive? What if I have 3
> disks
> > on the node?
>
> A "factor" of 3 means that a block of HDFS data (whose replication
> factor is also set to 3) will be available at 3 nodes. That means, if
> a file is worth 3 blocks, there will be 3x3=9 blocks available in your
> cluster totally for that one file.
>
> Replication is between DataNodes. DataNodes can use a single disk or
> multiple disks, but the concept of replication applies to Node-level,
> not disk-level.
>
> If am right, on a single node, if you have replication as 3, it will
> still create one block cause only one DataNode is available. Perhaps
> it'll create the replicas once new DataNodes are recognized by the
> cluster.
>
> >
> > 2. If I have a 3 node setup with replication set to 1, and I upload into
> it
> > a 3 GB file, it means that 1 GB of the file, approximately, will be
> > available on each node, right?
>
> If your block size is set to 1 GB, then yes.
>
> But know that if you are uploading into the HDFS from a node that has
> a DataNode service on it, it will contain all 3 blocks (an
> optimization). The best way to upload a file to HDFS would be to do so
> from a node that does not run a DataNode, as this will ensure a random
> distribution of blocks. (Correct me if am wrong).
>
> >
> > 3. If I run a mapreduce job on the 3 GB file above, will there be any
> data
> > transfer between the nodes for the map phase? The optimizer will try to
> > assign tasks in a way that each node uses the locally available data, so
> > each Node will run the map function based on locally available data,
> right?
> > In the reduce phase, of course, the map outputs will be shuffled.
>
> Data transfer will occur only for non-local and rack-local map tasks
> (which are usually run when TaskTracker slots are unavailable for maps
> to run locally on them).
>
> >
> > I am asking these questions because in a recent test Map-reduce job I ran
> on
> > a 2.13 GB file (3 node cluster), the job competed in 40 s with
> > replication=3, but took 1 min 45 s with replication=1. What could be the
> > reason for this? Can network latency be a reason? The job is simply an
> > aggregation where map returns IntWritable(1)s and reduce just sums it up.
>
> The time delay here could be related to several things. One could be
> that with replication=1, only one DataNode had your blocks (since you
> uploaded from a node running the service, as pointed above).
>
> But the best way to see why would be to see how many of your map tasks
> on the whole, were local. You can see this on the JobTracker Web UI as
> a counter (or on the CLI, after the job's succeeded).
>
> >
> > Thanks in advance,
> > Hari
> >
>
>
>
> --
> Harsh J
> www.harshj.com
>



-- 
Hari Shankar Sreekumar
Software Development Engineer
Clickable Inc.

Ph: +91-9968307501
Skype: hari.clickable

Reply via email to