Hi Prashant

Others may correct me if I am wrong here..

The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge of block size
and replication factor. In the source code, I see the following in the
DFSClient constructor:

    defaultBlockSize = conf.getLong("dfs.block.size", DEFAULT_BLOCK_SIZE);

    defaultReplication = (short) conf.getInt("dfs.replication", 3);

My understanding is that the client considers the following chain for the
values:
1. Manual values (the long form constructor; when a user provides these
values)
2. Configuration file values (these are cluster level defaults:
dfs.block.size and dfs.replication)
3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)

Moreover, in the org.apache.hadoop.hdfs.protocool.ClientProtocol the API to
create a file is
void create(...., short replication, long blocksize);

I presume it means that the client already has knowledge of these values
and passes them to the NameNode when creating a new file.

Hope that helps.

thanks
-Maneesh

On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi <prash1...@gmail.com>wrote:

> Thanks Maneesh.
>
> Quick question, does a client really need to know Block size and
> replication factor - A lot of times client has no control over these (set
> at cluster level)
>
> -Prashant Kommireddi
>
> On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges <dejan.men...@gmail.com
> >wrote:
>
> > Hi Maneesh,
> >
> > Thanks a lot for this! Just distributed it over the team and comments are
> > great :)
> >
> > Best regards,
> > Dejan
> >
> > On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney <mvarsh...@gmail.com
> > >wrote:
> >
> > > For your reading pleasure!
> > >
> > > PDF 3.3MB uploaded at (the mailing list has a cap of 1MB attachments):
> > >
> > >
> >
> https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1
> > >
> > >
> > > Appreciate if you can spare some time to peruse this little experiment
> of
> > > mine to use Comics as a medium to explain computer science topics. This
> > > particular issue explains the protocols and internals of HDFS.
> > >
> > > I am eager to hear your opinions on the usefulness of this visual
> medium
> > to
> > > teach complex protocols and algorithms.
> > >
> > > [My personal motivations: I have always found text descriptions to be
> too
> > > verbose as lot of effort is spent putting the concepts in proper
> > time-space
> > > context (which can be easily avoided in a visual medium); sequence
> > diagrams
> > > are unwieldy for non-trivial protocols, and they do not explain
> concepts;
> > > and finally, animations/videos happen "too fast" and do not offer
> > > self-paced learning experience.]
> > >
> > > All forms of criticisms, comments (and encouragements) welcome :)
> > >
> > > Thanks
> > > Maneesh
> > >
> >
>

Reply via email to