Sure, its just a case of how readers interpret it.

   1. Client is required to specify block size and replication factor each
   time
   2. Client does not need to worry about it since an admin has set the
   properties in default configuration files

A client could not be allowed to override the default configs if they are
set final (well there are ways to go around it as well as you suggest by
using create(....) :)

The information is great and helpful. Just want to make sure a beginner who
wants to write a "WordCount" in Mapreduce does not worry about specifying
block size' and replication factor in his code.

Thanks,
Prashant

On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney <mvarsh...@gmail.com>wrote:

> Hi Prashant
>
> Others may correct me if I am wrong here..
>
> The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge of block size
> and replication factor. In the source code, I see the following in the
> DFSClient constructor:
>
>    defaultBlockSize = conf.getLong("dfs.block.size", DEFAULT_BLOCK_SIZE);
>
>    defaultReplication = (short) conf.getInt("dfs.replication", 3);
>
> My understanding is that the client considers the following chain for the
> values:
> 1. Manual values (the long form constructor; when a user provides these
> values)
> 2. Configuration file values (these are cluster level defaults:
> dfs.block.size and dfs.replication)
> 3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)
>
> Moreover, in the org.apache.hadoop.hdfs.protocool.ClientProtocol the API to
> create a file is
> void create(...., short replication, long blocksize);
>
> I presume it means that the client already has knowledge of these values
> and passes them to the NameNode when creating a new file.
>
> Hope that helps.
>
> thanks
> -Maneesh
>
> On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi <prash1...@gmail.com
> >wrote:
>
> > Thanks Maneesh.
> >
> > Quick question, does a client really need to know Block size and
> > replication factor - A lot of times client has no control over these (set
> > at cluster level)
> >
> > -Prashant Kommireddi
> >
> > On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges <dejan.men...@gmail.com
> > >wrote:
> >
> > > Hi Maneesh,
> > >
> > > Thanks a lot for this! Just distributed it over the team and comments
> are
> > > great :)
> > >
> > > Best regards,
> > > Dejan
> > >
> > > On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney <mvarsh...@gmail.com
> > > >wrote:
> > >
> > > > For your reading pleasure!
> > > >
> > > > PDF 3.3MB uploaded at (the mailing list has a cap of 1MB
> attachments):
> > > >
> > > >
> > >
> >
> https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1
> > > >
> > > >
> > > > Appreciate if you can spare some time to peruse this little
> experiment
> > of
> > > > mine to use Comics as a medium to explain computer science topics.
> This
> > > > particular issue explains the protocols and internals of HDFS.
> > > >
> > > > I am eager to hear your opinions on the usefulness of this visual
> > medium
> > > to
> > > > teach complex protocols and algorithms.
> > > >
> > > > [My personal motivations: I have always found text descriptions to be
> > too
> > > > verbose as lot of effort is spent putting the concepts in proper
> > > time-space
> > > > context (which can be easily avoided in a visual medium); sequence
> > > diagrams
> > > > are unwieldy for non-trivial protocols, and they do not explain
> > concepts;
> > > > and finally, animations/videos happen "too fast" and do not offer
> > > > self-paced learning experience.]
> > > >
> > > > All forms of criticisms, comments (and encouragements) welcome :)
> > > >
> > > > Thanks
> > > > Maneesh
> > > >
> > >
> >
>

Reply via email to