Maneesh,

Firstly, I love the comic :)

Secondly, I am inclined to agree with Prashant on this latest point. While one 
code path could take us through the user defining command line overrides (e.g. 
hadoop fs -D blah -put foo bar) I think it might confuse a person new to 
Hadoop. The most common flow would be using admin determined values from 
hdfs-site and the only thing that would need to change is that conversation 
happening between client / server and not user / client.

Matt

-----Original Message-----
From: Prashant Kommireddi [mailto:prash1...@gmail.com] 
Sent: Wednesday, November 30, 2011 3:28 PM
To: common-user@hadoop.apache.org
Subject: Re: HDFS Explained as Comics

Sure, its just a case of how readers interpret it.

   1. Client is required to specify block size and replication factor each
   time
   2. Client does not need to worry about it since an admin has set the
   properties in default configuration files

A client could not be allowed to override the default configs if they are
set final (well there are ways to go around it as well as you suggest by
using create(....) :)

The information is great and helpful. Just want to make sure a beginner who
wants to write a "WordCount" in Mapreduce does not worry about specifying
block size' and replication factor in his code.

Thanks,
Prashant

On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney <mvarsh...@gmail.com>wrote:

> Hi Prashant
>
> Others may correct me if I am wrong here..
>
> The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge of block size
> and replication factor. In the source code, I see the following in the
> DFSClient constructor:
>
>    defaultBlockSize = conf.getLong("dfs.block.size", DEFAULT_BLOCK_SIZE);
>
>    defaultReplication = (short) conf.getInt("dfs.replication", 3);
>
> My understanding is that the client considers the following chain for the
> values:
> 1. Manual values (the long form constructor; when a user provides these
> values)
> 2. Configuration file values (these are cluster level defaults:
> dfs.block.size and dfs.replication)
> 3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)
>
> Moreover, in the org.apache.hadoop.hdfs.protocool.ClientProtocol the API to
> create a file is
> void create(...., short replication, long blocksize);
>
> I presume it means that the client already has knowledge of these values
> and passes them to the NameNode when creating a new file.
>
> Hope that helps.
>
> thanks
> -Maneesh
>
> On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi <prash1...@gmail.com
> >wrote:
>
> > Thanks Maneesh.
> >
> > Quick question, does a client really need to know Block size and
> > replication factor - A lot of times client has no control over these (set
> > at cluster level)
> >
> > -Prashant Kommireddi
> >
> > On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges <dejan.men...@gmail.com
> > >wrote:
> >
> > > Hi Maneesh,
> > >
> > > Thanks a lot for this! Just distributed it over the team and comments
> are
> > > great :)
> > >
> > > Best regards,
> > > Dejan
> > >
> > > On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney <mvarsh...@gmail.com
> > > >wrote:
> > >
> > > > For your reading pleasure!
> > > >
> > > > PDF 3.3MB uploaded at (the mailing list has a cap of 1MB
> attachments):
> > > >
> > > >
> > >
> >
> https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1
> > > >
> > > >
> > > > Appreciate if you can spare some time to peruse this little
> experiment
> > of
> > > > mine to use Comics as a medium to explain computer science topics.
> This
> > > > particular issue explains the protocols and internals of HDFS.
> > > >
> > > > I am eager to hear your opinions on the usefulness of this visual
> > medium
> > > to
> > > > teach complex protocols and algorithms.
> > > >
> > > > [My personal motivations: I have always found text descriptions to be
> > too
> > > > verbose as lot of effort is spent putting the concepts in proper
> > > time-space
> > > > context (which can be easily avoided in a visual medium); sequence
> > > diagrams
> > > > are unwieldy for non-trivial protocols, and they do not explain
> > concepts;
> > > > and finally, animations/videos happen "too fast" and do not offer
> > > > self-paced learning experience.]
> > > >
> > > > All forms of criticisms, comments (and encouragements) welcome :)
> > > >
> > > > Thanks
> > > > Maneesh
> > > >
> > >
> >
>
This e-mail message may contain privileged and/or confidential information, and 
is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please 
notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of 
this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, 
reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking 
for the presence of "Viruses" or other "Malware".
Monsanto, along with its subsidiaries, accepts no liability for any damage 
caused by any such code transmitted by or accompanying
this e-mail or any attachment.


The information contained in this email may be subject to the export control 
laws and regulations of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and 
sanctions regulations issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this 
information you are obligated to comply with all
applicable U.S. export laws and regulations.

Reply via email to