Hi Matthew

I agree with both you and Prashant. The strip needs to be modified to
explain that these can be default values that can be optionally overridden
(which I will fix in the next iteration).

However, from the 'understanding concepts of HDFS' point of view, I still
think that block size and replication factors are the real strengths of
HDFS, and the learners must be exposed to them so that they get to see how
hdfs is significantly different from conventional file systems.

On personal note: thanks for the first part of your message :)

-Maneesh


On Wed, Nov 30, 2011 at 1:36 PM, GOEKE, MATTHEW (AG/1000) <
matthew.go...@monsanto.com> wrote:

> Maneesh,
>
> Firstly, I love the comic :)
>
> Secondly, I am inclined to agree with Prashant on this latest point. While
> one code path could take us through the user defining command line
> overrides (e.g. hadoop fs -D blah -put foo bar) I think it might confuse a
> person new to Hadoop. The most common flow would be using admin determined
> values from hdfs-site and the only thing that would need to change is that
> conversation happening between client / server and not user / client.
>
> Matt
>
> -----Original Message-----
> From: Prashant Kommireddi [mailto:prash1...@gmail.com]
> Sent: Wednesday, November 30, 2011 3:28 PM
> To: common-user@hadoop.apache.org
> Subject: Re: HDFS Explained as Comics
>
> Sure, its just a case of how readers interpret it.
>
>   1. Client is required to specify block size and replication factor each
>   time
>   2. Client does not need to worry about it since an admin has set the
>   properties in default configuration files
>
> A client could not be allowed to override the default configs if they are
> set final (well there are ways to go around it as well as you suggest by
> using create(....) :)
>
> The information is great and helpful. Just want to make sure a beginner who
> wants to write a "WordCount" in Mapreduce does not worry about specifying
> block size' and replication factor in his code.
>
> Thanks,
> Prashant
>
> On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney <mvarsh...@gmail.com
> >wrote:
>
> > Hi Prashant
> >
> > Others may correct me if I am wrong here..
> >
> > The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge of block
> size
> > and replication factor. In the source code, I see the following in the
> > DFSClient constructor:
> >
> >    defaultBlockSize = conf.getLong("dfs.block.size", DEFAULT_BLOCK_SIZE);
> >
> >    defaultReplication = (short) conf.getInt("dfs.replication", 3);
> >
> > My understanding is that the client considers the following chain for the
> > values:
> > 1. Manual values (the long form constructor; when a user provides these
> > values)
> > 2. Configuration file values (these are cluster level defaults:
> > dfs.block.size and dfs.replication)
> > 3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)
> >
> > Moreover, in the org.apache.hadoop.hdfs.protocool.ClientProtocol the API
> to
> > create a file is
> > void create(...., short replication, long blocksize);
> >
> > I presume it means that the client already has knowledge of these values
> > and passes them to the NameNode when creating a new file.
> >
> > Hope that helps.
> >
> > thanks
> > -Maneesh
> >
> > On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi <
> prash1...@gmail.com
> > >wrote:
> >
> > > Thanks Maneesh.
> > >
> > > Quick question, does a client really need to know Block size and
> > > replication factor - A lot of times client has no control over these
> (set
> > > at cluster level)
> > >
> > > -Prashant Kommireddi
> > >
> > > On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges <dejan.men...@gmail.com
> > > >wrote:
> > >
> > > > Hi Maneesh,
> > > >
> > > > Thanks a lot for this! Just distributed it over the team and comments
> > are
> > > > great :)
> > > >
> > > > Best regards,
> > > > Dejan
> > > >
> > > > On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney <
> mvarsh...@gmail.com
> > > > >wrote:
> > > >
> > > > > For your reading pleasure!
> > > > >
> > > > > PDF 3.3MB uploaded at (the mailing list has a cap of 1MB
> > attachments):
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1
> > > > >
> > > > >
> > > > > Appreciate if you can spare some time to peruse this little
> > experiment
> > > of
> > > > > mine to use Comics as a medium to explain computer science topics.
> > This
> > > > > particular issue explains the protocols and internals of HDFS.
> > > > >
> > > > > I am eager to hear your opinions on the usefulness of this visual
> > > medium
> > > > to
> > > > > teach complex protocols and algorithms.
> > > > >
> > > > > [My personal motivations: I have always found text descriptions to
> be
> > > too
> > > > > verbose as lot of effort is spent putting the concepts in proper
> > > > time-space
> > > > > context (which can be easily avoided in a visual medium); sequence
> > > > diagrams
> > > > > are unwieldy for non-trivial protocols, and they do not explain
> > > concepts;
> > > > > and finally, animations/videos happen "too fast" and do not offer
> > > > > self-paced learning experience.]
> > > > >
> > > > > All forms of criticisms, comments (and encouragements) welcome :)
> > > > >
> > > > > Thanks
> > > > > Maneesh
> > > > >
> > > >
> > >
> >
> This e-mail message may contain privileged and/or confidential
> information, and is intended to be received only by persons entitled
> to receive such information. If you have received this e-mail in error,
> please notify the sender immediately. Please delete it and
> all attachments from any servers, hard drives or any other media. Other
> use of this e-mail by you is strictly prohibited.
>
> All e-mails and attachments sent and received are subject to monitoring,
> reading and archival by Monsanto, including its
> subsidiaries. The recipient of this e-mail is solely responsible for
> checking for the presence of "Viruses" or other "Malware".
> Monsanto, along with its subsidiaries, accepts no liability for any damage
> caused by any such code transmitted by or accompanying
> this e-mail or any attachment.
>
>
> The information contained in this email may be subject to the export
> control laws and regulations of the United States, potentially
> including but not limited to the Export Administration Regulations (EAR)
> and sanctions regulations issued by the U.S. Department of
> Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this
> information you are obligated to comply with all
> applicable U.S. export laws and regulations.
>
>

Reply via email to