Re: HDFS Explained as Comics

2011-12-01 Thread maneesh varshney
Hi Dieter

Very clear.  The comic format works indeed quite well.
> I never considered comics as a serious ("professional") way to get
> something explained efficiently,
> but this shows people should think twice before they start writing their
> next documentation.
>

Thanks! :)


> one question though: if a DN has a corrupted block, why does the NN only
> remove the bad DN from the block's list, and not the block from the DN list?
>

You are right. This needs to be fixed.


> (also, does it really store the data in 2 separate tables?  This looks to
> me like 2 different views of the same data?)


Actually its more than two tables... I have personally found the data
structures rather contrived.

In the org.apache.hadoop.hdfs.server.namenode package, information is kept
in multiple places:
- InodeFile, which has a list of blocks for a given file
- FSNamesystem, has a map of block -> {inode, datanodes}
- BlockInfo, which stores information in rather strange manner:

/**

 * This array contains triplets of references.

 * For each i-th data-node the block belongs to

 * triplets[3*i] is the reference to the DatanodeDescriptor

 * and triplets[3*i+1] and triplets[3*i+2] are references

 * to the previous and the next blocks, respectively, in the

 * list of blocks belonging to this data-node.

 */

private Object[] triplets;





> On Thu, 1 Dec 2011 08:53:31 +0100
> "Alexander C.H. Lorenz"  wrote:
>
> > Hi all,
> >
> > very cool comic!
> >
> > Thanks,
> >  Alex
> >
> > On Wed, Nov 30, 2011 at 11:58 PM, Abhishek Pratap Singh
> >  > > wrote:
> >
> > > Hi,
> > >
> > > This is indeed a good way to explain, most of the improvement has
> > > already been discussed. waiting for sequel of this comic.
> > >
> > > Regards,
> > > Abhishek
> > >
> > > On Wed, Nov 30, 2011 at 1:55 PM, maneesh varshney
> > >  > > >wrote:
> > >
> > > > Hi Matthew
> > > >
> > > > I agree with both you and Prashant. The strip needs to be
> > > > modified to explain that these can be default values that can be
> > > > optionally
> > > overridden
> > > > (which I will fix in the next iteration).
> > > >
> > > > However, from the 'understanding concepts of HDFS' point of view,
> > > > I still think that block size and replication factors are the
> > > > real strengths of HDFS, and the learners must be exposed to them
> > > > so that they get to see
> > > how
> > > > hdfs is significantly different from conventional file systems.
> > > >
> > > > On personal note: thanks for the first part of your message :)
> > > >
> > > > -Maneesh
> > > >
> > > >
> > > > On Wed, Nov 30, 2011 at 1:36 PM, GOEKE, MATTHEW (AG/1000) <
> > > > matthew.go...@monsanto.com> wrote:
> > > >
> > > > > Maneesh,
> > > > >
> > > > > Firstly, I love the comic :)
> > > > >
> > > > > Secondly, I am inclined to agree with Prashant on this latest
> > > > > point.
> > > > While
> > > > > one code path could take us through the user defining command
> > > > > line overrides (e.g. hadoop fs -D blah -put foo bar) I think it
> > > > > might
> > > confuse
> > > > a
> > > > > person new to Hadoop. The most common flow would be using admin
> > > > determined
> > > > > values from hdfs-site and the only thing that would need to
> > > > > change is
> > > > that
> > > > > conversation happening between client / server and not user /
> > > > > client.
> > > > >
> > > > > Matt
> > > > >
> > > > > -Original Message-
> > > > > From: Prashant Kommireddi [mailto:prash1...@gmail.com]
> > > > > Sent: Wednesday, November 30, 2011 3:28 PM
> > > > > To: common-user@hadoop.apache.org
> > > > > Subject: Re: HDFS Explained as Comics
> > > > >
> > > > > Sure, its just a case of how readers interpret it.
> > > > >
> > > > >   1. Client is required to specify block size and replication
> > > > > factor
> > > each
> > > > >   time
> > > > >   2. Client does not need to worry about it since an admin has
> > > > > set the propertie

RE: HDFS Explained as Comics

2011-12-01 Thread Ravi teja ch n v

Thats indeed a great piece of work Maneesh...Waiting for the mapreduce comic :)

Regards,
Ravi Teja

From: Dieter Plaetinck [dieter.plaeti...@intec.ugent.be]
Sent: 01 December 2011 15:11:36
To: common-user@hadoop.apache.org
Subject: Re: HDFS Explained as Comics

Very clear.  The comic format works indeed quite well.
I never considered comics as a serious ("professional") way to get something 
explained efficiently,
but this shows people should think twice before they start writing their next 
documentation.

one question though: if a DN has a corrupted block, why does the NN only remove 
the bad DN from the block's list, and not the block from the DN list?
(also, does it really store the data in 2 separate tables?  This looks to me 
like 2 different views of the same data?)

Dieter

On Thu, 1 Dec 2011 08:53:31 +0100
"Alexander C.H. Lorenz"  wrote:

> Hi all,
>
> very cool comic!
>
> Thanks,
>  Alex
>
> On Wed, Nov 30, 2011 at 11:58 PM, Abhishek Pratap Singh
>  > wrote:
>
> > Hi,
> >
> > This is indeed a good way to explain, most of the improvement has
> > already been discussed. waiting for sequel of this comic.
> >
> > Regards,
> > Abhishek
> >
> > On Wed, Nov 30, 2011 at 1:55 PM, maneesh varshney
> >  > >wrote:
> >
> > > Hi Matthew
> > >
> > > I agree with both you and Prashant. The strip needs to be
> > > modified to explain that these can be default values that can be
> > > optionally
> > overridden
> > > (which I will fix in the next iteration).
> > >
> > > However, from the 'understanding concepts of HDFS' point of view,
> > > I still think that block size and replication factors are the
> > > real strengths of HDFS, and the learners must be exposed to them
> > > so that they get to see
> > how
> > > hdfs is significantly different from conventional file systems.
> > >
> > > On personal note: thanks for the first part of your message :)
> > >
> > > -Maneesh
> > >
> > >
> > > On Wed, Nov 30, 2011 at 1:36 PM, GOEKE, MATTHEW (AG/1000) <
> > > matthew.go...@monsanto.com> wrote:
> > >
> > > > Maneesh,
> > > >
> > > > Firstly, I love the comic :)
> > > >
> > > > Secondly, I am inclined to agree with Prashant on this latest
> > > > point.
> > > While
> > > > one code path could take us through the user defining command
> > > > line overrides (e.g. hadoop fs -D blah -put foo bar) I think it
> > > > might
> > confuse
> > > a
> > > > person new to Hadoop. The most common flow would be using admin
> > > determined
> > > > values from hdfs-site and the only thing that would need to
> > > > change is
> > > that
> > > > conversation happening between client / server and not user /
> > > > client.
> > > >
> > > > Matt
> > > >
> > > > -Original Message-
> > > > From: Prashant Kommireddi [mailto:prash1...@gmail.com]
> > > > Sent: Wednesday, November 30, 2011 3:28 PM
> > > > To: common-user@hadoop.apache.org
> > > > Subject: Re: HDFS Explained as Comics
> > > >
> > > > Sure, its just a case of how readers interpret it.
> > > >
> > > >   1. Client is required to specify block size and replication
> > > > factor
> > each
> > > >   time
> > > >   2. Client does not need to worry about it since an admin has
> > > > set the properties in default configuration files
> > > >
> > > > A client could not be allowed to override the default configs
> > > > if they
> > are
> > > > set final (well there are ways to go around it as well as you
> > > > suggest
> > by
> > > > using create() :)
> > > >
> > > > The information is great and helpful. Just want to make sure a
> > > > beginner
> > > who
> > > > wants to write a "WordCount" in Mapreduce does not worry about
> > specifying
> > > > block size' and replication factor in his code.
> > > >
> > > > Thanks,
> > > > Prashant
> > > >
> > > > On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney
> > > >  > > > >wrote:
> > > >
> > > > > Hi Prashant
> > > > >
> > > > > Others may corre

RE: HDFS Explained as Comics

2011-12-01 Thread Ravi teja ch n v
Thats indeed a great piece of work Maneesh...Waiting for the mapreduce comic :)

Regards,
Ravi Teja

From: Dieter Plaetinck [dieter.plaeti...@intec.ugent.be]
Sent: 01 December 2011 15:11:36
To: common-user@hadoop.apache.org
Subject: Re: HDFS Explained as Comics

Very clear.  The comic format works indeed quite well.
I never considered comics as a serious ("professional") way to get something 
explained efficiently,
but this shows people should think twice before they start writing their next 
documentation.

one question though: if a DN has a corrupted block, why does the NN only remove 
the bad DN from the block's list, and not the block from the DN list?
(also, does it really store the data in 2 separate tables?  This looks to me 
like 2 different views of the same data?)

Dieter

On Thu, 1 Dec 2011 08:53:31 +0100
"Alexander C.H. Lorenz"  wrote:

> Hi all,
>
> very cool comic!
>
> Thanks,
>  Alex
>
> On Wed, Nov 30, 2011 at 11:58 PM, Abhishek Pratap Singh
>  > wrote:
>
> > Hi,
> >
> > This is indeed a good way to explain, most of the improvement has
> > already been discussed. waiting for sequel of this comic.
> >
> > Regards,
> > Abhishek
> >
> > On Wed, Nov 30, 2011 at 1:55 PM, maneesh varshney
> >  > >wrote:
> >
> > > Hi Matthew
> > >
> > > I agree with both you and Prashant. The strip needs to be
> > > modified to explain that these can be default values that can be
> > > optionally
> > overridden
> > > (which I will fix in the next iteration).
> > >
> > > However, from the 'understanding concepts of HDFS' point of view,
> > > I still think that block size and replication factors are the
> > > real strengths of HDFS, and the learners must be exposed to them
> > > so that they get to see
> > how
> > > hdfs is significantly different from conventional file systems.
> > >
> > > On personal note: thanks for the first part of your message :)
> > >
> > > -Maneesh
> > >
> > >
> > > On Wed, Nov 30, 2011 at 1:36 PM, GOEKE, MATTHEW (AG/1000) <
> > > matthew.go...@monsanto.com> wrote:
> > >
> > > > Maneesh,
> > > >
> > > > Firstly, I love the comic :)
> > > >
> > > > Secondly, I am inclined to agree with Prashant on this latest
> > > > point.
> > > While
> > > > one code path could take us through the user defining command
> > > > line overrides (e.g. hadoop fs -D blah -put foo bar) I think it
> > > > might
> > confuse
> > > a
> > > > person new to Hadoop. The most common flow would be using admin
> > > determined
> > > > values from hdfs-site and the only thing that would need to
> > > > change is
> > > that
> > > > conversation happening between client / server and not user /
> > > > client.
> > > >
> > > > Matt
> > > >
> > > > -Original Message-
> > > > From: Prashant Kommireddi [mailto:prash1...@gmail.com]
> > > > Sent: Wednesday, November 30, 2011 3:28 PM
> > > > To: common-user@hadoop.apache.org
> > > > Subject: Re: HDFS Explained as Comics
> > > >
> > > > Sure, its just a case of how readers interpret it.
> > > >
> > > >   1. Client is required to specify block size and replication
> > > > factor
> > each
> > > >   time
> > > >   2. Client does not need to worry about it since an admin has
> > > > set the properties in default configuration files
> > > >
> > > > A client could not be allowed to override the default configs
> > > > if they
> > are
> > > > set final (well there are ways to go around it as well as you
> > > > suggest
> > by
> > > > using create() :)
> > > >
> > > > The information is great and helpful. Just want to make sure a
> > > > beginner
> > > who
> > > > wants to write a "WordCount" in Mapreduce does not worry about
> > specifying
> > > > block size' and replication factor in his code.
> > > >
> > > > Thanks,
> > > > Prashant
> > > >
> > > > On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney
> > > >  > > > >wrote:
> > > >
> > > > > Hi Prashant
> > > > >
> > > > > Others may corre

Re: HDFS Explained as Comics

2011-12-01 Thread Dieter Plaetinck
Very clear.  The comic format works indeed quite well.
I never considered comics as a serious ("professional") way to get something 
explained efficiently,
but this shows people should think twice before they start writing their next 
documentation.

one question though: if a DN has a corrupted block, why does the NN only remove 
the bad DN from the block's list, and not the block from the DN list?
(also, does it really store the data in 2 separate tables?  This looks to me 
like 2 different views of the same data?)

Dieter

On Thu, 1 Dec 2011 08:53:31 +0100
"Alexander C.H. Lorenz"  wrote:

> Hi all,
> 
> very cool comic!
> 
> Thanks,
>  Alex
> 
> On Wed, Nov 30, 2011 at 11:58 PM, Abhishek Pratap Singh
>  > wrote:
> 
> > Hi,
> >
> > This is indeed a good way to explain, most of the improvement has
> > already been discussed. waiting for sequel of this comic.
> >
> > Regards,
> > Abhishek
> >
> > On Wed, Nov 30, 2011 at 1:55 PM, maneesh varshney
> >  > >wrote:
> >
> > > Hi Matthew
> > >
> > > I agree with both you and Prashant. The strip needs to be
> > > modified to explain that these can be default values that can be
> > > optionally
> > overridden
> > > (which I will fix in the next iteration).
> > >
> > > However, from the 'understanding concepts of HDFS' point of view,
> > > I still think that block size and replication factors are the
> > > real strengths of HDFS, and the learners must be exposed to them
> > > so that they get to see
> > how
> > > hdfs is significantly different from conventional file systems.
> > >
> > > On personal note: thanks for the first part of your message :)
> > >
> > > -Maneesh
> > >
> > >
> > > On Wed, Nov 30, 2011 at 1:36 PM, GOEKE, MATTHEW (AG/1000) <
> > > matthew.go...@monsanto.com> wrote:
> > >
> > > > Maneesh,
> > > >
> > > > Firstly, I love the comic :)
> > > >
> > > > Secondly, I am inclined to agree with Prashant on this latest
> > > > point.
> > > While
> > > > one code path could take us through the user defining command
> > > > line overrides (e.g. hadoop fs -D blah -put foo bar) I think it
> > > > might
> > confuse
> > > a
> > > > person new to Hadoop. The most common flow would be using admin
> > > determined
> > > > values from hdfs-site and the only thing that would need to
> > > > change is
> > > that
> > > > conversation happening between client / server and not user /
> > > > client.
> > > >
> > > > Matt
> > > >
> > > > -Original Message-
> > > > From: Prashant Kommireddi [mailto:prash1...@gmail.com]
> > > > Sent: Wednesday, November 30, 2011 3:28 PM
> > > > To: common-user@hadoop.apache.org
> > > > Subject: Re: HDFS Explained as Comics
> > > >
> > > > Sure, its just a case of how readers interpret it.
> > > >
> > > >   1. Client is required to specify block size and replication
> > > > factor
> > each
> > > >   time
> > > >   2. Client does not need to worry about it since an admin has
> > > > set the properties in default configuration files
> > > >
> > > > A client could not be allowed to override the default configs
> > > > if they
> > are
> > > > set final (well there are ways to go around it as well as you
> > > > suggest
> > by
> > > > using create() :)
> > > >
> > > > The information is great and helpful. Just want to make sure a
> > > > beginner
> > > who
> > > > wants to write a "WordCount" in Mapreduce does not worry about
> > specifying
> > > > block size' and replication factor in his code.
> > > >
> > > > Thanks,
> > > > Prashant
> > > >
> > > > On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney
> > > >  > > > >wrote:
> > > >
> > > > > Hi Prashant
> > > > >
> > > > > Others may correct me if I am wrong here..
> > > > >
> > > > > The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge
> > > > > of
> > block
> > > > size
> > > > > and replication factor. In the source code, I see the
> > > 

Re: HDFS Explained as Comics

2011-11-30 Thread Alexander C.H. Lorenz
Hi all,

very cool comic!

Thanks,
 Alex

On Wed, Nov 30, 2011 at 11:58 PM, Abhishek Pratap Singh  wrote:

> Hi,
>
> This is indeed a good way to explain, most of the improvement has already
> been discussed. waiting for sequel of this comic.
>
> Regards,
> Abhishek
>
> On Wed, Nov 30, 2011 at 1:55 PM, maneesh varshney  >wrote:
>
> > Hi Matthew
> >
> > I agree with both you and Prashant. The strip needs to be modified to
> > explain that these can be default values that can be optionally
> overridden
> > (which I will fix in the next iteration).
> >
> > However, from the 'understanding concepts of HDFS' point of view, I still
> > think that block size and replication factors are the real strengths of
> > HDFS, and the learners must be exposed to them so that they get to see
> how
> > hdfs is significantly different from conventional file systems.
> >
> > On personal note: thanks for the first part of your message :)
> >
> > -Maneesh
> >
> >
> > On Wed, Nov 30, 2011 at 1:36 PM, GOEKE, MATTHEW (AG/1000) <
> > matthew.go...@monsanto.com> wrote:
> >
> > > Maneesh,
> > >
> > > Firstly, I love the comic :)
> > >
> > > Secondly, I am inclined to agree with Prashant on this latest point.
> > While
> > > one code path could take us through the user defining command line
> > > overrides (e.g. hadoop fs -D blah -put foo bar) I think it might
> confuse
> > a
> > > person new to Hadoop. The most common flow would be using admin
> > determined
> > > values from hdfs-site and the only thing that would need to change is
> > that
> > > conversation happening between client / server and not user / client.
> > >
> > > Matt
> > >
> > > -Original Message-
> > > From: Prashant Kommireddi [mailto:prash1...@gmail.com]
> > > Sent: Wednesday, November 30, 2011 3:28 PM
> > > To: common-user@hadoop.apache.org
> > > Subject: Re: HDFS Explained as Comics
> > >
> > > Sure, its just a case of how readers interpret it.
> > >
> > >   1. Client is required to specify block size and replication factor
> each
> > >   time
> > >   2. Client does not need to worry about it since an admin has set the
> > >   properties in default configuration files
> > >
> > > A client could not be allowed to override the default configs if they
> are
> > > set final (well there are ways to go around it as well as you suggest
> by
> > > using create() :)
> > >
> > > The information is great and helpful. Just want to make sure a beginner
> > who
> > > wants to write a "WordCount" in Mapreduce does not worry about
> specifying
> > > block size' and replication factor in his code.
> > >
> > > Thanks,
> > > Prashant
> > >
> > > On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney  > > >wrote:
> > >
> > > > Hi Prashant
> > > >
> > > > Others may correct me if I am wrong here..
> > > >
> > > > The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge of
> block
> > > size
> > > > and replication factor. In the source code, I see the following in
> the
> > > > DFSClient constructor:
> > > >
> > > >defaultBlockSize = conf.getLong("dfs.block.size",
> > DEFAULT_BLOCK_SIZE);
> > > >
> > > >defaultReplication = (short) conf.getInt("dfs.replication", 3);
> > > >
> > > > My understanding is that the client considers the following chain for
> > the
> > > > values:
> > > > 1. Manual values (the long form constructor; when a user provides
> these
> > > > values)
> > > > 2. Configuration file values (these are cluster level defaults:
> > > > dfs.block.size and dfs.replication)
> > > > 3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)
> > > >
> > > > Moreover, in the org.apache.hadoop.hdfs.protocool.ClientProtocol the
> > API
> > > to
> > > > create a file is
> > > > void create(, short replication, long blocksize);
> > > >
> > > > I presume it means that the client already has knowledge of these
> > values
> > > > and passes them to the NameNode when creating a new file.
> > > >
> > > > Hope that helps.
> > > >
> > > > thanks
>

Re: HDFS Explained as Comics

2011-11-30 Thread Abhishek Pratap Singh
Hi,

This is indeed a good way to explain, most of the improvement has already
been discussed. waiting for sequel of this comic.

Regards,
Abhishek

On Wed, Nov 30, 2011 at 1:55 PM, maneesh varshney wrote:

> Hi Matthew
>
> I agree with both you and Prashant. The strip needs to be modified to
> explain that these can be default values that can be optionally overridden
> (which I will fix in the next iteration).
>
> However, from the 'understanding concepts of HDFS' point of view, I still
> think that block size and replication factors are the real strengths of
> HDFS, and the learners must be exposed to them so that they get to see how
> hdfs is significantly different from conventional file systems.
>
> On personal note: thanks for the first part of your message :)
>
> -Maneesh
>
>
> On Wed, Nov 30, 2011 at 1:36 PM, GOEKE, MATTHEW (AG/1000) <
> matthew.go...@monsanto.com> wrote:
>
> > Maneesh,
> >
> > Firstly, I love the comic :)
> >
> > Secondly, I am inclined to agree with Prashant on this latest point.
> While
> > one code path could take us through the user defining command line
> > overrides (e.g. hadoop fs -D blah -put foo bar) I think it might confuse
> a
> > person new to Hadoop. The most common flow would be using admin
> determined
> > values from hdfs-site and the only thing that would need to change is
> that
> > conversation happening between client / server and not user / client.
> >
> > Matt
> >
> > -Original Message-----
> > From: Prashant Kommireddi [mailto:prash1...@gmail.com]
> > Sent: Wednesday, November 30, 2011 3:28 PM
> > To: common-user@hadoop.apache.org
> > Subject: Re: HDFS Explained as Comics
> >
> > Sure, its just a case of how readers interpret it.
> >
> >   1. Client is required to specify block size and replication factor each
> >   time
> >   2. Client does not need to worry about it since an admin has set the
> >   properties in default configuration files
> >
> > A client could not be allowed to override the default configs if they are
> > set final (well there are ways to go around it as well as you suggest by
> > using create() :)
> >
> > The information is great and helpful. Just want to make sure a beginner
> who
> > wants to write a "WordCount" in Mapreduce does not worry about specifying
> > block size' and replication factor in his code.
> >
> > Thanks,
> > Prashant
> >
> > On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney  > >wrote:
> >
> > > Hi Prashant
> > >
> > > Others may correct me if I am wrong here..
> > >
> > > The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge of block
> > size
> > > and replication factor. In the source code, I see the following in the
> > > DFSClient constructor:
> > >
> > >defaultBlockSize = conf.getLong("dfs.block.size",
> DEFAULT_BLOCK_SIZE);
> > >
> > >defaultReplication = (short) conf.getInt("dfs.replication", 3);
> > >
> > > My understanding is that the client considers the following chain for
> the
> > > values:
> > > 1. Manual values (the long form constructor; when a user provides these
> > > values)
> > > 2. Configuration file values (these are cluster level defaults:
> > > dfs.block.size and dfs.replication)
> > > 3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)
> > >
> > > Moreover, in the org.apache.hadoop.hdfs.protocool.ClientProtocol the
> API
> > to
> > > create a file is
> > > void create(, short replication, long blocksize);
> > >
> > > I presume it means that the client already has knowledge of these
> values
> > > and passes them to the NameNode when creating a new file.
> > >
> > > Hope that helps.
> > >
> > > thanks
> > > -Maneesh
> > >
> > > On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi <
> > prash1...@gmail.com
> > > >wrote:
> > >
> > > > Thanks Maneesh.
> > > >
> > > > Quick question, does a client really need to know Block size and
> > > > replication factor - A lot of times client has no control over these
> > (set
> > > > at cluster level)
> > > >
> > > > -Prashant Kommireddi
> > > >
> > > > On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges <
> dejan.men...@gmail.com
> > > > >wrote:
> > > >
> > > > > 

Re: HDFS Explained as Comics

2011-11-30 Thread maneesh varshney
Hi Matthew

I agree with both you and Prashant. The strip needs to be modified to
explain that these can be default values that can be optionally overridden
(which I will fix in the next iteration).

However, from the 'understanding concepts of HDFS' point of view, I still
think that block size and replication factors are the real strengths of
HDFS, and the learners must be exposed to them so that they get to see how
hdfs is significantly different from conventional file systems.

On personal note: thanks for the first part of your message :)

-Maneesh


On Wed, Nov 30, 2011 at 1:36 PM, GOEKE, MATTHEW (AG/1000) <
matthew.go...@monsanto.com> wrote:

> Maneesh,
>
> Firstly, I love the comic :)
>
> Secondly, I am inclined to agree with Prashant on this latest point. While
> one code path could take us through the user defining command line
> overrides (e.g. hadoop fs -D blah -put foo bar) I think it might confuse a
> person new to Hadoop. The most common flow would be using admin determined
> values from hdfs-site and the only thing that would need to change is that
> conversation happening between client / server and not user / client.
>
> Matt
>
> -Original Message-
> From: Prashant Kommireddi [mailto:prash1...@gmail.com]
> Sent: Wednesday, November 30, 2011 3:28 PM
> To: common-user@hadoop.apache.org
> Subject: Re: HDFS Explained as Comics
>
> Sure, its just a case of how readers interpret it.
>
>   1. Client is required to specify block size and replication factor each
>   time
>   2. Client does not need to worry about it since an admin has set the
>   properties in default configuration files
>
> A client could not be allowed to override the default configs if they are
> set final (well there are ways to go around it as well as you suggest by
> using create() :)
>
> The information is great and helpful. Just want to make sure a beginner who
> wants to write a "WordCount" in Mapreduce does not worry about specifying
> block size' and replication factor in his code.
>
> Thanks,
> Prashant
>
> On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney  >wrote:
>
> > Hi Prashant
> >
> > Others may correct me if I am wrong here..
> >
> > The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge of block
> size
> > and replication factor. In the source code, I see the following in the
> > DFSClient constructor:
> >
> >defaultBlockSize = conf.getLong("dfs.block.size", DEFAULT_BLOCK_SIZE);
> >
> >defaultReplication = (short) conf.getInt("dfs.replication", 3);
> >
> > My understanding is that the client considers the following chain for the
> > values:
> > 1. Manual values (the long form constructor; when a user provides these
> > values)
> > 2. Configuration file values (these are cluster level defaults:
> > dfs.block.size and dfs.replication)
> > 3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)
> >
> > Moreover, in the org.apache.hadoop.hdfs.protocool.ClientProtocol the API
> to
> > create a file is
> > void create(, short replication, long blocksize);
> >
> > I presume it means that the client already has knowledge of these values
> > and passes them to the NameNode when creating a new file.
> >
> > Hope that helps.
> >
> > thanks
> > -Maneesh
> >
> > On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi <
> prash1...@gmail.com
> > >wrote:
> >
> > > Thanks Maneesh.
> > >
> > > Quick question, does a client really need to know Block size and
> > > replication factor - A lot of times client has no control over these
> (set
> > > at cluster level)
> > >
> > > -Prashant Kommireddi
> > >
> > > On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges  > > >wrote:
> > >
> > > > Hi Maneesh,
> > > >
> > > > Thanks a lot for this! Just distributed it over the team and comments
> > are
> > > > great :)
> > > >
> > > > Best regards,
> > > > Dejan
> > > >
> > > > On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney <
> mvarsh...@gmail.com
> > > > >wrote:
> > > >
> > > > > For your reading pleasure!
> > > > >
> > > > > PDF 3.3MB uploaded at (the mailing list has a cap of 1MB
> > attachments):
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1
> > > > >
> > > 

RE: HDFS Explained as Comics

2011-11-30 Thread GOEKE, MATTHEW (AG/1000)
Maneesh,

Firstly, I love the comic :)

Secondly, I am inclined to agree with Prashant on this latest point. While one 
code path could take us through the user defining command line overrides (e.g. 
hadoop fs -D blah -put foo bar) I think it might confuse a person new to 
Hadoop. The most common flow would be using admin determined values from 
hdfs-site and the only thing that would need to change is that conversation 
happening between client / server and not user / client.

Matt

-Original Message-
From: Prashant Kommireddi [mailto:prash1...@gmail.com] 
Sent: Wednesday, November 30, 2011 3:28 PM
To: common-user@hadoop.apache.org
Subject: Re: HDFS Explained as Comics

Sure, its just a case of how readers interpret it.

   1. Client is required to specify block size and replication factor each
   time
   2. Client does not need to worry about it since an admin has set the
   properties in default configuration files

A client could not be allowed to override the default configs if they are
set final (well there are ways to go around it as well as you suggest by
using create() :)

The information is great and helpful. Just want to make sure a beginner who
wants to write a "WordCount" in Mapreduce does not worry about specifying
block size' and replication factor in his code.

Thanks,
Prashant

On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney wrote:

> Hi Prashant
>
> Others may correct me if I am wrong here..
>
> The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge of block size
> and replication factor. In the source code, I see the following in the
> DFSClient constructor:
>
>defaultBlockSize = conf.getLong("dfs.block.size", DEFAULT_BLOCK_SIZE);
>
>defaultReplication = (short) conf.getInt("dfs.replication", 3);
>
> My understanding is that the client considers the following chain for the
> values:
> 1. Manual values (the long form constructor; when a user provides these
> values)
> 2. Configuration file values (these are cluster level defaults:
> dfs.block.size and dfs.replication)
> 3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)
>
> Moreover, in the org.apache.hadoop.hdfs.protocool.ClientProtocol the API to
> create a file is
> void create(, short replication, long blocksize);
>
> I presume it means that the client already has knowledge of these values
> and passes them to the NameNode when creating a new file.
>
> Hope that helps.
>
> thanks
> -Maneesh
>
> On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi  >wrote:
>
> > Thanks Maneesh.
> >
> > Quick question, does a client really need to know Block size and
> > replication factor - A lot of times client has no control over these (set
> > at cluster level)
> >
> > -Prashant Kommireddi
> >
> > On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges  > >wrote:
> >
> > > Hi Maneesh,
> > >
> > > Thanks a lot for this! Just distributed it over the team and comments
> are
> > > great :)
> > >
> > > Best regards,
> > > Dejan
> > >
> > > On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney  > > >wrote:
> > >
> > > > For your reading pleasure!
> > > >
> > > > PDF 3.3MB uploaded at (the mailing list has a cap of 1MB
> attachments):
> > > >
> > > >
> > >
> >
> https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1
> > > >
> > > >
> > > > Appreciate if you can spare some time to peruse this little
> experiment
> > of
> > > > mine to use Comics as a medium to explain computer science topics.
> This
> > > > particular issue explains the protocols and internals of HDFS.
> > > >
> > > > I am eager to hear your opinions on the usefulness of this visual
> > medium
> > > to
> > > > teach complex protocols and algorithms.
> > > >
> > > > [My personal motivations: I have always found text descriptions to be
> > too
> > > > verbose as lot of effort is spent putting the concepts in proper
> > > time-space
> > > > context (which can be easily avoided in a visual medium); sequence
> > > diagrams
> > > > are unwieldy for non-trivial protocols, and they do not explain
> > concepts;
> > > > and finally, animations/videos happen "too fast" and do not offer
> > > > self-paced learning experience.]
> > > >
> > > > All forms of criticisms, comments (and encouragements) welcome :)
> > > >
> > > > Thanks
> > > > Maneesh
&

Re: HDFS Explained as Comics

2011-11-30 Thread Prashant Kommireddi
Sure, its just a case of how readers interpret it.

   1. Client is required to specify block size and replication factor each
   time
   2. Client does not need to worry about it since an admin has set the
   properties in default configuration files

A client could not be allowed to override the default configs if they are
set final (well there are ways to go around it as well as you suggest by
using create() :)

The information is great and helpful. Just want to make sure a beginner who
wants to write a "WordCount" in Mapreduce does not worry about specifying
block size' and replication factor in his code.

Thanks,
Prashant

On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney wrote:

> Hi Prashant
>
> Others may correct me if I am wrong here..
>
> The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge of block size
> and replication factor. In the source code, I see the following in the
> DFSClient constructor:
>
>defaultBlockSize = conf.getLong("dfs.block.size", DEFAULT_BLOCK_SIZE);
>
>defaultReplication = (short) conf.getInt("dfs.replication", 3);
>
> My understanding is that the client considers the following chain for the
> values:
> 1. Manual values (the long form constructor; when a user provides these
> values)
> 2. Configuration file values (these are cluster level defaults:
> dfs.block.size and dfs.replication)
> 3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)
>
> Moreover, in the org.apache.hadoop.hdfs.protocool.ClientProtocol the API to
> create a file is
> void create(, short replication, long blocksize);
>
> I presume it means that the client already has knowledge of these values
> and passes them to the NameNode when creating a new file.
>
> Hope that helps.
>
> thanks
> -Maneesh
>
> On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi  >wrote:
>
> > Thanks Maneesh.
> >
> > Quick question, does a client really need to know Block size and
> > replication factor - A lot of times client has no control over these (set
> > at cluster level)
> >
> > -Prashant Kommireddi
> >
> > On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges  > >wrote:
> >
> > > Hi Maneesh,
> > >
> > > Thanks a lot for this! Just distributed it over the team and comments
> are
> > > great :)
> > >
> > > Best regards,
> > > Dejan
> > >
> > > On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney  > > >wrote:
> > >
> > > > For your reading pleasure!
> > > >
> > > > PDF 3.3MB uploaded at (the mailing list has a cap of 1MB
> attachments):
> > > >
> > > >
> > >
> >
> https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1
> > > >
> > > >
> > > > Appreciate if you can spare some time to peruse this little
> experiment
> > of
> > > > mine to use Comics as a medium to explain computer science topics.
> This
> > > > particular issue explains the protocols and internals of HDFS.
> > > >
> > > > I am eager to hear your opinions on the usefulness of this visual
> > medium
> > > to
> > > > teach complex protocols and algorithms.
> > > >
> > > > [My personal motivations: I have always found text descriptions to be
> > too
> > > > verbose as lot of effort is spent putting the concepts in proper
> > > time-space
> > > > context (which can be easily avoided in a visual medium); sequence
> > > diagrams
> > > > are unwieldy for non-trivial protocols, and they do not explain
> > concepts;
> > > > and finally, animations/videos happen "too fast" and do not offer
> > > > self-paced learning experience.]
> > > >
> > > > All forms of criticisms, comments (and encouragements) welcome :)
> > > >
> > > > Thanks
> > > > Maneesh
> > > >
> > >
> >
>


Re: HDFS Explained as Comics

2011-11-30 Thread maneesh varshney
Hi Prashant

Others may correct me if I am wrong here..

The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge of block size
and replication factor. In the source code, I see the following in the
DFSClient constructor:

defaultBlockSize = conf.getLong("dfs.block.size", DEFAULT_BLOCK_SIZE);

defaultReplication = (short) conf.getInt("dfs.replication", 3);

My understanding is that the client considers the following chain for the
values:
1. Manual values (the long form constructor; when a user provides these
values)
2. Configuration file values (these are cluster level defaults:
dfs.block.size and dfs.replication)
3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)

Moreover, in the org.apache.hadoop.hdfs.protocool.ClientProtocol the API to
create a file is
void create(, short replication, long blocksize);

I presume it means that the client already has knowledge of these values
and passes them to the NameNode when creating a new file.

Hope that helps.

thanks
-Maneesh

On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi wrote:

> Thanks Maneesh.
>
> Quick question, does a client really need to know Block size and
> replication factor - A lot of times client has no control over these (set
> at cluster level)
>
> -Prashant Kommireddi
>
> On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges  >wrote:
>
> > Hi Maneesh,
> >
> > Thanks a lot for this! Just distributed it over the team and comments are
> > great :)
> >
> > Best regards,
> > Dejan
> >
> > On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney  > >wrote:
> >
> > > For your reading pleasure!
> > >
> > > PDF 3.3MB uploaded at (the mailing list has a cap of 1MB attachments):
> > >
> > >
> >
> https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1
> > >
> > >
> > > Appreciate if you can spare some time to peruse this little experiment
> of
> > > mine to use Comics as a medium to explain computer science topics. This
> > > particular issue explains the protocols and internals of HDFS.
> > >
> > > I am eager to hear your opinions on the usefulness of this visual
> medium
> > to
> > > teach complex protocols and algorithms.
> > >
> > > [My personal motivations: I have always found text descriptions to be
> too
> > > verbose as lot of effort is spent putting the concepts in proper
> > time-space
> > > context (which can be easily avoided in a visual medium); sequence
> > diagrams
> > > are unwieldy for non-trivial protocols, and they do not explain
> concepts;
> > > and finally, animations/videos happen "too fast" and do not offer
> > > self-paced learning experience.]
> > >
> > > All forms of criticisms, comments (and encouragements) welcome :)
> > >
> > > Thanks
> > > Maneesh
> > >
> >
>


Re: HDFS Explained as Comics

2011-11-30 Thread Prashant Kommireddi
Thanks Maneesh.

Quick question, does a client really need to know Block size and
replication factor - A lot of times client has no control over these (set
at cluster level)

-Prashant Kommireddi

On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges wrote:

> Hi Maneesh,
>
> Thanks a lot for this! Just distributed it over the team and comments are
> great :)
>
> Best regards,
> Dejan
>
> On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney  >wrote:
>
> > For your reading pleasure!
> >
> > PDF 3.3MB uploaded at (the mailing list has a cap of 1MB attachments):
> >
> >
> https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1
> >
> >
> > Appreciate if you can spare some time to peruse this little experiment of
> > mine to use Comics as a medium to explain computer science topics. This
> > particular issue explains the protocols and internals of HDFS.
> >
> > I am eager to hear your opinions on the usefulness of this visual medium
> to
> > teach complex protocols and algorithms.
> >
> > [My personal motivations: I have always found text descriptions to be too
> > verbose as lot of effort is spent putting the concepts in proper
> time-space
> > context (which can be easily avoided in a visual medium); sequence
> diagrams
> > are unwieldy for non-trivial protocols, and they do not explain concepts;
> > and finally, animations/videos happen "too fast" and do not offer
> > self-paced learning experience.]
> >
> > All forms of criticisms, comments (and encouragements) welcome :)
> >
> > Thanks
> > Maneesh
> >
>


Re: HDFS Explained as Comics

2011-11-30 Thread Dejan Menges
Hi Maneesh,

Thanks a lot for this! Just distributed it over the team and comments are
great :)

Best regards,
Dejan

On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney wrote:

> For your reading pleasure!
>
> PDF 3.3MB uploaded at (the mailing list has a cap of 1MB attachments):
>
> https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1
>
>
> Appreciate if you can spare some time to peruse this little experiment of
> mine to use Comics as a medium to explain computer science topics. This
> particular issue explains the protocols and internals of HDFS.
>
> I am eager to hear your opinions on the usefulness of this visual medium to
> teach complex protocols and algorithms.
>
> [My personal motivations: I have always found text descriptions to be too
> verbose as lot of effort is spent putting the concepts in proper time-space
> context (which can be easily avoided in a visual medium); sequence diagrams
> are unwieldy for non-trivial protocols, and they do not explain concepts;
> and finally, animations/videos happen "too fast" and do not offer
> self-paced learning experience.]
>
> All forms of criticisms, comments (and encouragements) welcome :)
>
> Thanks
> Maneesh
>