Hi Prashant Others may correct me if I am wrong here..
The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge of block size and replication factor. In the source code, I see the following in the DFSClient constructor: defaultBlockSize = conf.getLong("dfs.block.size", DEFAULT_BLOCK_SIZE); defaultReplication = (short) conf.getInt("dfs.replication", 3); My understanding is that the client considers the following chain for the values: 1. Manual values (the long form constructor; when a user provides these values) 2. Configuration file values (these are cluster level defaults: dfs.block.size and dfs.replication) 3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3) Moreover, in the org.apache.hadoop.hdfs.protocool.ClientProtocol the API to create a file is void create(...., short replication, long blocksize); I presume it means that the client already has knowledge of these values and passes them to the NameNode when creating a new file. Hope that helps. thanks -Maneesh On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi <prash1...@gmail.com>wrote: > Thanks Maneesh. > > Quick question, does a client really need to know Block size and > replication factor - A lot of times client has no control over these (set > at cluster level) > > -Prashant Kommireddi > > On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges <dejan.men...@gmail.com > >wrote: > > > Hi Maneesh, > > > > Thanks a lot for this! Just distributed it over the team and comments are > > great :) > > > > Best regards, > > Dejan > > > > On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney <mvarsh...@gmail.com > > >wrote: > > > > > For your reading pleasure! > > > > > > PDF 3.3MB uploaded at (the mailing list has a cap of 1MB attachments): > > > > > > > > > https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1 > > > > > > > > > Appreciate if you can spare some time to peruse this little experiment > of > > > mine to use Comics as a medium to explain computer science topics. This > > > particular issue explains the protocols and internals of HDFS. > > > > > > I am eager to hear your opinions on the usefulness of this visual > medium > > to > > > teach complex protocols and algorithms. > > > > > > [My personal motivations: I have always found text descriptions to be > too > > > verbose as lot of effort is spent putting the concepts in proper > > time-space > > > context (which can be easily avoided in a visual medium); sequence > > diagrams > > > are unwieldy for non-trivial protocols, and they do not explain > concepts; > > > and finally, animations/videos happen "too fast" and do not offer > > > self-paced learning experience.] > > > > > > All forms of criticisms, comments (and encouragements) welcome :) > > > > > > Thanks > > > Maneesh > > > > > >