Sasha,

Connecting to the namenode is the proper way to establish the hdfs
connection.  Afterwards the Hadoop client handler that is called by your
code will go directly to the datanodes. There is no reason for you to
communicate directly with a datanode nor is there a way for you to even know
where the data nodes are located. That is all done by the Hadoop client code
and done silently under the covers by Hadoop itself. 

Bill

-----Original Message-----
From: sdo...@gmail.com [mailto:sdo...@gmail.com] On Behalf Of Sasha Dolgy
Sent: Sunday, May 17, 2009 10:55 AM
To: core-user@hadoop.apache.org
Subject: proper method for writing files to hdfs

The following graphic outlines the architecture for HDFS:
http://hadoop.apache.org/core/docs/current/images/hdfsarchitecture.gif

If one is to write a client that adds data into HDFS, it needs to add it
through the Data Node.  Now, from the graphic I am to understand that the
client doesn't communicate with the NameNode, and only the Data Node.

In the examples I've seen and the playing I am doing, I am connecting to the
hdfs url as a configuration parameter before I create a file.  Is this the
incorrect way to create files in HDFS?

    Configuration config = new Configuration();
    config.set("fs.default.name","hdfs://foo.bar.com:9000/");
    String path = "/tmp/i/am/a/path/to/a/file.name"
    Path hdfsPath = new Path(path);
    FileSystem fileSystem = FileSystem.get(config);
    FSDataOutputStream os = fileSystem.create(hdfsPath, false);
    os.write("something".getBytes());
    os.close();

Should the client be connecting to a data node to create the file as
indicated in the graphic above?

If connecting to a data node is possible and suggested, where can I find
more details about this process?

Thanks in advance,
-sasha

-- 
Sasha Dolgy
sasha.do...@gmail.com


Reply via email to