-----Original Message----- From: Sasha Dolgy [mailto:sdo...@gmail.com] Sent: Monday, May 18, 2009 9:50 AM To: core-user@hadoop.apache.org Subject: Re: proper method for writing files to hdfs
Ok, on the same page with that. Going back to the original question. In our scenario we are trying to stream data into HDFS and despite the posts and hints I've been reading, it's still tough to crack this nut and this is why I thought (and thankfully I wasn't right) that we were going about this the wrong way: We open up a new file and get the FSDataOutputStream and start to write data and flush as concurrent information comes in: 2009-05-17 06:16:50,921 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Changing block file offset of block blk_5834867413110307425_1064 from 2451 to 20 48 meta file offset to 23 2009-05-17 06:16:50,921 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 adding seqno 3 to ack queue. 2009-05-17 06:16:50,921 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_5834867413110307425_1064 acking for packet 3 2009-05-17 06:16:51,111 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving one packet for block blk_5834867413110307425_1064 of length 735 seqno 4 offsetInBlock 2048 lastPacketInBlock false 2009-05-17 06:16:51,111 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Changing block file offset of block blk_5834867413110307425_1064 from 2518 to 20 48 meta file offset to 23 2009-05-17 06:16:51,111 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 adding seqno 4 to ack queue. 2009-05-17 06:16:51,112 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_5834867413110307425_1064 acking for packet 4 2009-05-17 06:16:51,297 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving one packet for block blk_5834867413110307425_1064 of length 509 seqno 5 offsetInBlock 2560 lastPacketInBlock false 2009-05-17 06:16:51,297 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Changing block file offset of block blk_5834867413110307425_1064 from 2771 to 25 60 meta file offset to 27 The file gets bigger and bigger, but it is not commited to hdfs until we close() the stream. We've waited for the block size to go above 64k and even higher, and it never writes itself out to hdfs. I've seen the JIRA bug reports, etc. Has no one done this? Is it bad to stream data into it? How do I force it to flush the data to disk... The POC is with environmental data every moment from multiple sources for monitoring temperature in computers / facilities... Suppose I'm just a little frustrated. I see that hadoop is brilliant for large sets of data that you already have or are happy to move onto HDFS ... -sd > On Mon, May 18, 2009 at 2:09 PM, Bill Habermaas <b...@habermaas.us> wrote: >> Sasha, >> >> Connecting to the namenode is the proper way to establish the hdfs >> connection. Afterwards the Hadoop client handler that is called by your >> code will go directly to the datanodes. There is no reason for you to >> communicate directly with a datanode nor is there a way for you to even > know >> where the data nodes are located. That is all done by the Hadoop client > code >> and done silently under the covers by Hadoop itself. >> >> Bill >> >> -----Original Message----- >> From: sdo...@gmail.com [mailto:sdo...@gmail.com] On Behalf Of Sasha Dolgy >> Sent: Sunday, May 17, 2009 10:55 AM >> To: core-user@hadoop.apache.org >> Subject: proper method for writing files to hdfs >> >> The following graphic outlines the architecture for HDFS: >> http://hadoop.apache.org/core/docs/current/images/hdfsarchitecture.gif >> >> If one is to write a client that adds data into HDFS, it needs to add it >> through the Data Node. Now, from the graphic I am to understand that the >> client doesn't communicate with the NameNode, and only the Data Node. >> >> In the examples I've seen and the playing I am doing, I am connecting to > the >> hdfs url as a configuration parameter before I create a file. Is this the >> incorrect way to create files in HDFS? >> >> Configuration config = new Configuration(); >> config.set("fs.default.name","hdfs://foo.bar.com:9000/"); >> String path = "/tmp/i/am/a/path/to/a/file.name" >> Path hdfsPath = new Path(path); >> FileSystem fileSystem = FileSystem.get(config); >> FSDataOutputStream os = fileSystem.create(hdfsPath, false); >> os.write("something".getBytes()); >> os.close(); >> >> Should the client be connecting to a data node to create the file as >> indicated in the graphic above? >> >> If connecting to a data node is possible and suggested, where can I find >> more details about this process? >> >> Thanks in advance, >> -sasha >> >> -- >> Sasha Dolgy >> sasha.do...@gmail.com