Tom, I also forgot to mention that if you are writing to lots of little files it could cause issues too. HDFS is designed to handle relatively few BIG files. There is some work to improve this, but it is still a ways off. So it is likely going to be very slow and put a big load on the namenode if you are going to create lot of small files using this method.
--Bobby On 7/25/11 3:30 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote: Tom, That assumes that you will never write to the same file from two different mappers or processes. HDFS currently does not support writing to a single file from multiple processes. --Bobby On 7/25/11 3:25 PM, "Tom Melendez" <t...@supertom.com> wrote: Hi Folks, Just doing a sanity check here. I have a map-only job, which produces a filename for a key and data as a value. I want to write the value (data) into the key (filename) in the path specified when I run the job. The value (data) doesn't need any formatting, I can just write it to HDFS without modification. So, looking at this link (the Output Formats section): http://developer.yahoo.com/hadoop/tutorial/module5.html Looks like I want to: - create a new output format - override write, tell it not to call writekey as I don't want that written - new getRecordWriter method that use the key as the filename and calls my outputformat Sound reasonable? Thanks, Tom -- =================== Skybox is hiring. http://www.skyboximaging.com/careers/jobs