Tom,

I also forgot to mention that if you are writing to lots of little files it 
could cause issues too.  HDFS is designed to handle relatively few BIG files.  
There is some work to improve this, but it is still a ways off.  So it is 
likely going to be very slow and put a big load on the namenode if you are 
going to create lot of small files using this method.

--Bobby


On 7/25/11 3:30 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote:

Tom,

That assumes that you will never write to the same file from two different 
mappers or processes.  HDFS currently does not support writing to a single file 
from multiple processes.

--Bobby

On 7/25/11 3:25 PM, "Tom Melendez" <t...@supertom.com> wrote:

Hi Folks,

Just doing a sanity check here.

I have a map-only job, which produces a filename for a key and data as
a value.  I want to write the value (data) into the key (filename) in
the path specified when I run the job.

The value (data) doesn't need any formatting, I can just write it to
HDFS without modification.

So, looking at this link (the Output Formats section):

http://developer.yahoo.com/hadoop/tutorial/module5.html

Looks like I want to:
- create a new output format
- override write, tell it not to call writekey as I don't want that written
- new getRecordWriter method that use the key as the filename and
calls my outputformat

Sound reasonable?

Thanks,

Tom

--
===================
Skybox is hiring.
http://www.skyboximaging.com/careers/jobs


Reply via email to