Then you may want to look at the MultipleOutputFile, it can do what you need.

On Tue, Jul 29, 2008 at 10:11 PM, Lincoln Ritter
<[EMAIL PROTECTED]> wrote:
> Thanks for the info!
>
>> Not sure what happens if you write NULL as key or value.
>
> Looking at the code, it doesn't seem to really make a difference, and
> the function in question (basically 'collect') looks to be robust to
> null - but I may be missing something!
>
> In my case, I basically want the key to be the output filename, and
> the data in the files to be directly consumable by my app.  Having the
> key show up in the file complicates things on the app side so I'm
> trying to avoid this.  Passing null seems to work for now.
>
>
> -lincoln
>
> --
> lincolnritter.com
>
>
>
>
> On Tue, Jul 29, 2008 at 9:27 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote:
>> On Thu, Jul 24, 2008 at 12:32 AM, Lincoln Ritter
>> <[EMAIL PROTECTED]> wrote:
>>
>>> Alejandro said:
>>>> Take a look at the MultipleOutputFormat class or MultipleOutputs (in SVN 
>>>> tip)
>>>
>>> I'm muddling through both
>>> http://issues.apache.org/jira/browse/HADOOP-2906 and
>>> http://issues.apache.org/jira/browse/HADOOP-3149 trying to make sense
>>> of these.  I'm a little confused by the way this works but it looks
>>> like I can define a number of named outputs which looks like it
>>> enables different output formats and I can also define some of these
>>> as "multi", meaning that I can write to different "targets" (like
>>> files).  Is this correct?
>>
>> Exactly.
>>
>> ....
>>
>>> A couple of questions:
>>>
>>>  - I needed to pass 'null' to the collect method so as to not write
>>> the key to the file.  These files are meant to be consumable chunks of
>>> content so I want to control exactly what goes into them.  Does this
>>> seem normal or am i missing something?  Is there a downside to passing
>>> null here?
>>
>> Not sure what happens if you write NULL as key or value.
>>
>>>  - What is the 'part-00000' file for?  I have seen this in other
>>> places in the dfs. But it seems extraneous here.  It's not super
>>> critical but if I can make it go away that would be great.
>>
>> This is the standard output of the M/R job whatever is written the
>> OutputCollector you get in the reduce() call (or in the map() call
>> when reduce=0)
>>
>>>  - What is the purpose of the '-r-00000' suffix?  Perhaps it is to
>>> help with collisions?
>>
>> Yes, files written from a map have '-m-', files written from a reduce have 
>> '-r-'
>>
>>> I guess it seems strange that I can't just say
>>> "the output file should be called X" and have an output file called X
>>> appear.
>>
>> Well, you need the map, reduce mask and the task number mask to avoid
>> collisions.
>>
>

Reply via email to