Then you may want to look at the MultipleOutputFile, it can do what you need.
On Tue, Jul 29, 2008 at 10:11 PM, Lincoln Ritter <[EMAIL PROTECTED]> wrote: > Thanks for the info! > >> Not sure what happens if you write NULL as key or value. > > Looking at the code, it doesn't seem to really make a difference, and > the function in question (basically 'collect') looks to be robust to > null - but I may be missing something! > > In my case, I basically want the key to be the output filename, and > the data in the files to be directly consumable by my app. Having the > key show up in the file complicates things on the app side so I'm > trying to avoid this. Passing null seems to work for now. > > > -lincoln > > -- > lincolnritter.com > > > > > On Tue, Jul 29, 2008 at 9:27 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote: >> On Thu, Jul 24, 2008 at 12:32 AM, Lincoln Ritter >> <[EMAIL PROTECTED]> wrote: >> >>> Alejandro said: >>>> Take a look at the MultipleOutputFormat class or MultipleOutputs (in SVN >>>> tip) >>> >>> I'm muddling through both >>> http://issues.apache.org/jira/browse/HADOOP-2906 and >>> http://issues.apache.org/jira/browse/HADOOP-3149 trying to make sense >>> of these. I'm a little confused by the way this works but it looks >>> like I can define a number of named outputs which looks like it >>> enables different output formats and I can also define some of these >>> as "multi", meaning that I can write to different "targets" (like >>> files). Is this correct? >> >> Exactly. >> >> .... >> >>> A couple of questions: >>> >>> - I needed to pass 'null' to the collect method so as to not write >>> the key to the file. These files are meant to be consumable chunks of >>> content so I want to control exactly what goes into them. Does this >>> seem normal or am i missing something? Is there a downside to passing >>> null here? >> >> Not sure what happens if you write NULL as key or value. >> >>> - What is the 'part-00000' file for? I have seen this in other >>> places in the dfs. But it seems extraneous here. It's not super >>> critical but if I can make it go away that would be great. >> >> This is the standard output of the M/R job whatever is written the >> OutputCollector you get in the reduce() call (or in the map() call >> when reduce=0) >> >>> - What is the purpose of the '-r-00000' suffix? Perhaps it is to >>> help with collisions? >> >> Yes, files written from a map have '-m-', files written from a reduce have >> '-r-' >> >>> I guess it seems strange that I can't just say >>> "the output file should be called X" and have an output file called X >>> appear. >> >> Well, you need the map, reduce mask and the task number mask to avoid >> collisions. >> >