This seems like a reasonable solution - but I am using Hadoop streaming and byreducer is a perl script. Is it possible to handle side-effect files in streaming? I havent found anything that indicates that you can...
Ashish On Tue, Apr 1, 2008 at 9:13 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > > > Try opening the desired output file in the reduce method. Make sure that > the output files are relative to the correct task specific directory (look > for side-effect files on the wiki). > > > > On 4/1/08 5:57 PM, "Ashish Venugopal" <[EMAIL PROTECTED]> wrote: > > > Hi, I am using Hadoop streaming and I am trying to create a MapReduce > that > > will generate output where a single key is found in a single output part > > file. > > Does anyone know how to ensure this condition? I want the reduce task > (no > > matter how many are specified), to only receive > > key-value output from a single key each, process the key-value pairs for > > this key, write an output part-XXX file, and only > > then process the next key. > > > > Here is the task that I am trying to accomplish: > > > > Input: Corpus T (lines of text), Corpus V (each line has 1 word) > > Output: Each part-XXX should contain the lines of T that contain the > word > > from line XXX in V. > > > > Any help/ideas are appreciated. > > > > Ashish > >