Re: one key per output part file

Ashish Venugopal Tue, 01 Apr 2008 18:42:51 -0700

This seems like a reasonable solution - but I am using Hadoop streaming and
byreducer is a perl script. Is it possible to handle side-effect files in
streaming? I havent found
anything that indicates that you can...


Ashish

On Tue, Apr 1, 2008 at 9:13 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:

>
>
> Try opening the desired output file in the reduce method.  Make sure that
> the output files are relative to the correct task specific directory (look
> for side-effect files on the wiki).
>
>
>
> On 4/1/08 5:57 PM, "Ashish Venugopal" <[EMAIL PROTECTED]> wrote:
>
> > Hi, I am using Hadoop streaming and I am trying to create a MapReduce
> that
> > will generate output where a single key is found in a single output part
> > file.
> > Does anyone know how to ensure this condition? I want the reduce task
> (no
> > matter how many are specified), to only receive
> > key-value output from a single key each, process the key-value pairs for
> > this key, write an output part-XXX file, and only
> > then process the next key.
> >
> > Here is the task that I am trying to accomplish:
> >
> > Input: Corpus T (lines of text), Corpus V (each line has 1 word)
> > Output: Each part-XXX should contain the lines of T that contain the
> word
> > from line XXX in V.
> >
> > Any help/ideas are appreciated.
> >
> > Ashish
>
>

Re: one key per output part file

Reply via email to