You can open your sequence file in the mapper configure method, write to it
in your map, and close it in the mapper close method.
Then you end up with 1 sequence file per map. I am making an assumption that
each key,value to your map some how represents a single xml file/item.

On Wed, Jun 17, 2009 at 7:29 PM, Jothi Padmanabhan <joth...@yahoo-inc.com>wrote:

> You could look at CombineFileInputFormat to generate a single split out of
> several files.
>
> Your partitioner would be able to assign keys to specific reducers, but you
> would not have control on which node a given reduce task will run.
>
> Jothi
>
>
> On 6/18/09 5:10 AM, "Tarandeep Singh" <tarand...@gmail.com> wrote:
>
> > Hi,
> >
> > Can I restrict the output of mappers running on a node to go to
> reducer(s)
> > running on the same node?
> >
> > Let me explain why I want to do this-
> >
> > I am converting huge number of XML files into SequenceFiles. So
> > theoretically I don't even need reducers, mappers would read xml files
> and
> > output Sequencefiles. But the problem with this approach is I will end up
> > getting huge number of small output files.
> >
> > To avoid generating large number of smaller files, I can Identity
> reducers.
> > But by running reducers, I am unnecessarily transfering data over
> network. I
> > ran some test case using a small subset of my data (~90GB). With map only
> > jobs, my cluster finished conversion in only 6 minutes. But with map and
> > Identity reducers job, it takes around 38 minutes.
> >
> > I have to process close to a terabyte of data. So I was thinking of a
> faster
> > alternatives-
> >
> > * Writing a custom OutputFormat
> > * Somehow restrict output of mappers running on a node to go to reducers
> > running on the same node. May be I can write my own partitioner (simple)
> but
> > not sure how Hadoop's framework assigns partitions to reduce tasks.
> >
> > Any pointers ?
> >
> > Or this is not possible at all ?
> >
> > Thanks,
> > Tarandeep
>
>


-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Reply via email to