You can open your sequence file in the mapper configure method, write to it in your map, and close it in the mapper close method. Then you end up with 1 sequence file per map. I am making an assumption that each key,value to your map some how represents a single xml file/item.
On Wed, Jun 17, 2009 at 7:29 PM, Jothi Padmanabhan <joth...@yahoo-inc.com>wrote: > You could look at CombineFileInputFormat to generate a single split out of > several files. > > Your partitioner would be able to assign keys to specific reducers, but you > would not have control on which node a given reduce task will run. > > Jothi > > > On 6/18/09 5:10 AM, "Tarandeep Singh" <tarand...@gmail.com> wrote: > > > Hi, > > > > Can I restrict the output of mappers running on a node to go to > reducer(s) > > running on the same node? > > > > Let me explain why I want to do this- > > > > I am converting huge number of XML files into SequenceFiles. So > > theoretically I don't even need reducers, mappers would read xml files > and > > output Sequencefiles. But the problem with this approach is I will end up > > getting huge number of small output files. > > > > To avoid generating large number of smaller files, I can Identity > reducers. > > But by running reducers, I am unnecessarily transfering data over > network. I > > ran some test case using a small subset of my data (~90GB). With map only > > jobs, my cluster finished conversion in only 6 minutes. But with map and > > Identity reducers job, it takes around 38 minutes. > > > > I have to process close to a terabyte of data. So I was thinking of a > faster > > alternatives- > > > > * Writing a custom OutputFormat > > * Somehow restrict output of mappers running on a node to go to reducers > > running on the same node. May be I can write my own partitioner (simple) > but > > not sure how Hadoop's framework assigns partitions to reduce tasks. > > > > Any pointers ? > > > > Or this is not possible at all ? > > > > Thanks, > > Tarandeep > > -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals