Yeah I'm taking the path of chained mappers at this point. If anything it will 
give me a sorted output in the end. It's just to bad to have to run 3 jobs when 
I'm sure Hadoop provides an elegant way to do it in one. 

----- Original Message -----
From: Lance Norskog
Sent: 08/18/10 10:11 PM
To: [email protected]
Subject: Re: MultiFilterRecordReader

Hadoop has a toolkit called 'map-side joins' which requires sorted input 
tables. org.apache.hadoop.examples.Join.java shows how. Good luck decoding it! 
Could you use chained mapper tasks to sort each input set before using the join 
framework? On Wed, Aug 18, 2010 at 10:10 AM, y l <[email protected]> wrote: > 
Hi, > > My first email on the list, and overall pretty new to Hadoop, so I'm 
hoping to find some help with a new task I have to do for work. > I need to do 
a join between 2 sets of files. One is a bunch of csv files and the other set 
is sequence files. > > I was told MultiFilterRecorderReader could help me do 
the join, but I haven't been successful to find some good example on where and 
how to use that class to do the join. > I have found a good example using 
CompositeInputFormat here: http://www.congiu.com/node/5 > But it assumes that 
the input is sorted and I can't guarantee that it will be on the csv files at 
least. > > Anyone knows what I need to do with that MultiFi
 lterRecorderReader? Inherit it on the mapper? I'm a little confused... Please 
let me know if you have any pointers on that one. > > Thanks. > -- Lance 
Norskog [email protected]

Reply via email to