Re: joins in map reduce

2008-06-30 Thread Jason Venner

I have just started to try using the Join operators.

The join I am trying is this;
join is 
outer(tbl(org.apache.hadoop.mapred.SequenceFileInputFormat,Input1),tbl(org.apache.hadoop.mapred.SequenceFileInputFormat,IndexedTry1))


but I get an error
08/06/30 08:55:13 INFO mapred.FileInputFormat: Total input paths to 
process : 10
Exception in thread main java.io.IOException: No input paths specified 
in input
   at 
org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:115)

   at org.apache.hadoop.mapred.join.Parser$WNode.getSplits(Parser.java:304)
   at org.apache.hadoop.mapred.join.Parser$CNode.getSplits(Parser.java:375)
   at 
org.apache.hadoop.mapred.join.CompositeInputFormat.getSplits(CompositeInputFormat.java:131)

   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:544)

I am clearly missing something basic...

   conf.setInputFormat(CompositeInputFormat.class);
   conf.setOutputPath( outputDirectory );
   conf.setOutputKeyClass(Text.class);
   conf.setOutputValueClass(Text.class);
   conf.setOutputFormat(MapFileOutputFormat.class);
   conf.setMapperClass( LeftHandJoinMapper.class );
   conf.setReducerClass( IdentityReducer.class );
   conf.setNumReduceTasks(0);

   System.err.println( join is  + 
CompositeInputFormat.compose(outer, SequenceFileInputFormat.class, 
allTables ) );
   conf.set(mapred.join.expr, 
CompositeInputFormat.compose(outer, SequenceFileInputFormat.class, 
allTables ));
  
   JobClient client = new JobClient();
  
   client.setConf( conf );


   RunningJob job = JobClient.runJob( conf );



Shirley Cohen wrote:

Hi,

How does one do a join operation in map reduce? Is there more than one 
way to do a join? Which way works better and why?


Thanks,

Shirley

--
Jason Venner
Attributor - Program the Web http://www.attributor.com/
Attributor is hiring Hadoop Wranglers and coding wizards, contact if 
interested


Re: joins in map reduce

2008-05-22 Thread Ted Dunning


Also, if one source of the join is small enough to fit in memory, you can
build an in-memory table and do the map-side join on unsorted data.


On 5/21/08 11:43 AM, Owen O'Malley [EMAIL PROTECTED] wrote:

 
 On May 21, 2008, at 11:16 AM, Shirley Cohen wrote:
 
 How does one do a join operation in map reduce? Is there more than
 one way to do a join? Which way works better and why?
 
 There are a couple of ways, depending on what you need to do. If your
 input data is sorted and partitioned equivalently on the same key,
 you can do a join before the map (aka map-side join). The
 documentation is at:  http://tinyurl.com/5v4rot
 
 If your data is not sorted and partitioned consistently, you need to
 do the join in the reduce. There is a library to help at: http://
 tinyurl.com/5cz669
 
 -- Owen
 
 



Re: joins in map reduce

2008-05-21 Thread Owen O'Malley


On May 21, 2008, at 11:16 AM, Shirley Cohen wrote:

How does one do a join operation in map reduce? Is there more than  
one way to do a join? Which way works better and why?


There are a couple of ways, depending on what you need to do. If your  
input data is sorted and partitioned equivalently on the same key,  
you can do a join before the map (aka map-side join). The  
documentation is at:  http://tinyurl.com/5v4rot


If your data is not sorted and partitioned consistently, you need to  
do the join in the reduce. There is a library to help at: http:// 
tinyurl.com/5cz669


-- Owen