The contrib/data_join framework is different from the map-side join
framework, under o.a.h.mapred.join.
To see what the example is doing in an outer join, generate a few
sample, text input files, tab-separated:
join/a.txt:
AAAAAAAA a0
BBBBBBBB a1
CCCCCCCC a2
CCCCCCCC a3
join/b.txt:
AAAAAAAA b0
BBBBBBBB b1
BBBBBBBB b2
BBBBBBBB b3
join/c.txt:
AAAAAAAA c0
BBBBBBBB c1
DDDDDDDD c2
DDDDDDDD c3
Run the example with each as an input:
host$ bin/hadoop jar hadoop-*-examples.jar join \
-inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat \
-outKey org.apache.hadoop.io.Text \
-joinOp outer \
join/a.txt join/b.txt join/c.txt joinout
Examine the result in joinout/part-00000:
host$ bin/hadoop fs -text joinout/part-00000 | less
AAAAAAAA [a0,b0,c0]
BBBBBBBB [a1,b1,c1]
BBBBBBBB [a1,b2,c1]
BBBBBBBB [a1,b3,c1]
CCCCCCCC [a2,,]
CCCCCCCC [a3,,]
DDDDDDDD [,,c2]
DDDDDDDD [,,c3]
-C
On Aug 7, 2008, at 11:39 PM, Wei Wu wrote:
There are some examples in $HADOOPHOME/src/contrib/data_join, which
I hope
would help.
Wei
-----Original Message-----
From: John DeTreville [mailto:[EMAIL PROTECTED]
Sent: Friday, August 08, 2008 2:34 AM
To: core-user@hadoop.apache.org
Subject: "Join" example
Hadoop ships with a few example programs. One of these is "join,"
which
I believe demonstrates map-side joins. I'm finding its usage
instructions a little impenetrable; could anyone send me instructions
that are more like "type this" then "type this" then "type this"?
Thanks in advance.
Cheers,
John