The contrib/data_join framework is different from the map-side join framework, under o.a.h.mapred.join.

To see what the example is doing in an outer join, generate a few sample, text input files, tab-separated:


AAAAAAAA        a0
BBBBBBBB        a1
CCCCCCCC        a2
CCCCCCCC        a3


AAAAAAAA        b0
BBBBBBBB        b1
BBBBBBBB        b2
BBBBBBBB        b3


AAAAAAAA        c0
BBBBBBBB        c1
DDDDDDDD        c2
DDDDDDDD        c3

Run the example with each as an input:

host$ bin/hadoop jar hadoop-*-examples.jar join \
  -inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat \
  -outKey \
  -joinOp outer \
  join/a.txt join/b.txt join/c.txt joinout

Examine the result in joinout/part-00000:

host$ bin/hadoop fs -text joinout/part-00000 | less
AAAAAAAA        [a0,b0,c0]
BBBBBBBB        [a1,b1,c1]
BBBBBBBB        [a1,b2,c1]
BBBBBBBB        [a1,b3,c1]
CCCCCCCC        [a2,,]
CCCCCCCC        [a3,,]
DDDDDDDD        [,,c2]
DDDDDDDD        [,,c3]


On Aug 7, 2008, at 11:39 PM, Wei Wu wrote:

There are some examples in $HADOOPHOME/src/contrib/data_join, which I hope
would help.


-----Original Message-----
From: John DeTreville [mailto:[EMAIL PROTECTED]
Sent: Friday, August 08, 2008 2:34 AM
Subject: "Join" example

Hadoop ships with a few example programs. One of these is "join," which
I believe demonstrates map-side joins. I'm finding its usage
instructions a little impenetrable; could anyone send me instructions
that are more like "type this" then "type this" then "type this"?

Thanks in advance.


Reply via email to