The contrib/data_join framework is different from the map-side join framework, under o.a.h.mapred.join.

To see what the example is doing in an outer join, generate a few sample, text input files, tab-separated:

join/a.txt:

AAAAAAAA        a0
BBBBBBBB        a1
CCCCCCCC        a2
CCCCCCCC        a3

join/b.txt:

AAAAAAAA        b0
BBBBBBBB        b1
BBBBBBBB        b2
BBBBBBBB        b3

join/c.txt:

AAAAAAAA        c0
BBBBBBBB        c1
DDDDDDDD        c2
DDDDDDDD        c3

Run the example with each as an input:

host$ bin/hadoop jar hadoop-*-examples.jar join \
  -inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat \
  -outKey org.apache.hadoop.io.Text \
  -joinOp outer \
  join/a.txt join/b.txt join/c.txt joinout

Examine the result in joinout/part-00000:

host$ bin/hadoop fs -text joinout/part-00000 | less
AAAAAAAA        [a0,b0,c0]
BBBBBBBB        [a1,b1,c1]
BBBBBBBB        [a1,b2,c1]
BBBBBBBB        [a1,b3,c1]
CCCCCCCC        [a2,,]
CCCCCCCC        [a3,,]
DDDDDDDD        [,,c2]
DDDDDDDD        [,,c3]

-C

On Aug 7, 2008, at 11:39 PM, Wei Wu wrote:

There are some examples in $HADOOPHOME/src/contrib/data_join, which I hope
would help.

Wei

-----Original Message-----
From: John DeTreville [mailto:[EMAIL PROTECTED]
Sent: Friday, August 08, 2008 2:34 AM
To: core-user@hadoop.apache.org
Subject: "Join" example

Hadoop ships with a few example programs. One of these is "join," which
I believe demonstrates map-side joins. I'm finding its usage
instructions a little impenetrable; could anyone send me instructions
that are more like "type this" then "type this" then "type this"?

Thanks in advance.

Cheers,
John


Reply via email to