When I try the map-side join example (under Hadoop 0.17.1, running in
standalone mode under Win32), it attempts to dereference a null pointer.
$ cat One/some.txt
A 1
B 1
C 1
E 1
$ cat Two/some.txt
A 2
B 2
C 2
D 2
$ bin/hadoop jar *examples.jar join -inFormat
org.apache.hadoop.mapred.KeyValueTextInputFormat -outKey
org.apache.hadoop.io.Text -joinOp outer One/some.txt Two/some.txt output
cygpath: cannot create short name of c:\Documents and
Settings\jdd\Desktop\hadoop-0.17.1\logs
08/08/08 15:41:34 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
Job started: Fri Aug 08 15:41:34 PDT 2008
08/08/08 15:41:34 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
08/08/08 15:41:34 INFO mapred.FileInputFormat: Total input paths to
process : 1
08/08/08 15:41:34 INFO mapred.FileInputFormat: Total input paths to
process : 1
java.lang.NullPointerException
at
org.apache.hadoop.mapred.KeyValueTextInputFormat.isSplitable(KeyValueTex
tInputFormat.java:44)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:
247)
at
org.apache.hadoop.mapred.join.Parser$WNode.getSplits(Parser.java:305)
at
org.apache.hadoop.mapred.join.Parser$CNode.getSplits(Parser.java:375)
at
org.apache.hadoop.mapred.join.CompositeInputFormat.getSplits(CompositeIn
putFormat.java:129)
at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:712)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:973)
at org.apache.hadoop.examples.Join.run(Join.java:154)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.examples.Join.main(Join.java:163)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDr
iver.java:68)
at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at
org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:53)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)
$
I'll look around a little to see what the problem is. The attempt to
initialize the JVM metrics twice also seems suspicious.
Here's one other thing I don't understand. Suppose my directory One
contains some number of files, and directory Two contains the same
number, named the same and partitioned the same. If I give the directory
names One and Two to the example program, will it match up the files by
name for performing the join? I haven't found the code yet to do that,
although I'm imagining that perhaps that's what it does.
Cheers,
John
-Original Message-
From: Chris Douglas [mailto:[EMAIL PROTECTED]
Sent: Friday, August 08, 2008 1:57 PM
To: core-user@hadoop.apache.org
Subject: Re: "Join" example
The contrib/data_join framework is different from the map-side join
framework, under o.a.h.mapred.join.
To see what the example is doing in an outer join, generate a few
sample, text input files, tab-separated:
join/a.txt:
a0
a1
a2
a3
join/b.txt:
b0
b1
b2
b3
join/c.txt:
c0
c1
c2
c3
Run the example with each as an input:
host$ bin/hadoop jar hadoop-*-examples.jar join \
-inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat \
-outKey org.apache.hadoop.io.Text \
-joinOp outer \
join/a.txt join/b.txt join/c.txt joinout
Examine the result in joinout/part-0:
host$ bin/hadoop fs -text joinout/part-0 | less
[a0,b0,c0]
[a1,b1,c1]
[a1,b2,c1]
[a1,b3,c1]
[a2,,]
[a3,,]
[,,c2]
[,,c3]
-C
On Aug 7, 2008, at 11:39 PM, Wei Wu wrote:
> There are some examples in $HADOOPHOME/src/contrib/data_join, which
> I hope
> would help.
>
> Wei
>
> -Original Message-
> From: John DeTreville [mailt