RE: "Join" example

2008-08-08 Thread John DeTreville
When I try the map-side join example (under Hadoop 0.17.1, running in
standalone mode under Win32), it attempts to dereference a null pointer.

$ cat One/some.txt
A   1
B   1
C   1
E   1
$ cat Two/some.txt
A   2
B   2
C   2
D   2
$ bin/hadoop jar *examples.jar join -inFormat
org.apache.hadoop.mapred.KeyValueTextInputFormat -outKey
org.apache.hadoop.io.Text -joinOp outer One/some.txt Two/some.txt output
cygpath: cannot create short name of c:\Documents and
Settings\jdd\Desktop\hadoop-0.17.1\logs
08/08/08 15:41:34 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
Job started: Fri Aug 08 15:41:34 PDT 2008
08/08/08 15:41:34 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
08/08/08 15:41:34 INFO mapred.FileInputFormat: Total input paths to
process : 1
08/08/08 15:41:34 INFO mapred.FileInputFormat: Total input paths to
process : 1
java.lang.NullPointerException
at
org.apache.hadoop.mapred.KeyValueTextInputFormat.isSplitable(KeyValueTex
tInputFormat.java:44)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:
247)
at
org.apache.hadoop.mapred.join.Parser$WNode.getSplits(Parser.java:305)
at
org.apache.hadoop.mapred.join.Parser$CNode.getSplits(Parser.java:375)
at
org.apache.hadoop.mapred.join.CompositeInputFormat.getSplits(CompositeIn
putFormat.java:129)
at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:712)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:973)
at org.apache.hadoop.examples.Join.run(Join.java:154)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.examples.Join.main(Join.java:163)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDr
iver.java:68)
at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at
org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:53)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)
$ 

I'll look around a little to see what the problem is. The attempt to
initialize the JVM metrics twice also seems suspicious.

Here's one other thing I don't understand. Suppose my directory One
contains some number of files, and directory Two contains the same
number, named the same and partitioned the same. If I give the directory
names One and Two to the example program, will it match up the files by
name for performing the join? I haven't found the code yet to do that,
although I'm imagining that perhaps that's what it does.

Cheers,
John

-Original Message-
From: Chris Douglas [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 08, 2008 1:57 PM
To: core-user@hadoop.apache.org
Subject: Re: "Join" example

The contrib/data_join framework is different from the map-side join  
framework, under o.a.h.mapred.join.

To see what the example is doing in an outer join, generate a few  
sample, text input files, tab-separated:

join/a.txt:

a0
a1
a2
a3

join/b.txt:

b0
b1
b2
b3

join/c.txt:

c0
c1
c2
c3

Run the example with each as an input:

host$ bin/hadoop jar hadoop-*-examples.jar join \
   -inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat \
   -outKey org.apache.hadoop.io.Text \
   -joinOp outer \
   join/a.txt join/b.txt join/c.txt joinout

Examine the result in joinout/part-0:

host$ bin/hadoop fs -text joinout/part-0 | less
[a0,b0,c0]
[a1,b1,c1]
[a1,b2,c1]
[a1,b3,c1]
[a2,,]
[a3,,]
[,,c2]
[,,c3]

-C

On Aug 7, 2008, at 11:39 PM, Wei Wu wrote:

> There are some examples in $HADOOPHOME/src/contrib/data_join, which  
> I hope
> would help.
>
> Wei
>
> -Original Message-
> From: John DeTreville [mailt

RE: "Join" example

2008-08-08 Thread John DeTreville
Thanks very much, Chris!

Cheers,
John

-Original Message-
From: Chris Douglas [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 08, 2008 1:57 PM
To: core-user@hadoop.apache.org
Subject: Re: "Join" example

The contrib/data_join framework is different from the map-side join  
framework, under o.a.h.mapred.join.

To see what the example is doing in an outer join, generate a few  
sample, text input files, tab-separated:

join/a.txt:

a0
a1
a2
a3

join/b.txt:

b0
b1
b2
b3

join/c.txt:

c0
c1
c2
c3

Run the example with each as an input:

host$ bin/hadoop jar hadoop-*-examples.jar join \
   -inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat \
   -outKey org.apache.hadoop.io.Text \
   -joinOp outer \
   join/a.txt join/b.txt join/c.txt joinout

Examine the result in joinout/part-0:

host$ bin/hadoop fs -text joinout/part-0 | less
[a0,b0,c0]
[a1,b1,c1]
[a1,b2,c1]
[a1,b3,c1]
[a2,,]
[a3,,]
[,,c2]
[,,c3]

-C

On Aug 7, 2008, at 11:39 PM, Wei Wu wrote:

> There are some examples in $HADOOPHOME/src/contrib/data_join, which  
> I hope
> would help.
>
> Wei
>
> -Original Message-
> From: John DeTreville [mailto:[EMAIL PROTECTED]
> Sent: Friday, August 08, 2008 2:34 AM
> To: core-user@hadoop.apache.org
> Subject: "Join" example
>
> Hadoop ships with a few example programs. One of these is "join,"  
> which
> I believe demonstrates map-side joins. I'm finding its usage
> instructions a little impenetrable; could anyone send me instructions
> that are more like "type this" then "type this" then "type this"?
>
> Thanks in advance.
>
> Cheers,
> John
>



Re: "Join" example

2008-08-08 Thread Chris Douglas
The contrib/data_join framework is different from the map-side join  
framework, under o.a.h.mapred.join.


To see what the example is doing in an outer join, generate a few  
sample, text input files, tab-separated:


join/a.txt:

a0
a1
a2
a3

join/b.txt:

b0
b1
b2
b3

join/c.txt:

c0
c1
c2
c3

Run the example with each as an input:

host$ bin/hadoop jar hadoop-*-examples.jar join \
  -inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat \
  -outKey org.apache.hadoop.io.Text \
  -joinOp outer \
  join/a.txt join/b.txt join/c.txt joinout

Examine the result in joinout/part-0:

host$ bin/hadoop fs -text joinout/part-0 | less
[a0,b0,c0]
[a1,b1,c1]
[a1,b2,c1]
[a1,b3,c1]
[a2,,]
[a3,,]
[,,c2]
[,,c3]

-C

On Aug 7, 2008, at 11:39 PM, Wei Wu wrote:

There are some examples in $HADOOPHOME/src/contrib/data_join, which  
I hope

would help.

Wei

-Original Message-
From: John DeTreville [mailto:[EMAIL PROTECTED]
Sent: Friday, August 08, 2008 2:34 AM
To: core-user@hadoop.apache.org
Subject: "Join" example

Hadoop ships with a few example programs. One of these is "join,"  
which

I believe demonstrates map-side joins. I'm finding its usage
instructions a little impenetrable; could anyone send me instructions
that are more like "type this" then "type this" then "type this"?

Thanks in advance.

Cheers,
John





RE: "Join" example

2008-08-07 Thread Wei Wu
There are some examples in $HADOOPHOME/src/contrib/data_join, which I hope
would help.

Wei

-Original Message-
From: John DeTreville [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 08, 2008 2:34 AM
To: core-user@hadoop.apache.org
Subject: "Join" example

Hadoop ships with a few example programs. One of these is "join," which
I believe demonstrates map-side joins. I'm finding its usage
instructions a little impenetrable; could anyone send me instructions
that are more like "type this" then "type this" then "type this"?

Thanks in advance.

Cheers,
John