On Apr 16, 2009, at 10:27 AM, Stephen Green wrote:
java.lang.NullPointerException
at
org
.apache
.hadoop
.fs.s3native.NativeS3FileSystem.delete(NativeS3FileSystem.java:310)
at
org
.apache
.mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:83)
at
org
.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:
45)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun
.reflect
.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun
.reflect
.DelegatingMethodAccessorImpl
.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
(The line numbers in kmeans.Job are weird because I added logging.)
If the Hadoop on EMR is really 0.18.3, then the null pointer here is
the store in the NativeS3FileSystem. But there's another problem:
I deleted the output path before I started the run, so the existence
check should have failed and dfs.delete never should have been
called. I added a bit of logging to the KMeans job and here's what
it says about the output path:
OK, I figured this one out. I gave it the URI s3n://mahout-output/ as
the output directory. This is a problem because the URI parsing code
interprets mahout-output as a host name in the URI, which means that
roundabout NativeS3FileSystem:319, it gets the key "" from pathToKey,
which apparently indicates the root directory, which is always
supposed to exist, and so the input path is used to create a
directory, which generates the output directory and a non-null result
from getFileStatus, and therefore a true response from the exists call.
Using a subdirectory on the URI (s3n://mahout-output/kmeans) gets the
key kmeans, which moves things along a little farther.
This is a weird disconnect in the pathToKey code, I think.
Steve
--
Stephen Green // [email protected]
Principal Investigator \\ http://blogs.sun.com/searchguy
Aura Project // Voice: +1 781-442-0926
Sun Microsystems Labs \\ Fax: +1 781-442-1692