On Apr 16, 2009, at 10:27 AM, Stephen Green wrote:

java.lang.NullPointerException
at org .apache .hadoop .fs.s3native.NativeS3FileSystem.delete(NativeS3FileSystem.java:310) at org .apache .mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:83) at org .apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java: 45)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun .reflect .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:597)
       at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
       at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
       at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

(The line numbers in kmeans.Job are weird because I added logging.)

If the Hadoop on EMR is really 0.18.3, then the null pointer here is the store in the NativeS3FileSystem. But there's another problem: I deleted the output path before I started the run, so the existence check should have failed and dfs.delete never should have been called. I added a bit of logging to the KMeans job and here's what it says about the output path:



OK, I figured this one out. I gave it the URI s3n://mahout-output/ as the output directory. This is a problem because the URI parsing code interprets mahout-output as a host name in the URI, which means that roundabout NativeS3FileSystem:319, it gets the key "" from pathToKey, which apparently indicates the root directory, which is always supposed to exist, and so the input path is used to create a directory, which generates the output directory and a non-null result from getFileStatus, and therefore a true response from the exists call.

Using a subdirectory on the URI (s3n://mahout-output/kmeans) gets the key kmeans, which moves things along a little farther.

This is a weird disconnect in the pathToKey code, I think.

Steve
--
Stephen Green                      //   [email protected]
Principal Investigator             \\   http://blogs.sun.com/searchguy
Aura Project                       //   Voice: +1 781-442-0926
Sun Microsystems Labs              \\   Fax:   +1 781-442-1692



Reply via email to