Hi Jeff,

On Fri, Jun 10, 2011 at 7:38 PM, Jeff Eastman <jeast...@narus.com> wrote:

The first run on MapR:

> MAHOUT_LOCAL is set, running locally
[...]
> INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, 
> --endPhase=2147483647, 
> --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, 
> --input=mahout-work/reuters-out, --keyPrefix=, 
> --output=mahout-work/reuters-out-seqdir, --startPhase=0, --tempDir=temp}
> Exception in thread "main" java.io.IOException: No FileSystem for scheme: 
> maprfs
>        at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
>        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)

I force seqdirectory to run locally because it is considerable more
efficient to copy its output up to hdfs instead of copying the output
of the prior step, extract reuters, up to hdfs and then running
seqdirectory on the cluster. When seqdirectory is run locally, we are
simply calling java with the classpath set up with bin/mahout -- I
think it is not likely that this classpath includes the MapR classes.
However,We are pointing to the hadoop configuration that references a
maprfs filesystem. That configuration is loaded and maprfs is not
understood as a valid scheme due to the absent MapR classes. As a
result we encounter this error.

bin/mahout should/could slurp in the classpath appropriate to the
hadoop installation somehow, not sure of the best way to do this.

> And then, after changing HADOOP_HOME & HADOOP_CONF_DIR to CDH3 on a fresh 
> untar/install of 0.5:
[..]
> INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, 
> --endPhase=2147483647, 
> --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, 
> --input=mahout-work/reuters-out, --keyPrefix=, 
> --output=mahout-work/reuters-out-seqdir, --startPhase=0, --tempDir=temp}
> Exception in thread "main" java.io.IOException: Call to 
> hadoop1.eng.narus.com/172.31.2.200:8020 failed on local exception: 
> java.io.EOFException
>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)

I wonder if this is a similar case, where the hadoop classes packaged
with mahout are being used to talk to a CDH3 hadoop cluster and thus
we're bumping against protocol incompatibilities? Although
MAHOUT_LOCAL is set in this case, seqdirectory is clearly trying to
reach out to HDFS as a part of the filesystem setup process.

All in all it seems that rejiggering the classpath in bin/mahout to
make the classes specific to the hadoop environment it is executing
within appear first in the classpath may be the correct way to resolve
this issue.

Do I vaguely recall seeing another discussion regarding classpath
order pop up on the list recently?

Jeff, I really appreciate you putting this through its paces on the
wide variety of environments you have access too, thanks!

Drew

Reply via email to