On Apr 14, 2009, at 5:17 PM, Grant Ingersoll wrote:
I would be concerned about the fact that EMR is using 0.18 and
Mahout is on 0.19 (which of course raises another concern expressed
by Owen O'Malley to me at ApacheCon: No one uses 0.19)
Well, I did run Mahout locally on a 0.18.3 install, but that was
writing to and reading from HDFS. I can build a custom mahout-
examples that has the 0.18.3 Hadoop jars (or perhaps no hadoop jar at
all...) I'm guessing if EMR is on 0.18.3 and it gets popular, then
you're going to have to deal with that problem.
I'd say you should try reproducing the problem on the same version
that Mahout uses.
That'll be a bit tricky in the EMR case as that's Amazon's business
(ask me about trying to get a 64bit Solaris AMI on Amazon's version of
Xen...)
FWIW, any committer on the Mahout project can likely get credits to
use AWS.
I'm happy to share my limited experience.
Also:
----- Original Message ----
From: Sean Owen <[email protected]>
To: [email protected]
Sent: Tuesday, April 14, 2009 4:19:51 PM
Subject: Re: Mahout on Elastic MapReduce
This is a fairly uninformed observation, but: the error seems to be
from Hadoop. It seems to say that it understands hdfs:, but not
s3n:,
and that makes sense to me. Do we expect Hadoop understands how to
read from S3? I would expect not. (Though, you point to examples
that
seem to overcome this just fine?)
As Otis pointed out, Hadoop can handle S3 a couple of ways, and the
example that I've been working seems to be able to read the input data
from an s3n URI no problem.
When I have integrated code with stuff stored on S3, I have always
had
to write extra glue code to copy from S3 to a local file system, do
work, then copy back.
I think you do need to copy from S3 to HDFS, but I think that happens
automagically (? My Hadoop ignorance is starting to show!)
Steve
--
Stephen Green // [email protected]
Principal Investigator \\ http://blogs.sun.com/searchguy
Aura Project // Voice: +1 781-442-0926
Sun Microsystems Labs \\ Fax: +1 781-442-1692