Re: Mahout on Elastic MapReduce

Stephen Green Tue, 14 Apr 2009 15:55:19 -0700


On Apr 14, 2009, at 5:17 PM, Grant Ingersoll wrote:

I would be concerned about the fact that EMR is using 0.18 andMahout is on 0.19 (which of course raises another concern expressedby Owen O'Malley to me at ApacheCon: No one uses 0.19)

Well, I did run Mahout locally on a 0.18.3 install, but that waswriting to and reading from HDFS. I can build a custom mahout-examples that has the 0.18.3 Hadoop jars (or perhaps no hadoop jar atall...) I'm guessing if EMR is on 0.18.3 and it gets popular, thenyou're going to have to deal with that problem.

I'd say you should try reproducing the problem on the same versionthat Mahout uses.

That'll be a bit tricky in the EMR case as that's Amazon's business(ask me about trying to get a 64bit Solaris AMI on Amazon's version ofXen...)

FWIW, any committer on the Mahout project can likely get credits touse AWS.


I'm happy to share my limited experience.

Also:

----- Original Message ----

From: Sean Owen <[email protected]>
To: [email protected]
Sent: Tuesday, April 14, 2009 4:19:51 PM
Subject: Re: Mahout on Elastic MapReduce

This is a fairly uninformed observation, but: the error seems to be

from Hadoop. It seems to say that it understands hdfs:, but nots3n:,

and that makes sense to me. Do we expect Hadoop understands how to

read from S3? I would expect not. (Though, you point to examplesthat

seem to overcome this just fine?)

As Otis pointed out, Hadoop can handle S3 a couple of ways, and theexample that I've been working seems to be able to read the input datafrom an s3n URI no problem.

When I have integrated code with stuff stored on S3, I have alwayshad
to write extra glue code to copy from S3 to a local file system, do
work, then copy back.

I think you do need to copy from S3 to HDFS, but I think that happensautomagically (? My Hadoop ignorance is starting to show!)


Steve
--
Stephen Green                      //   [email protected]
Principal Investigator             \\   http://blogs.sun.com/searchguy
Aura Project                       //   Voice: +1 781-442-0926
Sun Microsystems Labs              \\   Fax:   +1 781-442-1692

Re: Mahout on Elastic MapReduce

Reply via email to