Re: Good starting instance for AMI

Ken Krugler Sun, 10 Jan 2010 17:03:45 -0800

I've been using EMR for the public terabyte dataset project.


In general it's worked for me, with the following caveats:

1. Hadoop 0.18.3, which meant I had to re-work some of my code thatdepended on newer (Hadoop 0.19.x) support.

2. It was kind of painful to get it running initially (setting up theright credentials.json file, etc)

3. You'll need S3 access, of course, which is another series of hoopsto jump through.

4. You really want to run in the mode where you create an EMR job withno steps, then add steps to run - otherwise you can waste a lot oftime firing up EMR jobs that fail immediately.

5. For bigger clusters, some of the Hadoop configuration parametersaren't set very well.


-- Ken

On Jan 10, 2010, at 4:21pm, Benson Margulies wrote:

That's what I meant. I haven't tried it yet, so I've got the same
question Jake has.
On Sun, Jan 10, 2010 at 6:27 PM, Jake Mannix <[email protected]>wrote:
You mean Elastic MapReduce (EMR)? Has anyone here had any luckwith that
for this or other projects?

 -jake
On Jan 10, 2010 3:21 PM, "Benson Margulies" <[email protected]>wrote:
Stupid question: I thought there was a way to use the cloud as a
hadoop farm directly without having to configure instances.
On Sun, Jan 10, 2010 at 6:18 PM, Sean Owen <[email protected]>wrote: > I
like the Alestic instances...


--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g

Re: Good starting instance for AMI

Reply via email to