I think the reserved EC2 instances just give you a better deal price-wise in exchange for an advanced payment and, essentially, a contract. I didn't see any mentions of reserved instances mean no sharing. If AWS did that, they'd be nothing more than a regular hosting service.
Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ ----- Original Message ---- > From: Something Something <[email protected]> > To: [email protected] > Cc: [email protected] > Sent: Tue, January 26, 2010 3:49:31 PM > Subject: Re: Performance of EC2 > > Wow.. how naive I am to think that I could trust Amazon. Thanks for > forwarding the links, Patrick. Seems like Amazon's reliability has gone > down considerably over the past few months. (Occasionally my instances fail > on startup or die in the middle for no apparent reason, and I used to think > I was doing something dumb!) > > But what I don't understand is this... if I *reserve* an instance then I > wouldn't be sharing its CPU with anyone, right? The blog seems to indicate > otherwise. > > I guess, I will have to look for alternatives to Amazon EC2. Any one has > any recommendations? Thanks again. > > > On Tue, Jan 26, 2010 at 11:44 AM, Patrick Hunt wrote: > > > Re "Amazon predictability", did you guys see this recent paper: > > http://people.csail.mit.edu/tromer/cloudsec/ > > > > Also some addl background on "noisy neighbor effects": > > http://bit.ly/4O7dHx > > http://bit.ly/8zPvQd > > > > Some interesting bits of information in there. > > > > Patrick > > > > > > Something Something wrote: > > > >> Here are some of the answers: > >> > >> How many concurrent reducers run on each node? Default two? > >>>> > >>> I was assuming 2 on each node would be the default. If not, this could > >> be a > >> problem. Please let me know. > >> > >> 'd suggest you spend a bit of time figuring where your MR jobs > >>>> > >>> are spending their time? > >> I agree. Will do some more research :) > >> > >> How much of this overall time is spent in reduce phase? > >>>> > >>> Mostly time is spent in the Reduce phases, because that's where most of > >> the > >> critical code is. > >> > >> Are inserts to a new table? > >>>> > >>> Yes, all inserts will always be in a new table. In fact, I disable/drop > >> HTables during this process. Not using any special indexes, should I be? > >> > >> I'm a little surprised that all worked on the small instances, that your > >>>> > >>> jobs completed. > >> But, really, shouldn't Amazon guarantee predictability :) After all I am > >> paying for these instances.. albeit a small amount! > >> > >> Are you opening a new table inside each task or once up in the config? > >>>> > >>> I open HTable in the 'setup' method for each mapper/reducer, and close > >> table > >> in the 'cleanup' method. > >> > >> You have to temper the above general rule with the fact that... > >>>> > >>> I will try a few combinations. > >> > >> How big is your dataset? > >>>> > >>> This one in particular is not big, but the real production ones will be. > >> Here's approximately how many rows get processed: > >> Phase 1: 300 rows > >> Phase 2 thru 8: 100 rows. > >> (Note: Each phase does complex calculations on the row.) > >> > >> Thanks for the help. > >> > >> > >> On Tue, Jan 26, 2010 at 10:36 AM, Jean-Daniel Cryans > >> >wrote: > >> > >> How big is your dataset? > >>> > >>> J-D > >>> > >>> On Tue, Jan 26, 2010 at 8:47 AM, Something Something > >>> wrote: > >>> > >>>> I have noticed some strange performance numbers on EC2. If someone can > >>>> > >>> give > >>> > >>>> me some hints to improve performance that would be greatly appreciated. > >>>> Here are the details: > >>>> > >>>> I have a process that runs a series of Jobs under Hadoop 0.20.1 & Hbase > >>>> 0.20.2 I ran the *exact* same process with following configurations: > >>>> > >>>> 1) 1 Master & 4 Workers (*c1.xlarge* instances) & 1 Zookeeper > >>>> > >>> (*c1.medium*) > >>> > >>>> with *8 Reducers *for every Reduce task. The process completed in *849* > >>>> seconds. > >>>> > >>>> 2) 1 Master, 4 Workers & 1 Zookeeper *ALL m1.small* instances with *8 > >>>> Reducers *for every Reduce task. The process completed in *906* > >>>> seconds. > >>>> > >>>> 3) 1 Master, *11* Workers & *3* Zookeepers *ALL m1.small* instances > >>>> with > >>>> > >>> *20 > >>> > >>>> Reducers *for every Reduce task. The process completed in *984* > >>>> seconds! > >>>> > >>>> > >>>> Two main questions: > >>>> > >>>> 1) It's totally surprising that when I have 11 workers with 20 Reducers > >>>> > >>> it > >>> > >>>> runs slower than when I have exactly same type of fewer machines with > >>>> > >>> fewer > >>> > >>>> reducers.. > >>>> 2) As expected it runs faster on c1.xlarge, but the performance > >>>> > >>> improvement > >>> > >>>> doesn't justify the high cost difference. I must not be utilizing the > >>>> machine power, but I don't know how to do that. > >>>> > >>>> Here are some of the performance improvements tricks that I have learnt > >>>> > >>> from > >>> > >>>> this mailing list in the past that I am using: > >>>> > >>>> 1) conf.set("hbase.client.scanner.caching", "30"); I have this for > >>>> all > >>>> jobs. > >>>> > >>>> 2) Using the following code every time I open a HTable: > >>>> this.table = new HTable(new HBaseConfiguration(), "tablenameXYZ"); > >>>> table.setAutoFlush(false); > >>>> table.setWriteBufferSize(1024 * 1024 * 12); > >>>> > >>>> 3) For every Put I do this: > >>>> Put put = new Put(Bytes.toBytes(out)); > >>>> put.setWriteToWAL(false); > >>>> > >>>> 4) Change the No. of Reducers as per the No. of Workers. I believe the > >>>> formula is: # of workers * 1.75. > >>>> > >>>> Any other hints? As always, greatly appreciate the help. Thanks. > >>>> > >>>> > >>
