How big is your dataset? J-D
On Tue, Jan 26, 2010 at 8:47 AM, Something Something <[email protected]> wrote: > I have noticed some strange performance numbers on EC2. If someone can give > me some hints to improve performance that would be greatly appreciated. > Here are the details: > > I have a process that runs a series of Jobs under Hadoop 0.20.1 & Hbase > 0.20.2 I ran the *exact* same process with following configurations: > > 1) 1 Master & 4 Workers (*c1.xlarge* instances) & 1 Zookeeper (*c1.medium*) > with *8 Reducers *for every Reduce task. The process completed in *849* > seconds. > > 2) 1 Master, 4 Workers & 1 Zookeeper *ALL m1.small* instances with *8 > Reducers *for every Reduce task. The process completed in *906* seconds. > > 3) 1 Master, *11* Workers & *3* Zookeepers *ALL m1.small* instances with *20 > Reducers *for every Reduce task. The process completed in *984* seconds! > > > Two main questions: > > 1) It's totally surprising that when I have 11 workers with 20 Reducers it > runs slower than when I have exactly same type of fewer machines with fewer > reducers.. > 2) As expected it runs faster on c1.xlarge, but the performance improvement > doesn't justify the high cost difference. I must not be utilizing the > machine power, but I don't know how to do that. > > Here are some of the performance improvements tricks that I have learnt from > this mailing list in the past that I am using: > > 1) conf.set("hbase.client.scanner.caching", "30"); I have this for all > jobs. > > 2) Using the following code every time I open a HTable: > this.table = new HTable(new HBaseConfiguration(), "tablenameXYZ"); > table.setAutoFlush(false); > table.setWriteBufferSize(1024 * 1024 * 12); > > 3) For every Put I do this: > Put put = new Put(Bytes.toBytes(out)); > put.setWriteToWAL(false); > > 4) Change the No. of Reducers as per the No. of Workers. I believe the > formula is: # of workers * 1.75. > > Any other hints? As always, greatly appreciate the help. Thanks. >
