Also I assume your HDFS is provisioned on locally attached disk, aka instance store, and not EBS?
On Wed, Jan 15, 2014 at 3:26 PM, Andrew Purtell <[email protected]> wrote: > m1.xlarge is a poorly provisioned instance type, with low PPS at the > network layer. Can you try a type advertised to have "high" I/O > performance? > > > On Wed, Jan 15, 2014 at 12:33 PM, Vladimir Rodionov < > [email protected]> wrote: > >> This is something which needs to be definitely solved/fixed/resolved >> >> I am running YCSB benchmark on aws ec2 on a small HBase cluster >> >> 5 (m1.xlarge) as RS >> 1 (m1.xlarge) hbase-master, zookeper >> >> Whirr 0.8.2 (with many hacks) is used to provision HBase. >> >> I am running 1 ycsb client (100% insert ops) throttled at 5K ops: >> >> ./bin/ycsb load hbase -P workloads/load20m -p columnfamily=family -s >> -threads 10 -target 5000 >> >> OUTPUT: >> >> 1120 sec: 5602339 operations; 4999.7 current ops/sec; [INSERT >> AverageLatency(us)=225.53] >> 1130 sec: 5652117 operations; 4969.35 current ops/sec; [INSERT >> AverageLatency(us)=203.31] >> 1140 sec: 5665210 operations; 1309.04 current ops/sec; [INSERT >> AverageLatency(us)=17.13] >> 1150 sec: 5665210 operations; 0 current ops/sec; >> 1160 sec: 5665210 operations; 0 current ops/sec; >> 1170 sec: 5665210 operations; 0 current ops/sec; >> 1180 sec: 5665210 operations; 0 current ops/sec; >> 1190 sec: 5665210 operations; 0 current ops/sec; >> 2014-01-15 15:19:34,139 Thread-2 WARN >> [HConnectionManager$HConnectionImplementation] Failed all from >> region=usertable,user6039,1389811852201.40518862106856d23b883e5d543d0b89., >> hostname=ip-10-45-174-120.ec2.internal, port=60020 >> java.util.concurrent.ExecutionException: java.net.SocketTimeoutException: >> Call to ip-10-45-174-120.ec2.internal/10.45.174.120:60020 failed on >> socket timeout exception: java.net.SocketTimeoutException: 60000 millis >> timeout while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=/10.180.211.173:42466remote=ip-10-45-174-120.ec2.internal/ >> 10.45.174.120:60020] >> at >> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) >> at java.util.concurrent.FutureTask.get(FutureTask.java:111) >> at >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1708) >> at >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1560) >> at >> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:994) >> at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:850) >> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:826) >> at com.yahoo.ycsb.db.HBaseClient.update(HBaseClient.java:328) >> at com.yahoo.ycsb.db.HBaseClient.insert(HBaseClient.java:357) >> at com.yahoo.ycsb.DBWrapper.insert(DBWrapper.java:148) >> at >> com.yahoo.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:461) >> at com.yahoo.ycsb.ClientThread.run(Client.java:269) >> Caused by: java.net.SocketTimeoutException: Call to >> ip-10-45-174-120.ec2.internal/10.45.174.120:60020 failed on socket >> timeout exception: java.net.SocketTimeoutException: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=/10.180.211.173:42466remote=ip-10-45-174-120.ec2.internal/ >> 10.45.174.120:60020] >> at >> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1043) >> at >> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1016) >> at >> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:87) >> at com.sun.proxy.$Proxy5.multi(Unknown Source) >> at >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1537) >> at >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1535) >> at >> org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:229) >> at >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1544) >> at >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1532) >> at >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >> at java.util.concurrent.FutureTask.run(FutureTask.java:166) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:701) >> >> >> SKIPPED A LOT >> >> >> 1200 sec: 5674180 operations; 896.82 current ops/sec; [INSERT >> AverageLatency(us)=7506.37] >> 1210 sec: 6022326 operations; 34811.12 current ops/sec; [INSERT >> AverageLatency(us)=1998.26] >> 1220 sec: 6102627 operations; 8018.07 current ops/sec; [INSERT >> AverageLatency(us)=395.11] >> 1230 sec: 6152632 operations; 5000 current ops/sec; [INSERT >> AverageLatency(us)=182.53] >> 1240 sec: 6202641 operations; 4999.9 current ops/sec; [INSERT >> AverageLatency(us)=201.76] >> 1250 sec: 6252642 operations; 4999.6 current ops/sec; [INSERT >> AverageLatency(us)=190.46] >> 1260 sec: 6302653 operations; 5000.1 current ops/sec; [INSERT >> AverageLatency(us)=212.31] >> 1270 sec: 6352660 operations; 5000.2 current ops/sec; [INSERT >> AverageLatency(us)=217.77] >> 1280 sec: 6402731 operations; 5000.1 current ops/sec; [INSERT >> AverageLatency(us)=195.83] >> 1290 sec: 6452740 operations; 4999.9 current ops/sec; [INSERT >> AverageLatency(us)=232.43] >> 1300 sec: 6502743 operations; 4999.8 current ops/sec; [INSERT >> AverageLatency(us)=290.52] >> 1310 sec: 6552755 operations; 5000.2 current ops/sec; [INSERT >> AverageLatency(us)=259.49] >> >> >> As you can see here there is ~ 60 sec total write stall on a cluster >> which I suppose 100% correlates with compactions started (minor) >> >> MAX_FILESIZE = 5GB >> ## Regions of 'usertable' - 50 >> >> I would appreciate any advices on how to get rid of these stalls. 5K per >> sec is quite moderate load even for 5 lousy AWS servers. Or it is not? >> >> Best regards, >> Vladimir Rodionov >> Principal Platform Engineer >> Carrier IQ, www.carrieriq.com >> e-mail: [email protected] >> >> >> Confidentiality Notice: The information contained in this message, >> including any attachments hereto, may be confidential and is intended to be >> read only by the individual or entity to whom this message is addressed. If >> the reader of this message is not the intended recipient or an agent or >> designee of the intended recipient, please note that any review, use, >> disclosure or distribution of this message or its attachments, in any form, >> is strictly prohibited. If you have received this message in error, please >> immediately notify the sender and/or [email protected] and >> delete or destroy any copy of this message and its attachments. >> > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
