Hi JG, Speed is now down to 18 rows/sec/table per process.
Here is a regionserver log that is serving two of the regions: http://pastebin.com/Hx5se0hz Here is the GC Log from the same server: http://pastebin.com/ChrRvxCx Here is the master log: http://pastebin.com/L1Kn66qU The thrift server logs have nothing in them in the same time period. Thanks in advance! -chris On Apr 28, 2010, at 7:32 PM, Jonathan Gray wrote: > Hey Chris, > > That's a really significant slowdown. I can't think of anything obvious that > would cause that in your setup. > > Any chance of some regionserver and master logs from the time it was going > slow? Is there any activity in the logs of the regionservers hosting the > regions of the table being written to? > > JG > >> -----Original Message----- >> From: Christopher Tarnas [mailto:c...@tarnas.org] On Behalf Of Chris >> Tarnas >> Sent: Wednesday, April 28, 2010 6:27 PM >> To: hbase-user@hadoop.apache.org >> Subject: EC2 + Thrift inserts >> >> Hello all, >> >> First, thanks to all the HBase developers for producing this, it's a >> great project and I'm glad to be able to use it. >> >> I'm looking for some help and hints here with insert performance help. >> I'm doing some benchmarking, testing how I can scale up using HBase, >> not really looking at raw speed. The testing is happening on EC2, using >> Andrew's scripts (thanks - those were very helpful) to set them up and >> with a slightly customized version of the default AMIs (added my >> application modules). I'm using HBase 20.3 and Hadoop 20.1. I've looked >> at the tips in the Wiki and it looks like Andrew's scripts are already >> setup that way. >> >> I'm inserting into HBase from a hadoop streaming job that runs perl and >> uses the thrift gateway. I'm also using the Transactional tables so >> that alone could be the case, but from what I can tell I don't think >> so. LZO compression is also enabled for the column families (much of >> the data is highly compressible). My cluster has 7 nodes, 5 >> regionservers, 1 master and 1 zookeeper. The regionservers and master >> are c1.xlarges. Each regionserver has the tasktrackers that runs the >> hadoop streaming jobs, and regionserver also runs its own thrift >> server. Each mapper that does the load talks to the localhost's thrift >> server. >> >> The Row keys a fixed string + an incremental number then the order of >> the bytes are reversed, so runA123 becomes 321Anur. I though of using >> murmur hash but was worried about collisions. >> >> As I add more insert jobs, each jobs throughput goes down. Way down. I >> went from about 200 row/sec/table per job with one job to about 24 >> rows/sec/table per job with 25 running jobs. The servers are mostly >> idle. I'm loading into two tables, one has several indexes and I'm >> loading into three column families, the other has no indexes and one >> column family. Both tables only currently have two region each. >> >> The regionserver that serves the indexed table's regions is using the >> most CPU but is 87% idle. The other servers are all at ~90% idle. There >> is no IO wait. the perl processes are barely ticking over. Java on the >> most "loaded" server is using about 50-60% of one CPU. >> >> Normally when I do load in a pseudo-distrbuted hbase (my development >> platform) perl's speed is the limiting factor and uses about 85% of a >> CPU. In this cluster they are using only 5-10% of a CPU as they are all >> waiting on thrift (hbase). When I run only 1 process on the cluster, >> perl uses much more of a CPU, maybe 70%. >> >> Any tips or help in getting the speed/scalability up would be great. >> Please let me know if you need any other info. >> >> As I send this - it looks like the main table has split again and is >> being served by three regionservers.. My performance is going up a bit >> (now 35 rows/sec/table per processes), but still seems like I'm not >> using the full potential of even the limited EC2 system, no IO wait and >> lots of idle CPU. >> >> >> many thanks >> -chris >> >> >> >> >> >