Re: EC2 + Thrift inserts

Chris Tarnas Wed, 28 Apr 2010 20:43:34 -0700

Hi JG,

Speed is now down to 18 rows/sec/table per process.


Here is a regionserver log that is serving two of the regions:

http://pastebin.com/Hx5se0hz

Here is the GC Log from the same server:

http://pastebin.com/ChrRvxCx

Here is the master log:

http://pastebin.com/L1Kn66qU

The thrift server logs have nothing in them in the same time period.

Thanks in advance!

-chris

On Apr 28, 2010, at 7:32 PM, Jonathan Gray wrote:

> Hey Chris,
> 
> That's a really significant slowdown.  I can't think of anything obvious that 
> would cause that in your setup.
> 
> Any chance of some regionserver and master logs from the time it was going 
> slow?  Is there any activity in the logs of the regionservers hosting the 
> regions of the table being written to?
> 
> JG
> 
>> -----Original Message-----
>> From: Christopher Tarnas [mailto:c...@tarnas.org] On Behalf Of Chris
>> Tarnas
>> Sent: Wednesday, April 28, 2010 6:27 PM
>> To: hbase-user@hadoop.apache.org
>> Subject: EC2 + Thrift inserts
>> 
>> Hello all,
>> 
>> First, thanks to all the HBase developers for producing this, it's a
>> great project and I'm glad to be able to use it.
>> 
>> I'm looking for some help and hints here with insert performance help.
>> I'm doing some benchmarking, testing how I can scale up using HBase,
>> not really looking at raw speed. The testing is happening on EC2, using
>> Andrew's scripts (thanks - those were very helpful) to set them up and
>> with a slightly customized version of the default AMIs (added my
>> application modules). I'm using HBase 20.3 and Hadoop 20.1. I've looked
>> at the tips in the Wiki and it looks like Andrew's scripts are already
>> setup that way.
>> 
>> I'm inserting into HBase from a hadoop streaming job that runs perl and
>> uses the thrift gateway. I'm also using the Transactional tables so
>> that alone could be the case, but from what I can tell I don't think
>> so. LZO compression is also enabled for the column families (much of
>> the data is highly compressible). My cluster has 7 nodes, 5
>> regionservers, 1 master and 1 zookeeper. The regionservers and master
>> are c1.xlarges. Each regionserver has the tasktrackers that runs the
>> hadoop streaming jobs, and regionserver also runs its own thrift
>> server. Each mapper that does the load talks to the localhost's thrift
>> server.
>> 
>> The Row keys a fixed string + an incremental number then the order of
>> the bytes are reversed, so runA123 becomes 321Anur. I though of using
>> murmur hash but was worried about collisions.
>> 
>> As I add more insert jobs, each jobs throughput goes down. Way down. I
>> went from about 200 row/sec/table per job with one job to about 24
>> rows/sec/table per job with 25 running jobs. The servers are mostly
>> idle. I'm loading into two tables, one has several indexes and I'm
>> loading into three column families, the other has no indexes and one
>> column family. Both tables only currently have two region each.
>> 
>> The regionserver that serves the indexed table's regions is using the
>> most CPU but is 87% idle. The other servers are all at ~90% idle. There
>> is no IO wait. the perl processes are barely ticking over. Java on the
>> most "loaded" server is using about 50-60% of one CPU.
>> 
>> Normally when I do load in a pseudo-distrbuted hbase (my development
>> platform) perl's speed is the limiting factor and uses about 85% of a
>> CPU. In this cluster they are using only 5-10% of a CPU as they are all
>> waiting on thrift (hbase). When I run only 1 process on the cluster,
>> perl uses much more of a CPU, maybe 70%.
>> 
>> Any tips or help in getting the speed/scalability up would be great.
>> Please let me know if you need any other info.
>> 
>> As I send this - it looks like the main table has split again and is
>> being served by three regionservers.. My performance is going up a bit
>> (now 35 rows/sec/table per processes), but still seems like I'm not
>> using the full potential of even the limited EC2 system, no IO wait and
>> lots of idle CPU.
>> 
>> 
>> many thanks
>> -chris
>> 
>> 
>> 
>> 
>> 
>

Re: EC2 + Thrift inserts

Reply via email to