Re: EC2 + Thrift inserts

Stack Wed, 28 Apr 2010 22:11:00 -0700

What is load on the server hosting meta like?  Higher than others?




On Apr 28, 2010, at 8:42 PM, Chris Tarnas <c...@email.com> wrote:

Hi JG,

Speed is now down to 18 rows/sec/table per process.

Here is a regionserver log that is serving two of the regions:

http://pastebin.com/Hx5se0hz

Here is the GC Log from the same server:

http://pastebin.com/ChrRvxCx

Here is the master log:

http://pastebin.com/L1Kn66qU

The thrift server logs have nothing in them in the same time period.

Thanks in advance!

-chris

On Apr 28, 2010, at 7:32 PM, Jonathan Gray wrote:
Hey Chris,
That's a really significant slowdown. I can't think of anythingobvious that would cause that in your setup.
Any chance of some regionserver and master logs from the time itwas going slow? Is there any activity in the logs of theregionservers hosting the regions of the table being written to?
JG
-----Original Message-----
From: Christopher Tarnas [mailto:c...@tarnas.org] On Behalf Of Chris
Tarnas
Sent: Wednesday, April 28, 2010 6:27 PM
To: hbase-user@hadoop.apache.org
Subject: EC2 + Thrift inserts

Hello all,

First, thanks to all the HBase developers for producing this, it's a
great project and I'm glad to be able to use it.
I'm looking for some help and hints here with insert performancehelp.
I'm doing some benchmarking, testing how I can scale up using HBase,
not really looking at raw speed. The testing is happening on EC2,usingAndrew's scripts (thanks - those were very helpful) to set them upand
with a slightly customized version of the default AMIs (added my
application modules). I'm using HBase 20.3 and Hadoop 20.1. I'velookedat the tips in the Wiki and it looks like Andrew's scripts arealready
setup that way.
I'm inserting into HBase from a hadoop streaming job that runsperl and
uses the thrift gateway. I'm also using the Transactional tables so
that alone could be the case, but from what I can tell I don't think
so. LZO compression is also enabled for the column families (much of
the data is highly compressible). My cluster has 7 nodes, 5
regionservers, 1 master and 1 zookeeper. The regionservers andmaster
are c1.xlarges. Each regionserver has the tasktrackers that runs the
hadoop streaming jobs, and regionserver also runs its own thrift
server. Each mapper that does the load talks to the localhost'sthrift
server.
The Row keys a fixed string + an incremental number then the orderofthe bytes are reversed, so runA123 becomes 321Anur. I though ofusing
murmur hash but was worried about collisions.
As I add more insert jobs, each jobs throughput goes down. Waydown. I
went from about 200 row/sec/table per job with one job to about 24
rows/sec/table per job with 25 running jobs. The servers are mostly
idle. I'm loading into two tables, one has several indexes and I'm
loading into three column families, the other has no indexes and one
column family. Both tables only currently have two region each.
The regionserver that serves the indexed table's regions is usingthemost CPU but is 87% idle. The other servers are all at ~90% idle.Thereis no IO wait. the perl processes are barely ticking over. Java onthe
most "loaded" server is using about 50-60% of one CPU.

Normally when I do load in a pseudo-distrbuted hbase (my development
platform) perl's speed is the limiting factor and uses about 85%of aCPU. In this cluster they are using only 5-10% of a CPU as theyare all
waiting on thrift (hbase). When I run only 1 process on the cluster,
perl uses much more of a CPU, maybe 70%.

Any tips or help in getting the speed/scalability up would be great.
Please let me know if you need any other info.

As I send this - it looks like the main table has split again and is
being served by three regionservers.. My performance is going up abit
(now 35 rows/sec/table per processes), but still seems like I'm not
using the full potential of even the limited EC2 system, no IOwait and
lots of idle CPU.


many thanks
-chris

Re: EC2 + Thrift inserts

Reply via email to