Re: EC2 + Thrift inserts

Jean-Daniel Cryans Fri, 30 Apr 2010 16:44:55 -0700

On Fri, Apr 30, 2010 at 4:32 PM, Chris Tarnas <c...@email.com> wrote:
> Thank you, it is nice to get this help.
>
> I definitely understand the overhead of writing the index, although it seems 
> much worse than just that overhead would indicate. If I understand you 
> correctly that is because all inserts into an IndexedTable are synchronized 
> on one table? If that was switched to using an HTablePool it would no longer 
> be as sever a bottleneck (performance is about an order of magnitude better 
> without the indexing)?


They are synchronized per region server yes, and it _should_ be better
with a pool since then you can do parallel inserts. Patching it
doesn't seem hard, but maybe I'm missing some finer details since I
usually don't work around that code.

>
> I'm also using thrift to connect and am wondering if that itself puts an 
> overall limit on scaling? It does seem that no matter how many more mappers 
> and servers I add, even without indexing, I am capped at about 5k rows/sec 
> total. I'm waiting a bit as the table grows so that it is split across more 
> regionservers, hopefully that will help, but as far as I can tell I am not 
> hitting any CPU or IO constraint during my tests.

I don't understand the "I'm also using thrift" and "how many more
mappers" part, you are using Thrift inside a map? Anyways, more
clients won't help since there's a single mega serialization of all
the inserts to the index table per region server. It's normal not to
see any CPU/mem/IO contention since, in this case, it's all about the
speed at which you can process a single row insertion The rest of the
threads just wait...

>
> -chris
>
> I'm also using thrift, and while I am using the
> On Apr 30, 2010, at 3:00 PM, Jean-Daniel Cryans wrote:
>
>> The contrib packages doesn't get as much love as core HBase, so they
>> tend to be under performant and/or reliable and/or maintained and/or
>> etc. In this case the issue doesn't seem that bad since it could just
>> use a HTablePool, but using IndexedTables will definitely be slower
>> than straight insert since it writes to 2 tables (the main table and
>> the index).
>>
>> J-D
>>
>> On Fri, Apr 30, 2010 at 2:53 PM, Chris Tarnas <c...@email.com> wrote:
>>> It appears that for multiple simulations loads using the IndexTables 
>>> probably not the best choice?
>>>
>>> -chris
>>>
>>> On Apr 30, 2010, at 2:39 PM, Jean-Daniel Cryans wrote:
>>>
>>>> Yeah more handlers won't do it here since there's tons of calls
>>>> waiting on a single synchronized method, I guess the IndexedRegion
>>>> should use a pool of HTables instead of a single one in order to
>>>> improve indexation throughput.
>>>>
>>>> J-D
>>>>
>>>> On Fri, Apr 30, 2010 at 2:26 PM, Chris Tarnas <c...@email.com> wrote:
>>>>> Here is the thread dump:
>>>>>
>>>>> I cranked up the handlers to 300 just in case and ran 40 mappers that 
>>>>> loaded data via thrift. Each node runs its own thrift server. I saw an 
>>>>> average of 18 rows/sec/mapper with no node using more than 10% CPU and no 
>>>>> IO wait. It seems no matter how many mappers I throw the total number of 
>>>>> rows/sec doesn't go much above 700 rows/second total, which seems very, 
>>>>> very slow to me.
>>>>>
>>>>> Here is the thread dump from a node:
>>>>>
>>>>> http://pastebin.com/U3eLRdMV
>>>>>
>>>>> I do see quite a bit of waiting and some blocking in there, not sure how 
>>>>> exactly to interpret it all though.
>>>>>
>>>>> thanks for any help!
>>>>> -chris
>>>>>
>>>>> On Apr 29, 2010, at 9:14 PM, Ryan Rawson wrote:
>>>>>
>>>>>> One thing to check is at the peak of your load, run jstack on one of
>>>>>> the regionservers, and look at the handler threads - if all of them
>>>>>> are doing something you might be running into handler contention.
>>>>>>
>>>>>> it is basically ultimately IO bound.
>>>>>>
>>>>>> -ryan
>>>>>>
>>>>>> On Thu, Apr 29, 2010 at 9:12 PM, Chris Tarnas <c...@email.com> wrote:
>>>>>>> They are all at 100, but none of the regionservers are loaded - most are
>>>>>>> less than 20% CPU. Is this all network latency?
>>>>>>>
>>>>>>> -chris
>>>>>>>
>>>>>>> On Apr 29, 2010, at 8:29 PM, Ryan Rawson <ryano...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Every insert on an indexed would require at the very least an RPC to a
>>>>>>>> different regionserver.  If the regionservers are busy, your request
>>>>>>>> could wait in the queue for a moment.
>>>>>>>>
>>>>>>>> One param to tune would be the handler thread count.  Set it to 100 at
>>>>>>>> least.
>>>>>>>>
>>>>>>>> On Thu, Apr 29, 2010 at 2:16 AM, Chris Tarnas <c...@email.com> wrote:
>>>>>>>>>
>>>>>>>>> I just finished some testing with JDK 1.6 u17 - so far no performance
>>>>>>>>> improvements with just changing that. Disabling LZO compression did 
>>>>>>>>> gain a
>>>>>>>>> little bit (up to about 30/sec from 25/sec per thread). Turning of 
>>>>>>>>> indexes
>>>>>>>>> helped the most - that brought me up to 115/sec @ 2875 total rows a 
>>>>>>>>> second.
>>>>>>>>> A single perl/thrift process can load at over 350 rows/sec so its not
>>>>>>>>> scaling as well as I would have expected, even without the indexes.
>>>>>>>>>
>>>>>>>>> Are the transactional indexes that costly? What is the bottleneck 
>>>>>>>>> there?
>>>>>>>>> CPU utilization and network packets went up when I disabled the 
>>>>>>>>> indexes, I
>>>>>>>>> don't think those are the bottlenecks for the indexes. I was even 
>>>>>>>>> able to
>>>>>>>>> add another 15 insert process (total of 40) and only lost about 10% 
>>>>>>>>> on a per
>>>>>>>>> process throughput. I probably could go even higher, none of the 
>>>>>>>>> nodes are
>>>>>>>>> above CPU 60% utilization and IO wait was at most 3.5%.
>>>>>>>>>
>>>>>>>>> Each rowkey is unique, so there should not be any blocking on the row
>>>>>>>>> locks. I'll do more indexed tests tomorrow.
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>> -chris
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Apr 29, 2010, at 12:18 AM, Todd Lipcon wrote:
>>>>>>>>>
>>>>>>>>>> Definitely smells like JDK 1.6.0_18. Downgrade that back to 16 or 17 
>>>>>>>>>> and
>>>>>>>>>> you
>>>>>>>>>> should be good to go. _18 is a botched release if I ever saw one.
>>>>>>>>>>
>>>>>>>>>> -Todd
>>>>>>>>>>
>>>>>>>>>> On Wed, Apr 28, 2010 at 10:54 PM, Chris Tarnas <c...@email.com> 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Stack,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for looking. I checked the ganglia charts, no server was at 
>>>>>>>>>>> more
>>>>>>>>>>> than ~20% CPU utilization at any time during the load test and swap 
>>>>>>>>>>> was
>>>>>>>>>>> never used. Network traffic was light - just running a count through
>>>>>>>>>>> hbase
>>>>>>>>>>> shell generates a much higher use. One the server hosting meta
>>>>>>>>>>> specifically,
>>>>>>>>>>> it was at about 15-20% CPU, and IO wait never went above 3%, was
>>>>>>>>>>> usually
>>>>>>>>>>> down at near 0.
>>>>>>>>>>>
>>>>>>>>>>> The load also died with a thrift timeout on every single node (each
>>>>>>>>>>> node
>>>>>>>>>>> connecting to localhost for its thrift server), it looks like a
>>>>>>>>>>> datanode
>>>>>>>>>>> just died and caused every thrift connection to timeout - I'll have 
>>>>>>>>>>> to
>>>>>>>>>>> up
>>>>>>>>>>> that limit to handle a node death.
>>>>>>>>>>>
>>>>>>>>>>> Checking logs this appears in the logs of the region server hosting
>>>>>>>>>>> meta,
>>>>>>>>>>> looks like the dead datanode causing this error:
>>>>>>>>>>>
>>>>>>>>>>> 2010-04-29 01:01:38,948 WARN org.apache.hadoop.hdfs.DFSClient:
>>>>>>>>>>> DFSOutputStream ResponseProcessor exception  for block
>>>>>>>>>>> blk_508630839844593817_11180java.io.IOException: Bad response 1 for
>>>>>>>>>>> block
>>>>>>>>>>> blk_508630839844593817_11180 from datanode 10.195.150.255:50010
>>>>>>>>>>>      at
>>>>>>>>>>>
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2423)
>>>>>>>>>>>
>>>>>>>>>>> The regionserver log on teh dead node, 10.195.150.255 has some more
>>>>>>>>>>> errors
>>>>>>>>>>> in it:
>>>>>>>>>>>
>>>>>>>>>>> http://pastebin.com/EFH9jz0w
>>>>>>>>>>>
>>>>>>>>>>> I found this in the .out file on the datanode:
>>>>>>>>>>>
>>>>>>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b13 mixed mode
>>>>>>>>>>> linux-amd64 )
>>>>>>>>>>> # Problematic frame:
>>>>>>>>>>> # V  [libjvm.so+0x62263c]
>>>>>>>>>>> #
>>>>>>>>>>> # An error report file with more information is saved as:
>>>>>>>>>>> # /usr/local/hadoop-0.20.1/hs_err_pid1364.log
>>>>>>>>>>> #
>>>>>>>>>>> # If you would like to submit a bug report, please visit:
>>>>>>>>>>> #   http://java.sun.com/webapps/bugreport/crash.jsp
>>>>>>>>>>> #
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> There is not a single error in the datanode's log though. Also of 
>>>>>>>>>>> note
>>>>>>>>>>> -
>>>>>>>>>>> this happened well into the test, so the node dying cause the load 
>>>>>>>>>>> to
>>>>>>>>>>> abort
>>>>>>>>>>> but not the prior poor performance. Looking through the mailing 
>>>>>>>>>>> list it
>>>>>>>>>>> looks like java 1.6.0_18 has a bad rep so I'll update the AMI 
>>>>>>>>>>> (although
>>>>>>>>>>> I'm
>>>>>>>>>>> using the same JVM on other servers in the office w/o issue and 
>>>>>>>>>>> decent
>>>>>>>>>>> single node performance and never dying...).
>>>>>>>>>>>
>>>>>>>>>>> Thanks for any help!
>>>>>>>>>>> -chris
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Apr 28, 2010, at 10:10 PM, Stack wrote:
>>>>>>>>>>>
>>>>>>>>>>>> What is load on the server hosting meta like?  Higher than others?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Apr 28, 2010, at 8:42 PM, Chris Tarnas <c...@email.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi JG,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Speed is now down to 18 rows/sec/table per process.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is a regionserver log that is serving two of the regions:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://pastebin.com/Hx5se0hz
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is the GC Log from the same server:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://pastebin.com/ChrRvxCx
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is the master log:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://pastebin.com/L1Kn66qU
>>>>>>>>>>>>>
>>>>>>>>>>>>> The thrift server logs have nothing in them in the same time 
>>>>>>>>>>>>> period.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks in advance!
>>>>>>>>>>>>>
>>>>>>>>>>>>> -chris
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Apr 28, 2010, at 7:32 PM, Jonathan Gray wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hey Chris,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That's a really significant slowdown.  I can't think of anything
>>>>>>>>>>>
>>>>>>>>>>> obvious that would cause that in your setup.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any chance of some regionserver and master logs from the time it 
>>>>>>>>>>>>>> was
>>>>>>>>>>>
>>>>>>>>>>> going slow?  Is there any activity in the logs of the regionservers
>>>>>>>>>>> hosting
>>>>>>>>>>> the regions of the table being written to?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> JG
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>> From: Christopher Tarnas [mailto:c...@tarnas.org] On Behalf Of 
>>>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>>> Tarnas
>>>>>>>>>>>>>>> Sent: Wednesday, April 28, 2010 6:27 PM
>>>>>>>>>>>>>>> To: hbase-user@hadoop.apache.org
>>>>>>>>>>>>>>> Subject: EC2 + Thrift inserts
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> First, thanks to all the HBase developers for producing this, 
>>>>>>>>>>>>>>> it's
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> great project and I'm glad to be able to use it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm looking for some help and hints here with insert performance
>>>>>>>>>>>>>>> help.
>>>>>>>>>>>>>>> I'm doing some benchmarking, testing how I can scale up using
>>>>>>>>>>>>>>> HBase,
>>>>>>>>>>>>>>> not really looking at raw speed. The testing is happening on 
>>>>>>>>>>>>>>> EC2,
>>>>>>>>>>>
>>>>>>>>>>> using
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Andrew's scripts (thanks - those were very helpful) to set them 
>>>>>>>>>>>>>>> up
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> with a slightly customized version of the default AMIs (added my
>>>>>>>>>>>>>>> application modules). I'm using HBase 20.3 and Hadoop 20.1. I've
>>>>>>>>>>>
>>>>>>>>>>> looked
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> at the tips in the Wiki and it looks like Andrew's scripts are
>>>>>>>>>>>>>>> already
>>>>>>>>>>>>>>> setup that way.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm inserting into HBase from a hadoop streaming job that runs 
>>>>>>>>>>>>>>> perl
>>>>>>>>>>>
>>>>>>>>>>> and
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> uses the thrift gateway. I'm also using the Transactional 
>>>>>>>>>>>>>>> tables so
>>>>>>>>>>>>>>> that alone could be the case, but from what I can tell I don't
>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>> so. LZO compression is also enabled for the column families 
>>>>>>>>>>>>>>> (much
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> the data is highly compressible). My cluster has 7 nodes, 5
>>>>>>>>>>>>>>> regionservers, 1 master and 1 zookeeper. The regionservers and
>>>>>>>>>>>>>>> master
>>>>>>>>>>>>>>> are c1.xlarges. Each regionserver has the tasktrackers that runs
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> hadoop streaming jobs, and regionserver also runs its own thrift
>>>>>>>>>>>>>>> server. Each mapper that does the load talks to the localhost's
>>>>>>>>>>>>>>> thrift
>>>>>>>>>>>>>>> server.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The Row keys a fixed string + an incremental number then the 
>>>>>>>>>>>>>>> order
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> the bytes are reversed, so runA123 becomes 321Anur. I though of
>>>>>>>>>>>>>>> using
>>>>>>>>>>>>>>> murmur hash but was worried about collisions.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As I add more insert jobs, each jobs throughput goes down. Way
>>>>>>>>>>>>>>> down. I
>>>>>>>>>>>>>>> went from about 200 row/sec/table per job with one job to about 
>>>>>>>>>>>>>>> 24
>>>>>>>>>>>>>>> rows/sec/table per job with 25 running jobs. The servers are 
>>>>>>>>>>>>>>> mostly
>>>>>>>>>>>>>>> idle. I'm loading into two tables, one has several indexes and 
>>>>>>>>>>>>>>> I'm
>>>>>>>>>>>>>>> loading into three column families, the other has no indexes and
>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>> column family. Both tables only currently have two region each.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The regionserver that serves the indexed table's regions is 
>>>>>>>>>>>>>>> using
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> most CPU but is 87% idle. The other servers are all at ~90% 
>>>>>>>>>>>>>>> idle.
>>>>>>>>>>>
>>>>>>>>>>> There
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> is no IO wait. the perl processes are barely ticking over. Java 
>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> most "loaded" server is using about 50-60% of one CPU.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Normally when I do load in a pseudo-distrbuted hbase (my
>>>>>>>>>>>>>>> development
>>>>>>>>>>>>>>> platform) perl's speed is the limiting factor and uses about 
>>>>>>>>>>>>>>> 85% of
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> CPU. In this cluster they are using only 5-10% of a CPU as they 
>>>>>>>>>>>>>>> are
>>>>>>>>>>>
>>>>>>>>>>> all
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> waiting on thrift (hbase). When I run only 1 process on the
>>>>>>>>>>>>>>> cluster,
>>>>>>>>>>>>>>> perl uses much more of a CPU, maybe 70%.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any tips or help in getting the speed/scalability up would be
>>>>>>>>>>>>>>> great.
>>>>>>>>>>>>>>> Please let me know if you need any other info.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As I send this - it looks like the main table has split again 
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>> being served by three regionservers.. My performance is going 
>>>>>>>>>>>>>>> up a
>>>>>>>>>>>>>>> bit
>>>>>>>>>>>>>>> (now 35 rows/sec/table per processes), but still seems like I'm 
>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>> using the full potential of even the limited EC2 system, no IO 
>>>>>>>>>>>>>>> wait
>>>>>>>>>>>
>>>>>>>>>>> and
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> lots of idle CPU.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> many thanks
>>>>>>>>>>>>>>> -chris
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Todd Lipcon
>>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>>
>
>

Re: EC2 + Thrift inserts

Reply via email to