On Fri, Apr 30, 2010 at 4:32 PM, Chris Tarnas <c...@email.com> wrote: > Thank you, it is nice to get this help. > > I definitely understand the overhead of writing the index, although it seems > much worse than just that overhead would indicate. If I understand you > correctly that is because all inserts into an IndexedTable are synchronized > on one table? If that was switched to using an HTablePool it would no longer > be as sever a bottleneck (performance is about an order of magnitude better > without the indexing)?
They are synchronized per region server yes, and it _should_ be better with a pool since then you can do parallel inserts. Patching it doesn't seem hard, but maybe I'm missing some finer details since I usually don't work around that code. > > I'm also using thrift to connect and am wondering if that itself puts an > overall limit on scaling? It does seem that no matter how many more mappers > and servers I add, even without indexing, I am capped at about 5k rows/sec > total. I'm waiting a bit as the table grows so that it is split across more > regionservers, hopefully that will help, but as far as I can tell I am not > hitting any CPU or IO constraint during my tests. I don't understand the "I'm also using thrift" and "how many more mappers" part, you are using Thrift inside a map? Anyways, more clients won't help since there's a single mega serialization of all the inserts to the index table per region server. It's normal not to see any CPU/mem/IO contention since, in this case, it's all about the speed at which you can process a single row insertion The rest of the threads just wait... > > -chris > > I'm also using thrift, and while I am using the > On Apr 30, 2010, at 3:00 PM, Jean-Daniel Cryans wrote: > >> The contrib packages doesn't get as much love as core HBase, so they >> tend to be under performant and/or reliable and/or maintained and/or >> etc. In this case the issue doesn't seem that bad since it could just >> use a HTablePool, but using IndexedTables will definitely be slower >> than straight insert since it writes to 2 tables (the main table and >> the index). >> >> J-D >> >> On Fri, Apr 30, 2010 at 2:53 PM, Chris Tarnas <c...@email.com> wrote: >>> It appears that for multiple simulations loads using the IndexTables >>> probably not the best choice? >>> >>> -chris >>> >>> On Apr 30, 2010, at 2:39 PM, Jean-Daniel Cryans wrote: >>> >>>> Yeah more handlers won't do it here since there's tons of calls >>>> waiting on a single synchronized method, I guess the IndexedRegion >>>> should use a pool of HTables instead of a single one in order to >>>> improve indexation throughput. >>>> >>>> J-D >>>> >>>> On Fri, Apr 30, 2010 at 2:26 PM, Chris Tarnas <c...@email.com> wrote: >>>>> Here is the thread dump: >>>>> >>>>> I cranked up the handlers to 300 just in case and ran 40 mappers that >>>>> loaded data via thrift. Each node runs its own thrift server. I saw an >>>>> average of 18 rows/sec/mapper with no node using more than 10% CPU and no >>>>> IO wait. It seems no matter how many mappers I throw the total number of >>>>> rows/sec doesn't go much above 700 rows/second total, which seems very, >>>>> very slow to me. >>>>> >>>>> Here is the thread dump from a node: >>>>> >>>>> http://pastebin.com/U3eLRdMV >>>>> >>>>> I do see quite a bit of waiting and some blocking in there, not sure how >>>>> exactly to interpret it all though. >>>>> >>>>> thanks for any help! >>>>> -chris >>>>> >>>>> On Apr 29, 2010, at 9:14 PM, Ryan Rawson wrote: >>>>> >>>>>> One thing to check is at the peak of your load, run jstack on one of >>>>>> the regionservers, and look at the handler threads - if all of them >>>>>> are doing something you might be running into handler contention. >>>>>> >>>>>> it is basically ultimately IO bound. >>>>>> >>>>>> -ryan >>>>>> >>>>>> On Thu, Apr 29, 2010 at 9:12 PM, Chris Tarnas <c...@email.com> wrote: >>>>>>> They are all at 100, but none of the regionservers are loaded - most are >>>>>>> less than 20% CPU. Is this all network latency? >>>>>>> >>>>>>> -chris >>>>>>> >>>>>>> On Apr 29, 2010, at 8:29 PM, Ryan Rawson <ryano...@gmail.com> wrote: >>>>>>> >>>>>>>> Every insert on an indexed would require at the very least an RPC to a >>>>>>>> different regionserver. If the regionservers are busy, your request >>>>>>>> could wait in the queue for a moment. >>>>>>>> >>>>>>>> One param to tune would be the handler thread count. Set it to 100 at >>>>>>>> least. >>>>>>>> >>>>>>>> On Thu, Apr 29, 2010 at 2:16 AM, Chris Tarnas <c...@email.com> wrote: >>>>>>>>> >>>>>>>>> I just finished some testing with JDK 1.6 u17 - so far no performance >>>>>>>>> improvements with just changing that. Disabling LZO compression did >>>>>>>>> gain a >>>>>>>>> little bit (up to about 30/sec from 25/sec per thread). Turning of >>>>>>>>> indexes >>>>>>>>> helped the most - that brought me up to 115/sec @ 2875 total rows a >>>>>>>>> second. >>>>>>>>> A single perl/thrift process can load at over 350 rows/sec so its not >>>>>>>>> scaling as well as I would have expected, even without the indexes. >>>>>>>>> >>>>>>>>> Are the transactional indexes that costly? What is the bottleneck >>>>>>>>> there? >>>>>>>>> CPU utilization and network packets went up when I disabled the >>>>>>>>> indexes, I >>>>>>>>> don't think those are the bottlenecks for the indexes. I was even >>>>>>>>> able to >>>>>>>>> add another 15 insert process (total of 40) and only lost about 10% >>>>>>>>> on a per >>>>>>>>> process throughput. I probably could go even higher, none of the >>>>>>>>> nodes are >>>>>>>>> above CPU 60% utilization and IO wait was at most 3.5%. >>>>>>>>> >>>>>>>>> Each rowkey is unique, so there should not be any blocking on the row >>>>>>>>> locks. I'll do more indexed tests tomorrow. >>>>>>>>> >>>>>>>>> thanks, >>>>>>>>> -chris >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Apr 29, 2010, at 12:18 AM, Todd Lipcon wrote: >>>>>>>>> >>>>>>>>>> Definitely smells like JDK 1.6.0_18. Downgrade that back to 16 or 17 >>>>>>>>>> and >>>>>>>>>> you >>>>>>>>>> should be good to go. _18 is a botched release if I ever saw one. >>>>>>>>>> >>>>>>>>>> -Todd >>>>>>>>>> >>>>>>>>>> On Wed, Apr 28, 2010 at 10:54 PM, Chris Tarnas <c...@email.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Stack, >>>>>>>>>>> >>>>>>>>>>> Thanks for looking. I checked the ganglia charts, no server was at >>>>>>>>>>> more >>>>>>>>>>> than ~20% CPU utilization at any time during the load test and swap >>>>>>>>>>> was >>>>>>>>>>> never used. Network traffic was light - just running a count through >>>>>>>>>>> hbase >>>>>>>>>>> shell generates a much higher use. One the server hosting meta >>>>>>>>>>> specifically, >>>>>>>>>>> it was at about 15-20% CPU, and IO wait never went above 3%, was >>>>>>>>>>> usually >>>>>>>>>>> down at near 0. >>>>>>>>>>> >>>>>>>>>>> The load also died with a thrift timeout on every single node (each >>>>>>>>>>> node >>>>>>>>>>> connecting to localhost for its thrift server), it looks like a >>>>>>>>>>> datanode >>>>>>>>>>> just died and caused every thrift connection to timeout - I'll have >>>>>>>>>>> to >>>>>>>>>>> up >>>>>>>>>>> that limit to handle a node death. >>>>>>>>>>> >>>>>>>>>>> Checking logs this appears in the logs of the region server hosting >>>>>>>>>>> meta, >>>>>>>>>>> looks like the dead datanode causing this error: >>>>>>>>>>> >>>>>>>>>>> 2010-04-29 01:01:38,948 WARN org.apache.hadoop.hdfs.DFSClient: >>>>>>>>>>> DFSOutputStream ResponseProcessor exception for block >>>>>>>>>>> blk_508630839844593817_11180java.io.IOException: Bad response 1 for >>>>>>>>>>> block >>>>>>>>>>> blk_508630839844593817_11180 from datanode 10.195.150.255:50010 >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2423) >>>>>>>>>>> >>>>>>>>>>> The regionserver log on teh dead node, 10.195.150.255 has some more >>>>>>>>>>> errors >>>>>>>>>>> in it: >>>>>>>>>>> >>>>>>>>>>> http://pastebin.com/EFH9jz0w >>>>>>>>>>> >>>>>>>>>>> I found this in the .out file on the datanode: >>>>>>>>>>> >>>>>>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b13 mixed mode >>>>>>>>>>> linux-amd64 ) >>>>>>>>>>> # Problematic frame: >>>>>>>>>>> # V [libjvm.so+0x62263c] >>>>>>>>>>> # >>>>>>>>>>> # An error report file with more information is saved as: >>>>>>>>>>> # /usr/local/hadoop-0.20.1/hs_err_pid1364.log >>>>>>>>>>> # >>>>>>>>>>> # If you would like to submit a bug report, please visit: >>>>>>>>>>> # http://java.sun.com/webapps/bugreport/crash.jsp >>>>>>>>>>> # >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> There is not a single error in the datanode's log though. Also of >>>>>>>>>>> note >>>>>>>>>>> - >>>>>>>>>>> this happened well into the test, so the node dying cause the load >>>>>>>>>>> to >>>>>>>>>>> abort >>>>>>>>>>> but not the prior poor performance. Looking through the mailing >>>>>>>>>>> list it >>>>>>>>>>> looks like java 1.6.0_18 has a bad rep so I'll update the AMI >>>>>>>>>>> (although >>>>>>>>>>> I'm >>>>>>>>>>> using the same JVM on other servers in the office w/o issue and >>>>>>>>>>> decent >>>>>>>>>>> single node performance and never dying...). >>>>>>>>>>> >>>>>>>>>>> Thanks for any help! >>>>>>>>>>> -chris >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Apr 28, 2010, at 10:10 PM, Stack wrote: >>>>>>>>>>> >>>>>>>>>>>> What is load on the server hosting meta like? Higher than others? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Apr 28, 2010, at 8:42 PM, Chris Tarnas <c...@email.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi JG, >>>>>>>>>>>>> >>>>>>>>>>>>> Speed is now down to 18 rows/sec/table per process. >>>>>>>>>>>>> >>>>>>>>>>>>> Here is a regionserver log that is serving two of the regions: >>>>>>>>>>>>> >>>>>>>>>>>>> http://pastebin.com/Hx5se0hz >>>>>>>>>>>>> >>>>>>>>>>>>> Here is the GC Log from the same server: >>>>>>>>>>>>> >>>>>>>>>>>>> http://pastebin.com/ChrRvxCx >>>>>>>>>>>>> >>>>>>>>>>>>> Here is the master log: >>>>>>>>>>>>> >>>>>>>>>>>>> http://pastebin.com/L1Kn66qU >>>>>>>>>>>>> >>>>>>>>>>>>> The thrift server logs have nothing in them in the same time >>>>>>>>>>>>> period. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks in advance! >>>>>>>>>>>>> >>>>>>>>>>>>> -chris >>>>>>>>>>>>> >>>>>>>>>>>>> On Apr 28, 2010, at 7:32 PM, Jonathan Gray wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hey Chris, >>>>>>>>>>>>>> >>>>>>>>>>>>>> That's a really significant slowdown. I can't think of anything >>>>>>>>>>> >>>>>>>>>>> obvious that would cause that in your setup. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Any chance of some regionserver and master logs from the time it >>>>>>>>>>>>>> was >>>>>>>>>>> >>>>>>>>>>> going slow? Is there any activity in the logs of the regionservers >>>>>>>>>>> hosting >>>>>>>>>>> the regions of the table being written to? >>>>>>>>>>>>>> >>>>>>>>>>>>>> JG >>>>>>>>>>>>>> >>>>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>>>> From: Christopher Tarnas [mailto:c...@tarnas.org] On Behalf Of >>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>> Tarnas >>>>>>>>>>>>>>> Sent: Wednesday, April 28, 2010 6:27 PM >>>>>>>>>>>>>>> To: hbase-user@hadoop.apache.org >>>>>>>>>>>>>>> Subject: EC2 + Thrift inserts >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello all, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> First, thanks to all the HBase developers for producing this, >>>>>>>>>>>>>>> it's >>>>>>>>>>>>>>> a >>>>>>>>>>>>>>> great project and I'm glad to be able to use it. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm looking for some help and hints here with insert performance >>>>>>>>>>>>>>> help. >>>>>>>>>>>>>>> I'm doing some benchmarking, testing how I can scale up using >>>>>>>>>>>>>>> HBase, >>>>>>>>>>>>>>> not really looking at raw speed. The testing is happening on >>>>>>>>>>>>>>> EC2, >>>>>>>>>>> >>>>>>>>>>> using >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Andrew's scripts (thanks - those were very helpful) to set them >>>>>>>>>>>>>>> up >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> with a slightly customized version of the default AMIs (added my >>>>>>>>>>>>>>> application modules). I'm using HBase 20.3 and Hadoop 20.1. I've >>>>>>>>>>> >>>>>>>>>>> looked >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> at the tips in the Wiki and it looks like Andrew's scripts are >>>>>>>>>>>>>>> already >>>>>>>>>>>>>>> setup that way. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm inserting into HBase from a hadoop streaming job that runs >>>>>>>>>>>>>>> perl >>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> uses the thrift gateway. I'm also using the Transactional >>>>>>>>>>>>>>> tables so >>>>>>>>>>>>>>> that alone could be the case, but from what I can tell I don't >>>>>>>>>>>>>>> think >>>>>>>>>>>>>>> so. LZO compression is also enabled for the column families >>>>>>>>>>>>>>> (much >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>> the data is highly compressible). My cluster has 7 nodes, 5 >>>>>>>>>>>>>>> regionservers, 1 master and 1 zookeeper. The regionservers and >>>>>>>>>>>>>>> master >>>>>>>>>>>>>>> are c1.xlarges. Each regionserver has the tasktrackers that runs >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> hadoop streaming jobs, and regionserver also runs its own thrift >>>>>>>>>>>>>>> server. Each mapper that does the load talks to the localhost's >>>>>>>>>>>>>>> thrift >>>>>>>>>>>>>>> server. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The Row keys a fixed string + an incremental number then the >>>>>>>>>>>>>>> order >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>> the bytes are reversed, so runA123 becomes 321Anur. I though of >>>>>>>>>>>>>>> using >>>>>>>>>>>>>>> murmur hash but was worried about collisions. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> As I add more insert jobs, each jobs throughput goes down. Way >>>>>>>>>>>>>>> down. I >>>>>>>>>>>>>>> went from about 200 row/sec/table per job with one job to about >>>>>>>>>>>>>>> 24 >>>>>>>>>>>>>>> rows/sec/table per job with 25 running jobs. The servers are >>>>>>>>>>>>>>> mostly >>>>>>>>>>>>>>> idle. I'm loading into two tables, one has several indexes and >>>>>>>>>>>>>>> I'm >>>>>>>>>>>>>>> loading into three column families, the other has no indexes and >>>>>>>>>>>>>>> one >>>>>>>>>>>>>>> column family. Both tables only currently have two region each. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The regionserver that serves the indexed table's regions is >>>>>>>>>>>>>>> using >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> most CPU but is 87% idle. The other servers are all at ~90% >>>>>>>>>>>>>>> idle. >>>>>>>>>>> >>>>>>>>>>> There >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> is no IO wait. the perl processes are barely ticking over. Java >>>>>>>>>>>>>>> on >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> most "loaded" server is using about 50-60% of one CPU. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Normally when I do load in a pseudo-distrbuted hbase (my >>>>>>>>>>>>>>> development >>>>>>>>>>>>>>> platform) perl's speed is the limiting factor and uses about >>>>>>>>>>>>>>> 85% of >>>>>>>>>>>>>>> a >>>>>>>>>>>>>>> CPU. In this cluster they are using only 5-10% of a CPU as they >>>>>>>>>>>>>>> are >>>>>>>>>>> >>>>>>>>>>> all >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> waiting on thrift (hbase). When I run only 1 process on the >>>>>>>>>>>>>>> cluster, >>>>>>>>>>>>>>> perl uses much more of a CPU, maybe 70%. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Any tips or help in getting the speed/scalability up would be >>>>>>>>>>>>>>> great. >>>>>>>>>>>>>>> Please let me know if you need any other info. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> As I send this - it looks like the main table has split again >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> is >>>>>>>>>>>>>>> being served by three regionservers.. My performance is going >>>>>>>>>>>>>>> up a >>>>>>>>>>>>>>> bit >>>>>>>>>>>>>>> (now 35 rows/sec/table per processes), but still seems like I'm >>>>>>>>>>>>>>> not >>>>>>>>>>>>>>> using the full potential of even the limited EC2 system, no IO >>>>>>>>>>>>>>> wait >>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> lots of idle CPU. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> many thanks >>>>>>>>>>>>>>> -chris >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Todd Lipcon >>>>>>>>>> Software Engineer, Cloudera >>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>>>> >>> >>> > >