On Thu, Jan 14, 2010 at 5:20 AM, Dmitriy Lyfar <[email protected]> wrote:
> Hi, > > > Speed still the same (about 1K rows per second). >> > >> >> This seems low for your 6 node cluster. >> >> If you look at the servers, are they cpu or io bound-up in any way? >> >> How many clients you have running now? >> > > Now I'm running 1-2 clients in parallel. If I run more -- timings grows. > Also I not use namenode as datanode and as regionserver. There is only > namenode/secondarynn/master/zk. > Understood, but is it because the regionservers+datanodes load is going up if you add more clients? Or are the timeouts because of something else? (Clients are running on the machine that has NN/Master/ZK? If so, could the clients be sucking resources from these servers in a way that slows down whole cluster? Is load on machine high when clients are running? That kinda thing). > > >> >> This is not a new table right? (I see there is an existing table in your >> cluster looking at the regionserver log). Its an existing table of many >> regions? >> > > Yes. I have 7 test tables. Client randomly select table which will be used > at start. > Now after some tests I have about 800 regions per region server and 7 > tables. > > Thats a lot of regions per regionserver. Just FYI. > >> You have upped the handlers in hbase. Have you done same for datanodes >> (In >> case we are bottlenecking here). >> > > I've updated this setting for hadoop also. As I understand if something > wrong with > number of handles -- I will get an exception TooManyOpenFiles and datanode > finish its work. > No. Your change to ulimit addresses this issue. Upping the handlers makes it so requests get into the server. Otherwise, they are blocked until one becomes available. If servers are powerful, as are yours, they can handle more work concurrenlty that handlers might allow come in. > All works fine for now. I've attached metrics from one of datanodes. On > other nodes we have almost same picture. Please look at the throughput > picture. It seems illogical to me that node have almost equal inbound and > outbound traffic (render.png). These pictures were snapped while running two > clients and then after some break I've ran one client. > I'll take a look. > > >> > Random ints plays a role of row keys now (i.e. uniform random >> distribution >> > on (0, 100 * 1000)). >> > What do you think is 5GB for hbase and 2GB for hdfs enough? >> > >> > Yes, that should be good. Writing you are not using that memory in >> regionserver though, maybe you should go with bigger regions if you have >> 25k >> cells. You using compression? >> > > Yes, 25Kb is important, but I think in production system we will have > 70-80% of 5-10Kb rows, > about 20% of 25Kb rows and 10% of > 25Kb rows. I'm not using any > compression for columns because I was thinking about throughput. But I was > planning to use compression when I can achieve 80-90 Mb/sec for this test. > > Currently we are at what? > >> I took a look at your regionserver log. Its just after an open of the >> regionserver. I see no activity other than the opening of a few regions. >> These regions do happen to have alot of store files so we're starting up >> compactions but that all should be fine. I'd be interested in seeing a >> log >> snippet from a regionserver under load. >> > > Ok, there are some tests running now which will be interesting I think, > I'll provide regionserver logs a bit later. > Thank you for your help! > Thanks for your patience sticking with it. St.Ack
