Hi Stack,

After some night tests I have log of one regionserver in debug mode.
I've uploaded it here: http://slil.ru/28491882 (downloading begins after 10
second)
But there is some problems I see after these tests, I regularly have
following exception in client logs:

org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
region server Some server, retryOnlyOne=true, index=0, islastrow=false,
tries=9, numtries=10, i=179, listsize=883,
region=4,\x00\x00F\x16,1263403845332 for region 4,\x00\x00E5,1263403845332,
row '\x00\x00E\xA2', but failed after 10 attempts.


But I see that all servers are online. I can only suppose that sometimes
there is insufficient number of RPC handlers. Also I would like to ask how
replication in hadoop works. You can see in pictures from previous post that
inbound traffic = outbound for server under load. Is that mean that hadoop
creates replication for block on another server as we wrote this block on
current server? Is there any influence of replication on read/write speed (I
mean is there any case when replications impacts on network throughput and
read/write operations became slower)?

2010/1/14 Dmitriy Lyfar <[email protected]>

> Hi,
>
> > Speed still the same (about 1K rows per second).
>> >
>>
>> This seems low for your 6 node cluster.
>>
>> If you look at the servers, are they cpu or io bound-up in any way?
>>
>> How many clients you have running now?
>>
>
> Now I'm running 1-2 clients in parallel. If I run more -- timings grows.
> Also I not use namenode as datanode and as regionserver. There is only
> namenode/secondarynn/master/zk.
>
>
>>
>> This is not a new table right?  (I see there is an existing table in your
>> cluster looking at the regionserver log).   Its an existing table of many
>> regions?
>>
>
> Yes. I have 7 test tables. Client randomly select table which will be used
> at start.
> Now after some tests I have about 800 regions per region server and 7
> tables.
>
>
>>
>> You have upped the handlers in hbase.  Have you done same for datanodes
>> (In
>> case we are bottlenecking here).
>>
>
> I've updated this setting for hadoop also. As I understand if something
> wrong with
> number of handles -- I will get an exception TooManyOpenFiles and datanode
> finish its work.
> All works fine for now. I've attached metrics from one of datanodes. On
> other nodes we have almost same picture. Please look at the throughput
> picture. It seems illogical to me that node have almost equal inbound and
> outbound traffic (render.png). These pictures were snapped while running two
> clients and then after some break I've ran one client.
>
>
>>  > Random ints plays a role of row keys now (i.e. uniform random
>> distribution
>> > on (0, 100 * 1000)).
>> > What do you think is 5GB for hbase and 2GB for hdfs enough?
>> >
>> > Yes, that should be good.  Writing you are not using that memory in
>> regionserver though, maybe you should go with bigger regions if you have
>> 25k
>> cells.  You using compression?
>>
>
> Yes, 25Kb is important, but I think in production system we will have
> 70-80% of 5-10Kb rows,
> about 20% of 25Kb rows and 10% of > 25Kb rows. I'm not using any
> compression for columns because I was thinking about throughput. But I was
> planning to use compression when I can achieve 80-90 Mb/sec for this test.
>
>
>>
>> I took a look at your regionserver log.  Its just after an open of the
>> regionserver.  I see no activity other than the opening of a few regions.
>>  These regions do happen to have alot of store files so we're starting up
>> compactions but that all should be fine.  I'd be interested in seeing a
>> log
>> snippet from a regionserver under load.
>>
>
> Ok, there are some tests running now which will be interesting I think,
> I'll provide regionserver logs a bit later.
> Thank you for your help!
>
> --
> Regards, Lyfar Dmitriy
>
>


-- 
Regards, Lyfar Dmitriy
mailto: [email protected]
jabber: [email protected]

Reply via email to