Re: Slow performance using linkwalk, help wanted

Karsten Thygesen Tue, 09 Nov 2010 07:25:15 -0800

On Nov 9, 2010, at 14:58 , Kevin Smith wrote:

> 
> On Nov 9, 2010, at 5:01 AM, Karsten Thygesen wrote:
> 
>> Hi
>> 
>> OK, we will use a larger ringsize next time and will consider a data reload.
>> 
>> Regarding the metrics: the servers are dedicated to Riak use and it not used 
>> for anything else. They are new HP servers with 8 cores each and 4x146GB 10K 
>> RPM SAS disks in a contatenated mirror setup. We use Solaris with ZFS as 
>> filesystem and I have turned off atime update in the data partition.
>> 
>> The pool is built as such:
>> 
>> pool: pool01
>> state: ONLINE
>> scrub: scrub completed after 0h0m with 0 errors on Tue Oct 26 21:25:05 2010
>> config:
>> 
>>       NAME          STATE     READ WRITE CKSUM
>>       pool01        ONLINE       0     0     0
>>         mirror-0    ONLINE       0     0     0
>>           c0t0d0s7  ONLINE       0     0     0
>>           c0t1d0s7  ONLINE       0     0     0
>>         mirror-1    ONLINE       0     0     0
>>           c0t2d0    ONLINE       0     0     0
>>           c0t3d0    ONLINE       0     0     0
>> 
>> errors: No known data errors
>> 
>> so it is as fast as possible. 
>> 
>> However - we use the ZFS default blocksize, which is 128Kb - is that optimal 
>> with bitcask as backend? It is rather large, but what is optimal with 
>> bitcask?
> 
> I don't have much experience tuning Solaris or ZFS for Riak. This is a 
> question best asked of Ryan and I will make sure he sees this.


Thanks!

> 
>> 
>> The cluster is 4 servers with gigabit connection located in the same 
>> datacenter on the same switch. The loadbalancer is a Zeus ZTM, which does 
>> quote a few http optimizations including extended reuse of http connections 
>> and we usually see far better response times using the loadbalancer than 
>> using a node directly.
> 
> Hmmm. Can you share what the performance times are like for direct cluster 
> access?

In this case, there is no measurable difference whenever we ask a cluster node 
directly or we go through the loadbalancer. The largest difference is when we 
hit it with a lot of small requests, but that is not the case here.

> 
>> 
>> When we run the test, each riak node is only about 100% cpu loaded (which on 
>> solaris means, that it only uses one of the 8 cores). We have seen spikes in 
>> the 160% area, but everything below 800% is not cpu bound. So all-in-all, 
>> the cpuload is between 5 and 10%.
> 
> Can you send me the code you're using for the performance test? I'd like to 
> run the exact code on my test hardware and see if that reveals anything.

Jan, can you please provide the test client?

> 
> Also, low CPU usage might indicate you are IO bound. Do you know if Riak 
> processes are spending much time waiting for IO to complete?
> 

It does not seem so. The servers are not IO bound, there is plenty of network 
capacity and the disks is only around 10% loaded.

My largest suspicion is on the datamodel - when having a 4-node cluster and 
doing a linkwalk, which need to combine around 5-600 documents, it will take 
quite some time, but we still feel, that the numbers is very high. 

Perhaps we should consider a datamodel, where we collect, say, 100 documents in 
a basket and the only have to linkwalk 4-5 baskets to return an answer? 
Tempting, performancewise, but it makes it a lot harder to maintain the data 
afterwards as we can not just use map-reduce and similar technologies to handle 
data...

Karsten

> --Kevin
> 
>

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Slow performance using linkwalk, help wanted

Reply via email to