On Nov 9, 2010, at 14:58 , Kevin Smith wrote: > > On Nov 9, 2010, at 5:01 AM, Karsten Thygesen wrote: > >> Hi >> >> OK, we will use a larger ringsize next time and will consider a data reload. >> >> Regarding the metrics: the servers are dedicated to Riak use and it not used >> for anything else. They are new HP servers with 8 cores each and 4x146GB 10K >> RPM SAS disks in a contatenated mirror setup. We use Solaris with ZFS as >> filesystem and I have turned off atime update in the data partition. >> >> The pool is built as such: >> >> pool: pool01 >> state: ONLINE >> scrub: scrub completed after 0h0m with 0 errors on Tue Oct 26 21:25:05 2010 >> config: >> >> NAME STATE READ WRITE CKSUM >> pool01 ONLINE 0 0 0 >> mirror-0 ONLINE 0 0 0 >> c0t0d0s7 ONLINE 0 0 0 >> c0t1d0s7 ONLINE 0 0 0 >> mirror-1 ONLINE 0 0 0 >> c0t2d0 ONLINE 0 0 0 >> c0t3d0 ONLINE 0 0 0 >> >> errors: No known data errors >> >> so it is as fast as possible. >> >> However - we use the ZFS default blocksize, which is 128Kb - is that optimal >> with bitcask as backend? It is rather large, but what is optimal with >> bitcask? > > I don't have much experience tuning Solaris or ZFS for Riak. This is a > question best asked of Ryan and I will make sure he sees this.
Thanks! > >> >> The cluster is 4 servers with gigabit connection located in the same >> datacenter on the same switch. The loadbalancer is a Zeus ZTM, which does >> quote a few http optimizations including extended reuse of http connections >> and we usually see far better response times using the loadbalancer than >> using a node directly. > > Hmmm. Can you share what the performance times are like for direct cluster > access? In this case, there is no measurable difference whenever we ask a cluster node directly or we go through the loadbalancer. The largest difference is when we hit it with a lot of small requests, but that is not the case here. > >> >> When we run the test, each riak node is only about 100% cpu loaded (which on >> solaris means, that it only uses one of the 8 cores). We have seen spikes in >> the 160% area, but everything below 800% is not cpu bound. So all-in-all, >> the cpuload is between 5 and 10%. > > Can you send me the code you're using for the performance test? I'd like to > run the exact code on my test hardware and see if that reveals anything. Jan, can you please provide the test client? > > Also, low CPU usage might indicate you are IO bound. Do you know if Riak > processes are spending much time waiting for IO to complete? > It does not seem so. The servers are not IO bound, there is plenty of network capacity and the disks is only around 10% loaded. My largest suspicion is on the datamodel - when having a 4-node cluster and doing a linkwalk, which need to combine around 5-600 documents, it will take quite some time, but we still feel, that the numbers is very high. Perhaps we should consider a datamodel, where we collect, say, 100 documents in a basket and the only have to linkwalk 4-5 baskets to return an answer? Tempting, performancewise, but it makes it a lot harder to maintain the data afterwards as we can not just use map-reduce and similar technologies to handle data... Karsten > --Kevin > >
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
