Sebastian, With all requests, Riak will attempt to read or write to all replicas regardless of the specified r or w value. The r and w values affect how many reads from or writes to partitions must be completed before the operation is considered successful.
As a result, the get (read) and put (write) handlers outlive the client request. They will continue to wait for either all vnodes (replicas/partitions) to respond or for the 60 timeout to elapse. As such, network traffic after a large number of reads with r=1 shouldn't be surprising, the request handlers are continuing to await responses from vnodes who are working through their request queues. On modest hardware, I've seen Riak clusters perform multiples of 500 ops per second. I'm curious, what are you using to perform your benchmark? Does it perform requests in parallel? Are requests being made to all nodes in the cluster or just one? To find your maximum throughput, you should experiment with various ratios of parallel request per node. Hope that helps. -- Ian Plosker <[email protected]> Developer Advocate Basho Technologies On Tuesday, January 24, 2012 at 5:21 AM, Sebastian Gerlach wrote: > Dear Riak-Users, > > we consider to save a large amount (50000000) of binary Data (Images) in > a riak cluster. Each image has a size of 648 KB. We want to store 3 > copy's of each image. > > In this case i need to store 50000000 * 648 KB * 3 = 90.5 TB Data. This > calculation didn't include any overhead for reorganisation and other stuff. > > On the other hand is the network. I run some benchmarks on a 4 node > cluster. Each with a 1 Gbps interface. In addition to the benchmarks > I've made some calculations. > > Some information for the benchmark: > - I use the same interface for clustercommunication and benchmarking. > - I use the riak http api interface > - time curl -s > HTTP://interface:8098/buckets/test-01/keys/[10001-20000].jpg > /dev/null > > In theory, a 1 Gbps interface provides 125 MB per second. In my > calculation i only use 50 percent of the theoretically available > bandwidth. This fit very well to my benchmarks. > > I try a while with the '{"props":{"r":X}}'. > > Calculation “r=2” > available bandwidth = 62.5 MB per second / (3*648 KB) = 33 requests per > second per node = 132 requests per second over the cluster. > > Calculation “r=1” > available bandwidth = 62.5 MB per second / (2*648 KB) = 50 requests per > second per node = 200 requests per second over the cluster. > > In this second case i see some strange effects in the network. My send > and received queues grow verry fast. And after finishing the benchmark > there is a while a lot of traffic between the riak nodes. > > Does anyone have experience with these data sets and can give a few > hints at a possible setup? The goal is to processed at least 500 > requests per second. > > Some other points in my considerations are the time required for a > reorganization after a new node are added to the cluster or a node has > been replaced. > > Many thanks for your reply and your attention. > > Kind regards > Sebastian > > > _______________________________________________ > riak-users mailing list > [email protected] (mailto:[email protected]) > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
