Sebastian,

With all requests, Riak will attempt to read or write to all replicas 
regardless of the specified r or w value. The r and w values affect how many 
reads from or writes to partitions must be completed before the operation is 
considered successful.

As a result, the get (read) and put (write) handlers outlive the client 
request. They will continue to wait for either all vnodes (replicas/partitions) 
to respond or for the 60 timeout to elapse. As such, network traffic after a 
large number of reads with r=1 shouldn't be surprising, the request handlers 
are continuing to await responses from vnodes who are working through their 
request queues.

On modest hardware, I've seen Riak clusters perform multiples of 500 ops per 
second. I'm curious, what are you using to perform your benchmark? Does it 
perform requests in parallel? Are requests being made to all nodes in the 
cluster or just one? To find your maximum throughput, you should experiment 
with various ratios of parallel request per node.

Hope that helps.  

--  
Ian Plosker <[email protected]>
Developer Advocate
Basho Technologies


On Tuesday, January 24, 2012 at 5:21 AM, Sebastian Gerlach wrote:

> Dear Riak-Users,
>  
> we consider to save a large amount (50000000) of binary Data (Images) in
> a riak cluster. Each image has a size of 648 KB. We want to store 3
> copy's of each image.
>  
> In this case i need to store 50000000 * 648 KB * 3 = 90.5 TB Data. This
> calculation didn't include any overhead for reorganisation and other stuff.
>  
> On the other hand is the network. I run some benchmarks on a 4 node
> cluster. Each with a 1 Gbps interface. In addition to the benchmarks
> I've made some calculations.
>  
> Some information for the benchmark:
> - I use the same interface for clustercommunication and benchmarking.
> - I use the riak http api interface
> - time curl -s
> HTTP://interface:8098/buckets/test-01/keys/[10001-20000].jpg > /dev/null
>  
> In theory, a 1 Gbps interface provides 125 MB per second. In my
> calculation i only use 50 percent of the theoretically available
> bandwidth. This fit very well to my benchmarks.
>  
> I try a while with the '{"props":{"r":X}}'.
>  
> Calculation “r=2”
> available bandwidth = 62.5 MB per second / (3*648 KB) = 33 requests per
> second per node = 132 requests per second over the cluster.
>  
> Calculation “r=1”
> available bandwidth = 62.5 MB per second / (2*648 KB) = 50 requests per
> second per node = 200 requests per second over the cluster.
>  
> In this second case i see some strange effects in the network. My send
> and received queues grow verry fast. And after finishing the benchmark
> there is a while a lot of traffic between the riak nodes.
>  
> Does anyone have experience with these data sets and can give a few
> hints at a possible setup? The goal is to processed at least 500
> requests per second.
>  
> Some other points in my considerations are the time required for a
> reorganization after a new node are added to the cluster or a node has
> been replaced.
>  
> Many thanks for your reply and your attention.
>  
> Kind regards
> Sebastian
>  
>  
> _______________________________________________
> riak-users mailing list
> [email protected] (mailto:[email protected])
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>  
>  


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to