Re: riak sizing considerations

Sebastian Gerlach Thu, 26 Jan 2012 03:59:16 -0800

Hi Ian,

thanks for your reply. First, thank's a lot for explain the read process
in riak. It helps a lot.


For the benchmarks i use 4 Server:

        CPU: 1x Intel(R) Xeon(R) CPU X5670 @ 2.93GHz
        RAM: 16 GB
        NET: 1 GBit interface

I believe that a cluster can perform over 500 requests per second
depending on the key/value size and the N_VAL.

In my case i have an N_VAL=3 with the default configuration of riak. I
only play around with the r and w property of the bucket.

I use curl as client and the HTTP-API of riak. I run different
benchmarks. Like one client make the requests to one node or 4 clients
request all 4 nodes.

In all tests the CPU and RAM wasn't the limit. Also i/o dons't seem to
be a problem. The bottleneck at this time is my network.

With a value size of 648 KB and a 1 GBit interface you have a physical
limitation.

In theory an 1 GBit interface can serve 128 MByte/sec. My Images have a
size of 648 KByte. If i use an Apache or Squid to serve them i can
theoretically get 128 MByte / 648 KByte = 200 Images/sec. With riak the
calculation looks like the following: 128 MByte / ( 4 * 648 KByte) = 50
Images/sec. The "4 *" result from N_VAL=3. The coordinator has to read 3
copys and send one.

In theory i can serve the requests parallel or in a row. The limiting
factor should be the bandwidth. In my calculations i use only 50% of the
available bandwidth.

Best regards
Sebastian

On 01/25/2012 11:23 PM, Ian Plosker wrote:
> Sebastian,
> 
> With all requests, Riak will attempt to read or write to all replicas 
> regardless of the specified r or w value. The r and w values affect how many 
> reads from or writes to partitions must be completed before the operation is 
> considered successful.
> 
> As a result, the get (read) and put (write) handlers outlive the client 
> request. They will continue to wait for either all vnodes 
> (replicas/partitions) to respond or for the 60 timeout to elapse. As such, 
> network traffic after a large number of reads with r=1 shouldn't be 
> surprising, the request handlers are continuing to await responses from 
> vnodes who are working through their request queues.
> 
> On modest hardware, I've seen Riak clusters perform multiples of 500 ops per 
> second. I'm curious, what are you using to perform your benchmark? Does it 
> perform requests in parallel? Are requests being made to all nodes in the 
> cluster or just one? To find your maximum throughput, you should experiment 
> with various ratios of parallel request per node.
> 
> Hope that helps.  
> 
> --  
> Ian Plosker <[email protected]>
> Developer Advocate
> Basho Technologies
> 
> 
> On Tuesday, January 24, 2012 at 5:21 AM, Sebastian Gerlach wrote:
> 
>> Dear Riak-Users,
>>  
>> we consider to save a large amount (50000000) of binary Data (Images) in
>> a riak cluster. Each image has a size of 648 KB. We want to store 3
>> copy's of each image.
>>  
>> In this case i need to store 50000000 * 648 KB * 3 = 90.5 TB Data. This
>> calculation didn't include any overhead for reorganisation and other stuff.
>>  
>> On the other hand is the network. I run some benchmarks on a 4 node
>> cluster. Each with a 1 Gbps interface. In addition to the benchmarks
>> I've made some calculations.
>>  
>> Some information for the benchmark:
>> - I use the same interface for clustercommunication and benchmarking.
>> - I use the riak http api interface
>> - time curl -s
>> HTTP://interface:8098/buckets/test-01/keys/[10001-20000].jpg > /dev/null
>>  
>> In theory, a 1 Gbps interface provides 125 MB per second. In my
>> calculation i only use 50 percent of the theoretically available
>> bandwidth. This fit very well to my benchmarks.
>>  
>> I try a while with the '{"props":{"r":X}}'.
>>  
>> Calculation “r=2”
>> available bandwidth = 62.5 MB per second / (3*648 KB) = 33 requests per
>> second per node = 132 requests per second over the cluster.
>>  
>> Calculation “r=1”
>> available bandwidth = 62.5 MB per second / (2*648 KB) = 50 requests per
>> second per node = 200 requests per second over the cluster.
>>  
>> In this second case i see some strange effects in the network. My send
>> and received queues grow verry fast. And after finishing the benchmark
>> there is a while a lot of traffic between the riak nodes.
>>  
>> Does anyone have experience with these data sets and can give a few
>> hints at a possible setup? The goal is to processed at least 500
>> requests per second.
>>  
>> Some other points in my considerations are the time required for a
>> reorganization after a new node are added to the cluster or a node has
>> been replaced.
>>  
>> Many thanks for your reply and your attention.
>>  
>> Kind regards
>> Sebastian
>>  
>>  
>> _______________________________________________
>> riak-users mailing list
>> [email protected] (mailto:[email protected])
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: riak sizing considerations

Reply via email to