Hi Ian,
thanks for your reply. First, thank's a lot for explain the read process
in riak. It helps a lot.
For the benchmarks i use 4 Server:
CPU: 1x Intel(R) Xeon(R) CPU X5670 @ 2.93GHz
RAM: 16 GB
NET: 1 GBit interface
I believe that a cluster can perform over 500 requests per second
depending on the key/value size and the N_VAL.
In my case i have an N_VAL=3 with the default configuration of riak. I
only play around with the r and w property of the bucket.
I use curl as client and the HTTP-API of riak. I run different
benchmarks. Like one client make the requests to one node or 4 clients
request all 4 nodes.
In all tests the CPU and RAM wasn't the limit. Also i/o dons't seem to
be a problem. The bottleneck at this time is my network.
With a value size of 648 KB and a 1 GBit interface you have a physical
limitation.
In theory an 1 GBit interface can serve 128 MByte/sec. My Images have a
size of 648 KByte. If i use an Apache or Squid to serve them i can
theoretically get 128 MByte / 648 KByte = 200 Images/sec. With riak the
calculation looks like the following: 128 MByte / ( 4 * 648 KByte) = 50
Images/sec. The "4 *" result from N_VAL=3. The coordinator has to read 3
copys and send one.
In theory i can serve the requests parallel or in a row. The limiting
factor should be the bandwidth. In my calculations i use only 50% of the
available bandwidth.
Best regards
Sebastian
On 01/25/2012 11:23 PM, Ian Plosker wrote:
> Sebastian,
>
> With all requests, Riak will attempt to read or write to all replicas
> regardless of the specified r or w value. The r and w values affect how many
> reads from or writes to partitions must be completed before the operation is
> considered successful.
>
> As a result, the get (read) and put (write) handlers outlive the client
> request. They will continue to wait for either all vnodes
> (replicas/partitions) to respond or for the 60 timeout to elapse. As such,
> network traffic after a large number of reads with r=1 shouldn't be
> surprising, the request handlers are continuing to await responses from
> vnodes who are working through their request queues.
>
> On modest hardware, I've seen Riak clusters perform multiples of 500 ops per
> second. I'm curious, what are you using to perform your benchmark? Does it
> perform requests in parallel? Are requests being made to all nodes in the
> cluster or just one? To find your maximum throughput, you should experiment
> with various ratios of parallel request per node.
>
> Hope that helps.
>
> --
> Ian Plosker <[email protected]>
> Developer Advocate
> Basho Technologies
>
>
> On Tuesday, January 24, 2012 at 5:21 AM, Sebastian Gerlach wrote:
>
>> Dear Riak-Users,
>>
>> we consider to save a large amount (50000000) of binary Data (Images) in
>> a riak cluster. Each image has a size of 648 KB. We want to store 3
>> copy's of each image.
>>
>> In this case i need to store 50000000 * 648 KB * 3 = 90.5 TB Data. This
>> calculation didn't include any overhead for reorganisation and other stuff.
>>
>> On the other hand is the network. I run some benchmarks on a 4 node
>> cluster. Each with a 1 Gbps interface. In addition to the benchmarks
>> I've made some calculations.
>>
>> Some information for the benchmark:
>> - I use the same interface for clustercommunication and benchmarking.
>> - I use the riak http api interface
>> - time curl -s
>> HTTP://interface:8098/buckets/test-01/keys/[10001-20000].jpg > /dev/null
>>
>> In theory, a 1 Gbps interface provides 125 MB per second. In my
>> calculation i only use 50 percent of the theoretically available
>> bandwidth. This fit very well to my benchmarks.
>>
>> I try a while with the '{"props":{"r":X}}'.
>>
>> Calculation “r=2”
>> available bandwidth = 62.5 MB per second / (3*648 KB) = 33 requests per
>> second per node = 132 requests per second over the cluster.
>>
>> Calculation “r=1”
>> available bandwidth = 62.5 MB per second / (2*648 KB) = 50 requests per
>> second per node = 200 requests per second over the cluster.
>>
>> In this second case i see some strange effects in the network. My send
>> and received queues grow verry fast. And after finishing the benchmark
>> there is a while a lot of traffic between the riak nodes.
>>
>> Does anyone have experience with these data sets and can give a few
>> hints at a possible setup? The goal is to processed at least 500
>> requests per second.
>>
>> Some other points in my considerations are the time required for a
>> reorganization after a new node are added to the cluster or a node has
>> been replaced.
>>
>> Many thanks for your reply and your attention.
>>
>> Kind regards
>> Sebastian
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> [email protected] (mailto:[email protected])
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com