Hi Sean,

So the mental model I have of the benchmark framework is wrong.

I thought that 'concurrent' was the setting for how many requests
could be in progress at any one time (I assumed 1 erlang process
(worker) per request), which is essentially a total deliberate
overestimate of our eventual production concurrency levels; not
particularly scientific and to be taken with a pinch of salt, but it
was consistent across a number of benchmarking runs.

With my mental model of what concurrent meant, I just took mode 'max'
to mean 'don't bother rate throttling'.

Anyway, even if I did hammer the server too hard, BitCask seemed to
handle it just fine!

Regarding the worker processes, does each erlang process handle > 1
request at a time?

Thanks,

James


On 12 July 2010 23:03, Sean Cribbs <[email protected]> wrote:
> James,
>
> One thing you might try is lowering the concurrency rate. "max" mode at even 
> 5-10 workers is enough to saturate most networks (the most I've ever run has 
> been 15).  Whether that is an accurate representation of your production load 
> is another story entirely, and an "exercise for the reader".
>
> Sean Cribbs <[email protected]>
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
> On Jul 11, 2010, at 11:52 PM, James Sadler wrote:
>
>> Hi All,
>>
>> I've been benchmarking Riak using basho_bench on an EC2 m1.large
>> instance and also running locally on my iMac inside VirtualBox to
>> assess performance of the Bitcask and Innostore backends
>>
>> The test configuration looks like this:
>>
>> {mode, max}.
>> {duration, 10}.
>> {concurrent, 50}.
>> {driver, basho_bench_driver_http_raw}.
>> {code_paths, ["deps/stats",
>>              "deps/ibrowse"]}.
>> %% a composite key composed of 4 IDs, each of which is a 16 char hex string,
>> %% specific to our data model.
>> {key_generator, {random_dynamo_style_string, 35000}}.
>> {value_generator, {fixed_bin, 1000}}.
>> {operations, [{get, 1}, {update, 2}]}.
>> {http_raw_ips, ["127.0.0.1"]}.
>>
>> Also, to generate the 'random_dynamo_style_string', I made the
>> following changes to the bash_bench source:
>>
>> diff --git a/src/basho_bench_keygen.erl b/src/basho_bench_keygen.erl
>> index 4849bbe..639e90b 100644
>> --- a/src/basho_bench_keygen.erl
>> +++ b/src/basho_bench_keygen.erl
>> @@ -54,6 +54,13 @@ new({pareto_int, MaxKey}, _Id) ->
>> new({pareto_int_bin, MaxKey}, _Id) ->
>>     Pareto = pareto(trunc(MaxKey * 0.2), ?PARETO_SHAPE),
>>     fun() -> <<(Pareto()):32/native>> end;
>> +new({random_dynamo_style_string, MaxKey}, _Id) ->
>> +    fun() -> lists:concat([
>> +                    get_random_string(16, "0123456789abcdef"), "-",
>> +                    get_random_string(16, "0123456789abcdef"), "-",
>> +                    get_random_string(16, "0123456789abcdef"), "-",
>> +                    get_random_string(16, "0123456789abcdef")])
>> +    end;
>> new(Other, _Id) ->
>>     ?FAIL_MSG("Unsupported key generator requested: ~p\n", [Other]).
>>
>> @@ -74,10 +81,17 @@ dimension({pareto_int, _}) ->
>>     0.0;
>> dimension({pareto_int_bin, _}) ->
>>     0.0;
>> +dimension({random_dynamo_style_string, MaxKey}) ->
>> +    0.0;
>> dimension(Other) ->
>>     ?FAIL_MSG("Unsupported key generator dimension requested: ~p\n", 
>> [Other]).
>>
>> -
>> +get_random_string(Length, AllowedChars) ->
>> +    lists:foldl(fun(_, Acc) ->
>> +                        [lists:nth(random:uniform(length(AllowedChars)),
>> +                                   AllowedChars)]
>> +                            ++ Acc
>> +                end, [], lists:seq(1, Length)).
>>
>>
>> %% ====================================================================
>>
>>
>> As of now, I've only been running the benchmark with a 'cluster' of
>> one single Riak node, and I have benchmarked with bitcask and
>> innostore backends on the latest version of Riak (0.11.0-1344) and
>> innostore (1.0.0-88) on Ubuntu Lucid.  I have also been running the
>> basho_bench on the same host as the Riak node.
>>
>> The benchmarks are showing very high get and update latencies in the
>> 95th percentile and beyond when using Innostore as the backend.
>> Bitcask performance is much better.
>>
>> While running the benchmarks, I had an iostat process reporting IO
>> every 1 second.  It clearly showed heavy writes during the benchmark,
>> but practically zero reads.  I expect that this was because of the
>> disk cache.  What I found very surprising was that the latencies for
>> innostore gets during the benchmark were very high, even though the
>> disk was not being hit for reads at all.  This was reproducible on
>> both EC2 and on a local VM on my iMac.
>>
>> Observations:
>>
>> ## Innostore backend
>>
>> - Latencies for innostore are high across the board. Even for reads,
>> __when iostat is reporting that no reads are hitting the disk__.
>>
>> - 95th percentile read/write latencies are up to 1500/2000 millis.
>>
>> - 99th percentile reads/writes are up to 2000/4000 millis.
>>
>> - The difference between update and get latency is small.
>>
>> - There are some failed (timeouts) updates/gets in log file produced
>> by basho_bench
>>
>> - Throughput with innostore backend is 150-200 req/sec
>>
>> - Mounting the filesystem with noatime doesn't seem to make much of a
>> difference.
>>
>> - Getting values from disk cache has huge latency (iostat reporting no
>> reads on the device).  This is somewhat bizarre.
>>
>> ## Bitcask backend
>>
>> - There are zero errors in the log produced by bash_bench (no timeouts
>> like with innostore)
>>
>> - Throughput is much higher: 620 req/sec
>>
>> - The 99.9th percentile latencies are 200ms for writes, and for reads 100ms
>>
>> - The 95th percentile latencies are 160ms for writes and 60ms for reads
>>
>> - Mean & median latencies are 115ms for writes and 20ms for reads.
>>
>> Summary charts from basho_bench are attached.
>>
>>
>> NOTE:
>>
>> I haven't included benchmark results from my own local VM.  FWIW, I
>> observed the approximately same characteristics in the EC2 and local
>> benchmarks.
>>
>> In summary, it looks like there are significant performance issues
>> with Innostore in terms of throughput and latency.  Latency is the
>> biggest issue for our ad serving product at Lexer, so it looks like
>> we'll be using Bitcask in production.
>>
>> Hopefully these results will be useful to others.
>>
>> Also, given our large key size of 67 characters, combined with the
>> Bitcask's padding and storage layout, how many keys should we be able
>> to manager per node per GB?
>>
>> Looking forward to any comments.
>>
>> Thanks.
>>
>> --
>> James
>> <summary_bitcask_ec2.png><summary_innostore_ec2.png>_______________________________________________
>> riak-users mailing list
>> [email protected]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>



-- 
James

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to