Re: Memcached performance numbers

2019-10-07 Thread dormando
the high end numbers are due to pipelining responses. ie; ascii multiget,
which reduces the syscalls. you can see how the tests were run via the
links to the source code in the blog.

I was running some pure get tests on dual 8 core machine yesterday with
memcached pinned to one numa node. Without any pipelining it was doing
~1.8m ops/sec. With heavy pipelining it should be much more than that.

In extreme and contrived cases I've gotten the pure get throughput above
50 million keys/sec. So I know that part scales... sets would as well but
nobody really asks for it so I've not focused on it. Latency is probably
not great at that throughput though :)

On Mon, 7 Oct 2019, Pradeep Fernando wrote:

> Thanks for the article link. That is some comprehensive benchmarking.
> Compared to article numbers, my latency numbers are sane enough. I hit ~120 
> us while you get similar/closer numbers at 99 th percentile.
>
> However, my throughput numbers seems to be wrong. I hit a throughput 
> kneepoint at 500K ops/sec while yours is around ~6000K.
> Order of magnitude difference. Can you please comment on it. :)
>
> thanks
> --Pradeep
>
> On Mon, Oct 7, 2019 at 4:46 PM dormando  wrote:
>   It'll depend on your hardware/test/etc.
>
>   https://memcached.org/blog/persistent-memory/ - a thorough performance
>   test with some higher end numbers on both throughput and latency along
>   with 50th/90th/95/99/etc percentiles and latency point clouds for each 
> sub
>   test. That was a big machine though.
>
>   ...and no I'm not going to ignore your 50/50 ratio :) the ratio changes
>   the results too much. people will have to test with what they expect to
>   see. if you do 100% get it scales linearly with the number of worker
>   threads/cores. Anything below 100% gets will slowly scale down to 500-1m
>   ops/s depending on the hardware and size of the objects.
>
>   On Mon, 7 Oct 2019, Pradeep Fernando wrote:
>
>   > Hi,
>   > Thanks for the help!.
>   > After a couple of trial and error configs, I figured out 'concurrency 
> parameter' used in memaslap as the culprit.
>   > In my configs I was using 16 (constant) as the concurrency input. 
> Scaling the value along with thread count gave me sane numbers.
>   >
>   > The average latency is 120 us for get/set workload ( please ignore my 
> 50/50 ratio) and throughput max out around 500K ops/second.
>   > graph attached.
>   >
>   > I know that the benchmarking numbers are heavily dependent on the 
> setup and other things. But Is my numbers are faithful enough to be quoted 
> for 
>   > memcached  single server numbers? In other words, are these numbers 
> are way off from a typical memcached performance numbers?
>   >
>   > thanks
>   > --Pradeep
>   >
>   >
>   > On Mon, Oct 7, 2019 at 2:42 PM dormando  wrote:
>   >       Hey,
>   >
>   >       Sorry; I'm not going to have any other major insights :) I'd 
> have to sit
>   >       here playing 20 questions to figure out your test setup. If 
> you're running
>   >       memaslap from another box, that one needs to be cpu pinned as 
> well. If
>   >       it's a VM, the governor/etc might not even matter.
>   >
>   >       Also I don't use memaslap at all, so I can't attest to it. I use
>   >       https://github.com/memcached/mc-crusher with the external 
> latency sampling
>   >       util it comes with. it's not as easy to use though.
>   >
>   >       On Mon, 7 Oct 2019, Pradeep Fernando wrote:
>   >
>   >       > Hi Dormando,
>   >       > That is great insight.!.
>   >       > However, it did not solve the problem. I disabled turbo, as 
> per your instructions.
>   >       > I even, set the CPU to operate with maximum performance, with 
>   >       > > cpupower frequency-set --governor performance ( i verified 
> this by monitoring cpu freq)
>   >       >
>   >       > Still the same unexplained behavior. :(. Do you have any 
> other suggestions?
>   >       >
>   >       > thanks
>   >       > --Pradeep
>   >       >
>   >       > On Mon, Oct 7, 2019 at 1:08 PM dormando  
> wrote:
>   >       >       Hi,
>   >       >
>   >       >       First as an aside; 1/1 get/set ratio is unusual for mc. 
> The gets scale a
>   >       >       lot better than sets. If you get into testing more 
> "realistic" perf
>   >       >       numbers make sure to increase the get rate.
>   >       >
>   >       >       You're probably just running into CPU scaling. OS's 
> come with a "battery
>   >       >       saver" or "ondemand" performance scheduler by default. 
> They also have
>   >       >       turbo. Once you start loading it up more the CPU will 
> stay in the higher
>   >       >       frequency states or begin to issue turbo, which will 
> lower the latency.
>   >       >
> 

Re: Memcached performance numbers

2019-10-07 Thread Pradeep Fernando
Thanks for the article link. That is some comprehensive benchmarking.

Compared to article numbers, my latency numbers are sane enough. I hit ~120
us while you get similar/closer numbers at 99 th percentile.

However, my throughput numbers seems to be wrong. I hit a throughput
kneepoint at 500K ops/sec while yours is around ~6000K.
Order of magnitude difference. Can you please comment on it. :)

thanks
--Pradeep

On Mon, Oct 7, 2019 at 4:46 PM dormando  wrote:

> It'll depend on your hardware/test/etc.
>
> https://memcached.org/blog/persistent-memory/ - a thorough performance
> test with some higher end numbers on both throughput and latency along
> with 50th/90th/95/99/etc percentiles and latency point clouds for each sub
> test. That was a big machine though.
>
> ...and no I'm not going to ignore your 50/50 ratio :) the ratio changes
> the results too much. people will have to test with what they expect to
> see. if you do 100% get it scales linearly with the number of worker
> threads/cores. Anything below 100% gets will slowly scale down to 500-1m
> ops/s depending on the hardware and size of the objects.
>
> On Mon, 7 Oct 2019, Pradeep Fernando wrote:
>
> > Hi,
> > Thanks for the help!.
> > After a couple of trial and error configs, I figured out 'concurrency
> parameter' used in memaslap as the culprit.
> > In my configs I was using 16 (constant) as the concurrency input.
> Scaling the value along with thread count gave me sane numbers.
> >
> > The average latency is 120 us for get/set workload ( please ignore my
> 50/50 ratio) and throughput max out around 500K ops/second.
> > graph attached.
> >
> > I know that the benchmarking numbers are heavily dependent on the setup
> and other things. But Is my numbers are faithful enough to be quoted for
> > memcached  single server numbers? In other words, are these numbers are
> way off from a typical memcached performance numbers?
> >
> > thanks
> > --Pradeep
> >
> >
> > On Mon, Oct 7, 2019 at 2:42 PM dormando  wrote:
> >   Hey,
> >
> >   Sorry; I'm not going to have any other major insights :) I'd have
> to sit
> >   here playing 20 questions to figure out your test setup. If you're
> running
> >   memaslap from another box, that one needs to be cpu pinned as
> well. If
> >   it's a VM, the governor/etc might not even matter.
> >
> >   Also I don't use memaslap at all, so I can't attest to it. I use
> >   https://github.com/memcached/mc-crusher with the external latency
> sampling
> >   util it comes with. it's not as easy to use though.
> >
> >   On Mon, 7 Oct 2019, Pradeep Fernando wrote:
> >
> >   > Hi Dormando,
> >   > That is great insight.!.
> >   > However, it did not solve the problem. I disabled turbo, as per
> your instructions.
> >   > I even, set the CPU to operate with maximum performance, with
> >   > > cpupower frequency-set --governor performance ( i verified
> this by monitoring cpu freq)
> >   >
> >   > Still the same unexplained behavior. :(. Do you have any other
> suggestions?
> >   >
> >   > thanks
> >   > --Pradeep
> >   >
> >   > On Mon, Oct 7, 2019 at 1:08 PM dormando 
> wrote:
> >   >   Hi,
> >   >
> >   >   First as an aside; 1/1 get/set ratio is unusual for mc.
> The gets scale a
> >   >   lot better than sets. If you get into testing more
> "realistic" perf
> >   >   numbers make sure to increase the get rate.
> >   >
> >   >   You're probably just running into CPU scaling. OS's come
> with a "battery
> >   >   saver" or "ondemand" performance scheduler by default.
> They also have
> >   >   turbo. Once you start loading it up more the CPU will stay
> in the higher
> >   >   frequency states or begin to issue turbo, which will lower
> the latency.
> >   >
> >   >   /usr/bin/echo 1 >
> /sys/devices/system/cpu/intel_pstate/no_turbo
> >   >   cpupower frequency-set -g performance
> >   >
> >   >   ... or whatever works for your platform.
> >   >
> >   >   On Mon, 7 Oct 2019, Pradeep Fernando wrote:
> >   >
> >   >   > Hi Devs,
> >   >   > I run memaslap to understand the performance
> characteristics of memcached,
> >   >   >
> >   >   > My setup : both memcached and memaslap running on a
> single machine with NUMA. memcached is bound to NUMA 1. Gave 3GB of memory
> to memcached.
> >   >   > workload : get/set 0.5/0.5
> >   >   >
> >   >   > I increase number of thread from memaslap and observe
> throughput latency numbers.
> >   >   >
> >   >   > I see increase in throughput (expected) but latency
> drops as I crease the load.
> >   >   > The initial average latency is 83 us and it drops to
> 30us with number of threads = 8, this is an unexpected number.-- I expected
> the latency to go up.
> >   >   > Am I reading the output wrong?
> >  

Re: Memcached performance numbers

2019-10-07 Thread dormando
It'll depend on your hardware/test/etc.

https://memcached.org/blog/persistent-memory/ - a thorough performance
test with some higher end numbers on both throughput and latency along
with 50th/90th/95/99/etc percentiles and latency point clouds for each sub
test. That was a big machine though.

...and no I'm not going to ignore your 50/50 ratio :) the ratio changes
the results too much. people will have to test with what they expect to
see. if you do 100% get it scales linearly with the number of worker
threads/cores. Anything below 100% gets will slowly scale down to 500-1m
ops/s depending on the hardware and size of the objects.

On Mon, 7 Oct 2019, Pradeep Fernando wrote:

> Hi,
> Thanks for the help!.
> After a couple of trial and error configs, I figured out 'concurrency 
> parameter' used in memaslap as the culprit.
> In my configs I was using 16 (constant) as the concurrency input. Scaling the 
> value along with thread count gave me sane numbers.
>
> The average latency is 120 us for get/set workload ( please ignore my 50/50 
> ratio) and throughput max out around 500K ops/second.
> graph attached.
>
> I know that the benchmarking numbers are heavily dependent on the setup and 
> other things. But Is my numbers are faithful enough to be quoted for 
> memcached  single server numbers? In other words, are these numbers are way 
> off from a typical memcached performance numbers?
>
> thanks
> --Pradeep
>
>
> On Mon, Oct 7, 2019 at 2:42 PM dormando  wrote:
>   Hey,
>
>   Sorry; I'm not going to have any other major insights :) I'd have to sit
>   here playing 20 questions to figure out your test setup. If you're 
> running
>   memaslap from another box, that one needs to be cpu pinned as well. If
>   it's a VM, the governor/etc might not even matter.
>
>   Also I don't use memaslap at all, so I can't attest to it. I use
>   https://github.com/memcached/mc-crusher with the external latency 
> sampling
>   util it comes with. it's not as easy to use though.
>
>   On Mon, 7 Oct 2019, Pradeep Fernando wrote:
>
>   > Hi Dormando,
>   > That is great insight.!.
>   > However, it did not solve the problem. I disabled turbo, as per your 
> instructions.
>   > I even, set the CPU to operate with maximum performance, with 
>   > > cpupower frequency-set --governor performance ( i verified this by 
> monitoring cpu freq)
>   >
>   > Still the same unexplained behavior. :(. Do you have any other 
> suggestions?
>   >
>   > thanks
>   > --Pradeep
>   >
>   > On Mon, Oct 7, 2019 at 1:08 PM dormando  wrote:
>   >       Hi,
>   >
>   >       First as an aside; 1/1 get/set ratio is unusual for mc. The 
> gets scale a
>   >       lot better than sets. If you get into testing more "realistic" 
> perf
>   >       numbers make sure to increase the get rate.
>   >
>   >       You're probably just running into CPU scaling. OS's come with a 
> "battery
>   >       saver" or "ondemand" performance scheduler by default. They 
> also have
>   >       turbo. Once you start loading it up more the CPU will stay in 
> the higher
>   >       frequency states or begin to issue turbo, which will lower the 
> latency.
>   >
>   >       /usr/bin/echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
>   >       cpupower frequency-set -g performance
>   >
>   >       ... or whatever works for your platform.
>   >
>   >       On Mon, 7 Oct 2019, Pradeep Fernando wrote:
>   >
>   >       > Hi Devs,
>   >       > I run memaslap to understand the performance characteristics 
> of memcached,
>   >       >
>   >       > My setup : both memcached and memaslap running on a single 
> machine with NUMA. memcached is bound to NUMA 1. Gave 3GB of memory to 
> memcached.
>   >       > workload : get/set 0.5/0.5
>   >       >
>   >       > I increase number of thread from memaslap and observe 
> throughput latency numbers. 
>   >       >
>   >       > I see increase in throughput (expected) but latency drops as 
> I crease the load. 
>   >       > The initial average latency is 83 us and it drops to 30us 
> with number of threads = 8, this is an unexpected number.-- I expected the 
> latency to go up.
>   >       > Am I reading the output wrong?
>   >       >
>   >       > Apologies, if this question does not qualify for this mailing 
> list. If so, please direct me to correct list I can get help. :)
>   >       >
>   >       > --Pradeep
>   >       >
>   >       >
>   >       >
>   >       > Thread count = 1
>   >       >
>   >       >
>   >       > Total Statistics (11447336 events)
>   >       >    Min:        11
>   >       >    Max:      1663
>   >       >    Avg:        83
>   >       >    Geo:     79.83
>   >       >    Std:     36.39
>   >       >    Log2 Dist:
>   >       >   

Re: Memcached performance numbers

2019-10-07 Thread Pradeep Fernando
Hi,

Thanks for the help!.
After a couple of trial and error configs, I figured out 'concurrency
parameter' used in memaslap as the culprit.
In my configs I was using 16 (constant) as the concurrency input. Scaling
the value along with thread count gave me sane numbers.

The average latency is 120 us for get/set workload ( please ignore my 50/50
ratio) and throughput max out around 500K ops/second.
graph attached.

I know that the benchmarking numbers are heavily dependent on the setup and
other things. But Is my numbers are faithful enough to be quoted for
memcached  single server numbers? In other words, are these numbers are way
off from a typical memcached performance numbers?

thanks
--Pradeep


On Mon, Oct 7, 2019 at 2:42 PM dormando  wrote:

> Hey,
>
> Sorry; I'm not going to have any other major insights :) I'd have to sit
> here playing 20 questions to figure out your test setup. If you're running
> memaslap from another box, that one needs to be cpu pinned as well. If
> it's a VM, the governor/etc might not even matter.
>
> Also I don't use memaslap at all, so I can't attest to it. I use
> https://github.com/memcached/mc-crusher with the external latency sampling
> util it comes with. it's not as easy to use though.
>
> On Mon, 7 Oct 2019, Pradeep Fernando wrote:
>
> > Hi Dormando,
> > That is great insight.!.
> > However, it did not solve the problem. I disabled turbo, as per your
> instructions.
> > I even, set the CPU to operate with maximum performance, with
> > > cpupower frequency-set --governor performance ( i verified this by
> monitoring cpu freq)
> >
> > Still the same unexplained behavior. :(. Do you have any other
> suggestions?
> >
> > thanks
> > --Pradeep
> >
> > On Mon, Oct 7, 2019 at 1:08 PM dormando  wrote:
> >   Hi,
> >
> >   First as an aside; 1/1 get/set ratio is unusual for mc. The gets
> scale a
> >   lot better than sets. If you get into testing more "realistic" perf
> >   numbers make sure to increase the get rate.
> >
> >   You're probably just running into CPU scaling. OS's come with a
> "battery
> >   saver" or "ondemand" performance scheduler by default. They also
> have
> >   turbo. Once you start loading it up more the CPU will stay in the
> higher
> >   frequency states or begin to issue turbo, which will lower the
> latency.
> >
> >   /usr/bin/echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
> >   cpupower frequency-set -g performance
> >
> >   ... or whatever works for your platform.
> >
> >   On Mon, 7 Oct 2019, Pradeep Fernando wrote:
> >
> >   > Hi Devs,
> >   > I run memaslap to understand the performance characteristics of
> memcached,
> >   >
> >   > My setup : both memcached and memaslap running on a single
> machine with NUMA. memcached is bound to NUMA 1. Gave 3GB of memory to
> memcached.
> >   > workload : get/set 0.5/0.5
> >   >
> >   > I increase number of thread from memaslap and observe throughput
> latency numbers.
> >   >
> >   > I see increase in throughput (expected) but latency drops as I
> crease the load.
> >   > The initial average latency is 83 us and it drops to 30us with
> number of threads = 8, this is an unexpected number.-- I expected the
> latency to go up.
> >   > Am I reading the output wrong?
> >   >
> >   > Apologies, if this question does not qualify for this mailing
> list. If so, please direct me to correct list I can get help. :)
> >   >
> >   > --Pradeep
> >   >
> >   >
> >   >
> >   > Thread count = 1
> >   >
> >   >
> >   > Total Statistics (11447336 events)
> >   >Min:11
> >   >Max:  1663
> >   >Avg:83
> >   >Geo: 79.83
> >   >Std: 36.39
> >   >Log2 Dist:
> >   >4:   42  594   351733   9527982
> >   >8:   1551101 7451 7103 1330
> >   >
> >   > cmd_get: 5723671
> >   > cmd_set: 5723681
> >   > get_misses: 0
> >   > written_bytes: 394933167
> >   > read_bytes: 343419948
> >   > object_bytes: 183157792
> >   >
> >   > Run time: 60.0s Ops: 11447352 TPS: 190765 Net_rate: 11.7M/s
> >   >
> >   > Thread count = 2
> >   >
> >   > Total Statistics (30888799 events)
> >   >Min:12
> >   >Max:  2011
> >   >Avg:30
> >   >Geo: 29.68
> >   >Std: 15.32
> >   >Log2 Dist:
> >   >4:   170225   25862674   47866398
> >   >8:  154 501717493  170
> >   >
> >   > cmd_get: 1504
> >   > cmd_set: 1511
> >   > get_misses: 0
> >   > written_bytes: 1065663678
> >   > read_bytes: 926663772
> >   > object_bytes: 494221152
> >   >
> >   > Run time: 60.0s Ops: 3015 TPS: 514751 Net_rate: 31.7M/s
> >   >
> >   > --
> >   >
> >   > ---
> >   > You received this m

Re: Memcached performance numbers

2019-10-07 Thread dormando
Hey,

Sorry; I'm not going to have any other major insights :) I'd have to sit
here playing 20 questions to figure out your test setup. If you're running
memaslap from another box, that one needs to be cpu pinned as well. If
it's a VM, the governor/etc might not even matter.

Also I don't use memaslap at all, so I can't attest to it. I use
https://github.com/memcached/mc-crusher with the external latency sampling
util it comes with. it's not as easy to use though.

On Mon, 7 Oct 2019, Pradeep Fernando wrote:

> Hi Dormando,
> That is great insight.!.
> However, it did not solve the problem. I disabled turbo, as per your 
> instructions.
> I even, set the CPU to operate with maximum performance, with 
> > cpupower frequency-set --governor performance ( i verified this by 
> > monitoring cpu freq)
>
> Still the same unexplained behavior. :(. Do you have any other suggestions?
>
> thanks
> --Pradeep
>
> On Mon, Oct 7, 2019 at 1:08 PM dormando  wrote:
>   Hi,
>
>   First as an aside; 1/1 get/set ratio is unusual for mc. The gets scale a
>   lot better than sets. If you get into testing more "realistic" perf
>   numbers make sure to increase the get rate.
>
>   You're probably just running into CPU scaling. OS's come with a "battery
>   saver" or "ondemand" performance scheduler by default. They also have
>   turbo. Once you start loading it up more the CPU will stay in the higher
>   frequency states or begin to issue turbo, which will lower the latency.
>
>   /usr/bin/echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
>   cpupower frequency-set -g performance
>
>   ... or whatever works for your platform.
>
>   On Mon, 7 Oct 2019, Pradeep Fernando wrote:
>
>   > Hi Devs,
>   > I run memaslap to understand the performance characteristics of 
> memcached,
>   >
>   > My setup : both memcached and memaslap running on a single machine 
> with NUMA. memcached is bound to NUMA 1. Gave 3GB of memory to memcached.
>   > workload : get/set 0.5/0.5
>   >
>   > I increase number of thread from memaslap and observe throughput 
> latency numbers. 
>   >
>   > I see increase in throughput (expected) but latency drops as I crease 
> the load. 
>   > The initial average latency is 83 us and it drops to 30us with number 
> of threads = 8, this is an unexpected number.-- I expected the latency to go 
> up.
>   > Am I reading the output wrong?
>   >
>   > Apologies, if this question does not qualify for this mailing list. 
> If so, please direct me to correct list I can get help. :)
>   >
>   > --Pradeep
>   >
>   >
>   >
>   > Thread count = 1
>   >
>   >
>   > Total Statistics (11447336 events)
>   >    Min:        11
>   >    Max:      1663
>   >    Avg:        83
>   >    Geo:     79.83
>   >    Std:     36.39
>   >    Log2 Dist:
>   >        4:       42      594   351733   9527982
>   >        8:   1551101     7451     7103     1330
>   >
>   > cmd_get: 5723671
>   > cmd_set: 5723681
>   > get_misses: 0
>   > written_bytes: 394933167
>   > read_bytes: 343419948
>   > object_bytes: 183157792
>   >
>   > Run time: 60.0s Ops: 11447352 TPS: 190765 Net_rate: 11.7M/s
>   >
>   > Thread count = 2
>   >
>   > Total Statistics (30888799 events)
>   >    Min:        12
>   >    Max:      2011
>   >    Avg:        30
>   >    Geo:     29.68
>   >    Std:     15.32
>   >    Log2 Dist:
>   >        4:   170225   25862674   478    66398
>   >        8:      154     5017    17493      170
>   >
>   > cmd_get: 1504
>   > cmd_set: 1511
>   > get_misses: 0
>   > written_bytes: 1065663678
>   > read_bytes: 926663772
>   > object_bytes: 494221152
>   >
>   > Run time: 60.0s Ops: 3015 TPS: 514751 Net_rate: 31.7M/s
>   >
>   > --
>   >
>   > ---
>   > You received this message because you are subscribed to the Google 
> Groups "memcached" group.
>   > To unsubscribe from this group and stop receiving emails from it, 
> send an email to memcached+unsubscr...@googlegroups.com.
>   > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/memcached/cd593815-c27b-4995-bb7f-21859d9a3187%40googlegroups.com.
>   >
>   >
>
>   --
>
>   ---
>   You received this message because you are subscribed to the Google 
> Groups "memcached" group.
>   To unsubscribe from this group and stop receiving emails from it, send 
> an email to memcached+unsubscr...@googlegroups.com.
>   To view this discussion on the web visit 
> https://groups.google.com/d/msgid/memcached/alpine.DEB.2.21.1910071005340.21578%40dskull.
>
>
>
> --
> Pradeep Fernando.
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group

Re: Memcached performance numbers

2019-10-07 Thread Pradeep Fernando
Hi Dormando,

That is great insight.!.
However, it did not solve the problem. I disabled turbo, as per your
instructions.
I even, set the CPU to operate with maximum performance, with
> cpupower frequency-set --governor performance ( i verified this by
monitoring cpu freq)

Still the same unexplained behavior. :(. Do you have any other suggestions?

thanks
--Pradeep

On Mon, Oct 7, 2019 at 1:08 PM dormando  wrote:

> Hi,
>
> First as an aside; 1/1 get/set ratio is unusual for mc. The gets scale a
> lot better than sets. If you get into testing more "realistic" perf
> numbers make sure to increase the get rate.
>
> You're probably just running into CPU scaling. OS's come with a "battery
> saver" or "ondemand" performance scheduler by default. They also have
> turbo. Once you start loading it up more the CPU will stay in the higher
> frequency states or begin to issue turbo, which will lower the latency.
>
> /usr/bin/echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
> cpupower frequency-set -g performance
>
> ... or whatever works for your platform.
>
> On Mon, 7 Oct 2019, Pradeep Fernando wrote:
>
> > Hi Devs,
> > I run memaslap to understand the performance characteristics of
> memcached,
> >
> > My setup : both memcached and memaslap running on a single machine with
> NUMA. memcached is bound to NUMA 1. Gave 3GB of memory to memcached.
> > workload : get/set 0.5/0.5
> >
> > I increase number of thread from memaslap and observe throughput latency
> numbers.
> >
> > I see increase in throughput (expected) but latency drops as I crease
> the load.
> > The initial average latency is 83 us and it drops to 30us with number of
> threads = 8, this is an unexpected number.-- I expected the latency to go
> up.
> > Am I reading the output wrong?
> >
> > Apologies, if this question does not qualify for this mailing list. If
> so, please direct me to correct list I can get help. :)
> >
> > --Pradeep
> >
> >
> >
> > Thread count = 1
> >
> >
> > Total Statistics (11447336 events)
> >Min:11
> >Max:  1663
> >Avg:83
> >Geo: 79.83
> >Std: 36.39
> >Log2 Dist:
> >4:   42  594   351733   9527982
> >8:   1551101 7451 7103 1330
> >
> > cmd_get: 5723671
> > cmd_set: 5723681
> > get_misses: 0
> > written_bytes: 394933167
> > read_bytes: 343419948
> > object_bytes: 183157792
> >
> > Run time: 60.0s Ops: 11447352 TPS: 190765 Net_rate: 11.7M/s
> >
> > Thread count = 2
> >
> > Total Statistics (30888799 events)
> >Min:12
> >Max:  2011
> >Avg:30
> >Geo: 29.68
> >Std: 15.32
> >Log2 Dist:
> >4:   170225   25862674   47866398
> >8:  154 501717493  170
> >
> > cmd_get: 1504
> > cmd_set: 1511
> > get_misses: 0
> > written_bytes: 1065663678
> > read_bytes: 926663772
> > object_bytes: 494221152
> >
> > Run time: 60.0s Ops: 3015 TPS: 514751 Net_rate: 31.7M/s
> >
> > --
> >
> > ---
> > You received this message because you are subscribed to the Google
> Groups "memcached" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to memcached+unsubscr...@googlegroups.com.
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/memcached/cd593815-c27b-4995-bb7f-21859d9a3187%40googlegroups.com
> .
> >
> >
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to memcached+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/memcached/alpine.DEB.2.21.1910071005340.21578%40dskull
> .
>


-- 
Pradeep Fernando.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/memcached/CAPSEm%3Dbas4jg_8ToMP3VBVE3GMfzk9vHaQ7TMXdmqUww67dFsg%40mail.gmail.com.


Re: Memcached performance numbers

2019-10-07 Thread dormando
Hi,

First as an aside; 1/1 get/set ratio is unusual for mc. The gets scale a
lot better than sets. If you get into testing more "realistic" perf
numbers make sure to increase the get rate.

You're probably just running into CPU scaling. OS's come with a "battery
saver" or "ondemand" performance scheduler by default. They also have
turbo. Once you start loading it up more the CPU will stay in the higher
frequency states or begin to issue turbo, which will lower the latency.

/usr/bin/echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
cpupower frequency-set -g performance

... or whatever works for your platform.

On Mon, 7 Oct 2019, Pradeep Fernando wrote:

> Hi Devs,
> I run memaslap to understand the performance characteristics of memcached,
>
> My setup : both memcached and memaslap running on a single machine with NUMA. 
> memcached is bound to NUMA 1. Gave 3GB of memory to memcached.
> workload : get/set 0.5/0.5
>
> I increase number of thread from memaslap and observe throughput latency 
> numbers. 
>
> I see increase in throughput (expected) but latency drops as I crease the 
> load. 
> The initial average latency is 83 us and it drops to 30us with number of 
> threads = 8, this is an unexpected number.-- I expected the latency to go up.
> Am I reading the output wrong?
>
> Apologies, if this question does not qualify for this mailing list. If so, 
> please direct me to correct list I can get help. :)
>
> --Pradeep
>
>
>
> Thread count = 1
>
>
> Total Statistics (11447336 events)
>    Min:        11
>    Max:      1663
>    Avg:        83
>    Geo:     79.83
>    Std:     36.39
>    Log2 Dist:
>        4:       42      594   351733   9527982
>        8:   1551101     7451     7103     1330
>
> cmd_get: 5723671
> cmd_set: 5723681
> get_misses: 0
> written_bytes: 394933167
> read_bytes: 343419948
> object_bytes: 183157792
>
> Run time: 60.0s Ops: 11447352 TPS: 190765 Net_rate: 11.7M/s
>
> Thread count = 2
>
> Total Statistics (30888799 events)
>    Min:        12
>    Max:      2011
>    Avg:        30
>    Geo:     29.68
>    Std:     15.32
>    Log2 Dist:
>        4:   170225   25862674   478    66398
>        8:      154     5017    17493      170
>
> cmd_get: 1504
> cmd_set: 1511
> get_misses: 0
> written_bytes: 1065663678
> read_bytes: 926663772
> object_bytes: 494221152
>
> Run time: 60.0s Ops: 3015 TPS: 514751 Net_rate: 31.7M/s
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/memcached/cd593815-c27b-4995-bb7f-21859d9a3187%40googlegroups.com.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/memcached/alpine.DEB.2.21.1910071005340.21578%40dskull.