Re: Dynamic settings

dormando Wed, 25 Jan 2017 13:52:40 -0800

Yeah gimme a few weeks maybe. Reducing those syscalls is like almost all
of the CPU usage. Difference between 1.2m keys/sec and 35m keys/sec on 20
cores in my own tests.


I did this:
https://github.com/memcached/memcached/pull/243
.. which would help batch perf.
and this:
https://github.com/memcached/memcached/pull/241
.. which should make binprot perf better at nearly undetectable cost to
ascii.

so, working my way to it.

On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote:

> Yes, our production traffic all uses binary protocol, even behind our 
> on-server proxy that we use. In fact, if you have a way to reduce syscalls by 
> batching responses, that
> would solve another huge pain we have that's of our own doing.
>
>
> Scott Mansfield
> Product > Consumer Science Eng > EVCache > Sr. Software Eng
> {
>   M: 352-514-9452
>   E: smansfi...@netflix.com
>   K: {M: mobile, E: email, K: key}
> }
>
> On Wed, Jan 25, 2017 at 11:33 AM, dormando <dorma...@rydia.net> wrote:
>       Okay, so it's the big rollup that gets delayed. Makes sense.
>
>       You're using binary protocol for everything? That's a major focus of my
>       performance annoyance right now, since every response packet is sent
>       individually. I should have that switched to an option at least pretty
>       soon, which should also help with the time it takes to service them.
>
>       I'll test both ascii and binprot + the req_per_event option to see how 
> bad
>       this is measurably.
>
>       On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote:
>
>       > The client is the EVCache client jar: 
> https://github.com/netflix/evcache
>       > When a user calls the batch get function on the client, it will 
> spread those batch gets out over many servers because it is hashing keys to 
> different servers.
>       Imagine many of
>       > these batch gets happening at the same time, though, and each 
> server's queue will get a bunch of gets from a bunch of different user-facing 
> batch gets. It all
>       gets intermixed.
>       > These client-side read queues are rather large (10000) and might end 
> up sending a batch of a few hundred keys at a time. These large batch gets 
> are sent off to
>       the servers as
>       > "one" getq|getq|getq|getq|getq|getq|getq|getq|getq|getq|noop package 
> and read back in that order. We are reading the responses fairly efficiently 
> internally, but
>       the batch get
>       > call that the user made is waiting on the data from all of these 
> separate servers to come back in order to properly respond to the user in a 
> synchronous manner. 
>       >
>       > Now on the memcached side, there's many servers all doing this same 
> pattern of many large batch gets. Memcached will stop responding to that 
> connection after 20
>       requests on the
>       > same event and go serve other connections. If that happens, any 
> user-facing batch call that is waiting on any getq command still waiting to 
> be serviced on that
>       connection can
>       > be delayed. It doesn't normally end up causing timeouts but it does 
> at a low level.
>       >
>       > Our timeouts for this app in particular are 5 seconds for a single 
> user-facing batch get call. This client app is fine with higher latency for 
> higher throughput.
>       >
>       > At this point we have the reqs_per_event set to a rather high 300 and 
> it seems to have solved our problem. I don't think it's causing any more 
> consternation (for
>       now), but
>       > having a dynamic setting would have lowered the operational 
> complexity of the tuning.
>       >
>       >
>       > Scott Mansfield
>       > Product > Consumer Science Eng > EVCache > Sr. Software Eng
>       > {
>       >   M: 352-514-9452
>       >   E: smansfi...@netflix.com
>       >   K: {M: mobile, E: email, K: key}
>       > }
>       >
>       > On Wed, Jan 25, 2017 at 11:04 AM, dormando <dorma...@rydia.net> wrote:
>       >       I guess when I say dynamic I mostly mean runttime-settable. 
> Dynamic is a
>       >       little harder so I tend to do those as a second pass.
>       >
>       >       You're saying your client had head-of-line blocking for 
> unrelated
>       >       requests? I'm not 100% sure I follow.
>       >
>       >       Big multiget comes in, multiget gets processed slightly slower 
> than normal
>       >       due to other clients making requests, so requests *behind* the 
> multiget
>       >       time out, or the multiget itself?
>       >
>       >       How long is your timeout? :P
>       >
>       >       I'll take a look at it as well and see about raising the limit 
> in `-o
>       >       modern` after some performance tests. The default is from 2006.
>       >
>       >       thanks!
>       >
>       >       On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote:
>       >
>       >       > The reqs_per_event setting was causing a client that was 
> doing large batch-gets (of a few hundred keys) to see some timeouts. Since 
> memcached will delay
>       >       responding fully until
>       >       > other connections are serviced and our client will wait until 
> the batch is done, we see some client-side timeouts for the users of our 
> client library. Our
>       >       solution has been to
>       >       > up the setting during startup, but just as a thought 
> experiment I was asking if we could have done it dynamically to avoid losing 
> data. At the moment
>       there's
>       >       quite a lot of
>       >       > machinery to change the setting (deploy, copy data over with 
> our cache warmer, flip traffic, tear down old boxes) and I would have rather 
> left everything
>       as is
>       >       and adjusted the
>       >       > setting on the fly until our client's problem was resolved.
>       >       > I'm interested in patching this specific setting to be 
> settable, but having it fully dynamic in nature is not something I'd want to 
> tackle. There's a
>       natural
>       >       tradeoff of
>       >       > latency for other connections / throughput for the one that 
> is currently being serviced. I'm not sure it's a good idea to dynamically 
> change that. It
>       might cause
>       >       unexpected
>       >       > behavior if one bad client sends huge requests.
>       >       >
>       >       >
>       >       > Scott Mansfield
>       >       > Product > Consumer Science Eng > EVCache > Sr. Software Eng
>       >       > {
>       >       >   M: 352-514-9452
>       >       >   E: smansfi...@netflix.com
>       >       >   K: {M: mobile, E: email, K: key}
>       >       > }
>       >       >
>       >       > On Tue, Jan 24, 2017 at 11:53 AM, dormando 
> <dorma...@rydia.net> wrote:
>       >       >       Hey,
>       >       >
>       >       >       Would you mind explaining a bit how you determined the 
> setting was causing
>       >       >       an issue, and what the impact was? The default there is 
> very old and might
>       >       >       be worth a revisit (or some kind of auto-tuning) as 
> well.
>       >       >
>       >       >       I've been trending as much as possible to online 
> configuration, inlcuding
>       >       >       the actual memory limit.. You can turn the lru crawler 
> on and off,
>       >       >       automoving on and off, manually move slab pages, etc. 
> I'm hoping to make
>       >       >       the LRU algorithm itself modifyable at runtime.
>       >       >
>       >       >       So yeah, I'd take a patch :)
>       >       >
>       >       >       On Mon, 23 Jan 2017, 'Scott Mansfield' via memcached 
> wrote:
>       >       >
>       >       >       > There was a single setting my team was looking at 
> today and wish we could have changed dynamically: the
>       >       >       > reqs_per_event setting. Right now in order to change 
> it we need to shut down the process and start it again
>       >       >       > with a different -R parameter. I don't see a way to 
> change many of the settings, though there are some that
>       >       >       > are ad-hoc changeable through some stats commands. I 
> was going to see if I could patch memcached to be able
>       >       >       > to change the reqs_per_event setting at runtime, but 
> before doing so I wanted to check to see if that's
>       >       >       > something that would be amenable. I also didn't want 
> to do something specifically for that setting if it was
>       >       >       > going to be better to add it as a general feature.
>       >       >       > I see some pros and cons:
>       >       >       >
>       >       >       > One easy pro is that you can easily change things at 
> runtime to save performance while not losing all of
>       >       >       > your data. If client request patterns change, the 
> process can react.
>       >       >       >
>       >       >       > A con is that the startup parameters won't 
> necessarily match what the process is doing, so they are no
>       >       >       > longer going to be a useful way to determine the 
> settings of memcached. Instead you would need to connect
>       >       >       > and issue a stats settings command to read them. It 
> also introduces change in places that may have
>       >       >       > previously never seen it, e.g. the reqs_per_event 
> setting is simply read at the beginning of the
>       >       >       > drive_machine loop. It might need some kind of 
> synchronization around it now instead. I don't think it
>       >       >       > necessarily needs it on x86_64 but it might on other 
> platforms which I am not familiar with.
>       >       >       >
>       >       >       > --
>       >       >       >
>       >       >       > ---
>       >       >       > You received this message because you are subscribed 
> to the Google Groups "memcached" group.
>       >       >       > To unsubscribe from this group and stop receiving 
> emails from it, send an email to
>       >       >       > memcached+unsubscr...@googlegroups.com.
>       >       >       > For more options, visit 
> https://groups.google.com/d/optout.
>       >       >       >
>       >       >       >
>       >       >
>       >       >       --
>       >       >
>       >       >       ---
>       >       >       You received this message because you are subscribed to 
> a topic in the Google Groups "memcached" group.
>       >       >       To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/memcached/C6l8aoXQO4A/unsubscribe.
>       >       >       To unsubscribe from this group and all its topics, send 
> an email to memcached+unsubscr...@googlegroups.com.
>       >       >       For more options, visit 
> https://groups.google.com/d/optout.
>       >       >
>       >       >
>       >       > --
>       >       >
>       >       > ---
>       >       > You received this message because you are subscribed to the 
> Google Groups "memcached" group.
>       >       > To unsubscribe from this group and stop receiving emails from 
> it, send an email to memcached+unsubscr...@googlegroups.com.
>       >       > For more options, visit https://groups.google.com/d/optout.
>       >       >
>       >       >
>       >
>       >       --
>       >
>       >       ---
>       >       You received this message because you are subscribed to a topic 
> in the Google Groups "memcached" group.
>       >       To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/memcached/C6l8aoXQO4A/unsubscribe.
>       >       To unsubscribe from this group and all its topics, send an 
> email to memcached+unsubscr...@googlegroups.com.
>       >       For more options, visit https://groups.google.com/d/optout.
>       >
>       >
>       > --
>       >
>       > ---
>       > You received this message because you are subscribed to the Google 
> Groups "memcached" group.
>       > To unsubscribe from this group and stop receiving emails from it, 
> send an email to memcached+unsubscr...@googlegroups.com.
>       > For more options, visit https://groups.google.com/d/optout.
>       >
>       >
>
>       --
>
>       ---
>       You received this message because you are subscribed to a topic in the 
> Google Groups "memcached" group.
>       To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/memcached/C6l8aoXQO4A/unsubscribe.
>       To unsubscribe from this group and all its topics, send an email to 
> memcached+unsubscr...@googlegroups.com.
>       For more options, visit https://groups.google.com/d/optout.
>
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Dynamic settings

Reply via email to