Yeah gimme a few weeks maybe. Reducing those syscalls is like almost all of the CPU usage. Difference between 1.2m keys/sec and 35m keys/sec on 20 cores in my own tests.
I did this: https://github.com/memcached/memcached/pull/243 .. which would help batch perf. and this: https://github.com/memcached/memcached/pull/241 .. which should make binprot perf better at nearly undetectable cost to ascii. so, working my way to it. On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote: > Yes, our production traffic all uses binary protocol, even behind our > on-server proxy that we use. In fact, if you have a way to reduce syscalls by > batching responses, that > would solve another huge pain we have that's of our own doing. > > > Scott Mansfield > Product > Consumer Science Eng > EVCache > Sr. Software Eng > { > M: 352-514-9452 > E: smansfi...@netflix.com > K: {M: mobile, E: email, K: key} > } > > On Wed, Jan 25, 2017 at 11:33 AM, dormando <dorma...@rydia.net> wrote: > Okay, so it's the big rollup that gets delayed. Makes sense. > > You're using binary protocol for everything? That's a major focus of my > performance annoyance right now, since every response packet is sent > individually. I should have that switched to an option at least pretty > soon, which should also help with the time it takes to service them. > > I'll test both ascii and binprot + the req_per_event option to see how > bad > this is measurably. > > On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote: > > > The client is the EVCache client jar: > https://github.com/netflix/evcache > > When a user calls the batch get function on the client, it will > spread those batch gets out over many servers because it is hashing keys to > different servers. > Imagine many of > > these batch gets happening at the same time, though, and each > server's queue will get a bunch of gets from a bunch of different user-facing > batch gets. It all > gets intermixed. > > These client-side read queues are rather large (10000) and might end > up sending a batch of a few hundred keys at a time. These large batch gets > are sent off to > the servers as > > "one" getq|getq|getq|getq|getq|getq|getq|getq|getq|getq|noop package > and read back in that order. We are reading the responses fairly efficiently > internally, but > the batch get > > call that the user made is waiting on the data from all of these > separate servers to come back in order to properly respond to the user in a > synchronous manner. > > > > Now on the memcached side, there's many servers all doing this same > pattern of many large batch gets. Memcached will stop responding to that > connection after 20 > requests on the > > same event and go serve other connections. If that happens, any > user-facing batch call that is waiting on any getq command still waiting to > be serviced on that > connection can > > be delayed. It doesn't normally end up causing timeouts but it does > at a low level. > > > > Our timeouts for this app in particular are 5 seconds for a single > user-facing batch get call. This client app is fine with higher latency for > higher throughput. > > > > At this point we have the reqs_per_event set to a rather high 300 and > it seems to have solved our problem. I don't think it's causing any more > consternation (for > now), but > > having a dynamic setting would have lowered the operational > complexity of the tuning. > > > > > > Scott Mansfield > > Product > Consumer Science Eng > EVCache > Sr. Software Eng > > { > > M: 352-514-9452 > > E: smansfi...@netflix.com > > K: {M: mobile, E: email, K: key} > > } > > > > On Wed, Jan 25, 2017 at 11:04 AM, dormando <dorma...@rydia.net> wrote: > > I guess when I say dynamic I mostly mean runttime-settable. > Dynamic is a > > little harder so I tend to do those as a second pass. > > > > You're saying your client had head-of-line blocking for > unrelated > > requests? I'm not 100% sure I follow. > > > > Big multiget comes in, multiget gets processed slightly slower > than normal > > due to other clients making requests, so requests *behind* the > multiget > > time out, or the multiget itself? > > > > How long is your timeout? :P > > > > I'll take a look at it as well and see about raising the limit > in `-o > > modern` after some performance tests. The default is from 2006. > > > > thanks! > > > > On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote: > > > > > The reqs_per_event setting was causing a client that was > doing large batch-gets (of a few hundred keys) to see some timeouts. Since > memcached will delay > > responding fully until > > > other connections are serviced and our client will wait until > the batch is done, we see some client-side timeouts for the users of our > client library. Our > > solution has been to > > > up the setting during startup, but just as a thought > experiment I was asking if we could have done it dynamically to avoid losing > data. At the moment > there's > > quite a lot of > > > machinery to change the setting (deploy, copy data over with > our cache warmer, flip traffic, tear down old boxes) and I would have rather > left everything > as is > > and adjusted the > > > setting on the fly until our client's problem was resolved. > > > I'm interested in patching this specific setting to be > settable, but having it fully dynamic in nature is not something I'd want to > tackle. There's a > natural > > tradeoff of > > > latency for other connections / throughput for the one that > is currently being serviced. I'm not sure it's a good idea to dynamically > change that. It > might cause > > unexpected > > > behavior if one bad client sends huge requests. > > > > > > > > > Scott Mansfield > > > Product > Consumer Science Eng > EVCache > Sr. Software Eng > > > { > > > M: 352-514-9452 > > > E: smansfi...@netflix.com > > > K: {M: mobile, E: email, K: key} > > > } > > > > > > On Tue, Jan 24, 2017 at 11:53 AM, dormando > <dorma...@rydia.net> wrote: > > > Hey, > > > > > > Would you mind explaining a bit how you determined the > setting was causing > > > an issue, and what the impact was? The default there is > very old and might > > > be worth a revisit (or some kind of auto-tuning) as > well. > > > > > > I've been trending as much as possible to online > configuration, inlcuding > > > the actual memory limit.. You can turn the lru crawler > on and off, > > > automoving on and off, manually move slab pages, etc. > I'm hoping to make > > > the LRU algorithm itself modifyable at runtime. > > > > > > So yeah, I'd take a patch :) > > > > > > On Mon, 23 Jan 2017, 'Scott Mansfield' via memcached > wrote: > > > > > > > There was a single setting my team was looking at > today and wish we could have changed dynamically: the > > > > reqs_per_event setting. Right now in order to change > it we need to shut down the process and start it again > > > > with a different -R parameter. I don't see a way to > change many of the settings, though there are some that > > > > are ad-hoc changeable through some stats commands. I > was going to see if I could patch memcached to be able > > > > to change the reqs_per_event setting at runtime, but > before doing so I wanted to check to see if that's > > > > something that would be amenable. I also didn't want > to do something specifically for that setting if it was > > > > going to be better to add it as a general feature. > > > > I see some pros and cons: > > > > > > > > One easy pro is that you can easily change things at > runtime to save performance while not losing all of > > > > your data. If client request patterns change, the > process can react. > > > > > > > > A con is that the startup parameters won't > necessarily match what the process is doing, so they are no > > > > longer going to be a useful way to determine the > settings of memcached. Instead you would need to connect > > > > and issue a stats settings command to read them. It > also introduces change in places that may have > > > > previously never seen it, e.g. the reqs_per_event > setting is simply read at the beginning of the > > > > drive_machine loop. It might need some kind of > synchronization around it now instead. I don't think it > > > > necessarily needs it on x86_64 but it might on other > platforms which I am not familiar with. > > > > > > > > -- > > > > > > > > --- > > > > You received this message because you are subscribed > to the Google Groups "memcached" group. > > > > To unsubscribe from this group and stop receiving > emails from it, send an email to > > > > memcached+unsubscr...@googlegroups.com. > > > > For more options, visit > https://groups.google.com/d/optout. > > > > > > > > > > > > > > -- > > > > > > --- > > > You received this message because you are subscribed to > a topic in the Google Groups "memcached" group. > > > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/memcached/C6l8aoXQO4A/unsubscribe. > > > To unsubscribe from this group and all its topics, send > an email to memcached+unsubscr...@googlegroups.com. > > > For more options, visit > https://groups.google.com/d/optout. > > > > > > > > > -- > > > > > > --- > > > You received this message because you are subscribed to the > Google Groups "memcached" group. > > > To unsubscribe from this group and stop receiving emails from > it, send an email to memcached+unsubscr...@googlegroups.com. > > > For more options, visit https://groups.google.com/d/optout. > > > > > > > > > > -- > > > > --- > > You received this message because you are subscribed to a topic > in the Google Groups "memcached" group. > > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/memcached/C6l8aoXQO4A/unsubscribe. > > To unsubscribe from this group and all its topics, send an > email to memcached+unsubscr...@googlegroups.com. > > For more options, visit https://groups.google.com/d/optout. > > > > > > -- > > > > --- > > You received this message because you are subscribed to the Google > Groups "memcached" group. > > To unsubscribe from this group and stop receiving emails from it, > send an email to memcached+unsubscr...@googlegroups.com. > > For more options, visit https://groups.google.com/d/optout. > > > > > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "memcached" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/memcached/C6l8aoXQO4A/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > memcached+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > > -- > > --- > You received this message because you are subscribed to the Google Groups > "memcached" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to memcached+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.