Okay, so it's the big rollup that gets delayed. Makes sense. You're using binary protocol for everything? That's a major focus of my performance annoyance right now, since every response packet is sent individually. I should have that switched to an option at least pretty soon, which should also help with the time it takes to service them.
I'll test both ascii and binprot + the req_per_event option to see how bad this is measurably. On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote: > The client is the EVCache client jar: https://github.com/netflix/evcache > When a user calls the batch get function on the client, it will spread those > batch gets out over many servers because it is hashing keys to different > servers. Imagine many of > these batch gets happening at the same time, though, and each server's queue > will get a bunch of gets from a bunch of different user-facing batch gets. It > all gets intermixed. > These client-side read queues are rather large (10000) and might end up > sending a batch of a few hundred keys at a time. These large batch gets are > sent off to the servers as > "one" getq|getq|getq|getq|getq|getq|getq|getq|getq|getq|noop package and read > back in that order. We are reading the responses fairly efficiently > internally, but the batch get > call that the user made is waiting on the data from all of these separate > servers to come back in order to properly respond to the user in a > synchronous manner. > > Now on the memcached side, there's many servers all doing this same pattern > of many large batch gets. Memcached will stop responding to that connection > after 20 requests on the > same event and go serve other connections. If that happens, any user-facing > batch call that is waiting on any getq command still waiting to be serviced > on that connection can > be delayed. It doesn't normally end up causing timeouts but it does at a low > level. > > Our timeouts for this app in particular are 5 seconds for a single > user-facing batch get call. This client app is fine with higher latency for > higher throughput. > > At this point we have the reqs_per_event set to a rather high 300 and it > seems to have solved our problem. I don't think it's causing any more > consternation (for now), but > having a dynamic setting would have lowered the operational complexity of the > tuning. > > > Scott Mansfield > Product > Consumer Science Eng > EVCache > Sr. Software Eng > { > M: 352-514-9452 > E: smansfi...@netflix.com > K: {M: mobile, E: email, K: key} > } > > On Wed, Jan 25, 2017 at 11:04 AM, dormando <dorma...@rydia.net> wrote: > I guess when I say dynamic I mostly mean runttime-settable. Dynamic is a > little harder so I tend to do those as a second pass. > > You're saying your client had head-of-line blocking for unrelated > requests? I'm not 100% sure I follow. > > Big multiget comes in, multiget gets processed slightly slower than > normal > due to other clients making requests, so requests *behind* the multiget > time out, or the multiget itself? > > How long is your timeout? :P > > I'll take a look at it as well and see about raising the limit in `-o > modern` after some performance tests. The default is from 2006. > > thanks! > > On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote: > > > The reqs_per_event setting was causing a client that was doing large > batch-gets (of a few hundred keys) to see some timeouts. Since memcached will > delay > responding fully until > > other connections are serviced and our client will wait until the > batch is done, we see some client-side timeouts for the users of our client > library. Our > solution has been to > > up the setting during startup, but just as a thought experiment I was > asking if we could have done it dynamically to avoid losing data. At the > moment there's > quite a lot of > > machinery to change the setting (deploy, copy data over with our > cache warmer, flip traffic, tear down old boxes) and I would have rather left > everything as is > and adjusted the > > setting on the fly until our client's problem was resolved. > > I'm interested in patching this specific setting to be settable, but > having it fully dynamic in nature is not something I'd want to tackle. > There's a natural > tradeoff of > > latency for other connections / throughput for the one that is > currently being serviced. I'm not sure it's a good idea to dynamically change > that. It might cause > unexpected > > behavior if one bad client sends huge requests. > > > > > > Scott Mansfield > > Product > Consumer Science Eng > EVCache > Sr. Software Eng > > { > > M: 352-514-9452 > > E: smansfi...@netflix.com > > K: {M: mobile, E: email, K: key} > > } > > > > On Tue, Jan 24, 2017 at 11:53 AM, dormando <dorma...@rydia.net> wrote: > > Hey, > > > > Would you mind explaining a bit how you determined the setting > was causing > > an issue, and what the impact was? The default there is very > old and might > > be worth a revisit (or some kind of auto-tuning) as well. > > > > I've been trending as much as possible to online configuration, > inlcuding > > the actual memory limit.. You can turn the lru crawler on and > off, > > automoving on and off, manually move slab pages, etc. I'm > hoping to make > > the LRU algorithm itself modifyable at runtime. > > > > So yeah, I'd take a patch :) > > > > On Mon, 23 Jan 2017, 'Scott Mansfield' via memcached wrote: > > > > > There was a single setting my team was looking at today and > wish we could have changed dynamically: the > > > reqs_per_event setting. Right now in order to change it we > need to shut down the process and start it again > > > with a different -R parameter. I don't see a way to change > many of the settings, though there are some that > > > are ad-hoc changeable through some stats commands. I was > going to see if I could patch memcached to be able > > > to change the reqs_per_event setting at runtime, but before > doing so I wanted to check to see if that's > > > something that would be amenable. I also didn't want to do > something specifically for that setting if it was > > > going to be better to add it as a general feature. > > > I see some pros and cons: > > > > > > One easy pro is that you can easily change things at runtime > to save performance while not losing all of > > > your data. If client request patterns change, the process can > react. > > > > > > A con is that the startup parameters won't necessarily match > what the process is doing, so they are no > > > longer going to be a useful way to determine the settings of > memcached. Instead you would need to connect > > > and issue a stats settings command to read them. It also > introduces change in places that may have > > > previously never seen it, e.g. the reqs_per_event setting is > simply read at the beginning of the > > > drive_machine loop. It might need some kind of > synchronization around it now instead. I don't think it > > > necessarily needs it on x86_64 but it might on other > platforms which I am not familiar with. > > > > > > -- > > > > > > --- > > > You received this message because you are subscribed to the > Google Groups "memcached" group. > > > To unsubscribe from this group and stop receiving emails from > it, send an email to > > > memcached+unsubscr...@googlegroups.com. > > > For more options, visit https://groups.google.com/d/optout. > > > > > > > > > > -- > > > > --- > > You received this message because you are subscribed to a topic > in the Google Groups "memcached" group. > > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/memcached/C6l8aoXQO4A/unsubscribe. > > To unsubscribe from this group and all its topics, send an > email to memcached+unsubscr...@googlegroups.com. > > For more options, visit https://groups.google.com/d/optout. > > > > > > -- > > > > --- > > You received this message because you are subscribed to the Google > Groups "memcached" group. > > To unsubscribe from this group and stop receiving emails from it, > send an email to memcached+unsubscr...@googlegroups.com. > > For more options, visit https://groups.google.com/d/optout. > > > > > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "memcached" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/memcached/C6l8aoXQO4A/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > memcached+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > > -- > > --- > You received this message because you are subscribed to the Google Groups > "memcached" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to memcached+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.