Okay, so it's the big rollup that gets delayed. Makes sense.

You're using binary protocol for everything? That's a major focus of my
performance annoyance right now, since every response packet is sent
individually. I should have that switched to an option at least pretty
soon, which should also help with the time it takes to service them.

I'll test both ascii and binprot + the req_per_event option to see how bad
this is measurably.

On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote:

> The client is the EVCache client jar: https://github.com/netflix/evcache
> When a user calls the batch get function on the client, it will spread those 
> batch gets out over many servers because it is hashing keys to different 
> servers. Imagine many of
> these batch gets happening at the same time, though, and each server's queue 
> will get a bunch of gets from a bunch of different user-facing batch gets. It 
> all gets intermixed.
> These client-side read queues are rather large (10000) and might end up 
> sending a batch of a few hundred keys at a time. These large batch gets are 
> sent off to the servers as
> "one" getq|getq|getq|getq|getq|getq|getq|getq|getq|getq|noop package and read 
> back in that order. We are reading the responses fairly efficiently 
> internally, but the batch get
> call that the user made is waiting on the data from all of these separate 
> servers to come back in order to properly respond to the user in a 
> synchronous manner. 
>
> Now on the memcached side, there's many servers all doing this same pattern 
> of many large batch gets. Memcached will stop responding to that connection 
> after 20 requests on the
> same event and go serve other connections. If that happens, any user-facing 
> batch call that is waiting on any getq command still waiting to be serviced 
> on that connection can
> be delayed. It doesn't normally end up causing timeouts but it does at a low 
> level.
>
> Our timeouts for this app in particular are 5 seconds for a single 
> user-facing batch get call. This client app is fine with higher latency for 
> higher throughput.
>
> At this point we have the reqs_per_event set to a rather high 300 and it 
> seems to have solved our problem. I don't think it's causing any more 
> consternation (for now), but
> having a dynamic setting would have lowered the operational complexity of the 
> tuning.
>
>
> Scott Mansfield
> Product > Consumer Science Eng > EVCache > Sr. Software Eng
> {
>   M: 352-514-9452
>   E: smansfi...@netflix.com
>   K: {M: mobile, E: email, K: key}
> }
>
> On Wed, Jan 25, 2017 at 11:04 AM, dormando <dorma...@rydia.net> wrote:
>       I guess when I say dynamic I mostly mean runttime-settable. Dynamic is a
>       little harder so I tend to do those as a second pass.
>
>       You're saying your client had head-of-line blocking for unrelated
>       requests? I'm not 100% sure I follow.
>
>       Big multiget comes in, multiget gets processed slightly slower than 
> normal
>       due to other clients making requests, so requests *behind* the multiget
>       time out, or the multiget itself?
>
>       How long is your timeout? :P
>
>       I'll take a look at it as well and see about raising the limit in `-o
>       modern` after some performance tests. The default is from 2006.
>
>       thanks!
>
>       On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote:
>
>       > The reqs_per_event setting was causing a client that was doing large 
> batch-gets (of a few hundred keys) to see some timeouts. Since memcached will 
> delay
>       responding fully until
>       > other connections are serviced and our client will wait until the 
> batch is done, we see some client-side timeouts for the users of our client 
> library. Our
>       solution has been to
>       > up the setting during startup, but just as a thought experiment I was 
> asking if we could have done it dynamically to avoid losing data. At the 
> moment there's
>       quite a lot of
>       > machinery to change the setting (deploy, copy data over with our 
> cache warmer, flip traffic, tear down old boxes) and I would have rather left 
> everything as is
>       and adjusted the
>       > setting on the fly until our client's problem was resolved.
>       > I'm interested in patching this specific setting to be settable, but 
> having it fully dynamic in nature is not something I'd want to tackle. 
> There's a natural
>       tradeoff of
>       > latency for other connections / throughput for the one that is 
> currently being serviced. I'm not sure it's a good idea to dynamically change 
> that. It might cause
>       unexpected
>       > behavior if one bad client sends huge requests.
>       >
>       >
>       > Scott Mansfield
>       > Product > Consumer Science Eng > EVCache > Sr. Software Eng
>       > {
>       >   M: 352-514-9452
>       >   E: smansfi...@netflix.com
>       >   K: {M: mobile, E: email, K: key}
>       > }
>       >
>       > On Tue, Jan 24, 2017 at 11:53 AM, dormando <dorma...@rydia.net> wrote:
>       >       Hey,
>       >
>       >       Would you mind explaining a bit how you determined the setting 
> was causing
>       >       an issue, and what the impact was? The default there is very 
> old and might
>       >       be worth a revisit (or some kind of auto-tuning) as well.
>       >
>       >       I've been trending as much as possible to online configuration, 
> inlcuding
>       >       the actual memory limit.. You can turn the lru crawler on and 
> off,
>       >       automoving on and off, manually move slab pages, etc. I'm 
> hoping to make
>       >       the LRU algorithm itself modifyable at runtime.
>       >
>       >       So yeah, I'd take a patch :)
>       >
>       >       On Mon, 23 Jan 2017, 'Scott Mansfield' via memcached wrote:
>       >
>       >       > There was a single setting my team was looking at today and 
> wish we could have changed dynamically: the
>       >       > reqs_per_event setting. Right now in order to change it we 
> need to shut down the process and start it again
>       >       > with a different -R parameter. I don't see a way to change 
> many of the settings, though there are some that
>       >       > are ad-hoc changeable through some stats commands. I was 
> going to see if I could patch memcached to be able
>       >       > to change the reqs_per_event setting at runtime, but before 
> doing so I wanted to check to see if that's
>       >       > something that would be amenable. I also didn't want to do 
> something specifically for that setting if it was
>       >       > going to be better to add it as a general feature.
>       >       > I see some pros and cons:
>       >       >
>       >       > One easy pro is that you can easily change things at runtime 
> to save performance while not losing all of
>       >       > your data. If client request patterns change, the process can 
> react.
>       >       >
>       >       > A con is that the startup parameters won't necessarily match 
> what the process is doing, so they are no
>       >       > longer going to be a useful way to determine the settings of 
> memcached. Instead you would need to connect
>       >       > and issue a stats settings command to read them. It also 
> introduces change in places that may have
>       >       > previously never seen it, e.g. the reqs_per_event setting is 
> simply read at the beginning of the
>       >       > drive_machine loop. It might need some kind of 
> synchronization around it now instead. I don't think it
>       >       > necessarily needs it on x86_64 but it might on other 
> platforms which I am not familiar with.
>       >       >
>       >       > --
>       >       >
>       >       > ---
>       >       > You received this message because you are subscribed to the 
> Google Groups "memcached" group.
>       >       > To unsubscribe from this group and stop receiving emails from 
> it, send an email to
>       >       > memcached+unsubscr...@googlegroups.com.
>       >       > For more options, visit https://groups.google.com/d/optout.
>       >       >
>       >       >
>       >
>       >       --
>       >
>       >       ---
>       >       You received this message because you are subscribed to a topic 
> in the Google Groups "memcached" group.
>       >       To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/memcached/C6l8aoXQO4A/unsubscribe.
>       >       To unsubscribe from this group and all its topics, send an 
> email to memcached+unsubscr...@googlegroups.com.
>       >       For more options, visit https://groups.google.com/d/optout.
>       >
>       >
>       > --
>       >
>       > ---
>       > You received this message because you are subscribed to the Google 
> Groups "memcached" group.
>       > To unsubscribe from this group and stop receiving emails from it, 
> send an email to memcached+unsubscr...@googlegroups.com.
>       > For more options, visit https://groups.google.com/d/optout.
>       >
>       >
>
>       --
>
>       ---
>       You received this message because you are subscribed to a topic in the 
> Google Groups "memcached" group.
>       To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/memcached/C6l8aoXQO4A/unsubscribe.
>       To unsubscribe from this group and all its topics, send an email to 
> memcached+unsubscr...@googlegroups.com.
>       For more options, visit https://groups.google.com/d/optout.
>
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to