Re: Dynamic settings

2017-02-09 Thread 'Scott Mansfield' via memcached
I have opened a pull request with a preliminary implementation for a 
settings command: https://github.com/memcached/memcached/pull/255

I took a few liberties, so let me know if anything is out of line.

On Wednesday, January 25, 2017 at 1:52:24 PM UTC-8, Dormando wrote:
>
> Yeah gimme a few weeks maybe. Reducing those syscalls is like almost all 
> of the CPU usage. Difference between 1.2m keys/sec and 35m keys/sec on 20 
> cores in my own tests. 
>
> I did this: 
> https://github.com/memcached/memcached/pull/243 
> .. which would help batch perf. 
> and this: 
> https://github.com/memcached/memcached/pull/241 
> .. which should make binprot perf better at nearly undetectable cost to 
> ascii. 
>
> so, working my way to it. 
>
> On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote: 
>
> > Yes, our production traffic all uses binary protocol, even behind our 
> on-server proxy that we use. In fact, if you have a way to reduce syscalls 
> by batching responses, that 
> > would solve another huge pain we have that's of our own doing. 
> > 
> > 
> > Scott Mansfield 
> > Product > Consumer Science Eng > EVCache > Sr. Software Eng 
> > { 
> >   M: 352-514-9452 
> >   E: smans...@netflix.com  
> >   K: {M: mobile, E: email, K: key} 
> > } 
> > 
> > On Wed, Jan 25, 2017 at 11:33 AM, dormando  > wrote: 
> >   Okay, so it's the big rollup that gets delayed. Makes sense. 
> > 
> >   You're using binary protocol for everything? That's a major focus 
> of my 
> >   performance annoyance right now, since every response packet is 
> sent 
> >   individually. I should have that switched to an option at least 
> pretty 
> >   soon, which should also help with the time it takes to service 
> them. 
> > 
> >   I'll test both ascii and binprot + the req_per_event option to see 
> how bad 
> >   this is measurably. 
> > 
> >   On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote: 
> > 
> >   > The client is the EVCache client jar: 
> https://github.com/netflix/evcache 
> >   > When a user calls the batch get function on the client, it will 
> spread those batch gets out over many servers because it is hashing keys to 
> different servers. 
> >   Imagine many of 
> >   > these batch gets happening at the same time, though, and each 
> server's queue will get a bunch of gets from a bunch of different 
> user-facing batch gets. It all 
> >   gets intermixed. 
> >   > These client-side read queues are rather large (1) and might 
> end up sending a batch of a few hundred keys at a time. These large batch 
> gets are sent off to 
> >   the servers as 
> >   > "one" getq|getq|getq|getq|getq|getq|getq|getq|getq|getq|noop 
> package and read back in that order. We are reading the responses fairly 
> efficiently internally, but 
> >   the batch get 
> >   > call that the user made is waiting on the data from all of these 
> separate servers to come back in order to properly respond to the user in a 
> synchronous manner.  
> >   > 
> >   > Now on the memcached side, there's many servers all doing this 
> same pattern of many large batch gets. Memcached will stop responding to 
> that connection after 20 
> >   requests on the 
> >   > same event and go serve other connections. If that happens, any 
> user-facing batch call that is waiting on any getq command still waiting to 
> be serviced on that 
> >   connection can 
> >   > be delayed. It doesn't normally end up causing timeouts but it 
> does at a low level. 
> >   > 
> >   > Our timeouts for this app in particular are 5 seconds for a 
> single user-facing batch get call. This client app is fine with higher 
> latency for higher throughput. 
> >   > 
> >   > At this point we have the reqs_per_event set to a rather high 
> 300 and it seems to have solved our problem. I don't think it's causing any 
> more consternation (for 
> >   now), but 
> >   > having a dynamic setting would have lowered the operational 
> complexity of the tuning. 
> >   > 
> >   > 
> >   > Scott Mansfield 
> >   > Product > Consumer Science Eng > EVCache > Sr. Software Eng 
> >   > { 
> >   >   M: 352-514-9452 
> >   >   E: smans...@netflix.com  
> >   >   K: {M: mobile, E: email, K: key} 
> >   > } 
> >   > 
> >   > On Wed, Jan 25, 2017 at 11:04 AM, dormando  > wrote: 
> >   >   I guess when I say dynamic I mostly mean 
> runttime-settable. Dynamic is a 
> >   >   little harder so I tend to do those as a second pass. 
> >   > 
> >   >   You're saying your client had head-of-line blocking for 
> unrelated 
> >   >   requests? I'm not 100% sure I follow. 
> >   > 
> >   >   Big multiget comes in, multiget gets processed slightly 
> slower than normal 
> >   >   due to other clients making requests, so requests *behind* 
> the multiget 
> >   >   time out, or the multig

Re: Dynamic settings

2017-01-25 Thread dormando
Yeah gimme a few weeks maybe. Reducing those syscalls is like almost all
of the CPU usage. Difference between 1.2m keys/sec and 35m keys/sec on 20
cores in my own tests.

I did this:
https://github.com/memcached/memcached/pull/243
.. which would help batch perf.
and this:
https://github.com/memcached/memcached/pull/241
.. which should make binprot perf better at nearly undetectable cost to
ascii.

so, working my way to it.

On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote:

> Yes, our production traffic all uses binary protocol, even behind our 
> on-server proxy that we use. In fact, if you have a way to reduce syscalls by 
> batching responses, that
> would solve another huge pain we have that's of our own doing.
>
>
> Scott Mansfield
> Product > Consumer Science Eng > EVCache > Sr. Software Eng
> {
>   M: 352-514-9452
>   E: smansfi...@netflix.com
>   K: {M: mobile, E: email, K: key}
> }
>
> On Wed, Jan 25, 2017 at 11:33 AM, dormando  wrote:
>   Okay, so it's the big rollup that gets delayed. Makes sense.
>
>   You're using binary protocol for everything? That's a major focus of my
>   performance annoyance right now, since every response packet is sent
>   individually. I should have that switched to an option at least pretty
>   soon, which should also help with the time it takes to service them.
>
>   I'll test both ascii and binprot + the req_per_event option to see how 
> bad
>   this is measurably.
>
>   On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote:
>
>   > The client is the EVCache client jar: 
> https://github.com/netflix/evcache
>   > When a user calls the batch get function on the client, it will 
> spread those batch gets out over many servers because it is hashing keys to 
> different servers.
>   Imagine many of
>   > these batch gets happening at the same time, though, and each 
> server's queue will get a bunch of gets from a bunch of different user-facing 
> batch gets. It all
>   gets intermixed.
>   > These client-side read queues are rather large (1) and might end 
> up sending a batch of a few hundred keys at a time. These large batch gets 
> are sent off to
>   the servers as
>   > "one" getq|getq|getq|getq|getq|getq|getq|getq|getq|getq|noop package 
> and read back in that order. We are reading the responses fairly efficiently 
> internally, but
>   the batch get
>   > call that the user made is waiting on the data from all of these 
> separate servers to come back in order to properly respond to the user in a 
> synchronous manner. 
>   >
>   > Now on the memcached side, there's many servers all doing this same 
> pattern of many large batch gets. Memcached will stop responding to that 
> connection after 20
>   requests on the
>   > same event and go serve other connections. If that happens, any 
> user-facing batch call that is waiting on any getq command still waiting to 
> be serviced on that
>   connection can
>   > be delayed. It doesn't normally end up causing timeouts but it does 
> at a low level.
>   >
>   > Our timeouts for this app in particular are 5 seconds for a single 
> user-facing batch get call. This client app is fine with higher latency for 
> higher throughput.
>   >
>   > At this point we have the reqs_per_event set to a rather high 300 and 
> it seems to have solved our problem. I don't think it's causing any more 
> consternation (for
>   now), but
>   > having a dynamic setting would have lowered the operational 
> complexity of the tuning.
>   >
>   >
>   > Scott Mansfield
>   > Product > Consumer Science Eng > EVCache > Sr. Software Eng
>   > {
>   >   M: 352-514-9452
>   >   E: smansfi...@netflix.com
>   >   K: {M: mobile, E: email, K: key}
>   > }
>   >
>   > On Wed, Jan 25, 2017 at 11:04 AM, dormando  wrote:
>   >       I guess when I say dynamic I mostly mean runttime-settable. 
> Dynamic is a
>   >       little harder so I tend to do those as a second pass.
>   >
>   >       You're saying your client had head-of-line blocking for 
> unrelated
>   >       requests? I'm not 100% sure I follow.
>   >
>   >       Big multiget comes in, multiget gets processed slightly slower 
> than normal
>   >       due to other clients making requests, so requests *behind* the 
> multiget
>   >       time out, or the multiget itself?
>   >
>   >       How long is your timeout? :P
>   >
>   >       I'll take a look at it as well and see about raising the limit 
> in `-o
>   >       modern` after some performance tests. The default is from 2006.
>   >
>   >       thanks!
>   >
>   >       On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote:
>   >
>   >       > The reqs_per_event setting was causing a client that was 
> doing large batch-gets (of a few hundred keys) to see some timeouts. Since 

Re: Dynamic settings

2017-01-25 Thread 'Scott Mansfield' via memcached
Yes, our production traffic all uses binary protocol, even behind our
on-server proxy that we use. In fact, if you have a way to reduce syscalls
by batching responses, that would solve another huge pain we have that's of
our own doing.


*Scott Mansfield*

Product > Consumer Science Eng > EVCache > Sr. Software Eng
{
  M: 352-514-9452
  E: smansfi...@netflix.com
  K: {M: mobile, E: email, K: key}
}

On Wed, Jan 25, 2017 at 11:33 AM, dormando  wrote:

> Okay, so it's the big rollup that gets delayed. Makes sense.
>
> You're using binary protocol for everything? That's a major focus of my
> performance annoyance right now, since every response packet is sent
> individually. I should have that switched to an option at least pretty
> soon, which should also help with the time it takes to service them.
>
> I'll test both ascii and binprot + the req_per_event option to see how bad
> this is measurably.
>
> On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote:
>
> > The client is the EVCache client jar: https://github.com/netflix/evcache
> > When a user calls the batch get function on the client, it will spread
> those batch gets out over many servers because it is hashing keys to
> different servers. Imagine many of
> > these batch gets happening at the same time, though, and each server's
> queue will get a bunch of gets from a bunch of different user-facing batch
> gets. It all gets intermixed.
> > These client-side read queues are rather large (1) and might end up
> sending a batch of a few hundred keys at a time. These large batch gets are
> sent off to the servers as
> > "one" getq|getq|getq|getq|getq|getq|getq|getq|getq|getq|noop package
> and read back in that order. We are reading the responses fairly
> efficiently internally, but the batch get
> > call that the user made is waiting on the data from all of these
> separate servers to come back in order to properly respond to the user in a
> synchronous manner.
> >
> > Now on the memcached side, there's many servers all doing this same
> pattern of many large batch gets. Memcached will stop responding to that
> connection after 20 requests on the
> > same event and go serve other connections. If that happens, any
> user-facing batch call that is waiting on any getq command still waiting to
> be serviced on that connection can
> > be delayed. It doesn't normally end up causing timeouts but it does at a
> low level.
> >
> > Our timeouts for this app in particular are 5 seconds for a single
> user-facing batch get call. This client app is fine with higher latency for
> higher throughput.
> >
> > At this point we have the reqs_per_event set to a rather high 300 and it
> seems to have solved our problem. I don't think it's causing any more
> consternation (for now), but
> > having a dynamic setting would have lowered the operational complexity
> of the tuning.
> >
> >
> > Scott Mansfield
> > Product > Consumer Science Eng > EVCache > Sr. Software Eng
> > {
> >   M: 352-514-9452
> >   E: smansfi...@netflix.com
> >   K: {M: mobile, E: email, K: key}
> > }
> >
> > On Wed, Jan 25, 2017 at 11:04 AM, dormando  wrote:
> >   I guess when I say dynamic I mostly mean runttime-settable.
> Dynamic is a
> >   little harder so I tend to do those as a second pass.
> >
> >   You're saying your client had head-of-line blocking for unrelated
> >   requests? I'm not 100% sure I follow.
> >
> >   Big multiget comes in, multiget gets processed slightly slower
> than normal
> >   due to other clients making requests, so requests *behind* the
> multiget
> >   time out, or the multiget itself?
> >
> >   How long is your timeout? :P
> >
> >   I'll take a look at it as well and see about raising the limit in
> `-o
> >   modern` after some performance tests. The default is from 2006.
> >
> >   thanks!
> >
> >   On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote:
> >
> >   > The reqs_per_event setting was causing a client that was doing
> large batch-gets (of a few hundred keys) to see some timeouts. Since
> memcached will delay
> >   responding fully until
> >   > other connections are serviced and our client will wait until
> the batch is done, we see some client-side timeouts for the users of our
> client library. Our
> >   solution has been to
> >   > up the setting during startup, but just as a thought experiment
> I was asking if we could have done it dynamically to avoid losing data. At
> the moment there's
> >   quite a lot of
> >   > machinery to change the setting (deploy, copy data over with our
> cache warmer, flip traffic, tear down old boxes) and I would have rather
> left everything as is
> >   and adjusted the
> >   > setting on the fly until our client's problem was resolved.
> >   > I'm interested in patching this specific setting to be settable,
> but having it fully dynamic in nature is not something I'd want to tackle.
> There's a natural
> >   tradeoff of
> 

Re: Dynamic settings

2017-01-25 Thread dormando
Okay, so it's the big rollup that gets delayed. Makes sense.

You're using binary protocol for everything? That's a major focus of my
performance annoyance right now, since every response packet is sent
individually. I should have that switched to an option at least pretty
soon, which should also help with the time it takes to service them.

I'll test both ascii and binprot + the req_per_event option to see how bad
this is measurably.

On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote:

> The client is the EVCache client jar: https://github.com/netflix/evcache
> When a user calls the batch get function on the client, it will spread those 
> batch gets out over many servers because it is hashing keys to different 
> servers. Imagine many of
> these batch gets happening at the same time, though, and each server's queue 
> will get a bunch of gets from a bunch of different user-facing batch gets. It 
> all gets intermixed.
> These client-side read queues are rather large (1) and might end up 
> sending a batch of a few hundred keys at a time. These large batch gets are 
> sent off to the servers as
> "one" getq|getq|getq|getq|getq|getq|getq|getq|getq|getq|noop package and read 
> back in that order. We are reading the responses fairly efficiently 
> internally, but the batch get
> call that the user made is waiting on the data from all of these separate 
> servers to come back in order to properly respond to the user in a 
> synchronous manner. 
>
> Now on the memcached side, there's many servers all doing this same pattern 
> of many large batch gets. Memcached will stop responding to that connection 
> after 20 requests on the
> same event and go serve other connections. If that happens, any user-facing 
> batch call that is waiting on any getq command still waiting to be serviced 
> on that connection can
> be delayed. It doesn't normally end up causing timeouts but it does at a low 
> level.
>
> Our timeouts for this app in particular are 5 seconds for a single 
> user-facing batch get call. This client app is fine with higher latency for 
> higher throughput.
>
> At this point we have the reqs_per_event set to a rather high 300 and it 
> seems to have solved our problem. I don't think it's causing any more 
> consternation (for now), but
> having a dynamic setting would have lowered the operational complexity of the 
> tuning.
>
>
> Scott Mansfield
> Product > Consumer Science Eng > EVCache > Sr. Software Eng
> {
>   M: 352-514-9452
>   E: smansfi...@netflix.com
>   K: {M: mobile, E: email, K: key}
> }
>
> On Wed, Jan 25, 2017 at 11:04 AM, dormando  wrote:
>   I guess when I say dynamic I mostly mean runttime-settable. Dynamic is a
>   little harder so I tend to do those as a second pass.
>
>   You're saying your client had head-of-line blocking for unrelated
>   requests? I'm not 100% sure I follow.
>
>   Big multiget comes in, multiget gets processed slightly slower than 
> normal
>   due to other clients making requests, so requests *behind* the multiget
>   time out, or the multiget itself?
>
>   How long is your timeout? :P
>
>   I'll take a look at it as well and see about raising the limit in `-o
>   modern` after some performance tests. The default is from 2006.
>
>   thanks!
>
>   On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote:
>
>   > The reqs_per_event setting was causing a client that was doing large 
> batch-gets (of a few hundred keys) to see some timeouts. Since memcached will 
> delay
>   responding fully until
>   > other connections are serviced and our client will wait until the 
> batch is done, we see some client-side timeouts for the users of our client 
> library. Our
>   solution has been to
>   > up the setting during startup, but just as a thought experiment I was 
> asking if we could have done it dynamically to avoid losing data. At the 
> moment there's
>   quite a lot of
>   > machinery to change the setting (deploy, copy data over with our 
> cache warmer, flip traffic, tear down old boxes) and I would have rather left 
> everything as is
>   and adjusted the
>   > setting on the fly until our client's problem was resolved.
>   > I'm interested in patching this specific setting to be settable, but 
> having it fully dynamic in nature is not something I'd want to tackle. 
> There's a natural
>   tradeoff of
>   > latency for other connections / throughput for the one that is 
> currently being serviced. I'm not sure it's a good idea to dynamically change 
> that. It might cause
>   unexpected
>   > behavior if one bad client sends huge requests.
>   >
>   >
>   > Scott Mansfield
>   > Product > Consumer Science Eng > EVCache > Sr. Software Eng
>   > {
>   >   M: 352-514-9452
>   >   E: smansfi...@netflix.com
>   >   K: {M: mobile, E: email, K: key}
>   > }
>   >
>   > On Tue, Jan 24, 2017 at 11:53 AM, dor

Re: Dynamic settings

2017-01-25 Thread 'Scott Mansfield' via memcached
The client is the EVCache client jar: https://github.com/netflix/evcache

When a user calls the batch get function on the client, it will spread
those batch gets out over many servers because it is hashing keys to
different servers. Imagine many of these batch gets happening at the same
time, though, and each server's queue will get a bunch of gets from a bunch
of different user-facing batch gets. It all gets intermixed. These
client-side read queues are rather large (1) and might end up sending a
batch of a few hundred keys at a time. These large batch gets are sent off
to the servers as "one" getq|getq|getq|getq|getq|getq|getq|getq|getq|getq|noop
package and read back in that order. We are reading the responses fairly
efficiently internally, but the batch get call that the user made is
waiting on the data from all of these separate servers to come back in
order to properly respond to the user in a synchronous manner.

Now on the memcached side, there's many servers all doing this same pattern
of many large batch gets. Memcached will stop responding to that connection
after 20 requests on the same event and go serve other connections. If that
happens, any user-facing batch call that is waiting on any getq command
still waiting to be serviced on that connection can be delayed. It doesn't
normally end up causing timeouts but it does at a low level.

Our timeouts for this app in particular are 5 seconds for a single
user-facing batch get call. This client app is fine with higher latency for
higher throughput.

At this point we have the reqs_per_event set to a rather high 300 and it
seems to have solved our problem. I don't think it's causing any more
consternation (for now), but having a dynamic setting would have lowered
the operational complexity of the tuning.


*Scott Mansfield*

Product > Consumer Science Eng > EVCache > Sr. Software Eng
{
  M: 352-514-9452 <(352)%20514-9452>
  E: smansfi...@netflix.com
  K: {M: mobile, E: email, K: key}
}

On Wed, Jan 25, 2017 at 11:04 AM, dormando  wrote:

> I guess when I say dynamic I mostly mean runttime-settable. Dynamic is a
> little harder so I tend to do those as a second pass.
>
> You're saying your client had head-of-line blocking for unrelated
> requests? I'm not 100% sure I follow.
>
> Big multiget comes in, multiget gets processed slightly slower than normal
> due to other clients making requests, so requests *behind* the multiget
> time out, or the multiget itself?
>
> How long is your timeout? :P
>
> I'll take a look at it as well and see about raising the limit in `-o
> modern` after some performance tests. The default is from 2006.
>
> thanks!
>
> On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote:
>
> > The reqs_per_event setting was causing a client that was doing large
> batch-gets (of a few hundred keys) to see some timeouts. Since memcached
> will delay responding fully until
> > other connections are serviced and our client will wait until the batch
> is done, we see some client-side timeouts for the users of our client
> library. Our solution has been to
> > up the setting during startup, but just as a thought experiment I was
> asking if we could have done it dynamically to avoid losing data. At the
> moment there's quite a lot of
> > machinery to change the setting (deploy, copy data over with our cache
> warmer, flip traffic, tear down old boxes) and I would have rather left
> everything as is and adjusted the
> > setting on the fly until our client's problem was resolved.
> > I'm interested in patching this specific setting to be settable, but
> having it fully dynamic in nature is not something I'd want to tackle.
> There's a natural tradeoff of
> > latency for other connections / throughput for the one that is currently
> being serviced. I'm not sure it's a good idea to dynamically change that.
> It might cause unexpected
> > behavior if one bad client sends huge requests.
> >
> >
> > Scott Mansfield
> > Product > Consumer Science Eng > EVCache > Sr. Software Eng
> > {
> >   M: 352-514-9452
> >   E: smansfi...@netflix.com
> >   K: {M: mobile, E: email, K: key}
> > }
> >
> > On Tue, Jan 24, 2017 at 11:53 AM, dormando  wrote:
> >   Hey,
> >
> >   Would you mind explaining a bit how you determined the setting was
> causing
> >   an issue, and what the impact was? The default there is very old
> and might
> >   be worth a revisit (or some kind of auto-tuning) as well.
> >
> >   I've been trending as much as possible to online configuration,
> inlcuding
> >   the actual memory limit.. You can turn the lru crawler on and off,
> >   automoving on and off, manually move slab pages, etc. I'm hoping
> to make
> >   the LRU algorithm itself modifyable at runtime.
> >
> >   So yeah, I'd take a patch :)
> >
> >   On Mon, 23 Jan 2017, 'Scott Mansfield' via memcached wrote:
> >
> >   > There was a single setting my team was looking at today and wish
> we could have changed dynamically: the
> >  

Re: Dynamic settings

2017-01-25 Thread dormando
I guess when I say dynamic I mostly mean runttime-settable. Dynamic is a
little harder so I tend to do those as a second pass.

You're saying your client had head-of-line blocking for unrelated
requests? I'm not 100% sure I follow.

Big multiget comes in, multiget gets processed slightly slower than normal
due to other clients making requests, so requests *behind* the multiget
time out, or the multiget itself?

How long is your timeout? :P

I'll take a look at it as well and see about raising the limit in `-o
modern` after some performance tests. The default is from 2006.

thanks!

On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote:

> The reqs_per_event setting was causing a client that was doing large 
> batch-gets (of a few hundred keys) to see some timeouts. Since memcached will 
> delay responding fully until
> other connections are serviced and our client will wait until the batch is 
> done, we see some client-side timeouts for the users of our client library. 
> Our solution has been to
> up the setting during startup, but just as a thought experiment I was asking 
> if we could have done it dynamically to avoid losing data. At the moment 
> there's quite a lot of
> machinery to change the setting (deploy, copy data over with our cache 
> warmer, flip traffic, tear down old boxes) and I would have rather left 
> everything as is and adjusted the
> setting on the fly until our client's problem was resolved.
> I'm interested in patching this specific setting to be settable, but having 
> it fully dynamic in nature is not something I'd want to tackle. There's a 
> natural tradeoff of
> latency for other connections / throughput for the one that is currently 
> being serviced. I'm not sure it's a good idea to dynamically change that. It 
> might cause unexpected
> behavior if one bad client sends huge requests.
>
>
> Scott Mansfield
> Product > Consumer Science Eng > EVCache > Sr. Software Eng
> {
>   M: 352-514-9452
>   E: smansfi...@netflix.com
>   K: {M: mobile, E: email, K: key}
> }
>
> On Tue, Jan 24, 2017 at 11:53 AM, dormando  wrote:
>   Hey,
>
>   Would you mind explaining a bit how you determined the setting was 
> causing
>   an issue, and what the impact was? The default there is very old and 
> might
>   be worth a revisit (or some kind of auto-tuning) as well.
>
>   I've been trending as much as possible to online configuration, 
> inlcuding
>   the actual memory limit.. You can turn the lru crawler on and off,
>   automoving on and off, manually move slab pages, etc. I'm hoping to make
>   the LRU algorithm itself modifyable at runtime.
>
>   So yeah, I'd take a patch :)
>
>   On Mon, 23 Jan 2017, 'Scott Mansfield' via memcached wrote:
>
>   > There was a single setting my team was looking at today and wish we 
> could have changed dynamically: the
>   > reqs_per_event setting. Right now in order to change it we need to 
> shut down the process and start it again
>   > with a different -R parameter. I don't see a way to change many of 
> the settings, though there are some that
>   > are ad-hoc changeable through some stats commands. I was going to see 
> if I could patch memcached to be able
>   > to change the reqs_per_event setting at runtime, but before doing so 
> I wanted to check to see if that's
>   > something that would be amenable. I also didn't want to do something 
> specifically for that setting if it was
>   > going to be better to add it as a general feature.
>   > I see some pros and cons:
>   >
>   > One easy pro is that you can easily change things at runtime to save 
> performance while not losing all of
>   > your data. If client request patterns change, the process can react.
>   >
>   > A con is that the startup parameters won't necessarily match what the 
> process is doing, so they are no
>   > longer going to be a useful way to determine the settings of 
> memcached. Instead you would need to connect
>   > and issue a stats settings command to read them. It also introduces 
> change in places that may have
>   > previously never seen it, e.g. the reqs_per_event setting is simply 
> read at the beginning of the
>   > drive_machine loop. It might need some kind of synchronization around 
> it now instead. I don't think it
>   > necessarily needs it on x86_64 but it might on other platforms which 
> I am not familiar with.
>   >
>   > --
>   >
>   > ---
>   > You received this message because you are subscribed to the Google 
> Groups "memcached" group.
>   > To unsubscribe from this group and stop receiving emails from it, 
> send an email to
>   > memcached+unsubscr...@googlegroups.com.
>   > For more options, visit https://groups.google.com/d/optout.
>   >
>   >
>
>   --
>
>   ---
>   You received this message because you are subscribed to a topic in the 
> Google Groups "memcached" group.
>   

Re: Dynamic settings

2017-01-25 Thread 'Scott Mansfield' via memcached
The reqs_per_event setting was causing a client that was doing large
batch-gets (of a few hundred keys) to see some timeouts. Since memcached
will delay responding fully until other connections are serviced and our
client will wait until the batch is done, we see some client-side timeouts
for the users of our client library. Our solution has been to up the
setting during startup, but just as a thought experiment I was asking if we
could have done it dynamically to avoid losing data. At the moment there's
quite a lot of machinery to change the setting (deploy, copy data over with
our cache warmer, flip traffic, tear down old boxes) and I would have
rather left everything as is and adjusted the setting on the fly until our
client's problem was resolved.

I'm interested in patching this specific setting to be settable, but having
it fully dynamic in nature is not something I'd want to tackle. There's a
natural tradeoff of latency for other connections / throughput for the one
that is currently being serviced. I'm not sure it's a good idea to
dynamically change that. It might cause unexpected behavior if one bad
client sends huge requests.


*Scott Mansfield*

Product > Consumer Science Eng > EVCache > Sr. Software Eng
{
  M: 352-514-9452
  E: smansfi...@netflix.com
  K: {M: mobile, E: email, K: key}
}

On Tue, Jan 24, 2017 at 11:53 AM, dormando  wrote:

> Hey,
>
> Would you mind explaining a bit how you determined the setting was causing
> an issue, and what the impact was? The default there is very old and might
> be worth a revisit (or some kind of auto-tuning) as well.
>
> I've been trending as much as possible to online configuration, inlcuding
> the actual memory limit.. You can turn the lru crawler on and off,
> automoving on and off, manually move slab pages, etc. I'm hoping to make
> the LRU algorithm itself modifyable at runtime.
>
> So yeah, I'd take a patch :)
>
> On Mon, 23 Jan 2017, 'Scott Mansfield' via memcached wrote:
>
> > There was a single setting my team was looking at today and wish we
> could have changed dynamically: the
> > reqs_per_event setting. Right now in order to change it we need to shut
> down the process and start it again
> > with a different -R parameter. I don't see a way to change many of the
> settings, though there are some that
> > are ad-hoc changeable through some stats commands. I was going to see if
> I could patch memcached to be able
> > to change the reqs_per_event setting at runtime, but before doing so I
> wanted to check to see if that's
> > something that would be amenable. I also didn't want to do something
> specifically for that setting if it was
> > going to be better to add it as a general feature.
> > I see some pros and cons:
> >
> > One easy pro is that you can easily change things at runtime to save
> performance while not losing all of
> > your data. If client request patterns change, the process can react.
> >
> > A con is that the startup parameters won't necessarily match what the
> process is doing, so they are no
> > longer going to be a useful way to determine the settings of memcached.
> Instead you would need to connect
> > and issue a stats settings command to read them. It also introduces
> change in places that may have
> > previously never seen it, e.g. the reqs_per_event setting is simply read
> at the beginning of the
> > drive_machine loop. It might need some kind of synchronization around it
> now instead. I don't think it
> > necessarily needs it on x86_64 but it might on other platforms which I
> am not familiar with.
> >
> > --
> >
> > ---
> > You received this message because you are subscribed to the Google
> Groups "memcached" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to
> > memcached+unsubscr...@googlegroups.com.
> > For more options, visit https://groups.google.com/d/optout.
> >
> >
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "memcached" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/memcached/C6l8aoXQO4A/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Dynamic settings

2017-01-24 Thread dormando
Hey,

Would you mind explaining a bit how you determined the setting was causing
an issue, and what the impact was? The default there is very old and might
be worth a revisit (or some kind of auto-tuning) as well.

I've been trending as much as possible to online configuration, inlcuding
the actual memory limit.. You can turn the lru crawler on and off,
automoving on and off, manually move slab pages, etc. I'm hoping to make
the LRU algorithm itself modifyable at runtime.

So yeah, I'd take a patch :)

On Mon, 23 Jan 2017, 'Scott Mansfield' via memcached wrote:

> There was a single setting my team was looking at today and wish we could 
> have changed dynamically: the
> reqs_per_event setting. Right now in order to change it we need to shut down 
> the process and start it again
> with a different -R parameter. I don't see a way to change many of the 
> settings, though there are some that
> are ad-hoc changeable through some stats commands. I was going to see if I 
> could patch memcached to be able
> to change the reqs_per_event setting at runtime, but before doing so I wanted 
> to check to see if that's
> something that would be amenable. I also didn't want to do something 
> specifically for that setting if it was
> going to be better to add it as a general feature.
> I see some pros and cons:
>
> One easy pro is that you can easily change things at runtime to save 
> performance while not losing all of
> your data. If client request patterns change, the process can react.
>
> A con is that the startup parameters won't necessarily match what the process 
> is doing, so they are no
> longer going to be a useful way to determine the settings of memcached. 
> Instead you would need to connect
> and issue a stats settings command to read them. It also introduces change in 
> places that may have
> previously never seen it, e.g. the reqs_per_event setting is simply read at 
> the beginning of the
> drive_machine loop. It might need some kind of synchronization around it now 
> instead. I don't think it
> necessarily needs it on x86_64 but it might on other platforms which I am not 
> familiar with.
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to
> memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.