Re: Set operations failing on Memcached 1.4.15

dormando Wed, 27 Mar 2013 16:11:33 -0700

There could be a brief spike when the hash_power_level is being increased
(it shuffles everything into a new hash table). That should stop after it
reaches a steady state.


Can you confirm a few things:

- What exactly are you seeing when a "set fails" ? is it timing out, are
you getting some specific error back?
- stats items / stats slabs are also helpful to know.
- Does hash_power_level increase after every spike? Does the spikes stop
after a wile?
- You can start an instance with -o hashpower=26 or something similar. If
you start an instance that way does it not spike?

Otherwise: no there're no background threads that do anything like that.
slab automove is a background thread but you don't have it enabled and its
CPU usage is minimal.

On Wed, 27 Mar 2013, Nikhil Garg wrote:

> We run 1.4.2 on our cluster though it goes OOM every few days and hence are 
> trying to migrate to a newer build. Release notes of 1.4.15 specifically say 
> that it fixes some OOM cases, so we tried 1.4.15. We noted that around every 
> 40-60 mins, sets would fail in burst.
> On further digging we found that each such burst corresponds to a spike in 
> user cpu time and a spike in open connections. We found that even if a box 
> isn't serving any production traffic, user cpu still spikes at roughly same 
> frequency (though spikes are much
> shorter). During one of the spikes, top showed that it was infact memcached 
> process which was hogging cpu. This behavior wasn't observed on previous 
> binary. Some differences between old binary and new binary:
> * old binary used libevent 1.4.2 whereas new one uses libevent 2.0.16
> * old binary was running on Ubuntu 10.04 whereas new one is running on 12.04
>
> Some more details about new binary:
>
> stats
> STAT pid 2727
> STAT uptime 74268
> STAT time 1364419469
> STAT version 1.4.15
> STAT libevent 2.0.16-stable
> STAT pointer_size 64
> STAT rusage_user 16275.537157
> STAT rusage_system 20872.252434
> STAT curr_connections 33
> STAT total_connections 270918
> STAT connection_structures 3712
> STAT reserved_fds 20
> STAT cmd_get 2216135698
> STAT cmd_set 161257323
> STAT cmd_flush 0
> STAT cmd_touch 0
> STAT get_hits 1970822534
> STAT get_misses 245313164
> STAT delete_misses 2615317
> STAT delete_hits 4184410
> STAT incr_misses 366964
> STAT incr_hits 3804454
> STAT decr_misses 0
> STAT decr_hits 0
> STAT cas_misses 21315
> STAT cas_hits 81561392
> STAT cas_badval 1845457
> STAT touch_hits 0
> STAT touch_misses 0
> STAT auth_cmds 0
> STAT auth_errors 0
> STAT bytes_read 114764483242
> STAT bytes_written 396446726727
> STAT limit_maxbytes 29360128000
> STAT accepting_conns 1
> STAT listen_disabled_num 0
> STAT threads 4
> STAT conn_yields 0
> STAT hash_power_level 26
> STAT hash_bytes 536870912
> STAT hash_is_expanding 0
> STAT bytes 10653107460
> STAT curr_items 61033194
> STAT total_items 163080654
> STAT expired_unfetched 57
> STAT evicted_unfetched 0
> STAT evictions 0
> STAT reclaimed 72
> END
>
>
> stats settings
> STAT maxbytes 3590324224
> STAT maxconns 100000
> STAT tcpport 11211
> STAT udpport 11211
> STAT inter 0.0.0.0
> STAT verbosity 0
> STAT oldest 0
> STAT evictions on
> STAT domain_socket NULL
> STAT umask 700
> STAT growth_factor 1.25
> STAT chunk_size 48
> STAT num_threads 4
> STAT num_threads_per_udp 4
> STAT stat_key_prefix :
> STAT detail_enabled no
> STAT reqs_per_event 20
> STAT cas_enabled yes
> STAT tcp_backlog 1024
> STAT binding_protocol auto-negotiate
> STAT auth_enabled_sasl no
> STAT item_size_max 1048576
> STAT maxconns_fast no
> STAT hashpower_init 0
> STAT slab_reassign no
> STAT slab_automove 0
> END
>
> We get around 10K operations per second (get + multi get + set) per server.
>
> root@mc20:~# ps aux | grep mem | grep -v grep
> nobody    2727 49.4 38.5 13935812 13508096 ?   Ssl  00:46 619:36 
> /usr/bin/memcached -m 28000 -p 11211 -u nobody -l 0.0.0.0 -d -c 100000
>
> User cpu spikes every 40-60 minutes:
>
> [user_cpu.gif]
>
>
> Open connections seem to spike at same time:
>
> [open_connections.gif]
>
>
> User cpu graph for a non-production server at similar frequency but much 
> shorter spikes:
>
> [non_prod_user_cpu.gif]
>
>
> I obtained strace during one of the spikes though found nothing suspicious 
> about it. Can provide it, if it is helpful. I also have the output of ls -l 
> /proc/$(pidof memcached)/fd from a spike.
>
> Is there some background thread which does some heavy duty work every some 
> minutes?
>
> --
>  
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Set operations failing on Memcached 1.4.15

Reply via email to