There could be a brief spike when the hash_power_level is being increased (it shuffles everything into a new hash table). That should stop after it reaches a steady state.
Can you confirm a few things: - What exactly are you seeing when a "set fails" ? is it timing out, are you getting some specific error back? - stats items / stats slabs are also helpful to know. - Does hash_power_level increase after every spike? Does the spikes stop after a wile? - You can start an instance with -o hashpower=26 or something similar. If you start an instance that way does it not spike? Otherwise: no there're no background threads that do anything like that. slab automove is a background thread but you don't have it enabled and its CPU usage is minimal. On Wed, 27 Mar 2013, Nikhil Garg wrote: > We run 1.4.2 on our cluster though it goes OOM every few days and hence are > trying to migrate to a newer build. Release notes of 1.4.15 specifically say > that it fixes some OOM cases, so we tried 1.4.15. We noted that around every > 40-60 mins, sets would fail in burst. > On further digging we found that each such burst corresponds to a spike in > user cpu time and a spike in open connections. We found that even if a box > isn't serving any production traffic, user cpu still spikes at roughly same > frequency (though spikes are much > shorter). During one of the spikes, top showed that it was infact memcached > process which was hogging cpu. This behavior wasn't observed on previous > binary. Some differences between old binary and new binary: > * old binary used libevent 1.4.2 whereas new one uses libevent 2.0.16 > * old binary was running on Ubuntu 10.04 whereas new one is running on 12.04 > > Some more details about new binary: > > stats > STAT pid 2727 > STAT uptime 74268 > STAT time 1364419469 > STAT version 1.4.15 > STAT libevent 2.0.16-stable > STAT pointer_size 64 > STAT rusage_user 16275.537157 > STAT rusage_system 20872.252434 > STAT curr_connections 33 > STAT total_connections 270918 > STAT connection_structures 3712 > STAT reserved_fds 20 > STAT cmd_get 2216135698 > STAT cmd_set 161257323 > STAT cmd_flush 0 > STAT cmd_touch 0 > STAT get_hits 1970822534 > STAT get_misses 245313164 > STAT delete_misses 2615317 > STAT delete_hits 4184410 > STAT incr_misses 366964 > STAT incr_hits 3804454 > STAT decr_misses 0 > STAT decr_hits 0 > STAT cas_misses 21315 > STAT cas_hits 81561392 > STAT cas_badval 1845457 > STAT touch_hits 0 > STAT touch_misses 0 > STAT auth_cmds 0 > STAT auth_errors 0 > STAT bytes_read 114764483242 > STAT bytes_written 396446726727 > STAT limit_maxbytes 29360128000 > STAT accepting_conns 1 > STAT listen_disabled_num 0 > STAT threads 4 > STAT conn_yields 0 > STAT hash_power_level 26 > STAT hash_bytes 536870912 > STAT hash_is_expanding 0 > STAT bytes 10653107460 > STAT curr_items 61033194 > STAT total_items 163080654 > STAT expired_unfetched 57 > STAT evicted_unfetched 0 > STAT evictions 0 > STAT reclaimed 72 > END > > > stats settings > STAT maxbytes 3590324224 > STAT maxconns 100000 > STAT tcpport 11211 > STAT udpport 11211 > STAT inter 0.0.0.0 > STAT verbosity 0 > STAT oldest 0 > STAT evictions on > STAT domain_socket NULL > STAT umask 700 > STAT growth_factor 1.25 > STAT chunk_size 48 > STAT num_threads 4 > STAT num_threads_per_udp 4 > STAT stat_key_prefix : > STAT detail_enabled no > STAT reqs_per_event 20 > STAT cas_enabled yes > STAT tcp_backlog 1024 > STAT binding_protocol auto-negotiate > STAT auth_enabled_sasl no > STAT item_size_max 1048576 > STAT maxconns_fast no > STAT hashpower_init 0 > STAT slab_reassign no > STAT slab_automove 0 > END > > We get around 10K operations per second (get + multi get + set) per server. > > root@mc20:~# ps aux | grep mem | grep -v grep > nobody 2727 49.4 38.5 13935812 13508096 ? Ssl 00:46 619:36 > /usr/bin/memcached -m 28000 -p 11211 -u nobody -l 0.0.0.0 -d -c 100000 > > User cpu spikes every 40-60 minutes: > > [user_cpu.gif] > > > Open connections seem to spike at same time: > > [open_connections.gif] > > > User cpu graph for a non-production server at similar frequency but much > shorter spikes: > > [non_prod_user_cpu.gif] > > > I obtained strace during one of the spikes though found nothing suspicious > about it. Can provide it, if it is helpful. I also have the output of ls -l > /proc/$(pidof memcached)/fd from a spike. > > Is there some background thread which does some heavy duty work every some > minutes? > > -- > > --- > You received this message because you are subscribed to the Google Groups > "memcached" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to memcached+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > > > -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.