Hi, While we were testing 1.4.16 to be used in production 1.4.17 is released and we swithced to that version. So there is no memcahed-debug output fo now.
However, we are still experiencing many crashes and hangs with server 1.4.15. About this issue there is an interesting fact I discovered: We've been collecting server stats output every minute. So we have stats output max 1 min before the crash/hang. The interesting fact is that in each and every case, HASH-POWER-LEVEL value was 23. (hash_power_level=23) And I found this thread it seems like the very same problem: https://groups.google.com/forum/#!searchin/memcached/memcached$201.4.15$20high$20load/memcached/oiylwdukSvQ/ZtT9-24dHE0J I believe memcached server couldn't extend its hash-level up to 24 under load. That was the cause of CPU peaks and hangs. 15 Aralık 2013 Pazar 04:05:30 UTC+2 tarihinde Dormando yazdı: > > Hi, > > If you're still seeing issues at all; can you try 1.4.16, and use the > "memcached-debug" binary instead of the normal one? That should give more > useful information on the crash. > > There should hardly be any lock contention at 5 threads unless you are > running an older version... The latest code should go past 10 (but not too > much higher than that). > > On Tue, 12 Nov 2013, Doruk Deniz Kutukculer wrote: > > > > > Hi everyone, > > > > It's been a while but I thought it might be useful for this discussion > to update. > > > > So we realised that memcache performance decrease as number-of-threads > increase past 5 threads. > > Theory is that lock-contention caused by the threads is the reason we > experinced the problem above. Therefore we change the number-of-threads to > 4. > > > > Which seems to be of some help. > > After this change, however, we saw "invalid slab class" error a couple > of times. We don't know exactly what caused this problem. But it looks like > > related to memory allocation/eviction. And our servers are approx. 95% > full. > > > > As a workaround(hopefully) we increased the max-memory size for > memcached process. > > > > Can you eloborate on the issue, please? > > Does it make sense? > > > > > > 25 Eyl�l 2013 �ar�amba 10:35:05 UTC+3 tarihinde Doruk Deniz > Kutukculer yazd�: > > Hi Roberto, > > Thanks for the response. > > > > We checked the dmesg output and saw nothing about our problems. There > are no problems logged about TCP in dmesg. > > > > > > > > 24 Eyl�l 2013 Sal� 21:53:43 UTC+3 tarihinde rspadim yazd�: > > any dmesg output about problems at same time? something about tcp? > > > > 2013/9/24 Doruk Deniz Kutukculer <ddkutu...@gmail.com>: > > > I forgot to mention: There is also CPU peaks at the time of > incidents: > > > > > > � � time �%usr %sys %wio %idle > > >> > > >> 02:51:04 0 1 0 99 > > >> 02:53:01 0 1 0 99 > > >> 02:54:01 0 1 0 99 > > >> 02:55:04 1 1 0 99 > > >> 02:56:01 1 1 0 99 > > >> 02:58:01 0 1 0 99 > > >> 03:00:01 1 1 0 99 > > >> 03:02:01 11 1 0 88 > > >> 03:03:01 21 1 0 78 > > >> 03:05:01 30 1 0 70 > > >> 03:07:01 30 1 0 69 > > >> 03:09:00 30 1 0 69 > > >> 03:10:01 31 1 0 68 > > >> 03:11:01 32 1 0 67 > > >> 03:12:01 32 1 0 68 > > >> 03:14:01 33 1 0 66 > > >> 03:15:02 33 1 0 66 > > >> 03:16:03 33 1 0 66 > > >> 03:17:01 35 1 0 64 > > >> 03:18:01 35 1 0 64 > > >> 03:19:04 35 1 0 64 > > >> 03:20:01 36 1 0 63 > > >> 03:22:00 38 1 0 61 > > >> 03:23:01 38 1 0 61 > > >> 03:24:01 38 1 0 61 > > >> 03:26:01 39 1 0 61 > > >> 03:27:01 40 1 0 60 > > >> 03:29:01 40 1 0 59 > > >> 03:30:01 40 1 0 59 > > >> 03:31:01 40 1 0 59 > > >> 03:32:02 40 1 0 60 > > >> 03:33:01 40 1 0 59 > > >> 03:34:08 40 1 0 59 > > >> 03:35:01 40 1 0 60 > > >> 03:36:01 40 1 0 59 > > >> 03:38:01 40 1 0 59 > > >> 03:39:01 40 1 0 59 > > >> 03:40:01 40 1 0 59 > > >> 03:41:01 40 1 0 59 > > >> 03:42:01 40 1 0 59 > > >> 03:43:01 40 1 0 59 > > >> 03:44:01 40 1 0 59 > > >> 03:45:01 40 1 0 59 > > >> 03:47:01 40 1 0 59 > > >> 03:48:01 40 1 0 59 > > >> 03:50:02 40 1 0 59 > > >> 03:52:05 40 1 0 59 > > >> 03:53:05 40 1 0 60 > > >> 03:54:02 40 1 0 59 > > >> 03:55:01 40 1 0 59 > > >> 03:56:01 41 1 0 58 > > >> 03:57:01 41 1 0 58 > > >> 03:58:01 41 1 0 58 > > >> 03:59:01 41 1 0 58 > > >> 04:00:01 41 1 0 57 > > >> 04:01:01 42 2 0 56 > > >> 04:02:01 41 1 0 58 > > >> 04:04:01 41 1 0 58 > > >> 04:05:01 41 1 0 58 > > >> 04:06:01 41 1 0 58 > > >> 04:07:01 41 1 0 58 > > >> 04:08:01 41 1 0 58 > > >> 04:09:01 41 1 0 58 > > >> 04:10:01 41 1 0 57 > > >> 04:12:00 41 1 0 58 > > >> 04:13:01 41 1 0 58 > > >> 04:14:01 33 1 0 66 > > >> 04:15:04 1 1 0 98 => server restarted > > >> 04:16:01 1 1 0 98 > > >> 04:17:07 1 1 0 99 > > > > > > > > > -- > > > > > > --- > > > You received this message because you are subscribed to the > Google Groups > > > "memcached" group. > > > To unsubscribe from this group and stop receiving emails from > it, send an > > > email to memcached+...@googlegroups.com. > > > For more options, visit https://groups.google.com/groups/opt_out. > > > > > > > > > -- > > Roberto Spadim > > > > -- > > � > > --- > > You received this message because you are subscribed to the Google > Groups "memcached" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to memcached+...@googlegroups.com <javascript:>. > > For more options, visit https://groups.google.com/groups/opt_out. > > > > -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.