Status: New Owner: ---- Labels: Type-Defect Priority-Medium
New issue 99 by thomchin: Memcached 1.4.2 server segmentation fault http://code.google.com/p/memcached/issues/detail?id=99 Hi guys, This was an issue I had with version 1.4.0 and submitted a report in the group message board. Dustin responded to me: "We've got a new release coming very soon (fixes over 1.4.0). This looks like the kind of thing that needs to be there." I mistook "that needs to be there" as "bug will be fixed in the new releases", but now I think he meant "submitted to the bug tracker". Cutting and pasting from the original posting (http://groups.google.com/group/memcached/browse_thread/thread/778dfe3968a7f74a) [ post 1 ] Hi guys, I have been trying to diagnose a memcached segmentation fault. It is running on a 32 bit Linux 2.6.20 build (Gentoo). The memcached server is being connected through the standard PHP (5.2.6 module to Apache 2) client. After a few reloads of the provided script, the server will crash (increasing the data size seems to accelerate it). The server is started as follows: /usr/bin/memcached -l localhost -vvv It will crash with the following dumped to the terminal: ... <30 new auto-negotiating client connection 30: going from conn_new_cmd to conn_waiting 30: going from conn_waiting to conn_read 30: going from conn_read to conn_parse_cmd 30: Client using the ascii protocol <30 set test 0 10 50526 30: going from conn_parse_cmd to conn_nread
FOUND KEY test 30 STORED
30: going from conn_nread to conn_write 30: going from conn_write to conn_new_cmd 30: going from conn_new_cmd to conn_waiting 30: going from conn_waiting to conn_read 30: going from conn_read to conn_closing <30 connection closed. <30 new auto-negotiating client connection 30: going from conn_new_cmd to conn_waiting 30: going from conn_waiting to conn_read 30: going from conn_read to conn_parse_cmd 30: Client using the ascii protocol <30 get test
FOUND KEY test 30 sending key test 30 END
30: going from conn_parse_cmd to conn_mwrite 30: going from conn_mwrite to conn_new_cmd 30: going from conn_new_cmd to conn_waiting 30: going from conn_waiting to conn_read 30: going from conn_read to conn_closing <30 connection closed. <30 new auto-negotiating client connection 30: going from conn_new_cmd to conn_waiting 30: going from conn_waiting to conn_read 30: going from conn_read to conn_parse_cmd 30: Client using the ascii protocol <30 set test 0 10 50526 30: going from conn_parse_cmd to conn_nread
FOUND KEY test 30 STORED
30: going from conn_nread to conn_write 30: going from conn_write to conn_new_cmd 30: going from conn_new_cmd to conn_waiting 30: going from conn_waiting to conn_read <31 new auto-negotiating client connection 31: going from conn_new_cmd to conn_waiting 31: going from conn_waiting to conn_read 31: going from conn_read to conn_parse_cmd 31: Client using the ascii protocol <31 get test
FOUND KEY test 31 sending key test 31 END
31: going from conn_parse_cmd to conn_mwrite Segmentation fault The PHP code used to produce this is: <?php function _memcache_test($k, $d) { echo "memcache_connect\n"; $mc = memcache_connect('localhost', 11211, 3); echo "memcache_set\n"; memcache_set($mc, $k, $d, 0, 10); echo "memcache_close\n"; memcache_close($mc); echo "memcache_connect\n"; $mc = memcache_connect('localhost', 11211, 3); echo "memcache_get\n"; $t = memcache_get($mc, $k); echo 'strlen(t): ', strlen($t), "\n"; echo "memcache_close\n"; memcache_close($mc); } echo '<pre>'; $sz = 50526; $s = str_pad('', $sz, '0'); echo 'strlen(s): ', strlen($s), "\n"; _memcache_test('test', $s); ?> Has anyone else seen anything like this? Thanks, thom [ post 2 ] Hi again, Okay, I think I may have found the problem. When sendmsg() (in transmit()) returns -1 (EAGAIN - write will block), the update_event() function encounters a segmentation fault when calling event_add(). This is probably because the previous call of event_base_set() passed in an invalid pointer. The code assumes that event.ev_base will contain a valid pointer to the event base assigned to the event by a previous call to either event_base_set() or event_add(). Maybe the behavior pertaining to this internal event member changed? Here is sample code compiled against libevent-1.4.12: #include <sys/types.h> #include <stdio.h> #include <event.h> static void event_handler(const int fd, const short which, void * arg) { } int main() { struct event_base *eb; struct event ev; eb = event_init(); ev.ev_base = NULL; printf("event_base: %p\n", (void *) eb); printf("event.ev_base: %p\n", (void *) ev.ev_base); event_set(&ev, 1, EV_WRITE | EV_PERSIST, event_handler, NULL); event_base_set(eb, &ev); printf("event.ev_base (event_base_set): %p\n", (void *) ev.ev_base); event_add(&ev, 0); printf("event.ev_base (event_add): %p\n", (void *) ev.ev_base); } It generates the following output: event_base: 0x804b008 event.ev_base: (nil) event.ev_base (event_base_set): (nil) event.ev_base (event_add): (nil) (note how the pointer value never changes) The problem seems to have been resolved after applying the following patch: *** memcached-1.4.0/memcached.c Thu Jul 9 13:16:24 2009 --- memcached-1.4.0-patch/memcached.c Wed Jul 29 12:04:04 2009 *************** *** 329,334 **** --- 329,336 ---- } MEMCACHED_CONN_CREATE(c); + c->event_base = base; + c->rbuf = c->wbuf = 0; c->ilist = 0; c->suffixlist = 0; *************** *** 3027,3033 **** static bool update_event(conn *c, const int new_flags) { assert(c != NULL); ! struct event_base *base = c->event.ev_base; if (c->ev_flags == new_flags) return true; if (event_del(&c->event) == -1) return false; --- 3029,3035 ---- static bool update_event(conn *c, const int new_flags) { assert(c != NULL); ! struct event_base *base = c->event_base; if (c->ev_flags == new_flags) return true; if (event_del(&c->event) == -1) return false; diff -crB memcached-1.4.0/memcached.h memcached-1.4.0-patch/ memcached.h *** memcached-1.4.0/memcached.h Thu Jul 9 13:16:24 2009 --- memcached-1.4.0-patch/memcached.h Wed Jul 29 12:04:36 2009 *************** *** 316,321 **** --- 316,322 ---- enum conn_states state; enum bin_substates substate; struct event event; + struct event_base *event_base; short ev_flags; short which; /** which events were just triggered */ Hope this thread helps someone else. Is this the proper channel to report a possible bug to the memcached team? If not, can someone point me to the right channel. Thanks, thom [ post 4 ] On Jul 29, 12:47 pm, Dustin <dsalli...@gmail.com> wrote:
I'm looking at this. I've never heard of an issue here, and I'm not entirely sure how to test for it proactively at the moment, but you seem to have done quite a good job of investigating it. Thanks a lot.
One really dumb way that I was able to reproduce the error every time (in testing for the possible crash cause) was to update_event(c, EV_WRITE | EV_PERSIST) in drive_machine() at the conn_waiting case (before the EV_READ). The current event flags in the connection data type are set to EV_READ | EV_PERSIST, so this call gets past the _redundant request_ check in update_event(). Thanks for looking at this and I am looking forward to hearing your assessment. thom --- Thank you for looking at this (again). I had to patch our build of 1.4.2 to stop it from periodically crashing. thom -- You received this message because you are listed in the owner or CC fields of this issue, or because you starred this issue. You may adjust your issue notification preferences at: http://code.google.com/hosting/settings