Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 99 by thomchin: Memcached 1.4.2 server segmentation fault
http://code.google.com/p/memcached/issues/detail?id=99

Hi guys,

This was an issue I had with version 1.4.0 and submitted a report in the
group message board.  Dustin responded to me:

"We've got a new release coming very soon (fixes over 1.4.0).  This looks
like the kind of thing that needs to be there."

I mistook "that needs to be there" as "bug will be fixed in the new
releases", but now I think he meant "submitted to the bug tracker".

Cutting and pasting from the original posting
(http://groups.google.com/group/memcached/browse_thread/thread/778dfe3968a7f74a)

[ post 1 ]

Hi guys,

I have been trying to diagnose a memcached segmentation fault.  It is
running on a 32 bit Linux 2.6.20 build (Gentoo).  The memcached server
is being connected through the standard PHP (5.2.6 module to Apache 2)
client.  After a few reloads of the provided script, the server will
crash (increasing the data size seems to accelerate it).

The server is started as follows:

/usr/bin/memcached -l localhost -vvv

It will crash with the following dumped to the terminal:

...
<30 new auto-negotiating client connection
30: going from conn_new_cmd to conn_waiting
30: going from conn_waiting to conn_read
30: going from conn_read to conn_parse_cmd
30: Client using the ascii protocol
<30 set test 0 10 50526
30: going from conn_parse_cmd to conn_nread
FOUND KEY test
30 STORED

30: going from conn_nread to conn_write
30: going from conn_write to conn_new_cmd
30: going from conn_new_cmd to conn_waiting
30: going from conn_waiting to conn_read
30: going from conn_read to conn_closing
<30 connection closed.
<30 new auto-negotiating client connection
30: going from conn_new_cmd to conn_waiting
30: going from conn_waiting to conn_read
30: going from conn_read to conn_parse_cmd
30: Client using the ascii protocol
<30 get test
FOUND KEY test
30 sending key test
30 END

30: going from conn_parse_cmd to conn_mwrite
30: going from conn_mwrite to conn_new_cmd
30: going from conn_new_cmd to conn_waiting
30: going from conn_waiting to conn_read
30: going from conn_read to conn_closing
<30 connection closed.
<30 new auto-negotiating client connection
30: going from conn_new_cmd to conn_waiting
30: going from conn_waiting to conn_read
30: going from conn_read to conn_parse_cmd
30: Client using the ascii protocol
<30 set test 0 10 50526
30: going from conn_parse_cmd to conn_nread
FOUND KEY test
30 STORED

30: going from conn_nread to conn_write
30: going from conn_write to conn_new_cmd
30: going from conn_new_cmd to conn_waiting
30: going from conn_waiting to conn_read
<31 new auto-negotiating client connection
31: going from conn_new_cmd to conn_waiting
31: going from conn_waiting to conn_read
31: going from conn_read to conn_parse_cmd
31: Client using the ascii protocol
<31 get test
FOUND KEY test
31 sending key test
31 END

31: going from conn_parse_cmd to conn_mwrite
Segmentation fault

The PHP code used to produce this is:

<?php
function _memcache_test($k, $d)
{
        echo "memcache_connect\n";
        $mc = memcache_connect('localhost', 11211, 3);

        echo "memcache_set\n";
        memcache_set($mc, $k, $d, 0, 10);

        echo "memcache_close\n";
        memcache_close($mc);

        echo "memcache_connect\n";
        $mc = memcache_connect('localhost', 11211, 3);

        echo "memcache_get\n";
        $t = memcache_get($mc, $k);

        echo 'strlen(t): ', strlen($t), "\n";

        echo "memcache_close\n";
        memcache_close($mc);

}

echo '<pre>';

$sz = 50526;

$s = str_pad('', $sz, '0');

echo 'strlen(s): ', strlen($s), "\n";

_memcache_test('test', $s);
?>

Has anyone else seen anything like this?

Thanks,
thom

[ post 2 ]

Hi again,

Okay, I think I may have found the problem.  When sendmsg() (in
transmit()) returns -1 (EAGAIN - write will block), the update_event()
function encounters a segmentation fault when calling event_add().
This is probably because the previous call of event_base_set() passed
in an invalid pointer.  The code assumes that event.ev_base will
contain a valid pointer to the event base assigned to the event by a
previous call to either event_base_set() or event_add().  Maybe the
behavior pertaining to this internal event member changed?  Here is
sample code compiled against libevent-1.4.12:

#include <sys/types.h>
#include <stdio.h>
#include <event.h>

static void event_handler(const int fd, const short which, void * arg)
{ }

int
main()
{
        struct event_base               *eb;
        struct event                    ev;

        eb = event_init();

        ev.ev_base = NULL;

        printf("event_base: %p\n", (void *) eb);
        printf("event.ev_base: %p\n", (void *) ev.ev_base);

        event_set(&ev, 1, EV_WRITE | EV_PERSIST, event_handler, NULL);

        event_base_set(eb, &ev);
        printf("event.ev_base (event_base_set): %p\n", (void *)
ev.ev_base);

        event_add(&ev, 0);
        printf("event.ev_base (event_add): %p\n", (void *)
ev.ev_base);

}

It generates the following output:

event_base: 0x804b008
event.ev_base: (nil)
event.ev_base (event_base_set): (nil)
event.ev_base (event_add): (nil)

(note how the pointer value never changes)

The problem seems to have been resolved after applying the following
patch:

*** memcached-1.4.0/memcached.c Thu Jul  9 13:16:24 2009
--- memcached-1.4.0-patch/memcached.c   Wed Jul 29 12:04:04 2009
***************
*** 329,334 ****
--- 329,336 ----
          }
          MEMCACHED_CONN_CREATE(c);

+         c->event_base = base;
+
          c->rbuf = c->wbuf = 0;
          c->ilist = 0;
          c->suffixlist = 0;
***************
*** 3027,3033 ****
  static bool update_event(conn *c, const int new_flags) {
      assert(c != NULL);

!     struct event_base *base = c->event.ev_base;
      if (c->ev_flags == new_flags)
          return true;
      if (event_del(&c->event) == -1) return false;
--- 3029,3035 ----
  static bool update_event(conn *c, const int new_flags) {
      assert(c != NULL);

!     struct event_base *base = c->event_base;
      if (c->ev_flags == new_flags)
          return true;
      if (event_del(&c->event) == -1) return false;
diff -crB memcached-1.4.0/memcached.h memcached-1.4.0-patch/
memcached.h
*** memcached-1.4.0/memcached.h Thu Jul  9 13:16:24 2009
--- memcached-1.4.0-patch/memcached.h   Wed Jul 29 12:04:36 2009
***************
*** 316,321 ****
--- 316,322 ----
      enum conn_states  state;
      enum bin_substates substate;
      struct event event;
+     struct event_base *event_base;
      short  ev_flags;
      short  which;   /** which events were just triggered */

Hope this thread helps someone else.  Is this the proper channel to
report a possible bug to the memcached team?  If not, can someone
point me to the right channel.

Thanks,
thom

[ post 4 ]

On Jul 29, 12:47 pm, Dustin <dsalli...@gmail.com> wrote:

   I'm looking at this.  I've never heard of an issue here, and I'm not
entirely sure how to test for it proactively at the moment, but you
seem to have done quite a good job of investigating it.  Thanks a lot.

One really dumb way that I was able to reproduce the error every time
(in testing for the possible crash cause) was to update_event(c,
EV_WRITE | EV_PERSIST) in drive_machine() at the conn_waiting case
(before the EV_READ).  The current event flags in the connection data
type are set to EV_READ | EV_PERSIST, so this call gets past the
_redundant request_ check in update_event().

Thanks for looking at this and I am looking forward to hearing your
assessment.

thom

---

Thank you for looking at this (again).  I had to patch our build of 1.4.2
to stop it from periodically crashing.

thom

--
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings

Reply via email to