To memcached community,

I have found memcached race condition bug, and here is the detail and
patch that will fix this problem.  We would love to hear what this
community think of this patch.

Under normal circumstances, each worker thread accesses its own
event_base
and same goes with main thread.  We have found that under specific
situation where worker thread access main thread's event_base
(main_base)
resulting in unexepected result.


[Example Scenario]
 1. throw alot clients (well over connection limit) to connect to
memcached.
 2. memcached's file descriptors reaches maximum setting
 3. main thread calls accept_new_conns(false) to stop polling sfd
 4. main thread's event_base_loop stops accepting incoming request
 5. main thread stops to acceess main_base at this point
 6. a client disconnects
 7. worker thread calls accept_new_conns(true) to start polling sfd
 8. accept_new_conns uses mutex to protect main_base's race condition
 9. worker thread starts loop with listen_conn
10. worker thread calls update_event with first conn
11. after first update_event(), main thread start polling sfd and
    starts to access main_base <- PROBLEM
12. Worker thread continues to call update_event() with second conn

At this point, worker thread and main thread both acccess and modify
main_base.

With incorrect event_count, event_count is set to zero while there
is an actual event waiting.  The result?  memcached passes through
event_base_loop() quietly shutting down daemon.


[Quick Fix]
Set memcached to only listen to a single interface.

example memcached setting:
> memcached -U 0 -u nobody -p 11222 -t 4 -m 16000 -C -c 1000 -l 192.168.0.1 -v


[Reproducing]
Use attack script (thanks to mala): http://gist.github.com/522741

w/ -l interface restriction: we have seen over 70 hours of stability
 - yes, you will see "Too many open connections." but that's not an
issue here

w/o -l interface restriction: memcached quits w/ attack script


Please give us some feedback on attach patch.  This should fix the
race condition we have experienced.

At last, we would like to thank numerous contributors on twitter that
helped us nail this problem.  http://togetter.com/li/41702

Shigeki

Reply via email to