Hey memcached folks,

We've encountered an odd production issue with the java memcached client. This is occuring in a memcached cluster of 30 nodes running server version 1.2.1, with java client version 1.5.1. I realize neither of these are the latest and greatest versions of the respective software, but I see nothing as of yet in the change logs to indicate that our problem would be solved by upgrading.

The issue is that when one or more server nodes go down and then later come back up, dead connections seem to persist in the client connection pools for an indefinite period. We are aware of this because we can see the memcached client SockIOPool class logging the following error on some percentage of incoming requests (where "foo" is just the output of the socket's toString method and "bar" is the host).

++++ socket in avail pool is not connected: foo for host: bar

The errors seem to (very very) slowly decrease over time, but a lot of them persist after 24 hours and the only remedy is to restart the application JVMs running the memcached client. Note that while this error is occuring, the memcached node that died and then restarted seems to be getting a relatively normal volume of traffic.

These are the settings we currently use on SockIOPool:

pool.setMinConn(5);
pool.setMaxConn(50);
pool.setInitConn(5);
pool.setSocketTO(1000);
pool.setSocketConnectTO(100);
pool.setFailover(false);
pool.setFailback(true);
pool.setAliveCheck(false);

Any help would be greatly appreciated.

Eli Bingham
Senior Engineer
Pandora Media, Inc.

Reply via email to