> You are right, that the algorithm is somehow broken :) but you are > missing a point here that makes things even worse: > 1) Imagine you have cache servers A, B and C > 2) C goes down > 3) keys are remapped to A and B > 4) C comes up > and here comes the mess :) > * since you have assigned some keys (k1, k2) from C to A and B, A and > B might now have the new versions of k1 and k2. Imagine the cache on C > is still there (it was probably a network or firewall issue), now C is > serving the old keys k1 and k2 ;)
Yes i Know, I talked in this list about this. Regarded network problems this situation can happen often. Some intermediate level to resolve consisteny cache will can help us, somethink like mcporxy ? > * I have not read the source you are referring to, but if it does > suff like you describe, namely remap keys from A to B and vice versa, > the mess is bigger. When C went down, you have remapped k3 from A to > B, so B might now have a newer version of k3. When C comes up, you > remap again k3 to A, so A will still serve the old version of k3. > > Messy messy, most of the clients are broken in this way: they remap > keys upon server down and don't care that you might serve old stuff > after server up + remap again. This last night I spent some time working with downed_server_test.sh - shell script that you can see how many keys are remaped when one serve goes down and one server goes up, be careful it has some errors ;) - and another conclusion about the maxium but not perfect architecture with consistent hashing and memcached is use a several name host with plus one "neutral" host for backup, and use monit and dns for automatic failover - only when one host down, at hand to come bak at status quo for clear cache mem. Llike this Hosts in you ketama or other consisten chache librarys : server1.mycompany.com:1121 200 server2.mycompany.com:1121 200 server3.mycompany.com:1121 200 server1, server2, and server3 are mapped into 10.0.0.1, 10.0.0.2, 10.0.0.3. One plus server 10.0.0.4 When any server goes down, perhaps 10.0.0.3 monit change dns automaticly server3.mycompany.com to 10.0.0.4. This is fundamental to avoid different name rehashing !!! To come back at status quo you need first restart memcached server to clean cache at 10.0.0.3 and change dns at hand ;) Another solution is using some tool to avoid inconsisteny cache, bu i dont know any solution yet implemented -- --pau
