> @Les, you make a clear and concise point. thnx. > > In this thread, i'm really keen on exploring a theoretical possibility (that > could become very practical for very large installations): > > -- at what node count (for a given pool) may/could we start to experience > problems related to performance (server, network or even client) > assuming a near perfect hardware/network set-up?
I think the really basic theoretical response is: - If your request will easily fit in the TCP send buffer and immediately transfer out the network card, it's best if it hits a single server. - If your requests are large, you can get lower latency responses by not waiting on the TCP socket. - Then there's some fiddling in the middle. - Each time a client runs "send" that's a syscall, so more do suck, but keep in mind the above tradeoff: A little system cpu time vs waiting for TCP Ack's. In reality it doesn't tend to matter that much. The point of my response to the facebook "multiget hole" is that you can tell clients to group keys to specific or subsets of servers, (like all keys related to a particular user), so you can have a massive pool and still generally avoid contacting all of them on every request. > -- if a memcacached client were to pool say, 2,000 or 20,000 connections > (again, theoretical but not entirely impractical given the rate of > internet growth), wud that not inject enough overhead -- connection or > otherwise -- on the client side to, say, warrant a direct fetch from the > database? in such a case, we wud have established a *theoretical* maximum > number nodes in a pool for that given client in near perfect conditions. The theory depends on your setup, of course: - Accessing the server hash takes no time (it's a hash), calculating it is the time consuming one. We've seen clients misbehave and seriously slow things down by recalculating a consistent hash on every request. So long as you're internally caching the continuum the lookups are free. - Established TCP sockets mostly just waste RAM, but don't generally slow things down. So for a client server, you can calculate the # of memcached instances * number of apache procs or whatever * the amount of memory overhead per TCP socket compared to the amount of RAM in the box and there's your limit. If you're using persistent connections. - If you decide to not use persistent connections, and design your application so satisfying a page read would hit at *most* something like 3 memcached instances, you can go much higher. Tune the servers for TIME_WAIT reuse, higher local ports, etc, which deals with the TCP churn. Connections are established on first use, then reused until the end of the request, so the TCP SYN/ACK cycle for 1-3 (or even more) instances won't add up to much. Pretending you can have an infinite number of servers on the same L2 segment you would likely be limited purely by bandwidth, or the amount of memory required to load the consistent hash for clients. Probably tens of thousands. - Or use UDP, if your data is tiny and you tune the fuck out of it. Typically it doesn't seem to be much faster, but I may get a boost out of it with some new linux syscalls. - Or (Matt/Dustin correct me if I'm wrong) you use a client design like spymemcached. The memcached binary protocol can actually allow many client instances to use the same server connections. Each client stacks commands in the TCP sockets like a queue (you could even theoretically add a few more connections if you block too long waiting for space), then they get responses routed to them off the same socket. This means you can use persistent connections, and generally have one socket per server instance for an entire app server. Many thousands should scale okay. - Remember Moore's law does grow computers very quickly. Maybe not as fast as the internet, but ten years ago you would have 100 megabit 2G ram memcached instances and need an awful lot of them. Right now 10ge is dropping in price, 100G+ RAM servers are more affordable, and the industry is already looking toward higher network rates. So as your company grows, you get opportunities to cut the instance count every few years. > -- also, i wud think the hashing algo wud deteriorate after a given > number of nodes.. admittedly, this number could be very large indeed and > also, i know this is unlikely in probably 99.999% of cases but it wud be > great to factor in the maths behind science. I sorta answered this above. Should put this into a wiki page I guess... -Dormando