Hi Joel, On Mon, Feb 07, 2011 at 01:10:53PM -0800, Joel Krauska wrote: > On 2/7/11 1:01 PM, Willy Tarreau wrote: > >On Mon, Feb 07, 2011 at 09:45:21AM +0100, Bedis 9 wrote: > >>>Do you have an example of what purpose it would serve ? I'm asking > >>>because it's not very easy to implement with table-based algorithms, > >>>since the size of the table is determined by the GCD of all active > >>>servers' weights. Thus adding a new server will change the size of > >>>the table. > >>> > >>>It's also a feature I've never seen on other products either, which > >>>makes be doubt about its usefulness. > >> > >> > >>Hey Willy, > >> > >>It's really useful in big organizations, when you have to manage tens > >>and tens of LBs and you want to ease the management. > >>An API available remotely allows you to write your own centralized > >>management tools: > >>- configuration backup > >>- configuration edit and push > >>- collection of statistics > >>etc... > > > >I'm well aware of the usefulness of the API, I was meaning that switching > >a server's role between active and backup did not seem useful to me ;-) > > > What was your original intent of the "Backup Server" feature?
The backup server is the server that must be used when everything else dies. In two-server setups, it happens that an application does not support load balancing at all and requires a single active/backup cluster so this is perfect for this. Other common usages are sorry servers to serve excuse pages when a site is down. Sometimes it even happens that backup servers are the old servers being upgraded, and they're left there for a short validation period (eg: one week) in case something bad happens to the new servers (bug in a network driver causing them to panic twice a day, ...). That's why in this usage it does not make much sense to switch them. > Our organization uses backup servers to assist with new code rollout and > easy rollback. (I'm not fond of it, just looking to automate it with APIs) I see what you mean, two of my customers are doing the same. > example use case: > > Four Servers A,B,C,D > All Running The Same Code Rev > A & B are primary > C & D are in backup > > The Upgrade: > C & D upgrade to new code rev. > > The Flip: (as quickly as possible to try to stay atomic) > C & D taken out of Backup State > A & B put in to Backup State > > Sanity Checking Phase: > Make sure new live roll is performing as expected. > Hold on to older rev A & B boxes until you feel C & D are solid. > (typically 15-30 minutes) > If the new push is terrible, you can revert "The Flip" above. > > Steady State: > Upgrade A & B to match C & D's code revs. > Now A & B can be thought of as emergency backup for if/when C/D fall over. > > Lather, rise repeat (swapping AB/CD above).... > > > Does the above use case make sense to you? Yes except that in my experience, once the code has been switched to C/D, A/B will never be put in production anymore because the application is backwards compatible but not upwards compatible, or simply because it's not acceptable to present the old site to visitors after the new version has been advertised and published. What I'm seeing in field is a variant of this. A/B will have their nominal weights while C/D will have a zero weight. That means they'll never ever be used. C/D get installed with the new code. They're tested using cookies and/or force-persist. Once the application seems to run OK on C/D, they're either slowly added to the farm (if the application supports it) or switched at once simply by switching weights, either in the config or on the stats CLI. It does not break existing sessions and leaves much more flexibility for the switchover. Also there is no risk that the undesired servers gets accidentely used in case some hiccups happens on the new servers. > Known limitations: > > Ideally it would be much easier to downgrade a single system's code. > (working on that) > > Also this clearly doesn't well scale to larger deployments -- 50 active > and 50 standby. (working on that too.) The principle with switching weights allows that too. I know some people who run a new version of one server with a small weight for some time before the main switch. You can't do that with backup servers. Sometimes they can even test the application's robustness by exagerating the weight on one server and measure the effects. I must say I really like this way of doing it because it's just as if they were having a mixing table full of potentiometers when others just have on/off switches. Seeing that in action is impressive when you deal with hundreds of servers in a single instance ! Willy