Hi Joel,

On Mon, Feb 07, 2011 at 01:10:53PM -0800, Joel Krauska wrote:
> On 2/7/11 1:01 PM, Willy Tarreau wrote:
> >On Mon, Feb 07, 2011 at 09:45:21AM +0100, Bedis 9 wrote:
> >>>Do you have an example of what purpose it would serve ? I'm asking
> >>>because it's not very easy to implement with table-based algorithms,
> >>>since the size of the table is determined by the GCD of all active
> >>>servers' weights. Thus adding a new server will change the size of
> >>>the table.
> >>>
> >>>It's also a feature I've never seen on other products either, which
> >>>makes be doubt about its usefulness.
> >>
> >>
> >>Hey Willy,
> >>
> >>It's really useful in big organizations, when you have to manage tens
> >>and tens of LBs and you want to ease the management.
> >>An API available remotely allows you to write your own centralized
> >>management tools:
> >>- configuration backup
> >>- configuration edit and push
> >>- collection of statistics
> >>etc...
> >
> >I'm well aware of the usefulness of the API, I was meaning that switching
> >a server's role between active and backup did not seem useful to me ;-)
> 
> 
> What was your original intent of the "Backup Server" feature?

The backup server is the server that must be used when everything else
dies. In two-server setups, it happens that an application does not
support load balancing at all and requires a single active/backup
cluster so this is perfect for this. Other common usages are sorry
servers to serve excuse pages when a site is down. Sometimes it even
happens that backup servers are the old servers being upgraded, and
they're left there for a short validation period (eg: one week) in
case something bad happens to the new servers (bug in a network
driver causing them to panic twice a day, ...).

That's why in this usage it does not make much sense to switch them.

> Our organization uses backup servers to assist with new code rollout and 
> easy rollback. (I'm not fond of it, just looking to automate it with APIs)

I see what you mean, two of my customers are doing the same.

> example use case:
> 
> Four Servers A,B,C,D
> All Running The Same Code Rev
> A & B are primary
> C & D are in backup
> 
> The Upgrade:
> C & D upgrade to new code rev.
> 
> The Flip: (as quickly as possible to try to stay atomic)
> C & D taken out of Backup State
> A & B put in to Backup State
> 
> Sanity Checking Phase:
> Make sure new live roll is performing as expected.
> Hold on to older rev A & B boxes until you feel C & D are solid.
> (typically 15-30 minutes)
> If the new push is terrible, you can revert "The Flip" above.
> 
> Steady State:
> Upgrade A & B to match C & D's code revs.
> Now A & B can be thought of as emergency backup for if/when C/D fall over.
> 
> Lather, rise repeat (swapping AB/CD above)....
> 
> 
> Does the above use case make sense to you?

Yes except that in my experience, once the code has been switched to C/D,
A/B will never be put in production anymore because the application is
backwards compatible but not upwards compatible, or simply because it's
not acceptable to present the old site to visitors after the new version
has been advertised and published.

What I'm seeing in field is a variant of this. A/B will have their nominal
weights while C/D will have a zero weight. That means they'll never ever be
used. C/D get installed with the new code. They're tested using cookies
and/or force-persist. Once the application seems to run OK on C/D, they're
either slowly added to the farm (if the application supports it) or switched
at once simply by switching weights, either in the config or on the stats
CLI. It does not break existing sessions and leaves much more flexibility
for the switchover. Also there is no risk that the undesired servers gets
accidentely used in case some hiccups happens on the new servers.

> Known limitations:
> 
> Ideally it would be much easier to downgrade a single system's code.
> (working on that)
> 
> Also this clearly doesn't well scale to larger deployments -- 50 active 
> and 50 standby. (working on that too.)

The principle with switching weights allows that too. I know some people
who run a new version of one server with a small weight for some time before
the main switch. You can't do that with backup servers. Sometimes they can
even test the application's robustness by exagerating the weight on one
server and measure the effects.

I must say I really like this way of doing it because it's just as if they
were having a mixing table full of potentiometers when others just have
on/off switches. Seeing that in action is impressive when you deal with
hundreds of servers in a single instance !

Willy


Reply via email to