On 01/06/07, Narayan Desai <[EMAIL PROTECTED]> wrote:

I also think these sorts of discussions, while quite interesting, put
the cart before the horse a little bit. Luke was talking about
fatfingering configs causing a massive failure. There are two ways to
do this: either causing a failure of core infrastructure in a small
number of places, or causing a per-node failure with a configuration
change or software upgrade. In both cases, a configuration file syntax
error can cause services, persistent or not, to fail. Since this is a
common occurrence, any safety mechanism that we want to be
practical needs to handle this.


Whilst I mostly agree with you, I wanted to point out that failure in
configuration file syntax is markedly less likely if you use templating as
opposed to distributing files as a whole, and that some of the problems you
go on to talk about, e.g. valid IP addresses, valid hostnames, well formed
values, and so forth can be mostly prevented by suitable (and cheap) source
file validation; e.g. regex's on parameter values, DNS and IP lookups at
compile time. This is the approach of both Quattor and LCFG. The overhead
isn't very great because now the values that the sysadmin is changing fall
into a standard format which you can write standard validators for (e.g.
test that this IS a hostname of a host in my system; test that this is a
word with no punctuation in it etc etc). The templates take care of making
sure the tabs are all in the right places and whatever other horridness is
necessary.

...

For this reason, it is not clear to me that this is a practical
near-term solution on Unix systems. As an alternative, we have been
working with behavioral tests, and they provide a much more
immediately useful set of functions. Behavioral tests aren't as
powerful as the analytical ones that Sanjai does, but they sure are a
lot more incremental.


There are problems with behavioural tests as well of course. Not only
are many of the things that you want to test inaccessible from shell script
probes as well, but shell tests inevitably occur after deployment, by which
time it may be too late to undo the damage you caused by doing the change in
the first place, and the question of what to do after deployment fails is a
hard one.

One conclusion I have come to over the last few years is that system
administrators (in general) don't have time in large enough chunks to
readily pursue non-incremental solutions for problems. There are, of
course, exceptions to this, but in general, this seems to hold. I
think that this has pretty big impact on how people adopt tools and
techniques. I would worry that any sort of FOL type system would be
hard enough to deploy and use that it would never make it too far out
of the lab (unless it was completely turn-key, which I gather Sanjai's
tools are)


We agree that average sysadmins writing first order logic is a non-starter.
My concept with cfgw was that someone else (perhaps a single senior
sysadmin, perhaps a tool designer) would write useful predicates which could
then be enabled. This still seems workable to me ("check my systems for this
property, and give me a report") from the perspective of use, but the
technical challenges are a lot harder. Simply put, I don't have time to do
it myself, and noone else seems very interested!

Ed
_______________________________________________
lssconf-discuss mailing list
lssconf-discuss@inf.ed.ac.uk
http://lists.inf.ed.ac.uk/mailman/listinfo/lssconf-discuss

Reply via email to