On Thu, 21 Jan 1999, Ricky Beam wrote:

> Just today, I replaced an Ultra10's solaris installation with linux.  For
> some yet-to-be-explained reason, solaris kept leaking drive space and
> leaving "deleted" files in the filesystem with a link count of zero.
> (It's damned hard to deal with ghost files.)  We shall see if linux

We encountered this too, in the context of an NFS bug/problem.  Were the
files NFS served and being written by a linux box?

> lives there any better -- personally, I think Sun should be taken out
> and shot for loading the new sparcs with IDE drives; and what's with
> that upside down case?

Shooting a corporation doesn't do much, but not buying their products,
well, there's something that hurts...

> >Typical uptime for an NT server is generally measured in weeks while
> >the linux/solaris boxes generally stay up until we need to either
> >physically move them or add/remove hardware to the system.  Depending
> 
> Well, having used a wide variety of OSen... the OS is generally as
> stable as the (crappy) applications you make it run.  Everything can
> be crashed -- there are always bugs.  I have a solaris box that used
> to be a RADIUS server that has been up for over a year.  (In fact,
> it's not be rebooted since it was placed in the rack... back in Oct.
> 1997.)

Sure, but some OSen and application sets are a bit more stable than
others.  It is damn hard to crash a linux box from userspace (won't say
that it cannot be done, especially if you try, but it isn't easy).  Is
this because the applications are well written and don't e.g. leak
memory like sieves?  Is it because the kernel is well written and
doesn't leak memory like a sieve?  Is it because the Unix model for
process management and security is pretty damn robust and protects the
integrity of the kernel from nearly any userspace trashing so that an
experienced root person can almost always recover systems function
without a reboot even if a single "nasty" application (e.g. X11,
netscape) or daemon (amd, lpd, whatever) misbehaves?  All of the above?

The point is that a lot of OSen don't manage all these contributors to
stability equally, or particularly, well.  And then there are
performance issues -- Solaris 2.3 may have been "stable", but it truly
deserved the nickname Slowaris.  And finally there are
management/interface issues.  In order to recover a partly "hung"
system, it really helps if the OS intrinsically and naturally supports a
command line interface to the OS that doesn't depend on the GUI being
up, since the GUI is very likely to be a part of the problem (huge
program, very complex and memory intensive, connected in userspace to
userspace programs that can hang it).  On a linux box, you can often
kill the GUI at the keyboard (Ctrl-Alt-Bksp), pop into a virtual tty
(Alt-Fx) or just login to the system over the network, su to root, kill
the damn X server and all the offending user's processes if it is hung
that deep, and voila, the system is recovered.  Of course some of that
would work on a Solaris box.  An NT box?  Dunno -- my own experiences
with NT suggest that it really wants you to manage via the GUI, but I'm
not expert enough to know if one can rlogin to an NT box and recover
from a GUI crash short of a reboot.  I kind of doubt it -- this is
usually only going to be possible if the GUI isn't a privileged part of
the operating system, and Windows generally is.  

I suspect that this is why a lot of NT users end up frustrated.  We've
just heard a number of horror stories of folks who have trivial and
common applications that can and do crash the system by hanging the GUI
or whatever.  Is the system "really" hung?  Could a knowledgeable enough
NT manager recover it without a reboot, losing only perhaps the login
session but not backgrounded jobs?  Don't know.  I'd be interested in
learning, though.  Any NT experts (willing or unwilling;-) out there?
Is NT Unixoid enough to be generally recoverable from userspace trash
that hangs a user's GUI session without a reboot?  Can one install and
manage a network of NT boxes without consoles?  Are userspace systems
calls handled well enough to preserve the system against programmer
error?  Enquiring minds like to know...

> >Readers Digest Version:  You want a scalable and robust platform
> >for mission critical file servers?  Avoid NT like the plague.
> 
> If only it were that simple...

The reason that it is not is that way too many managers (pointy haired
or not) go into a decision making process with a "strong predisposition"
toward a particular solution rather than actually considering and
weighing all the alternatives.

I have found that one can OFTEN influence the outcome either before
(easier) or after (harder) the fact by doing a hard-nosed, fully
documented analysis of the technical merits of the alternatives and
offering them up to the managers in question gratis for their
consideration.  If one presents the information in the rough format of a
1-2 page summary (linux is overwhelmingly cost-beneficial and stable and
manageable, NT is expensive, unstable, and requires constant management
intervention) that emphasizes the "bottom line" price/performance
differential followed by a detailed technical report, with references
and benchmarks (preferrably locally generated, but sourced are OK as
well) it can work wonders.

Remember, the managers themselves are usually responsible to a higher
power -- the bookkeepers and bean counters.  If you produce a report and
put it in writing and they FAIL to pay attention to it, if the doom you
predict passes, you might well end up with their job and they know it.
This is a good reason to look for a job elsewhere if they fail to pay
any attention to it at all -- it isn't worth it to work for blithering
idiots at a time when skilled systems persons are in high demand.

A lot of time, the managers are smarter and less pointy-haired than you
might expect.  Their predisposition toward e.g. NT is not without a
foundation, after all.  They read NT articles in their business
magazines.  They eat lunch with Microsoft's smooth and professional FE
staff.  They are by nature conservative, and they "know" that they can
make NT work at least as well as all the other groups that are using it
at a predictable (if high) price.  Predictability is good, from their
point of view -- they confuse it with technical stability, but that is a
matter of education.

If you provide them with technical information, well-digested and
flawlessly documented and they ARE reasonably smart, they will at least
read it.  They may still choose to go with NT (conservatism is usually
defensible, they will reason, and if something goes wrong with NT there
will at least be someone other than my own staff to blame).  If they are
truly smart, though, they may also agree to try a small linux pilot on
the side or even to pit trial NT and linux installations against one
another and let you prove your point (just as your 486 experiment above
is rather convincing to anyone with any sense comparing the cost/benefit
of the two paradigms for small webservers).  We know who is very likely
to win such a comparison.  

Don't forget to compare the technical advantages of linux-smp in server
situations (this IS the linux-smp list, right?:-) as this is an area
where IMHO linux overwhelmingly outshines NT.  2.0.x is incredibly
stable and efficient enough for most servers (and more efficient than NT
on a point by point basis anyway).  2.2.x is truly a next-generation
kernel and appears to be stable enough for production, especially if one
has 2.0.x to fall back on if your particular hardware/application mix
has problems under 2.2 (we do still hear of occasional problems with
2.1.x - 2.2.x on this list, but there are also guys like Bob Hyatt and
many others who get flawless performance under heavy and
SMP-sophisticated loads).

The one place where linux will still lose to NT in an office manager's
mind (with some justification) is on the application front.  If a shop
has adopted a particular Microsoft-based application suite (e.g. Office
97) there are real costs associated with converting to something else
that will run on linux, e.g. Corel or Applixware or StarOffice, however
inexpensive or functional the linux alternative, and I know from
annoying personal experience that MS alters their file formats just fast
enough to make file imports by these products from O97 fail as often as
they succeed.  Sure, it's Microsoft Evil Empire stuff, but it's also
reality and managers quite properly are paid to deal with reality, not
engage in crusades.  

Here again, a small pilot project can often be argued for -- demonstrate
the cost-savings of a linux box running applixware (total software cost
maybe $150) vs NT (any level) plus Office 97 (minimum cost perhaps
$500?) and multiply by fifty or a hundred seats and it really adds up.
But a manager isn't going to believe that e.g. Applixware can replace
Office 97 unless/until he or she sees it in action.  Once the camel's
nose is under the tent (presuming that everybody knows that story:-) it
is much easier and cheaper to come in out of the cold a little bit at a
time...

    rgb

Robert G. Brown                        http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:[EMAIL PROTECTED]



-
Linux SMP list: FIRST see FAQ at http://www.irisa.fr/prive/mentre/smp-faq/
To Unsubscribe: send "unsubscribe linux-smp" to [EMAIL PROTECTED]

Reply via email to