On Fri, Sep 22, 2000 at 10:38:25PM +0200, [EMAIL PROTECTED] wrote:
>
> Problem about reading for a couple days is that this implies user's
> job is knowing everything about system administration.  This is
> possible if eiuser is a consultant or user is a system administrator
> in a big compnay so there are hundred people around user going with
> the task of making money for the company.  If company is three of four
> persons or if user is a private individual this kind of "learning
> overhead" is unacceptable (no time left for real work).

There is no perfect solution to this problem, and there never will be.
Imagining that filesystem-stability nirvana exists will only cause
frustration.

No matter what filesystem you have, no matter what hardware you have,
and no matter how well-put-together the distribution, unless you've
got a contract with the universe insuring that nothing untoward will
happen in the vicinity of your machine, there will always be
possibilities for types of filesystem corruption for which the
standard tools will be insufficient and for types of filesystem
corruption for which the even the best of gurus will have little if
any success in recovering data.

Some problems can be avoided by good system design, some problems can
be automatically fixed given a well-designed distribution, some
problems will require manual but easier-to-understand intervention,
some problems will require the intervention of gurus, and some
problems could require tens or hundreds of thousands of dollars for
clean-room data recovery.

To approach a problem that is inherently not perfectly solvable and
simply complain about that fact does no good to anyone.  A better
way of looking at things is to see how each player can improve his
situation and the situation of others.  For instance:

Filesystem writers and writers of filesystem recovery tools:

     Make the filesystem better able to deal with types of
     corruption that are reported.  Improve error messages 
     in the recovery tools, perhaps incorporating some amount
     of documentation into the tool itself.

     (A useful tool for my job in supporting another unix,
     is a tool that makes a copy of all the filesystem metadata,
     so that a filesystem can be ftp'd to a guru and the damage
     understood and bugs fixed, without revealing any private
     data other than directory structure and file names and
     permissions.  That might be a useful tool to have for
     ext[23] and Reiserfs and it would provide a more direct
     way for users to present known problems to the programmers.)

Distribution makers:

     Emphasise stability over speed when making suggestions
     for filesystem types during install.

     Include references to documentation in the root filesystem, (!)
     when startup scripts drop an admin into a shell for running
     fsck manually at boot.

     Use the most conservative hdparm settings.  Offer the user
     a tool to set and test other settings.  (Perhaps some sort
     of almost filesystem regression test that can be done in a
     temporarily-created partition on each hard drive.  I for
     one wouldn't mind leaving my machine on overnight as these
     tests were done.)

     Realize that untrained users will be acting as system
     administrators.  Be sure common failure modes are structured
     such that these users, even if they don't fully understand
     what's going on, can be at least somewhat informed as to
     the basics and how they can get help.

System administrators:

     Keep good backups.  Verify backup integrity.  Be sure you
     have a recovery plan that can work.  Verify backup integrity.
     Be sure you have all the information and media required to
     quickly recover if need be.

     Send a running transaction log to another system if a
     day-old-restore of your system isn't good enough.  (I don't
     know very much about those issues, but if you know in advance
     that system downtime would mean your company would lose data
     and order information, then that's smoking-gun proof that
     you need to fix your backup and recovery strategy.) 

Businesses:

     Realize that downtime is extremely expensive.  Be willing
     to allocate time and money to investigating and verifying
     disaster recovery procedures.

Users:

     Realize that you're a system administrator even if you aren't
     interested in being one.  Reading up on the workings of the
     system will improve your chances of recovering it if things
     go wrong.  If there's little that's important to you on the
     system, then it's not such a big deal.  If you're not really
     into computers but yet have the only copy of your life's work
     on the machine and you don't know anyone who could help you
     if things go horribly wrong, you may want to find and print
     out a copy of people to call or companies to go to if the
     worst happens.

     Don't be fooled by pretty interfaces--Eye candy on the surface
     does not imply a stable filesystem underneath.  Even Macs can
     have filesystem problems that require expert help to recover from.

     (Okay, the "user" part is the hardest to write, and it's the
     most unfair one of all, as it implies more responsibility
     than even I think home users should be expected to shoulder
     very often, even if they're really responsible in fact.)

I would think dividing up the responsibilities in such a way
as the above when thinking through the issue is a better way
to approach the problem than simply throwing ones hands up in
the air.  It will also increase the likelihood of helpful
suggestions (or patches) being sent to the best group.

 -Mark Shewmaker
  [EMAIL PROTECTED]



_______________________________________________
Redhat-devel-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-devel-list

Reply via email to