On Fri, 22 Sep 2000, Mark Shewmaker wrote:

>> Problem about reading for a couple days is that this implies user's
>> job is knowing everything about system administration.  This is
>> possible if eiuser is a consultant or user is a system administrator
>> in a big compnay so there are hundred people around user going with
>> the task of making money for the company.  If company is three of four
>> persons or if user is a private individual this kind of "learning
>> overhead" is unacceptable (no time left for real work).
>
>There is no perfect solution to this problem, and there never will be.
>Imagining that filesystem-stability nirvana exists will only cause
>frustration.

Yep, that is a good way of putting it.  ;o)

>No matter what filesystem you have, no matter what hardware you have,
>and no matter how well-put-together the distribution, unless you've
>got a contract with the universe insuring that nothing untoward will
>happen in the vicinity of your machine, there will always be
>possibilities for types of filesystem corruption for which the
>standard tools will be insufficient and for types of filesystem
>corruption for which the even the best of gurus will have little if
>any success in recovering data.

Absolutely.


>Some problems can be avoided by good system design, some problems can
>be automatically fixed given a well-designed distribution, some
>problems will require manual but easier-to-understand intervention,
>some problems will require the intervention of gurus, and some
>problems could require tens or hundreds of thousands of dollars for
>clean-room data recovery.

My, you have a good way with words.  ;o)  This is the exact point
I was trying to get across as well, but I think you worded it
better.

Computer problems will always occur, both software and hardware,
due to the nature of things.  No system is 100% fault tolerant,
and as you say, it either fixes itself, requires a user to do so,
or to pay someone who does know to do the job.


>To approach a problem that is inherently not perfectly solvable and
>simply complain about that fact does no good to anyone.  A better
>way of looking at things is to see how each player can improve his
>situation and the situation of others.  For instance:
>
>Filesystem writers and writers of filesystem recovery tools:
>
>     Make the filesystem better able to deal with types of
>     corruption that are reported.  Improve error messages 
>     in the recovery tools, perhaps incorporating some amount
>     of documentation into the tool itself.
>
>     (A useful tool for my job in supporting another unix,
>     is a tool that makes a copy of all the filesystem metadata,
>     so that a filesystem can be ftp'd to a guru and the damage
>     understood and bugs fixed, without revealing any private
>     data other than directory structure and file names and
>     permissions.  That might be a useful tool to have for
>     ext[23] and Reiserfs and it would provide a more direct
>     way for users to present known problems to the programmers.)

Yep, but that is the least likely one to happen.  The goals that
people strive for when creating filesystems are usually more
technical goals such as speed, minimizing disk wastage,
minimizing fragmentation, etc..  Even the best designed
filesystem can get foobed pretty good by giving hdparm bad
options, or by using an experimental kernel or experimental
modules, etc..  So we must assume that bad errors in on-disk data
structures can always occur, and no matter how well designed
these tools and filesystems are, there will be times when no
automatic software can handle disk problems without asking
questions.  A lot of disk corruption problems for example have
*NO* right or wrong answer.  Sometimes it can be a choice like
"you're losing data here pal, but I can recover one of two
things, which one do you want?  Number 34234 or number 355211?"

In that case, joe user either guesses, or reformats and starts
over, and is more careful next time (assuming something he/she
did caused the problem).


>Distribution makers:
>
>     Emphasise stability over speed when making suggestions
>     for filesystem types during install.
>
>     Include references to documentation in the root filesystem, (!)
>     when startup scripts drop an admin into a shell for running
>     fsck manually at boot.

This could be the beginning of putting a lot of docs there and
cluttering up the root fs.  The problem here is that Linux as it
stands now is based a lot on technical perfection, and other
technical issues, and ease-of-use while aimed for and met in many
areas, is a secondary goal for the most part.  The more
"end-user" friendly it is the better, but when that friendliness
results in a messier system, for example like the root dir on a
fresh Win95 install, technical users will jump ship like there is
no tomorrow.  Tutorials and documentation are fantastic, and
wizard type programs might be cool for joe user too, but anything
that forces these sort of "smart" tools on joe sysadmin, or
tech-heads, will be met by fierce opposition.  Fortunately, I
believe that future enhancements will meet end users needs
without getting in the way of guru's and techie's needs.


>     Use the most conservative hdparm settings.  Offer the user
>     a tool to set and test other settings.  (Perhaps some sort
>     of almost filesystem regression test that can be done in a
>     temporarily-created partition on each hard drive.  I for
>     one wouldn't mind leaving my machine on overnight as these
>     tests were done.)

Or don't use hdparm at all.  Better yet, improve or replace
hdparm with a tool that has a database of known good, and known
bad hardware combinations and settings.  It can warn a user when
they are choosing potentially bad settings, etc..  that would not
be perfect, but could minimize the number of users who for
example run hdparm and try to tweak it to get 15.6M/s instead of
being happy with 15.2M/s.


>     Realize that untrained users will be acting as system
>     administrators.  Be sure common failure modes are structured
>     such that these users, even if they don't fully understand
>     what's going on, can be at least somewhat informed as to
>     the basics and how they can get help.

That sounds good.  The init scripts could perhaps display larger
messages for stuff like fsck, and other potential problems.  That
would be good indeed.


>System administrators:
>
>     Keep good backups.  Verify backup integrity.  Be sure you
>     have a recovery plan that can work.  Verify backup integrity.
>     Be sure you have all the information and media required to
>     quickly recover if need be.
>
>     Send a running transaction log to another system if a
>     day-old-restore of your system isn't good enough.  (I don't
>     know very much about those issues, but if you know in advance
>     that system downtime would mean your company would lose data
>     and order information, then that's smoking-gun proof that
>     you need to fix your backup and recovery strategy.) 

Absolutely good advice.


>Businesses:
>
>     Realize that downtime is extremely expensive.  Be willing
>     to allocate time and money to investigating and verifying
>     disaster recovery procedures.

And also to hiring someone to set up your systems effectively if
you can't do it yourself.  Downtime is $$$, and spending some $$$
to get things done right can save a LOT of $$$.


>Users:
>
>     Realize that you're a system administrator even if you aren't
>     interested in being one.  Reading up on the workings of the
>     system will improve your chances of recovering it if things
>     go wrong.  If there's little that's important to you on the
>     system, then it's not such a big deal.  If you're not really
>     into computers but yet have the only copy of your life's work
>     on the machine and you don't know anyone who could help you
>     if things go horribly wrong, you may want to find and print
>     out a copy of people to call or companies to go to if the
>     worst happens.

Yep, and if you really really do not want to learn the system to
that level, you can pay me money to fix it for you when things go
wrong.  ;o)


>     Don't be fooled by pretty interfaces--Eye candy on the surface
>     does not imply a stable filesystem underneath.  Even Macs can
>     have filesystem problems that require expert help to recover from.

Good point.


>     (Okay, the "user" part is the hardest to write, and it's the
>     most unfair one of all, as it implies more responsibility
>     than even I think home users should be expected to shoulder
>     very often, even if they're really responsible in fact.)
>
>I would think dividing up the responsibilities in such a way
>as the above when thinking through the issue is a better way
>to approach the problem than simply throwing ones hands up in
>the air.  It will also increase the likelihood of helpful
>suggestions (or patches) being sent to the best group.

Yep.  I think journalled filesystems will cover the largest
problems people encounter though.  For other "users" the Windows
troubleshooting guide can always be employed to solve problems
too rather than learn:

The Microsoft Windows 3 R Troubleshooting guide.
1) Restart the program
2) Reboot the system
3) Reinstall the system from scratch

Enjoy!
;o)

TTYL

--
         Mike A. Harris  -  Linux advocate  -  Open source advocate
                   Copyright 2000 all rights reserved
                               ----------
[Quote: Linus Torvalds - Aug 27, 2000 - linux-kernel mailing list]
"And I'm right.  I'm always right, but in this case I'm just a bit more
right than I usually am." -- Linus Torvalds



_______________________________________________
Redhat-devel-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-devel-list

Reply via email to