RE: dump(8) race conditions?

2002-02-08 Thread Duane H. Hesser


On 07-Feb-02 Markus Stumpf wrote:
 We use amanda and dump for backups. Some hosts have rather busy disks
 even during non prime time hours when backup is run.
 
 From time to time amanda reports dump(8) errors like the following:
 
 sendbackup: info end
|   DUMP: Date of this level 5 dump: Wed Feb  6 01:53:12 2002
|   DUMP: Date of last level 4 dump: Mon Feb  4 02:31:40 2002
|   DUMP: Dumping /dev/rda4s1e (/share/turing/disk07) to standard output
|   DUMP: mapping (Pass I) [regular files]
|   DUMP: mapping (Pass II) [directories]
|   DUMP: estimated 2423080 tape blocks.
|   DUMP: dumping (Pass III) [directories]
|   DUMP: dumping (Pass IV) [regular files]
|   DUMP: 14.72% done, finished in 0:28
|   DUMP: 33.78% done, finished in 0:19
|   DUMP: 52.84% done, finished in 0:13
|   DUMP: 71.65% done, finished in 0:07
 ?   DUMP: read error from /dev/rda4s1e: Invalid argument: [block -410921522]: 
count=3072
 ?   DUMP:   DUMP: read error from /dev/rda4s1e: Invalid argument: [sector 
-410921522]: count=512
 ?   DUMP: read error from /dev/rda4s1e: Invalid argument: [block -410921532]: 
count=5120
 ?   DUMP: read error from /dev/rda4s1e: Invalid argument: [block -1001057530]: 
count=1024
 [ ... ]
 
 First time we saw this we took down the machine to single user, unmounted
 the disk and fsck'd it. No errors where found and the next backups (even
 level 0) made it without errors.
 
 As we where still suspicious as to what might be the reason for this really
 sporadic error messages from different machines and different disks I
 look through the source of dump.
 
 If I do interpret the code correctly dump caches directory inode lists.
 Now, if during a dump and after caching the inode infos files get
 removed/shrunk dump has a dirty cache and tries to access blocks
 that are not/no longer allocated and the result are the above errors.
 
 Am I right with my interpretation or are this really hardware errors?
 

You are essentially correct, and your message is probably a good
reminder for those of us who routinely use dump on active filesystems.

Dump is a two pass system, and any activity which modifies inodes
between the first pass and the second is likely to cause problems,
either for dump or for restore.   It has always been thus, even as
far back as V7 (and probably v6).

Dumps which report errors such as the ones you mention are likely
to cause difficulities on restore.  Sometimes they will be completely
unreadable; sometimes partial or interactive restores will succeed
(for some files).  It is even possible that the dump may be completely
restorable, but with corrupted files.  On the other hand, dumps
which *don't* report errors can still be subtly corrupted.  Elizabeth
Zwicky, in a ten year old paper entitled Torture Testing Backup
and Archive Programs', discusses a couple of situations where this
can occur.

It is operationally (and sometimes politically) difficult to dump
on unmounted filesystems, so most of us (I think) bite the bullet
and try to dump at times when the subject filesystem is likely to
be quiescent.  It may also be smart to dump more frequently than
otherwise called for, just to increase the odds.

Your message reminds us of the risks we take.

It is worth noting that activity can be occurring on a filestystem
and dump will succeed if there is no activity which alters inodes
significantly between passes.

--
Duane H. Hesser
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



dump(8) race conditions?

2002-02-07 Thread Markus Stumpf

We use amanda and dump for backups. Some hosts have rather busy disks
even during non prime time hours when backup is run.

From time to time amanda reports dump(8) errors like the following:

sendbackup: info end
|   DUMP: Date of this level 5 dump: Wed Feb  6 01:53:12 2002
|   DUMP: Date of last level 4 dump: Mon Feb  4 02:31:40 2002
|   DUMP: Dumping /dev/rda4s1e (/share/turing/disk07) to standard output
|   DUMP: mapping (Pass I) [regular files]
|   DUMP: mapping (Pass II) [directories]
|   DUMP: estimated 2423080 tape blocks.
|   DUMP: dumping (Pass III) [directories]
|   DUMP: dumping (Pass IV) [regular files]
|   DUMP: 14.72% done, finished in 0:28
|   DUMP: 33.78% done, finished in 0:19
|   DUMP: 52.84% done, finished in 0:13
|   DUMP: 71.65% done, finished in 0:07
?   DUMP: read error from /dev/rda4s1e: Invalid argument: [block -410921522]: 
count=3072
?   DUMP:   DUMP: read error from /dev/rda4s1e: Invalid argument: [sector -410921522]: 
count=512
?   DUMP: read error from /dev/rda4s1e: Invalid argument: [block -410921532]: 
count=5120
?   DUMP: read error from /dev/rda4s1e: Invalid argument: [block -1001057530]: 
count=1024
[ ... ]

First time we saw this we took down the machine to single user, unmounted
the disk and fsck'd it. No errors where found and the next backups (even
level 0) made it without errors.

As we where still suspicious as to what might be the reason for this really
sporadic error messages from different machines and different disks I
look through the source of dump.

If I do interpret the code correctly dump caches directory inode lists.
Now, if during a dump and after caching the inode infos files get
removed/shrunk dump has a dirty cache and tries to access blocks
that are not/no longer allocated and the result are the above errors.

Am I right with my interpretation or are this really hardware errors?

Thanks,

\Maex

-- 
SpaceNet AG| Joseph-Dollinger-Bogen 14 | Fon: +49 (89) 32356-0
Research  Development |   D-80807 Muenchen| Fax: +49 (89) 32356-299
The security, stability and reliability of a computer system is reciprocally
 proportional to the amount of vacuity between the ears of the admin

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: dump(8) race conditions?

2002-02-07 Thread Dan Nelson

In the last episode (Feb 07), Markus Stumpf said:
 We use amanda and dump for backups. Some hosts have rather busy disks
 even during non prime time hours when backup is run.
 
 From time to time amanda reports dump(8) errors like the following:
 
 ?   DUMP: read error from /dev/rda4s1e: Invalid argument: [block -410921522]: 
count=3072
 ?   DUMP:   DUMP: read error from /dev/rda4s1e: Invalid argument: [sector 
-410921522]: count=512
 ?   DUMP: read error from /dev/rda4s1e: Invalid argument: [block -410921532]: 
count=5120
 ?   DUMP: read error from /dev/rda4s1e: Invalid argument: [block -1001057530]: 
count=1024
 [ ... ]

Dump should ideally be run on an unmounted filesystem.  The next best
is to create a snapshot ( /usr/src/sys/ufs/ffs/README.snapshot ) and
dump that.

-- 
Dan Nelson
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: dump(8) race conditions?

2002-02-07 Thread Markus Stumpf

On Thu, Feb 07, 2002 at 11:54:02AM -0600, Dan Nelson wrote:
 Dump should ideally be run on an unmounted filesystem.  The next best
 is to create a snapshot ( /usr/src/sys/ufs/ffs/README.snapshot ) and
 dump that.

True.
But on systems that host e.g. mailservers or webservers its unacceptable
to disrupt services tp umount and backup the system :/

$ uname -a
FreeBSD 4.4-RELEASE
$ more /usr/src/sys/ufs/ffs/README.snapshot
/usr/src/sys/ufs/ffs/README.snapshot: No such file or directory
:-)))

Located it in stable, but the READNE says:
2) Run dump on the snapshot. You will get a dump that is
   consistent with the filesystem as of the timestamp of the
   snapshot. Note that I have not yet changed dump to set the
   dumpdates file correctly, so do not use this feature in
   production until that fix is made.
:-((

Anyway, I have no problem with the errors per se, just wanted
to know if they could result from the race conditions or if I have
to better change the disks.

\Maex

-- 
SpaceNet AG| Joseph-Dollinger-Bogen 14 | Fon: +49 (89) 32356-0
Research  Development |   D-80807 Muenchen| Fax: +49 (89) 32356-299
The security, stability and reliability of a computer system is reciprocally
 proportional to the amount of vacuity between the ears of the admin

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message