Re: amrecover problem: need help/advice

2003-07-28 Thread Freels, James D.
Title: Re: amrecover problem: need help/advice




The problem was my scsi card driver.

Reviewing my kernel configuration files, between 2.4.14 and 2.4.15 (now running 2.4.21), I switched to the new sym53c8xx_2 driver. I recompiled my kernel with the older NCR53c7,8xx driver (there is also an ncr53c8xx and sym53c8xx ). It won't utilize all the speed of the card/drives, but it will read my tapes ! The strange thing is that it writes the tapes fine, but has problems reading. If I have time, I will investigate further and post to the debian-alpha mailing list to see if any similar problems have occurred in other machines.

On Sun, 2003-07-20 at 23:23, Gene Heskett wrote:

On Sunday 20 July 2003 18:31, Freels, James D. wrote:
OK. I am learning from this. The number of 1k blocks on the first
stored file on this tape is actually 384 and not 352. I should have
looked at the taper output and not the dumper output. I issue the
following multiple times:

mt rewind
mt -f=/dev/tape_norewind fsf 1
dd if=/dev/tape_norewind of=./first_file bs=1k count=384

the number of records read returned is

64
288
384
384
64
64
384

when it should be 384 every time ! Does this not smell of a
 hardware problem ? Must be the scsi card ?

Yes it does on the face of it.

If so, why don't I have other scsi errors on the hard drives ?

Any ideas ?

The only additional one I keep stumbling over is that maybe this card 
is so optimized for disk useage that its not quite kosher for a tape 
drive.

If you are running the disks on this same card, a second ugly thought 
comes to mind, and this is something that historicly seems to be 
abused by tape drives more than disk drives, and that is the honoring 
of the 'scsi disconnect', which is the situation where a command is 
issued to a device on the scsi bus, and an immediate disconnect is 
done, opening up the bus for use by other programs and such, leaving 
it up to the device the command was issued to to notify the host that 
the command has been done, and any data read (or written for that 
matter) is now ready in the buffers to be read at the host and 
controllers convienience.

Many tape drives ignore the disconnect message and lock the bus from 
other uses by other devices until the command is completed. I'm not 
saying that it couldn't happen to a disk, but if its a true scsi-2 or 
better, it should honor it.

You might investigate that aspect of it, and see if there are any 
workarounds starting with the cards own bios configuration available 
during the 'post' of a reboot. If not, then that question might be 
answered by putting a different card in for the tape drive by itself 
so that it doesn't have to share a bus with disks that in all 
probability do honor a disconnect correctly.

Other than those 2 possibilities, I'm fresh out of ideas.

On Sun, 2003-07-20 at 05:00, Gregor Ibic wrote:
 
 try to read the tape (or chunk - fsf) with dd
 dd if=/dev/nst0 of=/ bs=1024 count=
 where NNN is greater than your backup job/disk entry
 and see what happens

 i restored some valueable tape (MTF formated) reading this way and
 putting the pieces together.

 regards,
 gregor

-- 
Cheers, Gene
AMD [EMAIL PROTECTED] 320M
[EMAIL PROTECTED] 512M
99.26% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2003 by Maurice Eugene Heskett, all rights reserved.




-- 
James D. Freels, Ph.D.
Oak Ridge National Laboratory
[EMAIL PROTECTED]









RE: amrecover problem: need help/advice

2003-07-20 Thread Gregor Ibic



try to 
read the tape (or chunk - fsf) with dd
dd 
if=/dev/nst0 of=/ bs=1024 count=
where 
NNN is greater than your backup job/disk entry
and 
see what happens

i 
restored some valueable tape (MTF formated) reading this way and putting the 
pieces together.

regards,
gregor



Re: amrecover problem: need help/advice

2003-07-20 Thread Freels, James D.




OK. I can do this. Let me summarize:

I have a tape labeled fea12. This tape like all the tapes if all filesystems were backed up should contain 40 filesystems + front end + back end.

I issued the command

mt -f /dev/tape_norewind fsf 1

42 times before the end of tape occured and I received a i/o error message.

Then I issued 

mt rewind

The I issue

amrestore /dev/tape_norewind fea sdd9

where sdd9 is a bogus filesystem name to force the tape to go to the end. I get a 0 skip header stuff, followed by 40 filesystem finds, then followed by 

amrestore: 41: reached end of tape: date 20030618

Then I issue a mt rewind again.

Finally, I issue

amrestore /dev/tape_norewind

which should restore every file on the tape to the local disk area. Note the tape is written using gnutar.

But, when I do this, I get the following:

fea6::/holding_disk/: amrestore /dev/tape_norewind
amrestore: 0: skipping start of tape: date 20030618 label fea12
amrestore: 1: restoring fea.sda6.20030618.1
amrestore: read error: Input/output error
fea6::/holding_disk/:

which is my problem. Only a small portion (beginning) of this file is restored. If is issue

fea6::/holding_disk/: tar tvf fea.sda6.20030618.1
drwxr-xr-x root/root 137 2003-02-10 10:38:40 ./
drwxrwx--- fea/apse 59 2000-06-30 00:19:18 ./APSE10/
drwxrwxr-x fea/apse 90 2000-08-16 17:08:31 ./APSE10/comlib/
drwxrwxr-x fea/apse 20 2000-01-14 21:48:19 ./APSE10/comlib/SAVE/
drwxrwxr-x fea/apse 581 2000-08-16 17:12:38 ./APSE10/comslv/
drwxrwxr-x fea/apse 64 2000-08-15 16:36:15 ./APSE10/fileshks/
drwxrwx--- fea/apse 6280 2000-05-04 16:56:24 ./APSE10/matlib/
drwxrwxr-x fea/apse 171 2000-08-16 17:27:43 ./APSE10/param/
drwxrwx--- fea/apse 289 2000-08-11 14:46:34 ./APSE10/prepro/
drwxrwxr-x fea/apse 24 2000-06-01 14:22:15 ./APSE10/prepro/test/
drwxrwxr-x fea/apse 133 2000-06-07 14:51:30 ./APSE10/preprof/
drwxrwxr-x fea/apse 67 2000-06-07 14:52:38 ./APSE10/preprof/NMENU/
drwxrwxr-x fea/apse 66 2000-06-01 13:52:35 ./APSE10/preprof/OMENU/
drwxrwxr-x fea/apse 50 2000-06-01 12:46:41 ./APSE10/preprof/OMENU/test/
drwxrwxr-x fea/apse 166 2000-08-15 16:29:04 ./APSE10/preprof/RMENU/
drwxrwxr-x fea/apse 80 2000-02-22 16:45:48 ./APSE10/preprof/RMENU/models/
drwxrwxr-x fea/apse 31 1998-04-08 12:20:26 ./APSE10/preprof/falloc/
drwxrwxr-x fea/apse 65 2000-05-26 19:51:18 ./APSE10/preprof/findbc/
drwxrwxr-x fea/apse 29 1998-12-15 13:05:10 ./APSE10/preprof/flip/
drwxrwxr-x fea/apse 167 2000-08-14 13:14:25 ./APSE10/preprof/interp/
drwxrwxr-x fea/apse 17 2000-05-26 19:51:39 ./APSE10/preprof/math/
drwxrwxr-x fea/apse 38 1999-01-24 00:08:39 ./APSE10/preprof/matrices/
drwxrwxr-x fea/apse 35 1998-11-20 12:59:27 ./APSE10/preprof/mkmac/
drwxrwxr-x fea/apse 83 2000-08-11 14:49:37 ./APSE10/preprof/reader/
drwxrwxr-x fea/apse 175 1998-12-15 13:21:41 ./APSE10/preprof/refine/
drwxrwxr-x fea/apse 31 2000-05-26 19:52:46 ./APSE10/preprof/rotate/
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
fea6::/holding_disk/:

If seems there is a problem with the scsi I/O between the tape drive and the hard drive.

Any more good ideas ?

On Sat, 2003-07-19 at 13:02, Gene Heskett wrote:

That doesn't sound as if its the drive to me since 2 drives cannot 
read this tape, its more than likely a bad tape.  I don't think thats 
what you wanted to hear though...

Can you 'mt -f /device fsf nn' where nn is the number of the next file 
on the tape?

This is one of the reasons one should have a tapecycle that is at 
*least* 2*runspercycle*runtapes.  It would be somewhat dated, but 
better than nothing, to back up one dumpcycles worth of tapes in the 
sequence and recover that.




-- 
James D. Freels, Ph.D.
[EMAIL PROTECTED]  or  [EMAIL PROTECTED]
mplayer -cache 100 http://wdvx.microcerv.net/wdvx








RE: amrecover problem: need help/advice

2003-07-20 Thread Freels, James D.




Gregor,

Thanks for responding. I also responded back to Gene Heskett with his suggestion and a little more information.

I would like to try this idea. How can I determine  from the amanda log files ? Once I output the data from the tape to the drive, how do I conver it back to a .tar file ? The amdump/amrestore is using gnutar.

Thanks...

On Sun, 2003-07-20 at 05:00, Gregor Ibic wrote:

 
try to read the tape (or chunk - fsf) with dd
dd if=/dev/nst0 of=/ bs=1024 count=
where NNN is greater than your backup job/disk entry
and see what happens

i restored some valueable tape (MTF formated) reading this way and putting the pieces together.

regards,
gregor
 




-- 
James D. Freels, Ph.D.
[EMAIL PROTECTED]  or  [EMAIL PROTECTED]
mplayer -cache 100 http://wdvx.microcerv.net/wdvx








RE: amrecover problem: need help/advice

2003-07-20 Thread Freels, James D.




OK. I went to the amdump log file and found that the first file should have 352 1k blocks written to the tape.

Then I issued

mt rewind
mt -f /dev/tape_norewind fsf 1
dd if=/dev/tape_norewind of=./first_file bs=1k count=352

Then the following was received at the console:

dd: reading `/dev/tape_norewind': Input/output error
64+0 records in
64+0 records out

Then if I issue a more ./first_file, I get instructions on how to untar this block of data, with the following:

dd if=./first_file bs=32k skip=1 | /bin/tar -tf-

The output to the console looks like:

./
./APSE10/
./APSE10/comlib/
./APSE10/comlib/SAVE/
./APSE10/comslv/
./APSE10/fileshks/
./APSE10/matlib/
./APSE10/param/
./APSE10/prepro/
./APSE10/prepro/test/
./APSE10/preprof/
./APSE10/preprof/NMENU/
./APSE10/preprof/OMENU/
./APSE10/preprof/OMENU/test/
1+0 records in
1+0 records out
./APSE10/preprof/RMENU/
./APSE10/preprof/RMENU/models/
./APSE10/preprof/falloc/
./APSE10/preprof/findbc/
./APSE10/preprof/flip/
./APSE10/preprof/interp/
./APSE10/preprof/math/
./APSE10/preprof/matrices/
./APSE10/preprof/mkmac/
./APSE10/preprof/reader/
./APSE10/preprof/refine/
./APSE10/preprof/rotate/
/bin/tar: Unexpected EOF in archive
/bin/tar: Error is not recoverable: exiting now
fea6::/holding_disk/:

which is about what I get from the output of amrestore. I think the key to the problem is the initial I/O error that I get from the dd statement from the tape as input device. It is similar to what I get from amrestore. It is as if the data is on the tape, but a scsi problem is not letting me read it from the tape.

One additional bit of info: The server doing all the I/O is an Alpha using a symbios scsi card/driver. It has never had this problem before and restored many a file in the past...

On Sun, 2003-07-20 at 17:44, Freels, James D. wrote:

Gregor,

Thanks for responding. I also responded back to Gene Heskett with his suggestion and a little more information.

I would like to try this idea. How can I determine  from the amanda log files ? Once I output the data from the tape to the drive, how do I conver it back to a .tar file ? The amdump/amrestore is using gnutar.

Thanks...

On Sun, 2003-07-20 at 05:00, Gregor Ibic wrote: 

 
try to read the tape (or chunk - fsf) with dd
dd if=/dev/nst0 of=/ bs=1024 count=
where NNN is greater than your backup job/disk entry
and see what happens

i restored some valueable tape (MTF formated) reading this way and putting the pieces together.

regards,
gregor




-- 
James D. Freels, Ph.D.
[EMAIL PROTECTED]  or  [EMAIL PROTECTED]
mplayer -cache 100 http://wdvx.microcerv.net/wdvx








-- 
James D. Freels, Ph.D.
[EMAIL PROTECTED]  or  [EMAIL PROTECTED]
mplayer -cache 100 http://wdvx.microcerv.net/wdvx









RE: amrecover problem: need help/advice

2003-07-20 Thread Freels, James D.




OK. I am learning from this. The number of 1k blocks on the first stored file on this tape is actually 384 and not 352. I should have looked at the taper output and not the dumper output. I issue the following multiple times:

mt rewind
mt -f=/dev/tape_norewind fsf 1
dd if=/dev/tape_norewind of=./first_file bs=1k count=384

the number of records read returned is

64
288
384
384
64
64
384

when it should be 384 every time ! Does this not smell of a hardware problem ? Must be the scsi card ?
If so, why don't I have other scsi errors on the hard drives ?

Any ideas ?

On Sun, 2003-07-20 at 05:00, Gregor Ibic wrote:

 
try to read the tape (or chunk - fsf) with dd
dd if=/dev/nst0 of=/ bs=1024 count=
where NNN is greater than your backup job/disk entry
and see what happens

i restored some valueable tape (MTF formated) reading this way and putting the pieces together.

regards,
gregor
 




-- 
James D. Freels, Ph.D.
[EMAIL PROTECTED]  or  [EMAIL PROTECTED]
mplayer -cache 100 http://wdvx.microcerv.net/wdvx








Re: amrecover problem: need help/advice

2003-07-20 Thread Gene Heskett
On Sunday 20 July 2003 18:31, Freels, James D. wrote:
OK.  I am learning from this. The number of 1k blocks on the first
stored file on this tape is actually 384 and not 352.  I should have
looked at the taper output and not the dumper output.   I issue the
following multiple times:

mt rewind
mt -f=/dev/tape_norewind fsf 1
dd if=/dev/tape_norewind of=./first_file bs=1k count=384

the number of records read returned is

64
288
384
384
64
64
384

when it should be 384 every time !  Does this not smell of a
 hardware problem ?  Must be the scsi card ?

Yes it does on the face of it.

If so, why don't I have other scsi errors on the hard drives ?

Any ideas ?

The only additional one I keep stumbling over is that maybe this card 
is so optimized for disk useage that its not quite kosher for a tape 
drive.

If you are running the disks on this same card, a second ugly thought 
comes to mind, and this is something that historicly seems to be 
abused by tape drives more than disk drives, and that is the honoring 
of the 'scsi disconnect', which is the situation where a command is 
issued to a device on the scsi bus, and an immediate disconnect is 
done, opening up the bus for use by other programs and such, leaving 
it up to the device the command was issued to to notify the host that 
the command has been done, and any data read (or written for that 
matter) is now ready in the buffers to be read at the host and 
controllers convienience.

Many tape drives ignore the disconnect message and lock the bus from 
other uses by other devices until the command is completed.  I'm not 
saying that it couldn't happen to a disk, but if its a true scsi-2 or 
better, it should honor it.

You might investigate that aspect of it, and see if there are any 
workarounds starting with the cards own bios configuration available 
during the 'post' of a reboot.  If not, then that question might be 
answered by putting a different card in for the tape drive by itself 
so that it doesn't have to share a bus with disks that in all 
probability do honor a disconnect correctly.

Other than those 2 possibilities, I'm fresh out of ideas.

On Sun, 2003-07-20 at 05:00, Gregor Ibic wrote:
 
 try to read the tape (or chunk - fsf) with dd
 dd if=/dev/nst0 of=/ bs=1024 count=
 where NNN is greater than your backup job/disk entry
 and see what happens

 i restored some valueable tape (MTF formated) reading this way and
 putting the pieces together.

 regards,
 gregor

-- 
Cheers, Gene
AMD [EMAIL PROTECTED] 320M
[EMAIL PROTECTED]  512M
99.26% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2003 by Maurice Eugene Heskett, all rights reserved.



Re: amrecover problem: need help/advice

2003-07-19 Thread Gene Heskett
On Saturday 19 July 2003 10:46, Freels, James D. wrote:
Hello I have used amanda for a long time and have used the amrecover
command to recover files/directories in the past without problems.
Recently, I accidently deleted a large chunk of my home directory
 (you don't really want to know how this happened do you--long
 story). Normally, if something like this happens I do not panic
 because I know that I can recover most, if not all, of the files
 deleted.

The AMANDA log files all indicate backups have been written
 normally. Indeed, I can verify that myt home directory and each
 individual file thereof is listed in the index files on the backup
 server.  However, unlike in the past, when I issue the normal

amrecover /dev/tape_norewind machine  disk-device

I get a scsi-related error message after if finds the file on the
 tape. It searches through the tape, finds the file, and only dumps
 part of the file.  For example, if the file should be 4GB, it is
 only dumping about 500 Mb.  It is as if a buffer is overflowing or
 something like that.

I get no similar messages when creating the tape, only when trying
 to read off the tape.

Any help appreciated.  I am at a loss unless I have had a buggy tape
drive for some time and not realized it.

P.S.  I have also taken the tape to a second compatible drive on an
entirely different machine.  This drive also could not read the
 tape. Not looking good...

That doesn't sound as if its the drive to me since 2 drives cannot 
read this tape, its more than likely a bad tape.  I don't think thats 
what you wanted to hear though...

Can you 'mt -f /device fsf nn' where nn is the number of the next file 
on the tape?

This is one of the reasons one should have a tapecycle that is at 
*least* 2*runspercycle*runtapes.  It would be somewhat dated, but 
better than nothing, to back up one dumpcycles worth of tapes in the 
sequence and recover that.

-- 
Cheers, Gene
AMD [EMAIL PROTECTED] 320M
[EMAIL PROTECTED]  512M
99.26% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2003 by Maurice Eugene Heskett, all rights reserved.