Re: amrecover problem: need help/advice
Title: Re: amrecover problem: need help/advice The problem was my scsi card driver. Reviewing my kernel configuration files, between 2.4.14 and 2.4.15 (now running 2.4.21), I switched to the new sym53c8xx_2 driver. I recompiled my kernel with the older NCR53c7,8xx driver (there is also an ncr53c8xx and sym53c8xx ). It won't utilize all the speed of the card/drives, but it will read my tapes ! The strange thing is that it writes the tapes fine, but has problems reading. If I have time, I will investigate further and post to the debian-alpha mailing list to see if any similar problems have occurred in other machines. On Sun, 2003-07-20 at 23:23, Gene Heskett wrote: On Sunday 20 July 2003 18:31, Freels, James D. wrote: >OK. I am learning from this. The number of 1k blocks on the first >stored file on this tape is actually 384 and not 352. I should have >looked at the taper output and not the dumper output. I issue the >following multiple times: > >mt rewind >mt -f=/dev/tape_norewind fsf 1 >dd if=/dev/tape_norewind of=./first_file bs=1k count=384 > >the number of records read returned is > >64 >288 >384 >384 >64 >64 >384 > >when it should be 384 every time ! Does this not smell of a > hardware problem ? Must be the scsi card ? Yes it does on the face of it. >If so, why don't I have other scsi errors on the hard drives ? > >Any ideas ? The only additional one I keep stumbling over is that maybe this card is so optimized for disk useage that its not quite kosher for a tape drive. If you are running the disks on this same card, a second ugly thought comes to mind, and this is something that historicly seems to be abused by tape drives more than disk drives, and that is the honoring of the 'scsi disconnect', which is the situation where a command is issued to a device on the scsi bus, and an immediate disconnect is done, opening up the bus for use by other programs and such, leaving it up to the device the command was issued to to notify the host that the command has been done, and any data read (or written for that matter) is now ready in the buffers to be read at the host and controllers convienience. Many tape drives ignore the disconnect message and lock the bus from other uses by other devices until the command is completed. I'm not saying that it couldn't happen to a disk, but if its a true scsi-2 or better, it should honor it. You might investigate that aspect of it, and see if there are any workarounds starting with the cards own bios configuration available during the 'post' of a reboot. If not, then that question might be answered by putting a different card in for the tape drive by itself so that it doesn't have to share a bus with disks that in all probability do honor a disconnect correctly. Other than those 2 possibilities, I'm fresh out of ideas. >On Sun, 2003-07-20 at 05:00, Gregor Ibic wrote: >> >> try to read the tape (or chunk - fsf) with dd >> dd if=/dev/nst0 of=/ bs=1024 count= >> where NNN is greater than your backup job/disk entry >> and see what happens >> >> i restored some valueable tape (MTF formated) reading this way and >> putting the pieces together. >> >> regards, >> gregor -- Cheers, Gene AMD [EMAIL PROTECTED] 320M [EMAIL PROTECTED] 512M 99.26% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attornies please note, additions to this message by Gene Heskett are: Copyright 2003 by Maurice Eugene Heskett, all rights reserved. -- James D. Freels, Ph.D. Oak Ridge National Laboratory [EMAIL PROTECTED]
Re: amrecover problem: need help/advice
On Sunday 20 July 2003 18:31, Freels, James D. wrote: >OK. I am learning from this. The number of 1k blocks on the first >stored file on this tape is actually 384 and not 352. I should have >looked at the taper output and not the dumper output. I issue the >following multiple times: > >mt rewind >mt -f=/dev/tape_norewind fsf 1 >dd if=/dev/tape_norewind of=./first_file bs=1k count=384 > >the number of records read returned is > >64 >288 >384 >384 >64 >64 >384 > >when it should be 384 every time ! Does this not smell of a > hardware problem ? Must be the scsi card ? Yes it does on the face of it. >If so, why don't I have other scsi errors on the hard drives ? > >Any ideas ? The only additional one I keep stumbling over is that maybe this card is so optimized for disk useage that its not quite kosher for a tape drive. If you are running the disks on this same card, a second ugly thought comes to mind, and this is something that historicly seems to be abused by tape drives more than disk drives, and that is the honoring of the 'scsi disconnect', which is the situation where a command is issued to a device on the scsi bus, and an immediate disconnect is done, opening up the bus for use by other programs and such, leaving it up to the device the command was issued to to notify the host that the command has been done, and any data read (or written for that matter) is now ready in the buffers to be read at the host and controllers convienience. Many tape drives ignore the disconnect message and lock the bus from other uses by other devices until the command is completed. I'm not saying that it couldn't happen to a disk, but if its a true scsi-2 or better, it should honor it. You might investigate that aspect of it, and see if there are any workarounds starting with the cards own bios configuration available during the 'post' of a reboot. If not, then that question might be answered by putting a different card in for the tape drive by itself so that it doesn't have to share a bus with disks that in all probability do honor a disconnect correctly. Other than those 2 possibilities, I'm fresh out of ideas. >On Sun, 2003-07-20 at 05:00, Gregor Ibic wrote: >> >> try to read the tape (or chunk - fsf) with dd >> dd if=/dev/nst0 of=/ bs=1024 count= >> where NNN is greater than your backup job/disk entry >> and see what happens >> >> i restored some valueable tape (MTF formated) reading this way and >> putting the pieces together. >> >> regards, >> gregor -- Cheers, Gene AMD [EMAIL PROTECTED] 320M [EMAIL PROTECTED] 512M 99.26% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attornies please note, additions to this message by Gene Heskett are: Copyright 2003 by Maurice Eugene Heskett, all rights reserved.
RE: amrecover problem: need help/advice
OK. I am learning from this. The number of 1k blocks on the first stored file on this tape is actually 384 and not 352. I should have looked at the taper output and not the dumper output. I issue the following multiple times: mt rewind mt -f=/dev/tape_norewind fsf 1 dd if=/dev/tape_norewind of=./first_file bs=1k count=384 the number of records read returned is 64 288 384 384 64 64 384 when it should be 384 every time ! Does this not smell of a hardware problem ? Must be the scsi card ? If so, why don't I have other scsi errors on the hard drives ? Any ideas ? On Sun, 2003-07-20 at 05:00, Gregor Ibic wrote: try to read the tape (or chunk - fsf) with dd dd if=/dev/nst0 of=/ bs=1024 count= where NNN is greater than your backup job/disk entry and see what happens i restored some valueable tape (MTF formated) reading this way and putting the pieces together. regards, gregor -- James D. Freels, Ph.D. [EMAIL PROTECTED] or [EMAIL PROTECTED] mplayer -cache 100 http://wdvx.microcerv.net/wdvx
RE: amrecover problem: need help/advice
OK. I went to the amdump log file and found that the first file should have 352 1k blocks written to the tape. Then I issued mt rewind mt -f /dev/tape_norewind fsf 1 dd if=/dev/tape_norewind of=./first_file bs=1k count=352 Then the following was received at the console: dd: reading `/dev/tape_norewind': Input/output error 64+0 records in 64+0 records out Then if I issue a "more ./first_file", I get instructions on how to untar this block of data, with the following: dd if=./first_file bs=32k skip=1 | /bin/tar -tf- The output to the console looks like: ./ ./APSE10/ ./APSE10/comlib/ ./APSE10/comlib/SAVE/ ./APSE10/comslv/ ./APSE10/fileshks/ ./APSE10/matlib/ ./APSE10/param/ ./APSE10/prepro/ ./APSE10/prepro/test/ ./APSE10/preprof/ ./APSE10/preprof/NMENU/ ./APSE10/preprof/OMENU/ ./APSE10/preprof/OMENU/test/ 1+0 records in 1+0 records out ./APSE10/preprof/RMENU/ ./APSE10/preprof/RMENU/models/ ./APSE10/preprof/falloc/ ./APSE10/preprof/findbc/ ./APSE10/preprof/flip/ ./APSE10/preprof/interp/ ./APSE10/preprof/math/ ./APSE10/preprof/matrices/ ./APSE10/preprof/mkmac/ ./APSE10/preprof/reader/ ./APSE10/preprof/refine/ ./APSE10/preprof/rotate/ /bin/tar: Unexpected EOF in archive /bin/tar: Error is not recoverable: exiting now fea6::/holding_disk/: which is about what I get from the output of amrestore. I think the key to the problem is the initial I/O error that I get from the dd statement from the tape as input device. It is similar to what I get from amrestore. It is as if the data is on the tape, but a scsi problem is not letting me read it from the tape. One additional bit of info: The server doing all the I/O is an Alpha using a symbios scsi card/driver. It has never had this problem before and restored many a file in the past... On Sun, 2003-07-20 at 17:44, Freels, James D. wrote: Gregor, Thanks for responding. I also responded back to Gene Heskett with his suggestion and a little more information. I would like to try this idea. How can I determine from the amanda log files ? Once I output the data from the tape to the drive, how do I conver it back to a .tar file ? The amdump/amrestore is using gnutar. Thanks... On Sun, 2003-07-20 at 05:00, Gregor Ibic wrote: try to read the tape (or chunk - fsf) with dd dd if=/dev/nst0 of=/ bs=1024 count= where NNN is greater than your backup job/disk entry and see what happens i restored some valueable tape (MTF formated) reading this way and putting the pieces together. regards, gregor -- James D. Freels, Ph.D. [EMAIL PROTECTED] or [EMAIL PROTECTED] mplayer -cache 100 http://wdvx.microcerv.net/wdvx -- James D. Freels, Ph.D. [EMAIL PROTECTED] or [EMAIL PROTECTED] mplayer -cache 100 http://wdvx.microcerv.net/wdvx
RE: amrecover problem: need help/advice
Gregor, Thanks for responding. I also responded back to Gene Heskett with his suggestion and a little more information. I would like to try this idea. How can I determine from the amanda log files ? Once I output the data from the tape to the drive, how do I conver it back to a .tar file ? The amdump/amrestore is using gnutar. Thanks... On Sun, 2003-07-20 at 05:00, Gregor Ibic wrote: try to read the tape (or chunk - fsf) with dd dd if=/dev/nst0 of=/ bs=1024 count= where NNN is greater than your backup job/disk entry and see what happens i restored some valueable tape (MTF formated) reading this way and putting the pieces together. regards, gregor -- James D. Freels, Ph.D. [EMAIL PROTECTED] or [EMAIL PROTECTED] mplayer -cache 100 http://wdvx.microcerv.net/wdvx
Re: amrecover problem: need help/advice
OK. I can do this. Let me summarize: I have a tape labeled "fea12". This tape like all the tapes if all filesystems were backed up should contain 40 filesystems + front end + back end. I issued the command mt -f /dev/tape_norewind fsf 1 42 times before the end of tape occured and I received a i/o error message. Then I issued mt rewind The I issue amrestore /dev/tape_norewind fea sdd9 where sdd9 is a bogus filesystem name to force the tape to go to the end. I get a "0 skip header stuff", followed by 40 filesystem finds, then followed by amrestore: 41: reached end of tape: date 20030618 Then I issue a "mt rewind" again. Finally, I issue amrestore /dev/tape_norewind which should restore every file on the tape to the local disk area. Note the tape is written using gnutar. But, when I do this, I get the following: fea6::/holding_disk/: amrestore /dev/tape_norewind amrestore: 0: skipping start of tape: date 20030618 label fea12 amrestore: 1: restoring fea.sda6.20030618.1 amrestore: read error: Input/output error fea6::/holding_disk/: which is my problem. Only a small portion (beginning) of this file is restored. If is issue fea6::/holding_disk/: tar tvf fea.sda6.20030618.1 drwxr-xr-x root/root 137 2003-02-10 10:38:40 ./ drwxrwx--- fea/apse 59 2000-06-30 00:19:18 ./APSE10/ drwxrwxr-x fea/apse 90 2000-08-16 17:08:31 ./APSE10/comlib/ drwxrwxr-x fea/apse 20 2000-01-14 21:48:19 ./APSE10/comlib/SAVE/ drwxrwxr-x fea/apse 581 2000-08-16 17:12:38 ./APSE10/comslv/ drwxrwxr-x fea/apse 64 2000-08-15 16:36:15 ./APSE10/fileshks/ drwxrwx--- fea/apse 6280 2000-05-04 16:56:24 ./APSE10/matlib/ drwxrwxr-x fea/apse 171 2000-08-16 17:27:43 ./APSE10/param/ drwxrwx--- fea/apse 289 2000-08-11 14:46:34 ./APSE10/prepro/ drwxrwxr-x fea/apse 24 2000-06-01 14:22:15 ./APSE10/prepro/test/ drwxrwxr-x fea/apse 133 2000-06-07 14:51:30 ./APSE10/preprof/ drwxrwxr-x fea/apse 67 2000-06-07 14:52:38 ./APSE10/preprof/NMENU/ drwxrwxr-x fea/apse 66 2000-06-01 13:52:35 ./APSE10/preprof/OMENU/ drwxrwxr-x fea/apse 50 2000-06-01 12:46:41 ./APSE10/preprof/OMENU/test/ drwxrwxr-x fea/apse 166 2000-08-15 16:29:04 ./APSE10/preprof/RMENU/ drwxrwxr-x fea/apse 80 2000-02-22 16:45:48 ./APSE10/preprof/RMENU/models/ drwxrwxr-x fea/apse 31 1998-04-08 12:20:26 ./APSE10/preprof/falloc/ drwxrwxr-x fea/apse 65 2000-05-26 19:51:18 ./APSE10/preprof/findbc/ drwxrwxr-x fea/apse 29 1998-12-15 13:05:10 ./APSE10/preprof/flip/ drwxrwxr-x fea/apse 167 2000-08-14 13:14:25 ./APSE10/preprof/interp/ drwxrwxr-x fea/apse 17 2000-05-26 19:51:39 ./APSE10/preprof/math/ drwxrwxr-x fea/apse 38 1999-01-24 00:08:39 ./APSE10/preprof/matrices/ drwxrwxr-x fea/apse 35 1998-11-20 12:59:27 ./APSE10/preprof/mkmac/ drwxrwxr-x fea/apse 83 2000-08-11 14:49:37 ./APSE10/preprof/reader/ drwxrwxr-x fea/apse 175 1998-12-15 13:21:41 ./APSE10/preprof/refine/ drwxrwxr-x fea/apse 31 2000-05-26 19:52:46 ./APSE10/preprof/rotate/ tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now fea6::/holding_disk/: If seems there is a problem with the scsi I/O between the tape drive and the hard drive. Any more good ideas ? On Sat, 2003-07-19 at 13:02, Gene Heskett wrote: That doesn't sound as if its the drive to me since 2 drives cannot read this tape, its more than likely a bad tape. I don't think thats what you wanted to hear though... Can you 'mt -f /device fsf nn' where nn is the number of the next file on the tape? This is one of the reasons one should have a tapecycle that is at *least* 2*runspercycle*runtapes. It would be somewhat dated, but better than nothing, to back up one dumpcycles worth of tapes in the sequence and recover that. -- James D. Freels, Ph.D. [EMAIL PROTECTED] or [EMAIL PROTECTED] mplayer -cache 100 http://wdvx.microcerv.net/wdvx
RE: amrecover problem: need help/advice
try to read the tape (or chunk - fsf) with dd dd if=/dev/nst0 of=/ bs=1024 count= where NNN is greater than your backup job/disk entry and see what happens i restored some valueable tape (MTF formated) reading this way and putting the pieces together. regards, gregor
Re: amrecover problem: need help/advice
On Saturday 19 July 2003 10:46, Freels, James D. wrote: >Hello I have used amanda for a long time and have used the amrecover >command to recover files/directories in the past without problems. >Recently, I accidently deleted a large chunk of my home directory > (you don't really want to know how this happened do you--long > story). Normally, if something like this happens I do not panic > because I know that I can recover most, if not all, of the files > deleted. > >The AMANDA log files all indicate backups have been written > normally. Indeed, I can verify that myt home directory and each > individual file thereof is listed in the index files on the backup > server. However, unlike in the past, when I issue the normal > >amrecover /dev/tape_norewind machine disk-device > >I get a scsi-related error message after if finds the file on the > tape. It searches through the tape, finds the file, and only dumps > part of the file. For example, if the file should be 4GB, it is > only dumping about 500 Mb. It is as if a buffer is overflowing or > something like that. > >I get no similar messages when creating the tape, only when trying > to read off the tape. > >Any help appreciated. I am at a loss unless I have had a buggy tape >drive for some time and not realized it. > >P.S. I have also taken the tape to a second compatible drive on an >entirely different machine. This drive also could not read the > tape. Not looking good... That doesn't sound as if its the drive to me since 2 drives cannot read this tape, its more than likely a bad tape. I don't think thats what you wanted to hear though... Can you 'mt -f /device fsf nn' where nn is the number of the next file on the tape? This is one of the reasons one should have a tapecycle that is at *least* 2*runspercycle*runtapes. It would be somewhat dated, but better than nothing, to back up one dumpcycles worth of tapes in the sequence and recover that. -- Cheers, Gene AMD [EMAIL PROTECTED] 320M [EMAIL PROTECTED] 512M 99.26% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attornies please note, additions to this message by Gene Heskett are: Copyright 2003 by Maurice Eugene Heskett, all rights reserved.