Re: running amverify (a 2nd time)

2001-01-18 Thread Denise Ives

amverify is stuck right back at the point it seemed to be  hanging
yesterday:


Lights on the tape drive are on but not blinking:

Amanda's process activity:

ps -ef | grep amanda
amanda 29615 29614  2 13:12:42 pts/13   0:26
/usr/local/pkg/amanda-2.4.1p1/sbin/amrestore -h -p /dev/rmt/0cbn
  amanda 21102 21080  0 12:33:55 pts/13   0:00 -bash
  amanda 29620 29614  2 13:12:42 pts/13   0:21 cat
  amanda 29614 21154  0 13:12:42 pts/13   0:00 /bin/sh
/usr/local/sbin/amverify daily
  amanda 21154 21102  0 12:33:59 pts/13   0:00 /bin/sh
/usr/local/sbin/amverify daily

ps -fu amanda
 UID   PID  PPID  CSTIME TTY  TIME CMD
amanda 29615 29614  4 13:12:42 pts/13   0:32
/usr/local/pkg/amanda-2.4.1p1/sbin/amrestore -h -p /dev/rmt/0cbn
  amanda 21102 21080  0 12:33:55 pts/13   0:00 -bash
  amanda 29620 29614  3 13:12:42 pts/13   0:27 cat
  amanda 29614 21154  0 13:12:42 pts/13   0:00 /bin/sh
/usr/local/sbin/amverify daily
  amanda 21154 21102  0 12:33:59 pts/13   0:00 /bin/sh
/usr/local/sbin/amverify daily


Here is where she is stuck:

amanda@sundev1 [amanda] % amverify daily
No tape changer...
Tape device is /dev/rmt/0cbn...
Verify summary to [EMAIL PROTECTED]
Defects file is /tmp/amverify.21154/defects
amverify daily
Thu Jan 18 12:34:01 EST 2001

Using device /dev/rmt/0cbn
Volume daily119, Date 20010113   
Skipped admin1.corp.walid.com.sda3.20010113.0 (** Cannot do /sbin/dump
dumps)
Skipped admin1.corp.walid.com.sda6.20010113.0 (** Cannot do /sbin/dump
dumps)
Checked sundev1.corp.walid.com.c0t0d0s3.20010113.0
Skipped admin1.corp.walid.com.sda2.20010113.0 (** Cannot do /sbin/dump
dumps)
Checked sundev1.corp.walid.com.c0t0d0s0.20010113.0
Skipped admin1.corp.walid.com.sda5.20010113.0 (** Cannot do /sbin/dump
dumps)
Checked sundev1.corp.walid.com.c0t0d0s7.20010113.0
Reading...

lsof:
We tried to get lsof installed on our solaris box 2 months ago but it
failed to run after the install - we got a 32/64 bit error message.


More importantly - since amverify is failing - does that imply I will
not be able to amrestore/amrecover the image
sundev1.corp.walid.com.c0t0d0s7.20010113.0 on this tape? 
Of course I have not tried to use amrestore/amrecover for any of the files
currently on this amanda tape. [daily119]

thanks again - 



-- Forwarded message --
Date: Thu, 18 Jan 2001 00:17:55 -0500
From: John R. Jackson [EMAIL PROTECTED]
Reply-To: [EMAIL PROTECTED]
To: Denise Ives [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: running amverify 

... is there ayway to find out
if amverify was hung or if it was just taking a long time to do its thing?

Look for blinking lights on the drive is the first thing that pops to
mind :-).

Next, I'd get a ps listing of what was running, "ps -fu amanda-user".
If you see it sitting on "sleep" repeatedly with no more output, it's
probably hung waiting on the drive to go ready or something like that.
If it's sitting on GNU tar or dd, it's probably skipping though an image
(i.e. doing what it's supposed to).

Next, I'd get "lsof":

  ftp://vic.cc.purdue.edu/pub/tools/unix/lsof/lsof.tar.gz

This will let you see what offset various file descriptors are at (among
a bajillion other things), so you can run it on whatever processes have
the tape open and see if they are moving.

John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]





Re: running amverify (a 2nd time)

2001-01-18 Thread John R. Jackson

  amanda 29620 29614  3 13:12:42 pts/13   0:27 cat

This says it was collecting the rest of the file from tape and throwing
it away (which is normal).

Checked sundev1.corp.walid.com.c0t0d0s7.20010113.0
Reading...

As does this.

I guess the image Check on sundev1.corp.walid.com.c0t0d0s7.20010113.0
was what took so long? thoughts...

What thoughts are you looking for?

How big was the image?  How fast is your tape drive?  Was the image
compressed?  How busy was the system when you were running amverify?

lsof:
We tried to get lsof installed on our solaris box 2 months ago but it
failed to run after the install - we got a 32/64 bit error message.

I assume you then contacted the author (who happens to be my boss) and
he got it fixed for you, right?  That program runs on bajillions of OS's
and versions.  It would be truly amazing if it does not work at your site.

More importantly - since amverify is failing - does that imply I will
not be able to amrestore/amrecover the image
sundev1.corp.walid.com.c0t0d0s7.20010113.0 on this tape? 

No.  First, amverify is not failing (as you now know).  However, if it
had with an I/O error or some kind of consistency failure from your dump
program, that may or may not mean you would be able to restore at least
some of the files.  Or it could have been a dirty drive, etc.

John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]



running amverify

2001-01-17 Thread Denise Ives

amverify was taking forever so I aborted it is there ayway to find out
if amverify was hung or if it was just taking a long time to do its thing?


amverify daily
Wed Jan 17 17:08:13 EST 2001

Using device /dev/rmt/0cbn
Volume daily119, Date 20010113   
Skipped admin1.corp.walid.com.sda3.20010113.0 (** Cannot do /sbin/dump
dumps)
Skipped admin1.corp.walid.com.sda6.20010113.0 (** Cannot do /sbin/dump
dumps)
Checked sundev1.corp.walid.com.c0t0d0s3.20010113.0
Skipped admin1.corp.walid.com.sda2.20010113.0 (** Cannot do /sbin/dump
dumps)
Checked sundev1.corp.walid.com.c0t0d0s0.20010113.0
Skipped admin1.corp.walid.com.sda5.20010113.0 (** Cannot do /sbin/dump
dumps)
Checked sundev1.corp.walid.com.c0t0d0s7.20010113.0


** results - amanda report *


Subject: daily AMANDA VERIFY REPORT FOR daily119 

Tapes: daily119 
Errors found: 
aborted!

amverify daily
Wed Jan 17 17:08:13 EST 2001

Using device /dev/rmt/0cbn
Volume daily119, Date 20010113
Skipped admin1.corp.walid.com.sda3.20010113.0 (** Cannot do /sbin/dump
dumps)
Skipped admin1.corp.walid.com.sda6.20010113.0 (** Cannot do /sbin/dump
dumps)
Checked sundev1.corp.walid.com.c0t0d0s3.20010113.0
Skipped admin1.corp.walid.com.sda2.20010113.0 (** Cannot do /sbin/dump
dumps)
Checked sundev1.corp.walid.com.c0t0d0s0.20010113.0
Skipped admin1.corp.walid.com.sda5.20010113.0 (** Cannot do /sbin/dump
dumps)
Checked sundev1.corp.walid.com.c0t0d0s7.20010113.0
aborted!