Yep.  I've done audit on EVERY volume in the library.  More than once.

Also, since the primary pool is collocated and the offsite pool is not, it
is frequently impossible to tell WHICH primary volume needs to be restored.

You see the error on the reclaim of the copy pool tapes; since we normally
hit many tapes in a reclaim cycle, you find these errors in the log and
can't really tell which primary pool tape is the culprit.

And when I can tell, the RESTORE VOLUME doesn't always fix it, and I end up
deleting the remainin few files on teh copy pool tape, on the assumption
that TSM will pick them up again the next time a do BACKUP STGPOOL.

So far, I"m baffled.

I see maybe 50 files a week with this error, while millions have gone
through reclaim.  And it's only the copy pool tapes that seem to have a
problem, so I haven't been scared enough by it yet to shut down and audit
the DB.  I"m thinking of doing that over the New Year holiday.






-----Original Message-----
From: Bill Colwell [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, November 13, 2001 5:05 PM
To: [EMAIL PROTECTED]
Subject: Re: Antwort: Re: ANR9999D with a strange message


Have you done 'audit volume fix=yes'?  If these errors occur
during offsite reclaim, then the problem is on the primary tapes.

I have had problems like this a long time ago, nothing recent (knock on
wood).
I would do a move data on the primary volume until it quit with
the error, then audit the volume, fix=yes, then do move data again
until only bad files were on the tape, then do a restore volume.

I never, ever had to do a discarddata on a primary volume, restore
volume has always worked.


--
--------------------------
Bill Colwell
C. S. Draper Lab
Cambridge, Ma.
[EMAIL PROTECTED]
--------------------------


In <[EMAIL PROTECTED]>, on
11/13/01
   at 05:04 PM, "Prather, Wanda" <[EMAIL PROTECTED]> said:

>The problem is here we don't have any idea what is being deleted if we
>delete it.
>These errors occur on RECLAIMS for COPY POOL tapes.  So you may be getting
>errors on stuff that is months old, and the client can't back it up again.
>If I could look at the contents of the tapes that won't reclaim and tell
>whether there is a good copy in the primary pool, or a later backup copy,
>that would be fine.  But running contents on the tape won't tell you that.

>And of course the big issue is WHY are we getting these errors? they are
>getting scary, as they shouldn't be there at all.  They are not the result
>of physical I/O errors on any media.




>-----Original Message-----
>From: Mark Stapleton [mailto:[EMAIL PROTECTED]]
>Sent: Tuesday, November 13, 2001 12:49 PM
>To: [EMAIL PROTECTED]
>Subject: Re: Antwort: Re: ANR9999D with a strange message


>On Mon, 12 Nov 2001 09:53:51 +0100, it was written:
>>RESTORE VOLUME did not fix anything here. We are back at those
>>
>>ANR9999D ssrecons.c(2342): Actual:   Magic=1C9F3202,
>>SrvId=-61862846, SegGroupId=3512872581838733325,
>>SeqNum=805461472, converted=T.
>>
>>Messages. It seems that the data already were damaged when the BACKUP
>>STORAGEPOOL command run.
>>I am wondering now in two ways :
>>
>>2. What should we do next ? Do we have other possibilites than to do a
>DELETE
>>VOLUME xxx DISCARDDATA=YES ?

>It's not that bad a deal to delete a volume. When the clients do their
>first backup after the volume deletion, they will just back up the
>missing files again. Your biggest window of vulnerability will be a
>few hours.

>--
>Mark Stapleton ([EMAIL PROTECTED])

Reply via email to