Re: 3494 Volume Stealing

Seay, Paul Wed, 20 Mar 2002 01:33:19 -0800

Allen,

I do not normally boast about my knowledge on something but in this area of
this product I know it as well as the engineers do and probably better when
considering its interaction with host systems.  I have actually run the
traces and I can tell you that the buffers returned from the 3494 during an
inventory function are in error in some cases because of an internal bug in
the LM.  This bug can cause software that does no error checking to get all
balled up and do something wrong.  It is kind of like adding 1 + 1 and
expecting to always get 2.  The way the LM inventory search facility works
(TSM uses this) is it passes back 100 tapes at a time.  The first buffer has
a header on it that has the total count of entries.  The last buffer is what
is left.  Unfortunately, the buffer is not cleared, so there is residual
data in the buffer from previous calls.  The last entry in the last buffer
has a null entry in it.  IBM never tested the count always being right.
And, in fact determined they have no way to guarantee the count is correct
because they just get the count from a DB2 table on the LM, they do not
count the number of entries they are returning.  So, any application that
does not scan for the null entry can end up picking up bogus information.
Note that these buffers can contain all tapes for all hosts, not just the
ones for a specific category.  The category is in the records returned.
Usually, the counter is short and you are missing tapes, but I suppose it
could be high and cause the logic to process volumes that are not yours.


Just food for thought.  There is no user code in my environment.

In the case I described below, there was only one host attached to this
library and it was non-mixed.  We just kept getting 2 tapes or 4 tapes short
of what was actually in the library.  If you read the 3494 programmers guide
you will find that there is a lot more to this than meets the superfical
mtlib command.  There is a whole set of c routines to allow you to code to
the lmcpd interface.  In fact, I am in the process of designing an insert
and categorize function for TSM.  Then, I will have some user code.  It will
work like the mainframe, automatically categorize the tapes and check them
in.  The way I am going to do this is execute a high level language perl
script that can be recoded easily that will do the necessary mtlib and TSM
commands.  This will limit the unsolicited messages processing in the coded
routine significantly.

I know the site that is having this problem with the 4 TSM Servers.  This is
a tight environment.  Another site that has an MVS and TSM environment had
an MVS tape eaten.  Both are in the process of reproducing this problem at
will.

This may still be a user problem, but I am betting either TSM code or the
Library Manager.

What everyone is asking for is security by the attaching node as to which
tapes can be acted upon by volume range.  This was discussed at length at
Share related to TSM because no other libraries have the kind of smarts the
3494 or ACSLS have to even do this in hardware.  The position of the
customers including us is that the 3494 is an enterprise class, high
integrity, high dollar product.  Yes, the functionality is not there in the
LM, but the amount of code to do a table check of valid ranges is small.
Each host has to be identified to the 3494 library, why not add the valid
ranges and categories too.  The big sell for IBM to do this is if I could
secure a library at this level then I could share it between many
environments like the Shark Disk LUN masking allows me.  In other words
consolidate many libraries into one.  Yes, a host could still mount its own
tapes in the wrong drive, that is its problem as you say or set the hardware
scratch mount category on the wrong drive to its own.  These are difficult
to solve in the library design.  But, to allow any system to checkin a FF00
tape or change the category of a tape owned by another host is a problem.
By default, what IBM should do is make it work the way it does now.  All
systems can do anything to "*".  Then, you can lock it down if you want.

I am going to be discussing this issue with 3494 hardware engineering in a
couple weeks hopefully.  It has always been an issue for us, but now that
people are having these kind of problems, I have sturdier ground to stand on
to get IBM to provide the functionality.  And why are they having these
problems, because everyone wants the reliability of the 3590 platform, have
this library already installed and say why not.


-----Original Message-----
From: Allen Barth [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, March 19, 2002 5:50 PM
To: [EMAIL PROTECTED]
Subject: Re: 3494 Volume Stealing


I stand by the statement that the 3494 volume claiming is working as
designed.

I have a 3494 which for the last 6 years is used concurrently by: multiple
non-shared os/390 lpars two disparate as/400 systems multiple rs/6k servers
2 TSM systems

Yes, it is up to the 'host software' to maintain category limits.   In
every one of these 'host' environments, the 'host software' is a combination
of system or product software and user written code.  None of these systems
uses a pure search technique, there's always some user code to help each
system 'know' what its' valid tapes are.  In some it's just a little harder
to find the user code.  I further use a different volser range for each
platform to aid in more generic user code.  I don't see any way that a
robotic tape server able to hook up to a plethora of platforms and software
could be expected to isolate categories.  Also think of error conditions.
There is no way in for the 3494 to move or recategorize tapes.  Those
commands must come from attached hosts.

OK, this is all a learning curve.  Been there did that.   But I think it
works.

my .02

Al Barth




"Seay, Paul" <[EMAIL PROTECTED]>
Sent by: "ADSM: Dist Stor Manager" <[EMAIL PROTECTED]> 03/19/02 03:03 PM
Please respond to "ADSM: Dist Stor Manager"


        To:     [EMAIL PROTECTED]
        cc:
        Subject:        Re: 3494 Volume Stealing


Actually, Nick we think there really is a bug.  I saw something similar once
on Netbackup.  Essentially, the 3494 inventory count got off from the actual
number of entries presented in the SEARCH=YES type CHECKIN equivalent in
NetBackup.  After we ran a full offline inventory of the library the problem
went away for a week or two and would come back.  Eventually, we got a LM
code level that apparently fixed the corruption problem.  Have not seen it
for a long time.

-----Original Message-----
From: Nicholas Cassimatis [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, March 19, 2002 10:07 AM
To: [EMAIL PROTECTED]
Subject: Re: 3494 Volume Stealing


This falls under the old "Measure twice, cut once" rule.  If you're sharing
a library and NOT doublechecking yourself, you're asking for trouble. Plain
and simple.  Don't describe a "defect" to something that  is working at the
level it's designed to.  The nice thing about a shared library is you can
have a pool of "spare" tape to assign to any server you want to, as needed.

Checkout and checkin of tape can be a destructive process, and shouldn't be
taken too lightly.

Nick Cassimatis
[EMAIL PROTECTED]

Today is the tomorrow of yesterday.




                      "Orville L.
                      Lantto"                  To:
[EMAIL PROTECTED]
                      <orville.lantto@D        cc:
                      TREND.COM>               Subject:  Re: 3494 Volume
Stealing
                      Sent by: "ADSM:
                      Dist Stor
                      Manager"
                      <[EMAIL PROTECTED]
                      .EDU>


                      03/15/2002 05:43
                      PM
                      Please respond to
                      "ADSM: Dist Stor
                      Manager"





The volume which was "stolen" was checked in to another TSM server with that
server's scratch category code (verified by mtlib).  Yes, this is very
disturbing!


Orville L. Lantto
Datatrend Technologies, Inc.  (http://www.datatrend.com)
121 Cheshire Lane #700
Minnetonka, MN 55305
Email: [EMAIL PROTECTED]
V: 952-931-1203
F: 952-931-1293
C: 612-770-9166




"Seay, Paul" <[EMAIL PROTECTED]>
Sent by: "ADSM: Dist Stor Manager" <[EMAIL PROTECTED]> 03/15/02 03:15 PM
Please respond to "ADSM: Dist Stor Manager"


        To:     [EMAIL PROTECTED]
        cc:
        Subject:        Re: 3494 Volume Stealing


Yes and no.  Once a tape is ejected from the library, when it is reinserted,
it is anybody's game because it does not belong to a specific TSM server. It
is in FF00 status.  So, if you do a checkin command with a range,
search=yes, another TSM Server could get it.  This is why I do checkin
commands with a specific volume id when I checkin each tape.

At Share we have asked for a function to be added in general to setup an
include table for each TSM server.  This include table would limit what
ranges of tapes are allowed to be picked up by that TSM server instance.

Now, if the tapes are already in the library and assigned a scratch or
private category and the tapes can be stolen, that is a major problem that
support needs to know about.  I have never tried to see if I can cause one
TSM to steal tapes from another TSM server this way.

-----Original Message-----
From: Orville L. Lantto [mailto:[EMAIL PROTECTED]]
Sent: Friday, March 15, 2002 2:51 PM
To: [EMAIL PROTECTED]
Subject: 3494 Volume Stealing


I just tested a problem brought to me by one of my clients.  They have one
3494 library shared by four TSM Servers.  Using 4.2.1 TSM, properly
configured with different 3494 Categories, it is possible for one TSM server
to steal a volume that is checked in to another TSM server.  This behavior
is not exhibited by 3.7.3.

Has anyone seem this?


Orville L. Lantto
Datatrend Technologies, Inc.  (http://www.datatrend.com)
121 Cheshire Lane #700
Minnetonka, MN 55305
Email: [EMAIL PROTECTED]






******************* PLEASE NOTE *******************
This message, along with any attachments, may be confidential or legally
privileged.  It is intended only for the named person(s), who is/are the
only authorized recipients. If this message has reached you in error, kindly
destroy it without review and notify the sender immediately. Thank you for
your help.
**********************************************************

Re: 3494 Volume Stealing

Reply via email to