Re: 3592 tape read performance

2006-08-01 Thread Richard Sims

Be careful about extrapolating 3590 concepts into other areas...

I have seen no published information to suggest that the 3592 tape
utilizes a Volume Control Region (VCR) as does the 3590.  The design
of the 3592 cartridge borrows from the LTO cartridge design in
incorporating a 4 KB Cartridge Memory (CM) chip for the recording of
various information about the cartridge and media.  3592 tape load
time is improved over 3590 load time in that the 3592 can read its CM
in parallel during the loading process, as opposed to the 3590 having
to position to and digest VCR info before proceeding with the load.
If 3592 tape errors are detected, this information is recorded on the
CM to allow the next drive to learn that the tape is degraded.

Taking a long time to traverse a given area of tape is typically a
manifestation of retries, incited possibly by media defects or
environmental contaminants, as well as other issues previously
mentioned in this thread.  There is obviously no linear scan issue
where a single file is being operated upon.

I would advise taking a comprehensive look at your tape
environment...everything from air quality to drive microcode to brand
media quality.  You might cut open a particularly bad cartridge and
perform a post mortem.  (In doing this on a bad 3590 cartridge I
discovered that the tape was actually rippled as it came from the
manufacturer.)

   Richard Sims

On Jul 31, 2006, at 3:09 PM, Thomas Denier wrote:


-Survoy, Bernard J [EMAIL PROTECTED] wrote: -


Sounds like you have an issue on some volumes with the VCR (volume
control region). This structure is used (among other things) to
support high speed search operations.  If the VCR is invalid or
damaged, the drive will go into a low speed (essentially sequential
scan) until you get to the data location you want (takes forever,
if you look at the drive, it looks like it is reading tape). I know
on Sun/STK enterprise class drives, we have a similar structure
called the MIR (media information record); this structure is
rebuilt during the sequential scan up to the point where you
successfully locate the data.  A subsequent access beyond this
point is will rebuild the structure from that point forward.


Just how slow is the linear scan? I am in the process of executing
a 'move data' for a problem volume. The process moved 22 gigabytes
in the first hour or so, and has ostensibly spent almost 5 hours
working on a single 4.7 gigabyte file. How long should I wait for
this one file before I conclude that I have a problem other than
lost VCR information?

What do I do if I decide at some point that the 'move data' is a
lost cause, and want to try 'restore volume' instead? The 'cancel
process' command is cleverly designed to be useless in this sort
of situation; the process will not end until it finishes the current
file. Is there a way to get TSM to stop a data movement process
inmid-file?


Re: 3592 tape read performance

2006-07-31 Thread Rainer Wolf

Hi Thomas,
3592-J1a, tsmserver 5.3.3.2 on Solaris10
we have the same thing happened - also removed 2 or 3 tapes
from the library.
It was really annoying because a tape may happen to be reading
24 hours constantly reading but with some kBytes per second.

In our case it seems to be possibly both our old firmware in the
tape-drives and also the tapes ( here: ibm labeled - not the fujii ).

The tape support technician who checked the drives described
2 possibilities that may happenen at the tapes:
one thing is that the builtin brakes on the tapes may seldomly have a
malfunction leading to heavy positioning tasks of the drive.
The other thing is that the tape-material is slightly stuck - that
may happen with brandnew tapes and that might disappear once using the
tape at the whole length.

The firmware -update here has been a little bit complicated, because
first the drives seems to be gone.
After reset of the drives also the server-system had to be restarted.
Also you should check the latest tape-driver (IBMTape) .

Because now anything seems to be fine we may test again the
ploblem-tapes if they now work better .


Using the latest tape-drive and solaris-os Version it is very fine
for me that the unix 'iostat' - utility now is friendly showing
the current statistics of the tape-drives too  ... not only of the disks as 
before our update.
I currently localized one drive running a migration process and constantly 
running
with nearly 100 % busy and a write speed roughly around 5 MB/s over the time.
Moving that process to another drive ( same data , same destination tape-volume 
)
it shows to run normal and being 10 times faster. No errors at all - Just 
called the service 
... so you may also take a look at iostat ( eg 'iostat -x 5' ) if you also
can see the drives there.

for example that is really no 'problem-output' :-) :
  extended device statistics
device   r/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
IBMtape6 0.0  348.30.0 89166.2  0.0  0.61.7   0  58


Greetings
Rainer





Thomas Denier schrieb:

We are seeing increasingly frequent problems reading data from 3592
tapes. TSM sometimes spends as much as a couple of hours reading a
single file with a size of a few hundred megabyes. In some cases,
TSM reports a hardware or media error at the end of that time. In
other cases TSM eventually reads the file successfully. In the
latter case there are, as far as we can tell, no error indications
at all: no TSM messages, nothing logged by the OS, and no indicators
on the front panel of the tape drive. In some case the same tape
volume suffers this type of problem repeatedly. The problems seem
to spread roughly evenly over our whole population of 3592 drives.

We have just removed one 3592 volume from service because of
recurrent read problems, and are about to remove a second volume
from service. We only have about 120 3592 volumes, and losing two
of them within a week is disturbing, to put it mildly. The
possiblity that the volumes with non-recurring (so far) problems
will eventually need replacement is even more disturbing.

Our TSM server is at 5.2.6.0, running under mainframe Linux. The
3592 tapes drives are all the J1A model.
Does anyone have any suggestions for getting to the bottom of this?





--

Rainer Wolf  eMail:   [EMAIL PROTECTED]
kiz - Abt. Infrastruktur   Tel/Fax:  ++49 731 50-22482/22471
Universitaet Ulm wwweb:http://kiz.uni-ulm.de


Re: 3592 tape read performance

2006-07-31 Thread Thomas Denier
-Survoy, Bernard J [EMAIL PROTECTED] wrote: -

Sounds like you have an issue on some volumes with the VCR (volume
control region). This structure is used (among other things) to
support high speed search operations.  If the VCR is invalid or
damaged, the drive will go into a low speed (essentially sequential
scan) until you get to the data location you want (takes forever,
if you look at the drive, it looks like it is reading tape). I know
on Sun/STK enterprise class drives, we have a similar structure
called the MIR (media information record); this structure is
rebuilt during the sequential scan up to the point where you
successfully locate the data.  A subsequent access beyond this
point is will rebuild the structure from that point forward.

Just how slow is the linear scan? I am in the process of executing
a 'move data' for a problem volume. The process moved 22 gigabytes
in the first hour or so, and has ostensibly spent almost 5 hours
working on a single 4.7 gigabyte file. How long should I wait for
this one file before I conclude that I have a problem other than
lost VCR information?

What do I do if I decide at some point that the 'move data' is a
lost cause, and want to try 'restore volume' instead? The 'cancel
process' command is cleverly designed to be useless in this sort
of situation; the process will not end until it finishes the current
file. Is there a way to get TSM to stop a data movement process inmid-file?


Re: 3592 tape read performance

2006-07-31 Thread Kauffman, Tom
There *is* a way to get TSM to stop data movement mid-file. It's ugly.
And I've had to use it in similar circumstances.

Halt the TSM server.

I still don't understand *why* we can't get a cancel process with force
option.

Tom Kauffman
NIBCO, Inc

-Original Message-
From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of
Thomas Denier
Sent: Monday, July 31, 2006 3:09 PM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: 3592 tape read performance

-Survoy, Bernard J [EMAIL PROTECTED] wrote: -

Sounds like you have an issue on some volumes with the VCR (volume
control region). This structure is used (among other things) to
support high speed search operations.  If the VCR is invalid or
damaged, the drive will go into a low speed (essentially sequential
scan) until you get to the data location you want (takes forever,
if you look at the drive, it looks like it is reading tape). I know
on Sun/STK enterprise class drives, we have a similar structure
called the MIR (media information record); this structure is
rebuilt during the sequential scan up to the point where you
successfully locate the data.  A subsequent access beyond this
point is will rebuild the structure from that point forward.

Just how slow is the linear scan? I am in the process of executing
a 'move data' for a problem volume. The process moved 22 gigabytes
in the first hour or so, and has ostensibly spent almost 5 hours
working on a single 4.7 gigabyte file. How long should I wait for
this one file before I conclude that I have a problem other than
lost VCR information?

What do I do if I decide at some point that the 'move data' is a
lost cause, and want to try 'restore volume' instead? The 'cancel
process' command is cleverly designed to be useless in this sort
of situation; the process will not end until it finishes the current
file. Is there a way to get TSM to stop a data movement process
inmid-file?
CONFIDENTIALITY NOTICE:  This email and any attachments are for the 
exclusive and confidential use of the intended recipient.  If you are not
the intended recipient, please do not read, distribute or take action in 
reliance upon this message. If you have received this in error, please 
notify us immediately by return email and promptly delete this message 
and its attachments from your computer system. We do not waive  
attorney-client or work product privilege by the transmission of this
message.



Re: 3592 tape read performance

2006-07-30 Thread Richard Sims

Thomas -

We have 3592 drives as well, but have not experienced the kind of
read/write problems you are experiencing.  We find the 3592 to be a
superb, reliable tape technology.

Absent from your posting is mention of tape drive cleaning, so that
may be a cause. The 3592 is an extension of 3590 technology, and both
utilize cleaning cartridges to keep the drives performing reliably.
In AIX, there is thorough device error logging (unfortunately absent
in Linux), such that in AIX we can run the basic command 'errpt -R
3592' to look for instances where the drives said they needed
cleaning. This drive declaration is then satisfied by the 3494
Library Manager, where the 3592 drive is in such a library.

You also say that there are no TSM messages about 3592 errors. This
brings into question whether you have TapeAlert turned on, which
would help compensate for the Linux error recording deficiency.

All the usual environmental considerations apply to minimize tape
errors: clean air around the library and drives, clean transport and
careful handling of tapes entering and leaving the library (e.g.,
offsite storage), etc.

   Richard Sims


Re: 3592 tape read performance

2006-07-30 Thread Darby, Mark
In addition to Richard Sims' consistently excellent and insightful
response, one should also ensure that the microcode/firmware levels are
up to date - but if it ain't broke, don't fix it also applies, as
well.

We have noticed significant behavioral differences between
microcode/firmware levels and have experienced (and resolved) problems
like you mention with microcode/firmware updates.

We have had several media appear to become completely unreadable (in 4-5
independent attempts on different drives) but which, following a
firmware/microcode update, exhibited no subsequent problem.

We have also experienced extreme performance degradation (dropping to
KB/sec rates) when a brand new 3592 drive (a maintenance replacement)
had to be immediately replaced (again) - and this problem was never
explained.


-Original Message-
From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of
Thomas Denier
Sent: Friday, July 28, 2006 4:15 PM
To: ADSM-L@VM.MARIST.EDU
Subject: 3592 tape read performance


We are seeing increasingly frequent problems reading data from 3592
tapes. TSM sometimes spends as much as a couple of hours reading a
single file with a size of a few hundred megabyes. In some cases,
TSM reports a hardware or media error at the end of that time. In
other cases TSM eventually reads the file successfully. In the
latter case there are, as far as we can tell, no error indications
at all: no TSM messages, nothing logged by the OS, and no indicators
on the front panel of the tape drive. In some case the same tape
volume suffers this type of problem repeatedly. The problems seem
to spread roughly evenly over our whole population of 3592 drives.

We have just removed one 3592 volume from service because of
recurrent read problems, and are about to remove a second volume
from service. We only have about 120 3592 volumes, and losing two
of them within a week is disturbing, to put it mildly. The
possiblity that the volumes with non-recurring (so far) problems
will eventually need replacement is even more disturbing.

Our TSM server is at 5.2.6.0, running under mainframe Linux. The
3592 tapes drives are all the J1A model.
Does anyone have any suggestions for getting to the bottom of this?


3592 tape read performance

2006-07-28 Thread Thomas Denier
We are seeing increasingly frequent problems reading data from 3592
tapes. TSM sometimes spends as much as a couple of hours reading a
single file with a size of a few hundred megabyes. In some cases,
TSM reports a hardware or media error at the end of that time. In
other cases TSM eventually reads the file successfully. In the
latter case there are, as far as we can tell, no error indications
at all: no TSM messages, nothing logged by the OS, and no indicators
on the front panel of the tape drive. In some case the same tape
volume suffers this type of problem repeatedly. The problems seem
to spread roughly evenly over our whole population of 3592 drives.

We have just removed one 3592 volume from service because of
recurrent read problems, and are about to remove a second volume
from service. We only have about 120 3592 volumes, and losing two
of them within a week is disturbing, to put it mildly. The
possiblity that the volumes with non-recurring (so far) problems
will eventually need replacement is even more disturbing.

Our TSM server is at 5.2.6.0, running under mainframe Linux. The
3592 tapes drives are all the J1A model.
Does anyone have any suggestions for getting to the bottom of this?