Re: 3592 tape read performance
Be careful about extrapolating 3590 concepts into other areas... I have seen no published information to suggest that the 3592 tape utilizes a Volume Control Region (VCR) as does the 3590. The design of the 3592 cartridge borrows from the LTO cartridge design in incorporating a 4 KB Cartridge Memory (CM) chip for the recording of various information about the cartridge and media. 3592 tape load time is improved over 3590 load time in that the 3592 can read its CM in parallel during the loading process, as opposed to the 3590 having to position to and digest VCR info before proceeding with the load. If 3592 tape errors are detected, this information is recorded on the CM to allow the next drive to learn that the tape is degraded. Taking a long time to traverse a given area of tape is typically a manifestation of retries, incited possibly by media defects or environmental contaminants, as well as other issues previously mentioned in this thread. There is obviously no linear scan issue where a single file is being operated upon. I would advise taking a comprehensive look at your tape environment...everything from air quality to drive microcode to brand media quality. You might cut open a particularly bad cartridge and perform a post mortem. (In doing this on a bad 3590 cartridge I discovered that the tape was actually rippled as it came from the manufacturer.) Richard Sims On Jul 31, 2006, at 3:09 PM, Thomas Denier wrote: -Survoy, Bernard J [EMAIL PROTECTED] wrote: - Sounds like you have an issue on some volumes with the VCR (volume control region). This structure is used (among other things) to support high speed search operations. If the VCR is invalid or damaged, the drive will go into a low speed (essentially sequential scan) until you get to the data location you want (takes forever, if you look at the drive, it looks like it is reading tape). I know on Sun/STK enterprise class drives, we have a similar structure called the MIR (media information record); this structure is rebuilt during the sequential scan up to the point where you successfully locate the data. A subsequent access beyond this point is will rebuild the structure from that point forward. Just how slow is the linear scan? I am in the process of executing a 'move data' for a problem volume. The process moved 22 gigabytes in the first hour or so, and has ostensibly spent almost 5 hours working on a single 4.7 gigabyte file. How long should I wait for this one file before I conclude that I have a problem other than lost VCR information? What do I do if I decide at some point that the 'move data' is a lost cause, and want to try 'restore volume' instead? The 'cancel process' command is cleverly designed to be useless in this sort of situation; the process will not end until it finishes the current file. Is there a way to get TSM to stop a data movement process inmid-file?
Re: 3592 tape read performance
Hi Thomas, 3592-J1a, tsmserver 5.3.3.2 on Solaris10 we have the same thing happened - also removed 2 or 3 tapes from the library. It was really annoying because a tape may happen to be reading 24 hours constantly reading but with some kBytes per second. In our case it seems to be possibly both our old firmware in the tape-drives and also the tapes ( here: ibm labeled - not the fujii ). The tape support technician who checked the drives described 2 possibilities that may happenen at the tapes: one thing is that the builtin brakes on the tapes may seldomly have a malfunction leading to heavy positioning tasks of the drive. The other thing is that the tape-material is slightly stuck - that may happen with brandnew tapes and that might disappear once using the tape at the whole length. The firmware -update here has been a little bit complicated, because first the drives seems to be gone. After reset of the drives also the server-system had to be restarted. Also you should check the latest tape-driver (IBMTape) . Because now anything seems to be fine we may test again the ploblem-tapes if they now work better . Using the latest tape-drive and solaris-os Version it is very fine for me that the unix 'iostat' - utility now is friendly showing the current statistics of the tape-drives too ... not only of the disks as before our update. I currently localized one drive running a migration process and constantly running with nearly 100 % busy and a write speed roughly around 5 MB/s over the time. Moving that process to another drive ( same data , same destination tape-volume ) it shows to run normal and being 10 times faster. No errors at all - Just called the service ... so you may also take a look at iostat ( eg 'iostat -x 5' ) if you also can see the drives there. for example that is really no 'problem-output' :-) : extended device statistics device r/sw/s kr/s kw/s wait actv svc_t %w %b IBMtape6 0.0 348.30.0 89166.2 0.0 0.61.7 0 58 Greetings Rainer Thomas Denier schrieb: We are seeing increasingly frequent problems reading data from 3592 tapes. TSM sometimes spends as much as a couple of hours reading a single file with a size of a few hundred megabyes. In some cases, TSM reports a hardware or media error at the end of that time. In other cases TSM eventually reads the file successfully. In the latter case there are, as far as we can tell, no error indications at all: no TSM messages, nothing logged by the OS, and no indicators on the front panel of the tape drive. In some case the same tape volume suffers this type of problem repeatedly. The problems seem to spread roughly evenly over our whole population of 3592 drives. We have just removed one 3592 volume from service because of recurrent read problems, and are about to remove a second volume from service. We only have about 120 3592 volumes, and losing two of them within a week is disturbing, to put it mildly. The possiblity that the volumes with non-recurring (so far) problems will eventually need replacement is even more disturbing. Our TSM server is at 5.2.6.0, running under mainframe Linux. The 3592 tapes drives are all the J1A model. Does anyone have any suggestions for getting to the bottom of this? -- Rainer Wolf eMail: [EMAIL PROTECTED] kiz - Abt. Infrastruktur Tel/Fax: ++49 731 50-22482/22471 Universitaet Ulm wwweb:http://kiz.uni-ulm.de
Re: 3592 tape read performance
-Survoy, Bernard J [EMAIL PROTECTED] wrote: - Sounds like you have an issue on some volumes with the VCR (volume control region). This structure is used (among other things) to support high speed search operations. If the VCR is invalid or damaged, the drive will go into a low speed (essentially sequential scan) until you get to the data location you want (takes forever, if you look at the drive, it looks like it is reading tape). I know on Sun/STK enterprise class drives, we have a similar structure called the MIR (media information record); this structure is rebuilt during the sequential scan up to the point where you successfully locate the data. A subsequent access beyond this point is will rebuild the structure from that point forward. Just how slow is the linear scan? I am in the process of executing a 'move data' for a problem volume. The process moved 22 gigabytes in the first hour or so, and has ostensibly spent almost 5 hours working on a single 4.7 gigabyte file. How long should I wait for this one file before I conclude that I have a problem other than lost VCR information? What do I do if I decide at some point that the 'move data' is a lost cause, and want to try 'restore volume' instead? The 'cancel process' command is cleverly designed to be useless in this sort of situation; the process will not end until it finishes the current file. Is there a way to get TSM to stop a data movement process inmid-file?
Re: 3592 tape read performance
There *is* a way to get TSM to stop data movement mid-file. It's ugly. And I've had to use it in similar circumstances. Halt the TSM server. I still don't understand *why* we can't get a cancel process with force option. Tom Kauffman NIBCO, Inc -Original Message- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of Thomas Denier Sent: Monday, July 31, 2006 3:09 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: 3592 tape read performance -Survoy, Bernard J [EMAIL PROTECTED] wrote: - Sounds like you have an issue on some volumes with the VCR (volume control region). This structure is used (among other things) to support high speed search operations. If the VCR is invalid or damaged, the drive will go into a low speed (essentially sequential scan) until you get to the data location you want (takes forever, if you look at the drive, it looks like it is reading tape). I know on Sun/STK enterprise class drives, we have a similar structure called the MIR (media information record); this structure is rebuilt during the sequential scan up to the point where you successfully locate the data. A subsequent access beyond this point is will rebuild the structure from that point forward. Just how slow is the linear scan? I am in the process of executing a 'move data' for a problem volume. The process moved 22 gigabytes in the first hour or so, and has ostensibly spent almost 5 hours working on a single 4.7 gigabyte file. How long should I wait for this one file before I conclude that I have a problem other than lost VCR information? What do I do if I decide at some point that the 'move data' is a lost cause, and want to try 'restore volume' instead? The 'cancel process' command is cleverly designed to be useless in this sort of situation; the process will not end until it finishes the current file. Is there a way to get TSM to stop a data movement process inmid-file? CONFIDENTIALITY NOTICE: This email and any attachments are for the exclusive and confidential use of the intended recipient. If you are not the intended recipient, please do not read, distribute or take action in reliance upon this message. If you have received this in error, please notify us immediately by return email and promptly delete this message and its attachments from your computer system. We do not waive attorney-client or work product privilege by the transmission of this message.
Re: 3592 tape read performance
Thomas - We have 3592 drives as well, but have not experienced the kind of read/write problems you are experiencing. We find the 3592 to be a superb, reliable tape technology. Absent from your posting is mention of tape drive cleaning, so that may be a cause. The 3592 is an extension of 3590 technology, and both utilize cleaning cartridges to keep the drives performing reliably. In AIX, there is thorough device error logging (unfortunately absent in Linux), such that in AIX we can run the basic command 'errpt -R 3592' to look for instances where the drives said they needed cleaning. This drive declaration is then satisfied by the 3494 Library Manager, where the 3592 drive is in such a library. You also say that there are no TSM messages about 3592 errors. This brings into question whether you have TapeAlert turned on, which would help compensate for the Linux error recording deficiency. All the usual environmental considerations apply to minimize tape errors: clean air around the library and drives, clean transport and careful handling of tapes entering and leaving the library (e.g., offsite storage), etc. Richard Sims
Re: 3592 tape read performance
In addition to Richard Sims' consistently excellent and insightful response, one should also ensure that the microcode/firmware levels are up to date - but if it ain't broke, don't fix it also applies, as well. We have noticed significant behavioral differences between microcode/firmware levels and have experienced (and resolved) problems like you mention with microcode/firmware updates. We have had several media appear to become completely unreadable (in 4-5 independent attempts on different drives) but which, following a firmware/microcode update, exhibited no subsequent problem. We have also experienced extreme performance degradation (dropping to KB/sec rates) when a brand new 3592 drive (a maintenance replacement) had to be immediately replaced (again) - and this problem was never explained. -Original Message- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of Thomas Denier Sent: Friday, July 28, 2006 4:15 PM To: ADSM-L@VM.MARIST.EDU Subject: 3592 tape read performance We are seeing increasingly frequent problems reading data from 3592 tapes. TSM sometimes spends as much as a couple of hours reading a single file with a size of a few hundred megabyes. In some cases, TSM reports a hardware or media error at the end of that time. In other cases TSM eventually reads the file successfully. In the latter case there are, as far as we can tell, no error indications at all: no TSM messages, nothing logged by the OS, and no indicators on the front panel of the tape drive. In some case the same tape volume suffers this type of problem repeatedly. The problems seem to spread roughly evenly over our whole population of 3592 drives. We have just removed one 3592 volume from service because of recurrent read problems, and are about to remove a second volume from service. We only have about 120 3592 volumes, and losing two of them within a week is disturbing, to put it mildly. The possiblity that the volumes with non-recurring (so far) problems will eventually need replacement is even more disturbing. Our TSM server is at 5.2.6.0, running under mainframe Linux. The 3592 tapes drives are all the J1A model. Does anyone have any suggestions for getting to the bottom of this?
3592 tape read performance
We are seeing increasingly frequent problems reading data from 3592 tapes. TSM sometimes spends as much as a couple of hours reading a single file with a size of a few hundred megabyes. In some cases, TSM reports a hardware or media error at the end of that time. In other cases TSM eventually reads the file successfully. In the latter case there are, as far as we can tell, no error indications at all: no TSM messages, nothing logged by the OS, and no indicators on the front panel of the tape drive. In some case the same tape volume suffers this type of problem repeatedly. The problems seem to spread roughly evenly over our whole population of 3592 drives. We have just removed one 3592 volume from service because of recurrent read problems, and are about to remove a second volume from service. We only have about 120 3592 volumes, and losing two of them within a week is disturbing, to put it mildly. The possiblity that the volumes with non-recurring (so far) problems will eventually need replacement is even more disturbing. Our TSM server is at 5.2.6.0, running under mainframe Linux. The 3592 tapes drives are all the J1A model. Does anyone have any suggestions for getting to the bottom of this?