Re: Intermittant TSM Server - Database stops
Richard, I heartily agree and have convinced management that it's the right way to go. But we all can't do it. Until IBM starts shipping error free code, someone has to find the problems. At 07:41 AM 12/17/2003 -0500, you wrote: ... I would strongly urge sites to avoid, if at all possible, implementing environments grounded upon base-level software. Where time allows, give a new version time to settle down and develop a matured code base before adopting it, to avoid instabilities. Richard Sims, BU Fred Johanson ITSM Administration NSIT/DCS University of Chicago 773-702-8464
Re: Intermittant TSM Server - Database stops
... >TSM Verson 5.1.0 is the server level. > >Taking the server up to say 5.1.6 may well fix it Richard. Unfortunately, we >live an far from perfect world and we have SAN agents running that need to >go up at the same time. These agents are across a great many nodes. So >these issues need also to be considered. ... Stephen - That's what I was afraid of. You are running on a base-level system with NO maintenance fixes. As we've said on the List, you have to expect problems under such implementation choices. I think you now realize that you need to get to a solid maintenance level. Until then, you will have to try to circumvent the various problems that arise. I don't know that it necessarily follows that other software in your environment would fail to interoperate with a modestly updated TSM server: in many cases, TSM component levels can be a whole version apart and continue to function well. By all means, fully investigate that. I would strongly urge sites to avoid, if at all possible, implementing environments grounded upon base-level software. Where time allows, give a new version time to settle down and develop a matured code base before adopting it, to avoid instabilities. Richard Sims, BU
Re: Intermittant TSM Server - Database stops
Richard my sincerest apologies to you and the group. I consider myself scolded ! Twice!! TSM Verson 5.1.0 is the server level. Taking the server up to say 5.1.6 may well fix it Richard. Unfortunately, we live an far from perfect world and we have SAN agents running that need to go up at the same time. These agents are across a great many nodes. So these issues need also to be considered. Unloaddb failed. This is an undocumented feature in 5.1.0 so the unload and reload is not on. Seems also that on closer look simultaneous writes to different storage pools may be causing and issue. (This we expect is fixed on 5.2) Once again we need to consider the SAN agents on the other machines. These need to come up to version 5.2 at the same time and the nodes need a reboot if not mistaken. This is not as a simple task in our environment and will need to be phased in over time. The upside may well be the phase in process, which may be that we dedicate a new 5.2 Server and move across the nodes as we upgrade each node. What we need to consider is the method of eventually marrying the two machines after all is done! Meantime, we'll keep a close eye on the monitoring of the database, and recovery log consumption etc, until we can move up to a Version 5.2. We hope there are no gotcha's there. If there gotchas then I'd sure appreciate a posting. Thanks again for the scolding. Stephen :-) -Original Message- From: Richard Sims [mailto:[EMAIL PROTECTED] Sent: 16 December, 2003 9:33 PM To: [EMAIL PROTECTED] Subject: Re: Intermittant TSM Server - Database stops ... >Since then, TSM Server stops at random times. (no messages in the actlog) The *SM server has traditionally been programmed to produce an error log in its server directory when it crashes. Have a look for such. It could well be that simply boosting your maintenance level will resolve the issue. A server with such issues needs to be carefully monitored. In particular, watch Database and Recovery Log consumption as the server operates through the day, and watch for space constraints and anomalies. Your surviving Activity Log from the crash vicinity may record the start of some highly-consumptive process which may need investigation. Richard Sims, BU
Re: Intermittant TSM Server - Database stops
... >Since then, TSM Server stops at random times. (no messages in the actlog) The *SM server has traditionally been programmed to produce an error log in its server directory when it crashes. Have a look for such. You are hereby scolded for not including your software level in your posting: it could well be that simply boosting your maintenance level will resolve the issue. A server with such issues needs to be carefully monitored. In particular, watch Database and Recovery Log consumption as the server operates through the day, and watch for space constraints and anomalies. Your surviving Activity Log from the crash vicinity may record the start of some highly-consumptive process which may need investigation. Richard Sims, BU
" Intermittant TSM Server - Database stops"
TSM'ers Being the first time attempting such as process here is a plan 1. Perform database backup 2. Copy to another location (other machine) a) dsmserv.opt b) dsmserv.dsk c) volhist (volume history file) d) devconfig ( device config file) 3. Set the server to start in quiet mode. Modify the server dsmserv.opt a) change client ports to 15000 (Stops client from logging on) b) set expinterval to 0 as this prevents inventory from expiring starting immediately after a server startup. c) add NOMIGRRECL to prevent TSM from starting space reclamation or migration. d) Set DISABLESCHEDS to YES (this prevents any TSM Schedules from running 4. Edit devconfig file to contain file deveice class to store the unloaded database on it. define declass fileclass devtyep=file mountlimit=5 maxcap=5G dir=/tsmtemp 5. From Server directory run DSMSERV UNLOADDB DEVclass=fileclass Then we will perform the following steps after a successful UNLOADDB. This is to ensure that the files created for the database and recovery logvolumes do not exist on the system otherwise LOADFORMAT process will fail. 6. Create file called /usr/temp/LOGVOL.TXT. Edit the file to contain the following:- "var/tsm/tsmlog/log01.dsm" 512 "var/tsm/tsmlog/log02.dsm" 512 "var/tsm/tsmlog/log03.dsm" 512 "var/tsm/tsmlog/log04.dsm" 512 7. Create a file called /usr/temp/DBVOL.TXT, edit the file to contain the following:- "/var/tsm/tsmdb1/db01.dsm" 5000 "/var/tsm/tsmdb1/db02.dsm" 5000 "/var/tsm/tsmdb1/db03.dsm" 5000 "/var/tsm/tsmdb1/db04.dsm" 5000 8. From server directory issue DSMSERV loadformat 4 FILE:"/usr/temp/LOGVOL.TXT 4 FILE:"/usr/temp/DBVOL.TXT 9. DSMSERV LOADDB DEVclass=fileclass VOLumenames="/tsmtemp/","/tsmtemp/volname2", "/tsmtemp/volname3","/tsmtemp/volname4" If LOADDB spits up any errors then run AUDITDB eg DSMSERV AUDITDB FIX=YES DETAIL=YES FILE=/usrtemp/AUDITDB.TXT 10. After finshing the above. Restore the orginal devconfig.out to original settings 11 Start TSM Server from commandline 12. Perform a Full DB Backup 13 HALT the TSM Server 14. Restore the orginal DSMSERV.OPT file 15. Define mirror volumes (db and log vols using the dsmfmt command) Define the group of mirrored volumes 16 Restart the TSM Server Any comments or gottcha's would be greatly appreciated. Thanks in advance Stephen - - Original Message ----- From: "Pole, Stephen" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, December 16, 2003 11:36 AM Subject: Re: Intermittant TSM Server - Database stops > Hi all, > > Sorry to trouble all of you. > > Here is an event that happened last week during normal operations, while a > large query was being run on the TSM database. > > 12/11/03 01:08:13 ANR2561I Schedule prompter contacting ICMC02RPA > (session >4174) to start a scheduled operation. > > 12/11/03 01:08:32 ANR2017I Administrator OPS issued command: FETCH NEXT > 50 > 12/11/03 01:08:37 ANR2958E SQL temporary table storage has been > exhausted. > > The explanation seems pretty self explanantory. > > The server was restarted all seemed ok for about 12 hours. > > We have added another extended the database by means of another dataase > volume and refrained from performing any major SQL queries etc.. > > Since then, TSM Server stops at random times. (no messages in the actlog) > The server just falls and has to be restarted. > > The only clue something has happened is in the errpt -a > > LABEL: CORE_DUMP > IDENTIFIER: 1F0B7B49 > > Date/Time: Mon Dec 15 23:33:11 WAUS > Sequence Number: 40038 > Machine Id: D62A4C00 > Node Id: rsfm014 > Class: S > Type:PERM > Resource Name: SYSPROC > > Description > SOFTWARE PROGRAM ABNORMALLY TERMINATED > > Probable Causes > SOFTWARE PROGRAM > > User Causes > USER GENERATED SIGNAL > > Recommended Actions > CORRECT THEN RETRY > > Failure Causes > SOFTWARE PROGRAM > > Recommended Actions > RERUN THE APPLICATION PROGRAM > IF PROBLEM PERSISTS THEN DO THE FOLLOWING > CONTACT APPROPRIATE SERVICE REPRESENTATIVE > > Detail Data > SIGNAL NUMBER >6 > USER'S PROCESS ID: >44974 > FILE SYSTEM SERIAL NUMBER >2 > INODE NUMBER > 215060 > PROCESSOR ID >0 > PROGRAM NAME > dsmserv > ADDITIONAL INFORMATION > pthread_k A8 > ?? > _p_raise 64 > raise 34 > abort B8 > AbortServ 80 > TrapHandl 13C > ?? > ?? > > Symptom Data > REPORTABLE > 1 > INTERNAL ERROR > 0 &
Re: Intermittant TSM Server - Database stops
SQL Query using TSM Log, it happed with us, and solve with extend TSM Log size. Before normal operation, please run TSM Server in console mode, this is the best way troubleshooting after TSM Server crash SN - Original Message - From: "Pole, Stephen" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, December 16, 2003 11:36 AM Subject: Re: Intermittant TSM Server - Database stops > Hi all, > > Sorry to trouble all of you. > > Here is an event that happened last week during normal operations, while a > large query was being run on the TSM database. > > 12/11/03 01:08:13 ANR2561I Schedule prompter contacting ICMC02RPA > (session >4174) to start a scheduled operation. > > 12/11/03 01:08:32 ANR2017I Administrator OPS issued command: FETCH NEXT > 50 > 12/11/03 01:08:37 ANR2958E SQL temporary table storage has been > exhausted. > > The explanation seems pretty self explanantory. > > The server was restarted all seemed ok for about 12 hours. > > We have added another extended the database by means of another dataase > volume and refrained from performing any major SQL queries etc.. > > Since then, TSM Server stops at random times. (no messages in the actlog) > The server just falls and has to be restarted. > > The only clue something has happened is in the errpt -a > > LABEL: CORE_DUMP > IDENTIFIER: 1F0B7B49 > > Date/Time: Mon Dec 15 23:33:11 WAUS > Sequence Number: 40038 > Machine Id: D62A4C00 > Node Id: rsfm014 > Class: S > Type:PERM > Resource Name: SYSPROC > > Description > SOFTWARE PROGRAM ABNORMALLY TERMINATED > > Probable Causes > SOFTWARE PROGRAM > > User Causes > USER GENERATED SIGNAL > > Recommended Actions > CORRECT THEN RETRY > > Failure Causes > SOFTWARE PROGRAM > > Recommended Actions > RERUN THE APPLICATION PROGRAM > IF PROBLEM PERSISTS THEN DO THE FOLLOWING > CONTACT APPROPRIATE SERVICE REPRESENTATIVE > > Detail Data > SIGNAL NUMBER >6 > USER'S PROCESS ID: >44974 > FILE SYSTEM SERIAL NUMBER >2 > INODE NUMBER > 215060 > PROCESSOR ID >0 > PROGRAM NAME > dsmserv > ADDITIONAL INFORMATION > pthread_k A8 > ?? > _p_raise 64 > raise 34 > abort B8 > AbortServ 80 > TrapHandl 13C > ?? > ?? > > Symptom Data > REPORTABLE > 1 > INTERNAL ERROR > 0 > SYMPTOM CODE > PCSS/SPI2 FLDS/dsmserv SIG/6 FLDS/AbortServ VALU/80 > > End result is that TSM has gone very "flakey" and falls over a odd times. > > Has anyone encounter this before. If so what are my options? > > We are looking at doing an unloaddb then reload etc... (If I can get my > head around the procedure).. > > Any help would be greatly appreciated > > Thanks in advance TSM'ers! > > Cheers > > > > Stephen Pole > WA Dept of Health > > [EMAIL PROTECTED]
Re: Intermittant TSM Server - Database stops
Hi all, Sorry to trouble all of you. Here is an event that happened last week during normal operations, while a large query was being run on the TSM database. 12/11/03 01:08:13 ANR2561I Schedule prompter contacting ICMC02RPA (session 4174) to start a scheduled operation. 12/11/03 01:08:32 ANR2017I Administrator OPS issued command: FETCH NEXT 50 12/11/03 01:08:37 ANR2958E SQL temporary table storage has been exhausted. The explanation seems pretty self explanantory. The server was restarted all seemed ok for about 12 hours. We have added another extended the database by means of another dataase volume and refrained from performing any major SQL queries etc.. Since then, TSM Server stops at random times. (no messages in the actlog) The server just falls and has to be restarted. The only clue something has happened is in the errpt -a LABEL: CORE_DUMP IDENTIFIER: 1F0B7B49 Date/Time: Mon Dec 15 23:33:11 WAUS Sequence Number: 40038 Machine Id: D62A4C00 Node Id: rsfm014 Class: S Type:PERM Resource Name: SYSPROC Description SOFTWARE PROGRAM ABNORMALLY TERMINATED Probable Causes SOFTWARE PROGRAM User Causes USER GENERATED SIGNAL Recommended Actions CORRECT THEN RETRY Failure Causes SOFTWARE PROGRAM Recommended Actions RERUN THE APPLICATION PROGRAM IF PROBLEM PERSISTS THEN DO THE FOLLOWING CONTACT APPROPRIATE SERVICE REPRESENTATIVE Detail Data SIGNAL NUMBER 6 USER'S PROCESS ID: 44974 FILE SYSTEM SERIAL NUMBER 2 INODE NUMBER 215060 PROCESSOR ID 0 PROGRAM NAME dsmserv ADDITIONAL INFORMATION pthread_k A8 ?? _p_raise 64 raise 34 abort B8 AbortServ 80 TrapHandl 13C ?? ?? Symptom Data REPORTABLE 1 INTERNAL ERROR 0 SYMPTOM CODE PCSS/SPI2 FLDS/dsmserv SIG/6 FLDS/AbortServ VALU/80 End result is that TSM has gone very "flakey" and falls over a odd times. Has anyone encounter this before. If so what are my options? We are looking at doing an unloaddb then reload etc... (If I can get my head around the procedure).. Any help would be greatly appreciated Thanks in advance TSM'ers! Cheers Stephen Pole WA Dept of Health [EMAIL PROTECTED]