Re: Intermittant TSM Server - Database stops

2003-12-17 Thread Fred Johanson
Richard,

I heartily agree and have convinced management that it's the right way to
go.  But we all can't do it.  Until IBM starts shipping error free code,
someone has to find the problems.
At 07:41 AM 12/17/2003 -0500, you wrote:
...

I would strongly urge sites to avoid, if at all possible, implementing
environments grounded upon base-level software.  Where time allows, give a
new version time to settle down and develop a matured code base before
adopting it, to avoid instabilities.
   Richard Sims, BU
Fred Johanson
ITSM Administration
NSIT/DCS
University of Chicago
773-702-8464


Re: Intermittant TSM Server - Database stops

2003-12-17 Thread Richard Sims
...
>TSM Verson 5.1.0 is the server level.
>
>Taking the server up to say 5.1.6 may well fix it Richard. Unfortunately, we
>live an far from perfect world and we have SAN agents running that need to
>go up at the same time. These agents are across a great many nodes.  So
>these issues need also to be considered.
...

Stephen - That's what I was afraid of.  You are running on a base-level system
  with NO maintenance fixes.  As we've said on the List, you have to
expect problems under such implementation choices.  I think you now realize
that you need to get to a solid maintenance level.  Until then, you will have
to try to circumvent the various problems that arise.  I don't know that it
necessarily follows that other software in your environment would fail to
interoperate with a modestly updated TSM server: in many cases, TSM component
levels can be a whole version apart and continue to function well.  By all
means, fully investigate that.

I would strongly urge sites to avoid, if at all possible, implementing
environments grounded upon base-level software.  Where time allows, give a
new version time to settle down and develop a matured code base before
adopting it, to avoid instabilities.

   Richard Sims, BU


Re: Intermittant TSM Server - Database stops

2003-12-16 Thread Pole, Stephen
Richard my sincerest apologies to you and the group. I consider myself
scolded ! Twice!!

TSM Verson 5.1.0 is the server level.

Taking the server up to say 5.1.6 may well fix it Richard. Unfortunately, we
live an far from perfect world and we have SAN agents running that need to
go up at the same time. These agents are across a great many nodes.  So
these issues need also to be considered.

Unloaddb failed. This is an undocumented feature in 5.1.0 so the unload and
reload is not on.

Seems also that on closer look simultaneous writes to different storage
pools may be causing and issue. (This we expect is fixed on 5.2)  Once
again we need to consider the SAN agents on the other machines. These need
to come up to version 5.2 at the same time and the nodes need a reboot if
not mistaken. This is not as a simple task in our environment and will need
to be phased in over time. The upside may well be the phase in process,
which may be that we dedicate a new 5.2 Server and move across the nodes as
we upgrade each node. What we need to consider is the method of eventually
marrying the two machines after all is done!

Meantime, we'll keep a close eye on the monitoring of the database, and
recovery log consumption etc, until we can move up to a Version 5.2.

We hope there are no gotcha's there. If there gotchas then I'd sure
appreciate a posting.

Thanks again for the scolding.

Stephen :-)





-Original Message-
From: Richard Sims [mailto:[EMAIL PROTECTED]
Sent: 16 December, 2003 9:33 PM
To: [EMAIL PROTECTED]
Subject: Re: Intermittant TSM Server - Database stops


...
>Since then, TSM Server stops at random times. (no messages in the actlog)

The *SM server has traditionally been programmed to produce an error log in
its
server directory when it crashes.  Have a look for such.

It could well be that simply boosting your maintenance level will resolve
the
issue.

A server with such issues needs to be carefully monitored.  In particular,
watch
Database and Recovery Log consumption as the server operates through the
day, and
watch for space constraints and anomalies.  Your surviving Activity Log from
the
crash vicinity may record the start of some highly-consumptive process which
may
need investigation.

  Richard Sims, BU


Re: Intermittant TSM Server - Database stops

2003-12-16 Thread Richard Sims
...
>Since then, TSM Server stops at random times. (no messages in the actlog)

The *SM server has traditionally been programmed to produce an error log in its
server directory when it crashes.  Have a look for such.

You are hereby scolded for not including your software level in your posting:
it could well be that simply boosting your maintenance level will resolve the
issue.

A server with such issues needs to be carefully monitored.  In particular, watch
Database and Recovery Log consumption as the server operates through the day, and
watch for space constraints and anomalies.  Your surviving Activity Log from the
crash vicinity may record the start of some highly-consumptive process which may
need investigation.

  Richard Sims, BU


" Intermittant TSM Server - Database stops"

2003-12-15 Thread Pole, Stephen
TSM'ers

Being the first time attempting such as process here is a plan

1. Perform database backup
2. Copy to another location (other machine)
a) dsmserv.opt
b) dsmserv.dsk
c) volhist (volume history file)
d) devconfig ( device config file)

3. Set the server to start in quiet mode. Modify the server dsmserv.opt
a) change client ports to 15000 (Stops client from logging on)
b) set expinterval to 0 as this prevents inventory from expiring
starting immediately after a server startup.
c) add NOMIGRRECL to prevent TSM from starting space reclamation or
migration.
d) Set DISABLESCHEDS to YES (this prevents any TSM Schedules from
running

4.  Edit devconfig file to contain file deveice class to store the
unloaded database on it.
define declass fileclass devtyep=file mountlimit=5 maxcap=5G
dir=/tsmtemp

5. From Server directory run DSMSERV UNLOADDB DEVclass=fileclass

Then we will perform the following steps after a successful UNLOADDB. This
is to ensure that the files created for the database and recovery logvolumes
do not exist on the system otherwise LOADFORMAT process will fail.

6. Create file called /usr/temp/LOGVOL.TXT. Edit the file to contain the
following:-
"var/tsm/tsmlog/log01.dsm" 512
"var/tsm/tsmlog/log02.dsm" 512
"var/tsm/tsmlog/log03.dsm" 512
"var/tsm/tsmlog/log04.dsm" 512

7. Create a file called /usr/temp/DBVOL.TXT, edit the file to contain the
following:-
"/var/tsm/tsmdb1/db01.dsm" 5000
"/var/tsm/tsmdb1/db02.dsm" 5000
"/var/tsm/tsmdb1/db03.dsm" 5000
"/var/tsm/tsmdb1/db04.dsm" 5000

8. From server directory issue DSMSERV loadformat 4
FILE:"/usr/temp/LOGVOL.TXT 4 FILE:"/usr/temp/DBVOL.TXT

9. DSMSERV LOADDB DEVclass=fileclass
VOLumenames="/tsmtemp/","/tsmtemp/volname2",
"/tsmtemp/volname3","/tsmtemp/volname4"

If LOADDB spits up any errors then run AUDITDB
eg DSMSERV AUDITDB FIX=YES DETAIL=YES FILE=/usrtemp/AUDITDB.TXT

10. After finshing the above. Restore the orginal devconfig.out to original
settings

11 Start TSM Server from commandline

12. Perform a Full DB Backup
13 HALT the TSM Server
14. Restore the orginal DSMSERV.OPT file
15. Define mirror volumes (db and log vols using the dsmfmt command)

Define the group of mirrored volumes

16 Restart the TSM Server

Any comments or gottcha's would be greatly appreciated.

Thanks in advance


Stephen


-
- Original Message -----
From: "Pole, Stephen" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, December 16, 2003 11:36 AM
Subject: Re: Intermittant TSM Server - Database stops


> Hi all,
>
> Sorry to trouble all of you.
>
> Here is an event that happened last week during normal operations, while a
> large query was being run on the TSM database.
>
> 12/11/03 01:08:13 ANR2561I Schedule prompter contacting ICMC02RPA
> (session
>4174) to start a scheduled operation.
>
> 12/11/03 01:08:32 ANR2017I Administrator OPS issued command: FETCH
NEXT
> 50
> 12/11/03 01:08:37 ANR2958E SQL temporary table storage has been
> exhausted.
>
> The explanation seems pretty self explanantory.
>
> The server was restarted all seemed ok for about 12 hours.
>
> We have added another extended the database by means of another dataase
> volume and refrained from performing any major SQL queries etc..
>
> Since then, TSM Server stops at random times. (no messages in the actlog)
> The server just falls and has to be restarted.
>
> The only clue something has happened is in the errpt -a
>
> LABEL:  CORE_DUMP
> IDENTIFIER: 1F0B7B49
>
> Date/Time:   Mon Dec 15 23:33:11 WAUS
> Sequence Number: 40038
> Machine Id:  D62A4C00
> Node Id: rsfm014
> Class:   S
> Type:PERM
> Resource Name:   SYSPROC
>
> Description
> SOFTWARE PROGRAM ABNORMALLY TERMINATED
>
> Probable Causes
> SOFTWARE PROGRAM
>
> User Causes
> USER GENERATED SIGNAL
>
> Recommended Actions
> CORRECT THEN RETRY
>
> Failure Causes
> SOFTWARE PROGRAM
>
> Recommended Actions
> RERUN THE APPLICATION PROGRAM
> IF PROBLEM PERSISTS THEN DO THE FOLLOWING
> CONTACT APPROPRIATE SERVICE REPRESENTATIVE
>
> Detail Data
> SIGNAL NUMBER
>6
> USER'S PROCESS ID:
>44974
> FILE SYSTEM SERIAL NUMBER
>2
> INODE NUMBER
>   215060
> PROCESSOR ID
>0
> PROGRAM NAME
> dsmserv
> ADDITIONAL INFORMATION
> pthread_k A8
> ??
> _p_raise 64
> raise 34
> abort B8
> AbortServ 80
> TrapHandl 13C
> ??
> ??
>
> Symptom Data
> REPORTABLE
> 1
> INTERNAL ERROR
> 0
&

Re: Intermittant TSM Server - Database stops

2003-12-15 Thread Sony Priyambodo
SQL Query using TSM Log, it happed with us, and solve with extend TSM Log
size. Before normal operation, please run TSM Server in console mode, this
is the best way troubleshooting after TSM Server crash

SN

- Original Message -
From: "Pole, Stephen" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, December 16, 2003 11:36 AM
Subject: Re: Intermittant TSM Server - Database stops


> Hi all,
>
> Sorry to trouble all of you.
>
> Here is an event that happened last week during normal operations, while a
> large query was being run on the TSM database.
>
> 12/11/03 01:08:13 ANR2561I Schedule prompter contacting ICMC02RPA
> (session
>4174) to start a scheduled operation.
>
> 12/11/03 01:08:32 ANR2017I Administrator OPS issued command: FETCH
NEXT
> 50
> 12/11/03 01:08:37 ANR2958E SQL temporary table storage has been
> exhausted.
>
> The explanation seems pretty self explanantory.
>
> The server was restarted all seemed ok for about 12 hours.
>
> We have added another extended the database by means of another dataase
> volume and refrained from performing any major SQL queries etc..
>
> Since then, TSM Server stops at random times. (no messages in the actlog)
> The server just falls and has to be restarted.
>
> The only clue something has happened is in the errpt -a
>
> LABEL:  CORE_DUMP
> IDENTIFIER: 1F0B7B49
>
> Date/Time:   Mon Dec 15 23:33:11 WAUS
> Sequence Number: 40038
> Machine Id:  D62A4C00
> Node Id: rsfm014
> Class:   S
> Type:PERM
> Resource Name:   SYSPROC
>
> Description
> SOFTWARE PROGRAM ABNORMALLY TERMINATED
>
> Probable Causes
> SOFTWARE PROGRAM
>
> User Causes
> USER GENERATED SIGNAL
>
> Recommended Actions
> CORRECT THEN RETRY
>
> Failure Causes
> SOFTWARE PROGRAM
>
> Recommended Actions
> RERUN THE APPLICATION PROGRAM
> IF PROBLEM PERSISTS THEN DO THE FOLLOWING
> CONTACT APPROPRIATE SERVICE REPRESENTATIVE
>
> Detail Data
> SIGNAL NUMBER
>6
> USER'S PROCESS ID:
>44974
> FILE SYSTEM SERIAL NUMBER
>2
> INODE NUMBER
>   215060
> PROCESSOR ID
>0
> PROGRAM NAME
> dsmserv
> ADDITIONAL INFORMATION
> pthread_k A8
> ??
> _p_raise 64
> raise 34
> abort B8
> AbortServ 80
> TrapHandl 13C
> ??
> ??
>
> Symptom Data
> REPORTABLE
> 1
> INTERNAL ERROR
> 0
> SYMPTOM CODE
> PCSS/SPI2 FLDS/dsmserv SIG/6 FLDS/AbortServ VALU/80
>
> End result is that TSM has gone very "flakey" and falls over a odd times.
>
> Has anyone encounter this before. If so what are my options?
>
> We are looking at doing an unloaddb then reload etc... (If I can get
my
> head around the procedure)..
>
> Any help would be greatly appreciated
>
> Thanks in advance TSM'ers!
>
> Cheers
>
>
>
> Stephen Pole
> WA Dept of Health
>
> [EMAIL PROTECTED]


Re: Intermittant TSM Server - Database stops

2003-12-15 Thread Pole, Stephen
Hi all,

Sorry to trouble all of you.

Here is an event that happened last week during normal operations, while a
large query was being run on the TSM database.

12/11/03 01:08:13 ANR2561I Schedule prompter contacting ICMC02RPA
(session
   4174) to start a scheduled operation.

12/11/03 01:08:32 ANR2017I Administrator OPS issued command: FETCH NEXT
50
12/11/03 01:08:37 ANR2958E SQL temporary table storage has been
exhausted.

The explanation seems pretty self explanantory.

The server was restarted all seemed ok for about 12 hours.

We have added another extended the database by means of another dataase
volume and refrained from performing any major SQL queries etc..

Since then, TSM Server stops at random times. (no messages in the actlog)
The server just falls and has to be restarted.

The only clue something has happened is in the errpt -a

LABEL:  CORE_DUMP
IDENTIFIER: 1F0B7B49

Date/Time:   Mon Dec 15 23:33:11 WAUS
Sequence Number: 40038
Machine Id:  D62A4C00
Node Id: rsfm014
Class:   S
Type:PERM
Resource Name:   SYSPROC

Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED

Probable Causes
SOFTWARE PROGRAM

User Causes
USER GENERATED SIGNAL

Recommended Actions
CORRECT THEN RETRY

Failure Causes
SOFTWARE PROGRAM

Recommended Actions
RERUN THE APPLICATION PROGRAM
IF PROBLEM PERSISTS THEN DO THE FOLLOWING
CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
SIGNAL NUMBER
   6
USER'S PROCESS ID:
   44974
FILE SYSTEM SERIAL NUMBER
   2
INODE NUMBER
  215060
PROCESSOR ID
   0
PROGRAM NAME
dsmserv
ADDITIONAL INFORMATION
pthread_k A8
??
_p_raise 64
raise 34
abort B8
AbortServ 80
TrapHandl 13C
??
??

Symptom Data
REPORTABLE
1
INTERNAL ERROR
0
SYMPTOM CODE
PCSS/SPI2 FLDS/dsmserv SIG/6 FLDS/AbortServ VALU/80

End result is that TSM has gone very "flakey" and falls over a odd times.

Has anyone encounter this before. If so what are my options?

We are looking at doing an unloaddb then reload etc... (If I can get my
head around the procedure)..

Any help would be greatly appreciated

Thanks in advance TSM'ers!

Cheers



Stephen Pole
WA Dept of Health

[EMAIL PROTECTED]