Re: hidden flags to EXPIRE INV and CLEANUP EXPTABLE

2006-04-21 Thread Josh-Daniel Davis

Correction:
In the SHOW NODE against the subkeys, the KEY is the NODE_NAME.  Field 1
is still the node number and field2 is PLATFORM_NAME.

Also, beware of using SHOW NODE on wrong or random pages.

On 06.04.20 at 22:58 [EMAIL PROTECTED] wrote:


Date: Thu, 20 Apr 2006 22:58:34 -0500
From: Josh Davis [EMAIL PROTECTED]
Reply-To: ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU
To: ADSM-L@VM.MARIST.EDU
Subject: hidden flags to EXPIRE INV and CLEANUP EXPTABLE

UNDOCUMENTED OPTIONS FOR EXPIRE INVENTORY:
There are two undocumented/unsupported options for EXPIRE INV;
BEGINNODEID and ENDNODEID.

These accept the decimal node number of a node and can be used to
expire a specific node's filespaces, or a specific range.



WHY WOULD YOU EVER WANT TO USE THESE?
EXPIRE INV won't check for filespace lock before parsing a filespace.

As such, if you're running expiration, and a node is backing up a
filespace when expire inventory gets to it, expire inventory will wait
indefinitely.

When this happens, CANCEL EXPIRATION or CANCEL PROC will register as
Cancel Pending but will hang there until the lock is released.

Officially there's supposed to be a resource timeout, but IBM wasn't
able to give details on how long this is.




HOW TO FIND THE NODE NUMBERS:
Node numbers are sequential, starting at 1, and are in REG_TIME order.
Deletions leave gaps.

The short way would be a SELECT statement.  Supposedly this can be
done, but I couldn't figure out the column name.  IBM doesn't like to
give info regarding undocumented/unsupported options since that might
make them liable to support or defend them in the future.

The long way is to use SHOW commands.  Use SHOW OBJDIR to find the
btree node for the Nodes table.  This SHOULD be 38.

SHOW NODE 38 (hopefully) will show the top level of the tree.
On average, there are about 11 second-level leaf nodes per first level
leaf node.

If you do SHOW NODE on each subtree, and save these to a file, you'll
have the raw data for the nodes table.

In the data section, field 1 is the node number in hex, and field 2 is
the node name in all-caps ascii.



OTHER USES FOR THE NODEID:
This can be used with SHOW LOCKS and SHOW THREADS to find out which
node is holding the lock preventing expire inv from continuing.  From
there, you can kill a session so that expire inv can continue or be
cancelled.

This can also be converted to decimal so you can run EXPIRE INV
BEGINNODE=10 ENDNODE=20 or similar to operate only on a specific subset
of nodes.  This could be used to avoid nodes which have long-running
transactions, to quick-expire a huge bunch of data that was just
deleted, or to set up scheduled expirations for heavy-expire nodes.

These same flags work on CLEANUP EXPTABLE.  Since there is no way to
cancel CLEANUP EXPTABLE, then running it on a small subset of nodes can
help if you suspect you're not expiring all that you should be, but
don't want to risk having to shutdown TSM to abort it when you're 50
million objects in and it's a week after you started it.


WHY I'M SHARING THIS INFO:
I've opened a DCR requesting EXPIRE INVENTORY be given an option to
allow detection and skipping of locked filespaces, and that it should
be implemented without killing expiration or the session/process
holding the filespace lock.

The FITS request number is MR0420061821 if you or anyone wants to be
added to the notify/me-too list for this.

If your sales rep doesn't know where/how to get to FITS, it's on
D03DB004.boulder.ibm.com.  I think it's under m_dir (marketing).

This was way longer than I anticipated, but seemed useful enough to
risk sharing.

--
Josh



Re: hidden flags to EXPIRE INV and CLEANUP EXPTABLE

2006-04-21 Thread Richard Rhodes

 UNDOCUMENTED OPTIONS FOR EXPIRE INVENTORY:
 There are two undocumented/unsupported options for EXPIRE INV;
 BEGINNODEID and ENDNODEID.

 These accept the decimal node number of a node and can be used to
 expire a specific node's filespaces, or a specific range.



 WHY WOULD YOU EVER WANT TO USE THESE?
 EXPIRE INV won't check for filespace lock before parsing a filespace.


We are currently using the CLEANUP EXPTABLE cmd with
the BEGINNODEID/ENDNODEID options . . . . . .  UNDER THE
DIRECTION OF TSM SUPPORT.

We are on TSM V5.3.2.  Back when we were V5.1 (last year),
we were getting ANR messages that prevented us from deleted
old nodes out of TSM.  We did bunches of stuff with support, but
the fix was to move to v5.3.  This we were planning, and did.

As part of migrating to v5.3 we ran the cleanup procedure for Win
system objects.  After upgrading we still couldn't delete the nodes
out of TSM.  The solution was to run the CLEANUP EXPTABLE command.
We did this on our test system for each of our production databases.
Once started, this command cannot be stopped without halting the TSM
server.
Running it against a production DB on our test server ran well over 2
weeks for one TSM server, and just about 2 weeks for our other TSM server.
After running, we were able to delete the nodes.

Ok, now we are told by the support center to run it on
production.   Remember . . . . you cannot run expiration while this is
running.  We told them we could not go without expiration for that long,
and
we can't just halt our TSM server at anytime to stop it!!
The answer was to get the NODEID's for all the nodes and run CLEANUP
EXPTABLE for one
node at a time using the BEGINNODEID/ENDNODEID.  We tried this on one
node . . . it worked, but we have well over 500 nodes on each TSM server.

I've writtes a script that automatically works through a file of nodeid's.
After each node is cleaned up, it runs expiration for some amount of
time.  The longer the cleanup runs, the longer expiration runs.
I run this script Monday thru Friday, cleaning up one node at a time with
some normal expiration between each cleanup.  On
the weekend I put our normal expiration processing back in place to get
a couple good full expiration runs.

It took several months to work through the first TSM server.  If I
understand the output of the command, it fixed almost 15 million
errors on this TSM server.  The 2nd  TSM server has been going
through this process since Feb 1st is finally getting close to
finishing - it's processed 509 out of 526 nodes.

Anyway, this is one reason to use these options.

If anyone has a similar problem, I would be happy to send them this script
(ksh).  It's
highly specific to our environment, but could be adapted easily.

rick



-
The information contained in this message is intended only for the
personal and confidential use of the recipient(s) named above. If
the reader of this message is not the intended recipient or an
agent responsible for delivering it to the intended recipient, you
are hereby notified that you have received this document in error
and that any review, dissemination, distribution, or copying of
this message is strictly prohibited. If you have received this
communication in error, please notify us immediately, and delete
the original message.


hidden flags to EXPIRE INV and CLEANUP EXPTABLE

2006-04-20 Thread Josh Davis

UNDOCUMENTED OPTIONS FOR EXPIRE INVENTORY:
There are two undocumented/unsupported options for EXPIRE INV;
BEGINNODEID and ENDNODEID.

These accept the decimal node number of a node and can be used to
expire a specific node's filespaces, or a specific range.



WHY WOULD YOU EVER WANT TO USE THESE?
EXPIRE INV won't check for filespace lock before parsing a filespace.

As such, if you're running expiration, and a node is backing up a
filespace when expire inventory gets to it, expire inventory will wait
indefinitely.

When this happens, CANCEL EXPIRATION or CANCEL PROC will register as
Cancel Pending but will hang there until the lock is released.

Officially there's supposed to be a resource timeout, but IBM wasn't
able to give details on how long this is.




HOW TO FIND THE NODE NUMBERS:
Node numbers are sequential, starting at 1, and are in REG_TIME order.
Deletions leave gaps.

The short way would be a SELECT statement.  Supposedly this can be
done, but I couldn't figure out the column name.  IBM doesn't like to
give info regarding undocumented/unsupported options since that might
make them liable to support or defend them in the future.

The long way is to use SHOW commands.  Use SHOW OBJDIR to find the
btree node for the Nodes table.  This SHOULD be 38.

SHOW NODE 38 (hopefully) will show the top level of the tree.
On average, there are about 11 second-level leaf nodes per first level
leaf node.

If you do SHOW NODE on each subtree, and save these to a file, you'll
have the raw data for the nodes table.

In the data section, field 1 is the node number in hex, and field 2 is
the node name in all-caps ascii.



OTHER USES FOR THE NODEID:
This can be used with SHOW LOCKS and SHOW THREADS to find out which
node is holding the lock preventing expire inv from continuing.  From
there, you can kill a session so that expire inv can continue or be
cancelled.

This can also be converted to decimal so you can run EXPIRE INV
BEGINNODE=10 ENDNODE=20 or similar to operate only on a specific subset
of nodes.  This could be used to avoid nodes which have long-running
transactions, to quick-expire a huge bunch of data that was just
deleted, or to set up scheduled expirations for heavy-expire nodes.

These same flags work on CLEANUP EXPTABLE.  Since there is no way to
cancel CLEANUP EXPTABLE, then running it on a small subset of nodes can
help if you suspect you're not expiring all that you should be, but
don't want to risk having to shutdown TSM to abort it when you're 50
million objects in and it's a week after you started it.


WHY I'M SHARING THIS INFO:
I've opened a DCR requesting EXPIRE INVENTORY be given an option to
allow detection and skipping of locked filespaces, and that it should
be implemented without killing expiration or the session/process
holding the filespace lock.

The FITS request number is MR0420061821 if you or anyone wants to be
added to the notify/me-too list for this.

If your sales rep doesn't know where/how to get to FITS, it's on
D03DB004.boulder.ibm.com.  I think it's under m_dir (marketing).

This was way longer than I anticipated, but seemed useful enough to
risk sharing.

--
Josh