Re: TSM Client Return Codes - Failed or partial failed

2011-08-31 Thread Richard Sims
One thing you can do is query the ACTLOG for client-sent message ANE4959I 
telling of more than 0 files failed, which may correlate with previous ANE4037E 
messages about files changing during processing, ANE4005E file not found, and 
the like.  (If the same files show up day after day, then it's likely that the 
client administrator is not reviewing logs for issues, where the server 
administrator should best notify them.)
Similarly, the SUMMARY table contains a FAILED count.

Note that you can also do like
  select NODE_NAME, RESULTS from EVENTS where NODE_NAME is not null
to react to the return code from the client operation.

 Richard Sims  at Boston University


Re: TSM Client Return Codes - Failed or partial failed

2011-08-31 Thread Huebschman, George J.
* A client can have failed objects in the Schedule Summary report
without having a failed backup.
The Return Codes from the Client RC 0, RC 4, RC 8, and RC 12 determine
whether or not a Scheduled backup fails.
A Scheduled backup can FAIL with NO failed files.  That used to drive me
crazy, but that is a short trip for me.


* Messages like these in the Client's dsmerror log should NOT cause RC
12 return codes (FAIL):
date_time ANS4037E Object \\...\\ changed during processing. Object
skipped.
date_time ANS4005E Error processing '\\path\to\file.tmp': file not
found
date_time ANS1228E Sending of object
'\\path\to\file_important_Workbook.xlsx' failed
date_time ANS4987E Error processing
'\\path\to\file_important_Workbook.xlsx': the object is in use by
another process

A file... changed during processing error (ANS4037E) happens
when a file has changed between the inventory and actually backup
attempt.  TSM skips it but does not fail the backup event.
A file not found error (ANS4005E)occurs because the client can
not find a file that was inventoried at the beginning of the backup
event.  The client starts the Incremental backup by checking to see what
new or changed objects exist. It makes a list and then backs them up.
If the file is deleted or moved between the creation of the list and the
backup of the object, (as with the .tmp files above), you get a message,
but not a failed backup.
A file in use error (ANS4987E) is regarded as a minor issue by
TSM as well. TSM feels it should tell you about it, but since it is not
in a state where a good backup can be taken, it is not a fatal event.
Your TSM Server CopyGroup serialization settings determine if
TSM will try to back it up again.
The CHAngingretries option for the Client in the dsm.sys or
dsm.opt file determines how many times a client should retry the file.
You may see Retry messages such as these for objects in that
situation:
08/30/2011 01:27:43 Retry # 1  Normal File-- 2,992,622
\\CIFS_Filer_name\Long\long\path\to\file_Report.pdf [Sent]

* Actual FAILure of the scheduled backup event is often from an
inability to access a path or (Client) domain, from inability to perform
a specific option in an options file (such as an include statement, or a
pre-scheduled command), or from permissions errors:
08/22/2011 15:34:30 ANS4013E Error processing '\\Path\to\some_file:
invalid file handle

08/22/2011 15:43:02 ANS1512E Scheduled event '6PM-DAILY-INCR' failed.
Return code = 12

08/30/2011 01:27:43 ANS4007E Error processing
'\\rrstore11a\assetmgmt\Users\ADoyle\IEfavs\Links\Customize Links.url':
access to the object is denied
These ANS4007E messages are a pain for me.  Often they indicate
that the Client is running from a profile with insufficient permissions
to access the file.  In my case these are files on a CIFS share.  The
filer, CIFS, VSCAN, and Virus software and not working together well.

On Windows clients if you are backing up SystemState and there are VSS
errors...welcome to the club.  They will fail your Scheduled backup
event.  Looking at one of mine that I see failed for VSS/SystemState
errors, there is no mention of the VSS error/failure in the TSM Server
Actlog.

* You can look for these message codes in the TSM Server ACTLOG to
determine Failed (and Missed) backups:
***MISSED/FAILED
FAILED: q ac begint=-24 msg=2579
MISSED: q ac begint=-24 msg=2578
You can also do them as selects.

Checking for this with a TSM Server query actlog msg=4959 will NOT
tell you if the backup failed
08/30/2011 20:58:23  ANE4959I (Session: 4580433, Node:
Some_Client_Name)  Total
  number of objects failed:   4
(SESSION: 4580433)
For example, in the summary information reported the the TSM Server by
the Client (and logged in the Actlog) there are 15 objects failed, but
the backup was successful
  4579578)
08/30/2011 18:27:02  ANE4952I (Session: 4579578, Node:
SOME_CLIENT_NAME)  Total
  number of objects inspected:   44,415
(SESSION: 4579578)
08/30/2011 18:27:02  ANE4954I (Session: 4579578, Node:
SOME_CLIENT_NAME)  Total
  number of objects backed up:3,360
(SESSION: 4579578)
08/30/2011 18:27:02  ANE4958I (Session: 4579578, Node:
SOME_CLIENT_NAME)  Total
  number of objects updated:  0
(SESSION: 4579578)
08/30/2011 18:27:02  ANE4960I (Session: 4579578, Node:
SOME_CLIENT_NAME)  Total
  number of objects rebound:  0
(SESSION: 4579578)
08/30/2011 18:27:02  ANE4957I (Session: 4579578, Node:
SOME_CLIENT_NAME)  Total
  number of objects deleted:  0
(SESSION: 4579578)
08/30/2011 18:27:02  ANE4970I (Session: 4579578, Node:
SOME_CLIENT_NAME)  Total
  number of objects expired:  7
(SESSION: 4579578)

08/30/2011 18:27:02  ANE4959I (Session: 4579578,