Dear TSMville, Hanging Backup Invalidates TSM Journal
o - TSM Client 5.1.5.0, Journaling engine backing up c 12,000,000 files with a daily addition/update of around 80,000. When it works = 40 minutes. When it doesn't = 13 hours... o - TSM Server 5.1.6.2 WinNT A couple of nights ago, a journal backup hung and just kinda stayed around on the TSM server in IdleW without anyone noticing. The next day's backup began and, I'm guessing from hereon, it couldn't get access to the TSM journal, so it reverted to a looooong normal incremental backup. I subsequently spotted this, killed off the two IdleW sessions and kicked off a new backup on the journal client. However, it failed to do a journal backup and started a normal incremental again... Looking in the dsmerror.log, I spy a 'NpOpen: Named pipe error connecting to server WaitOnPipe failed. > NpOpen: call failed with return code:121 pipe name //./pipe.jnl'. I understand that this named pipe is opened up at the initiation of a journal backup as the b/a client attempts to connect to the journal daemon - the return code 121 suggests that the connect failed, and possibly the tsmjbbd.exe process wasn't up and running. I look at task manager, and it is, but consuming a 'healthy' 263,632K of memory. Observing its behaviour, I see it is still doing some work 'I/O Other' in Task Manager's useful extra columns, but nothing in the 'I/O Writes' or 'Reads' section, is this suspect... I'm guessing that the journal became invalidated somewhere down the line during the hung backup, or that the subsequent attempt at a backup failed as maybe the old TSM backup still has a lock on it? The tsmjbbd.exe is still present, and there is nothing from these dates in the jbberror.log. Any ideas what may be going on here? I seem to be able to get around 6 or 7 days of JBB backups before it starts to break and I have to hand-hold it to get it up again... In terms of automatically monitoring this, sticking a Tivoli process monitor to make sure the tsmjbbd.exe process is running is only useful to a point (i.e. it wouldn't have spotted the above), so it looks as though I'm going to have to trawl the stdout of our backup logs to make sure that 'using journal for x$' is present. Any ideas where else I should be looking - perhaps in the (what we've called) jbberror.log for 'Journal will be restarted for FS x'? So, questions are: o - any ideas what might be behind the above? A dead/alive tsmjbbd.exe, and if so, how? o - tsmjbbd.exe - how big should it be in 'healthy' usage? Is 263MB a bit excessive? o - any ideas about the best way to monitor (preferably using Tivoli e.g. ITM, logfile adapters etc) jbb backups? Quite a lot there - sorry! Rgds, David McClelland Global Management Systems Reuters 85 Fleet Street London EC4P 4AJ E-mail [EMAIL PROTECTED] Reuters Messaging [EMAIL PROTECTED] -------------------------------------------------------------- -- Visit our Internet site at http://www.reuters.com Get closer to the financial markets with Reuters Messaging - for more information and to register, visit http://www.reuters.com/messaging Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.