Greetings, I have been through the archives for help with this one, but I still don't have an answer. I support a TSM 5.4.3.0 server running on AIX 5.3ML9. EMC Disk Library for virtual tape, configured as 64 LTO1 tape drives. This server is the library master for both AIX and Windows Lan-free clients running the 5.4.2.0 Lan-free storage agent.
We came in yesterday and found 5 virtual tapes mounted, but in "Retry Dismount Failure" state: ANR8380I LTO volume V50135 is mounted R/O in drive EPC-LTO1-025 (/dev/epc-lto1-025), status: RETRY DISMOUNT FAILURE. ANR8380I LTO volume V50128 is mounted R/O in drive EPC-LTO1-040 (/dev/epc-lto1-040), status: RETRY DISMOUNT FAILURE. ANR8380I LTO volume V50097 is mounted R/O in drive EPC-LTO1-044 (/dev/epc-lto1-044), status: RETRY DISMOUNT FAILURE. ANR8380I LTO volume V50129 is mounted R/O in drive EPC-LTO1-047 (/dev/epc-lto1-047), status: RETRY DISMOUNT FAILURE.. ANR8380I LTO volume V50317 is mounted R/O in drive EPC-LTO1-006 (/dev/epc-lto1-006), status: RETRY DISMOUNT FAILURE. They have been in this state over 24 hours now, and we can't clear them. We can tell this is problem caused because of a confusion between the Library master and one of the Lan-free agents. My surmise is that the lan-free agent thinks it is finished with the drives, but that message never gets to the TSM server. Later the TSM Server's timeout logic tries to reclaim the drive, but the lan-free server still has a SCSI reserve on the tape drive, so the TSM Server can't open it to talk to it. We went out to the EDL appliance and dismounted the virtual tapes from the drives, so they are empty. We have tried restarting both the TSM server software and Lan-free agent. We have rebooted the Windows server running the Lan-free agent. We have deleted and rediscovered the AIX rmt devices on the library master. All those worked fine. We did an 'update server STL-PVMCONBKP02 forcsync=yes' between the server an TSM server and the lan-free agent, but that didn't help. The 'Retry Dismount Failure' errors still persist. Every little while we still get the following messages in the server activity log. Since the session between the server and the lan-free agent STL-PVMCONBKP02 isn't getting any errors, it is not a simple communication problem between them. 04/27/10 08:16:51 ANR0408I Session 11595 started for server STL-PVMCONBKP02 (Windows) (Tcp/Ip) for library sharing. (SESSION: 11595) 04/27/10 08:16:51 ANR0408I Session 11596 started for server STL-PVMCONBKP02 (Windows) (Tcp/Ip) for library sharing. (SESSION: 11596) 04/27/10 08:16:51 ANR0408I Session 11597 started for server STL-PVMCONBKP02 (Windows) (Tcp/Ip) for library sharing. (SESSION: 11597) 04/27/10 08:16:51 ANR0408I Session 11598 started for server STL-PVMCONBKP02 (Windows) (Tcp/Ip) for library sharing. (SESSION: 11598) 04/27/10 08:16:51 ANR0408I Session 11599 started for server STL-PVMCONBKP02 (Windows) (Tcp/Ip) for library sharing. (SESSION: 11599) 04/27/10 08:16:51 ANR0409I Session 11595 ended for server STL-PVMCONBKP02 (Windows). (SESSION: 11595) 04/27/10 08:16:51 ANR0409I Session 11596 ended for server STL-PVMCONBKP02 (Windows). (SESSION: 11596) 04/27/10 08:16:51 ANR0409I Session 11597 ended for server STL-PVMCONBKP02 (Windows). (SESSION: 11597) 04/27/10 08:16:51 ANR0409I Session 11598 ended for server STL-PVMCONBKP02 (Windows). (SESSION: 11598) 04/27/10 08:16:51 ANR0409I Session 11599 ended for server STL-PVMCONBKP02 (Windows). (SESSION: 11599) 04/27/10 08:16:51 ANR1794W TSM SAN discovery is disabled by options. (SESSION: 11595) 04/27/10 08:16:51 ANR8965W The server is unable to automatically determine the serial number for the device. (SESSION: 11595) 04/27/10 08:16:51 ANR8779E Unable to open drive /dev/epc-lto1-044, error number=16. (SESSION: 11595) 04/27/10 08:16:51 ANR1794W TSM SAN discovery is disabled by options. (SESSION: 11598) 04/27/10 08:16:51 ANR8965W The server is unable to automatically determine the serial number for the device. (SESSION: 11598) 04/27/10 08:16:51 ANR8779E Unable to open drive /dev/epc-lto1-040, error number=16. (SESSION: 11598) 04/27/10 08:16:51 ANR1794W TSM SAN discovery is disabled by options. (SESSION: 11596) 04/27/10 08:16:51 ANR8965W The server is unable to automatically determine the serial number for the device. (SESSION: 11596) 04/27/10 08:16:51 ANR8779E Unable to open drive /dev/epc-lto1-006, error number=16. (SESSION: 11596) 04/27/10 08:16:51 ANR1794W TSM SAN discovery is disabled by options. (SESSION: 11599) 04/27/10 08:16:51 ANR8965W The server is unable to automatically determine the serial number for the device. (SESSION: 11599) 04/27/10 08:16:51 ANR8779E Unable to open drive /dev/epc-lto1-047, error number=16. (SESSION: 11599) 04/27/10 08:16:51 ANR1794W TSM SAN discovery is disabled by options. (SESSION: 11597) 04/27/10 08:16:51 ANR8965W The server is unable to automatically determine the serial number for the device. (SESSION: 11597) 04/27/10 08:16:51 ANR8779E Unable to open drive /dev/epc-lto1-025, error number=16. (SESSION: 11597) We have seen this before, but rarely, and in the past we were always able to clear it by restarting the Lan-free agent and the TSM server software. But this time that isn't working. FYI, every once in a while, the 'Retry Dismount Failure' will change to 'Dismounting' for a few seconds, then goes back to 'Retry Dismount Failure' again. So TSM is obviously trying to do something to clear it. Can anyone suggest a procedure for clearing this condition? Best Regards, John D. Schneider The Computer Coaching Community, LLC Office: (314) 635-5424 / Toll Free: (866) 796-9226 Cell: (314) 750-8721