VSWITCH Controller failover issue on z/VM 5.2
This weekend the LAN team upgraded the CISCO router connected to the OSA card. The VSWITCH controller console shows the message: DTCOSD309W Received adapter-initiated Stop Lan after which it tries to fail over from the primary OSA to the backup OSA. Eventually I see: DTCOSD306I Received adapter-initiated Start Lan and the backup OSA addresses get started fine and assign all the IP addresses. There are several more iterations of Stop Lan and Start Lan, but it is always on the backup OSA addresses, never on the primary OSA addresses. So the question is: why didn't it ever try to restart on the primary OSA when the backup OSA received the Stop Lan? If both OSA cards were the same it might not matter that much, but the primary is a Gig-OSA and the backup is a Fast-OSA. - Oh, and just to make it more fun, lastly there is a Stop Lan, the various normal messages about stopping, attempting to restart, and: AMPX036I ASSERTION FAILURE CHECKING ERROR TRACE BACK OF CALLED ROUTINES ROUTINE STMT AT ADDRESS IN MODULE SPSM_BLKALLOCATE 3900D429D0 TCFPSM_FPSM INITDCB4800DA079A TCTOOSD_TOOSD TOOSDINIT 3300DA2DF4 TCTOOSD_TOOSD CALLINITRTN3000CB9506 TCPARSE_PARSETCP PROCESSSTARTSTOPSTATEMENT 3300CAE37E TCPARSE_PARSETCP PARSEOPTION 29200CB3C00 TCPARSE_PARSETCP RECEIVECONTROLLERMSGFROMCP 2500D9F010 TCTOOSD_TOOSD TOIUCV18000D3D580 TCTOIUC_TOIUCV Schedule 167000CFA118 MAIN-PROGRAM 1400C441FE TCPIP VSPASCAL 00E4D74A DTCOSD100E Insufficient Fixed Page Storage Pool storage after which it tries to shutdown and then goes into repeated: AMPX015I ADDRESSING EXCEPTION abends. Needless to say there were a lot of dump files in spool space. So, IBM will get a call on the abends. I could add more virtual storage to the controllers, but that will presumably only add cushion for more stops restarts before it runs out of storage again. Brian Nielsen
Re: VSWITCH Controller failover issue on z/VM 5.2
after the successful fail over from primary to backup RDEVs for the OSAs, what state was reported for the primary RDEVs? David -Original Message- From: The IBM z/VM Operating System on behalf of Brian Nielsen Sent: Mon 4/24/2006 1:39 PM To: IBMVM@LISTSERV.UARK.EDU Subject: [IBMVM] VSWITCH Controller failover issue on z/VM 5.2 This weekend the LAN team upgraded the CISCO router connected to the OSA = card. The VSWITCH controller console shows the message: DTCOSD309W Received adapter-initiated Stop Lan after which it tries to fail over from the primary OSA to the backup OSA.= Eventually I see: DTCOSD306I Received adapter-initiated Start Lan and the backup OSA addresses get started fine and assign all the IP addresses. There are several more iterations of Stop Lan and Start Lan, but it is = always on the backup OSA addresses, never on the primary OSA addresses. So the question is: why didn't it ever try to restart on the primary OSA = when the backup OSA received the Stop Lan? If both OSA cards were the = same it might not matter that much, but the primary is a Gig-OSA and the = backup is a Fast-OSA. - Oh, and just to make it more fun, lastly there is a Stop Lan, the various= normal messages about stopping, attempting to restart, and: AMPX036I ASSERTION FAILURE CHECKING ERROR TRACE BACK OF CALLED ROUTINES ROUTINE STMT AT ADDRESS IN MODULE SPSM_BLKALLOCATE 3900D429D0 TCFPSM_FPSM INITDCB4800DA079A TCTOOSD_TOOSD TOOSDINIT 3300DA2DF4 TCTOOSD_TOOSD CALLINITRTN3000CB9506 TCPARSE_PARSETCP PROCESSSTARTSTOPSTATEMENT 3300CAE37E TCPARSE_PARSETCP PARSEOPTION 29200CB3C00 TCPARSE_PARSETCP RECEIVECONTROLLERMSGFROMCP 2500D9F010 TCTOOSD_TOOSD TOIUCV18000D3D580 TCTOIUC_TOIUCV Schedule 167000CFA118 MAIN-PROGRAM 1400C441FE TCPIP VSPASCAL 00E4D74A DTCOSD100E Insufficient Fixed Page Storage Pool storage after which it tries to shutdown and then goes into repeated: AMPX015I ADDRESSING EXCEPTION abends. Needless to say there were a lot of dump files in spool space. So, IBM will get a call on the abends. I could add more virtual storage = to the controllers, but that will presumably only add cushion for more = stops restarts before it runs out of storage again. Brian Nielsen
Re: VSWITCH Controller failover issue on z/VM 5.2
On Monday, 04/24/2006 at 12:39 EST, Brian Nielsen [EMAIL PROTECTED] wrote: There are several more iterations of Stop Lan and Start Lan, but it is always on the backup OSA addresses, never on the primary OSA addresses. So the question is: why didn't it ever try to restart on the primary OSA when the backup OSA received the Stop Lan? If both OSA cards were the same it might not matter that much, but the primary is a Gig-OSA and the backup is a Fast-OSA. There is no permanent association of primary and backup in the VSWITCH. There is just a list of available OSAs. The VSWITCH starts with the first one and fails over until it finds a working one. Whichever one is active is the primary. The others are backups. (You can see that in QUERY VSWITCH.) So, I'm not 100% sure I'm reading your post correctly, but I would have expected identical behavior vis a vis the decision to fail over, without regard to which OSA was active and which were backups. A StopLAN on the currently active OSA should have caused a failover. If none of the backups worked (i.e. all OSAs unplugged at the same time), I would have expected CP to come back to the OSA that is currently 'active' and sit and wait for StartLAN on the active OSA or one of the backups. At that point, whichever one comes up first would become the active OSA. Of course, with the abend in the controller, bizzare things were obviously going on. The Support Center definitely needs to look at it. FWIW, if you want it to go back to the first OSA in the list, set up some system automation to do a SET VSWITCH DISCONNECT followed by a SET VSWITCH CONNECT when you get an adapter-initiated StartLAN on the desired device (as seen on the controller's console). CP will go back to the beginning of the OSA list and start looking for a working device. Alan Altmark z/VM Development IBM Endicott
Re: VSWITCH Controller failover issue on z/VM 5.2
The primaries were listed as BACKUP. Brian Nielsen On Mon, 24 Apr 2006 14:21:11 -0400, David Kreuter [EMAIL PROTECTED] resources.com wrote: after the successful fail over from primary to backup RDEVs for the OSAs , what state was reported for the primary RDEVs? David -Original Message- From: The IBM z/VM Operating System on behalf of Brian Nielsen Sent: Mon 4/24/2006 1:39 PM To: IBMVM@LISTSERV.UARK.EDU Subject: [IBMVM] VSWITCH Controller failover issue on z/VM 5.2 This weekend the LAN team upgraded the CISCO router connected to the OSA = card. The VSWITCH controller console shows the message: DTCOSD309W Received adapter-initiated Stop Lan after which it tries to fail over from the primary OSA to the backup OSA .= Eventually I see: DTCOSD306I Received adapter-initiated Start Lan and the backup OSA addresses get started fine and assign all the IP addresses. There are several more iterations of Stop Lan and Start Lan, but it is = always on the backup OSA addresses, never on the primary OSA addresses. So the question is: why didn't it ever try to restart on the primary OSA = when the backup OSA received the Stop Lan? If both OSA cards were the = same it might not matter that much, but the primary is a Gig-OSA and the = backup is a Fast-OSA. - Oh, and just to make it more fun, lastly there is a Stop Lan, the variou s= normal messages about stopping, attempting to restart, and: AMPX036I ASSERTION FAILURE CHECKING ERROR TRACE BACK OF CALLED ROUTINES ROUTINE STMT AT ADDRESS IN MODULE SPSM_BLKALLOCATE 3900D429D0 TCFPSM_FPSM INITDCB4800DA079A TCTOOSD_TOOSD TOOSDINIT 3300DA2DF4 TCTOOSD_TOOSD CALLINITRTN3000CB9506 TCPARSE_PARSETCP PROCESSSTARTSTOPSTATEMENT 3300CAE37E TCPARSE_PARSETCP PARSEOPTION 29200CB3C00 TCPARSE_PARSETCP RECEIVECONTROLLERMSGFROMCP 2500D9F010 TCTOOSD_TOOSD TOIUCV18000D3D580 TCTOIUC_TOIUCV Schedule 167000CFA118 MAIN-PROGRAM 1400C441FE TCPIP VSPASCAL 00E4D74A DTCOSD100E Insufficient Fixed Page Storage Pool storage after which it tries to shutdown and then goes into repeated: AMPX015I ADDRESSING EXCEPTION abends. Needless to say there were a lot of dump files in spool space. So, IBM will get a call on the abends. I could add more virtual storage = to the controllers, but that will presumably only add cushion for more = stops restarts before it runs out of storage again. Brian Nielsen = ===
Re: VSWITCH Controller failover issue on z/VM 5.2
Thanks. So whichever OSA does a Start Lan first becomes the current primary. Perhaps someone knows if there's a way to force the CISCO route r to start the preferred gig-OSA interface before the fast-OSA interface. I accomplished going back to the desired OSA by autologging DTCVSW1 (whic h is normally the active one, but had abended and was no longer logged on). Then forcing and autologging DTCVSW2 (which had taken over after DTCVSW1 abended). A side benefit was a clean virtual address space for each. Brian Nielsen On Mon, 24 Apr 2006 14:39:53 -0400, Alan Altmark [EMAIL PROTECTED] wrote: On Monday, 04/24/2006 at 12:39 EST, Brian Nielsen [EMAIL PROTECTED] wrote: There are several more iterations of Stop Lan and Start Lan, but it is always on the backup OSA addresses, never on the primary OSA addresses . So the question is: why didn't it ever try to restart on the primary O SA when the backup OSA received the Stop Lan? If both OSA cards were the same it might not matter that much, but the primary is a Gig-OSA and t he backup is a Fast-OSA. There is no permanent association of primary and backup in the VSWITCH. There is just a list of available OSAs. The VSWITCH starts wi th the first one and fails over until it finds a working one. Whichever on e is active is the primary. The others are backups. (You can see tha t in QUERY VSWITCH.) So, I'm not 100% sure I'm reading your post correctly, but I would have expected identical behavior vis a vis the decision to fail over, without regard to which OSA was active and which were backups. A StopLAN on the currently active OSA should have caused a failover. If none of the backups worked (i.e. all OSAs unplugged at the same time), I would have expected CP to come back to the OSA that is currently 'active' and sit a nd wait for StartLAN on the active OSA or one of the backups. At that poin t, whichever one comes up first would become the active OSA. Of course, with the abend in the controller, bizzare things were obvious ly going on. The Support Center definitely needs to look at it. FWIW, if you want it to go back to the first OSA in the list, set up som e system automation to do a SET VSWITCH DISCONNECT followed by a SET VSWIT CH CONNECT when you get an adapter-initiated StartLAN on the desired device (as seen on the controller's console). CP will go back to the beginning of the OSA list and start looking for a working device. Alan Altmark z/VM Development IBM Endicott =