Re: [cisco-voip] MRA DR / Resilience
For SIP yes, but I can’t tell if it also works with UDS or not. I’ve a question in to find out, but that may be dependent on future releases. I’ve been going over what happened, and SSO introduces a new layer. A very helpful gentleman from TAC spend a bunch of time with me going over how Jabber and Expressway sort of handle this. Expressway, at least in X12.6 where I’m at, has no understanding that a UCM node is down as far as UDS is concerned. Requests can forward to downed UCM which will return a HTTP error status code to Jabber HTTP transaction failures will cause re-auth to be triggered Jabber can also want to talk to a UCM that isn’t there, sometimes repeatedly for some reason instead of choosing a new one So, there are a number of reasons that may trigger a cycle where Jabber wants to verify it’s token validity, tries to talk to nothing a few times, then after about 3 retries it will give up and punt the user out. It’s not clear from looking at the Jabber log (and going cross eyed in the process) if Jabber is aware that it can try a different UCM or not. It shows the URL being put on a block list, but, then it just uses it again anyway. Still waiting to learn more about what happened. Best, Adam From: ROZA, Ariel Sent: Monday, January 18, 2021 1:59 PM To: ROZA, Ariel ; NateCCIE ; Pawlowski, Adam Cc: cisco-voip@puck.nether.net Subject: RE: [cisco-voip] MRA DR / Resilience I just reread the release notes, and it includes the case where CUCM is down. De: cisco-voip mailto:cisco-voip-boun...@puck.nether.net>> En nombre de ROZA, Ariel Enviado el: lunes, 18 de enero de 2021 15:53 Para: NateCCIE mailto:natec...@gmail.com>>; Pawlowski, Adam mailto:aj...@buffalo.edu>> CC: cisco-voip@puck.nether.net<mailto:cisco-voip@puck.nether.net> Asunto: Re: [cisco-voip] MRA DR / Resilience But will this include the scenario were one of the CUCMs is down? Don´t see explicitly in the notes… De: cisco-voip mailto:cisco-voip-boun...@puck.nether.net>> En nombre de NateCCIE Enviado el: miércoles, 13 de enero de 2021 10:56 Para: Pawlowski, Adam mailto:aj...@buffalo.edu>> CC: cisco-voip@puck.nether.net<mailto:cisco-voip@puck.nether.net> Asunto: Re: [cisco-voip] MRA DR / Resilience SIP Registration Failover for Cisco Jabber - MRA Deployments https://www.cisco.com/c/dam/en/us/td/docs/voice_ip_comm/expressway/release_note/Cisco-Expressway-Release-Note-X12-7.pdf#page16<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cisco.com%2Fc%2Fdam%2Fen%2Fus%2Ftd%2Fdocs%2Fvoice_ip_comm%2Fexpressway%2Frelease_note%2FCisco-Expressway-Release-Note-X12-7.pdf%23page16&data=04%7C01%7Cariel.roza%40la.logicalis.com%7C7102b260f7c543fc5d8c08d8bbe27944%7C2e3290cb8d404058abe502c4f58b87e3%7C0%7C0%7C637465928819010016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2hhQwkNYTqiqc6wDDUwV%2B%2BZUcfpKc%2Bpg3otGhRX5ePw%3D&reserved=0> This is new in x12.7 Sent from my iPhone On Jan 13, 2021, at 6:10 AM, Pawlowski, Adam mailto:aj...@buffalo.edu>> wrote: Hey all, I’m playing in this scenario now and trying to figure out what parts of the solution work, and which do not, in a DR “site failover’ kind of scenario with regard to MRA. I understand the documentation prescribes there’s no failover for voice and video, but I think that failover is different than the one I’m describing here. I know I can take Expressway C and Expressway E nodes out of the cluster at will, and things will heal over time once the Jabber clients catch up. I can take a Unity Connection guest down, and it should work, though the Jetty service certainly has load limits. I don’t think I’m hitting those here. I can take an IM&P node down, and, with the exception of pChat services (DB was not deployed HA and merge job just seems to fail but that’s another investigation), clients will eventually fail over and recover. Today, we have half the C cluster, half the E cluster, and one of two CUC nodes down. All IMP are up. One UCM subscriber is down, and things have been going poorly. Jabber customers keep getting punted from the client with “Your session has expired” randomly. The Jabber log looks like this token has expired, but, doesn’t provide enough debugging to know why. It’s possible that the Expressway E is fronting this message, since I understand it sits between Jabber and the rest of the infrastructure for oAuth, and Jabber does not talk to the UCM/CUC directly. When we did not have SSO, the worst thing we had to do is make sure that the Jabber client’s device pool had an active UCM as the primary in the CMGroup, as they wouldn’t register properly without that, but, those UCMs are up. Does anyone know what might be going on here? My best guess is that the Expressway isn’t intelligent enough to mark a UCM out of service when unreachable (or CUC server for that matter) and it is tryi
Re: [cisco-voip] MRA DR / Resilience
I just reread the release notes, and it includes the case where CUCM is down. De: cisco-voip En nombre de ROZA, Ariel Enviado el: lunes, 18 de enero de 2021 15:53 Para: NateCCIE ; Pawlowski, Adam CC: cisco-voip@puck.nether.net Asunto: Re: [cisco-voip] MRA DR / Resilience But will this include the scenario were one of the CUCMs is down? Don´t see explicitly in the notes… De: cisco-voip mailto:cisco-voip-boun...@puck.nether.net>> En nombre de NateCCIE Enviado el: miércoles, 13 de enero de 2021 10:56 Para: Pawlowski, Adam mailto:aj...@buffalo.edu>> CC: cisco-voip@puck.nether.net<mailto:cisco-voip@puck.nether.net> Asunto: Re: [cisco-voip] MRA DR / Resilience SIP Registration Failover for Cisco Jabber - MRA Deployments https://www.cisco.com/c/dam/en/us/td/docs/voice_ip_comm/expressway/release_note/Cisco-Expressway-Release-Note-X12-7.pdf#page16<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cisco.com%2Fc%2Fdam%2Fen%2Fus%2Ftd%2Fdocs%2Fvoice_ip_comm%2Fexpressway%2Frelease_note%2FCisco-Expressway-Release-Note-X12-7.pdf%23page16&data=04%7C01%7Cariel.roza%40la.logicalis.com%7C7102b260f7c543fc5d8c08d8bbe27944%7C2e3290cb8d404058abe502c4f58b87e3%7C0%7C0%7C637465928819010016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2hhQwkNYTqiqc6wDDUwV%2B%2BZUcfpKc%2Bpg3otGhRX5ePw%3D&reserved=0> This is new in x12.7 Sent from my iPhone On Jan 13, 2021, at 6:10 AM, Pawlowski, Adam mailto:aj...@buffalo.edu>> wrote: Hey all, I’m playing in this scenario now and trying to figure out what parts of the solution work, and which do not, in a DR “site failover’ kind of scenario with regard to MRA. I understand the documentation prescribes there’s no failover for voice and video, but I think that failover is different than the one I’m describing here. I know I can take Expressway C and Expressway E nodes out of the cluster at will, and things will heal over time once the Jabber clients catch up. I can take a Unity Connection guest down, and it should work, though the Jetty service certainly has load limits. I don’t think I’m hitting those here. I can take an IM&P node down, and, with the exception of pChat services (DB was not deployed HA and merge job just seems to fail but that’s another investigation), clients will eventually fail over and recover. Today, we have half the C cluster, half the E cluster, and one of two CUC nodes down. All IMP are up. One UCM subscriber is down, and things have been going poorly. Jabber customers keep getting punted from the client with “Your session has expired” randomly. The Jabber log looks like this token has expired, but, doesn’t provide enough debugging to know why. It’s possible that the Expressway E is fronting this message, since I understand it sits between Jabber and the rest of the infrastructure for oAuth, and Jabber does not talk to the UCM/CUC directly. When we did not have SSO, the worst thing we had to do is make sure that the Jabber client’s device pool had an active UCM as the primary in the CMGroup, as they wouldn’t register properly without that, but, those UCMs are up. Does anyone know what might be going on here? My best guess is that the Expressway isn’t intelligent enough to mark a UCM out of service when unreachable (or CUC server for that matter) and it is trying to refresh a customer’s token against a server that isn’t up. When this times out, instead of trying another it is telling Jabber the refresh token is expired. If this is the case, there’s no cluster resilience with Jabber, if any nodes are down then things are going to be intermittent. Why does Jabber sometimes choose to pop the dialog asking for a new session, and sometimes it just kicks the customer out of the client requiring a new sign in? I see a bug that suggests enabling LegacyOAuthSignout parameter, but, it doesn’t explain what effect that’s going to have on the client. Basically, this is just a test but I am trying to learn from it, and would appreciate any thoughts/experiences. If it is the Expressway cluster, then there’s no way around this as far as I can tell. Marking a UCM inactive with xAPI doesn’t work, it just gets pushed back to active. Any comments appreciated. Best, Adam Pawlowski SUNYAB NCS ___ cisco-voip mailing list cisco-voip@puck.nether.net<mailto:cisco-voip@puck.nether.net> https://puck.nether.net/mailman/listinfo/cisco-voip<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpuck.nether.net%2Fmailman%2Flistinfo%2Fcisco-voip&data=04%7C01%7Cariel.roza%40la.logicalis.com%7C7102b260f7c543fc5d8c08d8bbe27944%7C2e3290cb8d404058abe502c4f58b87e3%7C0%7C0%7C637465928819010016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=BppcmVisIn5sIsTs58PMMqmKAtYeB3M0G9HQF7LRt%2Fw%3D&reserved=0> _
Re: [cisco-voip] MRA DR / Resilience
But will this include the scenario were one of the CUCMs is down? Don´t see explicitly in the notes… De: cisco-voip En nombre de NateCCIE Enviado el: miércoles, 13 de enero de 2021 10:56 Para: Pawlowski, Adam CC: cisco-voip@puck.nether.net Asunto: Re: [cisco-voip] MRA DR / Resilience SIP Registration Failover for Cisco Jabber - MRA Deployments https://www.cisco.com/c/dam/en/us/td/docs/voice_ip_comm/expressway/release_note/Cisco-Expressway-Release-Note-X12-7.pdf#page16<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cisco.com%2Fc%2Fdam%2Fen%2Fus%2Ftd%2Fdocs%2Fvoice_ip_comm%2Fexpressway%2Frelease_note%2FCisco-Expressway-Release-Note-X12-7.pdf%23page16&data=04%7C01%7Cariel.roza%40la.logicalis.com%7C3ca70e93ed9a45c08f5208d8b7cb03e5%7C2e3290cb8d404058abe502c4f58b87e3%7C0%7C0%7C637461430021168602%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=sQElexskbC1q%2Bzsi%2BGhDDZ7TY1EV58OGmMLATmjJy24%3D&reserved=0> This is new in x12.7 Sent from my iPhone On Jan 13, 2021, at 6:10 AM, Pawlowski, Adam mailto:aj...@buffalo.edu>> wrote: Hey all, I’m playing in this scenario now and trying to figure out what parts of the solution work, and which do not, in a DR “site failover’ kind of scenario with regard to MRA. I understand the documentation prescribes there’s no failover for voice and video, but I think that failover is different than the one I’m describing here. I know I can take Expressway C and Expressway E nodes out of the cluster at will, and things will heal over time once the Jabber clients catch up. I can take a Unity Connection guest down, and it should work, though the Jetty service certainly has load limits. I don’t think I’m hitting those here. I can take an IM&P node down, and, with the exception of pChat services (DB was not deployed HA and merge job just seems to fail but that’s another investigation), clients will eventually fail over and recover. Today, we have half the C cluster, half the E cluster, and one of two CUC nodes down. All IMP are up. One UCM subscriber is down, and things have been going poorly. Jabber customers keep getting punted from the client with “Your session has expired” randomly. The Jabber log looks like this token has expired, but, doesn’t provide enough debugging to know why. It’s possible that the Expressway E is fronting this message, since I understand it sits between Jabber and the rest of the infrastructure for oAuth, and Jabber does not talk to the UCM/CUC directly. When we did not have SSO, the worst thing we had to do is make sure that the Jabber client’s device pool had an active UCM as the primary in the CMGroup, as they wouldn’t register properly without that, but, those UCMs are up. Does anyone know what might be going on here? My best guess is that the Expressway isn’t intelligent enough to mark a UCM out of service when unreachable (or CUC server for that matter) and it is trying to refresh a customer’s token against a server that isn’t up. When this times out, instead of trying another it is telling Jabber the refresh token is expired. If this is the case, there’s no cluster resilience with Jabber, if any nodes are down then things are going to be intermittent. Why does Jabber sometimes choose to pop the dialog asking for a new session, and sometimes it just kicks the customer out of the client requiring a new sign in? I see a bug that suggests enabling LegacyOAuthSignout parameter, but, it doesn’t explain what effect that’s going to have on the client. Basically, this is just a test but I am trying to learn from it, and would appreciate any thoughts/experiences. If it is the Expressway cluster, then there’s no way around this as far as I can tell. Marking a UCM inactive with xAPI doesn’t work, it just gets pushed back to active. Any comments appreciated. Best, Adam Pawlowski SUNYAB NCS ___ cisco-voip mailing list cisco-voip@puck.nether.net<mailto:cisco-voip@puck.nether.net> https://puck.nether.net/mailman/listinfo/cisco-voip ___ cisco-voip mailing list cisco-voip@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-voip
Re: [cisco-voip] MRA DR / Resilience
Hi Nate, we’re still on X12.6.5 so I’ll have to scope this out. It looks like, if I read that right, the Expressway will finally flag servers as inactive instead of … just not. It’s unclear if this improves anything with Jabber’s behavior. My customers have gifted my inbox with Jabber PRT logs this morning, and in reading through them, it looks like most of the issues are: * Jabber trying to hit the CUC node that’s down for SSO auth, which results in a sign in failure * Jabber trying to hit the UCM node that’s down for UDS, which results in a sign in failure Both things would be resolved if the servers are marked inactive and not presented to the Jabber client, but the Jabber client also has to handle this better if it tries to reach to something it cannot, instead of just bombing out. That’s probably a pipe dream with Jabber at this point. Thanks again, Adam From: NateCCIE Sent: Wednesday, January 13, 2021 8:56 AM To: Pawlowski, Adam Cc: cisco-voip@puck.nether.net Subject: Re: [cisco-voip] MRA DR / Resilience SIP Registration Failover for Cisco Jabber - MRA Deployments https://www.cisco.com/c/dam/en/us/td/docs/voice_ip_comm/expressway/release_note/Cisco-Expressway-Release-Note-X12-7.pdf#page16 This is new in x12.7 Sent from my iPhone On Jan 13, 2021, at 6:10 AM, Pawlowski, Adam mailto:aj...@buffalo.edu>> wrote: Hey all, I’m playing in this scenario now and trying to figure out what parts of the solution work, and which do not, in a DR “site failover’ kind of scenario with regard to MRA. I understand the documentation prescribes there’s no failover for voice and video, but I think that failover is different than the one I’m describing here. I know I can take Expressway C and Expressway E nodes out of the cluster at will, and things will heal over time once the Jabber clients catch up. I can take a Unity Connection guest down, and it should work, though the Jetty service certainly has load limits. I don’t think I’m hitting those here. I can take an IM&P node down, and, with the exception of pChat services (DB was not deployed HA and merge job just seems to fail but that’s another investigation), clients will eventually fail over and recover. Today, we have half the C cluster, half the E cluster, and one of two CUC nodes down. All IMP are up. One UCM subscriber is down, and things have been going poorly. Jabber customers keep getting punted from the client with “Your session has expired” randomly. The Jabber log looks like this token has expired, but, doesn’t provide enough debugging to know why. It’s possible that the Expressway E is fronting this message, since I understand it sits between Jabber and the rest of the infrastructure for oAuth, and Jabber does not talk to the UCM/CUC directly. When we did not have SSO, the worst thing we had to do is make sure that the Jabber client’s device pool had an active UCM as the primary in the CMGroup, as they wouldn’t register properly without that, but, those UCMs are up. Does anyone know what might be going on here? My best guess is that the Expressway isn’t intelligent enough to mark a UCM out of service when unreachable (or CUC server for that matter) and it is trying to refresh a customer’s token against a server that isn’t up. When this times out, instead of trying another it is telling Jabber the refresh token is expired. If this is the case, there’s no cluster resilience with Jabber, if any nodes are down then things are going to be intermittent. Why does Jabber sometimes choose to pop the dialog asking for a new session, and sometimes it just kicks the customer out of the client requiring a new sign in? I see a bug that suggests enabling LegacyOAuthSignout parameter, but, it doesn’t explain what effect that’s going to have on the client. Basically, this is just a test but I am trying to learn from it, and would appreciate any thoughts/experiences. If it is the Expressway cluster, then there’s no way around this as far as I can tell. Marking a UCM inactive with xAPI doesn’t work, it just gets pushed back to active. Any comments appreciated. Best, Adam Pawlowski SUNYAB NCS ___ cisco-voip mailing list cisco-voip@puck.nether.net<mailto:cisco-voip@puck.nether.net> https://puck.nether.net/mailman/listinfo/cisco-voip ___ cisco-voip mailing list cisco-voip@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-voip
Re: [cisco-voip] MRA DR / Resilience
SIP Registration Failover for Cisco Jabber - MRA Deployments https://www.cisco.com/c/dam/en/us/td/docs/voice_ip_comm/expressway/release_note/Cisco-Expressway-Release-Note-X12-7.pdf#page16 This is new in x12.7 Sent from my iPhone > On Jan 13, 2021, at 6:10 AM, Pawlowski, Adam wrote: > > > Hey all, > > I’m playing in this scenario now and trying to figure out what parts of the > solution work, and which do not, in a DR “site failover’ kind of scenario > with regard to MRA. > > I understand the documentation prescribes there’s no failover for voice and > video, but I think that failover is different than the one I’m describing > here. > > I know I can take Expressway C and Expressway E nodes out of the cluster at > will, and things will heal over time once the Jabber clients catch up. > > I can take a Unity Connection guest down, and it should work, though the > Jetty service certainly has load limits. I don’t think I’m hitting those here. > > I can take an IM&P node down, and, with the exception of pChat services (DB > was not deployed HA and merge job just seems to fail but that’s another > investigation), clients will eventually fail over and recover. > > Today, we have half the C cluster, half the E cluster, and one of two CUC > nodes down. All IMP are up. One UCM subscriber is down, and things have been > going poorly. Jabber customers keep getting punted from the client with “Your > session has expired” randomly. The Jabber log looks like this token has > expired, but, doesn’t provide enough debugging to know why. It’s possible > that the Expressway E is fronting this message, since I understand it sits > between Jabber and the rest of the infrastructure for oAuth, and Jabber does > not talk to the UCM/CUC directly. > > When we did not have SSO, the worst thing we had to do is make sure that the > Jabber client’s device pool had an active UCM as the primary in the CMGroup, > as they wouldn’t register properly without that, but, those UCMs are up. > > Does anyone know what might be going on here? > > My best guess is that the Expressway isn’t intelligent enough to mark a UCM > out of service when unreachable (or CUC server for that matter) and it is > trying to refresh a customer’s token against a server that isn’t up. When > this times out, instead of trying another it is telling Jabber the refresh > token is expired. If this is the case, there’s no cluster resilience with > Jabber, if any nodes are down then things are going to be intermittent. > > Why does Jabber sometimes choose to pop the dialog asking for a new session, > and sometimes it just kicks the customer out of the client requiring a new > sign in? I see a bug that suggests enabling LegacyOAuthSignout parameter, > but, it doesn’t explain what effect that’s going to have on the client. > > Basically, this is just a test but I am trying to learn from it, and would > appreciate any thoughts/experiences. If it is the Expressway cluster, then > there’s no way around this as far as I can tell. Marking a UCM inactive with > xAPI doesn’t work, it just gets pushed back to active. > > Any comments appreciated. > > Best, > > Adam Pawlowski > SUNYAB NCS > > > ___ > cisco-voip mailing list > cisco-voip@puck.nether.net > https://puck.nether.net/mailman/listinfo/cisco-voip ___ cisco-voip mailing list cisco-voip@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-voip
[cisco-voip] MRA DR / Resilience
Hey all, I'm playing in this scenario now and trying to figure out what parts of the solution work, and which do not, in a DR "site failover' kind of scenario with regard to MRA. I understand the documentation prescribes there's no failover for voice and video, but I think that failover is different than the one I'm describing here. I know I can take Expressway C and Expressway E nodes out of the cluster at will, and things will heal over time once the Jabber clients catch up. I can take a Unity Connection guest down, and it should work, though the Jetty service certainly has load limits. I don't think I'm hitting those here. I can take an IM&P node down, and, with the exception of pChat services (DB was not deployed HA and merge job just seems to fail but that's another investigation), clients will eventually fail over and recover. Today, we have half the C cluster, half the E cluster, and one of two CUC nodes down. All IMP are up. One UCM subscriber is down, and things have been going poorly. Jabber customers keep getting punted from the client with "Your session has expired" randomly. The Jabber log looks like this token has expired, but, doesn't provide enough debugging to know why. It's possible that the Expressway E is fronting this message, since I understand it sits between Jabber and the rest of the infrastructure for oAuth, and Jabber does not talk to the UCM/CUC directly. When we did not have SSO, the worst thing we had to do is make sure that the Jabber client's device pool had an active UCM as the primary in the CMGroup, as they wouldn't register properly without that, but, those UCMs are up. Does anyone know what might be going on here? My best guess is that the Expressway isn't intelligent enough to mark a UCM out of service when unreachable (or CUC server for that matter) and it is trying to refresh a customer's token against a server that isn't up. When this times out, instead of trying another it is telling Jabber the refresh token is expired. If this is the case, there's no cluster resilience with Jabber, if any nodes are down then things are going to be intermittent. Why does Jabber sometimes choose to pop the dialog asking for a new session, and sometimes it just kicks the customer out of the client requiring a new sign in? I see a bug that suggests enabling LegacyOAuthSignout parameter, but, it doesn't explain what effect that's going to have on the client. Basically, this is just a test but I am trying to learn from it, and would appreciate any thoughts/experiences. If it is the Expressway cluster, then there's no way around this as far as I can tell. Marking a UCM inactive with xAPI doesn't work, it just gets pushed back to active. Any comments appreciated. Best, Adam Pawlowski SUNYAB NCS ___ cisco-voip mailing list cisco-voip@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-voip