Wow. So as you say, two layers of problems -- failure of SAML (or just non-redeemed?) STs in the replicated instance, as well a problem with Google resulting in a sort DOS 'attack' against the system. From experience locally, I would say the latter is more likely to be a misconfigured browser or perhaps a mobile device (applications [compiled and web] like Google Drive on mobile can behave oddly depending on how you have your Google SSO set up). Do the access logs show anything consistent for these repeat STs -- i.e. same / similar followup URL in the user request to CAS, same user agent, etc.?
On 12/12/14, 6:20 AM, David A. Kovacic wrote: > Exactly right. If we see 3000 STs created in an hour for a user, all > to Google, we are also seeing Google report 3000 successful logins in > the same time frame reported in the Google admin console audit logs. > As far as I can tell, whatever condition triggers this (and it may be > some form of malware being used to send spam through us) gets > credentials once (only one TGT is ever created) and then somehow does > >1000 logins in about an hour to Google without ever logging back out > of Google. As far as we can tell, there are no errors in the process > of logging in, but because the Google SAML process seems to leave > fairly large remnants of instances in the heap, and those remnants are > not being GCed in a timely fashion, we run out of heap memory and the > SSO server process locks up, taking the other server with it. To > summarize, it seems not to be the SAML process itself, but the VOLUME > of SAML processes in a very short time that seems to cause the issue. > > > On 12/11/14 9:01 PM, Sean Baker wrote: >> Now that's interesting -- is that to say that when you see these >> rapidly-generated service tickets for particular users you're seeing >> them logging in as many times to Google as well? >> >> >> >> >> On 12/11/14, 14:17 PM, David A. Kovacic wrote: >>> Google seems to be accepting the assertions each time as we are >>> seeing the same number of logins in Google's audit logs as the >>> number of STs being created. I would expect that if there was >>> something wrong with assertion we would be receiving complaints from >>> the users. I am more inclined at this point to believe some sort of >>> crazy browser loop, but it's definitely not happening with any >>> consistency. >>> >>> We have tried contacting the two people we identified once we >>> started to get a handle on what the issue was, however neither has >>> responded. That's not terribly surprising given that we are in our >>> finals period here and requests for information go pretty much >>> ignored by students and faculty alike at this time. >>> >>> Dave >>> >>> On 12/10/14 8:14 PM, Sean Baker wrote: >>>> Your access logs should show the individual SAMLRequest's generated by >>>> Google; if it's rejecting your assertions in some automated way you >>>> should see a new SAMLRequest each time. If it's the same request over >>>> and over, one might infer a more local issue (not definitively mind you; >>>> just much more likely) [ehcache issue, browser configuration, etc.]. >>>> >>>> Has anyone talked with your end users who're triggering these events >>>> about what they experienced? >>>> >>>> On 12/10/14, 15:16 PM, David A. Kovacic wrote: >>>>> Does anyone know what I would need to do to be able to log the >>>>> actual SAML transactions? Is there any way to actually do that? >>>>> We have isolated this issue to only logins to Google and only >>>>> under certain conditions when something seems to start looping and >>>>> generating STs rapidly. We are trying to isolate the conditions >>>>> under which the loop starts. >>>>> >>>>> It would be helpful to actually see the SAML transactions being >>>>> generated so we could begin to get a handle on what Google apps is >>>>> being referenced and if Google is returning any errors or not >>>>> (although Google claims valid logins). >>>>> >>>>> >>>>> On 12/6/14 9:11 AM, Marvin Addison wrote: >>>>>> >>>>>> Second, the massive number of STs are being created on only >>>>>> one server (we can tell by the host name in the logged ST) >>>>>> but the OTHER SERVER is where the memory is growing out of >>>>>> bounds. >>>>>> >>>>>> >>>>>> I'm still working through this thread, but I wanted to point out >>>>>> that the other is hurting likely because of load balancer session >>>>>> affinity. Recall that ticket validation is a back-channel call, >>>>>> and the network source differs from that of the user's browser. >>>>>> In our environment, services typically get stuck on one node >>>>>> causing hot spots. This is because the service is validating >>>>>> tickets frequently enough that the session affinity timeout never >>>>>> kicks in. >>>>>> >>>>>> M >>>>>> >>>>>> -- >>>>>> You are currently subscribed tocas-u...@lists.jasig.org as:d...@case.edu >>>>>> To unsubscribe, change settings or access archives, >>>>>> seehttp://www.ja-sig.org/wiki/display/JSG/cas-user >>>>>> >>>>>> -- >>>>>> >>>>> -- >>>>> You are currently subscribed tocas-u...@lists.jasig.org >>>>> as:sean.ba...@usuhs.edu >>>>> To unsubscribe, change settings or access archives, >>>>> seehttp://www.ja-sig.org/wiki/display/JSG/cas-user >>>> >>>> -- >>>> You are currently subscribed tocas-u...@lists.jasig.org as:d...@case.edu >>>> To unsubscribe, change settings or access archives, >>>> seehttp://www.ja-sig.org/wiki/display/JSG/cas-user >>> -- >>> You are currently subscribed tocas-u...@lists.jasig.org >>> as:sean.ba...@usuhs.edu >>> To unsubscribe, change settings or access archives, >>> seehttp://www.ja-sig.org/wiki/display/JSG/cas-user >> >> -- >> You are currently subscribed tocas-u...@lists.jasig.org as:d...@case.edu >> To unsubscribe, change settings or access archives, >> seehttp://www.ja-sig.org/wiki/display/JSG/cas-user > -- > You are currently subscribed to cas-user@lists.jasig.org as: > sean.ba...@usuhs.edu > To unsubscribe, change settings or access archives, see > http://www.ja-sig.org/wiki/display/JSG/cas-user -- You are currently subscribed to cas-user@lists.jasig.org as: arch...@mail-archive.com To unsubscribe, change settings or access archives, see http://www.ja-sig.org/wiki/display/JSG/cas-user