Attached is a rather massive explanation of a set of problems and a "good enough" sloppy fix that we encountered bringing up CAS 3.4.2 under clustered JBoss servers. Elements of the problem are specific, but there are some chase conditions here that appear to be universal in that they expose weaknesses inherent in modern tabbed browsers, HTTP, and J2EE in very tight timing situations. The proposed solution provides code that works for the common userid/password form case but needs to be generalized to support every situation that CAS can currently be configured to support.
-- You are currently subscribed to [email protected] as: [email protected] To unsubscribe, change settings or access archives, see http://www.ja-sig.org/wiki/display/JSG/cas-devTitle: Naked POST
Naked POST
This document describes modifications that were found to be necessary for high-availability (clustered) CAS instances.
Yale has deployed multiple instances of CAS 3.4.2 under clustered JBoss 5 containers. Clustered JBoss 5 is used for other applications and is understood by our production support team. An F5 serves as a front-end to load balance CAS requests across the CAS instances. It was originally configured with session affinity, so that once a CAS instance issued a JSESSIONID to a Browser, subsequent requests with that JSESSIONID would be routed to the same CAS instance that issued the cookie. When we had problems that cast doubt on this logic (see below) we added IP address affinity where a browser once routed to a CAS would continue to be routed to the same CAS for a period of time.
CAS replicates its TicketRegistry using a named JBoss Cache instance configured by production support in the same files where the JBoss cluster defines replication of its HttpSession objects and EJB contexts. This involves a small modification of the standard cas-server-integration-JBoss project, which assumes that JBoss Cache is a JAR library of the application and expects to configure and create the Cache itself. The modification involves calling the JBoss container CacheManager to ask for a reference to the container managed Cache by name.
During testing we experienced infrequent cases of what I will call the Null Submit problem. A user would fill in the CAS login form, press Enter, and get back an empty CAS login form with no error message. The most likely cause of this behavior would be that the Form is submitted to a different CAS than the one that wrote the form and the other CAS lacks the Session and/or FlowScope data saved by the first CAS when the form was generated.
Web Flow, FlowState, and FlowExecutionKey
This requires a small technical explanation of how CAS uses Spring WebFlow. The login sequence is configured as a sequence of states, actions, and transitions in the login-webflow.xml file. First, the InitialFlowSetupAction class (doExecute method) is called by the configuration statement:
<on-start>
<evaluate _expression_="initialFlowSetupAction" />
</on-start>
InitialFlowSetupAction examines the request and stores cookie data from the request into FlowScope. FlowScope is conceptually a set of Session variables that exist only during the period of time when the flow is active (from this on-start until the flow reaches an end-state). In particular, the service= string is used to create a Service object that is stored in FlowState until the end of the login sequence where it is then used to redirect the Browser back to the correct application.
Then a sequence of tests is made for an existing login cookie or non-interactive credentials. If none are found, the Login form is displayed by [example abbreviated]:
<view-state id="viewLoginForm" view="casLoginView" model="credentials">
<var name="credentials" class="org.jasig.cas.authentication.principal.UsernamePasswordCredentials" />
<transition on="submit" bind="true" validate="true" to="realSubmit">
<set name="flowScope.credentials" value="credentials" />
<evaluate _expression_="authenticationViaFormAction.doBind(flowRequestContext, flowScope.credentials)" />
</transition>
</view-state>
<action-state id="realSubmit"> <evaluate _expression_="authenticationViaFormAction.submit(flowRequestContext, flowScope.credentials, messageContext)" /> <transition ... /> </action-state>
The special Spring WebFlow trick that you need to know is that this view-state that logically writes the Login Form also contains a "model" variable (credentials) that is an instance of the UsernamePasswordCredentials class. After the Form is submitted but before the "submit" event causes a transition to the "realSubmit" state, Spring WebFlow "binds" the submitted fields of the Form to the properties of the model variable. At this point the fields could be validated and the form redisplayed if, for example, the username or password fields are blank.
The point, however, is that on entry to the "realSubmit" state, an instance of the UsernamePasswordCredentials class must have been created, the username and password fields of the form must have been used to set its properties, and this object must be stored under the name "credentials" in FlowState.
The "realSubmit" state is associated with the AuthenticatonViaFormAction class, but you can see that the previous "viewLoginForm" state calls the doBind() method of that bean during the form submit post-processing. If you plug a custom binding routine into authenticationViaFormAction in the Spring configuration, then the custom code will move data from the Form to the Credentials object, but that would only make sense if you used a different Credentials object and a non-standard Form that had other data fields in it.
Between the time that the Form is written to the Browser, and POST that submits the filled in form, the Service object generated from the GET service= parameter, the current state in the flow (at "viewLoginForm"), and the particular UsernamePasswordCredentials object bound to the Form are all stored in FlowState, which in turn is associated with the HttpSession object.
Now suppose you display the CAS Login form on your screen but instead of filling it in you create a second tab and enter the same CAS login URL to generate a second Form. Now the same Browser with the same HttpSession has two different copies of the same CAS login form. It needs two different FlowStates. Spring WebFlow handles this by storing more than one WebFlow in the Session each with its own key. In the WEB_INF/view/jsp/default/ui/casLoginView.jsp page there is a hidden field:
<input type="hidden" name="lt" value="${flowExecutionKey}" />
This corresponds to the CasDefaultFlowUrlHandler class which has the following code:
public String getFlowExecutionKey(final HttpServletRequest request) {
return request.getParameter("lt");
}
The flowExecutionKey is the key under which the FlowState for this form is saved in the HttpSession, and the getFlowExecutionKey() method finds the hidden field in the Form and returns it to Spring WebFlow so the correct FlowState can be bound to the POST that is the Form submit request.
TabGuy
At Yale we have a few users who open a bunch of tabs to a bunch of different CAS protected applications. They then close the Browser saving all the Tabs, and the next morning after all the sessions have timed out they open the browser with the saved tabs. All the applications redirect the browser to CAS for a new login, so these users can end up with five different instances of the CAS Login page all bound to different service= strings and WebFlow execution keys. Lets call this user "TabGuy".
TabGuy will login to one of the CAS Forms. He should be redirected back to the application that corresponds to that particular tab. At this point the other applications show Login forms, but if he reenters the URL or refreshes the screen CAS should discover the newly minted cookie, generate the Service Ticket, and redirect the user back to each of those applications.
Multiple Session Objects
The first thing you realize is that JSESSIONID affinity is not good enough. The Session id is only generated in the response to the GET. If you have a load balancing front-end that receives five consecutive GET requests on different sockets from the same Browser, it is likely to round-robin the requests among the available nodes. IP address affinity may cause all the GET requests to be routed to the same CAS node.
Each node getting brand new requests from a Browser with no JSESSIONID and with no or a timed out TGT Cookie are going to generate new Session objects and new WebFlow states. It is possible that the container will generate two separate Session objects for concurrent requests from the same Browser IP address. After all, the only requirement is that it attach the correct session to any incoming request that contains a JSESSIONID, and none of these requests contain a session cookie. Certainly requests on different nodes of the cluster are going to generate separate Session objects.
Because these are all GET requests, they all end by writing the Login form with different "lt" FlowExecutionKey values and/or different JSESSIONID values. We have to be a bit fuzzy here, because Spring WebFlow tends to generate a FlowExecutionKey value of "e1s1" for the first Flow in a Session, and then a different value for subsequent flows generated while the first flow is still active. So if the requests generate different Session objects, then the responses to concurrent requests will have the same <input type="hidden" name="lt" value="e1s1" /> but different JSESSIONID cookie values. On the other hand, if the requests are matched to the same Session object then they will have the same JSESSIONID cookie value and different "lt" hidden field values.
At this point two separate concurrent network blasts occur. The five Forms get written back to the Browser with possibly more than one JSESSIONID cookie scoped to the same CAS URL, while under the covers the cluster tries to replicate Session objects across the nodes of the cluster.
Fortunately, if more than one Session object was created then each such object has a different key. They all replicate across all the nodes, but each Session object contains only a subset of the FlowStates for a subset of the tabs.
Meanwhile, the last Form that is written to the last Tab contains the JSESSIONID that wins. Cookies are not scoped to the individual tab but apply browser-wide, and the last Set-Cookie will leave behind the value that will be used by any subsequent CAS requests, including the Form POSTs.
ConcurrentModificationException
The java.util.ConcurrentModificationException is thrown when one thread is iterating over a Collection and discovers that another thread has modified the Collection during the iteration process. Typically an application is supposed to serialize operations on the same collection to prevent the problem.
However, after a Request ends, if that Request modified the Session object, then a J2EE container has to replicate it to the other nodes in the cluster. In order for JBoss to replicate the Session object, it has to first serialize it by calling writeObject() because it is the serialized byte stream that is written across the network. Internally, writeObject() enumerates various Collections of object references attached to the Session, including the Map that stores the FlowStates keyed by their FlowExecutionKey.
Because the enumeration is occurring inside JBoss at a time when the previous Request has ended and no CAS code is running, it is not possible to synchronize this operation with other requests. Because updates to the Session state are buried somewhere in Spring WebFlow, this isn't something that CAS has direct control over anyway. So when more than one GET request arrives at the same CAS node, and these requests are assigned to the same Session object, then it is possible for a subsequent request to try and store FlowState in the Session object while the JBoss post-processing of a previous request that used the same Session object is enumerating the Session during the writeObject() operation. At that point a ConcurrentModificationException is thrown and is logged in the JBoss log.
Fortunately, it is not clear that this Exception is really a problem. It may block the earlier Session replication from completing, but by definition it can only happen when a subsequent request is also changing Session state. Eventually the second request completes, and it also triggers post-processing to replicate its Session object across the cluster. The second (or at least the last) request that will request replication of the Session data has no trailing request to interfere with its correct operation. So eventually the Session does get replicated, although the ConcurrentModificationException may delay the replication by a few microseconds.
Orphaned FlowState
This still leaves the possibility that when the dust settles in the chase conditions there will be only one JESSIONID cookie in the browser while the FlowStates will have been spread across more than one Session object (from one node or from several nodes). The FlowStates associated with other Session objects become orphans. Now the good news is that all the FlowStates point to the same point in the Web Flow (at "viewLogonForm") and they mostly have the same data. The mixed news is that since FlowStates in different Sessions share the same "e1s1" key, they are likely to be confused with each other. The bad news is that since the service= is stored in the FlowState, the wrong tab may get redirected back to a given application.
Carry Service String in the Form
This already happens after a manner of speaking. The Form element contains an action attribute:
action="/cas/login;jsessionid=...?service=http://..."
So the service string is passed as QueryString data in the URL of a Form POST. However, generally QueryString parameters are not well defined in a POST of form data. It is slightly better to put it in the Form data itself. This can be easily done by adding another hidden field right after the "lt" field in the casLoginView.jsp file:
<input type="hidden" name="service" value="${service}" />
Then you enforce this by changing AutheticationViaFormAction.submit() to use the service from the Form if it is present instead of using the "service" variable stored in the WebFlow. This is a place where my code will work at Yale where we only have ordinary HTTP service strings and don't use any of the exotic Service types. The following example may not be generally correct and might be better replaced by code more sensitive to the possibility that someone will actually plug an ArgumentExtractor other than CasArgumentExtractor(), which just gets the "service" parameter from the Request. [There are several overloaded versions of WebUnits.getService. The one with an ArgumentExtractor list gets the service from the Request, while the one with just a RequestContext gets "service" as a WebFlow variable. So this code uses the Request parameter if one is provided, and only uses the FlowState if none is found in the Form.]
public final String submit(final RequestContext context, final Credentials credentials, final MessageContext messageContext) throws Exception {
final String ticketGrantingTicketId = WebUtils.getTicketGrantingTicketId(context);
List<ArgumentExtractor> extractors = new ArrayList<ArgumentExtractor>();
extractors.add(new CasArgumentExtractor());
Service service = WebUtils.getService(extractors,context);
if (service==null)
service = WebUtils.getService(context);
This version of the code is now robust against the "lt" field matching an "e1s1" key stored in a Session object other than the Session object created by the GET request that returned the Form.
Naked POST
This now puts us in a position to do something when POST arrives at a CAS server without matching any FlowExecutionKey. Current CAS ignores the mismatch, starts a new Flow, and processes the data down the same path as the GET. The response is then to display an empty form, force the user to reenter data, and lose the service= value. The previous step might fix the service= problem, and that might be enough for TabGuy because the normal login-webflow.xml would detect a TGT Cookie submitted with a POST that arrives at the beginning of the flow (it would not realize that it was a POST rather than a GET, issue a Service Ticket, and redirect back to the Service). However, especially in clustered configurations there is still some concern for the original Null Submit behavior, and we should be able to fix that.
The solution is to add a check for POST at the beginning of the Flow. There should never be a POST at the beginning of the Flow. The POST should always be associated with a lt=e1s1 key that selects a saved FlowState in which the Flow is positioned at the "viewLoginForm" state. However, if one should come in at the front, we can bypass the broken Flow and do the right thing by testing for it. In login-webflox.xml after the on-start, add a new action-state
</on-start>
<action-state id="checkNakedPost">
<evaluate _expression_="authenticationViaFormAction.checkNakedPost(flowRequestContext)"
/>
<transition on="success" to="ticketGrantingTicketExistsCheck" />
<transition on="error" to="realSubmit" />
</action-state>
<decision-state id="ticketGrantingTicketExistsCheck">
I decided to add a new method to AuthenticationViaFormAction in part because it belongs there and in part because we had to change that class to make the service= fix anyway. The new code looks something like:
public String checkNakedPost(RequestContext context) {
ExternalContext externalContext = context.getExternalContext();
HttpServletRequest req = (HttpServletRequest) externalContext.getNativeRequest();
if (!req.getMethod().toUpperCase().equals("POST"))
return "success"; // Its a GET, behave normally
String userid = req.getParameter("username").toLowerCase();
String password = req.getParameter("password");
if (userid==null || userid.length()==0 ||
password==null || password.length()==0)
return "success"; // fall through and display a new form
UsernamePasswordCredentials credential = new UsernamePasswordCredentials();
credential.setUsername(userid);
credential.setPassword(password);
context.getFlowScope().put("credentials",credential);
return "error"; // Skip forward to the realSubmit state
}
This may really sweep a lot of generality under the rug, because as written this code only handles the case where the Credential is a UsernamePasswordCredential, does not do any special validation (other than a username and password are present), and does not even call any doBind() to call any plugged in special Credential binder beans. So it will only work for a straight vanilla username/password form which is what we use at Yale and is good enough for me to solve my problem up front.
With support for Naked POST, it is no longer necessary to depend on the FlowState being unbroken. So we can drop the <distributable /> from web.xml and JBoss stops serializing Session objects to copy them over to the other nodes, which gets rid of the ConcurrentModificationException problem (and it saves a lot of overhead on the JBoss container).
The Bottom Line
While it would be nice to imagine that J2EE containers are supposed to work, the bottom line is that they don't always do what they are supposed to. Worse, the inherent nature of modern tabbed browsers and HTTP protocol open insurmountable problems for clustered CAS and the current FlowState strategy. However, CAS can be made robust against FlowState breaking by adding support for a hidden service field in the form and a Naked POST workaround at the start of the Flow.
