Bugs item #863113, was opened at 2003-12-19 19:33 Message generated for change (Comment added) made by airsquig You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=376685&aid=863113&group_id=22866
Category: Clustering Group: v3.2 Status: Open Resolution: Invalid Priority: 5 Submitted By: Jason Tetrault (airsquig) Assigned to: Thomas Peuss (tpeuss) Summary: Session Replication Inconsistent Initial Comment: This Bug report is a result of discussions on the JBoss Forums: http://www.jboss.org/index.html? module=bb&op=viewtopic&t=43602 Versions Replicated on: JBoss 3.2.3, JBoss 3.2.2, JBoss 3.2.4 RC, JBoss 4.0 RC Version researched: JBoss 3.2.4 nightly Overview: It was noticed that HTTP Session Replication appeared inconsistent in a clustered environment. Further research showed that this was happening under round robin load balancing. Basically, it was found that a Session would replicate to another node ONCE and only Once, after that, the sessions are inconsistent. This shows itself in a apache round robin, NON Sticky load balanced configuration. On the third hit is when this bug shows itself. Now, after adding some trace, what I found is the following: It seem that the org.jboss.web.tomcat.session.ClusterManager has a local session container(sessions). If and only if the session is not in the local container, it will access the HTTPSessionMBean to get the session, which calls the org.jboss.ha.httpsession.beanimpl.ejb.ClusteredHTTPSes sionBeanImpl EJB. All works well. If node A is hit first, then Node B, Node B will get the session from the EJB that was set from node A. Now, Once both nodes have the session in their cache, the ClusterManager does not appear to go back to the MBean to re-get their session on get session requests, it just uses the version in local sessions container. This means that a session will replicate ONCE, and after that, tomcat uses its local version. This causes an inconsistent session state. It does look like each Servlet Container is updating there version in the EJB but, it is off because it is based off the inconsistent session it its local cache. Now, to test this, I made a quick code change to ClusterManager.findSession()method to get the session from the MBean and not use the one from sessions. This appeared to fix the problem (You have to call sessions.sessions.get(id) or it will break. Again, I did not spend much time on fixing it). Exactly how you want to fix this is up to you, I can see a few ways: 1. Somewhat like mentioned above. 2. Making the backend call the invalidate on the MBean to get rid of the sessions session. 3. others. Reproduction Information: This can be replicated by 1 JSP in a clustered environment in a session replicated web application. The attached JSP will replicate this problem. Basically, on each cluster, change the string being appended to the session key to the node name. In an apache round robin environment, one would expect that the string in the session would look like the following: Hit1: NodeA Hit2: NodeA,NodeB Hit3: NodeA,NodeB,NodeA Hit4: NodeA,NodeB,NodeA,NodeB Hit5: NodeA,NodeB,NodeA,NodeB,NodeA Right now, this is what happens: Hit1: NodeA Hit2: NodeA,NodeB Hit3: NodeA,NodeA Hit4: NodeA,NodeB,NodeB Hit5: NodeA,NodeA,NodeA Hit6: NodeA,NodeB,NodeB,NodeB Now, you do not necessarily need the apache round robin configuration but, it helps. Email me if you have any questions. Jason ---------------------------------------------------------------------- >Comment By: Jason Tetrault (airsquig) Date: 2004-01-09 15:56 Message: Logged In: YES user_id=934522 Hello Thomas, I have made the configuration change. I have run a simple test of this change. All appears well with the simple test. Hopefully, I will be able to run some more aggressive testing later. I will update if I see any issues. Thanks Again Jason ---------------------------------------------------------------------- Comment By: Thomas Peuss (tpeuss) Date: 2004-01-09 13:46 Message: Logged In: YES user_id=507779 Sorry, I forgot to explain the new feature. You have to change following in jbossweb-tomcat41.sar/META-INF/jboss-service.xml: <attribute name="UseLocalCache">false</attribute> This forces replication and loading the session after every request. CU Thomas ---------------------------------------------------------------------- Comment By: Jason Tetrault (airsquig) Date: 2004-01-08 14:18 Message: Logged In: YES user_id=934522 Hello Thomas, Thank you for you effort. I have run the same test again and I am seeing the same issue. I ran yesterdays nightly build. I have not yet had a chance to look at the code. I was wondering if there is possibly a new configuration I will need? Regards, Jason ---------------------------------------------------------------------- Comment By: Thomas Peuss (tpeuss) Date: 2004-01-04 11:55 Message: Logged In: YES user_id=507779 I commited the changes today. Please give CVS Branch_3_2 a try. CU Thomas ---------------------------------------------------------------------- Comment By: Thomas Peuss (tpeuss) Date: 2003-12-29 18:27 Message: Logged In: YES user_id=507779 I am working on a solution for this. CU Thomas ---------------------------------------------------------------------- Comment By: Jason Tetrault (airsquig) Date: 2003-12-29 15:43 Message: Logged In: YES user_id=934522 Hello All, I was wondering if there was any update on this? - Jason ---------------------------------------------------------------------- Comment By: Jason Tetrault (airsquig) Date: 2003-12-21 17:56 Message: Logged In: YES user_id=934522 Hello Sacha, Thomas, My apologies. The requirements I speak of are system requirements that I am trying to help our customer meet, note necessarily J2EE requirements. The requirement was a clustered application that works without sticky sessions. I just wanted to note that I have helped deploy and develop a few large scale web systems (not necessarily with JBoss) and this is not the first time I have seen this requirement. More of a planning note for JBoss. I like the technology and wanted to help make this as competitive as possible. At least the choice would be nice, be it a bug or a feature request. Regards, Jason ---------------------------------------------------------------------- Comment By: Thomas Peuss (tpeuss) Date: 2003-12-21 17:41 Message: Logged In: YES user_id=507779 Hello Sacha, now I got the point. I will change the clustering code to lookup the session in the clustering code in every situation. Should I apply this to 3.2 or HEAD first? CU Thomas ---------------------------------------------------------------------- Comment By: Sacha Labourey (slaboure) Date: 2003-12-21 16:22 Message: Logged In: YES user_id=95900 > It is not uncommon for a highly redundant, large scale > deployment to expect this (Of any technology, not just J2EE) > from clustered technologies. I am working on a customer What are these "requirements", your posts (this and the first one) fail to explicit them clearly. There is a bug in the code (in that we don't always use the latest known information in the case of dual network failure, flip-flap effect), ok, but I don't see the "feature requirement" you mention. ---------------------------------------------------------------------- Comment By: Jason Tetrault (airsquig) Date: 2003-12-21 16:09 Message: Logged In: YES user_id=934522 Hello All, Two Notes, looking at the code I had the same worry Sacha did. This approach will only work if there is a tomcat container failure and restart. This will cause faulty data if failures result form network blips or overloaded machines. If this is the way HTTP Clustering is going to work, this at least needs to be highlighted in the JBoss Cluster document. It is not uncommon for a highly redundant, large scale deployment to expect this (Of any technology, not just J2EE) from clustered technologies. I am working on a customer system right now that this is a requirement for. I also believe it is common that J2EE application servers support this. Remember, many times you have a hardware based load balancer in the mix as well with replicated web servers (Which, yes you can configure for sticky sessions). The question really is, what happens to the load AFTER the failure with many users in a sticky session environment. Cheers Jason ---------------------------------------------------------------------- Comment By: Sacha Labourey (slaboure) Date: 2003-12-21 15:41 Message: Logged In: YES user_id=95900 Hello Thomas, No, you shouldn't redirect back, but a second failure could make it go back to the first node, and it wouldn't use the last known state, this is what is described in the first scenario. If you hashmap is just holding *my* session, then simply drop it as the clustering code keeps both a serialized and non- serialized representation of the session, so it won't cost much. Why do you think that the tomcat clustering code was never designed to work without sticky sessions, I mean, which cases wouldn't work? As we can enabled synchronous replication, it should be ok IMHO (except in concurrent update), but in that case the spec is dump anyway. Cheers, sacha ---------------------------------------------------------------------- Comment By: Thomas Peuss (tpeuss) Date: 2003-12-21 15:01 Message: Logged In: YES user_id=507779 Sacha, I see the problem of short network hangs leading to a failover to another node. But why should I redirect a client back to his "old" node after it comes up again? The session on the old node is dead and will be removed after some time. The local session cache is there because the Tomcat session manager I derived the clustering session manager from has a HashMap where it holds its sessions. My understanding of the clustering code is that it holds the serialized sessions and I have to deserialize them on access (which is costly - but what do I tell you ;-) ). If this is no longer the case we can remove the local session cache and use the cluster cache on every access. I think this is more straight forward anyway. But this is still no bug because the Tomcat clustering code was never designed to work without sticky session. CU Thomas ---------------------------------------------------------------------- Comment By: Sacha Labourey (slaboure) Date: 2003-12-21 07:44 Message: Logged In: YES user_id=95900 I don't agree Thomas: while it is not really the best thing to do not to use sticky sessions (what if you have concurrent (frame) requests to the same session?), the described problem can occur even in sticky-session situations if we have a set of minor network outages between the LB and the JBoss boxes. What is the purpose of this local session cache anyway? We already have the cache at the EJB level for http sessions, why do we need another one? Is it because its creation is costly? ---------------------------------------------------------------------- Comment By: Thomas Peuss (tpeuss) Date: 2003-12-20 11:03 Message: Logged In: YES user_id=507779 The JBoss HTTP-Clustering ONLY works with sticky session for performance reasons. Maybe we can introduce a configuration option that allows a use in a round-robin fashion (I am sill wondering why someone wants to do this). So this is not a bug. You can add this as a feature request. CU Thomas ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=376685&aid=863113&group_id=22866 ------------------------------------------------------- This SF.net email is sponsored by: Perforce Software. Perforce is the Fast Software Configuration Management System offering advanced branching capabilities and atomic changes on 50+ platforms. Free Eval! http://www.perforce.com/perforce/loadprog.html _______________________________________________ JBoss-Development mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/jboss-development