On Mon, Oct 12, 2009 at 16:08, Glen Daniels <g...@thoughtcraft.com> wrote: > Hi folks! > > OK, so here are the results of my weekend investigations. The lockup when > running the Rampart 1.5 tests with Axis2 1.5.1 was due to http connection > starvation. I've fixed two issues and everything works now, but I'd like to > respin both Axis2 1.5.1 and Rampart 1.5 as a result. Details below. > > First, a quick summary of a major change in Axis2 1.5.1 : we were formerly > creating new MultithreadedHTTPConenctionManagers all the time in the HTTP > sender code. In typical usage you'd never see connection pool starvation > (since each new MHCM had a new pool), but two major problems occurred. 1) > Connection reuse wasn't really possible, and 2) we would eventually (in > high-volume situations) run into the OS limits for open sockets. So I fixed > this so that 1.5.1 now re-uses a single MHCM for each ConfigurationContext, > which allows for sharing connections across ServiceClient instances. > > The bigger problem *behind* the problem above is that users of the commons > HTTPClient library (like Axis2) need to call releaseConnection() on each and > every HTTPMethod after they are finished. The > ServiceClient.cleanupTransport() call does this, but since we never told > people to call that explicitly,
Well, I did :-) See [1] and [2]. Andreas [1] http://markmail.org/message/c7wqfwzl23qrheic [2] http://svn.apache.org/viewvc?view=rev&revision=748730 > no one was in the habit of doing it. A > number of bugs about connection starvation came up, and we put in the > Options.setCallTransportCleanup() option, which automatically calls > cleanupTransport() after each call, but at a cost - since we're releasing > connection resources you need to make sure you've read everything, which > means building the whole Axiom tree. Bye-bye, streaming. So I also added a > different connection cleanup option which automatically cleans up the *last* > operation as you're setting up the next one. > > So, to make the Rampart story very short, the problem was this: a new > ServiceClient gets created to deal with SecureConversation interactions (see > STSClient.getServiceClient()). This SC shares the same ConfigurationContext > with the outer (i.e. user) SC, so it shares a MHCM and a connection pool. > The problem is since the STS operations happen inside a user-level operation, > the record of the "last operation" gets overwritten, and as a result my > automatic cleanup mechanism can't catch both! So we lose one connection each > time we go through the STS process, and that causes a hard lock. > > SOLUTION > -------- > > I did two things to fix this, both of which I think should be reflected in > the released code. First, in Rampart, I added a call to > setCallTransportCleanup(true) in STSClient - this means that the STS > operations will be forced to build the complete Axiom tree (see above), but > solves the connection starvation issue. Second, in Axis2, I added a default > 30-second timeout while waiting for new connections - this doesn't change the > functionality at all, but it does mean that we can no longer get into > situations where the system just locks up forever. With that change, we'll > now at least get an Exception if there's a starvation issue, which can then > be debugged. > > Nandana/all, can you check what I did in Rampart and let me know if you > foresee any problems with it? I'm going to respin Axis2 1.5.1 with this and > one other fix, and we should respin Rampart 1.5 as well. > > Thoughts/comments? > > Thanks, > --Glen >