On Mon, Oct 12, 2009 at 16:08, Glen Daniels <g...@thoughtcraft.com> wrote:
> Hi folks!
>
> OK, so here are the results of my weekend investigations.  The lockup when
> running the Rampart 1.5 tests with Axis2 1.5.1 was due to http connection
> starvation.  I've fixed two issues and everything works now, but I'd like to
> respin both Axis2 1.5.1 and Rampart 1.5 as a result.  Details below.
>
> First, a quick summary of a major change in Axis2 1.5.1 : we were formerly
> creating new MultithreadedHTTPConenctionManagers all the time in the HTTP
> sender code.  In typical usage you'd never see connection pool starvation
> (since each new MHCM had a new pool), but two major problems occurred.  1)
> Connection reuse wasn't really possible, and 2) we would eventually (in
> high-volume situations) run into the OS limits for open sockets.  So I fixed
> this so that 1.5.1 now re-uses a single MHCM for each ConfigurationContext,
> which allows for sharing connections across ServiceClient instances.
>
> The bigger problem *behind* the problem above is that users of the commons
> HTTPClient library (like Axis2) need to call releaseConnection() on each and
> every HTTPMethod after they are finished.  The
> ServiceClient.cleanupTransport() call does this, but since we never told
> people to call that explicitly,

Well, I did :-) See [1] and [2].

Andreas

[1] http://markmail.org/message/c7wqfwzl23qrheic
[2] http://svn.apache.org/viewvc?view=rev&revision=748730

> no one was in the habit of doing it.  A
> number of bugs about connection starvation came up, and we put in the
> Options.setCallTransportCleanup() option, which automatically calls
> cleanupTransport() after each call, but at a cost - since we're releasing
> connection resources you need to make sure you've read everything, which
> means building the whole Axiom tree.  Bye-bye, streaming.  So I also added a
> different connection cleanup option which automatically cleans up the *last*
> operation as you're setting up the next one.
>
> So, to make the Rampart story very short, the problem was this: a new
> ServiceClient gets created to deal with SecureConversation interactions (see
> STSClient.getServiceClient()).  This SC shares the same ConfigurationContext
> with the outer (i.e. user) SC, so it shares a MHCM and a connection pool.
> The problem is since the STS operations happen inside a user-level operation,
> the record of the "last operation" gets overwritten, and as a result my
> automatic cleanup mechanism can't catch both!  So we lose one connection each
> time we go through the STS process, and that causes a hard lock.
>
> SOLUTION
> --------
>
> I did two things to fix this, both of which I think should be reflected in
> the released code.  First, in Rampart, I added a call to
> setCallTransportCleanup(true) in STSClient - this means that the STS
> operations will be forced to build the complete Axiom tree (see above), but
> solves the connection starvation issue.  Second, in Axis2, I added a default
> 30-second timeout while waiting for new connections - this doesn't change the
> functionality at all, but it does mean that we can no longer get into
> situations where the system just locks up forever.  With that change, we'll
> now at least get an Exception if there's a starvation issue, which can then
> be debugged.
>
> Nandana/all, can you check what I did in Rampart and let me know if you
> foresee any problems with it?  I'm going to respin Axis2 1.5.1 with this and
> one other fix, and we should respin Rampart 1.5 as well.
>
> Thoughts/comments?
>
> Thanks,
> --Glen
>

Reply via email to