Re: Possible AbstractProtocol.waitingProcessors leak in Tomcat 9.0.75

2023-12-05 Thread Jakub Remenec
Hi,

I've experienced the same issue as described on Apache Tomcat 10.1.13.
After downgrading to 10.1.5 it started to work correctly. I also inspected
the heapdump of the application with memory problems and found out that
there were many org.apache.tomcat.websocket.WsSession present in
OUTPUT_CLOSED state. When I tried locally, I found out, that when I open
few websocket connections from Chrome and then go to Offline mode the
WsSessions remain in OUTPUT_CLOSED state. New connections afterwards have
state OPEN. In heapdump from production I saw around 4600 WsSessions but
only 40 were open. The rest was in the output closed state.
WsSessions are accessible through org.apache.coyote.AbstractProtocol ->
waitingProcessors Set. In heapdump and it was clearly visible that 49% of
heap was taken by the waitingProcessors set. When tomcat was downgraded to
10.1.5, I saw that WsSessions got cleared after going to offline mode.

Additional info - I've set the session timeout to 10minutes. The app uses
Java 17 with Spring Boot 3.1.x stack. It does not use any external STOMP
broker relay.

Regards,
Jakub.

On 2023/08/20 22:44:46 Mark Thomas wrote:
> On 20/08/2023 05:21, Mark Thomas wrote:
> > On 18/08/2023 11:28, Rubén Pérez wrote:
>
> 
>
> >> I started experiencing exactly the same issue when updating from Spring
> >> 6.0.7 to 6.0.9, therefore updating tomcat from 10.1.5 to 10.1.8. The
> >> Memory
> >> leak is very clearly visible in my monitoring tools. A further heap
dump
> >> reveals like many times more entries in waitingProcessors map than real
> >> active connections, and we end up with like 8 retained GB in memory
> >> full of
> >> those entries.
> >>
> >> I believe I have found a way to reproduce the issue locally. Open a
> >> websocket session from a client in Chrome, go to dev-tools and switch
the
> >> tab to offline mode, wait > 50secs, go and switch it back to No
> >> Throttling.
> >> Sometimes I get an error back to the client like:
> >>
> >> a["ERROR\nmessage:AMQ229014\\c Did not receive data from
> >> /192.168.0.1\\c12720
> >> within the 5ms connection TTL. The connection will now be
> >> closed.\ncontent-length:0\n\n\u"]
> >>
> >> And other times I get instead something like c[1002, ""] from Artemis
> >> followed by an "Invalid frame header" error from Chrome (websockets
> >> view in
> >> dev-tools).
> >>
> >> Only when it is the latter case, looks to be leaking things in that
map.
> >> Maybe it is a casualty or not, but that is what I have observed at
> >> least 2
> >> times.
> >>
> >> After the error appeared, I waited long enough for FE to reconnect the
> >> session, and then I just quitted Chrome.
> >
> > Thanks for the steps to reproduce. That is helpful. I'll let you know
> > how I get on.
>
> Unfortunately, I didn't get very far. Based on the log messages it looks
> very much like those are application generated rather than Tomcat
generated.
>
> At this point I am wondering if this is an application or a Tomcat
> issue. I'm going to need a sample application (ideally as cut down as
> possible) that demonstrates the issue to make progress on this.
>
> Another option is debugging this yourself to figure out what has
> changed. I can provide some pointers if this is of interest. Giv en you
> can repeat the issue reaosnable reliably, tracking down the commit that
> trigger the change isn't too hard.
>
> >> Again, after forcefully downgrading Tomcat 10.1.8 to 10.1.5 while
> >> preserving the same Spring version, the issue is gone (confirmed in
> >> production), in fact I have never managed to get an "Invalid frame
> >> header"
> >> in Chrome again with Tomcat 10.1.5 (in like 10 attempts). Before I got
it
> >> in 2 out of 4 attempts.
> >
> > Could you do some further testing and see if you can narrow down exactly

> > which version (10.1.6, 10.1.7 or 10.1.8) the issue first appears in?
> >
> > It would also be helpful to confirm if the issue is still present in
> > 10.1.12.
>
> Answers to the above would still be helpful.
>
> Mark
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>
>


RE: Re: Possible AbstractProtocol.waitingProcessors leak in Tomcat 9.0.75

2023-12-05 Thread Jakub Remenec
Hi,

I've experienced the same issue as described on Apache Tomcat 10.1.13.
After downgrading to 10.1.5 it started to work correctly. I also inspected
the heapdump of the application with memory problems and found out that
there were many org.apache.tomcat.websocket.WsSession present in
OUTPUT_CLOSED state. When I tried locally, I found out, that when I open
few websocket connections from Chrome and then go to Offline mode the
WsSessions remain in OUTPUT_CLOSED state. New connections afterwards have
state OPEN. In heapdump from production I saw around 4600 WsSessions but
only 40 were open. The rest was in the output closed state.
WsSessions are accessible through org.apache.coyote.AbstractProtocol ->
waitingProcessors Set. In heapdump and it was clearly visible that 49% of
heap was taken by the waitingProcessors set. When tomcat was downgraded to
10.1.5, I saw that WsSessions got cleared after going to offline mode.

Additional info - I've set the session timeout to 10minutes. The app uses
Java 17 with Spring Boot 3.1.x stack. It does not use any external STOMP
broker relay.

Regards,
Jakub.