[ 
https://issues.apache.org/jira/browse/HTTPCORE-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17737385#comment-17737385
 ] 

Malay Shah commented on HTTPCORE-753:
-------------------------------------

Hi [~olegk] I am working to capture the logs you've asked for, I haven't been 
able to reproduce it locally, we ran into it with some frequency on our CI/CD 
servers that were under more stress. I'm still working on trying to get you 
that information, but I do have the call stack from the CancelledKeyException 
we caught on the server, see below.

Looking at the code more, I'm not sure either if swapping those two lines of 
code will solve the issue. It looks like to get into this situation, the 
IOSessionImpl must not be closed, but the SelectionKey is invalid.


{{09:25:16 org.opentest4j.AssertionFailedError: failed with 
errorjava.nio.channels.CancelledKeyException at:}}
{{09:25:16 java.nio.channels.CancelledKeyException}}
{{09:25:16 at 
java.base/sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:71)}}
{{09:25:16 at 
java.base/sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:96)}}
{{09:25:16 at 
org.apache.hc.core5.reactor.IOSessionImpl.setEvent(IOSessionImpl.java:168)}}
{{09:25:16 at 
org.apache.hc.core5.reactor.InternalDataChannel.setEvent(InternalDataChannel.java:335)}}
{{09:25:16 at 
org.apache.hc.core5.http.impl.nio.AbstractHttp1StreamDuplexer$CapacityWindow.update(AbstractHttp1StreamDuplexer.java:625)}}
{{09:25:16 at 
com.company.webapi.CorrectedCapacityTrackingGrowthSharedInputBuffer.incrementCapacity(CorrectedCapacityTrackingGrowthSharedInputBuffer.java:125)}}
{{09:25:16 at 
com.company.webapi.CorrectedCapacityTrackingGrowthSharedInputBuffer.read(CorrectedCapacityTrackingGrowthSharedInputBuffer.java:197)}}
{{09:25:16 at 
org.apache.hc.core5.http.nio.support.classic.ContentInputStream.read(ContentInputStream.java:57)}}
{{09:25:16 at 
com.company.webapi.InputStreamClassicResponseConsumer$CheckForErrorInputStream.lambda$read$1(InputStreamClassicResponseConsumer.java:280)}}
{{09:25:16 at 
com.company.webapi.InputStreamClassicResponseConsumer$CheckForErrorInputStream.callWhileCheckingForErrors(InputStreamClassicResponseConsumer.java:254)}}
{{09:25:16 at 
com.company.webapi.InputStreamClassicResponseConsumer$CheckForErrorInputStream.read(InputStreamClassicResponseConsumer.java:280)}}

> race condition with CapacityWindow resulting in CancelKeyException
> ------------------------------------------------------------------
>
>                 Key: HTTPCORE-753
>                 URL: https://issues.apache.org/jira/browse/HTTPCORE-753
>             Project: HttpComponents HttpCore
>          Issue Type: Bug
>          Components: HttpCore
>    Affects Versions: 5.1.3
>            Reporter: Malay Shah
>            Priority: Major
>
> We have found a race condition AbstractHttp1StreamDuplexer where the 
> IOSessionImpl is closed off (cancelling it's SelectionKey) first before 
> CapacityWindow (which also has a reference to the IOSession) is closed. This 
> creates a window of opportunity where if a ResponseConsumer (something like 
> AbstractClassicEntityConsumer) holds a reference to the CapacityWindow and 
> tries to call update(), it will result in a CancelledKeyException being 
> thrown. We are running into this issue fairly regularly.
> Here is what I believe the fix is: in AbstractHttp1StreamDuplexer.onInput the 
> following two lines should be swapped:
> {{dataEnd(contentDecoder.getTrailers());}}
> {{capacityWindow.close();}}
>  
> Close the capacityWindow first before calling dataEnd which closes off the 
> connection, thus the capacityWindow will always be safe to use.
>  
> In case this is helpful, here is the callstack for when the connection is 
> closed and the SelectionKey becomes invalid:
> {{close:266, IOSessionImpl (org.apache.hc.core5.reactor)}}
> {{close:266, InternalDataChannel (org.apache.hc.core5.reactor)}}
> {{close:254, InternalDataChannel (org.apache.hc.core5.reactor)}}
> {{shutdownSession:157, AbstractHttp1StreamDuplexer 
> (org.apache.hc.core5.http.impl.nio)}}
> {{close:116, ClientHttp1StreamDuplexer$1 (org.apache.hc.core5.http.impl.nio)}}
> {{dataEnd:277, ClientHttp1StreamHandler (org.apache.hc.core5.http.impl.nio)}}
> {{dataEnd:366, ClientHttp1StreamDuplexer (org.apache.hc.core5.http.impl.nio)}}
> {{onInput:333, AbstractHttp1StreamDuplexer 
> (org.apache.hc.core5.http.impl.nio)}}
> {{inputReady:64, AbstractHttp1IOEventHandler 
> (org.apache.hc.core5.http.impl.nio)}}
> {{inputReady:39, ClientHttp1IOEventHandler 
> (org.apache.hc.core5.http.impl.nio)}}
> {{onIOEvent:131, InternalDataChannel (org.apache.hc.core5.reactor)}}
> {{handleIOEvent:51, InternalChannel (org.apache.hc.core5.reactor)}}
> {{processEvents:178, SingleCoreIOReactor (org.apache.hc.core5.reactor)}}
> {{doExecute:127, SingleCoreIOReactor (org.apache.hc.core5.reactor)}}
> {{execute:85, AbstractSingleCoreIOReactor (org.apache.hc.core5.reactor)}}
> {{run:44, IOReactorWorker (org.apache.hc.core5.reactor)}}
> {{run:829, Thread (java.lang)}}
>  
> In the interim, is it safe to catch and discard the CancelledKeyException 
> when calling update on the CapacityChannel?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to