OS causes an infinite error loop in transport server

Tomas Pavelka (JIRA) Tue, 17 Apr 2018 23:45:04 -0700

    [ 
https://issues.apache.org/jira/browse/AMQ-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441979#comment-16441979
 ]


Tomas Pavelka commented on AMQ-6937:
------------------------------------

Thanks, I see it now. The number of open sockets will be capped by the code in 
org.apache.activemq.transport.tcp.TcpTransportServer#doHandleSocket so on Linux 
you are unlikely to run into the spin loop and even if you do it can be fixed 
by correctly setting up limits.

That would make this a z/OS only problem. In our installation the problem is 
quite common as people recycle the TCPIP stack often to make configuration 
changes which causes the infinite spin loop.

Unfortunately I can't think of any other way to handle this and if I don't get 
the fix in mainline ActiveMQ I would have to run with a forked version which is 
something I would very much like to avoid...

 

> Recycling TCP/IP stack on z/OS causes an infinite error loop in transport 
> server
> --------------------------------------------------------------------------------
>
>                 Key: AMQ-6937
>                 URL: https://issues.apache.org/jira/browse/AMQ-6937
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.15.3
>            Reporter: Tomas Pavelka
>            Priority: Major
>         Attachments: AMQ-6937-cpu-spin-prevention.patch
>
>
> The ActiveMQ transport servers (e.g. TcpTransportServer) run the socket 
> accept (java.net.ServerSocket#accept) in an infinite loop. The accept call 
> can repeatedly fail with an exception spinning the CPU at full speed an 
> filling up logs quickly.
> Here is an example of an exception that gets repeated indefinitely:
> java.net.SocketException: EDC5122I Input/output error. (Accept failed)
>     at java.net.ServerSocket.implAccept(ServerSocket.java:623)
>     at java.net.ServerSocket.accept(ServerSocket.java:582)
>     at 
> org.apache.activemq.transport.tcp.TcpTransportServer.run(TcpTransportServer.java:351)
>     at java.lang.Thread.run(Thread.java:785)
> This is a common problem on z/OS because the pattern of running accept in a 
> loop is used in many open source projects. For example, here is the same 
> issue in Derby:
> https://issues.apache.org/jira/browse/DERBY-5347
> And here in Jetty:
> [https://github.com/eclipse/jetty.project/issues/283]
> Whenever the problem appears the socket becomes unusable. Would it be 
> possible for ActiveMQ to allow to insert a custom 
> org.apache.activemq.transport.TransportAcceptListener that would detect the 
> problem and do a re-bind on the socket?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AMQ-6937) Recycling TCP/IP stack on z/OS causes an infinite error loop in transport server

Reply via email to