[ https://issues.apache.org/jira/browse/ARTEMIS-4884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864336#comment-17864336 ]
Justin Bertram commented on ARTEMIS-4884: ----------------------------------------- bq. What puzzles me is why this issue is happening only when using Artemis. I'm not terribly puzzled by differences in behavior between Artemis and Classic. While they both support the same basic semantics and most of the same protocols they are implemented in _very_ different ways. This is, of course, why Artemis performs so much better than Classic - especially at scale. I've sometimes seen this increase in performance expose problems in other parts of the environment (e.g. race conditions pop up in client applications with poor thread safety). Something similar may be happening in your environment. Perhaps Artemis needs more CPU, more memory, faster IO, etc. to deal with the concurrent load you're putting on it. It may be that Classic has a built in bottleneck due to the way it's designed that's throttling connection attempts internally which helps it deal with the load. The more puzzling thing in my opinion is why it's _so_ specific. The issue only happens on Linux, using SSL, with a high number of concurrent connection attempts. Furthermore, as far as we know, it only happens _for you_. I can't reproduce it. Ultimately, is it not possible to simply let the handshake-timeout close these connections and have the application(s) just re-connect? I assume re-connection works in this case as the timeout is enforcing a crude form of connection throttling. Perhaps set {{handshake-timeout=1000}} so applications don't have to wait as long to reconnect. bq. Is there some parameter in {{broker.xml}} that I can set or something I can do on the host to nail down the problem? I don't understand enough about why this is happening to say. You could try [tuning thread pools|https://activemq.apache.org/components/artemis/documentation/latest/thread-pooling.html#server-side-thread-management] and gathering and analyzing thread dumps when you see the timeouts happening. Aside from that, maybe try setting up a fresh Linux environment with a different distribution than you're using now or running it on different hardware or using a differing JVM. It seems there's something specific about your current environment that's contributing to the problem. I think if this was a general issue we would have seen reports of it before now. I expect there are many users using Linux & SSL & lots of concurrent connections. In any case, if there is a problem with Artemis we'll almost certainly need to be able to reproduce it before we can fix it. > java.net.SocketException during multiple parallel SSL connections > ----------------------------------------------------------------- > > Key: ARTEMIS-4884 > URL: https://issues.apache.org/jira/browse/ARTEMIS-4884 > Project: ActiveMQ Artemis > Issue Type: Bug > Affects Versions: 2.35.0 > Reporter: Liviu Citu > Assignee: Justin Bertram > Priority: Major > Attachments: application.yml, artemis-client.jar, artemis-client.log, > artemis_client.zip, artemis_client_executor_service.zip, broker.xml > > > We are currently in process of migrating our broker from Classic 5.x to > Artemis. We are currently using CMS C++ client for connecting to the broker > {*}but the issue replicates also with the OpenWire JMS client{*}. Everything > works fine when using non-SSL setup (on both Windows and Linux) but we have > some issues when using SSL on Linux (SSL on Windows is OK). > The initial problem started with the following exceptions on the client side: > {noformat} > 024-02-22 09:54:37.377 [ERROR] [activemq_connection.cc:336] CMS exception: > Channel was inactive for too long: > FILE: activemq/core/ActiveMQConnection.cpp, LINE: 1293 > FILE: activemq/core/ActiveMQConnection.cpp, LINE: 1371 > FILE: activemq/core/ActiveMQConnection.cpp, LINE: > 573{noformat} > while on the broker side we had: > {noformat} > 2024-03-20 12:29:08,700 ERROR [org.apache.activemq.artemis.core.server] > AMQ224088: Timeout (10 seconds) on acceptor "netty-ssl-acceptor" during > protocol handshake with /10.21.70.53:33053 has occurred.{noformat} > To bypass these we have added the following setting to the *broker.xml* > *netty-ssl-acceptor* acceptor: *handshake-timeout=0* > However now the exceptions we are receiving are: > *+CMS client+* > {noformat} > 2024-05-22 09:26:40.842 [ERROR] [activemq_connection.cc:348] CMS exception: > OpenWireFormatNegotiator::onewayWire format negotiation timeout: peer did not > send his wire format. > FILE: activemq/core/ActiveMQConnection.cpp, LINE: 1293 > FILE: activemq/core/ActiveMQConnection.cpp, LINE: 1371 > FILE: activemq/core/ActiveMQConnection.cpp, LINE: 573{noformat} > +*Java client*+ > {noformat} > jakarta.jms.JMSException: Could not connect to broker URL: > ssl://linux_host:61617?keepAlive=true&wireFormat.maxInactivityDuration=0. > Reason: java.net.SocketException: Broken pipe > at > org.apache.activemq.util.JMSExceptionSupport.create(JMSExceptionSupport.java:49) > ~[activemq-client-6.1.2.jar!/:6.1.2] > at > org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(ActiveMQConnectionFactory.java:423) > ~[activemq-client-6.1.2.jar!/:6.1.2] > at > org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(ActiveMQConnectionFactory.java:353) > ~[activemq-client-6.1.2.jar!/:6.1.2] > at > org.apache.activemq.ActiveMQConnectionFactory.createConnection(ActiveMQConnectionFactory.java:245) > ~[activemq-client-6.1.2.jar!/:6.1.2] > ......................................................................... > Caused by: java.net.SocketException: Broken pipe > at java.base/sun.nio.ch.NioSocketImpl.implWrite(NioSocketImpl.java:425) > ~[?:?] > at java.base/sun.nio.ch.NioSocketImpl.write(NioSocketImpl.java:445) ~[?:?] > at java.base/sun.nio.ch.NioSocketImpl$2.write(NioSocketImpl.java:831) ~[?:?] > at java.base/java.net.Socket$SocketOutputStream.write(Socket.java:1035) > ~[?:?] > at > java.base/sun.security.ssl.SSLSocketOutputRecord.deliver(SSLSocketOutputRecord.java:345) > ~[?:?] > at > java.base/sun.security.ssl.SSLSocketImpl$AppOutputStream.write(SSLSocketImpl.java:1308) > ~[?:?] > at > org.apache.activemq.transport.tcp.TcpBufferedOutputStream.flush(TcpBufferedOutputStream.java:115) > ~[activemq-client-6.1.2.jar!/:6.1.2] > at java.base/java.io.DataOutputStream.flush(DataOutputStream.java:128) > ~[?:?] > at > org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:194) > ~[activemq-client-6.1.2.jar!/:6.1.2] > at > org.apache.activemq.transport.AbstractInactivityMonitor.doOnewaySend(AbstractInactivityMonitor.java:336) > ~[activemq-client-6.1.2.jar!/:6.1.2] > at > org.apache.activemq.transport.AbstractInactivityMonitor.oneway(AbstractInactivityMonitor.java:318) > ~[activemq-client-6.1.2.jar!/:6.1.2] > at > org.apache.activemq.transport.WireFormatNegotiator.sendWireFormat(WireFormatNegotiator.java:181) > ~[activemq-client-6.1.2.jar!/:6.1.2] > at > org.apache.activemq.transport.WireFormatNegotiator.sendWireFormat(WireFormatNegotiator.java:84) > ~[activemq-client-6.1.2.jar!/:6.1.2] > at > org.apache.activemq.transport.WireFormatNegotiator.start(WireFormatNegotiator.java:74) > ~[activemq-client-6.1.2.jar!/:6.1.2] > at > org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:64) > ~[activemq-client-6.1.2.jar!/:6.1.2] > at > org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:64) > ~[activemq-client-6.1.2.jar!/:6.1.2] > at org.apache.{noformat} > The problem replicates with the following: > * SSL on Linux. Problem does not replicate if non-SSL configuration is used. > Also does not replicate on Windows (regardless if SSL or non-SSL is used) > * two Artemis instances running on the same Linux host (problem does not > replicate if there is only one Artemis instance running) > * problem also replicates if there is one Artemis Broker and one Classic > Broker instance running on the same host > * *problem does not replicate with two instances of Classic Brokers. So it > is specific to Artemis broker* > * when testing with both Classic Broker and Artemis Broker, the client > connections using the Classic Broker were fine. Only those using Artemis > Broker were failing > * Artemis clients are also running on the same same host with the Broker. > Basically both client and server are running on the same host > * there are many connections done in the same time to the broker (25+). If > there are only few then the problem does not happen > * example of connection URL used by the client (the other instance just > uses a different port) > *ssl://linux_host:61617?keepAlive=true&wireFormat.MaxInactivityDuration=0* > * Broker configuration file attached (just mangled the SSL stuff and name of > the host). The other one is similar (different ports) > When monitoring the successful connections I found out that usual connections > took less than 0.5 seconds to succeed. I was unable to find any successful > connection that took more than this. > Looking to the broker logs we are unable to find any relevant message when > the connection fails. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@activemq.apache.org For additional commands, e-mail: issues-h...@activemq.apache.org For further information, visit: https://activemq.apache.org/contact