[ 
https://issues.apache.org/jira/browse/ARTEMIS-4884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864336#comment-17864336
 ] 

Justin Bertram commented on ARTEMIS-4884:
-----------------------------------------

bq. What puzzles me is why this issue is happening only when using Artemis.

I'm not terribly puzzled by differences in behavior between Artemis and 
Classic. While they both support the same basic semantics and most of the same 
protocols they are implemented in _very_ different ways. This is, of course, 
why Artemis performs so much better than Classic - especially at scale. I've 
sometimes seen this increase in performance expose problems in other parts of 
the environment (e.g. race conditions pop up in client applications with poor 
thread safety). Something similar may be happening in your environment. Perhaps 
Artemis needs more CPU, more memory, faster IO, etc. to deal with the 
concurrent load you're putting on it. It may be that Classic has a built in 
bottleneck due to the way it's designed that's throttling connection attempts 
internally which helps it deal with the load.

The more puzzling thing in my opinion is why it's _so_ specific. The issue only 
happens on Linux, using SSL, with a high number of concurrent connection 
attempts. Furthermore, as far as we know, it only happens _for you_. I can't 
reproduce it.

Ultimately, is it not possible to simply let the handshake-timeout close these 
connections and have the application(s) just re-connect? I assume re-connection 
works in this case as the timeout is enforcing a crude form of connection 
throttling. Perhaps set {{handshake-timeout=1000}} so applications don't have 
to wait as long to reconnect.

bq. Is there some parameter in {{broker.xml}} that I can set or something I can 
do on the host to nail down the problem?

I don't understand enough about why this is happening to say. You could try 
[tuning thread 
pools|https://activemq.apache.org/components/artemis/documentation/latest/thread-pooling.html#server-side-thread-management]
 and gathering and analyzing thread dumps when you see the timeouts happening. 

Aside from that, maybe try setting up a fresh Linux environment with a 
different distribution than you're using now or running it on different 
hardware or using a differing JVM. It seems there's something specific about 
your current environment that's contributing to the problem. I think if this 
was a general issue we would have seen reports of it before now. I expect there 
are many users using Linux & SSL & lots of concurrent connections.

In any case, if there is a problem with Artemis we'll almost certainly need to 
be able to reproduce it before we can fix it.

> java.net.SocketException during multiple parallel SSL connections
> -----------------------------------------------------------------
>
>                 Key: ARTEMIS-4884
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-4884
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>    Affects Versions: 2.35.0
>            Reporter: Liviu Citu
>            Assignee: Justin Bertram
>            Priority: Major
>         Attachments: application.yml, artemis-client.jar, artemis-client.log, 
> artemis_client.zip, artemis_client_executor_service.zip, broker.xml
>
>
> We are currently in process of migrating our broker from Classic 5.x to 
> Artemis. We are currently using CMS C++ client for connecting to the broker 
> {*}but the issue replicates also with the OpenWire JMS client{*}. Everything 
> works fine when using non-SSL setup (on both Windows and Linux) but we have 
> some issues when using SSL on Linux (SSL on Windows is OK).
> The initial problem started with the following exceptions on the client side:
> {noformat}
> 024-02-22 09:54:37.377 [ERROR] [activemq_connection.cc:336] CMS exception: 
> Channel was inactive for too long:
>                 FILE: activemq/core/ActiveMQConnection.cpp, LINE: 1293
>                 FILE: activemq/core/ActiveMQConnection.cpp, LINE: 1371
>                 FILE: activemq/core/ActiveMQConnection.cpp, LINE: 
> 573{noformat}
> while on the broker side we had:
> {noformat}
> 2024-03-20 12:29:08,700 ERROR [org.apache.activemq.artemis.core.server] 
> AMQ224088: Timeout (10 seconds) on acceptor "netty-ssl-acceptor" during 
> protocol handshake with /10.21.70.53:33053 has occurred.{noformat}
> To bypass these we have added the following setting to the *broker.xml* 
> *netty-ssl-acceptor* acceptor: *handshake-timeout=0*
> However now the exceptions we are receiving are:
> *+CMS client+*
> {noformat}
> 2024-05-22 09:26:40.842 [ERROR] [activemq_connection.cc:348] CMS exception: 
> OpenWireFormatNegotiator::onewayWire format negotiation timeout: peer did not 
> send his wire format.
>         FILE: activemq/core/ActiveMQConnection.cpp, LINE: 1293
>         FILE: activemq/core/ActiveMQConnection.cpp, LINE: 1371
>         FILE: activemq/core/ActiveMQConnection.cpp, LINE: 573{noformat}
> +*Java client*+
> {noformat}
> jakarta.jms.JMSException: Could not connect to broker URL: 
> ssl://linux_host:61617?keepAlive=true&wireFormat.maxInactivityDuration=0. 
> Reason: java.net.SocketException: Broken pipe
>   at 
> org.apache.activemq.util.JMSExceptionSupport.create(JMSExceptionSupport.java:49)
>  ~[activemq-client-6.1.2.jar!/:6.1.2]
>   at 
> org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(ActiveMQConnectionFactory.java:423)
>  ~[activemq-client-6.1.2.jar!/:6.1.2]
>   at 
> org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(ActiveMQConnectionFactory.java:353)
>  ~[activemq-client-6.1.2.jar!/:6.1.2]
>   at 
> org.apache.activemq.ActiveMQConnectionFactory.createConnection(ActiveMQConnectionFactory.java:245)
>  ~[activemq-client-6.1.2.jar!/:6.1.2]
> .........................................................................
> Caused by: java.net.SocketException: Broken pipe
>   at java.base/sun.nio.ch.NioSocketImpl.implWrite(NioSocketImpl.java:425) 
> ~[?:?]
>   at java.base/sun.nio.ch.NioSocketImpl.write(NioSocketImpl.java:445) ~[?:?]
>   at java.base/sun.nio.ch.NioSocketImpl$2.write(NioSocketImpl.java:831) ~[?:?]
>   at java.base/java.net.Socket$SocketOutputStream.write(Socket.java:1035) 
> ~[?:?]
>   at 
> java.base/sun.security.ssl.SSLSocketOutputRecord.deliver(SSLSocketOutputRecord.java:345)
>  ~[?:?]
>   at 
> java.base/sun.security.ssl.SSLSocketImpl$AppOutputStream.write(SSLSocketImpl.java:1308)
>  ~[?:?]
>   at 
> org.apache.activemq.transport.tcp.TcpBufferedOutputStream.flush(TcpBufferedOutputStream.java:115)
>  ~[activemq-client-6.1.2.jar!/:6.1.2]
>   at java.base/java.io.DataOutputStream.flush(DataOutputStream.java:128) 
> ~[?:?]
>   at 
> org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:194) 
> ~[activemq-client-6.1.2.jar!/:6.1.2]
>   at 
> org.apache.activemq.transport.AbstractInactivityMonitor.doOnewaySend(AbstractInactivityMonitor.java:336)
>  ~[activemq-client-6.1.2.jar!/:6.1.2]
>   at 
> org.apache.activemq.transport.AbstractInactivityMonitor.oneway(AbstractInactivityMonitor.java:318)
>  ~[activemq-client-6.1.2.jar!/:6.1.2]
>   at 
> org.apache.activemq.transport.WireFormatNegotiator.sendWireFormat(WireFormatNegotiator.java:181)
>  ~[activemq-client-6.1.2.jar!/:6.1.2]
>   at 
> org.apache.activemq.transport.WireFormatNegotiator.sendWireFormat(WireFormatNegotiator.java:84)
>  ~[activemq-client-6.1.2.jar!/:6.1.2]
>   at 
> org.apache.activemq.transport.WireFormatNegotiator.start(WireFormatNegotiator.java:74)
>  ~[activemq-client-6.1.2.jar!/:6.1.2]
>   at 
> org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:64) 
> ~[activemq-client-6.1.2.jar!/:6.1.2]
>   at 
> org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:64) 
> ~[activemq-client-6.1.2.jar!/:6.1.2]
>   at org.apache.{noformat}
> The problem replicates with the following:
>  * SSL on Linux. Problem does not replicate if non-SSL configuration is used. 
> Also does not replicate on Windows (regardless if SSL or non-SSL is used)
>  * two Artemis instances running on the same Linux host (problem does not 
> replicate if there is only one Artemis instance running)
>  * problem also replicates if there is one Artemis Broker and one Classic 
> Broker instance running on the same host
>  * *problem does not replicate with two instances of Classic Brokers. So it 
> is specific to Artemis broker*
>  * when testing with both Classic Broker and Artemis Broker, the client 
> connections using the Classic Broker were fine. Only those using Artemis 
> Broker were failing
>  * Artemis clients are also running on the same same host with the Broker. 
> Basically both client and server are running on the same host
>  * there are many connections done in the same time to the broker (25+). If 
> there are only few then the problem does not happen
>  * example of  connection URL used by the client (the other instance just 
> uses a different port)
> *ssl://linux_host:61617?keepAlive=true&wireFormat.MaxInactivityDuration=0*
>  * Broker configuration file attached (just mangled the SSL stuff and name of 
> the host). The other one is similar (different ports)
> When monitoring the successful connections I found out that usual connections 
> took less than 0.5 seconds to succeed. I was unable to find any successful 
> connection that took more than this.
> Looking to the broker logs we are unable to find any relevant message when 
> the connection fails.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@activemq.apache.org
For additional commands, e-mail: issues-h...@activemq.apache.org
For further information, visit: https://activemq.apache.org/contact


Reply via email to