[jira] [Comment Edited] (ARTEMIS-3831) Scale-down fails when using same discovery-group used by Broker cluster connection

Bob Maloney (Jira) Tue, 23 Aug 2022 10:35:05 -0700


    [ 
https://issues.apache.org/jira/browse/ARTEMIS-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583767#comment-17583767
 ]


Bob Maloney edited comment on ARTEMIS-3831 at 8/23/22 5:34 PM:
---------------------------------------------------------------

Receiving the same error. Are there example config files for the possible 
workaround in the description? Aside from scale-down, I have clustering 
operational in Kubernetes.

Note that the error can be replicated with a single cluster-enabled broker. For 
the workaround, I've essentially duplicated the existing configs, but nothing 
stands out that now one JGroups channel will be used by the broker versus the 
other used by scale-down. No errors on startup, but still receiving AMQ222181 
on shutdown.

ha policy
{code:xml}
      <ha-policy>
         <live-only>
            <scale-down>
               <enabled>true</enabled>
               <discovery-group-ref 
discovery-group-name="jgroups-discovery-group"/>
            </scale-down>
         </live-only>
      </ha-policy> 
{code}
acceptor/connector (for separate port)
{code:xml}
         ...         
         <acceptor name="netty-acceptor">tcp://0.0.0.0:61618</acceptor>

         <!-- added -->
         <acceptor name="jgroups-netty-acceptor">tcp://0.0.0.0:61619</acceptor>

      </acceptors>

      <connectors>
         <connector name="netty-connector">tcp://0.0.0.0:61618</connector>
         <!-- added -->
         <connector 
name="jgroups-netty-connector">tcp://0.0.0.0:61619</connector>
      </connectors>
{code}
broadcast-group
{code:xml}
      <broadcast-groups>
         <broadcast-group name="artemis-broadcast-group">
            <broadcast-period>2000</broadcast-period>
            <jgroups-file>jgroups.xml</jgroups-file>
            <jgroups-channel>artemis_broadcast_channel</jgroups-channel>
            <connector-ref>netty-connector</connector-ref>
         </broadcast-group>
         <!-- added below -->
         <broadcast-group name="jgroups-broadcast-group">
            <broadcast-period>2000</broadcast-period>
            <jgroups-file>jgroups_2.xml</jgroups-file>
            <jgroups-channel>jgroups_broadcast_channel</jgroups-channel>
            <connector-ref>jgroups-netty-connector</connector-ref>
         </broadcast-group>
      </broadcast-groups>
{code}
discovery-group
{code:xml}
      <discovery-groups>
         <discovery-group name="artemis-discovery-group">
            <jgroups-file>jgroups.xml</jgroups-file>
            <jgroups-channel>artemis_broadcast_channel</jgroups-channel>
            <refresh-timeout>10000</refresh-timeout>
         </discovery-group>
         <!-- added below -->
         <discovery-group name="jgroups-discovery-group">
            <jgroups-file>jgroups_2.xml</jgroups-file>
            <jgroups-channel>jgroups_broadcast_channel</jgroups-channel>
            <refresh-timeout>10000</refresh-timeout>
         </discovery-group>
      </discovery-groups> 
{code}
cluster-connection
{code:xml}
      <cluster-connections>
         <cluster-connection name="artemis-cluster">
            <address></address>
            <connector-ref>netty-connector</connector-ref>
            <check-period>1000</check-period>
            <connection-ttl>5000</connection-ttl>
            <min-large-message-size>50000</min-large-message-size>
            <call-timeout>5000</call-timeout>
            <retry-interval>500</retry-interval>
            <retry-interval-multiplier>1.0</retry-interval-multiplier>
            <max-retry-interval>5000</max-retry-interval>
            <initial-connect-attempts>-1</initial-connect-attempts>
            <reconnect-attempts>-1</reconnect-attempts>
            <use-duplicate-detection>true</use-duplicate-detection>
            <message-load-balancing>ON_DEMAND</message-load-balancing>
            <max-hops>1</max-hops>
            <confirmation-window-size>32000</confirmation-window-size>
            <call-failover-timeout>30000</call-failover-timeout>
            <notification-interval>1000</notification-interval>
            <notification-attempts>2</notification-attempts>
            <discovery-group-ref 
discovery-group-name="artemis-discovery-group"/>
         </cluster-connection>
         <!-- added below -->
         <cluster-connection name="jgroups-cluster">
            <address></address>
            <connector-ref>jgroups-netty-connector</connector-ref>
            <check-period>1000</check-period>
            <connection-ttl>5000</connection-ttl>
            <min-large-message-size>50000</min-large-message-size>
            <call-timeout>5000</call-timeout>
            <retry-interval>500</retry-interval>
            <retry-interval-multiplier>1.0</retry-interval-multiplier>
            <max-retry-interval>5000</max-retry-interval>
            <initial-connect-attempts>-1</initial-connect-attempts>
            <reconnect-attempts>-1</reconnect-attempts>
            <use-duplicate-detection>true</use-duplicate-detection>
            <message-load-balancing>ON_DEMAND</message-load-balancing>
            <max-hops>1</max-hops>
            <confirmation-window-size>32000</confirmation-window-size>
            <call-failover-timeout>30000</call-failover-timeout>
            <notification-interval>1000</notification-interval>
            <notification-attempts>2</notification-attempts>
            <discovery-group-ref 
discovery-group-name="jgroups-discovery-group"/>
         </cluster-connection>
      </cluster-connections>
{code}
New <jgroups-file>, with only change being a separate bind_port. 
jgroups-kubernetes is used for server discovery
{code:xml}
<config xmlns="urn:org:jgroups"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd";>
    <TCP external_addr="match-interface:eth0" 
bind_addr="site_local,match-interface:eth0" bind_port="7801" recv_buf_size="5M" 
send_buf_size="1M" thread_naming_pattern="cl" thread_pool.min_threads="0" 
thread_pool.max_threads="500" thread_pool.keep_alive_time="30000"/>

    <org.jgroups.protocols.kubernetes.KUBE_PING namespace="..." 
labels="app.kubernetes.io/instance=..." useNotReadyAddresses="false"/>

    <PING return_entire_cache="true"/>

    <MERGE3 max_interval="30000" min_interval="10000"/>
    <FD_SOCK2/>
    <FD_ALL timeout="10000" interval="3000"/>
    <VERIFY_SUSPECT timeout="1500"/>
    <pbcast.NAKACK2 xmit_interval="500" xmit_table_num_rows="100" 
xmit_table_msgs_per_row="2000" xmit_table_max_compaction_time="30000" 
use_mcast_xmit="false" discard_delivered_msgs="true"/>
    <UNICAST3 xmit_table_num_rows="100" xmit_table_msgs_per_row="1000" 
xmit_table_max_compaction_time="30000"/>
    <pbcast.STABLE desired_avg_gossip="50000" max_bytes="8m"/>
    <pbcast.GMS print_local_addr="true" join_timeout="3000"/>
    <MFC max_credits="2M" min_threshold="0.4"/>
    <FRAG2 frag_size="60K"/>
</config>
{code}
 


was (Author: JIRAUSER281196):
Receiving the same error. Are there example config files for the possible 
workaround in the description? Aside from scale-down, I have clustering 
operational in Kubernetes.

Note that the error can be replicated with a single cluster-enabled broker. For 
the workaround, I've essentially duplicated the existing configs, but nothing 
stands out that now one JGroups channel will be used by the broker versus the 
other used by scale-down. No errors on startup, but still receiving AMQ222181 
on shutdown.

acceptor/connector (for separate port)
{code:xml}
         ...         
         <acceptor name="netty-acceptor">tcp://0.0.0.0:61618</acceptor>

         <!-- added -->
         <acceptor name="jgroups-netty-acceptor">tcp://0.0.0.0:61619</acceptor>

      </acceptors>

      <connectors>
         <connector name="netty-connector">tcp://0.0.0.0:61618</connector>
         <!-- added -->
         <connector 
name="jgroups-netty-connector">tcp://0.0.0.0:61619</connector>
      </connectors>
{code}
broadcast-group
{code:xml}
      <broadcast-groups>
         <broadcast-group name="artemis-broadcast-group">
            <broadcast-period>2000</broadcast-period>
            <jgroups-file>jgroups.xml</jgroups-file>
            <jgroups-channel>artemis_broadcast_channel</jgroups-channel>
            <connector-ref>netty-connector</connector-ref>
         </broadcast-group>
         <!-- added below -->
         <broadcast-group name="jgroups-broadcast-group">
            <broadcast-period>2000</broadcast-period>
            <jgroups-file>jgroups_2.xml</jgroups-file>
            <jgroups-channel>jgroups_broadcast_channel</jgroups-channel>
            <connector-ref>jgroups-netty-connector</connector-ref>
         </broadcast-group>
      </broadcast-groups>
{code}
discovery-group
{code:xml}
      <discovery-groups>
         <discovery-group name="artemis-discovery-group">
            <jgroups-file>jgroups.xml</jgroups-file>
            <jgroups-channel>artemis_broadcast_channel</jgroups-channel>
            <refresh-timeout>10000</refresh-timeout>
         </discovery-group>
         <!-- added below -->
         <discovery-group name="jgroups-discovery-group">
            <jgroups-file>jgroups_2.xml</jgroups-file>
            <jgroups-channel>jgroups_broadcast_channel</jgroups-channel>
            <refresh-timeout>10000</refresh-timeout>
         </discovery-group>
      </discovery-groups> 
{code}
cluster-connection
{code:xml}
      <cluster-connections>
         <cluster-connection name="artemis-cluster">
            <address></address>
            <connector-ref>netty-connector</connector-ref>
            <check-period>1000</check-period>
            <connection-ttl>5000</connection-ttl>
            <min-large-message-size>50000</min-large-message-size>
            <call-timeout>5000</call-timeout>
            <retry-interval>500</retry-interval>
            <retry-interval-multiplier>1.0</retry-interval-multiplier>
            <max-retry-interval>5000</max-retry-interval>
            <initial-connect-attempts>-1</initial-connect-attempts>
            <reconnect-attempts>-1</reconnect-attempts>
            <use-duplicate-detection>true</use-duplicate-detection>
            <message-load-balancing>ON_DEMAND</message-load-balancing>
            <max-hops>1</max-hops>
            <confirmation-window-size>32000</confirmation-window-size>
            <call-failover-timeout>30000</call-failover-timeout>
            <notification-interval>1000</notification-interval>
            <notification-attempts>2</notification-attempts>
            <discovery-group-ref 
discovery-group-name="artemis-discovery-group"/>
         </cluster-connection>
         <!-- added below -->
         <cluster-connection name="jgroups-cluster">
            <address></address>
            <connector-ref>jgroups-netty-connector</connector-ref>
            <check-period>1000</check-period>
            <connection-ttl>5000</connection-ttl>
            <min-large-message-size>50000</min-large-message-size>
            <call-timeout>5000</call-timeout>
            <retry-interval>500</retry-interval>
            <retry-interval-multiplier>1.0</retry-interval-multiplier>
            <max-retry-interval>5000</max-retry-interval>
            <initial-connect-attempts>-1</initial-connect-attempts>
            <reconnect-attempts>-1</reconnect-attempts>
            <use-duplicate-detection>true</use-duplicate-detection>
            <message-load-balancing>ON_DEMAND</message-load-balancing>
            <max-hops>1</max-hops>
            <confirmation-window-size>32000</confirmation-window-size>
            <call-failover-timeout>30000</call-failover-timeout>
            <notification-interval>1000</notification-interval>
            <notification-attempts>2</notification-attempts>
            <discovery-group-ref 
discovery-group-name="jgroups-discovery-group"/>
         </cluster-connection>
      </cluster-connections>
{code}
New <jgroups-file>, with only change being a separate bind_port. 
jgroups-kubernetes is used for server discovery
{code:xml}
<config xmlns="urn:org:jgroups"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd";>
    <TCP external_addr="match-interface:eth0" 
bind_addr="site_local,match-interface:eth0" bind_port="7801" recv_buf_size="5M" 
send_buf_size="1M" thread_naming_pattern="cl" thread_pool.min_threads="0" 
thread_pool.max_threads="500" thread_pool.keep_alive_time="30000"/>

    <org.jgroups.protocols.kubernetes.KUBE_PING namespace="..." 
labels="app.kubernetes.io/instance=..." useNotReadyAddresses="false"/>

    <PING return_entire_cache="true"/>

    <MERGE3 max_interval="30000" min_interval="10000"/>
    <FD_SOCK2/>
    <FD_ALL timeout="10000" interval="3000"/>
    <VERIFY_SUSPECT timeout="1500"/>
    <pbcast.NAKACK2 xmit_interval="500" xmit_table_num_rows="100" 
xmit_table_msgs_per_row="2000" xmit_table_max_compaction_time="30000" 
use_mcast_xmit="false" discard_delivered_msgs="true"/>
    <UNICAST3 xmit_table_num_rows="100" xmit_table_msgs_per_row="1000" 
xmit_table_max_compaction_time="30000"/>
    <pbcast.STABLE desired_avg_gossip="50000" max_bytes="8m"/>
    <pbcast.GMS print_local_addr="true" join_timeout="3000"/>
    <MFC max_credits="2M" min_threshold="0.4"/>
    <FRAG2 frag_size="60K"/>
</config>
{code}
 

> Scale-down fails when using same discovery-group used by Broker cluster 
> connection
> ----------------------------------------------------------------------------------
>
>                 Key: ARTEMIS-3831
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-3831
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.19.1
>            Reporter: Apache Dev
>            Priority: Major
>
> Using 2 Live brokers in cluster.
> Both having the following HA Policy:
> {code}
>         <ha-policy>
>             <live-only>
>                 <scale-down>
>                     <enabled>true</enabled>
>                     <discovery-group-ref 
> discovery-group-name="activemq-discovery-group"/>
>                 </scale-down>
>             </live-only>
>         </ha-policy>
> {code}
> where "activemq-discovery-group" is using JGroups TCPPING:
> {code}
>         <discovery-groups>
>             <discovery-group name="activemq-discovery-group">
>                 <jgroups-file>...</jgroups-file>
>                 <jgroups-channel>...</jgroups-channel>
>                 <refresh-timeout>10000</refresh-timeout>
>             </discovery-group>
>         </discovery-groups>
> {code}
> and it is used by the cluster of 2 brokers:
> {code}
>         <cluster-connections>
>             <cluster-connection name="activemq-cluster">
>                 <connector-ref>netty-connector</connector-ref>
>                 <retry-interval>5000</retry-interval>
>                 <use-duplicate-detection>true</use-duplicate-detection>
>                 <message-load-balancing>OFF</message-load-balancing>
>                 <max-hops>1</max-hops>
>                 <discovery-group-ref 
> discovery-group-name="activemq-discovery-group"/>
>             </cluster-connection>
>         </cluster-connections>
> {code}
> Issue is that when shutdown happens, scale-down fails:
> {code}
> org.apache.activemq.artemis.core.server                      W AMQ222181: 
> Unable to scaleDown messages
>         ActiveMQInternalErrorException[errorType=INTERNAL_ERROR 
> message=AMQ219004: Failed to initialise session factory]
>         at 
> org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.initialize(ServerLocatorImpl.java:272)
>         at 
> org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:655)
>         at 
> org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:554)
>         at 
> org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:533)
>         at 
> org.apache.activemq.artemis.core.server.LiveNodeLocator.connectToCluster(LiveNodeLocator.java:85)
>         at 
> org.apache.activemq.artemis.core.server.impl.LiveOnlyActivation.connectToScaleDownTarget(LiveOnlyActivation.java:146)
>         at 
> org.apache.activemq.artemis.core.server.impl.LiveOnlyActivation.freezeConnections(LiveOnlyActivation.java:114)
>         at 
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.freezeConnections(ActiveMQServerImpl.java:1468)
>         at 
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1250)
>         at 
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1166)
>         at 
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1150)
>         ...
>         Caused by: ActiveMQInternalErrorException[errorType=INTERNAL_ERROR 
> message=channel is closed]
>         at 
> org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.startDiscovery(ServerLocatorImpl.java:286)
>         at 
> org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.initialize(ServerLocatorImpl.java:268)
>         ... 44 more
>         Caused by: java.lang.IllegalStateException: channel is closed
>         at org.jgroups.JChannel.checkClosed(JChannel.java:957)
>         at org.jgroups.JChannel._preConnect(JChannel.java:548)
>         at org.jgroups.JChannel.connect(JChannel.java:288)
>         at org.jgroups.JChannel.connect(JChannel.java:279)
>         at 
> org.apache.activemq.artemis.api.core.jgroups.JChannelWrapper.connect(JChannelWrapper.java:126)
>         at 
> org.apache.activemq.artemis.api.core.JGroupsBroadcastEndpoint.internalOpen(JGroupsBroadcastEndpoint.java:113)
>         at 
> org.apache.activemq.artemis.api.core.JGroupsBroadcastEndpoint.openClient(JGroupsBroadcastEndpoint.java:91)
>         at 
> org.apache.activemq.artemis.core.cluster.DiscoveryGroup.start(DiscoveryGroup.java:111)
>         at 
> org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.startDiscovery(ServerLocatorImpl.java:284)
>         ... 45 more
> {code}
> JGroups channel used by scale-down is probably the same used by broker, but 
> already being closed during broker shutdown itself.
> As a workaround, it is possible to create a separate discovery-group (with 
> its own broadcast-group) so that scale-down uses a new JGroups channel not 
> being closed by broker.
> However, this causes duplication of configurations and a new JGroups port for 
> the scale-down discovery must be opened.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (ARTEMIS-3831) Scale-down fails when using same discovery-group used by Broker cluster connection

Reply via email to