[jira] [Updated] (ARTEMIS-3767) Rolling upgrade from 2.17 and older broken since 2.18

Jira Thu, 07 Apr 2022 04:55:14 -0700


     [ 
https://issues.apache.org/jira/browse/ARTEMIS-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jan Šmucr updated ARTEMIS-3767:
-------------------------------
    Description: 
It's not possible to perform a rolling upgrade in replication environment. 
After upgrading the *slave* from 2.17 to 2.18 it reports:
{noformat}
AMQ214013: Failed to decode packet: java.lang.IndexOutOfBoundsException: 
readerIndex(57) + length(1) exceeds writerIndex(57): 
PooledUnsafeDirectByteBuf(ridx: 57, widx: 57, cap: 57) {noformat}
The 2.17 *master* then crashes with an exception:
{noformat}
2022-04-07 10:01:23,032 WARN  [org.apache.activemq.artemis.core.server] 
AMQ222010: Critical IO Error, shutting down the server. file=NULL, 
message=AMQ229114: Replication synchronization process timed out after waiting 
30,000 milliseconds: 
ActiveMQReplicationTimeooutException[errorType=REPLICATION_TIMEOUT_ERROR 
message=AMQ229114: Replication synchronization process timed out after waiting 
30,000 milliseconds]
        at 
org.apache.activemq.artemis.core.replication.ReplicationManager.sendSynchronizationDone(ReplicationManager.java:660)
 [artemis-server-2.17.0.jar:2.17.0]
        at 
org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.startReplication(JournalStorageManager.java:717)
 [artemis-server-2.17.0.jar:2.17.0]
        at 
org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation$2.run(SharedNothingLiveActivation.java:180)
 [artemis-server-2.17.0.jar:2.17.0]
        at java.base/java.lang.Thread.run(Thread.java:829) [java.base:] 
{noformat}
Upgrades from lower versions (or to higher versions) aren't possible either.

Steps to replicate the issue:
 # Create a master instance (replace the IPs to match your setup):
{noformat}
apache-artemis-2.17.0/bin/artemis create --aio --allow-anonymous --user admin 
--password admin --clustered --cluster-user admin --cluster-password admin 
--host 10.35.4.16 --http-host 10.35.4.16 --replicated --staticCluster 
tcp://10.35.4.211:61616 -- broker-master {noformat}
 # Start the instance:
{noformat}
broker-master/bin/artemis run{noformat}
 # Create a slave instance (it's fine to start the 2.18 right away, no need for 
a real upgrade):
{noformat}
apache-artemis-2.17.0/bin/artemis create --aio --allow-anonymous --user admin 
--password admin --clustered --slave --cluster-user admin --cluster-password 
admin --host 10.35.4.211 --http-host 10.35.4.211 --replicated --staticCluster 
tcp://10.35.4.16:61616 -- broker-slave{noformat}
 # Start the instance:
{noformat}
broker-slave/bin/artemis run {noformat}
 # The master crashes while the slave keeps running doing nothing.

  was:
It's not possible to perform a rolling upgrade in replication environment. 
After upgrading the *slave* from 2.17 to 2.18 it reports:
{noformat}
AMQ214013: Failed to decode packet: java.lang.IndexOutOfBoundsException: 
readerIndex(57) + length(1) exceeds writerIndex(57): 
PooledUnsafeDirectByteBuf(ridx: 57, widx: 57, cap: 57) {noformat}
The 2.17 *master* then crashes with an exception:
{noformat}
2022-04-07 10:01:23,032 WARN  [org.apache.activemq.artemis.core.server] 
AMQ222010: Critical IO Error, shutting down the server. file=NULL, 
message=AMQ229114: Replication synchronization process timed out after waiting 
30,000 milliseconds: 
ActiveMQReplicationTimeooutException[errorType=REPLICATION_TIMEOUT_ERROR 
message=AMQ229114: Replication synchronization process timed out after waiting 
30,000 milliseconds]
        at 
org.apache.activemq.artemis.core.replication.ReplicationManager.sendSynchronizationDone(ReplicationManager.java:660)
 [artemis-server-2.17.0.jar:2.17.0]
        at 
org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.startReplication(JournalStorageManager.java:717)
 [artemis-server-2.17.0.jar:2.17.0]
        at 
org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation$2.run(SharedNothingLiveActivation.java:180)
 [artemis-server-2.17.0.jar:2.17.0]
        at java.base/java.lang.Thread.run(Thread.java:829) [java.base:] 
{noformat}
Upgrades from lower versions (or to higher versions) aren't possible either.

Steps to replicate the issue:
 # Create a master instance (replace the IPs to match your setup):
{noformat}
apache-artemis-2.17.0/bin/artemis create --aio --allow-anonymous --user admin 
--password admin --clustered --cluster-user admin --cluster-password admin 
--host 10.35.4.16 --http-host 10.35.4.16 --replicated --staticCluster 
tcp://10.35.4.211:61616 -- broker-master {noformat}

 # Start the instance:
{noformat}
broker-master/bin/artemis run{noformat}

 # Create a slave instance (it's fine to start the 2.18 right away, no need for 
a real upgrade):
{noformat}
apache-artemis-2.17.0/bin/artemis create --aio --allow-anonymous --user admin 
--password admin --clustered --slave --cluster-user admin --cluster-password 
admin --host 10.35.4.211 --http-host 10.35.4.211 --replicated --staticCluster 
tcp://10.35.4.16:61616 -- broker-slave{noformat}

 # Start the instance:
{noformat}
broker-slave/bin/artemis run {noformat}

 # The master crashes while the slave keeps running doing nothing.


> Rolling upgrade from 2.17 and older broken since 2.18
> -----------------------------------------------------
>
>                 Key: ARTEMIS-3767
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-3767
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.18.0
>         Environment: AWS EC2 t3a.large
> CentOS Linux release 7.9.2009
> OpenJDK 8, OpenJDK 11
>            Reporter: Jan Šmucr
>            Priority: Major
>         Attachments: broker-master.log, broker-slave.log
>
>
> It's not possible to perform a rolling upgrade in replication environment. 
> After upgrading the *slave* from 2.17 to 2.18 it reports:
> {noformat}
> AMQ214013: Failed to decode packet: java.lang.IndexOutOfBoundsException: 
> readerIndex(57) + length(1) exceeds writerIndex(57): 
> PooledUnsafeDirectByteBuf(ridx: 57, widx: 57, cap: 57) {noformat}
> The 2.17 *master* then crashes with an exception:
> {noformat}
> 2022-04-07 10:01:23,032 WARN  [org.apache.activemq.artemis.core.server] 
> AMQ222010: Critical IO Error, shutting down the server. file=NULL, 
> message=AMQ229114: Replication synchronization process timed out after 
> waiting 30,000 milliseconds: 
> ActiveMQReplicationTimeooutException[errorType=REPLICATION_TIMEOUT_ERROR 
> message=AMQ229114: Replication synchronization process timed out after 
> waiting 30,000 milliseconds]
>         at 
> org.apache.activemq.artemis.core.replication.ReplicationManager.sendSynchronizationDone(ReplicationManager.java:660)
>  [artemis-server-2.17.0.jar:2.17.0]
>         at 
> org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.startReplication(JournalStorageManager.java:717)
>  [artemis-server-2.17.0.jar:2.17.0]
>         at 
> org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation$2.run(SharedNothingLiveActivation.java:180)
>  [artemis-server-2.17.0.jar:2.17.0]
>         at java.base/java.lang.Thread.run(Thread.java:829) [java.base:] 
> {noformat}
> Upgrades from lower versions (or to higher versions) aren't possible either.
> Steps to replicate the issue:
>  # Create a master instance (replace the IPs to match your setup):
> {noformat}
> apache-artemis-2.17.0/bin/artemis create --aio --allow-anonymous --user admin 
> --password admin --clustered --cluster-user admin --cluster-password admin 
> --host 10.35.4.16 --http-host 10.35.4.16 --replicated --staticCluster 
> tcp://10.35.4.211:61616 -- broker-master {noformat}
>  # Start the instance:
> {noformat}
> broker-master/bin/artemis run{noformat}
>  # Create a slave instance (it's fine to start the 2.18 right away, no need 
> for a real upgrade):
> {noformat}
> apache-artemis-2.17.0/bin/artemis create --aio --allow-anonymous --user admin 
> --password admin --clustered --slave --cluster-user admin --cluster-password 
> admin --host 10.35.4.211 --http-host 10.35.4.211 --replicated --staticCluster 
> tcp://10.35.4.16:61616 -- broker-slave{noformat}
>  # Start the instance:
> {noformat}
> broker-slave/bin/artemis run {noformat}
>  # The master crashes while the slave keeps running doing nothing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (ARTEMIS-3767) Rolling upgrade from 2.17 and older broken since 2.18

Reply via email to