[ https://issues.apache.org/jira/browse/ARTEMIS-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jan Šmucr updated ARTEMIS-3767: ------------------------------- Description: It's not possible to perform a rolling upgrade in replication environment. After upgrading the *slave* from 2.17 to 2.18 it reports: {noformat} AMQ214013: Failed to decode packet: java.lang.IndexOutOfBoundsException: readerIndex(57) + length(1) exceeds writerIndex(57): PooledUnsafeDirectByteBuf(ridx: 57, widx: 57, cap: 57) {noformat} The 2.17 *master* then crashes with an exception: {noformat} 2022-04-07 10:01:23,032 WARN [org.apache.activemq.artemis.core.server] AMQ222010: Critical IO Error, shutting down the server. file=NULL, message=AMQ229114: Replication synchronization process timed out after waiting 30,000 milliseconds: ActiveMQReplicationTimeooutException[errorType=REPLICATION_TIMEOUT_ERROR message=AMQ229114: Replication synchronization process timed out after waiting 30,000 milliseconds] at org.apache.activemq.artemis.core.replication.ReplicationManager.sendSynchronizationDone(ReplicationManager.java:660) [artemis-server-2.17.0.jar:2.17.0] at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.startReplication(JournalStorageManager.java:717) [artemis-server-2.17.0.jar:2.17.0] at org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation$2.run(SharedNothingLiveActivation.java:180) [artemis-server-2.17.0.jar:2.17.0] at java.base/java.lang.Thread.run(Thread.java:829) [java.base:] {noformat} Upgrades from lower versions (or to higher versions) aren't possible either. Steps to replicate the issue: # Create a master instance (replace the IPs to match your setup): {noformat} apache-artemis-2.17.0/bin/artemis create --aio --allow-anonymous --user admin --password admin --clustered --cluster-user admin --cluster-password admin --host 10.35.4.16 --http-host 10.35.4.16 --replicated --staticCluster tcp://10.35.4.211:61616 -- broker-master {noformat} # Start the instance: {noformat} broker-master/bin/artemis run{noformat} # Create a slave instance (it's fine to start the 2.18 right away, no need for a real upgrade): {noformat} apache-artemis-2.17.0/bin/artemis create --aio --allow-anonymous --user admin --password admin --clustered --slave --cluster-user admin --cluster-password admin --host 10.35.4.211 --http-host 10.35.4.211 --replicated --staticCluster tcp://10.35.4.16:61616 -- broker-slave{noformat} # Start the instance: {noformat} broker-slave/bin/artemis run {noformat} # The master crashes while the slave keeps running doing nothing. was: It's not possible to perform a rolling upgrade in replication environment. After upgrading the *slave* from 2.17 to 2.18 it reports: {noformat} AMQ214013: Failed to decode packet: java.lang.IndexOutOfBoundsException: readerIndex(57) + length(1) exceeds writerIndex(57): PooledUnsafeDirectByteBuf(ridx: 57, widx: 57, cap: 57) {noformat} The 2.17 *master* then crashes with an exception: {noformat} 2022-04-07 10:01:23,032 WARN [org.apache.activemq.artemis.core.server] AMQ222010: Critical IO Error, shutting down the server. file=NULL, message=AMQ229114: Replication synchronization process timed out after waiting 30,000 milliseconds: ActiveMQReplicationTimeooutException[errorType=REPLICATION_TIMEOUT_ERROR message=AMQ229114: Replication synchronization process timed out after waiting 30,000 milliseconds] at org.apache.activemq.artemis.core.replication.ReplicationManager.sendSynchronizationDone(ReplicationManager.java:660) [artemis-server-2.17.0.jar:2.17.0] at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.startReplication(JournalStorageManager.java:717) [artemis-server-2.17.0.jar:2.17.0] at org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation$2.run(SharedNothingLiveActivation.java:180) [artemis-server-2.17.0.jar:2.17.0] at java.base/java.lang.Thread.run(Thread.java:829) [java.base:] {noformat} Upgrades from lower versions (or to higher versions) aren't possible either. Steps to replicate the issue: # Create a master instance (replace the IPs to match your setup): {noformat} apache-artemis-2.17.0/bin/artemis create --aio --allow-anonymous --user admin --password admin --clustered --cluster-user admin --cluster-password admin --host 10.35.4.16 --http-host 10.35.4.16 --replicated --staticCluster tcp://10.35.4.211:61616 -- broker-master {noformat} # Start the instance: {noformat} broker-master/bin/artemis run{noformat} # Create a slave instance (it's fine to start the 2.18 right away, no need for a real upgrade): {noformat} apache-artemis-2.17.0/bin/artemis create --aio --allow-anonymous --user admin --password admin --clustered --slave --cluster-user admin --cluster-password admin --host 10.35.4.211 --http-host 10.35.4.211 --replicated --staticCluster tcp://10.35.4.16:61616 -- broker-slave{noformat} # Start the instance: {noformat} broker-slave/bin/artemis run {noformat} # The master crashes while the slave keeps running doing nothing. > Rolling upgrade from 2.17 and older broken since 2.18 > ----------------------------------------------------- > > Key: ARTEMIS-3767 > URL: https://issues.apache.org/jira/browse/ARTEMIS-3767 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker > Affects Versions: 2.18.0 > Environment: AWS EC2 t3a.large > CentOS Linux release 7.9.2009 > OpenJDK 8, OpenJDK 11 > Reporter: Jan Šmucr > Priority: Major > Attachments: broker-master.log, broker-slave.log > > > It's not possible to perform a rolling upgrade in replication environment. > After upgrading the *slave* from 2.17 to 2.18 it reports: > {noformat} > AMQ214013: Failed to decode packet: java.lang.IndexOutOfBoundsException: > readerIndex(57) + length(1) exceeds writerIndex(57): > PooledUnsafeDirectByteBuf(ridx: 57, widx: 57, cap: 57) {noformat} > The 2.17 *master* then crashes with an exception: > {noformat} > 2022-04-07 10:01:23,032 WARN [org.apache.activemq.artemis.core.server] > AMQ222010: Critical IO Error, shutting down the server. file=NULL, > message=AMQ229114: Replication synchronization process timed out after > waiting 30,000 milliseconds: > ActiveMQReplicationTimeooutException[errorType=REPLICATION_TIMEOUT_ERROR > message=AMQ229114: Replication synchronization process timed out after > waiting 30,000 milliseconds] > at > org.apache.activemq.artemis.core.replication.ReplicationManager.sendSynchronizationDone(ReplicationManager.java:660) > [artemis-server-2.17.0.jar:2.17.0] > at > org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.startReplication(JournalStorageManager.java:717) > [artemis-server-2.17.0.jar:2.17.0] > at > org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation$2.run(SharedNothingLiveActivation.java:180) > [artemis-server-2.17.0.jar:2.17.0] > at java.base/java.lang.Thread.run(Thread.java:829) [java.base:] > {noformat} > Upgrades from lower versions (or to higher versions) aren't possible either. > Steps to replicate the issue: > # Create a master instance (replace the IPs to match your setup): > {noformat} > apache-artemis-2.17.0/bin/artemis create --aio --allow-anonymous --user admin > --password admin --clustered --cluster-user admin --cluster-password admin > --host 10.35.4.16 --http-host 10.35.4.16 --replicated --staticCluster > tcp://10.35.4.211:61616 -- broker-master {noformat} > # Start the instance: > {noformat} > broker-master/bin/artemis run{noformat} > # Create a slave instance (it's fine to start the 2.18 right away, no need > for a real upgrade): > {noformat} > apache-artemis-2.17.0/bin/artemis create --aio --allow-anonymous --user admin > --password admin --clustered --slave --cluster-user admin --cluster-password > admin --host 10.35.4.211 --http-host 10.35.4.211 --replicated --staticCluster > tcp://10.35.4.16:61616 -- broker-slave{noformat} > # Start the instance: > {noformat} > broker-slave/bin/artemis run {noformat} > # The master crashes while the slave keeps running doing nothing. -- This message was sent by Atlassian Jira (v8.20.1#820001)