[jira] [Work logged] (ARTEMIS-4305) Zero persistence does not work in kubernetes

2024-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-4305?focusedWorklogId=923259=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-923259
 ]

ASF GitHub Bot logged work on ARTEMIS-4305:
---

Author: ASF GitHub Bot
Created on: 13/Jun/24 08:53
Start Date: 13/Jun/24 08:53
Worklog Time Spent: 10m 
  Work Description: iiliev2 commented on PR #4899:
URL: 
https://github.com/apache/activemq-artemis/pull/4899#issuecomment-2165039460

   Fixed the checkstyle issues.




Issue Time Tracking
---

Worklog Id: (was: 923259)
Time Spent: 1.5h  (was: 1h 20m)

> Zero persistence does not work in kubernetes
> 
>
> Key: ARTEMIS-4305
> URL: https://issues.apache.org/jira/browse/ARTEMIS-4305
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Ivan Iliev
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In a cluster deployed in kubernetes, when a node is destroyed it terminates 
> the process and shuts down the network before the process has a chance to 
> close connections. Then a new node might be brought up, reusing the old 
> node’s ip. If this happens before the connection ttl, from artemis’ point of 
> view, it looks like as if the connection came back. Yet it is actually not 
> the same, the peer has a new node id, etc. This messes things up with the 
> cluster, the old message flow record is invalid.
> One way to fix it could be if the {{Ping}} messages which are typically used 
> to detect dead connections could use some sort of connection id to match that 
> the other side is really the one which it is supposed to be.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@activemq.apache.org
For additional commands, e-mail: issues-h...@activemq.apache.org
For further information, visit: https://activemq.apache.org/contact




[jira] [Work logged] (ARTEMIS-4305) Zero persistence does not work in kubernetes

2024-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-4305?focusedWorklogId=922600=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-922600
 ]

ASF GitHub Bot logged work on ARTEMIS-4305:
---

Author: ASF GitHub Bot
Created on: 07/Jun/24 15:07
Start Date: 07/Jun/24 15:07
Worklog Time Spent: 10m 
  Work Description: jbertram commented on PR #4899:
URL: 
https://github.com/apache/activemq-artemis/pull/4899#issuecomment-2155033417

   I believe the use-case here only involves cluster nodes and the core 
connections between them. Therefore, I don't think AMQP is in view.




Issue Time Tracking
---

Worklog Id: (was: 922600)
Time Spent: 1h 20m  (was: 1h 10m)

> Zero persistence does not work in kubernetes
> 
>
> Key: ARTEMIS-4305
> URL: https://issues.apache.org/jira/browse/ARTEMIS-4305
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Ivan Iliev
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In a cluster deployed in kubernetes, when a node is destroyed it terminates 
> the process and shuts down the network before the process has a chance to 
> close connections. Then a new node might be brought up, reusing the old 
> node’s ip. If this happens before the connection ttl, from artemis’ point of 
> view, it looks like as if the connection came back. Yet it is actually not 
> the same, the peer has a new node id, etc. This messes things up with the 
> cluster, the old message flow record is invalid.
> One way to fix it could be if the {{Ping}} messages which are typically used 
> to detect dead connections could use some sort of connection id to match that 
> the other side is really the one which it is supposed to be.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@activemq.apache.org
For additional commands, e-mail: issues-h...@activemq.apache.org
For further information, visit: https://activemq.apache.org/contact




[jira] [Work logged] (ARTEMIS-4305) Zero persistence does not work in kubernetes

2024-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-4305?focusedWorklogId=922599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-922599
 ]

ASF GitHub Bot logged work on ARTEMIS-4305:
---

Author: ASF GitHub Bot
Created on: 07/Jun/24 15:01
Start Date: 07/Jun/24 15:01
Worklog Time Spent: 10m 
  Work Description: clebertsuconic commented on PR #4899:
URL: 
https://github.com/apache/activemq-artemis/pull/4899#issuecomment-2155022268

   If this is an issue in Core, it will be an issue in AMQP as well. we should 
make sure AMQP also takes care of this?
   
   WDYT @jbertram @gemmellr @gtully @tabish121 ?




Issue Time Tracking
---

Worklog Id: (was: 922599)
Time Spent: 1h 10m  (was: 1h)

> Zero persistence does not work in kubernetes
> 
>
> Key: ARTEMIS-4305
> URL: https://issues.apache.org/jira/browse/ARTEMIS-4305
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Ivan Iliev
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In a cluster deployed in kubernetes, when a node is destroyed it terminates 
> the process and shuts down the network before the process has a chance to 
> close connections. Then a new node might be brought up, reusing the old 
> node’s ip. If this happens before the connection ttl, from artemis’ point of 
> view, it looks like as if the connection came back. Yet it is actually not 
> the same, the peer has a new node id, etc. This messes things up with the 
> cluster, the old message flow record is invalid.
> One way to fix it could be if the {{Ping}} messages which are typically used 
> to detect dead connections could use some sort of connection id to match that 
> the other side is really the one which it is supposed to be.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@activemq.apache.org
For additional commands, e-mail: issues-h...@activemq.apache.org
For further information, visit: https://activemq.apache.org/contact




[jira] [Work logged] (ARTEMIS-4305) Zero persistence does not work in kubernetes

2024-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-4305?focusedWorklogId=922596=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-922596
 ]

ASF GitHub Bot logged work on ARTEMIS-4305:
---

Author: ASF GitHub Bot
Created on: 07/Jun/24 14:37
Start Date: 07/Jun/24 14:37
Worklog Time Spent: 10m 
  Work Description: iiliev2 commented on code in PR #4899:
URL: https://github.com/apache/activemq-artemis/pull/4899#discussion_r1631319204


##
artemis-core-client/src/main/java/org/apache/activemq/artemis/core/protocol/core/impl/RemotingConnectionImpl.java:
##
@@ -408,10 +416,15 @@ public void endOfBatch(Object connectionID) {
}
 
private void doBufferReceived(final Packet packet) {
+  if (isHealthy && !isCorrectPing(packet)) {
+ isHealthy = false;

Review Comment:
   Commenting this line out will effectively disable the fix. This will cause 
the new test `ZeroPersistenceSymmetricalClusterTest` to fail.





Issue Time Tracking
---

Worklog Id: (was: 922596)
Time Spent: 1h  (was: 50m)

> Zero persistence does not work in kubernetes
> 
>
> Key: ARTEMIS-4305
> URL: https://issues.apache.org/jira/browse/ARTEMIS-4305
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Ivan Iliev
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In a cluster deployed in kubernetes, when a node is destroyed it terminates 
> the process and shuts down the network before the process has a chance to 
> close connections. Then a new node might be brought up, reusing the old 
> node’s ip. If this happens before the connection ttl, from artemis’ point of 
> view, it looks like as if the connection came back. Yet it is actually not 
> the same, the peer has a new node id, etc. This messes things up with the 
> cluster, the old message flow record is invalid.
> One way to fix it could be if the {{Ping}} messages which are typically used 
> to detect dead connections could use some sort of connection id to match that 
> the other side is really the one which it is supposed to be.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@activemq.apache.org
For additional commands, e-mail: issues-h...@activemq.apache.org
For further information, visit: https://activemq.apache.org/contact




[jira] [Work logged] (ARTEMIS-4305) Zero persistence does not work in kubernetes

2024-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-4305?focusedWorklogId=922595=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-922595
 ]

ASF GitHub Bot logged work on ARTEMIS-4305:
---

Author: ASF GitHub Bot
Created on: 07/Jun/24 14:36
Start Date: 07/Jun/24 14:36
Worklog Time Spent: 10m 
  Work Description: iiliev2 commented on code in PR #4899:
URL: https://github.com/apache/activemq-artemis/pull/4899#discussion_r1631317798


##
artemis-core-client/src/main/java/org/apache/activemq/artemis/core/protocol/core/impl/RemotingConnectionImpl.java:
##
@@ -421,6 +434,34 @@ private void doBufferReceived(final Packet packet) {
   }
}
 
+   private boolean isCorrectPing(Packet packet) {
+  if (packet.getType() != PING) {
+ return true;

Review Comment:
   Commenting this line out will effectively disable the fix. This will cause 
the new test `ZeroPersistenceSymmetricalClusterTest` to fail.





Issue Time Tracking
---

Worklog Id: (was: 922595)
Time Spent: 50m  (was: 40m)

> Zero persistence does not work in kubernetes
> 
>
> Key: ARTEMIS-4305
> URL: https://issues.apache.org/jira/browse/ARTEMIS-4305
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Ivan Iliev
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In a cluster deployed in kubernetes, when a node is destroyed it terminates 
> the process and shuts down the network before the process has a chance to 
> close connections. Then a new node might be brought up, reusing the old 
> node’s ip. If this happens before the connection ttl, from artemis’ point of 
> view, it looks like as if the connection came back. Yet it is actually not 
> the same, the peer has a new node id, etc. This messes things up with the 
> cluster, the old message flow record is invalid.
> One way to fix it could be if the {{Ping}} messages which are typically used 
> to detect dead connections could use some sort of connection id to match that 
> the other side is really the one which it is supposed to be.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@activemq.apache.org
For additional commands, e-mail: issues-h...@activemq.apache.org
For further information, visit: https://activemq.apache.org/contact




[jira] [Work logged] (ARTEMIS-4305) Zero persistence does not work in kubernetes

2024-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-4305?focusedWorklogId=922594=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-922594
 ]

ASF GitHub Bot logged work on ARTEMIS-4305:
---

Author: ASF GitHub Bot
Created on: 07/Jun/24 14:36
Start Date: 07/Jun/24 14:36
Worklog Time Spent: 10m 
  Work Description: iiliev2 commented on code in PR #4899:
URL: https://github.com/apache/activemq-artemis/pull/4899#discussion_r1631317798


##
artemis-core-client/src/main/java/org/apache/activemq/artemis/core/protocol/core/impl/RemotingConnectionImpl.java:
##
@@ -421,6 +434,34 @@ private void doBufferReceived(final Packet packet) {
   }
}
 
+   private boolean isCorrectPing(Packet packet) {
+  if (packet.getType() != PING) {
+ return true;

Review Comment:
   Commenting this line out will effectively disable the fix. This will cause 
the new test `ZeroPersistenceSymmetricalClusterTest` to fail.





Issue Time Tracking
---

Worklog Id: (was: 922594)
Time Spent: 40m  (was: 0.5h)

> Zero persistence does not work in kubernetes
> 
>
> Key: ARTEMIS-4305
> URL: https://issues.apache.org/jira/browse/ARTEMIS-4305
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Ivan Iliev
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In a cluster deployed in kubernetes, when a node is destroyed it terminates 
> the process and shuts down the network before the process has a chance to 
> close connections. Then a new node might be brought up, reusing the old 
> node’s ip. If this happens before the connection ttl, from artemis’ point of 
> view, it looks like as if the connection came back. Yet it is actually not 
> the same, the peer has a new node id, etc. This messes things up with the 
> cluster, the old message flow record is invalid.
> One way to fix it could be if the {{Ping}} messages which are typically used 
> to detect dead connections could use some sort of connection id to match that 
> the other side is really the one which it is supposed to be.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@activemq.apache.org
For additional commands, e-mail: issues-h...@activemq.apache.org
For further information, visit: https://activemq.apache.org/contact




[jira] [Work logged] (ARTEMIS-4305) Zero persistence does not work in kubernetes

2024-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-4305?focusedWorklogId=916585=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-916585
 ]

ASF GitHub Bot logged work on ARTEMIS-4305:
---

Author: ASF GitHub Bot
Created on: 26/Apr/24 08:24
Start Date: 26/Apr/24 08:24
Worklog Time Spent: 10m 
  Work Description: iiliev2 commented on PR #4899:
URL: 
https://github.com/apache/activemq-artemis/pull/4899#issuecomment-2078890094

   Initially I attempted what you suggest about lazy initializing the node id 
like that, precicely because I wanted to keep the code changes to a minimum. 
However, that ended up being much more complicated(rather than simplified), 
because of the way `ClientSessionFactoryImpl` creates a new connection object 
on re-connects. It is very hard to reason about both when reading the code and 
when needing to debug it at runtime. So instead of this, I had to fill the 
missing gaps to use the data that is already there anyway, just wasn't being 
propagated deep enough.
   
   IMO from a functional standpoint, adding the `TransportConfiguration` to the 
connector(and connection) is the right thing to do here anyway. I assume due to 
historical reasons, those classes were working with a subset of the data, and 
no one had a good reason to fix this until now. For example 
`NettyConnection#getConnectorConfig` was creating a bogus transport 
configuration, even though when it was created there was a configuration which 
was not being passed to it.
   
   `Ping` is already the only `Packet` that is being modified. Why do you want 
to use a raw `byte[]` rather than `UUID`? IMO that will be more confusing - it 
suggests that there could be other kind of data that can be contained.




Issue Time Tracking
---

Worklog Id: (was: 916585)
Time Spent: 0.5h  (was: 20m)

> Zero persistence does not work in kubernetes
> 
>
> Key: ARTEMIS-4305
> URL: https://issues.apache.org/jira/browse/ARTEMIS-4305
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Ivan Iliev
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In a cluster deployed in kubernetes, when a node is destroyed it terminates 
> the process and shuts down the network before the process has a chance to 
> close connections. Then a new node might be brought up, reusing the old 
> node’s ip. If this happens before the connection ttl, from artemis’ point of 
> view, it looks like as if the connection came back. Yet it is actually not 
> the same, the peer has a new node id, etc. This messes things up with the 
> cluster, the old message flow record is invalid.
> One way to fix it could be if the {{Ping}} messages which are typically used 
> to detect dead connections could use some sort of connection id to match that 
> the other side is really the one which it is supposed to be.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (ARTEMIS-4305) Zero persistence does not work in kubernetes

2024-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-4305?focusedWorklogId=916557=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-916557
 ]

ASF GitHub Bot logged work on ARTEMIS-4305:
---

Author: ASF GitHub Bot
Created on: 26/Apr/24 04:26
Start Date: 26/Apr/24 04:26
Worklog Time Spent: 10m 
  Work Description: jbertram commented on PR #4899:
URL: 
https://github.com/apache/activemq-artemis/pull/4899#issuecomment-2078606956

   I think you could simplify this quite a bit. Here's what I suggest...
   
   - Don't modify any packet aside from `Ping` and only modify it with a new 
`byte[]`.
   - The broker should send its node ID in every `Ping`.
   - The first time the client receives a `Ping` it should save the node ID.
   - If a client ever receives a `Ping` that is different from the one it has 
saved then it should disconnect.




Issue Time Tracking
---

Worklog Id: (was: 916557)
Time Spent: 20m  (was: 10m)

> Zero persistence does not work in kubernetes
> 
>
> Key: ARTEMIS-4305
> URL: https://issues.apache.org/jira/browse/ARTEMIS-4305
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Ivan Iliev
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In a cluster deployed in kubernetes, when a node is destroyed it terminates 
> the process and shuts down the network before the process has a chance to 
> close connections. Then a new node might be brought up, reusing the old 
> node’s ip. If this happens before the connection ttl, from artemis’ point of 
> view, it looks like as if the connection came back. Yet it is actually not 
> the same, the peer has a new node id, etc. This messes things up with the 
> cluster, the old message flow record is invalid.
> One way to fix it could be if the {{Ping}} messages which are typically used 
> to detect dead connections could use some sort of connection id to match that 
> the other side is really the one which it is supposed to be.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (ARTEMIS-4305) Zero persistence does not work in kubernetes

2024-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-4305?focusedWorklogId=915875=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-915875
 ]

ASF GitHub Bot logged work on ARTEMIS-4305:
---

Author: ASF GitHub Bot
Created on: 22/Apr/24 16:58
Start Date: 22/Apr/24 16:58
Worklog Time Spent: 10m 
  Work Description: iiliev2 opened a new pull request, #4899:
URL: https://github.com/apache/activemq-artemis/pull/4899

   In a cluster deployed in kubernetes, when a node is destroyed it terminates 
the process and shuts down the network before the process has a chance to close 
connections. Then a new node might be brought up, reusing the old node’s ip. If 
this happens before the connection ttl, from artemis’ point of view, it looks 
like as if the connection came back. Yet it is actually not the same, the peer 
has a new node id, etc. This messes things up with the cluster, the old message 
flow record is invalid.
   
   This also solves another similar issue - if a node goes down and a new one 
comes in with a new nodeUUID and the same IP before the cluster connections in 
the others timeout, it would cause them to get stuck and list both the old and 
the new nodes in their topologies.
   
   The changes are grouped in tightly related incremental commits to make it 
easier to understand what is changed:
   
   1. `Ping` packets include `nodeUUID`
   2. Acceptors and connectors carry `TransportConfiguration`
   3. `RemotingConnectionImpl#doBufferReceived` tracks for ping nodeUUID 
mismatch with the target to flag it as `unhealthy`;  `ClientSessionFactoryImpl` 
destroys unhealthy connections(in addition to not receiving any data on time)




Issue Time Tracking
---

Worklog Id: (was: 915875)
Remaining Estimate: 0h
Time Spent: 10m

> Zero persistence does not work in kubernetes
> 
>
> Key: ARTEMIS-4305
> URL: https://issues.apache.org/jira/browse/ARTEMIS-4305
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Ivan Iliev
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In a cluster deployed in kubernetes, when a node is destroyed it terminates 
> the process and shuts down the network before the process has a chance to 
> close connections. Then a new node might be brought up, reusing the old 
> node’s ip. If this happens before the connection ttl, from artemis’ point of 
> view, it looks like as if the connection came back. Yet it is actually not 
> the same, the peer has a new node id, etc. This messes things up with the 
> cluster, the old message flow record is invalid.
> One way to fix it could be if the {{Ping}} messages which are typically used 
> to detect dead connections could use some sort of connection id to match that 
> the other side is really the one which it is supposed to be.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)