[jira] [Commented] (ARTEMIS-473) Resolve split brain data after split brains scenarios.
[ https://issues.apache.org/jira/browse/ARTEMIS-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905117#comment-15905117 ] clebert suconic commented on ARTEMIS-473: - [~martyntaylor] this is unfixable. We can only avoid split brains... this feature was a request to fix the journal after a split brain... After the data is mixed.. there's no way to differentiate it. you can only configure the system to avoid it. > Resolve split brain data after split brains scenarios. > -- > > Key: ARTEMIS-473 > URL: https://issues.apache.org/jira/browse/ARTEMIS-473 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Affects Versions: 1.2.0 >Reporter: Miroslav Novak >Assignee: clebert suconic >Priority: Critical > Fix For: 1.5.0 > > > If master-slave pair is configured using replicated journal and there are no > other servers in cluster then if network between master and slave is broken > then slave will activate. Depending on whether clients were disconnected from > master or not there might be or might not be failover to slave. Problem > happens in the moment when network between master and slave is restored. > Master and slave are active at the same time which is the split brain > syndrom. Currently there is no recovery mechanism to solve this situation. > Suggested improvement: If clients failovered to slave then master will > restart itself so failback occurs (if configured). If clients did not > failover and stayed connected to master then backup will restart itself. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARTEMIS-473) Resolve split brain data after split brains scenarios.
[ https://issues.apache.org/jira/browse/ARTEMIS-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690577#comment-15690577 ] ASF subversion and git services commented on ARTEMIS-473: - Commit 402f25be7dc5eda6e4dd1e8170e242415ce94fa8 in activemq-artemis's branch refs/heads/master from Clebert Suconic [ https://git-wip-us.apache.org/repos/asf?p=activemq-artemis.git;h=402f25b ] ARTEMIS-473/ARTEMIS-863 Detect network failures > Resolve split brain data after split brains scenarios. > -- > > Key: ARTEMIS-473 > URL: https://issues.apache.org/jira/browse/ARTEMIS-473 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Affects Versions: 1.2.0 >Reporter: Miroslav Novak >Priority: Critical > Fix For: 1.6.0 > > > If master-slave pair is configured using replicated journal and there are no > other servers in cluster then if network between master and slave is broken > then slave will activate. Depending on whether clients were disconnected from > master or not there might be or might not be failover to slave. Problem > happens in the moment when network between master and slave is restored. > Master and slave are active at the same time which is the split brain > syndrom. Currently there is no recovery mechanism to solve this situation. > Suggested improvement: If clients failovered to slave then master will > restart itself so failback occurs (if configured). If clients did not > failover and stayed connected to master then backup will restart itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARTEMIS-473) Resolve split brain data after split brains scenarios.
[ https://issues.apache.org/jira/browse/ARTEMIS-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15688863#comment-15688863 ] ASF GitHub Bot commented on ARTEMIS-473: GitHub user clebertsuconic opened a pull request: https://github.com/apache/activemq-artemis/pull/895 ARTEMIS-473/ARTEMIS-863 Detect network failures You can merge this pull request into a Git repository by running: $ git pull https://github.com/clebertsuconic/activemq-artemis netcheck Alternatively you can review and apply these changes as the patch at: https://github.com/apache/activemq-artemis/pull/895.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #895 commit 5d040c41822cbda784c29c6488073e5dff4d6f13 Author: Clebert SuconicDate: 2016-11-17T15:01:31Z ARTEMIS-473/ARTEMIS-863 Detect network failures > Resolve split brain data after split brains scenarios. > -- > > Key: ARTEMIS-473 > URL: https://issues.apache.org/jira/browse/ARTEMIS-473 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Affects Versions: 1.2.0 >Reporter: Miroslav Novak >Priority: Critical > Fix For: 1.6.0 > > > If master-slave pair is configured using replicated journal and there are no > other servers in cluster then if network between master and slave is broken > then slave will activate. Depending on whether clients were disconnected from > master or not there might be or might not be failover to slave. Problem > happens in the moment when network between master and slave is restored. > Master and slave are active at the same time which is the split brain > syndrom. Currently there is no recovery mechanism to solve this situation. > Suggested improvement: If clients failovered to slave then master will > restart itself so failback occurs (if configured). If clients did not > failover and stayed connected to master then backup will restart itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARTEMIS-473) Resolve split brain data after split brains scenarios.
[ https://issues.apache.org/jira/browse/ARTEMIS-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432755#comment-15432755 ] Miroslav Novak commented on ARTEMIS-473: [~clebertsuconic] do you think that options a) and b) are feasible? Option c) would require to resolve hard issues and merge the journals as you mentioned above which would be out of scope for this RFE. > Resolve split brain data after split brains scenarios. > -- > > Key: ARTEMIS-473 > URL: https://issues.apache.org/jira/browse/ARTEMIS-473 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Affects Versions: 1.2.0 >Reporter: Miroslav Novak >Priority: Critical > > If master-slave pair is configured using replicated journal and there are no > other servers in cluster then if network between master and slave is broken > then slave will activate. Depending on whether clients were disconnected from > master or not there might be or might not be failover to slave. Problem > happens in the moment when network between master and slave is restored. > Master and slave are active at the same time which is the split brain > syndrom. Currently there is no recovery mechanism to solve this situation. > Suggested improvement: If clients failovered to slave then master will > restart itself so failback occurs (if configured). If clients did not > failover and stayed connected to master then backup will restart itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARTEMIS-473) Resolve split brain data after split brains scenarios.
[ https://issues.apache.org/jira/browse/ARTEMIS-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417115#comment-15417115 ] Miroslav Novak commented on ARTEMIS-473: This feature request should handle options a) and b). c) requires to resolve hard issues which is out of scope. > Resolve split brain data after split brains scenarios. > -- > > Key: ARTEMIS-473 > URL: https://issues.apache.org/jira/browse/ARTEMIS-473 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Affects Versions: 1.2.0 >Reporter: Miroslav Novak >Priority: Critical > > If master-slave pair is configured using replicated journal and there are no > other servers in cluster then if network between master and slave is broken > then slave will activate. Depending on whether clients were disconnected from > master or not there might be or might not be failover to slave. Problem > happens in the moment when network between master and slave is restored. > Master and slave are active at the same time which is the split brain > syndrom. Currently there is no recovery mechanism to solve this situation. > Suggested improvement: If clients failovered to slave then master will > restart itself so failback occurs (if configured). If clients did not > failover and stayed connected to master then backup will restart itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARTEMIS-473) Resolve split brain data after split brains scenarios.
[ https://issues.apache.org/jira/browse/ARTEMIS-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417098#comment-15417098 ] clebert suconic commented on ARTEMIS-473: - I closed https://issues.apache.org/jira/browse/ARTEMIS-679 as won't fix because it defeats the purpose of replication by itself. if you had access to make a comprarisson you wouldn't defeat the need for replication. The only possible solution is to resolve eventual issues that happened because of a split brain in cases of network failures between live and backup. > Resolve split brain data after split brains scenarios. > -- > > Key: ARTEMIS-473 > URL: https://issues.apache.org/jira/browse/ARTEMIS-473 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Affects Versions: 1.2.0 >Reporter: Miroslav Novak >Priority: Critical > > If master-slave pair is configured using replicated journal and there are no > other servers in cluster then if network between master and slave is broken > then slave will activate. Depending on whether clients were disconnected from > master or not there might be or might not be failover to slave. Problem > happens in the moment when network between master and slave is restored. > Master and slave are active at the same time which is the split brain > syndrom. Currently there is no recovery mechanism to solve this situation. > Suggested improvement: If clients failovered to slave then master will > restart itself so failback occurs (if configured). If clients did not > failover and stayed connected to master then backup will restart itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARTEMIS-473) Resolve split brain data after split brains scenarios.
[ https://issues.apache.org/jira/browse/ARTEMIS-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15416643#comment-15416643 ] Miroslav Novak commented on ARTEMIS-473: I've created new jira for the description - ARTEMIS-679 - Activate most up to date server from master-slave(live-backup) pair. If split brain happens then there is not much Artemis can do about it. Still it can recover from quite common cases. Basically 3 situation can happen when split brain happens (=master and slave are active at the same time): a) Clients do not loose connection to master and stay connected to master. b) Clients loose connection to master and failover backup. c) Clients loose connection to master and slave at same time. They will try to reconnect to master or slave pair. I believe that for situations a) and b) Artemis can recover when network is reconnected. In the moment when master and slave notice that they're active at the same time, they will check who has external (no in-vm) connections. Server without external client connections will restart. Only server with the clients has the up-to-date journal. Option c) is problematic as clients can connect to master or slave so in this case there is nothing Artemis can do. wdyt? > Resolve split brain data after split brains scenarios. > -- > > Key: ARTEMIS-473 > URL: https://issues.apache.org/jira/browse/ARTEMIS-473 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Affects Versions: 1.2.0 >Reporter: Miroslav Novak >Priority: Critical > > If master-slave pair is configured using replicated journal and there are no > other servers in cluster then if network between master and slave is broken > then slave will activate. Depending on whether clients were disconnected from > master or not there might be or might not be failover to slave. Problem > happens in the moment when network between master and slave is restored. > Master and slave are active at the same time which is the split brain > syndrom. Currently there is no recovery mechanism to solve this situation. > Suggested improvement: If clients failovered to slave then master will > restart itself so failback occurs (if configured). If clients did not > failover and stayed connected to master then backup will restart itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARTEMIS-473) Resolve split brain data after split brains scenarios.
[ https://issues.apache.org/jira/browse/ARTEMIS-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15416620#comment-15416620 ] Miroslav Novak commented on ARTEMIS-473: Sorry, the title says something different then there is in the description. I'll change description per tittle and create new jira for problem in description. > Resolve split brain data after split brains scenarios. > -- > > Key: ARTEMIS-473 > URL: https://issues.apache.org/jira/browse/ARTEMIS-473 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Affects Versions: 1.2.0 >Reporter: Miroslav Novak >Priority: Critical > > if there are 2 live/backup pairs with replicated journal in colocated > topology Artemis1(L1/B2) <-> Artemis2(L2/B1) then there is no easy way to > start them if they're all shutdown. > Problem is that there is no way how to start the servers with most up-to-date > journal. If administrator shutdown servers in sequence Artemis1 and then > Artemis 2. Then Artemis 2 has the most up-to-date journals because backup B1 > on server2 activated. > Then If administrator decides to start Artemis2 then live L2 activates and > backup B1 waits for live L1 in Artemis 1 to start. But once L1 starts then L1 > replicates its own "old" journal to B1. > So L1 started with bad old journal. I would suggest that L1 and B1 compares > theirs journals and figure out which one is more up-to-date. Then server with > more up-to-date journal activates. > In scenario described above it would be backup B1 which will activate first. > Live L1 will synchronize its own journal from B1 and then failback happens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARTEMIS-473) Resolve split brain data after split brains scenarios.
[ https://issues.apache.org/jira/browse/ARTEMIS-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415913#comment-15415913 ] clebert suconic commented on ARTEMIS-473: - There's no way to determine that. We have to connection to the other server when the other server is started. if we did we would be using shared storage.. So there is no way to determine which server to start. We could give tools to admin to determine what is the last ID. the reason for this is that if you kill the networking between live and backup you will endup with two lives. Once a server is live there's no way for us to bring it back to backup as it could lose data. We could maybe add a configuration option to fix split brain scenarios and merge both journals once each server sees each other. That's a major feature though. > Resolve split brain data after split brains scenarios. > -- > > Key: ARTEMIS-473 > URL: https://issues.apache.org/jira/browse/ARTEMIS-473 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Affects Versions: 1.2.0 >Reporter: Miroslav Novak >Priority: Critical > > if there are 2 live/backup pairs with replicated journal in colocated > topology Artemis1(L1/B2) <-> Artemis2(L2/B1) then there is no easy way to > start them if they're all shutdown. > Problem is that there is no way how to start the servers with most up-to-date > journal. If administrator shutdown servers in sequence Artemis1 and then > Artemis 2. Then Artemis 2 has the most up-to-date journals because backup B1 > on server2 activated. > Then If administrator decides to start Artemis2 then live L2 activates and > backup B1 waits for live L1 in Artemis 1 to start. But once L1 starts then L1 > replicates its own "old" journal to B1. > So L1 started with bad old journal. I would suggest that L1 and B1 compares > theirs journals and figure out which one is more up-to-date. Then server with > more up-to-date journal activates. > In scenario described above it would be backup B1 which will activate first. > Live L1 will synchronize its own journal from B1 and then failback happens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)