[jira] [Commented] (ARTEMIS-473) Resolve split brain data after split brains scenarios.

2017-03-10 Thread clebert suconic (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905117#comment-15905117
 ] 

clebert suconic commented on ARTEMIS-473:
-

[~martyntaylor] this is unfixable. We can only avoid split brains... 

this feature was a request to fix the journal after a split brain...  After the 
data is mixed.. there's no way to differentiate it.

you can only configure the system to avoid it.

> Resolve split brain data after split brains scenarios.
> --
>
> Key: ARTEMIS-473
> URL: https://issues.apache.org/jira/browse/ARTEMIS-473
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Affects Versions: 1.2.0
>Reporter: Miroslav Novak
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 1.5.0
>
>
> If master-slave pair is configured using replicated journal and there are no 
> other servers in cluster then if network between master and slave is broken 
> then slave will activate. Depending on whether clients were disconnected from 
> master or not there might be or might not be failover to slave. Problem 
> happens in the moment when network between master and slave is restored. 
> Master and slave are active at the same time which is the split brain 
> syndrom. Currently there is no recovery mechanism to solve this situation.
> Suggested improvement: If clients failovered to slave then master will 
> restart itself so failback occurs (if configured). If clients did not 
> failover and stayed connected to master then backup will restart itself.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARTEMIS-473) Resolve split brain data after split brains scenarios.

2016-11-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690577#comment-15690577
 ] 

ASF subversion and git services commented on ARTEMIS-473:
-

Commit 402f25be7dc5eda6e4dd1e8170e242415ce94fa8 in activemq-artemis's branch 
refs/heads/master from Clebert Suconic
[ https://git-wip-us.apache.org/repos/asf?p=activemq-artemis.git;h=402f25b ]

ARTEMIS-473/ARTEMIS-863 Detect network failures


> Resolve split brain data after split brains scenarios.
> --
>
> Key: ARTEMIS-473
> URL: https://issues.apache.org/jira/browse/ARTEMIS-473
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Affects Versions: 1.2.0
>Reporter: Miroslav Novak
>Priority: Critical
> Fix For: 1.6.0
>
>
> If master-slave pair is configured using replicated journal and there are no 
> other servers in cluster then if network between master and slave is broken 
> then slave will activate. Depending on whether clients were disconnected from 
> master or not there might be or might not be failover to slave. Problem 
> happens in the moment when network between master and slave is restored. 
> Master and slave are active at the same time which is the split brain 
> syndrom. Currently there is no recovery mechanism to solve this situation.
> Suggested improvement: If clients failovered to slave then master will 
> restart itself so failback occurs (if configured). If clients did not 
> failover and stayed connected to master then backup will restart itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARTEMIS-473) Resolve split brain data after split brains scenarios.

2016-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15688863#comment-15688863
 ] 

ASF GitHub Bot commented on ARTEMIS-473:


GitHub user clebertsuconic opened a pull request:

https://github.com/apache/activemq-artemis/pull/895

ARTEMIS-473/ARTEMIS-863 Detect network failures



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/clebertsuconic/activemq-artemis netcheck

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/activemq-artemis/pull/895.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #895


commit 5d040c41822cbda784c29c6488073e5dff4d6f13
Author: Clebert Suconic 
Date:   2016-11-17T15:01:31Z

ARTEMIS-473/ARTEMIS-863 Detect network failures




> Resolve split brain data after split brains scenarios.
> --
>
> Key: ARTEMIS-473
> URL: https://issues.apache.org/jira/browse/ARTEMIS-473
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Affects Versions: 1.2.0
>Reporter: Miroslav Novak
>Priority: Critical
> Fix For: 1.6.0
>
>
> If master-slave pair is configured using replicated journal and there are no 
> other servers in cluster then if network between master and slave is broken 
> then slave will activate. Depending on whether clients were disconnected from 
> master or not there might be or might not be failover to slave. Problem 
> happens in the moment when network between master and slave is restored. 
> Master and slave are active at the same time which is the split brain 
> syndrom. Currently there is no recovery mechanism to solve this situation.
> Suggested improvement: If clients failovered to slave then master will 
> restart itself so failback occurs (if configured). If clients did not 
> failover and stayed connected to master then backup will restart itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARTEMIS-473) Resolve split brain data after split brains scenarios.

2016-08-23 Thread Miroslav Novak (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432755#comment-15432755
 ] 

Miroslav Novak commented on ARTEMIS-473:


[~clebertsuconic] do you think that options a) and b) are feasible? Option c) 
would require to resolve hard issues and merge the journals as you mentioned 
above which would be out of scope for this RFE.

> Resolve split brain data after split brains scenarios.
> --
>
> Key: ARTEMIS-473
> URL: https://issues.apache.org/jira/browse/ARTEMIS-473
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Affects Versions: 1.2.0
>Reporter: Miroslav Novak
>Priority: Critical
>
> If master-slave pair is configured using replicated journal and there are no 
> other servers in cluster then if network between master and slave is broken 
> then slave will activate. Depending on whether clients were disconnected from 
> master or not there might be or might not be failover to slave. Problem 
> happens in the moment when network between master and slave is restored. 
> Master and slave are active at the same time which is the split brain 
> syndrom. Currently there is no recovery mechanism to solve this situation.
> Suggested improvement: If clients failovered to slave then master will 
> restart itself so failback occurs (if configured). If clients did not 
> failover and stayed connected to master then backup will restart itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARTEMIS-473) Resolve split brain data after split brains scenarios.

2016-08-11 Thread Miroslav Novak (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417115#comment-15417115
 ] 

Miroslav Novak commented on ARTEMIS-473:


This feature request should handle options a) and b). c) requires to resolve 
hard issues which is out of scope. 

> Resolve split brain data after split brains scenarios.
> --
>
> Key: ARTEMIS-473
> URL: https://issues.apache.org/jira/browse/ARTEMIS-473
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Affects Versions: 1.2.0
>Reporter: Miroslav Novak
>Priority: Critical
>
> If master-slave pair is configured using replicated journal and there are no 
> other servers in cluster then if network between master and slave is broken 
> then slave will activate. Depending on whether clients were disconnected from 
> master or not there might be or might not be failover to slave. Problem 
> happens in the moment when network between master and slave is restored. 
> Master and slave are active at the same time which is the split brain 
> syndrom. Currently there is no recovery mechanism to solve this situation.
> Suggested improvement: If clients failovered to slave then master will 
> restart itself so failback occurs (if configured). If clients did not 
> failover and stayed connected to master then backup will restart itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARTEMIS-473) Resolve split brain data after split brains scenarios.

2016-08-11 Thread clebert suconic (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417098#comment-15417098
 ] 

clebert suconic commented on ARTEMIS-473:
-

I closed https://issues.apache.org/jira/browse/ARTEMIS-679 as won't fix because 
it defeats the purpose of replication by itself. if you had access to make a 
comprarisson you wouldn't defeat the need for replication.


The only possible solution is to resolve eventual issues that happened because 
of a split brain in cases of network failures between live and backup.

> Resolve split brain data after split brains scenarios.
> --
>
> Key: ARTEMIS-473
> URL: https://issues.apache.org/jira/browse/ARTEMIS-473
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Affects Versions: 1.2.0
>Reporter: Miroslav Novak
>Priority: Critical
>
> If master-slave pair is configured using replicated journal and there are no 
> other servers in cluster then if network between master and slave is broken 
> then slave will activate. Depending on whether clients were disconnected from 
> master or not there might be or might not be failover to slave. Problem 
> happens in the moment when network between master and slave is restored. 
> Master and slave are active at the same time which is the split brain 
> syndrom. Currently there is no recovery mechanism to solve this situation.
> Suggested improvement: If clients failovered to slave then master will 
> restart itself so failback occurs (if configured). If clients did not 
> failover and stayed connected to master then backup will restart itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARTEMIS-473) Resolve split brain data after split brains scenarios.

2016-08-11 Thread Miroslav Novak (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15416643#comment-15416643
 ] 

Miroslav Novak commented on ARTEMIS-473:


I've created new jira for the description - ARTEMIS-679 - Activate most up to 
date server from master-slave(live-backup) pair. 

If split brain happens then there is not much Artemis can do about it. Still it 
can recover from quite common cases. Basically 3 situation can happen when 
split brain happens (=master and slave are active at the same time):

a) Clients do not loose connection to master and stay connected to master.
b) Clients loose connection to master and failover backup. 
c) Clients loose connection to master and slave at same time. They will try to 
reconnect to master or slave pair. 

I believe that for situations a) and b) Artemis can recover when network is 
reconnected. In the moment when master and slave notice that they're active at 
the same time, they will check who has external (no in-vm) connections. Server 
without external client connections will restart. Only server with the clients 
has the up-to-date journal. 

Option c) is problematic as clients can connect to master or slave so in this 
case there is nothing Artemis can do. wdyt?

> Resolve split brain data after split brains scenarios.
> --
>
> Key: ARTEMIS-473
> URL: https://issues.apache.org/jira/browse/ARTEMIS-473
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Affects Versions: 1.2.0
>Reporter: Miroslav Novak
>Priority: Critical
>
> If master-slave pair is configured using replicated journal and there are no 
> other servers in cluster then if network between master and slave is broken 
> then slave will activate. Depending on whether clients were disconnected from 
> master or not there might be or might not be failover to slave. Problem 
> happens in the moment when network between master and slave is restored. 
> Master and slave are active at the same time which is the split brain 
> syndrom. Currently there is no recovery mechanism to solve this situation.
> Suggested improvement: If clients failovered to slave then master will 
> restart itself so failback occurs (if configured). If clients did not 
> failover and stayed connected to master then backup will restart itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARTEMIS-473) Resolve split brain data after split brains scenarios.

2016-08-11 Thread Miroslav Novak (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15416620#comment-15416620
 ] 

Miroslav Novak commented on ARTEMIS-473:


Sorry, the title says something different then there is in the description. 
I'll change description per tittle and create new jira for problem in 
description.

> Resolve split brain data after split brains scenarios.
> --
>
> Key: ARTEMIS-473
> URL: https://issues.apache.org/jira/browse/ARTEMIS-473
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Affects Versions: 1.2.0
>Reporter: Miroslav Novak
>Priority: Critical
>
> if there are 2 live/backup pairs with replicated journal in colocated 
> topology Artemis1(L1/B2) <-> Artemis2(L2/B1) then there is no easy way to 
> start them if they're all shutdown.
> Problem is that there is no way how to start the servers with most up-to-date 
> journal. If administrator shutdown servers in sequence Artemis1 and then 
> Artemis 2. Then Artemis 2 has the most up-to-date journals because backup B1 
> on server2 activated.
> Then If administrator decides to start Artemis2 then live L2 activates and 
> backup B1 waits for live L1 in Artemis 1 to start. But once L1 starts then L1 
> replicates its own "old" journal to B1.
> So L1 started with bad old journal. I would suggest that L1 and B1 compares 
> theirs journals and figure out which one is more up-to-date. Then server with 
> more up-to-date journal activates.
> In scenario described above it would be backup B1 which will activate first. 
> Live L1 will synchronize its own journal from B1 and then failback happens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARTEMIS-473) Resolve split brain data after split brains scenarios.

2016-08-10 Thread clebert suconic (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415913#comment-15415913
 ] 

clebert suconic commented on ARTEMIS-473:
-

There's no way to determine that. We have to connection to the other server 
when the other server is started. if we did we would be using shared storage..  
So there is no way to determine which server to start.


We could give tools to admin to determine what is the last ID.


the reason for this is that if you kill the networking between live and backup 
you will endup with two lives. Once a server is live there's no way for us to 
bring it back to backup as it could lose data.


We could maybe add a configuration option to fix split brain scenarios and 
merge both journals once each server sees each other. That's a major feature 
though.

> Resolve split brain data after split brains scenarios.
> --
>
> Key: ARTEMIS-473
> URL: https://issues.apache.org/jira/browse/ARTEMIS-473
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Affects Versions: 1.2.0
>Reporter: Miroslav Novak
>Priority: Critical
>
> if there are 2 live/backup pairs with replicated journal in colocated 
> topology Artemis1(L1/B2) <-> Artemis2(L2/B1) then there is no easy way to 
> start them if they're all shutdown.
> Problem is that there is no way how to start the servers with most up-to-date 
> journal. If administrator shutdown servers in sequence Artemis1 and then 
> Artemis 2. Then Artemis 2 has the most up-to-date journals because backup B1 
> on server2 activated.
> Then If administrator decides to start Artemis2 then live L2 activates and 
> backup B1 waits for live L1 in Artemis 1 to start. But once L1 starts then L1 
> replicates its own "old" journal to B1.
> So L1 started with bad old journal. I would suggest that L1 and B1 compares 
> theirs journals and figure out which one is more up-to-date. Then server with 
> more up-to-date journal activates.
> In scenario described above it would be backup B1 which will activate first. 
> Live L1 will synchronize its own journal from B1 and then failback happens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)