[jira] [Updated] (IGNITE-4491) Commutation loss between two nodes leads to hang whole cluster.

Vladislav Pyatkov (JIRA) Mon, 26 Dec 2016 03:26:54 -0800

     [ 
https://issues.apache.org/jira/browse/IGNITE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vladislav Pyatkov updated IGNITE-4491:
--------------------------------------
    Description: 
Reproduction steps:
1) Start nodes:

{noformat}
DC1                       DC2

1 (10.116.172.1)      8 (10.116.64.11)
2 (10.116.172.2)      7 (10.116.64.12)
3 (10.116.172.3)      6 (10.116.64.13)
4 (10.116.172.4)      5 (10.116.64.14)
{noformat}

each node have client which run in same host with server (look source in 
attachment).

2) Drop connection

Between 1-8,
{noformat}
1 (10.116.172.1)      8 (10.116.64.11)
{noformat}

Drop all input and output traffic
Invoke from 10.116.172.1
iptables -A INPUT -s 10.116.64.11 -j DROP
iptables -A OUTPUT -d 10.116.64.11 -j DROP

Between  4-5

4 (10.116.172.4)      5 (10.116.64.14)

Invoke from 10.116.172.4
iptables -A INPUT -s 10.116.64.14 -j DROP
iptables -A OUTPUT -d 10.116.64.14 -j DROP

3) Stop the grid, after several seconds

If you are looking into logs, you can find which node was segmented (pay 
attention, which clients did not segmented.), after drop traffic:
[12:04:33,914][INFO][disco-event-worker-#211%null%][GridDiscoveryManager] 
Topology snapshot [ver=18, servers=6, clients=8, CPUs=456, heap=68.0GB]

And all operations stopped at the same time.

  was:
Reproduction steps:
1) Start nodes:

DC1                       DC2

1 (10.116.172.1)      8 (10.116.64.11)
2 (10.116.172.2)      7 (10.116.64.12)
3 (10.116.172.3)      6 (10.116.64.13)
4 (10.116.172.4)      5 (10.116.64.14)

each node have client which run in same host with server (look source in 
attachment).

2) Drop connection

Between 1-8,

1 (10.116.172.1)      8 (10.116.64.11)

Drop all input and output traffic
Invoke from 10.116.172.1
iptables -A INPUT -s 10.116.64.11 -j DROP
iptables -A OUTPUT -d 10.116.64.11 -j DROP

Between  4-5

4 (10.116.172.4)      5 (10.116.64.14)

Invoke from 10.116.172.4
iptables -A INPUT -s 10.116.64.14 -j DROP
iptables -A OUTPUT -d 10.116.64.14 -j DROP

3) Stop the grid, after several seconds

If you are looking into logs, you can find which node was segmented (pay 
attention, which clients did not segmented.), after drop traffic:
[12:04:33,914][INFO][disco-event-worker-#211%null%][GridDiscoveryManager] 
Topology snapshot [ver=18, servers=6, clients=8, CPUs=456, heap=68.0GB]

And all operations stopped at the same time.


> Commutation loss between two nodes leads to hang whole cluster.
> ---------------------------------------------------------------
>
>                 Key: IGNITE-4491
>                 URL: https://issues.apache.org/jira/browse/IGNITE-4491
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 1.8
>            Reporter: Vladislav Pyatkov
>            Priority: Critical
>
> Reproduction steps:
> 1) Start nodes:
> {noformat}
> DC1                       DC2
> 1 (10.116.172.1)      8 (10.116.64.11)
> 2 (10.116.172.2)      7 (10.116.64.12)
> 3 (10.116.172.3)      6 (10.116.64.13)
> 4 (10.116.172.4)      5 (10.116.64.14)
> {noformat}
> each node have client which run in same host with server (look source in 
> attachment).
> 2) Drop connection
> Between 1-8,
> {noformat}
> 1 (10.116.172.1)      8 (10.116.64.11)
> {noformat}
> Drop all input and output traffic
> Invoke from 10.116.172.1
> iptables -A INPUT -s 10.116.64.11 -j DROP
> iptables -A OUTPUT -d 10.116.64.11 -j DROP
> Between  4-5
> 4 (10.116.172.4)      5 (10.116.64.14)
> Invoke from 10.116.172.4
> iptables -A INPUT -s 10.116.64.14 -j DROP
> iptables -A OUTPUT -d 10.116.64.14 -j DROP
> 3) Stop the grid, after several seconds
> If you are looking into logs, you can find which node was segmented (pay 
> attention, which clients did not segmented.), after drop traffic:
> [12:04:33,914][INFO][disco-event-worker-#211%null%][GridDiscoveryManager] 
> Topology snapshot [ver=18, servers=6, clients=8, CPUs=456, heap=68.0GB]
> And all operations stopped at the same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (IGNITE-4491) Commutation loss between two nodes leads to hang whole cluster.

Reply via email to