[ 
https://issues.apache.org/jira/browse/GEODE-10056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakov Varenina updated GEODE-10056:
-----------------------------------
    Description: 
It looks like each locator is maintaining a map of the load based on 
connections it dealt around so there will be no unbalancing problems until 
either locator restarts or clients get their connections from some other 
locator in the cluster.

How to test?

Start 2 clusters, Let's call site1 the sending and site2 the receiving site, 
The receiving site should have at least 2 locators. Both have 2 servers. No 
regions are needed.

Cluster-1 gfsh>list members

Member Count : 3Name | Id

--------- | -------------------------------------------------------------

locator10 | 10.0.2.15(locator10:7332:locator)<ec><v0>:41000 [Coordinator]

server11 | 10.0.2.15(server11:8358)<v1>:41003

server12 | 10.0.2.15(server12:8717)<v2>:41005

 

Cluster-2 gfsh>list members

Member Count : 4Name | Id

--------- | -------------------------------------------------------------

locator10 | 10.0.2.15(locator10:7562:locator)<ec><v0>:41001 [Coordinator]

locator11 | 10.0.2.15(locator11:8103:locator)<ec><v1>:41002

server11 | 10.0.2.15(server11:8547)<v2>:41004

server12 | 10.0.2.15(server12:8908)<v3>:41006

 

Create GW receiver in Site2 on both servers.

Cluster-2 gfsh>list gateways

GatewayReceiver Section              Member               | Port | Sender Count 
| Senders Connected

---------------------------------- | ---- | ------------ | -----------------

10.0.2.15(server11:8547)<v2>:41004 | 5175 | 0            |

10.0.2.15(server12:8908)<v3>:41006 | 5457 | 0            |

Create GW sender in Site1 on both servers. Use 10 dispatcher threads for easier 
obervation. 

Cluster-1 gfsh>list gateways

GatewaySender SectionGatewaySender Id |               Member               | 
Remote Cluster Id |   Type   |        Status         | Queued Events | Receiver 
Location

---------------- | ---------------------------------- | ----------------- | 
-------- | --------------------- | ------------- | -----------------

senderTo2        | 10.0.2.15(server11:8358)<v1>:41003 | 2                 | 
Parallel | Running and Connected | 0             | 10.0.2.15:5457

senderTo2        | 10.0.2.15(server12:8717)<v2>:41005 | 2                 | 
Parallel | Running and Connected | 0             | 10.0.2.15:5457

 

Observe balance in GW receiver connections in Site2. It will be perfect.

 

Cluster-2 gfsh>list gateways

GatewayReceiver Section              Member               | Port | Sender Count 
| Senders Connected

---------------------------------- | ---- | ------------ | 
---------------------------------------------------------------------------------------------------------------------------------

10.0.2.15(server11:8547)<v2>:41004 | 5175 | 12           | 
10.0.2.15(server12:8717)<v2>:41005, 10.0.2.15(server12:8717)<v2>:41005, 
10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:..

10.0.2.15(server12:8908)<v3>:41006 | 5457 | 12           | 
10.0.2.15(server12:8717)<v2>:41005, 10.0.2.15(server12:8717)<v2>:41005, 
10.0.2.15(server12:8717)<v2>:41005, 10.0.2.15(server12:..

 

12 connections each - 10 payload + 2 ping connections.

Now stop GW receiver in one server of site2. In Site1 do a stop/start 
gateway-sender command - all connections will go to the only receiver in site2 
(as expected). Check it:

 

Cluster-2 gfsh>list gateways

GatewayReceiver Section              Member               | Port | Sender Count 
| Senders Connected

---------------------------------- | ---- | ------------ | 
---------------------------------------------------------------------------------------------------------------------------------

10.0.2.15(server11:8547)<v2>:41004 | 5175 | 22           | 
10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server12:8717)<v2>:41005, 
10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:..

10.0.2.15(server12:8908)<v3>:41006 | 5457 | 0            |

 

Now 22 in just one receiver - 20 payload + 1 ping from each sender.

Stop GW sender in one server in Site1. Connection drops in GW receiver to half 
the value (also expected).

 

Cluster-2 gfsh>list gateways

GatewayReceiver Section              Member               | Port | Sender Count 
| Senders Connected

---------------------------------- | ---- | ------------ | 
---------------------------------------------------------------------------------------------------------------------------------

10.0.2.15(server11:8547)<v2>:41004 | 5175 | 11           | 
10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:8358)<v1>:41003, 
10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:..

10.0.2.15(server12:8908)<v3>:41006 | 5457 | 0            |

Now 11 as one sender from Site1 is stopped.

Start the GW receiver in server of site2 (that was stopped before). It will not 
receive new connections just yet.

Start GW sender in one server in Site1 (that was stopped before). All 
connections will land in receiver started before so the balance is there.

Cluster-2 gfsh>list gateways

GatewayReceiver Section              Member               | Port | Sender Count 
| Senders Connected

---------------------------------- | ---- | ------------ | 
---------------------------------------------------------------------------------------------------------------------------------

10.0.2.15(server11:8547)<v2>:41004 | 5175 | 11           | 
10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:8358)<v1>:41003, 
10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:..

10.0.2.15(server12:8908)<v3>:41006 | 5182 | 11           | 
10.0.2.15(server12:8717)<v2>:41005, 10.0.2.15(server12:8717)<v2>:41005, 
10.0.2.15(server12:8717)<v2>:41005, 10.0.2.15(server12:..

11 connections in each because we have perfect mapping server11 to server11 and 
server12 to server12 (i.e. there is just 1 ping connection in each receiver). 
As expected - we see how balance was achieved. Stop GW sender in same server in 
Site1 again. Again, no connections in receiver of Site2 we just started 
(expected).

Cluster-2 gfsh>list gateways

GatewayReceiver Section              Member               | Port | Sender Count 
| Senders Connected

---------------------------------- | ---- | ------------ | 
---------------------------------------------------------------------------------------------------------------------------------

10.0.2.15(server11:8547)<v2>:41004 | 5175 | 11           | 
10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:8358)<v1>:41003, 
10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:..

10.0.2.15(server12:8908)<v3>:41006 | 5182 | 0            |

Now stop one locator in Site2 - the one that was serving GW senders - it was 
locator10 in my case. Start GW sender in that server of Site1 again. Check the 
balance in Site2 GW receiver:

Cluster-2 gfsh>list gateways

GatewayReceiver Section              Member               | Port | Sender Count 
| Senders Connected

---------------------------------- | ---- | ------------ | 
---------------------------------------------------------------------------------------------------------------------------------

10.0.2.15(server11:8547)<v2>:41004 | 5175 | 17           | 
10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:8358)<v1>:41003, 
10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:..

10.0.2.15(server12:8908)<v3>:41006 | 5182 | 6            | 
10.0.2.15(server12:8717)<v2>:41005, 10.0.2.15(server12:8717)<v2>:41005, 
10.0.2.15(server12:8717)<v2>:41005, 10.0.2.15(server12:..

As you can see in above printout, connections aren't balanced correctly when 
connection request is sent to new locator.

> Gateway-reciver load mantained only on one locator
> --------------------------------------------------
>
>                 Key: GEODE-10056
>                 URL: https://issues.apache.org/jira/browse/GEODE-10056
>             Project: Geode
>          Issue Type: Bug
>            Reporter: Jakov Varenina
>            Assignee: Jakov Varenina
>            Priority: Major
>              Labels: needsTriage
>
> It looks like each locator is maintaining a map of the load based on 
> connections it dealt around so there will be no unbalancing problems until 
> either locator restarts or clients get their connections from some other 
> locator in the cluster.
> How to test?
> Start 2 clusters, Let's call site1 the sending and site2 the receiving site, 
> The receiving site should have at least 2 locators. Both have 2 servers. No 
> regions are needed.
> Cluster-1 gfsh>list members
> Member Count : 3Name | Id
> --------- | -------------------------------------------------------------
> locator10 | 10.0.2.15(locator10:7332:locator)<ec><v0>:41000 [Coordinator]
> server11 | 10.0.2.15(server11:8358)<v1>:41003
> server12 | 10.0.2.15(server12:8717)<v2>:41005
>  
> Cluster-2 gfsh>list members
> Member Count : 4Name | Id
> --------- | -------------------------------------------------------------
> locator10 | 10.0.2.15(locator10:7562:locator)<ec><v0>:41001 [Coordinator]
> locator11 | 10.0.2.15(locator11:8103:locator)<ec><v1>:41002
> server11 | 10.0.2.15(server11:8547)<v2>:41004
> server12 | 10.0.2.15(server12:8908)<v3>:41006
>  
> Create GW receiver in Site2 on both servers.
> Cluster-2 gfsh>list gateways
> GatewayReceiver Section              Member               | Port | Sender 
> Count | Senders Connected
> ---------------------------------- | ---- | ------------ | -----------------
> 10.0.2.15(server11:8547)<v2>:41004 | 5175 | 0            |
> 10.0.2.15(server12:8908)<v3>:41006 | 5457 | 0            |
> Create GW sender in Site1 on both servers. Use 10 dispatcher threads for 
> easier obervation. 
> Cluster-1 gfsh>list gateways
> GatewaySender SectionGatewaySender Id |               Member               | 
> Remote Cluster Id |   Type   |        Status         | Queued Events | 
> Receiver Location
> ---------------- | ---------------------------------- | ----------------- | 
> -------- | --------------------- | ------------- | -----------------
> senderTo2        | 10.0.2.15(server11:8358)<v1>:41003 | 2                 | 
> Parallel | Running and Connected | 0             | 10.0.2.15:5457
> senderTo2        | 10.0.2.15(server12:8717)<v2>:41005 | 2                 | 
> Parallel | Running and Connected | 0             | 10.0.2.15:5457
>  
> Observe balance in GW receiver connections in Site2. It will be perfect.
>  
> Cluster-2 gfsh>list gateways
> GatewayReceiver Section              Member               | Port | Sender 
> Count | Senders Connected
> ---------------------------------- | ---- | ------------ | 
> ---------------------------------------------------------------------------------------------------------------------------------
> 10.0.2.15(server11:8547)<v2>:41004 | 5175 | 12           | 
> 10.0.2.15(server12:8717)<v2>:41005, 10.0.2.15(server12:8717)<v2>:41005, 
> 10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:..
> 10.0.2.15(server12:8908)<v3>:41006 | 5457 | 12           | 
> 10.0.2.15(server12:8717)<v2>:41005, 10.0.2.15(server12:8717)<v2>:41005, 
> 10.0.2.15(server12:8717)<v2>:41005, 10.0.2.15(server12:..
>  
> 12 connections each - 10 payload + 2 ping connections.
> Now stop GW receiver in one server of site2. In Site1 do a stop/start 
> gateway-sender command - all connections will go to the only receiver in 
> site2 (as expected). Check it:
>  
> Cluster-2 gfsh>list gateways
> GatewayReceiver Section              Member               | Port | Sender 
> Count | Senders Connected
> ---------------------------------- | ---- | ------------ | 
> ---------------------------------------------------------------------------------------------------------------------------------
> 10.0.2.15(server11:8547)<v2>:41004 | 5175 | 22           | 
> 10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server12:8717)<v2>:41005, 
> 10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:..
> 10.0.2.15(server12:8908)<v3>:41006 | 5457 | 0            |
>  
> Now 22 in just one receiver - 20 payload + 1 ping from each sender.
> Stop GW sender in one server in Site1. Connection drops in GW receiver to 
> half the value (also expected).
>  
> Cluster-2 gfsh>list gateways
> GatewayReceiver Section              Member               | Port | Sender 
> Count | Senders Connected
> ---------------------------------- | ---- | ------------ | 
> ---------------------------------------------------------------------------------------------------------------------------------
> 10.0.2.15(server11:8547)<v2>:41004 | 5175 | 11           | 
> 10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:8358)<v1>:41003, 
> 10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:..
> 10.0.2.15(server12:8908)<v3>:41006 | 5457 | 0            |
> Now 11 as one sender from Site1 is stopped.
> Start the GW receiver in server of site2 (that was stopped before). It will 
> not receive new connections just yet.
> Start GW sender in one server in Site1 (that was stopped before). All 
> connections will land in receiver started before so the balance is there.
> Cluster-2 gfsh>list gateways
> GatewayReceiver Section              Member               | Port | Sender 
> Count | Senders Connected
> ---------------------------------- | ---- | ------------ | 
> ---------------------------------------------------------------------------------------------------------------------------------
> 10.0.2.15(server11:8547)<v2>:41004 | 5175 | 11           | 
> 10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:8358)<v1>:41003, 
> 10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:..
> 10.0.2.15(server12:8908)<v3>:41006 | 5182 | 11           | 
> 10.0.2.15(server12:8717)<v2>:41005, 10.0.2.15(server12:8717)<v2>:41005, 
> 10.0.2.15(server12:8717)<v2>:41005, 10.0.2.15(server12:..
> 11 connections in each because we have perfect mapping server11 to server11 
> and server12 to server12 (i.e. there is just 1 ping connection in each 
> receiver). As expected - we see how balance was achieved. Stop GW sender in 
> same server in Site1 again. Again, no connections in receiver of Site2 we 
> just started (expected).
> Cluster-2 gfsh>list gateways
> GatewayReceiver Section              Member               | Port | Sender 
> Count | Senders Connected
> ---------------------------------- | ---- | ------------ | 
> ---------------------------------------------------------------------------------------------------------------------------------
> 10.0.2.15(server11:8547)<v2>:41004 | 5175 | 11           | 
> 10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:8358)<v1>:41003, 
> 10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:..
> 10.0.2.15(server12:8908)<v3>:41006 | 5182 | 0            |
> Now stop one locator in Site2 - the one that was serving GW senders - it was 
> locator10 in my case. Start GW sender in that server of Site1 again. Check 
> the balance in Site2 GW receiver:
> Cluster-2 gfsh>list gateways
> GatewayReceiver Section              Member               | Port | Sender 
> Count | Senders Connected
> ---------------------------------- | ---- | ------------ | 
> ---------------------------------------------------------------------------------------------------------------------------------
> 10.0.2.15(server11:8547)<v2>:41004 | 5175 | 17           | 
> 10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:8358)<v1>:41003, 
> 10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:..
> 10.0.2.15(server12:8908)<v3>:41006 | 5182 | 6            | 
> 10.0.2.15(server12:8717)<v2>:41005, 10.0.2.15(server12:8717)<v2>:41005, 
> 10.0.2.15(server12:8717)<v2>:41005, 10.0.2.15(server12:..
> As you can see in above printout, connections aren't balanced correctly when 
> connection request is sent to new locator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to