Re: incorrect partition map exchange behaviour

2021-01-14 Thread Ilya Kasnacheev
Hello!

I have just looked at this issue today, and the relevant fix seems to be
https://issues.apache.org/jira/browse/IGNITE-11147

Regards,
-- 
Ilya Kasnacheev


ср, 13 янв. 2021 г. в 20:26, tschauenberg :

> Sorry about mixing the terminology.  My post was meant to be about the PME
> and the primary keys.
>
> So the summary of my post and what it was trying to show was the PME was
> only happening on cluster node leaves (server or visor) but not cluster
> node
> joins (at least with previously joined nodes - haven't tested with joining
> a
> brand new node for the first time such as expanding the cluster from 3
> nodes
> to 4 nodes).
>
> The PME doc suggests the PME should happen on the joins but the logs and
> visor/stats are showing that's not happening and it's only happening on the
> leaves.
>
> So what I am trying to identify is:
> * is this a known bug and if so, which versions is this fixed in?
> * what is the impact of the database state where one node has no designated
> primaries?
> ** This probably effectively reduces the get/put performance to n-1 nodes?
> ** Also, for things like compute tasks that operate on local data such as
> those using SqlFieldsQuery.setLocal(true) the node with no primaries will
> do
> nothing?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


RE: incorrect partition map exchange behaviour

2021-01-13 Thread tschauenberg
Sorry about mixing the terminology.  My post was meant to be about the PME
and the primary keys.

So the summary of my post and what it was trying to show was the PME was
only happening on cluster node leaves (server or visor) but not cluster node
joins (at least with previously joined nodes - haven't tested with joining a
brand new node for the first time such as expanding the cluster from 3 nodes
to 4 nodes).

The PME doc suggests the PME should happen on the joins but the logs and
visor/stats are showing that's not happening and it's only happening on the
leaves.

So what I am trying to identify is:
* is this a known bug and if so, which versions is this fixed in?
* what is the impact of the database state where one node has no designated
primaries?  
** This probably effectively reduces the get/put performance to n-1 nodes?
** Also, for things like compute tasks that operate on local data such as
those using SqlFieldsQuery.setLocal(true) the node with no primaries will do
nothing?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: incorrect partition map exchange behaviour

2021-01-13 Thread tschauenberg
Haven't tested on 2.9.1 as we don't have that database provisioned and sadly
won't for awhile.  When we do though I will update.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


RE: incorrect partition map exchange behaviour

2021-01-13 Thread Alexandr Shapkin
Hi, As you correctly pointed to the PME implementation details webpage, this is a process of exchanging information about partition holders. And it’s happening on every topology change, cluster deactivation, etc. The process itself is not about data rebalancing, it’s about what node should store a particular partition.  If you want to check whether the data rebalance happened you need to find something like  [2020-01-15 15:46:57,042][INFO ][sys-#50][GridDhtPartitionDemander] Starting rebalance routine [ignite-sys-cache, topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], supplier=9e88a103-4465-4e5b-865f-4edaa909fee1, fullPartitions=[0-99], histPartitions=[]] It also depends on whether your cluster is under load during the rolling upgrade, if there are no updates happening then no data rebalance should happen as well.  I’m not pretty sure about the metric and visor. Anyway you can perform the checks explicitly from code: ignite.cache("myCache").localSize(CachePeekMode.BACKUP);ignite.cache("myCache").localSize(CachePeekMode.PRIMARY);   From: tschauenbergSent: Friday, January 8, 2021 3:59 AMTo: user@ignite.apache.orgSubject: incorrect partition map exchange behaviour Hi, We have a cluster of Ignite 2.8.1 server nodes and have recently startedlooking at the individual cache metrics for primary keysorg.apache.ignite.internal.processors.cache.CacheLocalMetricsMXBeanImpl.OffHeapPrimaryEntriesCount In our configuration we have a replicated cache with 2 backups.  Our clusterhas 3 nodes in it so the primaries should be spread equally on the 3 nodesand each node has backups from the other two nodes.  All these server nodesare in the baseline.  Additionally we have some thick clients connected butI don't think they are relevant to the discussion. Whenever we do a rolling restart one node at a time, at the end after thelast node is restarted it always owns zero primaries and owns solelybackups.  The two nodes restarted earlier during the rolling restart own allthe primaries. When our cluster is in this scenario, if we start and stop visor, when visorleaves the cluster it triggers a PME where all keys get balanced on allserver nodes.  Looking at the visor cache stats between the start and stopwe can see a min of 0 keys on the nodes for our cache so visor and the jmxmetrics line up on that front.  After stopping visor, the jmx metrics showthe evenly distributed primaries and then starting visor a second time wecan confirm that again the min, average, max node keys are all evenlydistributed. Every join and leave during the rolling restart and during visor start/stopshows reflects a topology increment and node leave and join events in thelogs.   According tohttps://cwiki.apache.org/confluence/display/IGNITE/%2528Partition+Map%2529+Exchange+-+under+the+hoodeach leave and join should trigger the PME but we only see the keys changingon the leaves. Additionally, we tried waiting longer between the stop and start part of therolling restart to see if that had any effect.  We ensured we waited longenough for a PME to do any moving but waiting longer didn't have any effect. The stop always has the PME move the keys off that node and the start neversees the PME move any primaries back. Why are we only seeing the PME change keys when nodes (server or visor) stopand never when they join?   --Sent from: http://apache-ignite-users.70518.x6.nabble.com/ 


Re: incorrect partition map exchange behaviour

2021-01-13 Thread Ilya Kasnacheev
Hello!

Does it happen to work on 2.9.1, or will fail too? I recommend checking it
since I vaguely remember some discussions about late affinity assignments
fix.

Regards,
-- 
Ilya Kasnacheev


сб, 9 янв. 2021 г. в 03:11, tschauenberg :

> Here's my attempt to demonstrate and also provide logs
>
> Standup 3 node cluster and load with data
>
>
> Using a thick client, 250k devices are loaded into the device cache.  The
> thick client then leaves.  There's one other thick client connected the
> whole time for serving requests but I think that's irrelevant for the test
> but want to point it out in case someone notices there's still a client
> connected.
>
> Show topology from logs of the client leaving:
>
>
>
> [2021-01-08T23:08:05.012Z][INFO][disco-event-worker-#40][GridDiscoveryManager]
> Node left topology: TcpDiscoveryNode
> [id=611e30ee-b7c6-4ead-a746-f609b206cfb4,
> consistentId=611e30ee-b7c6-4ead-a746-f609b206cfb4, addrs=ArrayList
> [127.0.0.1, 172.17.0.3], sockAddrs=HashSet [/127.0.0.1:0, /172.17.0.3:0],
> discPort=0, order=7, intOrder=6, lastExchangeTime=1610146373751, loc=false,
> ver=2.8.1#20200521-sha1:86422096, isClient=true]
>
> [2021-01-08T23:08:05.013Z][INFO][disco-event-worker-#40][GridDiscoveryManager]
> Topology snapshot [ver=8, locNode=75e4ddea, servers=3, clients=1,
> state=ACTIVE, CPUs=7, offheap=3.0GB, heap=3.1GB]
>
> Start visor on one of the nodes
>
>
> Show topology from logs
>
>
> [2021-01-08T23:30:33.461Z][INFO][tcp-disco-msg-worker-[4ea8efe1
> 10.12.3.76:47500]-#2][TcpDiscoverySpi] New next node
> [newNext=TcpDiscoveryNode [id=1cca94e3-f15f-4a8b-9f65-d9b9055a5fa7,
> consistentId=10.12.2.110:47501, addrs=ArrayList [10.12.2.110],
> sockAddrs=HashSet [/10.12.2.110:47501], discPort=47501, order=0,
> intOrder=7,
> lastExchangeTime=1610148633458, loc=false,
> ver=2.8.1#20200521-sha1:86422096,
> isClient=false]]
>
> [2021-01-08T23:30:34.045Z][INFO][sys-#1011][GridDhtPartitionsExchangeFuture]
> Completed partition exchange
> [localNode=75e4ddea-1927-4e93-82e9-fdfbb7b58d1c,
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
> [topVer=9, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode
> [id=1cca94e3-f15f-4a8b-9f65-d9b9055a5fa7, consistentId=10.12.2.110:47501,
> addrs=ArrayList [10.12.2.110], sockAddrs=HashSet [/10.12.2.110:47501],
> discPort=47501, order=9, intOrder=7, lastExchangeTime=1610148633458,
> loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=false], done=true,
> newCrdFut=null], topVer=AffinityTopologyVersion [topVer=9, minorTopVer=0]]
>
> Show data balanced in visor
>
>
>
> +---+-+---+-+---+---+---+---++
> | Devices(@c2)  | PARTITIONED | 3 | 25 (0 / 25)
>
> | min: 80315 (0 / 80315)| min: 0| min: 0| min: 0|
> min: 25|
> |   | |   |
>
> | avg: 8.33 (0.00 / 8.33)   | avg: 0.00 | avg: 0.00 | avg: 0.00 |
> avg: 25.00 |
> |   | |   |
>
> | max: 86968 (0 / 86968)| max: 0| max: 0| max: 0|
> max: 25|
>
> +---+-+---+-+---+---+---+---++
>
> At this point the data is all relatively balanced and the topology
> increased
> when visor connected.
>
> Stop ignite on one node
>
>
> Show topology and PME from logs (from a different ignite node as the ignite
> process was stopped)
>
>
>
> [2021-01-08T23:35:39.333Z][INFO][disco-event-worker-#40][GridDiscoveryManager]
> Node left topology: TcpDiscoveryNode
> [id=75e4ddea-1927-4e93-82e9-fdfbb7b58d1c,
> consistentId=3a4a497f-5a89-4f2c-8531-b2b05f2ede22, addrs=ArrayList
> [10.12.2.110], sockAddrs=HashSet [/10.12.2.110:47500], discPort=47500,
> order=3, intOrder=3, lastExchangeTime=1610139164908, loc=false,
> ver=2.8.1#20200521-sha1:86422096, isClient=false]
>
> [2021-01-08T23:35:39.333Z][INFO][disco-event-worker-#40][GridDiscoveryManager]
> Topology snapshot [ver=10, locNode=4ea8efe1, servers=2, clients=1,
> state=ACTIVE, CPUs=5, offheap=2.0GB, heap=2.1GB]
> [2021-01-08T23:35:39.333Z][INFO][disco-event-worker-#40][GridDiscoveryManager]
>
> ^-- Baseline [id=0, size=3, online=2, offline=1]
> [2021-01-08T23:35:39.335Z][INFO][exchange-worker-#41][time] Started
> exchange
> init [topVer=AffinityTopologyVersion [topVer=10, minorTopVer=0], crd=true,
> evt=NODE_LEFT, evtNode=75e4ddea-1927-4e93-82e9-fdfbb7b58d1c,
> customEvt=null,
> allowMerge=false, exchangeFreeSwitch=true]
> [2021-01-08T23:35:39.338Z][INFO][sys-#1031][GridAffinityAssignmentCache]
> Local node affinity assignment distribution is not ideal [cache=Households,
> expectedPrimary=512.00, actualPrimary=548, expectedBackups=1024.00,
> actualBackups=476, warningThreshold=50.00%]
> 

Re: incorrect partition map exchange behaviour

2021-01-08 Thread tschauenberg
Here's my attempt to demonstrate and also provide logs

Standup 3 node cluster and load with data


Using a thick client, 250k devices are loaded into the device cache.  The
thick client then leaves.  There's one other thick client connected the
whole time for serving requests but I think that's irrelevant for the test
but want to point it out in case someone notices there's still a client
connected.

Show topology from logs of the client leaving:


[2021-01-08T23:08:05.012Z][INFO][disco-event-worker-#40][GridDiscoveryManager]
Node left topology: TcpDiscoveryNode
[id=611e30ee-b7c6-4ead-a746-f609b206cfb4,
consistentId=611e30ee-b7c6-4ead-a746-f609b206cfb4, addrs=ArrayList
[127.0.0.1, 172.17.0.3], sockAddrs=HashSet [/127.0.0.1:0, /172.17.0.3:0],
discPort=0, order=7, intOrder=6, lastExchangeTime=1610146373751, loc=false,
ver=2.8.1#20200521-sha1:86422096, isClient=true]
[2021-01-08T23:08:05.013Z][INFO][disco-event-worker-#40][GridDiscoveryManager]
Topology snapshot [ver=8, locNode=75e4ddea, servers=3, clients=1,
state=ACTIVE, CPUs=7, offheap=3.0GB, heap=3.1GB]

Start visor on one of the nodes


Show topology from logs


[2021-01-08T23:30:33.461Z][INFO][tcp-disco-msg-worker-[4ea8efe1
10.12.3.76:47500]-#2][TcpDiscoverySpi] New next node
[newNext=TcpDiscoveryNode [id=1cca94e3-f15f-4a8b-9f65-d9b9055a5fa7,
consistentId=10.12.2.110:47501, addrs=ArrayList [10.12.2.110],
sockAddrs=HashSet [/10.12.2.110:47501], discPort=47501, order=0, intOrder=7,
lastExchangeTime=1610148633458, loc=false, ver=2.8.1#20200521-sha1:86422096,
isClient=false]]
[2021-01-08T23:30:34.045Z][INFO][sys-#1011][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=75e4ddea-1927-4e93-82e9-fdfbb7b58d1c,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=9, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode
[id=1cca94e3-f15f-4a8b-9f65-d9b9055a5fa7, consistentId=10.12.2.110:47501,
addrs=ArrayList [10.12.2.110], sockAddrs=HashSet [/10.12.2.110:47501],
discPort=47501, order=9, intOrder=7, lastExchangeTime=1610148633458,
loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=false], done=true,
newCrdFut=null], topVer=AffinityTopologyVersion [topVer=9, minorTopVer=0]]

Show data balanced in visor


+---+-+---+-+---+---+---+---++
| Devices(@c2)  | PARTITIONED | 3 | 25 (0 / 25)
| min: 80315 (0 / 80315)| min: 0| min: 0| min: 0|
min: 25|
|   | |   |
| avg: 8.33 (0.00 / 8.33)   | avg: 0.00 | avg: 0.00 | avg: 0.00 |
avg: 25.00 |
|   | |   |
| max: 86968 (0 / 86968)| max: 0| max: 0| max: 0|
max: 25|
+---+-+---+-+---+---+---+---++

At this point the data is all relatively balanced and the topology increased
when visor connected.

Stop ignite on one node


Show topology and PME from logs (from a different ignite node as the ignite
process was stopped)


[2021-01-08T23:35:39.333Z][INFO][disco-event-worker-#40][GridDiscoveryManager]
Node left topology: TcpDiscoveryNode
[id=75e4ddea-1927-4e93-82e9-fdfbb7b58d1c,
consistentId=3a4a497f-5a89-4f2c-8531-b2b05f2ede22, addrs=ArrayList
[10.12.2.110], sockAddrs=HashSet [/10.12.2.110:47500], discPort=47500,
order=3, intOrder=3, lastExchangeTime=1610139164908, loc=false,
ver=2.8.1#20200521-sha1:86422096, isClient=false]
[2021-01-08T23:35:39.333Z][INFO][disco-event-worker-#40][GridDiscoveryManager]
Topology snapshot [ver=10, locNode=4ea8efe1, servers=2, clients=1,
state=ACTIVE, CPUs=5, offheap=2.0GB, heap=2.1GB]
[2021-01-08T23:35:39.333Z][INFO][disco-event-worker-#40][GridDiscoveryManager]  
^-- Baseline [id=0, size=3, online=2, offline=1]
[2021-01-08T23:35:39.335Z][INFO][exchange-worker-#41][time] Started exchange
init [topVer=AffinityTopologyVersion [topVer=10, minorTopVer=0], crd=true,
evt=NODE_LEFT, evtNode=75e4ddea-1927-4e93-82e9-fdfbb7b58d1c, customEvt=null,
allowMerge=false, exchangeFreeSwitch=true]
[2021-01-08T23:35:39.338Z][INFO][sys-#1031][GridAffinityAssignmentCache]
Local node affinity assignment distribution is not ideal [cache=Households,
expectedPrimary=512.00, actualPrimary=548, expectedBackups=1024.00,
actualBackups=476, warningThreshold=50.00%]
[2021-01-08T23:35:39.340Z][INFO][sys-#1032][GridAffinityAssignmentCache]
Local node affinity assignment distribution is not ideal [cache=Devices,
expectedPrimary=512.00, actualPrimary=548, expectedBackups=1024.00,
actualBackups=476, warningThreshold=50.00%]
[2021-01-08T23:35:39.354Z][INFO][exchange-worker-#41][GridDhtPartitionsExchangeFuture]
Finished waiting for partition release future

incorrect partition map exchange behaviour

2021-01-07 Thread tschauenberg
Hi,

We have a cluster of Ignite 2.8.1 server nodes and have recently started
looking at the individual cache metrics for primary keys
org.apache.ignite.internal.processors.cache.CacheLocalMetricsMXBeanImpl.OffHeapPrimaryEntriesCount

In our configuration we have a replicated cache with 2 backups.  Our cluster
has 3 nodes in it so the primaries should be spread equally on the 3 nodes
and each node has backups from the other two nodes.  All these server nodes
are in the baseline.  Additionally we have some thick clients connected but
I don't think they are relevant to the discussion.

Whenever we do a rolling restart one node at a time, at the end after the
last node is restarted it always owns zero primaries and owns solely
backups.  The two nodes restarted earlier during the rolling restart own all
the primaries.

When our cluster is in this scenario, if we start and stop visor, when visor
leaves the cluster it triggers a PME where all keys get balanced on all
server nodes.  Looking at the visor cache stats between the start and stop
we can see a min of 0 keys on the nodes for our cache so visor and the jmx
metrics line up on that front.  After stopping visor, the jmx metrics show
the evenly distributed primaries and then starting visor a second time we
can confirm that again the min, average, max node keys are all evenly
distributed.

Every join and leave during the rolling restart and during visor start/stop
shows reflects a topology increment and node leave and join events in the
logs.  

According to
https://cwiki.apache.org/confluence/display/IGNITE/%2528Partition+Map%2529+Exchange+-+under+the+hood
each leave and join should trigger the PME but we only see the keys changing
on the leaves.

Additionally, we tried waiting longer between the stop and start part of the
rolling restart to see if that had any effect.  We ensured we waited long
enough for a PME to do any moving but waiting longer didn't have any effect. 
The stop always has the PME move the keys off that node and the start never
sees the PME move any primaries back.

Why are we only seeing the PME change keys when nodes (server or visor) stop
and never when they join?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/