Re: Cache was inconsistent state
John, It looks like a split-brain. They were in one cluster at first. I'm not sure what was the reason for this, it could be a network problem or something else. I saw in logs that you use both ipv4 and ipv6, I would recommend using only one of them to avoid problems - just add -Djava.net.preferIPv4Stack=true to all nodes in the cluster. Also, to avoid split-brain situations, you can use Zookeeper Discovery: https://apacheignite.readme.io/docs/zookeeper-discovery#failures-and-split-brain-handling or implement Segmentation resolver. More information about the second can be found on the forum, for example, here: http://apache-ignite-users.70518.x6.nabble.com/split-brain-problem-and-GridSegmentationProcessor-td14590.html Evgenii пт, 8 мая 2020 г. в 14:30, John Smith : > How though? It's the same cluster! We haven't changed anything > this happened on it's own... > > All I did was reboot the node and the cluster fixed itself. > > On Fri, 8 May 2020 at 15:32, Evgenii Zhuravlev > wrote: > >> Hi John, >> >> *Yes, it looks like they are in a different clusters:* >> *Metrics from the node with a problem:* >> [15:17:28,668][INFO][grid-timeout-worker-#23%xx%][IgniteKernal%xx] >> >> Metrics for local node (to disable set 'metricsLogFrequency' to 0) >> ^-- Node [id=5bbf262e, name=xx, uptime=93 days, 19:36:10.921] >> ^-- H/N/C [hosts=3, nodes=4, CPUs=10] >> >> *Metrics from another node:* >> [15:17:05,635][INFO][grid-timeout-worker-#23%xx%][IgniteKernal%xx] >> >> Metrics for local node (to disable set 'metricsLogFrequency' to 0) >> ^-- Node [id=dddefdcd, name=xx, uptime=19 days, 16:49:48.381] >> ^-- H/N/C [hosts=6, nodes=7, CPUs=21] >> >> *The same topology versions for 2 nodes has different nodes:* >> [03:56:17,643][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager] >> Topology snapshot [ver=1036, locNode=5bbf262e, servers=1, clients=3, >> state=ACTIVE, CPUs=10, offheap=10.0GB, heap=13.0GB] >> [03:56:17,643][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager] >> ^-- Baseline [id=0, size=3, online=1, offline=2] >> >> *And* >> >> [03:56:43,388][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager] >> Topology snapshot [ver=1036, locNode=4394fdd4, servers=2, clients=2, >> state=ACTIVE, CPUs=15, offheap=20.0GB, heap=19.0GB] >> [03:56:43,389][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager] >> ^-- Baseline [id=0, size=3, online=2, offline=1] >> >> So, it's just 2 different clusters. >> >> Best Regards, >> Evgenii >> >> пт, 8 мая 2020 г. в 08:50, John Smith : >> >>> Hi Evgenii, here the logs. >>> >>> https://www.dropbox.com/s/ke71qsoqg588kc8/ignite-logs.zip?dl=0 >>> >>> On Fri, 8 May 2020 at 09:21, John Smith wrote: >>> Ok let me try get them... On Thu., May 7, 2020, 1:14 p.m. Evgenii Zhuravlev, < e.zhuravlev...@gmail.com> wrote: > Hi, > > It looks like the third server node was not a part of this cluster > before restart. Can you share full logs from all server nodes? > > Evgenii > > чт, 7 мая 2020 г. в 09:11, John Smith : > >> Hi, running 2.7.0 on 3 deployed on VMs running Ubuntu. >> >> I checked the state of the cluster by going >> to: /ignite?cmd=currentState >> And the response was: >> {"successStatus":0,"error":null,"sessionToken":null,"response":true} >> I also checked: /ignite?cmd=size&cacheName= >> >> 2 nodes where reporting 3 million records >> 1 node was reporting 2 million records. >> >> When I connected to visor and ran the node command... The details >> where wrong as it only showed 2 server nodes and only 1 client, but 3 >> server nodes actually exist and more clients are connected. >> >> So I rebooted the node that was claiming 2 million records instead of >> 3 and when I re-ran the node command displayed all the proper nodes. >> Also after the reboot all the nodes started reporting 2 million >> records instead of 3 million so there some sort of rebalancing or >> correction (the cache has a 90 day TTL)? >> >> >> >> Before reboot >> >> +=+ >> | # | Node ID8(@), IP |Consistent ID >> | Node Type | Up Time | CPUs | CPU Load | Free Heap | >> >> +=+ >> | 0 | xx(@n0), xx.69 | xx | Server| 20:25:30 | 4| >> 1.27 % | 84.00 % | >> | 1 | xx(@n1), xx.1 | xx | Client| 13:12:01 | 3| >> 0.67 % | 74.00 % | >> | 2 | xx(@n2), xx.63 | xx | Server| 16:55:05 | 4| >> 6.57 % | 84.00 % | >> >> +
Re: Cache was inconsistent state
How though? It's the same cluster! We haven't changed anything this happened on it's own... All I did was reboot the node and the cluster fixed itself. On Fri, 8 May 2020 at 15:32, Evgenii Zhuravlev wrote: > Hi John, > > *Yes, it looks like they are in a different clusters:* > *Metrics from the node with a problem:* > [15:17:28,668][INFO][grid-timeout-worker-#23%xx%][IgniteKernal%xx] > Metrics for local node (to disable set 'metricsLogFrequency' to 0) > ^-- Node [id=5bbf262e, name=xx, uptime=93 days, 19:36:10.921] > ^-- H/N/C [hosts=3, nodes=4, CPUs=10] > > *Metrics from another node:* > [15:17:05,635][INFO][grid-timeout-worker-#23%xx%][IgniteKernal%xx] > Metrics for local node (to disable set 'metricsLogFrequency' to 0) > ^-- Node [id=dddefdcd, name=xx, uptime=19 days, 16:49:48.381] > ^-- H/N/C [hosts=6, nodes=7, CPUs=21] > > *The same topology versions for 2 nodes has different nodes:* > [03:56:17,643][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager] > Topology snapshot [ver=1036, locNode=5bbf262e, servers=1, clients=3, > state=ACTIVE, CPUs=10, offheap=10.0GB, heap=13.0GB] > [03:56:17,643][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager] > ^-- Baseline [id=0, size=3, online=1, offline=2] > > *And* > > [03:56:43,388][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager] > Topology snapshot [ver=1036, locNode=4394fdd4, servers=2, clients=2, > state=ACTIVE, CPUs=15, offheap=20.0GB, heap=19.0GB] > [03:56:43,389][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager] > ^-- Baseline [id=0, size=3, online=2, offline=1] > > So, it's just 2 different clusters. > > Best Regards, > Evgenii > > пт, 8 мая 2020 г. в 08:50, John Smith : > >> Hi Evgenii, here the logs. >> >> https://www.dropbox.com/s/ke71qsoqg588kc8/ignite-logs.zip?dl=0 >> >> On Fri, 8 May 2020 at 09:21, John Smith wrote: >> >>> Ok let me try get them... >>> >>> On Thu., May 7, 2020, 1:14 p.m. Evgenii Zhuravlev, < >>> e.zhuravlev...@gmail.com> wrote: >>> Hi, It looks like the third server node was not a part of this cluster before restart. Can you share full logs from all server nodes? Evgenii чт, 7 мая 2020 г. в 09:11, John Smith : > Hi, running 2.7.0 on 3 deployed on VMs running Ubuntu. > > I checked the state of the cluster by going > to: /ignite?cmd=currentState > And the response was: > {"successStatus":0,"error":null,"sessionToken":null,"response":true} > I also checked: /ignite?cmd=size&cacheName= > > 2 nodes where reporting 3 million records > 1 node was reporting 2 million records. > > When I connected to visor and ran the node command... The details > where wrong as it only showed 2 server nodes and only 1 client, but 3 > server nodes actually exist and more clients are connected. > > So I rebooted the node that was claiming 2 million records instead of > 3 and when I re-ran the node command displayed all the proper nodes. > Also after the reboot all the nodes started reporting 2 million > records instead of 3 million so there some sort of rebalancing or > correction (the cache has a 90 day TTL)? > > > > Before reboot > > +=+ > | # | Node ID8(@), IP |Consistent ID > | Node Type | Up Time | CPUs | CPU Load | Free Heap | > > +=+ > | 0 | xx(@n0), xx.69 | xx | Server| 20:25:30 | 4| > 1.27 % | 84.00 % | > | 1 | xx(@n1), xx.1 | xx | Client| 13:12:01 | 3| > 0.67 % | 74.00 % | > | 2 | xx(@n2), xx.63 | xx | Server| 16:55:05 | 4| > 6.57 % | 84.00 % | > > +-+ > > After reboot > > +=+ > | # | Node ID8(@), IP |Consistent ID > | Node Type | Up Time | CPUs | CPU Load | Free Heap | > > +=+ > | 0 | xx(@n0), xx.69 | xx | Server| 21:13:45 | 4| > 0.77 % | 56.00 % | > | 1 | xx(@n1), xx.1 | xx | Client| 14:00:17 | 3| > 0.77 % | 56.00 % | > | 2 | xx(@n2), xx.63 | xx | Server| 17:43:20 | 4| > 1.00 % | 60.00 % | > | 3 | xx(@n3), xx.65 | xx | Client| 01:42:45 | 4| > 4.10 % | 56.00 % | > | 4 | x
Re: Cache was inconsistent state
Hi John, *Yes, it looks like they are in a different clusters:* *Metrics from the node with a problem:* [15:17:28,668][INFO][grid-timeout-worker-#23%xx%][IgniteKernal%xx] Metrics for local node (to disable set 'metricsLogFrequency' to 0) ^-- Node [id=5bbf262e, name=xx, uptime=93 days, 19:36:10.921] ^-- H/N/C [hosts=3, nodes=4, CPUs=10] *Metrics from another node:* [15:17:05,635][INFO][grid-timeout-worker-#23%xx%][IgniteKernal%xx] Metrics for local node (to disable set 'metricsLogFrequency' to 0) ^-- Node [id=dddefdcd, name=xx, uptime=19 days, 16:49:48.381] ^-- H/N/C [hosts=6, nodes=7, CPUs=21] *The same topology versions for 2 nodes has different nodes:* [03:56:17,643][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager] Topology snapshot [ver=1036, locNode=5bbf262e, servers=1, clients=3, state=ACTIVE, CPUs=10, offheap=10.0GB, heap=13.0GB] [03:56:17,643][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager] ^-- Baseline [id=0, size=3, online=1, offline=2] *And* [03:56:43,388][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager] Topology snapshot [ver=1036, locNode=4394fdd4, servers=2, clients=2, state=ACTIVE, CPUs=15, offheap=20.0GB, heap=19.0GB] [03:56:43,389][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager] ^-- Baseline [id=0, size=3, online=2, offline=1] So, it's just 2 different clusters. Best Regards, Evgenii пт, 8 мая 2020 г. в 08:50, John Smith : > Hi Evgenii, here the logs. > > https://www.dropbox.com/s/ke71qsoqg588kc8/ignite-logs.zip?dl=0 > > On Fri, 8 May 2020 at 09:21, John Smith wrote: > >> Ok let me try get them... >> >> On Thu., May 7, 2020, 1:14 p.m. Evgenii Zhuravlev, < >> e.zhuravlev...@gmail.com> wrote: >> >>> Hi, >>> >>> It looks like the third server node was not a part of this cluster >>> before restart. Can you share full logs from all server nodes? >>> >>> Evgenii >>> >>> чт, 7 мая 2020 г. в 09:11, John Smith : >>> Hi, running 2.7.0 on 3 deployed on VMs running Ubuntu. I checked the state of the cluster by going to: /ignite?cmd=currentState And the response was: {"successStatus":0,"error":null,"sessionToken":null,"response":true} I also checked: /ignite?cmd=size&cacheName= 2 nodes where reporting 3 million records 1 node was reporting 2 million records. When I connected to visor and ran the node command... The details where wrong as it only showed 2 server nodes and only 1 client, but 3 server nodes actually exist and more clients are connected. So I rebooted the node that was claiming 2 million records instead of 3 and when I re-ran the node command displayed all the proper nodes. Also after the reboot all the nodes started reporting 2 million records instead of 3 million so there some sort of rebalancing or correction (the cache has a 90 day TTL)? Before reboot +=+ | # | Node ID8(@), IP |Consistent ID | Node Type | Up Time | CPUs | CPU Load | Free Heap | +=+ | 0 | xx(@n0), xx.69 | xx | Server| 20:25:30 | 4| 1.27 % | 84.00 % | | 1 | xx(@n1), xx.1 | xx | Client| 13:12:01 | 3| 0.67 % | 74.00 % | | 2 | xx(@n2), xx.63 | xx | Server| 16:55:05 | 4| 6.57 % | 84.00 % | +-+ After reboot +=+ | # | Node ID8(@), IP |Consistent ID | Node Type | Up Time | CPUs | CPU Load | Free Heap | +=+ | 0 | xx(@n0), xx.69 | xx | Server| 21:13:45 | 4| 0.77 % | 56.00 % | | 1 | xx(@n1), xx.1 | xx | Client| 14:00:17 | 3| 0.77 % | 56.00 % | | 2 | xx(@n2), xx.63 | xx | Server| 17:43:20 | 4| 1.00 % | 60.00 % | | 3 | xx(@n3), xx.65 | xx | Client| 01:42:45 | 4| 4.10 % | 56.00 % | | 4 | xx(@n4), xx.65 | xx | Client| 01:42:45 | 4| 3.93 % | 56.00 % | | 5 | xx(@n5), xx.1 | xx | Client| 16:59:53 | 2| 0.67 % | 91.00 % | | 6 | xx(@n6), xx.79 | xx | Server| 00:41:31 | 4| 1.00 % | 97.00 % | +
Re: Cache was inconsistent state
Hi Evgenii, here the logs. https://www.dropbox.com/s/ke71qsoqg588kc8/ignite-logs.zip?dl=0 On Fri, 8 May 2020 at 09:21, John Smith wrote: > Ok let me try get them... > > On Thu., May 7, 2020, 1:14 p.m. Evgenii Zhuravlev, < > e.zhuravlev...@gmail.com> wrote: > >> Hi, >> >> It looks like the third server node was not a part of this cluster before >> restart. Can you share full logs from all server nodes? >> >> Evgenii >> >> чт, 7 мая 2020 г. в 09:11, John Smith : >> >>> Hi, running 2.7.0 on 3 deployed on VMs running Ubuntu. >>> >>> I checked the state of the cluster by going to: /ignite?cmd=currentState >>> And the response was: >>> {"successStatus":0,"error":null,"sessionToken":null,"response":true} >>> I also checked: /ignite?cmd=size&cacheName= >>> >>> 2 nodes where reporting 3 million records >>> 1 node was reporting 2 million records. >>> >>> When I connected to visor and ran the node command... The details where >>> wrong as it only showed 2 server nodes and only 1 client, but 3 server >>> nodes actually exist and more clients are connected. >>> >>> So I rebooted the node that was claiming 2 million records instead of 3 >>> and when I re-ran the node command displayed all the proper nodes. >>> Also after the reboot all the nodes started reporting 2 million records >>> instead of 3 million so there some sort of rebalancing or correction (the >>> cache has a 90 day TTL)? >>> >>> >>> >>> Before reboot >>> >>> +=+ >>> | # | Node ID8(@), IP |Consistent ID >>> | Node Type | Up Time | CPUs | CPU Load | Free Heap | >>> >>> +=+ >>> | 0 | xx(@n0), xx.69 | xx | Server| 20:25:30 | 4| >>> 1.27 % | 84.00 % | >>> | 1 | xx(@n1), xx.1 | xx | Client| 13:12:01 | 3| >>> 0.67 % | 74.00 % | >>> | 2 | xx(@n2), xx.63 | xx | Server| 16:55:05 | 4| >>> 6.57 % | 84.00 % | >>> >>> +-+ >>> >>> After reboot >>> >>> +=+ >>> | # | Node ID8(@), IP |Consistent ID >>> | Node Type | Up Time | CPUs | CPU Load | Free Heap | >>> >>> +=+ >>> | 0 | xx(@n0), xx.69 | xx | Server| 21:13:45 | 4| >>> 0.77 % | 56.00 % | >>> | 1 | xx(@n1), xx.1 | xx | Client| 14:00:17 | 3| >>> 0.77 % | 56.00 % | >>> | 2 | xx(@n2), xx.63 | xx | Server| 17:43:20 | 4| >>> 1.00 % | 60.00 % | >>> | 3 | xx(@n3), xx.65 | xx | Client| 01:42:45 | 4| >>> 4.10 % | 56.00 % | >>> | 4 | xx(@n4), xx.65 | xx | Client| 01:42:45 | 4| >>> 3.93 % | 56.00 % | >>> | 5 | xx(@n5), xx.1 | xx | Client| 16:59:53 | 2| >>> 0.67 % | 91.00 % | >>> | 6 | xx(@n6), xx.79 | xx | Server| 00:41:31 | 4| >>> 1.00 % | 97.00 % | >>> >>> +-+ >>> >>
Re: Event for an update that should have been filtered is received in Local Listener of Continuous Query when a 1000 row insert is triggered
It might, I would need to see a reproducer to make a determination. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Event for an update that should have been filtered is received in Local Listener of Continuous Query when a 1000 row insert is triggered
I see this line being printed just before any local listener is invoked : 2020-05-06T16:28:17,909 INFO o.a.i.s.c.t.TcpCommunicationSpi [grid-nio-worker-tcp-comm-4-#255%ActivDataPublisher-ACTIVEI2-igniteclient-GREEN%]: Accepted incoming communication connection [locAddr=/x.x.x.x:yyy, rmtAddr=/x.x.x.x:yyy] The remote address is the address of the ignite server node. The question is, should a client whose remote filter should filter out an update , get this line at all ? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Event for an update that should have been filtered is received in Local Listener of Continuous Query when a 1000 row insert is triggered
Hi Alex, Thank you for the reply . >> verify the continuous query definitions using the appropriate view: https://apacheignite.readme.io/docs/continuous_queries We are on 2.7.6 version. I guess this view is not available for us. Also we have been running the CQs for couple of months now in our test env. and have not faced any issues. This issue is more recent and happens sometimes ( cannot figure out what could have caused it yet. ) . Though the issue has happened couple of times on our test env. - I am not able to reproduce this on my local machine. Also `i was able to cause this failure only once on the linux test env ( never so far on my windows machine even though I have tested many scenarios so far. ) It looks like some exceptional scenario or some race condition has caused this. Please note that we have recently put a EVT_NODE_SEGMENTED HANDLER and we have a handler to a cluster switch request where switch to a different cluster based on updates on a particular record on a particular table- the handling of both events is to do ignite.close() and Ignition.start()( with the right cluster config ). As mentioned, we have tested the event handler and huge inserts/updates after segmentation etc. and I am not able to cause this issue to occur. regards, Veena -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Cache was inconsistent state
Ok let me try get them... On Thu., May 7, 2020, 1:14 p.m. Evgenii Zhuravlev, wrote: > Hi, > > It looks like the third server node was not a part of this cluster before > restart. Can you share full logs from all server nodes? > > Evgenii > > чт, 7 мая 2020 г. в 09:11, John Smith : > >> Hi, running 2.7.0 on 3 deployed on VMs running Ubuntu. >> >> I checked the state of the cluster by going to: /ignite?cmd=currentState >> And the response was: >> {"successStatus":0,"error":null,"sessionToken":null,"response":true} >> I also checked: /ignite?cmd=size&cacheName= >> >> 2 nodes where reporting 3 million records >> 1 node was reporting 2 million records. >> >> When I connected to visor and ran the node command... The details where >> wrong as it only showed 2 server nodes and only 1 client, but 3 server >> nodes actually exist and more clients are connected. >> >> So I rebooted the node that was claiming 2 million records instead of 3 >> and when I re-ran the node command displayed all the proper nodes. >> Also after the reboot all the nodes started reporting 2 million records >> instead of 3 million so there some sort of rebalancing or correction (the >> cache has a 90 day TTL)? >> >> >> >> Before reboot >> >> +=+ >> | # | Node ID8(@), IP |Consistent ID >> | Node Type | Up Time | CPUs | CPU Load | Free Heap | >> >> +=+ >> | 0 | xx(@n0), xx.69 | xx | Server| 20:25:30 | 4| >> 1.27 % | 84.00 % | >> | 1 | xx(@n1), xx.1 | xx | Client| 13:12:01 | 3| 0.67 >> % | 74.00 % | >> | 2 | xx(@n2), xx.63 | xx | Server| 16:55:05 | 4| >> 6.57 % | 84.00 % | >> >> +-+ >> >> After reboot >> >> +=+ >> | # | Node ID8(@), IP |Consistent ID >> | Node Type | Up Time | CPUs | CPU Load | Free Heap | >> >> +=+ >> | 0 | xx(@n0), xx.69 | xx | Server| 21:13:45 | 4| >> 0.77 % | 56.00 % | >> | 1 | xx(@n1), xx.1 | xx | Client| 14:00:17 | 3| 0.77 >> % | 56.00 % | >> | 2 | xx(@n2), xx.63 | xx | Server| 17:43:20 | 4| >> 1.00 % | 60.00 % | >> | 3 | xx(@n3), xx.65 | xx | Client| 01:42:45 | 4| >> 4.10 % | 56.00 % | >> | 4 | xx(@n4), xx.65 | xx | Client| 01:42:45 | 4| >> 3.93 % | 56.00 % | >> | 5 | xx(@n5), xx.1 | xx | Client| 16:59:53 | 2| 0.67 >> % | 91.00 % | >> | 6 | xx(@n6), xx.79 | xx | Server| 00:41:31 | 4| >> 1.00 % | 97.00 % | >> >> +-+ >> >
Re: Deploying the Ignite Maven Project in LINUX
Thanks for sharing the link. It is really very helpful. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/