How to fix Ignite node segmentation without restart

Actarus Tue, 16 Jun 2020 06:26:18 -0700

Hello,

I'm running Apache Ignite (2.4.0) embedded into a java application that runs
in a master/slave architecture. This means that there are only ever two
nodes in a grid, in FULL_SYNC, REPLICATED mode. Only the master application
writes to the grid, the slave only reads from it when it gets promoted to
master on a failover.


In such an architecture, network segmentation issues mean different things.
Typically I see that for handling segmentation, the node that experienced
the issue would need to be restarted. However in this scenario if the master
is segmented, I do not want to restart it and I cannot do a failover because
a network issue just happened and the stand-by may be invalid. The fix is to
always restart the slave.

However I notice that regardless of handling the EVT_NODE_SEGMENTED event,
adding a SegmentationProcess, running with SegmentationPolicy.NOOP and
having a segmentation plugin and always returning true/OK, I find that the
node that runs in master always remains in segmented state, and it is
impossible for it to re-join a cluster after restarting the slave node.

Is there some mechanism I can use to tell the node within my master process
to completely ignore segmentation? Or tell it that it is fine so that
discovery can still happen after I restart the slave node? Currently I used
port 4444 with TcpDiscoverySpi with hard-coded addresses (master and slave
IP addresses). When the master node is segmented (by simulating network
issues on the command-line) it appears there's no way for the discovery to
recover - port 4444 is shut down, and the slave node always comes up blind
to the master.

I would appreciate any insights on this issue. Thank you.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

How to fix Ignite node segmentation without restart

Reply via email to