Of course I meant ticket [1] increased cluster stability in situation of blinking network.
[1] https://issues.apache.org/jira/browse/IGNITE-7163 On Mon, Jun 8, 2020 at 1:51 PM Sergey Chugunov <sergey.chugu...@gmail.com> wrote: > Vladimir, > > Adding to what Alexey has said I remember that cases of short-term network > issues (blinking network) were also a driver for this improvement. They are > indeed hard to reproduce but have been seen in real world set-ups and have > proven to increase cluster stability. > > On Sat, Jun 6, 2020 at 5:09 PM Denis Magda <dma...@apache.org> wrote: > >> Finally, I got your question. >> >> Back in 2017-2018, there was a Discovery SPI's stabilization activity. The >> networking component could fail in various hard-to-reproduce scenarios >> affecting cluster availability and consistency. That ticket reminds me of >> those notorious issues that would fire once a week or month under specific >> configuration settings. So, I would not touch the code that fixes the >> issue >> unless @Alexey Goncharuk <alexey.goncha...@gmail.com> or @Sergey Chugunov >> <schugu...@gridgain.com> confirms that it's safe to do. Also, there >> should >> be a test for this scenario. >> >> - >> Denis >> >> >> On Fri, Jun 5, 2020 at 12:28 AM Vladimir Steshin <vlads...@gmail.com> >> wrote: >> >> > Denis, >> > >> > I have no nodes that I'm unable to interconnect. This case is simulated >> > in IgniteDiscoveryMassiveNodeFailTest.testMassiveFailSelfKill() >> > Introduced in [1]. >> > >> > I’m asking if it is real or supposed problem. Where it was met? Which >> > network configuration/issues could be? >> > >> > >> > [1] https://issues.apache.org/jira/browse/IGNITE-7163 >> > >> > 05.06.2020 1:01, Denis Magda пишет: >> > > Vladimir, >> > > >> > > I'm suggesting to share the log files from the nodes that are unable >> to >> > > interconnect so that the community can check them for potential >> issues. >> > > Instead of sharing the logs from all the 5 nodes, try to start a >> > two-nodes >> > > cluster with the nodes that fail to discover each other and attach the >> > logs >> > > from those. >> > > >> > > - >> > > Denis >> > > >> > > >> > > On Thu, Jun 4, 2020 at 1:57 PM Vladimir Steshin <vlads...@gmail.com> >> > wrote: >> > > >> > >> Denis, hi. >> > >> >> > >> Sorry, I didn’t catch your idea. Are you saying this can happen >> > and >> > >> suggest experiment? I’m not descripting a probable case. It is >> already >> > >> done in [1]. I’m asking is it real, where it was met. >> > >> >> > >> >> > >> 04.06.2020 23:33, Denis Magda пишет: >> > >>> Vladimir, >> > >>> >> > >>> Please do the following experiment. Start a 2-nodes cluster booting >> > node >> > >> 3 >> > >>> and, for instance, node 5. Those won't be able to interconnect >> > according >> > >> to >> > >>> your description. Attach the log files from both nodes for analysis. >> > This >> > >>> should be a networking issue. >> > >>> >> > >>> - >> > >>> Denis >> > >>> >> > >>> >> > >>> On Thu, Jun 4, 2020 at 1:24 PM Vladimir Steshin <vlads...@gmail.com >> > >> > >> wrote: >> > >>>> Hi, Igniters. >> > >>>> >> > >>>> >> > >>>> I wanted to ask how one node may not be able to connect to >> > another >> > >>>> whereas rest of the cluster can. This got covered in [1]. In short: >> > node >> > >>>> 3 can't connect to nodes 4 and 5 but can to 1. At the same time, >> node >> > 2 >> > >>>> can connect to 4. Questions: >> > >>>> >> > >>>> 1) Is it real case? Where this problem came from? >> > >>>> >> > >>>> 2) If node 3 can’t connect to 4 and 5, does it mean node 2 can’t >> > connect >> > >>>> to 4 (and 5) too? >> > >>>> >> > >>>> Sergey, Dmitry maybe you bring light (I see you in [1])? I'm >> > >>>> participating in [2] and found this backward connection checking. >> > >>>> Answering would help us a lot. >> > >>>> >> > >>>> Thanks! >> > >>>> >> > >>>> [1] >> > >>>> https://issues.apache.org/jira/browse/IGNITE-7163< >> > >>>> https://issues.apache.org/jira/browse/IGNITE-7163> >> > >>>> >> > >>>> [2] >> > >>>> >> > >>>> >> > >> >> > >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up >> > >>>> < >> > >>>> >> > >> >> > >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up >> > >> >