Of course I meant ticket [1] increased cluster stability in situation of
blinking network.

[1] https://issues.apache.org/jira/browse/IGNITE-7163

On Mon, Jun 8, 2020 at 1:51 PM Sergey Chugunov <sergey.chugu...@gmail.com>
wrote:

> Vladimir,
>
> Adding to what Alexey has said I remember that cases of short-term network
> issues (blinking network) were also a driver for this improvement. They are
> indeed hard to reproduce but have been seen in real world set-ups and have
> proven to increase cluster stability.
>
> On Sat, Jun 6, 2020 at 5:09 PM Denis Magda <dma...@apache.org> wrote:
>
>> Finally, I got your question.
>>
>> Back in 2017-2018, there was a Discovery SPI's stabilization activity. The
>> networking component could fail in various hard-to-reproduce scenarios
>> affecting cluster availability and consistency. That ticket reminds me of
>> those notorious issues that would fire once a week or month under specific
>> configuration settings. So, I would not touch the code that fixes the
>> issue
>> unless @Alexey Goncharuk <alexey.goncha...@gmail.com> or @Sergey Chugunov
>> <schugu...@gridgain.com> confirms that it's safe to do. Also, there
>> should
>> be a test for this scenario.
>>
>> -
>> Denis
>>
>>
>> On Fri, Jun 5, 2020 at 12:28 AM Vladimir Steshin <vlads...@gmail.com>
>> wrote:
>>
>> > Denis,
>> >
>> > I have no nodes that I'm unable to interconnect. This case is simulated
>> > in IgniteDiscoveryMassiveNodeFailTest.testMassiveFailSelfKill()
>> > Introduced in [1].
>> >
>> > I’m asking if it is real or supposed problem. Where it was met? Which
>> > network configuration/issues could be?
>> >
>> >
>> > [1] https://issues.apache.org/jira/browse/IGNITE-7163
>> >
>> > 05.06.2020 1:01, Denis Magda пишет:
>> > > Vladimir,
>> > >
>> > > I'm suggesting to share the log files from the nodes that are unable
>> to
>> > > interconnect so that the community can check them for potential
>> issues.
>> > > Instead of sharing the logs from all the 5 nodes, try to start a
>> > two-nodes
>> > > cluster with the nodes that fail to discover each other and attach the
>> > logs
>> > > from those.
>> > >
>> > > -
>> > > Denis
>> > >
>> > >
>> > > On Thu, Jun 4, 2020 at 1:57 PM Vladimir Steshin <vlads...@gmail.com>
>> > wrote:
>> > >
>> > >> Denis, hi.
>> > >>
>> > >>       Sorry, I didn’t catch your idea. Are you saying this can happen
>> > and
>> > >> suggest experiment? I’m not descripting a probable case. It is
>> already
>> > >> done in [1]. I’m asking is it real, where it was met.
>> > >>
>> > >>
>> > >> 04.06.2020 23:33, Denis Magda пишет:
>> > >>> Vladimir,
>> > >>>
>> > >>> Please do the following experiment. Start a 2-nodes cluster booting
>> > node
>> > >> 3
>> > >>> and, for instance, node 5. Those won't be able to interconnect
>> > according
>> > >> to
>> > >>> your description. Attach the log files from both nodes for analysis.
>> > This
>> > >>> should be a networking issue.
>> > >>>
>> > >>> -
>> > >>> Denis
>> > >>>
>> > >>>
>> > >>> On Thu, Jun 4, 2020 at 1:24 PM Vladimir Steshin <vlads...@gmail.com
>> >
>> > >> wrote:
>> > >>>>        Hi, Igniters.
>> > >>>>
>> > >>>>
>> > >>>>        I wanted to ask how one node may not be able to connect to
>> > another
>> > >>>> whereas rest of the cluster can. This got covered in [1]. In short:
>> > node
>> > >>>> 3 can't connect to nodes 4 and 5 but can to 1. At the same time,
>> node
>> > 2
>> > >>>> can connect to 4. Questions:
>> > >>>>
>> > >>>> 1) Is it real case? Where this problem came from?
>> > >>>>
>> > >>>> 2) If node 3 can’t connect to 4 and 5, does it mean node 2 can’t
>> > connect
>> > >>>> to 4 (and 5) too?
>> > >>>>
>> > >>>> Sergey, Dmitry maybe you bring light (I see you in [1])? I'm
>> > >>>> participating in [2] and found this backward connection checking.
>> > >>>> Answering would help us a lot.
>> > >>>>
>> > >>>> Thanks!
>> > >>>>
>> > >>>> [1]
>> > >>>> https://issues.apache.org/jira/browse/IGNITE-7163<
>> > >>>> https://issues.apache.org/jira/browse/IGNITE-7163>
>> > >>>>
>> > >>>> [2]
>> > >>>>
>> > >>>>
>> > >>
>> >
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up
>> > >>>> <
>> > >>>>
>> > >>
>> >
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up
>> >
>>
>

Reply via email to