Re: Sort nodes in the ring in order to minimize the number of reconnections

Denis Magda Fri, 23 Dec 2016 12:56:14 -0800

Alexander,

This is something different and looks unrelated to the discussion we have over 
here.


A transaction will not be rolled back the way you’re describing. It will be 
either committed once or rolled back once. There can be and will be inter nodes 
communication when something fails at the commit phase but this depends on how 
the affinity function distributes the keys and partitions and not how the nodes 
are connected at the discovery SPI layer.
 
Here you can learn more about failures handling by 2 phase commit protocol
http://gridgain.blogspot.com/2014/09/two-phase-commit-for-in-memory-caches.html 
<http://gridgain.blogspot.com/2014/09/two-phase-commit-for-in-memory-caches.html>

—
Denis

> On Dec 23, 2016, at 12:24 PM, Александр Меньшиков <sharple...@gmail.com> 
> wrote:
> 
> I in fact worried about the following situation:
> 
> Like i said we have ring A->F->B->E->C->D->A, and connection between A,B,C
> and D,E,F was been broken. But nodes will detect the fact of the
> unavailability of nodes not at the same time. And meanwhile the client will
> perform transactional operations. Transactions may rollback many times in
> the following sequence of events:
> 
> 0. Everything is fine: A->F->B->E->C->D->A.
> 1. Connection between A,B,C and D,E,F is broken.
> 2. "A" sees "F" falls out of topology and reconnect to "B", all
> transactions using the "F" are rolled back and begin with backup node ("B",
> for example).
> 3. After that "B" sees "E" falls out of topology and reconnect to "C", all
> transaction using "E" are rolled back and begin with backup node ("C", for
> example).
> 4. After that "C" sees "D" falls out of topology and reconnect to "A", all
> transaction using "D" are rolled back and begin with backup node ("A", for
> example).
> 
> And we get 3 different set of rollbacks, instead one set of rollbacks.
> 
> 2016-12-23 22:43 GMT+03:00 Valentin Kulichenko <
> valentin.kuliche...@gmail.com>:
> 
>> Hi Vyacheslav,
>> 
>> Discovery logic is incapsulated in TcpDiscoverySpi.
>> TcpDiscoveryMulticastIpFinder in one of many implementations of IP finder.
>> The only purpose of the IP finder is to provide list of addresses where a
>> node can send initial join request, and the fact that it sends this initial
>> request to node A doesn't actually mean that it will be connected to A
>> within a ring. Having said that, I doubt that IP finder will be somehow
>> affected in case the discussed change is implemented.
>> 
>> Discovery protocol already maintains consistent information about the ring,
>> so any node in topology already knows everything about other nodes,
>> including ordering in the ring. So on discovery level it should not be very
>> difficult to customize where a joining node is placed on the ring.
>> 
>> However, here is the concern I have. Currently when a new node joins,
>> coordinator assigns order number to this node (e.g. if we already have
>> nodes 1,2 and 3, new node will have order 4). This node will then be the
>> last one on the ring, i.e. nodes are always ordered in the ring by this
>> order number (1->2->3->4->1). If we change this, we will basically allow a
>> node to be placed anywhere else (smth like 1->2->4->3->1). I'm not 100%
>> sure if this is going to cause issues, but sounds dangerous.
>> 
>> Yakov, can you please chime in and share your thoughts on this?
>> 
>> -Val
>> 
>> On Fri, Dec 23, 2016 at 2:46 AM, Vyacheslav Daradur <daradu...@gmail.com>
>> wrote:
>> 
>>> Thanks for reply.
>>> 
>>> I have some questions:
>>> 
>>> 1. Where the logic of Ignite cluster building is realized? DiscoverySpi
>> and
>>> TcpDiscoveryMulticastIpFinder?
>>> 
>>> 2. Which standart Ignite metrics you can recommend to use for
>>> node-ordering?
>>> 
>>> 2016-12-22 19:08 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>:
>>> 
>>>> I think having some user-defined ordering can be beneficial. However,
>> we
>>>> are only talking about node discovery protocol here to maintain the
>>>> cluster. All other communication between nodes happens directly (does
>> not
>>>> go through the ring).
>>>> 
>>>> D.
>>>> 
>>>> On Thu, Dec 22, 2016 at 6:32 AM, Vyacheslav Daradur <
>> daradu...@gmail.com
>>>> 
>>>> wrote:
>>>> 
>>>>> Hello, Alex!
>>>>> 
>>>>> I think it is a great idea.
>>>>> 
>>>>> I suggest to build communications between nodes on weight (or
>>> priority).
>>>>> 
>>>>> For example, ordering on latency:
>>>>> - nodes on one host = 1
>>>>> - nodes in one rack-blade = 2
>>>>> - nodes in one server-rack = 3
>>>>> - nodes in one physical cluster = 4
>>>>> - nodes in one subnet = 5
>>>>> - etc.
>>>>> 
>>>>> Maybe it'll be better to use some metrics from ClusterMetrics
>>> interface.
>>>>> 
>>>>> The algorithm of ordering can be implemented in a class such as
>>>> Comparator
>>>>> and use it when we build a cluster or we select a place for a new
>> node.
>>>>> 
>>>>> --
>>>>> With best regards,
>>>>> Vyacheslav Daradur
>>>>> 
>>>>> 2016-12-22 13:59 GMT+03:00 Александр Меньшиков <sharple...@gmail.com
>>> :
>>>>> 
>>>>>> Hello everyone,
>>>>>> 
>>>>>> As far as I know nodes are connected in a ring. For example if i
>>> have 6
>>>>>> nodes, with names A, B, C, D, E, and F they can connect in ring any
>>>>>> possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, etc. And if some
>> node
>>>>> falls
>>>>>> out of topology neighboring nodes must reconnect. If nodes A,B and
>> C
>>>>>> located in the same physical location, and D, E and F in another,
>> and
>>>> in
>>>>>> some time one physical location is not available in another, we can
>>> get
>>>>>> different number of reconnections. Best case scenario if we have
>> ring
>>>>> like
>>>>>> A-B-CxD-E-FxA ('x' mean disconnect) -- then we get only one
>> reconnect
>>>> (C
>>>>>> reconnect to A or F reconnect to D -- depending on what part of the
>>>>> cluster
>>>>>> we leave alive). But now possible that case AxFxBxExCxDxA -- then
>> we
>>>> get
>>>>> a
>>>>>> lot of reconnections (A to B, B to C, C to A -- in general n/2
>>>>>> reconnections, where n -- number of nodes). And i think to add
>>>> something
>>>>> to
>>>>>> ensure that we always have good sorting of nodes connections
>>>>>> (A-B-C-...-Z-A).
>>>>>> 
>>>>>> Of course in real world we can have multiple levels of physical
>>>>> closeness.
>>>>>> 
>>>>>> In my opinion enough to add one parameter of 'int' to configuration
>>>> (with
>>>>>> name like 'ExtraNodeOrder') and to change the method of comparison
>>>> nodes
>>>>> so
>>>>>> that it first compared the 'ExtraNodeOrder', and then according to
>>> the
>>>>> old
>>>>>> criterion (as far as I know Ignite use topology version). So if
>> some
>>>>> users
>>>>>> have multiple levels of physical closeness, he can use different
>>> bits.
>>>>> For
>>>>>> example use 16 high bits for DC number, and low 16 bits for racks.
>>>>>> 
>>>>>> Alternatively, we can add array of ‘int’ to configuration and
>> compare
>>>>> nodes
>>>>>> in sequence from the zero element to the last.
>>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: Sort nodes in the ring in order to minimize the number of reconnections

Reply via email to