Re: [Dev] about network partitions

Asanka Abeyweera Mon, 08 Jun 2015 00:23:27 -0700

Hi Asitha,

Yes. We have to go for something like a lock in DB. We can easily do that
in RDBMS. But how can we do that in Cassandra? Found this page [1]. But
does not look very promising.


[1] http://wiki.apache.org/cassandra/Locking

On Mon, Jun 8, 2015 at 12:29 PM, Asitha Nanayakkara <asi...@wso2.com> wrote:

> Hi Asanka,
>
> Adding dev@
>
> On Mon, Jun 8, 2015 at 12:04 PM, Asanka Abeyweera <asank...@wso2.com>
> wrote:
>
>> Hi all,
>>
>> How are we going to handle following case with hazelcast?
>>
>> Assume we had an 8 node MB cluster and due to a network failure cluster
>> divided in to two partitions with 4 nodes each. Now each partition have its
>> own hazlecast cluster. But both the partitions are pointed to a single DB.
>> Since slot manager users a range to define a slot, a slot can include
>> messages from other partition's publishers. One side effect of this is
>> message duplication which should not happen with queues. Another one is the
>> message content removed by other partition before delivery. There can be
>> some other complications too.
>>
>> WDYT?
>>
>
> Yes I Agree with you on this. In my opinion in a Hazelcast partitioned
> scenario we can't take decisions depending on Hazelcast. What matters here
> is DB access. If we can have some sort of a lock for Slot coordinator in
> terms of database then we might be able to get away with most of the
> complexities involved. I'v talked about this in another mail thread as well
> [1] If there is no DB access, anyway there is nothing that a slot
> coordinator can do.
>
> Since DB access is vital for slot coordinator we might be better off using
> database specific locking mechanism at all times without depending on
> Hazelcast. WDYT?
>
> [1] [MB] Hazelcast coordinator issue after cluster partitioning
>
> Thanks,
> Asitha
>
>
>>
>>
>> On Mon, Jun 1, 2015 at 11:20 AM, Asitha Nanayakkara <asi...@wso2.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> What if we use the Hazelcast node list first member as coordinator
>>> (suggested by the Hazelcast support). In an event of a member left and
>>> member joined we evaluate the node list and check whether the node lists
>>> first member has changed. If that's changed we fire a coordinator changed
>>> event with the new coordinator details (this should be done in the kernel).
>>> And we write our coordinator logic depending on this event. Current slot
>>> coordinator might receive that he is not the coordinator so he can stop.
>>> New slot coordinator can start. Others can updated coordinator details.
>>>
>>> IMHO at all times regardless of the cluster is partitioned or not, there
>>> should be only one slot coordinator.
>>>
>>> In a situation where each node has access to DB (separate network card)
>>> but doesn't have access to coordination thru Hazelcast (malfunctioning
>>> network card) then there will be a cluster partition. And multiple slot
>>> coordinators will operate. If there are publishers and subscribers for the
>>> same queue on each partition messages will be duplicated and each slot
>>> coordinator will deliver messages from overlapping slots on their own.
>>>
>>> My point here is, in a partition scenario if we have DB access from all
>>> partitions, having multiple slot coordinators will be problematic. All this
>>> options are assuming thrift is working without any issue. If thrift is not
>>> working between partitions, then having a single slot coordinator will
>>> starve subscribers in other partitions.
>>>
>>> So we have four communication links we need to look at
>>>
>>>    - Database link
>>>    - Coordination link
>>>    - Thrift link
>>>    - Publisher subscriber link ( AMQP and MQTT ports)
>>>
>>> We need to analyze the impact of losing these links in any combination.
>>> I may be totally or partially wrong on this.
>>>
>>> Thanks
>>> Asitha
>>>
>>> On Mon, Jun 1, 2015 at 9:32 AM, Asanka Abeyweera <asank...@wso2.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> When the two partitions connect again, Can the cluster select a new
>>>> slot manager node (other than the ones already present in two partitions)?
>>>> We might also have to understand how the hazlecast lists and maps are
>>>> merged internally in these scenarios to fully answer this.
>>>>
>>>> On Sun, May 31, 2015 at 8:08 PM, Ramith Jayasinghe <ram...@wso2.com>
>>>> wrote:
>>>>
>>>>> well I'm not actually asking implement this.  BUT we absolutely have
>>>>> to have a reconciliation model otherwise we are screwed.
>>>>>
>>>>>
>>>>> On Sun, May 31, 2015 at 7:28 PM, Hasitha Hiranya <hasit...@wso2.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We need to merge all operating lists (fresh slots/assigned
>>>>>> slots/overlapped slots/returned slots) in two slot managers together.
>>>>>>
>>>>>> If we met a conflict during merging (same slot assigned to different
>>>>>> nodes), we should give a BIG warning, and maybe continue. At that point 
>>>>>> we
>>>>>> cannot do anything from Slot Manager Side, individual nodes will be
>>>>>> delivering same message.
>>>>>>
>>>>>> Otherwise we need to introduce some abortImmediately method - which
>>>>>> is heck-tic.
>>>>>>
>>>>>> So, yeah, Ramith's proposal looks simple enough. When partitions are
>>>>>> merged, allow big part to continue, and do not allow any new slot
>>>>>> assignments to nodes which are not in the partition, rather put a BIG 
>>>>>> log,
>>>>>> this node is useless and not in a cluster. Please restart.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Sun, May 31, 2015 at 7:45 AM, Pamod Sylvester <pa...@wso2.com>
>>>>>> wrote:
>>>>>>
>>>>>>> In this case we might need to sort messages which are laying in a
>>>>>>> queue or a durable subscription ? For message ordering. I.e maintaining
>>>>>>> time stamp etc
>>>>>>>
>>>>>>>
>>>>>>> On Sunday, May 31, 2015, Ramith Jayasinghe <ram...@wso2.com> wrote:
>>>>>>>
>>>>>>>> suppose there are two network partitions:
>>>>>>>>  P1,P2 where, nodecount(P1) >= nodecount(P2)
>>>>>>>>
>>>>>>>>  def: nodecount : - number of broker nodes in the partition.
>>>>>>>>
>>>>>>>>  so two brokers will operate own their own during the partition ( -
>>>>>>>> with their own coordinator which is bad -> we need to find/observe 
>>>>>>>> what's
>>>>>>>> the exact behavior
>>>>>>>>
>>>>>>>>  1)how slots are being used ->
>>>>>>>>  2) will this make stale messages in DB?
>>>>>>>>  3) will there be duplicates ( which is ok at this point than
>>>>>>>> loosing messages)
>>>>>>>>
>>>>>>>> and biggest problem we want to solve is what are we gong to do when
>>>>>>>> partitions are merged?
>>>>>>>> My proposal is:
>>>>>>>>  Partition which has biggest node count ( max(nodecount(P1),
>>>>>>>> nodecount(P2) ) continues to operate
>>>>>>>> and all other nodes have to restart (by user) if nodecount(P2) > 2.
>>>>>>>>
>>>>>>>> thoughts?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Ramith Jayasinghe
>>>>>>>> Technical Lead
>>>>>>>> WSO2 Inc., http://wso2.com
>>>>>>>> lean.enterprise.middleware
>>>>>>>>
>>>>>>>> E: ram...@wso2.com
>>>>>>>> P: +94 777542851
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Pamod Sylvester *
>>>>>>>
>>>>>>> *WSO2 Inc.; http://wso2.com <http://wso2.com>*
>>>>>>> cell: +94 77 7779495
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Hasitha Abeykoon*
>>>>>> Senior Software Engineer; WSO2, Inc.; http://wso2.com
>>>>>> *cell:* *+94 719363063*
>>>>>> *blog: **abeykoon.blogspot.com* <http://abeykoon.blogspot.com>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Ramith Jayasinghe
>>>>> Technical Lead
>>>>> WSO2 Inc., http://wso2.com
>>>>> lean.enterprise.middleware
>>>>>
>>>>> E: ram...@wso2.com
>>>>> P: +94 777542851
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Asanka Abeyweera
>>>> Software Engineer
>>>> WSO2 Inc.
>>>>
>>>> Phone: +94 712228648
>>>> Blog: a5anka.github.io
>>>>
>>>
>>>
>>>
>>> --
>>> *Asitha Nanayakkara*
>>> Software Engineer
>>> WSO2, Inc. http://wso2.com/
>>> Mob: + 94 77 85 30 682
>>>
>>>
>>
>>
>> --
>> Asanka Abeyweera
>> Software Engineer
>> WSO2 Inc.
>>
>> Phone: +94 712228648
>> Blog: a5anka.github.io
>>
>
>
>
> --
> *Asitha Nanayakkara*
> Software Engineer
> WSO2, Inc. http://wso2.com/
> Mob: + 94 77 85 30 682
>
>


-- 
Asanka Abeyweera
Software Engineer
WSO2 Inc.

Phone: +94 712228648
Blog: a5anka.github.io

_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] about network partitions

Reply via email to