Hi Asitha, Yes. We have to go for something like a lock in DB. We can easily do that in RDBMS. But how can we do that in Cassandra? Found this page [1]. But does not look very promising.
[1] http://wiki.apache.org/cassandra/Locking On Mon, Jun 8, 2015 at 12:29 PM, Asitha Nanayakkara <asi...@wso2.com> wrote: > Hi Asanka, > > Adding dev@ > > On Mon, Jun 8, 2015 at 12:04 PM, Asanka Abeyweera <asank...@wso2.com> > wrote: > >> Hi all, >> >> How are we going to handle following case with hazelcast? >> >> Assume we had an 8 node MB cluster and due to a network failure cluster >> divided in to two partitions with 4 nodes each. Now each partition have its >> own hazlecast cluster. But both the partitions are pointed to a single DB. >> Since slot manager users a range to define a slot, a slot can include >> messages from other partition's publishers. One side effect of this is >> message duplication which should not happen with queues. Another one is the >> message content removed by other partition before delivery. There can be >> some other complications too. >> >> WDYT? >> > > Yes I Agree with you on this. In my opinion in a Hazelcast partitioned > scenario we can't take decisions depending on Hazelcast. What matters here > is DB access. If we can have some sort of a lock for Slot coordinator in > terms of database then we might be able to get away with most of the > complexities involved. I'v talked about this in another mail thread as well > [1] If there is no DB access, anyway there is nothing that a slot > coordinator can do. > > Since DB access is vital for slot coordinator we might be better off using > database specific locking mechanism at all times without depending on > Hazelcast. WDYT? > > [1] [MB] Hazelcast coordinator issue after cluster partitioning > > Thanks, > Asitha > > >> >> >> On Mon, Jun 1, 2015 at 11:20 AM, Asitha Nanayakkara <asi...@wso2.com> >> wrote: >> >>> Hi all, >>> >>> What if we use the Hazelcast node list first member as coordinator >>> (suggested by the Hazelcast support). In an event of a member left and >>> member joined we evaluate the node list and check whether the node lists >>> first member has changed. If that's changed we fire a coordinator changed >>> event with the new coordinator details (this should be done in the kernel). >>> And we write our coordinator logic depending on this event. Current slot >>> coordinator might receive that he is not the coordinator so he can stop. >>> New slot coordinator can start. Others can updated coordinator details. >>> >>> IMHO at all times regardless of the cluster is partitioned or not, there >>> should be only one slot coordinator. >>> >>> In a situation where each node has access to DB (separate network card) >>> but doesn't have access to coordination thru Hazelcast (malfunctioning >>> network card) then there will be a cluster partition. And multiple slot >>> coordinators will operate. If there are publishers and subscribers for the >>> same queue on each partition messages will be duplicated and each slot >>> coordinator will deliver messages from overlapping slots on their own. >>> >>> My point here is, in a partition scenario if we have DB access from all >>> partitions, having multiple slot coordinators will be problematic. All this >>> options are assuming thrift is working without any issue. If thrift is not >>> working between partitions, then having a single slot coordinator will >>> starve subscribers in other partitions. >>> >>> So we have four communication links we need to look at >>> >>> - Database link >>> - Coordination link >>> - Thrift link >>> - Publisher subscriber link ( AMQP and MQTT ports) >>> >>> We need to analyze the impact of losing these links in any combination. >>> I may be totally or partially wrong on this. >>> >>> Thanks >>> Asitha >>> >>> On Mon, Jun 1, 2015 at 9:32 AM, Asanka Abeyweera <asank...@wso2.com> >>> wrote: >>> >>>> Hi, >>>> >>>> When the two partitions connect again, Can the cluster select a new >>>> slot manager node (other than the ones already present in two partitions)? >>>> We might also have to understand how the hazlecast lists and maps are >>>> merged internally in these scenarios to fully answer this. >>>> >>>> On Sun, May 31, 2015 at 8:08 PM, Ramith Jayasinghe <ram...@wso2.com> >>>> wrote: >>>> >>>>> well I'm not actually asking implement this. BUT we absolutely have >>>>> to have a reconciliation model otherwise we are screwed. >>>>> >>>>> >>>>> On Sun, May 31, 2015 at 7:28 PM, Hasitha Hiranya <hasit...@wso2.com> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> We need to merge all operating lists (fresh slots/assigned >>>>>> slots/overlapped slots/returned slots) in two slot managers together. >>>>>> >>>>>> If we met a conflict during merging (same slot assigned to different >>>>>> nodes), we should give a BIG warning, and maybe continue. At that point >>>>>> we >>>>>> cannot do anything from Slot Manager Side, individual nodes will be >>>>>> delivering same message. >>>>>> >>>>>> Otherwise we need to introduce some abortImmediately method - which >>>>>> is heck-tic. >>>>>> >>>>>> So, yeah, Ramith's proposal looks simple enough. When partitions are >>>>>> merged, allow big part to continue, and do not allow any new slot >>>>>> assignments to nodes which are not in the partition, rather put a BIG >>>>>> log, >>>>>> this node is useless and not in a cluster. Please restart. >>>>>> >>>>>> Thanks >>>>>> >>>>>> On Sun, May 31, 2015 at 7:45 AM, Pamod Sylvester <pa...@wso2.com> >>>>>> wrote: >>>>>> >>>>>>> In this case we might need to sort messages which are laying in a >>>>>>> queue or a durable subscription ? For message ordering. I.e maintaining >>>>>>> time stamp etc >>>>>>> >>>>>>> >>>>>>> On Sunday, May 31, 2015, Ramith Jayasinghe <ram...@wso2.com> wrote: >>>>>>> >>>>>>>> suppose there are two network partitions: >>>>>>>> P1,P2 where, nodecount(P1) >= nodecount(P2) >>>>>>>> >>>>>>>> def: nodecount : - number of broker nodes in the partition. >>>>>>>> >>>>>>>> so two brokers will operate own their own during the partition ( - >>>>>>>> with their own coordinator which is bad -> we need to find/observe >>>>>>>> what's >>>>>>>> the exact behavior >>>>>>>> >>>>>>>> 1)how slots are being used -> >>>>>>>> 2) will this make stale messages in DB? >>>>>>>> 3) will there be duplicates ( which is ok at this point than >>>>>>>> loosing messages) >>>>>>>> >>>>>>>> and biggest problem we want to solve is what are we gong to do when >>>>>>>> partitions are merged? >>>>>>>> My proposal is: >>>>>>>> Partition which has biggest node count ( max(nodecount(P1), >>>>>>>> nodecount(P2) ) continues to operate >>>>>>>> and all other nodes have to restart (by user) if nodecount(P2) > 2. >>>>>>>> >>>>>>>> thoughts? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ramith Jayasinghe >>>>>>>> Technical Lead >>>>>>>> WSO2 Inc., http://wso2.com >>>>>>>> lean.enterprise.middleware >>>>>>>> >>>>>>>> E: ram...@wso2.com >>>>>>>> P: +94 777542851 >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Pamod Sylvester * >>>>>>> >>>>>>> *WSO2 Inc.; http://wso2.com <http://wso2.com>* >>>>>>> cell: +94 77 7779495 >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Hasitha Abeykoon* >>>>>> Senior Software Engineer; WSO2, Inc.; http://wso2.com >>>>>> *cell:* *+94 719363063* >>>>>> *blog: **abeykoon.blogspot.com* <http://abeykoon.blogspot.com> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Ramith Jayasinghe >>>>> Technical Lead >>>>> WSO2 Inc., http://wso2.com >>>>> lean.enterprise.middleware >>>>> >>>>> E: ram...@wso2.com >>>>> P: +94 777542851 >>>>> >>>>> >>>> >>>> >>>> -- >>>> Asanka Abeyweera >>>> Software Engineer >>>> WSO2 Inc. >>>> >>>> Phone: +94 712228648 >>>> Blog: a5anka.github.io >>>> >>> >>> >>> >>> -- >>> *Asitha Nanayakkara* >>> Software Engineer >>> WSO2, Inc. http://wso2.com/ >>> Mob: + 94 77 85 30 682 >>> >>> >> >> >> -- >> Asanka Abeyweera >> Software Engineer >> WSO2 Inc. >> >> Phone: +94 712228648 >> Blog: a5anka.github.io >> > > > > -- > *Asitha Nanayakkara* > Software Engineer > WSO2, Inc. http://wso2.com/ > Mob: + 94 77 85 30 682 > > -- Asanka Abeyweera Software Engineer WSO2 Inc. Phone: +94 712228648 Blog: a5anka.github.io
_______________________________________________ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev