Also, could we make this happily_chugging_along_current_master becoming slave again optional ? This sounds like an unnecessary disruption to the distributed system. I am assuming that, perhaps, somebody has a need for it.
The following use case sounds reasonable: - current master fails - slave becomes master - the old_master may or may not come back up again (who cares ? we may just empower another cheapo machine for the slave functionality) - another machine becomes the new slave The point being, there should be no value to coupling master/slave functionality to specific pieces of hardware, in a majority of cases. Thanks Regards - Sridhar On 3/8/06, Sridhar Komandur <[EMAIL PROTECTED]> wrote: > > > I agree that bulk synch first and then enabling dynamic synch cuts on down > complexity. > > Please see answers inline ... > > On 3/8/06, Rob Davies <[EMAIL PROTECTED]> wrote: > > > > Is not that simple - while the slave is syncing, there are also > > running clients that are acknowledging messages (and hence they get > > deleted). > > > These new messages will be part of the dynamic sync. Note that the slave > has the option of not acting on it right away. > > We could record all the message exchanges (adds/deletes/new durable > > subscribers/delete subscribers etc. etc.) - but is it really likely > > that the slave will ever catch up without a pause ? > > > This type of synchronization gets very difficult very quickly. We > > haven't even gone through edge cases (fail-over scenarios whilst the > > master/slave are still syncing for example). > > > > I am not sure if it is fruitful to record and send every message to the > slave. Slave is recreating the state of the master, so only > summary/outstanding info at tthe master that is of relevance to recreating > slave state should be sent. I am not yet familiar enough with activemq to > delve into specific design details. > > Which is why my preference is pause processing whilst a bulk transfer > > happens. In reality, as we prefer shared-nothing architectures, this > > involves copying journal files and database files from one machine to > > another - which can be done relatively quickly - so pausing the > > clients won't be too onerous. > > > > cheers, > > > > Rob > > > > So, just to clarify, when bulk sync is happening what happens to the > messages from producers (are they also blocked) ?How putting the new > messages in a 'holding' area, while sending them to the slave - slave is in > a 'bulk synch' state and would do the same. > > > As I mentioned, waiting for bulk transfer to finish simplies design. It > also gives us an opportunity to evaluate if the pain of pausing the clients > is worth the additional complexity. > > Regards > - Sridhar > > > I > > On 8 Mar 2006, at 19:28, Sridhar Komandur wrote: > > > > > On 3/8/06, Ning Li < [EMAIL PROTECTED]> wrote: > > >> > > >> Bulk synch is a good idea, I think we can find a way to do it in > > >> current > > >> system, like create a topic and every message comes in will be > > >> sent to > > >> that topic, when the secondary comes up, it can pull those > > >> messages. Or > > >> we can find other ways to do it. > > > > > > > > > Yes, an internally created (persisted) queue at the primary > > > to store stuff when the secondary is not in sight. When the > > > secondary comes > > > up > > > it drains from that subject ? Sounds like a good idea to me. > > > > > > > > > One difficulty is we cannot pause the primary broker, it is hard > > > for the > > >> secondary to catch up with both the historic and ongoing messages, I > > >> think there is a timing issue in it. I guess that is why James > > >> recommended pausing the primary broker. > > >> > > >> I am not sure if we can find a way to do both dynamic synch and bulk > > >> synch at the same time in the current system that will be great. > > > > > > > > > > > > It can be done - we need a notion of ordering among all the > > > messages (coming > > > from both dynamic as well as bulk synch). This ordering can be > > > provided by > > > the message arrival time stamp at the primary. > > > > > > Once we do this it is a matter of inserting the incoming messages > > > (without > > > worrying about the source) to the same target store. We can even > > > have the > > > bulk synch proceed in a lazy fashion - a background task at the > > > primary (and > > > possibly at the secondary) for a couple of reasons: > > > - latest messages are more relevant/important > > > - latest messages could in fact be retransmissions of the old, so > > > it is ok > > > to process the old messeges later for recovery purposes > > > > > > Regards > > > - Sridhar > > > > > > Thanks. > > >> > > >> Ning > > >> -----Original Message----- > > >> From: [EMAIL PROTECTED] [mailto: [EMAIL PROTECTED] > > On > > >> Behalf Of Sridhar Komandur > > >> Sent: Wednesday, March 08, 2006 9:59 AM > > >> To: activemq-dev@geronimo.apache.org > > >> Subject: Re: improve master/slave topology > > >> > > >> I like the idea of broker-broker synchronization. One of the > > >> issues to > > >> resolve is how reliable this synch activity needs to be ? A > > >> transactional > > >> approach is too heavy weight for the common case. > > >> > > >> I think a middle ground based on TCP may be good enough. We can > > >> divide > > >> the > > >> synchronization into two phases: > > >> - dynamic synch : messages are sent to the partner on an ongoing > > >> basis > > >> - bulk synch: a new secondary comes up and its state needs to be > > >> brought > > >> up > > >> to par with primary > > >> > > >> Thanks > > >> Regards > > >> - Sridhar > > >> > > >> On 3/6/06, Ning Li <[EMAIL PROTECTED]> wrote: > > >>> > > >>> Hi, > > >>> > > >>> This is a continued discussion about dynamically reintroduce the > > >> master > > >>> after a failure, the original discussion is here. > > >>> > > >>> http://forums.activemq.org/posts/list/468.page#1653 > > >>> > > >>> James idea about pausing the slave and synchronize two DBs is better > > > > >>> than stopping the slave and doing a manual sync. But I doubt this is > > >>> acceptable to us, as in real production environment, we won't be > > >>> able > > >> to > > >>> pause the only message broker unless for a really short interval (I > > >>> guess have to less than one minute otherwise the end user will > > >>> notice > > >>> it). > > >>> > > >>> Maybe a broker-broker synchronization protocol is the ultimate > > >> solution, > > >>> just we are not sure how to get there. Any recommendation or > > >>> suggestions? > > >>> > > >>> > > >>> Thanks > > >>> > > >>> Ning > > >>> > > >> > > > > >