Re: True master-master fail-over without data gaps (choosing CA in CAP)
Hi, - Original Message > From: Jake Luciani > To: solr-user@lucene.apache.org > Sent: Wed, March 9, 2011 8:07:00 PM > Subject: Re: True master-master fail-over without data gaps (choosing CA in >CAP) > > Yeah sure. Let me update this on the Solandra wiki. I'll send across the > link Excellent. You could include ES there, too, if you feel extra adventurous. ;) > I think you hit the main two shortcomings atm. - Grandma, why are your eyes so big? - To see you better. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ > -Jake > > On Wed, Mar 9, 2011 at 6:17 PM, Otis Gospodnetic > wrote: > > > Jake, > > > > Maybe it's time to come up with the Solandra/Solr matrix so we can see > > Solandra's strengths (e.g. RT, no replication) and weaknesses (e.g. I think > > I > > saw a mention of some big indices?) or missing feature (e.g. no delete by > > query), etc. > > > > Thanks! > > Otis > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > Lucene ecosystem search :: http://search-lucene.com/ > > > > > > > > - Original Message ---- > > > From: Jake Luciani > > > To: "solr-user@lucene.apache.org" > > > Sent: Wed, March 9, 2011 6:04:13 PM > > > Subject: Re: True master-master fail-over without data gaps (choosing CA > > in > > >CAP) > > > > > > Jason, > > > > > > It's predecessor did, Lucandra. But Solandra is a new approach that > > manages > > >shards of documents across the cluster for you and uses solrs distributed > > >search to query indexes. > > > > > > > > > Jake > > > > > > On Mar 9, 2011, at 5:15 PM, Jason Rutherglen < > > jason.rutherg...@gmail.com> > > >wrote: > > > > > > > Doesn't Solandra partition by term instead of document? > > > > > > > > On Wed, Mar 9, 2011 at 2:13 PM, Smiley, David W. > > wrote: > > > >> I was just about to jump in this conversation to mention Solandra and > > go > > >fig, Solandra's committer comes in. :-) It was nice to meet you at > > Strata, > > >Jake. > > > >> > > > >> I haven't dug into the code yet but Solandra strikes me as a killer > > way to > > >scale Solr. I'm looking forward to playing with it; particularly looking > > at > > >disk requirements and performance measurements. > > > >> > > > >> ~ David Smiley > > > >> > > > >> On Mar 9, 2011, at 3:14 PM, Jake Luciani wrote: > > > >> > > > >>> Hi Otis, > > > >>> > > > >>> Have you considered using Solandra with Quorum writes > > > >>> to achieve master/master with CA semantics? > > > >>> > > > >>> -Jake > > > >>> > > > >>> > > > >>> On Wed, Mar 9, 2011 at 2:48 PM, Otis Gospodnetic > > > > > >>>> wrote: > > > >>> > > > >>>> Hi, > > > >>>> > > > >>>> Original Message > > > >>>> > > > >>>>> From: Robert Petersen > > > >>>>> > > > >>>>> Can't you skip the SAN and keep the indexes locally? Then you > > would > > > >>>>> have two redundant copies of the index and no lock issues. > > > >>>> > > > >>>> I could, but then I'd have the issue of keeping them in sync, which > > >seems > > > >>>> more > > > >>>> fragile. I think SAN makes things simpler overall. > > > >>>> > > > >>>>> Also, Can't master02 just be a slave to master01 (in the master > > farm > > >and > > > >>>>> separate from the slave farm) until such time as master01 fails? > > Then > > > >>>> > > > >>>> No, because it wouldn't be in sync. It would always be N minutes > > >behind, > > > >>>> and > > > >>>> when the primary master fails, the secondary would not have all the > > docs > > >- > > > >>>> data > > > >>>> loss. >
Re: True master-master fail-over without data gaps (choosing CA in CAP)
Yeah sure. Let me update this on the Solandra wiki. I'll send across the link I think you hit the main two shortcomings atm. -Jake On Wed, Mar 9, 2011 at 6:17 PM, Otis Gospodnetic wrote: > Jake, > > Maybe it's time to come up with the Solandra/Solr matrix so we can see > Solandra's strengths (e.g. RT, no replication) and weaknesses (e.g. I think > I > saw a mention of some big indices?) or missing feature (e.g. no delete by > query), etc. > > Thanks! > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message > > From: Jake Luciani > > To: "solr-user@lucene.apache.org" > > Sent: Wed, March 9, 2011 6:04:13 PM > > Subject: Re: True master-master fail-over without data gaps (choosing CA > in > >CAP) > > > > Jason, > > > > It's predecessor did, Lucandra. But Solandra is a new approach that > manages > >shards of documents across the cluster for you and uses solrs distributed > >search to query indexes. > > > > > > Jake > > > > On Mar 9, 2011, at 5:15 PM, Jason Rutherglen < > jason.rutherg...@gmail.com> > >wrote: > > > > > Doesn't Solandra partition by term instead of document? > > > > > > On Wed, Mar 9, 2011 at 2:13 PM, Smiley, David W. > wrote: > > >> I was just about to jump in this conversation to mention Solandra and > go > >fig, Solandra's committer comes in. :-) It was nice to meet you at > Strata, > >Jake. > > >> > > >> I haven't dug into the code yet but Solandra strikes me as a killer > way to > >scale Solr. I'm looking forward to playing with it; particularly looking > at > >disk requirements and performance measurements. > > >> > > >> ~ David Smiley > > >> > > >> On Mar 9, 2011, at 3:14 PM, Jake Luciani wrote: > > >> > > >>> Hi Otis, > > >>> > > >>> Have you considered using Solandra with Quorum writes > > >>> to achieve master/master with CA semantics? > > >>> > > >>> -Jake > > >>> > > >>> > > >>> On Wed, Mar 9, 2011 at 2:48 PM, Otis Gospodnetic > > > >>>> wrote: > > >>> > > >>>> Hi, > > >>>> > > >>>> Original Message > > >>>> > > >>>>> From: Robert Petersen > > >>>>> > > >>>>> Can't you skip the SAN and keep the indexes locally? Then you > would > > >>>>> have two redundant copies of the index and no lock issues. > > >>>> > > >>>> I could, but then I'd have the issue of keeping them in sync, which > >seems > > >>>> more > > >>>> fragile. I think SAN makes things simpler overall. > > >>>> > > >>>>> Also, Can't master02 just be a slave to master01 (in the master > farm > >and > > >>>>> separate from the slave farm) until such time as master01 fails? > Then > > >>>> > > >>>> No, because it wouldn't be in sync. It would always be N minutes > >behind, > > >>>> and > > >>>> when the primary master fails, the secondary would not have all the > docs > >- > > >>>> data > > >>>> loss. > > >>>> > > >>>>> master02 would start receiving the new documents with an indexes > > >>>>> complete up to the last replication at least and the other slaves > would > > >>>>> be directed by LB to poll master02 also... > > >>>> > > >>>> Yeah, "complete up to the last replication" is the problem. It's a > data > > >>>> gap > > >>>> that now needs to be filled somehow. > > >>>> > > >>>> Otis > > >>>> > > >>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > >>>> Lucene ecosystem search :: http://search-lucene.com/ > > >>>> > > >>>> > > >>>>> -Original Message- > > >>>>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] > > >>>>> Sent: Wednesday, March 09, 2011 9:47 AM > > &
Re: True master-master fail-over without data gaps (choosing CA in CAP)
Jake, Maybe it's time to come up with the Solandra/Solr matrix so we can see Solandra's strengths (e.g. RT, no replication) and weaknesses (e.g. I think I saw a mention of some big indices?) or missing feature (e.g. no delete by query), etc. Thanks! Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Jake Luciani > To: "solr-user@lucene.apache.org" > Sent: Wed, March 9, 2011 6:04:13 PM > Subject: Re: True master-master fail-over without data gaps (choosing CA in >CAP) > > Jason, > > It's predecessor did, Lucandra. But Solandra is a new approach that manages >shards of documents across the cluster for you and uses solrs distributed >search to query indexes. > > > Jake > > On Mar 9, 2011, at 5:15 PM, Jason Rutherglen >wrote: > > > Doesn't Solandra partition by term instead of document? > > > > On Wed, Mar 9, 2011 at 2:13 PM, Smiley, David W. wrote: > >> I was just about to jump in this conversation to mention Solandra and go >fig, Solandra's committer comes in. :-) It was nice to meet you at Strata, >Jake. > >> > >> I haven't dug into the code yet but Solandra strikes me as a killer way > >> to >scale Solr. I'm looking forward to playing with it; particularly looking at >disk requirements and performance measurements. > >> > >> ~ David Smiley > >> > >> On Mar 9, 2011, at 3:14 PM, Jake Luciani wrote: > >> > >>> Hi Otis, > >>> > >>> Have you considered using Solandra with Quorum writes > >>> to achieve master/master with CA semantics? > >>> > >>> -Jake > >>> > >>> > >>> On Wed, Mar 9, 2011 at 2:48 PM, Otis Gospodnetic > >>>> wrote: > >>> > >>>> Hi, > >>>> > >>>> Original Message > >>>> > >>>>> From: Robert Petersen > >>>>> > >>>>> Can't you skip the SAN and keep the indexes locally? Then you would > >>>>> have two redundant copies of the index and no lock issues. > >>>> > >>>> I could, but then I'd have the issue of keeping them in sync, which >seems > >>>> more > >>>> fragile. I think SAN makes things simpler overall. > >>>> > >>>>> Also, Can't master02 just be a slave to master01 (in the master farm >and > >>>>> separate from the slave farm) until such time as master01 fails? Then > >>>> > >>>> No, because it wouldn't be in sync. It would always be N minutes >behind, > >>>> and > >>>> when the primary master fails, the secondary would not have all the > >>>> docs >- > >>>> data > >>>> loss. > >>>> > >>>>> master02 would start receiving the new documents with an indexes > >>>>> complete up to the last replication at least and the other slaves would > >>>>> be directed by LB to poll master02 also... > >>>> > >>>> Yeah, "complete up to the last replication" is the problem. It's a data > >>>> gap > >>>> that now needs to be filled somehow. > >>>> > >>>> Otis > >>>> > >>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > >>>> Lucene ecosystem search :: http://search-lucene.com/ > >>>> > >>>> > >>>>> -Original Message- > >>>>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] > >>>>> Sent: Wednesday, March 09, 2011 9:47 AM > >>>>> To: solr-user@lucene.apache.org > >>>>> Subject: Re: True master-master fail-over without data gaps (choosing > >>>>> >CA > >>>>> in CAP) > >>>>> > >>>>> Hi, > >>>>> > >>>>> > >>>>> - Original Message > >>>>>> From: Walter Underwood > >>>>> > >>>>>> On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote: > >>>>>> > >>>>>>> You mean it's not possible to have 2 masters that are in nearly > >>>>> real-time > >
Re: True master-master fail-over without data gaps (choosing CA in CAP)
Jason, It's predecessor did, Lucandra. But Solandra is a new approach that manages shards of documents across the cluster for you and uses solrs distributed search to query indexes. Jake On Mar 9, 2011, at 5:15 PM, Jason Rutherglen wrote: > Doesn't Solandra partition by term instead of document? > > On Wed, Mar 9, 2011 at 2:13 PM, Smiley, David W. wrote: >> I was just about to jump in this conversation to mention Solandra and go >> fig, Solandra's committer comes in. :-) It was nice to meet you at Strata, >> Jake. >> >> I haven't dug into the code yet but Solandra strikes me as a killer way to >> scale Solr. I'm looking forward to playing with it; particularly looking at >> disk requirements and performance measurements. >> >> ~ David Smiley >> >> On Mar 9, 2011, at 3:14 PM, Jake Luciani wrote: >> >>> Hi Otis, >>> >>> Have you considered using Solandra with Quorum writes >>> to achieve master/master with CA semantics? >>> >>> -Jake >>> >>> >>> On Wed, Mar 9, 2011 at 2:48 PM, Otis Gospodnetic >>> wrote: >>> >>>> Hi, >>>> >>>> Original Message >>>> >>>>> From: Robert Petersen >>>>> >>>>> Can't you skip the SAN and keep the indexes locally? Then you would >>>>> have two redundant copies of the index and no lock issues. >>>> >>>> I could, but then I'd have the issue of keeping them in sync, which seems >>>> more >>>> fragile. I think SAN makes things simpler overall. >>>> >>>>> Also, Can't master02 just be a slave to master01 (in the master farm and >>>>> separate from the slave farm) until such time as master01 fails? Then >>>> >>>> No, because it wouldn't be in sync. It would always be N minutes behind, >>>> and >>>> when the primary master fails, the secondary would not have all the docs - >>>> data >>>> loss. >>>> >>>>> master02 would start receiving the new documents with an indexes >>>>> complete up to the last replication at least and the other slaves would >>>>> be directed by LB to poll master02 also... >>>> >>>> Yeah, "complete up to the last replication" is the problem. It's a data >>>> gap >>>> that now needs to be filled somehow. >>>> >>>> Otis >>>> >>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch >>>> Lucene ecosystem search :: http://search-lucene.com/ >>>> >>>> >>>>> -Original Message- >>>>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] >>>>> Sent: Wednesday, March 09, 2011 9:47 AM >>>>> To: solr-user@lucene.apache.org >>>>> Subject: Re: True master-master fail-over without data gaps (choosing CA >>>>> in CAP) >>>>> >>>>> Hi, >>>>> >>>>> >>>>> - Original Message >>>>>> From: Walter Underwood >>>>> >>>>>> On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote: >>>>>> >>>>>>> You mean it's not possible to have 2 masters that are in nearly >>>>> real-time >>>>>> sync? >>>>>>> How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs >>>>> (their >>>>>> edit >>>>>> >>>>>>> logs) in sync to avoid the current NN SPOF, for example, so I'm >>>>> thinking >>>>>> this >>>>>> >>>>>>> could be doable with Solr masters, too, no? >>>>>> >>>>>> If you add fault-tolerant, you run into the CAP Theorem. Consistency, >>>>> >>>>>> availability, partition: choose two. You cannot have it all. >>>>> >>>>> Right, so I'll take Consistency and Availability, and I'll put my 2 >>>>> masters in >>>>> the same rack (which has redundant switches, power supply, etc.) and >>>>> thus >>>>> minimize/avoid partitioning. >>>>> Assuming the above actually works, I think my Q remains: >>>>> >>>>> How do you set up 2 Solr masters so they are in n
Re: True master-master fail-over without data gaps (choosing CA in CAP)
Doesn't Solandra partition by term instead of document? On Wed, Mar 9, 2011 at 2:13 PM, Smiley, David W. wrote: > I was just about to jump in this conversation to mention Solandra and go fig, > Solandra's committer comes in. :-) It was nice to meet you at Strata, Jake. > > I haven't dug into the code yet but Solandra strikes me as a killer way to > scale Solr. I'm looking forward to playing with it; particularly looking at > disk requirements and performance measurements. > > ~ David Smiley > > On Mar 9, 2011, at 3:14 PM, Jake Luciani wrote: > >> Hi Otis, >> >> Have you considered using Solandra with Quorum writes >> to achieve master/master with CA semantics? >> >> -Jake >> >> >> On Wed, Mar 9, 2011 at 2:48 PM, Otis Gospodnetic >> wrote: >> >>> Hi, >>> >>> Original Message >>> >>>> From: Robert Petersen >>>> >>>> Can't you skip the SAN and keep the indexes locally? Then you would >>>> have two redundant copies of the index and no lock issues. >>> >>> I could, but then I'd have the issue of keeping them in sync, which seems >>> more >>> fragile. I think SAN makes things simpler overall. >>> >>>> Also, Can't master02 just be a slave to master01 (in the master farm and >>>> separate from the slave farm) until such time as master01 fails? Then >>> >>> No, because it wouldn't be in sync. It would always be N minutes behind, >>> and >>> when the primary master fails, the secondary would not have all the docs - >>> data >>> loss. >>> >>>> master02 would start receiving the new documents with an indexes >>>> complete up to the last replication at least and the other slaves would >>>> be directed by LB to poll master02 also... >>> >>> Yeah, "complete up to the last replication" is the problem. It's a data >>> gap >>> that now needs to be filled somehow. >>> >>> Otis >>> >>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch >>> Lucene ecosystem search :: http://search-lucene.com/ >>> >>> >>>> -Original Message- >>>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] >>>> Sent: Wednesday, March 09, 2011 9:47 AM >>>> To: solr-user@lucene.apache.org >>>> Subject: Re: True master-master fail-over without data gaps (choosing CA >>>> in CAP) >>>> >>>> Hi, >>>> >>>> >>>> - Original Message >>>>> From: Walter Underwood >>>> >>>>> On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote: >>>>> >>>>>> You mean it's not possible to have 2 masters that are in nearly >>>> real-time >>>>> sync? >>>>>> How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs >>>> (their >>>>> edit >>>>> >>>>>> logs) in sync to avoid the current NN SPOF, for example, so I'm >>>> thinking >>>>> this >>>>> >>>>>> could be doable with Solr masters, too, no? >>>>> >>>>> If you add fault-tolerant, you run into the CAP Theorem. Consistency, >>>> >>>>> availability, partition: choose two. You cannot have it all. >>>> >>>> Right, so I'll take Consistency and Availability, and I'll put my 2 >>>> masters in >>>> the same rack (which has redundant switches, power supply, etc.) and >>>> thus >>>> minimize/avoid partitioning. >>>> Assuming the above actually works, I think my Q remains: >>>> >>>> How do you set up 2 Solr masters so they are in near real-time sync? >>>> DRBD? >>>> >>>> But here is maybe a simpler scenario that more people may be >>>> considering: >>>> >>>> Imagine 2 masters on 2 different servers in 1 rack, pointing to the same >>>> index >>>> on the shared storage (SAN) that also happens to live in the same rack. >>>> 2 Solr masters are behind 1 LB VIP that indexer talks to. >>>> The VIP is configured so that all requests always get routed to the >>>> primary >>>> master (because only 1 master can be modifying an index at a time), >>>> except when
Re: True master-master fail-over without data gaps (choosing CA in CAP)
I was just about to jump in this conversation to mention Solandra and go fig, Solandra's committer comes in. :-) It was nice to meet you at Strata, Jake. I haven't dug into the code yet but Solandra strikes me as a killer way to scale Solr. I'm looking forward to playing with it; particularly looking at disk requirements and performance measurements. ~ David Smiley On Mar 9, 2011, at 3:14 PM, Jake Luciani wrote: > Hi Otis, > > Have you considered using Solandra with Quorum writes > to achieve master/master with CA semantics? > > -Jake > > > On Wed, Mar 9, 2011 at 2:48 PM, Otis Gospodnetic > wrote: > >> Hi, >> >> Original Message >> >>> From: Robert Petersen >>> >>> Can't you skip the SAN and keep the indexes locally? Then you would >>> have two redundant copies of the index and no lock issues. >> >> I could, but then I'd have the issue of keeping them in sync, which seems >> more >> fragile. I think SAN makes things simpler overall. >> >>> Also, Can't master02 just be a slave to master01 (in the master farm and >>> separate from the slave farm) until such time as master01 fails? Then >> >> No, because it wouldn't be in sync. It would always be N minutes behind, >> and >> when the primary master fails, the secondary would not have all the docs - >> data >> loss. >> >>> master02 would start receiving the new documents with an indexes >>> complete up to the last replication at least and the other slaves would >>> be directed by LB to poll master02 also... >> >> Yeah, "complete up to the last replication" is the problem. It's a data >> gap >> that now needs to be filled somehow. >> >> Otis >> >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch >> Lucene ecosystem search :: http://search-lucene.com/ >> >> >>> -Original Message- >>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] >>> Sent: Wednesday, March 09, 2011 9:47 AM >>> To: solr-user@lucene.apache.org >>> Subject: Re: True master-master fail-over without data gaps (choosing CA >>> in CAP) >>> >>> Hi, >>> >>> >>> - Original Message >>>> From: Walter Underwood >>> >>>> On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote: >>>> >>>>> You mean it's not possible to have 2 masters that are in nearly >>> real-time >>>> sync? >>>>> How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs >>> (their >>>> edit >>>> >>>>> logs) in sync to avoid the current NN SPOF, for example, so I'm >>> thinking >>>> this >>>> >>>>> could be doable with Solr masters, too, no? >>>> >>>> If you add fault-tolerant, you run into the CAP Theorem. Consistency, >>> >>>> availability, partition: choose two. You cannot have it all. >>> >>> Right, so I'll take Consistency and Availability, and I'll put my 2 >>> masters in >>> the same rack (which has redundant switches, power supply, etc.) and >>> thus >>> minimize/avoid partitioning. >>> Assuming the above actually works, I think my Q remains: >>> >>> How do you set up 2 Solr masters so they are in near real-time sync? >>> DRBD? >>> >>> But here is maybe a simpler scenario that more people may be >>> considering: >>> >>> Imagine 2 masters on 2 different servers in 1 rack, pointing to the same >>> index >>> on the shared storage (SAN) that also happens to live in the same rack. >>> 2 Solr masters are behind 1 LB VIP that indexer talks to. >>> The VIP is configured so that all requests always get routed to the >>> primary >>> master (because only 1 master can be modifying an index at a time), >>> except when >>> this primary is down, in which case the requests are sent to the >>> secondary >>> master. >>> >>> So in this case my Q is around automation of this, around Lucene index >>> locks, >>> around the need for manual intervention, and such. >>> Concretely, if you have these 2 master instances, the primary master has >>> the >>> Lucene index lock in the index dir. When the secondary master needs to >>> take >>> over (i.e., when i
Re: True master-master fail-over without data gaps (choosing CA in CAP)
Hi Otis, Have you considered using Solandra with Quorum writes to achieve master/master with CA semantics? -Jake On Wed, Mar 9, 2011 at 2:48 PM, Otis Gospodnetic wrote: > Hi, > > Original Message > > > From: Robert Petersen > > > > Can't you skip the SAN and keep the indexes locally? Then you would > > have two redundant copies of the index and no lock issues. > > I could, but then I'd have the issue of keeping them in sync, which seems > more > fragile. I think SAN makes things simpler overall. > > > Also, Can't master02 just be a slave to master01 (in the master farm and > > separate from the slave farm) until such time as master01 fails? Then > > No, because it wouldn't be in sync. It would always be N minutes behind, > and > when the primary master fails, the secondary would not have all the docs - > data > loss. > > > master02 would start receiving the new documents with an indexes > > complete up to the last replication at least and the other slaves would > > be directed by LB to poll master02 also... > > Yeah, "complete up to the last replication" is the problem. It's a data > gap > that now needs to be filled somehow. > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > -Original Message----- > > From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] > > Sent: Wednesday, March 09, 2011 9:47 AM > > To: solr-user@lucene.apache.org > > Subject: Re: True master-master fail-over without data gaps (choosing CA > > in CAP) > > > > Hi, > > > > > > - Original Message > > > From: Walter Underwood > > > > > On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote: > > > > > > > You mean it's not possible to have 2 masters that are in nearly > > real-time > > >sync? > > > > How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs > > (their > > >edit > > > > > > > logs) in sync to avoid the current NN SPOF, for example, so I'm > > thinking > > >this > > > > > > > could be doable with Solr masters, too, no? > > > > > > If you add fault-tolerant, you run into the CAP Theorem. Consistency, > > > > >availability, partition: choose two. You cannot have it all. > > > > Right, so I'll take Consistency and Availability, and I'll put my 2 > > masters in > > the same rack (which has redundant switches, power supply, etc.) and > > thus > > minimize/avoid partitioning. > > Assuming the above actually works, I think my Q remains: > > > > How do you set up 2 Solr masters so they are in near real-time sync? > > DRBD? > > > > But here is maybe a simpler scenario that more people may be > > considering: > > > > Imagine 2 masters on 2 different servers in 1 rack, pointing to the same > > index > > on the shared storage (SAN) that also happens to live in the same rack. > > 2 Solr masters are behind 1 LB VIP that indexer talks to. > > The VIP is configured so that all requests always get routed to the > > primary > > master (because only 1 master can be modifying an index at a time), > > except when > > this primary is down, in which case the requests are sent to the > > secondary > > master. > > > > So in this case my Q is around automation of this, around Lucene index > > locks, > > around the need for manual intervention, and such. > > Concretely, if you have these 2 master instances, the primary master has > > the > > Lucene index lock in the index dir. When the secondary master needs to > > take > > over (i.e., when it starts receiving documents via LB), it needs to be > > able to > > write to that same index. But what if that lock is still around? One > > could use > > the Native lock to make the lock disappear if the primary master's JVM > > exited > > unexpectedly, and in that case everything *should* work and be > > completely > > transparent, right? That is, the secondary will start getting new docs, > > it will > > use its IndexWriter to write to that same shared index, which won't be > > locked > > for writes because the lock is gone, and everyone will be happy. Did I > > miss > > something important here? > > > > Assuming the above is correct, what if the lock is *not* gone because > > the > > primary master's JVM is actually not dead, although maybe unresponsive, > > so LB > > thinks the primary master is dead. Then the LB will route indexing > > requests to > > the secondary master, which will attempt to write to the index, but be > > denied > > because of the lock. So a human needs to jump in, remove the lock, and > > manually > > reindex failed docs if the upstream component doesn't buffer docs that > > failed to > > get indexed and doesn't retry indexing them automatically. Is this > > correct or > > is there a way to avoid humans here? > > > > Thanks, > > Otis > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > Lucene ecosystem search :: http://search-lucene.com/ > > > -- http://twitter.com/tjake
Re: True master-master fail-over without data gaps (choosing CA in CAP)
Hi, Original Message > From: Robert Petersen > > Can't you skip the SAN and keep the indexes locally? Then you would > have two redundant copies of the index and no lock issues. I could, but then I'd have the issue of keeping them in sync, which seems more fragile. I think SAN makes things simpler overall. > Also, Can't master02 just be a slave to master01 (in the master farm and > separate from the slave farm) until such time as master01 fails? Then No, because it wouldn't be in sync. It would always be N minutes behind, and when the primary master fails, the secondary would not have all the docs - data loss. > master02 would start receiving the new documents with an indexes > complete up to the last replication at least and the other slaves would > be directed by LB to poll master02 also... Yeah, "complete up to the last replication" is the problem. It's a data gap that now needs to be filled somehow. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ > -Original Message- > From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] > Sent: Wednesday, March 09, 2011 9:47 AM > To: solr-user@lucene.apache.org > Subject: Re: True master-master fail-over without data gaps (choosing CA > in CAP) > > Hi, > > > - Original Message > > From: Walter Underwood > > > On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote: > > > > > You mean it's not possible to have 2 masters that are in nearly > real-time > >sync? > > > How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs > (their > >edit > > > > > logs) in sync to avoid the current NN SPOF, for example, so I'm > thinking > >this > > > > > could be doable with Solr masters, too, no? > > > > If you add fault-tolerant, you run into the CAP Theorem. Consistency, > > >availability, partition: choose two. You cannot have it all. > > Right, so I'll take Consistency and Availability, and I'll put my 2 > masters in > the same rack (which has redundant switches, power supply, etc.) and > thus > minimize/avoid partitioning. > Assuming the above actually works, I think my Q remains: > > How do you set up 2 Solr masters so they are in near real-time sync? > DRBD? > > But here is maybe a simpler scenario that more people may be > considering: > > Imagine 2 masters on 2 different servers in 1 rack, pointing to the same > index > on the shared storage (SAN) that also happens to live in the same rack. > 2 Solr masters are behind 1 LB VIP that indexer talks to. > The VIP is configured so that all requests always get routed to the > primary > master (because only 1 master can be modifying an index at a time), > except when > this primary is down, in which case the requests are sent to the > secondary > master. > > So in this case my Q is around automation of this, around Lucene index > locks, > around the need for manual intervention, and such. > Concretely, if you have these 2 master instances, the primary master has > the > Lucene index lock in the index dir. When the secondary master needs to > take > over (i.e., when it starts receiving documents via LB), it needs to be > able to > write to that same index. But what if that lock is still around? One > could use > the Native lock to make the lock disappear if the primary master's JVM > exited > unexpectedly, and in that case everything *should* work and be > completely > transparent, right? That is, the secondary will start getting new docs, > it will > use its IndexWriter to write to that same shared index, which won't be > locked > for writes because the lock is gone, and everyone will be happy. Did I > miss > something important here? > > Assuming the above is correct, what if the lock is *not* gone because > the > primary master's JVM is actually not dead, although maybe unresponsive, > so LB > thinks the primary master is dead. Then the LB will route indexing > requests to > the secondary master, which will attempt to write to the index, but be > denied > because of the lock. So a human needs to jump in, remove the lock, and > manually > reindex failed docs if the upstream component doesn't buffer docs that > failed to > get indexed and doesn't retry indexing them automatically. Is this > correct or > is there a way to avoid humans here? > > Thanks, > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ >
RE: True master-master fail-over without data gaps (choosing CA in CAP)
Can't you skip the SAN and keep the indexes locally? Then you would have two redundant copies of the index and no lock issues. Also, Can't master02 just be a slave to master01 (in the master farm and separate from the slave farm) until such time as master01 fails? Then master02 would start receiving the new documents with an indexes complete up to the last replication at least and the other slaves would be directed by LB to poll master02 also... -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, March 09, 2011 9:47 AM To: solr-user@lucene.apache.org Subject: Re: True master-master fail-over without data gaps (choosing CA in CAP) Hi, - Original Message > From: Walter Underwood > On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote: > > > You mean it's not possible to have 2 masters that are in nearly real-time >sync? > > How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs (their >edit > > > logs) in sync to avoid the current NN SPOF, for example, so I'm thinking >this > > > could be doable with Solr masters, too, no? > > If you add fault-tolerant, you run into the CAP Theorem. Consistency, >availability, partition: choose two. You cannot have it all. Right, so I'll take Consistency and Availability, and I'll put my 2 masters in the same rack (which has redundant switches, power supply, etc.) and thus minimize/avoid partitioning. Assuming the above actually works, I think my Q remains: How do you set up 2 Solr masters so they are in near real-time sync? DRBD? But here is maybe a simpler scenario that more people may be considering: Imagine 2 masters on 2 different servers in 1 rack, pointing to the same index on the shared storage (SAN) that also happens to live in the same rack. 2 Solr masters are behind 1 LB VIP that indexer talks to. The VIP is configured so that all requests always get routed to the primary master (because only 1 master can be modifying an index at a time), except when this primary is down, in which case the requests are sent to the secondary master. So in this case my Q is around automation of this, around Lucene index locks, around the need for manual intervention, and such. Concretely, if you have these 2 master instances, the primary master has the Lucene index lock in the index dir. When the secondary master needs to take over (i.e., when it starts receiving documents via LB), it needs to be able to write to that same index. But what if that lock is still around? One could use the Native lock to make the lock disappear if the primary master's JVM exited unexpectedly, and in that case everything *should* work and be completely transparent, right? That is, the secondary will start getting new docs, it will use its IndexWriter to write to that same shared index, which won't be locked for writes because the lock is gone, and everyone will be happy. Did I miss something important here? Assuming the above is correct, what if the lock is *not* gone because the primary master's JVM is actually not dead, although maybe unresponsive, so LB thinks the primary master is dead. Then the LB will route indexing requests to the secondary master, which will attempt to write to the index, but be denied because of the lock. So a human needs to jump in, remove the lock, and manually reindex failed docs if the upstream component doesn't buffer docs that failed to get indexed and doesn't retry indexing them automatically. Is this correct or is there a way to avoid humans here? Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: True master-master fail-over without data gaps (choosing CA in CAP)
Hi, - Original Message > From: Walter Underwood > On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote: > > > You mean it's not possible to have 2 masters that are in nearly real-time >sync? > > How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs (their >edit > > > logs) in sync to avoid the current NN SPOF, for example, so I'm thinking >this > > > could be doable with Solr masters, too, no? > > If you add fault-tolerant, you run into the CAP Theorem. Consistency, >availability, partition: choose two. You cannot have it all. Right, so I'll take Consistency and Availability, and I'll put my 2 masters in the same rack (which has redundant switches, power supply, etc.) and thus minimize/avoid partitioning. Assuming the above actually works, I think my Q remains: How do you set up 2 Solr masters so they are in near real-time sync? DRBD? But here is maybe a simpler scenario that more people may be considering: Imagine 2 masters on 2 different servers in 1 rack, pointing to the same index on the shared storage (SAN) that also happens to live in the same rack. 2 Solr masters are behind 1 LB VIP that indexer talks to. The VIP is configured so that all requests always get routed to the primary master (because only 1 master can be modifying an index at a time), except when this primary is down, in which case the requests are sent to the secondary master. So in this case my Q is around automation of this, around Lucene index locks, around the need for manual intervention, and such. Concretely, if you have these 2 master instances, the primary master has the Lucene index lock in the index dir. When the secondary master needs to take over (i.e., when it starts receiving documents via LB), it needs to be able to write to that same index. But what if that lock is still around? One could use the Native lock to make the lock disappear if the primary master's JVM exited unexpectedly, and in that case everything *should* work and be completely transparent, right? That is, the secondary will start getting new docs, it will use its IndexWriter to write to that same shared index, which won't be locked for writes because the lock is gone, and everyone will be happy. Did I miss something important here? Assuming the above is correct, what if the lock is *not* gone because the primary master's JVM is actually not dead, although maybe unresponsive, so LB thinks the primary master is dead. Then the LB will route indexing requests to the secondary master, which will attempt to write to the index, but be denied because of the lock. So a human needs to jump in, remove the lock, and manually reindex failed docs if the upstream component doesn't buffer docs that failed to get indexed and doesn't retry indexing them automatically. Is this correct or is there a way to avoid humans here? Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/