Re: True master-master fail-over without data gaps (choosing CA in CAP)
Hi, - Original Message From: Jake Luciani jak...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, March 9, 2011 8:07:00 PM Subject: Re: True master-master fail-over without data gaps (choosing CA in CAP) Yeah sure. Let me update this on the Solandra wiki. I'll send across the link Excellent. You could include ES there, too, if you feel extra adventurous. ;) I think you hit the main two shortcomings atm. - Grandma, why are your eyes so big? - To see you better. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -Jake On Wed, Mar 9, 2011 at 6:17 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Jake, Maybe it's time to come up with the Solandra/Solr matrix so we can see Solandra's strengths (e.g. RT, no replication) and weaknesses (e.g. I think I saw a mention of some big indices?) or missing feature (e.g. no delete by query), etc. Thanks! Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Jake Luciani jak...@gmail.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Wed, March 9, 2011 6:04:13 PM Subject: Re: True master-master fail-over without data gaps (choosing CA in CAP) Jason, It's predecessor did, Lucandra. But Solandra is a new approach that manages shards of documents across the cluster for you and uses solrs distributed search to query indexes. Jake On Mar 9, 2011, at 5:15 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Doesn't Solandra partition by term instead of document? On Wed, Mar 9, 2011 at 2:13 PM, Smiley, David W. dsmi...@mitre.org wrote: I was just about to jump in this conversation to mention Solandra and go fig, Solandra's committer comes in. :-) It was nice to meet you at Strata, Jake. I haven't dug into the code yet but Solandra strikes me as a killer way to scale Solr. I'm looking forward to playing with it; particularly looking at disk requirements and performance measurements. ~ David Smiley On Mar 9, 2011, at 3:14 PM, Jake Luciani wrote: Hi Otis, Have you considered using Solandra with Quorum writes to achieve master/master with CA semantics? -Jake On Wed, Mar 9, 2011 at 2:48 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, Original Message From: Robert Petersen rober...@buy.com Can't you skip the SAN and keep the indexes locally? Then you would have two redundant copies of the index and no lock issues. I could, but then I'd have the issue of keeping them in sync, which seems more fragile. I think SAN makes things simpler overall. Also, Can't master02 just be a slave to master01 (in the master farm and separate from the slave farm) until such time as master01 fails? Then No, because it wouldn't be in sync. It would always be N minutes behind, and when the primary master fails, the secondary would not have all the docs - data loss. master02 would start receiving the new documents with an indexes complete up to the last replication at least and the other slaves would be directed by LB to poll master02 also... Yeah, complete up to the last replication is the problem. It's a data gap that now needs to be filled somehow. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, March 09, 2011 9:47 AM To: solr-user@lucene.apache.org Subject: Re: True master-master fail-over without data gaps (choosing CA in CAP) Hi, - Original Message From: Walter Underwood wun...@wunderwood.org On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote: You mean it's not possible to have 2 masters that are in nearly real-time sync? How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs (their edit logs) in sync to avoid the current NN SPOF, for example, so I'm thinking this could be doable with Solr masters, too, no? If you add fault-tolerant, you run into the CAP Theorem. Consistency, availability, partition: choose two. You cannot have it all. Right, so I'll take Consistency and Availability, and I'll put my 2 masters in the same rack (which has redundant switches, power supply
Re: True master-master fail-over without data gaps
Yes, I think this should be pushed upstream - insert a tee in the document stream so that all documents go to both masters. Then use a load balancer to make requests of the masters. The tee itself then becomes a possible single point of failure, but you didn't say anything about the architecture of the document feed. Is that also fault-tolerant? -Mike On 3/9/2011 1:06 AM, Jonathan Rochkind wrote: I'd honestly think about buffer the incoming documents in some store that's actually made for fail-over persistence reliability, maybe CouchDB or something. And then that's taking care of not losing anything, and the problem becomes how we make sure that our solr master indexes are kept in sync with the actual persistent store; which I'm still not sure about, but I'm thinking it's a simpler problem. The right tool for the right job, that kind of failover persistence is not solr's specialty. From: Otis Gospodnetic [otis_gospodne...@yahoo.com] Sent: Tuesday, March 08, 2011 11:45 PM To: solr-user@lucene.apache.org Subject: True master-master fail-over without data gaps Hello, What are some common or good ways to handle indexing (master) fail-over? Imagine you have a continuous stream of incoming documents that you have to index without losing any of them (or with losing as few of them as possible). How do you set up you masters? In other words, you can't just have 2 masters where the secondary is the Repeater (or Slave) of the primary master and replicates the index periodically: you need to have 2 masters that are in sync at all times! How do you achieve that? * Do you just put N masters behind a LB VIP, configure them both to point to the index on some shared storage (e.g. SAN), and count on the LB to fail-over to the secondary master when the primary becomes unreachable? If so, how do you deal with index locks? You use the Native lock and count on it disappearing when the primary master goes down? That means you count on the whole JVM process dying, which may not be the case... * Or do you use tools like DRBD, Corosync, Pacemaker, etc. to keep 2 masters with 2 separate indices in sync, while making sure you write to only 1 of them via LB VIP or otherwise? * Or ... This thread is on a similar topic, but is inconclusive: http://search-lucene.com/m/aOsyN15f1qd1 Here is another similar thread, but this one doesn't cover how 2 masters are kept in sync at all times: http://search-lucene.com/m/aOsyN15f1qd1 Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: True master-master fail-over without data gaps
If you're using the delta import handler the problem would seem to go away because you can have two separate masters running at all times, and if one fails, you can then point the slaves to the secondary master, that is guaranteed to be in sync because it's been importing from the same database? On Tue, Mar 8, 2011 at 8:45 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hello, What are some common or good ways to handle indexing (master) fail-over? Imagine you have a continuous stream of incoming documents that you have to index without losing any of them (or with losing as few of them as possible). How do you set up you masters? In other words, you can't just have 2 masters where the secondary is the Repeater (or Slave) of the primary master and replicates the index periodically: you need to have 2 masters that are in sync at all times! How do you achieve that? * Do you just put N masters behind a LB VIP, configure them both to point to the index on some shared storage (e.g. SAN), and count on the LB to fail-over to the secondary master when the primary becomes unreachable? If so, how do you deal with index locks? You use the Native lock and count on it disappearing when the primary master goes down? That means you count on the whole JVM process dying, which may not be the case... * Or do you use tools like DRBD, Corosync, Pacemaker, etc. to keep 2 masters with 2 separate indices in sync, while making sure you write to only 1 of them via LB VIP or otherwise? * Or ... This thread is on a similar topic, but is inconclusive: http://search-lucene.com/m/aOsyN15f1qd1 Here is another similar thread, but this one doesn't cover how 2 masters are kept in sync at all times: http://search-lucene.com/m/aOsyN15f1qd1 Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
RE: True master-master fail-over without data gaps
If you have a wrapper, like an indexer app which prepares solr docs and sends them into solr, then it is simple. The wrapper is your 'tee' and it can send docs to both (or N) masters. -Original Message- From: Michael Sokolov [mailto:soko...@ifactory.com] Sent: Wednesday, March 09, 2011 4:14 AM To: solr-user@lucene.apache.org Cc: Jonathan Rochkind Subject: Re: True master-master fail-over without data gaps Yes, I think this should be pushed upstream - insert a tee in the document stream so that all documents go to both masters. Then use a load balancer to make requests of the masters. The tee itself then becomes a possible single point of failure, but you didn't say anything about the architecture of the document feed. Is that also fault-tolerant? -Mike On 3/9/2011 1:06 AM, Jonathan Rochkind wrote: I'd honestly think about buffer the incoming documents in some store that's actually made for fail-over persistence reliability, maybe CouchDB or something. And then that's taking care of not losing anything, and the problem becomes how we make sure that our solr master indexes are kept in sync with the actual persistent store; which I'm still not sure about, but I'm thinking it's a simpler problem. The right tool for the right job, that kind of failover persistence is not solr's specialty. From: Otis Gospodnetic [otis_gospodne...@yahoo.com] Sent: Tuesday, March 08, 2011 11:45 PM To: solr-user@lucene.apache.org Subject: True master-master fail-over without data gaps Hello, What are some common or good ways to handle indexing (master) fail-over? Imagine you have a continuous stream of incoming documents that you have to index without losing any of them (or with losing as few of them as possible). How do you set up you masters? In other words, you can't just have 2 masters where the secondary is the Repeater (or Slave) of the primary master and replicates the index periodically: you need to have 2 masters that are in sync at all times! How do you achieve that? * Do you just put N masters behind a LB VIP, configure them both to point to the index on some shared storage (e.g. SAN), and count on the LB to fail-over to the secondary master when the primary becomes unreachable? If so, how do you deal with index locks? You use the Native lock and count on it disappearing when the primary master goes down? That means you count on the whole JVM process dying, which may not be the case... * Or do you use tools like DRBD, Corosync, Pacemaker, etc. to keep 2 masters with 2 separate indices in sync, while making sure you write to only 1 of them via LB VIP or otherwise? * Or ... This thread is on a similar topic, but is inconclusive: http://search-lucene.com/m/aOsyN15f1qd1 Here is another similar thread, but this one doesn't cover how 2 masters are kept in sync at all times: http://search-lucene.com/m/aOsyN15f1qd1 Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: True master-master fail-over without data gaps
Hi, - Original Message If you're using the delta import handler the problem would seem to go away because you can have two separate masters running at all times, and if one fails, you can then point the slaves to the secondary master, that is guaranteed to be in sync because it's been importing from the same database? Oh, there is no DB involved. Think of a document stream continuously coming in, a component listening to that stream, grabbing docs, and pushing it to master(s). Otis On Tue, Mar 8, 2011 at 8:45 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hello, What are some common or good ways to handle indexing (master) fail-over? Imagine you have a continuous stream of incoming documents that you have to index without losing any of them (or with losing as few of them as possible). How do you set up you masters? In other words, you can't just have 2 masters where the secondary is the Repeater (or Slave) of the primary master and replicates the index periodically: you need to have 2 masters that are in sync at all times! How do you achieve that? * Do you just put N masters behind a LB VIP, configure them both to point to the index on some shared storage (e.g. SAN), and count on the LB to fail-over to the secondary master when the primary becomes unreachable? If so, how do you deal with index locks? You use the Native lock and count on it disappearing when the primary master goes down? That means you count on the whole JVM process dying, which may not be the case... * Or do you use tools like DRBD, Corosync, Pacemaker, etc. to keep 2 masters with 2 separate indices in sync, while making sure you write to only 1 of them via LB VIP or otherwise? * Or ... This thread is on a similar topic, but is inconclusive: http://search-lucene.com/m/aOsyN15f1qd1 Here is another similar thread, but this one doesn't cover how 2 masters are kept in sync at all times: http://search-lucene.com/m/aOsyN15f1qd1 Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: True master-master fail-over without data gaps
Hi, - Original Message From: Robert Petersen rober...@buy.com To: solr-user@lucene.apache.org Sent: Wed, March 9, 2011 11:40:56 AM Subject: RE: True master-master fail-over without data gaps If you have a wrapper, like an indexer app which prepares solr docs and sends them into solr, then it is simple. The wrapper is your 'tee' and it can send docs to both (or N) masters. Doesn't this make it too easy for 2 masters to get out of sync even if the problem is not with them? e.g. something happens in this tee component and it indexes a doc to master A, but not master B. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -Original Message- From: Michael Sokolov [mailto:soko...@ifactory.com] Sent: Wednesday, March 09, 2011 4:14 AM To: solr-user@lucene.apache.org Cc: Jonathan Rochkind Subject: Re: True master-master fail-over without data gaps Yes, I think this should be pushed upstream - insert a tee in the document stream so that all documents go to both masters. Then use a load balancer to make requests of the masters. The tee itself then becomes a possible single point of failure, but you didn't say anything about the architecture of the document feed. Is that also fault-tolerant? -Mike On 3/9/2011 1:06 AM, Jonathan Rochkind wrote: I'd honestly think about buffer the incoming documents in some store that's actually made for fail-over persistence reliability, maybe CouchDB or something. And then that's taking care of not losing anything, and the problem becomes how we make sure that our solr master indexes are kept in sync with the actual persistent store; which I'm still not sure about, but I'm thinking it's a simpler problem. The right tool for the right job, that kind of failover persistence is not solr's specialty. From: Otis Gospodnetic [otis_gospodne...@yahoo.com] Sent: Tuesday, March 08, 2011 11:45 PM To: solr-user@lucene.apache.org Subject: True master-master fail-over without data gaps Hello, What are some common or good ways to handle indexing (master) fail-over? Imagine you have a continuous stream of incoming documents that you have to index without losing any of them (or with losing as few of them as possible). How do you set up you masters? In other words, you can't just have 2 masters where the secondary is the Repeater (or Slave) of the primary master and replicates the index periodically: you need to have 2 masters that are in sync at all times! How do you achieve that? * Do you just put N masters behind a LB VIP, configure them both to point to the index on some shared storage (e.g. SAN), and count on the LB to fail-over to the secondary master when the primary becomes unreachable? If so, how do you deal with index locks? You use the Native lock and count on it disappearing when the primary master goes down? That means you count on the whole JVM process dying, which may not be the case... * Or do you use tools like DRBD, Corosync, Pacemaker, etc. to keep 2 masters with 2 separate indices in sync, while making sure you write to only 1 of them via LB VIP or otherwise? * Or ... This thread is on a similar topic, but is inconclusive: http://search-lucene.com/m/aOsyN15f1qd1 Here is another similar thread, but this one doesn't cover how 2 masters are kept in sync at all times: http://search-lucene.com/m/aOsyN15f1qd1 Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: True master-master fail-over without data gaps
Oh, there is no DB involved. Think of a document stream continuously coming in, a component listening to that stream, grabbing docs, and pushing it to master(s). I don't think Solr is designed for this use case, eg, I wouldn't expect deterministic results with the current architecture as it's something that's inherently a a key component of [No]SQL databases. On Wed, Mar 9, 2011 at 8:49 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, - Original Message If you're using the delta import handler the problem would seem to go away because you can have two separate masters running at all times, and if one fails, you can then point the slaves to the secondary master, that is guaranteed to be in sync because it's been importing from the same database? Oh, there is no DB involved. Think of a document stream continuously coming in, a component listening to that stream, grabbing docs, and pushing it to master(s). Otis On Tue, Mar 8, 2011 at 8:45 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hello, What are some common or good ways to handle indexing (master) fail-over? Imagine you have a continuous stream of incoming documents that you have to index without losing any of them (or with losing as few of them as possible). How do you set up you masters? In other words, you can't just have 2 masters where the secondary is the Repeater (or Slave) of the primary master and replicates the index periodically: you need to have 2 masters that are in sync at all times! How do you achieve that? * Do you just put N masters behind a LB VIP, configure them both to point to the index on some shared storage (e.g. SAN), and count on the LB to fail-over to the secondary master when the primary becomes unreachable? If so, how do you deal with index locks? You use the Native lock and count on it disappearing when the primary master goes down? That means you count on the whole JVM process dying, which may not be the case... * Or do you use tools like DRBD, Corosync, Pacemaker, etc. to keep 2 masters with 2 separate indices in sync, while making sure you write to only 1 of them via LB VIP or otherwise? * Or ... This thread is on a similar topic, but is inconclusive: http://search-lucene.com/m/aOsyN15f1qd1 Here is another similar thread, but this one doesn't cover how 2 masters are kept in sync at all times: http://search-lucene.com/m/aOsyN15f1qd1 Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: True master-master fail-over without data gaps
Hi, - Original Message Yes, I think this should be pushed upstream - insert a tee in the document stream so that all documents go to both masters. Then use a load balancer to make requests of the masters. Hm, but this makes the tee app aware of this. What if I want to hide that from any code of mine? The tee itself then becomes a possible single point of failure, but you didn't say anything about the architecture of the document feed. Is that also fault-tolerant? Let's say it is! :) Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ On 3/9/2011 1:06 AM, Jonathan Rochkind wrote: I'd honestly think about buffer the incoming documents in some store that's actually made for fail-over persistence reliability, maybe CouchDB or something. And then that's taking care of not losing anything, and the problem becomes how we make sure that our solr master indexes are kept in sync with the actual persistent store; which I'm still not sure about, but I'm thinking it's a simpler problem. The right tool for the right job, that kind of failover persistence is not solr's specialty. From: Otis Gospodnetic [otis_gospodne...@yahoo.com] Sent: Tuesday, March 08, 2011 11:45 PM To: solr-user@lucene.apache.org Subject: True master-master fail-over without data gaps Hello, What are some common or good ways to handle indexing (master) fail-over? Imagine you have a continuous stream of incoming documents that you have to index without losing any of them (or with losing as few of them as possible). How do you set up you masters? In other words, you can't just have 2 masters where the secondary is the Repeater (or Slave) of the primary master and replicates the index periodically: you need to have 2 masters that are in sync at all times! How do you achieve that? * Do you just put N masters behind a LB VIP, configure them both to point to the index on some shared storage (e.g. SAN), and count on the LB to fail-over to the secondary master when the primary becomes unreachable? If so, how do you deal with index locks? You use the Native lock and count on it disappearing when the primary master goes down? That means you count on the whole JVM process dying, which may not be the case... * Or do you use tools like DRBD, Corosync, Pacemaker, etc. to keep 2 masters with 2 separate indices in sync, while making sure you write to only 1 of them via LB VIP or otherwise? * Or ... This thread is on a similar topic, but is inconclusive: http://search-lucene.com/m/aOsyN15f1qd1 Here is another similar thread, but this one doesn't cover how 2 masters are kept in sync at all times: http://search-lucene.com/m/aOsyN15f1qd1 Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: True master-master fail-over without data gaps
Hi, - Original Message Oh, there is no DB involved. Think of a document stream continuously coming in, a component listening to that stream, grabbing docs, and pushing it to master(s). I don't think Solr is designed for this use case, eg, I wouldn't expect deterministic results with the current architecture as it's something that's inherently a a key component of [No]SQL databases. You mean it's not possible to have 2 masters that are in nearly real-time sync? How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs (their edit logs) in sync to avoid the current NN SPOF, for example, so I'm thinking this could be doable with Solr masters, too, no? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ On Wed, Mar 9, 2011 at 8:49 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, - Original Message If you're using the delta import handler the problem would seem to go away because you can have two separate masters running at all times, and if one fails, you can then point the slaves to the secondary master, that is guaranteed to be in sync because it's been importing from the same database? Oh, there is no DB involved. Think of a document stream continuously coming in, a component listening to that stream, grabbing docs, and pushing it to master(s). Otis On Tue, Mar 8, 2011 at 8:45 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hello, What are some common or good ways to handle indexing (master) fail-over? Imagine you have a continuous stream of incoming documents that you have to index without losing any of them (or with losing as few of them as possible). How do you set up you masters? In other words, you can't just have 2 masters where the secondary is the Repeater (or Slave) of the primary master and replicates the index periodically: you need to have 2 masters that are in sync at all times! How do you achieve that? * Do you just put N masters behind a LB VIP, configure them both to point to the index on some shared storage (e.g. SAN), and count on the LB to fail-over to the secondary master when the primary becomes unreachable? If so, how do you deal with index locks? You use the Native lock and count on it disappearing when the primary master goes down? That means you count on the whole JVM process dying, which may not be the case... * Or do you use tools like DRBD, Corosync, Pacemaker, etc. to keep 2 masters with 2 separate indices in sync, while making sure you write to only 1 of them via LB VIP or otherwise? * Or ... This thread is on a similar topic, but is inconclusive: http://search-lucene.com/m/aOsyN15f1qd1 Here is another similar thread, but this one doesn't cover how 2 masters are kept in sync at all times: http://search-lucene.com/m/aOsyN15f1qd1 Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
RE: True master-master fail-over without data gaps
Currently I use an application connected to a queue containing incoming data which my indexer app turns into solr docs. I log everything to a log table and have never had an issue with losing anything. I can trace incoming docs exactly, and keep timing data in there also. If I added a second solr url for a second master and resent the same doc to master02 that I sent to master01, I would expect near 100% synchronization. The problem here is how to get the slave farm to start replicating from the second master if and when the first goes down. I can only see that as being a manual operation, repointing the slaves to master02 and restarting or reloading them etc... -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, March 09, 2011 8:52 AM To: solr-user@lucene.apache.org Subject: Re: True master-master fail-over without data gaps Hi, - Original Message From: Robert Petersen rober...@buy.com To: solr-user@lucene.apache.org Sent: Wed, March 9, 2011 11:40:56 AM Subject: RE: True master-master fail-over without data gaps If you have a wrapper, like an indexer app which prepares solr docs and sends them into solr, then it is simple. The wrapper is your 'tee' and it can send docs to both (or N) masters. Doesn't this make it too easy for 2 masters to get out of sync even if the problem is not with them? e.g. something happens in this tee component and it indexes a doc to master A, but not master B. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -Original Message- From: Michael Sokolov [mailto:soko...@ifactory.com] Sent: Wednesday, March 09, 2011 4:14 AM To: solr-user@lucene.apache.org Cc: Jonathan Rochkind Subject: Re: True master-master fail-over without data gaps Yes, I think this should be pushed upstream - insert a tee in the document stream so that all documents go to both masters. Then use a load balancer to make requests of the masters. The tee itself then becomes a possible single point of failure, but you didn't say anything about the architecture of the document feed. Is that also fault-tolerant? -Mike On 3/9/2011 1:06 AM, Jonathan Rochkind wrote: I'd honestly think about buffer the incoming documents in some store that's actually made for fail-over persistence reliability, maybe CouchDB or something. And then that's taking care of not losing anything, and the problem becomes how we make sure that our solr master indexes are kept in sync with the actual persistent store; which I'm still not sure about, but I'm thinking it's a simpler problem. The right tool for the right job, that kind of failover persistence is not solr's specialty. From: Otis Gospodnetic [otis_gospodne...@yahoo.com] Sent: Tuesday, March 08, 2011 11:45 PM To: solr-user@lucene.apache.org Subject: True master-master fail-over without data gaps Hello, What are some common or good ways to handle indexing (master) fail-over? Imagine you have a continuous stream of incoming documents that you have to index without losing any of them (or with losing as few of them as possible). How do you set up you masters? In other words, you can't just have 2 masters where the secondary is the Repeater (or Slave) of the primary master and replicates the index periodically: you need to have 2 masters that are in sync at all times! How do you achieve that? * Do you just put N masters behind a LB VIP, configure them both to point to the index on some shared storage (e.g. SAN), and count on the LB to fail-over to the secondary master when the primary becomes unreachable? If so, how do you deal with index locks? You use the Native lock and count on it disappearing when the primary master goes down? That means you count on the whole JVM process dying, which may not be the case... * Or do you use tools like DRBD, Corosync, Pacemaker, etc. to keep 2 masters with 2 separate indices in sync, while making sure you write to only 1 of them via LB VIP or otherwise? * Or ... This thread is on a similar topic, but is inconclusive: http://search-lucene.com/m/aOsyN15f1qd1 Here is another similar thread, but this one doesn't cover how 2 masters are kept in sync at all times: http://search-lucene.com/m/aOsyN15f1qd1 Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: True master-master fail-over without data gaps
Hi, - Original Message I'd honestly think about buffer the incoming documents in some store that's actually made for fail-over persistence reliability, maybe CouchDB or something. And then that's taking care of not losing anything, and the problem becomes how we make sure that our solr master indexes are kept in sync with the actual persistent store; which I'm still not sure about, but I'm thinking it's a simpler problem. The right tool for the right job, that kind of failover persistence is not solr's specialty. But check this! In some cases one is not allowed to save content to disk (think copyrights). I'm not making this up - we actually have a customer with this cannot save to disk (but can index) requirement. So buffering to disk is not an option, and buffering in memory is not practical because of the input document rate and their size. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Otis Gospodnetic [otis_gospodne...@yahoo.com] Sent: Tuesday, March 08, 2011 11:45 PM To: solr-user@lucene.apache.org Subject: True master-master fail-over without data gaps Hello, What are some common or good ways to handle indexing (master) fail-over? Imagine you have a continuous stream of incoming documents that you have to index without losing any of them (or with losing as few of them as possible). How do you set up you masters? In other words, you can't just have 2 masters where the secondary is the Repeater (or Slave) of the primary master and replicates the index periodically: you need to have 2 masters that are in sync at all times! How do you achieve that? * Do you just put N masters behind a LB VIP, configure them both to point to the index on some shared storage (e.g. SAN), and count on the LB to fail-over to the secondary master when the primary becomes unreachable? If so, how do you deal with index locks? You use the Native lock and count on it disappearing when the primary master goes down? That means you count on the whole JVM process dying, which may not be the case... * Or do you use tools like DRBD, Corosync, Pacemaker, etc. to keep 2 masters with 2 separate indices in sync, while making sure you write to only 1 of them via LB VIP or otherwise? * Or ... This thread is on a similar topic, but is inconclusive: http://search-lucene.com/m/aOsyN15f1qd1 Here is another similar thread, but this one doesn't cover how 2 masters are kept in sync at all times: http://search-lucene.com/m/aOsyN15f1qd1 Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: True master-master fail-over without data gaps
On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote: You mean it's not possible to have 2 masters that are in nearly real-time sync? How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs (their edit logs) in sync to avoid the current NN SPOF, for example, so I'm thinking this could be doable with Solr masters, too, no? Otis If you add fault-tolerant, you run into the CAP Theorem. Consistency, availability, partition: choose two. You cannot have it all. wunder -- Walter Underwood
RE: True master-master fail-over without data gaps
...but the index resides on disk doesn't it??? lol -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, March 09, 2011 9:06 AM To: solr-user@lucene.apache.org Subject: Re: True master-master fail-over without data gaps Hi, - Original Message I'd honestly think about buffer the incoming documents in some store that's actually made for fail-over persistence reliability, maybe CouchDB or something. And then that's taking care of not losing anything, and the problem becomes how we make sure that our solr master indexes are kept in sync with the actual persistent store; which I'm still not sure about, but I'm thinking it's a simpler problem. The right tool for the right job, that kind of failover persistence is not solr's specialty. But check this! In some cases one is not allowed to save content to disk (think copyrights). I'm not making this up - we actually have a customer with this cannot save to disk (but can index) requirement. So buffering to disk is not an option, and buffering in memory is not practical because of the input document rate and their size. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Otis Gospodnetic [otis_gospodne...@yahoo.com] Sent: Tuesday, March 08, 2011 11:45 PM To: solr-user@lucene.apache.org Subject: True master-master fail-over without data gaps Hello, What are some common or good ways to handle indexing (master) fail-over? Imagine you have a continuous stream of incoming documents that you have to index without losing any of them (or with losing as few of them as possible). How do you set up you masters? In other words, you can't just have 2 masters where the secondary is the Repeater (or Slave) of the primary master and replicates the index periodically: you need to have 2 masters that are in sync at all times! How do you achieve that? * Do you just put N masters behind a LB VIP, configure them both to point to the index on some shared storage (e.g. SAN), and count on the LB to fail-over to the secondary master when the primary becomes unreachable? If so, how do you deal with index locks? You use the Native lock and count on it disappearing when the primary master goes down? That means you count on the whole JVM process dying, which may not be the case... * Or do you use tools like DRBD, Corosync, Pacemaker, etc. to keep 2 masters with 2 separate indices in sync, while making sure you write to only 1 of them via LB VIP or otherwise? * Or ... This thread is on a similar topic, but is inconclusive: http://search-lucene.com/m/aOsyN15f1qd1 Here is another similar thread, but this one doesn't cover how 2 masters are kept in sync at all times: http://search-lucene.com/m/aOsyN15f1qd1 Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: True master-master fail-over without data gaps
Hi, - Original Message Currently I use an application connected to a queue containing incoming data which my indexer app turns into solr docs. I log everything to a log table and have never had an issue with losing anything. Yeah, if everything goes through some storage that can be polled (either a DB or a durable JMS Topic or some such), then N masters could connect to it, not miss anything, and be more or less in near real-time sync. I can trace incoming docs exactly, and keep timing data in there also. If I added a second solr url for a second master and resent the same doc to master02 that I sent to master01, I would expect near 100% synchronization. The problem here is how to get the slave farm to start replicating from the second master if and when the first goes down. I can only see that as being a manual operation, repointing the slaves to master02 and restarting or reloading them etc... Actually, you can configure a LB to handle that, so that's less of a problem, I think. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, March 09, 2011 8:52 AM To: solr-user@lucene.apache.org Subject: Re: True master-master fail-over without data gaps Hi, - Original Message From: Robert Petersen rober...@buy.com To: solr-user@lucene.apache.org Sent: Wed, March 9, 2011 11:40:56 AM Subject: RE: True master-master fail-over without data gaps If you have a wrapper, like an indexer app which prepares solr docs and sends them into solr, then it is simple. The wrapper is your 'tee' and it can send docs to both (or N) masters. Doesn't this make it too easy for 2 masters to get out of sync even if the problem is not with them? e.g. something happens in this tee component and it indexes a doc to master A, but not master B. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -Original Message- From: Michael Sokolov [mailto:soko...@ifactory.com] Sent: Wednesday, March 09, 2011 4:14 AM To: solr-user@lucene.apache.org Cc: Jonathan Rochkind Subject: Re: True master-master fail-over without data gaps Yes, I think this should be pushed upstream - insert a tee in the document stream so that all documents go to both masters. Then use a load balancer to make requests of the masters. The tee itself then becomes a possible single point of failure, but you didn't say anything about the architecture of the document feed. Is that also fault-tolerant? -Mike On 3/9/2011 1:06 AM, Jonathan Rochkind wrote: I'd honestly think about buffer the incoming documents in some store that's actually made for fail-over persistence reliability, maybe CouchDB or something. And then that's taking care of not losing anything, and the problem becomes how we make sure that our solr master indexes are kept in sync with the actual persistent store; which I'm still not sure about, but I'm thinking it's a simpler problem. The right tool for the right job, that kind of failover persistence is not solr's specialty. From: Otis Gospodnetic [otis_gospodne...@yahoo.com] Sent: Tuesday, March 08, 2011 11:45 PM To: solr-user@lucene.apache.org Subject: True master-master fail-over without data gaps Hello, What are some common or good ways to handle indexing (master) fail-over? Imagine you have a continuous stream of incoming documents that you have to index without losing any of them (or with losing as few of them as possible). How do you set up you masters? In other words, you can't just have 2 masters where the secondary is the Repeater (or Slave) of the primary master and replicates the index periodically: you need to have 2 masters that are in sync at all times! How do you achieve that? * Do you just put N masters behind a LB VIP, configure them both to point to the index on some shared storage (e.g. SAN), and count on the LB to fail-over to the secondary master when the primary becomes unreachable? If so, how do you deal with index locks? You use the Native lock and count on it disappearing when the primary master goes down? That means you count on the whole JVM process dying, which may not be the case... * Or do you use tools like DRBD, Corosync, Pacemaker, etc. to keep 2 masters with 2 separate indices in sync, while making sure you write to only 1 of them via LB VIP or otherwise? * Or ... This thread is on a similar topic, but is inconclusive: http://search-lucene.com/m/aOsyN15f1qd1
Re: True master-master fail-over without data gaps
On disk, yes, but only indexed, and thus far enough from the original content to make storing terms in Lucene's inverted index. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Robert Petersen rober...@buy.com To: solr-user@lucene.apache.org Sent: Wed, March 9, 2011 12:07:27 PM Subject: RE: True master-master fail-over without data gaps ...but the index resides on disk doesn't it??? lol -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, March 09, 2011 9:06 AM To: solr-user@lucene.apache.org Subject: Re: True master-master fail-over without data gaps Hi, - Original Message I'd honestly think about buffer the incoming documents in some store that's actually made for fail-over persistence reliability, maybe CouchDB or something. And then that's taking care of not losing anything, and the problem becomes how we make sure that our solr master indexes are kept in sync with the actual persistent store; which I'm still not sure about, but I'm thinking it's a simpler problem. The right tool for the right job, that kind of failover persistence is not solr's specialty. But check this! In some cases one is not allowed to save content to disk (think copyrights). I'm not making this up - we actually have a customer with this cannot save to disk (but can index) requirement. So buffering to disk is not an option, and buffering in memory is not practical because of the input document rate and their size. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Otis Gospodnetic [otis_gospodne...@yahoo.com] Sent: Tuesday, March 08, 2011 11:45 PM To: solr-user@lucene.apache.org Subject: True master-master fail-over without data gaps Hello, What are some common or good ways to handle indexing (master) fail-over? Imagine you have a continuous stream of incoming documents that you have to index without losing any of them (or with losing as few of them as possible). How do you set up you masters? In other words, you can't just have 2 masters where the secondary is the Repeater (or Slave) of the primary master and replicates the index periodically: you need to have 2 masters that are in sync at all times! How do you achieve that? * Do you just put N masters behind a LB VIP, configure them both to point to the index on some shared storage (e.g. SAN), and count on the LB to fail-over to the secondary master when the primary becomes unreachable? If so, how do you deal with index locks? You use the Native lock and count on it disappearing when the primary master goes down? That means you count on the whole JVM process dying, which may not be the case... * Or do you use tools like DRBD, Corosync, Pacemaker, etc. to keep 2 masters with 2 separate indices in sync, while making sure you write to only 1 of them via LB VIP or otherwise? * Or ... This thread is on a similar topic, but is inconclusive: http://search-lucene.com/m/aOsyN15f1qd1 Here is another similar thread, but this one doesn't cover how 2 masters are kept in sync at all times: http://search-lucene.com/m/aOsyN15f1qd1 Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: True master-master fail-over without data gaps
RAMdisk ...but the index resides on disk doesn't it??? lol -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, March 09, 2011 9:06 AM To: solr-user@lucene.apache.org Subject: Re: True master-master fail-over without data gaps Hi, - Original Message I'd honestly think about buffer the incoming documents in some store that's actually made for fail-over persistence reliability, maybe CouchDB or something. And then that's taking care of not losing anything, and the problem becomes how we make sure that our solr master indexes are kept in sync with the actual persistent store; which I'm still not sure about, but I'm thinking it's a simpler problem. The right tool for the right job, that kind of failover persistence is not solr's specialty. But check this! In some cases one is not allowed to save content to disk (think copyrights). I'm not making this up - we actually have a customer with this cannot save to disk (but can index) requirement. So buffering to disk is not an option, and buffering in memory is not practical because of the input document rate and their size. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Otis Gospodnetic [otis_gospodne...@yahoo.com] Sent: Tuesday, March 08, 2011 11:45 PM To: solr-user@lucene.apache.org Subject: True master-master fail-over without data gaps Hello, What are some common or good ways to handle indexing (master) fail-over? Imagine you have a continuous stream of incoming documents that you have to index without losing any of them (or with losing as few of them as possible). How do you set up you masters? In other words, you can't just have 2 masters where the secondary is the Repeater (or Slave) of the primary master and replicates the index periodically: you need to have 2 masters that are in sync at all times! How do you achieve that? * Do you just put N masters behind a LB VIP, configure them both to point to the index on some shared storage (e.g. SAN), and count on the LB to fail-over to the secondary master when the primary becomes unreachable? If so, how do you deal with index locks? You use the Native lock and count on it disappearing when the primary master goes down? That means you count on the whole JVM process dying, which may not be the case... * Or do you use tools like DRBD, Corosync, Pacemaker, etc. to keep 2 masters with 2 separate indices in sync, while making sure you write to only 1 of them via LB VIP or otherwise? * Or ... This thread is on a similar topic, but is inconclusive: http://search-lucene.com/m/aOsyN15f1qd1 Here is another similar thread, but this one doesn't cover how 2 masters are kept in sync at all times: http://search-lucene.com/m/aOsyN15f1qd1 Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: True master-master fail-over without data gaps
This is why there's block cipher cryptography. On Wed, Mar 9, 2011 at 9:11 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: On disk, yes, but only indexed, and thus far enough from the original content to make storing terms in Lucene's inverted index. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
RE: True master-master fail-over without data gaps
I guess you could put a LB between slaves and masters, never thought of that! :) -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, March 09, 2011 9:10 AM To: solr-user@lucene.apache.org Subject: Re: True master-master fail-over without data gaps Hi, - Original Message Currently I use an application connected to a queue containing incoming data which my indexer app turns into solr docs. I log everything to a log table and have never had an issue with losing anything. Yeah, if everything goes through some storage that can be polled (either a DB or a durable JMS Topic or some such), then N masters could connect to it, not miss anything, and be more or less in near real-time sync. I can trace incoming docs exactly, and keep timing data in there also. If I added a second solr url for a second master and resent the same doc to master02 that I sent to master01, I would expect near 100% synchronization. The problem here is how to get the slave farm to start replicating from the second master if and when the first goes down. I can only see that as being a manual operation, repointing the slaves to master02 and restarting or reloading them etc... Actually, you can configure a LB to handle that, so that's less of a problem, I think. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, March 09, 2011 8:52 AM To: solr-user@lucene.apache.org Subject: Re: True master-master fail-over without data gaps Hi, - Original Message From: Robert Petersen rober...@buy.com To: solr-user@lucene.apache.org Sent: Wed, March 9, 2011 11:40:56 AM Subject: RE: True master-master fail-over without data gaps If you have a wrapper, like an indexer app which prepares solr docs and sends them into solr, then it is simple. The wrapper is your 'tee' and it can send docs to both (or N) masters. Doesn't this make it too easy for 2 masters to get out of sync even if the problem is not with them? e.g. something happens in this tee component and it indexes a doc to master A, but not master B. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -Original Message- From: Michael Sokolov [mailto:soko...@ifactory.com] Sent: Wednesday, March 09, 2011 4:14 AM To: solr-user@lucene.apache.org Cc: Jonathan Rochkind Subject: Re: True master-master fail-over without data gaps Yes, I think this should be pushed upstream - insert a tee in the document stream so that all documents go to both masters. Then use a load balancer to make requests of the masters. The tee itself then becomes a possible single point of failure, but you didn't say anything about the architecture of the document feed. Is that also fault-tolerant? -Mike On 3/9/2011 1:06 AM, Jonathan Rochkind wrote: I'd honestly think about buffer the incoming documents in some store that's actually made for fail-over persistence reliability, maybe CouchDB or something. And then that's taking care of not losing anything, and the problem becomes how we make sure that our solr master indexes are kept in sync with the actual persistent store; which I'm still not sure about, but I'm thinking it's a simpler problem. The right tool for the right job, that kind of failover persistence is not solr's specialty. From: Otis Gospodnetic [otis_gospodne...@yahoo.com] Sent: Tuesday, March 08, 2011 11:45 PM To: solr-user@lucene.apache.org Subject: True master-master fail-over without data gaps Hello, What are some common or good ways to handle indexing (master) fail-over? Imagine you have a continuous stream of incoming documents that you have to index without losing any of them (or with losing as few of them as possible). How do you set up you masters? In other words, you can't just have 2 masters where the secondary is the Repeater (or Slave) of the primary master and replicates the index periodically: you need to have 2 masters that are in sync at all times! How do you achieve that? * Do you just put N masters behind a LB VIP, configure them both to point to the index on some shared storage (e.g. SAN), and count on the LB to fail-over to the secondary master when the primary becomes unreachable? If so, how do you deal with index locks? You use the Native lock and count on it disappearing when the primary master goes down? That means you count on the whole JVM process dying, which may not be the case... * Or do you use tools like DRBD
Re: True master-master fail-over without data gaps
Right. LB VIP on both sides of master(s). Black box. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Robert Petersen rober...@buy.com To: solr-user@lucene.apache.org Sent: Wed, March 9, 2011 12:16:31 PM Subject: RE: True master-master fail-over without data gaps I guess you could put a LB between slaves and masters, never thought of that! :) -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, March 09, 2011 9:10 AM To: solr-user@lucene.apache.org Subject: Re: True master-master fail-over without data gaps Hi, - Original Message Currently I use an application connected to a queue containing incoming data which my indexer app turns into solr docs. I log everything to a log table and have never had an issue with losing anything. Yeah, if everything goes through some storage that can be polled (either a DB or a durable JMS Topic or some such), then N masters could connect to it, not miss anything, and be more or less in near real-time sync. I can trace incoming docs exactly, and keep timing data in there also. If I added a second solr url for a second master and resent the same doc to master02 that I sent to master01, I would expect near 100% synchronization. The problem here is how to get the slave farm to start replicating from the second master if and when the first goes down. I can only see that as being a manual operation, repointing the slaves to master02 and restarting or reloading them etc... Actually, you can configure a LB to handle that, so that's less of a problem, I think. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, March 09, 2011 8:52 AM To: solr-user@lucene.apache.org Subject: Re: True master-master fail-over without data gaps Hi, - Original Message From: Robert Petersen rober...@buy.com To: solr-user@lucene.apache.org Sent: Wed, March 9, 2011 11:40:56 AM Subject: RE: True master-master fail-over without data gaps If you have a wrapper, like an indexer app which prepares solr docs and sends them into solr, then it is simple. The wrapper is your 'tee' and it can send docs to both (or N) masters. Doesn't this make it too easy for 2 masters to get out of sync even if the problem is not with them? e.g. something happens in this tee component and it indexes a doc to master A, but not master B. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -Original Message- From: Michael Sokolov [mailto:soko...@ifactory.com] Sent: Wednesday, March 09, 2011 4:14 AM To: solr-user@lucene.apache.org Cc: Jonathan Rochkind Subject: Re: True master-master fail-over without data gaps Yes, I think this should be pushed upstream - insert a tee in the document stream so that all documents go to both masters. Then use a load balancer to make requests of the masters. The tee itself then becomes a possible single point of failure, but you didn't say anything about the architecture of the document feed. Is that also fault-tolerant? -Mike On 3/9/2011 1:06 AM, Jonathan Rochkind wrote: I'd honestly think about buffer the incoming documents in some store that's actually made for fail-over persistence reliability, maybe CouchDB or something. And then that's taking care of not losing anything, and the problem becomes how we make sure that our solr master indexes are kept in sync with the actual persistent store; which I'm still not sure about, but I'm thinking it's a simpler problem. The right tool for the right job, that kind of failover persistence is not solr's specialty. From: Otis Gospodnetic [otis_gospodne...@yahoo.com] Sent: Tuesday, March 08, 2011 11:45 PM To: solr-user@lucene.apache.org Subject: True master-master fail-over without data gaps Hello, What are some common or good ways to handle indexing (master) fail-over? Imagine you have a continuous stream of incoming documents that you have to index without losing any of them (or withlosing as few of them as possible). How do you set up you masters? In other words, you can't just have 2 masters where the secondary is the Repeater (or Slave) of the primary master and replicates the index periodically
Re: True master-master fail-over without data gaps
On 3/9/2011 12:05 PM, Otis Gospodnetic wrote: But check this! In some cases one is not allowed to save content to disk (think copyrights). I'm not making this up - we actually have a customer with this cannot save to disk (but can index) requirement. Do they realize that a Solr index is on disk, and if you save it to a Solr index it's being saved to disk? If they prohibited you from putting the doc in a stored field in Solr, I guess that would at least be somewhat consistent, although annoying. But I don't think it's our customers jobs to tell us HOW to implement our software to get the results they want. They can certainly make you promise not to distribute or use copyrighted material, and they can even ask to see your security procedures to make sure it doesn't get out. But if you need to buffer documents to achieve the application they want, but they won't let you... Solr can't help you with that. As I suggested before though, I might rather buffer to a NoSQL store like MongoDB or CouchDB instead of actually to disk. Perhaps your customer won't notice those stores keep data on disk just like they haven't noticed Solr does. I am not an expert in various kinds of NoSQL stores, but I think some of them in fact specialize in the area of concern here: Absolute failover reliability through replication. Solr is not a store. So buffering to disk is not an option, and buffering in memory is not practical because of the input document rate and their size. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Otis Gospodnetic [otis_gospodne...@yahoo.com] Sent: Tuesday, March 08, 2011 11:45 PM To: solr-user@lucene.apache.org Subject: True master-master fail-over without data gaps Hello, What are some common or good ways to handle indexing (master) fail-over? Imagine you have a continuous stream of incoming documents that you have to index without losing any of them (or with losing as few of them as possible). How do you set up you masters? In other words, you can't just have 2 masters where the secondary is the Repeater (or Slave) of the primary master and replicates the index periodically: you need to have 2 masters that are in sync at all times! How do you achieve that? * Do you just put N masters behind a LB VIP, configure them both to point to the index on some shared storage (e.g. SAN), and count on the LB to fail-over to the secondary master when the primary becomes unreachable? If so, how do you deal with index locks? You use the Native lock and count on it disappearing when the primary master goes down? That means you count on the whole JVM process dying, which may not be the case... * Or do you use tools like DRBD, Corosync, Pacemaker, etc. to keep 2 masters with 2 separate indices in sync, while making sure you write to only 1 of them via LB VIP or otherwise? * Or ... This thread is on a similar topic, but is inconclusive: http://search-lucene.com/m/aOsyN15f1qd1 Here is another similar thread, but this one doesn't cover how 2 masters are kept in sync at all times: http://search-lucene.com/m/aOsyN15f1qd1 Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: True master-master fail-over without data gaps (choosing CA in CAP)
Hi, - Original Message From: Walter Underwood wun...@wunderwood.org On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote: You mean it's not possible to have 2 masters that are in nearly real-time sync? How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs (their edit logs) in sync to avoid the current NN SPOF, for example, so I'm thinking this could be doable with Solr masters, too, no? If you add fault-tolerant, you run into the CAP Theorem. Consistency, availability, partition: choose two. You cannot have it all. Right, so I'll take Consistency and Availability, and I'll put my 2 masters in the same rack (which has redundant switches, power supply, etc.) and thus minimize/avoid partitioning. Assuming the above actually works, I think my Q remains: How do you set up 2 Solr masters so they are in near real-time sync? DRBD? But here is maybe a simpler scenario that more people may be considering: Imagine 2 masters on 2 different servers in 1 rack, pointing to the same index on the shared storage (SAN) that also happens to live in the same rack. 2 Solr masters are behind 1 LB VIP that indexer talks to. The VIP is configured so that all requests always get routed to the primary master (because only 1 master can be modifying an index at a time), except when this primary is down, in which case the requests are sent to the secondary master. So in this case my Q is around automation of this, around Lucene index locks, around the need for manual intervention, and such. Concretely, if you have these 2 master instances, the primary master has the Lucene index lock in the index dir. When the secondary master needs to take over (i.e., when it starts receiving documents via LB), it needs to be able to write to that same index. But what if that lock is still around? One could use the Native lock to make the lock disappear if the primary master's JVM exited unexpectedly, and in that case everything *should* work and be completely transparent, right? That is, the secondary will start getting new docs, it will use its IndexWriter to write to that same shared index, which won't be locked for writes because the lock is gone, and everyone will be happy. Did I miss something important here? Assuming the above is correct, what if the lock is *not* gone because the primary master's JVM is actually not dead, although maybe unresponsive, so LB thinks the primary master is dead. Then the LB will route indexing requests to the secondary master, which will attempt to write to the index, but be denied because of the lock. So a human needs to jump in, remove the lock, and manually reindex failed docs if the upstream component doesn't buffer docs that failed to get indexed and doesn't retry indexing them automatically. Is this correct or is there a way to avoid humans here? Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
RE: True master-master fail-over without data gaps (choosing CA in CAP)
Can't you skip the SAN and keep the indexes locally? Then you would have two redundant copies of the index and no lock issues. Also, Can't master02 just be a slave to master01 (in the master farm and separate from the slave farm) until such time as master01 fails? Then master02 would start receiving the new documents with an indexes complete up to the last replication at least and the other slaves would be directed by LB to poll master02 also... -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, March 09, 2011 9:47 AM To: solr-user@lucene.apache.org Subject: Re: True master-master fail-over without data gaps (choosing CA in CAP) Hi, - Original Message From: Walter Underwood wun...@wunderwood.org On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote: You mean it's not possible to have 2 masters that are in nearly real-time sync? How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs (their edit logs) in sync to avoid the current NN SPOF, for example, so I'm thinking this could be doable with Solr masters, too, no? If you add fault-tolerant, you run into the CAP Theorem. Consistency, availability, partition: choose two. You cannot have it all. Right, so I'll take Consistency and Availability, and I'll put my 2 masters in the same rack (which has redundant switches, power supply, etc.) and thus minimize/avoid partitioning. Assuming the above actually works, I think my Q remains: How do you set up 2 Solr masters so they are in near real-time sync? DRBD? But here is maybe a simpler scenario that more people may be considering: Imagine 2 masters on 2 different servers in 1 rack, pointing to the same index on the shared storage (SAN) that also happens to live in the same rack. 2 Solr masters are behind 1 LB VIP that indexer talks to. The VIP is configured so that all requests always get routed to the primary master (because only 1 master can be modifying an index at a time), except when this primary is down, in which case the requests are sent to the secondary master. So in this case my Q is around automation of this, around Lucene index locks, around the need for manual intervention, and such. Concretely, if you have these 2 master instances, the primary master has the Lucene index lock in the index dir. When the secondary master needs to take over (i.e., when it starts receiving documents via LB), it needs to be able to write to that same index. But what if that lock is still around? One could use the Native lock to make the lock disappear if the primary master's JVM exited unexpectedly, and in that case everything *should* work and be completely transparent, right? That is, the secondary will start getting new docs, it will use its IndexWriter to write to that same shared index, which won't be locked for writes because the lock is gone, and everyone will be happy. Did I miss something important here? Assuming the above is correct, what if the lock is *not* gone because the primary master's JVM is actually not dead, although maybe unresponsive, so LB thinks the primary master is dead. Then the LB will route indexing requests to the secondary master, which will attempt to write to the index, but be denied because of the lock. So a human needs to jump in, remove the lock, and manually reindex failed docs if the upstream component doesn't buffer docs that failed to get indexed and doesn't retry indexing them automatically. Is this correct or is there a way to avoid humans here? Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: True master-master fail-over without data gaps (choosing CA in CAP)
Hi, Original Message From: Robert Petersen rober...@buy.com Can't you skip the SAN and keep the indexes locally? Then you would have two redundant copies of the index and no lock issues. I could, but then I'd have the issue of keeping them in sync, which seems more fragile. I think SAN makes things simpler overall. Also, Can't master02 just be a slave to master01 (in the master farm and separate from the slave farm) until such time as master01 fails? Then No, because it wouldn't be in sync. It would always be N minutes behind, and when the primary master fails, the secondary would not have all the docs - data loss. master02 would start receiving the new documents with an indexes complete up to the last replication at least and the other slaves would be directed by LB to poll master02 also... Yeah, complete up to the last replication is the problem. It's a data gap that now needs to be filled somehow. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, March 09, 2011 9:47 AM To: solr-user@lucene.apache.org Subject: Re: True master-master fail-over without data gaps (choosing CA in CAP) Hi, - Original Message From: Walter Underwood wun...@wunderwood.org On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote: You mean it's not possible to have 2 masters that are in nearly real-time sync? How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs (their edit logs) in sync to avoid the current NN SPOF, for example, so I'm thinking this could be doable with Solr masters, too, no? If you add fault-tolerant, you run into the CAP Theorem. Consistency, availability, partition: choose two. You cannot have it all. Right, so I'll take Consistency and Availability, and I'll put my 2 masters in the same rack (which has redundant switches, power supply, etc.) and thus minimize/avoid partitioning. Assuming the above actually works, I think my Q remains: How do you set up 2 Solr masters so they are in near real-time sync? DRBD? But here is maybe a simpler scenario that more people may be considering: Imagine 2 masters on 2 different servers in 1 rack, pointing to the same index on the shared storage (SAN) that also happens to live in the same rack. 2 Solr masters are behind 1 LB VIP that indexer talks to. The VIP is configured so that all requests always get routed to the primary master (because only 1 master can be modifying an index at a time), except when this primary is down, in which case the requests are sent to the secondary master. So in this case my Q is around automation of this, around Lucene index locks, around the need for manual intervention, and such. Concretely, if you have these 2 master instances, the primary master has the Lucene index lock in the index dir. When the secondary master needs to take over (i.e., when it starts receiving documents via LB), it needs to be able to write to that same index. But what if that lock is still around? One could use the Native lock to make the lock disappear if the primary master's JVM exited unexpectedly, and in that case everything *should* work and be completely transparent, right? That is, the secondary will start getting new docs, it will use its IndexWriter to write to that same shared index, which won't be locked for writes because the lock is gone, and everyone will be happy. Did I miss something important here? Assuming the above is correct, what if the lock is *not* gone because the primary master's JVM is actually not dead, although maybe unresponsive, so LB thinks the primary master is dead. Then the LB will route indexing requests to the secondary master, which will attempt to write to the index, but be denied because of the lock. So a human needs to jump in, remove the lock, and manually reindex failed docs if the upstream component doesn't buffer docs that failed to get indexed and doesn't retry indexing them automatically. Is this correct or is there a way to avoid humans here? Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: True master-master fail-over without data gaps (choosing CA in CAP)
Hi Otis, Have you considered using Solandra with Quorum writes to achieve master/master with CA semantics? -Jake On Wed, Mar 9, 2011 at 2:48 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, Original Message From: Robert Petersen rober...@buy.com Can't you skip the SAN and keep the indexes locally? Then you would have two redundant copies of the index and no lock issues. I could, but then I'd have the issue of keeping them in sync, which seems more fragile. I think SAN makes things simpler overall. Also, Can't master02 just be a slave to master01 (in the master farm and separate from the slave farm) until such time as master01 fails? Then No, because it wouldn't be in sync. It would always be N minutes behind, and when the primary master fails, the secondary would not have all the docs - data loss. master02 would start receiving the new documents with an indexes complete up to the last replication at least and the other slaves would be directed by LB to poll master02 also... Yeah, complete up to the last replication is the problem. It's a data gap that now needs to be filled somehow. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, March 09, 2011 9:47 AM To: solr-user@lucene.apache.org Subject: Re: True master-master fail-over without data gaps (choosing CA in CAP) Hi, - Original Message From: Walter Underwood wun...@wunderwood.org On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote: You mean it's not possible to have 2 masters that are in nearly real-time sync? How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs (their edit logs) in sync to avoid the current NN SPOF, for example, so I'm thinking this could be doable with Solr masters, too, no? If you add fault-tolerant, you run into the CAP Theorem. Consistency, availability, partition: choose two. You cannot have it all. Right, so I'll take Consistency and Availability, and I'll put my 2 masters in the same rack (which has redundant switches, power supply, etc.) and thus minimize/avoid partitioning. Assuming the above actually works, I think my Q remains: How do you set up 2 Solr masters so they are in near real-time sync? DRBD? But here is maybe a simpler scenario that more people may be considering: Imagine 2 masters on 2 different servers in 1 rack, pointing to the same index on the shared storage (SAN) that also happens to live in the same rack. 2 Solr masters are behind 1 LB VIP that indexer talks to. The VIP is configured so that all requests always get routed to the primary master (because only 1 master can be modifying an index at a time), except when this primary is down, in which case the requests are sent to the secondary master. So in this case my Q is around automation of this, around Lucene index locks, around the need for manual intervention, and such. Concretely, if you have these 2 master instances, the primary master has the Lucene index lock in the index dir. When the secondary master needs to take over (i.e., when it starts receiving documents via LB), it needs to be able to write to that same index. But what if that lock is still around? One could use the Native lock to make the lock disappear if the primary master's JVM exited unexpectedly, and in that case everything *should* work and be completely transparent, right? That is, the secondary will start getting new docs, it will use its IndexWriter to write to that same shared index, which won't be locked for writes because the lock is gone, and everyone will be happy. Did I miss something important here? Assuming the above is correct, what if the lock is *not* gone because the primary master's JVM is actually not dead, although maybe unresponsive, so LB thinks the primary master is dead. Then the LB will route indexing requests to the secondary master, which will attempt to write to the index, but be denied because of the lock. So a human needs to jump in, remove the lock, and manually reindex failed docs if the upstream component doesn't buffer docs that failed to get indexed and doesn't retry indexing them automatically. Is this correct or is there a way to avoid humans here? Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -- http://twitter.com/tjake
Re: True master-master fail-over without data gaps (choosing CA in CAP)
I was just about to jump in this conversation to mention Solandra and go fig, Solandra's committer comes in. :-) It was nice to meet you at Strata, Jake. I haven't dug into the code yet but Solandra strikes me as a killer way to scale Solr. I'm looking forward to playing with it; particularly looking at disk requirements and performance measurements. ~ David Smiley On Mar 9, 2011, at 3:14 PM, Jake Luciani wrote: Hi Otis, Have you considered using Solandra with Quorum writes to achieve master/master with CA semantics? -Jake On Wed, Mar 9, 2011 at 2:48 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, Original Message From: Robert Petersen rober...@buy.com Can't you skip the SAN and keep the indexes locally? Then you would have two redundant copies of the index and no lock issues. I could, but then I'd have the issue of keeping them in sync, which seems more fragile. I think SAN makes things simpler overall. Also, Can't master02 just be a slave to master01 (in the master farm and separate from the slave farm) until such time as master01 fails? Then No, because it wouldn't be in sync. It would always be N minutes behind, and when the primary master fails, the secondary would not have all the docs - data loss. master02 would start receiving the new documents with an indexes complete up to the last replication at least and the other slaves would be directed by LB to poll master02 also... Yeah, complete up to the last replication is the problem. It's a data gap that now needs to be filled somehow. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, March 09, 2011 9:47 AM To: solr-user@lucene.apache.org Subject: Re: True master-master fail-over without data gaps (choosing CA in CAP) Hi, - Original Message From: Walter Underwood wun...@wunderwood.org On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote: You mean it's not possible to have 2 masters that are in nearly real-time sync? How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs (their edit logs) in sync to avoid the current NN SPOF, for example, so I'm thinking this could be doable with Solr masters, too, no? If you add fault-tolerant, you run into the CAP Theorem. Consistency, availability, partition: choose two. You cannot have it all. Right, so I'll take Consistency and Availability, and I'll put my 2 masters in the same rack (which has redundant switches, power supply, etc.) and thus minimize/avoid partitioning. Assuming the above actually works, I think my Q remains: How do you set up 2 Solr masters so they are in near real-time sync? DRBD? But here is maybe a simpler scenario that more people may be considering: Imagine 2 masters on 2 different servers in 1 rack, pointing to the same index on the shared storage (SAN) that also happens to live in the same rack. 2 Solr masters are behind 1 LB VIP that indexer talks to. The VIP is configured so that all requests always get routed to the primary master (because only 1 master can be modifying an index at a time), except when this primary is down, in which case the requests are sent to the secondary master. So in this case my Q is around automation of this, around Lucene index locks, around the need for manual intervention, and such. Concretely, if you have these 2 master instances, the primary master has the Lucene index lock in the index dir. When the secondary master needs to take over (i.e., when it starts receiving documents via LB), it needs to be able to write to that same index. But what if that lock is still around? One could use the Native lock to make the lock disappear if the primary master's JVM exited unexpectedly, and in that case everything *should* work and be completely transparent, right? That is, the secondary will start getting new docs, it will use its IndexWriter to write to that same shared index, which won't be locked for writes because the lock is gone, and everyone will be happy. Did I miss something important here? Assuming the above is correct, what if the lock is *not* gone because the primary master's JVM is actually not dead, although maybe unresponsive, so LB thinks the primary master is dead. Then the LB will route indexing requests to the secondary master, which will attempt to write to the index, but be denied because of the lock. So a human needs to jump in, remove the lock, and manually reindex failed docs if the upstream component doesn't buffer docs that failed to get indexed and doesn't retry indexing them automatically. Is this correct or is there a way to avoid humans here? Thanks, Otis Sematext :: http
Re: True master-master fail-over without data gaps (choosing CA in CAP)
Doesn't Solandra partition by term instead of document? On Wed, Mar 9, 2011 at 2:13 PM, Smiley, David W. dsmi...@mitre.org wrote: I was just about to jump in this conversation to mention Solandra and go fig, Solandra's committer comes in. :-) It was nice to meet you at Strata, Jake. I haven't dug into the code yet but Solandra strikes me as a killer way to scale Solr. I'm looking forward to playing with it; particularly looking at disk requirements and performance measurements. ~ David Smiley On Mar 9, 2011, at 3:14 PM, Jake Luciani wrote: Hi Otis, Have you considered using Solandra with Quorum writes to achieve master/master with CA semantics? -Jake On Wed, Mar 9, 2011 at 2:48 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, Original Message From: Robert Petersen rober...@buy.com Can't you skip the SAN and keep the indexes locally? Then you would have two redundant copies of the index and no lock issues. I could, but then I'd have the issue of keeping them in sync, which seems more fragile. I think SAN makes things simpler overall. Also, Can't master02 just be a slave to master01 (in the master farm and separate from the slave farm) until such time as master01 fails? Then No, because it wouldn't be in sync. It would always be N minutes behind, and when the primary master fails, the secondary would not have all the docs - data loss. master02 would start receiving the new documents with an indexes complete up to the last replication at least and the other slaves would be directed by LB to poll master02 also... Yeah, complete up to the last replication is the problem. It's a data gap that now needs to be filled somehow. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, March 09, 2011 9:47 AM To: solr-user@lucene.apache.org Subject: Re: True master-master fail-over without data gaps (choosing CA in CAP) Hi, - Original Message From: Walter Underwood wun...@wunderwood.org On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote: You mean it's not possible to have 2 masters that are in nearly real-time sync? How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs (their edit logs) in sync to avoid the current NN SPOF, for example, so I'm thinking this could be doable with Solr masters, too, no? If you add fault-tolerant, you run into the CAP Theorem. Consistency, availability, partition: choose two. You cannot have it all. Right, so I'll take Consistency and Availability, and I'll put my 2 masters in the same rack (which has redundant switches, power supply, etc.) and thus minimize/avoid partitioning. Assuming the above actually works, I think my Q remains: How do you set up 2 Solr masters so they are in near real-time sync? DRBD? But here is maybe a simpler scenario that more people may be considering: Imagine 2 masters on 2 different servers in 1 rack, pointing to the same index on the shared storage (SAN) that also happens to live in the same rack. 2 Solr masters are behind 1 LB VIP that indexer talks to. The VIP is configured so that all requests always get routed to the primary master (because only 1 master can be modifying an index at a time), except when this primary is down, in which case the requests are sent to the secondary master. So in this case my Q is around automation of this, around Lucene index locks, around the need for manual intervention, and such. Concretely, if you have these 2 master instances, the primary master has the Lucene index lock in the index dir. When the secondary master needs to take over (i.e., when it starts receiving documents via LB), it needs to be able to write to that same index. But what if that lock is still around? One could use the Native lock to make the lock disappear if the primary master's JVM exited unexpectedly, and in that case everything *should* work and be completely transparent, right? That is, the secondary will start getting new docs, it will use its IndexWriter to write to that same shared index, which won't be locked for writes because the lock is gone, and everyone will be happy. Did I miss something important here? Assuming the above is correct, what if the lock is *not* gone because the primary master's JVM is actually not dead, although maybe unresponsive, so LB thinks the primary master is dead. Then the LB will route indexing requests to the secondary master, which will attempt to write to the index, but be denied because of the lock. So a human needs to jump in, remove the lock, and manually reindex failed docs if the upstream component doesn't buffer docs that failed to get indexed and doesn't retry indexing them automatically
Re: True master-master fail-over without data gaps (choosing CA in CAP)
Jason, It's predecessor did, Lucandra. But Solandra is a new approach that manages shards of documents across the cluster for you and uses solrs distributed search to query indexes. Jake On Mar 9, 2011, at 5:15 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Doesn't Solandra partition by term instead of document? On Wed, Mar 9, 2011 at 2:13 PM, Smiley, David W. dsmi...@mitre.org wrote: I was just about to jump in this conversation to mention Solandra and go fig, Solandra's committer comes in. :-) It was nice to meet you at Strata, Jake. I haven't dug into the code yet but Solandra strikes me as a killer way to scale Solr. I'm looking forward to playing with it; particularly looking at disk requirements and performance measurements. ~ David Smiley On Mar 9, 2011, at 3:14 PM, Jake Luciani wrote: Hi Otis, Have you considered using Solandra with Quorum writes to achieve master/master with CA semantics? -Jake On Wed, Mar 9, 2011 at 2:48 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, Original Message From: Robert Petersen rober...@buy.com Can't you skip the SAN and keep the indexes locally? Then you would have two redundant copies of the index and no lock issues. I could, but then I'd have the issue of keeping them in sync, which seems more fragile. I think SAN makes things simpler overall. Also, Can't master02 just be a slave to master01 (in the master farm and separate from the slave farm) until such time as master01 fails? Then No, because it wouldn't be in sync. It would always be N minutes behind, and when the primary master fails, the secondary would not have all the docs - data loss. master02 would start receiving the new documents with an indexes complete up to the last replication at least and the other slaves would be directed by LB to poll master02 also... Yeah, complete up to the last replication is the problem. It's a data gap that now needs to be filled somehow. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, March 09, 2011 9:47 AM To: solr-user@lucene.apache.org Subject: Re: True master-master fail-over without data gaps (choosing CA in CAP) Hi, - Original Message From: Walter Underwood wun...@wunderwood.org On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote: You mean it's not possible to have 2 masters that are in nearly real-time sync? How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs (their edit logs) in sync to avoid the current NN SPOF, for example, so I'm thinking this could be doable with Solr masters, too, no? If you add fault-tolerant, you run into the CAP Theorem. Consistency, availability, partition: choose two. You cannot have it all. Right, so I'll take Consistency and Availability, and I'll put my 2 masters in the same rack (which has redundant switches, power supply, etc.) and thus minimize/avoid partitioning. Assuming the above actually works, I think my Q remains: How do you set up 2 Solr masters so they are in near real-time sync? DRBD? But here is maybe a simpler scenario that more people may be considering: Imagine 2 masters on 2 different servers in 1 rack, pointing to the same index on the shared storage (SAN) that also happens to live in the same rack. 2 Solr masters are behind 1 LB VIP that indexer talks to. The VIP is configured so that all requests always get routed to the primary master (because only 1 master can be modifying an index at a time), except when this primary is down, in which case the requests are sent to the secondary master. So in this case my Q is around automation of this, around Lucene index locks, around the need for manual intervention, and such. Concretely, if you have these 2 master instances, the primary master has the Lucene index lock in the index dir. When the secondary master needs to take over (i.e., when it starts receiving documents via LB), it needs to be able to write to that same index. But what if that lock is still around? One could use the Native lock to make the lock disappear if the primary master's JVM exited unexpectedly, and in that case everything *should* work and be completely transparent, right? That is, the secondary will start getting new docs, it will use its IndexWriter to write to that same shared index, which won't be locked for writes because the lock is gone, and everyone will be happy. Did I miss something important here? Assuming the above is correct, what if the lock is *not* gone because the primary master's JVM is actually not dead, although maybe unresponsive, so LB thinks the primary master is dead. Then the LB will route indexing
Re: True master-master fail-over without data gaps (choosing CA in CAP)
Jake, Maybe it's time to come up with the Solandra/Solr matrix so we can see Solandra's strengths (e.g. RT, no replication) and weaknesses (e.g. I think I saw a mention of some big indices?) or missing feature (e.g. no delete by query), etc. Thanks! Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Jake Luciani jak...@gmail.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Wed, March 9, 2011 6:04:13 PM Subject: Re: True master-master fail-over without data gaps (choosing CA in CAP) Jason, It's predecessor did, Lucandra. But Solandra is a new approach that manages shards of documents across the cluster for you and uses solrs distributed search to query indexes. Jake On Mar 9, 2011, at 5:15 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Doesn't Solandra partition by term instead of document? On Wed, Mar 9, 2011 at 2:13 PM, Smiley, David W. dsmi...@mitre.org wrote: I was just about to jump in this conversation to mention Solandra and go fig, Solandra's committer comes in. :-) It was nice to meet you at Strata, Jake. I haven't dug into the code yet but Solandra strikes me as a killer way to scale Solr. I'm looking forward to playing with it; particularly looking at disk requirements and performance measurements. ~ David Smiley On Mar 9, 2011, at 3:14 PM, Jake Luciani wrote: Hi Otis, Have you considered using Solandra with Quorum writes to achieve master/master with CA semantics? -Jake On Wed, Mar 9, 2011 at 2:48 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, Original Message From: Robert Petersen rober...@buy.com Can't you skip the SAN and keep the indexes locally? Then you would have two redundant copies of the index and no lock issues. I could, but then I'd have the issue of keeping them in sync, which seems more fragile. I think SAN makes things simpler overall. Also, Can't master02 just be a slave to master01 (in the master farm and separate from the slave farm) until such time as master01 fails? Then No, because it wouldn't be in sync. It would always be N minutes behind, and when the primary master fails, the secondary would not have all the docs - data loss. master02 would start receiving the new documents with an indexes complete up to the last replication at least and the other slaves would be directed by LB to poll master02 also... Yeah, complete up to the last replication is the problem. It's a data gap that now needs to be filled somehow. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, March 09, 2011 9:47 AM To: solr-user@lucene.apache.org Subject: Re: True master-master fail-over without data gaps (choosing CA in CAP) Hi, - Original Message From: Walter Underwood wun...@wunderwood.org On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote: You mean it's not possible to have 2 masters that are in nearly real-time sync? How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs (their edit logs) in sync to avoid the current NN SPOF, for example, so I'm thinking this could be doable with Solr masters, too, no? If you add fault-tolerant, you run into the CAP Theorem. Consistency, availability, partition: choose two. You cannot have it all. Right, so I'll take Consistency and Availability, and I'll put my 2 masters in the same rack (which has redundant switches, power supply, etc.) and thus minimize/avoid partitioning. Assuming the above actually works, I think my Q remains: How do you set up 2 Solr masters so they are in near real-time sync? DRBD? But here is maybe a simpler scenario that more people may be considering: Imagine 2 masters on 2 different servers in 1 rack, pointing to the same index on the shared storage (SAN) that also happens to live in the same rack. 2 Solr masters are behind 1 LB VIP that indexer talks to. The VIP is configured so that all requests always get routed to the primary master (because only 1 master can be modifying an index at a time), except when this primary is down, in which case the requests are sent to the secondary master. So in this case my Q is around automation of this, around Lucene index locks, around the need for manual intervention, and such. Concretely, if you have these 2 master instances, the primary master has the Lucene index lock in the index dir. When the secondary
Re: True master-master fail-over without data gaps (choosing CA in CAP)
Yeah sure. Let me update this on the Solandra wiki. I'll send across the link I think you hit the main two shortcomings atm. -Jake On Wed, Mar 9, 2011 at 6:17 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Jake, Maybe it's time to come up with the Solandra/Solr matrix so we can see Solandra's strengths (e.g. RT, no replication) and weaknesses (e.g. I think I saw a mention of some big indices?) or missing feature (e.g. no delete by query), etc. Thanks! Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Jake Luciani jak...@gmail.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Wed, March 9, 2011 6:04:13 PM Subject: Re: True master-master fail-over without data gaps (choosing CA in CAP) Jason, It's predecessor did, Lucandra. But Solandra is a new approach that manages shards of documents across the cluster for you and uses solrs distributed search to query indexes. Jake On Mar 9, 2011, at 5:15 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Doesn't Solandra partition by term instead of document? On Wed, Mar 9, 2011 at 2:13 PM, Smiley, David W. dsmi...@mitre.org wrote: I was just about to jump in this conversation to mention Solandra and go fig, Solandra's committer comes in. :-) It was nice to meet you at Strata, Jake. I haven't dug into the code yet but Solandra strikes me as a killer way to scale Solr. I'm looking forward to playing with it; particularly looking at disk requirements and performance measurements. ~ David Smiley On Mar 9, 2011, at 3:14 PM, Jake Luciani wrote: Hi Otis, Have you considered using Solandra with Quorum writes to achieve master/master with CA semantics? -Jake On Wed, Mar 9, 2011 at 2:48 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, Original Message From: Robert Petersen rober...@buy.com Can't you skip the SAN and keep the indexes locally? Then you would have two redundant copies of the index and no lock issues. I could, but then I'd have the issue of keeping them in sync, which seems more fragile. I think SAN makes things simpler overall. Also, Can't master02 just be a slave to master01 (in the master farm and separate from the slave farm) until such time as master01 fails? Then No, because it wouldn't be in sync. It would always be N minutes behind, and when the primary master fails, the secondary would not have all the docs - data loss. master02 would start receiving the new documents with an indexes complete up to the last replication at least and the other slaves would be directed by LB to poll master02 also... Yeah, complete up to the last replication is the problem. It's a data gap that now needs to be filled somehow. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, March 09, 2011 9:47 AM To: solr-user@lucene.apache.org Subject: Re: True master-master fail-over without data gaps (choosing CA in CAP) Hi, - Original Message From: Walter Underwood wun...@wunderwood.org On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote: You mean it's not possible to have 2 masters that are in nearly real-time sync? How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs (their edit logs) in sync to avoid the current NN SPOF, for example, so I'm thinking this could be doable with Solr masters, too, no? If you add fault-tolerant, you run into the CAP Theorem. Consistency, availability, partition: choose two. You cannot have it all. Right, so I'll take Consistency and Availability, and I'll put my 2 masters in the same rack (which has redundant switches, power supply, etc.) and thus minimize/avoid partitioning. Assuming the above actually works, I think my Q remains: How do you set up 2 Solr masters so they are in near real-time sync? DRBD? But here is maybe a simpler scenario that more people may be considering: Imagine 2 masters on 2 different servers in 1 rack, pointing to the same index on the shared storage (SAN) that also happens to live in the same rack. 2 Solr masters are behind 1 LB VIP that indexer talks to. The VIP is configured so that all requests always get routed to the primary master (because only 1 master can be modifying an index at a time), except when this primary is down, in which case the requests are sent
RE: True master-master fail-over without data gaps
I'd honestly think about buffer the incoming documents in some store that's actually made for fail-over persistence reliability, maybe CouchDB or something. And then that's taking care of not losing anything, and the problem becomes how we make sure that our solr master indexes are kept in sync with the actual persistent store; which I'm still not sure about, but I'm thinking it's a simpler problem. The right tool for the right job, that kind of failover persistence is not solr's specialty. From: Otis Gospodnetic [otis_gospodne...@yahoo.com] Sent: Tuesday, March 08, 2011 11:45 PM To: solr-user@lucene.apache.org Subject: True master-master fail-over without data gaps Hello, What are some common or good ways to handle indexing (master) fail-over? Imagine you have a continuous stream of incoming documents that you have to index without losing any of them (or with losing as few of them as possible). How do you set up you masters? In other words, you can't just have 2 masters where the secondary is the Repeater (or Slave) of the primary master and replicates the index periodically: you need to have 2 masters that are in sync at all times! How do you achieve that? * Do you just put N masters behind a LB VIP, configure them both to point to the index on some shared storage (e.g. SAN), and count on the LB to fail-over to the secondary master when the primary becomes unreachable? If so, how do you deal with index locks? You use the Native lock and count on it disappearing when the primary master goes down? That means you count on the whole JVM process dying, which may not be the case... * Or do you use tools like DRBD, Corosync, Pacemaker, etc. to keep 2 masters with 2 separate indices in sync, while making sure you write to only 1 of them via LB VIP or otherwise? * Or ... This thread is on a similar topic, but is inconclusive: http://search-lucene.com/m/aOsyN15f1qd1 Here is another similar thread, but this one doesn't cover how 2 masters are kept in sync at all times: http://search-lucene.com/m/aOsyN15f1qd1 Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/