Re: SolrCloud upgrade concern
Thanks for all this information. It clears lot of confusion surrounding CDCR feature. Although, I should say that if CDCR functionality is so fragile in SolrCloud and not worth pursuing much, does it make sense to add some warning about its possible shortcomings in the documentation? On Thu, May 28, 2020 at 9:02 AM Jan Høydahl wrote: > I had a client who asked a lot about CDCR a few years ago, but I kept > recommending > aginst it and recommended them to go for Ericks’s alternative (2), since > they anyway > needed to replicate their Oracle DBs in each DC as well. Much cleaner > design to let > each cluster have a local datasource and always stay in sync with local DB > than to > replicate both DB and index. > > There are of course use cases where you want to sync a read-only copy of > indices > to multiple DCs. I hope we’ll see a 3rd party tool for that some day, > something that > can sit outside your Solr clusters, monitor ZK of each cluster, and do > some magic :) > > Jan > > > 28. mai 2020 kl. 01:17 skrev Erick Erickson : > > > > The biggest issue with CDCR is it’s rather fragile and requires > monitoring, > > it’s not a “fire and forget” type of functionality. For instance, the > use of the > > tlogs as a queueing mechanism means that if, for any reason, the > communications > > between DCs is broken, the tlogs will grow forever until the connection > is > > re-established. Plus the other issues Jason pointed out. > > > > So yes, some companies do use CDCR to communicate between separate > > DCs. But they also put in some “roll your own” type of monitoring to > insure > > things don’t go haywire. > > > > Alternatives: > > 1> use something that’s built from the ground up to provide reliable > > messaging between DCs. Kafka or similar has been mentioned. Write > > your updates to the Kafka queue and consume them in both DCs. > > These kinds of solutions have a lot more robustness. > > > > 2> reproduce your system-of-record rather than Solr in the DCs and > > treat the DCs as separate installations. If you adopt this approach, > > some of the streaming capabilities can be used to monitor that they stay > > in sync. For instance have a background or periodic task that’ll take a > while > > for a complete run wrap two "search" streams in a "unique” decorator, > > anything except an empty result identifies docs not on both DCs. > > > > 3> Oh Dear. This one is “interesting”. Wrap a “topic" stream on DC1 in > >an update decorator for DC2 and wrap both of those in a daemon > decorator. > > That’s gobbledygook, and you’ll have to dig through the docs a bit for > > that to make sense. Essentially the topic stream is one of the very > few > > streams that does not (IIRC) require all values in the fl list be > docValues. > > It fires the first time and establishes a checkpoint, finding all docs > up to that point. > > Thereafter, it’ll get docs that have changed since the last time it > ran. It uses a tiny > > collection for record keeping. Each time the topic stream finds new > docs, it passes > > them to the update stream which sends them to another DC. Wrapping the > whole > > thing in a daemon decorator means it periodically runs in the > background. The one > > shortcoming is that this approach doesn’t propagate deletes. That’s > enough of that > > until you tell us whether it sounds worth pursuing ;) > > > > So overall, you _can_ use CDCR to connect remote DCs, but it takes time > and energy > > to make it robust. Its advantage is that it’s entirely contained within > Solr. But it’s not > > getting much attention lately, meaning nobody has decided the > functionality is important > > enough to them to donate the time/resources to make it more robust. Were > someone > > to take an active interest in it, likely it could be kept around as a > plugin that core Solr > > is not responsible for. > > > > Best, > > Erick > > > >> On May 27, 2020, at 4:43 PM, gnandre wrote: > >> > >> Thanks, Jason. This is very helpful. > >> > >> I should clarify though that I am not using CDCR currently with my > >> existing master-slave architecture. What I meant to say earlier was > that we > >> will be relying heavily on the CDCR feature if we migrate from solr > >> master-slave architecture to solrcloud architecture. Are there any > >> alternatives to CDCR? AFAIK, if you want to replicate between different > >> data centers then CDCR is the only option. Also, when you say lot of > >> customers are using SolrCloud successfully, how are they working around > the > >> CDCR situation? Do they not have any data center use cases? Is there > some > >> list maintained somewhere where one can find which companies are using > >> SolrCloud successfully? > >> > >> > >> > >> On Wed, May 27, 2020 at 9:27 AM Jason Gerlowski > >> wrote: > >> > >>> Hi Arnold, > >>> > >>> From what I saw in the community, CDCR saw an initial burst of > >>> development around when it was contributed, but hasn't seen much
Re: SolrCloud upgrade concern
I had a client who asked a lot about CDCR a few years ago, but I kept recommending aginst it and recommended them to go for Ericks’s alternative (2), since they anyway needed to replicate their Oracle DBs in each DC as well. Much cleaner design to let each cluster have a local datasource and always stay in sync with local DB than to replicate both DB and index. There are of course use cases where you want to sync a read-only copy of indices to multiple DCs. I hope we’ll see a 3rd party tool for that some day, something that can sit outside your Solr clusters, monitor ZK of each cluster, and do some magic :) Jan > 28. mai 2020 kl. 01:17 skrev Erick Erickson : > > The biggest issue with CDCR is it’s rather fragile and requires monitoring, > it’s not a “fire and forget” type of functionality. For instance, the use of > the > tlogs as a queueing mechanism means that if, for any reason, the > communications > between DCs is broken, the tlogs will grow forever until the connection is > re-established. Plus the other issues Jason pointed out. > > So yes, some companies do use CDCR to communicate between separate > DCs. But they also put in some “roll your own” type of monitoring to insure > things don’t go haywire. > > Alternatives: > 1> use something that’s built from the ground up to provide reliable > messaging between DCs. Kafka or similar has been mentioned. Write > your updates to the Kafka queue and consume them in both DCs. > These kinds of solutions have a lot more robustness. > > 2> reproduce your system-of-record rather than Solr in the DCs and > treat the DCs as separate installations. If you adopt this approach, > some of the streaming capabilities can be used to monitor that they stay > in sync. For instance have a background or periodic task that’ll take a while > for a complete run wrap two "search" streams in a "unique” decorator, > anything except an empty result identifies docs not on both DCs. > > 3> Oh Dear. This one is “interesting”. Wrap a “topic" stream on DC1 in >an update decorator for DC2 and wrap both of those in a daemon decorator. > That’s gobbledygook, and you’ll have to dig through the docs a bit for > that to make sense. Essentially the topic stream is one of the very few > streams that does not (IIRC) require all values in the fl list be docValues. > It fires the first time and establishes a checkpoint, finding all docs up > to that point. > Thereafter, it’ll get docs that have changed since the last time it ran. It > uses a tiny > collection for record keeping. Each time the topic stream finds new docs, > it passes > them to the update stream which sends them to another DC. Wrapping the whole > thing in a daemon decorator means it periodically runs in the background. > The one > shortcoming is that this approach doesn’t propagate deletes. That’s enough > of that > until you tell us whether it sounds worth pursuing ;) > > So overall, you _can_ use CDCR to connect remote DCs, but it takes time and > energy > to make it robust. Its advantage is that it’s entirely contained within Solr. > But it’s not > getting much attention lately, meaning nobody has decided the functionality > is important > enough to them to donate the time/resources to make it more robust. Were > someone > to take an active interest in it, likely it could be kept around as a plugin > that core Solr > is not responsible for. > > Best, > Erick > >> On May 27, 2020, at 4:43 PM, gnandre wrote: >> >> Thanks, Jason. This is very helpful. >> >> I should clarify though that I am not using CDCR currently with my >> existing master-slave architecture. What I meant to say earlier was that we >> will be relying heavily on the CDCR feature if we migrate from solr >> master-slave architecture to solrcloud architecture. Are there any >> alternatives to CDCR? AFAIK, if you want to replicate between different >> data centers then CDCR is the only option. Also, when you say lot of >> customers are using SolrCloud successfully, how are they working around the >> CDCR situation? Do they not have any data center use cases? Is there some >> list maintained somewhere where one can find which companies are using >> SolrCloud successfully? >> >> >> >> On Wed, May 27, 2020 at 9:27 AM Jason Gerlowski >> wrote: >> >>> Hi Arnold, >>> >>> From what I saw in the community, CDCR saw an initial burst of >>> development around when it was contributed, but hasn't seen much >>> attention or improvement since. So while it's been around for a few >>> years, I'm not sure it's improved much in terms of stability or >>> compatibility with other Solr features. >>> >>> Some of the bigger ticket issues still open around CDCR: >>> - SOLR-11959 no support for basic-auth >>> - SOLR-12842 infinite retry of failed update-requests (leads to >>> sync/recovery problems) >>> - SOLR-12057 no real support for NRT/TLOG/PULL replicas >>> - SOLR-10679 no support for collection aliase
Re: SolrCloud upgrade concern
The biggest issue with CDCR is it’s rather fragile and requires monitoring, it’s not a “fire and forget” type of functionality. For instance, the use of the tlogs as a queueing mechanism means that if, for any reason, the communications between DCs is broken, the tlogs will grow forever until the connection is re-established. Plus the other issues Jason pointed out. So yes, some companies do use CDCR to communicate between separate DCs. But they also put in some “roll your own” type of monitoring to insure things don’t go haywire. Alternatives: 1> use something that’s built from the ground up to provide reliable messaging between DCs. Kafka or similar has been mentioned. Write your updates to the Kafka queue and consume them in both DCs. These kinds of solutions have a lot more robustness. 2> reproduce your system-of-record rather than Solr in the DCs and treat the DCs as separate installations. If you adopt this approach, some of the streaming capabilities can be used to monitor that they stay in sync. For instance have a background or periodic task that’ll take a while for a complete run wrap two "search" streams in a "unique” decorator, anything except an empty result identifies docs not on both DCs. 3> Oh Dear. This one is “interesting”. Wrap a “topic" stream on DC1 in an update decorator for DC2 and wrap both of those in a daemon decorator. That’s gobbledygook, and you’ll have to dig through the docs a bit for that to make sense. Essentially the topic stream is one of the very few streams that does not (IIRC) require all values in the fl list be docValues. It fires the first time and establishes a checkpoint, finding all docs up to that point. Thereafter, it’ll get docs that have changed since the last time it ran. It uses a tiny collection for record keeping. Each time the topic stream finds new docs, it passes them to the update stream which sends them to another DC. Wrapping the whole thing in a daemon decorator means it periodically runs in the background. The one shortcoming is that this approach doesn’t propagate deletes. That’s enough of that until you tell us whether it sounds worth pursuing ;) So overall, you _can_ use CDCR to connect remote DCs, but it takes time and energy to make it robust. Its advantage is that it’s entirely contained within Solr. But it’s not getting much attention lately, meaning nobody has decided the functionality is important enough to them to donate the time/resources to make it more robust. Were someone to take an active interest in it, likely it could be kept around as a plugin that core Solr is not responsible for. Best, Erick > On May 27, 2020, at 4:43 PM, gnandre wrote: > > Thanks, Jason. This is very helpful. > > I should clarify though that I am not using CDCR currently with my > existing master-slave architecture. What I meant to say earlier was that we > will be relying heavily on the CDCR feature if we migrate from solr > master-slave architecture to solrcloud architecture. Are there any > alternatives to CDCR? AFAIK, if you want to replicate between different > data centers then CDCR is the only option. Also, when you say lot of > customers are using SolrCloud successfully, how are they working around the > CDCR situation? Do they not have any data center use cases? Is there some > list maintained somewhere where one can find which companies are using > SolrCloud successfully? > > > > On Wed, May 27, 2020 at 9:27 AM Jason Gerlowski > wrote: > >> Hi Arnold, >> >> From what I saw in the community, CDCR saw an initial burst of >> development around when it was contributed, but hasn't seen much >> attention or improvement since. So while it's been around for a few >> years, I'm not sure it's improved much in terms of stability or >> compatibility with other Solr features. >> >> Some of the bigger ticket issues still open around CDCR: >> - SOLR-11959 no support for basic-auth >> - SOLR-12842 infinite retry of failed update-requests (leads to >> sync/recovery problems) >> - SOLR-12057 no real support for NRT/TLOG/PULL replicas >> - SOLR-10679 no support for collection aliases >> >> These are in addition to other more architectural issues: CDCR can be >> a bottleneck on clusters with high ingestion rates, CDCR uses >> full-index-replication more than traditional indexing setups, which >> can cause issues with modern index sizes, etc. >> >> So, unfortunately, no real good news in terms of CDCR maturing much in >> recent releases. Joel Bernstein filed a JIRA recently suggesting its >> removal entirely actually. Though I don't think it's gone anywhere. >> >> That said, I gather from what you said that you're already using CDCR >> successfully with Master-Slave. If none of these pitfalls are biting >> you in your current Master-Slave setup, you might not be bothered by >> them any more in SolrCloud. Most of the problems with CDCR are >> applicable in master-slave as well
Re: SolrCloud upgrade concern
Thanks, Jason. This is very helpful. I should clarify though that I am not using CDCR currently with my existing master-slave architecture. What I meant to say earlier was that we will be relying heavily on the CDCR feature if we migrate from solr master-slave architecture to solrcloud architecture. Are there any alternatives to CDCR? AFAIK, if you want to replicate between different data centers then CDCR is the only option. Also, when you say lot of customers are using SolrCloud successfully, how are they working around the CDCR situation? Do they not have any data center use cases? Is there some list maintained somewhere where one can find which companies are using SolrCloud successfully? On Wed, May 27, 2020 at 9:27 AM Jason Gerlowski wrote: > Hi Arnold, > > From what I saw in the community, CDCR saw an initial burst of > development around when it was contributed, but hasn't seen much > attention or improvement since. So while it's been around for a few > years, I'm not sure it's improved much in terms of stability or > compatibility with other Solr features. > > Some of the bigger ticket issues still open around CDCR: > - SOLR-11959 no support for basic-auth > - SOLR-12842 infinite retry of failed update-requests (leads to > sync/recovery problems) > - SOLR-12057 no real support for NRT/TLOG/PULL replicas > - SOLR-10679 no support for collection aliases > > These are in addition to other more architectural issues: CDCR can be > a bottleneck on clusters with high ingestion rates, CDCR uses > full-index-replication more than traditional indexing setups, which > can cause issues with modern index sizes, etc. > > So, unfortunately, no real good news in terms of CDCR maturing much in > recent releases. Joel Bernstein filed a JIRA recently suggesting its > removal entirely actually. Though I don't think it's gone anywhere. > > That said, I gather from what you said that you're already using CDCR > successfully with Master-Slave. If none of these pitfalls are biting > you in your current Master-Slave setup, you might not be bothered by > them any more in SolrCloud. Most of the problems with CDCR are > applicable in master-slave as well as SolrCloud. I wouldn't recommend > CDCR if you were starting from scratch, and I still recommend you > consider other options. But since you're already using it with some > success, it might be an orthogonal concern to your potential migration > to SolrCloud. > > Best of luck deciding! > > Jason > > On Fri, May 22, 2020 at 7:06 PM gnandre wrote: > > > > Thanks for this reply, Jason. > > > > I am mostly worried about CDCR feature. I am relying heavily on it. > > Although, I am planning to use Solr 8.3. It has been long time since CDCR > > was first introduced. I wonder what is the state of CDCR is 8.3. Is it > > stable now? > > > > On Wed, Jan 22, 2020, 8:01 AM Jason Gerlowski > wrote: > > > > > Hi Arnold, > > > > > > The stability and complexity issues Mark highlighted in his post > > > aren't just imagined - there are real, sometimes serious, bugs in > > > SolrCloud features. But at the same time there are many many stable > > > deployments out there where SolrCloud is a real success story for > > > users. Small example, I work at a company (Lucidworks) where our main > > > product (Fusion) is built heavily on top of SolrCloud and we see it > > > deployed successfully every day. > > > > > > In no way am I trying to minimize Mark's concerns (or David's). There > > > are stability bugs. But the extent to which those need affect you > > > depends a lot on what your deployment looks like. How many nodes? > > > How many collections? How tightly are you trying to squeeze your > > > hardware? Is your network flaky? Are you looking to use any of > > > SolrCloud's newer, less stable features like CDCR, etc.? > > > > > > Is SolrCloud better for you than Master/Slave? It depends on what > > > you're hoping to gain by a move to SolrCloud, and on your answers to > > > some of the questions above. I would be leery of following any > > > recommendations that are made without regard for your reason for > > > switching or your deployment details. Those things are always the > > > biggest driver in terms of success. > > > > > > Good luck making your decision! > > > > > > Best, > > > > > > Jason > > > >
Re: SolrCloud upgrade concern
Hi Arnold, >From what I saw in the community, CDCR saw an initial burst of development around when it was contributed, but hasn't seen much attention or improvement since. So while it's been around for a few years, I'm not sure it's improved much in terms of stability or compatibility with other Solr features. Some of the bigger ticket issues still open around CDCR: - SOLR-11959 no support for basic-auth - SOLR-12842 infinite retry of failed update-requests (leads to sync/recovery problems) - SOLR-12057 no real support for NRT/TLOG/PULL replicas - SOLR-10679 no support for collection aliases These are in addition to other more architectural issues: CDCR can be a bottleneck on clusters with high ingestion rates, CDCR uses full-index-replication more than traditional indexing setups, which can cause issues with modern index sizes, etc. So, unfortunately, no real good news in terms of CDCR maturing much in recent releases. Joel Bernstein filed a JIRA recently suggesting its removal entirely actually. Though I don't think it's gone anywhere. That said, I gather from what you said that you're already using CDCR successfully with Master-Slave. If none of these pitfalls are biting you in your current Master-Slave setup, you might not be bothered by them any more in SolrCloud. Most of the problems with CDCR are applicable in master-slave as well as SolrCloud. I wouldn't recommend CDCR if you were starting from scratch, and I still recommend you consider other options. But since you're already using it with some success, it might be an orthogonal concern to your potential migration to SolrCloud. Best of luck deciding! Jason On Fri, May 22, 2020 at 7:06 PM gnandre wrote: > > Thanks for this reply, Jason. > > I am mostly worried about CDCR feature. I am relying heavily on it. > Although, I am planning to use Solr 8.3. It has been long time since CDCR > was first introduced. I wonder what is the state of CDCR is 8.3. Is it > stable now? > > On Wed, Jan 22, 2020, 8:01 AM Jason Gerlowski wrote: > > > Hi Arnold, > > > > The stability and complexity issues Mark highlighted in his post > > aren't just imagined - there are real, sometimes serious, bugs in > > SolrCloud features. But at the same time there are many many stable > > deployments out there where SolrCloud is a real success story for > > users. Small example, I work at a company (Lucidworks) where our main > > product (Fusion) is built heavily on top of SolrCloud and we see it > > deployed successfully every day. > > > > In no way am I trying to minimize Mark's concerns (or David's). There > > are stability bugs. But the extent to which those need affect you > > depends a lot on what your deployment looks like. How many nodes? > > How many collections? How tightly are you trying to squeeze your > > hardware? Is your network flaky? Are you looking to use any of > > SolrCloud's newer, less stable features like CDCR, etc.? > > > > Is SolrCloud better for you than Master/Slave? It depends on what > > you're hoping to gain by a move to SolrCloud, and on your answers to > > some of the questions above. I would be leery of following any > > recommendations that are made without regard for your reason for > > switching or your deployment details. Those things are always the > > biggest driver in terms of success. > > > > Good luck making your decision! > > > > Best, > > > > Jason > >
Re: SolrCloud upgrade concern
Thanks for this reply, Jason. I am mostly worried about CDCR feature. I am relying heavily on it. Although, I am planning to use Solr 8.3. It has been long time since CDCR was first introduced. I wonder what is the state of CDCR is 8.3. Is it stable now? On Wed, Jan 22, 2020, 8:01 AM Jason Gerlowski wrote: > Hi Arnold, > > The stability and complexity issues Mark highlighted in his post > aren't just imagined - there are real, sometimes serious, bugs in > SolrCloud features. But at the same time there are many many stable > deployments out there where SolrCloud is a real success story for > users. Small example, I work at a company (Lucidworks) where our main > product (Fusion) is built heavily on top of SolrCloud and we see it > deployed successfully every day. > > In no way am I trying to minimize Mark's concerns (or David's). There > are stability bugs. But the extent to which those need affect you > depends a lot on what your deployment looks like. How many nodes? > How many collections? How tightly are you trying to squeeze your > hardware? Is your network flaky? Are you looking to use any of > SolrCloud's newer, less stable features like CDCR, etc.? > > Is SolrCloud better for you than Master/Slave? It depends on what > you're hoping to gain by a move to SolrCloud, and on your answers to > some of the questions above. I would be leery of following any > recommendations that are made without regard for your reason for > switching or your deployment details. Those things are always the > biggest driver in terms of success. > > Good luck making your decision! > > Best, > > Jason >
Re: SolrCloud upgrade concern
Hi Arnold, The stability and complexity issues Mark highlighted in his post aren't just imagined - there are real, sometimes serious, bugs in SolrCloud features. But at the same time there are many many stable deployments out there where SolrCloud is a real success story for users. Small example, I work at a company (Lucidworks) where our main product (Fusion) is built heavily on top of SolrCloud and we see it deployed successfully every day. In no way am I trying to minimize Mark's concerns (or David's). There are stability bugs. But the extent to which those need affect you depends a lot on what your deployment looks like. How many nodes? How many collections? How tightly are you trying to squeeze your hardware? Is your network flaky? Are you looking to use any of SolrCloud's newer, less stable features like CDCR, etc.? Is SolrCloud better for you than Master/Slave? It depends on what you're hoping to gain by a move to SolrCloud, and on your answers to some of the questions above. I would be leery of following any recommendations that are made without regard for your reason for switching or your deployment details. Those things are always the biggest driver in terms of success. Good luck making your decision! Best, Jason
Re: SolrCloud upgrade concern
ha, im on that thread, didnt know they got stored on a site, thats good to know! -i stand by what i said in there. so i have nothing more to add On Thu, Jan 16, 2020 at 3:29 PM Arnold Bronley wrote: > Hi, > > I am trying to upgrade my system from Solr master-slave architecture to > SolrCloud architecture. > Meanwhile, I stumbled upon this very negative post about SolrCloud. > > > https://lucene.472066.n3.nabble.com/A-Last-Message-to-the-Solr-Users-td4452980.html > > > Given that it is from one of the initial authors of SolrCloud > functionality, I am having second thoughts about the upgrade and I am > somewhat concerned. > > I will greatly appreciate any advice/feedback on this from Solr community. >
SolrCloud upgrade concern
Hi, I am trying to upgrade my system from Solr master-slave architecture to SolrCloud architecture. Meanwhile, I stumbled upon this very negative post about SolrCloud. https://lucene.472066.n3.nabble.com/A-Last-Message-to-the-Solr-Users-td4452980.html Given that it is from one of the initial authors of SolrCloud functionality, I am having second thoughts about the upgrade and I am somewhat concerned. I will greatly appreciate any advice/feedback on this from Solr community.