Looking forward to the doc. Thanks Marcus
On Sun, Dec 6, 2020 at 05:47 Erick Erickson <[email protected]> wrote: > Anshum: > > I know I’ve been recommending something like this to clients for a while, > do you think a call to the community for people who’ve already put > something in the middle might net us some good info on the lurking > gremlins? Mind you “recommend” hasn’t actually involved me _doing_ it > so I don’t have any actual experience there… > > But yeah, absolutely +1 for something making this easier for clients... > > Erick > > > On Dec 5, 2020, at 11:43 AM, Ilan Ginzburg <[email protected]> wrote: > > > > That's an interesting initiative Anshum! > > > > I can see at least two different approaches here, your mention of SolrJ > seems to hint at the first one: > > 1. Get the data as it comes from the client and fork it to local and > remote data centers, > > 2. Create (an asynchronous) stream replicating local data center data to > remote. > > > > Option 1 is strongly consistent but adds latency and potentially > blocking on the critical path. > > Option 2 could look like remote PULL replicas, might have lower impact > on the local data center but has to deal with the remote data center always > being somewhat behind. If the client application can handle that, the > performance and efficiency gain (as well as simpler implementation? It > doesn't require another persistence layer) might be worth it... > > > > Ilan > > > > On Fri, Dec 4, 2020 at 5:24 PM Anshum Gupta <[email protected]> > wrote: > > Hi everyone, > > > > Large scale Solr installations often require cross data-center > replication in order to achieve data replication for both, access latency > reasons as well as disaster recovery. In the past users have either > designed their own solutions to deal with this or have tried to rely on the > now-deprecated CDCR. > > > > It would be really good to have support for cross data-center > replication within Solr, that is offered and supported by the community. > This would allow the effort around this shared problem to converge. > > > > I’d like to propose a new solution based on my experiences at my day > job. The key points about this approach: > > • Uses an external, configurable, messaging system in the middle > for actual replication/mirroring. > > • We offer an abstraction and some default implementations based > on what we can support and what users really want. An example here would be > Kafka. > > • This would be a separate repository allowing it to have its own > release cadence. We shouldn’t have to release this with every Solr release > as the overlap is just limited to SolrJ interactions. > > > > I’ll share a more detailed and evolving document soon with the design > for everyone else to contribute to but wanted to share this as I’m starting > to work on this and wanted to avoid parallel efforts towards the same > end-goal. > > > > -- > > Anshum Gupta > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- Marcus Eagan
