Hi everyone,
Large scale Solr installations often require cross data-center replication in order to achieve data replication for both, access latency reasons as well as disaster recovery. In the past users have either designed their own solutions to deal with this or have tried to rely on the now-deprecated CDCR. It would be really good to have support for cross data-center replication within Solr, that is offered and supported by the community. This would allow the effort around this shared problem to converge. I’d like to propose a new solution based on my experiences at my day job. The key points about this approach: 1. Uses an external, configurable, messaging system in the middle for actual replication/mirroring. 2. We offer an abstraction and some default implementations based on what we can support and what users really want. An example here would be Kafka. 3. This would be a separate repository allowing it to have its own release cadence. We shouldn’t have to release this with every Solr release as the overlap is just limited to SolrJ interactions. I’ll share a more detailed and evolving document soon with the design for everyone else to contribute to but wanted to share this as I’m starting to work on this and wanted to avoid parallel efforts towards the same end-goal. -- Anshum Gupta
