While it seems quite easy to attach listeners to an ES node to capture operations in translog-style and push out index/delete operations on shard level somehow, there will be more to consider for a reliable solution.
The Couchbase developers have added a data replication protocol to their product which is meant for transporting changes over long distances with latency for in-memory processing. To learn about the most important features, see https://github.com/couchbaselabs/dcp-documentation and http://docs.couchbase.com/admin/admin/Concepts/dcp.html I think bringing such a concept of an inter cluster protocol into ES could be a good starting point, to sketch the complete path for such an ambitious project beforehand. Most challenging could be dealing with back pressure when receiving nodes/clusters are becoming slow. For a solution to this, reactive Java / reactive streams look like a viable possibility. See also https://github.com/ReactiveX/RxJava/wiki/Backpressure http://www.ratpack.io/manual/current/streams.html I'm in favor of Ratpack since it comes with Java 8, Groovy, Google Guava, and Netty, which has a resemblance to ES. In ES, for inter cluster communication, there is not much coded afaik, except snapshot/restore. Maybe snapshot/restore can provide everything you want, with incremental mode. Lucene will offer numbered segment files for faster incremental snapshot/restore. Just my 2¢ Jörg On Thu, Jan 15, 2015 at 7:00 PM, Todd Nine <tn...@apigee.com> wrote: > Hey all, > I would like to create a plugin, and I need a hand. Below are the > requirements I have. > > > - Our documents are immutable. They are only ever created or deleted, > updates do not apply. > - We want mirrors of our ES cluster in multiple AWS regions. This way > if the WAN between regions is severed for any reason, we do not suffer an > outage, just a delay in consistency. > - As documents are added or removed they are rolled up then shipped in > batch to the other AWS Regions. This can be a fast as a few milliseconds, > or as slow as minutes, and will be user configurable. Note that a full > backup+load is too slow, this is more of a near realtime operation. > - This will sync the following operations. > - Index creation/deletion > - Alias creation/deletion > - Document creation/deletion > > > What I'm thinking architecturally. > > > - The plugin is installed on each node in our cluster in all regions > - The plugin will only gather changes for the primary shards on the > local node > - After the timeout elapses, the plugin will ship the changelog to the > other AWS regions, where the plugin will receive it and process it > > > Are there any api's I can look at that are a good starting point for > developing this? I'd like to do a simple prototype with 2 1 node clusters > reasonably soon. I found several plugin tutorials, but I'm more concerned > with what part of the ES api I can call to receive events, if any. > > Thanks, > Todd > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/dff53da5-8a0c-4805-8f97-72844019a79e%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/dff53da5-8a0c-4805-8f97-72844019a79e%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFxWfx_KasNcZVCA7wC6VTSM-NrC0hBn51iSnikGsdD8g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.