Re: Help creating a near real time streaming plugin to perform replication between clusters

joergpra...@gmail.com Thu, 15 Jan 2015 16:29:27 -0800

While it seems quite easy to attach listeners to an ES node to capture
operations in translog-style and push out index/delete operations on shard
level somehow, there will be more to consider for a reliable solution.


The Couchbase developers have added a data replication protocol to their
product which is meant for transporting changes over long distances with
latency for in-memory processing.

To learn about the most important features, see

https://github.com/couchbaselabs/dcp-documentation

and

http://docs.couchbase.com/admin/admin/Concepts/dcp.html

I think bringing such a concept of an inter cluster protocol into ES could
be a good starting point, to sketch the complete path for such an ambitious
project beforehand.

Most challenging could be dealing with back pressure when receiving
nodes/clusters are becoming slow. For a solution to this, reactive Java /
reactive streams look like a viable possibility.

See also

https://github.com/ReactiveX/RxJava/wiki/Backpressure

http://www.ratpack.io/manual/current/streams.html

I'm in favor of Ratpack since it comes with Java 8, Groovy, Google Guava,
and Netty, which has a resemblance to ES.

In ES, for inter cluster communication, there is not much coded afaik,
except snapshot/restore. Maybe snapshot/restore can provide everything you
want, with incremental mode. Lucene will offer numbered segment files for
faster incremental snapshot/restore.

Just my 2¢

Jörg



On Thu, Jan 15, 2015 at 7:00 PM, Todd Nine <tn...@apigee.com> wrote:

> Hey all,
>   I would like to create a plugin, and I need a hand.  Below are the
> requirements I have.
>
>
>    - Our documents are immutable.  They are only ever created or deleted,
>    updates do not apply.
>    - We want mirrors of our ES cluster in multiple AWS regions.  This way
>    if the WAN between regions is severed for any reason, we do not suffer an
>    outage, just a delay in consistency.
>    - As documents are added or removed they are rolled up then shipped in
>    batch to the other AWS Regions.  This can be a fast as a few milliseconds,
>    or as slow as minutes, and will be user configurable.  Note that a full
>    backup+load is too slow, this is more of a near realtime operation.
>    - This will sync the following operations.
>       - Index creation/deletion
>       - Alias creation/deletion
>       - Document creation/deletion
>
>
> What I'm thinking architecturally.
>
>
>    - The plugin is installed on each node in our cluster in all regions
>    - The plugin will only gather changes for the primary shards on the
>    local node
>    - After the timeout elapses, the plugin will ship the changelog to the
>    other AWS regions, where the plugin will receive it and process it
>
>
> Are there any api's I can look at that are a good starting point for
> developing this?  I'd like to do a simple prototype with 2 1 node clusters
> reasonably soon.  I found several plugin tutorials, but I'm more concerned
> with what part of the ES api I can call to receive events, if any.
>
> Thanks,
> Todd
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/dff53da5-8a0c-4805-8f97-72844019a79e%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/dff53da5-8a0c-4805-8f97-72844019a79e%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFxWfx_KasNcZVCA7wC6VTSM-NrC0hBn51iSnikGsdD8g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Help creating a near real time streaming plugin to perform replication between clusters

Reply via email to