Re: Notifications from Elasticsearch when documents are added.

2014-06-27 Thread Matthew Parrott
Hey!

I have looked at tribes, but didn't look deeply because of this:

The merged view cannot handle indices with the same name in multiple 
clusters.

I'd like to have indexes replicated across datacenters. Is there a way to 
accomplish that with tribes?

Thanks!

On Friday, June 27, 2014 2:29:46 AM UTC-7, Jörg Prante wrote:

 Have you seen the Tribe Node? This is a kind of a merged state 
 multi-master cluster. 


 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-tribe.html

 Jörg


 On Fri, Jun 27, 2014 at 1:39 AM, Matthew Parrott matthe...@gmail.com 
 javascript: wrote:

 Hi!

 Have there been any further explorations in the area of wan replication?

 I have ES clusters in multiple datacenters connected via high-speed 
 private network. I'm wondering if multi-master replication would be 
 possible in this environment or if we'd need some type of 'shovel' plugin 
 like the one described here to ship data between the DCs.

 Thanks,
 Matthew


 On Tuesday, July 23, 2013 10:06:10 AM UTC-7, Jörg Prante wrote:

 Yes, I once examined Kafka, and discovered that many components are 
 already there in Elasticsearch. For example, the activity stream is already 
 there as ES translog (if you focus on indexing operations) and the ES 
 gateway is a useful persistency store mechanism. What I didn't like was the 
 single Kafka JVM, and the Zookeeper infrastructure, it is all adding up 
 complexity beside ES.

 For cross-cluster replication, I think the best approach is distributed 
 log replication. This is hard, because logged ES operations must be 
 synchronized by an external time source (e.g. vector clocks) to use them 
 like a global event stream. A pubsub mechanism could then work at the 
 primary shards of an index in the ES node as a service, merging the 
 translogs for an external agent who previously subscribed to the 
 replication stream. The vector clock is required for a distributed time 
 machine like behavior (snapshots), assuming the translog is not deleted, 
 but stored for a certain time window.

 Jörg


 On Tue, Jul 23, 2013 at 3:55 PM, Vinicius Carvalho 
 vinicius...@gmail.com wrote:

 Thanks again Jorg, so that you know I'm actually considering using 
 kafka for intra cluster replication. We want to push the index operations 
 to a topic and then other clusters on different DCs would subscribe to 
 this. Conflict resolution will be last commit will win. And in case of 
 kafka cluster failure we will append changes to a local index, and then 
 send them over as the bus is back. In the case ES cluster dies, and when 
 it 
 recovers, one nice thing on kafka is that one can request messages based 
 on 
 an offset, so we could start consuming messages from the last point the 
 cluster had consume them.

 It's all ideas I'm working right now. I'll probably have time to start 
 coding them soon. Thanks for all the support :)

 Cheers

   -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/37b1c902-a74d-4c35-bc41-5e1d5e76e72d%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/37b1c902-a74d-4c35-bc41-5e1d5e76e72d%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/28808d58-62c2-433e-b932-c93d824f0a97%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Notifications from Elasticsearch when documents are added.

2014-06-27 Thread Matthew Parrott
I found this note:
http://www.elasticsearch.org/blog/1-0-0-beta2-released/

Which mentions:
Later on we plan on making cross data-center replication possible by 
adding the ability to do incremental restores into a read-only index.

Is that feature still on the roadmap?

Thanks

On Friday, June 27, 2014 10:40:55 AM UTC-7, Matthew Parrott wrote:

 Hey!

 I have looked at tribes, but didn't look deeply because of this:

 The merged view cannot handle indices with the same name in multiple 
 clusters.

 I'd like to have indexes replicated across datacenters. Is there a way to 
 accomplish that with tribes?

 Thanks!

 On Friday, June 27, 2014 2:29:46 AM UTC-7, Jörg Prante wrote:

 Have you seen the Tribe Node? This is a kind of a merged state 
 multi-master cluster. 


 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-tribe.html

 Jörg


 On Fri, Jun 27, 2014 at 1:39 AM, Matthew Parrott matthe...@gmail.com 
 wrote:

 Hi!

 Have there been any further explorations in the area of wan replication?

 I have ES clusters in multiple datacenters connected via high-speed 
 private network. I'm wondering if multi-master replication would be 
 possible in this environment or if we'd need some type of 'shovel' plugin 
 like the one described here to ship data between the DCs.

 Thanks,
 Matthew


 On Tuesday, July 23, 2013 10:06:10 AM UTC-7, Jörg Prante wrote:

 Yes, I once examined Kafka, and discovered that many components are 
 already there in Elasticsearch. For example, the activity stream is 
 already 
 there as ES translog (if you focus on indexing operations) and the ES 
 gateway is a useful persistency store mechanism. What I didn't like was 
 the 
 single Kafka JVM, and the Zookeeper infrastructure, it is all adding up 
 complexity beside ES.

 For cross-cluster replication, I think the best approach is distributed 
 log replication. This is hard, because logged ES operations must be 
 synchronized by an external time source (e.g. vector clocks) to use them 
 like a global event stream. A pubsub mechanism could then work at the 
 primary shards of an index in the ES node as a service, merging the 
 translogs for an external agent who previously subscribed to the 
 replication stream. The vector clock is required for a distributed time 
 machine like behavior (snapshots), assuming the translog is not deleted, 
 but stored for a certain time window.

 Jörg


 On Tue, Jul 23, 2013 at 3:55 PM, Vinicius Carvalho 
 vinicius...@gmail.com wrote:

 Thanks again Jorg, so that you know I'm actually considering using 
 kafka for intra cluster replication. We want to push the index operations 
 to a topic and then other clusters on different DCs would subscribe to 
 this. Conflict resolution will be last commit will win. And in case of 
 kafka cluster failure we will append changes to a local index, and then 
 send them over as the bus is back. In the case ES cluster dies, and when 
 it 
 recovers, one nice thing on kafka is that one can request messages based 
 on 
 an offset, so we could start consuming messages from the last point the 
 cluster had consume them.

 It's all ideas I'm working right now. I'll probably have time to start 
 coding them soon. Thanks for all the support :)

 Cheers

   -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/37b1c902-a74d-4c35-bc41-5e1d5e76e72d%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/37b1c902-a74d-4c35-bc41-5e1d5e76e72d%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c0caf8a9-98a3-4e00-aa7b-abec5c98a542%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: cross data center replication

2014-06-27 Thread Matthew Parrott
I'm interested in this too.
es-reindex seems like it lacks conflict resolution, and as noted in the 
docs, would be better implemented as a river.

On Wednesday, June 4, 2014 9:03:37 PM UTC-7, Todd Nine wrote:

 Hey all,
  
 Sorry to resurrect a dead thread.  Did you ever find a solution for 
 eventual consistency of documents across EC2 regions?

 Thanks,
 todd



 On Wednesday, May 1, 2013 5:50:00 AM UTC-7, Norberto Meijome wrote:

 +1 on all of the above. es-reindex already in my list of things to 
 investigate (for a number of issues...)

 cheers,
 b 


 On Wed, May 1, 2013 at 6:58 AM, Paul Hill pare...@gmail.com wrote:

 On 4/23/2013 8:44 AM, Daniel Maher wrote:

 On 2013-04-23 5:22 PM, Saikat Kanjilal wrote:

 Hello Folks,
 [...] does ES out of the box currently support cross data
 center replication,  []


 Hello,

 I'd wager that the question you're really asking about is how to 
 control where shards are placed; if you can make deterministic statements 
 about where shards are, then you can create your own rack-aware or data 
 centre-aware scenarios.  ES has supported this out of the box for well 
 over a year now (possibly longer).

 You'll want to investigate zones and routing allocation, which are 
 the key elements of shard placement.  There is an excellent blog post 
 which 
 describes exactly how to set things up here :
 http://blog.sematext.com/2012/05/29/elasticsearch-shard-
 placement-control/ 

  Is shard allocation really the correct solution if the data centers 
 are globally distributed?

 If I have a data center in the US intended to server data from the US, 
 but it should also have access to Europe and Asia data, and clusters in 
 both Europe and Asia with similar needs, would I really want to use zones 
 etc. and have one great global cluster with data center aware 
 configurations?

 Assuming that the US would be happy to deal with old documents from Asia 
 and Europe, when Asia or Europe is off line or just not caught up, it would 
 seem that you would NOT want a world cluster, because I can't picture how 
 you'd configure a 3-part world cluster for both index into the right 
 indices, search the right (possible combination of) shards, but also 
 preventing split brain.

 In the scenerio, I've described, I would think each data center might 
 better provide availability and eventual consistency (with less concern for 
 the remote data from the other region) by having three clusters and some 
 type of syncing from one index to copies at the other two locations.  For 
 example, the US datacenter might have a US, copyOfEurope, and copyOfAsia 
 index.

 Anyone have any observations about such a world-wide scenerio?
 Are there any index to index copy utilities?
 Is there a river or other plugin that might be useful for this three 
 clusters working together scenerio?
 How about the project https://github.com/karussell/elasticsearch-reindex
 ?
 Comments?

 -Paul


 -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.





 -- 
 Norberto 'Beto' Meijome
  


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/86f03167-6803-4bdd-9278-21b222e56d7c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Notifications from Elasticsearch when documents are added.

2014-06-26 Thread Matthew Parrott
Hi!

Have there been any further explorations in the area of wan replication?

I have ES clusters in multiple datacenters connected via high-speed private 
network. I'm wondering if multi-master replication would be possible in 
this environment or if we'd need some type of 'shovel' plugin like the one 
described here to ship data between the DCs.

Thanks,
Matthew

On Tuesday, July 23, 2013 10:06:10 AM UTC-7, Jörg Prante wrote:

 Yes, I once examined Kafka, and discovered that many components are 
 already there in Elasticsearch. For example, the activity stream is already 
 there as ES translog (if you focus on indexing operations) and the ES 
 gateway is a useful persistency store mechanism. What I didn't like was the 
 single Kafka JVM, and the Zookeeper infrastructure, it is all adding up 
 complexity beside ES.

 For cross-cluster replication, I think the best approach is distributed 
 log replication. This is hard, because logged ES operations must be 
 synchronized by an external time source (e.g. vector clocks) to use them 
 like a global event stream. A pubsub mechanism could then work at the 
 primary shards of an index in the ES node as a service, merging the 
 translogs for an external agent who previously subscribed to the 
 replication stream. The vector clock is required for a distributed time 
 machine like behavior (snapshots), assuming the translog is not deleted, 
 but stored for a certain time window.

 Jörg

 On Tue, Jul 23, 2013 at 3:55 PM, Vinicius Carvalho vinicius...@gmail.com 
 javascript: wrote:

 Thanks again Jorg, so that you know I'm actually considering using kafka 
 for intra cluster replication. We want to push the index operations to a 
 topic and then other clusters on different DCs would subscribe to this. 
 Conflict resolution will be last commit will win. And in case of kafka 
 cluster failure we will append changes to a local index, and then send them 
 over as the bus is back. In the case ES cluster dies, and when it recovers, 
 one nice thing on kafka is that one can request messages based on an 
 offset, so we could start consuming messages from the last point the 
 cluster had consume them.

 It's all ideas I'm working right now. I'll probably have time to start 
 coding them soon. Thanks for all the support :)

 Cheers

 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/37b1c902-a74d-4c35-bc41-5e1d5e76e72d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


multimaster over wan

2014-03-30 Thread Matthew Parrott
Hi,

I'm wondering if ES supports a multi-master configuration over WAN. Let's 
say I have node A and node B and they're in different parts of the world. 
Can I write to the instance closest to my location and have the data 
automatically replicated to the other?

Reading through the cluster configuration docs, it wasn't quite clear if 
this was possible.

Thanks,
Matthew

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d62d143e-5405-4f6f-8a3e-722f473832ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.