Re: cross data center replication

2014-06-27 Thread Matthew Parrott
I'm interested in this too.
es-reindex seems like it lacks conflict resolution, and as noted in the 
docs, would be better implemented as a river.

On Wednesday, June 4, 2014 9:03:37 PM UTC-7, Todd Nine wrote:
>
> Hey all,
>  
> Sorry to resurrect a dead thread.  Did you ever find a solution for 
> eventual consistency of documents across EC2 regions?
>
> Thanks,
> todd
>
>
>
> On Wednesday, May 1, 2013 5:50:00 AM UTC-7, Norberto Meijome wrote:
>>
>> +1 on all of the above. es-reindex already in my list of things to 
>> investigate (for a number of issues...)
>>
>> cheers,
>> b 
>>
>>
>> On Wed, May 1, 2013 at 6:58 AM, Paul Hill  wrote:
>>
>>> On 4/23/2013 8:44 AM, Daniel Maher wrote:
>>>
 On 2013-04-23 5:22 PM, Saikat Kanjilal wrote:

> Hello Folks,
> [...] does ES out of the box currently support cross data
> center replication,  []
>

 Hello,

 I'd wager that the question you're really asking about is how to 
 control where shards are placed; if you can make deterministic statements 
 about where shards are, then you can create your own "rack-aware" or "data 
 centre-aware" scenarios.  ES has supported this "out of the box" for well 
 over a year now (possibly longer).

 You'll want to investigate "zones" and "routing allocation", which are 
 the key elements of shard placement.  There is an excellent blog post 
 which 
 describes exactly how to set things up here :
 http://blog.sematext.com/2012/05/29/elasticsearch-shard-
 placement-control/ 

  Is shard allocation really the correct solution if the data centers 
>>> are globally distributed?
>>>
>>> If I have a data center in the US intended to server data from the US, 
>>> but it should also have access to Europe and Asia data, and clusters in 
>>> both Europe and Asia with similar needs, would I really want to use zones 
>>> etc. and have one great global cluster with data center aware 
>>> configurations?
>>>
>>> Assuming that the US would be happy to deal with old documents from Asia 
>>> and Europe, when Asia or Europe is off line or just not caught up, it would 
>>> seem that you would NOT want a "world" cluster, because I can't picture how 
>>> you'd configure a 3-part world cluster for both index into the right 
>>> indices, search the right (possible combination of) shards, but also 
>>> preventing "split brain".
>>>
>>> In the scenerio, I've described, I would think each data center might 
>>> better provide availability and eventual consistency (with less concern for 
>>> the remote data from the other region) by having three clusters and some 
>>> type of syncing from one index to copies at the other two locations.  For 
>>> example, the US datacenter might have a US, copyOfEurope, and copyOfAsia 
>>> index.
>>>
>>> Anyone have any observations about such a world-wide scenerio?
>>> Are there any index to index copy utilities?
>>> Is there a river or other plugin that might be useful for this three 
>>> clusters working together scenerio?
>>> How about the project https://github.com/karussell/elasticsearch-reindex
>>> ?
>>> Comments?
>>>
>>> -Paul
>>>
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>>
>>
>>
>> -- 
>> Norberto 'Beto' Meijome
>>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/86f03167-6803-4bdd-9278-21b222e56d7c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Notifications from Elasticsearch when documents are added.

2014-06-27 Thread Matthew Parrott
I found this note:
http://www.elasticsearch.org/blog/1-0-0-beta2-released/

Which mentions:
"Later on we plan on making cross data-center replication possible by 
adding the ability to do incremental restores into a read-only index."

Is that feature still on the roadmap?

Thanks

On Friday, June 27, 2014 10:40:55 AM UTC-7, Matthew Parrott wrote:
>
> Hey!
>
> I have looked at tribes, but didn't look deeply because of this:
>
> "The merged view cannot handle indices with the same name in multiple 
> clusters."
>
> I'd like to have indexes replicated across datacenters. Is there a way to 
> accomplish that with tribes?
>
> Thanks!
>
> On Friday, June 27, 2014 2:29:46 AM UTC-7, Jörg Prante wrote:
>>
>> Have you seen the Tribe Node? This is a kind of a "merged state" 
>> multi-master cluster. 
>>
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-tribe.html
>>
>> Jörg
>>
>>
>> On Fri, Jun 27, 2014 at 1:39 AM, Matthew Parrott  
>> wrote:
>>
>>> Hi!
>>>
>>> Have there been any further explorations in the area of wan replication?
>>>
>>> I have ES clusters in multiple datacenters connected via high-speed 
>>> private network. I'm wondering if multi-master replication would be 
>>> possible in this environment or if we'd need some type of 'shovel' plugin 
>>> like the one described here to ship data between the DCs.
>>>
>>> Thanks,
>>> Matthew
>>>
>>>
>>> On Tuesday, July 23, 2013 10:06:10 AM UTC-7, Jörg Prante wrote:
>>>
>>>> Yes, I once examined Kafka, and discovered that many components are 
>>>> already there in Elasticsearch. For example, the activity stream is 
>>>> already 
>>>> there as ES translog (if you focus on indexing operations) and the ES 
>>>> gateway is a useful persistency store mechanism. What I didn't like was 
>>>> the 
>>>> single Kafka JVM, and the Zookeeper infrastructure, it is all adding up 
>>>> complexity beside ES.
>>>>
>>>> For cross-cluster replication, I think the best approach is distributed 
>>>> log replication. This is hard, because logged ES operations must be 
>>>> synchronized by an external time source (e.g. vector clocks) to use them 
>>>> like a global event stream. A pubsub mechanism could then work at the 
>>>> primary shards of an index in the ES node as a service, merging the 
>>>> translogs for an external agent who previously subscribed to the 
>>>> replication stream. The vector clock is required for a distributed time 
>>>> machine like behavior (snapshots), assuming the translog is not deleted, 
>>>> but stored for a certain time window.
>>>>
>>>> Jörg
>>>>
>>>>
>>>> On Tue, Jul 23, 2013 at 3:55 PM, Vinicius Carvalho <
>>>> vinicius...@gmail.com> wrote:
>>>>
>>>>> Thanks again Jorg, so that you know I'm actually considering using 
>>>>> kafka for intra cluster replication. We want to push the index operations 
>>>>> to a topic and then other clusters on different DCs would subscribe to 
>>>>> this. Conflict resolution will be last commit will win. And in case of 
>>>>> kafka cluster failure we will append changes to a local index, and then 
>>>>> send them over as the bus is back. In the case ES cluster dies, and when 
>>>>> it 
>>>>> recovers, one nice thing on kafka is that one can request messages based 
>>>>> on 
>>>>> an offset, so we could start consuming messages from the last point the 
>>>>> cluster had consume them.
>>>>>
>>>>> It's all ideas I'm working right now. I'll probably have time to start 
>>>>> coding them soon. Thanks for all the support :)
>>>>>
>>>>> Cheers
>>>>>
>>>>>   -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/37b1c902-a74d-4c35-bc41-5e1d5e76e72d%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/elasticsearch/37b1c902-a74d-4c35-bc41-5e1d5e76e72d%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c0caf8a9-98a3-4e00-aa7b-abec5c98a542%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Notifications from Elasticsearch when documents are added.

2014-06-27 Thread Matthew Parrott
Hey!

I have looked at tribes, but didn't look deeply because of this:

"The merged view cannot handle indices with the same name in multiple 
clusters."

I'd like to have indexes replicated across datacenters. Is there a way to 
accomplish that with tribes?

Thanks!

On Friday, June 27, 2014 2:29:46 AM UTC-7, Jörg Prante wrote:
>
> Have you seen the Tribe Node? This is a kind of a "merged state" 
> multi-master cluster. 
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-tribe.html
>
> Jörg
>
>
> On Fri, Jun 27, 2014 at 1:39 AM, Matthew Parrott  > wrote:
>
>> Hi!
>>
>> Have there been any further explorations in the area of wan replication?
>>
>> I have ES clusters in multiple datacenters connected via high-speed 
>> private network. I'm wondering if multi-master replication would be 
>> possible in this environment or if we'd need some type of 'shovel' plugin 
>> like the one described here to ship data between the DCs.
>>
>> Thanks,
>> Matthew
>>
>>
>> On Tuesday, July 23, 2013 10:06:10 AM UTC-7, Jörg Prante wrote:
>>
>>> Yes, I once examined Kafka, and discovered that many components are 
>>> already there in Elasticsearch. For example, the activity stream is already 
>>> there as ES translog (if you focus on indexing operations) and the ES 
>>> gateway is a useful persistency store mechanism. What I didn't like was the 
>>> single Kafka JVM, and the Zookeeper infrastructure, it is all adding up 
>>> complexity beside ES.
>>>
>>> For cross-cluster replication, I think the best approach is distributed 
>>> log replication. This is hard, because logged ES operations must be 
>>> synchronized by an external time source (e.g. vector clocks) to use them 
>>> like a global event stream. A pubsub mechanism could then work at the 
>>> primary shards of an index in the ES node as a service, merging the 
>>> translogs for an external agent who previously subscribed to the 
>>> replication stream. The vector clock is required for a distributed time 
>>> machine like behavior (snapshots), assuming the translog is not deleted, 
>>> but stored for a certain time window.
>>>
>>> Jörg
>>>
>>>
>>> On Tue, Jul 23, 2013 at 3:55 PM, Vinicius Carvalho <
>>> vinicius...@gmail.com> wrote:
>>>
>>>> Thanks again Jorg, so that you know I'm actually considering using 
>>>> kafka for intra cluster replication. We want to push the index operations 
>>>> to a topic and then other clusters on different DCs would subscribe to 
>>>> this. Conflict resolution will be last commit will win. And in case of 
>>>> kafka cluster failure we will append changes to a local index, and then 
>>>> send them over as the bus is back. In the case ES cluster dies, and when 
>>>> it 
>>>> recovers, one nice thing on kafka is that one can request messages based 
>>>> on 
>>>> an offset, so we could start consuming messages from the last point the 
>>>> cluster had consume them.
>>>>
>>>> It's all ideas I'm working right now. I'll probably have time to start 
>>>> coding them soon. Thanks for all the support :)
>>>>
>>>> Cheers
>>>>
>>>>   -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/37b1c902-a74d-4c35-bc41-5e1d5e76e72d%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/37b1c902-a74d-4c35-bc41-5e1d5e76e72d%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/28808d58-62c2-433e-b932-c93d824f0a97%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Notifications from Elasticsearch when documents are added.

2014-06-26 Thread Matthew Parrott
Hi!

Have there been any further explorations in the area of wan replication?

I have ES clusters in multiple datacenters connected via high-speed private 
network. I'm wondering if multi-master replication would be possible in 
this environment or if we'd need some type of 'shovel' plugin like the one 
described here to ship data between the DCs.

Thanks,
Matthew

On Tuesday, July 23, 2013 10:06:10 AM UTC-7, Jörg Prante wrote:
>
> Yes, I once examined Kafka, and discovered that many components are 
> already there in Elasticsearch. For example, the activity stream is already 
> there as ES translog (if you focus on indexing operations) and the ES 
> gateway is a useful persistency store mechanism. What I didn't like was the 
> single Kafka JVM, and the Zookeeper infrastructure, it is all adding up 
> complexity beside ES.
>
> For cross-cluster replication, I think the best approach is distributed 
> log replication. This is hard, because logged ES operations must be 
> synchronized by an external time source (e.g. vector clocks) to use them 
> like a global event stream. A pubsub mechanism could then work at the 
> primary shards of an index in the ES node as a service, merging the 
> translogs for an external agent who previously subscribed to the 
> replication stream. The vector clock is required for a distributed time 
> machine like behavior (snapshots), assuming the translog is not deleted, 
> but stored for a certain time window.
>
> Jörg
>
> On Tue, Jul 23, 2013 at 3:55 PM, Vinicius Carvalho  > wrote:
>
>> Thanks again Jorg, so that you know I'm actually considering using kafka 
>> for intra cluster replication. We want to push the index operations to a 
>> topic and then other clusters on different DCs would subscribe to this. 
>> Conflict resolution will be last commit will win. And in case of kafka 
>> cluster failure we will append changes to a local index, and then send them 
>> over as the bus is back. In the case ES cluster dies, and when it recovers, 
>> one nice thing on kafka is that one can request messages based on an 
>> offset, so we could start consuming messages from the last point the 
>> cluster had consume them.
>>
>> It's all ideas I'm working right now. I'll probably have time to start 
>> coding them soon. Thanks for all the support :)
>>
>> Cheers
>>
>> 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/37b1c902-a74d-4c35-bc41-5e1d5e76e72d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


multimaster over wan

2014-03-30 Thread Matthew Parrott
Hi,

I'm wondering if ES supports a multi-master configuration over WAN. Let's 
say I have node A and node B and they're in different parts of the world. 
Can I write to the instance closest to my location and have the data 
automatically replicated to the other?

Reading through the cluster configuration docs, it wasn't quite clear if 
this was possible.

Thanks,
Matthew

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d62d143e-5405-4f6f-8a3e-722f473832ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.