Re: Multi DC cluster or separate cluster per DC?

2014-05-15 Thread Sebastian Łaskawiec
We are still thinking about production configuration and here is a short 
list of single/separate cluster's advantages and disadvantages...

Single cluster:

   - (+) If you have single cluster - you perform single query to the 
   database. In case of having cluster per DC - each cluster needs to query DB 
   separately
   - (+) Data consistency - in the matter of fact this is achieved by 
   single query to the DB
   - (+) You can introduce new DC easily
   - (+) True active-active configuration
   - (-) Split brain and pretty complicated configuration (to avoid split 
   brain in case when DC link is down)
   - (-) node.master setting can not be changed in runtime (take a look at 
   my first post and split brain solution)
   - (-) In case of a disaster we need to operate on single DC. If you use 
   single cluster per 2 DCs you can't really tell if a single DC is strong 
   enough to handle query and indexing load
   - (-) In pessimistic scenario data travels through WAN 2 times (first 
   time - database replication, second time - ES replication)
   - (-) You can't really tell which node will respond to the query. Let's 
   assume that you have full index in each DC (force awareness option). ES 
   might decide to gather results from the remote DC and not from the local 
   one. This way you need to add WAN latency into your query time.
   - (-) You need to turn off whole cluster or perform cycle restarts 
   during upgrade

Separate cluster per DC:

   - (+) No Split brain
   - (+) You can tell precisely when you are out of resources to handle 
   load in ES cluster in each DC
   - (+) You can experiment with different settings on production. If 
   something goes wrong - just switch clients to standby DC.
   - (+) Full failover - in case of any problems - just switch to the other 
   DC
   - (+) Upgrades are easy and you have no down time (upgrade first DC, 
   stabilize it, test it, and then to the same to the other DC)
   - (+) Since these are 2 separate clusters you can avoid data traveling 
   through WAN during queries. Each DC queries nodes locally.
   - (-) It is not a full active-active configuration. It's more like an 
   active-standby configuration
   - (-) Data inconsistency might occur (different results when queried 
   local and remote DC)
   - (-) Each DC will query DB separately. This will generate additional 
   load to the DB

Right now we think we should go for 2 separate clusters. DB load is a thing 
which worries me the most (we have really complicated query with a lot of 
left joins). However we think that in our case having to separate DC have 
more advantages then disadvantages.

If you have some more arguments or comments - please let us know :)

Regards
Sebastian

W dniu poniedziałek, 12 maja 2014 20:02:35 UTC+2 użytkownik Deepak Jha 
napisał:

 Having a separate cluster is definitely a better way to go. OR, you can 
 control the shard, replica placement so that they are always placed in the 
 same DC. In this way, you can avoid interDC issues still having a single 
 cluster. I have the similar issue and I am looking at it as one of the 
 alternative. 

 On Saturday, May 10, 2014 1:05:08 AM UTC-7, Sebastian Łaskawiec wrote:

 Thanks for the answer! We've been talking with several other teams in our 
 company and it looks like this is the most recommended and stable setup.

 Regards
 Sebastian

 W dniu środa, 7 maja 2014 03:23:43 UTC+2 użytkownik Mark Walkom napisał:

 Go the latter method and have two clusters, ES can be very sensitive to 
 network latency and you'll likely end up with more problems than it is 
 worth. 
 Given you already have the data source of truth being replicated, it's 
 the sanest option to just read that locally.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 6 May 2014 23:51, Sebastian Łaskawiec sebastian...@gmail.com wrote:

 Hi!

 I'd like to ask for advice about deployment in multi DC scenario.

 Currently we operate on 2 Data Centers in active/standby mode.  like to 
 opeIn case of ES we'd like to have different approach - we'drate in 
 active-active mode (we want to optimize our resources especially for 
 querying). 
 Here are some details about target configuration:

- 4 ES instances per DC. Full cluster will have 8 instances.
- Up to 1 TB of data 
- Data pulled from database using JDBC River
- Database is replicated asynchronously between DCs. Each DC will 
have its own database instance to pull data. 
- Average latency between DCs is about several miliseconds
- We need to operate when passive DC is down

 We know that multi DC configuration might end with Split Brain issue. 
 Here is how we want to prevent it:

- Set node.master: true only in 4 nodes in active DC
- Set node.master: false in passive DC
- This way we'll be sure that new cluster will not be created in 
passive DC 
- Additionally we'd like 

Re: Multi DC cluster or separate cluster per DC?

2014-05-14 Thread Amit Soni
I am just wondering whether elastic search team has any plans to add
features for multi-data center deployment (active-active)?

-Amit.


On Mon, May 12, 2014 at 11:02 AM, Deepak Jha dkjhan...@gmail.com wrote:

 Having a separate cluster is definitely a better way to go. OR, you can
 control the shard, replica placement so that they are always placed in the
 same DC. In this way, you can avoid interDC issues still having a single
 cluster. I have the similar issue and I am looking at it as one of the
 alternative.


 On Saturday, May 10, 2014 1:05:08 AM UTC-7, Sebastian Łaskawiec wrote:

 Thanks for the answer! We've been talking with several other teams in our
 company and it looks like this is the most recommended and stable setup.

 Regards
 Sebastian

 W dniu środa, 7 maja 2014 03:23:43 UTC+2 użytkownik Mark Walkom napisał:

 Go the latter method and have two clusters, ES can be very sensitive to
 network latency and you'll likely end up with more problems than it is
 worth.
 Given you already have the data source of truth being replicated, it's
 the sanest option to just read that locally.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 6 May 2014 23:51, Sebastian Łaskawiec sebastian...@gmail.com wrote:

 Hi!

 I'd like to ask for advice about deployment in multi DC scenario.

 Currently we operate on 2 Data Centers in active/standby mode.  like to
 opeIn case of ES we'd like to have different approach - we'drate in
 active-active mode (we want to optimize our resources especially for
 querying).
 Here are some details about target configuration:

- 4 ES instances per DC. Full cluster will have 8 instances.
- Up to 1 TB of data
- Data pulled from database using JDBC River
- Database is replicated asynchronously between DCs. Each DC will
have its own database instance to pull data.
- Average latency between DCs is about several miliseconds
- We need to operate when passive DC is down

 We know that multi DC configuration might end with Split Brain issue.
 Here is how we want to prevent it:

- Set node.master: true only in 4 nodes in active DC
- Set node.master: false in passive DC
- This way we'll be sure that new cluster will not be created in
passive DC
- Additionally we'd like to set discovery.zen.minimum_master_nodes:
3 (to avoid Split Brain in active DC)

 Additionally there is problem with switchover (passive DC becomes
 active and active becomes passive). In our system it takes about 20 minutes
 and this is the maximum length of our maintenance window. We were thinking
 of shutting down whole ES cluster and switch node.master setting in
 configuration files (as far as I know this settings can not be changed via
 REST api). Then we'd need to start whole cluster.

 So my question is: is it better to have one big ES cluster operating on
 both DCs or should we change our approach and create 2 separate clusters
 (and rely on database replication)? I'd be grateful for advice.

 Regards
 Sebastian

  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%
 40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/5875ae02-0cdd-4ce7-bce0-18e01bf0877a%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/5875ae02-0cdd-4ce7-bce0-18e01bf0877a%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAAOGaQJU_Mb2Puk9kc0KcEwZeaQj2XaFdCrUCuVMMa%2BWt_289A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Multi DC cluster or separate cluster per DC?

2014-05-12 Thread Deepak Jha
Having a separate cluster is definitely a better way to go. OR, you can 
control the shard, replica placement so that they are always placed in the 
same DC. In this way, you can avoid interDC issues still having a single 
cluster. I have the similar issue and I am looking at it as one of the 
alternative. 

On Saturday, May 10, 2014 1:05:08 AM UTC-7, Sebastian Łaskawiec wrote:

 Thanks for the answer! We've been talking with several other teams in our 
 company and it looks like this is the most recommended and stable setup.

 Regards
 Sebastian

 W dniu środa, 7 maja 2014 03:23:43 UTC+2 użytkownik Mark Walkom napisał:

 Go the latter method and have two clusters, ES can be very sensitive to 
 network latency and you'll likely end up with more problems than it is 
 worth. 
 Given you already have the data source of truth being replicated, it's 
 the sanest option to just read that locally.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 6 May 2014 23:51, Sebastian Łaskawiec sebastian...@gmail.com wrote:

 Hi!

 I'd like to ask for advice about deployment in multi DC scenario.

 Currently we operate on 2 Data Centers in active/standby mode.  like to 
 opeIn case of ES we'd like to have different approach - we'drate in 
 active-active mode (we want to optimize our resources especially for 
 querying). 
 Here are some details about target configuration:

- 4 ES instances per DC. Full cluster will have 8 instances.
- Up to 1 TB of data 
- Data pulled from database using JDBC River
- Database is replicated asynchronously between DCs. Each DC will 
have its own database instance to pull data. 
- Average latency between DCs is about several miliseconds
- We need to operate when passive DC is down

 We know that multi DC configuration might end with Split Brain issue. 
 Here is how we want to prevent it:

- Set node.master: true only in 4 nodes in active DC
- Set node.master: false in passive DC
- This way we'll be sure that new cluster will not be created in 
passive DC 
- Additionally we'd like to set discovery.zen.minimum_master_nodes: 
3 (to avoid Split Brain in active DC)

 Additionally there is problem with switchover (passive DC becomes active 
 and active becomes passive). In our system it takes about 20 minutes and 
 this is the maximum length of our maintenance window. We were thinking of 
 shutting down whole ES cluster and switch node.master setting in 
 configuration files (as far as I know this settings can not be changed via 
 REST api). Then we'd need to start whole cluster.

 So my question is: is it better to have one big ES cluster operating on 
 both DCs or should we change our approach and create 2 separate clusters 
 (and rely on database replication)? I'd be grateful for advice.

 Regards
 Sebastian

  -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5875ae02-0cdd-4ce7-bce0-18e01bf0877a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Multi DC cluster or separate cluster per DC?

2014-05-10 Thread Sebastian Łaskawiec
Thanks for the answer! We've been talking with several other teams in our 
company and it looks like this is the most recommended and stable setup.

Regards
Sebastian

W dniu środa, 7 maja 2014 03:23:43 UTC+2 użytkownik Mark Walkom napisał:

 Go the latter method and have two clusters, ES can be very sensitive to 
 network latency and you'll likely end up with more problems than it is 
 worth. 
 Given you already have the data source of truth being replicated, it's the 
 sanest option to just read that locally.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com


 On 6 May 2014 23:51, Sebastian Łaskawiec sebastian...@gmail.comjavascript:
  wrote:

 Hi!

 I'd like to ask for advice about deployment in multi DC scenario.

 Currently we operate on 2 Data Centers in active/standby mode.  like to 
 opeIn case of ES we'd like to have different approach - we'drate in 
 active-active mode (we want to optimize our resources especially for 
 querying). 
 Here are some details about target configuration:

- 4 ES instances per DC. Full cluster will have 8 instances.
- Up to 1 TB of data 
- Data pulled from database using JDBC River
- Database is replicated asynchronously between DCs. Each DC will 
have its own database instance to pull data. 
- Average latency between DCs is about several miliseconds
- We need to operate when passive DC is down

 We know that multi DC configuration might end with Split Brain issue. 
 Here is how we want to prevent it:

- Set node.master: true only in 4 nodes in active DC
- Set node.master: false in passive DC
- This way we'll be sure that new cluster will not be created in 
passive DC 
- Additionally we'd like to set discovery.zen.minimum_master_nodes: 3 
(to avoid Split Brain in active DC)

 Additionally there is problem with switchover (passive DC becomes active 
 and active becomes passive). In our system it takes about 20 minutes and 
 this is the maximum length of our maintenance window. We were thinking of 
 shutting down whole ES cluster and switch node.master setting in 
 configuration files (as far as I know this settings can not be changed via 
 REST api). Then we'd need to start whole cluster.

 So my question is: is it better to have one big ES cluster operating on 
 both DCs or should we change our approach and create 2 separate clusters 
 (and rely on database replication)? I'd be grateful for advice.

 Regards
 Sebastian

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/71a8db73-40bc-431d-bb9a-b581f510cf03%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Multi DC cluster or separate cluster per DC?

2014-05-06 Thread Sebastian Łaskawiec
Hi!

I'd like to ask for advice about deployment in multi DC scenario.

Currently we operate on 2 Data Centers in active/standby mode.  like to 
opeIn case of ES we'd like to have different approach - we'drate in 
active-active mode (we want to optimize our resources especially for 
querying). 
Here are some details about target configuration:

   - 4 ES instances per DC. Full cluster will have 8 instances.
   - Up to 1 TB of data
   - Data pulled from database using JDBC River
   - Database is replicated asynchronously between DCs. Each DC will have 
   its own database instance to pull data.
   - Average latency between DCs is about several miliseconds
   - We need to operate when passive DC is down

We know that multi DC configuration might end with Split Brain issue. Here 
is how we want to prevent it:

   - Set node.master: true only in 4 nodes in active DC
   - Set node.master: false in passive DC
   - This way we'll be sure that new cluster will not be created in passive 
   DC
   - Additionally we'd like to set discovery.zen.minimum_master_nodes: 3 
   (to avoid Split Brain in active DC)

Additionally there is problem with switchover (passive DC becomes active 
and active becomes passive). In our system it takes about 20 minutes and 
this is the maximum length of our maintenance window. We were thinking of 
shutting down whole ES cluster and switch node.master setting in 
configuration files (as far as I know this settings can not be changed via 
REST api). Then we'd need to start whole cluster.

So my question is: is it better to have one big ES cluster operating on 
both DCs or should we change our approach and create 2 separate clusters 
(and rely on database replication)? I'd be grateful for advice.

Regards
Sebastian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Multi DC cluster or separate cluster per DC?

2014-05-06 Thread Mark Walkom
Go the latter method and have two clusters, ES can be very sensitive to
network latency and you'll likely end up with more problems than it is
worth.
Given you already have the data source of truth being replicated, it's the
sanest option to just read that locally.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 6 May 2014 23:51, Sebastian Łaskawiec sebastian.laskaw...@gmail.comwrote:

 Hi!

 I'd like to ask for advice about deployment in multi DC scenario.

 Currently we operate on 2 Data Centers in active/standby mode.  like to
 opeIn case of ES we'd like to have different approach - we'drate in
 active-active mode (we want to optimize our resources especially for
 querying).
 Here are some details about target configuration:

- 4 ES instances per DC. Full cluster will have 8 instances.
- Up to 1 TB of data
- Data pulled from database using JDBC River
- Database is replicated asynchronously between DCs. Each DC will have
its own database instance to pull data.
- Average latency between DCs is about several miliseconds
- We need to operate when passive DC is down

 We know that multi DC configuration might end with Split Brain issue. Here
 is how we want to prevent it:

- Set node.master: true only in 4 nodes in active DC
- Set node.master: false in passive DC
- This way we'll be sure that new cluster will not be created in
passive DC
- Additionally we'd like to set discovery.zen.minimum_master_nodes: 3
(to avoid Split Brain in active DC)

 Additionally there is problem with switchover (passive DC becomes active
 and active becomes passive). In our system it takes about 20 minutes and
 this is the maximum length of our maintenance window. We were thinking of
 shutting down whole ES cluster and switch node.master setting in
 configuration files (as far as I know this settings can not be changed via
 REST api). Then we'd need to start whole cluster.

 So my question is: is it better to have one big ES cluster operating on
 both DCs or should we change our approach and create 2 separate clusters
 (and rely on database replication)? I'd be grateful for advice.

 Regards
 Sebastian

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624b94kkDgY5ehdwSvPkA4TaZ9QPvds%3DZHsJ%2B5DFX1_e3xQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.