Re: Multi DC cluster or separate cluster per DC?
We are still thinking about production configuration and here is a short list of single/separate cluster's advantages and disadvantages... Single cluster: - (+) If you have single cluster - you perform single query to the database. In case of having cluster per DC - each cluster needs to query DB separately - (+) Data consistency - in the matter of fact this is achieved by single query to the DB - (+) You can introduce new DC easily - (+) True active-active configuration - (-) Split brain and pretty complicated configuration (to avoid split brain in case when DC link is down) - (-) node.master setting can not be changed in runtime (take a look at my first post and split brain solution) - (-) In case of a disaster we need to operate on single DC. If you use single cluster per 2 DCs you can't really tell if a single DC is strong enough to handle query and indexing load - (-) In pessimistic scenario data travels through WAN 2 times (first time - database replication, second time - ES replication) - (-) You can't really tell which node will respond to the query. Let's assume that you have full index in each DC (force awareness option). ES might decide to gather results from the remote DC and not from the local one. This way you need to add WAN latency into your query time. - (-) You need to turn off whole cluster or perform cycle restarts during upgrade Separate cluster per DC: - (+) No Split brain - (+) You can tell precisely when you are out of resources to handle load in ES cluster in each DC - (+) You can experiment with different settings on production. If something goes wrong - just switch clients to standby DC. - (+) Full failover - in case of any problems - just switch to the other DC - (+) Upgrades are easy and you have no down time (upgrade first DC, stabilize it, test it, and then to the same to the other DC) - (+) Since these are 2 separate clusters you can avoid data traveling through WAN during queries. Each DC queries nodes locally. - (-) It is not a full active-active configuration. It's more like an active-standby configuration - (-) Data inconsistency might occur (different results when queried local and remote DC) - (-) Each DC will query DB separately. This will generate additional load to the DB Right now we think we should go for 2 separate clusters. DB load is a thing which worries me the most (we have really complicated query with a lot of left joins). However we think that in our case having to separate DC have more advantages then disadvantages. If you have some more arguments or comments - please let us know :) Regards Sebastian W dniu poniedziałek, 12 maja 2014 20:02:35 UTC+2 użytkownik Deepak Jha napisał: Having a separate cluster is definitely a better way to go. OR, you can control the shard, replica placement so that they are always placed in the same DC. In this way, you can avoid interDC issues still having a single cluster. I have the similar issue and I am looking at it as one of the alternative. On Saturday, May 10, 2014 1:05:08 AM UTC-7, Sebastian Łaskawiec wrote: Thanks for the answer! We've been talking with several other teams in our company and it looks like this is the most recommended and stable setup. Regards Sebastian W dniu środa, 7 maja 2014 03:23:43 UTC+2 użytkownik Mark Walkom napisał: Go the latter method and have two clusters, ES can be very sensitive to network latency and you'll likely end up with more problems than it is worth. Given you already have the data source of truth being replicated, it's the sanest option to just read that locally. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 6 May 2014 23:51, Sebastian Łaskawiec sebastian...@gmail.com wrote: Hi! I'd like to ask for advice about deployment in multi DC scenario. Currently we operate on 2 Data Centers in active/standby mode. like to opeIn case of ES we'd like to have different approach - we'drate in active-active mode (we want to optimize our resources especially for querying). Here are some details about target configuration: - 4 ES instances per DC. Full cluster will have 8 instances. - Up to 1 TB of data - Data pulled from database using JDBC River - Database is replicated asynchronously between DCs. Each DC will have its own database instance to pull data. - Average latency between DCs is about several miliseconds - We need to operate when passive DC is down We know that multi DC configuration might end with Split Brain issue. Here is how we want to prevent it: - Set node.master: true only in 4 nodes in active DC - Set node.master: false in passive DC - This way we'll be sure that new cluster will not be created in passive DC - Additionally we'd like
Re: Multi DC cluster or separate cluster per DC?
I am just wondering whether elastic search team has any plans to add features for multi-data center deployment (active-active)? -Amit. On Mon, May 12, 2014 at 11:02 AM, Deepak Jha dkjhan...@gmail.com wrote: Having a separate cluster is definitely a better way to go. OR, you can control the shard, replica placement so that they are always placed in the same DC. In this way, you can avoid interDC issues still having a single cluster. I have the similar issue and I am looking at it as one of the alternative. On Saturday, May 10, 2014 1:05:08 AM UTC-7, Sebastian Łaskawiec wrote: Thanks for the answer! We've been talking with several other teams in our company and it looks like this is the most recommended and stable setup. Regards Sebastian W dniu środa, 7 maja 2014 03:23:43 UTC+2 użytkownik Mark Walkom napisał: Go the latter method and have two clusters, ES can be very sensitive to network latency and you'll likely end up with more problems than it is worth. Given you already have the data source of truth being replicated, it's the sanest option to just read that locally. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 6 May 2014 23:51, Sebastian Łaskawiec sebastian...@gmail.com wrote: Hi! I'd like to ask for advice about deployment in multi DC scenario. Currently we operate on 2 Data Centers in active/standby mode. like to opeIn case of ES we'd like to have different approach - we'drate in active-active mode (we want to optimize our resources especially for querying). Here are some details about target configuration: - 4 ES instances per DC. Full cluster will have 8 instances. - Up to 1 TB of data - Data pulled from database using JDBC River - Database is replicated asynchronously between DCs. Each DC will have its own database instance to pull data. - Average latency between DCs is about several miliseconds - We need to operate when passive DC is down We know that multi DC configuration might end with Split Brain issue. Here is how we want to prevent it: - Set node.master: true only in 4 nodes in active DC - Set node.master: false in passive DC - This way we'll be sure that new cluster will not be created in passive DC - Additionally we'd like to set discovery.zen.minimum_master_nodes: 3 (to avoid Split Brain in active DC) Additionally there is problem with switchover (passive DC becomes active and active becomes passive). In our system it takes about 20 minutes and this is the maximum length of our maintenance window. We were thinking of shutting down whole ES cluster and switch node.master setting in configuration files (as far as I know this settings can not be changed via REST api). Then we'd need to start whole cluster. So my question is: is it better to have one big ES cluster operating on both DCs or should we change our approach and create 2 separate clusters (and rely on database replication)? I'd be grateful for advice. Regards Sebastian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f% 40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5875ae02-0cdd-4ce7-bce0-18e01bf0877a%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/5875ae02-0cdd-4ce7-bce0-18e01bf0877a%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAAOGaQJU_Mb2Puk9kc0KcEwZeaQj2XaFdCrUCuVMMa%2BWt_289A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Multi DC cluster or separate cluster per DC?
Having a separate cluster is definitely a better way to go. OR, you can control the shard, replica placement so that they are always placed in the same DC. In this way, you can avoid interDC issues still having a single cluster. I have the similar issue and I am looking at it as one of the alternative. On Saturday, May 10, 2014 1:05:08 AM UTC-7, Sebastian Łaskawiec wrote: Thanks for the answer! We've been talking with several other teams in our company and it looks like this is the most recommended and stable setup. Regards Sebastian W dniu środa, 7 maja 2014 03:23:43 UTC+2 użytkownik Mark Walkom napisał: Go the latter method and have two clusters, ES can be very sensitive to network latency and you'll likely end up with more problems than it is worth. Given you already have the data source of truth being replicated, it's the sanest option to just read that locally. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 6 May 2014 23:51, Sebastian Łaskawiec sebastian...@gmail.com wrote: Hi! I'd like to ask for advice about deployment in multi DC scenario. Currently we operate on 2 Data Centers in active/standby mode. like to opeIn case of ES we'd like to have different approach - we'drate in active-active mode (we want to optimize our resources especially for querying). Here are some details about target configuration: - 4 ES instances per DC. Full cluster will have 8 instances. - Up to 1 TB of data - Data pulled from database using JDBC River - Database is replicated asynchronously between DCs. Each DC will have its own database instance to pull data. - Average latency between DCs is about several miliseconds - We need to operate when passive DC is down We know that multi DC configuration might end with Split Brain issue. Here is how we want to prevent it: - Set node.master: true only in 4 nodes in active DC - Set node.master: false in passive DC - This way we'll be sure that new cluster will not be created in passive DC - Additionally we'd like to set discovery.zen.minimum_master_nodes: 3 (to avoid Split Brain in active DC) Additionally there is problem with switchover (passive DC becomes active and active becomes passive). In our system it takes about 20 minutes and this is the maximum length of our maintenance window. We were thinking of shutting down whole ES cluster and switch node.master setting in configuration files (as far as I know this settings can not be changed via REST api). Then we'd need to start whole cluster. So my question is: is it better to have one big ES cluster operating on both DCs or should we change our approach and create 2 separate clusters (and rely on database replication)? I'd be grateful for advice. Regards Sebastian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5875ae02-0cdd-4ce7-bce0-18e01bf0877a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Multi DC cluster or separate cluster per DC?
Thanks for the answer! We've been talking with several other teams in our company and it looks like this is the most recommended and stable setup. Regards Sebastian W dniu środa, 7 maja 2014 03:23:43 UTC+2 użytkownik Mark Walkom napisał: Go the latter method and have two clusters, ES can be very sensitive to network latency and you'll likely end up with more problems than it is worth. Given you already have the data source of truth being replicated, it's the sanest option to just read that locally. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 6 May 2014 23:51, Sebastian Łaskawiec sebastian...@gmail.comjavascript: wrote: Hi! I'd like to ask for advice about deployment in multi DC scenario. Currently we operate on 2 Data Centers in active/standby mode. like to opeIn case of ES we'd like to have different approach - we'drate in active-active mode (we want to optimize our resources especially for querying). Here are some details about target configuration: - 4 ES instances per DC. Full cluster will have 8 instances. - Up to 1 TB of data - Data pulled from database using JDBC River - Database is replicated asynchronously between DCs. Each DC will have its own database instance to pull data. - Average latency between DCs is about several miliseconds - We need to operate when passive DC is down We know that multi DC configuration might end with Split Brain issue. Here is how we want to prevent it: - Set node.master: true only in 4 nodes in active DC - Set node.master: false in passive DC - This way we'll be sure that new cluster will not be created in passive DC - Additionally we'd like to set discovery.zen.minimum_master_nodes: 3 (to avoid Split Brain in active DC) Additionally there is problem with switchover (passive DC becomes active and active becomes passive). In our system it takes about 20 minutes and this is the maximum length of our maintenance window. We were thinking of shutting down whole ES cluster and switch node.master setting in configuration files (as far as I know this settings can not be changed via REST api). Then we'd need to start whole cluster. So my question is: is it better to have one big ES cluster operating on both DCs or should we change our approach and create 2 separate clusters (and rely on database replication)? I'd be grateful for advice. Regards Sebastian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/71a8db73-40bc-431d-bb9a-b581f510cf03%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Multi DC cluster or separate cluster per DC?
Hi! I'd like to ask for advice about deployment in multi DC scenario. Currently we operate on 2 Data Centers in active/standby mode. like to opeIn case of ES we'd like to have different approach - we'drate in active-active mode (we want to optimize our resources especially for querying). Here are some details about target configuration: - 4 ES instances per DC. Full cluster will have 8 instances. - Up to 1 TB of data - Data pulled from database using JDBC River - Database is replicated asynchronously between DCs. Each DC will have its own database instance to pull data. - Average latency between DCs is about several miliseconds - We need to operate when passive DC is down We know that multi DC configuration might end with Split Brain issue. Here is how we want to prevent it: - Set node.master: true only in 4 nodes in active DC - Set node.master: false in passive DC - This way we'll be sure that new cluster will not be created in passive DC - Additionally we'd like to set discovery.zen.minimum_master_nodes: 3 (to avoid Split Brain in active DC) Additionally there is problem with switchover (passive DC becomes active and active becomes passive). In our system it takes about 20 minutes and this is the maximum length of our maintenance window. We were thinking of shutting down whole ES cluster and switch node.master setting in configuration files (as far as I know this settings can not be changed via REST api). Then we'd need to start whole cluster. So my question is: is it better to have one big ES cluster operating on both DCs or should we change our approach and create 2 separate clusters (and rely on database replication)? I'd be grateful for advice. Regards Sebastian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Multi DC cluster or separate cluster per DC?
Go the latter method and have two clusters, ES can be very sensitive to network latency and you'll likely end up with more problems than it is worth. Given you already have the data source of truth being replicated, it's the sanest option to just read that locally. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 6 May 2014 23:51, Sebastian Łaskawiec sebastian.laskaw...@gmail.comwrote: Hi! I'd like to ask for advice about deployment in multi DC scenario. Currently we operate on 2 Data Centers in active/standby mode. like to opeIn case of ES we'd like to have different approach - we'drate in active-active mode (we want to optimize our resources especially for querying). Here are some details about target configuration: - 4 ES instances per DC. Full cluster will have 8 instances. - Up to 1 TB of data - Data pulled from database using JDBC River - Database is replicated asynchronously between DCs. Each DC will have its own database instance to pull data. - Average latency between DCs is about several miliseconds - We need to operate when passive DC is down We know that multi DC configuration might end with Split Brain issue. Here is how we want to prevent it: - Set node.master: true only in 4 nodes in active DC - Set node.master: false in passive DC - This way we'll be sure that new cluster will not be created in passive DC - Additionally we'd like to set discovery.zen.minimum_master_nodes: 3 (to avoid Split Brain in active DC) Additionally there is problem with switchover (passive DC becomes active and active becomes passive). In our system it takes about 20 minutes and this is the maximum length of our maintenance window. We were thinking of shutting down whole ES cluster and switch node.master setting in configuration files (as far as I know this settings can not be changed via REST api). Then we'd need to start whole cluster. So my question is: is it better to have one big ES cluster operating on both DCs or should we change our approach and create 2 separate clusters (and rely on database replication)? I'd be grateful for advice. Regards Sebastian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624b94kkDgY5ehdwSvPkA4TaZ9QPvds%3DZHsJ%2B5DFX1_e3xQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.