I appreciate the explanation.  That seems to confirm what I was thinking,
that until regions are working 100% we'll just have to make sure the
DC-to-DC links are as stable/redundant as possible to prevent HA issues.
If we increase the HA delay it shouldn't be a major issue, and it will
still be better than nothing.

For us is probably also makes sense to not worry about having management
servers in each DC for now.  If we have a big enough outage in our primary
DC to affect access to the management server we probably have bigger
problems to worry about.

Much appreciated!


Thank You,

Logan Barfield
Tranquil Hosting

On Wed, Jan 7, 2015 at 12:15 PM, Simon Weller <swel...@ena.com> wrote:

> Logan,
>
> We currently run CS in multiple geographically separate DCs, and may be
> able to give you a little insight into things.
>
> We run KVM in advanced networking mode, with CLVM clusters backed onto
> Dell Compellent SANs. We currently have different DCs running different
> zones per DC, in a single region. We've been running CS in production now
> since 4.0 prior to regions, so that functionality (along with its
> limitations) hasn't been something we've adopted yet. We run our Management
> (With Multiple clustered nodes) out of 1 DC, and have a backup set of
> Management Nodes in another DC should we need to invoke BCDR in the event
> the primary Management nodes became unavailable.
>
> Your concerns regarding HA problems are founded. We run our own nationwide
> MPLS backbone, and therefore have multiple high capacity bandwidth paths
> between our different DCs, and even with that capacity and fault tolerant
> design, we've seen issues where Management has attempted to invoke HA due
> to brief loss of connectivity (typically due to maintenance or grooming
> activity), and this can be quite problematic. VPN tunnels are going to be
> very challenging for you, and you really need to look at VPLS or some other
> technology that can layer on top of a resilient infrastructure with
> multiple paths and fast failover (e.g. MPLS Fast Reroute).
>
> Ideally, regions should solve this with dedicated local management nodes,
> but until the syncing is sorted out, and those newer releases are stable,
> there isn't much option short of using a single region right now, short of
> setting up a completely separate CS instances per DC.
>
> Hope this helps a little.
>
> - Si
>
> ________________________________________
> From: Logan Barfield <lbarfi...@tqhosting.com>
> Sent: Tuesday, January 06, 2015 1:45 PM
> To: dev@cloudstack.apache.org; us...@cloudstack.apache.org
> Subject: Multi-Datacenter Deployment
>
> We are currently running a single location CloudStack deployment:
> - 1 Hardware firewall
> - 1 Mangement/Database Server
> - 1 NFS staging store (for S3 secondary storage)
> - Ceph RBD for primary storage
> - 4 Hypervisors
> - 1 Zone/Pod/Cluster
>
> We are looking to expand our deployment to other datacenters, and I'm
> trying to determine the best way to go about it.  The documentation is a
> bit lacking for multi-site deployments.
>
> Our goal for the multi-site deployment is to have a zone for each site
> (E.G. US East, US West, Europe) that our customers can use to deploy
> instances in their preferred geographic area.
>
> Since we don't want to have different accounts for every datacenter, I
> don't think using Regions makes sense for us (and I'm not sure what they're
> actually good for without keeping accounts/users/domains in sync).
>
> Right now I'm thinking our setup will be as follows:
> - Firewall, Management Server, NFS staging server, primary storage, and
> Hypervisors in each datacenter.
> - All Management servers will be on the same management network.
> - Management servers will be connected via site-to-site VPN links over WAN.
> - MySQL replication (Percona?) will be set up on the management servers.
> Having an odd number of servers to protect against split brain, and keeping
> redundant database backups.
> - One region (default)
> - One zone for each datacenter
> - Geo-enabled DNS to direct customers to the nearest Management server
> - Object storage for secondary storage across cloud.
>
> My primary concerns with this setup are:
> - I haven't really seen multi-site deployments details anywhere.
> - Potential for split-brain.
> - How will HA be handled (e.g., if a VPN link goes down and one of the
> remote management servers can't contact a host, will it try to initiate
> HA?) - This sort of goes along with the split brain problem.
>
> Are my assumptions here sound, or is there a standard recommended way of
> doing multi-site deployments?
>
> Any suggestions are much appreciated.
>

Reply via email to