Logan,

We currently run CS in multiple geographically separate DCs, and may be able to 
give you a little insight into things.

We run KVM in advanced networking mode, with CLVM clusters backed onto Dell 
Compellent SANs. We currently have different DCs running different zones per 
DC, in a single region. We've been running CS in production now since 4.0 prior 
to regions, so that functionality (along with its limitations) hasn't been 
something we've adopted yet. We run our Management (With Multiple clustered 
nodes) out of 1 DC, and have a backup set of Management Nodes in another DC 
should we need to invoke BCDR in the event the primary Management nodes became 
unavailable.

Your concerns regarding HA problems are founded. We run our own nationwide MPLS 
backbone, and therefore have multiple high capacity bandwidth paths between our 
different DCs, and even with that capacity and fault tolerant design, we've 
seen issues where Management has attempted to invoke HA due to brief loss of 
connectivity (typically due to maintenance or grooming activity), and this can 
be quite problematic. VPN tunnels are going to be very challenging for you, and 
you really need to look at VPLS or some other technology that can layer on top 
of a resilient infrastructure with multiple paths and fast failover (e.g. MPLS 
Fast Reroute).

Ideally, regions should solve this with dedicated local management nodes, but 
until the syncing is sorted out, and those newer releases are stable, there 
isn't much option short of using a single region right now, short of setting up 
a completely separate CS instances per DC. 

Hope this helps a little.

- Si

________________________________________
From: Logan Barfield <lbarfi...@tqhosting.com>
Sent: Tuesday, January 06, 2015 1:45 PM
To: dev@cloudstack.apache.org; us...@cloudstack.apache.org
Subject: Multi-Datacenter Deployment

We are currently running a single location CloudStack deployment:
- 1 Hardware firewall
- 1 Mangement/Database Server
- 1 NFS staging store (for S3 secondary storage)
- Ceph RBD for primary storage
- 4 Hypervisors
- 1 Zone/Pod/Cluster

We are looking to expand our deployment to other datacenters, and I'm
trying to determine the best way to go about it.  The documentation is a
bit lacking for multi-site deployments.

Our goal for the multi-site deployment is to have a zone for each site
(E.G. US East, US West, Europe) that our customers can use to deploy
instances in their preferred geographic area.

Since we don't want to have different accounts for every datacenter, I
don't think using Regions makes sense for us (and I'm not sure what they're
actually good for without keeping accounts/users/domains in sync).

Right now I'm thinking our setup will be as follows:
- Firewall, Management Server, NFS staging server, primary storage, and
Hypervisors in each datacenter.
- All Management servers will be on the same management network.
- Management servers will be connected via site-to-site VPN links over WAN.
- MySQL replication (Percona?) will be set up on the management servers.
Having an odd number of servers to protect against split brain, and keeping
redundant database backups.
- One region (default)
- One zone for each datacenter
- Geo-enabled DNS to direct customers to the nearest Management server
- Object storage for secondary storage across cloud.

My primary concerns with this setup are:
- I haven't really seen multi-site deployments details anywhere.
- Potential for split-brain.
- How will HA be handled (e.g., if a VPN link goes down and one of the
remote management servers can't contact a host, will it try to initiate
HA?) - This sort of goes along with the split brain problem.

Are my assumptions here sound, or is there a standard recommended way of
doing multi-site deployments?

Any suggestions are much appreciated.

Reply via email to