Re: [DRBD-user] DRBD with CentOS in Production?

Adam Goryachev Wed, 14 Aug 2013 19:21:39 -0700

On 15/08/13 04:54, Digimer wrote:

On 14/08/13 10:58, Christian Völker wrote:
Hi all,
I'm planning to use DRBD in a production environment. I prefer to use
CentOS as base system.
The reason to use drbd is the synchronisation, not the highavailability.
We'll have two locations connected through a 100Mbit line. On both
locations users will access the data at the same time. So I know I have
to use a cluster aware filesystem.

I'm a little bit unsure about the performance- of course it will slow
down all access to the device which might be secondary. But are there
any tweaks to improve the performance despite of the slow 100 Mbit
connection?

So questions are:
Is CentOS6 with DRBD suitable for production use?
Which filesystem is recommended? GFS? ZFS (experimental?)?

Thanks& Greetings

Christian
First, the short answer; Yes, DRBD on CentOS 6 is perfectly stable.I've used 8.3.{11~15} on CentOS 6.{0~4} in production without issue. Ialso use GFS2 partitions on all my clusters without issue.
If you want both locations to have simultaneous access to the storage/ filesystem, then you need a cluster aware filesystem and you need torun DRBD in dual-primary. This, in turn, requires the use of "ProtocolC" which says that DRBD will not tell the caller that the write wascompleted until it has hit persistent storage on both nodes.Effectively making your storage performance that of the speed/latencyof your network link. Across a 100 Mbit link, this means that your rawwrite speeds will never exceed ~11~12 MB/sec. The write latency willalso be the same as the network link's latency + the storage latency.
Performance will not be stellar.
What you're proposing is called a "stretch cluster" and it'snotoriously hard to do well.
There is a further complication to your plan though; It will be nearlyimpossible to differentiate a broken link from a failed remote server.So your network link becomes a single point of failure... If the linkbreaks, both nodes will block and call a fence against their peer. Thefence will fail because the link to the fence device is lost, so thenodes will remain blocked and your storage will hang (better to hangthan to risk corruption). The fence actions will remain pending forhowever long it takes to repair the link, and then both will try tofence the other at the exact same time. There is a chance that, postnetwork repair, both nodes will get fenced and you will have tomanually boot the nodes back up.
There is yet another concern; Corosync expects low latency networks(corosync being the communication and membership layer of thecluster). So you will need to allocate time to tweaking the corosynctimeouts to handle your high-latency network. If there is anintermittent blip in your network that exceeds corosync's timeouts,the cluster will partition and one or both of the nodes will befenced, as per the issue above.
You said that "HA" is not your highest concern, so this might be anacceptable risk... You have to make that call. The software is stablethough. Your implementation may not be stable, however.
digimer


Just to jump in on this....

If one of those nodes was for a small mini-office, could you configurethe small office node to always shutdown/disable access/offline, and thebig office to become a single primary upon link failure? Then onrecovery, the small office node reconnects, resyncs any updates, andthen returns to dual primary?

I don't know if this is applicable for Christian, but it is somethingI've considered previously.



Regards,
Adam

--
Adam Goryachev Website Managers www.websitemanagers.com.au
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD with CentOS in Production?

Reply via email to