On 14/06/2012 03:01, Jake Smith wrote:
----- Original Message -----
From: "Arnold Krille" <arn...@arnoldarts.de>
To: drbd-user@lists.linbit.com
Sent: Wednesday, June 13, 2012 3:04:04 PM
Subject: Re: [DRBD-user] Corosync Configuration

On 13.06.2012 17:56, William Seligman wrote:
A data point:

On my cluster, I have two dedicated direct-link cables between the
two nodes,
one for DRBD traffic, the other for corosync/pacemaker traffic.
Roughly once per
week, I get a "link down" messages on one of the nodes:
A) use several communication-rings in corosync. We use one on the
regular user-network and a second on the storage-network. One fails,
no
problem, corosync doesn't sense a need to fence something.
B) use bonded/bridged interfaces for the storage-connection. We
currently have our storage-network aka vlan17 as vlan on eth0 of all
the
servers and untagged on eth1, using a bond with active-backup mode
where
eth1 is the primary and vlan17 the backup.

With these two I didn't even realize my boss unplugged the
network-cables of one of our servers one by one. Neither did drbd
feel
any glitch nor did the cluster feel a need to move/kill/fence
anything.
And a 5 second hang for the x2go-sessions on one of the machines
doesn't
matter when everyone is on break.

I haven't yet figured out how to build the bridges/bonds when all the
servers have 4 nics. But that isn't a real problem until I also did
functionality tests with two (or three) new switches.
I think I will do one bridge of two ports with rstp for the normal
user
network and one bridge of two ports with rstp for the
storage-network.
Then skip the active-backup bonding and see that rstp manages to find
the paths. Of course this wouldn't necessarily improve throughput
between two nodes, but throughput from one node to two nodes would
probably be higher.
Or I extend my current setup and instead of eth0 and eth1 I use one
pair
of bonded ports each. Which would give me a total of three bonds per
server, two one of the 'real' modes and one in active-backup mode...

This is whole thing is somewhat off-topic for DRBD and we should probably move 
the thread to the pacemaker mailing list but since no one has complained I'll 
chime in with our setup :-)

We have a pair of bonded nics in each server using round robin directly 
connected to each other for drbd sync/corosync first ring.

Then we have a second pair of nics on our regular network.  These are using 
802.3ad link aggregation through a HP 5412 chassis switch.  That switch 
supports link aggregation between seperate switch modules so each of the two 
nics on each server are connected to different modules giving us greater fault 
tolerance as well as aggregation.  We run our second corosync ring on that bond.

We are doing something similar in our storage setup - balance-rr round robin with multiple ports between two servers directly for the first ring and using the LAN (no bonding) for the second ring.

Using a switch in the middle did /not/ work for us with balance-rr as miimon can detect a link failure with a back to back connection (i.e.balance-rr across vlans/switches would cause dropped packets if the eth0 link was connected from san0 but eth0 was down on san1)

We also use 802.3.ad to diverse HP 2010 switches for networking of KVM hosts and it works great.

This config theoretically gives us the potential of 3 nic/cable failures 
without requiring a STONITH.
We've not had a STONITH event due to a link problem since we got the setup 
running (over 9 months).

Jake
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to