Re: [ceph-users] Have 2 different public networks

Alex Moore Sat, 20 Dec 2014 11:55:56 -0800

Thought I'd share details of my setup as I am effectively achieving this(ie making monitors accessible over multiple interfaces) with IP routingas follows:

My Ceph hosts each have a /32 IP address on a loopback interface. Andthat is the IP address that all their Ceph daemons are bound to. Inceph.conf I do that by setting all the following values to the host'sloopback IP: "mon addr" in the [mon.x] sections; "cluster addr" and"public addr" in the [mds.x] sections; and "cluster addr", "publicaddr", and "osd heartbeat addr" in the [osd.x] sections. Then I use IProuting to ensure that the hosts can all reach each other's loopbackIPs, and that clients can reach them too, over the relevant networks.This also allows inter-host traffic to fail over to using alternatepaths if the normal path is down for some reason.

To put this in context, my cluster is just a small 3-node cluster, and Ihave a pair of layer 3 switches, with networking arranged like this:


node1:

has an 8 Gbps point-to-point routed link to node2 (and uses this link tocommunicate with node2 under normal circumstances)has an 8 Gbps point-to-point routed link to node3 (and uses this link tocommunicate with node3 under normal circumstances)has a 2 Gbps routed link to the primary layer 3 switch (clients whichare not other nodes in the cluster use this link to communicate withnode1 under normal circumstances, and traffic to nodes 2 and 3 willswitch to using this if their respective point-to-point links go down)has a 1 Gbps routed link to the secondary layer 3 switch (this is notused under normal circumstances - only if the 2 Gbps link goes down)


node2:

has an 8 Gbps point-to-point routed link to node3 (and uses this link tocommunicate with node3 under normal circumstances)has an 8 Gbps point-to-point routed link to node1 (and uses this link tocommunicate with node1 under normal circumstances)has a 2 Gbps routed link to the primary layer 3 switch (clients whichare not other nodes in the cluster use this link to communicate withnode2 under normal circumstances, and traffic to nodes 3 and 1 willswitch to using this if their respective point-to-point links go down)has a 1 Gbps routed link to the secondary layer 3 switch (this is notused under normal circumstances - only if the 2 Gbps link goes down)


node3:

has an 8 Gbps point-to-point routed link to node2 (and uses this link tocommunicate with node1 under normal circumstances)has an 8 Gbps point-to-point routed link to node3 (and uses this link tocommunicate with node2 under normal circumstances)has a 2 Gbps routed link to the primary layer 3 switch (clients whichare not other nodes in the cluster use this link to communicate withnode3 under normal circumstances, and traffic to nodes 1 and 2 willswitch to using this if their respective point-to-point links go down)has a 1 Gbps routed link to the secondary layer 3 switch (this is notused under normal circumstances - only if the 2 Gbps link goes down)

I have avoided using a proper routing protocol for this, as the failoverstill works automatically when links go down even with static routes.Although I do also have scripts running on the hosts that detect when adevice at the other end of a link is not pingable even though the linkis up, and dynamically removes/inserts the routes as necessary in such asituation. But adapting this approach to a larger cluster wherepoint-to-point links between all hosts isn't viable might well warrantuse of a routing protocol.

The end result being that I have more control over where the differenttraffic goes, and it allows me to mess around with the networkingwithout any effect on the cluster.


Alex

On 20/12/2014 5:23 AM, Craig Lewis wrote:

On Fri, Dec 19, 2014 at 6:19 PM, Francois Lafont <flafdiv...@free.fr<mailto:flafdiv...@free.fr>> wrote:
    So, indeed, I have to use routing *or* maybe create 2 monitors
    by server like this:

    [mon.node1-public1]
        host     = ceph-node1
        mon addr = 10.0.1.1

    [mon.node1-public2]
        host     = ceph-node1
        mon addr = 10.0.2.1

    # etc...

    But, in this case, the working directories of mon.node1-public1
    and mon.node1-public2 will be in the same disk (I have no
    choice). Is it a problem? Are monitors big consumers of I/O disk?
Interesting idea. While you will have an even number of monitors,you'll still have an odd number of failure domains. I'm not sure ifit'll work though... make sure you test having the leader on bothnetworks. It might cause problems if the leader is on the 10.0.1.0/24<http://10.0.1.0/24> network?
Monitors can be big consumers of disk IO, if there is a lot of clusteractivity. Monitors records all of the cluster changes in LevelDB, andsend copies to all of the daemons. There have been posts to the MLabout people running out of Disk IOps on the monitors, and theproblems it causes. The bigger the cluster, the more IOps. As longas you monitor and alert on your monitor disk IOps, I don't think itwould be a problem.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Have 2 different public networks

Reply via email to