[ceph-users] Re: Cluster network and public network
What is a saturated network with modern switched technologies? Links to individual hosts? Uplinks from TORS (public)? Switch backplane (cluster)? > That is correct.I didn't explain it clearly. I said that is because in > some write only scenario the public network and cluster network will > all be saturated the same time. > linyunfan > > Janne Johansson 于2020年5月14日周四 下午3:42写道: >> >> Den tors 14 maj 2020 kl 08:42 skrev lin yunfan : >>> >>> Besides the recoverry scenario , in a write only scenario the cluster >>> network will use the almost the same bandwith as public network. >> >> >> That would depend on the replication factor. If it is high, I would assume >> every MB from the client network would make (repl-factor - 1) times the data >> on the private network to send replication requests to the other OSD hosts >> with the same amount of data. >> >> -- >> May the most significant bit of your life be positive. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster network and public network
Den tors 14 maj 2020 kl 10:46 skrev Amudhan P : > Will EC based write benefit from Public network and Cluster network? > I guess this depends on what parameters you use. All in all I think using one network is probably better, and the cases where I have seen missing heartbeats, it's not the network that prevents packets from coming over, it is the OSDs that make themselves busy doing something else instead of heartbeating that makes them flip out, so if you can set it up with LACP or other kinds of bonding, you allow the OSD hosts to use the network optimally regardless of what state they are in. > On Thu, May 14, 2020 at 1:39 PM lin yunfan wrote: > >> That is correct.I didn't explain it clearly. I said that is because in >> some write only scenario the public network and cluster network will >> all be saturated the same time. >> linyunfan >> >> Janne Johansson 于2020年5月14日周四 下午3:42写道: >> > >> > Den tors 14 maj 2020 kl 08:42 skrev lin yunfan : >> >> >> >> Besides the recoverry scenario , in a write only scenario the cluster >> >> network will use the almost the same bandwith as public network. >> > -- May the most significant bit of your life be positive. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster network and public network
Will EC based write benefit from Public network and Cluster network? On Thu, May 14, 2020 at 1:39 PM lin yunfan wrote: > That is correct.I didn't explain it clearly. I said that is because in > some write only scenario the public network and cluster network will > all be saturated the same time. > linyunfan > > Janne Johansson 于2020年5月14日周四 下午3:42写道: > > > > Den tors 14 maj 2020 kl 08:42 skrev lin yunfan : > >> > >> Besides the recoverry scenario , in a write only scenario the cluster > >> network will use the almost the same bandwith as public network. > > > > > > That would depend on the replication factor. If it is high, I would > assume every MB from the client network would make (repl-factor - 1) times > the data on the private network to send replication requests to the other > OSD hosts with the same amount of data. > > > > -- > > May the most significant bit of your life be positive. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster network and public network
That is correct.I didn't explain it clearly. I said that is because in some write only scenario the public network and cluster network will all be saturated the same time. linyunfan Janne Johansson 于2020年5月14日周四 下午3:42写道: > > Den tors 14 maj 2020 kl 08:42 skrev lin yunfan : >> >> Besides the recoverry scenario , in a write only scenario the cluster >> network will use the almost the same bandwith as public network. > > > That would depend on the replication factor. If it is high, I would assume > every MB from the client network would make (repl-factor - 1) times the data > on the private network to send replication requests to the other OSD hosts > with the same amount of data. > > -- > May the most significant bit of your life be positive. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster network and public network
Den tors 14 maj 2020 kl 08:42 skrev lin yunfan : > Besides the recoverry scenario , in a write only scenario the cluster > network will use the almost the same bandwith as public network. > That would depend on the replication factor. If it is high, I would assume every MB from the client network would make (repl-factor - 1) times the data on the private network to send replication requests to the other OSD hosts with the same amount of data. -- May the most significant bit of your life be positive. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster network and public network
Besides the recoverry scenario , in a write only scenario the cluster network will use the almost the same bandwith as public network. linyunfan Anthony D'Atri 于2020年5月9日周六 下午4:32写道: > > > > Hi, > > > > I deployed few clusters with two networks as well as only one network. > > There has little impact between them for my experience. > > > > I did a performance test on nautilus cluster with two networks last week. > > What I found is that the cluster network has low bandwidth usage > > During steady-state, sure. Heartbeats go over that, as do replication ops > when clients write data. > > During heavy recovery or backfill, including healing from failures, > balancing, adding/removing drives, much more will be used. > > Convention wisdom has been to not let that traffic DoS clients, or clients to > DoS heartbeats. > > But this I think dates to a time when 1Gb/s networks were common. If one’s > using modern multiple/bonded 25Gb/s or 40Gb/s links …. > > > while public network bandwidth is nearly full. > > If your public network is saturated, that actually is a problem, last thing > you want is to add recovery traffic, or to slow down heartbeats. For most > people, it isn’t saturated. > > How do you define “full” ? TOR uplinks? TORs to individual nodes? Switch > backplanes? Are you using bonding with the wrong hash policy? > > > As a result, I don't think the cluster network is necessary. > > For an increasing percentage of folks deploying production-quality clusters, > agreed. > > > > > > > Willi Schiegel 于2020年5月8日周五 下午6:14写道: > > > >> Hello Nghia, > >> > >> I once asked a similar question about network architecture and got the > >> same answer as Martin wrote from Wido den Hollander: > >> > >> There is no need to have a public and cluster network with Ceph. Working > >> as a Ceph consultant I've deployed multi-PB Ceph clusters with a single > >> public network without any problems. Each node has a single IP-address, > >> nothing more, nothing less. > >> > >> In the current Ceph manual you can read > >> > >> It is possible to run a Ceph Storage Cluster with two networks: a public > >> (front-side) network and a cluster (back-side) network. However, this > >> approach complicates network configuration (both hardware and software) > >> and does not usually have a significant impact on overall performance. > >> For this reason, we generally recommend that dual-NIC systems either be > >> configured with two IPs on the same network, or bonded. > >> > >> I followed the advice from Wido "One system, one IP address" and > >> everything works fine. So, you should be fine with one interface for > >> MONs, MGRs, and OSDs. > >> > >> Best > >> Willi > >> > >> On 5/8/20 11:57 AM, Nghia Viet Tran wrote: > >>> Hi Martin, > >>> > >>> Thanks for your response. You mean one network interface for only MON > >>> hosts or for the whole cluster including OSD hosts? I’m confusing now > >>> because there are some projects that only useone public network for the > >>> whole cluster. That means the rebalancing, replicating objects and > >>> heartbeats from OSD hostswould affects the performance of Ceph client. > >>> > >>> *From: *Martin Verges > >>> *Date: *Friday, May 8, 2020 at 16:20 > >>> *To: *Nghia Viet Tran > >>> *Cc: *"ceph-users@ceph.io" > >>> *Subject: *Re: [ceph-users] Cluster network and public network > >>> > >>> Hello Nghia, > >>> > >>> just use one network interface card and use frontend and backend traffic > >>> on the same. No problem with that. > >>> > >>> If you have a dual port card, use both ports as an LACP channel and > >>> maybe separate it using VLANs if you want to, but not required as well. > >>> > >>> > >>> -- > >>> > >>> Martin Verges > >>> Managing director > >>> > >>> Mobile: +49 174 9335695 > >>> E-Mail: martin.ver...@croit.io <mailto:martin.ver...@croit.io> > >>> Chat: https://t.me/MartinVerges > >>> > >>> croit GmbH, Freseniusstr. 31h, 81247 Munich > >>> CEO: Martin Verges - VAT-ID: DE310638492 > >>> Com. register: Amtsgericht Munich HRB 231263 > >>> > >>> Web: https://croit.io > >>> YouTube: ht
[ceph-users] Re: Cluster network and public network
Dear all, looks like I need to be more precise: >>> I think, however, that a disappearing back network has no real >>> consequences as the heartbeats always go over both. >> >> FWIW this has not been my experience, at least through Luminous. >> >> What I’ve seen is that when the cluster/replication net is configured but >> unavailable, OSD heartbeats fail > and peers report them to the mons as down. The mons send out a map > accordingly, and the affected > OSDs report “I’m not dead yet!”. Flap flap flap. > > +1. This has also been my experience. And it's quit hard to debug as > well (confusing / seemingly contradictory messages). > > It uses the back network to replicate data ... and as long as it can't > (client) IO wont go through. I did not mean to have a back network configured but it is taken down. Of course this won't work. What I mean is that you: 1. remove the cluster network definition from the cluster config (ceph.conf and/or ceph config ...) 2. restart OSDs to apply the change 3. remove the physical network Step 2 will most likely require down time as you write, because during the transition some OSDs will think all OSDs listen on 2 while other OSDs think everyone is listening on 1 network. If you can afford to take all clients down and do a full cluster restart, this is doable. If you set noout,nodown,pause and maybe some other flags (norebalance,nobackfill,norecover), wait for all client *and* recovery I/O to complete, it is probably possible to do this transition without disconnecting clients by just restarting all OSDs failure domain by failure domain. After the transition things should work fine with just 1 network. In any case, my recommendation would be to keep both networks if they are on different VLAN IDs. Then, nothing special is required to do the transition and this is what I did to simplify the physical networking (two logical networks, identical physical networking). Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Stefan Kooman Sent: 13 May 2020 07:40 To: ceph-users@ceph.io Subject: [ceph-users] Re: Cluster network and public network On 2020-05-12 18:59, Anthony D'Atri wrote: > >> I think, however, that a disappearing back network has no real >> consequences as the heartbeats always go over both. > > FWIW this has not been my experience, at least through Luminous. > > What I’ve seen is that when the cluster/replication net is configured but > unavailable, OSD heartbeats fail and peers report them to the mons as down. > The mons send out a map accordingly, and the affected OSDs report “I’m not > dead yet!”. Flap flap flap. +1. This has also been my experience. And it's quit hard to debug as well (confusing / seemingly contradictory messages). It uses the back network to replicate data ... and as long as it can't (client) IO wont go through. Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster network and public network
On 2020-05-12 18:59, Anthony D'Atri wrote: I think, however, that a disappearing back network has no real consequences as the heartbeats always go over both. FWIW this has not been my experience, at least through Luminous. What I’ve seen is that when the cluster/replication net is configured but unavailable, OSD heartbeats fail and peers report them to the mons as down. The mons send out a map accordingly, and the affected OSDs report “I’m not dead yet!”. Flap flap flap. +1. This has also been my experience. And it's quit hard to debug as well (confusing / seemingly contradictory messages). It uses the back network to replicate data ... and as long as it can't (client) IO wont go through. Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster network and public network
> > I did not mean to have a back network configured but it is taken down. Of > course this won't work. What I mean is that you: > > 1. remove the cluster network definition from the cluster config (ceph.conf > and/or ceph config ...) > 2. restart OSDs to apply the change > 3. remove the physical network > > Step 2 will most likely require down time as you write, because during the > transition some OSDs will think all OSDs listen on 2 while other OSDs think > everyone is listening on 1 network. If you can afford to take all clients > down and do a full cluster restart, this is doable. If you set > noout,nodown,pause and maybe some other flags > (norebalance,nobackfill,norecover), wait for all client *and* recovery I/O to > complete, it is probably possible to do this transition without disconnecting > clients by just restarting all OSDs failure domain by failure domain. Perhaps temporarily setting mon_osd_min_down_reporters to a large number would help avoid flapping. I fear at least some [RBD] clients would still experience timeouts / kernel panics though. > > After the transition things should work fine with just 1 network. > > In any case, my recommendation would be to keep both networks if they are on > different VLAN IDs. Then, nothing special is required to do the transition > and this is what I did to simplify the physical networking (two logical > networks, identical physical networking). > > Best regards, > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ____ > From: Stefan Kooman > Sent: 13 May 2020 07:40 > To: ceph-users@ceph.io > Subject: [ceph-users] Re: Cluster network and public network > > On 2020-05-12 18:59, Anthony D'Atri wrote: >> >>> I think, however, that a disappearing back network has no real >>> consequences as the heartbeats always go over both. >> >> FWIW this has not been my experience, at least through Luminous. >> >> What I’ve seen is that when the cluster/replication net is configured but >> unavailable, OSD heartbeats fail and peers report them to the mons as down. >> The mons send out a map accordingly, and the affected OSDs report “I’m not >> dead yet!”. Flap flap flap. > > +1. This has also been my experience. And it's quit hard to debug as > well (confusing / seemingly contradictory messages). > > It uses the back network to replicate data ... and as long as it can't > (client) IO wont go through. > > Gr. Stefan > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster network and public network
> I think, however, that a disappearing back network has no real consequences > as the heartbeats always go over both. FWIW this has not been my experience, at least through Luminous. What I’ve seen is that when the cluster/replication net is configured but unavailable, OSD heartbeats fail and peers report them to the mons as down. The mons send out a map accordingly, and the affected OSDs report “I’m not dead yet!”. Flap flap flap. YMMV ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster network and public network
Hi MJ, this should work. Note that when using cloned devices all traffic will go through the same VLAN. In that case, I believe you an simply remove the cluster network definition and use just one IP, there is no point having the second IP on the same VLAN. You will probably have to do "noout,nodown" for the flip-over, which probably required a restart of each OSD. I think, however, that a disappearing back network has no real consequences as the heartbeats always go over both. There might be stuck replication traffic for a while, but even this can be avoided with "osd pause". Our configuration with 2 VLANS is this: public network: ceph0.81: flags=4163 mtu 9000 cluster network: ceph0.82: flags=4163 mtu 9000 ceph0: flags=5187 mtu 9000 em1: flags=6211 mtu 9000 em2: flags=6211 mtu 9000 p1p1: flags=6211 mtu 9000 p1p2: flags=6211 mtu 9000 p2p1: flags=6211 mtu 9000 p2p2: flags=6211 mtu 9000 If you already have 2 VLANs with different IDs, then this flip-over is trivial. I did it without service outage. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: mj Sent: 12 May 2020 13:12:47 To: ceph-users@ceph.io Subject: [ceph-users] Re: Cluster network and public network Hi, On 11/05/2020 08:50, Wido den Hollander wrote: > Great to hear! I'm still behind this idea and all the clusters I design > have a single (or LACP) network going to the host. > > One IP address per node where all traffic goes over. That's Ceph, SSH, > (SNMP) Monitoring, etc. > > Wido We have an 'old-style' cluster, seperated LAN/cluster network. We would like to move over to the 'new-style'. Is it as easy as: define the NICs in a 2x10G LACP bond0, and add both NICs to the bond0 config, and add configure like: > auto bond0 > iface bond0 inet static > address 192.168.0.5 > netmask 255.255.255.0 and add our cluster IP as a second IP, like > auto bond0:1 > iface bond0:1 inet static > address 192.168.10.160 > netmask 255.255.255.0 On all nodes, reboot, and everything will work? Or are there ceph specifics to consider? Thanks, MJ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster network and public network
Hi, On 11/05/2020 08:50, Wido den Hollander wrote: Great to hear! I'm still behind this idea and all the clusters I design have a single (or LACP) network going to the host. One IP address per node where all traffic goes over. That's Ceph, SSH, (SNMP) Monitoring, etc. Wido We have an 'old-style' cluster, seperated LAN/cluster network. We would like to move over to the 'new-style'. Is it as easy as: define the NICs in a 2x10G LACP bond0, and add both NICs to the bond0 config, and add configure like: auto bond0 iface bond0 inet static address 192.168.0.5 netmask 255.255.255.0 and add our cluster IP as a second IP, like auto bond0:1 iface bond0:1 inet static address 192.168.10.160 netmask 255.255.255.0 On all nodes, reboot, and everything will work? Or are there ceph specifics to consider? Thanks, MJ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster network and public network
Hi Anthony and Phil, since my meltdown case was mentioned and I might have a network capacity issue, here a question about why having separate VLANS for private and public network might have its merits: In our part of the ceph cluster that was overloaded (our cluster has 2 sites logically separate and physically different), I see a lot of dropped packets on the spine switch and it looks like its the downlinks to the leafs where storage servers. I'm still not finished investigating, so a network overload is still a hypothetical part of our meltdown. The question below should, however, be interesting in any case as it might help prevent a meltdown in case of similar set ups. Our network connectivity is as follows: we have 1 storage server and up to 18 clients per leaf. The storage servers have 6x10G connectivity in an LACP bond and front- and back-network share all ports but are separated by VLAN. The clients have 1x10G on the public network. Unfortunately, currently the up-links from leaf to spine switches are limited to 2x10G. We are in the progress of upgrading to 2x40, so let's ignore fixing this temporary bottleneck here (Corona got in the way) and focus on workarounds until we can access the site again. For every write, currently every storage server is hit (10 servers with 8+2EC). Since we believed the low uplink bandwidth to be short time only during a network upgrade, we were willing to accept the low bandwidth assuming that the competition between client and storage traffic would throttle the clients sufficiently much to result in a working system maybe with reduced performance but not becoming unstable. The questions relevant to this thread: I kept the separation into public and cluster network, because this enables QOS definitions, which are typical per VLAN. In my situation, what if the up-links were saturated by the competing client- and storage server traffic? Both run on the same VLAN, obviously. The only way to make space for the OSD/heartbeat traffic would be to give the cluster network VLAN higher priority over public network by QOS settings. This should at least allow the OSDs to continue checking heartbeats etc. over a busy line. Is this correct? This also raises a question I had a long time ago and was also raised by Anthony. Why are the MONs not on the cluster network? If I can make a priority line for the OSDs, why can't I make OSD-MON communication a priority too? While digging through heartbeat options as a consequence of our meltdown, I found this one: # ceph daemon osd.0 config show | grep heart ... "osd_heartbeat_addr": "-", ... # ceph daemon mon.ceph-01 config show | grep heart ... "osd_heartbeat_addr": "-", ... Is it actually possible to reserve a dedicated (third) VLAN with high QOS to heartbeat traffic by providing a per-host IP address to this parameter? What does this parameter do? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Anthony D'Atri Sent: 09 May 2020 23:59:49 To: Phil Regnauld Cc: ceph-users@ceph.io Subject: [ceph-users] Re: Cluster network and public network >> If your public network is saturated, that actually is a problem, last thing >> you want is to add recovery traffic, or to slow down heartbeats. For most >> people, it isn’t saturated. > >See Frank Schilder's post about a meltdown which he believes could have >been caused by beacon/hearbeat being drowned out by other recovery/IO >trafic, not at the network level, but at the processing level on the > OSDs. > >If indeed there are cases where the OSDs are too busy to send (or > process) >heartbeat/beacon messaging, it wouldn't help to have a separate > network ? Agreed. Many times I’ve had to argue that CPUs that aren’t nearly saturated *aren’t* necessarily overkill, especially with fast media where latency hurts. It would be interesting to consider an architecture where a core/HT is dedicated to the control plane. That said, I’ve seen a situation where excessive CPU appeared to affect latency by allowing the CPUs to drop C-states, this especially affected network traffic (2x dual 10GE). Curiously some systems in the same cluster experienced this but some didn’t. There was a mix of Sandy Bridge and Ivy Bridge IIRC, as well as different Broadcom chips. Despite an apparently alignment with older vs newer Broadcom chip, I never fully characterized the situation — replacing one of the Broadcom NICs in an affected system with the model in use on unaffected systems diddn’t resolve the issue. It’s possible that replacing the other wwould have made a difference. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster network and public network
On 5/8/20 12:13 PM, Willi Schiegel wrote: > Hello Nghia, > > I once asked a similar question about network architecture and got the > same answer as Martin wrote from Wido den Hollander: > > There is no need to have a public and cluster network with Ceph. Working > as a Ceph consultant I've deployed multi-PB Ceph clusters with a single > public network without any problems. Each node has a single IP-address, > nothing more, nothing less. > > In the current Ceph manual you can read > > It is possible to run a Ceph Storage Cluster with two networks: a public > (front-side) network and a cluster (back-side) network. However, this > approach complicates network configuration (both hardware and software) > and does not usually have a significant impact on overall performance. > For this reason, we generally recommend that dual-NIC systems either be > configured with two IPs on the same network, or bonded. > > I followed the advice from Wido "One system, one IP address" and > everything works fine. So, you should be fine with one interface for > MONs, MGRs, and OSDs. > Great to hear! I'm still behind this idea and all the clusters I design have a single (or LACP) network going to the host. One IP address per node where all traffic goes over. That's Ceph, SSH, (SNMP) Monitoring, etc. Wido > Best > Willi > > On 5/8/20 11:57 AM, Nghia Viet Tran wrote: >> Hi Martin, >> >> Thanks for your response. You mean one network interface for only MON >> hosts or for the whole cluster including OSD hosts? I’m confusing now >> because there are some projects that only useone public network for >> the whole cluster. That means the rebalancing, replicating objects and >> heartbeats from OSD hostswould affects the performance of Ceph client. >> >> *From: *Martin Verges >> *Date: *Friday, May 8, 2020 at 16:20 >> *To: *Nghia Viet Tran >> *Cc: *"ceph-users@ceph.io" >> *Subject: *Re: [ceph-users] Cluster network and public network >> >> Hello Nghia, >> >> just use one network interface card and use frontend and backend >> traffic on the same. No problem with that. >> >> If you have a dual port card, use both ports as an LACP channel and >> maybe separate it using VLANs if you want to, but not required as well. >> >> >> -- >> >> Martin Verges >> Managing director >> >> Mobile: +49 174 9335695 >> E-Mail: martin.ver...@croit.io <mailto:martin.ver...@croit.io> >> Chat: https://t.me/MartinVerges >> >> croit GmbH, Freseniusstr. 31h, 81247 Munich >> CEO: Martin Verges - VAT-ID: DE310638492 >> Com. register: Amtsgericht Munich HRB 231263 >> >> Web: https://croit.io >> YouTube: https://goo.gl/PGE1Bx >> >> Am Fr., 8. Mai 2020 um 09:29 Uhr schrieb Nghia Viet Tran >> mailto:nghia.viet.t...@mgm-tp.com>>: >> >> Hi everyone, >> >> I have a question about the network setup. From the document, It’s >> recommended to have 2 NICs per hosts as described in below picture >> >> Diagram >> >> In the picture, OSD hosts will connect to the Cluster network for >> replicate and heartbeat between OSDs, therefore, we definitely need >> 2 NICs for it. But seems there are no connections between Ceph MON >> and Cluster network. Can we install 1 NIC on Ceph MON then? >> >> I appreciated any comments! >> >> Thank you! >> >> -- >> Nghia Viet Tran (Mr) >> >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> <mailto:ceph-users@ceph.io> >> To unsubscribe send an email to ceph-users-le...@ceph.io >> <mailto:ceph-users-le...@ceph.io> >> >> >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster network and public network
>> If your public network is saturated, that actually is a problem, last thing >> you want is to add recovery traffic, or to slow down heartbeats. For most >> people, it isn’t saturated. > >See Frank Schilder's post about a meltdown which he believes could have >been caused by beacon/hearbeat being drowned out by other recovery/IO >trafic, not at the network level, but at the processing level on the > OSDs. > >If indeed there are cases where the OSDs are too busy to send (or > process) >heartbeat/beacon messaging, it wouldn't help to have a separate > network ? Agreed. Many times I’ve had to argue that CPUs that aren’t nearly saturated *aren’t* necessarily overkill, especially with fast media where latency hurts. It would be interesting to consider an architecture where a core/HT is dedicated to the control plane. That said, I’ve seen a situation where excessive CPU appeared to affect latency by allowing the CPUs to drop C-states, this especially affected network traffic (2x dual 10GE). Curiously some systems in the same cluster experienced this but some didn’t. There was a mix of Sandy Bridge and Ivy Bridge IIRC, as well as different Broadcom chips. Despite an apparently alignment with older vs newer Broadcom chip, I never fully characterized the situation — replacing one of the Broadcom NICs in an affected system with the model in use on unaffected systems diddn’t resolve the issue. It’s possible that replacing the other wwould have made a difference. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster network and public network
Anthony D'Atri (anthony.datri) writes: > > During heavy recovery or backfill, including healing from failures, > balancing, adding/removing drives, much more will be used. > > Convention wisdom has been to not let that traffic DoS clients, or clients to > DoS heartbeats. [...] > If your public network is saturated, that actually is a problem, last thing > you want is to add recovery traffic, or to slow down heartbeats. For most > people, it isn’t saturated. See Frank Schilder's post about a meltdown which he believes could have been caused by beacon/hearbeat being drowned out by other recovery/IO trafic, not at the network level, but at the processing level on the OSDs. If indeed there are cases where the OSDs are too busy to send (or process) heartbeat/beacon messaging, it wouldn't help to have a separate network ? Cheers, Phil ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster network and public network
Hi Anthony, Thanks for the feedback! The servers are using two bond interfaces for two networks. And each interface is bonded with two 25Gb/s cards(active-backup mode). You're right, I should have done the test in a havey recovery or backfill situation. I will benchmark the cluster once again in order to get the accurate statistics. Further more, the bandwidth data are obtained from 'iftop'. The speed is much higer on public interface, than that on cluster interface. Thanks Anthony D'Atri 于2020年5月9日周六 下午4:32写道: > > > Hi, > > > > I deployed few clusters with two networks as well as only one network. > > There has little impact between them for my experience. > > > > I did a performance test on nautilus cluster with two networks last week. > > What I found is that the cluster network has low bandwidth usage > > During steady-state, sure. Heartbeats go over that, as do replication ops > when clients write data. > > During heavy recovery or backfill, including healing from failures, > balancing, adding/removing drives, much more will be used. > > Convention wisdom has been to not let that traffic DoS clients, or clients > to DoS heartbeats. > > But this I think dates to a time when 1Gb/s networks were common. If > one’s using modern multiple/bonded 25Gb/s or 40Gb/s links …. > > > while public network bandwidth is nearly full. > > If your public network is saturated, that actually is a problem, last > thing you want is to add recovery traffic, or to slow down heartbeats. For > most people, it isn’t saturated. > > How do you define “full” ? TOR uplinks? TORs to individual nodes? > Switch backplanes? Are you using bonding with the wrong hash policy? > > > As a result, I don't think the cluster network is necessary. > > For an increasing percentage of folks deploying production-quality > clusters, agreed. > > > > > > > Willi Schiegel 于2020年5月8日周五 下午6:14写道: > > > >> Hello Nghia, > >> > >> I once asked a similar question about network architecture and got the > >> same answer as Martin wrote from Wido den Hollander: > >> > >> There is no need to have a public and cluster network with Ceph. Working > >> as a Ceph consultant I've deployed multi-PB Ceph clusters with a single > >> public network without any problems. Each node has a single IP-address, > >> nothing more, nothing less. > >> > >> In the current Ceph manual you can read > >> > >> It is possible to run a Ceph Storage Cluster with two networks: a public > >> (front-side) network and a cluster (back-side) network. However, this > >> approach complicates network configuration (both hardware and software) > >> and does not usually have a significant impact on overall performance. > >> For this reason, we generally recommend that dual-NIC systems either be > >> configured with two IPs on the same network, or bonded. > >> > >> I followed the advice from Wido "One system, one IP address" and > >> everything works fine. So, you should be fine with one interface for > >> MONs, MGRs, and OSDs. > >> > >> Best > >> Willi > >> > >> On 5/8/20 11:57 AM, Nghia Viet Tran wrote: > >>> Hi Martin, > >>> > >>> Thanks for your response. You mean one network interface for only MON > >>> hosts or for the whole cluster including OSD hosts? I’m confusing now > >>> because there are some projects that only useone public network for the > >>> whole cluster. That means the rebalancing, replicating objects and > >>> heartbeats from OSD hostswould affects the performance of Ceph client. > >>> > >>> *From: *Martin Verges > >>> *Date: *Friday, May 8, 2020 at 16:20 > >>> *To: *Nghia Viet Tran > >>> *Cc: *"ceph-users@ceph.io" > >>> *Subject: *Re: [ceph-users] Cluster network and public network > >>> > >>> Hello Nghia, > >>> > >>> just use one network interface card and use frontend and backend > traffic > >>> on the same. No problem with that. > >>> > >>> If you have a dual port card, use both ports as an LACP channel and > >>> maybe separate it using VLANs if you want to, but not required as well. > >>> > >>> > >>> -- > >>> > >>> Martin Verges > >>> Managing director > >>> > >>> Mobile: +49 174 9335695 > >>> E-Mail: martin.ver...@croit.io <mailto:martin.ver...@croit.io> > >>> Chat: https
[ceph-users] Re: Cluster network and public network
> Hi, > > I deployed few clusters with two networks as well as only one network. > There has little impact between them for my experience. > > I did a performance test on nautilus cluster with two networks last week. > What I found is that the cluster network has low bandwidth usage During steady-state, sure. Heartbeats go over that, as do replication ops when clients write data. During heavy recovery or backfill, including healing from failures, balancing, adding/removing drives, much more will be used. Convention wisdom has been to not let that traffic DoS clients, or clients to DoS heartbeats. But this I think dates to a time when 1Gb/s networks were common. If one’s using modern multiple/bonded 25Gb/s or 40Gb/s links …. > while public network bandwidth is nearly full. If your public network is saturated, that actually is a problem, last thing you want is to add recovery traffic, or to slow down heartbeats. For most people, it isn’t saturated. How do you define “full” ? TOR uplinks? TORs to individual nodes? Switch backplanes? Are you using bonding with the wrong hash policy? > As a result, I don't think the cluster network is necessary. For an increasing percentage of folks deploying production-quality clusters, agreed. > > > Willi Schiegel 于2020年5月8日周五 下午6:14写道: > >> Hello Nghia, >> >> I once asked a similar question about network architecture and got the >> same answer as Martin wrote from Wido den Hollander: >> >> There is no need to have a public and cluster network with Ceph. Working >> as a Ceph consultant I've deployed multi-PB Ceph clusters with a single >> public network without any problems. Each node has a single IP-address, >> nothing more, nothing less. >> >> In the current Ceph manual you can read >> >> It is possible to run a Ceph Storage Cluster with two networks: a public >> (front-side) network and a cluster (back-side) network. However, this >> approach complicates network configuration (both hardware and software) >> and does not usually have a significant impact on overall performance. >> For this reason, we generally recommend that dual-NIC systems either be >> configured with two IPs on the same network, or bonded. >> >> I followed the advice from Wido "One system, one IP address" and >> everything works fine. So, you should be fine with one interface for >> MONs, MGRs, and OSDs. >> >> Best >> Willi >> >> On 5/8/20 11:57 AM, Nghia Viet Tran wrote: >>> Hi Martin, >>> >>> Thanks for your response. You mean one network interface for only MON >>> hosts or for the whole cluster including OSD hosts? I’m confusing now >>> because there are some projects that only useone public network for the >>> whole cluster. That means the rebalancing, replicating objects and >>> heartbeats from OSD hostswould affects the performance of Ceph client. >>> >>> *From: *Martin Verges >>> *Date: *Friday, May 8, 2020 at 16:20 >>> *To: *Nghia Viet Tran >>> *Cc: *"ceph-users@ceph.io" >>> *Subject: *Re: [ceph-users] Cluster network and public network >>> >>> Hello Nghia, >>> >>> just use one network interface card and use frontend and backend traffic >>> on the same. No problem with that. >>> >>> If you have a dual port card, use both ports as an LACP channel and >>> maybe separate it using VLANs if you want to, but not required as well. >>> >>> >>> -- >>> >>> Martin Verges >>> Managing director >>> >>> Mobile: +49 174 9335695 >>> E-Mail: martin.ver...@croit.io <mailto:martin.ver...@croit.io> >>> Chat: https://t.me/MartinVerges >>> >>> croit GmbH, Freseniusstr. 31h, 81247 Munich >>> CEO: Martin Verges - VAT-ID: DE310638492 >>> Com. register: Amtsgericht Munich HRB 231263 >>> >>> Web: https://croit.io >>> YouTube: https://goo.gl/PGE1Bx >>> >>> Am Fr., 8. Mai 2020 um 09:29 Uhr schrieb Nghia Viet Tran >>> mailto:nghia.viet.t...@mgm-tp.com>>: >>> >>>Hi everyone, >>> >>>I have a question about the network setup. From the document, It’s >>>recommended to have 2 NICs per hosts as described in below picture >>> >>>Diagram >>> >>>In the picture, OSD hosts will connect to the Cluster network for >>>replicate and heartbeat between OSDs, therefore, we definitely need >>>2 NICs for it. But seems there are no connections between Ceph MON >>
[ceph-users] Re: Cluster network and public network
Hi, I deployed few clusters with two networks as well as only one network. There has little impact between them for my experience. I did a performance test on nautilus cluster with two networks last week. What I found is that the cluster network has low bandwidth usage while public network bandwidth is nearly full. As a result, I don't think the cluster network is necessary. Willi Schiegel 于2020年5月8日周五 下午6:14写道: > Hello Nghia, > > I once asked a similar question about network architecture and got the > same answer as Martin wrote from Wido den Hollander: > > There is no need to have a public and cluster network with Ceph. Working > as a Ceph consultant I've deployed multi-PB Ceph clusters with a single > public network without any problems. Each node has a single IP-address, > nothing more, nothing less. > > In the current Ceph manual you can read > > It is possible to run a Ceph Storage Cluster with two networks: a public > (front-side) network and a cluster (back-side) network. However, this > approach complicates network configuration (both hardware and software) > and does not usually have a significant impact on overall performance. > For this reason, we generally recommend that dual-NIC systems either be > configured with two IPs on the same network, or bonded. > > I followed the advice from Wido "One system, one IP address" and > everything works fine. So, you should be fine with one interface for > MONs, MGRs, and OSDs. > > Best > Willi > > On 5/8/20 11:57 AM, Nghia Viet Tran wrote: > > Hi Martin, > > > > Thanks for your response. You mean one network interface for only MON > > hosts or for the whole cluster including OSD hosts? I’m confusing now > > because there are some projects that only useone public network for the > > whole cluster. That means the rebalancing, replicating objects and > > heartbeats from OSD hostswould affects the performance of Ceph client. > > > > *From: *Martin Verges > > *Date: *Friday, May 8, 2020 at 16:20 > > *To: *Nghia Viet Tran > > *Cc: *"ceph-users@ceph.io" > > *Subject: *Re: [ceph-users] Cluster network and public network > > > > Hello Nghia, > > > > just use one network interface card and use frontend and backend traffic > > on the same. No problem with that. > > > > If you have a dual port card, use both ports as an LACP channel and > > maybe separate it using VLANs if you want to, but not required as well. > > > > > > -- > > > > Martin Verges > > Managing director > > > > Mobile: +49 174 9335695 > > E-Mail: martin.ver...@croit.io <mailto:martin.ver...@croit.io> > > Chat: https://t.me/MartinVerges > > > > croit GmbH, Freseniusstr. 31h, 81247 Munich > > CEO: Martin Verges - VAT-ID: DE310638492 > > Com. register: Amtsgericht Munich HRB 231263 > > > > Web: https://croit.io > > YouTube: https://goo.gl/PGE1Bx > > > > Am Fr., 8. Mai 2020 um 09:29 Uhr schrieb Nghia Viet Tran > > mailto:nghia.viet.t...@mgm-tp.com>>: > > > > Hi everyone, > > > > I have a question about the network setup. From the document, It’s > > recommended to have 2 NICs per hosts as described in below picture > > > > Diagram > > > > In the picture, OSD hosts will connect to the Cluster network for > > replicate and heartbeat between OSDs, therefore, we definitely need > > 2 NICs for it. But seems there are no connections between Ceph MON > > and Cluster network. Can we install 1 NIC on Ceph MON then? > > > > I appreciated any comments! > > > > Thank you! > > > > -- > > > > Nghia Viet Tran (Mr) > > > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > <mailto:ceph-users@ceph.io> > > To unsubscribe send an email to ceph-users-le...@ceph.io > > <mailto:ceph-users-le...@ceph.io> > > > > > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster network and public network
Hello Nghia, I once asked a similar question about network architecture and got the same answer as Martin wrote from Wido den Hollander: There is no need to have a public and cluster network with Ceph. Working as a Ceph consultant I've deployed multi-PB Ceph clusters with a single public network without any problems. Each node has a single IP-address, nothing more, nothing less. In the current Ceph manual you can read It is possible to run a Ceph Storage Cluster with two networks: a public (front-side) network and a cluster (back-side) network. However, this approach complicates network configuration (both hardware and software) and does not usually have a significant impact on overall performance. For this reason, we generally recommend that dual-NIC systems either be configured with two IPs on the same network, or bonded. I followed the advice from Wido "One system, one IP address" and everything works fine. So, you should be fine with one interface for MONs, MGRs, and OSDs. Best Willi On 5/8/20 11:57 AM, Nghia Viet Tran wrote: Hi Martin, Thanks for your response. You mean one network interface for only MON hosts or for the whole cluster including OSD hosts? I’m confusing now because there are some projects that only useone public network for the whole cluster. That means the rebalancing, replicating objects and heartbeats from OSD hostswould affects the performance of Ceph client. *From: *Martin Verges *Date: *Friday, May 8, 2020 at 16:20 *To: *Nghia Viet Tran *Cc: *"ceph-users@ceph.io" *Subject: *Re: [ceph-users] Cluster network and public network Hello Nghia, just use one network interface card and use frontend and backend traffic on the same. No problem with that. If you have a dual port card, use both ports as an LACP channel and maybe separate it using VLANs if you want to, but not required as well. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.ver...@croit.io <mailto:martin.ver...@croit.io> Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io YouTube: https://goo.gl/PGE1Bx Am Fr., 8. Mai 2020 um 09:29 Uhr schrieb Nghia Viet Tran mailto:nghia.viet.t...@mgm-tp.com>>: Hi everyone, I have a question about the network setup. From the document, It’s recommended to have 2 NICs per hosts as described in below picture Diagram In the picture, OSD hosts will connect to the Cluster network for replicate and heartbeat between OSDs, therefore, we definitely need 2 NICs for it. But seems there are no connections between Ceph MON and Cluster network. Can we install 1 NIC on Ceph MON then? I appreciated any comments! Thank you! -- Nghia Viet Tran (Mr) ___ ceph-users mailing list -- ceph-users@ceph.io <mailto:ceph-users@ceph.io> To unsubscribe send an email to ceph-users-le...@ceph.io <mailto:ceph-users-le...@ceph.io> ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster network and public network
Hi Martin, Thanks for your response. You mean one network interface for only MON hosts or for the whole cluster including OSD hosts? I’m confusing now because there are some projects that only use one public network for the whole cluster. That means the rebalancing, replicating objects and heartbeats from OSD hosts would affects the performance of Ceph client. From: Martin Verges Date: Friday, May 8, 2020 at 16:20 To: Nghia Viet Tran Cc: "ceph-users@ceph.io" Subject: Re: [ceph-users] Cluster network and public network Hello Nghia, just use one network interface card and use frontend and backend traffic on the same. No problem with that. If you have a dual port card, use both ports as an LACP channel and maybe separate it using VLANs if you want to, but not required as well. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.ver...@croit.io<mailto:martin.ver...@croit.io> Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io YouTube: https://goo.gl/PGE1Bx Am Fr., 8. Mai 2020 um 09:29 Uhr schrieb Nghia Viet Tran mailto:nghia.viet.t...@mgm-tp.com>>: Hi everyone, I have a question about the network setup. From the document, It’s recommended to have 2 NICs per hosts as described in below picture [Diagram] In the picture, OSD hosts will connect to the Cluster network for replicate and heartbeat between OSDs, therefore, we definitely need 2 NICs for it. But seems there are no connections between Ceph MON and Cluster network. Can we install 1 NIC on Ceph MON then? I appreciated any comments! Thank you! -- Nghia Viet Tran (Mr) ___ ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io> To unsubscribe send an email to ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io> ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster network and public network
Hello Nghia, just use one network interface card and use frontend and backend traffic on the same. No problem with that. If you have a dual port card, use both ports as an LACP channel and maybe separate it using VLANs if you want to, but not required as well. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.ver...@croit.io Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io YouTube: https://goo.gl/PGE1Bx Am Fr., 8. Mai 2020 um 09:29 Uhr schrieb Nghia Viet Tran < nghia.viet.t...@mgm-tp.com>: > Hi everyone, > > I have a question about the network setup. From the document, It’s > recommended to have 2 NICs per hosts as described in below picture > > [image: Diagram] > > In the picture, OSD hosts will connect to the Cluster network for > replicate and heartbeat between OSDs, therefore, we definitely need 2 NICs > for it. But seems there are no connections between Ceph MON and Cluster > network. Can we install 1 NIC on Ceph MON then? > > > > I appreciated any comments! > > Thank you! > > -- > > Nghia Viet Tran (Mr) > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io