[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus
Hi all, I can now add another data point as well. We upgraded our production cluster from mimic to octopus with the procedure of - set quick-fix-on-start=false in all ceph.conf files and the mon config store - set nosnaptrim - upgrade all daemons - set require-osd-release=octopus - host by host: set quick-fix-on-start=true in ceph.conf and restart OSDs - unset nosnaptrim On our production system the conversion went much faster compared with the test system. This process is very CPU intensive, yet converting 70 OSDs per host with 2x18 core Broadwell CPUs worked without problems. Load reached more than 200% but it all finished without crashes. Upgrading the daemons and completing the conversion of all hosts took 3 very long days. After conversion in this way no problems with snaptrim. We also enabled ephemeral pinning on our FS with 8 active MDSes and see no change in single-user performance, but at least 2-3 times higher aggregated throughput (home for a 500 node HPC cluster). We did have a severe hiccup though. Very small OSDs with a size of ca. 100G crash on octopus when OMAP reaches a certain size. I don't know yet what a safe minimum size is (ongoing thread "OSD crashes during upgrade mimic->octopus"). The 300G OSDs on our test cluster worked fine. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Tyler Stachecki Sent: 27 September 2022 02:00 To: Marc Cc: Frank Schilder; ceph-users Subject: Re: [ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus Just a datapoint - we upgraded several large Mimic-born clusters straight to 15.2.12 with the quick fsck disabled in ceph.conf, then did require-osd-release, and finally did the omap conversion offline after the cluster was upgraded using the bluestore tool while the OSDs were down (all done in batches). Clusters are zippy as ever. Maybe on a whim, try doing an offline fsck with the bluestore tool and see if it improves things? To answer an earlier question, if you have no health statuses muted, a 'ceph health detail' should show you at least a subset of OSDs that have not gone through the omap conversion yet. Cheers, Tyler On Mon, Sep 26, 2022, 5:13 PM Marc mailto:m...@f1-outsourcing.eu>> wrote: Hi Frank, Thank you very much for this! :) > > we just completed a third upgrade test. There are 2 ways to convert the > OSDs: > > A) convert along with the upgrade (quick-fix-on-start=true) > B) convert after setting require-osd-release=octopus (quick-fix-on- > start=false until require-osd-release set to octopus, then restart to > initiate conversion) > > There is a variation A' of A: follow A, then initiate manual compaction > and restart all OSDs. > > Our experiments show that paths A and B do *not* yield the same result. > Following path A leads to a severely performance degraded cluster. As of > now, we cannot confirm that A' fixes this. It seems that the only way > out is to zap and re-deploy all OSDs, basically what Boris is doing > right now. > > We extended now our procedure to adding > > bluestore_fsck_quick_fix_on_mount = false > > to every ceph.conf file and executing > > ceph config set osd bluestore_fsck_quick_fix_on_mount false > > to catch any accidents. After daemon upgrade, we set > bluestore_fsck_quick_fix_on_mount = true host by host in the ceph.conf > and restart OSDs. > > This procedure works like a charm. > > I don't know what the difference between A and B is. It is possible that > B executes an extra step that is missing in A. The performance > degradation only shows up when snaptrim is active, but then it is very > severe. I suspect that many users who complained about snaptrim in the > past have at least 1 A-converted OSD in their cluster. > > If you have a cluster upgraded with B-converted OSDs, it works like a > native octopus cluster. There is very little performance reduction > compared with mimic. In exchange, I have the impression that it operates > more stable. ___ ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io> To unsubscribe send an email to ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io> ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus
Just a datapoint - we upgraded several large Mimic-born clusters straight to 15.2.12 with the quick fsck disabled in ceph.conf, then did require-osd-release, and finally did the omap conversion offline after the cluster was upgraded using the bluestore tool while the OSDs were down (all done in batches). Clusters are zippy as ever. Maybe on a whim, try doing an offline fsck with the bluestore tool and see if it improves things? To answer an earlier question, if you have no health statuses muted, a 'ceph health detail' should show you at least a subset of OSDs that have not gone through the omap conversion yet. Cheers, Tyler On Mon, Sep 26, 2022, 5:13 PM Marc wrote: > Hi Frank, > > Thank you very much for this! :) > > > > > we just completed a third upgrade test. There are 2 ways to convert the > > OSDs: > > > > A) convert along with the upgrade (quick-fix-on-start=true) > > B) convert after setting require-osd-release=octopus (quick-fix-on- > > start=false until require-osd-release set to octopus, then restart to > > initiate conversion) > > > > There is a variation A' of A: follow A, then initiate manual compaction > > and restart all OSDs. > > > > Our experiments show that paths A and B do *not* yield the same result. > > Following path A leads to a severely performance degraded cluster. As of > > now, we cannot confirm that A' fixes this. It seems that the only way > > out is to zap and re-deploy all OSDs, basically what Boris is doing > > right now. > > > > We extended now our procedure to adding > > > > bluestore_fsck_quick_fix_on_mount = false > > > > to every ceph.conf file and executing > > > > ceph config set osd bluestore_fsck_quick_fix_on_mount false > > > > to catch any accidents. After daemon upgrade, we set > > bluestore_fsck_quick_fix_on_mount = true host by host in the ceph.conf > > and restart OSDs. > > > > This procedure works like a charm. > > > > I don't know what the difference between A and B is. It is possible that > > B executes an extra step that is missing in A. The performance > > degradation only shows up when snaptrim is active, but then it is very > > severe. I suspect that many users who complained about snaptrim in the > > past have at least 1 A-converted OSD in their cluster. > > > > If you have a cluster upgraded with B-converted OSDs, it works like a > > native octopus cluster. There is very little performance reduction > > compared with mimic. In exchange, I have the impression that it operates > > more stable. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus
Hi Frank, Thank you very much for this! :) > > we just completed a third upgrade test. There are 2 ways to convert the > OSDs: > > A) convert along with the upgrade (quick-fix-on-start=true) > B) convert after setting require-osd-release=octopus (quick-fix-on- > start=false until require-osd-release set to octopus, then restart to > initiate conversion) > > There is a variation A' of A: follow A, then initiate manual compaction > and restart all OSDs. > > Our experiments show that paths A and B do *not* yield the same result. > Following path A leads to a severely performance degraded cluster. As of > now, we cannot confirm that A' fixes this. It seems that the only way > out is to zap and re-deploy all OSDs, basically what Boris is doing > right now. > > We extended now our procedure to adding > > bluestore_fsck_quick_fix_on_mount = false > > to every ceph.conf file and executing > > ceph config set osd bluestore_fsck_quick_fix_on_mount false > > to catch any accidents. After daemon upgrade, we set > bluestore_fsck_quick_fix_on_mount = true host by host in the ceph.conf > and restart OSDs. > > This procedure works like a charm. > > I don't know what the difference between A and B is. It is possible that > B executes an extra step that is missing in A. The performance > degradation only shows up when snaptrim is active, but then it is very > severe. I suspect that many users who complained about snaptrim in the > past have at least 1 A-converted OSD in their cluster. > > If you have a cluster upgraded with B-converted OSDs, it works like a > native octopus cluster. There is very little performance reduction > compared with mimic. In exchange, I have the impression that it operates > more stable. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus
Hi, i just checked and all OSDs have it set to true. It seems also not a problem with the snaptrim opration. We just had two times in the last 7 days where nearly all OSDs logged a lot (around 3k times in 20 minutes) of these messages: 022-09-12T20:27:19.146+0200 7f576de49700 -1 osd.9 786378 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.153241560.0:42288714 8.56 8:6a19e4ee:::rbd_data.4c64dc3662fb05.0c00:head [write 2162688~4096 in=4096b] snapc 9835e=[] ondisk+write+known_if_redirected e786375) Am Di., 13. Sept. 2022 um 20:20 Uhr schrieb Wesley Dillingham < w...@wesdillingham.com>: > I haven't read through this entire thread so forgive me if already > mentioned: > > What is the parameter "bluefs_buffered_io" set to on your OSDs? We once > saw a terrible slowdown on our OSDs during snaptrim events and setting > bluefs_buffered_io to true alleviated that issue. That was on a nautilus > cluster. > > Respectfully, > > *Wes Dillingham* > w...@wesdillingham.com > LinkedIn <http://www.linkedin.com/in/wesleydillingham> > > > On Tue, Sep 13, 2022 at 10:48 AM Boris Behrens wrote: > >> The cluster is SSD only with 2TB,4TB and 8TB disks. I would expect that >> this should be done fairly fast. >> For now I will recreate every OSD in the cluster and check if this helps. >> >> Do you experience slow OPS (so the cluster shows a message like "cluster >> [WRN] Health check update: 679 slow ops, oldest one blocked for 95 sec, >> daemons >> >> [osd.0,osd.106,osd.107,osd.108,osd.113,osd.116,osd.123,osd.124,osd.125,osd.134]... >> have slow ops. (SLOW_OPS)")? >> >> I can also see a huge spike in the load of all hosts in our cluster for a >> couple of minutes. >> >> >> Am Di., 13. Sept. 2022 um 13:14 Uhr schrieb Frank Schilder > >: >> >> > Hi Boris. >> > >> > > 3. wait some time (took around 5-20 minutes) >> > >> > Sounds short. Might just have been the compaction that the OSDs do any >> > ways on startup after upgrade. I don't know how to check for completed >> > format conversion. What I see in your MON log is exactly what I have >> seen >> > with default snap trim settings until all OSDs were converted. Once an >> OSD >> > falls behind and slow ops start piling up, everything comes to a halt. >> Your >> > logs clearly show a sudden drop of IOP/s on snap trim start and I would >> > guess this is the cause of the slowly growing OPS back log of the OSDs. >> > >> > If its not that, I don't know what else to look for. >> > >> > Best regards, >> > = >> > Frank Schilder >> > AIT Risø Campus >> > Bygning 109, rum S14 >> > >> > >> > From: Boris Behrens >> > Sent: 13 September 2022 12:58:19 >> > To: Frank Schilder >> > Cc: ceph-users@ceph.io >> > Subject: Re: [ceph-users] laggy OSDs and staling krbd IO after upgrade >> > from nautilus to octopus >> > >> > Hi Frank, >> > we converted the OSDs directly on the upgrade. >> > >> > 1. installing new ceph versions >> > 2. restart all OSD daemons >> > 3. wait some time (took around 5-20 minutes) >> > 4. all OSDs were online again. >> > >> > So I would expect, that the OSDs are all upgraded correctly. >> > I also checked when the trimming happens, and it does not seem to be an >> > issue on it's own, as the trim happens all the time in various sizes. >> > >> > Am Di., 13. Sept. 2022 um 12:45 Uhr schrieb Frank Schilder < >> fr...@dtu.dk >> > <mailto:fr...@dtu.dk>>: >> > Are you observing this here: >> > >> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/LAN6PTZ2NHF2ZHAYXZIQPHZ4CMJKMI5K/ >> > = >> > Frank Schilder >> > AIT Risø Campus >> > Bygning 109, rum S14 >> > >> > >> > From: Boris Behrens mailto:b...@kervyn.de>> >> > Sent: 13 September 2022 11:43:20 >> > To: ceph-users@ceph.io<mailto:ceph-users@ceph.io> >> > Subject: [ceph-users] laggy OSDs and staling krbd IO after upgrade from >> > nautilus to octopus >> > >> > Hi, I need you help really bad. >> > >> > we are currently experiencing a very bad cluster hangups that happen >> > sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and >> once >> > 2022-09-12 in the evening) &g
[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus
I haven't read through this entire thread so forgive me if already mentioned: What is the parameter "bluefs_buffered_io" set to on your OSDs? We once saw a terrible slowdown on our OSDs during snaptrim events and setting bluefs_buffered_io to true alleviated that issue. That was on a nautilus cluster. Respectfully, *Wes Dillingham* w...@wesdillingham.com LinkedIn <http://www.linkedin.com/in/wesleydillingham> On Tue, Sep 13, 2022 at 10:48 AM Boris Behrens wrote: > The cluster is SSD only with 2TB,4TB and 8TB disks. I would expect that > this should be done fairly fast. > For now I will recreate every OSD in the cluster and check if this helps. > > Do you experience slow OPS (so the cluster shows a message like "cluster > [WRN] Health check update: 679 slow ops, oldest one blocked for 95 sec, > daemons > > [osd.0,osd.106,osd.107,osd.108,osd.113,osd.116,osd.123,osd.124,osd.125,osd.134]... > have slow ops. (SLOW_OPS)")? > > I can also see a huge spike in the load of all hosts in our cluster for a > couple of minutes. > > > Am Di., 13. Sept. 2022 um 13:14 Uhr schrieb Frank Schilder : > > > Hi Boris. > > > > > 3. wait some time (took around 5-20 minutes) > > > > Sounds short. Might just have been the compaction that the OSDs do any > > ways on startup after upgrade. I don't know how to check for completed > > format conversion. What I see in your MON log is exactly what I have seen > > with default snap trim settings until all OSDs were converted. Once an > OSD > > falls behind and slow ops start piling up, everything comes to a halt. > Your > > logs clearly show a sudden drop of IOP/s on snap trim start and I would > > guess this is the cause of the slowly growing OPS back log of the OSDs. > > > > If its not that, I don't know what else to look for. > > > > Best regards, > > = > > Frank Schilder > > AIT Risø Campus > > Bygning 109, rum S14 > > > > ____________ > > From: Boris Behrens > > Sent: 13 September 2022 12:58:19 > > To: Frank Schilder > > Cc: ceph-users@ceph.io > > Subject: Re: [ceph-users] laggy OSDs and staling krbd IO after upgrade > > from nautilus to octopus > > > > Hi Frank, > > we converted the OSDs directly on the upgrade. > > > > 1. installing new ceph versions > > 2. restart all OSD daemons > > 3. wait some time (took around 5-20 minutes) > > 4. all OSDs were online again. > > > > So I would expect, that the OSDs are all upgraded correctly. > > I also checked when the trimming happens, and it does not seem to be an > > issue on it's own, as the trim happens all the time in various sizes. > > > > Am Di., 13. Sept. 2022 um 12:45 Uhr schrieb Frank Schilder > <mailto:fr...@dtu.dk>>: > > Are you observing this here: > > > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/LAN6PTZ2NHF2ZHAYXZIQPHZ4CMJKMI5K/ > > = > > Frank Schilder > > AIT Risø Campus > > Bygning 109, rum S14 > > > > > > From: Boris Behrens mailto:b...@kervyn.de>> > > Sent: 13 September 2022 11:43:20 > > To: ceph-users@ceph.io<mailto:ceph-users@ceph.io> > > Subject: [ceph-users] laggy OSDs and staling krbd IO after upgrade from > > nautilus to octopus > > > > Hi, I need you help really bad. > > > > we are currently experiencing a very bad cluster hangups that happen > > sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and once > > 2022-09-12 in the evening) > > We use krbd without cephx for the qemu clients and when the OSDs are > > getting laggy, the krbd connection comes to a grinding halt, to a point > > that all IO is staling and we can't even unmap the rbd device. > > > > From the logs, it looks like that the cluster starts to snaptrim a lot a > > PGs, then PGs become laggy and then the cluster snowballs into laggy > OSDs. > > I have attached the monitor log and the osd log (from one OSD) around the > > time where it happened. > > > > - is this a known issue? > > - what can I do to debug it further? > > - can I downgrade back to nautilus? > > - should I upgrade the PGs for the pool to 4096 or 8192? > > > > The cluster contains a mixture of 2,4 and 8TB SSDs (no rotating disks) > > where the 8TB disks got ~120PGs and the 2TB disks got ~30PGs. All hosts > > have a minimum of 128GB RAM and the kernel logs of all ceph hosts do not > > show anything for the timeframe. > > > > Cluster stats: > > cluster:
[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus
The cluster is SSD only with 2TB,4TB and 8TB disks. I would expect that this should be done fairly fast. For now I will recreate every OSD in the cluster and check if this helps. Do you experience slow OPS (so the cluster shows a message like "cluster [WRN] Health check update: 679 slow ops, oldest one blocked for 95 sec, daemons [osd.0,osd.106,osd.107,osd.108,osd.113,osd.116,osd.123,osd.124,osd.125,osd.134]... have slow ops. (SLOW_OPS)")? I can also see a huge spike in the load of all hosts in our cluster for a couple of minutes. Am Di., 13. Sept. 2022 um 13:14 Uhr schrieb Frank Schilder : > Hi Boris. > > > 3. wait some time (took around 5-20 minutes) > > Sounds short. Might just have been the compaction that the OSDs do any > ways on startup after upgrade. I don't know how to check for completed > format conversion. What I see in your MON log is exactly what I have seen > with default snap trim settings until all OSDs were converted. Once an OSD > falls behind and slow ops start piling up, everything comes to a halt. Your > logs clearly show a sudden drop of IOP/s on snap trim start and I would > guess this is the cause of the slowly growing OPS back log of the OSDs. > > If its not that, I don't know what else to look for. > > Best regards, > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > > From: Boris Behrens > Sent: 13 September 2022 12:58:19 > To: Frank Schilder > Cc: ceph-users@ceph.io > Subject: Re: [ceph-users] laggy OSDs and staling krbd IO after upgrade > from nautilus to octopus > > Hi Frank, > we converted the OSDs directly on the upgrade. > > 1. installing new ceph versions > 2. restart all OSD daemons > 3. wait some time (took around 5-20 minutes) > 4. all OSDs were online again. > > So I would expect, that the OSDs are all upgraded correctly. > I also checked when the trimming happens, and it does not seem to be an > issue on it's own, as the trim happens all the time in various sizes. > > Am Di., 13. Sept. 2022 um 12:45 Uhr schrieb Frank Schilder <mailto:fr...@dtu.dk>>: > Are you observing this here: > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/LAN6PTZ2NHF2ZHAYXZIQPHZ4CMJKMI5K/ > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > > From: Boris Behrens mailto:b...@kervyn.de>> > Sent: 13 September 2022 11:43:20 > To: ceph-users@ceph.io<mailto:ceph-users@ceph.io> > Subject: [ceph-users] laggy OSDs and staling krbd IO after upgrade from > nautilus to octopus > > Hi, I need you help really bad. > > we are currently experiencing a very bad cluster hangups that happen > sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and once > 2022-09-12 in the evening) > We use krbd without cephx for the qemu clients and when the OSDs are > getting laggy, the krbd connection comes to a grinding halt, to a point > that all IO is staling and we can't even unmap the rbd device. > > From the logs, it looks like that the cluster starts to snaptrim a lot a > PGs, then PGs become laggy and then the cluster snowballs into laggy OSDs. > I have attached the monitor log and the osd log (from one OSD) around the > time where it happened. > > - is this a known issue? > - what can I do to debug it further? > - can I downgrade back to nautilus? > - should I upgrade the PGs for the pool to 4096 or 8192? > > The cluster contains a mixture of 2,4 and 8TB SSDs (no rotating disks) > where the 8TB disks got ~120PGs and the 2TB disks got ~30PGs. All hosts > have a minimum of 128GB RAM and the kernel logs of all ceph hosts do not > show anything for the timeframe. > > Cluster stats: > cluster: > id: 74313356-3b3d-43f3-bce6-9fb0e4591097 > health: HEALTH_OK > > services: > mon: 3 daemons, quorum ceph-rbd-mon4,ceph-rbd-mon5,ceph-rbd-mon6 (age > 25h) > mgr: ceph-rbd-mon5(active, since 4d), standbys: ceph-rbd-mon4, > ceph-rbd-mon6 > osd: 149 osds: 149 up (since 6d), 149 in (since 7w) > > data: > pools: 4 pools, 2241 pgs > objects: 25.43M objects, 82 TiB > usage: 231 TiB used, 187 TiB / 417 TiB avail > pgs: 2241 active+clean > > io: > client: 211 MiB/s rd, 273 MiB/s wr, 1.43k op/s rd, 8.80k op/s wr > > --- RAW STORAGE --- > CLASS SIZE AVAILUSED RAW USED %RAW USED > ssd417 TiB 187 TiB 230 TiB 231 TiB 55.30 > TOTAL 417 TiB 187 TiB 230 TiB 231 TiB 55.30 > > --- POOLS --- > POOL ID PGS STORED OBJECTS USED %USED MAX > AVAIL > isos764 455 GiB 117.
[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus
> > It might be possible that converting OSDs before setting require-osd- > release=octopus leads to a broken state of the converted OSDs. I could > not yet find a way out of this situation. We will soon perform a third > upgrade test to test this hypothesis. > So with upgrading one should put this line in ceph.conf, before restarting the osd daemons? require-osd-release=octopus (I still need to upgrade from Nautilus) ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus
Hi Frank, we converted the OSDs directly on the upgrade. 1. installing new ceph versions 2. restart all OSD daemons 3. wait some time (took around 5-20 minutes) 4. all OSDs were online again. So I would expect, that the OSDs are all upgraded correctly. I also checked when the trimming happens, and it does not seem to be an issue on it's own, as the trim happens all the time in various sizes. Am Di., 13. Sept. 2022 um 12:45 Uhr schrieb Frank Schilder : > Are you observing this here: > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/LAN6PTZ2NHF2ZHAYXZIQPHZ4CMJKMI5K/ > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > > From: Boris Behrens > Sent: 13 September 2022 11:43:20 > To: ceph-users@ceph.io > Subject: [ceph-users] laggy OSDs and staling krbd IO after upgrade from > nautilus to octopus > > Hi, I need you help really bad. > > we are currently experiencing a very bad cluster hangups that happen > sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and once > 2022-09-12 in the evening) > We use krbd without cephx for the qemu clients and when the OSDs are > getting laggy, the krbd connection comes to a grinding halt, to a point > that all IO is staling and we can't even unmap the rbd device. > > From the logs, it looks like that the cluster starts to snaptrim a lot a > PGs, then PGs become laggy and then the cluster snowballs into laggy OSDs. > I have attached the monitor log and the osd log (from one OSD) around the > time where it happened. > > - is this a known issue? > - what can I do to debug it further? > - can I downgrade back to nautilus? > - should I upgrade the PGs for the pool to 4096 or 8192? > > The cluster contains a mixture of 2,4 and 8TB SSDs (no rotating disks) > where the 8TB disks got ~120PGs and the 2TB disks got ~30PGs. All hosts > have a minimum of 128GB RAM and the kernel logs of all ceph hosts do not > show anything for the timeframe. > > Cluster stats: > cluster: > id: 74313356-3b3d-43f3-bce6-9fb0e4591097 > health: HEALTH_OK > > services: > mon: 3 daemons, quorum ceph-rbd-mon4,ceph-rbd-mon5,ceph-rbd-mon6 (age > 25h) > mgr: ceph-rbd-mon5(active, since 4d), standbys: ceph-rbd-mon4, > ceph-rbd-mon6 > osd: 149 osds: 149 up (since 6d), 149 in (since 7w) > > data: > pools: 4 pools, 2241 pgs > objects: 25.43M objects, 82 TiB > usage: 231 TiB used, 187 TiB / 417 TiB avail > pgs: 2241 active+clean > > io: > client: 211 MiB/s rd, 273 MiB/s wr, 1.43k op/s rd, 8.80k op/s wr > > --- RAW STORAGE --- > CLASS SIZE AVAILUSED RAW USED %RAW USED > ssd417 TiB 187 TiB 230 TiB 231 TiB 55.30 > TOTAL 417 TiB 187 TiB 230 TiB 231 TiB 55.30 > > --- POOLS --- > POOL ID PGS STORED OBJECTS USED %USED MAX > AVAIL > isos764 455 GiB 117.92k 1.3 TiB 1.17 38 > TiB > rbd 8 2048 76 TiB 24.65M 222 TiB 66.31 38 > TiB > archive 9 128 2.4 TiB 669.59k 7.3 TiB 6.06 38 > TiB > device_health_metrics 10 1 25 MiB 149 76 MiB 0 38 > TiB > > > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im > groüen Saal. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus
I checked the cluster for other snaptrim operations and they happen all over the place, so for me it looks like they just happend to be done when the issue occured, but were not the driving factor. Am Di., 13. Sept. 2022 um 12:04 Uhr schrieb Boris Behrens : > Because someone mentioned that the attachments did not went through I > created pastebin links: > > monlog: https://pastebin.com/jiNPUrtL > osdlog: https://pastebin.com/dxqXgqDz > > Am Di., 13. Sept. 2022 um 11:43 Uhr schrieb Boris Behrens : > >> Hi, I need you help really bad. >> >> we are currently experiencing a very bad cluster hangups that happen >> sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and once >> 2022-09-12 in the evening) >> We use krbd without cephx for the qemu clients and when the OSDs are >> getting laggy, the krbd connection comes to a grinding halt, to a point >> that all IO is staling and we can't even unmap the rbd device. >> >> From the logs, it looks like that the cluster starts to snaptrim a lot a >> PGs, then PGs become laggy and then the cluster snowballs into laggy OSDs. >> I have attached the monitor log and the osd log (from one OSD) around the >> time where it happened. >> >> - is this a known issue? >> - what can I do to debug it further? >> - can I downgrade back to nautilus? >> - should I upgrade the PGs for the pool to 4096 or 8192? >> >> The cluster contains a mixture of 2,4 and 8TB SSDs (no rotating disks) >> where the 8TB disks got ~120PGs and the 2TB disks got ~30PGs. All hosts >> have a minimum of 128GB RAM and the kernel logs of all ceph hosts do not >> show anything for the timeframe. >> >> Cluster stats: >> cluster: >> id: 74313356-3b3d-43f3-bce6-9fb0e4591097 >> health: HEALTH_OK >> >> services: >> mon: 3 daemons, quorum ceph-rbd-mon4,ceph-rbd-mon5,ceph-rbd-mon6 (age >> 25h) >> mgr: ceph-rbd-mon5(active, since 4d), standbys: ceph-rbd-mon4, >> ceph-rbd-mon6 >> osd: 149 osds: 149 up (since 6d), 149 in (since 7w) >> >> data: >> pools: 4 pools, 2241 pgs >> objects: 25.43M objects, 82 TiB >> usage: 231 TiB used, 187 TiB / 417 TiB avail >> pgs: 2241 active+clean >> >> io: >> client: 211 MiB/s rd, 273 MiB/s wr, 1.43k op/s rd, 8.80k op/s wr >> >> --- RAW STORAGE --- >> CLASS SIZE AVAILUSED RAW USED %RAW USED >> ssd417 TiB 187 TiB 230 TiB 231 TiB 55.30 >> TOTAL 417 TiB 187 TiB 230 TiB 231 TiB 55.30 >> >> --- POOLS --- >> POOL ID PGS STORED OBJECTS USED %USED MAX >> AVAIL >> isos764 455 GiB 117.92k 1.3 TiB 1.17 38 >> TiB >> rbd 8 2048 76 TiB 24.65M 222 TiB 66.31 38 >> TiB >> archive 9 128 2.4 TiB 669.59k 7.3 TiB 6.06 38 >> TiB >> device_health_metrics 10 1 25 MiB 149 76 MiB 0 38 >> TiB >> >> >> >> -- >> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im >> groüen Saal. >> > > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im > groüen Saal. > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus
Because someone mentioned that the attachments did not went through I created pastebin links: monlog: https://pastebin.com/jiNPUrtL osdlog: https://pastebin.com/dxqXgqDz Am Di., 13. Sept. 2022 um 11:43 Uhr schrieb Boris Behrens : > Hi, I need you help really bad. > > we are currently experiencing a very bad cluster hangups that happen > sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and once > 2022-09-12 in the evening) > We use krbd without cephx for the qemu clients and when the OSDs are > getting laggy, the krbd connection comes to a grinding halt, to a point > that all IO is staling and we can't even unmap the rbd device. > > From the logs, it looks like that the cluster starts to snaptrim a lot a > PGs, then PGs become laggy and then the cluster snowballs into laggy OSDs. > I have attached the monitor log and the osd log (from one OSD) around the > time where it happened. > > - is this a known issue? > - what can I do to debug it further? > - can I downgrade back to nautilus? > - should I upgrade the PGs for the pool to 4096 or 8192? > > The cluster contains a mixture of 2,4 and 8TB SSDs (no rotating disks) > where the 8TB disks got ~120PGs and the 2TB disks got ~30PGs. All hosts > have a minimum of 128GB RAM and the kernel logs of all ceph hosts do not > show anything for the timeframe. > > Cluster stats: > cluster: > id: 74313356-3b3d-43f3-bce6-9fb0e4591097 > health: HEALTH_OK > > services: > mon: 3 daemons, quorum ceph-rbd-mon4,ceph-rbd-mon5,ceph-rbd-mon6 (age > 25h) > mgr: ceph-rbd-mon5(active, since 4d), standbys: ceph-rbd-mon4, > ceph-rbd-mon6 > osd: 149 osds: 149 up (since 6d), 149 in (since 7w) > > data: > pools: 4 pools, 2241 pgs > objects: 25.43M objects, 82 TiB > usage: 231 TiB used, 187 TiB / 417 TiB avail > pgs: 2241 active+clean > > io: > client: 211 MiB/s rd, 273 MiB/s wr, 1.43k op/s rd, 8.80k op/s wr > > --- RAW STORAGE --- > CLASS SIZE AVAILUSED RAW USED %RAW USED > ssd417 TiB 187 TiB 230 TiB 231 TiB 55.30 > TOTAL 417 TiB 187 TiB 230 TiB 231 TiB 55.30 > > --- POOLS --- > POOL ID PGS STORED OBJECTS USED %USED MAX > AVAIL > isos764 455 GiB 117.92k 1.3 TiB 1.17 38 > TiB > rbd 8 2048 76 TiB 24.65M 222 TiB 66.31 38 > TiB > archive 9 128 2.4 TiB 669.59k 7.3 TiB 6.06 38 > TiB > device_health_metrics 10 1 25 MiB 149 76 MiB 0 38 > TiB > > > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im > groüen Saal. > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io