[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-10-15 Thread Frank Schilder
Hi all,

I can now add another data point as well. We upgraded our production cluster 
from mimic to octopus with the procedure of

- set quick-fix-on-start=false in all ceph.conf files and the mon config store
- set nosnaptrim
- upgrade all daemons
- set require-osd-release=octopus
- host by host: set quick-fix-on-start=true in ceph.conf and restart OSDs
- unset nosnaptrim

On our production system the conversion went much faster compared with the test 
system. This process is very CPU intensive, yet converting 70 OSDs per host 
with 2x18 core Broadwell CPUs worked without problems. Load reached more than 
200% but it all finished without crashes.

Upgrading the daemons and completing the conversion of all hosts took 3 very 
long days. After conversion in this way no problems with snaptrim. We also 
enabled ephemeral pinning on our FS with 8 active MDSes and see no change in 
single-user performance, but at least 2-3 times higher aggregated throughput 
(home for a 500 node HPC cluster).

We did have a severe hiccup though. Very small OSDs with a size of ca. 100G 
crash on octopus when OMAP reaches a certain size. I don't know yet what a safe 
minimum size is (ongoing thread "OSD crashes during upgrade mimic->octopus"). 
The 300G OSDs on our test cluster worked fine.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Tyler Stachecki 
Sent: 27 September 2022 02:00
To: Marc
Cc: Frank Schilder; ceph-users
Subject: Re: [ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from 
nautilus to octopus

Just a datapoint - we upgraded several large Mimic-born clusters straight to 
15.2.12 with the quick fsck disabled in ceph.conf, then did 
require-osd-release, and finally did the omap conversion offline after the 
cluster was upgraded using the bluestore tool while the OSDs were down (all 
done in batches). Clusters are zippy as ever.

Maybe on a whim, try doing an offline fsck with the bluestore tool and see if 
it improves things?

To answer an earlier question, if you have no health statuses muted, a 'ceph 
health detail' should show you at least a subset of OSDs that have not gone 
through the omap conversion yet.

Cheers,
Tyler

On Mon, Sep 26, 2022, 5:13 PM Marc 
mailto:m...@f1-outsourcing.eu>> wrote:
Hi Frank,

Thank you very much for this! :)

>
> we just completed a third upgrade test. There are 2 ways to convert the
> OSDs:
>
> A) convert along with the upgrade (quick-fix-on-start=true)
> B) convert after setting require-osd-release=octopus (quick-fix-on-
> start=false until require-osd-release set to octopus, then restart to
> initiate conversion)
>
> There is a variation A' of A: follow A, then initiate manual compaction
> and restart all OSDs.
>
> Our experiments show that paths A and B do *not* yield the same result.
> Following path A leads to a severely performance degraded cluster. As of
> now, we cannot confirm that A' fixes this. It seems that the only way
> out is to zap and re-deploy all OSDs, basically what Boris is doing
> right now.
>
> We extended now our procedure to adding
>
>   bluestore_fsck_quick_fix_on_mount = false
>
> to every ceph.conf file and executing
>
>   ceph config set osd bluestore_fsck_quick_fix_on_mount false
>
> to catch any accidents. After daemon upgrade, we set
> bluestore_fsck_quick_fix_on_mount = true host by host in the ceph.conf
> and restart OSDs.
>
> This procedure works like a charm.
>
> I don't know what the difference between A and B is. It is possible that
> B executes an extra step that is missing in A. The performance
> degradation only shows up when snaptrim is active, but then it is very
> severe. I suspect that many users who complained about snaptrim in the
> past have at least 1 A-converted OSD in their cluster.
>
> If you have a cluster upgraded with B-converted OSDs, it works like a
> native octopus cluster. There is very little performance reduction
> compared with mimic. In exchange, I have the impression that it operates
> more stable.
___
ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io>
To unsubscribe send an email to 
ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-26 Thread Tyler Stachecki
Just a datapoint - we upgraded several large Mimic-born clusters straight
to 15.2.12 with the quick fsck disabled in ceph.conf, then did
require-osd-release, and finally did the omap conversion offline after the
cluster was upgraded using the bluestore tool while the OSDs were down (all
done in batches). Clusters are zippy as ever.

Maybe on a whim, try doing an offline fsck with the bluestore tool and see
if it improves things?

To answer an earlier question, if you have no health statuses muted, a
'ceph health detail' should show you at least a subset of OSDs that have
not gone through the omap conversion yet.

Cheers,
Tyler

On Mon, Sep 26, 2022, 5:13 PM Marc  wrote:

> Hi Frank,
>
> Thank you very much for this! :)
>
> >
> > we just completed a third upgrade test. There are 2 ways to convert the
> > OSDs:
> >
> > A) convert along with the upgrade (quick-fix-on-start=true)
> > B) convert after setting require-osd-release=octopus (quick-fix-on-
> > start=false until require-osd-release set to octopus, then restart to
> > initiate conversion)
> >
> > There is a variation A' of A: follow A, then initiate manual compaction
> > and restart all OSDs.
> >
> > Our experiments show that paths A and B do *not* yield the same result.
> > Following path A leads to a severely performance degraded cluster. As of
> > now, we cannot confirm that A' fixes this. It seems that the only way
> > out is to zap and re-deploy all OSDs, basically what Boris is doing
> > right now.
> >
> > We extended now our procedure to adding
> >
> >   bluestore_fsck_quick_fix_on_mount = false
> >
> > to every ceph.conf file and executing
> >
> >   ceph config set osd bluestore_fsck_quick_fix_on_mount false
> >
> > to catch any accidents. After daemon upgrade, we set
> > bluestore_fsck_quick_fix_on_mount = true host by host in the ceph.conf
> > and restart OSDs.
> >
> > This procedure works like a charm.
> >
> > I don't know what the difference between A and B is. It is possible that
> > B executes an extra step that is missing in A. The performance
> > degradation only shows up when snaptrim is active, but then it is very
> > severe. I suspect that many users who complained about snaptrim in the
> > past have at least 1 A-converted OSD in their cluster.
> >
> > If you have a cluster upgraded with B-converted OSDs, it works like a
> > native octopus cluster. There is very little performance reduction
> > compared with mimic. In exchange, I have the impression that it operates
> > more stable.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-26 Thread Marc
Hi Frank,

Thank you very much for this! :)

> 
> we just completed a third upgrade test. There are 2 ways to convert the
> OSDs:
> 
> A) convert along with the upgrade (quick-fix-on-start=true)
> B) convert after setting require-osd-release=octopus (quick-fix-on-
> start=false until require-osd-release set to octopus, then restart to
> initiate conversion)
> 
> There is a variation A' of A: follow A, then initiate manual compaction
> and restart all OSDs.
> 
> Our experiments show that paths A and B do *not* yield the same result.
> Following path A leads to a severely performance degraded cluster. As of
> now, we cannot confirm that A' fixes this. It seems that the only way
> out is to zap and re-deploy all OSDs, basically what Boris is doing
> right now.
> 
> We extended now our procedure to adding
> 
>   bluestore_fsck_quick_fix_on_mount = false
> 
> to every ceph.conf file and executing
> 
>   ceph config set osd bluestore_fsck_quick_fix_on_mount false
> 
> to catch any accidents. After daemon upgrade, we set
> bluestore_fsck_quick_fix_on_mount = true host by host in the ceph.conf
> and restart OSDs.
> 
> This procedure works like a charm.
> 
> I don't know what the difference between A and B is. It is possible that
> B executes an extra step that is missing in A. The performance
> degradation only shows up when snaptrim is active, but then it is very
> severe. I suspect that many users who complained about snaptrim in the
> past have at least 1 A-converted OSD in their cluster.
> 
> If you have a cluster upgraded with B-converted OSDs, it works like a
> native octopus cluster. There is very little performance reduction
> compared with mimic. In exchange, I have the impression that it operates
> more stable.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
Hi,
i just checked and all OSDs have it set to true.
It seems also not a problem with the snaptrim opration.

We just had two times in the last 7 days where nearly all OSDs logged a lot
(around 3k times in 20 minutes) of these messages:
022-09-12T20:27:19.146+0200 7f576de49700 -1 osd.9 786378 get_health_metrics
reporting 1 slow ops, oldest is osd_op(client.153241560.0:42288714 8.56
8:6a19e4ee:::rbd_data.4c64dc3662fb05.0c00:head [write
2162688~4096 in=4096b] snapc 9835e=[] ondisk+write+known_if_redirected
e786375)


Am Di., 13. Sept. 2022 um 20:20 Uhr schrieb Wesley Dillingham <
w...@wesdillingham.com>:

> I haven't read through this entire thread so forgive me if already
> mentioned:
>
> What is the parameter "bluefs_buffered_io" set to on your OSDs? We once
> saw a terrible slowdown on our OSDs during snaptrim events and setting
> bluefs_buffered_io to true alleviated that issue. That was on a nautilus
> cluster.
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Tue, Sep 13, 2022 at 10:48 AM Boris Behrens  wrote:
>
>> The cluster is SSD only with 2TB,4TB and 8TB disks. I would expect that
>> this should be done fairly fast.
>> For now I will recreate every OSD in the cluster and check if this helps.
>>
>> Do you experience slow OPS (so the cluster shows a message like "cluster
>> [WRN] Health check update: 679 slow ops, oldest one blocked for 95 sec,
>> daemons
>>
>> [osd.0,osd.106,osd.107,osd.108,osd.113,osd.116,osd.123,osd.124,osd.125,osd.134]...
>> have slow ops. (SLOW_OPS)")?
>>
>> I can also see a huge spike in the load of all hosts in our cluster for a
>> couple of minutes.
>>
>>
>> Am Di., 13. Sept. 2022 um 13:14 Uhr schrieb Frank Schilder > >:
>>
>> > Hi Boris.
>> >
>> > > 3. wait some time (took around 5-20 minutes)
>> >
>> > Sounds short. Might just have been the compaction that the OSDs do any
>> > ways on startup after upgrade. I don't know how to check for completed
>> > format conversion. What I see in your MON log is exactly what I have
>> seen
>> > with default snap trim settings until all OSDs were converted. Once an
>> OSD
>> > falls behind and slow ops start piling up, everything comes to a halt.
>> Your
>> > logs clearly show a sudden drop of IOP/s on snap trim start and I would
>> > guess this is the cause of the slowly growing OPS back log of the OSDs.
>> >
>> > If its not that, I don't know what else to look for.
>> >
>> > Best regards,
>> > =
>> > Frank Schilder
>> > AIT Risø Campus
>> > Bygning 109, rum S14
>> >
>> > 
>> > From: Boris Behrens 
>> > Sent: 13 September 2022 12:58:19
>> > To: Frank Schilder
>> > Cc: ceph-users@ceph.io
>> > Subject: Re: [ceph-users] laggy OSDs and staling krbd IO after upgrade
>> > from nautilus to octopus
>> >
>> > Hi Frank,
>> > we converted the OSDs directly on the upgrade.
>> >
>> > 1. installing new ceph versions
>> > 2. restart all OSD daemons
>> > 3. wait some time (took around 5-20 minutes)
>> > 4. all OSDs were online again.
>> >
>> > So I would expect, that the OSDs are all upgraded correctly.
>> > I also checked when the trimming happens, and it does not seem to be an
>> > issue on it's own, as the trim happens all the time in various sizes.
>> >
>> > Am Di., 13. Sept. 2022 um 12:45 Uhr schrieb Frank Schilder <
>> fr...@dtu.dk
>> > <mailto:fr...@dtu.dk>>:
>> > Are you observing this here:
>> >
>> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/LAN6PTZ2NHF2ZHAYXZIQPHZ4CMJKMI5K/
>> > =
>> > Frank Schilder
>> > AIT Risø Campus
>> > Bygning 109, rum S14
>> >
>> > 
>> > From: Boris Behrens mailto:b...@kervyn.de>>
>> > Sent: 13 September 2022 11:43:20
>> > To: ceph-users@ceph.io<mailto:ceph-users@ceph.io>
>> > Subject: [ceph-users] laggy OSDs and staling krbd IO after upgrade from
>> > nautilus to octopus
>> >
>> > Hi, I need you help really bad.
>> >
>> > we are currently experiencing a very bad cluster hangups that happen
>> > sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and
>> once
>> > 2022-09-12 in the evening)
&g

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Wesley Dillingham
I haven't read through this entire thread so forgive me if already
mentioned:

What is the parameter "bluefs_buffered_io" set to on your OSDs? We once saw
a terrible slowdown on our OSDs during snaptrim events and setting
bluefs_buffered_io to true alleviated that issue. That was on a nautilus
cluster.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Tue, Sep 13, 2022 at 10:48 AM Boris Behrens  wrote:

> The cluster is SSD only with 2TB,4TB and 8TB disks. I would expect that
> this should be done fairly fast.
> For now I will recreate every OSD in the cluster and check if this helps.
>
> Do you experience slow OPS (so the cluster shows a message like "cluster
> [WRN] Health check update: 679 slow ops, oldest one blocked for 95 sec,
> daemons
>
> [osd.0,osd.106,osd.107,osd.108,osd.113,osd.116,osd.123,osd.124,osd.125,osd.134]...
> have slow ops. (SLOW_OPS)")?
>
> I can also see a huge spike in the load of all hosts in our cluster for a
> couple of minutes.
>
>
> Am Di., 13. Sept. 2022 um 13:14 Uhr schrieb Frank Schilder :
>
> > Hi Boris.
> >
> > > 3. wait some time (took around 5-20 minutes)
> >
> > Sounds short. Might just have been the compaction that the OSDs do any
> > ways on startup after upgrade. I don't know how to check for completed
> > format conversion. What I see in your MON log is exactly what I have seen
> > with default snap trim settings until all OSDs were converted. Once an
> OSD
> > falls behind and slow ops start piling up, everything comes to a halt.
> Your
> > logs clearly show a sudden drop of IOP/s on snap trim start and I would
> > guess this is the cause of the slowly growing OPS back log of the OSDs.
> >
> > If its not that, I don't know what else to look for.
> >
> > Best regards,
> > =
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> > ____________
> > From: Boris Behrens 
> > Sent: 13 September 2022 12:58:19
> > To: Frank Schilder
> > Cc: ceph-users@ceph.io
> > Subject: Re: [ceph-users] laggy OSDs and staling krbd IO after upgrade
> > from nautilus to octopus
> >
> > Hi Frank,
> > we converted the OSDs directly on the upgrade.
> >
> > 1. installing new ceph versions
> > 2. restart all OSD daemons
> > 3. wait some time (took around 5-20 minutes)
> > 4. all OSDs were online again.
> >
> > So I would expect, that the OSDs are all upgraded correctly.
> > I also checked when the trimming happens, and it does not seem to be an
> > issue on it's own, as the trim happens all the time in various sizes.
> >
> > Am Di., 13. Sept. 2022 um 12:45 Uhr schrieb Frank Schilder  > <mailto:fr...@dtu.dk>>:
> > Are you observing this here:
> >
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/LAN6PTZ2NHF2ZHAYXZIQPHZ4CMJKMI5K/
> > =
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> > 
> > From: Boris Behrens mailto:b...@kervyn.de>>
> > Sent: 13 September 2022 11:43:20
> > To: ceph-users@ceph.io<mailto:ceph-users@ceph.io>
> > Subject: [ceph-users] laggy OSDs and staling krbd IO after upgrade from
> > nautilus to octopus
> >
> > Hi, I need you help really bad.
> >
> > we are currently experiencing a very bad cluster hangups that happen
> > sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and once
> > 2022-09-12 in the evening)
> > We use krbd without cephx for the qemu clients and when the OSDs are
> > getting laggy, the krbd connection comes to a grinding halt, to a point
> > that all IO is staling and we can't even unmap the rbd device.
> >
> > From the logs, it looks like that the cluster starts to snaptrim a lot a
> > PGs, then PGs become laggy and then the cluster snowballs into laggy
> OSDs.
> > I have attached the monitor log and the osd log (from one OSD) around the
> > time where it happened.
> >
> > - is this a known issue?
> > - what can I do to debug it further?
> > - can I downgrade back to nautilus?
> > - should I upgrade the PGs for the pool to 4096 or 8192?
> >
> > The cluster contains a mixture of 2,4 and 8TB SSDs (no rotating disks)
> > where the 8TB disks got ~120PGs and the 2TB disks got ~30PGs. All hosts
> > have a minimum of 128GB RAM and the kernel logs of all ceph hosts do not
> > show anything for the timeframe.
> >
> > Cluster stats:
> >   cluster:

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
The cluster is SSD only with 2TB,4TB and 8TB disks. I would expect that
this should be done fairly fast.
For now I will recreate every OSD in the cluster and check if this helps.

Do you experience slow OPS (so the cluster shows a message like "cluster
[WRN] Health check update: 679 slow ops, oldest one blocked for 95 sec,
daemons
[osd.0,osd.106,osd.107,osd.108,osd.113,osd.116,osd.123,osd.124,osd.125,osd.134]...
have slow ops. (SLOW_OPS)")?

I can also see a huge spike in the load of all hosts in our cluster for a
couple of minutes.


Am Di., 13. Sept. 2022 um 13:14 Uhr schrieb Frank Schilder :

> Hi Boris.
>
> > 3. wait some time (took around 5-20 minutes)
>
> Sounds short. Might just have been the compaction that the OSDs do any
> ways on startup after upgrade. I don't know how to check for completed
> format conversion. What I see in your MON log is exactly what I have seen
> with default snap trim settings until all OSDs were converted. Once an OSD
> falls behind and slow ops start piling up, everything comes to a halt. Your
> logs clearly show a sudden drop of IOP/s on snap trim start and I would
> guess this is the cause of the slowly growing OPS back log of the OSDs.
>
> If its not that, I don't know what else to look for.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Boris Behrens 
> Sent: 13 September 2022 12:58:19
> To: Frank Schilder
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] laggy OSDs and staling krbd IO after upgrade
> from nautilus to octopus
>
> Hi Frank,
> we converted the OSDs directly on the upgrade.
>
> 1. installing new ceph versions
> 2. restart all OSD daemons
> 3. wait some time (took around 5-20 minutes)
> 4. all OSDs were online again.
>
> So I would expect, that the OSDs are all upgraded correctly.
> I also checked when the trimming happens, and it does not seem to be an
> issue on it's own, as the trim happens all the time in various sizes.
>
> Am Di., 13. Sept. 2022 um 12:45 Uhr schrieb Frank Schilder  <mailto:fr...@dtu.dk>>:
> Are you observing this here:
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/LAN6PTZ2NHF2ZHAYXZIQPHZ4CMJKMI5K/
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Boris Behrens mailto:b...@kervyn.de>>
> Sent: 13 September 2022 11:43:20
> To: ceph-users@ceph.io<mailto:ceph-users@ceph.io>
> Subject: [ceph-users] laggy OSDs and staling krbd IO after upgrade from
> nautilus to octopus
>
> Hi, I need you help really bad.
>
> we are currently experiencing a very bad cluster hangups that happen
> sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and once
> 2022-09-12 in the evening)
> We use krbd without cephx for the qemu clients and when the OSDs are
> getting laggy, the krbd connection comes to a grinding halt, to a point
> that all IO is staling and we can't even unmap the rbd device.
>
> From the logs, it looks like that the cluster starts to snaptrim a lot a
> PGs, then PGs become laggy and then the cluster snowballs into laggy OSDs.
> I have attached the monitor log and the osd log (from one OSD) around the
> time where it happened.
>
> - is this a known issue?
> - what can I do to debug it further?
> - can I downgrade back to nautilus?
> - should I upgrade the PGs for the pool to 4096 or 8192?
>
> The cluster contains a mixture of 2,4 and 8TB SSDs (no rotating disks)
> where the 8TB disks got ~120PGs and the 2TB disks got ~30PGs. All hosts
> have a minimum of 128GB RAM and the kernel logs of all ceph hosts do not
> show anything for the timeframe.
>
> Cluster stats:
>   cluster:
> id: 74313356-3b3d-43f3-bce6-9fb0e4591097
> health: HEALTH_OK
>
>   services:
> mon: 3 daemons, quorum ceph-rbd-mon4,ceph-rbd-mon5,ceph-rbd-mon6 (age
> 25h)
> mgr: ceph-rbd-mon5(active, since 4d), standbys: ceph-rbd-mon4,
> ceph-rbd-mon6
> osd: 149 osds: 149 up (since 6d), 149 in (since 7w)
>
>   data:
> pools:   4 pools, 2241 pgs
> objects: 25.43M objects, 82 TiB
> usage:   231 TiB used, 187 TiB / 417 TiB avail
> pgs: 2241 active+clean
>
>   io:
> client:   211 MiB/s rd, 273 MiB/s wr, 1.43k op/s rd, 8.80k op/s wr
>
> --- RAW STORAGE ---
> CLASS  SIZE AVAILUSED RAW USED  %RAW USED
> ssd417 TiB  187 TiB  230 TiB   231 TiB  55.30
> TOTAL  417 TiB  187 TiB  230 TiB   231 TiB  55.30
>
> --- POOLS ---
> POOL   ID  PGS   STORED   OBJECTS  USED %USED  MAX
> AVAIL
> isos764  455 GiB  117.

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Marc


> 
> It might be possible that converting OSDs before setting require-osd-
> release=octopus leads to a broken state of the converted OSDs. I could
> not yet find a way out of this situation. We will soon perform a third
> upgrade test to test this hypothesis.
> 

So with upgrading one should put this line in ceph.conf, before restarting the 
osd daemons?
require-osd-release=octopus

(I still need to upgrade from Nautilus)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
Hi Frank,
we converted the OSDs directly on the upgrade.

1. installing new ceph versions
2. restart all OSD daemons
3. wait some time (took around 5-20 minutes)
4. all OSDs were online again.

So I would expect, that the OSDs are all upgraded correctly.
I also checked when the trimming happens, and it does not seem to be an
issue on it's own, as the trim happens all the time in various sizes.

Am Di., 13. Sept. 2022 um 12:45 Uhr schrieb Frank Schilder :

> Are you observing this here:
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/LAN6PTZ2NHF2ZHAYXZIQPHZ4CMJKMI5K/
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Boris Behrens 
> Sent: 13 September 2022 11:43:20
> To: ceph-users@ceph.io
> Subject: [ceph-users] laggy OSDs and staling krbd IO after upgrade from
> nautilus to octopus
>
> Hi, I need you help really bad.
>
> we are currently experiencing a very bad cluster hangups that happen
> sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and once
> 2022-09-12 in the evening)
> We use krbd without cephx for the qemu clients and when the OSDs are
> getting laggy, the krbd connection comes to a grinding halt, to a point
> that all IO is staling and we can't even unmap the rbd device.
>
> From the logs, it looks like that the cluster starts to snaptrim a lot a
> PGs, then PGs become laggy and then the cluster snowballs into laggy OSDs.
> I have attached the monitor log and the osd log (from one OSD) around the
> time where it happened.
>
> - is this a known issue?
> - what can I do to debug it further?
> - can I downgrade back to nautilus?
> - should I upgrade the PGs for the pool to 4096 or 8192?
>
> The cluster contains a mixture of 2,4 and 8TB SSDs (no rotating disks)
> where the 8TB disks got ~120PGs and the 2TB disks got ~30PGs. All hosts
> have a minimum of 128GB RAM and the kernel logs of all ceph hosts do not
> show anything for the timeframe.
>
> Cluster stats:
>   cluster:
> id: 74313356-3b3d-43f3-bce6-9fb0e4591097
> health: HEALTH_OK
>
>   services:
> mon: 3 daemons, quorum ceph-rbd-mon4,ceph-rbd-mon5,ceph-rbd-mon6 (age
> 25h)
> mgr: ceph-rbd-mon5(active, since 4d), standbys: ceph-rbd-mon4,
> ceph-rbd-mon6
> osd: 149 osds: 149 up (since 6d), 149 in (since 7w)
>
>   data:
> pools:   4 pools, 2241 pgs
> objects: 25.43M objects, 82 TiB
> usage:   231 TiB used, 187 TiB / 417 TiB avail
> pgs: 2241 active+clean
>
>   io:
> client:   211 MiB/s rd, 273 MiB/s wr, 1.43k op/s rd, 8.80k op/s wr
>
> --- RAW STORAGE ---
> CLASS  SIZE AVAILUSED RAW USED  %RAW USED
> ssd417 TiB  187 TiB  230 TiB   231 TiB  55.30
> TOTAL  417 TiB  187 TiB  230 TiB   231 TiB  55.30
>
> --- POOLS ---
> POOL   ID  PGS   STORED   OBJECTS  USED %USED  MAX
> AVAIL
> isos764  455 GiB  117.92k  1.3 TiB   1.17 38
> TiB
> rbd 8  2048   76 TiB   24.65M  222 TiB  66.31 38
> TiB
> archive 9   128  2.4 TiB  669.59k  7.3 TiB   6.06 38
> TiB
> device_health_metrics  10 1   25 MiB  149   76 MiB  0 38
> TiB
>
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
I checked the cluster for other snaptrim operations and they happen all
over the place, so for me it looks like they just happend to be done when
the issue occured, but were not the driving factor.

Am Di., 13. Sept. 2022 um 12:04 Uhr schrieb Boris Behrens :

> Because someone mentioned that the attachments did not went through I
> created pastebin links:
>
> monlog: https://pastebin.com/jiNPUrtL
> osdlog: https://pastebin.com/dxqXgqDz
>
> Am Di., 13. Sept. 2022 um 11:43 Uhr schrieb Boris Behrens :
>
>> Hi, I need you help really bad.
>>
>> we are currently experiencing a very bad cluster hangups that happen
>> sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and once
>> 2022-09-12 in the evening)
>> We use krbd without cephx for the qemu clients and when the OSDs are
>> getting laggy, the krbd connection comes to a grinding halt, to a point
>> that all IO is staling and we can't even unmap the rbd device.
>>
>> From the logs, it looks like that the cluster starts to snaptrim a lot a
>> PGs, then PGs become laggy and then the cluster snowballs into laggy OSDs.
>> I have attached the monitor log and the osd log (from one OSD) around the
>> time where it happened.
>>
>> - is this a known issue?
>> - what can I do to debug it further?
>> - can I downgrade back to nautilus?
>> - should I upgrade the PGs for the pool to 4096 or 8192?
>>
>> The cluster contains a mixture of 2,4 and 8TB SSDs (no rotating disks)
>> where the 8TB disks got ~120PGs and the 2TB disks got ~30PGs. All hosts
>> have a minimum of 128GB RAM and the kernel logs of all ceph hosts do not
>> show anything for the timeframe.
>>
>> Cluster stats:
>>   cluster:
>> id: 74313356-3b3d-43f3-bce6-9fb0e4591097
>> health: HEALTH_OK
>>
>>   services:
>> mon: 3 daemons, quorum ceph-rbd-mon4,ceph-rbd-mon5,ceph-rbd-mon6 (age
>> 25h)
>> mgr: ceph-rbd-mon5(active, since 4d), standbys: ceph-rbd-mon4,
>> ceph-rbd-mon6
>> osd: 149 osds: 149 up (since 6d), 149 in (since 7w)
>>
>>   data:
>> pools:   4 pools, 2241 pgs
>> objects: 25.43M objects, 82 TiB
>> usage:   231 TiB used, 187 TiB / 417 TiB avail
>> pgs: 2241 active+clean
>>
>>   io:
>> client:   211 MiB/s rd, 273 MiB/s wr, 1.43k op/s rd, 8.80k op/s wr
>>
>> --- RAW STORAGE ---
>> CLASS  SIZE AVAILUSED RAW USED  %RAW USED
>> ssd417 TiB  187 TiB  230 TiB   231 TiB  55.30
>> TOTAL  417 TiB  187 TiB  230 TiB   231 TiB  55.30
>>
>> --- POOLS ---
>> POOL   ID  PGS   STORED   OBJECTS  USED %USED  MAX
>> AVAIL
>> isos764  455 GiB  117.92k  1.3 TiB   1.17 38
>> TiB
>> rbd 8  2048   76 TiB   24.65M  222 TiB  66.31 38
>> TiB
>> archive 9   128  2.4 TiB  669.59k  7.3 TiB   6.06 38
>> TiB
>> device_health_metrics  10 1   25 MiB  149   76 MiB  0 38
>> TiB
>>
>>
>>
>> --
>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
>> groüen Saal.
>>
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
Because someone mentioned that the attachments did not went through I
created pastebin links:

monlog: https://pastebin.com/jiNPUrtL
osdlog: https://pastebin.com/dxqXgqDz

Am Di., 13. Sept. 2022 um 11:43 Uhr schrieb Boris Behrens :

> Hi, I need you help really bad.
>
> we are currently experiencing a very bad cluster hangups that happen
> sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and once
> 2022-09-12 in the evening)
> We use krbd without cephx for the qemu clients and when the OSDs are
> getting laggy, the krbd connection comes to a grinding halt, to a point
> that all IO is staling and we can't even unmap the rbd device.
>
> From the logs, it looks like that the cluster starts to snaptrim a lot a
> PGs, then PGs become laggy and then the cluster snowballs into laggy OSDs.
> I have attached the monitor log and the osd log (from one OSD) around the
> time where it happened.
>
> - is this a known issue?
> - what can I do to debug it further?
> - can I downgrade back to nautilus?
> - should I upgrade the PGs for the pool to 4096 or 8192?
>
> The cluster contains a mixture of 2,4 and 8TB SSDs (no rotating disks)
> where the 8TB disks got ~120PGs and the 2TB disks got ~30PGs. All hosts
> have a minimum of 128GB RAM and the kernel logs of all ceph hosts do not
> show anything for the timeframe.
>
> Cluster stats:
>   cluster:
> id: 74313356-3b3d-43f3-bce6-9fb0e4591097
> health: HEALTH_OK
>
>   services:
> mon: 3 daemons, quorum ceph-rbd-mon4,ceph-rbd-mon5,ceph-rbd-mon6 (age
> 25h)
> mgr: ceph-rbd-mon5(active, since 4d), standbys: ceph-rbd-mon4,
> ceph-rbd-mon6
> osd: 149 osds: 149 up (since 6d), 149 in (since 7w)
>
>   data:
> pools:   4 pools, 2241 pgs
> objects: 25.43M objects, 82 TiB
> usage:   231 TiB used, 187 TiB / 417 TiB avail
> pgs: 2241 active+clean
>
>   io:
> client:   211 MiB/s rd, 273 MiB/s wr, 1.43k op/s rd, 8.80k op/s wr
>
> --- RAW STORAGE ---
> CLASS  SIZE AVAILUSED RAW USED  %RAW USED
> ssd417 TiB  187 TiB  230 TiB   231 TiB  55.30
> TOTAL  417 TiB  187 TiB  230 TiB   231 TiB  55.30
>
> --- POOLS ---
> POOL   ID  PGS   STORED   OBJECTS  USED %USED  MAX
> AVAIL
> isos764  455 GiB  117.92k  1.3 TiB   1.17 38
> TiB
> rbd 8  2048   76 TiB   24.65M  222 TiB  66.31 38
> TiB
> archive 9   128  2.4 TiB  669.59k  7.3 TiB   6.06 38
> TiB
> device_health_metrics  10 1   25 MiB  149   76 MiB  0 38
> TiB
>
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io