[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-25 Thread Torkil Svensgaard
On 25-03-2024 23:07, Kai Stian Olstad wrote: On Mon, Mar 25, 2024 at 10:58:24PM +0100, Kai Stian Olstad wrote: On Mon, Mar 25, 2024 at 09:28:01PM +0100, Torkil Svensgaard wrote: My tally came to 412 out of 539 OSDs showing up in a blocked_by list and that is about every OSD with data prior

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-25 Thread Torkil Svensgaard
On 25-03-2024 22:58, Kai Stian Olstad wrote: On Mon, Mar 25, 2024 at 09:28:01PM +0100, Torkil Svensgaard wrote: My tally came to 412 out of 539 OSDs showing up in a blocked_by list and that is about every OSD with data prior to adding ~100 empty OSDs. How 400 read targets and 100 write

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-25 Thread Kai Stian Olstad
On Mon, Mar 25, 2024 at 10:58:24PM +0100, Kai Stian Olstad wrote: On Mon, Mar 25, 2024 at 09:28:01PM +0100, Torkil Svensgaard wrote: My tally came to 412 out of 539 OSDs showing up in a blocked_by list and that is about every OSD with data prior to adding ~100 empty OSDs. How 400 read targets

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-25 Thread Kai Stian Olstad
On Mon, Mar 25, 2024 at 09:28:01PM +0100, Torkil Svensgaard wrote: My tally came to 412 out of 539 OSDs showing up in a blocked_by list and that is about every OSD with data prior to adding ~100 empty OSDs. How 400 read targets and 100 write targets can only equal ~60 backfills with

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-25 Thread Torkil Svensgaard
Neither downing or restarting the OSD cleared the bogus blocked_by. I guess it makes no sense to look further at blocked_by as the cause when the data can't be trusted and there is no obvious smoking gun like a few OSDs blocking everything. My tally came to 412 out of 539 OSDs showing up in a

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-25 Thread Anthony D'Atri
First try "ceph osd down 89" > On Mar 25, 2024, at 15:37, Alexander E. Patrakov wrote: > > On Mon, Mar 25, 2024 at 7:37 PM Torkil Svensgaard wrote: >> >> >> >> On 24/03/2024 01:14, Torkil Svensgaard wrote: >>> On 24-03-2024 00:31, Alexander E. Patrakov wrote: Hi Torkil, >>> >>> Hi

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-25 Thread Alexander E. Patrakov
On Mon, Mar 25, 2024 at 7:37 PM Torkil Svensgaard wrote: > > > > On 24/03/2024 01:14, Torkil Svensgaard wrote: > > On 24-03-2024 00:31, Alexander E. Patrakov wrote: > >> Hi Torkil, > > > > Hi Alexander > > > >> Thanks for the update. Even though the improvement is small, it is > >> still an

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-25 Thread Torkil Svensgaard
On 24/03/2024 01:14, Torkil Svensgaard wrote: On 24-03-2024 00:31, Alexander E. Patrakov wrote: Hi Torkil, Hi Alexander Thanks for the update. Even though the improvement is small, it is still an improvement, consistent with the osd_max_backfills value, and it proves that there are still

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-24 Thread Torkil Svensgaard
On 24-03-2024 13:41, Tyler Stachecki wrote: On Sat, Mar 23, 2024, 4:26 AM Torkil Svensgaard wrote: Hi ... Using mclock with high_recovery_ops profile. What is the bottleneck here? I would have expected a huge number of simultaneous backfills. Backfill reservation logjam? mClock is very

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-24 Thread Tyler Stachecki
On Sat, Mar 23, 2024, 4:26 AM Torkil Svensgaard wrote: > Hi > > ... Using mclock with high_recovery_ops profile. > > What is the bottleneck here? I would have expected a huge number of > simultaneous backfills. Backfill reservation logjam? > mClock is very buggy in my experience and frequently

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Alexander E. Patrakov
Hi Torkil, Thanks for the update. Even though the improvement is small, it is still an improvement, consistent with the osd_max_backfills value, and it proves that there are still unsolved peering issues. I have looked at both the old and the new state of the PG, but could not find anything else

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Torkil Svensgaard
Hi Alex New query output attached after restarting both OSDs. OSD 237 is no longer mentioned but it unfortunately made no difference for the number of backfills which went 59->62->62. Mvh. Torkil On 23-03-2024 22:26, Alexander E. Patrakov wrote: Hi Torkil, I have looked at the files that

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Alexander E. Patrakov
Hi Torkil, I have looked at the files that you attached. They were helpful: pool 11 is problematic, it complains about degraded objects for no obvious reason. I think that is the blocker. I also noted that you mentioned peering problems, and I suspect that they are not completely resolved. As a

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Alexander E. Patrakov
Hi Torkil, I have looked at the CRUSH rules, and the equivalent rules work on my test cluster. So this cannot be the cause of the blockage. What happens if you increase the osd_max_backfills setting temporarily? It may be a good idea to investigate a few of the stalled PGs. Please run commands

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Torkil Svensgaard
On 23-03-2024 18:43, Alexander E. Patrakov wrote: Hi Torkil, Unfortunately, your files contain nothing obviously bad or suspicious, except for two things: more PGs than usual and bad balance. What's your "mon max pg per osd" setting? [root@lazy ~]# ceph config get mon mon_max_pg_per_osd

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Torkil Svensgaard
On 23-03-2024 19:05, Alexander E. Patrakov wrote: Sorry for replying to myself, but "ceph osd pool ls detail" by itself is insufficient. For every erasure code profile mentioned in the output, please also run something like this: ceph osd erasure-code-profile get prf-for-ec-data ...where

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Alexander E. Patrakov
Sorry for replying to myself, but "ceph osd pool ls detail" by itself is insufficient. For every erasure code profile mentioned in the output, please also run something like this: ceph osd erasure-code-profile get prf-for-ec-data ...where "prf-for-ec-data" is the name that appears after the

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Alexander E. Patrakov
Hi Torkil, I take my previous response back. You have an erasure-coded pool with nine shards but only three datacenters. This, in general, cannot work. You need either nine datacenters or a very custom CRUSH rule. The second option may not be available if the current EC setup is already

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Alexander E. Patrakov
Hi Torkil, Unfortunately, your files contain nothing obviously bad or suspicious, except for two things: more PGs than usual and bad balance. What's your "mon max pg per osd" setting? On Sun, Mar 24, 2024 at 1:08 AM Torkil Svensgaard wrote: > > On 2024-03-23 17:54, Kai Stian Olstad wrote: > >

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Kai Stian Olstad
On Sat, Mar 23, 2024 at 12:09:29PM +0100, Torkil Svensgaard wrote: The other output is too big for pastebin and I'm not familiar with paste services, any suggestion for a preferred way to share such output? You can attached files to the mail here on the list. -- Kai Stian Olstad

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Torkil Svensgaard
On 23-03-2024 10:44, Alexander E. Patrakov wrote: Hello Torkil, Hi Alexander It would help if you provided the whole "ceph osd df tree" and "ceph pg ls" outputs. Of course, here's ceph osd df tree to start with: https://pastebin.com/X50b2W0J The other output is too big for pastebin and

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Alexander E. Patrakov
Hello Torkil, It would help if you provided the whole "ceph osd df tree" and "ceph pg ls" outputs. On Sat, Mar 23, 2024 at 4:26 PM Torkil Svensgaard wrote: > > Hi > > We have this after adding some hosts and changing crush failure domain > to datacenter: > > pgs: 1338512379/3162732055