[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-23 Thread duluxoz
Hi Alexander, DOH! Thanks for pointing out my typo - I missed it, and yes, it was my issue.  :-) New issue (sort of): The requirement of the new RBD Image is 2 TB in size (its for a MariaDB Database/Data Warehouse). However, I'm getting the following errors: ~~~ mkfs.xfs: pwrite failed:

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Alexander E. Patrakov
Hi Torkil, Thanks for the update. Even though the improvement is small, it is still an improvement, consistent with the osd_max_backfills value, and it proves that there are still unsolved peering issues. I have looked at both the old and the new state of the PG, but could not find anything else

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Torkil Svensgaard
Hi Alex New query output attached after restarting both OSDs. OSD 237 is no longer mentioned but it unfortunately made no difference for the number of backfills which went 59->62->62. Mvh. Torkil On 23-03-2024 22:26, Alexander E. Patrakov wrote: Hi Torkil, I have looked at the files that

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Alexander E. Patrakov
Hi Torkil, I have looked at the files that you attached. They were helpful: pool 11 is problematic, it complains about degraded objects for no obvious reason. I think that is the blocker. I also noted that you mentioned peering problems, and I suspect that they are not completely resolved. As a

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Alexander E. Patrakov
Hi Torkil, I have looked at the CRUSH rules, and the equivalent rules work on my test cluster. So this cannot be the cause of the blockage. What happens if you increase the osd_max_backfills setting temporarily? It may be a good idea to investigate a few of the stalled PGs. Please run commands

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Torkil Svensgaard
On 23-03-2024 18:43, Alexander E. Patrakov wrote: Hi Torkil, Unfortunately, your files contain nothing obviously bad or suspicious, except for two things: more PGs than usual and bad balance. What's your "mon max pg per osd" setting? [root@lazy ~]# ceph config get mon mon_max_pg_per_osd

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Torkil Svensgaard
On 23-03-2024 19:05, Alexander E. Patrakov wrote: Sorry for replying to myself, but "ceph osd pool ls detail" by itself is insufficient. For every erasure code profile mentioned in the output, please also run something like this: ceph osd erasure-code-profile get prf-for-ec-data ...where

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Alexander E. Patrakov
Sorry for replying to myself, but "ceph osd pool ls detail" by itself is insufficient. For every erasure code profile mentioned in the output, please also run something like this: ceph osd erasure-code-profile get prf-for-ec-data ...where "prf-for-ec-data" is the name that appears after the

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Alexander E. Patrakov
Hi Torkil, I take my previous response back. You have an erasure-coded pool with nine shards but only three datacenters. This, in general, cannot work. You need either nine datacenters or a very custom CRUSH rule. The second option may not be available if the current EC setup is already

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Alexander E. Patrakov
Hi Torkil, Unfortunately, your files contain nothing obviously bad or suspicious, except for two things: more PGs than usual and bad balance. What's your "mon max pg per osd" setting? On Sun, Mar 24, 2024 at 1:08 AM Torkil Svensgaard wrote: > > On 2024-03-23 17:54, Kai Stian Olstad wrote: > >

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Kai Stian Olstad
On Sat, Mar 23, 2024 at 12:09:29PM +0100, Torkil Svensgaard wrote: The other output is too big for pastebin and I'm not familiar with paste services, any suggestion for a preferred way to share such output? You can attached files to the mail here on the list. -- Kai Stian Olstad

[ceph-users] Re: Are we logging IRC channels?

2024-03-23 Thread Anthony D'Atri
I fear this will raise controversy, but in 2024 what’s the value in perpetuating an interface from early 1980s BITnet batch operating systems? > On Mar 23, 2024, at 5:45 AM, Janne Johansson wrote: > >> Sure! I think Wido just did it all unofficially, but afaik we've lost >> all of those

[ceph-users] Re: log_latency slow operation observed for submit_transact, latency = 22.644258499s

2024-03-23 Thread Torkil Svensgaard
Hi guys Thanks for the suggestions, we'll do the offline compaction and see how big an impact it will have. Even if compact-on-iteration should take care of it, doing it offline should avoid I/O problems during the compaction, correct? That's why offline is preferred to online? Mvh.

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Torkil Svensgaard
On 23-03-2024 10:44, Alexander E. Patrakov wrote: Hello Torkil, Hi Alexander It would help if you provided the whole "ceph osd df tree" and "ceph pg ls" outputs. Of course, here's ceph osd df tree to start with: https://pastebin.com/X50b2W0J The other output is too big for pastebin and

[ceph-users] Re: Are we logging IRC channels?

2024-03-23 Thread Janne Johansson
> Sure! I think Wido just did it all unofficially, but afaik we've lost > all of those records now. I don't know if Wido still reads the mailing > list but he might be able to chime in. There was a ton of knowledge in > the irc channel back in the day. With slack, it feels like a lot of >

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Alexander E. Patrakov
Hello Torkil, It would help if you provided the whole "ceph osd df tree" and "ceph pg ls" outputs. On Sat, Mar 23, 2024 at 4:26 PM Torkil Svensgaard wrote: > > Hi > > We have this after adding some hosts and changing crush failure domain > to datacenter: > > pgs: 1338512379/3162732055

[ceph-users] Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Torkil Svensgaard
Hi We have this after adding some hosts and changing crush failure domain to datacenter: pgs: 1338512379/3162732055 objects misplaced (42.321%) 5970 active+remapped+backfill_wait 4853 active+clean 11 active+remapped+backfilling We have 3 datacenters each

[ceph-users] Re: Are we logging IRC channels?

2024-03-23 Thread Alvaro Soto
Sweet, I started the irc/slack/discord bridge as unofficial and now it's official, so it's only a little effort to make this happen. Btw, I'm concerned about crimson slack channel not being bridge to irc, and not being logged so I'll start working on it. Cheers! On Fri, Mar 22, 2024, 7:28 PM

[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-23 Thread Alexander E. Patrakov
Hello Dulux-Oz, Please treat the RBD as a normal block device. Therefore, "mkfs" needs to be run before mounting it. The mistake is that you run "mkfs xfs" instead of "mkfs.xfs" (space vs dot). And, you are not limited to xfs, feel free to use ext4 or btrfs or any other block-based filesystem.

[ceph-users] Re: Laptop Losing Connectivity To CephFS On Sleep/Hibernation

2024-03-23 Thread duluxoz
On 23/03/2024 18:25, Konstantin Shalygin wrote: Hi, Yes, this is generic solution for end users mounts - samba gateway k Sent from my iPhone Thanks Konstantin, I really appreciate the help ___ ceph-users mailing list -- ceph-users@ceph.io To

[ceph-users] Re: Laptop Losing Connectivity To CephFS On Sleep/Hibernation

2024-03-23 Thread duluxoz
On 23/03/2024 18:22, Alexander E. Patrakov wrote: On Sat, Mar 23, 2024 at 3:08 PM duluxoz wrote: Almost right. Please set up a cluster of two SAMBA servers with CTDB, for high availability. Cool - thanks Alex, I really appreciate it :-) ___

[ceph-users] Re: Laptop Losing Connectivity To CephFS On Sleep/Hibernation

2024-03-23 Thread Konstantin Shalygin
Hi, Yes, this is generic solution for end users mounts - samba gateway k Sent from my iPhone > On 23 Mar 2024, at 12:10, duluxoz wrote: > > Hi Alex, and thanks for getting back to me so quickly (I really appreciate > it), > > So from what you said it looks like we've got the wrong

[ceph-users] Re: Laptop Losing Connectivity To CephFS On Sleep/Hibernation

2024-03-23 Thread Alexander E. Patrakov
On Sat, Mar 23, 2024 at 3:08 PM duluxoz wrote: > > > On 23/03/2024 18:00, Alexander E. Patrakov wrote: > > Hi Dulux-Oz, > > > > CephFS is not designed to deal with mobile clients such as laptops > > that can lose connectivity at any time. And I am not talking about the > > inconveniences on the

[ceph-users] Re: Laptop Losing Connectivity To CephFS On Sleep/Hibernation

2024-03-23 Thread duluxoz
On 23/03/2024 18:00, Alexander E. Patrakov wrote: Hi Dulux-Oz, CephFS is not designed to deal with mobile clients such as laptops that can lose connectivity at any time. And I am not talking about the inconveniences on the laptop itself, but about problems that your laptop would cause to

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-23 Thread Frédéric Nass
Considering https://github.com/ceph/ceph/blob/f6edcef6efe209e8947887752bd2b833d0ca13b7/src/osd/OSD.cc#L10086, the OSD: - always sets and updates its per osd osd_mclock_max_capacity_iops_{hdd,ssd} when the benchmark occurs and its measured iops is below or equal to

[ceph-users] Re: Laptop Losing Connectivity To CephFS On Sleep/Hibernation

2024-03-23 Thread Alexander E. Patrakov
Hi Dulux-Oz, CephFS is not designed to deal with mobile clients such as laptops that can lose connectivity at any time. And I am not talking about the inconveniences on the laptop itself, but about problems that your laptop would cause to other clients. The problems stem from the fact that MDSes

[ceph-users] Mounting A RBD Via Kernal Modules

2024-03-23 Thread duluxoz
Hi All, I'm trying to mount a Ceph Reef (v18.2.2 - latest version) RBD Image as a 2nd HDD on a Rocky Linux v9.3 (latest version) host. The EC pool has been created and initialised and the image has been created. The ceph-common package has been installed on the host. The correct keyring

[ceph-users] Laptop Losing Connectivity To CephFS On Sleep/Hibernation

2024-03-23 Thread duluxoz
Hi All, I'm looking for some help/advice to solve the issue outlined in the heading. I'm running CepfFS (name: cephfs) on a Ceph Reef (v18.2.2 - latest update) cluster, connecting from a laptop running Rocky Linux v9.3 (latest update) with KDE v5 (latest update). I've set up the laptop to