[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-24 Thread Wesley Dillingham
I suspect this may be a network / firewall issue between the client and one OSD-server. Perhaps the 100MB RBD didn't have an object mapped to a PG with the primary on this problematic OSD host but the 2TB RBD does. Just a theory. Respectfully, *Wes Dillingham* LinkedIn

[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-24 Thread duluxoz
Hi Alexander, Already set (and confirmed by running the command again) - no good, I'm afraid. So I just restart with a brand new image and ran the following commands on the ceph cluster and the host respectively. Results are below: On the ceph cluster: [code] rbd create --size 4T

[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-24 Thread Alexander E. Patrakov
Hello Matthew, Is the overwrite enabled in the erasure-coded pool? If not, here is how to fix it: ceph osd pool set my_pool.data allow_ec_overwrites true On Mon, Mar 25, 2024 at 11:17 AM duluxoz wrote: > > Hi Curt, > > Blockdev --getbsz: 4096 > > Rbd info my_pool.meta/my_image: > > ~~~ > > rbd

[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-24 Thread duluxoz
Hi Curt, Blockdev --getbsz: 4096 Rbd info my_pool.meta/my_image: ~~~ rbd image 'my_image':     size 4 TiB in 1048576 objects     order 22 (4 MiB objects)     snapshot_count: 0     id: 294519bf21a1af     data_pool: my_pool.data     block_name_prefix:

[ceph-users] el7 + nautilus rbd snapshot map + lvs mount crash

2024-03-24 Thread Marc
Looks like this procedure crashes the ceph node. Tried this now for 2nd time after updating and again crash. el7 + nautilus -> rbd snapshot map -> lvs mount -> crash (lvs are not even duplicate names) ___ ceph-users mailing list -- ceph-users@ceph.io

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-24 Thread Torkil Svensgaard
On 24-03-2024 13:41, Tyler Stachecki wrote: On Sat, Mar 23, 2024, 4:26 AM Torkil Svensgaard wrote: Hi ... Using mclock with high_recovery_ops profile. What is the bottleneck here? I would have expected a huge number of simultaneous backfills. Backfill reservation logjam? mClock is very

[ceph-users] Re: log_latency slow operation observed for submit_transact, latency = 22.644258499s

2024-03-24 Thread Torkil Svensgaard
No latency spikes seen the last 24 hours after manually compacting all the OSDs so it seemed to solve it for us at least. Thanks all. Mvh. Torkil On 23-03-2024 12:32, Torkil Svensgaard wrote: Hi guys Thanks for the suggestions, we'll do the offline compaction and see how big an impact it

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-24 Thread Tyler Stachecki
On Sat, Mar 23, 2024, 4:26 AM Torkil Svensgaard wrote: > Hi > > ... Using mclock with high_recovery_ops profile. > > What is the bottleneck here? I would have expected a huge number of > simultaneous backfills. Backfill reservation logjam? > mClock is very buggy in my experience and frequently

[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-24 Thread duluxoz
Hi, Alwin, Command (as requested): rbd create --size 4T my_pool.meta/my_image --data-pool my_pool.data --image-feature exclusive-lock --image-feature deep-flatten --image-feature fast-diff --image-feature layering --image-feature object-map --image-feature data-pool On 24/03/2024 22:53,

[ceph-users] Re: ceph cluster extremely unbalanced

2024-03-24 Thread Matt Vandermeulen
Hi, I would expect that almost every PG in the cluster is going to have to move once you start standardizing CRUSH weights, and I wouldn't want to move data twice. My plan would look something like: - Make sure the cluster is healthy (no degraded PGs) - Set nobackfill, norebalance flags to

[ceph-users] Re: ceph cluster extremely unbalanced

2024-03-24 Thread Alexander E. Patrakov
Hi Denis, My approach would be: 1. Run "ceph osd metadata" and see if you have a mix of 64K and 4K bluestore_min_alloc_size. If so, you cannot really use the built-in balancer, as it would result in a bimodal distribution instead of a proper balance, see https://tracker.ceph.com/issues/64715,

[ceph-users] ceph cluster extremely unbalanced

2024-03-24 Thread Denis Polom
Hi guys, recently I took over a care of Ceph cluster that is extremely unbalanced. Cluster is running on Quincy 17.2.7 (upgraded Nautilus -> Octopus -> Quincy) and has 1428 OSDs (HDDs). We are running CephFS on it. Crush failure domain is datacenter (there are 3), data pool is EC 3+3. This

[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-24 Thread Curt
Hey Mathew, One more thing out of curiosity can you send the output of blockdev --getbsz on the rbd dev and rbd info? I'm using 16TB rbd images without issue, but I haven't updated to reef .2 yet. Cheers, Curt On Sun, 24 Mar 2024, 11:12 duluxoz, wrote: > Hi Curt, > > Nope, no dropped

[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-24 Thread duluxoz
Hi Curt, Nope, no dropped packets or errors - sorry, wrong tree  :-) Thanks for chiming in. On 24/03/2024 20:01, Curt wrote: I may be barking up the wrong tree, but if you run ip -s link show yourNicID on this server or your OSDs do you see any errors/dropped/missed?

[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-24 Thread Curt
I may be barking up the wrong tree, but if you run ip -s link show yourNicID on this server or your OSDs do you see any errors/dropped/missed? On Sun, 24 Mar 2024, 09:20 duluxoz, wrote: > Hi, > > Yeah, I've been testing various configurations since I sent my last > email - all to no avail. > >

[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-24 Thread duluxoz
Hi, Yeah, I've been testing various configurations since I sent my last email - all to no avail. So I'm back to the start with a brand new 4T image which is rbdmapped to /dev/rbd0. Its not formatted (yet) and so not mounted. Every time I attempt a mkfs.xfs /dev/rbd0 (or mkfs.xfs

[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-24 Thread Alexander E. Patrakov
Hi, Please test again, it must have been some network issue. A 10 TB RBD image is used here without any problems. On Sun, Mar 24, 2024 at 1:01 PM duluxoz wrote: > > Hi Alexander, > > DOH! > > Thanks for pointing out my typo - I missed it, and yes, it was my > issue. :-) > > New issue (sort