[ceph-users] Re: librbd hangs during large backfill

2023-07-20 Thread fb2cd0fc-933c-4cfe-b534-93d67045a088
We did have a peering storm, we're past that portion of the backfill and still experiencing new instances of rbd volumes hanging. It is for sure not just the peering storm. We've got 22.184% objects misplaced yet, with a bunch of pgs left to backfill (like 75k). Our rbd poll is using about

[ceph-users] Re: librbd hangs during large backfill

2023-07-20 Thread Jack Hayhurst
We did have a peering storm, we're past that portion of the backfill and still experiencing new instances of rbd volumes hanging. It is for sure not just the peering storm. We've got 22.184% objects misplaced yet, with a bunch of pgs left to backfill (like 75k). Our rbd poll is using about

[ceph-users] Re: librbd hangs during large backfill

2023-07-18 Thread Anthony D'Atri
I've seen this dynamic contribute to a hypervisor with many attachments running out of system-wide file descriptors. > On Jul 18, 2023, at 16:21, Konstantin Shalygin wrote: > > Hi, > > Check you libvirt limits for qemu open files/sockets. Seems, when you added > new OSD's, your librbd client

[ceph-users] Re: librbd hangs during large backfill

2023-07-18 Thread Konstantin Shalygin
Hi, Check you libvirt limits for qemu open files/sockets. Seems, when you added new OSD's, your librbd client limit reached k Sent from my iPhone > On 18 Jul 2023, at 19:32, Wesley Dillingham wrote: > > Did your automation / process allow for stalls in between changes to allow > peering to

[ceph-users] Re: librbd hangs during large backfill

2023-07-18 Thread Wesley Dillingham
Did your automation / process allow for stalls in between changes to allow peering to complete? My hunch is you caused a very large peering storm (during peering a PG is inactive) which in turn caused your VMs to panic. If the RBDs are unmapped and re-mapped does it still continue to struggle?