[ceph-users] Re: RBD Mirror - Failed to unlink peer
Can you share 'ceph versions' output? Do you see the same behaviour when adding a snapshot schedule, e.g. rbd -p mirror snapshot schedule add 30m I can't reproduce it, unfortunately, creating those mirror snapshots manually still works for me. Zitat von scott.cai...@tecnica-ltd.co.uk: We have rbd-mirror daemon running on both sites, however replication is only one way (i.e. the one on the remote site is the only live one, the one on the primary site is just there for if we ever need to set up two-way, but this is not currently set up for any replication - so it makes sense there's nothing in the log files on the primary site, as it's doing nothing). I'm not seeing any errors in rbd-mirror daemon log at either end - primary is blank as expected, and the error appears to be on the primary when the snapshot is taken, so the remote cluster never see's any errors. When we either manually run the command to take a snapshot, or have this run through cron we receive the error, e.g. running the following on the primary site: # rbd mirror image snapshot ceph-ssd/vm-101-disk-1 Snapshot ID: 58393 2024-08-26T12:39:54.958+0100 7b5ad6a006c0 -1 librbd::mirror::snapshot::CreatePrimaryRequest: 0x7b5ac0019e60 handle_unlink_peer: failed to unlink peer: (2) No such file or directory This appears in the console as the output for this (we used to only get the Snapshot ID: x), not in any rbd log files. Hope that clarifies it? Thanks. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RBD Mirror - Failed to unlink peer
We have rbd-mirror daemon running on both sites, however replication is only one way (i.e. the one on the remote site is the only live one, the one on the primary site is just there for if we ever need to set up two-way, but this is not currently set up for any replication - so it makes sense there's nothing in the log files on the primary site, as it's doing nothing). I'm not seeing any errors in rbd-mirror daemon log at either end - primary is blank as expected, and the error appears to be on the primary when the snapshot is taken, so the remote cluster never see's any errors. When we either manually run the command to take a snapshot, or have this run through cron we receive the error, e.g. running the following on the primary site: # rbd mirror image snapshot ceph-ssd/vm-101-disk-1 Snapshot ID: 58393 2024-08-26T12:39:54.958+0100 7b5ad6a006c0 -1 librbd::mirror::snapshot::CreatePrimaryRequest: 0x7b5ac0019e60 handle_unlink_peer: failed to unlink peer: (2) No such file or directory This appears in the console as the output for this (we used to only get the Snapshot ID: x), not in any rbd log files. Hope that clarifies it? Thanks. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RBD Mirror - Failed to unlink peer
Hi, I think I need some clarification. You have a rbd-mirror daemon running on the primary site although you have configured rbd-mirroring one-way only? And those errors you see in the rbd-mirror daemon log on the primary site? Maybe the daemon got started/activated by accident (or it was not disabled from some two-way mirror tests)? You don't need a rbd-mirror daemon on the primary site if you mirror only one-way. Zitat von scott.cai...@tecnica-ltd.co.uk: Thanks - side tracked with other work so only just got around to testing this. Unfortunately when enabling rbd-mirror logs on the source cluster I'm not seeing any events logged at all, however on the remote cluster I can see constant activity (mostly imageReplayer, mirrorStatusUpdater, etc. logs). Currently our sync is only one way (from source to remote), and the error appears to be on the source (i.e. as soon as the snapshot is taken). There's no error on the remote cluster in the rbd mirror logs, and nothing logged at all on the source cluster in the rbd mirror logs. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RBD Mirror - Failed to unlink peer
Thanks - side tracked with other work so only just got around to testing this. Unfortunately when enabling rbd-mirror logs on the source cluster I'm not seeing any events logged at all, however on the remote cluster I can see constant activity (mostly imageReplayer, mirrorStatusUpdater, etc. logs). Currently our sync is only one way (from source to remote), and the error appears to be on the source (i.e. as soon as the snapshot is taken). There's no error on the remote cluster in the rbd mirror logs, and nothing logged at all on the source cluster in the rbd mirror logs. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RBD Mirror - Failed to unlink peer
Hi, sorry for the delayed response, I was on vacation. I would set the "debug_rbd_mirror" config to 15 (or higher) and then watch the logs: # ceph config set client.rbd-mirror. debug_rbd_mirror 15 Maybe that reveals anything. Regards, Eugen Zitat von scott.cai...@tecnica-ltd.co.uk: Thanks - hopefully I'll hear back from devs then as I can't seem to find anything online about others encountering the same warning, but I surely can't be the only one! Would it be the rbd subsystem I'm looking to increase to debug level 15 or is there another subsystem for rbd mirroring? What would be the best way to enable it (ceph config set client debug_rbd 20 then change back to 0/5 once done)? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RBD Mirror - Failed to unlink peer
Thanks - hopefully I'll hear back from devs then as I can't seem to find anything online about others encountering the same warning, but I surely can't be the only one! Would it be the rbd subsystem I'm looking to increase to debug level 15 or is there another subsystem for rbd mirroring? What would be the best way to enable it (ceph config set client debug_rbd 20 then change back to 0/5 once done)? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RBD Mirror - Failed to unlink peer
Hi, I don't have much to contribute, but according to the source code [1] this seems to be a non-fatal message: void CreatePrimaryRequest::handle_unlink_peer(int r) { CephContext *cct = m_image_ctx->cct; ldout(cct, 15) << "r=" << r << dendl; if (r < 0) { lderr(cct) << "failed to unlink peer: " << cpp_strerror(r) << dendl; finish(0); // not fatal return; } I guess if you increased debug level to 15, you might see where exactly that message comes from. But I don't know how to get rid of them, so maybe one of the devs can comment on that. Regards, Eugen [1] https://github.com/ceph/ceph/blob/v17.2.7/src/librbd/mirror/snapshot/CreatePrimaryRequest.cc#L260 Zitat von Scott Cairns : Hi, Following the introduction of an additional node to our Ceph cluster, we've started to see unlink errors when taking a rbd mirror snapshot. We've had RBD mirroring configured for over a year now and it's been working flawlessly, however after we created OSD's on a new node we've receiving the following error: librbd::mirror::snapshot::CreatePrimaryRequest: 0x7f60c80056f0 handle_unlink_peer: failed to unlink peer: (2) No such file or directory This seemed to appear on around 3 of 150 snapshots on the first night and over the weeks has progressed to almost every snapshot. What's odd, is that the snapshot appears to be taken without any issues and does mirror to the DR site - we can see the snapshot ID taken on the source side is mirrored to the destination side when checking the rbd snap ls, and we've tested promoting an image on the DR site to ensure the snapshot does include up to date data, which it does. I can't see any other errors generated when the snapshot is taken to identify what file/directory isn't found - everything appears to be working okay it's just generating an error during the snapshot. I've also tried disabling mirroring on the disk and re-enabling however it doesn't appear to make any difference - there's no error on the initial mirror image, or the first snapshot taken after that, but every subsequent snapshot shows the error again. Any ideas? Thanks, Scott The content of this e-mail and any attachment is confidential and intended solely for the use of the individual to whom it is addressed. Any views or opinions presented are solely those of the author and do not necessarily represent those of Tecnica Limited. If you have received this e-mail in error please notify the sender. Any use, dissemination, forwarding, printing, or copying of this e-mail or any attachments thereto, in whole or part, without permission is strictly prohibited. Tecnica Limited Registered office: 5 Castle Court, Carnegie Campus, Dunfermline, Fife, KY11 8PB. Registered in Scotland No. SC250307. VAT No. 827 5110 42. This footnote also confirms that this email message has been swept for the presence of computer viruses. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io