[ceph-users] Re: RBD Mirror - Failed to unlink peer

2024-08-27 Thread Eugen Block

Can you share 'ceph versions' output?
Do you see the same behaviour when adding a snapshot schedule, e.g.

rbd -p  mirror snapshot schedule add 30m

I can't reproduce it, unfortunately, creating those mirror snapshots  
manually still works for me.


Zitat von scott.cai...@tecnica-ltd.co.uk:

We have rbd-mirror daemon running on both sites, however replication  
is only one way (i.e. the one on the remote site is the only live  
one, the one on the primary site is just there for if we ever need  
to set up two-way, but this is not currently set up for any  
replication - so it makes sense there's nothing in the log files on  
the primary site, as it's doing nothing).


I'm not seeing any errors in rbd-mirror daemon log at either end -  
primary is blank as expected, and the error appears to be on the  
primary when the snapshot is taken, so the remote cluster never  
see's any errors.


When we either manually run the command to take a snapshot, or have  
this run through cron we receive the error, e.g. running the  
following on the primary site:


# rbd mirror image snapshot ceph-ssd/vm-101-disk-1
Snapshot ID: 58393
2024-08-26T12:39:54.958+0100 7b5ad6a006c0 -1  
librbd::mirror::snapshot::CreatePrimaryRequest: 0x7b5ac0019e60  
handle_unlink_peer: failed to unlink peer: (2) No such file or  
directory



This appears in the console as the output for this (we used to only  
get the Snapshot ID: x), not in any rbd log files.


Hope that clarifies it? Thanks.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD Mirror - Failed to unlink peer

2024-08-27 Thread scott . cairns
We have rbd-mirror daemon running on both sites, however replication is only 
one way (i.e. the one on the remote site is the only live one, the one on the 
primary site is just there for if we ever need to set up two-way, but this is 
not currently set up for any replication - so it makes sense there's nothing in 
the log files on the primary site, as it's doing nothing).

I'm not seeing any errors in rbd-mirror daemon log at either end - primary is 
blank as expected, and the error appears to be on the primary when the snapshot 
is taken, so the remote cluster never see's any errors.

When we either manually run the command to take a snapshot, or have this run 
through cron we receive the error, e.g. running the following on the primary 
site:

# rbd mirror image snapshot ceph-ssd/vm-101-disk-1
Snapshot ID: 58393
2024-08-26T12:39:54.958+0100 7b5ad6a006c0 -1 
librbd::mirror::snapshot::CreatePrimaryRequest: 0x7b5ac0019e60 
handle_unlink_peer: failed to unlink peer: (2) No such file or directory


This appears in the console as the output for this (we used to only get the 
Snapshot ID: x), not in any rbd log files.

Hope that clarifies it? Thanks.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD Mirror - Failed to unlink peer

2024-08-26 Thread Eugen Block

Hi,

I think I need some clarification. You have a rbd-mirror daemon  
running on the primary site although you have configured rbd-mirroring  
one-way only? And those errors you see in the rbd-mirror daemon log on  
the primary site?
Maybe the daemon got started/activated by accident (or it was not  
disabled from some two-way mirror tests)? You don't need a rbd-mirror  
daemon on the primary site if you mirror only one-way.



Zitat von scott.cai...@tecnica-ltd.co.uk:

Thanks - side tracked with other work so only just got around to  
testing this.


Unfortunately when enabling rbd-mirror logs on the source cluster  
I'm not seeing any events logged at all, however on the remote  
cluster I can see constant activity (mostly imageReplayer,  
mirrorStatusUpdater, etc. logs).


Currently our sync is only one way (from source to remote), and the  
error appears to be on the source (i.e. as soon as the snapshot is  
taken).


There's no error on the remote cluster in the rbd mirror logs, and  
nothing logged at all on the source cluster in the rbd mirror logs.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD Mirror - Failed to unlink peer

2024-08-24 Thread scott . cairns
Thanks - side tracked with other work so only just got around to testing this.

Unfortunately when enabling rbd-mirror logs on the source cluster I'm not 
seeing any events logged at all, however on the remote cluster I can see 
constant activity (mostly imageReplayer, mirrorStatusUpdater, etc. logs).

Currently our sync is only one way (from source to remote), and the error 
appears to be on the source (i.e. as soon as the snapshot is taken).

There's no error on the remote cluster in the rbd mirror logs, and nothing 
logged at all on the source cluster in the rbd mirror logs.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD Mirror - Failed to unlink peer

2024-07-08 Thread Eugen Block

Hi,

sorry for the delayed response, I was on vacation.

I would set the "debug_rbd_mirror" config to 15 (or higher) and then  
watch the logs:


# ceph config set client.rbd-mirror. debug_rbd_mirror 15

Maybe that reveals anything.

Regards,
Eugen

Zitat von scott.cai...@tecnica-ltd.co.uk:

Thanks - hopefully I'll hear back from devs then as I can't seem to  
find anything online about others encountering the same warning, but  
I surely can't be the only one!


Would it be the rbd subsystem I'm looking to increase to debug level  
15 or is there another subsystem for rbd mirroring?
What would be the best way to enable it (ceph config set client  
debug_rbd 20 then change back to 0/5 once done)?

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD Mirror - Failed to unlink peer

2024-06-30 Thread scott . cairns
Thanks - hopefully I'll hear back from devs then as I can't seem to find 
anything online about others encountering the same warning, but I surely can't 
be the only one!

Would it be the rbd subsystem I'm looking to increase to debug level 15 or is 
there another subsystem for rbd mirroring?
What would be the best way to enable it (ceph config set client debug_rbd 20 
then change back to 0/5 once done)?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD Mirror - Failed to unlink peer

2024-06-04 Thread Eugen Block

Hi,

I don't have much to contribute, but according to the source code [1]  
this seems to be a non-fatal message:


void CreatePrimaryRequest::handle_unlink_peer(int r) {
  CephContext *cct = m_image_ctx->cct;
  ldout(cct, 15) << "r=" << r << dendl;

  if (r < 0) {
lderr(cct) << "failed to unlink peer: " << cpp_strerror(r) << dendl;
finish(0); // not fatal
return;
  }

I guess if you increased debug level to 15, you might see where  
exactly that message comes from. But I don't know how to get rid of  
them, so maybe one of the devs can comment on that.


Regards,
Eugen

[1]  
https://github.com/ceph/ceph/blob/v17.2.7/src/librbd/mirror/snapshot/CreatePrimaryRequest.cc#L260


Zitat von Scott Cairns :


Hi,

Following the introduction of an additional node to our Ceph  
cluster, we've started to see unlink errors when taking a rbd mirror  
snapshot.


We've had RBD mirroring configured for over a year now and it's been  
working flawlessly, however after we created OSD's on a new node  
we've receiving the following error:


librbd::mirror::snapshot::CreatePrimaryRequest: 0x7f60c80056f0  
handle_unlink_peer: failed to unlink peer: (2) No such file or  
directory


This seemed to appear on around 3 of 150 snapshots on the first  
night and over the weeks has progressed to almost every snapshot.


What's odd, is that the snapshot appears to be taken without any  
issues and does mirror to the DR site - we can see the snapshot ID  
taken on the source side is mirrored to the destination side when  
checking the rbd snap ls, and we've tested promoting an image on the  
DR site to ensure the snapshot does include up to date data, which  
it does.


I can't see any other errors generated when the snapshot is taken to  
identify what file/directory isn't found - everything appears to be  
working okay it's just generating an error during the snapshot.



I've also tried disabling mirroring on the disk and re-enabling  
however it doesn't appear to make any difference - there's no error  
on the initial mirror image, or the first snapshot taken after that,  
but every subsequent snapshot shows the error again.


Any ideas?

Thanks,
Scott



The content of this e-mail and any attachment is confidential and  
intended solely for the use of the individual to whom it is addressed.
Any views or opinions presented are solely those of the author and  
do not necessarily represent those of Tecnica Limited.

If you have received this e-mail in error please notify the sender.
Any use, dissemination, forwarding, printing, or copying of this  
e-mail or any attachments thereto, in whole or part, without  
permission is strictly prohibited.


Tecnica Limited Registered office: 5 Castle Court, Carnegie Campus,  
Dunfermline, Fife, KY11 8PB.

Registered in Scotland No. SC250307.
VAT No. 827 5110 42.

This footnote also confirms that this email message has been swept  
for the presence of computer viruses.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io