Hi Andreas,

thanks for that piece of information.

I understand that transient lock migration is important under "normal" 
operational conditions. The use case I have in mind is the process of 
live-migration, when one might want to do a clean hand-over of a lock between 
two librbd clients. Specifically, in our use case we ended up with 2 VMs 
running on the same image and ultimately destroying the local file system. 
Here, I could imagine that the *eligibility* to aquire a transient lock could 
be managed by a lock list or something so that the orchestrator can, in an 
atomic operation, mark the target client as eligible and the source client as 
not. Then, the transient lock migration cannot happen to clients that should 
not get this lock under any circumstances.

The orchestrator knows about that, but I don't see a way to communicate this to 
a state of an rbd image so that the ceph side could enforce that independently 
as well. This would help dramatically with troubleshooting orchestrator errors 
as these are much less likely to lead to data corruption. I could imagine a 
command that would allow to make a client read-only no matter what, for 
example, lock a client in read-only mode. Such a client should not be able to 
aquire an exclusive write lock, it should simply be ignored when making such a 
request.

Maybe I misunderstand the rbd lock add documentation, but it doesn't seem to 
offer this kind of permission lock. It would be great if I'm wrong or if it 
could be added.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Andreas Teuchert <a.teuch...@syseleven.de>
Sent: 19 January 2023 13:30:52
To: Frank Schilder; Ilya Dryomov
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: Ceph rbd clients surrender exclusive lock in 
critical situation

Hi Frank,

one thing that might be relevant here: If you disable transparent lock
transitions, you cannot create snapshots of images that are in use in
such a way.

This may or may not be relevant in your case. I'm just mentioning it
because I myself was surprised by that.

Best regards,

Andreas

On 19.01.23 12:50, Frank Schilder wrote:
> Hi Ilya,
>
> thanks for the info, it did help. I agree, its the orchestration layer's 
> responsibility to handle things right. I have a case open already with 
> support and it looks like there is indeed a bug on that side. I was mainly 
> after a way that ceph librbd clients could offer a safety net in case such 
> bugs occur. Its a bit like the four-eyes principle, having an orchestration 
> layer do things right is good, but having a second instance confirming the 
> same thing is much better. A bug in one layer will not cause a catastrophe, 
> because the second layer catches it.
>
> I'm not sure if the rbd lock capabilities are sufficiently powerful to 
> provide a command-line interface to that. The flag RBD_LOCK_MODE_EXCLUSIVE 
> seems the only way and if qemu is not using it, there seems not a lot one can 
> do in scripts.
>
> Thanks for your help and best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to