On Wed, Mar 14, 2018 at 11:13 AM, Jason Dillaman <jdill...@redhat.com> wrote:
> Maxim, can you provide steps for a reproducer? > Yes, but it involves adding two artificial delays: one in tcmu-runner and another in kernel iscsi. If you're willing to take pains of recompiling kernel and tcmu-runner on one of gateway nodes, I'll help to reproduce. Generally, the idea of reproducer is simple: let's model a situation when two stale requests got stuck in kernel mailbox waiting to be consumed by tcmu-runner, and another one got stuck in iscsi layer -- immediately after reading iscsi request from the socket. If we unblock tcmu-runner after newer data went through another gateway, the first stale request will switch tcmu-runner state from LOCKED to UNLOCKED state, then the second stale request will trigger alua_thread to re-acquire the lock, so when the third request comes to tcmu-runner, the lock is already reacquired and it goes to OSD smoothly overwriting newer data. > > On Wed, Mar 14, 2018 at 2:06 PM, Maxim Patlasov <mpatla...@skytap.com> > wrote: > > On Sun, Mar 11, 2018 at 5:10 PM, Mike Christie <mchri...@redhat.com> > wrote: > >> > >> On 03/11/2018 08:54 AM, shadow_lin wrote: > >> > Hi Jason, > >> > How the old target gateway is blacklisted? Is it a feature of the > target > >> > gateway(which can support active/passive multipath) should provide or > is > >> > it only by rbd excusive lock? > >> > I think excusive lock only let one client can write to rbd at the same > >> > time,but another client can obtain the lock later when the lock is > >> > released. > >> > >> For the case where we had the lock and it got taken: > >> > >> If IO was blocked, then unjammed and it has already passed the target > >> level checks then the IO will be failed by the OSD due to the > >> blacklisting. When we get IO errors from ceph indicating we are > >> blacklisted the tcmu rbd layer will fail the IO indicating the state > >> change and that the IO can be retried. We will also tell the target > >> layer rbd does not have the lock anymore and to just stop the iscsi > >> connection while we clean up the blacklisting, running commands and > >> update our state. > > > > > > Mike, can you please give more details on how you tell the target layer > rbd > > does not have the lock and to stop iscsi connection. Which > > tcmu-runner/kernel-target functions are used for that? > > > > In fact, I performed an experiment with three stale write requests stuck > on > > blacklisted gateway, and one of them managed to overwrite newer data. I > > followed all instructions from > > http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-install/ > and > > http://docs.ceph.com/docs/master/rbd/iscsi-target-cli/, so I'm > interested > > what I'm missing... > > > > Thanks, > > Maxim > > > > Thanks, > > Maxim > > > >> > >> > > > > > > -- > Jason >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com