Re: [ceph-users] rbd rm snap on image with exclusive lock
Thanks -- let me know. In the future, you may want to consider having librbd create an admin socket so that you can change (certain) settings or interact w/ the process w/o restarting it. On Wed, Oct 25, 2017 at 9:54 AM, Piotr Dałek wrote: > On 17-10-25 03:30 PM, Jason Dillaman wrote: >> >> Hmm, hard to say off the top of my head. If you could enable "debug >> librbd = 20" logging on the buggy client that owns the lock, create a >> new snapshot, and attempt to delete it, it would be interesting to >> verify that the image is being properly refreshed. > > > I'd love to, but that would require us to restart that client - not an > option. We'll try to reproduce this somehow anyway and let you know if > something interesting shows up. > > > -- > Piotr Dałek > piotr.da...@corp.ovh.com > https://www.ovh.com/us/ -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd rm snap on image with exclusive lock
On 17-10-25 03:30 PM, Jason Dillaman wrote: Hmm, hard to say off the top of my head. If you could enable "debug librbd = 20" logging on the buggy client that owns the lock, create a new snapshot, and attempt to delete it, it would be interesting to verify that the image is being properly refreshed. I'd love to, but that would require us to restart that client - not an option. We'll try to reproduce this somehow anyway and let you know if something interesting shows up. -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovh.com/us/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd rm snap on image with exclusive lock
Hmm, hard to say off the top of my head. If you could enable "debug librbd = 20" logging on the buggy client that owns the lock, create a new snapshot, and attempt to delete it, it would be interesting to verify that the image is being properly refreshed. On Wed, Oct 25, 2017 at 9:23 AM, Piotr Dałek wrote: > On 17-10-25 02:39 PM, Jason Dillaman wrote: >> >> That log is showing that a snap remove request was made from a client >> that couldn't acquire the lock to a client that currently owns the >> lock. The client that currently owns the lock responded w/ an -ENOENT >> error that the snapshot doesn't exist. Depending on the maintenance >> operation requested, different errors codes are filtered out to handle >> the case where Ceph double (or more) delivers the request message to >> the lock owner. Normally this isn't an issue since the local client >> pre-checks the image state before sending the RPC message (i.e. snap >> remove will first locally ensure the snap exists and respond w/ >> -ENOENT if it doesn't). >> >> Therefore, in this case, the question is who is this rogue client that >> still owns the lock and is responding the a snap remove request but >> hasn't refreshed its state to know that the snapshot exists. > > > Thanks, that makes things clear. > > Seems like we have some Cinders utilizing Infernalis (9.2.1) librbd. Are you > aware of any bugs in 9.2.x that could cause such behavior? We've seen that > for the first time... > > -- > Piotr Dałek > piotr.da...@corp.ovh.com > https://www.ovh.com/us/ > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd rm snap on image with exclusive lock
On 17-10-25 02:39 PM, Jason Dillaman wrote: That log is showing that a snap remove request was made from a client that couldn't acquire the lock to a client that currently owns the lock. The client that currently owns the lock responded w/ an -ENOENT error that the snapshot doesn't exist. Depending on the maintenance operation requested, different errors codes are filtered out to handle the case where Ceph double (or more) delivers the request message to the lock owner. Normally this isn't an issue since the local client pre-checks the image state before sending the RPC message (i.e. snap remove will first locally ensure the snap exists and respond w/ -ENOENT if it doesn't). Therefore, in this case, the question is who is this rogue client that still owns the lock and is responding the a snap remove request but hasn't refreshed its state to know that the snapshot exists. Thanks, that makes things clear. Seems like we have some Cinders utilizing Infernalis (9.2.1) librbd. Are you aware of any bugs in 9.2.x that could cause such behavior? We've seen that for the first time... -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovh.com/us/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd rm snap on image with exclusive lock
That log is showing that a snap remove request was made from a client that couldn't acquire the lock to a client that currently owns the lock. The client that currently owns the lock responded w/ an -ENOENT error that the snapshot doesn't exist. Depending on the maintenance operation requested, different errors codes are filtered out to handle the case where Ceph double (or more) delivers the request message to the lock owner. Normally this isn't an issue since the local client pre-checks the image state before sending the RPC message (i.e. snap remove will first locally ensure the snap exists and respond w/ -ENOENT if it doesn't). Therefore, in this case, the question is who is this rogue client that still owns the lock and is responding the a snap remove request but hasn't refreshed its state to know that the snapshot exists. On Wed, Oct 25, 2017 at 3:33 AM, Piotr Dałek wrote: > Hello, > > I have a 10.2.5 Ceph cluster, there is an image with exclusive lock that is > being held by client. Some other client creates a snapshot on that image, > then (that client) goes away. Later, third client attempts to remove that > snapshot using rbd snap rm, but fails to do so without error: > > 2017-10-24 09:50:29.585675 7f754cbb1d80 5 librbd::AioImageRequestWQ: > 0x7f75579349f0 : ictx=0x7f7557932f50 > 2017-10-24 09:50:29.585759 7f754cbb1d80 20 librbd::ImageState: open > 2017-10-24 09:50:29.585761 7f754cbb1d80 10 librbd::ImageState: > 0x7f7557933d90 send_open_unlock > 2017-10-24 09:50:29.585767 7f754cbb1d80 10 librbd::image::OpenRequest: > 0x7f7557935360 send_v2_detect_header > 2017-10-24 09:50:29.588006 7f752e205700 10 librbd::image::OpenRequest: > handle_v2_detect_header: r=0 > 2017-10-24 09:50:29.588018 7f752e205700 10 librbd::image::OpenRequest: > 0x7f7557935360 send_v2_get_id > 2017-10-24 09:50:29.589679 7f752e205700 10 librbd::image::OpenRequest: > handle_v2_get_id: r=0 > 2017-10-24 09:50:29.589698 7f752e205700 10 librbd::image::OpenRequest: > 0x7f7557935360 send_v2_get_immutable_metadata > 2017-10-24 09:50:29.598466 7f752e205700 10 librbd::image::OpenRequest: > handle_v2_get_immutable_metadata: r=0 > 2017-10-24 09:50:29.598481 7f752e205700 10 librbd::image::OpenRequest: > 0x7f7557935360 send_v2_get_stripe_unit_count > 2017-10-24 09:50:29.601050 7f752e205700 10 librbd::image::OpenRequest: > handle_v2_get_stripe_unit_count: r=-8 > 2017-10-24 09:50:29.601074 7f752e205700 10 librbd::ImageCtx: init_layout > stripe_unit 4194304 stripe_count 1 object_size 4194304 prefix > rbd_data.[redacted] format rbd_data.[redacted].%016llx > 2017-10-24 09:50:29.601077 7f752e205700 10 librbd::image::OpenRequest: > 0x7f7557935360 send_v2_apply_metadata: start_key=conf_ > 2017-10-24 09:50:29.603277 7f752e205700 10 librbd::image::OpenRequest: > 0x7f7557935360 handle_v2_apply_metadata: r=0 > 2017-10-24 09:50:29.603294 7f752e205700 20 librbd::ImageCtx: apply_metadata > 2017-10-24 09:50:29.603425 7f752e205700 20 librbd::ImageCtx: enabling > caching... > 2017-10-24 09:50:29.603432 7f752e205700 20 librbd::ImageCtx: Initial cache > settings: size=33554432 num_objects=10 max_dirty=25165824 > target_dirty=16777216 max_dirty_age=1 > 2017-10-24 09:50:29.603535 7f752e205700 10 librbd::ImageCtx: cache bytes > 33554432 -> about 1048 objects > 2017-10-24 09:50:29.603790 7f752e205700 10 librbd::image::OpenRequest: > 0x7f7557935360 send_register_watch > 2017-10-24 09:50:29.603867 7f752e205700 10 librbd::ImageWatcher: > 0x7f75140017e0 registering image watcher > 2017-10-24 09:50:29.606497 7f752e205700 10 librbd::image::OpenRequest: > 0x7f7557935360 handle_register_watch: r=0 > 2017-10-24 09:50:29.606513 7f752e205700 10 librbd::image::OpenRequest: > 0x7f7557935360 send_refresh > 2017-10-24 09:50:29.606518 7f752e205700 10 librbd::image::RefreshRequest: > 0x7f7514008ab0 send_v2_get_mutable_metadata > 2017-10-24 09:50:29.619931 7f752e205700 10 librbd::image::RefreshRequest: > 0x7f7514008ab0 handle_v2_get_mutable_metadata: r=0 > 2017-10-24 09:50:29.619964 7f752e205700 10 librbd::image::RefreshRequest: > 0x7f7514008ab0 send_v2_get_flags > 2017-10-24 09:50:29.626218 7f752e205700 10 librbd::image::RefreshRequest: > 0x7f7514008ab0 handle_v2_get_flags: r=0 > 2017-10-24 09:50:29.626239 7f752e205700 10 librbd::image::RefreshRequest: > 0x7f7514008ab0 send_v2_get_snapshots > 2017-10-24 09:50:29.647141 7f752e205700 10 librbd::image::RefreshRequest: > 0x7f7514008ab0 handle_v2_get_snapshots: r=0 > 2017-10-24 09:50:29.647164 7f752e205700 10 librbd::image::RefreshRequest: > 0x7f7514008ab0 send_v2_init_exclusive_lock > 2017-10-24 09:50:29.647173 7f752e205700 10 librbd::ExclusiveLock: > 0x7f7514013510 init > 2017-10-24 09:50:29.647177 7f752e205700 5 librbd::AioImageRequestWQ: > block_writes: 0x7f7557932f50, num=1 > 2017-10-24 09:50:29.647227 7f752da04700 10 librbd::ExclusiveLock: > 0x7f7514013510 handle_init_complete > 2017-10-24 09:50:29.647237 7f752da04700 10 librbd::image::RefreshRequest: > 0x7f7514008ab0 handle_v2_init_exclusive_lock: r=0 > 2017-10-24 09:50:29.647
[ceph-users] rbd rm snap on image with exclusive lock
Hello, I have a 10.2.5 Ceph cluster, there is an image with exclusive lock that is being held by client. Some other client creates a snapshot on that image, then (that client) goes away. Later, third client attempts to remove that snapshot using rbd snap rm, but fails to do so without error: 2017-10-24 09:50:29.585675 7f754cbb1d80 5 librbd::AioImageRequestWQ: 0x7f75579349f0 : ictx=0x7f7557932f50 2017-10-24 09:50:29.585759 7f754cbb1d80 20 librbd::ImageState: open 2017-10-24 09:50:29.585761 7f754cbb1d80 10 librbd::ImageState: 0x7f7557933d90 send_open_unlock 2017-10-24 09:50:29.585767 7f754cbb1d80 10 librbd::image::OpenRequest: 0x7f7557935360 send_v2_detect_header 2017-10-24 09:50:29.588006 7f752e205700 10 librbd::image::OpenRequest: handle_v2_detect_header: r=0 2017-10-24 09:50:29.588018 7f752e205700 10 librbd::image::OpenRequest: 0x7f7557935360 send_v2_get_id 2017-10-24 09:50:29.589679 7f752e205700 10 librbd::image::OpenRequest: handle_v2_get_id: r=0 2017-10-24 09:50:29.589698 7f752e205700 10 librbd::image::OpenRequest: 0x7f7557935360 send_v2_get_immutable_metadata 2017-10-24 09:50:29.598466 7f752e205700 10 librbd::image::OpenRequest: handle_v2_get_immutable_metadata: r=0 2017-10-24 09:50:29.598481 7f752e205700 10 librbd::image::OpenRequest: 0x7f7557935360 send_v2_get_stripe_unit_count 2017-10-24 09:50:29.601050 7f752e205700 10 librbd::image::OpenRequest: handle_v2_get_stripe_unit_count: r=-8 2017-10-24 09:50:29.601074 7f752e205700 10 librbd::ImageCtx: init_layout stripe_unit 4194304 stripe_count 1 object_size 4194304 prefix rbd_data.[redacted] format rbd_data.[redacted].%016llx 2017-10-24 09:50:29.601077 7f752e205700 10 librbd::image::OpenRequest: 0x7f7557935360 send_v2_apply_metadata: start_key=conf_ 2017-10-24 09:50:29.603277 7f752e205700 10 librbd::image::OpenRequest: 0x7f7557935360 handle_v2_apply_metadata: r=0 2017-10-24 09:50:29.603294 7f752e205700 20 librbd::ImageCtx: apply_metadata 2017-10-24 09:50:29.603425 7f752e205700 20 librbd::ImageCtx: enabling caching... 2017-10-24 09:50:29.603432 7f752e205700 20 librbd::ImageCtx: Initial cache settings: size=33554432 num_objects=10 max_dirty=25165824 target_dirty=16777216 max_dirty_age=1 2017-10-24 09:50:29.603535 7f752e205700 10 librbd::ImageCtx: cache bytes 33554432 -> about 1048 objects 2017-10-24 09:50:29.603790 7f752e205700 10 librbd::image::OpenRequest: 0x7f7557935360 send_register_watch 2017-10-24 09:50:29.603867 7f752e205700 10 librbd::ImageWatcher: 0x7f75140017e0 registering image watcher 2017-10-24 09:50:29.606497 7f752e205700 10 librbd::image::OpenRequest: 0x7f7557935360 handle_register_watch: r=0 2017-10-24 09:50:29.606513 7f752e205700 10 librbd::image::OpenRequest: 0x7f7557935360 send_refresh 2017-10-24 09:50:29.606518 7f752e205700 10 librbd::image::RefreshRequest: 0x7f7514008ab0 send_v2_get_mutable_metadata 2017-10-24 09:50:29.619931 7f752e205700 10 librbd::image::RefreshRequest: 0x7f7514008ab0 handle_v2_get_mutable_metadata: r=0 2017-10-24 09:50:29.619964 7f752e205700 10 librbd::image::RefreshRequest: 0x7f7514008ab0 send_v2_get_flags 2017-10-24 09:50:29.626218 7f752e205700 10 librbd::image::RefreshRequest: 0x7f7514008ab0 handle_v2_get_flags: r=0 2017-10-24 09:50:29.626239 7f752e205700 10 librbd::image::RefreshRequest: 0x7f7514008ab0 send_v2_get_snapshots 2017-10-24 09:50:29.647141 7f752e205700 10 librbd::image::RefreshRequest: 0x7f7514008ab0 handle_v2_get_snapshots: r=0 2017-10-24 09:50:29.647164 7f752e205700 10 librbd::image::RefreshRequest: 0x7f7514008ab0 send_v2_init_exclusive_lock 2017-10-24 09:50:29.647173 7f752e205700 10 librbd::ExclusiveLock: 0x7f7514013510 init 2017-10-24 09:50:29.647177 7f752e205700 5 librbd::AioImageRequestWQ: block_writes: 0x7f7557932f50, num=1 2017-10-24 09:50:29.647227 7f752da04700 10 librbd::ExclusiveLock: 0x7f7514013510 handle_init_complete 2017-10-24 09:50:29.647237 7f752da04700 10 librbd::image::RefreshRequest: 0x7f7514008ab0 handle_v2_init_exclusive_lock: r=0 2017-10-24 09:50:29.647239 7f752da04700 10 librbd::image::RefreshRequest: 0x7f7514008ab0 send_v2_apply 2017-10-24 09:50:29.647242 7f752da04700 10 librbd::image::RefreshRequest: 0x7f7514008ab0 handle_v2_apply 2017-10-24 09:50:29.647246 7f752da04700 20 librbd::image::RefreshRequest: 0x7f7514008ab0 apply 2017-10-24 09:50:29.647248 7f752da04700 20 librbd::image::RefreshRequest: new snapshot id=272717 name=[redacted name] size=107374182400 2017-10-24 09:50:29.647257 7f752da04700 20 librbd::image::RefreshRequest: new snapshot id=240459 name=[redacted name#2] size=107374182400 2017-10-24 09:50:29.647270 7f752da04700 10 librbd::image::RefreshRequest: 0x7f7514008ab0 send_flush_aio 2017-10-24 09:50:29.647272 7f752da04700 10 librbd::image::RefreshRequest: 0x7f7514008ab0 handle_flush_aio: r=0 2017-10-24 09:50:29.647276 7f752da04700 10 librbd::image::OpenRequest: handle_refresh: r=0 2017-10-24 09:50:29.647278 7f752da04700 10 librbd::ImageState: 0x7f7557933d90 handle_open: r=0 2017-10-24 09:50:29.647328 7f754cbb1d8