Hello again! Unfortunately I have to raise the problem again. I have constantly hanging snapshots on several images. My Ceph version is now 0.94.5. RBD CLI always giving me this: root@slpeah001:[~]:# rbd snap create volumes/volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd --snap test 2016-01-13 12:04:39.107166 7fb70e4c2880 -1 librbd::ImageWatcher: 0x427a710 no lock owners detected 2016-01-13 12:04:44.108783 7fb70e4c2880 -1 librbd::ImageWatcher: 0x427a710 no lock owners detected 2016-01-13 12:04:49.110321 7fb70e4c2880 -1 librbd::ImageWatcher: 0x427a710 no lock owners detected 2016-01-13 12:04:54.112373 7fb70e4c2880 -1 librbd::ImageWatcher: 0x427a710 no lock owners detected
I turned "debug rbd = 20" and found this records only on one of OSDs (on the same host as RBD client): 2016-01-13 11:44:46.076780 7fb5f05d8700 0 -- 192.168.252.11:6804/407141 >> 192.168.252.11:6800/407122 pipe(0x392d2000 sd=257 :6804 s=2 pgs=17 cs=1 l=0 c=0x383b4160).fault with nothing to send, going to standby 2016-01-13 11:58:26.261460 7fb5efbce700 0 -- 192.168.252.11:6804/407141 >> 192.168.252.11:6802/407124 pipe(0x39e45000 sd=156 :6804 s=2 pgs=17 cs=1 l=0 c=0x386fbb20).fault with nothing to send, going to standby 2016-01-13 12:04:23.948931 7fb5fede2700 0 -- 192.168.254.11:6804/407141 submit_message watch-notify(notify_complete (2) cookie 44850800 notify 99720550678667 ret -110) v3 remote, 192.168.254.11:0/1468572, failed lossy con, dropping message 0x3ab76fc0 2016-01-13 12:09:04.254329 7fb5fede2700 0 -- 192.168.254.11:6804/407141 submit_message watch-notify(notify_complete (2) cookie 69846112 notify 99720550678721 ret -110) v3 remote, 192.168.254.11:0/1509673, failed lossy con, dropping message 0x3830cb40 Here is the image properties root@slpeah001:[~]:# rbd info volumes/volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd rbd image 'volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd': size 200 GB in 51200 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.2f2a81562fea59 format: 2 features: layering, striping, exclusive, object map flags: stripe unit: 4096 kB stripe count: 1 root@slpeah001:[~]:# rbd status volumes/volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd Watchers: watcher=192.168.254.17:0/2088291 client.3424561 cookie=93888518795008 root@slpeah001:[~]:# rbd lock list volumes/volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd There is 1 exclusive lock on this image. Locker ID Address client.3424561 auto 93888518795008 192.168.254.17:0/2088291 Also taking RBD snapshots from python API also is hanging... This image is being used by libvirt. Any suggestions? Thanks! Regards, Vasily. 2016-01-06 1:11 GMT+08:00 Мистер Сёма <anga...@gmail.com>: > Well, I believe the problem is no more valid. > My code before was: > virsh qemu-agent-command $INSTANCE '{"execute":"guest-fsfreeze-freeze"}' > rbd snap create $RBD_ID --snap `date +%F-%T` > > and then snapshot creation was hanging forever. I inserted a 2 second sleep. > > My code after > virsh qemu-agent-command $INSTANCE '{"execute":"guest-fsfreeze-freeze"}' > sleep 2 > rbd snap create $RBD_ID --snap `date +%F-%T` > > And now it works perfectly. Again, I have no idea, how it solved the problem. > Thanks :) > > 2016-01-06 0:49 GMT+08:00 Мистер Сёма <anga...@gmail.com>: >> I am very sorry, but I am not able to increase log versbosity because >> it's a production cluster with very limited space for logs. Sounds >> crazy, but that's it. >> I have found out that the RBD snapshot process hangs forever only when >> QEMU fsfreeze was issued just before the snapshot. If the guest is not >> frozen - snapshot is taken with no problem... I have absolutely no >> idea how these two things could be related to each other... And again >> this issue occurs only when there is an exclusive lock on image and >> exclusive lock feature is enabled also on it. >> >> Do somebody else have such a problem? >> >> 2016-01-05 2:55 GMT+08:00 Jason Dillaman <dilla...@redhat.com>: >>> I am surprised by the error you are seeing with exclusive lock enabled. >>> The rbd CLI should be able to send the 'snap create' request to QEMU >>> without an error. Are you able to provide "debug rbd = 20" logs from >>> shortly before and after your snapshot attempt? >>> >>> -- >>> >>> Jason Dillaman >>> >>> >>> ----- Original Message ----- >>>> From: "Мистер Сёма" <anga...@gmail.com> >>>> To: "ceph-users" <ceph-users@lists.ceph.com> >>>> Sent: Monday, January 4, 2016 12:37:07 PM >>>> Subject: [ceph-users] How to do quiesced rbd snapshot in libvirt? >>>> >>>> Hello, >>>> >>>> Can anyone please tell me what is the right way to do quiesced RBD >>>> snapshots in libvirt (OpenStack)? >>>> My Ceph version is 0.94.3. >>>> >>>> I found two possible ways, none of them is working for me. Wonder if >>>> I'm doing something wrong: >>>> 1) Do VM fsFreeze through QEMU guest agent, perform RBD snapshot, do >>>> fsThaw. Looks good but the bad thing here is that libvirt uses >>>> exclusive lock on image, which results in errors like that when taking >>>> snapshot: " 7f359d304880 -1 librbd::ImageWatcher: no lock owners >>>> detected". It seems like rbd client is trying to take snapshot on >>>> behalf of exclusive lock owner but is unable to find this owner. >>>> Without an exclusive lock everything is working nice. >>>> >>>> 2) Performing QEMU external snapshots with local QCOW2 file being >>>> overlayed on top of RBD image. This seems really interesting but the >>>> bad thing is that there is no way currently to remove this kind of >>>> snapshot because active blockcommit is not currently working for RBD >>>> images (https://bugzilla.redhat.com/show_bug.cgi?id=1189998). >>>> >>>> So again my question is: how do you guys take quiesced RBD snapshots in >>>> libvirt? >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com