Hello again!

Unfortunately I have to raise the problem again. I have constantly
hanging snapshots on several images.
My Ceph version is now 0.94.5.
RBD CLI always giving me this:
root@slpeah001:[~]:# rbd snap create
volumes/volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd --snap test
2016-01-13 12:04:39.107166 7fb70e4c2880 -1 librbd::ImageWatcher:
0x427a710 no lock owners detected
2016-01-13 12:04:44.108783 7fb70e4c2880 -1 librbd::ImageWatcher:
0x427a710 no lock owners detected
2016-01-13 12:04:49.110321 7fb70e4c2880 -1 librbd::ImageWatcher:
0x427a710 no lock owners detected
2016-01-13 12:04:54.112373 7fb70e4c2880 -1 librbd::ImageWatcher:
0x427a710 no lock owners detected

I turned "debug rbd = 20" and found this records only on one of OSDs
(on the same host as RBD client):
2016-01-13 11:44:46.076780 7fb5f05d8700  0 --
192.168.252.11:6804/407141 >> 192.168.252.11:6800/407122
pipe(0x392d2000 sd=257 :6804 s=2 pgs=17 cs=1 l=0 c=0x383b4160).fault
with nothing to send, going to standby
2016-01-13 11:58:26.261460 7fb5efbce700  0 --
192.168.252.11:6804/407141 >> 192.168.252.11:6802/407124
pipe(0x39e45000 sd=156 :6804 s=2 pgs=17 cs=1 l=0 c=0x386fbb20).fault
with nothing to send, going to standby
2016-01-13 12:04:23.948931 7fb5fede2700  0 --
192.168.254.11:6804/407141 submit_message watch-notify(notify_complete
(2) cookie 44850800 notify 99720550678667 ret -110) v3 remote,
192.168.254.11:0/1468572, failed lossy con, dropping message
0x3ab76fc0
2016-01-13 12:09:04.254329 7fb5fede2700  0 --
192.168.254.11:6804/407141 submit_message watch-notify(notify_complete
(2) cookie 69846112 notify 99720550678721 ret -110) v3 remote,
192.168.254.11:0/1509673, failed lossy con, dropping message
0x3830cb40

Here is the image properties
root@slpeah001:[~]:# rbd info
volumes/volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd
rbd image 'volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd':
        size 200 GB in 51200 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.2f2a81562fea59
        format: 2
        features: layering, striping, exclusive, object map
        flags:
        stripe unit: 4096 kB
        stripe count: 1
root@slpeah001:[~]:# rbd status
volumes/volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd
Watchers:
        watcher=192.168.254.17:0/2088291 client.3424561 cookie=93888518795008
root@slpeah001:[~]:# rbd lock list
volumes/volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd
There is 1 exclusive lock on this image.
Locker         ID                  Address
client.3424561 auto 93888518795008 192.168.254.17:0/2088291

Also taking RBD snapshots from python API also is hanging...
This image is being used by libvirt.

Any suggestions?
Thanks!

Regards, Vasily.


2016-01-06 1:11 GMT+08:00 Мистер Сёма <anga...@gmail.com>:
> Well, I believe the problem is no more valid.
> My code before was:
> virsh qemu-agent-command $INSTANCE '{"execute":"guest-fsfreeze-freeze"}'
> rbd snap create $RBD_ID --snap `date +%F-%T`
>
> and then snapshot creation was hanging forever. I inserted a 2 second sleep.
>
> My code after
> virsh qemu-agent-command $INSTANCE '{"execute":"guest-fsfreeze-freeze"}'
> sleep 2
> rbd snap create $RBD_ID --snap `date +%F-%T`
>
> And now it works perfectly. Again, I have no idea, how it solved the problem.
> Thanks :)
>
> 2016-01-06 0:49 GMT+08:00 Мистер Сёма <anga...@gmail.com>:
>> I am very sorry, but I am not able to increase log versbosity because
>> it's a production cluster with very limited space for logs. Sounds
>> crazy, but that's it.
>> I have found out that the RBD snapshot process hangs forever only when
>> QEMU fsfreeze was issued just before the snapshot. If the guest is not
>> frozen - snapshot is taken with no problem... I have absolutely no
>> idea how these two things could be related to each other... And again
>> this issue occurs only when there is an exclusive lock on image and
>> exclusive lock feature is enabled also on it.
>>
>> Do somebody else have such a problem?
>>
>> 2016-01-05 2:55 GMT+08:00 Jason Dillaman <dilla...@redhat.com>:
>>> I am surprised by the error you are seeing with exclusive lock enabled.  
>>> The rbd CLI should be able to send the 'snap create' request to QEMU 
>>> without an error.  Are you able to provide "debug rbd = 20" logs from 
>>> shortly before and after your snapshot attempt?
>>>
>>> --
>>>
>>> Jason Dillaman
>>>
>>>
>>> ----- Original Message -----
>>>> From: "Мистер Сёма" <anga...@gmail.com>
>>>> To: "ceph-users" <ceph-users@lists.ceph.com>
>>>> Sent: Monday, January 4, 2016 12:37:07 PM
>>>> Subject: [ceph-users] How to do quiesced rbd snapshot in libvirt?
>>>>
>>>> Hello,
>>>>
>>>> Can anyone please tell me what is the right way to do quiesced RBD
>>>> snapshots in libvirt (OpenStack)?
>>>> My Ceph version is 0.94.3.
>>>>
>>>> I found two possible ways, none of them is working for me. Wonder if
>>>> I'm doing something wrong:
>>>> 1) Do VM fsFreeze through QEMU guest agent, perform RBD snapshot, do
>>>> fsThaw. Looks good but the bad thing here is that libvirt uses
>>>> exclusive lock on image, which results in errors like that when taking
>>>> snapshot: " 7f359d304880 -1 librbd::ImageWatcher: no lock owners
>>>> detected". It seems like rbd client is trying to take snapshot on
>>>> behalf of exclusive lock owner but is unable to find this owner.
>>>> Without an exclusive lock everything is working nice.
>>>>
>>>> 2)  Performing QEMU external snapshots with local QCOW2 file being
>>>> overlayed on top of RBD image. This seems really interesting but the
>>>> bad thing is that there is no way currently to remove this kind of
>>>> snapshot because active blockcommit is not currently working for RBD
>>>> images (https://bugzilla.redhat.com/show_bug.cgi?id=1189998).
>>>>
>>>> So again my question is: how do you guys take quiesced RBD snapshots in
>>>> libvirt?
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to