There is an edge case with cloned image writeback caching that occurs after an 
attempt to read a non-existent clone RADOS object, followed by a write to said 
object, followed by another read.  This second read will cause the cached write 
to be flushed to the OSD while the appropriate locks are not being held.  This 
issue is being tracked via an upstream tracker ticket [1].

This issue effects librbd clients using v0.94.4 and v9.x.  Disabling the cache 
or switching to write-through caching (rbd_cache_max_dirty = 0) should avoid 
the issue until it is fixed in the next Ceph release.

[1] http://tracker.ceph.com/issues/13559

-- 

Jason Dillaman 


----- Original Message ----- 

> From: "Andrei Mikhailovsky" <and...@arhont.com>
> To: ceph-us...@ceph.com
> Sent: Wednesday, October 21, 2015 8:17:39 AM
> Subject: [ceph-users] [urgent] KVM issues after upgrade to 0.94.4

> Hello guys,

> I've upgraded to the latest Hammer release and I've just noticed a massive
> issue after the upgrade (((

> I am using ceph for virtual machine rbd storage over cloudstack. I am having
> issues with starting virtual routers. The libvirt error message is:

> cat r-1407-VM.log
> 2015-10-21 11:04:59.262+0000: starting up
> LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin
> QEMU_AUDIO_DRV=none /usr/bin/kvm-spice -name r-1407-VM -S -machine
> pc-i440fx-trusty,accel=kvm,usb=off -m 256 -realtime mlock=off -smp
> 1,sockets=1,cores=1,threads=1 -uuid 815d2860-cc7f-475d-bf63-02814c720fe4
> -no-user-config -nodefaults -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/r-1407-VM.monitor,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
> -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive
> file=rbd:Primary-ubuntu-1/c3f90fb4-c1a6-4e99-a2c0-64ae4517412e:id=admin:key=AQDiDbJR2GqPABAAWCcsUQ+UQwK8z9c6LWrizw==:auth_supported=cephx\;none:mon_host=ceph-mon.csprdc.arhont.com\:6789,if=none,id=drive-virtio-disk0,format=raw,cache=none
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
> -drive
> file=/usr/share/cloudstack-common/vms/systemvm.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw,cache=none
> -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1
> -netdev tap,fd=54,id=hostnet0,vhost=on,vhostfd=55 -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=02:00:2e:f7:00:18,bus=pci.0,addr=0x3,rombar=0,romfile=
> -netdev tap,fd=56,id=hostnet1,vhost=on,vhostfd=57 -device
> virtio-net-pci,netdev=hostnet1,id=net1,mac=0e:00:a9:fe:01:42,bus=pci.0,addr=0x4,rombar=0,romfile=
> -netdev tap,fd=58,id=hostnet2,vhost=on,vhostfd=59 -device
> virtio-net-pci,netdev=hostnet2,id=net2,mac=06:0c:b6:00:02:13,bus=pci.0,addr=0x5,rombar=0,romfile=
> -chardev pty,id=charserial0 -device
> isa-serial,chardev=charserial0,id=serial0 -chardev
> socket,id=charchannel0,path=/var/lib/libvirt/qemu/r-1407-VM.agent,server,nowait
> -device
> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=r-1407-VM.vport
> -device usb-tablet,id=input0 -vnc 192.168.169.2:10,password -device
> cirrus-vga,id=video0,bus=pci.0,addr=0x2
> Domain id=42 is tainted: high-privileges
> libust[20136/20136]: Warning: HOME environment variable not set. Disabling
> LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
> char device redirected to /dev/pts/13 (label charserial0)
> librbd/LibrbdWriteback.cc: In function 'virtual ceph_tid_t
> librbd::LibrbdWriteback::write(const object_t&, const object_locator_t&,
> uint64_t, uint64_t, const SnapContext&, const bufferlist&, utime_t,
> uint64_t, __u32, Context*)' thread 7ffa6b7fe700 time 2015-10-21
> 12:05:07.901876
> librbd/LibrbdWriteback.cc: 160: FAILED assert(m_ictx->owner_lock.is_locked())
> ceph version 0.94.4 (95292699291242794510b39ffde3f4df67898d3a)
> 1: (()+0x17258b) [0x7ffa92ef758b]
> 2: (()+0xa9573) [0x7ffa92e2e573]
> 3: (()+0x3a90ca) [0x7ffa9312e0ca]
> 4: (()+0x3b583d) [0x7ffa9313a83d]
> 5: (()+0x7212c) [0x7ffa92df712c]
> 6: (()+0x9590f) [0x7ffa92e1a90f]
> 7: (()+0x969a3) [0x7ffa92e1b9a3]
> 8: (()+0x4782a) [0x7ffa92dcc82a]
> 9: (()+0x56599) [0x7ffa92ddb599]
> 10: (()+0x7284e) [0x7ffa92df784e]
> 11: (()+0x162b7e) [0x7ffa92ee7b7e]
> 12: (()+0x163c10) [0x7ffa92ee8c10]
> 13: (()+0x8182) [0x7ffa8ec49182]
> 14: (clone()+0x6d) [0x7ffa8e97647d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
> terminate called after throwing an instance of 'ceph::FailedAssertion'
> 2015-10-21 11:05:08.091+0000: shutting down

> From what I can see, there seem to be an issue with locking
> (librbd/LibrbdWriteback.cc: 160: FAILED
> assert(m_ictx->owner_lock.is_locked())). However, the r-1407-VM virtual
> router is a new router and has not been created or ran before. So, I don't
> see why there is an issue with locking.

> Could someone please help me determine the cause of the error and how to fix
> it. I've not seen this on 0.94.1.

> Many thanks

> Andrei

> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to