Nope, I have not, as we didn't face the issue for some time now and the less promotion happening, the better for us : this is a cluster for backups and the same disks are used for cache and EC pools at the moment.
I will try this if the bug happens again. On Mon, Feb 29, 2016 at 11:40 AM, Christian Balzer <ch...@gol.com> wrote: > > Hello, > > On Mon, 29 Feb 2016 11:14:28 +0100 Adrien Gillard wrote: > > > We are likely facing the same kind of issue in our infernalis cluster > > with EC. > > > Have you tried what Nick Fisk suggested (and which makes perfect sense to > me, but I can't test it, no EC pools here)? > > That is setting the recency values to 0, which should always force > promotion. > > Christian > > > From times to times some of our volumes mounted via the RBD kernel > > module, will start to "freeze". I can still browse the volume, but the > > (backup) application using it hangs. I guess it's because it tries to > > access an object from the EC pool (tracker.ceph.com seems down at the > > moment so I can't access the details). > > > > I can't map / unmap the affected volumes (it rarely concerns all the > > volumes at the same time). Running 'rbd -p ec-pool info volume-1' gets me > > the same errors as Frederic ((95) Operation not supported). The sloppy > > workaround I found is running 'rbd -p ec-pool ls -l' a couple of times. > > It "magically" gets the volumes in order and they become usable again. > > > > Adrien > > > > On Sat, Feb 27, 2016 at 12:14 PM, SCHAER Frederic > > <frederic.sch...@cea.fr> wrote: > > > > > Hi, > > > > > > Many thanks. > > > Just tested : I could see the rbd_id object in the EC pool, and after > > > promoting it I could see it in the SSD cache pool and could > > > successfully list the image information, indeed. > > > > > > Cheers > > > > > > -----Message d'origine----- > > > De : Jason Dillaman [mailto:dilla...@redhat.com] > > > Envoyé : mercredi 24 février 2016 19:16 > > > À : SCHAER Frederic <frederic.sch...@cea.fr> > > > Cc : ceph-us...@ceph.com; HONORE Pierre-Francois < > > > pierre-francois.hon...@cea.fr> > > > Objet : Re: [ceph-users] ceph hammer : rbd info/Status : operation not > > > supported (95) (EC+RBD tier pools) > > > > > > If you run "rados -p <cache pool> ls | grep "rbd_id.<yyy-disk1>" and > > > don't see that object, you are experiencing that issue [1]. > > > > > > You can attempt to work around this issue by running "rados -p > > > irfu-virt setomapval rbd_id.<yyy-disk1> dummy value" to force-promote > > > the object to the cache pool. I haven't tested / verified that will > > > alleviate the issue, though. > > > > > > [1] http://tracker.ceph.com/issues/14762 > > > > > > -- > > > > > > Jason Dillaman > > > > > > ----- Original Message ----- > > > > > > > From: "SCHAER Frederic" <frederic.sch...@cea.fr> > > > > To: ceph-us...@ceph.com > > > > Cc: "HONORE Pierre-Francois" <pierre-francois.hon...@cea.fr> > > > > Sent: Wednesday, February 24, 2016 12:56:48 PM > > > > Subject: [ceph-users] ceph hammer : rbd info/Status : operation not > > > supported > > > > (95) (EC+RBD tier pools) > > > > > > > Hi, > > > > > > > I just started testing VMs inside ceph this week, ceph-hammer 0.94-5 > > > here. > > > > > > > I built several pools, using pool tiering: > > > > - A small replicated SSD pool (5 SSDs only, but I thought it’d be > > > > better > > > for > > > > IOPS, I intend to test the difference with disks only) > > > > - Overlaying a larger EC pool > > > > > > > I just have 2 VMs in Ceph… and one of them is breaking something. > > > > The VM that is not breaking was migrated using qemu-img for creating > > > > the > > > ceph > > > > volume, then migrating the data. Its rbd format is 1 : > > > > rbd image 'xxx-disk1': > > > > size 20480 MB in 5120 objects > > > > order 22 (4096 kB objects) > > > > block_name_prefix: rb.0.83a49.3d1b58ba > > > > format: 1 > > > > > > > The VM that’s failing has a rbd format 2 > > > > this is what I had before things started breaking : > > > > rbd image 'yyy-disk1': > > > > size 10240 MB in 2560 objects > > > > order 22 (4096 kB objects) > > > > block_name_prefix: rbd_data.8ae1f47398c89 > > > > format: 2 > > > > features: layering, striping > > > > flags: > > > > stripe unit: 4096 kB > > > > stripe count: 1 > > > > > > > The VM started behaving weirdly with a huge IOwait % during its > > > > install (that’s to say it did not take long to go wrong ;) ) > > > > Now, this is the only thing that I can get > > > > > > > [root@ceph0 ~]# rbd -p irfu-virt info yyy-disk1 > > > > 2016-02-24 18:30:33.213590 7f00e6f6d7c0 -1 librbd::ImageCtx: error > > > reading > > > > image id: (95) Operation not supported > > > > rbd: error opening image yyy-disk1: (95) Operation not supported > > > > > > > One thing to note : the VM * IS STILL * working : I can still do disk > > > > operations, apparently. > > > > During the VM installation, I realized I wrongly set the target SSD > > > caching > > > > size to 100Mbytes, instead of 100Gbytes, and ceph complained it was > > > almost > > > > full : > > > > health HEALTH_WARN > > > > 'ssd-hot-irfu-virt' at/near target max > > > > > > > My question is…… am I facing the bug as reported in this list thread > > > > with title “Possible Cache Tier Bug - Can someone confirm” ? > > > > Or did I do something wrong ? > > > > > > > The libvirt and kvm that are writing into ceph are the following : > > > > libvirt -1.2.17-13.el7_2.3.x86_64 > > > > qemu- kvm -1.5.3-105.el7_2.3.x86_64 > > > > > > > Any idea how I could recover the VM file, if possible ? > > > > Please note I have no problem with deleting the VM and rebuilding > > > > it, I > > > just > > > > spawned it to test. > > > > As a matter of fact, I just “virsh destroyed” the VM, to see if I > > > > could > > > start > > > > it again… and I cant : > > > > > > > # virsh start yyy > > > > error: Failed to start domain yyy > > > > error: internal error: process exited while connecting to monitor: > > > > 2016-02-24T17:49:59.262170Z qemu-kvm: -drive > > > > > > > > file=rbd:irfu-virt/yyy-disk1:id=irfu-virt:key=***==:auth_supported=cephx\;none:mon_host=_____\:6789,if=none,id=drive-virtio-disk0,format=raw: > > > > error reading header from yyy-disk1 > > > > 2016-02-24T17:49:59.263743Z qemu-kvm: -drive > > > > > > > > file=rbd:irfu-virt/yyy-disk1:id=irfu-virt:key=A***==:auth_supported=cephx\;none:mon_host=___\:6789,if=none,id=drive-virtio-disk0,format=raw: > > > > could not open disk image > > > > > > > > rbd:irfu-virt/___-disk1:id=irfu-***==:auth_supported=cephx\;none:mon_host=___\:6789: > > > > Could not open 'rbd:irfu-virt/yyy-disk1:id=irfu-virt:key=*** > > > > > > > Ideas ? > > > > Thanks > > > > Frederic > > > > > > > _______________________________________________ > > > > ceph-users mailing list > > > > ceph-users@lists.ceph.com > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@lists.ceph.com > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > -- > Christian Balzer Network/Systems Engineer > ch...@gol.com Global OnLine Japan/Rakuten Communications > http://www.gol.com/ >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com