Re: [ceph-users] rbd watchers
The times I have seen this message, it has always been because there are snapshots of the image that haven't been deleted yet. You can see the snapshots with rbd snap list image. On Tue, May 20, 2014 at 4:26 AM, James Eckersall james.eckers...@gmail.com wrote: Hi, I'm having some trouble with an rbd image. I want to rename the current rbd and create a new rbd with the same name. I renamed the rbd with rbd mv, but it was still mapped on another node, so rbd mv gave me an error that it was unable to remove the source. I then unmapped the original rbd and tried to remove it. Despite it being unmapped, the cluster still believes that there is a watcher on the rbd: root@ceph-admin:~# rados -p poolname listwatchers rbdname.rbd watcher=x.x.x.x:0/2329830975 client.26367 cookie=48 root@ceph-admin:~# rbd rm -p poolname rbdname Removing image: 99% complete...failed.2014-05-20 11:50:15.023823 7fa6372e4780 -1 librbd: error removing header: (16) Device or resource busy rbd: error: image still has watchers This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout. I've already rebooted the node that the cluster claims is a watcher and confirmed it definitely is not mapped. I'm 99.9% sure that there are no nodes actually using this rbd. Does anyone know how I can get rid of it? Currently running ceph 0.73-1 on Ubuntu 12.04. Thanks J ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Object Storage front-end?
You can use librados directly or you can use radosgw, which, I think, would be pretty much exactly what you are looking for. On Tue, Apr 29, 2014 at 4:36 PM, Stuart Longland stua...@vrt.com.au wrote: Hi all, Is there some kind of web-based or WebDAV-based front-end for accessing a Ceph cluster? Our situation is sometimes we have big blobs that we'd like to stash somewhere safe, things like customer database backups, etc. Things other than disk images. We haven't deployed CephFS at this stage as at the time, running more than one MDS was not supported and I'd rather not rely on having something that has that single point of failure. For now I'm just creating a RBD, formatting it XFS and slopping my data into that. Not ideal, but it works: for me, as I run a Linux workstation. It won't work for the Windows users (which outnumber us greatly). I was thinking something along the lines of a WebDAV or Samba interface, which I realise could be done with conventional Apache/Samba atop CephFS, but I was wondering if there was something that would do it using the librados API? Something like Ceph Gateway, but without the specialised client requirement. Has anyone seen something along these lines or am I being to vague? Regards, -- Stuart Longland Systems Engineer _ ___ \ /|_) | T: +61 7 3535 9619 \/ | \ | 38b Douglas StreetF: +61 7 3535 9699 SYSTEMSMilton QLD 4064 http://www.vrt.com.au ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Using RBD with LVM
You need to add a line to /etc/lvm/lvm.conf: types = [ rbd, 1024 ] It should be in the devices section of the file. On Tue, Sep 24, 2013 at 5:00 PM, John-Paul Robinson j...@uab.edu wrote: Hi, I'm exploring a configuration with multiple Ceph block devices used with LVM. The goal is to provide a way to grow and shrink my file systems while they are on line. I've created three block devices: $ sudo ./ceph-ls | grep home jpr-home-lvm-p01: 102400 MB jpr-home-lvm-p02: 102400 MB jpr-home-lvm-p03: 102400 MB And have them mapped into my kernel (3.2.0-23-generic #36-Ubuntu SMP): $ sudo rbd showmapped id pool imagesnap device 0 rbd jpr-test-vol01 -/dev/rbd0 1 rbd jpr-home-lvm-p01 -/dev/rbd1 2 rbd jpr-home-lvm-p02 -/dev/rbd2 3 rbd jpr-home-lvm-p03 -/dev/rbd3 In order to use them with LVM, I need to define them as physical volumes. But when I run this command I get an unexpected error: $ sudo pvcreate /dev/rbd1 Device /dev/rbd1 not found (or ignored by filtering). I am able to use other RBD on this same machine to create file systems directly and mount them: $ df -h /mnt-test Filesystem Size Used Avail Use% Mounted on /dev/rbd050G 885M 47G 2% /mnt-test Is there a reason that the /dev/rbd[1-2] devices can't be initialized as physical volumes in LVM? Thanks, ~jpr ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph/rbd nova image cache
The easy solution to this is to create a really tiny image in glance (call it fake_image or something like that) and tell nova that it is the image you are using. Since you are booting from the RBD anyway, it doesn't actually use the image for anything, and should only put a single copy of it in the _base directory. On Wed, Aug 21, 2013 at 8:06 AM, Sébastien Han sebastien@enovance.comwrote: Do you use Xen or KVM? It seems that Xen as a flag called: cache_images=all. However I haven't seen anything for KVM. *Sébastien Han* Cloud Engineer *Always give 100%. Unless you're giving blood.* *Phone: *+33 (0)1 49 70 99 72 http://www.enovance.com/en - *Mobile: *+33 (0)6 52 84 44 70 http://www.enovance.com/en *Mail:* http://www.enovance.com/ensebastien@enovance.com - *Skype : *han.sbastien *Address :* 10, rue de la Victoire - 75009 Paris *Web : *www.enovance.com - *Twitter : *@enovance On August 20, 2013 at 10:07:56 PM, w sun (ws...@hotmail.com) wrote: This might be slightly off topic though many of ceph users might have run into similar issues. For one of our Grizzly Openstack environment, we are using Ceph/RBD as the exclusive image and volume storage for VMs, which are booting from rbd backed Cinder volumes. As a result, nova image cache is being used at all. For some reasons, nova still creates image cache under /var/lib/nova/_base on nova nodes. This fills up our shared /var/lib/nova/instances (with NFS) directory. This NFS share has limited size (50GB) which is used to store config drive and enable VM failover/restart during HW failure. Does anyone know how to disable nova image caching function completely? Or suggestion to best deal with this issue? We know we can do aggressive clearing with some of the nova image cache management configuration but that doesn't help reducing the extra IO overhead of caching image on the nova nodes. Thanks. --weiguo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Why is my mon store.db is 220GB?
Hmm. This sounds very similar to the problem I reported (with debug-mon = 20 and debug ms = 1 logs as of today) on our support site (ticket #438) - Sage, please take a look. On Mon, Aug 12, 2013 at 9:49 PM, Sage Weil s...@inktank.com wrote: On Mon, 12 Aug 2013, Jeppesen, Nelson wrote: Joao, (log file uploaded to http://pastebin.com/Ufrxn6fZ) I had some good luck and some bad luck. I copied the store.db to a new monitor, injected a modified monmap and started it up (This is all on the same host.) Very quickly it reached quorum (as far as I can tell) but didn't respond. Running 'ceph -w' just hung, no timeouts or errors. Same thing when restarting an OSD. The last lines of the log file '...ms_verify_authorizer..' are from 'ceph -w' attempts. I restarted everything again and it sat there synchronizing. IO stat reported about 100MB/s, but just reads. I let it sit there for 7 min but nothing happened. Can you do this again with --debug-mon 20 --debug-ms 1? It looks as though the main dispatch thread is blocked (7f71a1aa5700 does nothing after winning the election). It would also be helpful to gdb attach to the running ceph-mon and capture the output from 'thread apply all bt'. Side question, how long can a ceph cluster run without a monitor? I was able to upload files via rados gateway without issue even when the monitor was down. Quite a while, as long as no new processes need to authenticate, and no nodes go up or down. Eventually the authentication keys are going to time out, though (1 hour is the default). sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Weird problem - maybe quorum related
One of our tests last night failed in a weird way. We started with a three node cluster, with three monitors, expanded to a 5 node cluster with 5 monitors and dropped back to a 4 node cluster with three monitors. The sequence of events was: start 3 monitors (monitors 0, 1, 2) - monmap e1 add one node restart the 3 monitors add another node add monitor 4 - monmap e2 restart monitor 0 add monitor 3 - monmap e3 restart monitor 1 restart monitor 2 shutdown server with monitor 4 on it remove monitor 4 - monmap e4 restart monitor 0 mon.0 had an odd time sync problem and respawned stop monitor 3 remove monitor 3 At that point (08:23:52 in the log), ceph stopped responding (as if quorum was lost). Note that we do not see a new monmap (e5) created by the removal of monitor 3. See the (sort of) full log at: https://gist.github.com/mdegerne/06fa38243bd462c46d39 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Possible bug with image.list_lockers()
I'm not certain what the correct behavior should be in this case, so maybe it is not a bug, but here is what is happening: When an OSD becomes full, a process fails and we unmount the rbd attempt to remove the lock associated with the rbd for the process. The unmount works fine, but removing the lock is failing right now because the list_lockers() function call never returns. Here is a code snippet I tried with a fake rbd lock on a test cluster: import rbd import rados with rados.Rados(conffile='/etc/ceph/ceph.conf') as cluster: with cluster.open_ioctx('rbd') as ioctx: with rbd.Image(ioctx, 'msd1') as image: image.list_lockers() The process never returns, even after the ceph cluster is returned to healthy. The only indication of the error is an error in the /var/log/messages file: Jul 11 23:25:05 node-172-16-0-13 python: 2013-07-11 23:25:05.826793 7ffc66d72700 0 client.6911.objecter FULL, paused modify 0x7ffc687c6050 tid 2 Any help would be greatly appreciated. ceph version: ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] monitor removal and re-add
Precisely. This is what we need to do. It is just a case of adjusting our process to make that possible. As I stated a couple e-mails ago, the design of Ceph allows it, it is just a bit of a challenge to fit it into our existing processes. It's on me now to fix the process. On Mon, Jun 24, 2013 at 11:26 PM, Alex Bligh a...@alex.org.uk wrote: On 25 Jun 2013, at 00:39, Mandell Degerness wrote: The issue, Sage, is that we have to deal with the cluster being re-expanded. If we start with 5 monitors and scale back to 3, running the ceph mon remove N command after stopping each monitor and don't restart the existing monitors, we cannot re-add those same monitors that were previously removed. They will suicide at startup. Can you not restart the remaining monitors individually at the end of the process once the monmaps and the ceph.confs have been updated so they only think there are 3 monitors? Once you have got to a stable 3 mon config, you can go back up to 5. -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] monitor removal and re-add
Hmm. This is a bit ugly from our perspective, but not fatal to your design (just our implementation). At the time we run the rm, the cluster is smaller and so the restart of each monitor is not fatal to the cluster. The problem is on our side in terms of guaranteeing order of behaviors. On Mon, Jun 24, 2013 at 1:54 PM, Sage Weil s...@inktank.com wrote: On Mon, 24 Jun 2013, Mandell Degerness wrote: I'm testing the change (actually re-starting the monitors after the monitor removal), but this brings up the issue with why we didn't want to do this in the first place: When reducing the number of monitors from 5 to 3, we are guaranteed to have a service outage for the time it takes to restart at least one of the monitors (and, possibly, for two of the restarts, now that I think on it). In theory, the stop/start cycle is very short and should complete in a reasonable time. What I'm concerned about, however, is the case that something is wrong with our re-written config file. In that case, the outage is immediate and will last until the problem is corrected on the first server to have the monitor restarted. I'm jumping into this thread late, but: why would you follow the second removal procedure for broken clusters? To go from 5-3 mons, you should just stop 2 of the mons and do 'ceph mon rm addr1' 'ceph mon rm addr2'. sage On Mon, Jun 24, 2013 at 10:07 AM, John Nielsen li...@jnielsen.net wrote: On Jun 21, 2013, at 5:00 PM, Mandell Degerness mand...@pistoncloud.com wrote: There is a scenario where we would want to remove a monitor and, at a later date, re-add the monitor (using the same IP address). Is there a supported way to do this? I tried deleting the monitor directory and rebuilding from scratch following the add monitor procedures from the web, but the monitor still suicide's when started. I assume you're already referencing this: http://ceph.com/docs/master/rados/operations/add-or-rm-mons/ I have done what you describe before. There were a couple hiccups, let's see if I remember the specifics: Remove: Follow the first two steps under removing a monitor (manual) at the link above: service ceph stop mon.N ceph mon remove N Comment out the monitor entry in ceph.conf on ALL mon, osd and client hosts. Restart services as required to make everyone happy with the smaller set of monitors Re-add: Wipe the old monitor's directory and re-create it Follow the steps for adding a monitor (manual) at the link above. Instead of adding a new entry you can just un-comment your old ones in ceph.conf. You can also start the monitor with service ceph start mon N on the appropriate host instead of running yourself (step 8). Note that you DO need to run ceph-mon as specified in step 5. I was initially confused about the '--mkfs' flag there--it doesn't refer to the OS's filesystem, you should use a directory or mountpoint that is already prepared/mounted. HTH. If you run into trouble post exactly the steps you followed and additional details about your setup. JN ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] monitor removal and re-add
The issue, Sage, is that we have to deal with the cluster being re-expanded. If we start with 5 monitors and scale back to 3, running the ceph mon remove N command after stopping each monitor and don't restart the existing monitors, we cannot re-add those same monitors that were previously removed. They will suicide at startup. On Mon, Jun 24, 2013 at 4:22 PM, Sage Weil s...@inktank.com wrote: On Mon, 24 Jun 2013, Mandell Degerness wrote: Hmm. This is a bit ugly from our perspective, but not fatal to your design (just our implementation). At the time we run the rm, the cluster is smaller and so the restart of each monitor is not fatal to the cluster. The problem is on our side in terms of guaranteeing order of behaviors. Sorry, I'm still confused about where the monitor gets restarted. It doesn't matter if the removed monitor is stopped or failed/gone; 'ceph mon rm ...' will remove it from the monmap and quorum. It sounds like you're suggesting that the surviving monitors need to be restarted, but they do not, as long as enough of them are alive to form a quorum and pass the decree that the mon cluster is smaller. So 5 - 2 would be problematic, but 5 - 3 (assuming there are 3 currently up) will work without restarts... sage On Mon, Jun 24, 2013 at 1:54 PM, Sage Weil s...@inktank.com wrote: On Mon, 24 Jun 2013, Mandell Degerness wrote: I'm testing the change (actually re-starting the monitors after the monitor removal), but this brings up the issue with why we didn't want to do this in the first place: When reducing the number of monitors from 5 to 3, we are guaranteed to have a service outage for the time it takes to restart at least one of the monitors (and, possibly, for two of the restarts, now that I think on it). In theory, the stop/start cycle is very short and should complete in a reasonable time. What I'm concerned about, however, is the case that something is wrong with our re-written config file. In that case, the outage is immediate and will last until the problem is corrected on the first server to have the monitor restarted. I'm jumping into this thread late, but: why would you follow the second removal procedure for broken clusters? To go from 5-3 mons, you should just stop 2 of the mons and do 'ceph mon rm addr1' 'ceph mon rm addr2'. sage On Mon, Jun 24, 2013 at 10:07 AM, John Nielsen li...@jnielsen.net wrote: On Jun 21, 2013, at 5:00 PM, Mandell Degerness mand...@pistoncloud.com wrote: There is a scenario where we would want to remove a monitor and, at a later date, re-add the monitor (using the same IP address). Is there a supported way to do this? I tried deleting the monitor directory and rebuilding from scratch following the add monitor procedures from the web, but the monitor still suicide's when started. I assume you're already referencing this: http://ceph.com/docs/master/rados/operations/add-or-rm-mons/ I have done what you describe before. There were a couple hiccups, let's see if I remember the specifics: Remove: Follow the first two steps under removing a monitor (manual) at the link above: service ceph stop mon.N ceph mon remove N Comment out the monitor entry in ceph.conf on ALL mon, osd and client hosts. Restart services as required to make everyone happy with the smaller set of monitors Re-add: Wipe the old monitor's directory and re-create it Follow the steps for adding a monitor (manual) at the link above. Instead of adding a new entry you can just un-comment your old ones in ceph.conf. You can also start the monitor with service ceph start mon N on the appropriate host instead of running yourself (step 8). Note that you DO need to run ceph-mon as specified in step 5. I was initially confused about the '--mkfs' flag there--it doesn't refer to the OS's filesystem, you should use a directory or mountpoint that is already prepared/mounted. HTH. If you run into trouble post exactly the steps you followed and additional details about your setup. JN ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw placement groups
It is possible to create all of the pools manually before starting radosgw. That allows control of the pg_num used. The pools are: .rgw, .rgw.control, .rgw.gc, .log, .intent-log, .usage, .users, .users.email, .users.swift, .users.uid On Wed, Jun 19, 2013 at 6:13 PM, Derek Yarnell de...@umiacs.umd.edu wrote: Hi, So when bootstrapping radosgw you are not given the option to create the pools (and therefor set a specific pg_num). There are a lot of pools created .rgw, .rgw.gc, .rgw.control, .users.uid, .users.email, .users. I know I can set osd_pool_default_pg_num but that will apply to all those uniformly. Since I don't see any way to specify these built in pools I am guessing I may be able to just create these by hand before with individualized numbers of placement groups. Is there a guideline to sizing these, my guess is that these are not going to be uniform. I also see the PG splitting feature but it is still experimental and I am guessing not what I should be doing to continually resize these. Thanks, derek -- --- Derek T. Yarnell University of Maryland Institute for Advanced Computer Studies ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RBD Reference Counts for deletion
I know that there was another report of the bad behavior when deleting an RBD that is currently mounted on a host. My problem is related, but slightly different. We are using openstack and Grizzly Cinder to create a bootable ceph volume. The instance was booted and all was well. The server on which the instance had been booted was unplugged. The user deleted the instance - which amounts to a database update on the Nova side. They then tried to delete the volume, which failed with the following error: Traceback (most recent call last): 21262 File /usr/lib64/python2.7/site-packages/cinder/volume/driver.py, line 90, in _try_execute 21263 self._execute(*command, **kwargs) 21264 File /usr/lib64/python2.7/site-packages/cinder/utils.py, line 190, in execute 21265 cmd=' '.join(cmd)) 21266 ProcessExecutionError: Unexpected error while running command. 21267 Command: rbd rm --pool rbd volume-66e11621-1c38-4e2d-9d90-cc511013c290 21268 Exit code: 16 21269 Stdout: '\rRemoving image: 1% complete...\rRemoving image: 2% complete...\rRemoving image: 3% complete...\rRemoving image: 4% complete...\rRemoving image: 5% complete ...\rRemoving image: 6% complete...\rRemoving image: 7% complete...\rRemoving image: 8% complete...\rRemoving image: 9% complete...\rRemoving image: 10% complete...\r Removing image: 11% complete...\rRemoving image: 12% complete...\rRemoving image: 13% complete...\rRemoving image: 14% complete...\rRemoving image: 15% complete...\rR emoving image: 16% complete...\rRemoving image: 17% complete...\rRemoving image: 18% complete...\rRemoving image: 19% complete...\rRemoving image: 20% complete...\rRe moving image: 21% complete...\rRemoving image: 22% complete...\rRemoving image: 23% complete...\rRemoving image: 24% complete...\rRemoving image: 25% complete...\rRem oving image: 26% complete...\rRemoving image: 27% complete...\rRemoving image: 28% complete...\rRemoving image: 29% complete...\rRemoving image: 30% complete...\rRemo ving image: 31% complete...\rRemoving image: 32% complete...\rRemoving image: 33% complete...\rRemoving image: 34% complete...\rRemoving image: 35% complete...\rRemov ing image: 36% complete...\rRemoving image: 37% complete...\rRemoving image: 38% complete...\rRemoving image: 39% complete...\rRemoving image: 40% complete...\rRemovi ng image: 41% complete...\rRemoving image: 42% complete...\rRemoving image: 43% complete...\rRemoving image: 44% complete...\rRemoving image: 45% complete...\rRemovin g image: 46% complete...\rRemoving image: 47% complete...\rRemoving image: 48% complete...\rRemoving image: 49% complete...\rRemoving image: 50% complete...\rRemoving image: 51% complete...\rRemoving image: 52% complete...\rRemoving image: 53% complete...\rRemoving image: 54% complete...\rRemoving image: 55% complete...\rRemoving image: 56% complete...\rRemoving image: 57% complete...\rRemoving image: 58% complete...\rRemoving image: 59% complete...\rRemoving image: 60% complete...\rRemoving i mage: 61% complete...\rRemoving image: 62% complete...\rRemoving image: 63% complete...\rRemoving image: 64% complete...\rRemoving image: 65% complete...\rRemoving im age: 66% complete...\rRemoving image: 67% complete...\rRemoving image: 68% complete...\rRemoving image: 69% complete...\rRemoving image: 70% complete...\rRemoving ima ge: 71% complete...\rRemoving image: 72% complete...\rRemoving image: 73% complete...\rRemoving image: 74% complete...\rRemoving image: 75% complete...\rRemoving imag e: 76% complete...\rRemoving image: 77% complete...\rRemoving image: 78% complete...\rRemoving image: 79% complete...\rRemoving image: 80% complete...\rRemoving image : 81% complete...\rRemoving image: 82% complete...\rRemoving image: 83% complete...\rRemoving image: 84% complete...\rRemoving image: 85% complete...\rRemoving image: 86% complete...\rRemoving image: 87% complete...\rRemoving image: 88% complete...\rRemoving image: 89% complete...\rRemoving image: 90% complete...\rRemoving image: 91% complete...\rRemoving image: 92% complete...\rRemoving image: 93% complete...\rRemoving image: 94% complete...\rRemoving image: 95% complete...\rRemoving image: 9 6% complete...\rRemoving image: 97% complete...\rRemoving image: 98% complete...\rRemoving image: 99% complete...\rRemoving image: 99% complete...failed.\n' 21270 Stderr: 'rbd: error: image still has watchers\nThis means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.\n2013-05-09 21:51:27.522986 7f8aca884780 -1 librbd: error removing header: (16) Device or resource busy\n' It appears to me that Ceph still believes the volume is mounted somewhere. Is there a way to tell Ceph to delete the RBD, in spite of it's belief that it is mounted? ___ ceph-users mailing list ceph-users@lists.ceph.com
Re: [ceph-users] RBD Reference Counts for deletion
Sorry. I should have mentioned, this is using the bobtail version of ceph. On Mon, May 13, 2013 at 1:13 PM, Mandell Degerness mand...@pistoncloud.com wrote: I know that there was another report of the bad behavior when deleting an RBD that is currently mounted on a host. My problem is related, but slightly different. We are using openstack and Grizzly Cinder to create a bootable ceph volume. The instance was booted and all was well. The server on which the instance had been booted was unplugged. The user deleted the instance - which amounts to a database update on the Nova side. They then tried to delete the volume, which failed with the following error: Traceback (most recent call last): 21262 File /usr/lib64/python2.7/site-packages/cinder/volume/driver.py, line 90, in _try_execute 21263 self._execute(*command, **kwargs) 21264 File /usr/lib64/python2.7/site-packages/cinder/utils.py, line 190, in execute 21265 cmd=' '.join(cmd)) 21266 ProcessExecutionError: Unexpected error while running command. 21267 Command: rbd rm --pool rbd volume-66e11621-1c38-4e2d-9d90-cc511013c290 21268 Exit code: 16 21269 Stdout: '\rRemoving image: 1% complete...\rRemoving image: 2% complete...\rRemoving image: 3% complete...\rRemoving image: 4% complete...\rRemoving image: 5% complete ...\rRemoving image: 6% complete...\rRemoving image: 7% complete...\rRemoving image: 8% complete...\rRemoving image: 9% complete...\rRemoving image: 10% complete...\r Removing image: 11% complete...\rRemoving image: 12% complete...\rRemoving image: 13% complete...\rRemoving image: 14% complete...\rRemoving image: 15% complete...\rR emoving image: 16% complete...\rRemoving image: 17% complete...\rRemoving image: 18% complete...\rRemoving image: 19% complete...\rRemoving image: 20% complete...\rRe moving image: 21% complete...\rRemoving image: 22% complete...\rRemoving image: 23% complete...\rRemoving image: 24% complete...\rRemoving image: 25% complete...\rRem oving image: 26% complete...\rRemoving image: 27% complete...\rRemoving image: 28% complete...\rRemoving image: 29% complete...\rRemoving image: 30% complete...\rRemo ving image: 31% complete...\rRemoving image: 32% complete...\rRemoving image: 33% complete...\rRemoving image: 34% complete...\rRemoving image: 35% complete...\rRemov ing image: 36% complete...\rRemoving image: 37% complete...\rRemoving image: 38% complete...\rRemoving image: 39% complete...\rRemoving image: 40% complete...\rRemovi ng image: 41% complete...\rRemoving image: 42% complete...\rRemoving image: 43% complete...\rRemoving image: 44% complete...\rRemoving image: 45% complete...\rRemovin g image: 46% complete...\rRemoving image: 47% complete...\rRemoving image: 48% complete...\rRemoving image: 49% complete...\rRemoving image: 50% complete...\rRemoving image: 51% complete...\rRemoving image: 52% complete...\rRemoving image: 53% complete...\rRemoving image: 54% complete...\rRemoving image: 55% complete...\rRemoving image: 56% complete...\rRemoving image: 57% complete...\rRemoving image: 58% complete...\rRemoving image: 59% complete...\rRemoving image: 60% complete...\rRemoving i mage: 61% complete...\rRemoving image: 62% complete...\rRemoving image: 63% complete...\rRemoving image: 64% complete...\rRemoving image: 65% complete...\rRemoving im age: 66% complete...\rRemoving image: 67% complete...\rRemoving image: 68% complete...\rRemoving image: 69% complete...\rRemoving image: 70% complete...\rRemoving ima ge: 71% complete...\rRemoving image: 72% complete...\rRemoving image: 73% complete...\rRemoving image: 74% complete...\rRemoving image: 75% complete...\rRemoving imag e: 76% complete...\rRemoving image: 77% complete...\rRemoving image: 78% complete...\rRemoving image: 79% complete...\rRemoving image: 80% complete...\rRemoving image : 81% complete...\rRemoving image: 82% complete...\rRemoving image: 83% complete...\rRemoving image: 84% complete...\rRemoving image: 85% complete...\rRemoving image: 86% complete...\rRemoving image: 87% complete...\rRemoving image: 88% complete...\rRemoving image: 89% complete...\rRemoving image: 90% complete...\rRemoving image: 91% complete...\rRemoving image: 92% complete...\rRemoving image: 93% complete...\rRemoving image: 94% complete...\rRemoving image: 95% complete...\rRemoving image: 9 6% complete...\rRemoving image: 97% complete...\rRemoving image: 98% complete...\rRemoving image: 99% complete...\rRemoving image: 99% complete...failed.\n' 21270 Stderr: 'rbd: error: image still has watchers\nThis means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.\n2013-05-09 21:51:27.522986 7f8aca884780 -1 librbd: error removing header: (16) Device or resource busy\n' It appears to me that Ceph still believes the volume