Re: [ceph-users] rbd watchers

2014-05-21 Thread Mandell Degerness
The times I have seen this message, it has always been because there
are snapshots of the image that haven't been deleted yet. You can see
the snapshots with rbd snap list image.

On Tue, May 20, 2014 at 4:26 AM, James Eckersall
james.eckers...@gmail.com wrote:
 Hi,



 I'm having some trouble with an rbd image.  I want to rename the current rbd
 and create a new rbd with the same name.

 I renamed the rbd with rbd mv, but it was still mapped on another node, so
 rbd mv gave me an error that it was unable to remove the source.


 I then unmapped the original rbd and tried to remove it.


 Despite it being unmapped, the cluster still believes that there is a
 watcher on the rbd:


 root@ceph-admin:~# rados -p poolname listwatchers rbdname.rbd

 watcher=x.x.x.x:0/2329830975 client.26367 cookie=48

 root@ceph-admin:~# rbd rm -p poolname rbdname

 Removing image: 99% complete...failed.2014-05-20 11:50:15.023823
 7fa6372e4780 -1 librbd: error removing header: (16) Device or resource busy


 rbd: error: image still has watchers

 This means the image is still open or the client using it crashed. Try again
 after closing/unmapping it or waiting 30s for the crashed client to timeout.



 I've already rebooted the node that the cluster claims is a watcher and
 confirmed it definitely is not mapped.

 I'm 99.9% sure that there are no nodes actually using this rbd.


 Does anyone know how I can get rid of it?


 Currently running ceph 0.73-1 on Ubuntu 12.04.


 Thanks


 J


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Object Storage front-end?

2014-05-01 Thread Mandell Degerness
You can use librados directly or you can use radosgw, which, I think,
would be pretty much exactly what you are looking for.

On Tue, Apr 29, 2014 at 4:36 PM, Stuart Longland stua...@vrt.com.au wrote:
 Hi all,

 Is there some kind of web-based or WebDAV-based front-end for accessing
 a Ceph cluster?

 Our situation is sometimes we have big blobs that we'd like to stash
 somewhere safe, things like customer database backups, etc.  Things
 other than disk images.

 We haven't deployed CephFS at this stage as at the time, running more
 than one MDS was not supported and I'd rather not rely on having
 something that has that single point of failure.

 For now I'm just creating a RBD, formatting it XFS and slopping my data
 into that.  Not ideal, but it works: for me, as I run a Linux
 workstation.  It won't work for the Windows users (which outnumber us
 greatly).

 I was thinking something along the lines of a WebDAV or Samba interface,
 which I realise could be done with conventional Apache/Samba atop
 CephFS, but I was wondering if there was something that would do it
 using the librados API?  Something like Ceph Gateway, but without the
 specialised client requirement.

 Has anyone seen something along these lines or am I being to vague?
 Regards,
 --
 Stuart Longland
 Systems Engineer
  _ ___
 \  /|_) |   T: +61 7 3535 9619
  \/ | \ | 38b Douglas StreetF: +61 7 3535 9699
SYSTEMSMilton QLD 4064   http://www.vrt.com.au


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Using RBD with LVM

2013-09-24 Thread Mandell Degerness
You need to add a line to /etc/lvm/lvm.conf:

types = [ rbd, 1024 ]

It should be in the devices section of the file.

On Tue, Sep 24, 2013 at 5:00 PM, John-Paul Robinson j...@uab.edu wrote:
 Hi,

 I'm exploring a configuration with multiple Ceph block devices used with
 LVM.  The goal is to provide a way to grow and shrink my file systems
 while they are on line.

 I've created three block devices:

 $ sudo ./ceph-ls  | grep home
 jpr-home-lvm-p01: 102400 MB
 jpr-home-lvm-p02: 102400 MB
 jpr-home-lvm-p03: 102400 MB

 And have them mapped into my kernel (3.2.0-23-generic #36-Ubuntu SMP):

 $ sudo rbd showmapped
 id pool imagesnap device
 0  rbd  jpr-test-vol01   -/dev/rbd0
 1  rbd  jpr-home-lvm-p01 -/dev/rbd1
 2  rbd  jpr-home-lvm-p02 -/dev/rbd2
 3  rbd  jpr-home-lvm-p03 -/dev/rbd3

 In order to use them with LVM, I need to define them as physical
 volumes.  But when I run this command I get an unexpected error:

 $ sudo pvcreate /dev/rbd1
   Device /dev/rbd1 not found (or ignored by filtering).

 I am able to use other RBD on this same machine to create file systems
 directly and mount them:

 $ df -h /mnt-test
 Filesystem  Size  Used Avail Use% Mounted on
 /dev/rbd050G  885M   47G   2% /mnt-test

 Is there a reason that the /dev/rbd[1-2] devices can't be initialized as
 physical volumes in LVM?

 Thanks,

 ~jpr
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph/rbd nova image cache

2013-08-23 Thread Mandell Degerness
The easy solution to this is to create a really tiny image in glance (call
it fake_image or something like that) and tell nova that it is the image
you are using.  Since you are booting from the RBD anyway, it doesn't
actually use the image for anything, and should only put a single copy of
it in the _base directory.


On Wed, Aug 21, 2013 at 8:06 AM, Sébastien Han
sebastien@enovance.comwrote:

 Do you use Xen or KVM?

 It seems that Xen as a flag called: cache_images=all. However I haven't
 seen anything for KVM.

 
 *Sébastien Han*
 Cloud Engineer

 *Always give 100%. Unless you're giving blood.*


 *Phone: *+33 (0)1 49 70 99 72 http://www.enovance.com/en -
 *Mobile: *+33 (0)6 52 84 44 70 http://www.enovance.com/en
 *Mail:*  http://www.enovance.com/ensebastien@enovance.com - *Skype
 : *han.sbastien
 *Address :* 10, rue de la Victoire - 75009 Paris
 *Web : *www.enovance.com - *Twitter : *@enovance

 On August 20, 2013 at 10:07:56 PM, w sun (ws...@hotmail.com) wrote:

 This might be slightly off topic though many of ceph users might have run
 into similar issues.

 For one of our Grizzly Openstack environment, we are using Ceph/RBD as the
 exclusive image and volume storage for VMs, which are booting from rbd
 backed Cinder volumes. As a result, nova image cache is being used at all.
 For some reasons, nova still creates image cache under /var/lib/nova/_base
 on nova nodes. This fills up our shared /var/lib/nova/instances (with NFS)
 directory. This NFS share has limited size (50GB) which is used to store
 config drive and enable VM failover/restart during HW failure.

 Does anyone know how to disable nova image caching function completely? Or
 suggestion to best deal with this issue? We know we can do aggressive
 clearing with some of the nova image cache management configuration but
 that doesn't help reducing the extra IO overhead of caching image on the
 nova nodes.

 Thanks. --weiguo


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Mandell Degerness
Hmm.  This sounds very similar to the problem I reported (with
debug-mon = 20 and debug ms = 1 logs as of today) on our support site
(ticket #438) - Sage, please take a look.

On Mon, Aug 12, 2013 at 9:49 PM, Sage Weil s...@inktank.com wrote:
 On Mon, 12 Aug 2013, Jeppesen, Nelson wrote:
 Joao,

 (log file uploaded to http://pastebin.com/Ufrxn6fZ)

 I had some good luck and some bad luck. I copied the store.db to a new 
 monitor, injected a modified monmap and started it up (This is all on the 
 same host.) Very quickly it reached quorum (as far as I can tell) but didn't 
 respond. Running 'ceph -w' just hung, no timeouts or errors. Same thing when 
 restarting an OSD.

 The last lines of the log file   '...ms_verify_authorizer..' are from 'ceph 
 -w' attempts.

 I restarted everything again and it sat there synchronizing. IO stat 
 reported about 100MB/s, but just reads. I let it sit there for 7 min but 
 nothing happened.

 Can you do this again with --debug-mon 20 --debug-ms 1?  It looks as
 though the main dispatch thread is blocked (7f71a1aa5700 does nothing
 after winning the election).  It would also be helpful to gdb attach to
 the running ceph-mon and capture the output from 'thread apply all bt'.

 Side question, how long can a ceph cluster run without a monitor? I was
 able to upload files via rados gateway without issue even when the
 monitor was down.

 Quite a while, as long as no new processes need to authenticate, and no
 nodes go up or down.  Eventually the authentication keys are going to time
 out, though (1 hour is the default).

 sage
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Weird problem - maybe quorum related

2013-07-23 Thread Mandell Degerness
One of our tests last night failed in a weird way.  We started with a
three node cluster, with three monitors, expanded to a 5 node cluster
with 5 monitors and dropped back to a 4 node cluster with three
monitors.

The sequence of events was:

start 3 monitors (monitors 0, 1, 2) - monmap e1
add one node
restart the 3 monitors
add another node
add monitor 4 - monmap e2
restart monitor 0
add monitor 3 - monmap e3
restart monitor 1
restart monitor 2
shutdown server with monitor 4 on it
remove monitor 4 - monmap e4
restart monitor 0
mon.0 had an odd time sync problem and respawned
stop monitor 3
remove monitor 3

At that point (08:23:52 in the log), ceph stopped responding (as if
quorum was lost).  Note that we do not see a new monmap (e5) created
by the removal of monitor 3.

See the (sort of) full log at:
https://gist.github.com/mdegerne/06fa38243bd462c46d39
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Possible bug with image.list_lockers()

2013-07-11 Thread Mandell Degerness
I'm not certain what the correct behavior should be in this case, so
maybe it is not a bug, but here is what is happening:

When an OSD becomes full, a process fails and we unmount the rbd
attempt to remove the lock associated with the rbd for the process.
The unmount works fine, but removing the lock is failing right now
because the list_lockers() function call never returns.

Here is a code snippet I tried with a fake rbd lock on a test cluster:

import rbd
import rados
with rados.Rados(conffile='/etc/ceph/ceph.conf') as cluster:
  with cluster.open_ioctx('rbd') as ioctx:
with rbd.Image(ioctx, 'msd1') as image:
  image.list_lockers()

The process never returns, even after the ceph cluster is returned to
healthy.  The only indication of the error is an error in the
/var/log/messages file:

Jul 11 23:25:05 node-172-16-0-13 python: 2013-07-11 23:25:05.826793
7ffc66d72700  0 client.6911.objecter  FULL, paused modify
0x7ffc687c6050 tid 2

Any help would be greatly appreciated.

ceph version:

ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] monitor removal and re-add

2013-06-25 Thread Mandell Degerness
Precisely.  This is what we need to do.  It is just a case of
adjusting our process to make that possible.  As I stated a couple
e-mails ago, the design of Ceph allows it, it is just a bit of a
challenge to fit it into our existing processes.  It's on me now to
fix the process.

On Mon, Jun 24, 2013 at 11:26 PM, Alex Bligh a...@alex.org.uk wrote:

 On 25 Jun 2013, at 00:39, Mandell Degerness wrote:

 The issue, Sage, is that we have to deal with the cluster being
 re-expanded.  If we start with 5 monitors and scale back to 3, running
 the ceph mon remove N command after stopping each monitor and don't
 restart the existing monitors, we cannot re-add those same monitors
 that were previously removed.  They will suicide at startup.

 Can you not restart the remaining monitors individually at the
 end of the process once the monmaps and the ceph.confs have been
 updated so they only think there are 3 monitors?

 Once you have got to a stable 3 mon config, you can go back up
 to 5.

 --
 Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] monitor removal and re-add

2013-06-24 Thread Mandell Degerness
Hmm.  This is a bit ugly from our perspective, but not fatal to your
design (just our implementation).  At the time we run the rm, the
cluster is smaller and so the restart of each monitor is not fatal to
the cluster.  The problem is on our side in terms of guaranteeing
order of behaviors.

On Mon, Jun 24, 2013 at 1:54 PM, Sage Weil s...@inktank.com wrote:
 On Mon, 24 Jun 2013, Mandell Degerness wrote:
 I'm testing the change (actually re-starting the monitors after the
 monitor removal), but this brings up the issue with why we didn't want
 to do this in the first place:  When reducing the number of monitors
 from 5 to 3, we are guaranteed to have a service outage for the time
 it takes to restart at least one of the monitors (and, possibly, for
 two of the restarts, now that I think on it).  In theory, the
 stop/start cycle is very short and should complete in a reasonable
 time.  What I'm concerned about, however, is the case that something
 is wrong with our re-written config file.  In that case, the outage is
 immediate and will last until the problem is corrected on the first
 server to have the monitor restarted.

 I'm jumping into this thread late, but: why would you follow the second
 removal procedure for broken clusters?  To go from 5-3 mons, you should
 just stop 2 of the mons and do 'ceph mon rm addr1' 'ceph mon rm
 addr2'.

 sage


 On Mon, Jun 24, 2013 at 10:07 AM, John Nielsen li...@jnielsen.net wrote:
  On Jun 21, 2013, at 5:00 PM, Mandell Degerness mand...@pistoncloud.com 
  wrote:
 
  There is a scenario where we would want to remove a monitor and, at a
  later date, re-add the monitor (using the same IP address).  Is there
  a supported way to do this?  I tried deleting the monitor directory
  and rebuilding from scratch following the add monitor procedures from
  the web, but the monitor still suicide's when started.
 
 
  I assume you're already referencing this:
  http://ceph.com/docs/master/rados/operations/add-or-rm-mons/
 
  I have done what you describe before. There were a couple hiccups, let's 
  see if I remember the specifics:
 
  Remove:
  Follow the first two steps under removing a monitor (manual) at the link 
  above:
  service ceph stop mon.N
  ceph mon remove N
  Comment out the monitor entry in ceph.conf on ALL mon, osd and client 
  hosts.
  Restart services as required to make everyone happy with the smaller set 
  of monitors
 
  Re-add:
  Wipe the old monitor's directory and re-create it
  Follow the steps for adding a monitor (manual) at the link above. Instead 
  of adding a new entry you can just un-comment your old ones in ceph.conf. 
  You can also start the monitor with service ceph start mon N on the 
  appropriate host instead of running yourself (step 8). Note that you DO 
  need to run ceph-mon as specified in step 5. I was initially confused 
  about the '--mkfs' flag there--it doesn't refer to the OS's filesystem, 
  you should use a directory or mountpoint that is already prepared/mounted.
 
  HTH. If you run into trouble post exactly the steps you followed and 
  additional details about your setup.
 
  JN
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] monitor removal and re-add

2013-06-24 Thread Mandell Degerness
The issue, Sage, is that we have to deal with the cluster being
re-expanded.  If we start with 5 monitors and scale back to 3, running
the ceph mon remove N command after stopping each monitor and don't
restart the existing monitors, we cannot re-add those same monitors
that were previously removed.  They will suicide at startup.

On Mon, Jun 24, 2013 at 4:22 PM, Sage Weil s...@inktank.com wrote:
 On Mon, 24 Jun 2013, Mandell Degerness wrote:
 Hmm.  This is a bit ugly from our perspective, but not fatal to your
 design (just our implementation).  At the time we run the rm, the
 cluster is smaller and so the restart of each monitor is not fatal to
 the cluster.  The problem is on our side in terms of guaranteeing
 order of behaviors.

 Sorry, I'm still confused about where the monitor gets restarted.  It
 doesn't matter if the removed monitor is stopped or failed/gone; 'ceph mon
 rm ...' will remove it from the monmap and quorum.  It sounds like you're
 suggesting that the surviving monitors need to be restarted, but they do
 not, as long as enough of them are alive to form a quorum and pass the
 decree that the mon cluster is smaller.  So 5 - 2 would be problematic,
 but 5 - 3 (assuming there are 3 currently up) will work without
 restarts...

 sage



 On Mon, Jun 24, 2013 at 1:54 PM, Sage Weil s...@inktank.com wrote:
  On Mon, 24 Jun 2013, Mandell Degerness wrote:
  I'm testing the change (actually re-starting the monitors after the
  monitor removal), but this brings up the issue with why we didn't want
  to do this in the first place:  When reducing the number of monitors
  from 5 to 3, we are guaranteed to have a service outage for the time
  it takes to restart at least one of the monitors (and, possibly, for
  two of the restarts, now that I think on it).  In theory, the
  stop/start cycle is very short and should complete in a reasonable
  time.  What I'm concerned about, however, is the case that something
  is wrong with our re-written config file.  In that case, the outage is
  immediate and will last until the problem is corrected on the first
  server to have the monitor restarted.
 
  I'm jumping into this thread late, but: why would you follow the second
  removal procedure for broken clusters?  To go from 5-3 mons, you should
  just stop 2 of the mons and do 'ceph mon rm addr1' 'ceph mon rm
  addr2'.
 
  sage
 
 
  On Mon, Jun 24, 2013 at 10:07 AM, John Nielsen li...@jnielsen.net wrote:
   On Jun 21, 2013, at 5:00 PM, Mandell Degerness 
   mand...@pistoncloud.com wrote:
  
   There is a scenario where we would want to remove a monitor and, at a
   later date, re-add the monitor (using the same IP address).  Is there
   a supported way to do this?  I tried deleting the monitor directory
   and rebuilding from scratch following the add monitor procedures from
   the web, but the monitor still suicide's when started.
  
  
   I assume you're already referencing this:
   http://ceph.com/docs/master/rados/operations/add-or-rm-mons/
  
   I have done what you describe before. There were a couple hiccups, 
   let's see if I remember the specifics:
  
   Remove:
   Follow the first two steps under removing a monitor (manual) at the 
   link above:
   service ceph stop mon.N
   ceph mon remove N
   Comment out the monitor entry in ceph.conf on ALL mon, osd and client 
   hosts.
   Restart services as required to make everyone happy with the smaller 
   set of monitors
  
   Re-add:
   Wipe the old monitor's directory and re-create it
   Follow the steps for adding a monitor (manual) at the link above. 
   Instead of adding a new entry you can just un-comment your old ones in 
   ceph.conf. You can also start the monitor with service ceph start mon 
   N on the appropriate host instead of running yourself (step 8). Note 
   that you DO need to run ceph-mon as specified in step 5. I was 
   initially confused about the '--mkfs' flag there--it doesn't refer to 
   the OS's filesystem, you should use a directory or mountpoint that is 
   already prepared/mounted.
  
   HTH. If you run into trouble post exactly the steps you followed and 
   additional details about your setup.
  
   JN
  
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw placement groups

2013-06-20 Thread Mandell Degerness
It is possible to create all of the pools manually before starting
radosgw.  That allows control of the pg_num used.  The pools are:

.rgw, .rgw.control, .rgw.gc, .log, .intent-log, .usage, .users,
.users.email, .users.swift, .users.uid

On Wed, Jun 19, 2013 at 6:13 PM, Derek Yarnell de...@umiacs.umd.edu wrote:
 Hi,

 So when bootstrapping radosgw you are not given the option to create the
 pools (and therefor set a specific pg_num).  There are a lot of pools
 created .rgw, .rgw.gc, .rgw.control, .users.uid, .users.email, .users.
 I know I can set osd_pool_default_pg_num but that will apply to all
 those uniformly.

 Since I don't see any way to specify these built in pools I am guessing
 I may be able to just create these by hand before with individualized
 numbers of placement groups.  Is there a guideline to sizing these, my
 guess is that these are not going to be uniform.

 I also see the PG splitting feature but it is still experimental and I
 am guessing not what I should be doing to continually resize these.

 Thanks,
 derek

 --
 ---
 Derek T. Yarnell
 University of Maryland
 Institute for Advanced Computer Studies
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD Reference Counts for deletion

2013-05-13 Thread Mandell Degerness
I know that there was another report of the bad behavior when deleting
an RBD that is currently mounted on a host.  My problem is related,
but slightly different.

We are using openstack and Grizzly Cinder to create a bootable ceph
volume.  The instance was booted and all was well.  The server on
which the instance had been booted was unplugged.  The user deleted
the instance - which amounts to a database update on the Nova side.
They then tried to delete the volume, which failed with the following
error:

Traceback (most recent call last):
21262   File /usr/lib64/python2.7/site-packages/cinder/volume/driver.py,
line 90, in _try_execute
21263 self._execute(*command, **kwargs)
21264   File /usr/lib64/python2.7/site-packages/cinder/utils.py,
line 190, in execute
21265 cmd=' '.join(cmd))
21266 ProcessExecutionError: Unexpected error while running command.
21267 Command: rbd rm --pool rbd volume-66e11621-1c38-4e2d-9d90-cc511013c290
21268 Exit code: 16
21269 Stdout: '\rRemoving image: 1% complete...\rRemoving image: 2%
complete...\rRemoving image: 3% complete...\rRemoving image: 4%
complete...\rRemoving image: 5% complete  ...\rRemoving image: 6%
complete...\rRemoving image: 7% complete...\rRemoving image: 8%
complete...\rRemoving image: 9% complete...\rRemoving image: 10%
complete...\r  Removing image: 11% complete...\rRemoving image:
12% complete...\rRemoving image: 13% complete...\rRemoving image: 14%
complete...\rRemoving image: 15% complete...\rR  emoving image:
16% complete...\rRemoving image: 17% complete...\rRemoving image: 18%
complete...\rRemoving image: 19% complete...\rRemoving image: 20%
complete...\rRe  moving image: 21% complete...\rRemoving image:
22% complete...\rRemoving image: 23% complete...\rRemoving image: 24%
complete...\rRemoving image: 25% complete...\rRem  oving image:
26% complete...\rRemoving image: 27% complete...\rRemoving image: 28%
complete...\rRemoving image: 29% complete...\rRemoving image: 30%
complete...\rRemo  ving image: 31% complete...\rRemoving image:
32% complete...\rRemoving image: 33% complete...\rRemoving image: 34%
complete...\rRemoving image: 35% complete...\rRemov  ing image:
36% complete...\rRemoving image: 37% complete...\rRemoving image: 38%
complete...\rRemoving image: 39% complete...\rRemoving image: 40%
complete...\rRemovi  ng image: 41% complete...\rRemoving image:
42% complete...\rRemoving image: 43% complete...\rRemoving image: 44%
complete...\rRemoving image: 45% complete...\rRemovin  g image:
46% complete...\rRemoving image: 47% complete...\rRemoving image: 48%
complete...\rRemoving image: 49% complete...\rRemoving image: 50%
complete...\rRemoving   image: 51% complete...\rRemoving image:
52% complete...\rRemoving image: 53% complete...\rRemoving image: 54%
complete...\rRemoving image: 55% complete...\rRemoving   image:
56% complete...\rRemoving image: 57% complete...\rRemoving image: 58%
complete...\rRemoving image: 59% complete...\rRemoving image: 60%
complete...\rRemoving i  mage: 61% complete...\rRemoving image:
62% complete...\rRemoving image: 63% complete...\rRemoving image: 64%
complete...\rRemoving image: 65% complete...\rRemoving im  age:
66% complete...\rRemoving image: 67% complete...\rRemoving image: 68%
complete...\rRemoving image: 69% complete...\rRemoving image: 70%
complete...\rRemoving ima  ge: 71% complete...\rRemoving image:
72% complete...\rRemoving image: 73% complete...\rRemoving image: 74%
complete...\rRemoving image: 75% complete...\rRemoving imag  e:
76% complete...\rRemoving image: 77% complete...\rRemoving image: 78%
complete...\rRemoving image: 79% complete...\rRemoving image: 80%
complete...\rRemoving image  : 81% complete...\rRemoving image:
82% complete...\rRemoving image: 83% complete...\rRemoving image: 84%
complete...\rRemoving image: 85% complete...\rRemoving image:
86% complete...\rRemoving image: 87% complete...\rRemoving image: 88%
complete...\rRemoving image: 89% complete...\rRemoving image: 90%
complete...\rRemoving image:   91% complete...\rRemoving image:
92% complete...\rRemoving image: 93% complete...\rRemoving image: 94%
complete...\rRemoving image: 95% complete...\rRemoving image: 9
6% complete...\rRemoving image: 97% complete...\rRemoving image: 98%
complete...\rRemoving image: 99% complete...\rRemoving image: 99%
complete...failed.\n'
21270 Stderr: 'rbd: error: image still has watchers\nThis means the
image is still open or the client using it crashed. Try again after
closing/unmapping it or waiting 30s   for the crashed client to
timeout.\n2013-05-09 21:51:27.522986 7f8aca884780 -1 librbd: error
removing header: (16) Device or resource busy\n'

It appears to me that Ceph still believes the volume is mounted
somewhere.  Is there a way to tell Ceph to delete the RBD, in spite of
it's belief that it is mounted?
___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] RBD Reference Counts for deletion

2013-05-13 Thread Mandell Degerness
Sorry.  I should have mentioned, this is using the bobtail version of ceph.

On Mon, May 13, 2013 at 1:13 PM, Mandell Degerness
mand...@pistoncloud.com wrote:
 I know that there was another report of the bad behavior when deleting
 an RBD that is currently mounted on a host.  My problem is related,
 but slightly different.

 We are using openstack and Grizzly Cinder to create a bootable ceph
 volume.  The instance was booted and all was well.  The server on
 which the instance had been booted was unplugged.  The user deleted
 the instance - which amounts to a database update on the Nova side.
 They then tried to delete the volume, which failed with the following
 error:

 Traceback (most recent call last):
 21262   File /usr/lib64/python2.7/site-packages/cinder/volume/driver.py,
 line 90, in _try_execute
 21263 self._execute(*command, **kwargs)
 21264   File /usr/lib64/python2.7/site-packages/cinder/utils.py,
 line 190, in execute
 21265 cmd=' '.join(cmd))
 21266 ProcessExecutionError: Unexpected error while running command.
 21267 Command: rbd rm --pool rbd volume-66e11621-1c38-4e2d-9d90-cc511013c290
 21268 Exit code: 16
 21269 Stdout: '\rRemoving image: 1% complete...\rRemoving image: 2%
 complete...\rRemoving image: 3% complete...\rRemoving image: 4%
 complete...\rRemoving image: 5% complete  ...\rRemoving image: 6%
 complete...\rRemoving image: 7% complete...\rRemoving image: 8%
 complete...\rRemoving image: 9% complete...\rRemoving image: 10%
 complete...\r  Removing image: 11% complete...\rRemoving image:
 12% complete...\rRemoving image: 13% complete...\rRemoving image: 14%
 complete...\rRemoving image: 15% complete...\rR  emoving image:
 16% complete...\rRemoving image: 17% complete...\rRemoving image: 18%
 complete...\rRemoving image: 19% complete...\rRemoving image: 20%
 complete...\rRe  moving image: 21% complete...\rRemoving image:
 22% complete...\rRemoving image: 23% complete...\rRemoving image: 24%
 complete...\rRemoving image: 25% complete...\rRem  oving image:
 26% complete...\rRemoving image: 27% complete...\rRemoving image: 28%
 complete...\rRemoving image: 29% complete...\rRemoving image: 30%
 complete...\rRemo  ving image: 31% complete...\rRemoving image:
 32% complete...\rRemoving image: 33% complete...\rRemoving image: 34%
 complete...\rRemoving image: 35% complete...\rRemov  ing image:
 36% complete...\rRemoving image: 37% complete...\rRemoving image: 38%
 complete...\rRemoving image: 39% complete...\rRemoving image: 40%
 complete...\rRemovi  ng image: 41% complete...\rRemoving image:
 42% complete...\rRemoving image: 43% complete...\rRemoving image: 44%
 complete...\rRemoving image: 45% complete...\rRemovin  g image:
 46% complete...\rRemoving image: 47% complete...\rRemoving image: 48%
 complete...\rRemoving image: 49% complete...\rRemoving image: 50%
 complete...\rRemoving   image: 51% complete...\rRemoving image:
 52% complete...\rRemoving image: 53% complete...\rRemoving image: 54%
 complete...\rRemoving image: 55% complete...\rRemoving   image:
 56% complete...\rRemoving image: 57% complete...\rRemoving image: 58%
 complete...\rRemoving image: 59% complete...\rRemoving image: 60%
 complete...\rRemoving i  mage: 61% complete...\rRemoving image:
 62% complete...\rRemoving image: 63% complete...\rRemoving image: 64%
 complete...\rRemoving image: 65% complete...\rRemoving im  age:
 66% complete...\rRemoving image: 67% complete...\rRemoving image: 68%
 complete...\rRemoving image: 69% complete...\rRemoving image: 70%
 complete...\rRemoving ima  ge: 71% complete...\rRemoving image:
 72% complete...\rRemoving image: 73% complete...\rRemoving image: 74%
 complete...\rRemoving image: 75% complete...\rRemoving imag  e:
 76% complete...\rRemoving image: 77% complete...\rRemoving image: 78%
 complete...\rRemoving image: 79% complete...\rRemoving image: 80%
 complete...\rRemoving image  : 81% complete...\rRemoving image:
 82% complete...\rRemoving image: 83% complete...\rRemoving image: 84%
 complete...\rRemoving image: 85% complete...\rRemoving image:
 86% complete...\rRemoving image: 87% complete...\rRemoving image: 88%
 complete...\rRemoving image: 89% complete...\rRemoving image: 90%
 complete...\rRemoving image:   91% complete...\rRemoving image:
 92% complete...\rRemoving image: 93% complete...\rRemoving image: 94%
 complete...\rRemoving image: 95% complete...\rRemoving image: 9
 6% complete...\rRemoving image: 97% complete...\rRemoving image: 98%
 complete...\rRemoving image: 99% complete...\rRemoving image: 99%
 complete...failed.\n'
 21270 Stderr: 'rbd: error: image still has watchers\nThis means the
 image is still open or the client using it crashed. Try again after
 closing/unmapping it or waiting 30s   for the crashed client to
 timeout.\n2013-05-09 21:51:27.522986 7f8aca884780 -1 librbd: error
 removing header: (16) Device or resource busy\n'

 It appears to me that Ceph still believes the volume