[ceph-users] Lost monitors in a multi mon cluster
Hello, I was running a multi mon (3) Ceph cluster and in a migration move, I reinstall 2 of the 3 monitors nodes without deleting them properly into the cluster. So, there is only one monitor left which is stuck in probing phase and the cluster is down. As I can only connect to mon socket, I don't how if it's possible to add a monitor, get and edit monmap. This cluster is running Ceph version 0.67.1. Is there a way to force my last monitor into a leader state or re build a lost monitor to pass the probe and election phases ? Thank you, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RE : Balance data on near full osd warning or error
Hi, thank you it's rebalancing now :) De : Eric Eastman [eri...@aol.com] Date d'envoi : mercredi 23 octobre 2013 01:19 À : HURTEVENT VINCENT; ceph-users@lists.ceph.com Objet : Re: [ceph-users] Balance data on near full osd warning or error Hello, What I have used to rebalance my cluster is: ceph osd reweight-by-utilization >we're using a small Ceph cluster with 8 nodes, each 4 osds. People are using it >through instances and volumes in a Openstack platform. > >We're facing a HEALTH_ERR with full or near full osds : > > cluster 5942e110-ea2f-4bac-80f7-243fe3e35732 > health HEALTH_ERR 1 full osd(s); 13 near full osd(s) > monmap e1: 3 mons at {0=192.168.73.131:6789/0,1=192.168.73.135:6789/0,2=192.168.73.140:6789/0} , >election epoch 2974, quorum 0,1,2 0,1,2 > osdmap e4127: 32 osds: 32 up, 32 in full > pgmap v6055899: 10304 pgs: 10304 active+clean; 12444 GB data, 24953 GB used, >4840 GB / 29793 GB avail > mdsmap e792: 1/1/1 up {0=2=up:active}, 2 up:standby Eric ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Balance data on near full osd warning or error
Hello, we're using a small Ceph cluster with 8 nodes, each 4 osds. People are using it through instances and volumes in a Openstack platform. We're facing a HEALTH_ERR with full or near full osds : cluster 5942e110-ea2f-4bac-80f7-243fe3e35732 health HEALTH_ERR 1 full osd(s); 13 near full osd(s) monmap e1: 3 mons at {0=192.168.73.131:6789/0,1=192.168.73.135:6789/0,2=192.168.73.140:6789/0}, election epoch 2974, quorum 0,1,2 0,1,2 osdmap e4127: 32 osds: 32 up, 32 in full pgmap v6055899: 10304 pgs: 10304 active+clean; 12444 GB data, 24953 GB used, 4840 GB / 29793 GB avail mdsmap e792: 1/1/1 up {0=2=up:active}, 2 up:standby Here is the dd output on these osds : /dev/sdc 932G785G 147G 85% /data/ceph/osd/data/1 /dev/sdd 932G879G 53G 95% /data/ceph/osd/data/2 /dev/sde 932G765G 167G 83% /data/ceph/osd/data/3 /dev/sdf 932G754G 178G 81% /data/ceph/osd/data/4 /dev/sdc 932G799G 133G 86% /data/ceph/osd/data/6 /dev/sdd 932G818G 114G 88% /data/ceph/osd/data/7 /dev/sde 932G814G 118G 88% /data/ceph/osd/data/8 /dev/sdf 932G801G 131G 86% /data/ceph/osd/data/9 /dev/sdc 932G764G 168G 83% /data/ceph/osd/data/11 /dev/sdd 932G840G 92G 91% /data/ceph/osd/data/12 /dev/sde 932G699G 233G 76% /data/ceph/osd/data/13 /dev/sdf 932G721G 211G 78% /data/ceph/osd/data/14 /dev/sdc 932G778G 154G 84% /data/ceph/osd/data/16 /dev/sdd 932G820G 112G 88% /data/ceph/osd/data/17 /dev/sde 932G684G 248G 74% /data/ceph/osd/data/18 /dev/sdf 932G763G 169G 82% /data/ceph/osd/data/19 /dev/sdc 932G757G 175G 82% /data/ceph/osd/data/21 /dev/sdd 932G715G 217G 77% /data/ceph/osd/data/22 /dev/sde 932G762G 170G 82% /data/ceph/osd/data/23 /dev/sdf 932G728G 204G 79% /data/ceph/osd/data/24 /dev/sdc 932G841G 91G 91% /data/ceph/osd/data/26 /dev/sdd 932G795G 137G 86% /data/ceph/osd/data/27 /dev/sde 932G691G 241G 75% /data/ceph/osd/data/28 /dev/sdf 932G772G 160G 83% /data/ceph/osd/data/29 /dev/sdc 932G738G 195G 80% /data/ceph/osd/data/36 /dev/sdd 932G803G 129G 87% /data/ceph/osd/data/37 /dev/sde 932G783G 149G 85% /data/ceph/osd/data/38 /dev/sdf 932G844G 88G 91% /data/ceph/osd/data/39 /dev/sdc 932G885G 47G 96% /data/ceph/osd/data/31 /dev/sdd 932G708G 224G 76% /data/ceph/osd/data/32 /dev/sde 932G802G 130G 87% /data/ceph/osd/data/33 /dev/sdf 932G862G 70G 93% /data/ceph/osd/data/34 Some osds are really nearly full whereas some are ok I think (below 80%). Wer're losing GBs we thought usable. There is no custom CRUSHmap, Is there a way to rebalance data between osds ? As I understand the documentation, we should do this by adding new osd(s), switch off highly used osd to force the cluster to rebuild data. Is there a way to do this without addind osds ? We 're using ceph version 0.67.1. Thank you, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RE : OpenStack Cinder + Ceph, unable to remove unattached volumes, still watchers
Hi Josh, thank you for your answer, but I was in Bobtail so no listwatchers command :) I planned a reboot of concerned compute nodes and all went fine then. I updated Ceph to last stable though. De : Josh Durgin [josh.dur...@inktank.com] Date d'envoi : mardi 20 août 2013 22:40 À : HURTEVENT VINCENT Cc: Maciej Gałkiewicz; ceph-us...@ceph.com Objet : Re: [ceph-users] OpenStack Cinder + Ceph, unable to remove unattached volumes, still watchers On 08/20/2013 11:20 AM, Vincent Hurtevent wrote: > > > I'm not the end user. It's possible that the volume has been detached > without unmounting. > > As the volume is unattached and the initial kvm instance is down, I was > expecting the rbd volume is properly unlocked even if the guest unmount > hasn't been done, like a physical disk in fact. Yes, detaching the volume will remove the watch regardless of the guest having it mounted. > Which part of the Ceph thing is allways locked or marked in use ? Do we > have to go to the rados object level ? > The data can be destroy. It's a watch on the rbd header object, registered when the rbd volume is attached, and unregistered when it is detached or 30 seconds after the qemu/kvm process using it dies. From rbd info you can get the id of the image (part of the block_name_prefix), and use the rados tool to see what ip is watching the volume's header object, i.e.: $ rbd info volume-name | grep prefix block_name_prefix: rbd_data.102f74b0dc51 $ rados -p rbd listwatchers rbd_header.102f74b0dc51 watcher=192.168.106.222:0/1029129 client.4152 cookie=1 > Reboot compute nodes could clean librbd layer and clean watchers ? Yes, because this would kill all the qemu/kvm processes. Josh > > De : Don Talton (dotalton) [dotal...@cisco.com] > Date d'envoi : mardi 20 août 2013 19:57 > À : HURTEVENT VINCENT > Objet : RE: [ceph-users] OpenStack Cinder + Ceph, unable to remove > unattached volumes, still watchers > > Did you unmounts them in the guest before detaching? > > > -Original Message- > > From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users- > > boun...@lists.ceph.com] On Behalf Of Vincent Hurtevent > > Sent: Tuesday, August 20, 2013 10:33 AM > > To: ceph-us...@ceph.com > > Subject: [ceph-users] OpenStack Cinder + Ceph, unable to remove > > unattached volumes, still watchers > > > > Hello, > > > > I'm using Ceph as Cinder backend. Actually it's working pretty well > and some > > users are using this cloud platform for few weeks, but I come back from > > vacation and I've got some errors removing volumes, errors I didn't > have few > > weeks ago. > > > > Here's the situation : > > > > Volumes are unattached, but Ceph is telling Cinder or I, when I try > to remove > > trough rbd tools, that the volume still has watchers. > > > > rbd --pool cinder rm volume-46e241ee-ed3f-446a-87c7-1c9df560d770 > > Removing image: 99% complete...failed. > > rbd: error: image still has watchers > > This means the image is still open or the client using it crashed. > Try again after > > closing/unmapping it or waiting 30s for the crashed client to timeout. > > 2013-08-20 19:17:36.075524 7fedbc7e1780 -1 librbd: error removing > > header: (16) Device or resource busy > > > > > > The kvm instances on which the volumes have been attached are now > > terminated. There's no lock on the volume using 'rbd lock list'. > > > > I restarted all the monitors (3) one by one, with no better success. > > > > From Openstack PoV, these volumes are well unattached. > > > > How can I unlock the volumes or trace back the watcher/process ? These > > could be on several and different compute nodes. > > > > > > Thank you for any hint, > > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com