[ceph-users] Lost monitors in a multi mon cluster

2014-10-24 Thread HURTEVENT VINCENT
Hello,

I was running a multi mon (3) Ceph cluster and in a migration move, I reinstall 
2 of the 3 monitors nodes without deleting them properly into the cluster.

So, there is only one monitor left which is stuck in probing phase and the 
cluster is down.

As I can only connect to mon socket, I don't how if it's possible to add a 
monitor, get and edit monmap.

This cluster is running Ceph version 0.67.1.

Is there a way to force my last monitor into a leader state or re build a lost 
monitor to pass the probe and election phases ?

Thank you,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RE : Balance data on near full osd warning or error

2013-10-23 Thread HURTEVENT VINCENT
Hi,

thank you it's rebalancing now :)




De : Eric Eastman [eri...@aol.com]
Date d'envoi : mercredi 23 octobre 2013 01:19
À : HURTEVENT VINCENT; ceph-users@lists.ceph.com
Objet : Re: [ceph-users] Balance data on near full osd warning or error

Hello,
What I have used to rebalance my cluster is:

ceph osd reweight-by-utilization


>we're using a small Ceph cluster with 8 nodes, each 4 osds. People are
using it
>through instances and volumes in a Openstack platform.
>
>We're facing a HEALTH_ERR with full or near full osds :
>
> cluster 5942e110-ea2f-4bac-80f7-243fe3e35732
>   health HEALTH_ERR 1 full osd(s); 13 near full osd(s)
>   monmap e1: 3 mons at
{0=192.168.73.131:6789/0,1=192.168.73.135:6789/0,2=192.168.73.140:6789/0}
,
>election epoch 2974, quorum 0,1,2 0,1,2
 >  osdmap e4127: 32 osds: 32 up, 32 in full
  >   pgmap v6055899: 10304 pgs: 10304 active+clean; 12444 GB data,
24953 GB used,
>4840 GB / 29793 GB avail
 >  mdsmap e792: 1/1/1 up {0=2=up:active}, 2 up:standby

Eric
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Balance data on near full osd warning or error

2013-10-22 Thread HURTEVENT VINCENT
Hello,

we're using a small Ceph cluster with 8 nodes, each 4 osds. People are using it 
through instances and volumes in a Openstack platform.

We're facing a HEALTH_ERR with full or near full osds :

  cluster 5942e110-ea2f-4bac-80f7-243fe3e35732
   health HEALTH_ERR 1 full osd(s); 13 near full osd(s)
   monmap e1: 3 mons at 
{0=192.168.73.131:6789/0,1=192.168.73.135:6789/0,2=192.168.73.140:6789/0}, 
election epoch 2974, quorum 0,1,2 0,1,2
   osdmap e4127: 32 osds: 32 up, 32 in full
pgmap v6055899: 10304 pgs: 10304 active+clean; 12444 GB data, 24953 GB 
used, 4840 GB / 29793 GB avail
   mdsmap e792: 1/1/1 up {0=2=up:active}, 2 up:standby

Here is the dd output on these osds :

/dev/sdc   932G785G  147G  85% /data/ceph/osd/data/1
/dev/sdd   932G879G   53G  95% /data/ceph/osd/data/2
/dev/sde   932G765G  167G  83% /data/ceph/osd/data/3
/dev/sdf   932G754G  178G  81% /data/ceph/osd/data/4
/dev/sdc   932G799G  133G  86% /data/ceph/osd/data/6
/dev/sdd   932G818G  114G  88% /data/ceph/osd/data/7
/dev/sde   932G814G  118G  88% /data/ceph/osd/data/8
/dev/sdf   932G801G  131G  86% /data/ceph/osd/data/9
/dev/sdc   932G764G  168G  83% /data/ceph/osd/data/11
/dev/sdd   932G840G   92G  91% /data/ceph/osd/data/12
/dev/sde   932G699G  233G  76% /data/ceph/osd/data/13
/dev/sdf   932G721G  211G  78% /data/ceph/osd/data/14
/dev/sdc   932G778G  154G  84% /data/ceph/osd/data/16
/dev/sdd   932G820G  112G  88% /data/ceph/osd/data/17
/dev/sde   932G684G  248G  74% /data/ceph/osd/data/18
/dev/sdf   932G763G  169G  82% /data/ceph/osd/data/19
/dev/sdc   932G757G  175G  82% /data/ceph/osd/data/21
/dev/sdd   932G715G  217G  77% /data/ceph/osd/data/22
/dev/sde   932G762G  170G  82% /data/ceph/osd/data/23
/dev/sdf   932G728G  204G  79% /data/ceph/osd/data/24
/dev/sdc   932G841G   91G  91% /data/ceph/osd/data/26
/dev/sdd   932G795G  137G  86% /data/ceph/osd/data/27
/dev/sde   932G691G  241G  75% /data/ceph/osd/data/28
/dev/sdf   932G772G  160G  83% /data/ceph/osd/data/29
/dev/sdc   932G738G  195G  80% /data/ceph/osd/data/36
/dev/sdd   932G803G  129G  87% /data/ceph/osd/data/37
/dev/sde   932G783G  149G  85% /data/ceph/osd/data/38
/dev/sdf   932G844G   88G  91% /data/ceph/osd/data/39
/dev/sdc   932G885G   47G  96% /data/ceph/osd/data/31
/dev/sdd   932G708G  224G  76% /data/ceph/osd/data/32
/dev/sde   932G802G  130G  87% /data/ceph/osd/data/33
/dev/sdf   932G862G   70G  93% /data/ceph/osd/data/34

Some osds are really nearly full whereas some are ok I think (below 80%). 
Wer're losing GBs we thought usable. There is no custom CRUSHmap,

Is there a way to rebalance data between osds ? As I understand the 
documentation, we should do this by adding new osd(s), switch off highly used 
osd to force the cluster to rebuild data.

Is there a way to do this without addind osds ? We 're using ceph version 
0.67.1.

Thank you,



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RE : OpenStack Cinder + Ceph, unable to remove unattached volumes, still watchers

2013-08-22 Thread HURTEVENT VINCENT
Hi Josh,

thank you for your answer, but I was in Bobtail so no listwatchers command :)

I planned a reboot of concerned compute nodes and all went fine then. I updated 
Ceph to last stable though.





De : Josh Durgin [josh.dur...@inktank.com]
Date d'envoi : mardi 20 août 2013 22:40
À : HURTEVENT VINCENT
Cc: Maciej Gałkiewicz; ceph-us...@ceph.com
Objet : Re: [ceph-users] OpenStack Cinder + Ceph, unable to remove unattached 
volumes, still watchers

On 08/20/2013 11:20 AM, Vincent Hurtevent wrote:
>
>
> I'm not the end user. It's possible that the volume has been detached
> without unmounting.
>
> As the volume is unattached and the initial kvm instance is down, I was
> expecting the rbd volume is properly unlocked even if the guest unmount
> hasn't been done, like a physical disk in fact.

Yes, detaching the volume will remove the watch regardless of the guest
having it mounted.

> Which part of the Ceph thing is allways locked or marked in use ? Do we
> have to go to the rados object level ?
> The data can be destroy.

It's a watch on the rbd header object, registered when the rbd volume
is attached, and unregistered when it is detached or 30 seconds after
the qemu/kvm process using it dies.

 From rbd info you can get the id of the image (part of the
block_name_prefix), and use the rados tool to see what ip is watching
the volume's header object, i.e.:

$ rbd info volume-name | grep prefix
 block_name_prefix: rbd_data.102f74b0dc51
$ rados -p rbd listwatchers rbd_header.102f74b0dc51
watcher=192.168.106.222:0/1029129 client.4152 cookie=1

> Reboot compute nodes could clean librbd layer and clean watchers ?

Yes, because this would kill all the qemu/kvm processes.

Josh

> 
> De : Don Talton (dotalton) [dotal...@cisco.com]
> Date d'envoi : mardi 20 août 2013 19:57
> À : HURTEVENT VINCENT
> Objet : RE: [ceph-users] OpenStack Cinder + Ceph, unable to remove
> unattached volumes, still watchers
>
> Did you unmounts them in the guest before detaching?
>
>  > -Original Message-
>  > From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
>  > boun...@lists.ceph.com] On Behalf Of Vincent Hurtevent
>  > Sent: Tuesday, August 20, 2013 10:33 AM
>  > To: ceph-us...@ceph.com
>  > Subject: [ceph-users] OpenStack Cinder + Ceph, unable to remove
>  > unattached volumes, still watchers
>  >
>  > Hello,
>  >
>  > I'm using Ceph as Cinder backend. Actually it's working pretty well
> and some
>  > users are using this cloud platform for few weeks, but I come back from
>  > vacation and I've got some errors removing volumes, errors I didn't
> have few
>  > weeks ago.
>  >
>  > Here's the situation :
>  >
>  > Volumes are unattached, but Ceph is telling Cinder or I, when I try
> to remove
>  > trough rbd tools, that the volume still has watchers.
>  >
>  > rbd --pool cinder rm volume-46e241ee-ed3f-446a-87c7-1c9df560d770
>  > Removing image: 99% complete...failed.
>  > rbd: error: image still has watchers
>  > This means the image is still open or the client using it crashed.
> Try again after
>  > closing/unmapping it or waiting 30s for the crashed client to timeout.
>  > 2013-08-20 19:17:36.075524 7fedbc7e1780 -1 librbd: error removing
>  > header: (16) Device or resource busy
>  >
>  >
>  > The kvm instances on which the volumes have been attached are now
>  > terminated. There's no lock on the volume using 'rbd lock list'.
>  >
>  > I restarted all the monitors (3) one by one, with no better success.
>  >
>  >  From Openstack PoV, these volumes are well unattached.
>  >
>  > How can I unlock the volumes or trace back the watcher/process ? These
>  > could be on several and different compute nodes.
>  >
>  >
>  > Thank you for any hint,
>  >
>  >
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com