[ceph-users] Replacing an OSD Drive

2015-02-06 Thread Gaylord Holder

When the time comes to replace an OSD I've used the following procedure

1) Stop/down/out the osd and replace the drive
2) Create the ceph osd directory: ceph-osd -i N --mkfs
3) Copy the osd key out of the authorized keys list
4) ceph osd crush rm osd.N
5) ceph osd crush add osd.$i $osd_size root=default host=$(hostname -s)
6) ceph osd in osd.N
7) service ceph start osd.N

If I don't do steps 4 and 5, the osd process times out in futex:

[pid 22822] futex(0x4604cc4, 
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 98, {1423237460, 
296281000},  

[pid 22821] futex(0x4604cc0, FUTEX_WAKE_PRIVATE, 1 
[pid 22822] <... futex resumed> )   = -1 EAGAIN (Resource 
temporarily unavailable)


Upping the debugging only shows:

2015-02-06 10:48:22.656012 7f9acf967700 20 osd.40 396 update_osd_stat 
osd_stat(62060 kB used, 2793 GB avail, 2793 GB total, peers []/[] op 
hist [])
2015-02-06 10:48:22.656025 7f9acf967700  5 osd.40 396 heartbeat: 
osd_stat(62060 kB used, 2793 GB avail, 2793 GB total, peers []/[] op 
hist [])

2015-02-06 10:48:23.356299 7f9ae76c7700  5 osd.40 396 tick
2015-02-06 10:48:23.356308 7f9ae76c7700 10 osd.40 396 do_waiters -- start
2015-02-06 10:48:23.356310 7f9ae76c7700 10 osd.40 396 do_waiters -- finish
2015-02-06 10:48:24.356114 7f9acf967700 20 osd.40 396 update_osd_stat 
osd_stat(62060 kB used, 2793 GB avail, 2793 GB total, peers []/[] op 
hist [])


in the osd log file.

What is ceph-osd doing that recreating the osd in the crush map changes?

Thanks for any enlightenment on this.
-Gaylord

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Openstack Instances and RBDs

2013-11-01 Thread Gaylord Holder

http://www.sebastien-han.fr/blog/2013/06/03/ceph-integration-in-openstack-grizzly-update-and-roadmap-for-havana/

suggests it is possible to run openstack instances (not only images) off 
of RBDs in grizzly and havana (which I'm running), and to use RBDs in 
lieu of a shared file system.


I've followed

http://ceph.com/docs/next/rbd/libvirt/

but I can only get boot-from-volume to work.  Instances still are being 
housed in /var/lib/nova/instances, making live-migration a non-starter.


Is there a better guide for running openstack instances out of RBDs, or 
is it just not ready yet?


Thanks,

-Gaylord
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Non-Ceph cluster name

2013-10-24 Thread Gaylord Holder

Works perfectly.

My only grip is --cluster isn't listed as a valid argument from

  ceph-mon --help

and the only reference searching for --cluster in the ceph documentation 
is in regards to ceph-rest-api.


Shall I file a bug to correct the documentation?

Thanks again for the quick and accurate response.

-Gaylord

On 10/24/2013 08:11 AM, Sage Weil wrote:

Try passing --cluster csceph instead of the config file path and I
suspect it will work.

sage



Gaylord Holder  wrote:

I'm trying to bring a ceph cluster not named ceph.

I'm running version 0.61.

  From my reading of the documentation, the $cluster metavariable is set
by the basename of the configuration file: specifying the configuration
file "/etc/ceph/mycluster.conf" sets the $cluster metavariable to
"mycluster"

However, given a configuration file /etc/ceph/csceph.conf:

[global]
 fsid = 70d421fe-28ca-4804-bce8-d51a16b531ec
 mon host =192.168.124.202  <http://192.168.124.202>
 mon_initial_members = a

[mon.a]
host = monnode
mon addr =192.168.124.202:6789

and running:

ceph-authtool csceph.mon.keyring --create-keyring --name=mon.
--gen-key --cap mon 'allow *'

ceph-mon -c /etc/ceph/csceph.conf --mkfs -i a --keyring
csceph.mon.keyring

ceph-mon tries to create monfs in

/var/lib/ceph/mon/ceph-a

not

/var/lib/ceph/mon/csceph-a

as expected.


Thank you for any help you can give.

Cheers,
-Gaylord


ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Non-Ceph cluster name

2013-10-24 Thread Gaylord Holder

I'm trying to bring a ceph cluster not named ceph.

I'm running version 0.61.

From my reading of the documentation, the $cluster metavariable is set 
by the basename of the configuration file: specifying the configuration 
file "/etc/ceph/mycluster.conf" sets the $cluster metavariable to 
"mycluster"


However, given a configuration file /etc/ceph/csceph.conf:

  [global]
   fsid = 70d421fe-28ca-4804-bce8-d51a16b531ec
   mon host = 192.168.124.202
   mon_initial_members = a

  [mon.a]
  host = monnode
  mon addr = 192.168.124.202:6789

and running:

  ceph-authtool csceph.mon.keyring --create-keyring --name=mon. 
--gen-key --cap mon 'allow *'


  ceph-mon -c /etc/ceph/csceph.conf --mkfs -i a --keyring 
csceph.mon.keyring


ceph-mon tries to create monfs in

  /var/lib/ceph/mon/ceph-a

not

  /var/lib/ceph/mon/csceph-a

as expected.


Thank you for any help you can give.

Cheers,
-Gaylord
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How many rbds can you map?

2013-10-08 Thread Gaylord Holder

Always nice to see I've hit a real problem, and not just my being dumb.

-Gaylord

On 10/08/2013 01:46 PM, Gregory Farnum wrote:

I believe this is a result of how we used the kernel interfaces
(allocating a major device ID for each RBD volume), and some kernel
limits (only 8 bits for storing major device IDs, and some used for
other purposes). See http://tracker.ceph.com/issues/5048

I believe we have discussed not using a major device ID for each
mounted RBD volume, but I don't remember the details as they involved
kernel-fu beyond what I'm familiar with.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, Oct 8, 2013 at 10:19 AM, Gaylord Holder  wrote:

I'm testing how many rbds I can map on a single server.

I've created 10,000 rbds in the rbd pool, but I can only actually map 230.

Mapping the 230th one fails with:
rbd: add failed: (16) Device or resource busy

Is there a way to bump this up?

-Gaylord

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How many rbds can you map?

2013-10-08 Thread Gaylord Holder

I'm testing how many rbds I can map on a single server.

I've created 10,000 rbds in the rbd pool, but I can only actually map 230.

Mapping the 230th one fails with:
rbd: add failed: (16) Device or resource busy

Is there a way to bump this up?

-Gaylord

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Full OSD questions

2013-09-22 Thread Gaylord Holder



On 09/22/2013 02:12 AM, yy-nm wrote:

On 2013/9/10 6:38, Gaylord Holder wrote:

Indeed, that pool was created with the default 8 pg_nums.

8 pg_num * 2T/OSD / 2 repl ~ 8TB which about how far I got.

I bumped up the pg_num to 600 for that pool and nothing happened.
I bumped up the pgp_num to 600 for that pool and ceph started shifting
things around.

Can you explain the difference between pg_num and pgp_num to me?
I can't understand the distinction.

Thank you for your help!

-Gaylord

On 09/09/2013 04:58 PM, Samuel Just wrote:

This is usually caused by having too few pgs.  Each pool with a
significant amount of data needs at least around 100pgs/osd.
-Sam

On Mon, Sep 9, 2013 at 10:32 AM, Gaylord Holder
 wrote:

I'm starting to load up my ceph cluster.

I currently have 12 2TB drives (10 up and in, 2 defined but down and
out).

rados df

says I have 8TB free, but I have 2 nearly full OSDs.

I don't understand how/why these two disks are filled while the
others are
relatively empty.

How do I tell ceph to spread the data around more, and why isn't it
already
doing it?

Thank you for helping me understand this system better.

Cheers,
-Gaylord
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


well, pg_num as the total num of pgs, and pgp_num means the num of pgs
which are used now


The reference


you can reference on 
http://ceph.com/docs/master/rados/operations/pools/#create-a-pool
the description of pgp_num


simply says pgp_num is:

> The total number of placement groups for placement purposes.

Why is the number of placement groups different from the number of 
placement groups for placement purposes?


When would you want them to be different?

Thank you for helping me understand this.

Cheers,
-Gaylord
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Understanding ceph status

2013-09-09 Thread Gaylord Holder

There are a lot of numbers ceph status prints.

Is there any documentation on what they are?

I'm particulary curious about what seems a total data.

ceph status says I have 314TB, when I calculate I have 24TB.

It also says:

10615 GB used, 8005 GB / 18621 GB avail;

which I take to be 10TB used/8T available for use, and 18TB total available.

This doesn't make sense to me as I have 24TB raw and with default 2x 
replication, I should only have 12TB available??


I see MB/s, K/s, o/s, but what are E/s units?

-Gaylord
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Full OSD questions

2013-09-09 Thread Gaylord Holder

Indeed, that pool was created with the default 8 pg_nums.

8 pg_num * 2T/OSD / 2 repl ~ 8TB which about how far I got.

I bumped up the pg_num to 600 for that pool and nothing happened.
I bumped up the pgp_num to 600 for that pool and ceph started shifting 
things around.


Can you explain the difference between pg_num and pgp_num to me?
I can't understand the distinction.

Thank you for your help!

-Gaylord

On 09/09/2013 04:58 PM, Samuel Just wrote:

This is usually caused by having too few pgs.  Each pool with a
significant amount of data needs at least around 100pgs/osd.
-Sam

On Mon, Sep 9, 2013 at 10:32 AM, Gaylord Holder  wrote:

I'm starting to load up my ceph cluster.

I currently have 12 2TB drives (10 up and in, 2 defined but down and out).

rados df

says I have 8TB free, but I have 2 nearly full OSDs.

I don't understand how/why these two disks are filled while the others are
relatively empty.

How do I tell ceph to spread the data around more, and why isn't it already
doing it?

Thank you for helping me understand this system better.

Cheers,
-Gaylord
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Full OSD questions

2013-09-09 Thread Gaylord Holder

I'm starting to load up my ceph cluster.

I currently have 12 2TB drives (10 up and in, 2 defined but down and out).

rados df

says I have 8TB free, but I have 2 nearly full OSDs.

I don't understand how/why these two disks are filled while the others 
are relatively empty.


How do I tell ceph to spread the data around more, and why isn't it 
already doing it?


Thank you for helping me understand this system better.

Cheers,
-Gaylord
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD map question

2013-09-04 Thread Gaylord Holder

Is it possible know if an RBD is mapped by a machine?
-Gaylord
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to force lost PGs

2013-09-03 Thread Gaylord Holder

Awesome Sage!

I knew I had lost data.  I'm trying to find out what will happen when 
the worst happens (like the ceph administer is an idiot).


So those PGs are hanging around in a OSD/pool somewhere with some kind 
of reference count and they just need to be recreated?


Thanks again for unsticking me.

-Gaylord
On 09/03/2013 10:44 AM, Sage Weil wrote:

On Sun, 1 Sep 2013, Gaylord Holder wrote:


I created a pool with no replication and an RBD within that pool.  I mapped
the RBD to a machine, formatted it with a file system and dumped data on it.

Just to see what kind of trouble I can get into, I stopped the OSD the RBD was
using, marked the OSD as out, and reformatted the OSD tree.

When I brought the OSD back up, I now have three stale PGs.

Now I'm trying to clear the stale PGs.  I've tried removing the OSD from the
crush maps, the OSD lists etc, without any luck.


Note that this means that you destroyed all copies of those 3 PGs, which
means this experiment lost data.

You can make ceph recreate the PGs (empty!) with

  ceph pg force_create_pg 

sage



Running
   ceph pg 3.1 query
   ceph pg 3.1 mark_unfound_lost revert
ceph explains it doesn't have a PG 3.1

Running
  ceph osd repair osd.1
hangs after pg 2.3e

Running
   ceph osd lost 1 --yes-i-really-mean-it
nukes the osd.  Rebuilding osd.1 goes fine, but I still have 3 stale PGs.

Any help clearing these stale pages would be appreciated.

Thanks,
-Gaylord
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to force lost PGs

2013-09-02 Thread Gaylord Holder


I created a pool with no replication and an RBD within that pool.  I 
mapped the RBD to a machine, formatted it with a file system and dumped 
data on it.


Just to see what kind of trouble I can get into, I stopped the OSD the 
RBD was using, marked the OSD as out, and reformatted the OSD tree.


When I brought the OSD back up, I now have three stale PGs.

Now I'm trying to clear the stale PGs.  I've tried removing the OSD from 
the crush maps, the OSD lists etc, without any luck.


Running
  ceph pg 3.1 query
  ceph pg 3.1 mark_unfound_lost revert
ceph explains it doesn't have a PG 3.1

Running
 ceph osd repair osd.1
hangs after pg 2.3e

Running
  ceph osd lost 1 --yes-i-really-mean-it
nukes the osd.  Rebuilding osd.1 goes fine, but I still have 3 stale PGs.

Any help clearing these stale pages would be appreciated.

Thanks,
-Gaylord
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD Mapping

2013-07-23 Thread Gaylord Holder

Is it possible to find out which machines are mapping and RBD?

-Gaylord
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph pgs stuck or degraded.

2013-07-22 Thread Gaylord Holder
If I understand what the #tunables page is saying, changing the tunables 
kicks the OSD re-balancing mechanism a bit and resets it to try again.


I'll see about getting 3.9 kernel in for my RBD maachines, and reset 
everything to optimal.


Thanks again.

-Gaylord

On 07/22/2013 04:51 PM, Sage Weil wrote:

On Mon, 22 Jul 2013, Gaylord Holder wrote:

Sage,

The crush tunables did the trick.

why?  Could you explain what was causing the problem?


This has a good explanation, I think:

http://ceph.com/docs/master/rados/operations/crush-map/#tunables


I've haven't installed 3.9 on my RBD servers yet.  Will setting crush tunables
back to default or legacy cause me similar problems in the future?


Yeah.  For 3.6+ kernels, you can set slightly different tunables and it
will be very close to optimal...

sage




Thank you again Sage!

-Gaylord

On 07/22/2013 02:27 PM, Sage Weil wr:

On Mon, 22 Jul 2013, Gaylord Holder wrote:


I have a 12 OSD/3 host set up, and have be stuck with a bunch of stuck
pages.

I've verified the OSDs are all up and in.  The crushmap looks fine.
I've tried restarting all the daemons.



root@never:/var/lib/ceph/mon# ceph status
 health HEALTH_WARN 139 pgs degraded; 461 pgs stuck unclean; recovery
216/6213 degraded (3.477%)
 monmap e4: 2 mons at {a=192.168.225.9:6789/0,b=192.168.225.10:6789/0},
election epoch 14, quorum 0,1 a,b


Add another monitor; right now if 1 fails the cluster is unavailable.


 osdmap e238: 12 osds: 12 up, 12 in
  pgmap v7396: 2528 pgs: 2067 active+clean, 322 active+remapped, 139
active+degraded; 8218 MB data, 103 GB used, 22241 GB / 22345 GB avail;
216/6213 degraded (3.477%)
 mdsmap e1: 0/0/1 up


My guess crush tunables.  Try

   ceph osd crush tunables optimal

unless you are using a pre-3.8(ish) kernel or other very old (pre-bobtail)
clients.

sage





I have one non-default pool with 3x replication.  Fewer than half of the
pg
have expanded to 3x (278/400 pgs still have acting 2x sets).

Where can I go look for the trouble?

Thank you for any light someone can shed on this.

Cheers,
-Gaylord
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph pgs stuck or degraded.

2013-07-22 Thread Gaylord Holder

Sage,

The crush tunables did the trick.

why?  Could you explain what was causing the problem?

I've haven't installed 3.9 on my RBD servers yet.  Will setting crush 
tunables back to default or legacy cause me similar problems in the future?


Thank you again Sage!

-Gaylord

On 07/22/2013 02:27 PM, Sage Weil wr:

On Mon, 22 Jul 2013, Gaylord Holder wrote:


I have a 12 OSD/3 host set up, and have be stuck with a bunch of stuck pages.

I've verified the OSDs are all up and in.  The crushmap looks fine.
I've tried restarting all the daemons.



root@never:/var/lib/ceph/mon# ceph status
health HEALTH_WARN 139 pgs degraded; 461 pgs stuck unclean; recovery
216/6213 degraded (3.477%)
monmap e4: 2 mons at {a=192.168.225.9:6789/0,b=192.168.225.10:6789/0},
election epoch 14, quorum 0,1 a,b


Add another monitor; right now if 1 fails the cluster is unavailable.


osdmap e238: 12 osds: 12 up, 12 in
 pgmap v7396: 2528 pgs: 2067 active+clean, 322 active+remapped, 139
active+degraded; 8218 MB data, 103 GB used, 22241 GB / 22345 GB avail;
216/6213 degraded (3.477%)
mdsmap e1: 0/0/1 up


My guess crush tunables.  Try

  ceph osd crush tunables optimal

unless you are using a pre-3.8(ish) kernel or other very old (pre-bobtail)
clients.

sage





I have one non-default pool with 3x replication.  Fewer than half of the pg
have expanded to 3x (278/400 pgs still have acting 2x sets).

Where can I go look for the trouble?

Thank you for any light someone can shed on this.

Cheers,
-Gaylord
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph pgs stuck or degraded.

2013-07-22 Thread Gaylord Holder


I have a 12 OSD/3 host set up, and have be stuck with a bunch of stuck 
pages.


I've verified the OSDs are all up and in.  The crushmap looks fine.
I've tried restarting all the daemons.



root@never:/var/lib/ceph/mon# ceph status
   health HEALTH_WARN 139 pgs degraded; 461 pgs stuck unclean; recovery 
216/6213 degraded (3.477%)
   monmap e4: 2 mons at 
{a=192.168.225.9:6789/0,b=192.168.225.10:6789/0}, election epoch 14, 
quorum 0,1 a,b

   osdmap e238: 12 osds: 12 up, 12 in
pgmap v7396: 2528 pgs: 2067 active+clean, 322 active+remapped, 139 
active+degraded; 8218 MB data, 103 GB used, 22241 GB / 22345 GB avail; 
216/6213 degraded (3.477%)

   mdsmap e1: 0/0/1 up


I have one non-default pool with 3x replication.  Fewer than half of the 
pg have expanded to 3x (278/400 pgs still have acting 2x sets).


Where can I go look for the trouble?

Thank you for any light someone can shed on this.

Cheers,
-Gaylord
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] feature set mismatch

2013-07-19 Thread Gaylord Holder



On 07/17/2013 05:49 PM, Josh Durgin wrote:

[please keep replies on the list]

On 07/17/2013 04:04 AM, Gaylord Holder wrote:



On 07/16/2013 09:22 PM, Josh Durgin wrote:

On 07/16/2013 06:06 PM, Gaylord Holder wrote:

Now whenever I try to map an RBD to a machine, mon0 complains:

feature set mismatch, my 2 < server's 2040002, missing 204
missing required protocol features.


Your cluster is using newer crush tunables to get better data
distribution, but your kernel client doesn't support that.

You'll need to upgrade to linux 3.9, or set the tunables
to 'legacy', which your kernel understands [1].

Josh

[1] http://ceph.com/docs/master/rados/operations/crush-map/#tuning-crush



Josh,

That was certainly the trick.

  ceph osd crush tunables legacy

now allows me to map the rbd.


To be clear, did you change the tunables before? If the upgrade enabled
them somehow without your intervention, it would be a bug.


No bugs on this issue.

I had changed the tunables and not connected tunables to a protocol 
mismatch error messages.


Thanks again for your help.
-gaylord




Who need to be running 3.9?  Just the machines mounting the rbd, or
everyone?


Just the machines mounting it.



Is there a better place in the documentation to track the recommended
kernel version than

   http://ceph.com/docs/next/install/os-recommendations/


That and the release notes are the best places to look.
Nothing incompatible with old kernels should be enabled by default,
but some new features (like the crush tunables) may require newer
kernel clients.

Josh


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] feature set mismatch

2013-07-16 Thread Gaylord Holder
I had RBD's working and mapping working.  Then I grew the cluster and 
increased the OSDs.


Now whenever I try to map an RBD to a machine, mon0 complains:

feature set mismatch, my 2 < server's 2040002, missing 204
missing required protocol features.

I don't see any other problems with the cluster, only

rbd map  -p pool image

hanging.

Any help or pointers would be appreciated.

Thank you,
-Gaylord
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com