Adding back ceph-users; try not to turn public threads into private ones when the problem hasn't been resolved.

On 08/13/2013 04:42 AM, Joshua Young wrote:
So I put the journals on their own partitions and they worked just
fine. All night they were up doing normal operations. When running
initctl list | grep ceph I would get ...

ceph-mds-all-starter stop/waiting
ceph-mds-all start/running
ceph-osd-all start/running
ceph-osd-all-starter stop/waiting
ceph-all start/running
ceph-mon-all start/running
ceph-mon-all-starter stop/waiting
ceph-mon (ceph/cloud3) start/running, process 1864
ceph-create-keys stop/waiting
ceph-osd (ceph/8) start/running, process 2136
ceph-osd (ceph/20) start/running, process 5281
ceph-osd (ceph/15) start/running, process 5292
ceph-osd (ceph/14) start/running, process 2135
ceph-mds stop/waiting



This is correct. There are 4 OSDs on this server. Now I have come in
today and running ceph -s still says all of my OSDS are up. When I run
the same command as above I only see OSD 14. When I go into the logs of
one of the others (OSD 15 ) I see this...

Does ps agree that only one OSD is left running?

2013-08-13 06:37:48.414775 7ffa2099a7c0  0 ceph version 0.61.7 
(8f010aff684e820ecc837c25ac77c7a05d7191ff), process ceph-osd, pid 16597
2013-08-13 06:37:48.421208 7ffa2099a7c0  0 filestore(/var/lib/ceph/osd/ceph-15) 
lock_fsid failed to lock /var/lib/ceph/osd/ceph-15/fsid, is another ceph-osd 
still running? (11) Resource temporarily unavailable
2013-08-13 06:37:48.421246 7ffa2099a7c0 -1 filestore(/var/lib/ceph/osd/ceph-15) 
FileStore::mount: lock_fsid failed
2013-08-13 06:37:48.421274 7ffa2099a7c0 -1 ^[[0;31m ** ERROR: error converting 
store /var/lib/ceph/osd/ceph-15: (16) Device or resource busy^[[0m
2013-08-13 06:37:48.445927 7f0fbb6687c0  0 ceph version 0.61.7 
(8f010aff684e820ecc837c25ac77c7a05d7191ff), process ceph-osd, pid 16659
2013-08-13 06:37:48.447470 7f0fbb6687c0  0 filestore(/var/lib/ceph/osd/ceph-15) 
lock_fsid failed to lock /var/lib/ceph/osd/ceph-15/fsid, is another ceph-osd 
still running? (11) Resource temporarily unavailable
2013-08-13 06:37:48.447480 7f0fbb6687c0 -1 filestore(/var/lib/ceph/osd/ceph-15) 
FileStore::mount: lock_fsid failed
2013-08-13 06:37:48.447500 7f0fbb6687c0 -1 ^[[0;31m ** ERROR: error converting 
store /var/lib/ceph/osd/ceph-15: (16) Device or resource busy^[[0m
2013-08-13 06:37:48.474852 7f28f332c7c0  0 ceph version 0.61.7 
(8f010aff684e820ecc837c25ac77c7a05d7191ff), process ceph-osd, pid 16752
2013-08-13 06:37:48.476695 7f28f332c7c0  0 filestore(/var/lib/ceph/osd/ceph-15) 
lock_fsid failed to lock /var/lib/ceph/osd/ceph-15/fsid, is another ceph-osd 
still running? (11) Resource temporarily unavailable
2013-08-13 06:37:48.476707 7f28f332c7c0 -1 filestore(/var/lib/ceph/osd/ceph-15) 
FileStore::mount: lock_fsid failed
2013-08-13 06:37:48.476728 7f28f332c7c0 -1 ^[[0;31m ** ERROR: error converting 
store /var/lib/ceph/osd/ceph-15: (16) Device or resource busy^[[0m
2013-08-13 06:37:48.501723 7f84618467c0  0 ceph version 0.61.7 
(8f010aff684e820ecc837c25ac77c7a05d7191ff), process ceph-osd, pid 16845
2013-08-13 06:37:48.503919 7f84618467c0  0 filestore(/var/lib/ceph/osd/ceph-15) 
lock_fsid failed to lock /var/lib/ceph/osd/ceph-15/fsid, is another ceph-osd 
still running? (11) Resource temporarily unavailable
2013-08-13 06:37:48.503932 7f84618467c0 -1 filestore(/var/lib/ceph/osd/ceph-15) 
FileStore::mount: lock_fsid failed
2013-08-13 06:37:48.503955 7f84618467c0 -1 ^[[0;31m ** ERROR: error converting 
store /var/lib/ceph/osd/ceph-15: (16) Device or resource busy^[[0m
2013-08-13 06:37:48.529665 7f29c2a367c0  0 ceph version 0.61.7 
(8f010aff684e820ecc837c25ac77c7a05d7191ff), process ceph-osd, pid 16944
2013-08-13 06:37:48.531227 7f29c2a367c0  0 filestore(/var/lib/ceph/osd/ceph-15) 
lock_fsid failed to lock /var/lib/ceph/osd/ceph-15/fsid, is another ceph-osd 
still running? (11) Resource temporarily unavailable
2013-08-13 06:37:48.531239 7f29c2a367c0 -1 filestore(/var/lib/ceph/osd/ceph-15) 
FileStore::mount: lock_fsid failed
2013-08-13 06:37:48.531260 7f29c2a367c0 -1 ^[[0;31m ** ERROR: error converting 
store /var/lib/ceph/osd/ceph-15: (16) Device or resource busy^[[0m


So the OSD can't get a lock on its data. You aren't attempting to share devices/partitions for OSD storage as well, are you?

What is your cluster configuration?


Any idea? Thanks



-----Original Message-----
From: Dan Mick [mailto:dan.m...@inktank.com]
Sent: Monday, August 12, 2013 5:50 PM
To: Joshua Young
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Start Stop OSD



On 08/12/2013 04:49 AM, Joshua Young wrote:
I have 2 issues that I can not find a solution to.

First: I am unable to stop / start any osd by command. I have deployed
with ceph-deploy on Ubuntu 13.04 and everything seems to be working
find. I have 5 hosts 5 mons and 20 osds.

Using initctl list | grep ceph gives me

ceph-osd (ceph/15) start/running, process 2122

The fact that only one is output means that upstart believes there's only one 
OSD job running.  Are you sure the other daemons are actually alive and started 
by upstart?

However OSD 12 13 14 15 are all on this server.

sudo stop ceph-osd id=12

gives me stop: Unknown instance: ceph/12

Does anyone know what is wrong? Nothing in logs.

Also, when trying to put the journal on an SSD everything works fine.
I can add all 4 disks per host to the same SSD. The issue is when I
restart the server, only 1 out of the 3 OSDs will come back up. Has
anyone else had this issue?

Are you using partitions on the SSD?  If not, that's obviously going to be a 
problem; the device is usable by only one journal at a time.


--
Dan Mick, Filesystem Engineering
Inktank Storage, Inc.   http://inktank.com
Ceph docs: http://ceph.com/docs
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to