Re: [ceph-users] FW: OSD deployed with ceph directories but not using Cinder volumes

2015-05-27 Thread Sergio A. de Carvalho Jr.
I was under the impression that ceph-disk activate would take care of
setting OSD weights. In fact, the documentation for adding OSDs, the short
form, only talks about running ceph-disk prepare and activate:

http://ceph.com/docs/master/install/manual-deployment/#adding-osds

This is also how the Ceph cookbook provisions OSDs (
https://github.com/ceph/ceph-cookbook) and we've been using it successfully
in other scenarios, without having to manually set weights.

Sergio



On Wed, May 27, 2015 at 2:09 AM, Christian Balzer ch...@gol.com wrote:


 Hello,

 your problem is of course that the weight is 0 for all your OSDs.
 Thus no data can be placed anywhere at all.

 You will want to re-read the manual deployment documentation or dissect
 ceph-deploy/ceph-disk more.
 Your script misses the crush add bit of that process:
 ceph osd crush add {id-or-name} {weight}  [{bucket-type}={bucket-name} ...]

 like
 ceph osd crush add osd.0 1 host=host01

 Christian

 On Tue, 26 May 2015 17:29:52 + Johanni Thunstrom wrote:

  Dear Ceph Team,
 
  Our cluster includes three Ceph nodes with 1 MON and 1 OSD in each. All
  nodes are running on CentOS 6.5 (kernel 2.6.32) VMs in a testing
  cluster, not production. The script we’re using is a simplified sequence
  of steps that does more or less what the ceph-cookbook does. Using
  OpenStack Cinder, we have attached a 10G block volume to each node in
  order to setup the OSD. After running our ceph cluster initialization
  script (pasted below), our cluster has a status of HEALTH_WARN and PG
  status of incomplete. Additionally all PGs in every Ceph node have the
  same acting and up set: [0]. Is this an indicator that the PG’s have not
  even started the creating state, since not every OSD has the id 0 yet
  they all state 0 as their up and acting OSD? Additionally the weight of
  all OSD’s is 0. Overall, the OSD’s appear to be up and in. The network
  appears to be fine; we are able to ping  telnet to each server from one
  another.
 
  In order to isolate our problem, we tried replacing the attached cinder
  volume for a  10G xfs formatted file mounted to /ceph-data. We set
  OSD_PATH=/ceph-data and JOURNAL_PATH=/ceph-data/journal, and kept the
  rest of our setup_ceph.sh script the same. Our ceph cluster was able to
  reach a status of HEALTH_OK and all PGs were active+clean.
 
  What seems to be missing is the communication between the OSDs to
  replicate/create the PGs correctly. Any advice on what’s blocking the
  PGs from reaching an active+clean state? We are very stumped as to why
  the cluster using an attached cinder volume fails to reach HEALTH_OK.
 
  If I left out any important information or explanation on how the ceph
  cluster was created, let me know. Thank you!
 
  Sincerely,
  Johanni B. Thunstrom
 
  Health Output:
 
  ceph –s
  cluster cbbcfd09-9e8e-4cd1-905f-4b8e0fdb48cf
   health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192
  pgs stuck unclean monmap e3: 3 mons at
  {cephscriptdeplcindervol01=
 10.98.66.235:6789/0,cephscriptdeplcindervol02=10.98.66.229:6789/0,cephscriptdeplcindervol03=10.98.66.226:6789/0
 },
  election epoch 6, quorum 0,1,2
 
 cephscriptdeplcindervol03,cephscriptdeplcindervol02,cephscriptdeplcindervol01
  osdmap e11: 3 osds: 3 up, 3 in pgmap v23: 192 pgs, 3 pools, 0 bytes
  data, 0 objects 101608 kB used, 15227 MB / 15326 MB avail 192 incomplete
 
  ceph health detail
  HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck
  unclean pg 1.2c is stuck inactive since forever, current state
  incomplete, last acting [0] pg 0.2d is stuck inactive since forever,
  current state incomplete, last acting [0] ..
  …
  ..
  pg 0.2e is stuck unclean since forever, current state incomplete, last
  acting [0] pg 1.2f is stuck unclean since forever, current state
  incomplete, last acting [0] pg 2.2c is stuck unclean since forever,
  current state incomplete, last acting [0] pg 2.2f is incomplete, acting
  [0] (reducing pool rbd min_size from 2 may help; search ceph.com/docs
  for 'incomplete') .. ….
  ..
  pg 1.30 is incomplete, acting [0] (reducing pool metadata min_size from
  2 may help; search ceph.com/docs for 'incomplete') pg 0.31 is
  incomplete, acting [0] (reducing pool data min_size from 2 may help;
  search ceph.com/docs for 'incomplete') pg 2.32 is incomplete, acting [0]
  (reducing pool rbd min_size from 2 may help; search ceph.com/docs for
  'incomplete') pg 1.31 is incomplete, acting [0] (reducing pool metadata
  min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 0.30
  is incomplete, acting [0] (reducing pool data min_size from 2 may help;
  search ceph.com/docs for 'incomplete') pg 2.2d is incomplete, acting [0]
  (reducing pool rbd min_size from 2 may help; search ceph.com/docs for
  'incomplete') pg 1.2e is incomplete, acting [0] (reducing pool metadata
  min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 0.2f
  is incomplete, acting [0] (reducing pool 

Re: [ceph-users] FW: OSD deployed with ceph directories but not using Cinder volumes

2015-05-26 Thread Christian Balzer

Hello,

your problem is of course that the weight is 0 for all your OSDs.
Thus no data can be placed anywhere at all.

You will want to re-read the manual deployment documentation or dissect
ceph-deploy/ceph-disk more.
Your script misses the crush add bit of that process:
ceph osd crush add {id-or-name} {weight}  [{bucket-type}={bucket-name} ...]

like
ceph osd crush add osd.0 1 host=host01

Christian

On Tue, 26 May 2015 17:29:52 + Johanni Thunstrom wrote:

 Dear Ceph Team,
 
 Our cluster includes three Ceph nodes with 1 MON and 1 OSD in each. All
 nodes are running on CentOS 6.5 (kernel 2.6.32) VMs in a testing
 cluster, not production. The script we’re using is a simplified sequence
 of steps that does more or less what the ceph-cookbook does. Using
 OpenStack Cinder, we have attached a 10G block volume to each node in
 order to setup the OSD. After running our ceph cluster initialization
 script (pasted below), our cluster has a status of HEALTH_WARN and PG
 status of incomplete. Additionally all PGs in every Ceph node have the
 same acting and up set: [0]. Is this an indicator that the PG’s have not
 even started the creating state, since not every OSD has the id 0 yet
 they all state 0 as their up and acting OSD? Additionally the weight of
 all OSD’s is 0. Overall, the OSD’s appear to be up and in. The network
 appears to be fine; we are able to ping  telnet to each server from one
 another.
 
 In order to isolate our problem, we tried replacing the attached cinder
 volume for a  10G xfs formatted file mounted to /ceph-data. We set
 OSD_PATH=/ceph-data and JOURNAL_PATH=/ceph-data/journal, and kept the
 rest of our setup_ceph.sh script the same. Our ceph cluster was able to
 reach a status of HEALTH_OK and all PGs were active+clean.
 
 What seems to be missing is the communication between the OSDs to
 replicate/create the PGs correctly. Any advice on what’s blocking the
 PGs from reaching an active+clean state? We are very stumped as to why
 the cluster using an attached cinder volume fails to reach HEALTH_OK.
 
 If I left out any important information or explanation on how the ceph
 cluster was created, let me know. Thank you!
 
 Sincerely,
 Johanni B. Thunstrom
 
 Health Output:
 
 ceph –s
 cluster cbbcfd09-9e8e-4cd1-905f-4b8e0fdb48cf
  health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192
 pgs stuck unclean monmap e3: 3 mons at
 {cephscriptdeplcindervol01=10.98.66.235:6789/0,cephscriptdeplcindervol02=10.98.66.229:6789/0,cephscriptdeplcindervol03=10.98.66.226:6789/0},
 election epoch 6, quorum 0,1,2
 cephscriptdeplcindervol03,cephscriptdeplcindervol02,cephscriptdeplcindervol01
 osdmap e11: 3 osds: 3 up, 3 in pgmap v23: 192 pgs, 3 pools, 0 bytes
 data, 0 objects 101608 kB used, 15227 MB / 15326 MB avail 192 incomplete
 
 ceph health detail
 HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck
 unclean pg 1.2c is stuck inactive since forever, current state
 incomplete, last acting [0] pg 0.2d is stuck inactive since forever,
 current state incomplete, last acting [0] ..
 …
 ..
 pg 0.2e is stuck unclean since forever, current state incomplete, last
 acting [0] pg 1.2f is stuck unclean since forever, current state
 incomplete, last acting [0] pg 2.2c is stuck unclean since forever,
 current state incomplete, last acting [0] pg 2.2f is incomplete, acting
 [0] (reducing pool rbd min_size from 2 may help; search ceph.com/docs
 for 'incomplete') .. ….
 ..
 pg 1.30 is incomplete, acting [0] (reducing pool metadata min_size from
 2 may help; search ceph.com/docs for 'incomplete') pg 0.31 is
 incomplete, acting [0] (reducing pool data min_size from 2 may help;
 search ceph.com/docs for 'incomplete') pg 2.32 is incomplete, acting [0]
 (reducing pool rbd min_size from 2 may help; search ceph.com/docs for
 'incomplete') pg 1.31 is incomplete, acting [0] (reducing pool metadata
 min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 0.30
 is incomplete, acting [0] (reducing pool data min_size from 2 may help;
 search ceph.com/docs for 'incomplete') pg 2.2d is incomplete, acting [0]
 (reducing pool rbd min_size from 2 may help; search ceph.com/docs for
 'incomplete') pg 1.2e is incomplete, acting [0] (reducing pool metadata
 min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 0.2f
 is incomplete, acting [0] (reducing pool data min_size from 2 may help;
 search ceph.com/docs for 'incomplete') pg 2.2c is incomplete, acting [0]
 (reducing pool rbd min_size from 2 may help; search ceph.com/docs for
 'incomplete') pg 1.2f is incomplete, acting [0] (reducing pool metadata
 min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 0.2e
 is incomplete, acting [0] (reducing pool data min_size from 2 may help;
 search ceph.com/docs for 'incomplete')
 
 ceph mon dump
 dumped monmap epoch 3
 epoch 3
 fsid cbbcfd09-9e8e-4cd1-905f-4b8e0fdb48cf
 last_changed 2015-05-18 23:10:39.218552
 created 0.00
 0: 10.98.66.226:6789/0 

[ceph-users] FW: OSD deployed with ceph directories but not using Cinder volumes

2015-05-26 Thread Johanni Thunstrom
Dear Ceph Team,

Our cluster includes three Ceph nodes with 1 MON and 1 OSD in each. All nodes 
are running on CentOS 6.5 (kernel 2.6.32) VMs in a testing cluster, not 
production. The script we’re using is a simplified sequence of steps that does 
more or less what the ceph-cookbook does. Using OpenStack Cinder, we have 
attached a 10G block volume to each node in order to setup the OSD. After 
running our ceph cluster initialization script (pasted below), our cluster has 
a status of HEALTH_WARN and PG status of incomplete. Additionally all PGs in 
every Ceph node have the same acting and up set: [0]. Is this an indicator that 
the PG’s have not even started the creating state, since not every OSD has the 
id 0 yet they all state 0 as their up and acting OSD? Additionally the weight 
of all OSD’s is 0. Overall, the OSD’s appear to be up and in. The network 
appears to be fine; we are able to ping  telnet to each server from one 
another.

In order to isolate our problem, we tried replacing the attached cinder volume 
for a  10G xfs formatted file mounted to /ceph-data. We set OSD_PATH=/ceph-data 
and JOURNAL_PATH=/ceph-data/journal, and kept the rest of our setup_ceph.sh 
script the same. Our ceph cluster was able to reach a status of HEALTH_OK and 
all PGs were active+clean.

What seems to be missing is the communication between the OSDs to 
replicate/create the PGs correctly. Any advice on what’s blocking the PGs from 
reaching an active+clean state? We are very stumped as to why the cluster using 
an attached cinder volume fails to reach HEALTH_OK.

If I left out any important information or explanation on how the ceph cluster 
was created, let me know. Thank you!

Sincerely,
Johanni B. Thunstrom

Health Output:

ceph –s
cluster cbbcfd09-9e8e-4cd1-905f-4b8e0fdb48cf
 health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs 
stuck unclean
 monmap e3: 3 mons at 
{cephscriptdeplcindervol01=10.98.66.235:6789/0,cephscriptdeplcindervol02=10.98.66.229:6789/0,cephscriptdeplcindervol03=10.98.66.226:6789/0},
 election epoch 6, quorum 0,1,2 
cephscriptdeplcindervol03,cephscriptdeplcindervol02,cephscriptdeplcindervol01
 osdmap e11: 3 osds: 3 up, 3 in
  pgmap v23: 192 pgs, 3 pools, 0 bytes data, 0 objects
101608 kB used, 15227 MB / 15326 MB avail
 192 incomplete

ceph health detail
HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck unclean
pg 1.2c is stuck inactive since forever, current state incomplete, last acting 
[0]
pg 0.2d is stuck inactive since forever, current state incomplete, last acting 
[0]
..
…
..
pg 0.2e is stuck unclean since forever, current state incomplete, last acting 
[0]
pg 1.2f is stuck unclean since forever, current state incomplete, last acting 
[0]
pg 2.2c is stuck unclean since forever, current state incomplete, last acting 
[0]
pg 2.2f is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; 
search ceph.com/docs for 'incomplete')
..
….
..
pg 1.30 is incomplete, acting [0] (reducing pool metadata min_size from 2 may 
help; search ceph.com/docs for 'incomplete')
pg 0.31 is incomplete, acting [0] (reducing pool data min_size from 2 may help; 
search ceph.com/docs for 'incomplete')
pg 2.32 is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; 
search ceph.com/docs for 'incomplete')
pg 1.31 is incomplete, acting [0] (reducing pool metadata min_size from 2 may 
help; search ceph.com/docs for 'incomplete')
pg 0.30 is incomplete, acting [0] (reducing pool data min_size from 2 may help; 
search ceph.com/docs for 'incomplete')
pg 2.2d is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; 
search ceph.com/docs for 'incomplete')
pg 1.2e is incomplete, acting [0] (reducing pool metadata min_size from 2 may 
help; search ceph.com/docs for 'incomplete')
pg 0.2f is incomplete, acting [0] (reducing pool data min_size from 2 may help; 
search ceph.com/docs for 'incomplete')
pg 2.2c is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; 
search ceph.com/docs for 'incomplete')
pg 1.2f is incomplete, acting [0] (reducing pool metadata min_size from 2 may 
help; search ceph.com/docs for 'incomplete')
pg 0.2e is incomplete, acting [0] (reducing pool data min_size from 2 may help; 
search ceph.com/docs for 'incomplete')

ceph mon dump
dumped monmap epoch 3
epoch 3
fsid cbbcfd09-9e8e-4cd1-905f-4b8e0fdb48cf
last_changed 2015-05-18 23:10:39.218552
created 0.00
0: 10.98.66.226:6789/0 mon.cephscriptdeplcindervol03
1: 10.98.66.229:6789/0 mon.cephscriptdeplcindervol02
2: 10.98.66.235:6789/0 mon.cephscriptdeplcindervol01

ceph osd dump
epoch 11
fsid cbbcfd09-9e8e-4cd1-905f-4b8e0fdb48cf
created 2015-05-18 22:35:14.823379
modified 2015-05-18 23:10:59.037467
flags
pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 flags hashpspool crash_replay_interval 45 
stripe_width 0
pool 1 'metadata' replicated size 3