Re: [ceph-users] FW: OSD deployed with ceph directories but not using Cinder volumes
I was under the impression that ceph-disk activate would take care of setting OSD weights. In fact, the documentation for adding OSDs, the short form, only talks about running ceph-disk prepare and activate: http://ceph.com/docs/master/install/manual-deployment/#adding-osds This is also how the Ceph cookbook provisions OSDs ( https://github.com/ceph/ceph-cookbook) and we've been using it successfully in other scenarios, without having to manually set weights. Sergio On Wed, May 27, 2015 at 2:09 AM, Christian Balzer ch...@gol.com wrote: Hello, your problem is of course that the weight is 0 for all your OSDs. Thus no data can be placed anywhere at all. You will want to re-read the manual deployment documentation or dissect ceph-deploy/ceph-disk more. Your script misses the crush add bit of that process: ceph osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...] like ceph osd crush add osd.0 1 host=host01 Christian On Tue, 26 May 2015 17:29:52 + Johanni Thunstrom wrote: Dear Ceph Team, Our cluster includes three Ceph nodes with 1 MON and 1 OSD in each. All nodes are running on CentOS 6.5 (kernel 2.6.32) VMs in a testing cluster, not production. The script we’re using is a simplified sequence of steps that does more or less what the ceph-cookbook does. Using OpenStack Cinder, we have attached a 10G block volume to each node in order to setup the OSD. After running our ceph cluster initialization script (pasted below), our cluster has a status of HEALTH_WARN and PG status of incomplete. Additionally all PGs in every Ceph node have the same acting and up set: [0]. Is this an indicator that the PG’s have not even started the creating state, since not every OSD has the id 0 yet they all state 0 as their up and acting OSD? Additionally the weight of all OSD’s is 0. Overall, the OSD’s appear to be up and in. The network appears to be fine; we are able to ping telnet to each server from one another. In order to isolate our problem, we tried replacing the attached cinder volume for a 10G xfs formatted file mounted to /ceph-data. We set OSD_PATH=/ceph-data and JOURNAL_PATH=/ceph-data/journal, and kept the rest of our setup_ceph.sh script the same. Our ceph cluster was able to reach a status of HEALTH_OK and all PGs were active+clean. What seems to be missing is the communication between the OSDs to replicate/create the PGs correctly. Any advice on what’s blocking the PGs from reaching an active+clean state? We are very stumped as to why the cluster using an attached cinder volume fails to reach HEALTH_OK. If I left out any important information or explanation on how the ceph cluster was created, let me know. Thank you! Sincerely, Johanni B. Thunstrom Health Output: ceph –s cluster cbbcfd09-9e8e-4cd1-905f-4b8e0fdb48cf health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck unclean monmap e3: 3 mons at {cephscriptdeplcindervol01= 10.98.66.235:6789/0,cephscriptdeplcindervol02=10.98.66.229:6789/0,cephscriptdeplcindervol03=10.98.66.226:6789/0 }, election epoch 6, quorum 0,1,2 cephscriptdeplcindervol03,cephscriptdeplcindervol02,cephscriptdeplcindervol01 osdmap e11: 3 osds: 3 up, 3 in pgmap v23: 192 pgs, 3 pools, 0 bytes data, 0 objects 101608 kB used, 15227 MB / 15326 MB avail 192 incomplete ceph health detail HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck unclean pg 1.2c is stuck inactive since forever, current state incomplete, last acting [0] pg 0.2d is stuck inactive since forever, current state incomplete, last acting [0] .. … .. pg 0.2e is stuck unclean since forever, current state incomplete, last acting [0] pg 1.2f is stuck unclean since forever, current state incomplete, last acting [0] pg 2.2c is stuck unclean since forever, current state incomplete, last acting [0] pg 2.2f is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; search ceph.com/docs for 'incomplete') .. …. .. pg 1.30 is incomplete, acting [0] (reducing pool metadata min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 0.31 is incomplete, acting [0] (reducing pool data min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 2.32 is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 1.31 is incomplete, acting [0] (reducing pool metadata min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 0.30 is incomplete, acting [0] (reducing pool data min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 2.2d is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 1.2e is incomplete, acting [0] (reducing pool metadata min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 0.2f is incomplete, acting [0] (reducing pool
Re: [ceph-users] FW: OSD deployed with ceph directories but not using Cinder volumes
Hello, your problem is of course that the weight is 0 for all your OSDs. Thus no data can be placed anywhere at all. You will want to re-read the manual deployment documentation or dissect ceph-deploy/ceph-disk more. Your script misses the crush add bit of that process: ceph osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...] like ceph osd crush add osd.0 1 host=host01 Christian On Tue, 26 May 2015 17:29:52 + Johanni Thunstrom wrote: Dear Ceph Team, Our cluster includes three Ceph nodes with 1 MON and 1 OSD in each. All nodes are running on CentOS 6.5 (kernel 2.6.32) VMs in a testing cluster, not production. The script we’re using is a simplified sequence of steps that does more or less what the ceph-cookbook does. Using OpenStack Cinder, we have attached a 10G block volume to each node in order to setup the OSD. After running our ceph cluster initialization script (pasted below), our cluster has a status of HEALTH_WARN and PG status of incomplete. Additionally all PGs in every Ceph node have the same acting and up set: [0]. Is this an indicator that the PG’s have not even started the creating state, since not every OSD has the id 0 yet they all state 0 as their up and acting OSD? Additionally the weight of all OSD’s is 0. Overall, the OSD’s appear to be up and in. The network appears to be fine; we are able to ping telnet to each server from one another. In order to isolate our problem, we tried replacing the attached cinder volume for a 10G xfs formatted file mounted to /ceph-data. We set OSD_PATH=/ceph-data and JOURNAL_PATH=/ceph-data/journal, and kept the rest of our setup_ceph.sh script the same. Our ceph cluster was able to reach a status of HEALTH_OK and all PGs were active+clean. What seems to be missing is the communication between the OSDs to replicate/create the PGs correctly. Any advice on what’s blocking the PGs from reaching an active+clean state? We are very stumped as to why the cluster using an attached cinder volume fails to reach HEALTH_OK. If I left out any important information or explanation on how the ceph cluster was created, let me know. Thank you! Sincerely, Johanni B. Thunstrom Health Output: ceph –s cluster cbbcfd09-9e8e-4cd1-905f-4b8e0fdb48cf health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck unclean monmap e3: 3 mons at {cephscriptdeplcindervol01=10.98.66.235:6789/0,cephscriptdeplcindervol02=10.98.66.229:6789/0,cephscriptdeplcindervol03=10.98.66.226:6789/0}, election epoch 6, quorum 0,1,2 cephscriptdeplcindervol03,cephscriptdeplcindervol02,cephscriptdeplcindervol01 osdmap e11: 3 osds: 3 up, 3 in pgmap v23: 192 pgs, 3 pools, 0 bytes data, 0 objects 101608 kB used, 15227 MB / 15326 MB avail 192 incomplete ceph health detail HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck unclean pg 1.2c is stuck inactive since forever, current state incomplete, last acting [0] pg 0.2d is stuck inactive since forever, current state incomplete, last acting [0] .. … .. pg 0.2e is stuck unclean since forever, current state incomplete, last acting [0] pg 1.2f is stuck unclean since forever, current state incomplete, last acting [0] pg 2.2c is stuck unclean since forever, current state incomplete, last acting [0] pg 2.2f is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; search ceph.com/docs for 'incomplete') .. …. .. pg 1.30 is incomplete, acting [0] (reducing pool metadata min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 0.31 is incomplete, acting [0] (reducing pool data min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 2.32 is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 1.31 is incomplete, acting [0] (reducing pool metadata min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 0.30 is incomplete, acting [0] (reducing pool data min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 2.2d is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 1.2e is incomplete, acting [0] (reducing pool metadata min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 0.2f is incomplete, acting [0] (reducing pool data min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 2.2c is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 1.2f is incomplete, acting [0] (reducing pool metadata min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 0.2e is incomplete, acting [0] (reducing pool data min_size from 2 may help; search ceph.com/docs for 'incomplete') ceph mon dump dumped monmap epoch 3 epoch 3 fsid cbbcfd09-9e8e-4cd1-905f-4b8e0fdb48cf last_changed 2015-05-18 23:10:39.218552 created 0.00 0: 10.98.66.226:6789/0
[ceph-users] FW: OSD deployed with ceph directories but not using Cinder volumes
Dear Ceph Team, Our cluster includes three Ceph nodes with 1 MON and 1 OSD in each. All nodes are running on CentOS 6.5 (kernel 2.6.32) VMs in a testing cluster, not production. The script we’re using is a simplified sequence of steps that does more or less what the ceph-cookbook does. Using OpenStack Cinder, we have attached a 10G block volume to each node in order to setup the OSD. After running our ceph cluster initialization script (pasted below), our cluster has a status of HEALTH_WARN and PG status of incomplete. Additionally all PGs in every Ceph node have the same acting and up set: [0]. Is this an indicator that the PG’s have not even started the creating state, since not every OSD has the id 0 yet they all state 0 as their up and acting OSD? Additionally the weight of all OSD’s is 0. Overall, the OSD’s appear to be up and in. The network appears to be fine; we are able to ping telnet to each server from one another. In order to isolate our problem, we tried replacing the attached cinder volume for a 10G xfs formatted file mounted to /ceph-data. We set OSD_PATH=/ceph-data and JOURNAL_PATH=/ceph-data/journal, and kept the rest of our setup_ceph.sh script the same. Our ceph cluster was able to reach a status of HEALTH_OK and all PGs were active+clean. What seems to be missing is the communication between the OSDs to replicate/create the PGs correctly. Any advice on what’s blocking the PGs from reaching an active+clean state? We are very stumped as to why the cluster using an attached cinder volume fails to reach HEALTH_OK. If I left out any important information or explanation on how the ceph cluster was created, let me know. Thank you! Sincerely, Johanni B. Thunstrom Health Output: ceph –s cluster cbbcfd09-9e8e-4cd1-905f-4b8e0fdb48cf health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck unclean monmap e3: 3 mons at {cephscriptdeplcindervol01=10.98.66.235:6789/0,cephscriptdeplcindervol02=10.98.66.229:6789/0,cephscriptdeplcindervol03=10.98.66.226:6789/0}, election epoch 6, quorum 0,1,2 cephscriptdeplcindervol03,cephscriptdeplcindervol02,cephscriptdeplcindervol01 osdmap e11: 3 osds: 3 up, 3 in pgmap v23: 192 pgs, 3 pools, 0 bytes data, 0 objects 101608 kB used, 15227 MB / 15326 MB avail 192 incomplete ceph health detail HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck unclean pg 1.2c is stuck inactive since forever, current state incomplete, last acting [0] pg 0.2d is stuck inactive since forever, current state incomplete, last acting [0] .. … .. pg 0.2e is stuck unclean since forever, current state incomplete, last acting [0] pg 1.2f is stuck unclean since forever, current state incomplete, last acting [0] pg 2.2c is stuck unclean since forever, current state incomplete, last acting [0] pg 2.2f is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; search ceph.com/docs for 'incomplete') .. …. .. pg 1.30 is incomplete, acting [0] (reducing pool metadata min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 0.31 is incomplete, acting [0] (reducing pool data min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 2.32 is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 1.31 is incomplete, acting [0] (reducing pool metadata min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 0.30 is incomplete, acting [0] (reducing pool data min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 2.2d is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 1.2e is incomplete, acting [0] (reducing pool metadata min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 0.2f is incomplete, acting [0] (reducing pool data min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 2.2c is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 1.2f is incomplete, acting [0] (reducing pool metadata min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 0.2e is incomplete, acting [0] (reducing pool data min_size from 2 may help; search ceph.com/docs for 'incomplete') ceph mon dump dumped monmap epoch 3 epoch 3 fsid cbbcfd09-9e8e-4cd1-905f-4b8e0fdb48cf last_changed 2015-05-18 23:10:39.218552 created 0.00 0: 10.98.66.226:6789/0 mon.cephscriptdeplcindervol03 1: 10.98.66.229:6789/0 mon.cephscriptdeplcindervol02 2: 10.98.66.235:6789/0 mon.cephscriptdeplcindervol01 ceph osd dump epoch 11 fsid cbbcfd09-9e8e-4cd1-905f-4b8e0fdb48cf created 2015-05-18 22:35:14.823379 modified 2015-05-18 23:10:59.037467 flags pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool crash_replay_interval 45 stripe_width 0 pool 1 'metadata' replicated size 3