Re: [ceph-users] active+degraded on an empty new cluster

Giuseppe Civitella Wed, 10 Dec 2014 14:18:04 -0800

Craig, Gregory,

my disks were a bit smaller than 10GB, I changed them with 20GB disks and
the cluster's health went OK.


Thanks a lot

2014-12-10 0:08 GMT+01:00 Craig Lewis <cle...@centraldesktop.com>:

> When I first created a test cluster, I used 1 GiB disks.  That causes
> problems.
>
> Ceph has a CRUSH weight.  By default, the weight is the size of the disk
> in TiB, truncated to 2 decimal places.  ie, any disk smaller than 10 GiB
> will have a weight of 0.00.
>
> I increased all of my virtual disks to 10 GiB.  After rebooting the nodes
> (to see the changes), everything healed.
>
>
> On Tue, Dec 9, 2014 at 9:45 AM, Gregory Farnum <g...@gregs42.com> wrote:
>
>> It looks like your OSDs all have weight zero for some reason. I'd fix
>> that. :)
>> -Greg
>>
>> On Tue, Dec 9, 2014 at 6:24 AM Giuseppe Civitella <
>> giuseppe.civite...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> thanks for the quick answer.
>>> I did try the force_create_pg on a pg but is stuck on "creating":
>>> root@ceph-mon1:/home/ceph# ceph pg dump |grep creating
>>> dumped all in format plain
>>> 2.2f    0       0       0       0       0       0       0       creating
>>>        2014-12-09 13:11:37.384808      0'0     0:0     []      -1      []
>>>    -1      0'0     0.0000000'0      0.000000
>>>
>>> root@ceph-mon1:/home/ceph# ceph pg 2.2f query
>>> { "state": "active+degraded",
>>>   "epoch": 105,
>>>   "up": [
>>>         0],
>>>   "acting": [
>>>         0],
>>>   "actingbackfill": [
>>>         "0"],
>>>   "info": { "pgid": "2.2f",
>>>       "last_update": "0'0",
>>>       "last_complete": "0'0",
>>>       "log_tail": "0'0",
>>>       "last_user_version": 0,
>>>       "last_backfill": "MAX",
>>>       "purged_snaps": "[]",
>>>           "last_scrub": "0'0",
>>>           "last_scrub_stamp": "2014-12-06 14:15:11.499769",
>>>           "last_deep_scrub": "0'0",
>>>           "last_deep_scrub_stamp": "2014-12-06 14:15:11.499769",
>>>           "last_clean_scrub_stamp": "0.000000",
>>>           "log_size": 0,
>>>           "ondisk_log_size": 0,
>>>           "stats_invalid": "0",
>>>           "stat_sum": { "num_bytes": 0,
>>>               "num_objects": 0,
>>>               "num_object_clones": 0,
>>>               "num_object_copies": 0,
>>>               "num_objects_missing_on_primary": 0,
>>>               "num_objects_degraded": 0,
>>>               "num_objects_unfound": 0,
>>>               "num_objects_dirty": 0,
>>>               "num_whiteouts": 0,
>>>               "num_read": 0,
>>>               "num_read_kb": 0,
>>>               "num_write": 0,
>>>               "num_write_kb": 0,
>>>               "num_scrub_errors": 0,
>>>               "num_shallow_scrub_errors": 0,
>>>               "num_deep_scrub_errors": 0,
>>>               "num_objects_recovered": 0,
>>>               "num_bytes_recovered": 0,
>>>               "num_keys_recovered": 0,
>>>               "num_objects_omap": 0,
>>>               "num_objects_hit_set_archive": 0},
>>>           "stat_cat_sum": {},
>>>           "up": [
>>>                 0],
>>>           "acting": [
>>>                 0],
>>>           "up_primary": 0,
>>>           "acting_primary": 0},
>>>       "empty": 1,
>>>       "dne": 0,
>>>       "incomplete": 0,
>>>       "last_epoch_started": 104,
>>>       "hit_set_history": { "current_last_update": "0'0",
>>>           "current_last_stamp": "0.000000",
>>>           "current_info": { "begin": "0.000000",
>>>               "end": "0.000000",
>>>               "version": "0'0"},
>>>           "history": []}},
>>>   "peer_info": [],
>>>   "recovery_state": [
>>>         { "name": "Started\/Primary\/Active",
>>>           "enter_time": "2014-12-09 12:12:52.760384",
>>>           "might_have_unfound": [],
>>>           "recovery_progress": { "backfill_targets": [],
>>>               "waiting_on_backfill": [],
>>>               "last_backfill_started": "0\/\/0\/\/-1",
>>>               "backfill_info": { "begin": "0\/\/0\/\/-1",
>>>                   "end": "0\/\/0\/\/-1",
>>>                   "objects": []},
>>>               "peer_backfill_info": [],
>>>               "backfills_in_flight": [],
>>>               "recovering": [],
>>>               "pg_backend": { "pull_from_peer": [],
>>>                   "pushing": []}},
>>>           "scrub": { "scrubber.epoch_start": "0",
>>>               "scrubber.active": 0,
>>>               "scrubber.block_writes": 0,
>>>               "scrubber.finalizing": 0,
>>>               "scrubber.waiting_on": 0,
>>>               "scrubber.waiting_on_whom": []}},
>>>         { "name": "Started",
>>>           "enter_time": "2014-12-09 12:12:51.845686"}],
>>>   "agent_state": {}}root@ceph-mon1:/home/ceph#
>>>
>>>
>>>
>>> 2014-12-09 13:01 GMT+01:00 Irek Fasikhov <malm...@gmail.com>:
>>>
>>>> Hi.
>>>>
>>>> http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/
>>>>
>>>> ceph pg force_create_pg <pgid>
>>>>
>>>>
>>>> 2014-12-09 14:50 GMT+03:00 Giuseppe Civitella <
>>>> giuseppe.civite...@gmail.com>:
>>>>
>>>>> Hi all,
>>>>>
>>>>> last week I installed a new ceph cluster on 3 vm running Ubuntu 14.04
>>>>> with default kernel.
>>>>> There is a ceph monitor a two osd hosts. Here are some datails:
>>>>> ceph -s
>>>>>     cluster c46d5b02-dab1-40bf-8a3d-f8e4a77b79da
>>>>>      health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean
>>>>>      monmap e1: 1 mons at {ceph-mon1=10.1.1.83:6789/0}, election
>>>>> epoch 1, quorum 0 ceph-mon1
>>>>>      osdmap e83: 6 osds: 6 up, 6 in
>>>>>       pgmap v231: 192 pgs, 3 pools, 0 bytes data, 0 objects
>>>>>             207 MB used, 30446 MB / 30653 MB avail
>>>>>                  192 active+degraded
>>>>>
>>>>> root@ceph-mon1:/home/ceph# ceph osd dump
>>>>> epoch 99
>>>>> fsid c46d5b02-dab1-40bf-8a3d-f8e4a77b79da
>>>>> created 2014-12-06 13:15:06.418843
>>>>> modified 2014-12-09 11:38:04.353279
>>>>> flags
>>>>> pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>>>>> rjenkins pg_num 64 pgp_num 64 last_change 18 flags hashpspool
>>>>> crash_replay_interval 45 stripe_width 0
>>>>> pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 0
>>>>> object_hash rjenkins pg_num 64 pgp_num 64 last_change 19 flags hashpspool
>>>>> stripe_width 0
>>>>> pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>>>>> rjenkins pg_num 64 pgp_num 64 last_change 20 flags hashpspool 
>>>>> stripe_width 0
>>>>> max_osd 6
>>>>> osd.0 up   in  weight 1 up_from 90 up_thru 90 down_at 89
>>>>> last_clean_interval [58,89) 10.1.1.84:6805/995 10.1.1.84:6806/4000995
>>>>> 10.1.1.84:6807/4000995 10.1.1.84:6808/4000995 exists,up
>>>>> e3895075-614d-48e2-b956-96e13dbd87fe
>>>>> osd.1 up   in  weight 1 up_from 88 up_thru 0 down_at 87
>>>>> last_clean_interval [8,87) 10.1.1.85:6800/23146 10.1.1.85:6815/7023146
>>>>> 10.1.1.85:6816/7023146 10.1.1.85:6817/7023146 exists,up
>>>>> 144bc6ee-2e3d-4118-a460-8cc2bb3ec3e8
>>>>> osd.2 up   in  weight 1 up_from 61 up_thru 0 down_at 60
>>>>> last_clean_interval [11,60) 10.1.1.85:6805/26784
>>>>> 10.1.1.85:6802/5026784 10.1.1.85:6811/5026784 10.1.1.85:6812/5026784
>>>>> exists,up 8d5c7108-ef11-4947-b28c-8e20371d6d78
>>>>> osd.3 up   in  weight 1 up_from 95 up_thru 0 down_at 94
>>>>> last_clean_interval [57,94) 10.1.1.84:6800/810 10.1.1.84:6810/3000810
>>>>> 10.1.1.84:6811/3000810 10.1.1.84:6812/3000810 exists,up
>>>>> bd762b2d-f94c-4879-8865-cecd63895557
>>>>> osd.4 up   in  weight 1 up_from 97 up_thru 0 down_at 96
>>>>> last_clean_interval [74,96) 10.1.1.84:6801/9304 10.1.1.84:6802/2009304
>>>>> 10.1.1.84:6803/2009304 10.1.1.84:6813/2009304 exists,up
>>>>> 7d28a54b-b474-4369-b958-9e6bf6c856aa
>>>>> osd.5 up   in  weight 1 up_from 99 up_thru 0 down_at 98
>>>>> last_clean_interval [79,98) 10.1.1.85:6801/19513
>>>>> 10.1.1.85:6808/2019513 10.1.1.85:6810/2019513 10.1.1.85:6813/2019513
>>>>> exists,up f4d76875-0e40-487c-a26d-320f8b8d60c5
>>>>>
>>>>> root@ceph-mon1:/home/ceph# ceph osd tree
>>>>> # id    weight  type name       up/down reweight
>>>>> -1      0       root default
>>>>> -2      0               host ceph-osd1
>>>>> 0       0                       osd.0   up      1
>>>>> 3       0                       osd.3   up      1
>>>>> 4       0                       osd.4   up      1
>>>>> -3      0               host ceph-osd2
>>>>> 1       0                       osd.1   up      1
>>>>> 2       0                       osd.2   up      1
>>>>> 5       0                       osd.5   up      1
>>>>>
>>>>> Current HEALTH_WARN state says "192 active+degraded" since I rebooted
>>>>> an osd host. Previously it was "incomplete". It never reached a HEALTH_OK
>>>>> state.
>>>>> Any hint about what to do next to have an healthy cluster?
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> С уважением, Фасихов Ирек Нургаязович
>>>> Моб.: +79229045757
>>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] active+degraded on an empty new cluster

Reply via email to