Hello,
Env;- Bluestore EC 4+1 v11.2.0 RHEL7.3 16383 PG We did our resiliency testing and found OSD's keeps on flapping and cluster went to error state. What we did:- 1. we have 5 node cluster 2. poweroff/stop ceph.target on last node and waited everything seems to reach back to normal. 3. Then power up the last node and then we see this recovery stuck on remapped PG. ~~~ osdmap e4829: 340 osds: 101 up, 112 in; 15011 *remapped pgs* *~~~* 4. Initially all osd's reach 340, at the same time this remapped value reached 16384 with OSD epoch value e818 5. Then after 1 or 2 hour we suspect that this remapped PG value keeps on incremnet/decrement results the osd's started failed one by one. While we tested with below patch also still no change. patch - https://github.com/ceph/ceph-ci/commit/wip-prune-past-intervals-kraken #ceph -s 2017-05-18 18:07:45.876586 7fd6bb87e700 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb 2017-05-18 18:07:45.900045 7fd6bb87e700 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb cluster cb55baa8-d5a5-442e-9aae-3fd83553824e health HEALTH_ERR 27056 pgs are stuck inactive for more than 300 seconds 744 pgs degraded 10944 pgs down 3919 pgs peering 11416 pgs stale 744 pgs stuck degraded 15640 pgs stuck inactive 11416 pgs stuck stale 16384 pgs stuck unclean 744 pgs stuck undersized 744 pgs undersized recovery 1279809/135206985 objects degraded (0.947%) too many PGs per OSD (731 > max 300) 11/112 in osds are down monmap e3: 5 mons at {PL6-CN1= 10.50.62.151:6789/0,PL6-CN2=10.50.62.152:6789/0,PL6-CN3=10.50.62.153:6789/0,PL6-CN4=10.50.62.154:6789/0,PL6-CN5=1 0.50.62.155:6789/0} election epoch 22, quorum 0,1,2,3,4 PL6-CN1,PL6-CN2,PL6-CN3,PL6-CN4,PL6-CN5 mgr no daemons active osdmap e4827: 340 osds: 101 up, 112 in; 15011 remapped pgs flags sortbitwise,require_jewel_osds,require_kraken_osds pgmap v83202: 16384 pgs, 1 pools, 52815 GB data, 26407 kobjects 12438 GB used, 331 TB / 343 TB avail 1279809/135206985 objects degraded (0.947%) 4512 stale+down+remapped 3060 down+remapped 2204 stale+down 2000 stale+remapped+peering 1259 stale+peering 1167 down 739 stale+active+undersized+degraded 702 stale+remapped 557 peering 102 remapped+peering # ceph pg stat 2017-05-18 18:09:18.345865 7fe2f72ec700 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb 2017-05-18 18:09:18.368566 7fe2f72ec700 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb v83204: 16384 pgs: 1 inactive, 1259 stale+peering, 75 remapped, 2000 stale+remapped+peering, 102 remapped+peering, 2204 stale+down, 739 stale+active+undersized+degraded, 1 down+remapped+peering, 702 stale+remapped, 557 peering, 4512 stale+down+remapped, 3060 down+remapped, 5 active+undersized+degraded, 1167 down; 52815 GB data, 12438 GB used, 331 TB / 343 TB avail; 1279809/135206985 objects degraded (0.947%) Randomly capture some pg value. ~~~ 3.3ffc 1646 0 1715 0 0 3451912192 1646 1646 stale+active+undersized+degraded 2017-05-18 11:06:32.453158 846'1646 872:1634 [36,NONE,278,219,225] 36 [36,NONE,278,219,225] 36 0'0 2017-05-18 07:14:30.303859 0'0 2017-05-18 07:14:30.303859 3.3ffb 1711 0 0 0 0 3588227072 1711 1711 down 2017-05-18 15:20:52.858840 846'1711 1602:1708 [150,161,NONE,NONE,83] 150 [150,161,NONE,NONE,83] 150 0'0 2017-05-18 07:14:30.303838 0'0 2017-05-18 07:14:30.303838 3.3ffa 1617 0 0 0 0 3391094784 1617 1617 down+remapped 2017-05-18 17:12:54.943317 846'1617 2525:1637 [48,292,77,277,49] 48 [48,NONE,NONE,277,49] 48 0'0 2017-05-18 07:14:30.303807 0'0 2017-05-18 07:14:30.303807 3.3ff9 1682 0 0 0 0 3527409664 1682 1682 down+remapped 2017-05-18 16:16:42.223632 846'1682 2195:1678 [266,79,NONE,309,258] 266 [NONE,NONE,NONE,NONE,258] 258 0'0 2017-05-18 07:14:30.303793 0'0 2017-05-18 07:14:30.303793 ~~~ ceph.conf [mon] mon_osd_down_out_interval = 3600 mon_osd_reporter_subtree_level=host mon_osd_down_out_subtree_limit=host mon_osd_min_down_reporters = 4 mon_allow_pool_delete = true [osd] bluestore = true bluestore_cache_size = 107374182 bluefs_buffered_io = true osd_op_threads = 24 osd_op_num_shards = 5 osd_op_num_threads_per_shard = 2 osd_enable_op_tracker = false osd_scrub_begin_hour = 1 osd_scrub_end_hour = 7 osd_deep_scrub_interval = 3.154e+9 osd_max_backfills = 3 osd_recovery_max_active = 3 osd_recovery_op_priority = 1 # ceph osd stat 2017-05-18 18:10:11.864303 7fedc5a98700 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb 2017-05-18 18:10:11.887182 7fedc5a98700 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb osdmap e4829: 340 osds: 101 up, 112 in; 15011 remapped pgs ==<<< <<<<<<<<<<<<<< SEE this flags sortbitwise,require_jewel_osds,require_kraken_osds Is there any config directive which helps to skip the remapped PG count while recovery process. Does Luminous v12.0.3 fixed the OSD flap issue? Awaiting for your suggestions. Thanks Jayaram
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com