On Sun, Feb 11, 2018 at 8:19 PM Chris Apsey wrote:
> All,
>
> Recently doubled the number of OSDs in our cluster, and towards the end
> of the rebalancing, I noticed that recovery IO fell to nothing and that
> the ceph mons eventually looked like this when I ran ceph -s
>
>cluster:
> id: 6a65c3d0-b84e-4c89-bbf7-a38a1966d780
> health: HEALTH_WARN
> 34922/4329975 objects misplaced (0.807%)
> Reduced data availability: 542 pgs inactive, 49 pgs
> peering, 13502 pgs stale
> Degraded data redundancy: 248778/4329975 objects
> degraded (5.745%), 7319 pgs unclean, 2224 pgs degraded, 1817 pgs
> undersized
>
>services:
> mon: 3 daemons, quorum cephmon-0,cephmon-1,cephmon-2
> mgr: cephmon-0(active), standbys: cephmon-1, cephmon-2
> osd: 376 osds: 376 up, 376 in
>
>data:
> pools: 9 pools, 13952 pgs
> objects: 1409k objects, 5992 GB
> usage: 31528 GB used, 1673 TB / 1704 TB avail
> pgs: 3.225% pgs unknown
> 0.659% pgs not active
> 248778/4329975 objects degraded (5.745%)
> 34922/4329975 objects misplaced (0.807%)
> 6141 stale+active+clean
> 4537 stale+active+remapped+backfilling
> 1575 stale+active+undersized+degraded
> 489 stale+active+clean+remapped
> 450 unknown
> 396 stale+active+recovery_wait+degraded
> 216
> stale+active+undersized+degraded+remapped+backfilling
> 40 stale+peering
> 30 stale+activating
> 24 stale+active+undersized+remapped
> 22 stale+active+recovering+degraded
> 13 stale+activating+degraded
> 9stale+remapped+peering
> 4stale+active+remapped+backfill_wait
> 3stale+active+clean+scrubbing+deep
> 2
> stale+active+undersized+degraded+remapped+backfill_wait
> 1stale+active+remapped
>
> The problem is, everything works fine. If I run ceph health detail and
> do a pg query against one of the 'degraded' placement groups, it reports
> back as active-clean. All clients in the cluster can write and read at
> normal speeds, but not IO information is ever reported in ceph -s.
>
> From what I can see, everything in the cluster is working properly
> except the actual reporting on the status of the cluster. Has anyone
> seen this before/know how to sync the mons up to what the OSDs are
> actually reporting? I see no connectivity errors in the logs of the
> mons or the osds.
>
It sounds like the manager has gone stale somehow. You can probably fix it
by restarting, though if you have logs it would be good to file a bug
report at tracker.ceph.com.
-Greg
>
> Thanks,
>
> ---
> v/r
>
> Chris Apsey
> bitskr...@bitskrieg.net
> https://www.bitskrieg.net
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com