Update to this.

The affected pg didn't seem inconsistent:

[root@admin-ceph1-qh2 ~]# ceph health detail
HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
   pg 6.20 is active+clean+inconsistent, acting [114,26,44]
[root@admin-ceph1-qh2 ~]# rados list-inconsistent-obj 6.20
--format=json-pretty
{
   "epoch": 210034,
   "inconsistents": []
}

Although pg query showed the primary info.stats.stat_sum.num_bytes differed
from the peers

A pg repair on 6.20 seems to have resolved the issue for now but the
info.stats.stat_sum.num_bytes still differs so presumably will become
inconsistent again next time it scrubs.

Adrian.

On Tue, Jun 5, 2018 at 12:09 PM, Adrian <aussie...@gmail.com> wrote:

> Hi Cephers,
>
> We recently upgraded one of our clusters from hammer to jewel and then to
> luminous (12.2.5, 5 mons/mgr, 21 storage nodes * 9 osd's). After some
> deep-scubs we have an inconsistent pg with a log message we've not seen
> before:
>
> HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
> OSD_SCRUB_ERRORS 1 scrub errors
> PG_DAMAGED Possible data damage: 1 pg inconsistent
>     pg 6.20 is active+clean+inconsistent, acting [114,26,44]
>
>
> Ceph log shows
>
> 2018-06-03 06:53:35.467791 osd.114 osd.114 172.26.28.25:6825/40819 395 : 
> cluster [ERR] 6.20 scrub stat mismatch, got 6526/6526 objects, 87/87 clones, 
> 6526/6526 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 
> 25952454144/25952462336 bytes, 0/0 hit_set_archive bytes.
> 2018-06-03 06:53:35.467799 osd.114 osd.114 172.26.28.25:6825/40819 396 : 
> cluster [ERR] 6.20 scrub 1 errors
> 2018-06-03 06:53:40.701632 mon.mon1-ceph1-qh2 mon.0 172.26.28.8:6789/0 41298 
> : cluster [ERR] Health check failed: 1 scrub errors (OSD_SCRUB_ERRORS)
> 2018-06-03 06:53:40.701668 mon.mon1-ceph1-qh2 mon.0 172.26.28.8:6789/0 41299 
> : cluster [ERR] Health check failed: Possible data damage: 1 pg inconsistent 
> (PG_DAMAGED)
> 2018-06-03 07:00:00.000137 mon.mon1-ceph1-qh2 mon.0 172.26.28.8:6789/0 41345 
> : cluster [ERR] overall HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg 
> inconsistent
>
> There are no EC pools - looks like it may be the same as
> https://tracker.ceph.com/issues/22656 although as in #7 this is not a
> cache pool.
>
> Wondering if this is ok to issue a pg repair on 6.20 or if there's
> something else we should be looking at first ?
>
> Thanks in advance,
> Adrian.
>
> ---
> Adrian : aussie...@gmail.com
> If violence doesn't solve your problem, you're not using enough of it.
>



-- 
---
Adrian : aussie...@gmail.com
If violence doesn't solve your problem, you're not using enough of it.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to