That last part with the scrubs, when manually run returning clean may not
be accurate.  Doing more testing, but the problem is definitely persistent
even after a repair returns to show the PG as clean.

On Fri, Jan 26, 2018 at 7:41 AM David Turner <drakonst...@gmail.com> wrote:

> I just upgraded to Luminous yesterday and before the upgrade was complete,
> we had SSD OSDs flapping up and down and scrub errors in the RGW index
> pools.  I consistently made sure that we had all OSDs back up and the
> cluster healthy before continuing and never reduced the min_size below 2
> for the pools on the NVMes.  The RGW daemons for our 2 multi-site realms
> restarted themselves (due to a long-standing memory leak supposedly fixed
> in 12.2.2) and prematurely upgraded themselves before all of the OSDs had
> been upgraded and I thought that was the reason for the scrub errors and
> inconsistent PGs... however this morning I had a scrub error in our local
> only realm which does not use multi-site and had not restarted any of it's
> RGW daemons until after all of the OSDs had been upgraded.
>
> Is there anything we should be looking at for this?  Any idea what could
> be causing these scrub errors?  I can issue a repair on the PG and the
> scrub errors go away, but then they keep coming back on the same PGs
> later.  I can also issue a deep-scrub on every PG in these pools and they
> return clean, but then later show back up with the scrub errors and
> inconsistent PGs on the same PGs.
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to