That last part with the scrubs, when manually run returning clean may not be accurate. Doing more testing, but the problem is definitely persistent even after a repair returns to show the PG as clean.
On Fri, Jan 26, 2018 at 7:41 AM David Turner <drakonst...@gmail.com> wrote: > I just upgraded to Luminous yesterday and before the upgrade was complete, > we had SSD OSDs flapping up and down and scrub errors in the RGW index > pools. I consistently made sure that we had all OSDs back up and the > cluster healthy before continuing and never reduced the min_size below 2 > for the pools on the NVMes. The RGW daemons for our 2 multi-site realms > restarted themselves (due to a long-standing memory leak supposedly fixed > in 12.2.2) and prematurely upgraded themselves before all of the OSDs had > been upgraded and I thought that was the reason for the scrub errors and > inconsistent PGs... however this morning I had a scrub error in our local > only realm which does not use multi-site and had not restarted any of it's > RGW daemons until after all of the OSDs had been upgraded. > > Is there anything we should be looking at for this? Any idea what could > be causing these scrub errors? I can issue a repair on the PG and the > scrub errors go away, but then they keep coming back on the same PGs > later. I can also issue a deep-scrub on every PG in these pools and they > return clean, but then later show back up with the scrub errors and > inconsistent PGs on the same PGs. >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com