On Tue, Aug 9, 2016 at 11:15 PM, Goncalo Borges <goncalo.bor...@sydney.edu.au> wrote: > Hi Greg... > > Thanks for replying, You seem omnipresent in all ceph/cephfs issues! > > Can you please confirm that, in Jewel, 'ceph pg repair' simply copies the pg > contents of the primary osd to the others? And that can lead to data > corruption if the problematic osd is indeed the primary? > > If in Jewel there is some clever way for the system to know which osd has > the problematic pg/object, than there is no real need to inspect the pgs in > the different osds. If that is not the case, we need to find out what is the > osd with the incorrect data.
I think some of the new scrub infrastructure went in, but I'm not sure how much. David or Sam? > I am not sure if 'ceph pg <id> query' can give you clear indication of what > is wrong. Will have to look closely to the output of a ceph pg <id> query > and see what it reports once it happens. > > The other alternative, i.e., using ceph-objectstore-tool to inspect an > object odmap key/values per osd, can't be used on an an active / live osd. > For example, just trying to get some info from a pg, I get: > > # ceph-objectstore-tool --data /var/lib/ceph/osd/ceph-0 --journal > /var/lib/ceph/osd/ceph-0/journal --op info --pgid 5.291 > OSD has the store locked > > So, it seems that we have to stop the osd (and set ithe cluster to noout to > avoid rebalancing) to release the lock, which will bring the cluster to a > not-ok state (which is kind of ugly). Ah yeah. I'm not aware of anything more elegant right now though. :( -Greg > > Cheers > > G. > > > > > > > > > On 08/10/2016 12:13 AM, Gregory Farnum wrote: >> >> On Tue, Aug 9, 2016 at 2:00 AM, Kenneth Waegeman >> <kenneth.waege...@ugent.be> wrote: >>> >>> Hi, >>> >>> I did a diff on the directories of all three the osds, no difference .. >>> So I >>> don't know what's wrong. >> >> omap (as implied by the omap_digest complaint) is stored in the OSD >> leveldb, not in the data directories, so you wouldn't expect to see >> any differences from a raw diff. I think you can extract the omaps as >> well by using the ceph-objectstore-tool or whatever it's called (I >> haven't done it myself) and compare those. Should see if you get more >> useful information out of the pg query first, though! >> -Greg >> >>> Only thing I see different is a scrub file in the TEMP folder (it is >>> already >>> another pg than last mail): >>> >>> -rw-r--r-- 1 ceph ceph 0 Aug 9 09:51 >>> scrub\u6.107__head_00000107__fffffffffffffff8 >>> >>> But it is empty.. >>> >>> Thanks! >>> >>> >>> >>> On 09/08/16 04:33, Goncalo Borges wrote: >>>> >>>> Hi Kenneth... >>>> >>>> The previous default behavior of 'ceph pg repair' was to copy the pg >>>> objects from the primary osd to others. Not sure if it is till the case >>>> in >>>> Jewel. For this reason, once we get these kind of errors in a data pool, >>>> the >>>> best practice is to compare the md5 checksums of the damaged object in >>>> all >>>> osds involved in the inconsistent pg. Since we have a 3 replica cluster, >>>> we >>>> should find a 2 good object quorum. If by chance the primary osd has the >>>> wrong object, it should delete it before running the repair. >>>> >>>> On a metadata pool, I am not sure exactly how to cross check since all >>>> objects are size 0 and therefore, md5sum is meaningless. Maybe, one way >>>> forward could be to check the contents of the pg directories (ex: >>>> /var/lib/ceph/osd/ceph-0/current/5.161_head/) in all osds involved for >>>> the >>>> pg and see if we spot something wrong? >>>> >>>> Cheers >>>> >>>> G. >>>> >>>> >>>> On 08/08/2016 09:40 PM, Kenneth Waegeman wrote: >>>>> >>>>> Hi all, >>>>> >>>>> Since last week, some pg's are going in the inconsistent state after a >>>>> scrub error. Last week we had 4 pgs in that state, They were on >>>>> different >>>>> OSDS, but all of the metadata pool. >>>>> I did a pg repair on them, and all were healthy again. But now again >>>>> one >>>>> pg is inconsistent. >>>>> >>>>> with health detail I see: >>>>> >>>>> pg 6.2f4 is active+clean+inconsistent, acting [3,5,1] >>>>> 1 scrub errors >>>>> >>>>> And in the log of the primary: >>>>> >>>>> 2016-08-06 06:24:44.723224 7fc5493f3700 -1 log_channel(cluster) log >>>>> [ERR] >>>>> : 6.2f4 shard 5: soid 6:2f55791f:::606.00000000:head omap_digest >>>>> 0x3a105358 >>>>> != best guess omap_digest 0xc85c4361 from auth shard 1 >>>>> 2016-08-06 06:24:53.931029 7fc54bbf8700 -1 log_channel(cluster) log >>>>> [ERR] >>>>> : 6.2f4 deep-scrub 0 missing, 1 inconsistent objects >>>>> 2016-08-06 06:24:53.931055 7fc54bbf8700 -1 log_channel(cluster) log >>>>> [ERR] >>>>> : 6.2f4 deep-scrub 1 errors >>>>> >>>>> I looked in dmesg but I couldn't see any IO errors on any of the OSDs >>>>> in >>>>> the acting set. Last week it was another set. It is of course possible >>>>> more >>>>> than 1 OSD is failing, but how can we check this, since there is >>>>> nothing >>>>> more in the logs? >>>>> >>>>> Thanks !! >>>>> >>>>> K >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@lists.ceph.com >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > -- > Goncalo Borges > Research Computing > ARC Centre of Excellence for Particle Physics at the Terascale > School of Physics A28 | University of Sydney, NSW 2006 > T: +61 2 93511937 > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com