Further to this, we managed to repair the inconsistent PG by comparing the
object digests and removing the one that didn't match (3 of 4 replicas had
the same digest, 1 didn't) and then issuing a pg repair and scrub.
This has removed the inconsistent flag on the PG, however, we are still
seeing the mds report damage.

We tried removing the damage from the mds with damage rm, then ran a
recursive stat across the problem directory, but the damage re-appeared.
Tried dong a scrub_path, but the command returned code -2 and the mds log
shows that the scrub started and finished less than 1ms later.

Any further help is greatly appreciated.

On 17 May 2017 at 10:58, James Eckersall <james.eckers...@gmail.com> wrote:

> An update to this.  The cluster has been upgraded to Kraken, but I've
> still got the same PG reporting inconsistent and the same error message
> about mds metadata damaged.
> Can anyone offer any further advice please?
> If you need output from the ceph-osdomap-tool, could you please explain
> how to use it?  I haven't been able to find any docs that explain.
>
> Thanks
> J
>
> On 3 May 2017 at 14:35, James Eckersall <james.eckers...@gmail.com> wrote:
>
>> Hi David,
>>
>> Thanks for the reply, it's appreciated.
>> We're going to upgrade the cluster to Kraken and see if that fixes the
>> metadata issue.
>>
>> J
>>
>> On 2 May 2017 at 17:00, David Zafman <dzaf...@redhat.com> wrote:
>>
>>>
>>> James,
>>>
>>>     You have an omap corruption.  It is likely caused by a bug which has
>>> already been identified.  A fix for that problem is available but it is
>>> still pending backport for the next Jewel point release.  All 4 of your
>>> replicas have different "omap_digest" values.
>>>
>>> Instead of the xattrs the ceph-osdomap-tool --command
>>> dump-objects-with-keys output from OSDs 3, 10, 11, 23 would be interesting
>>> to compare.
>>>
>>> ***WARNING*** Please backup your data before doing any repair attempts.
>>>
>>> If you can upgrade to Kraken v11.2.0, it will auto repair the omaps on
>>> ceph-osd start up.  It will likely still require a ceph pg repair to make
>>> the 4 replicas consistent with each other.  The final result may be the
>>> reappearance of removed MDS files in the directory.
>>>
>>> If you can recover the data, you could remove the directory entirely and
>>> rebuild it.  The original bug was triggered during omap deletion typically
>>> in a large directory which corresponds to an individual unlink in cephfs.
>>>
>>> If you can build a branch in github to get the newer ceph-osdomap-tool
>>> you could try to use it to repair the omaps.
>>>
>>> David
>>>
>>>
>>> On 5/2/17 5:05 AM, James Eckersall wrote:
>>>
>>> Hi,
>>>
>>> I'm having some issues with a ceph cluster.  It's an 8 node cluster rnning
>>> Jewel ceph-10.2.7-0.el7.x86_64 on CentOS 7.
>>> This cluster provides RBDs and a CephFS filesystem to a number of clients.
>>>
>>> ceph health detail is showing the following errors:
>>>
>>> pg 2.9 is active+clean+inconsistent, acting [3,10,11,23]
>>> 1 scrub errors
>>> mds0: Metadata damage detected
>>>
>>>
>>> The pg 2.9 is in the cephfs_metadata pool (id 2).
>>>
>>> I've looked at the OSD logs for OSD 3, which is the primary for this PG,
>>> but the only thing that appears relating to this PG is the following:
>>>
>>> log_channel(cluster) log [ERR] : 2.9 deep-scrub 1 errors
>>>
>>> After initiating a ceph pg repair 2.9, I see the following in the primary
>>> OSD log:
>>>
>>> log_channel(cluster) log [ERR] : 2.9 repair 1 errors, 0 fixed
>>> log_channel(cluster) log [ERR] : 2.9 deep-scrub 1 errors
>>>
>>>
>>> I found the below command in a previous ceph-users post.  Running this
>>> returns the following:
>>>
>>> # rados list-inconsistent-obj 2.9
>>> {"epoch":23738,"inconsistents":[{"object":{"name":"10000411194.00000000","nspace":"","locator":"","snap":"head","version":14737091},"errors":["omap_digest_mismatch"],"union_shard_errors":[],"selected_object_info":"2:9758b358:::10000411194.00000000:head(33456'14737091
>>> mds.0.214448:248532 dirty|omap|data_digest s 0 uv 14737091 dd
>>> ffffffff)","shards":[{"osd":3,"errors":[],"size":0,"omap_digest":"0x6748eef3","data_digest":"0xffffffff"},{"osd":10,"errors":[],"size":0,"omap_digest":"0xa791d5a4","data_digest":"0xffffffff"},{"osd":11,"errors":[],"size":0,"omap_digest":"0x53f46ab0","data_digest":"0xffffffff"},{"osd":23,"errors":[],"size":0,"omap_digest":"0x97b80594","data_digest":"0xffffffff"}]}]}
>>>
>>>
>>> So from this, I think that the object in PG 2.9 with the problem is
>>> 10000411194.00000000.
>>>
>>> This is what I see on the filesystem on the 4 OSD's this PG resides on:
>>>
>>> -rw-r--r--. 1 ceph ceph 0 Apr 27 12:31
>>> /var/lib/ceph/osd/ceph-3/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
>>> -rw-r--r--. 1 ceph ceph 0 Apr 15 22:05
>>> /var/lib/ceph/osd/ceph-10/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
>>> -rw-r--r--. 1 ceph ceph 0 Apr 15 22:07
>>> /var/lib/ceph/osd/ceph-11/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
>>> -rw-r--r--. 1 ceph ceph 0 Apr 16 03:58
>>> /var/lib/ceph/osd/ceph-23/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
>>>
>>> The extended attrs are as follows, although I have no idea what any of them
>>> mean.
>>>
>>> # file:
>>> var/lib/ceph/osd/ceph-11/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
>>> user.ceph._=0sDwj5AAAABAM1AAAAAAAAABQAAAAxMDAwMDQxMTE5NC4wMDAwMDAwMP7/////////6RrNGgAAAAAAAgAAAAAAAAAGAxwAAAACAAAAAAAAAP////8AAAAAAAAAAP//////////AAAAABUn4QAAAAAAu4IAAK4m4QAAAAAAu4IAAAICFQAAAAIAAAAAAAAAAOSZDAAAAAAAsEUDAAAAAAAAAAAAjUoIWUgWsQQCAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVJ+EAAAAAAAAAAAAAAAAAABwAAACNSghZESm8BP///w==
>>> user.ceph._@1=0s//////8=
>>> user.ceph._layout=0sAgIYAAAAAAAAAAAAAAAAAAAA//////////8AAAAA
>>> user.ceph._parent=0sBQRPAQAAlBFBAAABAAAIAAAAAgIjAAAAjxFBAAABAAAPAAAAdHViZWFtYXRldXIubmV0qdgAAAAAAAACAh0AAAB/EUEAAAEAAAkAAAB3cC1yb2NrZXREAAAAAAAAAAICGQAAABYNQQAAAQAABQAAAGNhY2hlUgAAAAAAAAACAh4AAAAQDUEAAAEAAAoAAAB3cC1jb250ZW50NAMAAAAAAAACAhgAAAANDUEAAAEAAAQAAABodG1sIAEAAAAAAAACAikAAADagTMAAAEAABUAAABuZ2lueC1waHA3LWNsdmdmLWRhdGGJAAAAAAAAAAICMwAAADkAAAAAAQ==
>>> user.ceph._parent@1
>>> =0sAAAfAAAANDg4LTU3YjI2NTdmMmZhMTMtbWktcHJveWVjdG8tMXSQCAAAAAAAAgIcAAAAAQAAAAAAAAAIAAAAcHJvamVjdHPBAgcAAAAAAAIAAAAAAAAAAAAAAA==
>>> user.ceph.snapset=0sAgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==
>>> user.cephos.seq=0sAQEQAAAAgcAqFAAAAAAAAAAAAgAAAAA=
>>> user.cephos.spill_out=0sMAA=getfattr: Removing leading '/' from absolute
>>> path names
>>>
>>> # file:
>>> var/lib/ceph/osd/ceph-3/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
>>> user.ceph._=0sDwj5AAAABAM1AAAAAAAAABQAAAAxMDAwMDQxMTE5NC4wMDAwMDAwMP7/////////6RrNGgAAAAAAAgAAAAAAAAAGAxwAAAACAAAAAAAAAP////8AAAAAAAAAAP//////////AAAAABUn4QAAAAAAu4IAAK4m4QAAAAAAu4IAAAICFQAAAAIAAAAAAAAAAOSZDAAAAAAAsEUDAAAAAAAAAAAAjUoIWUgWsQQCAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVJ+EAAAAAAAAAAAAAAAAAABwAAACNSghZESm8BP///w==
>>> user.ceph._@1=0s//////8=
>>> user.ceph._layout=0sAgIYAAAAAAAAAAAAAAAAAAAA//////////8AAAAA
>>> user.ceph._parent=0sBQRPAQAAlBFBAAABAAAIAAAAAgIjAAAAjxFBAAABAAAPAAAAdHViZWFtYXRldXIubmV0qdgAAAAAAAACAh0AAAB/EUEAAAEAAAkAAAB3cC1yb2NrZXREAAAAAAAAAAICGQAAABYNQQAAAQAABQAAAGNhY2hlUgAAAAAAAAACAh4AAAAQDUEAAAEAAAoAAAB3cC1jb250ZW50NAMAAAAAAAACAhgAAAANDUEAAAEAAAQAAABodG1sIAEAAAAAAAACAikAAADagTMAAAEAABUAAABuZ2lueC1waHA3LWNsdmdmLWRhdGGJAAAAAAAAAAICMwAAADkAAAAAAQ==
>>> user.ceph._parent@1
>>> =0sAAAfAAAANDg4LTU3YjI2NTdmMmZhMTMtbWktcHJveWVjdG8tMXSQCAAAAAAAAgIcAAAAAQAAAAAAAAAIAAAAcHJvamVjdHPBAgcAAAAAAAIAAAAAAAAAAAAAAA==
>>> user.ceph.snapset=0sAgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==
>>> user.cephos.seq=0sAQEQAAAAZaQ9GwAAAAAAAAAAAgAAAAA=
>>> user.cephos.spill_out=0sMAA=getfattr: Removing leading '/' from absolute
>>> path names
>>>
>>> # file:
>>> var/lib/ceph/osd/ceph-10/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
>>> user.ceph._=0sDwj5AAAABAM1AAAAAAAAABQAAAAxMDAwMDQxMTE5NC4wMDAwMDAwMP7/////////6RrNGgAAAAAAAgAAAAAAAAAGAxwAAAACAAAAAAAAAP////8AAAAAAAAAAP//////////AAAAABUn4QAAAAAAu4IAAK4m4QAAAAAAu4IAAAICFQAAAAIAAAAAAAAAAOSZDAAAAAAAsEUDAAAAAAAAAAAAjUoIWUgWsQQCAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVJ+EAAAAAAAAAAAAAAAAAABwAAACNSghZESm8BP///w==
>>> user.ceph._@1=0s//////8=
>>> user.ceph._layout=0sAgIYAAAAAAAAAAAAAAAAAAAA//////////8AAAAA
>>> user.ceph._parent=0sBQRPAQAAlBFBAAABAAAIAAAAAgIjAAAAjxFBAAABAAAPAAAAdHViZWFtYXRldXIubmV0qdgAAAAAAAACAh0AAAB/EUEAAAEAAAkAAAB3cC1yb2NrZXREAAAAAAAAAAICGQAAABYNQQAAAQAABQAAAGNhY2hlUgAAAAAAAAACAh4AAAAQDUEAAAEAAAoAAAB3cC1jb250ZW50NAMAAAAAAAACAhgAAAANDUEAAAEAAAQAAABodG1sIAEAAAAAAAACAikAAADagTMAAAEAABUAAABuZ2lueC1waHA3LWNsdmdmLWRhdGGJAAAAAAAAAAICMwAAADkAAAAAAQ==
>>> user.ceph._parent@1
>>> =0sAAAfAAAANDg4LTU3YjI2NTdmMmZhMTMtbWktcHJveWVjdG8tMXSQCAAAAAAAAgIcAAAAAQAAAAAAAAAIAAAAcHJvamVjdHPBAgcAAAAAAAIAAAAAAAAAAAAAAA==
>>> user.ceph.snapset=0sAgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==
>>> user.cephos.seq=0sAQEQAAAA1T1dEQAAAAAAAAAAAgAAAAA=
>>> user.cephos.spill_out=0sMAA=getfattr: Removing leading '/' from absolute
>>> path names
>>>
>>> # file:
>>> var/lib/ceph/osd/ceph-23/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
>>> user.ceph._=0sDwj5AAAABAM1AAAAAAAAABQAAAAxMDAwMDQxMTE5NC4wMDAwMDAwMP7/////////6RrNGgAAAAAAAgAAAAAAAAAGAxwAAAACAAAAAAAAAP////8AAAAAAAAAAP//////////AAAAABUn4QAAAAAAu4IAAK4m4QAAAAAAu4IAAAICFQAAAAIAAAAAAAAAAOSZDAAAAAAAsEUDAAAAAAAAAAAAjUoIWUgWsQQCAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVJ+EAAAAAAAAAAAAAAAAAABwAAACNSghZESm8BP///w==
>>> user.ceph._@1=0s//////8=
>>> user.ceph._layout=0sAgIYAAAAAAAAAAAAAAAAAAAA//////////8AAAAA
>>> user.ceph._parent=0sBQRPAQAAlBFBAAABAAAIAAAAAgIjAAAAjxFBAAABAAAPAAAAdHViZWFtYXRldXIubmV0qdgAAAAAAAACAh0AAAB/EUEAAAEAAAkAAAB3cC1yb2NrZXREAAAAAAAAAAICGQAAABYNQQAAAQAABQAAAGNhY2hlUgAAAAAAAAACAh4AAAAQDUEAAAEAAAoAAAB3cC1jb250ZW50NAMAAAAAAAACAhgAAAANDUEAAAEAAAQAAABodG1sIAEAAAAAAAACAikAAADagTMAAAEAABUAAABuZ2lueC1waHA3LWNsdmdmLWRhdGGJAAAAAAAAAAICMwAAADkAAAAAAQ==
>>> user.ceph._parent@1
>>> =0sAAAfAAAANDg4LTU3YjI2NTdmMmZhMTMtbWktcHJveWVjdG8tMXSQCAAAAAAAAgIcAAAAAQAAAAAAAAAIAAAAcHJvamVjdHPBAgcAAAAAAAIAAAAAAAAAAAAAAA==
>>> user.ceph.snapset=0sAgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==
>>> user.cephos.seq=0sAQEQAAAADiM7AAAAAAAAAAAAAgAAAAA=
>>> user.cephos.spill_out=0sMAA=getfattr: Removing leading '/' from absolute
>>> path names
>>>
>>>
>>> With metadata damage issue, I can get the list of inodes with the command
>>> below.
>>>
>>> $ ceph tell mds.0 damage ls | python -m "json.tool"
>>> [
>>>     {
>>>         "damage_type": "dir_frag",
>>>         "frag": "*",
>>>         "id": 5129156,
>>>         "ino": 1099556021325
>>>     },
>>>     {
>>>         "damage_type": "dir_frag",
>>>         "frag": "*",
>>>         "id": 8983971,
>>>         "ino": 1099548098243
>>>     },
>>>     {
>>>         "damage_type": "dir_frag",
>>>         "frag": "*",
>>>         "id": 33278608,
>>>         "ino": 1099548257921
>>>     },
>>>     {
>>>         "damage_type": "dir_frag",
>>>         "frag": "*",
>>>         "id": 33455691,
>>>         "ino": 1099548271575
>>>     },
>>>     {
>>>         "damage_type": "dir_frag",
>>>         "frag": "*",
>>>         "id": 38203788,
>>>         "ino": 1099548134708
>>>     },
>>> ...
>>>
>>> All of the inodes (approx 800 of them) are for various directories within a
>>> wordpress cache directory.
>>> I ran an rm -rf on each of the directories as I do not need the content.
>>> The content of the directories was removed, but the directories are unable
>>> to be removed as rmdir reports they are not empty, despite having 0 files
>>> listed with ls.
>>>
>>> I'm not sure if these two issues are related to each other.  They were
>>> noticed within a day of each other.  I think the metadata damage error
>>> appeared before the scrub error.
>>>
>>> I'm at a bit of a loss with how to proceed and I don't want to make things
>>> worse.
>>>
>>> I'd really appreciate any help that anyone can give to try and resolve
>>> these problems.
>>>
>>> Thanks
>>>
>>> J
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing 
>>> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to