Re: [ceph-users] MDS damaged

Daniel Davidson Tue, 24 Oct 2017 22:04:41 -0700

This finally finished:

2017-10-24 22:50:11.766519 7f775e539bc0 1 scavenge_dentries: frag607.00000000 is corrupt, overwriting

Events by type:
  OPEN: 5640344
  SESSION: 10
  SUBTREEMAP: 8070
  UPDATE: 1384964
Errors: 0


I truncated:
#cephfs-journal-tool journal reset
old journal was 6255163020467~8616264519
new journal start will be 6263781982208 (2697222 bytes past old end)
writing journal head
writing EResetJournal entry
done

I reset sessions:
# cephfs-table-tool all reset session
{
    "0": {
        "data": {},
        "result": 0
    }
}

I marked it repaired:

#ceph mds repaired 0

And I still got errors as show from ceph -w:

2017-10-25 00:02:08.929404 mds.0 [ERR] dir 607 object missing on disk;some files may be lost (~mds0/stray7)2017-10-25 00:02:09.099472 mon.0 [INF] mds.0 172.16.31.1:6800/3462673422down:damaged

2017-10-25 00:02:09.105643 mon.0 [INF] fsmap e121619: 0/1/1 up, 1 damaged

2017-10-25 00:02:10.182101 mon.0 [INF] mds.? 172.16.31.1:6809/2991612296up:boot2017-10-25 00:02:10.182189 mon.0 [INF] fsmap e121620: 0/1/1 up, 1up:standby, 1 damaged


What should I do next? ceph fs reset igbhome scares me.

Dan


On 10/24/2017 09:25 PM, Daniel Davidson wrote:

Out of desperation, I started with the disaster recovery guide:

http://docs.ceph.com/docs/jewel/cephfs/disaster-recovery/

After exporting the journal, I started doing:

cephfs-journal-tool event recover_dentries summary
And that was about 7 hours ago, and it is still running. I am gettinga lot of messages like:
2017-10-24 21:24:10.910489 7f775e539bc0 1 scavenge_dentries: frag607.00000000 is corrupt, overwriting
The frag number is the same for every line and there have been thousands.

I really could use some assistance,

Dan




On 10/24/2017 12:14 PM, Daniel Davidson wrote:
Our ceph system is having a problem.
A few days a go we had a pg that was marked as inconsistent, andtoday I fixed it with a:
#ceph pg repair 1.37c

then a file was stuck as missing so I did a:

#ceph pg 1.37c mark_unfound_lost delete
pg has 1 objects unfound and apparently lost marking
That fixed the unfound file problem and all the pgs wentactive+clean. A few minutes later though, the FS seemed to pause andthe MDS started giving errors.
# ceph -w
    cluster 7bffce86-9d7b-4bdf-a9c9-67670e68ca77
     health HEALTH_ERR
            mds rank 0 is damaged
            mds cluster is degraded
            noscrub,nodeep-scrub flag(s) set
monmap e3: 4 mons at{ceph-0=172.16.31.1:6789/0,ceph-1=172.16.31.2:6789/0,ceph-2=172.16.31.3:6789/0,ceph-3=172.16.31.4:6789/0} election epoch 652, quorum 0,1,2,3ceph-0,ceph-1,ceph-2,ceph-3
      fsmap e121409: 0/1/1 up, 4 up:standby, 1 damaged
     osdmap e35220: 32 osds: 32 up, 32 in
            flags noscrub,nodeep-scrub,sortbitwise,require_jewel_osds
      pgmap v28398840: 1536 pgs, 2 pools, 795 TB data, 329 Mobjects
            1595 TB used, 1024 TB / 2619 TB avail
                1536 active+clean

Looking into the logs when I try a:

#ceph mds repaired 0
2017-10-24 12:01:27.354271 mds.0 172.16.31.3:6801/1949050374 75 :cluster [ERR] dir 607 object missing on disk; some files may be lost(~mds0/stray7)
Any ideas as for what to do next, I am stumped.

Dan

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] MDS damaged

Reply via email to