[ceph-users] MDS recovery

2023-04-25 Thread jack
Hi All,

We have a CephFS cluster running Octopus with three control nodes each running 
an MDS, Monitor, and Manager on Ubuntu 20.04. The OS drive on one of these 
nodes failed recently and we had to do a fresh install, but made the mistake of 
installing Ubuntu 22.04 where Octopus is not available. We tried to force apt 
to use the Ubuntu 20.04 repo when installing Ceph so that it would install 
Octopus, but for some reason Quincy was still installed. We re-integrated this 
node and it seemed to work fine for about a week until our cluster reported 
damage to an MDS daemon and placed our filesystem into a degraded state.

cluster:
id: 692905c0-f271-4cd8-9e43-1c32ef8abd13
health: HEALTH_ERR
mons are allowing insecure global_id reclaim
1 filesystem is degraded
1 filesystem is offline
1 mds daemon damaged
noout flag(s) set
161 scrub errors
Possible data damage: 24 pgs inconsistent
8 pgs not deep-scrubbed in time
4 pgs not scrubbed in time
6 daemons have recently crashed

  services:
mon: 3 daemons, quorum database-0,file-server,webhost (age 12d)
mgr: database-0(active, since 4w), standbys: webhost, file-server
mds: cephfs:0/1 3 up:standby, 1 damaged
osd: 91 osds: 90 up (since 32h), 90 in (since 5M)
 flags noout

  task status:

  data:
pools:   7 pools, 633 pgs
objects: 169.18M objects, 640 TiB
usage:   883 TiB used, 251 TiB / 1.1 PiB avail
pgs: 605 active+clean
 23  active+clean+inconsistent
 4   active+clean+scrubbing+deep
 1   active+clean+scrubbing+deep+inconsistent

We are not sure if the Quincy/Octopus version mismatch is the problem, but we 
are in the process of downgrading this node now to ensure all nodes are running 
Octopus. Before doing that, we ran the following commands to try and recover:

$ cephfs-journal-tool --rank=cephfs:all journal export backup.bin

$ sudo cephfs-journal-tool --rank=cephfs:all event recover_dentries summary:

Events by type:
  OPEN: 29589
  PURGED: 1
  SESSION: 16
  SESSIONS: 4
  SUBTREEMAP: 127
  UPDATE: 70438
Errors: 0

$ cephfs-journal-tool --rank=cephfs:0 journal reset:

old journal was 170234219175~232148677
new journal start will be 170469097472 (2729620 bytes past old end)
writing journal head
writing EResetJournal entry
done

$ cephfs-table-tool all reset session

All of our MDS daemons are down and fail to restart with the following errors:

-3> 2023-04-20T10:25:15.072-0700 7f0465069700 -1 log_channel(cluster) log [ERR] 
: journal replay alloc 0x153af79 not in free 
[0x153af7d~0x1e8,0x153b35c~0x1f7,0x153b555~0x2,0x153b559~0x2,0x153b55d~0x2,0x153b561~0x2,0x153b565~0x1de,0x153b938~0x1fd,0x153bd2a~0x4,0x153bf23~0x4,0x153c11c~0x4,0x153cd7b~0x158,0x153ced8~0xac3128]
-2> 2023-04-20T10:25:15.072-0700 7f0465069700 -1 log_channel(cluster) log 
[ERR] : journal replay alloc 
[0x153af7a~0x1eb,0x153b35c~0x1f7,0x153b555~0x2,0x153b559~0x2,0x153b55d~0x2],
 only 
[0x153af7d~0x1e8,0x153b35c~0x1f7,0x153b555~0x2,0x153b559~0x2,0x153b55d~0x2]
 is in free 
[0x153af7d~0x1e8,0x153b35c~0x1f7,0x153b555~0x2,0x153b559~0x2,0x153b55d~0x2,0x153b561~0x2,0x153b565~0x1de,0x153b938~0x1fd,0x153bd2a~0x4,0x153bf23~0x4,0x153c11c~0x4,0x153cd7b~0x158,0x153ced8~0xac3128]
-1> 2023-04-20T10:25:15.072-0700 7f0465069700 -1 
/build/ceph-15.2.15/src/mds/journal.cc: In function 'void 
EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)' thread 7f0465069700 
time 2023-04-20T10:25:15.076784-0700
/build/ceph-15.2.15/src/mds/journal.cc: 1513: FAILED ceph_assert(inotablev == 
mds->inotable->get_version())

 ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus 
(stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x155) [0x7f04717a3be1]
 2: (()+0x26ade9) [0x7f04717a3de9]
 3: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x67e2) 
[0x560feaca36f2]
 4: (EUpdate::replay(MDSRank*)+0x42) [0x560feaca5bd2]
 5: (MDLog::_replay_thread()+0x90c) [0x560feac393ac]
 6: (MDLog::ReplayThread::entry()+0x11) [0x560fea920821]
 7: (()+0x8609) [0x7f0471318609]
 8: (clone()+0x43) [0x7f0470ee9163]

 0> 2023-04-20T10:25:15.076-0700 7f0465069700 -1 *** Caught signal 
(Aborted) **
 in thread 7f0465069700 thread_name:md_log_replay

 ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus 
(stable)
 1: (()+0x143c0) [0x7f04713243c0]
 2: (gsignal()+0xcb) [0x7f0470e0d03b]
 3: (abort()+0x12b) [0x7f0470dec859]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x1b0) [0x7f04717a3c3c]
 5: (()+0x26ade9) [0x7f04717a3de9]
 6: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x67e2) 
[0x560feaca36f2]
 7: (EUpdate::replay(MDSRank*)+0x42) [0x560feaca5bd2]
 8: (MDLog::_

[ceph-users] MDS recovery with existing pools

2023-12-07 Thread Eugen Block

Hi,

following up on the previous thread (After hardware failure tried to  
recover ceph and followed instructions for recovery using OSDS), we  
were able to get ceph back into a healthy state (including the unfound  
object). Now the CephFS needs to be recovered and I'm having trouble  
to fully understand the docs [1] which the next steps would be. We ran  
the following which according to [1] sets the state to existing but  
failed:


ceph fs new--force --recover

But how to continue from here? Should we expect an active MDS at this  
point or not? Because the "ceph fs status" output still shows rank 0  
as failed. We then tried:


ceph fs set  joinable true

But apparently it was already joinable, nothing changed. Before doing  
anything (destructive) from the advanced options [2] I wanted to ask  
the community, how to get on from here. I pasted the mds logs at the  
bottom, I'm not really sure if the current state is expected or not.  
Apparently, the journal recovers but the purge_queue does not:


mds.0.41 Booting: 2: waiting for purge queue recovered
mds.0.journaler.pq(ro) _finish_probe_end write_pos = 14797504512  
(header had 14789452521). recovered.

mds.0.purge_queue operator(): open complete
mds.0.purge_queue operator(): recovering write_pos
monclient: get_auth_request con 0x55c280bc5c00 auth_method 0
monclient: get_auth_request con 0x55c280ee0c00 auth_method 0
mds.0.journaler.pq(ro) _finish_read got error -2
mds.0.purge_queue _recover: Error -2 recovering write_pos
mds.0.purge_queue _go_readonly: going readonly because internal IO  
failed: No such file or directory

mds.0.journaler.pq(ro) set_readonly
mds.0.41 unhandled write error (2) No such file or directory, force  
readonly...

mds.0.cache force file system read-only
force file system read-only

Is this expected because the "--recover" flag prevents an active MDS  
or not? Before running "ceph mds rmfailed ..." and/or "ceph fs reset  
" with the --yes-i-really-mean-it flag I'd like to  
ask for your input. In which case should we run those commands? The  
docs are not really clear to me. Any input is highly appreciated!


Thanks!
Eugen

[1] https://docs.ceph.com/en/latest/cephfs/recover-fs-after-mon-store-loss/
[2]  
https://docs.ceph.com/en/latest/cephfs/administration/#advanced-cephfs-admin-settings


---snip---
Dec 07 15:35:48 node02 bash[692598]: debug-90>  
2023-12-07T13:35:47.730+ 7f4cd855f700  1 mds.storage.node02.hemalk  
Updating MDS map to version 41 from mon.0
Dec 07 15:35:48 node02 bash[692598]: debug-89>  
2023-12-07T13:35:47.730+ 7f4cd855f700  4 mds.0.purge_queue  
operator():  data pool 3 not found in OSDMap
Dec 07 15:35:48 node02 bash[692598]: debug-88>  
2023-12-07T13:35:47.730+ 7f4cd855f700  5 asok(0x55c27fe86000)  
register_command objecter_requests hook 0x55c27fe16310
Dec 07 15:35:48 node02 bash[692598]: debug-87>  
2023-12-07T13:35:47.730+ 7f4cd855f700 10 monclient: _renew_subs
Dec 07 15:35:48 node02 bash[692598]: debug-86>  
2023-12-07T13:35:47.730+ 7f4cd855f700 10 monclient:  
_send_mon_message to mon.node02 at v2:10.40.99.12:3300/0
Dec 07 15:35:48 node02 bash[692598]: debug-85>  
2023-12-07T13:35:47.730+ 7f4cd855f700 10 log_channel(cluster)  
update_config to_monitors: true to_syslog: false syslog_facility:   
prio: info to_graylog: false graylog_host: 127.0.0.1 graylog_port:  
12201)
Dec 07 15:35:48 node02 bash[692598]: debug-84>  
2023-12-07T13:35:47.730+ 7f4cd855f700  4 mds.0.purge_queue  
operator():  data pool 3 not found in OSDMap
Dec 07 15:35:48 node02 bash[692598]: debug-83>  
2023-12-07T13:35:47.730+ 7f4cd855f700  4 mds.0.0 apply_blocklist:  
killed 0, blocklisted sessions (0 blocklist entries, 0)
Dec 07 15:35:48 node02 bash[692598]: debug-82>  
2023-12-07T13:35:47.730+ 7f4cd855f700  1 mds.0.41 handle_mds_map i  
am now mds.0.41
Dec 07 15:35:48 node02 bash[692598]: debug-81>  
2023-12-07T13:35:47.734+ 7f4cd855f700  1 mds.0.41 handle_mds_map  
state change up:standby --> up:replay
Dec 07 15:35:48 node02 bash[692598]: debug-80>  
2023-12-07T13:35:47.734+ 7f4cd855f700  5  
mds.beacon.storage.node02.hemalk set_want_state: up:standby -> up:replay
Dec 07 15:35:48 node02 bash[692598]: debug-79>  
2023-12-07T13:35:47.734+ 7f4cd855f700  1 mds.0.41 replay_start
Dec 07 15:35:48 node02 bash[692598]: debug-78>  
2023-12-07T13:35:47.734+ 7f4cd855f700  2 mds.0.41 Booting: 0:  
opening inotable
Dec 07 15:35:48 node02 bash[692598]: debug-77>  
2023-12-07T13:35:47.734+ 7f4cd855f700 10 monclient:  
_send_mon_message to mon.node02 at v2:10.40.99.12:3300/0
Dec 07 15:35:48 node02 bash[692598]: debug-76>  
2023-12-07T13:35:47.734+ 7f4cd855f700  2 mds.0.41 Booting: 0:  
opening sessionmap
Dec 07 15:35:48 node02 bash[692598]: debug-75>  
2023-12-07T13:35:47.734+ 7f4cd855f700 10 monclient:  
_send_mon_message to mon.node02 at v2:10.40.99.12:3300/0
Dec 07 15:35:48 node02 bash[692598]: debug-74>