[ceph-users] mgr finish mon failed to return metadata for mds
The current ceph version that we use is 17.2.7. We see in the Manager logs the below errors: 2 mgr.server handle_open ignoring open from mds.storage.node01.zjltbu v2:10.40.99.11:6800/1327026642; not ready for session (expect reconnect) 0 7faf43715700 1 mgr finish mon failed to return metadata for mds.storage.node01.zjltbu: (2) No such file or directory and in # ceph fs status Error EINVAL: Traceback (most recent call last): File "/usr/share/ceph/mgr/mgr_module.py", line 1759, in _handle_command return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf) File "/usr/share/ceph/mgr/mgr_module.py", line 462, in call return self.func(mgr, **kwargs) File "/usr/share/ceph/mgr/status/module.py", line 109, in handle_fs_status assert metadata AssertionError Does anyone know what they are and what we can do to fix them ? Thanks, Manolis Under the General Data Protection Regulation (GDPR) (EU) 2016/679, Motivian as Data Controller has a legal duty to protect any information collected from you via email. Information contained in this email and any attachments may be privileged or confidential and intended for the exclusive use of the original recipient. If you have received this email by mistake, please advise the sender immediately and delete the email, including emptying your deleted email box. Information included in this email is reserved to named addressee's eyes only. You may not share this message or any of its attachments to anyone. Please note that as the recipient, it is your responsibility to check the email for malicious software. Motivian puts the security of the client at a high priority. Therefore, we have put efforts into ensuring that the message is error and virus-free. Unfortunately, full security of the email cannot be ensured as, despite our efforts, the data included in emails could be infected, intercepted, or corrupted. Therefore, the recipient should check the email for threats with proper software, as the sender does not accept liability for any damage inflicted by viewing the content of this email. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: After hardware failure tried to recover ceph and followed instructions for recovery using OSDS
5T20:05:23.103069+ pg 2.2 not scrubbed since 2023-11-15T05:46:04.363718+ pg 3.2 not scrubbed since 2023-11-14T20:25:27.750532+ pg 3.c not scrubbed since 2023-11-15T18:47:44.742320+ pg 3.27 not scrubbed since 2023-11-15T21:09:57.747494+ pg 3.2a not scrubbed since 2023-11-15T18:01:21.875230+ [WRN] POOL_NEARFULL: 3 pool(s) nearfull pool '.mgr' is nearfull pool 'cephfs.storage.meta' is nearfull pool 'cephfs.storage.data' is nearfull Any ideas ? Thanks, Manolis Daramas -Original Message- From: Eugen Block Sent: Tuesday, November 21, 2023 1:10 PM To: ceph-users@ceph.io Subject: [ceph-users] Re: After hardware failure tried to recover ceph and followed instructions for recovery using OSDS Hi, I guess you could just redeploy the third MON which fails to start (after the orchestrator is responding again) unless you figured it out already. What is it logging? > 1 osds exist in the crush map but not in the osdmap This could be due to the input/output error, but it's just a guess: > osd.10 : 9225 osdmaps trimmed, 0 osdmaps added. > Mount failed with '(5) Input/output error' Can you add the 'ceph osd tree' output? > # ceph fs ls (output below): > No filesystems enabled Ceph doesn't report active MDS daemons, there are two processes listed, one on node01, the other on node02. What are those daemons logging? > It looks like that we have a problem with the orchestrator now > (we've lost cephadm orchestrator) and we also cannot see the > filesystem. Depending on the cluster status the orchestrator might not behave as expected, and HEALTH_ERR isn't too good, of course. But you could try to do a 'ceph mgr fail' and see if it reacts again. Zitat von Manolis Daramas : > Hello everyone, > > We had a recent power failure on a server which hosts a 3-node ceph > cluster (with Ubuntu 20.04 and Ceph version 17.2.7) and we think > that we may have lost some of our data if not all of them. > > We have followed the instructions on > https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds > but with no > luck. > > We have kept a backup of store.db folder on all 3 nodes prior the > below steps. > > We have stopped ceph.target on all 3 nodes. > > We have run the first part of the script and we have altered it > according to our configuration: > > ms=/root/mon-store > mkdir $ms > > hosts="node01 node02 node03" > # collect the cluster map from stopped OSDs > for host in $hosts; do > rsync -avz $ms/. root@$host:$ms.remote > rm -rf $ms > ssh root@$host < for osd in /var/lib/ceph/be4304e4-b0d5-11ec-8c6a-2965d4229f37/osd*; do > ceph-objectstore-tool --data-path \$osd --no-mon-config --op > update-mon-db --mon-store-path $ms.remote > done > EOF > rsync -avz root@$host:$ms.remote/. $ms > done > > and the results were: > > for node01 > > sd.0 : 0 osdmaps trimmed, 673 osdmaps added. > osd.10 : 9225 osdmaps trimmed, 0 osdmaps added. > Mount failed with '(5) Input/output error' > osd.4 : 0 osdmaps trimmed, 0 osdmaps added. > osd.8 : 0 osdmaps trimmed, 0 osdmaps added. > receiving incremental file list > created directory /root/mon-store > ./ > kv_backend > store.db/ > store.db/08.sst > store.db/14.sst > store.db/20.sst > store.db/22.log > store.db/CURRENT > store.db/IDENTITY > store.db/LOCK > store.db/MANIFEST-21 > store.db/OPTIONS-18 > store.db/OPTIONS-24 > > sent 248 bytes received 286,474 bytes 191,148.00 bytes/sec > total size is 7,869,025 speedup is 27.44 > sending incremental file list > created directory /root/mon-store.remote > ./ > kv_backend > store.db/ > store.db/08.sst > store.db/14.sst > store.db/20.sst > store.db/22.log > store.db/CURRENT > store.db/IDENTITY > store.db/LOCK > store.db/MANIFEST-21 > store.db/OPTIONS-18 > store.db/OPTIONS-24 > > sent 286,478 bytes received 285 bytes 191,175.33 bytes/sec > total size is 7,869,025 speedup is 27.44 > > for node02 > > osd.12 : 0 osdmaps trimmed, 0 osdmaps added. > osd.2 : 0 osdmaps trimmed, 0 osdmaps added. > osd.5 : 0 osdmaps trimmed, 0 osdmaps added. > osd.7 : 0 osdmaps trimmed, 0 osdmaps added. > osd.9 : 0 osdmaps trimmed, 0 osdmaps added. > receiving incremental file list > created directory /root/mon-store > ./ > kv_backend > store.db/ > store.db/08.sst > store.db/14.sst > store.db/20.sst > store.db/26.sst > store.db/32.sst > store.db/38.sst > store.db/44.sst > store.db/50.sst > store.db/52.log > store.db/CURRENT > store.db/IDENTITY > store.db/LOCK > store.db/MANIFEST-51 > store.d
[ceph-users] After hardware failure tried to recover ceph and followed instructions for recovery using OSDS
d5-11ec-8c6a-2965d4229f37-osd-1 f4cda871218d quay.io/ceph/ceph "/usr/bin/ceph-osd -..." 2 days ago Up 2 days ceph-be4304e4-b0d5-11ec-8c6a-2965d4229f37-osd-13 969e670dc47c quay.io/ceph/ceph "/usr/bin/ceph-osd -..." 2 days ago Up 2 days ceph-be4304e4-b0d5-11ec-8c6a-2965d4229f37-osd-6 a49e91a7bb8e quay.io/prometheus/node-exporter:v1.5.0 "/bin/node_exporter ..." 2 days ago Up 2 days ceph-be4304e4-b0d5-11ec-8c6a-2965d4229f37-node-exporter-node03 835c3893a3f4 quay.io/ceph/ceph "/usr/bin/ceph-crash..." 2 days ago Up 2 days ceph-be4304e4-b0d5-11ec-8c6a-2965d4229f37-crash-node03 bfa6f5b989ea quay.io/ceph/ceph "/usr/bin/ceph-mon -..." 2 days ago Up 2 days ceph-be4304e4-b0d5-11ec-8c6a-2965d4229f37-mon-node03 # ceph -s (output below): cluster: id: be4304e4-b0d5-11ec-8c6a-2965d4229f37 health: HEALTH_ERR 20 stray daemon(s) not managed by cephadm 3 stray host(s) with 20 daemon(s) not managed by cephadm 1/3 mons down, quorum node02,node03 1/523510 objects unfound (0.000%) 3 nearfull osd(s) 1 osds exist in the crush map but not in the osdmap Low space hindering backfill (add storage if this doesn't resolve itself): 20 pgs backfill_toofull Possible data damage: 1 pg recovery_unfound Degraded data redundancy: 74666/1570530 objects degraded (4.754%), 21 pgs degraded, 21 pgs undersized 3 pool(s) nearfull services: mon: 3 daemons, quorum node02,node03 (age 2d), out of quorum: node01 mgr: node01.xlciyx(active, since 2d), standbys: node02.gudauu osd: 14 osds: 14 up (since 2d), 14 in (since 3d); 21 remapped pgs data: pools: 3 pools, 161 pgs objects: 523.51k objects, 299 GiB usage: 1014 GiB used, 836 GiB / 1.8 TiB avail pgs: 74666/1570530 objects degraded (4.754%) 1/523510 objects unfound (0.000%) 140 active+clean 20 active+undersized+degraded+remapped+backfill_toofull 1. active+recovery_unfound+undersized+degraded+remapped # ceph fs ls (output below): No filesystems enabled It looks like that we have a problem with the orchestrator now (we've lost cephadm orchestrator) and we also cannot see the filesystem. May you please assist since we are not able to mount the filesystem ? Thank you, Manolis Daramas Under the General Data Protection Regulation (GDPR) (EU) 2016/679, Motivian as Data Controller has a legal duty to protect any information collected from you via email. Information contained in this email and any attachments may be privileged or confidential and intended for the exclusive use of the original recipient. If you have received this email by mistake, please advise the sender immediately and delete the email, including emptying your deleted email box. Information included in this email is reserved to named addressee's eyes only. You may not share this message or any of its attachments to anyone. Please note that as the recipient, it is your responsibility to check the email for malicious software. Motivian puts the security of the client at a high priority. Therefore, we have put efforts into ensuring that the message is error and virus-free. Unfortunately, full security of the email cannot be ensured as, despite our efforts, the data included in emails could be infected, intercepted, or corrupted. Therefore, the recipient should check the email for threats with proper software, as the sender does not accept liability for any damage inflicted by viewing the content of this email. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io