[ceph-users] mgr finish mon failed to return metadata for mds

2023-12-12 Thread Manolis Daramas
The current ceph version that we use is 17.2.7.

We see in the Manager logs the below errors:

2 mgr.server handle_open ignoring open from mds.storage.node01.zjltbu 
v2:10.40.99.11:6800/1327026642; not ready for session (expect reconnect)
0 7faf43715700  1 mgr finish mon failed to return metadata for 
mds.storage.node01.zjltbu: (2) No such file or directory

and in

# ceph fs status
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 1759, in _handle_command
return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 462, in call
return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/status/module.py", line 109, in handle_fs_status
assert metadata
AssertionError

Does anyone know what they are and what we can do to fix them ?

Thanks,

Manolis
Under the General Data Protection Regulation (GDPR) (EU) 2016/679, Motivian as 
Data Controller has a legal duty to protect any information collected from you 
via email. Information contained in this email and any attachments may be 
privileged or confidential and intended for the exclusive use of the original 
recipient. If you have received this email by mistake, please advise the sender 
immediately and delete the email, including emptying your deleted email box. 
Information included in this email is reserved to named addressee's eyes only. 
You may not share this message or any of its attachments to anyone. Please note 
that as the recipient, it is your responsibility to check the email for 
malicious software. Motivian puts the security of the client at a high 
priority. Therefore, we have put efforts into ensuring that the message is 
error and virus-free. Unfortunately, full security of the email cannot be 
ensured as, despite our efforts, the data included in emails could be infected, 
 intercepted, or corrupted. Therefore, the recipient should check the email for 
threats with proper software, as the sender does not accept liability for any 
damage inflicted by viewing the content of this email.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: After hardware failure tried to recover ceph and followed instructions for recovery using OSDS

2023-12-05 Thread Manolis Daramas
5T20:05:23.103069+
pg 2.2 not scrubbed since 2023-11-15T05:46:04.363718+
pg 3.2 not scrubbed since 2023-11-14T20:25:27.750532+
pg 3.c not scrubbed since 2023-11-15T18:47:44.742320+
pg 3.27 not scrubbed since 2023-11-15T21:09:57.747494+
pg 3.2a not scrubbed since 2023-11-15T18:01:21.875230+
[WRN] POOL_NEARFULL: 3 pool(s) nearfull
pool '.mgr' is nearfull
pool 'cephfs.storage.meta' is nearfull
pool 'cephfs.storage.data' is nearfull

Any ideas ?

Thanks,

Manolis Daramas

-Original Message-
From: Eugen Block 
Sent: Tuesday, November 21, 2023 1:10 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: After hardware failure tried to recover ceph and 
followed instructions for recovery using OSDS

Hi,

I guess you could just redeploy the third MON which fails to start
(after the orchestrator is responding again) unless you figured it out
already. What is it logging?

> 1 osds exist in the crush map but not in the osdmap

This could be due to the input/output error, but it's just a guess:

> osd.10  : 9225 osdmaps trimmed, 0 osdmaps added.
> Mount failed with '(5) Input/output error'

Can you add the 'ceph osd tree' output?

> # ceph fs ls (output below):
> No filesystems enabled

Ceph doesn't report active MDS daemons, there are two processes
listed, one on node01, the other on node02. What are those daemons
logging?

> It looks like that we have a problem with the orchestrator now
> (we've lost cephadm orchestrator) and we also cannot see the
> filesystem.

Depending on the cluster status the orchestrator might not behave as
expected, and HEALTH_ERR isn't too good, of course. But you could try
to do a 'ceph mgr fail' and see if it reacts again.

Zitat von Manolis Daramas :

> Hello everyone,
>
> We had a recent power failure on a server which hosts a 3-node ceph
> cluster (with Ubuntu 20.04 and Ceph version 17.2.7) and we think
> that we may have lost some of our data if not all of them.
>
> We have followed the instructions on
> https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds
>  but with no
> luck.
>
> We have kept a backup of store.db folder on all 3 nodes prior the
> below steps.
>
> We have stopped ceph.target on all 3 nodes.
>
> We have run the first part of the script and we have altered it
> according to our configuration:
>
> ms=/root/mon-store
> mkdir $ms
>
> hosts="node01 node02 node03"
> # collect the cluster map from stopped OSDs
> for host in $hosts; do
>   rsync -avz $ms/. root@$host:$ms.remote
>   rm -rf $ms
>   ssh root@$host < for osd in /var/lib/ceph/be4304e4-b0d5-11ec-8c6a-2965d4229f37/osd*; do
>   ceph-objectstore-tool --data-path \$osd --no-mon-config --op
> update-mon-db --mon-store-path $ms.remote
> done
> EOF
>   rsync -avz root@$host:$ms.remote/. $ms
> done
>
> and the results were:
>
> for node01
>
> sd.0   : 0 osdmaps trimmed, 673 osdmaps added.
> osd.10  : 9225 osdmaps trimmed, 0 osdmaps added.
> Mount failed with '(5) Input/output error'
> osd.4   : 0 osdmaps trimmed, 0 osdmaps added.
> osd.8   : 0 osdmaps trimmed, 0 osdmaps added.
> receiving incremental file list
> created directory /root/mon-store
> ./
> kv_backend
> store.db/
> store.db/08.sst
> store.db/14.sst
> store.db/20.sst
> store.db/22.log
> store.db/CURRENT
> store.db/IDENTITY
> store.db/LOCK
> store.db/MANIFEST-21
> store.db/OPTIONS-18
> store.db/OPTIONS-24
>
> sent 248 bytes  received 286,474 bytes  191,148.00 bytes/sec
> total size is 7,869,025  speedup is 27.44
> sending incremental file list
> created directory /root/mon-store.remote
> ./
> kv_backend
> store.db/
> store.db/08.sst
> store.db/14.sst
> store.db/20.sst
> store.db/22.log
> store.db/CURRENT
> store.db/IDENTITY
> store.db/LOCK
> store.db/MANIFEST-21
> store.db/OPTIONS-18
> store.db/OPTIONS-24
>
> sent 286,478 bytes  received 285 bytes  191,175.33 bytes/sec
> total size is 7,869,025  speedup is 27.44
>
> for node02
>
> osd.12  : 0 osdmaps trimmed, 0 osdmaps added.
> osd.2   : 0 osdmaps trimmed, 0 osdmaps added.
> osd.5   : 0 osdmaps trimmed, 0 osdmaps added.
> osd.7   : 0 osdmaps trimmed, 0 osdmaps added.
> osd.9   : 0 osdmaps trimmed, 0 osdmaps added.
> receiving incremental file list
> created directory /root/mon-store
> ./
> kv_backend
> store.db/
> store.db/08.sst
> store.db/14.sst
> store.db/20.sst
> store.db/26.sst
> store.db/32.sst
> store.db/38.sst
> store.db/44.sst
> store.db/50.sst
> store.db/52.log
> store.db/CURRENT
> store.db/IDENTITY
> store.db/LOCK
> store.db/MANIFEST-51
> store.d

[ceph-users] After hardware failure tried to recover ceph and followed instructions for recovery using OSDS

2023-11-20 Thread Manolis Daramas
d5-11ec-8c6a-2965d4229f37-osd-1
f4cda871218d   quay.io/ceph/ceph "/usr/bin/ceph-osd 
-..."   2 days ago  Up 2 days 
ceph-be4304e4-b0d5-11ec-8c6a-2965d4229f37-osd-13
969e670dc47c   quay.io/ceph/ceph "/usr/bin/ceph-osd 
-..."   2 days ago  Up 2 days 
ceph-be4304e4-b0d5-11ec-8c6a-2965d4229f37-osd-6
a49e91a7bb8e   quay.io/prometheus/node-exporter:v1.5.0   "/bin/node_exporter 
..."   2 days ago  Up 2 days 
ceph-be4304e4-b0d5-11ec-8c6a-2965d4229f37-node-exporter-node03
835c3893a3f4   quay.io/ceph/ceph 
"/usr/bin/ceph-crash..."   2 days ago  Up 2 days 
ceph-be4304e4-b0d5-11ec-8c6a-2965d4229f37-crash-node03
bfa6f5b989ea   quay.io/ceph/ceph "/usr/bin/ceph-mon 
-..."   2 days ago  Up 2 days 
ceph-be4304e4-b0d5-11ec-8c6a-2965d4229f37-mon-node03


# ceph -s (output below):

cluster:
id: be4304e4-b0d5-11ec-8c6a-2965d4229f37
health: HEALTH_ERR
20 stray daemon(s) not managed by cephadm
3 stray host(s) with 20 daemon(s) not managed by cephadm
1/3 mons down, quorum node02,node03
1/523510 objects unfound (0.000%)
3 nearfull osd(s)
1 osds exist in the crush map but not in the osdmap
Low space hindering backfill (add storage if this doesn't resolve 
itself): 20 pgs backfill_toofull
Possible data damage: 1 pg recovery_unfound
Degraded data redundancy: 74666/1570530 objects degraded (4.754%), 
21 pgs degraded, 21 pgs undersized
3 pool(s) nearfull

  services:
mon: 3 daemons, quorum node02,node03 (age 2d), out of quorum: node01
mgr: node01.xlciyx(active, since 2d), standbys: node02.gudauu
osd: 14 osds: 14 up (since 2d), 14 in (since 3d); 21 remapped pgs

  data:
pools:   3 pools, 161 pgs
objects: 523.51k objects, 299 GiB
usage:   1014 GiB used, 836 GiB / 1.8 TiB avail
pgs: 74666/1570530 objects degraded (4.754%)
 1/523510 objects unfound (0.000%)
 140 active+clean
 20  active+undersized+degraded+remapped+backfill_toofull

  1.  active+recovery_unfound+undersized+degraded+remapped

# ceph fs ls (output below):
No filesystems enabled

It looks like that we have a problem with the orchestrator now (we've lost 
cephadm orchestrator) and we also cannot see the filesystem.


May you please assist since we are not able to mount the filesystem ?


Thank you,

Manolis Daramas


Under the General Data Protection Regulation (GDPR) (EU) 2016/679, Motivian as 
Data Controller has a legal duty to protect any information collected from you 
via email. Information contained in this email and any attachments may be 
privileged or confidential and intended for the exclusive use of the original 
recipient. If you have received this email by mistake, please advise the sender 
immediately and delete the email, including emptying your deleted email box. 
Information included in this email is reserved to named addressee's eyes only. 
You may not share this message or any of its attachments to anyone. Please note 
that as the recipient, it is your responsibility to check the email for 
malicious software. Motivian puts the security of the client at a high 
priority. Therefore, we have put efforts into ensuring that the message is 
error and virus-free. Unfortunately, full security of the email cannot be 
ensured as, despite our efforts, the data included in emails could be infected, 
 intercepted, or corrupted. Therefore, the recipient should check the email for 
threats with proper software, as the sender does not accept liability for any 
damage inflicted by viewing the content of this email.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io