[ceph-users] Re: osd out cant' bring it back online

Oliver Weinmann Tue, 01 Dec 2020 05:22:05 -0800

Hi Stefan,

unfortunately It doesn't start.


The failed osd (osd.0) is located on gedaopl02

[root@gedasvl02 ~]# ceph osd tree
INFO:cephadm:Inferring fsid d0920c36-2368-11eb-a5de-005056b703af

INFO:cephadm:Inferring config/var/lib/ceph/d0920c36-2368-11eb-a5de-005056b703af/mon.gedasvl02/config

INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
ID  CLASS  WEIGHT   TYPE NAME           STATUS  REWEIGHT  PRI-AFF
-1         0.43658  root default
-7         0.21829      host gedaopl01
 2    ssd  0.21829          osd.2           up   1.00000  1.00000
-3               0      host gedaopl02
-5         0.21829      host gedaopl03
 3    ssd  0.21829          osd.3           up   1.00000  1.00000
 0               0  osd.0                 down         0  1.00000


[root@gedaopl02 ~]# systemctl --failed
UNIT LOAD   ACTIVE SUB    DESCRIPTION

● ceph-d0920c36-2368-11eb-a5de-005056b703af@mgr.gedaopl02.pijxbm.serviceloaded failed failed Ceph mgr.gedaopl02.pijxbm ford0920c36-2368-11eb-a5de-005056b703af● ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service loaded failedfailed Ceph osd.0 for d0920c36-2368-11eb-a5de-005056b703af● ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.1.service loaded failedfailed Ceph osd.1 for d0920c36-2368-11eb-a5de-005056b703af


LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

3 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

I can start the service but then after a minute or so it fails. MaybeI'm looking at the wrong log file, but it's empty:

[root@gedaopl02 ~]# tail -f/var/log/ceph/d0920c36-2368-11eb-a5de-005056b703af/ceph-osd.0.log

Yesterday when I deleted the failed osd and recreated it there were lotsof message in the log file:


https://pastebin.com/5hH27pdR

Cheers,

Oliver

Am 01.12.2020 um 09:22 schrieb Stefan Kooman:

On 2020-11-30 15:55, Oliver Weinmann wrote:

I have another error "pgs undersized", maybe this is also causing trouble?

This is a result of the loss of one OSD, and the PGs located on it. As
you only have 1 OSDs left, the cluster cannot recover on a third OSD
(assuming defaults here). The cluster will heal itself as soon as the
third OSD will be back online.

Can you start the OSD? If not, can you provide logs of the failing OSD?

Gr. Stefan

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: osd out cant' bring it back online

Reply via email to