[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Boris Behrens
Hi Mohamed,
are all mons down, or do you still have at least one that is running?

AFAIK: the mons save their DB on the normal OS disks, and not within the
ceph cluster.
So if all mons are dead, which mean the disks which contained the mon data
are unrecoverable dead, you might need to bootstrap a new cluster and add
the OSDs to the new cluster. This will likely include tinkering with cephx
authentication, so you don't wipe the old OSD data.

If you still have at least ONE mon alive, you can shut it down, and remove
all the other mons from the monmap and start it again. You CAN have
clusters with only one mon.

Or is did your host just lost the boot disk and you just need to bring it
up somehow? losing 4x2 NVME disks at the same time, sounds a bit strange.

Am Do., 2. Nov. 2023 um 11:34 Uhr schrieb Mohamed LAMDAOUAR <
mohamed.lamdao...@enyx.fr>:

> Hello,
>
>   I have 7 machines on CEPH cluster, the service ceph runs on a docker
> container.
>  Each machine has 4 hdd of data (available) and 2 nvme sssd (bricked)
>   During a reboot, the ssd bricked on 4 machines, the data are available on
> the HDD disk but the nvme is bricked and the system is not available. is it
> possible to recover the data of the cluster (the data disk are all
> available)
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Robert Sander

Hi,

On 11/2/23 11:28, Mohamed LAMDAOUAR wrote:


   I have 7 machines on CEPH cluster, the service ceph runs on a docker
container.
  Each machine has 4 hdd of data (available) and 2 nvme sssd (bricked)
   During a reboot, the ssd bricked on 4 machines, the data are available on
the HDD disk but the nvme is bricked and the system is not available. is it
possible to recover the data of the cluster (the data disk are all
available)


You can try to recover the MON db from the OSDs, as they keep a copy of it:

https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-mon/#monitor-store-failures

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Joachim Kraftmayer - ceph ambassador

Hi,

another short note regarding the documentation, the paths are designed 
for a package installation.


the paths for container installation look a bit different e.g.: 
/var/lib/ceph//osd.y/


Joachim

___
ceph ambassador DACH
ceph consultant since 2012

Clyso GmbH - Premier Ceph Foundation Member

https://www.clyso.com/

Am 02.11.23 um 12:02 schrieb Robert Sander:

Hi,

On 11/2/23 11:28, Mohamed LAMDAOUAR wrote:


   I have 7 machines on CEPH cluster, the service ceph runs on a docker
container.
  Each machine has 4 hdd of data (available) and 2 nvme sssd (bricked)
   During a reboot, the ssd bricked on 4 machines, the data are 
available on
the HDD disk but the nvme is bricked and the system is not available. 
is it

possible to recover the data of the cluster (the data disk are all
available)


You can try to recover the MON db from the OSDs, as they keep a copy 
of it:


https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-mon/#monitor-store-failures 



Regards

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Mohamed LAMDAOUAR
Hello Boris,

I have one server monitor up and two other servers of the cluster are also
up (These two servers are not monitors ) .
I have four other servers down (the boot disk is out) but the osd data
disks are safe.
I reinstalled the OS on a  new SSD disk. How can I rebuild my cluster with
only one mons.
If you would like, you can join me for a meeting. I will give you more
information about the cluster.

Thanks for your help, I'm very stuck because the data is present but I
don't know how to add the old osd in the cluster to recover the data.



Le jeu. 2 nov. 2023 à 11:55, Boris Behrens  a écrit :

> Hi Mohamed,
> are all mons down, or do you still have at least one that is running?
>
> AFAIK: the mons save their DB on the normal OS disks, and not within the
> ceph cluster.
> So if all mons are dead, which mean the disks which contained the mon data
> are unrecoverable dead, you might need to bootstrap a new cluster and add
> the OSDs to the new cluster. This will likely include tinkering with cephx
> authentication, so you don't wipe the old OSD data.
>
> If you still have at least ONE mon alive, you can shut it down, and remove
> all the other mons from the monmap and start it again. You CAN have
> clusters with only one mon.
>
> Or is did your host just lost the boot disk and you just need to bring it
> up somehow? losing 4x2 NVME disks at the same time, sounds a bit strange.
>
> Am Do., 2. Nov. 2023 um 11:34 Uhr schrieb Mohamed LAMDAOUAR <
> mohamed.lamdao...@enyx.fr>:
>
> > Hello,
> >
> >   I have 7 machines on CEPH cluster, the service ceph runs on a docker
> > container.
> >  Each machine has 4 hdd of data (available) and 2 nvme sssd (bricked)
> >   During a reboot, the ssd bricked on 4 machines, the data are available
> on
> > the HDD disk but the nvme is bricked and the system is not available. is
> it
> > possible to recover the data of the cluster (the data disk are all
> > available)
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Mohamed LAMDAOUAR
Thanks Robert,

I tried this but I'm stuck. If you have some time, do help me with that I
will be very happy because I'm lost :(



8 rue greneta, 75003, Paris, FRANCE
enyx.com


*exegy.com
*

*Mohamed*

*Lamdaouar*

*Infrastructure Engineer*

mohamed.lamdao...@enyx.fr



Le jeu. 2 nov. 2023 à 12:08, Robert Sander  a
écrit :

> Hi,
>
> On 11/2/23 11:28, Mohamed LAMDAOUAR wrote:
>
> >I have 7 machines on CEPH cluster, the service ceph runs on a docker
> > container.
> >   Each machine has 4 hdd of data (available) and 2 nvme sssd (bricked)
> >During a reboot, the ssd bricked on 4 machines, the data are
> available on
> > the HDD disk but the nvme is bricked and the system is not available. is
> it
> > possible to recover the data of the cluster (the data disk are all
> > available)
>
> You can try to recover the MON db from the OSDs, as they keep a copy of it:
>
>
> https://antiphishing.vadesecure.com/v4?f=V3p0eFlQOUZ4czh2enpJS7j2EMloAUIO32fbOvv14lAE8TBfIaEnIXo1udoGNqFP&i=SHV0Y1JZQjNyckJFa3dUQgXEoh5gS3KVt16QIfDo2EM&k=ZVd0&r=T0hnMlUyVEgwNmlmdHc1NSadQBRlJgURlsomDarMY4-jUWYEpAndmhRvBGKG0NP9&s=ce3a282420f5ed2a0e4f461ee9ce796e4a8a9a8a48926d26f05d22ec8b4821cf&u=https%3A%2F%2Fdocs.ceph.com%2Fen%2Freef%2Frados%2Ftroubleshooting%2Ftroubleshooting-mon%2F%23monitor-store-failures
>
> Regards
> --
> Robert Sander
> Heinlein Consulting GmbH
> Schwedter Str. 8/9b, 10119 Berlin
>
>
> https://antiphishing.vadesecure.com/v4?f=V3p0eFlQOUZ4czh2enpJS7j2EMloAUIO32fbOvv14lAE8TBfIaEnIXo1udoGNqFP&i=SHV0Y1JZQjNyckJFa3dUQgXEoh5gS3KVt16QIfDo2EM&k=ZVd0&r=T0hnMlUyVEgwNmlmdHc1NSadQBRlJgURlsomDarMY4-jUWYEpAndmhRvBGKG0NP9&s=9be6bbd716ad7f48663775ee06c8a10f00ee18518820dff2f66261e44d97e44b&u=https%3A%2F%2Fwww.heinlein-support.de
>
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
>
> Amtsgericht Berlin-Charlottenburg - HRB 220009 B
> Geschäftsführer: Peer Heinlein - Sitz: Berlin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Mohamed LAMDAOUAR
Thanks Joachim for the clarification ;)


8 rue greneta, 75003, Paris, FRANCE
enyx.com


*exegy.com
*

*Mohamed*

*Lamdaouar*

*Infrastructure Engineer*

mohamed.lamdao...@enyx.fr



Le jeu. 2 nov. 2023 à 12:32, Joachim Kraftmayer - ceph ambassador <
joachim.kraftma...@clyso.com> a écrit :

> Hi,
>
> another short note regarding the documentation, the paths are designed
> for a package installation.
>
> the paths for container installation look a bit different e.g.:
> /var/lib/ceph//osd.y/
>
> Joachim
>
> ___
> ceph ambassador DACH
> ceph consultant since 2012
>
> Clyso GmbH - Premier Ceph Foundation Member
>
>
> https://antiphishing.vadesecure.com/v4?f=YzVlb2dsZkpsODdFRWdva9L1IE1w2EQLvMdHLN3sufEeUE26-Zjd5he_hHQGVRiZ&i=QnFJT0s3VDByR0FXRXVPd5pLO-MYgJX1RP9aJCymua0&k=jSd7&r=S25kRlB1M01yME9kTDh5eGIDikNixeEjoSEXXcwvyF-L3OOm1Nlw9ziPIdSCShRH&s=7e8601d35f271e8f59c60a0f355ae783469031960405ac244a2a5c2a7cd1fffb&u=https%3A%2F%2Fwww.clyso.com%2F
>
> Am 02.11.23 um 12:02 schrieb Robert Sander:
> > Hi,
> >
> > On 11/2/23 11:28, Mohamed LAMDAOUAR wrote:
> >
> >>I have 7 machines on CEPH cluster, the service ceph runs on a docker
> >> container.
> >>   Each machine has 4 hdd of data (available) and 2 nvme sssd (bricked)
> >>During a reboot, the ssd bricked on 4 machines, the data are
> >> available on
> >> the HDD disk but the nvme is bricked and the system is not available.
> >> is it
> >> possible to recover the data of the cluster (the data disk are all
> >> available)
> >
> > You can try to recover the MON db from the OSDs, as they keep a copy
> > of it:
> >
> >
> https://antiphishing.vadesecure.com/v4?f=YzVlb2dsZkpsODdFRWdva9L1IE1w2EQLvMdHLN3sufEeUE26-Zjd5he_hHQGVRiZ&i=QnFJT0s3VDByR0FXRXVPd5pLO-MYgJX1RP9aJCymua0&k=jSd7&r=S25kRlB1M01yME9kTDh5eGIDikNixeEjoSEXXcwvyF-L3OOm1Nlw9ziPIdSCShRH&s=e6330edf5d51634451c52072757d4758b6043067ad1106fe0543ea0ae7eac543&u=https%3A%2F%2Fdocs.ceph.com%2Fen%2Freef%2Frados%2Ftroubleshooting%2Ftroubleshooting-mon%2F%23monitor-store-failures
> >
> >
> > Regards
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Robert Sander

On 11/2/23 12:48, Mohamed LAMDAOUAR wrote:


I reinstalled the OS on a  new SSD disk. How can I rebuild my cluster with
only one mons.


If there is one MON still operating you can try to extract its monmap 
and remove all the other MONs from it with the monmaptool:


https://docs.ceph.com/en/latest/man/8/monmaptool/
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovering-a-monitor-s-broken-monmap

This way the remaining MON will be the only one in the map and will have 
quorum and the cluster will work again.


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Mohamed LAMDAOUAR
Hi robert,

when I ran this command, I got this error (because the database of the osd
was on the boot disk)

ceph-objectstore-tool \
> --type bluestore \
> --data-path /var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9 \
> --op update-mon-db \
> --mon-store-path /home/enyx-admin/backup-osd-9 \
> --no-mon-config --debug
2023-11-02T10:59:33.381+ 7f6724da71c0  1 bdev(0x560257b42400
/var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9/block) open path
/var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9/block

2023-11-02T10:59:33.381+ 7f6724da71c0  1 bdev(0x560257b42400
/var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9/block) open size
1827154432 (0x9187fc0, 9.1 TiB) block_size 4096 (4 KiB) rotational
device, discard not supported

2023-11-02T10:59:33.381+ 7f6724da71c0  1 bluestore(/var/lib/ceph/
c80891ba-55f3-11ed-9389-919f4368965c/osd.9) _set_cache_sizes cache_size
1073741824 meta 0.45 kv 0.45 data 0.06

2023-11-02T10:59:33.381+ 7f6724da71c0  1 bdev(0x560257b42c00
/var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9/block) open path
/var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9/block

2023-11-02T10:59:33.381+ 7f6724da71c0  1 bdev(0x560257b42c00
/var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9/block) open size
1827154432 (0x9187fc0, 9.1 TiB) block_size 4096 (4 KiB) rotational
device, discard not supported

2023-11-02T10:59:33.381+ 7f6724da71c0  1 bluefs add_block_device bdev 1
path /var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9/block size
9.1 TiB

2023-11-02T10:59:33.381+ 7f6724da71c0  1 bluefs mount

2023-11-02T10:59:33.381+ 7f6724da71c0  1 bluefs _init_alloc shared, id
1, capacity 0x9187fc0, block size 0x1

2023-11-02T10:59:33.441+ 7f6724da71c0 -1 bluefs _replay 0x0: stop: uuid
369c96dd-2df1-8d88-2722-3f8334920e83 != super.uuid
ba94c6e8-394b-4a78-84d6-9afe1cbc280b,
block dump:
  42 9c 61 78 69 ec 36 9c  96 dd 2d f1 8d 88 27 22
 |B.axi.6...-...'"|
0010  3f 83 34 92 0e 83 e6 0f  8f 17 fc 3e ec 86 c5 15
 |?.4>|
0020  39 91 13 2e b0 14 92 86  65 75 5c 8e c1 ee fc 18
 |9...eu\.|
0030  f1 7b b2 37 f7 75 70 e2  5e da 79 cd e6 ad 27 40
 |.{.7.up.^.y...'@|
0040  d6 b8 3b da 81 1f 9b ba  c6 e8 b7 68 bc a1 77 ac
 |..;h..w.|
0050  7b a9 a3 cd 9d da b6 57  aa 40 bd ab d0 89 ec e6  |{..W.@
..|
0060  71 a2 2b 4d 87 74 2f ff  0a bf 3b da 3d da 93 52
 |q.+M.t/...;.=..R|
0070  1c ea f2 fb 8d e0 a1 e6  ef b5 42 5e 85 87 27 df
 |..B^..'.|
0080  ac f1 ae 08 9d c5 71 6f  0f f7 68 ce 28 3d 3e 6e
 |..qo..h.(=>n|
0090  94 b2 1a dc 3b f0 9e e9  6e 77 dd 95 b6 9e 94 56
 |;...nw.V|
00a0  f2 dd 9a 35 a0 65 78 05  bb a9 5f a1 99 6a 5c a1
 |...5.ex..._..j\.|
00b0  5d e9 6d 02 83 be 9d 60  d1 82 fc 6c 66 40 11 17  |].m`...lf@
..|
00c0  3a 4d 9d 73 f6 ec fb ed  41 db e2 39 15 e1 5f 28
 |:M.sA..9.._(|
00d0  c4 ce cf eb 93 f2 88 d5  af ae 11 14 d6 97 74 ff
 |..t.|
00e0  4b 7e 73 fe 97 4c 06 2a  3a bc b3 7f 04 94 6c 1d
 |K~s..L.*:.l.|
00f0  60 bf b1 42 fa 76 b0 df  33 ff bf 84 36 b1 b5 b3
 |`..B.v..3...6...|
0100  17 36 d6 b7 7d 4c d4 37  fa 7f 8e 59 1f 72 53 d5
 |.6..}L.7...Y.rS.|
0110  c4 d0 de d8 4e 13 ca c6  0a 60 87 3c e4 21 2b 1b
 |N`.<.!+.|
0120  00 f2 67 cf 0a 02 01 20  ec ec 7f c1 8f e3 df f8  |..g
|
0130  3f db 7f 60 28 14 8a fa  48 cb c6 f6 c7 9a 3f 71
 |?..`(...H.?q|
0140  bf 61 36 30 08 c0 f1 e7  f8 af b5 7f d2 fc ad a1
 |.a60|
0150  72 b2 40 ff 82 ff a3 c7  5f f0 a3 0e 8f b2 fe b6  |r.@
._...|
0160  ee 2f 5d fe 90 8b fa 28  8f 95 03 fa 5b ee e3 9c
 |./]([...|
0170  36 ea 3f 6a 1e c0 fe bb  c2 80 4a 56 ca 96 26 8f
 |6.?j..JV..&.|
0180  85 03 e0 f8 67 c9 3d a8  fa 97 af c5 c0 00 ce 7f
 |g.=.|
0190  cd 83 ff 36 ff c0 1c f0  7b c1 03 cf b7 b6 56 06
 |...6{.V.|
01a0  8a 30 7b 4d e0 5b 11 31  a0 12 cc d9 5e fb 7f 2d
 |.0{M.[.1^..-|
01b0  fb 47 04 df ea 1b 3d 3e  6c 1f f7 07 96 df 97 cf
 |.G=>l...|
01c0  15 60 76 56 0e b6 06 30  3b c0 6f 11 0a 40 19 98  |.`vV...0;.o..@
..|
01d0  a1 89 fe e3 b6 f3 28 80  83 e5 c1 1d 9c ac da 40
 |..(@|
01e0  71 5b 2b 07 eb 07 2e 8a  0f 71 7f 88 aa f5 23 0b
 |q[+..q#.|
01f0  03 17 a0 b0 e2 c3 76 e3  68 62 00 53 10 17 02 4a
 |..v.hb.S...J|
0200  02 ec 1f 72 82 8f 0f 28  fc a0 e0 83 04 3b c0 e3
 |...r...(.;..|
0210  6b 25 15 fe ae ce df de  33 29 6c e5 f0 68 c6 1f
 |k%..3)l..h..|
0220  e9 01 00 ff f1 be c9 c7  f4 f8 73 fc bf cc 80 fe
 |..s.|
0230  73 1d 08 a8 64 62 6f 0e  e3 11 13 15 13 03 81 58
 |s...dboX|
0240  a1 20 10 9b f0 43 e3 7c  68 2c 0f ed 21 34 10 10  |.
...C.|h,..!4..|
0250  08 04 05 f3 fd de 0f ed  35 ff b0 4d 4d 5d e3 a1
 |5..MM]..|
0260  7f 30 08 f0 b0 85 fe e9  06 f0 3f 95 64 f9 2f 3e
 |.0

[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Boris Behrens
Hi,
follow these instructions:
https://docs.ceph.com/en/quincy/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster
As you are using containers, you might need to specify the --mon-data
directory (/var/lib/CLUSTER_UUID/mon.MONNAME) (actually I never did this in
an orchestrator environment)

Good luck.


Am Do., 2. Nov. 2023 um 12:48 Uhr schrieb Mohamed LAMDAOUAR <
mohamed.lamdao...@enyx.fr>:

> Hello Boris,
>
> I have one server monitor up and two other servers of the cluster are also
> up (These two servers are not monitors ) .
> I have four other servers down (the boot disk is out) but the osd data
> disks are safe.
> I reinstalled the OS on a  new SSD disk. How can I rebuild my cluster with
> only one mons.
> If you would like, you can join me for a meeting. I will give you more
> information about the cluster.
>
> Thanks for your help, I'm very stuck because the data is present but I
> don't know how to add the old osd in the cluster to recover the data.
>
>
>
> Le jeu. 2 nov. 2023 à 11:55, Boris Behrens  a écrit :
>
>> Hi Mohamed,
>> are all mons down, or do you still have at least one that is running?
>>
>> AFAIK: the mons save their DB on the normal OS disks, and not within the
>> ceph cluster.
>> So if all mons are dead, which mean the disks which contained the mon data
>> are unrecoverable dead, you might need to bootstrap a new cluster and add
>> the OSDs to the new cluster. This will likely include tinkering with cephx
>> authentication, so you don't wipe the old OSD data.
>>
>> If you still have at least ONE mon alive, you can shut it down, and remove
>> all the other mons from the monmap and start it again. You CAN have
>> clusters with only one mon.
>>
>> Or is did your host just lost the boot disk and you just need to bring it
>> up somehow? losing 4x2 NVME disks at the same time, sounds a bit strange.
>>
>> Am Do., 2. Nov. 2023 um 11:34 Uhr schrieb Mohamed LAMDAOUAR <
>> mohamed.lamdao...@enyx.fr>:
>>
>> > Hello,
>> >
>> >   I have 7 machines on CEPH cluster, the service ceph runs on a docker
>> > container.
>> >  Each machine has 4 hdd of data (available) and 2 nvme sssd (bricked)
>> >   During a reboot, the ssd bricked on 4 machines, the data are
>> available on
>> > the HDD disk but the nvme is bricked and the system is not available.
>> is it
>> > possible to recover the data of the cluster (the data disk are all
>> > available)
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>>
>>
>> --
>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
>> groüen Saal.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Malte Stroem

Hey Mohamed,

just send us the output of

ceph -s

and

ceph mon dump

please.

Best,
Malte

On 02.11.23 13:05, Mohamed LAMDAOUAR wrote:

Hi robert,

when I ran this command, I got this error (because the database of the osd
was on the boot disk)

ceph-objectstore-tool \

--type bluestore \
--data-path /var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9 \
--op update-mon-db \
--mon-store-path /home/enyx-admin/backup-osd-9 \
--no-mon-config --debug

2023-11-02T10:59:33.381+ 7f6724da71c0  1 bdev(0x560257b42400
/var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9/block) open path
/var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9/block

2023-11-02T10:59:33.381+ 7f6724da71c0  1 bdev(0x560257b42400
/var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9/block) open size
1827154432 (0x9187fc0, 9.1 TiB) block_size 4096 (4 KiB) rotational
device, discard not supported

2023-11-02T10:59:33.381+ 7f6724da71c0  1 bluestore(/var/lib/ceph/
c80891ba-55f3-11ed-9389-919f4368965c/osd.9) _set_cache_sizes cache_size
1073741824 meta 0.45 kv 0.45 data 0.06

2023-11-02T10:59:33.381+ 7f6724da71c0  1 bdev(0x560257b42c00
/var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9/block) open path
/var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9/block

2023-11-02T10:59:33.381+ 7f6724da71c0  1 bdev(0x560257b42c00
/var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9/block) open size
1827154432 (0x9187fc0, 9.1 TiB) block_size 4096 (4 KiB) rotational
device, discard not supported

2023-11-02T10:59:33.381+ 7f6724da71c0  1 bluefs add_block_device bdev 1
path /var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9/block size
9.1 TiB

2023-11-02T10:59:33.381+ 7f6724da71c0  1 bluefs mount

2023-11-02T10:59:33.381+ 7f6724da71c0  1 bluefs _init_alloc shared, id
1, capacity 0x9187fc0, block size 0x1

2023-11-02T10:59:33.441+ 7f6724da71c0 -1 bluefs _replay 0x0: stop: uuid
369c96dd-2df1-8d88-2722-3f8334920e83 != super.uuid
ba94c6e8-394b-4a78-84d6-9afe1cbc280b,
block dump:
  42 9c 61 78 69 ec 36 9c  96 dd 2d f1 8d 88 27 22
  |B.axi.6...-...'"|
0010  3f 83 34 92 0e 83 e6 0f  8f 17 fc 3e ec 86 c5 15
  |?.4>|
0020  39 91 13 2e b0 14 92 86  65 75 5c 8e c1 ee fc 18
  |9...eu\.|
0030  f1 7b b2 37 f7 75 70 e2  5e da 79 cd e6 ad 27 40
  |.{.7.up.^.y...'@|
0040  d6 b8 3b da 81 1f 9b ba  c6 e8 b7 68 bc a1 77 ac
  |..;h..w.|
0050  7b a9 a3 cd 9d da b6 57  aa 40 bd ab d0 89 ec e6  |{..W.@
..|
0060  71 a2 2b 4d 87 74 2f ff  0a bf 3b da 3d da 93 52
  |q.+M.t/...;.=..R|
0070  1c ea f2 fb 8d e0 a1 e6  ef b5 42 5e 85 87 27 df
  |..B^..'.|
0080  ac f1 ae 08 9d c5 71 6f  0f f7 68 ce 28 3d 3e 6e
  |..qo..h.(=>n|
0090  94 b2 1a dc 3b f0 9e e9  6e 77 dd 95 b6 9e 94 56
  |;...nw.V|
00a0  f2 dd 9a 35 a0 65 78 05  bb a9 5f a1 99 6a 5c a1
  |...5.ex..._..j\.|
00b0  5d e9 6d 02 83 be 9d 60  d1 82 fc 6c 66 40 11 17  |].m`...lf@
..|
00c0  3a 4d 9d 73 f6 ec fb ed  41 db e2 39 15 e1 5f 28
  |:M.sA..9.._(|
00d0  c4 ce cf eb 93 f2 88 d5  af ae 11 14 d6 97 74 ff
  |..t.|
00e0  4b 7e 73 fe 97 4c 06 2a  3a bc b3 7f 04 94 6c 1d
  |K~s..L.*:.l.|
00f0  60 bf b1 42 fa 76 b0 df  33 ff bf 84 36 b1 b5 b3
  |`..B.v..3...6...|
0100  17 36 d6 b7 7d 4c d4 37  fa 7f 8e 59 1f 72 53 d5
  |.6..}L.7...Y.rS.|
0110  c4 d0 de d8 4e 13 ca c6  0a 60 87 3c e4 21 2b 1b
  |N`.<.!+.|
0120  00 f2 67 cf 0a 02 01 20  ec ec 7f c1 8f e3 df f8  |..g
|
0130  3f db 7f 60 28 14 8a fa  48 cb c6 f6 c7 9a 3f 71
  |?..`(...H.?q|
0140  bf 61 36 30 08 c0 f1 e7  f8 af b5 7f d2 fc ad a1
  |.a60|
0150  72 b2 40 ff 82 ff a3 c7  5f f0 a3 0e 8f b2 fe b6  |r.@
._...|
0160  ee 2f 5d fe 90 8b fa 28  8f 95 03 fa 5b ee e3 9c
  |./]([...|
0170  36 ea 3f 6a 1e c0 fe bb  c2 80 4a 56 ca 96 26 8f
  |6.?j..JV..&.|
0180  85 03 e0 f8 67 c9 3d a8  fa 97 af c5 c0 00 ce 7f
  |g.=.|
0190  cd 83 ff 36 ff c0 1c f0  7b c1 03 cf b7 b6 56 06
  |...6{.V.|
01a0  8a 30 7b 4d e0 5b 11 31  a0 12 cc d9 5e fb 7f 2d
  |.0{M.[.1^..-|
01b0  fb 47 04 df ea 1b 3d 3e  6c 1f f7 07 96 df 97 cf
  |.G=>l...|
01c0  15 60 76 56 0e b6 06 30  3b c0 6f 11 0a 40 19 98  |.`vV...0;.o..@
..|
01d0  a1 89 fe e3 b6 f3 28 80  83 e5 c1 1d 9c ac da 40
  |..(@|
01e0  71 5b 2b 07 eb 07 2e 8a  0f 71 7f 88 aa f5 23 0b
  |q[+..q#.|
01f0  03 17 a0 b0 e2 c3 76 e3  68 62 00 53 10 17 02 4a
  |..v.hb.S...J|
0200  02 ec 1f 72 82 8f 0f 28  fc a0 e0 83 04 3b c0 e3
  |...r...(.;..|
0210  6b 25 15 fe ae ce df de  33 29 6c e5 f0 68 c6 1f
  |k%..3)l..h..|
0220  e9 01 00 ff f1 be c9 c7  f4 f8 73 fc bf cc 80 fe
  |..s.|
0230  73 1d 08 a8 64 62 6f 0e  e3 11 13 15 13 03 81 58
  |s...dboX|
0240  a1 20 10 9b f0 43 e3 7c  68 2c 0f ed 21 34 10 10  |.
...C

[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Robert Sander

Hi,

On 11/2/23 13:05, Mohamed LAMDAOUAR wrote:


when I ran this command, I got this error (because the database of the 
osd was on the boot disk)


The RocksDB part of the OSD was on the failed SSD?

Then the OSD is lost and cannot be recovered.
The RocksDB contains the information where each object is stored on the 
OSD data partition and without it nobody knows where each object is. The 
data is lost.


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread David C.
Hi Mohamed,

I understand there's one operational monitor, isn't there?
If so, you need to reprovision the other monitors on an empty base so that
they synchronize with the only remaining monitor.


Cordialement,

*David CASIER*





Le jeu. 2 nov. 2023 à 13:42, Mohamed LAMDAOUAR 
a écrit :

> Hi robert,
>
> when I ran this command, I got this error (because the database of the osd
> was on the boot disk)
>
> ceph-objectstore-tool \
> > --type bluestore \
> > --data-path /var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9 \
> > --op update-mon-db \
> > --mon-store-path /home/enyx-admin/backup-osd-9 \
> > --no-mon-config --debug
> 2023-11-02T10:59:33.381+ 7f6724da71c0  1 bdev(0x560257b42400
> /var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9/block) open path
> /var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9/block
>
> 2023-11-02T10:59:33.381+ 7f6724da71c0  1 bdev(0x560257b42400
> /var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9/block) open size
> 1827154432 (0x9187fc0, 9.1 TiB) block_size 4096 (4 KiB) rotational
> device, discard not supported
>
> 2023-11-02T10:59:33.381+ 7f6724da71c0  1 bluestore(/var/lib/ceph/
> c80891ba-55f3-11ed-9389-919f4368965c/osd.9) _set_cache_sizes cache_size
> 1073741824 meta 0.45 kv 0.45 data 0.06
>
> 2023-11-02T10:59:33.381+ 7f6724da71c0  1 bdev(0x560257b42c00
> /var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9/block) open path
> /var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9/block
>
> 2023-11-02T10:59:33.381+ 7f6724da71c0  1 bdev(0x560257b42c00
> /var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9/block) open size
> 1827154432 (0x9187fc0, 9.1 TiB) block_size 4096 (4 KiB) rotational
> device, discard not supported
>
> 2023-11-02T10:59:33.381+ 7f6724da71c0  1 bluefs add_block_device bdev 1
> path /var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9/block size
> 9.1 TiB
>
> 2023-11-02T10:59:33.381+ 7f6724da71c0  1 bluefs mount
>
> 2023-11-02T10:59:33.381+ 7f6724da71c0  1 bluefs _init_alloc shared, id
> 1, capacity 0x9187fc0, block size 0x1
>
> 2023-11-02T10:59:33.441+ 7f6724da71c0 -1 bluefs _replay 0x0: stop: uuid
> 369c96dd-2df1-8d88-2722-3f8334920e83 != super.uuid
> ba94c6e8-394b-4a78-84d6-9afe1cbc280b,
> block dump:
>   42 9c 61 78 69 ec 36 9c  96 dd 2d f1 8d 88 27 22
>  |B.axi.6...-...'"|
> 0010  3f 83 34 92 0e 83 e6 0f  8f 17 fc 3e ec 86 c5 15
>  |?.4>|
> 0020  39 91 13 2e b0 14 92 86  65 75 5c 8e c1 ee fc 18
>  |9...eu\.|
> 0030  f1 7b b2 37 f7 75 70 e2  5e da 79 cd e6 ad 27 40
>  |.{.7.up.^.y...'@|
> 0040  d6 b8 3b da 81 1f 9b ba  c6 e8 b7 68 bc a1 77 ac
>  |..;h..w.|
> 0050  7b a9 a3 cd 9d da b6 57  aa 40 bd ab d0 89 ec e6  |{..W.@
> ..|
> 0060  71 a2 2b 4d 87 74 2f ff  0a bf 3b da 3d da 93 52
>  |q.+M.t/...;.=..R|
> 0070  1c ea f2 fb 8d e0 a1 e6  ef b5 42 5e 85 87 27 df
>  |..B^..'.|
> 0080  ac f1 ae 08 9d c5 71 6f  0f f7 68 ce 28 3d 3e 6e
>  |..qo..h.(=>n|
> 0090  94 b2 1a dc 3b f0 9e e9  6e 77 dd 95 b6 9e 94 56
>  |;...nw.V|
> 00a0  f2 dd 9a 35 a0 65 78 05  bb a9 5f a1 99 6a 5c a1
>  |...5.ex..._..j\.|
> 00b0  5d e9 6d 02 83 be 9d 60  d1 82 fc 6c 66 40 11 17  |].m`...lf@
> ..|
> 00c0  3a 4d 9d 73 f6 ec fb ed  41 db e2 39 15 e1 5f 28
>  |:M.sA..9.._(|
> 00d0  c4 ce cf eb 93 f2 88 d5  af ae 11 14 d6 97 74 ff
>  |..t.|
> 00e0  4b 7e 73 fe 97 4c 06 2a  3a bc b3 7f 04 94 6c 1d
>  |K~s..L.*:.l.|
> 00f0  60 bf b1 42 fa 76 b0 df  33 ff bf 84 36 b1 b5 b3
>  |`..B.v..3...6...|
> 0100  17 36 d6 b7 7d 4c d4 37  fa 7f 8e 59 1f 72 53 d5
>  |.6..}L.7...Y.rS.|
> 0110  c4 d0 de d8 4e 13 ca c6  0a 60 87 3c e4 21 2b 1b
>  |N`.<.!+.|
> 0120  00 f2 67 cf 0a 02 01 20  ec ec 7f c1 8f e3 df f8  |..g
> |
> 0130  3f db 7f 60 28 14 8a fa  48 cb c6 f6 c7 9a 3f 71
>  |?..`(...H.?q|
> 0140  bf 61 36 30 08 c0 f1 e7  f8 af b5 7f d2 fc ad a1
>  |.a60|
> 0150  72 b2 40 ff 82 ff a3 c7  5f f0 a3 0e 8f b2 fe b6  |r.@
> ._...|
> 0160  ee 2f 5d fe 90 8b fa 28  8f 95 03 fa 5b ee e3 9c
>  |./]([...|
> 0170  36 ea 3f 6a 1e c0 fe bb  c2 80 4a 56 ca 96 26 8f
>  |6.?j..JV..&.|
> 0180  85 03 e0 f8 67 c9 3d a8  fa 97 af c5 c0 00 ce 7f
>  |g.=.|
> 0190  cd 83 ff 36 ff c0 1c f0  7b c1 03 cf b7 b6 56 06
>  |...6{.V.|
> 01a0  8a 30 7b 4d e0 5b 11 31  a0 12 cc d9 5e fb 7f 2d
>  |.0{M.[.1^..-|
> 01b0  fb 47 04 df ea 1b 3d 3e  6c 1f f7 07 96 df 97 cf
>  |.G=>l...|
> 01c0  15 60 76 56 0e b6 06 30  3b c0 6f 11 0a 40 19 98  |.`vV...0;.o..@
> ..|
> 01d0  a1 89 fe e3 b6 f3 28 80  83 e5 c1 1d 9c ac da 40
>  |..(@|
> 01e0  71 5b 2b 07 eb 07 2e 8a  0f 71 7f 88 aa f5 23 0b
>  |q[+..q#.|
> 01f0  0

[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Anthony D'Atri
This admittedly is the case throughout the docs.

> On Nov 2, 2023, at 07:27, Joachim Kraftmayer - ceph ambassador 
>  wrote:
> 
> Hi,
> 
> another short note regarding the documentation, the paths are designed for a 
> package installation.
> 
> the paths for container installation look a bit different e.g.: 
> /var/lib/ceph//osd.y/
> 
> Joachim
> 
> ___
> ceph ambassador DACH
> ceph consultant since 2012
> 
> Clyso GmbH - Premier Ceph Foundation Member
> 
> https://www.clyso.com/
> 
> Am 02.11.23 um 12:02 schrieb Robert Sander:
>> Hi,
>> 
>> On 11/2/23 11:28, Mohamed LAMDAOUAR wrote:
>> 
>>>I have 7 machines on CEPH cluster, the service ceph runs on a docker
>>> container.
>>>   Each machine has 4 hdd of data (available) and 2 nvme sssd (bricked)
>>>During a reboot, the ssd bricked on 4 machines, the data are available on
>>> the HDD disk but the nvme is bricked and the system is not available. is it
>>> possible to recover the data of the cluster (the data disk are all
>>> available)
>> 
>> You can try to recover the MON db from the OSDs, as they keep a copy of it:
>> 
>> https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-mon/#monitor-store-failures
>>  
>> 
>> Regards
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread David C.
Hi,

I've just checked with the team and the situation is much more serious than
it seems: the lost disks contained the MONs AND OSDs databases (5 servers
down out of 8, replica 3).

It seems that the team fell victim to a bad batch of Samsung 980 Pros (I'm
not a big fan of this "Pro" range, but that's not the point), which have
never been able to restart since the incident.

Someone please correct me, but as far as I'm concerned, the cluster is lost.


Cordialement,

*David CASIER*




*Ligne directe: +33(0) 9 72 61 98 29*




Le jeu. 2 nov. 2023 à 15:49, Anthony D'Atri  a écrit :

> This admittedly is the case throughout the docs.
>
> > On Nov 2, 2023, at 07:27, Joachim Kraftmayer - ceph ambassador <
> joachim.kraftma...@clyso.com> wrote:
> >
> > Hi,
> >
> > another short note regarding the documentation, the paths are designed
> for a package installation.
> >
> > the paths for container installation look a bit different e.g.:
> /var/lib/ceph//osd.y/
> >
> > Joachim
> >
> > ___
> > ceph ambassador DACH
> > ceph consultant since 2012
> >
> > Clyso GmbH - Premier Ceph Foundation Member
> >
> > https://www.clyso.com/
> >
> > Am 02.11.23 um 12:02 schrieb Robert Sander:
> >> Hi,
> >>
> >> On 11/2/23 11:28, Mohamed LAMDAOUAR wrote:
> >>
> >>>I have 7 machines on CEPH cluster, the service ceph runs on a docker
> >>> container.
> >>>   Each machine has 4 hdd of data (available) and 2 nvme sssd (bricked)
> >>>During a reboot, the ssd bricked on 4 machines, the data are
> available on
> >>> the HDD disk but the nvme is bricked and the system is not available.
> is it
> >>> possible to recover the data of the cluster (the data disk are all
> >>> available)
> >>
> >> You can try to recover the MON db from the OSDs, as they keep a copy of
> it:
> >>
> >>
> https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-mon/#monitor-store-failures
> >>
> >> Regards
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io