[ceph-users] Re: restoring ceph cluster from osds
Hi, I still think the best approach would be to rebuild the MON store from the OSDs as described here [2]. Just creating new MONs with the same IDs might not be sufficient because they would miss all the OSD keyrings etc., so you'd still have to do some work to get it up. It might be easier with the OSD approach, but other users might have a better approach, it's really been a while since I had to go through that troubleshooting section. Regards, Eugen [2] https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds Zitat von Ben : Hi, Yes, the old mon daemons are removed. In the first post mon daemons were started with mon data from scratch. After some code search, I suspect without original mon data I could restore the cluster from all osds. But I may be wrong on this. For now, I think it could be of less configuration if I could start a mon daemon cluster with exact ID as original one( something like k,m,o). Any thoughts on this? Ben Eugen Block 于2023年3月9日周四 20:56写道: Hi, I'm not familiar with rook so the steps required may vary. If you try to reuse the old mon stores you'll have the mentioned mismatch between the new daemons and the old monmap (which still contains the old mon daemons). It's not entirely clear what went wrong in the first place and what you already tried exactly, so it's hard to tell if editing the monmap is the way to go here. I guess the old mon daemons are removed, is that assumption correct? In that case it could be worth a try to edit the current monmap to contain only the new mons and inject it (see [1] for details). If the mons start and form a quorum you'd have a cluster, but I can't tell if the OSDs will register successfully. I think the previous approach when the original mons were up but the OSDs didn't start would have been more promising. Anyway, maybe editing the monmap will fix this for you. [1] https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovering-a-monitor-s-broken-monmap Zitat von Ben : > Hi Eugen, > > Thank you for help on this. > > Forget the log. A little progress, the monitors store were restored. I > created a new ceph cluster to use the restored monitors store. But the > monitor log complains: > > debug 2023-03-09T11:00:31.233+ 7fe95234f880 0 starting mon.a rank -1 > at public addrs [v2:169.169.163.25:3300/0,v1:169.169.163.25:6789/0] at bind > addrs [v2:197.166.206.27:3300/0,v1:197.166.206.27:6789/0] mon_data > /var/lib/ceph/mon/ceph-a fsid 3f271841-6188-47c1-b3fd-90fd4f978c76 > > debug 2023-03-09T11:00:31.234+ 7fe95234f880 1 mon.a@-1(???) e27 > preinit fsid 3f271841-6188-47c1-b3fd-90fd4f978c76 > > debug 2023-03-09T11:00:31.234+ 7fe95234f880 -1 mon.a@-1(???) e27 not in > monmap and have been in a quorum before; must have been removed > > debug 2023-03-09T11:00:31.234+ 7fe95234f880 -1 mon.a@-1(???) e27 commit > suicide! > > debug 2023-03-09T11:00:31.234+ 7fe95234f880 -1 failed to initialize > > > The fact is original monitor clusters ids are k,m,o, however the new ones > are a,b,d. It was deployed by rook. Any ideas to make this work? > > > Ben > > Eugen Block 于2023年3月9日周四 16:00写道: > >> Hi, >> >> there's no attachment to your email, please use something like >> pastebin to provide OSD logs. >> >> Thanks >> Eugen >> >> Zitat von Ben : >> >> > Hi, >> > >> > I ended up with having whole set of osds to get back original ceph >> cluster. >> > I figured out to make the cluster running. However, it's status is >> > something as below: >> > >> > bash-4.4$ ceph -s >> > >> > cluster: >> > >> > id: 3f271841-6188-47c1-b3fd-90fd4f978c76 >> > >> > health: HEALTH_WARN >> > >> > 7 daemons have recently crashed >> > >> > 4 slow ops, oldest one blocked for 35077 sec, daemons >> > [mon.a,mon.b] have slow ops. >> > >> > >> > >> > services: >> > >> > mon: 3 daemons, quorum a,b,d (age 9h) >> > >> > mgr: b(active, since 14h), standbys: a >> > >> > osd: 4 osds: 0 up, 4 in (since 9h) >> > >> > >> > >> > data: >> > >> > pools: 0 pools, 0 pgs >> > >> > objects: 0 objects, 0 B >> > >> > usage: 0 B used, 0 B / 0 B avail >> > >> > pgs: >> > >> > >> > All osds are down. >> > >> > >> > I checked the osds logs and attached with this. >> > >> > >> > Please help and I wonder if it's possible to get the cluster back. I have >> > some backup for monitor's data. Till now I haven't restore that in the >> > course. >> > >> > >> > Thanks, >> > >> > Ben >> > ___ >> > ceph-users mailing list -- ceph-users@ceph.io >> > To unsubscribe send an email to ceph-users-le...@ceph.io >> >> >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email
[ceph-users] Re: restoring ceph cluster from osds
Hi, Yes, the old mon daemons are removed. In the first post mon daemons were started with mon data from scratch. After some code search, I suspect without original mon data I could restore the cluster from all osds. But I may be wrong on this. For now, I think it could be of less configuration if I could start a mon daemon cluster with exact ID as original one( something like k,m,o). Any thoughts on this? Ben Eugen Block 于2023年3月9日周四 20:56写道: > Hi, > > I'm not familiar with rook so the steps required may vary. If you try > to reuse the old mon stores you'll have the mentioned mismatch between > the new daemons and the old monmap (which still contains the old mon > daemons). It's not entirely clear what went wrong in the first place > and what you already tried exactly, so it's hard to tell if editing > the monmap is the way to go here. I guess the old mon daemons are > removed, is that assumption correct? In that case it could be worth a > try to edit the current monmap to contain only the new mons and inject > it (see [1] for details). If the mons start and form a quorum you'd > have a cluster, but I can't tell if the OSDs will register > successfully. I think the previous approach when the original mons > were up but the OSDs didn't start would have been more promising. > Anyway, maybe editing the monmap will fix this for you. > > [1] > > https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovering-a-monitor-s-broken-monmap > > Zitat von Ben : > > > Hi Eugen, > > > > Thank you for help on this. > > > > Forget the log. A little progress, the monitors store were restored. I > > created a new ceph cluster to use the restored monitors store. But the > > monitor log complains: > > > > debug 2023-03-09T11:00:31.233+ 7fe95234f880 0 starting mon.a rank -1 > > at public addrs [v2:169.169.163.25:3300/0,v1:169.169.163.25:6789/0] at > bind > > addrs [v2:197.166.206.27:3300/0,v1:197.166.206.27:6789/0] mon_data > > /var/lib/ceph/mon/ceph-a fsid 3f271841-6188-47c1-b3fd-90fd4f978c76 > > > > debug 2023-03-09T11:00:31.234+ 7fe95234f880 1 mon.a@-1(???) e27 > > preinit fsid 3f271841-6188-47c1-b3fd-90fd4f978c76 > > > > debug 2023-03-09T11:00:31.234+ 7fe95234f880 -1 mon.a@-1(???) e27 > not in > > monmap and have been in a quorum before; must have been removed > > > > debug 2023-03-09T11:00:31.234+ 7fe95234f880 -1 mon.a@-1(???) e27 > commit > > suicide! > > > > debug 2023-03-09T11:00:31.234+ 7fe95234f880 -1 failed to initialize > > > > > > The fact is original monitor clusters ids are k,m,o, however the new ones > > are a,b,d. It was deployed by rook. Any ideas to make this work? > > > > > > Ben > > > > Eugen Block 于2023年3月9日周四 16:00写道: > > > >> Hi, > >> > >> there's no attachment to your email, please use something like > >> pastebin to provide OSD logs. > >> > >> Thanks > >> Eugen > >> > >> Zitat von Ben : > >> > >> > Hi, > >> > > >> > I ended up with having whole set of osds to get back original ceph > >> cluster. > >> > I figured out to make the cluster running. However, it's status is > >> > something as below: > >> > > >> > bash-4.4$ ceph -s > >> > > >> > cluster: > >> > > >> > id: 3f271841-6188-47c1-b3fd-90fd4f978c76 > >> > > >> > health: HEALTH_WARN > >> > > >> > 7 daemons have recently crashed > >> > > >> > 4 slow ops, oldest one blocked for 35077 sec, daemons > >> > [mon.a,mon.b] have slow ops. > >> > > >> > > >> > > >> > services: > >> > > >> > mon: 3 daemons, quorum a,b,d (age 9h) > >> > > >> > mgr: b(active, since 14h), standbys: a > >> > > >> > osd: 4 osds: 0 up, 4 in (since 9h) > >> > > >> > > >> > > >> > data: > >> > > >> > pools: 0 pools, 0 pgs > >> > > >> > objects: 0 objects, 0 B > >> > > >> > usage: 0 B used, 0 B / 0 B avail > >> > > >> > pgs: > >> > > >> > > >> > All osds are down. > >> > > >> > > >> > I checked the osds logs and attached with this. > >> > > >> > > >> > Please help and I wonder if it's possible to get the cluster back. I > have > >> > some backup for monitor's data. Till now I haven't restore that in the > >> > course. > >> > > >> > > >> > Thanks, > >> > > >> > Ben > >> > ___ > >> > ceph-users mailing list -- ceph-users@ceph.io > >> > To unsubscribe send an email to ceph-users-le...@ceph.io > >> > >> > >> ___ > >> ceph-users mailing list -- ceph-users@ceph.io > >> To unsubscribe send an email to ceph-users-le...@ceph.io > >> > > > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Trying to throttle global backfill
I received a few suggestions, and resolved my issue. Anthony D'Atri suggested mclock (newer than my nautilus version), adding "--osd_recovery_max_single_start 1” (didn’t seem to take), “osd_op_queue_cut_off=high” (which I didn’t get to checking), and pgremapper (from github). Pgremapper did the trick to cancel the backfill which had been initiated by an unfortunate OSD name-changing sequence. Big winner, achieved EXACTLY what I needed, which was to undo an unfortunate recalculation of placement groups. Before: 310842802/17308319325 objects misplaced (1.796%) Ran: pgremapper cancel-backfill --yes After: 421709/17308356309 objects misplaced (0.002%) The “before” scenario was causing over 10GiB/s of backfill traffic. The “after” scenario was a very cool 300-400MiB/s, entirely within the realm of sanity. The cluster is temporarily split between two datacenters, being physically lifted and shifted over a period of a month. Alex Gorbachev also suggested setting osd-recovery-sleep. That was probably the solution I was looking for to throttle backfill operations at the beginning, and I’ll be keeping that in my toolbox, as well. As always, I’m HUGELY appreciative of the community response. I learned a lot in the process, had an outage-inducing scenario rectified very quickly, and got back to work. Thanks so much! Happy to answer any followup questions and return the favor when I can. From: Rice, Christian Date: Wednesday, March 8, 2023 at 3:57 PM To: ceph-users Subject: [EXTERNAL] [ceph-users] Trying to throttle global backfill I have a large number of misplaced objects, and I have all osd settings to “1” already: sudo ceph tell osd.\* injectargs '--osd_max_backfills=1 --osd_recovery_max_active=1 --osd_recovery_op_priority=1' How can I slow it down even more? The cluster is too large, it’s impacting other network traffic ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: restoring ceph cluster from osds
Hi, I'm not familiar with rook so the steps required may vary. If you try to reuse the old mon stores you'll have the mentioned mismatch between the new daemons and the old monmap (which still contains the old mon daemons). It's not entirely clear what went wrong in the first place and what you already tried exactly, so it's hard to tell if editing the monmap is the way to go here. I guess the old mon daemons are removed, is that assumption correct? In that case it could be worth a try to edit the current monmap to contain only the new mons and inject it (see [1] for details). If the mons start and form a quorum you'd have a cluster, but I can't tell if the OSDs will register successfully. I think the previous approach when the original mons were up but the OSDs didn't start would have been more promising. Anyway, maybe editing the monmap will fix this for you. [1] https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovering-a-monitor-s-broken-monmap Zitat von Ben : Hi Eugen, Thank you for help on this. Forget the log. A little progress, the monitors store were restored. I created a new ceph cluster to use the restored monitors store. But the monitor log complains: debug 2023-03-09T11:00:31.233+ 7fe95234f880 0 starting mon.a rank -1 at public addrs [v2:169.169.163.25:3300/0,v1:169.169.163.25:6789/0] at bind addrs [v2:197.166.206.27:3300/0,v1:197.166.206.27:6789/0] mon_data /var/lib/ceph/mon/ceph-a fsid 3f271841-6188-47c1-b3fd-90fd4f978c76 debug 2023-03-09T11:00:31.234+ 7fe95234f880 1 mon.a@-1(???) e27 preinit fsid 3f271841-6188-47c1-b3fd-90fd4f978c76 debug 2023-03-09T11:00:31.234+ 7fe95234f880 -1 mon.a@-1(???) e27 not in monmap and have been in a quorum before; must have been removed debug 2023-03-09T11:00:31.234+ 7fe95234f880 -1 mon.a@-1(???) e27 commit suicide! debug 2023-03-09T11:00:31.234+ 7fe95234f880 -1 failed to initialize The fact is original monitor clusters ids are k,m,o, however the new ones are a,b,d. It was deployed by rook. Any ideas to make this work? Ben Eugen Block 于2023年3月9日周四 16:00写道: Hi, there's no attachment to your email, please use something like pastebin to provide OSD logs. Thanks Eugen Zitat von Ben : > Hi, > > I ended up with having whole set of osds to get back original ceph cluster. > I figured out to make the cluster running. However, it's status is > something as below: > > bash-4.4$ ceph -s > > cluster: > > id: 3f271841-6188-47c1-b3fd-90fd4f978c76 > > health: HEALTH_WARN > > 7 daemons have recently crashed > > 4 slow ops, oldest one blocked for 35077 sec, daemons > [mon.a,mon.b] have slow ops. > > > > services: > > mon: 3 daemons, quorum a,b,d (age 9h) > > mgr: b(active, since 14h), standbys: a > > osd: 4 osds: 0 up, 4 in (since 9h) > > > > data: > > pools: 0 pools, 0 pgs > > objects: 0 objects, 0 B > > usage: 0 B used, 0 B / 0 B avail > > pgs: > > > All osds are down. > > > I checked the osds logs and attached with this. > > > Please help and I wonder if it's possible to get the cluster back. I have > some backup for monitor's data. Till now I haven't restore that in the > course. > > > Thanks, > > Ben > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: restoring ceph cluster from osds
Hi Eugen, Thank you for help on this. Forget the log. A little progress, the monitors store were restored. I created a new ceph cluster to use the restored monitors store. But the monitor log complains: debug 2023-03-09T11:00:31.233+ 7fe95234f880 0 starting mon.a rank -1 at public addrs [v2:169.169.163.25:3300/0,v1:169.169.163.25:6789/0] at bind addrs [v2:197.166.206.27:3300/0,v1:197.166.206.27:6789/0] mon_data /var/lib/ceph/mon/ceph-a fsid 3f271841-6188-47c1-b3fd-90fd4f978c76 debug 2023-03-09T11:00:31.234+ 7fe95234f880 1 mon.a@-1(???) e27 preinit fsid 3f271841-6188-47c1-b3fd-90fd4f978c76 debug 2023-03-09T11:00:31.234+ 7fe95234f880 -1 mon.a@-1(???) e27 not in monmap and have been in a quorum before; must have been removed debug 2023-03-09T11:00:31.234+ 7fe95234f880 -1 mon.a@-1(???) e27 commit suicide! debug 2023-03-09T11:00:31.234+ 7fe95234f880 -1 failed to initialize The fact is original monitor clusters ids are k,m,o, however the new ones are a,b,d. It was deployed by rook. Any ideas to make this work? Ben Eugen Block 于2023年3月9日周四 16:00写道: > Hi, > > there's no attachment to your email, please use something like > pastebin to provide OSD logs. > > Thanks > Eugen > > Zitat von Ben : > > > Hi, > > > > I ended up with having whole set of osds to get back original ceph > cluster. > > I figured out to make the cluster running. However, it's status is > > something as below: > > > > bash-4.4$ ceph -s > > > > cluster: > > > > id: 3f271841-6188-47c1-b3fd-90fd4f978c76 > > > > health: HEALTH_WARN > > > > 7 daemons have recently crashed > > > > 4 slow ops, oldest one blocked for 35077 sec, daemons > > [mon.a,mon.b] have slow ops. > > > > > > > > services: > > > > mon: 3 daemons, quorum a,b,d (age 9h) > > > > mgr: b(active, since 14h), standbys: a > > > > osd: 4 osds: 0 up, 4 in (since 9h) > > > > > > > > data: > > > > pools: 0 pools, 0 pgs > > > > objects: 0 objects, 0 B > > > > usage: 0 B used, 0 B / 0 B avail > > > > pgs: > > > > > > All osds are down. > > > > > > I checked the osds logs and attached with this. > > > > > > Please help and I wonder if it's possible to get the cluster back. I have > > some backup for monitor's data. Till now I haven't restore that in the > > course. > > > > > > Thanks, > > > > Ben > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: rbd on EC pool with fast and extremely slow writes/reads
Thanks for the hint, did run some short test, all fine. I am not sure it's a drive issue. Some more digging, the file with bad performance has this segments: [root@afsvos01 vicepa]# hdparm --fibmap $PWD/0 /vicepa/0: filesystem blocksize 4096, begins at LBA 2048; assuming 512 byte sectors. byte_offset begin_LBA end_LBA sectors 0 743232 2815039 2071808 1060765696 3733064 5838279 2105216 2138636288 70841232 87586575 16745344 10712252416 87586576 87635727 49152 Reading by segments: # dd if=0 of=/tmp/0 bs=4M status=progress count=252 1052770304 bytes (1.1 GB, 1004 MiB) copied, 45 s, 23.3 MB/s 252+0 records in 252+0 records out # dd if=0 of=/tmp/0 bs=4M status=progress skip=252 count=256 935329792 bytes (935 MB, 892 MiB) copied, 4 s, 234 MB/s 256+0 records in 256+0 records out # dd if=0 of=/tmp/0 bs=4M status=progress skip=510 7885291520 bytes (7.9 GB, 7.3 GiB) copied, 12 s, 657 MB/s 2050+0 records in 2050+0 records out So, 1st 1G is very slow, second segment is faster, then the rest quite fast, and it's reproducible (dropped caches before each dd) Now, the rbd is 3TB with 256 pgs (EC 8+3), I checked with rados that objects are randomly distributed on pgs, eg # rados --pgid 23.82 ls|grep rbd_data.20.2723bd3292f6f8 rbd_data.20.2723bd3292f6f8.0008 rbd_data.20.2723bd3292f6f8.000d rbd_data.20.2723bd3292f6f8.01cb rbd_data.20.2723bd3292f6f8.000601b2 rbd_data.20.2723bd3292f6f8.0009001b rbd_data.20.2723bd3292f6f8.005b rbd_data.20.2723bd3292f6f8.000900e8 where object ...05b for example corresponds to the 1st block of the file I am testing. Well, if my understanding of rbd is correct: I assume that LBA regions are mapped to consecutive rbd objects. So, now I am completely confused since the slow chunk of the file is still mapped to ~256 objects on different pgs Maybe I misunderstood the whole thing. Any other hints? we will still do hdd tests on all the drives Cheers, Andrej On 3/6/23 20:25, Paul Mezzanini wrote: When I have seen behavior like this it was a dying drive. It only became obviously when I did a smart long test and I got failed reads. Still reported smart OK though so that was a lie. -- Paul Mezzanini Platform Engineer III Research Computing Rochester Institute of Technology From: Andrej Filipcic Sent: Monday, March 6, 2023 8:51 AM To: ceph-users Subject: [ceph-users] rbd on EC pool with fast and extremely slow writes/reads Hi, I have a problem on one of ceph clusters I do not understand. ceph 17.2.5 on 17 servers, 400 HDD OSDs, 10 and 25Gb/s NICs 3TB rbd image is on erasure coded 8+3 pool with 128pgs , xfs filesystem, 4MB objects in rbd image, mostly empy. I have created a bunch of 10G files, most of them were written with 1.5GB/s, few of them were really slow, ~10MB/s, a factor of 100. When reading these files back, the fast-written ones are read fast, ~2-2.5GB/s, the slowly-written are also extremely slow in reading, iotop shows between 1 and 30 MB/s reading speed. This does not happen at all on replicated images. There are some OSDs with higher apply/commit latency, eg 200ms, but there are no slow ops. The tests were done actually on proxmox vm with librbd, but the same happens with krbd, and on bare metal with mounted krbd as well. I have tried to check all OSDs for laggy drives, but they all look about the same. I have also copied entire image with "rados get...", object by object, the strange thing here is that most of objects were copied within 0.1-0.2s, but quite some took more than 1s. The cluster is quite busy with base traffic of ~1-2GB/s, so the speeds can vary due to that. But I would not expect a factor of 100 slowdown for some writes/reads with rbds. Any clues on what might be wrong or what else to check? I have another similar ceph cluster where everything looks fine. Best, Andrej -- _ prof. dr. Andrej Filipcic, E-mail:andrej.filip...@ijs.si Department of Experimental High Energy Physics - F9 Jozef Stefan Institute, Jamova 39, P.o.Box 3000 SI-1001 Ljubljana, Slovenia Tel.: +386-1-477-3674Fax: +386-1-477-3166 - ___ ceph-users mailing list --ceph-users@ceph.io To unsubscribe send an email toceph-users-le...@ceph.io -- _ prof. dr. Andrej Filipcic, E-mail:andrej.filip...@ijs.si Department of Experimental High Energy Physics - F9 Jozef Stefan Institute, Jamova 39, P.o.Box 3000 SI-1001 Ljubljana, Slovenia Tel.: +386-1-477-3674Fax: +386-1-477-3166 - ___ ceph-users mailing list --
[ceph-users] libceph: mds1 IP+PORT wrong peer at address
Hi all, we seem to have hit a bug in the ceph fs kernel client and I just want to confirm what action to take. We get the error "wrong peer at address" in dmesg and some jobs on that server seem to get stuck in fs access; log extract below. I found these 2 tracker items https://tracker.ceph.com/issues/23883 and https://tracker.ceph.com/issues/41519, which don't seem to have fixes. My questions: - Is this harmless or does it indicate invalid/corrupted client cache entries? - How to resolve, ignore, umount+mount or reboot? Here an extract from the dmesg log, the error has survived a couple of MDS restarts already: [Mon Mar 6 12:56:46 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address [Mon Mar 6 13:05:18 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-1572619386 [Mon Mar 6 13:05:18 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address [Mon Mar 6 13:13:50 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-1572619386 [Mon Mar 6 13:13:50 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address [Mon Mar 6 13:16:41 2023] libceph: mds1 192.168.32.87:6801 socket closed (con state OPEN) [Mon Mar 6 13:16:41 2023] libceph: mds1 192.168.32.87:6801 socket closed (con state OPEN) [Mon Mar 6 13:16:45 2023] ceph: mds1 reconnect start [Mon Mar 6 13:16:45 2023] ceph: mds1 reconnect start [Mon Mar 6 13:16:48 2023] ceph: mds1 reconnect success [Mon Mar 6 13:16:48 2023] ceph: mds1 reconnect success [Mon Mar 6 13:18:13 2023] ceph: update_snap_trace error -22 [Mon Mar 6 13:18:17 2023] libceph: mds7 192.168.32.88:6801 socket closed (con state OPEN) [Mon Mar 6 13:18:17 2023] libceph: mds7 192.168.32.88:6801 socket closed (con state OPEN) [Mon Mar 6 13:18:23 2023] ceph: mds1 recovery completed [Mon Mar 6 13:18:23 2023] ceph: mds1 recovery completed [Mon Mar 6 13:18:28 2023] ceph: mds7 reconnect start [Mon Mar 6 13:18:28 2023] ceph: mds7 reconnect start [Mon Mar 6 13:18:28 2023] ceph: mds7 reconnect success [Mon Mar 6 13:18:29 2023] ceph: mds7 reconnect success [Mon Mar 6 13:18:35 2023] ceph: update_snap_trace error -22 [Mon Mar 6 13:18:35 2023] ceph: mds7 recovery completed [Mon Mar 6 13:18:35 2023] ceph: mds7 recovery completed [Mon Mar 6 13:22:22 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347 [Mon Mar 6 13:22:22 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address [Mon Mar 6 13:30:54 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347 [...] [Thu Mar 9 09:37:24 2023] slurm.epilog.cl (31457): drop_caches: 3 [Thu Mar 9 09:38:26 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347 [Thu Mar 9 09:38:26 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address [Thu Mar 9 09:46:58 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347 [Thu Mar 9 09:46:58 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address [Thu Mar 9 09:55:30 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347 [Thu Mar 9 09:55:30 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address [Thu Mar 9 10:04:02 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347 [Thu Mar 9 10:04:02 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] radosgw - octopus - 500 Bad file descriptor on upload
Hi, we've observed 500er errors on uploading files to a single bucket, but the problem went away after around 2 hours. We've checked and saw the following error message: 2023-03-08T17:55:58.778+ 7f8062f15700 0 WARNING: set_req_state_err err_no=125 resorting to 500 2023-03-08T17:55:58.778+ 7f8062f15700 0 ERROR: RESTFUL_IO(s)->complete_header() returned err=Bad file descriptor 2023-03-08T17:55:58.778+ 7f8062f15700 1 == req done req=0x7f81d0189700 op status=-125 http_status=500 latency=65003730017ns == 2023-03-08T17:55:58.778+ 7f8062f15700 1 beast: 0x7f81d0189700: IPADDRESS - - [2023-03-08T17:55:58.778961+] "PUT /BUCKET/OBJECT HTTP/1.1" 500 57 - "aws-sdk-php/3.257.11 OS/Linux/5.15.0-60-generic lang/php/8.2.3 GuzzleHttp/7" - It only happened to a single bucket over a period of 1-2 hours (around 300 requests). In the same time we've had >20k PUT requests the were working fine on other buckets. This error also seem to happen to other buckets, but only very sporadically. Did someone encounter this issue or knows what it could be? Cheers Boris ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: LRC k6m3l3, rack outage and availability
Hi, I haven't had the chance to play with LRC yet, so I can't really comment on that. But can you share your osd tree as well? I assume you already did, but can you verify that the crush rule works as expected and the chunks are distributed correctly? Regards, Eugen Zitat von steve.bake...@gmail.com: Hi, currently we are testing LRC codes and I got a cluster setup with 3 racks and 4 hosts in each of those. What I want to achieve is to have a storage efficient erasure code (<=200%) and also availability during a rack outage. In (my) theory, that should have worked with the LRC k6m3l3 having a crush-locality=rack and a crush-failure domain=host. But when I tested it, the PGs of the pool all go in the "down" state. So, when we've got k=6 data chunks and m=3 coding chunks, the data should be reconstructable with 6 of these 9 objects. With l=3, LRC splits these 9 objects in 3 groups of 3 objects and creates one additional locality-chunk per group. We now got 3 groups of 4 objects. These 3 groups get distributed over the 3 racks, the 4 objects of each group get distributed over the 4 hosts of a rack. I thought that on a full rack outage, the 6 remaining k/m chunks on the other 2 racks should still be enough to keep up the availability and the cluster could proceed in a degraded state. But it does not, so I guess my thinking is wrong :) I wonder what's the reason for this, is it maybe some min_size setting ? The default min_size of this pool becomes 7 - I also changed that to 6 (yes, one shouldn't do that in productrion I think) but got the same result. Below I've added some details about the cluster, pool creation and pg dumps. Any ideas ? Can s.o. explain why this does not work or give another solution how to achieve the described specifications? Thx! Ceph version: ceph --version ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable) # Creation of the pool: # ceph osd erasure-code-profile set lrc_individual_profile plugin=lrc k=6 m=3 l=3 crush-failure-domain=host crush-locality=rack crush-root=default ceph osd pool create lrc_individual_pool 1024 1024 erasure lrc_individual_profile ceph osd pool set lrc_individual_pool pg_num 1024 ceph osd pool set lrc_individual_pool pg_num_min 1024 ceph osd pool set lrc_individual_pool pgp_num 1024 ceph osd pool set lrc_individual_pool pg_autoscale_mode warn ceph osd pool set lrc_individual_pool bulk true ## Resulting pool details: ## ceph osd pool ls detail pool 72 'lrc_individual_pool' erasure profile lrc_individual_profile size 12 min_size 7 crush_rule 1 object_hash rjenkins pg_num 1024 pgp_num 1024 autoscale_mode warn last_change 140484 flags hashpspool,bulk stripe_width 24576 pg_num_min 1024 ceph osd pool get lrc_individual_pool all size: 12 min_size: 7 pg_num: 1024 pgp_num: 1024 crush_rule: lrc_individual_pool hashpspool: true allow_ec_overwrites: false nodelete: false nopgchange: false nosizechange: false write_fadvise_dontneed: false noscrub: false nodeep-scrub: false use_gmt_hitset: 1 erasure_code_profile: lrc_individual_profile fast_read: 0 pg_autoscale_mode: warn pg_num_min: 1024 bulk: true # Resulting crush rule: # ceph osd crush rule dump lrc_individual_pool { "rule_id": 1, "rule_name": "lrc_individual_pool", "ruleset": 1, "type": 3, "min_size": 3, "max_size": 12, "steps": [ { "op": "set_chooseleaf_tries", "num": 5 }, { "op": "set_choose_tries", "num": 100 }, { "op": "take", "item": -1, "item_name": "default" }, { "op": "choose_indep", "num": 3, "type": "rack" }, { "op": "chooseleaf_indep", "num": 4, "type": "host" }, { "op": "emit" } ] } Ceph status after the rack outage: cluster: id: ... health: HEALTH_WARN 96 osds down 4 hosts (96 osds) down 1 rack (96 osds) down Reduced data availability: 1024 pgs inactive, 1024 pgs down services: mon: 3 daemons, quorum ...,...,... (age 4d) mgr: ...(active, since 4d), standbys: ...,... osd: 288 osds: 192 up (since 116s), 288 in (since 21h) data: pools: 2 pools, 1025 pgs objects: 291 objects, 0 B usage: 199 GiB used, 524 TiB / 524 TiB avail pgs: 99.902% pgs not active 1024 down 1active+clean # Section of pg dump: # 72.32 0 0 0 00 00 0 0 0 down 2023-03-08T09:04:02.992141+0100
[ceph-users] Re: restoring ceph cluster from osds
Hi, there's no attachment to your email, please use something like pastebin to provide OSD logs. Thanks Eugen Zitat von Ben : Hi, I ended up with having whole set of osds to get back original ceph cluster. I figured out to make the cluster running. However, it's status is something as below: bash-4.4$ ceph -s cluster: id: 3f271841-6188-47c1-b3fd-90fd4f978c76 health: HEALTH_WARN 7 daemons have recently crashed 4 slow ops, oldest one blocked for 35077 sec, daemons [mon.a,mon.b] have slow ops. services: mon: 3 daemons, quorum a,b,d (age 9h) mgr: b(active, since 14h), standbys: a osd: 4 osds: 0 up, 4 in (since 9h) data: pools: 0 pools, 0 pgs objects: 0 objects, 0 B usage: 0 B used, 0 B / 0 B avail pgs: All osds are down. I checked the osds logs and attached with this. Please help and I wonder if it's possible to get the cluster back. I have some backup for monitor's data. Till now I haven't restore that in the course. Thanks, Ben ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io