Hi,
I haven't seen this error yet, did you upgrade while the cluster was
not healthy? The more history you can provide, the better.
Can you add the output of these CLI commands?
ceph -s
ceph health detail
ceph pg ls-by-pool <pool_with_id_57> (not the entire output, just to
see if they are listed)
Before deleting a PG, I'd export it with ceph-objectstore-tool, just
in case. Then you could try to remove it from one OSD (also with
ceph-objectstore-tool) and see if that single OSD starts again. If it
works, you could do the same for the remaining PG chunks.
Downgrading is generally not supported, so you might break even more.
Regards,
Eugen
Zitat von Daniel Williams <[email protected]>:
Some background pool 57 is a new rbd pool (12MiB used) that I was just
experimenting with (performance of striped hdd rbd devices), I don't think
I deleted it but can't say for sure (it appears in ceph df) since it
doesn't matter.
This pool was created on reef, a full deep scrub has been done several
times over since moving to reef (March 2024), likely no deep scrub has been
done since moving to squid since I've had lots of troubles...
This error however has broken a 150TiB machine and worse I don't know that
a restart won't break others..
After a host reboot I've lost half the OSDs on that host, they all refuse
to start with:
-725> 2025-09-25T18:02:37.157+0000 7f93d0aab8c0 -1 Falling back to public
interface
-2> 2025-09-25T18:02:40.033+0000 7f93d0aab8c0 -1 osd.21 2098994 init
missing pg_pool_t for deleted pool 57 for pg 57.3s7; please downgrade to
luminous and allow pg deletion to complete before upgrading
-1> 2025-09-25T18:02:40.037+0000 7f93d0aab8c0 -1 ./src/osd/OSD.cc: In
function 'int OSD::init()' thread 7f93d0aab8c0 time
2025-09-25T18:02:40.040491+0000
./src/osd/OSD.cc: 3867: ceph_abort_msg("abort() called")
ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62) squid
(stable)
1: (ceph::__ceph_abort(char const*, int, char const*,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&)+0xb7) [0x560b78a7056a]
2: /usr/bin/ceph-osd(+0x385bcb) [0x560b789f0bcb]
3: main()
4: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f93d165dd90]
5: __libc_start_main()
6: _start()
0> 2025-09-25T18:02:40.037+0000 7f93d0aab8c0 -1 *** Caught signal
(Aborted) **
in thread 7f93d0aab8c0 thread_name:ceph-osd
ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62) squid
(stable)
1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f93d1676520]
2: pthread_kill()
3: raise()
4: abort()
5: (ceph::__ceph_abort(char const*, int, char const*,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&)+0x16a) [0x560b78a7061d]
6: /usr/bin/ceph-osd(+0x385bcb) [0x560b789f0bcb]
7: main()
8: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f93d165dd90]
9: __libc_start_main()
10: _start()
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.
-725> 2025-09-25T18:02:37.157+0000 7f93d0aab8c0 -1 Falling back to public
interface
-2> 2025-09-25T18:02:40.033+0000 7f93d0aab8c0 -1 osd.21 2098994 init
missing pg_pool_t for deleted pool 57 for pg 57.3s7; please downgrade to
luminous and allow pg deletion to complete before upgrading
-1> 2025-09-25T18:02:40.037+0000 7f93d0aab8c0 -1 ./src/osd/OSD.cc: In
function 'int OSD::init()' thread 7f93d0aab8c0 time
2025-09-25T18:02:40.040491+0000
./src/osd/OSD.cc: 3867: ceph_abort_msg("abort() called")
ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62) squid
(stable)
1: (ceph::__ceph_abort(char const*, int, char const*,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&)+0xb7) [0x560b78a7056a]
2: /usr/bin/ceph-osd(+0x385bcb) [0x560b789f0bcb]
3: main()
4: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f93d165dd90]
5: __libc_start_main()
6: _start()
0> 2025-09-25T18:02:40.037+0000 7f93d0aab8c0 -1 *** Caught signal
(Aborted) **
in thread 7f93d0aab8c0 thread_name:ceph-osd
ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62) squid
(stable)
1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f93d1676520]
2: pthread_kill()
3: raise()
4: abort()
5: (ceph::__ceph_abort(char const*, int, char const*,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&)+0x16a) [0x560b78a7061d]
6: /usr/bin/ceph-osd(+0x385bcb) [0x560b789f0bcb]
7: main()
8: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f93d165dd90]
9: __libc_start_main()
10: _start()
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.
Aborted
Will deleting the PG help? Is there any way I can recover these OSDs? Will
moving back to reef help?
Daniel
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]