[ceph-users] Re: Cannot adopt osd (on squid)

Chris Palmer via ceph-users Fri, 09 Jan 2026 05:53:41 -0800


On 09/01/2026 13:22, Loïc Tortay via ceph-users wrote:

On 09/01/2026 13:15, Chris Palmer via ceph-users wrote:
I have a squid package-installed test cluster that I am trying toadopt into cephadm.
Monitors & managers have been adopted, and the orchestrator isfunctioning.
When I try to adopt an OSD though (e.g. osd.0), cephadm creates thenew directory (<fsid>/osd.0) and populates the files from the legacydirectory. It then crashes complaining that the directory osd/ceph-0is not empty so cannot be removed. In some cases none of the fileshave been removed, and in others the block symlink andrequire_osd_release file remain. I can remove the new <fsid>/osd.0and use ceph-volume to activate the OSD again.
cephadm.log reports that every file has been moved and the ownershipset - even for files that still remain in the legacy directory. Itdoesn't report any other errors (apart from the directory not emptyerror. when it crashes).
I really can't find where to go from here. Any pointers appreciated...
Hello,
We've had the same issue, with clusters initially installed/managedwith Ceph-Ansible.
The issue (w/ Ceph-Ansible) seems to be related to some failure duringthe initial configuration which left multiple mounts on the same"/var/lib/ceph/$CLUSTERID/osd.$OSD" directory (it's an activemountpoint therefore it can't be removed).
"mount|grep /var/lib/ceph|sort|uniq -c|sort -n" shows which OSDs
have multiple mounts (first field).
After the failed adoption, our workaround/recovery procedure was thefollowing ($NOK is the OSD number):
# Move the configuration out of the way (do not remove it)
mv -v /var/lib/ceph/$CLUSTERID/osd.$NOK /var/lib/ceph/osd.$NOK.nok
# If there are more than 2 mounts on the same directory, umount allbut # one to allow the adoption process to complete (the configurationwill # not be valid, the OSD won't start)
cephadm adopt --style legacy --name osd.$NOK
# Copy the correct configuration files to the adopted OSD directory
cd /var/lib/ceph/$CLUSTERID/osd.$NOK
cp -v ../../osd.$NOK.nok/{keyring,fsid} .
# $OKOSD is the number of a successfully adopted OSD.
cp -v ../../osd.$OKOSD/config .
#
# Check the "block*" links point to the same devices as in the first
# directory ("/var/lib/ceph/osd.$NOK.nok"), correct the links if
# they're wrong
#
# Make sure the files & links belong the correct UID for the containers
chown -hv 167:167 block*
chown -v 167:167 *

systemctl status ceph-$CLUSTERID@osd.$NOK.service
systemctl start ceph-$CLUSTERID@osd.$NOK.service
systemctl status ceph-$CLUSTERID@osd.$NOK.service

Other things we have noted after adoptions are:
- do *not* remove the packages used before the adoption, they (atleast the RPMs) have pre-remove scripts which stop the OSDs/Ceph onthe node- cleanup "logrotate" conflicting configurations from the packages &containers (i.e. remove the configuration files provided by the packages)
Loïc.

Thanks for the info, and the last two hints! Will have to watch out forthose...

This test cluster was freshly built on squid 19.2.3 for testing thismigration, purely using the manual install method (not ceph-ansible).I've already triple-checked the mounts, symlinks, ownerships, etc, andwe don't have any of those problems.

Some progress though. The problem is not entirely repeatable. The filesare always replicated into the new directory correctly. Sometimes nofiles are removed from the legacy directory (and it fails as it is notempty), sometimes some are left (and it fails), and sometimes it works.Even on the same OSD.

If it fails, I can manually remove the files from the legacy directory,and repeat the adopt. It then completes, removing the legacy directory,although the OSD is still stopped. Manually starting it using "ceph orchdaemon start osd.n" works.

I've also noticed that sometimes it takes a while to appear in "cephorch ps", and I have to wait until it does before starting it.

I'm going to do some reboot and daemon restart tests, then continue onthe rest of the OSDs....


Thanks, Chris
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Cannot adopt osd (on squid)

Reply via email to