On 09/01/2026 13:22, Loïc Tortay via ceph-users wrote:
On 09/01/2026 13:15, Chris Palmer via ceph-users wrote:
I have a squid package-installed test cluster that I am trying to adopt into cephadm.

Monitors & managers have been adopted, and the orchestrator is functioning.

When I try to adopt an OSD though (e.g. osd.0), cephadm creates the new directory (<fsid>/osd.0) and populates the files from the legacy directory. It then crashes complaining that the directory osd/ceph-0 is not empty so cannot be removed. In some cases none of the files have been removed, and in others the block symlink and require_osd_release file remain. I can remove the new <fsid>/osd.0 and use ceph-volume to activate the OSD again.

cephadm.log reports that every file has been moved and the ownership set - even for files that still remain in the legacy directory. It doesn't report any other errors (apart from the directory not empty error. when it crashes).

I really can't find where to go from here. Any pointers appreciated...


Hello,
We've had the same issue, with clusters initially installed/managed with Ceph-Ansible.

The issue (w/ Ceph-Ansible) seems to be related to some failure during the initial configuration which left multiple mounts on the same "/var/lib/ceph/$CLUSTERID/osd.$OSD" directory (it's an active mountpoint therefore it can't be removed).
"mount|grep /var/lib/ceph|sort|uniq -c|sort -n" shows which OSDs
have multiple mounts (first field).

After the failed adoption, our workaround/recovery procedure was the following ($NOK is the OSD number):
# Move the configuration out of the way (do not remove it)
mv -v /var/lib/ceph/$CLUSTERID/osd.$NOK /var/lib/ceph/osd.$NOK.nok
# If there are more than 2 mounts on the same directory, umount all but # one to allow the adoption process to complete (the configuration will # not be valid, the OSD won't start)
cephadm adopt --style legacy --name osd.$NOK
# Copy the correct configuration files to the adopted OSD directory
cd /var/lib/ceph/$CLUSTERID/osd.$NOK
cp -v ../../osd.$NOK.nok/{keyring,fsid} .
# $OKOSD is the number of a successfully adopted OSD.
cp -v ../../osd.$OKOSD/config .
#
# Check the "block*" links point to the same devices as in the first
# directory ("/var/lib/ceph/osd.$NOK.nok"), correct the links if
# they're wrong
#
# Make sure the files & links belong the correct UID for the containers
chown -hv 167:167 block*
chown -v 167:167 *

systemctl status ceph-$CLUSTERID@osd.$NOK.service
systemctl start ceph-$CLUSTERID@osd.$NOK.service
systemctl status ceph-$CLUSTERID@osd.$NOK.service

Other things we have noted after adoptions are:
- do *not* remove the packages used before the adoption, they (at least the RPMs) have pre-remove scripts which stop the OSDs/Ceph on the node - cleanup "logrotate" conflicting configurations from the packages & containers (i.e. remove the configuration files provided by the packages)


Loïc.


Thanks for the info, and the last two hints! Will have to watch out for those...

This test cluster was freshly built on squid 19.2.3 for testing this migration, purely using the manual install method (not ceph-ansible). I've already triple-checked the mounts, symlinks, ownerships, etc, and we don't have any of those problems.

Some progress though. The problem is not entirely repeatable. The files are always replicated into the new directory correctly. Sometimes no files are removed from the legacy directory (and it fails as it is not empty), sometimes some are left (and it fails), and sometimes it works. Even on the same OSD.

If it fails, I can manually remove the files from the legacy directory, and repeat the adopt. It then completes, removing the legacy directory, although the OSD is still stopped. Manually starting it using "ceph orch daemon start osd.n" works.

I've also noticed that sometimes it takes a while to appear in "ceph orch ps", and I have to wait until it does before starting it.

I'm going to do some reboot and daemon restart tests, then continue on the rest of the OSDs....

Thanks, Chris
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to