On 09/01/2026 13:22, Loïc Tortay via ceph-users wrote:
On 09/01/2026 13:15, Chris Palmer via ceph-users wrote:
I have a squid package-installed test cluster that I am trying to
adopt into cephadm.
Monitors & managers have been adopted, and the orchestrator is
functioning.
When I try to adopt an OSD though (e.g. osd.0), cephadm creates the
new directory (<fsid>/osd.0) and populates the files from the legacy
directory. It then crashes complaining that the directory osd/ceph-0
is not empty so cannot be removed. In some cases none of the files
have been removed, and in others the block symlink and
require_osd_release file remain. I can remove the new <fsid>/osd.0
and use ceph-volume to activate the OSD again.
cephadm.log reports that every file has been moved and the ownership
set - even for files that still remain in the legacy directory. It
doesn't report any other errors (apart from the directory not empty
error. when it crashes).
I really can't find where to go from here. Any pointers appreciated...
Hello,
We've had the same issue, with clusters initially installed/managed
with Ceph-Ansible.
The issue (w/ Ceph-Ansible) seems to be related to some failure during
the initial configuration which left multiple mounts on the same
"/var/lib/ceph/$CLUSTERID/osd.$OSD" directory (it's an active
mountpoint therefore it can't be removed).
"mount|grep /var/lib/ceph|sort|uniq -c|sort -n" shows which OSDs
have multiple mounts (first field).
After the failed adoption, our workaround/recovery procedure was the
following ($NOK is the OSD number):
# Move the configuration out of the way (do not remove it)
mv -v /var/lib/ceph/$CLUSTERID/osd.$NOK /var/lib/ceph/osd.$NOK.nok
# If there are more than 2 mounts on the same directory, umount all
but # one to allow the adoption process to complete (the configuration
will # not be valid, the OSD won't start)
cephadm adopt --style legacy --name osd.$NOK
# Copy the correct configuration files to the adopted OSD directory
cd /var/lib/ceph/$CLUSTERID/osd.$NOK
cp -v ../../osd.$NOK.nok/{keyring,fsid} .
# $OKOSD is the number of a successfully adopted OSD.
cp -v ../../osd.$OKOSD/config .
#
# Check the "block*" links point to the same devices as in the first
# directory ("/var/lib/ceph/osd.$NOK.nok"), correct the links if
# they're wrong
#
# Make sure the files & links belong the correct UID for the containers
chown -hv 167:167 block*
chown -v 167:167 *
systemctl status ceph-$CLUSTERID@osd.$NOK.service
systemctl start ceph-$CLUSTERID@osd.$NOK.service
systemctl status ceph-$CLUSTERID@osd.$NOK.service
Other things we have noted after adoptions are:
- do *not* remove the packages used before the adoption, they (at
least the RPMs) have pre-remove scripts which stop the OSDs/Ceph on
the node
- cleanup "logrotate" conflicting configurations from the packages &
containers (i.e. remove the configuration files provided by the packages)
Loïc.
Thanks for the info, and the last two hints! Will have to watch out for
those...
This test cluster was freshly built on squid 19.2.3 for testing this
migration, purely using the manual install method (not ceph-ansible).
I've already triple-checked the mounts, symlinks, ownerships, etc, and
we don't have any of those problems.
Some progress though. The problem is not entirely repeatable. The files
are always replicated into the new directory correctly. Sometimes no
files are removed from the legacy directory (and it fails as it is not
empty), sometimes some are left (and it fails), and sometimes it works.
Even on the same OSD.
If it fails, I can manually remove the files from the legacy directory,
and repeat the adopt. It then completes, removing the legacy directory,
although the OSD is still stopped. Manually starting it using "ceph orch
daemon start osd.n" works.
I've also noticed that sometimes it takes a while to appear in "ceph
orch ps", and I have to wait until it does before starting it.
I'm going to do some reboot and daemon restart tests, then continue on
the rest of the OSDs....
Thanks, Chris
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]