I think I tracked down the problem to lacking permissions on the /etc/shadow file in the squid container image:
docker exec -it ceph-eac6da30-d72a-11f0-88be-7cc255639332-osd-9 ls -l /etc/shadow ---------- 1 root root 583 Nov 4 16:31 /etc/shadow I have verified that the same situation exists in the tentacle images. if I make the file readable by root: docker exec -it ceph-eac6da30-d72a-11f0-88be-7cc255639332-osd-9 chmod u+r /etc/shadow docker exec -it ceph-eac6da30-d72a-11f0-88be-7cc255639332-osd-9 ls -l /etc/shadow -r-------- 1 root root 583 Nov 4 16:31 /etc/shadow Then the sudo command needed by osd works perfectly. Looking at the hosts dmesg I finally noticed that the problem is with the default apparmor profile called unix_chkpwd: [Mon Dec 15 09:52:25 2025] audit: type=1400 audit(1765792348.851:248): apparmor="DENIED" operation="open" class="file" profile="unix-chkpwd" name="/dev/console" pid=1028560 comm="unix_chkpwd" requested_mask="w" denied_mask="w" fsuid= 0 ouid=0 I disabled the profile by running these commands on the host: mkdir /etc/apparmor.d/disable/ ln -s /etc/apparmor.d/unix-chkpwd /etc/apparmor.d/disable/ apparmor_parser -R /etc/apparmor.d/unix-chkpwd After getting rid of the app armor profile all the osd containers started working perfectly. It's quite possible that there is a more elegant solution, but I can't be bothered to find it at the moment. On Fri, 12 Dec 2025 at 11:58, Flemming Frandsen via ceph-users < [email protected]> wrote: > I've been building a new cluster with cephadm, the OS is Ubuntu 24.04 and > I'm using the ubuntu provided host packages, docker is 29.1.2 and > containerd is 2.2.0 and the ceph release is squid 19.2.3. > > Everything seems to work just perfectly, except for scrape-health-metrics, > which records this result (for all of the osds in the 4 hosts in the > cluster): > "20251212-074916": { > "dev": "/dev/nvme5n1", > "error": "smartctl failed", > "nvme_smart_health_information_add_log_error": "nvme returned an > error: sudo: exit status: 1", > "nvme_smart_health_information_add_log_error_code": -22, > "nvme_vendor": "samsung", > "smartctl_error_code": -22, > "smartctl_output": "smartctl returned an error (1): stderr:\nsudo: > exit status: 1\nstdout:\n" > }, > > Digging though the source I found that it's the OSD container that runs the > command: sudo /usr/sbin/smartctl -x --json=o /dev/nvme5n1 > > I can run the command in the container using: > docker exec -it -u ceph ceph-eac6da30-d72a-11f0-88be-7cc255639332-osd-3 > sudo /usr/sbin/smartctl -x --json=o /dev/nvme5n1 > > The result of the manual command is: > sudo: PAM account management error: Authentication service cannot retrieve > authentication info > sudo: a password is required > > I have enabled debugging for sudo, in the container and verified that the > command that the osd is running, is same one and that the error is the same > as well, running ceph device scrape-health-metrics causes this sudo debug > output: > > Dec 12 10:30:35 sudo[845] user command "/usr/sbin/smartctl -x --json=o > /dev/nvme5n1" matches sudoers command "/usr/sbin/smartctl -x --json=o > /dev/*": true @ command_matches() ./match_command.c:667 > Dec 12 10:30:35 sudo[845] userspec matched @ > /etc/sudoers.d/ceph-smartctl:4:57: allowed @ sudoers_lookup_check() > ./parse.c:167 > Dec 12 10:30:35 sudo[845] sudo_putenv: SUDO_COMMAND=/usr/sbin/smartctl -x > --json=o /dev/nvme5n1 > Dec 12 10:30:35 sudo[845] <- new_logline @ ./eventlog.c:218 := PAM account > management error: Authentication service cannot retrieve authentication > info ; TTY=pts/8 ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -x > --json=o /dev/nvme5n1 > > > My only guess is that I have chosen a too-new Ubuntu version for Ceph and I > should just bite the bullet and re-install with Ubuntu 22.04, but if anyone > has a better idea, please let me know. > > -- > Flemming Frandsen - YAPH - http://osaa.dk - http://dren.dk/ > _______________________________________________ > ceph-users mailing list -- [email protected] > To unsubscribe send an email to [email protected] > -- Flemming Frandsen - YAPH - http://osaa.dk - http://dren.dk/ _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
