On Tue Dec 23, 2025 at 1:43 PM CET, Maximiliano Sandoval wrote: > "Max R. Carrara" <[email protected]> writes: > > > Fix #6816: Prevent ceph-exporter Daemon from Crashing on Startup - v2 > > ===================================================================== > > > > tl;dr: Stop ceph-exporter.service from ending up in a crash loop by > > handing it a custom keyring file and setting its group to `www-data`, > > similar to what we did for ceph-crash.service [0] before. > > > > This is a refresh of a somewhat older series that has been rebased, with > > the version guard in `debian/postinst` adapted. The description from the > > previous version is provided here again for the reader's convenience. > > > > Currently, the `ceph-exporter` daemon ends up in a short startup crash > > loop before ultimately failing to start at all, because it tries to > > access the keyring file at `/etc/pve/priv/ceph.client.admin.keyring`, > > for which it doesn't have the permissions to do so. > > > > Instead of giving it access to the admin ring, give it its own keyring > > located at `/etc/pve/ceph/ceph.client.exporter.keyring`. This file and > > its corresponding section in `/etc/pve/ceph.conf` is created when the > > first MON is created via the API. If the cluster has already been set > > up, a postinst hook creates the keyring file and adapts > > `/etc/pve/ceph.conf` instead. > > > > The core logic of all of this was already added for `ceph-crash` a while > > ago [0] and is reused throughout the series, with some alterations to > > the original code in order to make it a little more generic. > > I tested this series and it works as advertised modulo a race condition: > > When the ceph-exporter unit is started before installing this series it > will fail and systemd will retry a handful of times, during this time > `systemctl is-failed ceph-exporter.service` returns 'activating' instead > of 'failed'. This might explain that then the reset-failed is never > called. This results in ceph-exporter being restarted as part of the > postinst script but failing because the reset-failed was never called > and there have been too many attempts already. > > Otherwise, it works as expected. Thanks! > > Tested-by: Maximiliano Sandoval <[email protected]>
Thanks a ton for testing this! That's a really good catch. As discussed off-list, `ceph-exporter` won't be reset and restarted anymore in debian/postinst. See v3 [0] for an update. [0]: https://lore.proxmox.com/pve-devel/[email protected]/ _______________________________________________ pve-devel mailing list [email protected] https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
