Script 'mail_helper' called by obssrc Hello community, here is the log from the commit of package health-checker for openSUSE:Factory checked in at 2025-11-04 18:40:20 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/health-checker (Old) and /work/SRC/openSUSE:Factory/.health-checker.new.1980 (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "health-checker" Tue Nov 4 18:40:20 2025 rev:27 rq:1314047 version:1.13+git20251028.c9a2249 Changes: -------- --- /work/SRC/openSUSE:Factory/health-checker/health-checker.changes 2024-12-09 21:09:41.364245822 +0100 +++ /work/SRC/openSUSE:Factory/.health-checker.new.1980/health-checker.changes 2025-11-04 18:40:22.012400122 +0100 @@ -1,0 +2,8 @@ +Tue Oct 28 10:20:17 UTC 2025 - Danilo Spinella <[email protected]> + +- Update to version 1.13+git20251028.c9a2249: + * Release version 1.13 + * Use Automatic Boot Assessment for systemd-boot/grub2-bls + * README: clearify services in After section + +------------------------------------------------------------------- Old: ---- health-checker-1.12+git20241105.2e2832f15742.obscpio New: ---- health-checker-1.13+git20251028.c9a2249.obscpio ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ health-checker.spec ++++++ --- /var/tmp/diff_new_pack.Ay3i4m/_old 2025-11-04 18:40:24.628510254 +0100 +++ /var/tmp/diff_new_pack.Ay3i4m/_new 2025-11-04 18:40:24.632510424 +0100 @@ -1,7 +1,7 @@ # # spec file for package health-checker # -# Copyright (c) 2024 SUSE LLC +# Copyright (c) 2025 SUSE LLC and contributors # # All modifications and additions to the file contributed by third parties # remain the property of their copyright owners, unless otherwise agreed @@ -19,7 +19,7 @@ %define _dracutmoduledir %(pkg-config --variable=dracutmodulesdir dracut) Name: health-checker -Version: 1.12+git20241105.2e2832f15742 +Version: 1.13+git20251028.c9a2249 Release: 0 Summary: Service for verifying that important services are running License: GPL-2.0-only @@ -74,6 +74,12 @@ %install %make_install +mkdir -p %{buildroot}%{_sysconfdir}/kernel +echo '3' > %{buildroot}%{_sysconfdir}/kernel/tries +# Add autoreboot on kernel panic +mkdir -p %{buildroot}%{_prefix}/lib/sysctl.d +echo 'panic=5' > %{buildroot}%{_prefix}/lib/sysctl.d/health-checker.conf +%fdupes %{buildroot}%{_mandir} %fdupes %{buildroot}%{_mandir} %pre @@ -108,6 +114,9 @@ %dir %{_sysconfdir}/grub.d %{_sysconfdir}/grub.d/*_health_check* %{_dracutmoduledir}/50health-checker +%dir %{_sysconfdir}/kernel +%config(noreplace) %{_sysconfdir}/kernel/tries +%{_prefix}/lib/sysctl.d/%{name}.conf %files plugins-MicroOS %{_libexecdir}/health-checker/etc-overlayfs.sh ++++++ _service ++++++ --- /var/tmp/diff_new_pack.Ay3i4m/_old 2025-11-04 18:40:24.796517328 +0100 +++ /var/tmp/diff_new_pack.Ay3i4m/_new 2025-11-04 18:40:24.836519012 +0100 @@ -1,7 +1,7 @@ <services> <service name="obs_scm" mode="manual"> - <param name="version">1.12</param> - <param name="versionformat">1.12+git%cd.%h</param> + <param name="version">1.13</param> + <param name="versionformat">1.13+git%cd.%h</param> <param name="url">https://github.com/openSUSE/health-checker.git</param> <param name="scm">git</param> <param name="changesgenerate">enable</param> ++++++ _servicedata ++++++ --- /var/tmp/diff_new_pack.Ay3i4m/_old 2025-11-04 18:40:24.956524063 +0100 +++ /var/tmp/diff_new_pack.Ay3i4m/_new 2025-11-04 18:40:24.996525748 +0100 @@ -1,7 +1,7 @@ <servicedata> <service name="tar_scm"> <param name="url">https://github.com/openSUSE/health-checker.git</param> - <param name="changesrevision">2e2832f157429be17ddf7bb21976eee2cda251d6</param></service> + <param name="changesrevision">c9a224963060436a7991e26d0235a77ba267f70e</param></service> </servicedata> (No newline at EOF) ++++++ health-checker-1.12+git20241105.2e2832f15742.obscpio -> health-checker-1.13+git20251028.c9a2249.obscpio ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/health-checker-1.12+git20241105.2e2832f15742/NEWS new/health-checker-1.13+git20251028.c9a2249/NEWS --- old/health-checker-1.12+git20241105.2e2832f15742/NEWS 2024-11-05 09:00:04.000000000 +0100 +++ new/health-checker-1.13+git20251028.c9a2249/NEWS 2025-10-28 11:06:25.000000000 +0100 @@ -2,6 +2,9 @@ Copyright (C) 2017-2024 Thorsten Kukuk et al. +Version 1.13 +* Use Automatic Boot Assessment for systemd-boot and grub2-bls + Version 1.12 * Improve RPM database consistency check: Use rpm command directly to prevent blocking when there is no zypper database yet or when the diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/health-checker-1.12+git20241105.2e2832f15742/README.md new/health-checker-1.13+git20251028.c9a2249/README.md --- old/health-checker-1.12+git20241105.2e2832f15742/README.md 2024-11-05 09:00:04.000000000 +0100 +++ new/health-checker-1.13+git20251028.c9a2249/README.md 2025-10-28 11:06:25.000000000 +0100 @@ -6,14 +6,47 @@ ## How does this work? `health-checker` will be called by a systemd service during the boot -process. All services, which should be checked, needs to be listed in the -'After' section. +process. -The `health-checker` script will call several plugins. Every plugin is -responsible to check a special service or condition. For this, the plugin is -called with the option *check*. If this fails, the plugin will exit with the -return value `1`, else `0`. -If everyting was fine, the script will create a +The `health-checker` script will call several plugins; each plugin is +responsible to check a special service or condition. To run the check, the +plugin is called with the option *check* (e.g. +`/usr/libexec/health-checker/tmp.sh check`). If this fails, the plugin will +exit with the return value `1`, else `0`. The `health-checker` script will call +all plugins inside `/usr/libexec/health-checker` and +`/usr/local/libexec/health-checker` directories; the former directory includes +plugins installed by system packages (and therefore coming through an RPM), the +latter includes plugins installed manually by the system admin. All services, +which should be checked by the plugins, needs to be listed in the 'After' +section, so that they are started before `health-checker`. + +Its behavior depends if the system is using systemd-boot, grub2-bls or any +other bootloader following the Boot Loader Specification (BLS) or legacy. + +### systemd-boot and grub2-bls + +On systemd-boot and grub2-bls, when the BLS is followed, `health-checker` +assess the status of the system at the boot and act accordingly. It uses the +[Automatic Boot Assessment](https://systemd.io/AUTOMATIC_BOOT_ASSESSMENT/) +provided by systemd to check every new snapshot, to reboot a predetermined +amount of times, and then it lets the user access an emergency shell when +everything else fails. + +Every new snapshot has a separate boot entry with a boot counter (according to +`/etc/kernel/tries`, which health-checker sets to 3 by default); when that +snapshot is booted for the first time, the bootloader will decrease the amount +of tries left. Then if health-checker succeed, systemd will call +`systemd-bless-boot`, which will mark the new snapshot as working. If instead +`health-checker` fails, the system will reboot into the next available +snapshot, performing a rollback at boot. The bootloader will order the snapshot +depending if they are new/working, leaving all the non working snapshots last +(so with a non 0 boots available left). If an entry that was working previously +is now broken, `health-checker` will try rebooting once before starting an +emergency shell. + +### Legacy grub + +If everything was fine, the script will create a `/var/lib/misc/health-check.state` file with the number of the current, working btrfs subvolume with the root filesystem. If a plugin reports an error condition, the `health-checker` script will take diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/health-checker-1.12+git20241105.2e2832f15742/configure.ac new/health-checker-1.13+git20251028.c9a2249/configure.ac --- old/health-checker-1.12+git20241105.2e2832f15742/configure.ac 2024-11-05 09:00:04.000000000 +0100 +++ new/health-checker-1.13+git20251028.c9a2249/configure.ac 2025-10-28 11:06:25.000000000 +0100 @@ -1,5 +1,5 @@ dnl Process this file with autoconf to produce a configure script. -AC_INIT(health-checker, 1.12) +AC_INIT(health-checker, 1.13) AM_INIT_AUTOMAKE AC_PREFIX_DEFAULT(/usr) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/health-checker-1.12+git20241105.2e2832f15742/man/health-checker.8.xml.in new/health-checker-1.13+git20251028.c9a2249/man/health-checker.8.xml.in --- old/health-checker-1.12+git20241105.2e2832f15742/man/health-checker.8.xml.in 2024-11-05 09:00:04.000000000 +0100 +++ new/health-checker-1.13+git20251028.c9a2249/man/health-checker.8.xml.in 2025-10-28 11:06:25.000000000 +0100 @@ -62,11 +62,27 @@ <refsect1 id='description'><title>DESCRIPTION</title> <para><emphasis remap='B'>health-checker</emphasis> checks if the system is coming up correctly during boot up. - In case of an error, the remediation action depends on what happened before. - If this is the first boot after a transactional update, an automatic rollback - to the last known working snapshot is executed. If the snapshot was already - rebooted successfully before, a reboot is tried. If this does not help, - some sevices are shutdown and an admin has to repair the system. + The behavior depends if the system is using systemd-boot/grub2-bls + or legacy grub2.</para> + <para>With systemd-boot/grub2-bls: the bootloader picks the first available entry + and health-checker checks that the system has booted successfully. When + any plugin fails, then health-checker will behave depending on the entry + booted: if it is a snapshot that has previously been booted successfully + then health-checker will try rebooting the system once more, and it will + open an emergency shell if the system still fails to boot. + If this is the first boot of a new snapshot, then the system will reboot + a number of timed configured in <filename>/etc/kernel/tries</filename>, + checking the status each time. If the system stills fails to boot, then + health-checker marks the current entry as not working and reboot. The + bootloader will pick the next available boot entry. + It is possible to disable health-checker automated rebooting by adding + <command>"health-checker-reboot=disabled"</command> to the kernel cmdline.</para> + <para>In case of an error and legacy grub2 is used, the remediation action depends + on what happened before. If this is the first boot after a transactional + update, an automatic rollback to the last known working snapshot is + executed. If the snapshot was already rebooted successfully before, a + reboot is tried. If this does not help, some services are shutdown and an + admin has to repair the system. </para> <para> If the boot was successful, the current snapshot is marked as known to be diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/health-checker-1.12+git20241105.2e2832f15742/plugins/health-check-tester.sh new/health-checker-1.13+git20251028.c9a2249/plugins/health-check-tester.sh --- old/health-checker-1.12+git20241105.2e2832f15742/plugins/health-check-tester.sh 2024-11-05 09:00:04.000000000 +0100 +++ new/health-checker-1.13+git20251028.c9a2249/plugins/health-check-tester.sh 2025-10-28 11:06:25.000000000 +0100 @@ -6,6 +6,11 @@ run_checks() { + # Check if the system is running legacy grub as the check file + # is only used there + if [ -d /boot/efi/loader/entries ]; then + return 0 + fi # Simple check: if this is the very first boot, succeed. # If not, fail. if [ -f /var/lib/misc/health-check.state ]; then diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/health-checker-1.12+git20241105.2e2832f15742/sbin/health-checker.in new/health-checker-1.13+git20251028.c9a2249/sbin/health-checker.in --- old/health-checker-1.12+git20241105.2e2832f15742/sbin/health-checker.in 2024-11-05 09:00:04.000000000 +0100 +++ new/health-checker-1.13+git20251028.c9a2249/sbin/health-checker.in 2025-10-28 11:06:25.000000000 +0100 @@ -21,6 +21,11 @@ SNAPSHOT_DEFAULT="" BTRFS_ID_CURRENT=0 +is_bls=0 +if [ -d /boot/efi/loader/entries ]; then + is_bls=1 +fi + set_btrfs_id() { BTRFS_ID_DEFAULT=`btrfs subvolume get-default / | awk '{print $2}'` @@ -28,6 +33,20 @@ BTRFS_ID_CURRENT=`findmnt --output OPTIONS --noheadings / | sed -e 's|.*subvolid=\([0-9]\+\).*|\1|g'` } +get_snapshot() +{ + sed -e 's|.*@/\.snapshots/\([0-9]\+\)/snapshot.*|\1|g' +} + +set_snapshot_id() +{ + SNAPSHOT_DEFAULT="$(btrfs subvolume get-default / | get_snapshot)" + SNAPSHOT_CURRENT="$(findmnt --output OPTIONS --noheadings --first-only --direction backward /usr | get_snapshot)" + if [ -z "${SNAPSHOT_CURRENT}" ]; then + SNAPSHOT_CURRENT="$(findmnt --output OPTIONS --noheadings --first-only --direction backward / | get_snapshot)" + fi +} + create_log() { local SEVERITY=1 @@ -103,7 +122,24 @@ done } -error_decission() +# We want to enter an emergency shell just once every boot, otherwise +# systemd restarts health-checker every time the user continues from +# the emergency shell. This causes a loop with no way to exit the emergency shell +# other than fixing the issue +start_emergency_shell() { + if [ ! -f /run/health-checker/.emergency-shell-started ]; then + create_log user.emerg "Machine didn't come up correctly, starting emergency shell" + telem_send_record 1 + mkdir /run/health-checker + touch /run/health-checker/.emergency-shell-started + stop_services + systemctl start emergency.target + else + exit 1 + fi +} + +error_decision_legacy() { if [ ! -f ${STATE_FILE} ]; then # No state file, no successful boot @@ -133,15 +169,105 @@ telem_send_record 1 systemctl reboot else - create_log user.emerg "Machine didn't come up correctly, starting emergency shell" - stop_services - systemctl start emergency.target + start_emergency_shell fi } -# Clear GRUB flag (used to determine if system was able to boot at all) -echo "Clearing GRUB flag" -grub2-editenv - set health_checker_flag=0 +systemd-bless-boot() { + if [ -x /usr/lib/systemd/systemd-bless-boot ]; then + /usr/lib/systemd/systemd-bless-boot "$@" + fi +} + +get_current_entry_bls() { + bootctl list --json=short | jq -r ".[] | select(.isSelected) | .id" +} + +error_decision_bls() { + local status should_reboot current_entry + # systemd-bless-boot returns: + # clean: boot counting is not in effect + # good: this entry booted fine. Since we are calling this at boot, + # it shouldn't be possible to return "good" + # dirty: this entry has no more tries available + # indeterminate: when an entry is neither good or bad, i.e. + # we are still trying to boot 3 times + status=$(systemd-bless-boot status) + # The bootloader fills the EFI variables with the info we need + + # Get the booted entry + current_entry=$(get_current_entry_bls) + should_reboot=0 + # Do not reboot by default if the entry has been chosen manually or the reboot has + # been disabled in the kernel cmdline + # selected_entry contains the boot count, remove it before comparing it to the default entry + if ! grep -qw "health-checker-reboot=disabled" /proc/cmdline; then + should_reboot=1 + fi + set_snapshot_id + # the entry is the default one + case "$status" in + # boot counting is still in effect, let systemd-boot do the rest + "indeterminate") + if [ "$should_reboot" -eq 1 ]; then + create_log user.alert "Machine didn't come up correctly, trying the same snapshot" + # We want to reboot into the current snapshot + bootctl set-oneshot "$current_entry" + systemctl reboot + fi + start_emergency_shell $should_reboot + ;; + "dirty") + if [ "$should_reboot" -eq 1 ]; then + create_log user.alert "Machine didn't come up correctly, rebooting a different snapshot" + echo "NEW_SNAPSHOT_FAILED=1" > $STATE_FILE + if [ "$SNAPSHOT_DEFAULT" == "$SNAPSHOT_CURRENT" ]; then + # If the default entry has been marked as not dirty, tell + # the bootloader to pick the first new / working snapshot + bootctl set-default "" + fi + systemctl reboot + fi + start_emergency_shell $should_reboot + ;; + "clean"|"good") + [ -f $STATE_FILE ] && . $STATE_FILE + # We want to reboot into the current snapshot to try one more time if it works + if [ "$REBOOTING_GOOD_SNAPSHOT" != "$SNAPSHOT_CURRENT" ] && [ "$should_reboot" -eq 1 ]; then + create_log user.alert "Machine didn't come up correctly, trying same snapshot" + bootctl set-oneshot "$current_entry" + sed -i '/REBOOTING_GOOD_SNAPSHOT/d' $STATE_FILE + echo "REBOOTING_GOOD_SNAPSHOT=$SNAPSHOT_CURRENT" >> $STATE_FILE + systemctl reboot + fi + start_emergency_shell + ;; + "bad") + start_emergency_shell + ;; + *) + create_log user.alert "Machine didn't come up correctly, found unexpected verb in systemd-bless-boot" + reboot_or_emergency_shell $should_reboot + ;; + # good should never appear here because systemd-bless-boot.service that + # marks an entry as "good" is called after boot.entry (and health-checker as well) + # All the next reboots for the same entry will have a "clean" state + esac +} + +error_decision() { + if [ "$is_bls" == "1" ]; then + error_decision_bls + else + error_decision_legacy + fi +} + +if [ "$is_bls" != "1" ]; then + # Clear GRUB flag (used to determine if system was able to boot at all) + echo "Clearing GRUB flag" + grub2-editenv - set health_checker_flag=0 +fi echo "Starting health check" FAILED=0; @@ -157,20 +283,34 @@ if [ ${FAILED} -ne 0 ]; then echo "Health check failed!" - error_decission + error_decision telem_send_record 0 exit 1 else echo "Health check passed" - # Save good working state and remove old rebooted state file - save_working_snapshot - if [ -f ${REBOOTED_STATE} ]; then - create_log user.info "Health check passed after reboot" - rm -rf ${REBOOTED_STATE} + if [ "$is_bls" != "1" ]; then + # Save good working state and remove old rebooted state file + save_working_snapshot + if [ -f ${REBOOTED_STATE} ]; then + create_log user.info "Health check passed after reboot" + rm -rf ${REBOOTED_STATE} + fi + else + NEW_SNAPSHOT_FAILED=0 + REBOOTING_GOOD_SNAPSHOT="" + [ -f $STATE_FILE ] && . $STATE_FILE + set_snapshot_id + # If the new snapshot failed, update the default to the current one + if [ "$NEW_SNAPSHOT_FAILED" -eq 1 ] && [ "$SNAPSHOT_CURRENT" != "$SNAPSHOT_DEFAULT" ]; then + if ! sdbootutil set-default-snapshot "$SNAPSHOT_CURRENT" ; then + create_log user.crit "Cannot set current snapshot as default boot entry using sdbootutil" + fi + fi + [ -f $STATE_FILE ] && rm $STATE_FILE fi fi - +echo "Health check passed" if [ -z "${TELEM_PAYLOAD}" ]; then TELEM_PAYLOAD="Health check passed" fi diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/health-checker-1.12+git20241105.2e2832f15742/systemd/health-checker.service new/health-checker-1.13+git20251028.c9a2249/systemd/health-checker.service --- old/health-checker-1.12+git20241105.2e2832f15742/systemd/health-checker.service 2024-11-05 09:00:04.000000000 +0100 +++ new/health-checker-1.13+git20251028.c9a2249/systemd/health-checker.service 2025-10-28 11:06:25.000000000 +0100 @@ -14,4 +14,5 @@ RemainAfterExit=yes [Install] +RequiredBy=boot-complete.target WantedBy=default.target ++++++ health-checker.obsinfo ++++++ --- /var/tmp/diff_new_pack.Ay3i4m/_old 2025-11-04 18:40:25.268537199 +0100 +++ /var/tmp/diff_new_pack.Ay3i4m/_new 2025-11-04 18:40:25.276537536 +0100 @@ -1,5 +1,5 @@ name: health-checker -version: 1.12+git20241105.2e2832f15742 -mtime: 1730793604 -commit: 2e2832f157429be17ddf7bb21976eee2cda251d6 +version: 1.13+git20251028.c9a2249 +mtime: 1761645985 +commit: c9a224963060436a7991e26d0235a77ba267f70e
