I had this nicely formatted when I sent it, but it seems to have been
reformatted elsewhere in transit. Hopefully this helps but if not I will
leave it be.
On 2/14/21 6:27 PM, Judah Kocher wrote:
Thanks to each of you for your replies,
Lesson 1: always get machines with remote console access. It wil save
the day some day and help in diagnosing issues.
Having remote console access would be sweet, but unfortunately that goes
far beyond the hobbyist price point I currently have to work with.
On the system that succeeded when you were watching on the console,
did automaic sysupgardes started to work after that?
It did not. This unit still fails the weekly upgrade.
In general, my guess would be a boot.conf contents that prevent the
automatic upgrade to work. Or maybe you have very old bootloaders on
the failing mahcines.
I do not have an /etc/boot.conf file on any of these systems. Both the
units having issues are less than 12 months old so I wouldn't think the
bootloader age would be an issue. I have both older and newer units that
are working correctly. Are there major recent changes you are aware of
that might lead to something like this?
BTW, kernel # cannot be used to identify a kernel.
Interesting. When I realized two of my machines were still on old
snapshots it seemed like a a simple metric to use for tracking this. The
actual number isn't really as relevant to me at this point as the fact
that it does or does not change. Could the update be partially and/or
completely succeeding and the kernel # staying the same? In my limited
experience following -current for the last 4+ years every snapshot comes
with a different #.
-Otto
Care to show a script ? otherwise it looks like rather lengthy
mathematical problem with quite some variables.
The script is very basic. I didn't show it originally because I
know from reading various threads in the past that many of the folks on
this list find a "non-default" install abhorrent and I wasn't interested
in dragging that into this. However, here are the complete script contents:
#!/bin/sh
# Fetch current system version
CURRENT=`uname -a | awk '{print $4}'`
# download latest snapshot but do not install
doas sysupgrade -n -s
# Delete unwanted sets
doas rm /home/_sysupgrade/x*
doas rm /home/_sysupgrade/g*
doas rm /home/_sysupgrade/c*
# Log System Time
year=`date +%Y`
month=`date +%m`
day=`date +%d`
hour=`date +%H`
minute=`date +%M`
second=`date +%S`
echo "System Snapshot upgrade started from $CURRENT at
$hour:$minute:$second on $month/$day/$year" >> /home/$USER/systemLog
doas reboot
I have a separate script that runs on each boot which checks if an
upgrade attempt was the last logged item and fetches the current
system version to compare. If it is the same it logs a failure. If it
is different it logs the new version #. Either way it emails me the
results. This script is running on all 6 systems.
.
What are the permissions on the bsd.upgrade that's left behind? If
they are still +x then your issue is with the boot loader, maybe that
boot.conf otto suggested. If they are -x then the boot loader started
the install kernel but something went wrong.
The permissions of the left-behind bsd.upgrade are -rw------- 1 root
On 14 February 2021 18:02:07 CET, Judah Kocher <koche...@hotmail.com>
wrote:
Hello folks,
I am having an issue with sysupgrade and I have had trouble finding the
source of the problem so I hope someone here might be able and willing
to point me in the right direction.
I have 6 small systems running OpenBSD -current and I have a basic
script which upgrades to the latest snapshot weekly. The systems are
all
relatively similar. Three are the exact same piece of hardware, two are
slightly different, and one is a VM configured to match the first three
as closely as possible with virtual hardware.
The script checks the current kernel version, (e.g. "GENERIC.MP#302")
logs it, runs sysupgrade, and after the reboot it checks the kernel
version again. If it is different it logs it as a "success" and if it
is
still the same it logs it as a failure.
All 6 systems were configured using the same autoinstall configuration
and the upgrade script is identical on each unit. However, two of the
three identical units always fail. When I remote into either system and
manually run the upgrade script it also fails. I was able to get onsite
with one of them where I connected a monitor and keyboard and manually
ran the script to observe the results but oddly enough it succeeded so
I
learned nothing actionable. However it continues to fail the weekly
upgrade. I have confirmed that the script permissions are identical on
the working and nonworking units.
The 4 units that successfully upgrade leave a mail message with a log
of
the upgrade process. However I have been unable to find any record or
log on the systems that are failing to help me figure out why this
isn't
working. The only difference I can identify between the systems is that
"auto_upgrade.conf" and "bsd.upgrade" are both present in "/" on the
two
systems that fail, but are properly removed on the 4 that succeed.
I would appreciate any suggestions of what else I can try or check to
figure out what is causing this issue.
Thanks
Judah