On 6/7/24 1:57 AM, Thomas Huth wrote:
On 05/06/2024 16.48, Jared Rossi wrote:
diff --git a/pc-bios/s390-ccw/s390-ccw.h b/pc-bios/s390-ccw/s390-ccw.h
index c977a52b50..de3d1f0d5a 100644
--- a/pc-bios/s390-ccw/s390-ccw.h
+++ b/pc-bios/s390-ccw/s390-ccw.h
@@ -43,6 +43,7 @@ typedef unsigned long long u64;
#include "iplb.h"
/* start.s */
+extern char _start[];
void disabled_wait(void) __attribute__ ((__noreturn__));
void consume_sclp_int(void);
void consume_io_int(void);
@@ -88,6 +89,11 @@ __attribute__ ((__noreturn__))
static inline void panic(const char *string)
{
sclp_print(string);
+ if (load_next_iplb()) {
+ sclp_print("\nTrying next boot device...");
+ jump_to_IPL_code((long)_start);
+ }
+
disabled_wait();
}
Honestly, I am unsure whether this is a really cool idea or a very
ugly hack ... but I think I tend towards the latter, sorry. Jumping
back to the startup code might cause various problem, e.g.
pre-initialized variables don't get their values reset, causing
different behavior when the s390-ccw bios runs a function a second
time this way. Thus this sounds very fragile. Could we please try to
get things cleaned up correctly, so that functions return with error
codes instead of panicking when we can continue with another boot
device? Even if its more work right now, I think this will be much
more maintainable in the future.
Thomas
Thanks Thomas, I appreciate your insight. Your hesitation is
perfectly understandable as well. My initial design was like you
suggest, where the functions return instead of panic, but the issue I
ran into is that netboot uses a separate image, which we jump in to
at the start of IPL from a network device (see zipl_load() in
pc-bios/s390-ccw/bootmap.c). I wasn't able to come up with a simple
way to return to the main BIOS code if a netboot fails other than by
jumping back. So, it seems to me that netboot kind of throws a
monkeywrench into the basic idea of reworking the panics into returns.
I'm open to suggestions on a better way to recover from a failed
netboot, and it's certainly possible I've overlooked something, but
as far as I can tell a jump is necessary in that particular case at
least. Netboot could perhaps be handled as a special case where the
jump back is permitted whereas other device types return, but I don't
think that actually solves the main issue.
What are your thoughts on this?
Yes, I agree that jumping is currently required to get back from the
netboot code. So if you could rework your patches in a way that limits
the jumping to a failed netboot, that would be acceptable, I think.
Apart from that: We originally decided to put the netboot code into a
separate binary since the required roms/SLOF module might not always
have been checked out (it needed to be done manually), so we were not
able to compile it in all cases. But nowadays, this is handled in a
much nicer way, the submodule is automatically checked out once you
compile the s390x-softmmu target and have a s390x compiler available,
so I wonder whether we should maybe do the next step and integrate the
netboot code into the main s390-ccw.img now? Anybody got an opinion on
this?
Thomas
Hi Thomas,
I would generally defer the decision about integrating the netboot code
to someone with more insight than me, but for what it's worth, I am of
the opinion that if we want to rework all of panics into returns, then
it would make the most sense to also do the integration now so that we
can avoid using jump altogether. Unless I'm missing something simple, I
don't think the panic/return conversion will be trivial, and actually I
think it will be quite invasive since there are dozens of calls to panic
and assert that will need to be changed. It doesn't seem worthwhile to
do all of these conversions in order to avoid using jump, but then still
being exposed to possible problems caused by jumping due to netboot
requiring it anyway.
Regards,
Jared Rossi