On 8/31/2021 04:31, Sebastian Huber wrote:
On 30/08/2021 17:13, Kinsey Moore wrote:
On 8/30/2021 07:50, Sebastian Huber wrote:
On 30/08/2021 14:27, Kinsey Moore wrote:
On 8/30/2021 00:42, Sebastian Huber wrote:
Hello Kinsey,
why can't you use the existing fatal error extension for this? You
just have to test for an RTEMS_FATAL_SOURCE_EXTENSION source. The
fatal code is a pointer to the exception frame.
Unfortunately, the fatal error extensions framework necessarily
assumes that the exception is fatal and so does not include the
machinery to perform a thread dispatch or restore the exception
frame for additional execution. It could theoretically be done in
the fatal error extensions context, but it would end up being
reimplemented for every architecture and you'd have to unwind the
stack manually. I'm sure there are other ragged edges that would
have to be smoothed over as well.
Non-interrupt exceptions are not uniformly handled across
architectures in RTEMS currently. Adding the
RTEMS_FATAL_SOURCE_EXTENSION fatal source was an attempt to do this.
I am not that fond of adding a second approach unless there are
strong technical reasons to do this.
This was in an effort to formalize how recoverable exceptions are
handled. Currently, it's done on on SPARC by handling exception traps
as you would an interrupt trap since they share a common architecture
on that platform. This representation varies quite a bit among
platforms, so we needed a different mechanism.
I recently changed the non-interrupt exception handling on sparc,
since it was not robust against a corrupt stack pointer:
http://devel.rtems.org/ticket/4459
The initial fatal extensions are quite robust, you only need a
stack, valid read-only data and a valid code. So, using a user
extension is the right thing to do, but I don't thing we need a new
one.
Doing the non-interrupt exception processing on the stack which
caused the exception is a bit problematic, since the stack pointer
might be corrupt as well. It is more robust to switch to for example
the interrupt stack. If the exception was caused by an interrupt,
then this exception is not recoverable.
The non-interrupt exception processing occurs on the interrupt stack,
not the thread/user stack. In the AArch64 support code provided, the
stack is switched back to the thread/user stack before thread
dispatch and exception frame restoration occurs.
You can only switch back to the thread stack if it is valid. Doing a
thread dispatch should be only done if you are sure that the system
state is still intact. This is probably no the case for most exceptions.
If the handler has declared that it handled the exception and corrected
the cause underlying the exception then the system state should be
valid. If it can't make that claim then it should not have handled the
exception.
If the non-interrupt exception was caused by a thread, then you
could do some high level actions for some exceptions, such as
floating-point exceptions and arithmetic exceptions. If you get a
data abort or instruction error, then it is probably better to
terminate the system.
I leave that decision to the handlers defined on this framework. In
the case of the exception-to-signal mapping, I'm carrying over the
existing exception set from the SPARC implementation.
It is probably this code:
+ case EXCEPTION_DATA_ABORT_READ:
+ case EXCEPTION_DATA_ABORT_WRITE:
+ case EXCEPTION_DATA_ABORT_UNSPECIFIED:
+ case EXCEPTION_INSTRUCTION_ABORT:
+ case EXCEPTION_MMU_UNSPECIFIED:
+ case EXCEPTION_ACCESS_ALIGNMENT:
+ signal = SIGSEGV;
+ break;
+
+ default:
+ /*
+ * Covers unknown, PC/SP alignment, illegal execution state,
and any new
+ * exception classes that get added.
+ */
+ signal = SIGILL;
+ break;
+ }
Using signals to handle these exceptions is like playing Russian
roulette.
You're right. Specifically, SP alignment faults should be moved to the
not-handled section because they're not actually handled here and would
have to be to proceed with further execution. I'll make that change, thanks.
Non-interrupt exception handling is always architecture-dependent.
It is just a matter how you organize it. In general, the most
sensible way to deal with non-interrupt exceptions is to log the
error somehow and terminate the system. The mapping to signals is a
bit of a special case if you ask me. My preferred way to handle
non-interrupt exceptions would be to
1. switch to a dedicated stack
2. save the complete register set to the CPU exception frame
3. call the fatal error extensions with RTEMS_FATAL_SOURCE_EXTENSION
and the CPU exception frame (with interrupts disabled)
Add a new API to query/alter the CPU exception frame, switch to the
stack indicated by the CPU exception frame, and restore the context
stored in the CPU exception frame. With these architecture-dependent
CPU exception frame support it should be possible to implement a
high level mapper to signals.
What you've described is basically what is happening here (the
dedicated stack is currently the interrupt/exception stack on
AArch64), but the low level details are necessarily contained within
the CPU port in patch 3/5. Support for this framework is not required
for any CPU port, but CPU ports that do support it repurpose the
existing code underlying the fatal error extensions with the
additional support you described above.
I don't think that looking at existing code is the right thing to do.
The exception handling is too diverse in RTEMS. We should think about
how a clean design should look like.
I repurposed the existing code in the AArch64 CPU port because it
happened to do part of what was needed as you listed just above. This
may not be a perfectly clean design, but it's cleaner than what
currently exists for recoverably handling machine exceptions. What
currently exists is: hooking the exception vector(s) with one-off
assembly for each platform and exception type.
This does not exist in parallel to the fatal error extensions, but
rather the fatal error extensions are moved on top of the Exception
Manager for CPU ports that support it. The Exception Manager returns
whether the exception was handled and the CPU port then calls the
fatal error extensions if the exception wasn't handled. With this
patch set, only an accessor was added to get the exception class, but
my initial thoughts included manipulation of the execution address
and several other more generic manipulators.
If a non-interrupt exception occurs, the default behaviour should be
to terminate the system as robust and save as possible. Raising signal
should be optional and not make the exception handling less robust.
The support for the signals should also not lead to dead code in the
default case. This is why I proposed a two step approach. The first
step is a normal fatal error handler. The second step is a resume of
normal multitasking in a special signal fatal error extension using an
architecture-specific "jump" which is defined by the CPU exception frame.
I don't think I understand how a signal could be sent to the runtime
while simultaneously shutting down the system since system shutdown
would necessarily occur before the signal could be sent in thread dispatch.
As things are currently setup, the signal mapping hook is only installed
if the application specifically requests it and is off by default. The
average application will see no change to exception handling since it
does not request the mapping and there are no default recoverable
exception handlers.
If fatal error handlers run first, assumptions are made that violate the
ability to resume processing because they are specifically fatal handlers.
Kinsey
_______________________________________________
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel