Re: [PATCH v1 2/5] cpukit: Add Exception Manager

Kinsey Moore Tue, 31 Aug 2021 06:35:27 -0700

On 8/31/2021 04:31, Sebastian Huber wrote:

On 30/08/2021 17:13, Kinsey Moore wrote:
On 8/30/2021 07:50, Sebastian Huber wrote:
On 30/08/2021 14:27, Kinsey Moore wrote:
On 8/30/2021 00:42, Sebastian Huber wrote:
Hello Kinsey,
why can't you use the existing fatal error extension for this? Youjust have to test for an RTEMS_FATAL_SOURCE_EXTENSION source. Thefatal code is a pointer to the exception frame.
Unfortunately, the fatal error extensions framework necessarilyassumes that the exception is fatal and so does not include themachinery to perform a thread dispatch or restore the exceptionframe for additional execution. It could theoretically be done inthe fatal error extensions context, but it would end up beingreimplemented for every architecture and you'd have to unwind thestack manually. I'm sure there are other ragged edges that wouldhave to be smoothed over as well.
Non-interrupt exceptions are not uniformly handled acrossarchitectures in RTEMS currently. Adding theRTEMS_FATAL_SOURCE_EXTENSION fatal source was an attempt to do this.I am not that fond of adding a second approach unless there arestrong technical reasons to do this.
This was in an effort to formalize how recoverable exceptions arehandled. Currently, it's done on on SPARC by handling exception trapsas you would an interrupt trap since they share a common architectureon that platform. This representation varies quite a bit amongplatforms, so we needed a different mechanism.
I recently changed the non-interrupt exception handling on sparc,since it was not robust against a corrupt stack pointer:
http://devel.rtems.org/ticket/4459
The initial fatal extensions are quite robust, you only need astack, valid read-only data and a valid code. So, using a userextension is the right thing to do, but I don't thing we need a newone.
Doing the non-interrupt exception processing on the stack whichcaused the exception is a bit problematic, since the stack pointermight be corrupt as well. It is more robust to switch to for examplethe interrupt stack. If the exception was caused by an interrupt,then this exception is not recoverable.
The non-interrupt exception processing occurs on the interrupt stack,not the thread/user stack. In the AArch64 support code provided, thestack is switched back to the thread/user stack before threaddispatch and exception frame restoration occurs.
You can only switch back to the thread stack if it is valid. Doing athread dispatch should be only done if you are sure that the systemstate is still intact. This is probably no the case for most exceptions.

If the handler has declared that it handled the exception and correctedthe cause underlying the exception then the system state should bevalid. If it can't make that claim then it should not have handled theexception.

If the non-interrupt exception was caused by a thread, then youcould do some high level actions for some exceptions, such asfloating-point exceptions and arithmetic exceptions. If you get adata abort or instruction error, then it is probably better toterminate the system.
I leave that decision to the handlers defined on this framework. Inthe case of the exception-to-signal mapping, I'm carrying over theexisting exception set from the SPARC implementation.
It is probably this code:

+    case EXCEPTION_DATA_ABORT_READ:
+    case EXCEPTION_DATA_ABORT_WRITE:
+    case EXCEPTION_DATA_ABORT_UNSPECIFIED:
+    case EXCEPTION_INSTRUCTION_ABORT:
+    case EXCEPTION_MMU_UNSPECIFIED:
+    case EXCEPTION_ACCESS_ALIGNMENT:
+      signal = SIGSEGV;
+      break;
+
+    default:
+      /*
+ * Covers unknown, PC/SP alignment, illegal execution state,and any new
+       * exception classes that get added.
+       */
+      signal = SIGILL;
+      break;
+  }
Using signals to handle these exceptions is like playing Russianroulette.

You're right. Specifically, SP alignment faults should be moved to thenot-handled section because they're not actually handled here and wouldhave to be to proceed with further execution. I'll make that change, thanks.

Non-interrupt exception handling is always architecture-dependent.It is just a matter how you organize it. In general, the mostsensible way to deal with non-interrupt exceptions is to log theerror somehow and terminate the system. The mapping to signals is abit of a special case if you ask me. My preferred way to handlenon-interrupt exceptions would be to
1. switch to a dedicated stack

2. save the complete register set to the CPU exception frame
3. call the fatal error extensions with RTEMS_FATAL_SOURCE_EXTENSIONand the CPU exception frame (with interrupts disabled)
Add a new API to query/alter the CPU exception frame, switch to thestack indicated by the CPU exception frame, and restore the contextstored in the CPU exception frame. With these architecture-dependentCPU exception frame support it should be possible to implement ahigh level mapper to signals.
What you've described is basically what is happening here (thededicated stack is currently the interrupt/exception stack onAArch64), but the low level details are necessarily contained withinthe CPU port in patch 3/5. Support for this framework is not requiredfor any CPU port, but CPU ports that do support it repurpose theexisting code underlying the fatal error extensions with theadditional support you described above.
I don't think that looking at existing code is the right thing to do.The exception handling is too diverse in RTEMS. We should think abouthow a clean design should look like.

I repurposed the existing code in the AArch64 CPU port because ithappened to do part of what was needed as you listed just above. Thismay not be a perfectly clean design, but it's cleaner than whatcurrently exists for recoverably handling machine exceptions. Whatcurrently exists is: hooking the exception vector(s) with one-offassembly for each platform and exception type.

This does not exist in parallel to the fatal error extensions, butrather the fatal error extensions are moved on top of the ExceptionManager for CPU ports that support it. The Exception Manager returnswhether the exception was handled and the CPU port then calls thefatal error extensions if the exception wasn't handled. With thispatch set, only an accessor was added to get the exception class, butmy initial thoughts included manipulation of the execution addressand several other more generic manipulators.
If a non-interrupt exception occurs, the default behaviour should beto terminate the system as robust and save as possible. Raising signalshould be optional and not make the exception handling less robust.The support for the signals should also not lead to dead code in thedefault case. This is why I proposed a two step approach. The firststep is a normal fatal error handler. The second step is a resume ofnormal multitasking in a special signal fatal error extension using anarchitecture-specific "jump" which is defined by the CPU exception frame.

I don't think I understand how a signal could be sent to the runtimewhile simultaneously shutting down the system since system shutdownwould necessarily occur before the signal could be sent in thread dispatch.

As things are currently setup, the signal mapping hook is only installedif the application specifically requests it and is off by default. Theaverage application will see no change to exception handling since itdoes not request the mapping and there are no default recoverableexception handlers.

If fatal error handlers run first, assumptions are made that violate theability to resume processing because they are specifically fatal handlers.



Kinsey

_______________________________________________
devel mailing list
[email protected]
http://lists.rtems.org/mailman/listinfo/devel

Re: [PATCH v1 2/5] cpukit: Add Exception Manager

Reply via email to