The patches in the following mails are a signifcant rewrite of the MCA/INIT handlers. At this stage they are for review, not for inclusion in the ia64 tree.
Some background might be useful. The current MCA/INIT handlers have several shortcomings :- (1) Only one MCA stack, so we cannot handle concurrent MCA on multiple cpus. (2) Only one INIT stack, for the monarch. Slave INIT events never get into the C code, which gives no data for the slave processes. (3) The lack of slave INIT processing also means that some MCA events that could normally be recovered may turn into fatal events. If one or more cpus are spinning disabled when an MCA occurs then SAL will eventually hit the disabled cpus with a slave INIT event. Even if the MCA is recoverable (e.g. DBE in user space), the cpus that were hit by INIT are now dead, which makes MCA recovery pointless. (4) A monarch INIT event assumes that it can use the existing stack. If the INIT was delivered while the cpu was in physical mode then the OS monarch handler gets a recursive error. Ditto if the kernel stack has overflowed. (5) MCA and INIT stacks are completely non-standard. You cannot get a backtrace nor debug the MCA/INIT handlers. We even have a special entry point in the unwind code just for MCA/INIT. Only the kernel knows about that unwind routine, external code such as libunwind does not. (6) The current code relies on getting data from the MCA/INIT record. If we hang trying to retrieve that record then we get no useful data. A side effect of using the MCA/INIT record is that we may read a record from an earlier event, it may not have been cleared when a second event occurs. (7) Some horrible assembler code in minstate.h, to handle both the normal stacks and the non-standard MCA/INIT stacks. (8) Only one copy of the SAL to OS state, which prevents multiple cpus from returning to SAL. My MCA/INIT rewrite addresses these problems by :- (1) Using per cpu MCA stacks. (2) Using per cpu INIT stacks. (3) Using a common code path for both monarch and slave INIT events, passing in a flag to indicate if the event is monarch or slave. (4) Neither MCA nor INIT will use any part of the current stack until they have verified that it is safe to do so. (5) MCA/INIT stacks look like normal process stacks. I can even get a backtrace through the MCA/INIT handlers :). This removes the need for the special unwind routine. (6) All data is obtained from PAL/SAL data areas. There is no need to call SAL to get the record, and the problem of stale data goes away. (7) minstate.h is now all virtual mode code. (8) Each cpu gets its own copy of the SAL to OS state. The original plan was to treat an MCA/INIT as an interrupt that switched stacks, even if a cpu was already using a kernel stack. However that caused problems with the notion of "current", mainly because the task structure is stored in the stack area. Separating the task structure from the rest of the stack was vetoed on performance grounds, it would require extra TLB entries. This plan would also have required changes to unwinders, both in the kernel and in external packages such as lcrash. Plan B involves switching to the MCA/INIT stacks, making them look like normal processes with no dependency on data in other stacks. The process that was running at the time of MCA/INIT is converted to look like a sleeping task, complete with its state at the time of interrupt. The MCA/INIT stack has a pointer to the interrupted task; in addition the pid of the interrupted task is placed in the 'comm' field of the MCA/INIT process for humans to read. This approach does not require extra TLBs and it works with the existing unwind code. The only downside is that it requires two small hooks in the scheduler code to adjust the scheduler's notion of "this process is on this cpu". The following 7 patches contain :- 1) Scheduler hooks to change which process is deemed to be on a cpu. 2) Add an extra thread_info flag to indicate the special MCA/INIT stacks. Mainly for debuggers. 3) The bulk of the change. Use per cpu MCA/INIT stacks. Change the SAL to OS state (sos) to be per process. Do all the assembler work on the MCA/INIT stacks, leaving the original stack alone. Pass per cpu state data to the C handlers for MCA and INIT, which also means changing the mca_drv interfaces slightly. Lots of verification on whether the original stack is usable before converting it to a sleeping process. 4) Remove the physical mode path from minstate.h. 5) Align the stack for the initial task to be the same alignment as all other process stacks. Otherwise the validation code needs special cases for the intial task, it is currently only page aligned. 6) Delete the special case unwind code that was only used by the old MCA/INIT handler. 7) Turn off PAL halt. For some reason, INIT that is delivered while the cpu is in PAL halt gets corrupt registers on return from the INIT handler. I am still investigating this, for now skip the PAL halt. Patches are against 2.6.12-rc4, but they should fit rc6. TODO: Although we could theoretically handle concurrent MCA with these patches, MCA is still single threaded by ia64_mca_serialize. It is not clear what our model should be for handling concurrent MCA on multiple cpus, some discussion is required first. Not all state is preserved over MCA/INIT and the return to the previous task. In particular the interrupt registers are not preserved. No big deal, it is just a matter of verifying the save/restore state of every register. This should be fixed in the next iteration. Now that MCA/INIT is recoverable, we will have to address the SCSI timeouts that occur if interrupts are disabled for long periods. MCA can disable interrupts for up to 20 seconds while it does the rendezvous. On resume, the timer code tries to bring jiffies in sync with itc, time runs too fast and we get spurious timeouts. There is no point in recovering from MCA if the disk dies as a side effect of the lost interrupts. - To unsubscribe from this list: send the line "unsubscribe linux-ia64" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html