>> Maybe the other difference in approach between Andy and me is whether to >> go for a solution that covers all the corner cases, or just make an >> incremental >> improvement that allows for recover in some useful subset of remaining fatal >> cases, but still dies in other cases. > > Does that mean more core code surgery?
Yes. I need to look at other user access inside pagefault_disable/enable() as likely spots where the code may continue after a machine check and retry the access. So expect some more "if (ret == ENXIO) { do something to give up gracefully }" >> I'm happy to replace error messages with ones that are more descriptive and >> helpful to humans. > > Yap, that: "Multiple copyin" with something more understandable to users... I'll work on it. We tend not to have essay length messages as panic() strings. But I can add a comment in the code there so that people who grep whatever panic message we choose can get more details on what happened and what to do. -Tony