On Thu, Sep 19, 2019 at 12:37:40PM -0400, Boris Ostrovsky wrote:
> On 9/19/19 12:14 PM, James Dingwall wrote:
> > On Thu, Sep 19, 2019 at 03:51:33PM +0000, Luck, Tony wrote:
> >>> I have been investigating a regression in our environment where pstore 
> >>> (efi-pstore specifically but I suspect this would affect all 
> >>> implementations) no longer works after upgrading from a 4.4 to 5.0 
> >>> kernel when running under xen.  (This is an Ubuntu kernel but I don't 
> >>> think there are patches which affect this area.)
> >> I don't have any answer for this ... but want to throw out the idea that
> >> VMM systems could provide some hypercalls to guests to save/return
> >> some blob of memory (perhaps the "save" triggers automagically if the
> >> guest crashes?).
> >>
> >> That would provide a much better pstore back end than relying on emulation
> >> of EFI persistent variables (which have severe contraints on size, and 
> >> don't
> >> support some pstore modes because you can't dynamically update EFI 
> >> variables
> >> hundreds of times per second).
> >>
> > For clarification this is a dom0 crash rather than an HVM guest with EFI.  I
> > should probably have also mentioned the xen verion has changed from 4.8.4 to
> > 4.11.2 in case its behaviour on detection of crashed domain has changed.
> >
> > (For capturing guest crashes we have enabled xenconsole logging so the
> > hvc0 log is available in dom0.)
> 
> 
> Do you only see this difference between 4.4 and 5.0 when you crash via
> sysrq?
> 
> Because that's where things changed. On 4.4 we seem to be forcing an
> oops, which eventually calls kmsg_dump() and then panic. On 5.0 we call
> panic() directly from sysrq handler. And because Xen's panic notifier
> doesn't return we never get a chance to call kmsg_dump().
> 

Ok, I see that change in 8341f2f222d729688014ce8306727fdb9798d37e.  I 
hadn't tested it any other way before.  Using the null pointer 
de-reference module code at [1] a pstore record is generated as expected 
when the module is loaded (panic_on_oops=1).

I have also tested swapping the kmsg_dump() / 
atomic_notifier_call_chain() around in panic.c and this also results in 
a pstore record being created with sysrq-c.  I don't know if that would 
be an acceptable solution though since it may break behaviour that other 
things depend on.

James

[1] http://ubuntu.5.x6.nabble.com/How-To-Cause-An-Oops-td3681145.html

Reply via email to