On 24/06/2016 12:42, Mark Cave-Ayland wrote:
> On 24/06/16 07:36, Paolo Bonzini wrote:
> 
>> On 24/06/2016 05:57, Richard Henderson wrote:
>>>
>>> Whatever happens, it happens after 10GB of logs, which is simply too
>>> much to sift through.  I've tried to narrow it down, but the lack of a
>>> hardware tlb refill means that we get hundreds of thousands of Data
>>> Access Faults that are simply TLB misses and not the actual Segmentation
>>> Fault in question.
>>>
>>> It doesn't seem to affect other OSes, so I can't imagine what quirk is
>>> being exercised in this case.
>>>
>>> As loath as I am to suggest it, we may have to revert the sparc indirect
>>> register patch for the release.
>>
>> We have more than a month.  If it's reproducible, it can be fixed. :)
>>
>>> I do now ping the rest of my sparc improvements patchset.  It's
>>> completely independent of the use of indirect registers.
>>
>> Mark, perhaps you can try to use migration to reduce the amount of
>> logging?  (Start QEMU with -snapshot, try to stop the vm before it
>> fails.  If you succeed, do a "migrate exec:cat>foo.sav" followed by
>> "commit"; if you fail, try again).
> 
> Yeah, given the improvements that Richard has made, I'd prefer not to
> revert if at all possible. Finally I have some spare time today so I'll
> try and get this down to an easily-testable qcow2 image that can
> reproduce the issue.

I've gotten an image that reaches the segmentation fault in about 1
second but I cannot upload it anywhere in the next few hours.  The good
news is that it fails even without a hard disk (so it's a stateless vm)
and with -d nochain -singlestep.  The bad news is that the dump is not
very deterministic and that I failed to create images closer to the failure.

Paolo

Reply via email to