On 24/06/2016 12:42, Mark Cave-Ayland wrote: > On 24/06/16 07:36, Paolo Bonzini wrote: > >> On 24/06/2016 05:57, Richard Henderson wrote: >>> >>> Whatever happens, it happens after 10GB of logs, which is simply too >>> much to sift through. I've tried to narrow it down, but the lack of a >>> hardware tlb refill means that we get hundreds of thousands of Data >>> Access Faults that are simply TLB misses and not the actual Segmentation >>> Fault in question. >>> >>> It doesn't seem to affect other OSes, so I can't imagine what quirk is >>> being exercised in this case. >>> >>> As loath as I am to suggest it, we may have to revert the sparc indirect >>> register patch for the release. >> >> We have more than a month. If it's reproducible, it can be fixed. :) >> >>> I do now ping the rest of my sparc improvements patchset. It's >>> completely independent of the use of indirect registers. >> >> Mark, perhaps you can try to use migration to reduce the amount of >> logging? (Start QEMU with -snapshot, try to stop the vm before it >> fails. If you succeed, do a "migrate exec:cat>foo.sav" followed by >> "commit"; if you fail, try again). > > Yeah, given the improvements that Richard has made, I'd prefer not to > revert if at all possible. Finally I have some spare time today so I'll > try and get this down to an easily-testable qcow2 image that can > reproduce the issue.
I've gotten an image that reaches the segmentation fault in about 1 second but I cannot upload it anywhere in the next few hours. The good news is that it fails even without a hard disk (so it's a stateless vm) and with -d nochain -singlestep. The bad news is that the dump is not very deterministic and that I failed to create images closer to the failure. Paolo