On 24/06/2016 05:57, Richard Henderson wrote: > > Whatever happens, it happens after 10GB of logs, which is simply too > much to sift through. I've tried to narrow it down, but the lack of a > hardware tlb refill means that we get hundreds of thousands of Data > Access Faults that are simply TLB misses and not the actual Segmentation > Fault in question. > > It doesn't seem to affect other OSes, so I can't imagine what quirk is > being exercised in this case. > > As loath as I am to suggest it, we may have to revert the sparc indirect > register patch for the release.
We have more than a month. If it's reproducible, it can be fixed. :) > I do now ping the rest of my sparc improvements patchset. It's > completely independent of the use of indirect registers. Mark, perhaps you can try to use migration to reduce the amount of logging? (Start QEMU with -snapshot, try to stop the vm before it fails. If you succeed, do a "migrate exec:cat>foo.sav" followed by "commit"; if you fail, try again). It would be nice to have a mechanism to stop the VM after executing N basic blocks. Binary search on this value then can help with coming up with a more easily debuggable snapshot, possibly to a point where the difference between pre-patch and post-patch becomes deterministic. Paolo