On 26/02/2019 19:03, Mark Cave-Ayland wrote: > On 26/02/2019 09:24, Alex Bennée wrote: > >>> Presumably the issue here is somehow related to the compiler incorrectly >>> extending/reducing the shift when the larger type is involved? Also during >>> my tests >>> the visual corruption was only present for 32-bit accesses, but presumably >>> all the >>> access sizes will need a similar fix. >> >> So I did fix this in the third patch when I out of lined the unaligned >> helpers so: >> >> const tcg_target_ulong size_mask = MAKE_64BIT_MASK(0, size * 8); >> >> and >> >> /* Big-endian combine. */ >> r2 &= size_mask; >> >> or >> >> /* Little-endian combine. */ >> r1 &= size_mask; >> >> I guess I should also apply that to patch 1 for bisectability. > > I've done that locally, and while things have improved with progress bars I'm > still > seeing some strange blank fills appearing on the display in MacOS. I'll have > another > dig further and see what's going on...
Okay I've found it: looks like you also need to apply size_mask to the final result to keep within the number of bits represented by size: diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index 7729254424..73710f9b9c 100644 --- a/accel/tcg/cputlb.c +++ b/accel/tcg/cputlb.c @@ -1274,11 +1274,11 @@ static tcg_target_ulong load_helper(CPUArchState *env, target_ulong addr, if (big_endian) { /* Big-endian combine. */ r2 &= size_mask; - res = (r1 << shift) | (r2 >> ((size * 8) - shift)); + res = ((r1 << shift) | (r2 >> ((size * 8) - shift))) & size_mask; } else { /* Little-endian combine. */ r1 &= size_mask; - res = (r1 >> shift) | (r2 << ((size * 8) - shift)); + res = ((r1 >> shift) | (r2 << ((size * 8) - shift))) & size_mask; } return res; } I've now incorporated this into your original patchset, rebased and pushed the result to https://github.com/mcayland/qemu/tree/alex-softmmu for you to grab and test. Hopefully this version will now pass Emilio's tests too: I don't have a benchmark setup in place, so it's worth testing to make sure that my fixes haven't introduced any performance regressions. Something else I noticed is that patch 3 removes the extra victim TLB fill from the unaligned access path in store_helper() but doesn't mention it in the commit message - is this deliberate? ATB, Mark.