Chris Wilson <ch...@chris-wilson.co.uk> writes: > Quoting Chris Wilson (2018-04-05 20:54:54) > > Quoting Scott D Phillips (2018-04-03 21:05:42)
[...] > > Ok, was hoping to see how you choose to use the streaming load, but I > > guess that's the next patch. > > > > Reviewed-by: Chris Wilson <ch...@chris-wilson.co.uk> > > Oh, one point Eric Anholt mentioned on another thread about movntqda is > that stale data inside the internal buffer is not automatically > invalidated. We may need to emit explicit mfence before the copies if we > are in doubt. A single mfence per tiled-copy is probably not enough to > worry about optimising away. Looking around, I found this errata about movntdqa not honoring the ordering guarantees of locked instructions (VLP31 in the pdf): https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/pentium-n3520-j2850-celeron-n2920-n2820-n2815-n2806-j1850-j1750-spec-update.pdf So I added this code near the top of tiled_to_linear(): if (mem_copy == (mem_copy_fn)_mesa_streaming_load_memcpy) { /* Various atom processors have errata where the movntdqa instruction * (which is used in streaming_load_memcpu) may incorrectly be reordered * before locked instructions. To work around that, we put an lfence * here to manually wait for preceeding loads to be completed. */ __builtin_ia32_lfence(); } It seems that an mfence won't suffice where the errata mentions you need the lfence, by my hazy understanding. Do I have that right, or should this be an mfence? _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev