On 09/13/2018 06:52 PM, Dylan Baker wrote:
Quoting D Scott Phillips (2018-09-13 08:28:29)
Tapani Pälli <tapani.pa...@intel.com> writes:
From: D Scott Phillips <d.scott.phill...@intel.com>
The reference for MOVNTDQA says:
For WC memory type, the nontemporal hint may be implemented by
loading a temporary internal buffer with the equivalent of an
aligned cache line without filling this data to the cache.
[...] Subsequent MOVNTDQA reads to unread portions of the WC
cache line will receive data from the temporary internal
buffer if data is available.
This hidden cache line sized temporary buffer can improve the
read performance from wc maps.
v2: Add mfence at start of tiled_to_linear for streaming loads (Chris)
v3: add Android build support (Tapani)
Reviewed-by: Chris Wilson <ch...@chris-wilson.co.uk>
Reviewed-by: Matt Turner <matts...@gmail.com>
Acked-by: Kenneth Graunke <kenn...@whitecape.org>
---
src/mesa/drivers/dri/i965/Android.mk | 22 +++++++++
src/mesa/drivers/dri/i965/Makefile.am | 7 +++
src/mesa/drivers/dri/i965/Makefile.sources | 6 ++-
src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 62 ++++++++++++++++++++++++++
src/mesa/drivers/dri/i965/meson.build | 18 ++++++--
5 files changed, 110 insertions(+), 5 deletions(-)
.. snip ..
diff --git a/src/mesa/drivers/dri/i965/Makefile.am
b/src/mesa/drivers/dri/i965/Makefile.am
index 0afa7a2f216..d9e06930d38 100644
--- a/src/mesa/drivers/dri/i965/Makefile.am
+++ b/src/mesa/drivers/dri/i965/Makefile.am
@@ -92,8 +92,14 @@ libi965_gen11_la_CFLAGS = $(AM_CFLAGS) -DGEN_VERSIONx10=110
noinst_LTLIBRARIES = \
libi965_dri.la \
+ libintel_tiled_memcpy.la \
$(I965_PERGEN_LIBS)
+libintel_tiled_memcpy_la_SOURCES = \
+ $(intel_tiled_memcpy_FILES)
+libintel_tiled_memcpy_la_CFLAGS = \
+ $(AM_CFLAGS) $(SSE41_CFLAGS)
+
The issue here is that SSE41_CFLAGS includes -msse4.1, which (1) allows
us to use sse4.1 intrinsics and (2) allows the compiler to use sse4.1
instructions in whatever way it wants.
1 is the desired behavior here and 2 is an unfortunate side-effect. The
intrinsics we use are properly guarded by runtime checks so they are
only exercised on systems with support. The other uses of sse4.1 by the
compiler outside the intrinsics is unpredictable. When I made this
change there actually were none, which is why everything worked. But the
compiler has permission to change its mind at any point later.
The sse4.1 code needs isolated from everything else somehow, either
split into separate files or compile the same file multiple times with
different flags, or something.
The way we use sse4.1 in mesa core (the only place I know of that we use it) is
to compile a separate static library using the sse4.1 flags.
OK, I'll try isolating functions requiring sse41 in to own library and
link against that. TBH I'm not 100% sure how this will work right with
all the flatten + inlining used, the surrounding code needs to be
compiled with sse41 as well to be able to always inline a function that
uses intrinsics. Perhaps we need to have full blown
intel_tiled_memcpy_sse41 and intel_tiled_memcpy and then choose one or
another based on support.
// Tapani
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev