According to the Intel Software Development Manual:

   Streaming loads may be weakly ordered and may appear to software to
   execute out of order with respect to other memory operations.
   Software must explicitly use fences (e.g. MFENCE) if it needs to
   preserve order among streaming loads or between streaming loads and
   other memory operations.

That is, a memory fence is needed to preserve the order between the GPU
writing the buffer and the streaming loads reading it back.

Reported-by: Joseph Nuzman <joseph.nuz...@intel.com>
---
 src/mesa/main/streaming-load-memcpy.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/main/streaming-load-memcpy.c 
b/src/mesa/main/streaming-load-memcpy.c
index d7147af..32854b6 100644
--- a/src/mesa/main/streaming-load-memcpy.c
+++ b/src/mesa/main/streaming-load-memcpy.c
@@ -54,16 +54,19 @@ _mesa_streaming_load_memcpy(void *restrict dst, void 
*restrict src, size_t len)
 
       memcpy(d, s, MIN2(bytes_before_alignment_boundary, len));
 
       d = (char *)ALIGN((uintptr_t)d, 16);
       s = (char *)ALIGN((uintptr_t)s, 16);
       len -= MIN2(bytes_before_alignment_boundary, len);
    }
 
+   if (len >= 64)
+      _mm_mfence();
+
    while (len >= 64) {
       __m128i *dst_cacheline = (__m128i *)d;
       __m128i *src_cacheline = (__m128i *)s;
 
       __m128i temp1 = _mm_stream_load_si128(src_cacheline + 0);
       __m128i temp2 = _mm_stream_load_si128(src_cacheline + 1);
       __m128i temp3 = _mm_stream_load_si128(src_cacheline + 2);
       __m128i temp4 = _mm_stream_load_si128(src_cacheline + 3);
-- 
2.4.9

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to