Hello,

I have found several invalid memory accesses when running
dEQP-GLES31.functional.ssbo.* tests on i965 driver (and gen7+). That
invalid memory accesses were unluckily happening when generating the
assembly instructions for SSBO stores for different compute shaders.

However it looks like this problem could happen to other shaders and
situations. Because of that, I am going to explain the problem here:

When generating a untyped surface write/read, i965 driver will end up
calling brw_send_indirect_message() through
brw_send_indirect_surface_message(). At brw_send_indirect_message()'s
'else' branch, the code generates a load of the indirect descriptor to
an address register using an OR instruction and it also generates a new
SEND instruction; if this case happens, the OR instruction is returned.
brw_send_indirect_surface_message() uses that OR instruction to set mlen
and rlen's descriptor bits later.

Just to give more context, when generating instructions in fs/vec4
generators, i965 driver uses pointers to elements in the 'store' table
inside struct brw_codegen. That table has an initial size of 1024 but,
when it's full, it is resized (doubling its size each time,
see brw_next_insn()). This resize operation ends up calling
realloc(). However the returned pointer by realloc() could be different
and the old allocated memory would be free'd as part of the process.

Back to the issue, if the p->store's resize happens when we get the
pointer to the SEND instruction at brw_send_indirect_message()'s 'else'
branch, we could have the following problem:

The realloc() returns a new pointer and *free'd* the old allocation, so
the pointer we previously saved for the OR instruction at
brw_send_indirect_surface_message() becomes invalid (because it is a
pointer of the old allocation). Then, we access to that invalid pointer
when setting up rlen/mlen bits at the end of
brw_send_indirect_surface_message() and we would have undefined results.

This issue is quite unlikely to happen but it is reproducible on
~120 dEQP-GLES31.functional.ssbo.* tests, basically because they have
the same shaders except the buffer variable's data type. Those tests
were failing intermittently at different rates but valgrind helped to
find what was happening.

I would like to expose publicly the problem and analyse possible
solutions for it along with the community. For the time being, a patch
is proposed to mitigate this issue, which is sent as a reply to this
one.

What this work-around patch does is: book enough room for one or more
consecutive SEND* instructions at the end of the p->store table in order
to avoid reallocating p->store in the aforementioned case. The 32 value
was chosen arbitrary because it has low impact in p->store
(its initial size is 1024) and makes this issue much less likely to
happen. We could tune this number to a less conservative value if
needed. If you want to test it, that patch should be applied on top of
this Curro's patch [0] as it fixes a lot of compute shader compilation
errors (~700 dEQP-GLES31.functional.ssbo.* tests). I have setup a branch
with both patches in [1].

Feel free to comment about other solutions, ideas, opinions, etc.

Thanks,

Sam

[0] See attachment at
http://lists.freedesktop.org/archives/mesa-dev/2015-October/097183.html
[1] $ git clone -b dEQP-functional-ssbo-fixes-v1 \
https://github.com/Igalia/mesa.git

Samuel Iglesias Gonsalvez (1):
  i965: book space at the end of p->store for SEND opcodes to avoid
    invalid memory access

 src/mesa/drivers/dri/i965/brw_eu_emit.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

-- 
2.1.4

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to