Hello, I have found several invalid memory accesses when running dEQP-GLES31.functional.ssbo.* tests on i965 driver (and gen7+). That invalid memory accesses were unluckily happening when generating the assembly instructions for SSBO stores for different compute shaders.
However it looks like this problem could happen to other shaders and situations. Because of that, I am going to explain the problem here: When generating a untyped surface write/read, i965 driver will end up calling brw_send_indirect_message() through brw_send_indirect_surface_message(). At brw_send_indirect_message()'s 'else' branch, the code generates a load of the indirect descriptor to an address register using an OR instruction and it also generates a new SEND instruction; if this case happens, the OR instruction is returned. brw_send_indirect_surface_message() uses that OR instruction to set mlen and rlen's descriptor bits later. Just to give more context, when generating instructions in fs/vec4 generators, i965 driver uses pointers to elements in the 'store' table inside struct brw_codegen. That table has an initial size of 1024 but, when it's full, it is resized (doubling its size each time, see brw_next_insn()). This resize operation ends up calling realloc(). However the returned pointer by realloc() could be different and the old allocated memory would be free'd as part of the process. Back to the issue, if the p->store's resize happens when we get the pointer to the SEND instruction at brw_send_indirect_message()'s 'else' branch, we could have the following problem: The realloc() returns a new pointer and *free'd* the old allocation, so the pointer we previously saved for the OR instruction at brw_send_indirect_surface_message() becomes invalid (because it is a pointer of the old allocation). Then, we access to that invalid pointer when setting up rlen/mlen bits at the end of brw_send_indirect_surface_message() and we would have undefined results. This issue is quite unlikely to happen but it is reproducible on ~120 dEQP-GLES31.functional.ssbo.* tests, basically because they have the same shaders except the buffer variable's data type. Those tests were failing intermittently at different rates but valgrind helped to find what was happening. I would like to expose publicly the problem and analyse possible solutions for it along with the community. For the time being, a patch is proposed to mitigate this issue, which is sent as a reply to this one. What this work-around patch does is: book enough room for one or more consecutive SEND* instructions at the end of the p->store table in order to avoid reallocating p->store in the aforementioned case. The 32 value was chosen arbitrary because it has low impact in p->store (its initial size is 1024) and makes this issue much less likely to happen. We could tune this number to a less conservative value if needed. If you want to test it, that patch should be applied on top of this Curro's patch [0] as it fixes a lot of compute shader compilation errors (~700 dEQP-GLES31.functional.ssbo.* tests). I have setup a branch with both patches in [1]. Feel free to comment about other solutions, ideas, opinions, etc. Thanks, Sam [0] See attachment at http://lists.freedesktop.org/archives/mesa-dev/2015-October/097183.html [1] $ git clone -b dEQP-functional-ssbo-fixes-v1 \ https://github.com/Igalia/mesa.git Samuel Iglesias Gonsalvez (1): i965: book space at the end of p->store for SEND opcodes to avoid invalid memory access src/mesa/drivers/dri/i965/brw_eu_emit.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) -- 2.1.4 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev