Re: [Mesa-dev] [RFC PATCH 0/5] Skip automatic execsize for instructions with a width of 4
On Wed, Dec 9, 2015 at 4:15 AM, Iago Toral Quirogawrote: > Right now we rely on the code at the bottom of brw_set_dest to set the > correct execution size for anything that does not operate on a full SIMD > register (dst.width < BRW_EXECUTE_8). However, this won't work with fp64, > where operands are twice as big and we see instructions with a horizontal > width of 4 that still require an execution size of 8. We cannot fix this by > simply checking the type of the operands involved and skip the automatic > execsize adjustment when they are doubles because we can also operate on > doubles as integers (for pack and unpack operations for example). Can you give an example of when checking the type wouldn't be sufficient? Presumably packDouble2x32/unpackDouble2x32? What code do they generate? Could we look at the destination's stride as well? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC PATCH 0/5] Skip automatic execsize for instructions with a width of 4
On Thu, Dec 17, 2015 at 12:31 PM, Connor Abbottwrote: > On Thu, Dec 17, 2015 at 11:44 AM, Matt Turner wrote: >> On Wed, Dec 9, 2015 at 4:15 AM, Iago Toral Quiroga wrote: >>> Right now we rely on the code at the bottom of brw_set_dest to set the >>> correct execution size for anything that does not operate on a full SIMD >>> register (dst.width < BRW_EXECUTE_8). However, this won't work with fp64, >>> where operands are twice as big and we see instructions with a horizontal >>> width of 4 that still require an execution size of 8. We cannot fix this by >>> simply checking the type of the operands involved and skip the automatic >>> execsize adjustment when they are doubles because we can also operate on >>> doubles as integers (for pack and unpack operations for example). >> >> Can you give an example of when checking the type wouldn't be >> sufficient? Presumably packDouble2x32/unpackDouble2x32? What code do >> they generate? Could we look at the destination's stride as well? > > Yes, packDouble2x32 is what would cause the same issue, since it turns > into two 32-bit MOV's with a destination stride of 2 (and offset of 0 > and 1, ofc). We could check for the stride, but if we really don't > want to change this code, the better way to go would be to avoid the > width adjustment that causes this whole thing to happen for > destination registers, since everything other than this piece of code > ignores the destination width and vstride. But that being said, this > code is a hack since it breaks the assumption that fs_inst::exec_size > always equals the final ExecSize of the instruction, and Iago and I > agreed to fix it (or at least for now, do the portion of the fix > needed for fp64) rather than work around it. Not sure if you guys are thinking about it, but a lot of these things that apply to doubles also apply to ARB_gpu_shader_int64. While you obviously don't need to support it now, you may want to keep it in mind as you make design decisions. Cheers, -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC PATCH 0/5] Skip automatic execsize for instructions with a width of 4
On Thu, Dec 17, 2015 at 11:44 AM, Matt Turnerwrote: > On Wed, Dec 9, 2015 at 4:15 AM, Iago Toral Quiroga wrote: >> Right now we rely on the code at the bottom of brw_set_dest to set the >> correct execution size for anything that does not operate on a full SIMD >> register (dst.width < BRW_EXECUTE_8). However, this won't work with fp64, >> where operands are twice as big and we see instructions with a horizontal >> width of 4 that still require an execution size of 8. We cannot fix this by >> simply checking the type of the operands involved and skip the automatic >> execsize adjustment when they are doubles because we can also operate on >> doubles as integers (for pack and unpack operations for example). > > Can you give an example of when checking the type wouldn't be > sufficient? Presumably packDouble2x32/unpackDouble2x32? What code do > they generate? Could we look at the destination's stride as well? Yes, packDouble2x32 is what would cause the same issue, since it turns into two 32-bit MOV's with a destination stride of 2 (and offset of 0 and 1, ofc). We could check for the stride, but if we really don't want to change this code, the better way to go would be to avoid the width adjustment that causes this whole thing to happen for destination registers, since everything other than this piece of code ignores the destination width and vstride. But that being said, this code is a hack since it breaks the assumption that fs_inst::exec_size always equals the final ExecSize of the instruction, and Iago and I agreed to fix it (or at least for now, do the portion of the fix needed for fp64) rather than work around it. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC PATCH 0/5] Skip automatic execsize for instructions with a width of 4
Right now we rely on the code at the bottom of brw_set_dest to set the correct execution size for anything that does not operate on a full SIMD register (dst.width < BRW_EXECUTE_8). However, this won't work with fp64, where operands are twice as big and we see instructions with a horizontal width of 4 that still require an execution size of 8. We cannot fix this by simply checking the type of the operands involved and skip the automatic execsize adjustment when they are doubles because we can also operate on doubles as integers (for pack and unpack operations for example). Connor and I have been discussing this and we see two options: 1) We fix all the instructions that do not operate on doubles to set the correct execution size so they don't depend on the code in brw_set_dest any more, then remove that code from brw_set_dest so it does not break fp64. However, this involves a lot of changes throughout the driver and across all hardware platforms... it looks rather scary, so I think we probably don't want to go down this path. 2) Since we only produce double instructions with a width of 4, we can limit the change to these instructions. This means that we only automatically adjust execsize in brw_set_dest for instructions where dst.width < BRW_EXECUTE_4 and we fix driver code that emits non-double instructions with a width of 4 to set the execsize they need explicitly. For gen7/8 it seems that only a handful of changes are required for this according to piglit. This RFC series implements 2). As mentioned, piglit seems happy with the changes, but it might be a good idea to run this through Jenkins to make sure that we are not missing anything. My original idea was to expand the fix to cover other gens as well, but unfortunately, this would involve many more changes. For example, ILK's strip-and-fans or clipping modules are full with code that mixes width8 and width4 instructions (I have ~10 patches that fix this that involve changing the default execution size to 4 in the sf module), gen6 also needs a few changes and I would not be surprised if gen4 required more. Gen9 might require changes too. Another problem with this is that we don't have gen4/9 hardware available, so trying this would also require involvement from other people with access to this hardware. So considering that we are not targetting fp64 support on gens < 7, at least for now, it seems that implementing 2) but special-case it for gen7 and gen8 only, leaving other generations intact is a resoanble way to go for now. Opinions? Iago Toral Quiroga (5): i965/eu: set correct execution size in brw_NOP i965/fs: set execution size for SEND messages in generate_uniform_pull_constant_load_gen7 i965/eu: set execution size for SEND message in brw_send_indirect_message i965: set correct execsize for MOVS with a width of 4 in brw_find_live_channel i965: Skip execution size adjustment for instructions of width 4 src/mesa/drivers/dri/i965/brw_eu_emit.c| 21 - src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 ++ 2 files changed, 22 insertions(+), 1 deletion(-) -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev