Re: [Mesa-dev] [RFC PATCH 0/5] Skip automatic execsize for instructions with a width of 4

2015-12-17 Thread Matt Turner
On Wed, Dec 9, 2015 at 4:15 AM, Iago Toral Quiroga  wrote:
> Right now we rely on the code at the bottom of brw_set_dest to set the 
> correct execution size for anything that does not operate on a full SIMD 
> register (dst.width < BRW_EXECUTE_8). However, this won't work with fp64, 
> where operands are twice as big and we see instructions with a horizontal 
> width of 4 that still require an execution size of 8. We cannot fix this by 
> simply checking the type of the operands involved and skip the automatic 
> execsize adjustment when they are doubles because we can also operate on 
> doubles as integers (for pack and unpack operations for example).

Can you give an example of when checking the type wouldn't be
sufficient? Presumably packDouble2x32/unpackDouble2x32? What code do
they generate? Could we look at the destination's stride as well?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH 0/5] Skip automatic execsize for instructions with a width of 4

2015-12-17 Thread Ilia Mirkin
On Thu, Dec 17, 2015 at 12:31 PM, Connor Abbott  wrote:
> On Thu, Dec 17, 2015 at 11:44 AM, Matt Turner  wrote:
>> On Wed, Dec 9, 2015 at 4:15 AM, Iago Toral Quiroga  wrote:
>>> Right now we rely on the code at the bottom of brw_set_dest to set the 
>>> correct execution size for anything that does not operate on a full SIMD 
>>> register (dst.width < BRW_EXECUTE_8). However, this won't work with fp64, 
>>> where operands are twice as big and we see instructions with a horizontal 
>>> width of 4 that still require an execution size of 8. We cannot fix this by 
>>> simply checking the type of the operands involved and skip the automatic 
>>> execsize adjustment when they are doubles because we can also operate on 
>>> doubles as integers (for pack and unpack operations for example).
>>
>> Can you give an example of when checking the type wouldn't be
>> sufficient? Presumably packDouble2x32/unpackDouble2x32? What code do
>> they generate? Could we look at the destination's stride as well?
>
> Yes, packDouble2x32 is what would cause the same issue, since it turns
> into two 32-bit MOV's with a destination stride of 2 (and offset of 0
> and 1, ofc). We could check for the stride, but if we really don't
> want to change this code, the better way to go would be to avoid the
> width adjustment that causes this whole thing to happen for
> destination registers, since everything other than this piece of code
> ignores the destination width and vstride. But that being said, this
> code is a hack since it breaks the assumption that fs_inst::exec_size
> always equals the final ExecSize of the instruction, and Iago and I
> agreed to fix it (or at least for now, do the portion of the fix
> needed for fp64) rather than work around it.

Not sure if you guys are thinking about it, but a lot of these things
that apply to doubles also apply to ARB_gpu_shader_int64. While you
obviously don't need to support it now, you may want to keep it in
mind as you make design decisions.

Cheers,

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH 0/5] Skip automatic execsize for instructions with a width of 4

2015-12-17 Thread Connor Abbott
On Thu, Dec 17, 2015 at 11:44 AM, Matt Turner  wrote:
> On Wed, Dec 9, 2015 at 4:15 AM, Iago Toral Quiroga  wrote:
>> Right now we rely on the code at the bottom of brw_set_dest to set the 
>> correct execution size for anything that does not operate on a full SIMD 
>> register (dst.width < BRW_EXECUTE_8). However, this won't work with fp64, 
>> where operands are twice as big and we see instructions with a horizontal 
>> width of 4 that still require an execution size of 8. We cannot fix this by 
>> simply checking the type of the operands involved and skip the automatic 
>> execsize adjustment when they are doubles because we can also operate on 
>> doubles as integers (for pack and unpack operations for example).
>
> Can you give an example of when checking the type wouldn't be
> sufficient? Presumably packDouble2x32/unpackDouble2x32? What code do
> they generate? Could we look at the destination's stride as well?

Yes, packDouble2x32 is what would cause the same issue, since it turns
into two 32-bit MOV's with a destination stride of 2 (and offset of 0
and 1, ofc). We could check for the stride, but if we really don't
want to change this code, the better way to go would be to avoid the
width adjustment that causes this whole thing to happen for
destination registers, since everything other than this piece of code
ignores the destination width and vstride. But that being said, this
code is a hack since it breaks the assumption that fs_inst::exec_size
always equals the final ExecSize of the instruction, and Iago and I
agreed to fix it (or at least for now, do the portion of the fix
needed for fp64) rather than work around it.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC PATCH 0/5] Skip automatic execsize for instructions with a width of 4

2015-12-09 Thread Iago Toral Quiroga
Right now we rely on the code at the bottom of brw_set_dest to set the correct 
execution size for anything that does not operate on a full SIMD register 
(dst.width < BRW_EXECUTE_8). However, this won't work with fp64, where operands 
are twice as big and we see instructions with a horizontal width of 4 that 
still require an execution size of 8. We cannot fix this by simply checking the 
type of the operands involved and skip the automatic execsize adjustment when 
they are doubles because we can also operate on doubles as integers (for pack 
and unpack operations for example).

Connor and I have been discussing this and we see two options:

1) We fix all the instructions that do not operate on doubles to set the
correct execution size so they don't depend on the code in brw_set_dest any 
more, then remove that code from brw_set_dest so it does not break fp64. 
However, this involves a lot of changes throughout the driver and across all 
hardware platforms... it looks rather scary, so I think we probably don't want 
to go down this path.

2) Since we only produce double instructions with a width of 4, we can limit 
the change to these instructions. This means that we only automatically adjust 
execsize in brw_set_dest for instructions where dst.width < BRW_EXECUTE_4 and 
we fix driver code that emits non-double instructions with a width of 4 to set 
the execsize they need explicitly. For gen7/8 it seems that only a handful of 
changes are required for this according to piglit.

This RFC series implements 2). As mentioned, piglit seems happy with the 
changes, but it might be a good idea to run this through Jenkins to make sure 
that we are not missing anything.

My original idea was to expand the fix to cover other gens as well, but 
unfortunately, this would involve many more changes. For example, ILK's 
strip-and-fans or clipping modules are full with code that mixes width8 and 
width4 instructions (I have ~10 patches that fix this that involve changing the 
default execution size to 4 in the sf module), gen6 also needs a few changes 
and I would not be surprised if gen4 required more. Gen9 might require changes 
too. Another problem with this is that we don't have gen4/9 hardware available, 
so trying this would also require involvement from other people with access to 
this hardware.

So considering that we are not targetting fp64 support on gens < 7, at least 
for now, it seems that implementing 2) but special-case it for gen7 and gen8 
only, leaving other generations intact is a resoanble way to go for now.

Opinions?

Iago Toral Quiroga (5):
  i965/eu: set correct execution size in brw_NOP
  i965/fs: set execution size for SEND messages in
generate_uniform_pull_constant_load_gen7
  i965/eu: set execution size for SEND message in
brw_send_indirect_message
  i965: set correct execsize for MOVS with a width of 4 in
brw_find_live_channel
  i965: Skip execution size adjustment for instructions of width 4

 src/mesa/drivers/dri/i965/brw_eu_emit.c| 21 -
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp |  2 ++
 2 files changed, 22 insertions(+), 1 deletion(-)

-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev