fs: Lower 32x32 bit multiplication on BXT.

Francisco Jerez Wed, 12 Aug 2015 04:42:10 -0700

Francisco Jerez <curroje...@riseup.net> writes:

> Neil Roberts <n...@linux.intel.com> writes:
>
>> Ok, that makes sense, thanks for the detailed explanation. I think
>> you're right, given the extra restrictions avoiding the integer dword
>> multiplication seems like the simplest solution.
>>
>> I can't say that I fully understand what the documentation is trying to
>> say here. Does it mean that the horizontal strides must be set such that
>> each element in the source is offset to the same qword within its
>> register as the corresponding element in the destination?
>>
>
> The way I understand it one's required to:
>  - Align the stride in bytes (i.e. horizontal stride times the size of
>    the register type) between components of an individual source or
>    destination to a multiple of 64b (if the register region is more than
>    a single scalar that is).  The multiple must be the same for all
>    non-scalar sources and the destination.
>  - Use the same sub-register offsets for all non-scalar sources and the
>    destination.
>  - Set vertical stride to horizontal stride times width.
>  - Don't use indirect addressing or ARF registers.
>
>> I tried changing the test case¹ so that one of the sources comes from a
>> uniform. That would make the register for the uniform source be a scalar
>> and thus all of its elements would come from the first qword, whereas
>> for the destination the elements would be written to different qwords.
>> That ends up with an instruction like this:
>>
>> mul(16)         g120<1>D        g9<8,8,1>D      g6<0,1,0>D { align1
>>                                                             1H compacted };
>>
>
> Yeah, this should definitely violate the alignment rule on both the
> destination and first source.
>
>> However, the test still passes on my BXT so I've probably misunderstood
>> something. On the other hand this does look similar to the stride you
>> had in your example instruction that the simulator complained about.
>>
>> Given that the documentation and simulator implies this shouldn't work
>> I'm happy to leave this patch and I'm not suggesting we do anything now.
>>
>
> Honestly I'm also not 100% sure if this restriction really applies to
> all BXT hardware or if it's specific to some preproduction steppings --
> The documentation seems to imply the former, but I guess it could be
> wrong.
>
I forgot to mention that when I tried this yesterday the simulator
seemed to complain about the alignment rule regardless of the stepping
for all the ones it recognised (a-c IIRC).


>> Regards,
>> - Neil
>>
>> 1. 
>> https://github.com/bpeel/piglit/blob/test-integer-multiply-uniform/tests/general/mult32.c
>>
>> Francisco Jerez <curroje...@riseup.net> writes:
>>
>>> Neil Roberts <n...@linux.intel.com> writes:
>>>
>>>> Are you sure this patch is necessary? The documentation for the multiply
>>>> instruction on BDW+ says:
>>>>
>>>> SourceType :       *D      
>>>> DestinationType :  *D      
>>>> Project :  EXCLUDE(CHV) 
>>>>
>>>> This to me implies that it should work on BXT because it doesn't say
>>>> EXCLUDE(BXT).
>>>>
>>>
>>> In fact both CHV and BXT *can* support 32x32 multiplication, that's not
>>> really the reason that motivated this patch -- The problem is that 32x32
>>> multiplication has QWORD execution type which has several restrictions
>>> on CHV and BXT, the annoying one is:
>>>
>>> "CHV, BXT
>>>
>>> When source or destination datatype is 64b or operation is integer DWord
>>> multiply, regioning in Align1 must follow these rules:
>>>
>>>     Source and Destination horizontal stride must be aligned to the same
>>>     qword.
>>>
>>>     Example:
>>> [...]
>>>     // mul (4) r10.0<2>:d r11.0<8;4,2>:d r12.0<8;4,2>:d // Source and 
>>> Destination stride must be 2 since the execution type is Qword."
>>>
>>> So it sounds like we could use the native 32x32 multiply with some
>>> additional effort to pass the arguments with the correct stride, but
>>> it's not clear to me whether the solution would be any more better than
>>> splitting up the multiply into 32x16 multiplies, so doing the same as in
>>> CHV and pre-Gen8 seemed like the most KISS solution for now.
>>>
>>>> I made a little test case and ran it on my BXT and it seems to work even
>>>> without this patch. I looked at the assembler source and it is
>>>> definitely doing a MUL instruction with D types for the dst and two
>>>> sources.
>>>>
>>>> https://github.com/bpeel/piglit/blob/test-integer-multiply/tests/general/mult32.c
>>>>
>>> Hmm, I guess the docs could be wrong, but I'm not sure what the
>>> consequences would be of violating the alignment rule, I guess the
>>> failure may be non-deterministic.  What stepping of BXT did you test
>>> this on?
>>>
>>>> Regards,
>>>> - Neil
>>>>
>>>> Francisco Jerez <curroje...@riseup.net> writes:
>>>>
>>>>> AFAIK BXT has the same annoying alignment limitation as CHV on the
>>>>> source register regions of 32x32 bit MULs, give it the same treatment.
>>>>> ---
>>>>>  src/mesa/drivers/dri/i965/brw_fs.cpp | 4 ++--
>>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
>>>>> b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>>>> index 244f299..fc9f007 100644
>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>>>> @@ -3126,9 +3126,9 @@ fs_visitor::lower_integer_multiplication()
>>>>>     bool progress = false;
>>>>>  
>>>>>     /* Gen8's MUL instruction can do a 32-bit x 32-bit -> 32-bit operation
>>>>> -    * directly, but Cherryview cannot.
>>>>> +    * directly, but CHV/BXT cannot.
>>>>>      */
>>>>> -   if (devinfo->gen >= 8 && !devinfo->is_cherryview)
>>>>> +   if (devinfo->gen >= 8 && !devinfo->is_cherryview && 
>>>>> !devinfo->is_broxton)
>>>>>        return false;
>>>>>  
>>>>>     foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
>>>>> -- 
>>>>> 2.4.6
>>>>>
>>>>> _______________________________________________
>>>>> mesa-dev mailing list
>>>>> mesa-dev@lists.freedesktop.org
>>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

signature.asc
Description: PGP signature

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/5] i965/fs: Lower 32x32 bit multiplication on BXT.

Reply via email to