fs: Lower 32x32 bit multiplication on BXT.

Francisco Jerez Tue, 11 Aug 2015 05:44:59 -0700

Francisco Jerez <curroje...@riseup.net> writes:

> Neil Roberts <n...@linux.intel.com> writes:
>
>> Are you sure this patch is necessary? The documentation for the multiply
>> instruction on BDW+ says:
>>
>> SourceType :         *D      
>> DestinationType :    *D      
>> Project :    EXCLUDE(CHV) 
>>
>> This to me implies that it should work on BXT because it doesn't say
>> EXCLUDE(BXT).
>>
>
> In fact both CHV and BXT *can* support 32x32 multiplication, that's not
> really the reason that motivated this patch -- The problem is that 32x32
> multiplication has QWORD execution type which has several restrictions
> on CHV and BXT, the annoying one is:
>
> "CHV, BXT
>
> When source or destination datatype is 64b or operation is integer DWord
> multiply, regioning in Align1 must follow these rules:
>
>     Source and Destination horizontal stride must be aligned to the same
>     qword.
>
>     Example:
> [...]
>     // mul (4) r10.0<2>:d r11.0<8;4,2>:d r12.0<8;4,2>:d // Source and 
> Destination stride must be 2 since the execution type is Qword."
>
> So it sounds like we could use the native 32x32 multiply with some
> additional effort to pass the arguments with the correct stride, but
> it's not clear to me whether the solution would be any more better than
> splitting up the multiply into 32x16 multiplies, so doing the same as in
> CHV and pre-Gen8 seemed like the most KISS solution for now.
>
>> I made a little test case and ran it on my BXT and it seems to work even
>> without this patch. I looked at the assembler source and it is
>> definitely doing a MUL instruction with D types for the dst and two
>> sources.
>>
>> https://github.com/bpeel/piglit/blob/test-integer-multiply/tests/general/mult32.c
>>
> Hmm, I guess the docs could be wrong, but I'm not sure what the
> consequences would be of violating the alignment rule, I guess the
> failure may be non-deterministic.  What stepping of BXT did you test
> this on?
>
I just tried this on the BXT simulator, and it doesn't seem to be too
happy about multiplies breaking the alignment rule:


| .aub:0x0> CEuExecUnit::CDecodeInst::RegionCheck>Instruction is:
| mul (16) r3.0<1>:d    r2.3<0;1,0>:d    0xe1ac99a1:d     { Align1, N1, 
NoCompact }.
| For 64-bit Align1 operation or multiplication of dwords in this
| device, destination horizontal stride must be aligned to qword.
|
| Fail

>> Regards,
>> - Neil
>>
>> Francisco Jerez <curroje...@riseup.net> writes:
>>
>>> AFAIK BXT has the same annoying alignment limitation as CHV on the
>>> source register regions of 32x32 bit MULs, give it the same treatment.
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_fs.cpp | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> index 244f299..fc9f007 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> @@ -3126,9 +3126,9 @@ fs_visitor::lower_integer_multiplication()
>>>     bool progress = false;
>>>  
>>>     /* Gen8's MUL instruction can do a 32-bit x 32-bit -> 32-bit operation
>>> -    * directly, but Cherryview cannot.
>>> +    * directly, but CHV/BXT cannot.
>>>      */
>>> -   if (devinfo->gen >= 8 && !devinfo->is_cherryview)
>>> +   if (devinfo->gen >= 8 && !devinfo->is_cherryview && 
>>> !devinfo->is_broxton)
>>>        return false;
>>>  
>>>     foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
>>> -- 
>>> 2.4.6
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

signature.asc
Description: PGP signature

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/5] i965/fs: Lower 32x32 bit multiplication on BXT.

Reply via email to