Re: Widening multiplication, but no narrowing division [i386/AMD64]

2023-01-10 Thread Paul Koning via Gcc



> On Jan 9, 2023, at 11:27 AM, Stefan Kanthak  wrote:
> 
> "Paul Koning"  wrote:
> 
>>> ...
> 
>> Yes, I was thinking the same.  But I spent a while on that pattern -- I
>> wanted to support div/mod as a single operation because the machine has
>> that primitive.  And I'm pretty sure I saw it work before I committed
>> that change.  That's why I'm wondering if something changed.
> 
> I can't tell from the past how GCC once worked, but today it can't
> (or doesn't) use such patterns, at least not on i386/AMD64 processors.

It turns out I was confused by the RTL generated by my pattern.  That pattern 
is for divmodhi, so it works as desired given same-size inputs.  

I'm wondering if the case of longer dividend -- which is a common thing for 
several machines -- could be handled by a define_peephole2 that matches the 
sign-extend of the divisor followed by the (longer) divide.  I made a stab at 
that but what I wrote wasn't valid.

So, question to the list:  suppose I want to write RTL that matches what Stefan 
is talking about, with a div or mod or divmod that has si results and a di 
dividend (or hi results and an si dividend), how would you do that?  Can a 
define_peephole2 do it, and if so, what would it look like?

paul




Re: Widening multiplication, but no narrowing division [i386/AMD64]

2023-01-09 Thread Stefan Kanthak
"Paul Koning"  wrote:

>> On Jan 9, 2023, at 10:20 AM, Stefan Kanthak  wrote:
>> 
>> "Paul Koning"  wrote:
>> 
 On Jan 9, 2023, at 7:20 AM, Stefan Kanthak  wrote:
 
 Hi,
 
 GCC (and other C compilers too) support the widening multiplication
 of i386/AMD64 processors, but DON'T support their narrowing division:
>>> 
>>> I wonder if this changed in the recent past.
>>> I have a pattern for this type of thing in pdp11.md:
>> [...]
>>> and I'm pretty sure this worked at some point in the past.  
>> 
>> Unfortunately the C standard defines that the smaller operand (of lesser
>> conversion rank), here divisor, has to undergo a conversion to the "real
>> common type", i.e. the broader operand (of higher conversion rank), here
>> dividend. Unless the information about promotion/conversion is handed over
>> to the code generator it can't apply such patterns -- as demonstrated by
>> the demo code.

> Yes, I was thinking the same.  But I spent a while on that pattern -- I
> wanted to support div/mod as a single operation because the machine has
> that primitive.  And I'm pretty sure I saw it work before I committed
> that change.  That's why I'm wondering if something changed.

I can't tell from the past how GCC once worked, but today it can't
(or doesn't) use such patterns, at least not on i386/AMD64 processors.
To give another example where the necessary information is most
obviously NOT propagated from front end to back end:

--- clmul.c ---
// widening carry-less multiplication

unsigned long long clmul(unsigned long p, unsigned long q)
{
unsigned long long r = 0;
unsigned long  s = 1UL << 31;

do {
r <<= 1;
if (q & s)
#ifdef _MSC_VER
(unsigned long) r ^= p;
#else
r ^= p; // no need to promote/convert p here!
#endif
} while (s >>= 1);

return r;
}
--- EOF ---

# https://gcc.godbolt.org/z/E99v7fEP3
clmul(unsigned long, unsigned long):
pushebp
mov ecx, -2147483648
xor eax, eax
xor edx, edx
pushedi# OOPS: superfluous
xor edi, edi   # OOPS: superfluous
pushesi
pushebx# OUCH: WTF?
mov ebp, DWORD PTR [esp+24]
mov ebx, 32# OUCH: WTF?
mov esi, DWORD PTR [esp+20]
.L3:
shldedx, eax, 1
add eax, eax
testebp, ecx
je  .L2
xor eax, esi
xor edx, edi   # OOPS: superfluous
.L2:
shr ecx, 1
sub ebx, 1 # OUCH: WTF?
jne .L3
pop ebx# OUCH: WTF?
pop esi
pop edi# OOPS: superfluous
pop ebp
ret

8 superfluous instructions out of the total 25 instructions!

NOT AMUSED
Stefan


Re: Widening multiplication, but no narrowing division [i386/AMD64]

2023-01-09 Thread Paul Koning via Gcc



> On Jan 9, 2023, at 10:20 AM, Stefan Kanthak  wrote:
> 
> "Paul Koning"  wrote:
> 
>>> On Jan 9, 2023, at 7:20 AM, Stefan Kanthak  wrote:
>>> 
>>> Hi,
>>> 
>>> GCC (and other C compilers too) support the widening multiplication
>>> of i386/AMD64 processors, but DON'T support their narrowing division:
>> 
>> I wonder if this changed in the recent past.
>> I have a pattern for this type of thing in pdp11.md:
> [...]
>> and I'm pretty sure this worked at some point in the past.  
> 
> Unfortunately the C standard defines that the smaller operand (of lesser
> conversion rank), here divisor, has to undergo a conversion to the "real
> common type", i.e. the broader operand (of higher conversion rank), here
> dividend. Unless the information about promotion/conversion is handed over
> to the code generator it can't apply such patterns -- as demonstrated by
> the demo code.
> 
> regards
> Stefan

Yes, I was thinking the same.  But I spent a while on that pattern -- I wanted 
to support div/mod as a single operation because the machine has that 
primitive.  And I'm pretty sure I saw it work before I committed that change.  
That's why I'm wondering if something changed.

paul



Re: Widening multiplication, but no narrowing division [i386/AMD64]

2023-01-09 Thread Stefan Kanthak
"Paul Koning"  wrote:

>> On Jan 9, 2023, at 7:20 AM, Stefan Kanthak  wrote:
>> 
>> Hi,
>> 
>> GCC (and other C compilers too) support the widening multiplication
>> of i386/AMD64 processors, but DON'T support their narrowing division:
>
> I wonder if this changed in the recent past.
> I have a pattern for this type of thing in pdp11.md:
[...]
> and I'm pretty sure this worked at some point in the past.  

Unfortunately the C standard defines that the smaller operand (of lesser
conversion rank), here divisor, has to undergo a conversion to the "real
common type", i.e. the broader operand (of higher conversion rank), here
dividend. Unless the information about promotion/conversion is handed over
to the code generator it can't apply such patterns -- as demonstrated by
the demo code.

regards
Stefan


Re: Widening multiplication, but no narrowing division [i386/AMD64]

2023-01-09 Thread Paul Koning via Gcc



> On Jan 9, 2023, at 7:20 AM, Stefan Kanthak  wrote:
> 
> Hi,
> 
> GCC (and other C compilers too) support the widening multiplication
> of i386/AMD64 processors, but DON'T support their narrowing division:

I wonder if this changed in the recent past.  I have a pattern for this type of 
thing in pdp11.md:

(define_expand "divmodhi4"
  [(parallel
[(set (subreg:HI (match_dup 1) 0)
(div:HI (match_operand:SI 1 "register_operand" "0")
(match_operand:HI 2 "general_operand" "g")))
 (set (subreg:HI (match_dup 1) 2)
(mod:HI (match_dup 1) (match_dup 2)))])
   (set (match_operand:HI 0 "register_operand" "=r")
(subreg:HI (match_dup 1) 0))
   (set (match_operand:HI 3 "register_operand" "=r")
(subreg:HI (match_dup 1) 2))]
  "TARGET_40_PLUS"
  "")

and I'm pretty sure this worked at some point in the past.  

paul



Re: Widening multiplication, but no narrowing division [i386/AMD64]

2023-01-09 Thread Stefan Kanthak
LIU Hao wrote:

>在 2023/1/9 20:20, Stefan Kanthak 写道:
>> Hi,
>>
>> GCC (and other C compilers too) support the widening multiplication
>> of i386/AMD64 processors, but DON'T support their narrowing division:
>>
>>
>
> QWORD-DWORD division would change the behavior of your program.
[...]
> If DIV was used, it would effect an exception:

Guess why I use "schoolbook" division?
Please read the end of my post until you understand the code.

regards
Stefan



Re: Widening multiplication, but no narrowing division [i386/AMD64]

2023-01-09 Thread LIU Hao via Gcc

在 2023/1/9 20:20, Stefan Kanthak 写道:

Hi,

GCC (and other C compilers too) support the widening multiplication
of i386/AMD64 processors, but DON'T support their narrowing division:




QWORD-DWORD division would change the behavior of your program.


Given:

   ```
   uint32_t xdiv(uint64_t x, uint32_t y) { return x / y;  }
   ```

then `xdiv(0x20002, 2)` should first convert both operands to `uint64_t`, perform the division 
which yields `0x10001`, then truncate the quotient to 32-bit which gives `1`. The result is exact.



If DIV was used, it would effect an exception:

   ```
   mov edx, 2
   mov eax, edx   # edx:eax = 0x20002

   mov ecx, edx
   div ecx# division overflows because the quotient
  # can't stored into EAX
   ```





--
Best regards,
LIU Hao



OpenPGP_signature
Description: OpenPGP digital signature