Re: [PATCH] [libiberty] remove TBAA violation in iterative_hash, improve code-gen

2024-02-14 Thread Jakub Jelinek
On Wed, Feb 14, 2024 at 05:09:39PM +0100, Richard Biener wrote:
> 
> 
> > Am 14.02.2024 um 16:22 schrieb Jakub Jelinek :
> > 
> > On Wed, Feb 14, 2024 at 04:13:51PM +0100, Richard Biener wrote:
> >> The following removes the TBAA violation present in iterative_hash.
> >> As we eventually LTO that it's important to fix.  This also improves
> >> code generation for the >= 12 bytes loop by using | to compose the
> >> 4 byte words as at least GCC 7 and up can recognize that pattern
> >> and perform a 4 byte load while the variant with a + is not
> >> recognized (not on trunk either), I think we have an enhancement bug
> >> for this somewhere.
> >> 
> >> Given we reliably merge and the bogus "optimized" path might be
> >> only relevant for archs that cannot do misaligned loads efficiently
> >> I've chosen to keep a specialization for aligned accesses.
> >> 
> >> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK for trunk?
> >> 
> >> Thanks,
> >> Richard.
> >> 
> >> libiberty/
> >>* hashtab.c (iterative_hash): Remove TBAA violating handling
> >>of aligned little-endian case in favor of just keeping the
> >>aligned case special-cased.  Use | for composing a larger word.
> > 
> > Have you tried using memcpy into a hashval_t temporary?
> > Just wonder whether you get better or worse code with that compared to
> > the shifts.
> 
> I didn’t but I verified I get a single movd on x84-64 when using | instead of 
> + with GCC 7 and trunk.

Ok then.

Jakub



Re: [PATCH] [libiberty] remove TBAA violation in iterative_hash, improve code-gen

2024-02-14 Thread Richard Biener



> Am 14.02.2024 um 16:22 schrieb Jakub Jelinek :
> 
> On Wed, Feb 14, 2024 at 04:13:51PM +0100, Richard Biener wrote:
>> The following removes the TBAA violation present in iterative_hash.
>> As we eventually LTO that it's important to fix.  This also improves
>> code generation for the >= 12 bytes loop by using | to compose the
>> 4 byte words as at least GCC 7 and up can recognize that pattern
>> and perform a 4 byte load while the variant with a + is not
>> recognized (not on trunk either), I think we have an enhancement bug
>> for this somewhere.
>> 
>> Given we reliably merge and the bogus "optimized" path might be
>> only relevant for archs that cannot do misaligned loads efficiently
>> I've chosen to keep a specialization for aligned accesses.
>> 
>> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK for trunk?
>> 
>> Thanks,
>> Richard.
>> 
>> libiberty/
>>* hashtab.c (iterative_hash): Remove TBAA violating handling
>>of aligned little-endian case in favor of just keeping the
>>aligned case special-cased.  Use | for composing a larger word.
> 
> Have you tried using memcpy into a hashval_t temporary?
> Just wonder whether you get better or worse code with that compared to
> the shifts.

I didn’t but I verified I get a single movd on x84-64 when using | instead of + 
with GCC 7 and trunk.

Richard 

>Jakub
> 


Re: [PATCH] [libiberty] remove TBAA violation in iterative_hash, improve code-gen

2024-02-14 Thread Jakub Jelinek
On Wed, Feb 14, 2024 at 04:13:51PM +0100, Richard Biener wrote:
> The following removes the TBAA violation present in iterative_hash.
> As we eventually LTO that it's important to fix.  This also improves
> code generation for the >= 12 bytes loop by using | to compose the
> 4 byte words as at least GCC 7 and up can recognize that pattern
> and perform a 4 byte load while the variant with a + is not
> recognized (not on trunk either), I think we have an enhancement bug
> for this somewhere.
> 
> Given we reliably merge and the bogus "optimized" path might be
> only relevant for archs that cannot do misaligned loads efficiently
> I've chosen to keep a specialization for aligned accesses.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK for trunk?
> 
> Thanks,
> Richard.
> 
> libiberty/
>   * hashtab.c (iterative_hash): Remove TBAA violating handling
>   of aligned little-endian case in favor of just keeping the
>   aligned case special-cased.  Use | for composing a larger word.

Have you tried using memcpy into a hashval_t temporary?
Just wonder whether you get better or worse code with that compared to
the shifts.

Jakub