[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization

2024-05-15 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995

--- Comment #13 from Aldy Hernandez  ---
(In reply to Aldy Hernandez from comment #10)
> Created attachment 58202 [details]
> proof of concept implementing a range-op entry for builtin_assume_aligned
> 
> Something like this (properly coded and enhanced) would work.
> 
> The pointers_handled_p() cruft is temporary boilerplate to indicate that we
> only support a fold-range() of PRANGE = IRANGE op PRANGE for this operation.
> This is going away when the dust settles.

FYI, I've removed the pointers_handled_p requirement for new range-op entries.

[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization

2024-05-15 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995

--- Comment #12 from GCC Commits  ---
The master branch has been updated by Aldy Hernandez :

https://gcc.gnu.org/g:c400b2100719d0a9e5989c63e0827b9e98919df3

commit r15-504-gc400b2100719d0a9e5989c63e0827b9e98919df3
Author: Aldy Hernandez 
Date:   Tue May 14 16:21:50 2024 +0200

[prange] Default pointers_handled_p() to false.

The pointers_handled_p() method is an internal range-op helper to help
catch dispatch type mismatches for pointer operands.  This is what
caught the IPA mismatch in PR114985.

This method is only a temporary measure to catch any incompatibilities
in the current pointer range-op entries.  This patch returns true for
any *new* entries in the range-op table, as the current ones are
already fleshed out.  This keeps us from having to implement this
boilerplate function for any new range-op entries.

PR tree-optimization/114995
* range-op-ptr.cc (range_operator::pointers_handled_p): Default to
true.

[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization

2024-05-14 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995

--- Comment #11 from Aldy Hernandez  ---
Just to clarify.  prange as well as irange keep a value/mask pair where we can
store alignment info, so every calculation (range-op) along the way can track
this properly.

Now, for pointers we loose this information across passes because of the union
in SSA_NAME_RANGE_INFO (struct tree_ssa_name) which keeps the range info in an
ptr_info_def, not a proper vrange_storage.  It's in my long term TODO list to
propose we properly track pointer ranges once prange comes live.

Note that IPA keeps alignment info on the side, so part of this info is kept
across passes.  But I assume they're doing this, because originally ranges
didn't have alignment (value/mask) info associated with them.

[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization

2024-05-14 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995

--- Comment #10 from Aldy Hernandez  ---
Created attachment 58202
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58202&action=edit
proof of concept implementing a range-op entry for builtin_assume_aligned

Something like this (properly coded and enhanced) would work.

The pointers_handled_p() cruft is temporary boilerplate to indicate that we
only support a fold-range() of PRANGE = IRANGE op PRANGE for this operation. 
This is going away when the dust settles.

[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization

2024-05-14 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995

--- Comment #9 from Aldy Hernandez  ---
(In reply to Jakub Jelinek from comment #7)
> The above examples just show misunderstanding what __builtin_assume_aligned
> is and what it is not.  You need to use the result of the built-in function
> in the accesses to be able to use the alignment information, if you just try
> to compare __builtin_assume_aligned (x, 32) == x, it will just fold as
> always true.  The design of the builtin is to attach the alignment
> information to the result of the builtin function only.
> 
> CCing Aldy/Andrew for whether prange can or could be taught to handle the
> assume cases with uintptr_t and bitwise and + comparison.

All the pieces are there to make it work, both with the assume aligned and with
the uintptr_t case.  And we could probably get it all without prange.

For example:

#include 

void foo (const float *);

void bar1 (const float *array)
{
  [[assume(array != nullptr)]];
  const float *aligned = (const float *) __builtin_assume_aligned (array, 32);
  foo (aligned);
}

The __builtin_assume_aligned hasn't been expanded by evrp, so we should be able
to add a range-op entry for it.  This is what evrp sees:

void bar1 (const float * array)
{
  const float * aligned;

   :
  aligned_2 = __builtin_assume_aligned (array_1(D), 32);
  foo (aligned_2);
  return;

}

All we need is a range-op implementation for builtin_assume_aligned.  The
attached crude implementation does it.

=== BB 2 
 :
aligned_2 = __builtin_assume_aligned (array_1(D), 32);
foo (aligned_2);
return;

aligned_2 : [prange] const float * [0, +INF] MASK 0x VALUE 0x0

That is, the bottom 32 bits are cleared.

Andrew will have to comment on the uintptr_t idiom, because it gets expanded
into an .ASSUME() function which I'm unfamiliar with.

For this small function:

void bar2 (const float *array)
{
  [[assume((uintptr_t (array) & (32 - 1)) == 0)]];
  foo (array);
}

evrp expands to:

=== BB 2 
Partial equiv (array.0_3 pe64 array_2(D))
 :
array.0_3 = (long unsigned int) array_2(D);
_4 = array.0_3 & 31;
_5 = _4 == 0;
return _5;

_4 : [irange] long unsigned int [0, 31] MASK 0x1f VALUE 0x0

I don't see any reason why we couldn't get that array.0_3 and array_2 are
aligned to 32-bits.  Maybe we don't set the value/mask pair for the
bitwise_and::op1_range?  The value/mask stuff is not very fleshed out,
especially for the op1_range operators.

[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization

2024-05-09 Thread pratikc at live dot co.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995

--- Comment #8 from Pratik Chowdhury  ---
> if you just try to compare __builtin_assume_aligned (x, 32) == x, it will 
> just fold as always true

Aah. Dead code elimination.

> CCing Aldy/Andrew for whether prange can or could be taught to handle the 
> assume cases with uintptr_t and bitwise and + comparison.

Yeah this could be very helpful in cases such as this.

@Jakub @Andrew I think [this](https://gcc.godbolt.org/z/MEre8hr71) also has
scope for taking advantage of the same.

```cpp
void MulAddLoopWorksWithBuitInUnreachableAndConst(const float* const __restrict
mul_array,
 const float* const __restrict add_array,
 const ::std::size_t size, float* const __restrict x_array)
{
if((reinterpret_cast<::std::uintptr_t>(mul_array) & (32-1)) != 0)
{
__builtin_unreachable();
}
if((reinterpret_cast<::std::uintptr_t>(add_array) & (32-1)) != 0)
{
__builtin_unreachable();
}
if((reinterpret_cast<::std::uintptr_t>(x_array) & (32-1)) != 0)
{
__builtin_unreachable();
}
if ((size & (32 - 1)) != 0) __builtin_unreachable();
for (::std::size_t i = 0; i != size; i++) [[likely]] {
const auto mul = *(mul_array + i);
const auto add = *(add_array + i);
x_array[i] = x_array[i] * mul + add;
// x_array[i] *= mul;
// x_array[i] += add;
}
}
```

Here we are working under the assumption that the memory addresses themselves
are multiples of 32 if aligned for AVX2.

Clang seems to be able to take advantage of the same here.

If the __builtin_assume_aligned is kinda not supported due to dead code
elimination, then this looks like a nice enough alternative.

It also retains const correctness for me.

[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization

2024-05-08 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995

Jakub Jelinek  changed:

   What|Removed |Added

 CC||aldyh at gcc dot gnu.org,
   ||amacleod at redhat dot com,
   ||jakub at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek  ---
The above examples just show misunderstanding what __builtin_assume_aligned is
and what it is not.  You need to use the result of the built-in function in the
accesses to be able to use the alignment information, if you just try to
compare __builtin_assume_aligned (x, 32) == x, it will just fold as always
true.  The design of the builtin is to attach the alignment information to the
result of the builtin function only.

CCing Aldy/Andrew for whether prange can or could be taught to handle the
assume cases with uintptr_t and bitwise and + comparison.

[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization

2024-05-08 Thread pratikc at live dot co.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995

--- Comment #6 from Pratik Chowdhury  ---
> [[assume((uintptr_t(x_array) & (32-1)) == 0)]];

The Parans in the & have definitely given someone sleepless nights LOL. I
myself was saved by the warnings.

> Right now we don't always prop back what information we get from the assume 
> attributes.
Maybe with the recent prange addition, it can for pointers ...

Aah

Guess we will switch to assume in the future.


I tried [something else just about now](https://gcc.godbolt.org/z/d8aPjzMhq)

I think its a bit wrong. Clang seems to be able to handle it.

Is this syntax even valid?

```cpp
   
if(reinterpret_cast<::std::uintptr_t>(__builtin_assume_aligned((void*)(mul_array),
32)) != reinterpret_cast<::std::uintptr_t>(mul_array))
{
__builtin_unreachable();
}
   
if(reinterpret_cast<::std::uintptr_t>(__builtin_assume_aligned((void*)(add_array),
32)) != reinterpret_cast<::std::uintptr_t>(add_array))
{
__builtin_unreachable();
}
   
if(reinterpret_cast<::std::uintptr_t>(__builtin_assume_aligned((void*)(x_array),
32)) != reinterpret_cast<::std::uintptr_t>(x_array))
{
__builtin_unreachable();
}
```

I have my doubts on the previous one

But this should ideally be valid

```cpp
   
if((reinterpret_cast<::std::uintptr_t>(__builtin_assume_aligned((void*)(mul_array),
32)) & (32-1)) != 0)
{
__builtin_unreachable();
}
   
if((reinterpret_cast<::std::uintptr_t>(__builtin_assume_aligned((void*)(add_array),
32)) & (32-1)) != 0)
{
__builtin_unreachable();
}
   
if((reinterpret_cast<::std::uintptr_t>(__builtin_assume_aligned((void*)(x_array),
32)) & (32-1)) != 0)
{
__builtin_unreachable();
}
```

But either of them are unable to change the Load Stores from Unaligned to
Aligned.

Maybe victims of aggressive Dead Code elimination here? GCC intrinsics believe
that there can be no case its false and the code is deleted for the same?
Because __builtin_assume_aligned should always be a multiple of 32 in the above
cases.

Or __builtin_assume_aligned does not support usage like this in GCC and in
Clang it does? And due to that difference, we have a difference in behavior.

Its pretty interesting either way.

[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization

2024-05-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement
   Keywords||missed-optimization
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=109045

--- Comment #5 from Andrew Pinski  ---
Right now we don't always prop back what information we get from the assume
attributes.
Maybe with the recent prange addition, it can for pointers ...

[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization

2024-05-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995

--- Comment #4 from Andrew Pinski  ---
Oh  this is the more correct syntax:
[[assume((uintptr_t(x_array) & (32-1)) == 0)]];
[[assume((uintptr_t(mul_array) & (32-1)) == 0)]];
[[assume((uintptr_t(add_array) & (32-1)) == 0)]];

[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization

2024-05-08 Thread pratikc at live dot co.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995

--- Comment #3 from Pratik Chowdhury  ---
Yeah definitely.

My bad.

Sorry.

@Andrew Pinski [however even that change does not seem to change the results
for GCC with Aligned Loads not being used](https://gcc.godbolt.org/z/9WbMbePc1)

Added the more corrected 2 Power Trick function at the very end 

Any other mistake I could have made?

[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization

2024-05-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995

--- Comment #2 from Andrew Pinski  ---
I suspect the syntax you want instead is:
[[assume(uintptr_t(x_array) & (32-1) == 0]];

Becuase __builtin_assume_aligned takes a pointer and returns a pointer that has
the assumed alignment

[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization

2024-05-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995

--- Comment #1 from Andrew Pinski  ---
Created attachment 58135
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58135&action=edit
testcase