[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995 --- Comment #13 from Aldy Hernandez --- (In reply to Aldy Hernandez from comment #10) > Created attachment 58202 [details] > proof of concept implementing a range-op entry for builtin_assume_aligned > > Something like this (properly coded and enhanced) would work. > > The pointers_handled_p() cruft is temporary boilerplate to indicate that we > only support a fold-range() of PRANGE = IRANGE op PRANGE for this operation. > This is going away when the dust settles. FYI, I've removed the pointers_handled_p requirement for new range-op entries.
[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995 --- Comment #12 from GCC Commits --- The master branch has been updated by Aldy Hernandez : https://gcc.gnu.org/g:c400b2100719d0a9e5989c63e0827b9e98919df3 commit r15-504-gc400b2100719d0a9e5989c63e0827b9e98919df3 Author: Aldy Hernandez Date: Tue May 14 16:21:50 2024 +0200 [prange] Default pointers_handled_p() to false. The pointers_handled_p() method is an internal range-op helper to help catch dispatch type mismatches for pointer operands. This is what caught the IPA mismatch in PR114985. This method is only a temporary measure to catch any incompatibilities in the current pointer range-op entries. This patch returns true for any *new* entries in the range-op table, as the current ones are already fleshed out. This keeps us from having to implement this boilerplate function for any new range-op entries. PR tree-optimization/114995 * range-op-ptr.cc (range_operator::pointers_handled_p): Default to true.
[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995 --- Comment #11 from Aldy Hernandez --- Just to clarify. prange as well as irange keep a value/mask pair where we can store alignment info, so every calculation (range-op) along the way can track this properly. Now, for pointers we loose this information across passes because of the union in SSA_NAME_RANGE_INFO (struct tree_ssa_name) which keeps the range info in an ptr_info_def, not a proper vrange_storage. It's in my long term TODO list to propose we properly track pointer ranges once prange comes live. Note that IPA keeps alignment info on the side, so part of this info is kept across passes. But I assume they're doing this, because originally ranges didn't have alignment (value/mask) info associated with them.
[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995 --- Comment #10 from Aldy Hernandez --- Created attachment 58202 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58202&action=edit proof of concept implementing a range-op entry for builtin_assume_aligned Something like this (properly coded and enhanced) would work. The pointers_handled_p() cruft is temporary boilerplate to indicate that we only support a fold-range() of PRANGE = IRANGE op PRANGE for this operation. This is going away when the dust settles.
[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995 --- Comment #9 from Aldy Hernandez --- (In reply to Jakub Jelinek from comment #7) > The above examples just show misunderstanding what __builtin_assume_aligned > is and what it is not. You need to use the result of the built-in function > in the accesses to be able to use the alignment information, if you just try > to compare __builtin_assume_aligned (x, 32) == x, it will just fold as > always true. The design of the builtin is to attach the alignment > information to the result of the builtin function only. > > CCing Aldy/Andrew for whether prange can or could be taught to handle the > assume cases with uintptr_t and bitwise and + comparison. All the pieces are there to make it work, both with the assume aligned and with the uintptr_t case. And we could probably get it all without prange. For example: #include void foo (const float *); void bar1 (const float *array) { [[assume(array != nullptr)]]; const float *aligned = (const float *) __builtin_assume_aligned (array, 32); foo (aligned); } The __builtin_assume_aligned hasn't been expanded by evrp, so we should be able to add a range-op entry for it. This is what evrp sees: void bar1 (const float * array) { const float * aligned; : aligned_2 = __builtin_assume_aligned (array_1(D), 32); foo (aligned_2); return; } All we need is a range-op implementation for builtin_assume_aligned. The attached crude implementation does it. === BB 2 : aligned_2 = __builtin_assume_aligned (array_1(D), 32); foo (aligned_2); return; aligned_2 : [prange] const float * [0, +INF] MASK 0x VALUE 0x0 That is, the bottom 32 bits are cleared. Andrew will have to comment on the uintptr_t idiom, because it gets expanded into an .ASSUME() function which I'm unfamiliar with. For this small function: void bar2 (const float *array) { [[assume((uintptr_t (array) & (32 - 1)) == 0)]]; foo (array); } evrp expands to: === BB 2 Partial equiv (array.0_3 pe64 array_2(D)) : array.0_3 = (long unsigned int) array_2(D); _4 = array.0_3 & 31; _5 = _4 == 0; return _5; _4 : [irange] long unsigned int [0, 31] MASK 0x1f VALUE 0x0 I don't see any reason why we couldn't get that array.0_3 and array_2 are aligned to 32-bits. Maybe we don't set the value/mask pair for the bitwise_and::op1_range? The value/mask stuff is not very fleshed out, especially for the op1_range operators.
[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995 --- Comment #8 from Pratik Chowdhury --- > if you just try to compare __builtin_assume_aligned (x, 32) == x, it will > just fold as always true Aah. Dead code elimination. > CCing Aldy/Andrew for whether prange can or could be taught to handle the > assume cases with uintptr_t and bitwise and + comparison. Yeah this could be very helpful in cases such as this. @Jakub @Andrew I think [this](https://gcc.godbolt.org/z/MEre8hr71) also has scope for taking advantage of the same. ```cpp void MulAddLoopWorksWithBuitInUnreachableAndConst(const float* const __restrict mul_array, const float* const __restrict add_array, const ::std::size_t size, float* const __restrict x_array) { if((reinterpret_cast<::std::uintptr_t>(mul_array) & (32-1)) != 0) { __builtin_unreachable(); } if((reinterpret_cast<::std::uintptr_t>(add_array) & (32-1)) != 0) { __builtin_unreachable(); } if((reinterpret_cast<::std::uintptr_t>(x_array) & (32-1)) != 0) { __builtin_unreachable(); } if ((size & (32 - 1)) != 0) __builtin_unreachable(); for (::std::size_t i = 0; i != size; i++) [[likely]] { const auto mul = *(mul_array + i); const auto add = *(add_array + i); x_array[i] = x_array[i] * mul + add; // x_array[i] *= mul; // x_array[i] += add; } } ``` Here we are working under the assumption that the memory addresses themselves are multiples of 32 if aligned for AVX2. Clang seems to be able to take advantage of the same here. If the __builtin_assume_aligned is kinda not supported due to dead code elimination, then this looks like a nice enough alternative. It also retains const correctness for me.
[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995 Jakub Jelinek changed: What|Removed |Added CC||aldyh at gcc dot gnu.org, ||amacleod at redhat dot com, ||jakub at gcc dot gnu.org --- Comment #7 from Jakub Jelinek --- The above examples just show misunderstanding what __builtin_assume_aligned is and what it is not. You need to use the result of the built-in function in the accesses to be able to use the alignment information, if you just try to compare __builtin_assume_aligned (x, 32) == x, it will just fold as always true. The design of the builtin is to attach the alignment information to the result of the builtin function only. CCing Aldy/Andrew for whether prange can or could be taught to handle the assume cases with uintptr_t and bitwise and + comparison.
[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995 --- Comment #6 from Pratik Chowdhury --- > [[assume((uintptr_t(x_array) & (32-1)) == 0)]]; The Parans in the & have definitely given someone sleepless nights LOL. I myself was saved by the warnings. > Right now we don't always prop back what information we get from the assume > attributes. Maybe with the recent prange addition, it can for pointers ... Aah Guess we will switch to assume in the future. I tried [something else just about now](https://gcc.godbolt.org/z/d8aPjzMhq) I think its a bit wrong. Clang seems to be able to handle it. Is this syntax even valid? ```cpp if(reinterpret_cast<::std::uintptr_t>(__builtin_assume_aligned((void*)(mul_array), 32)) != reinterpret_cast<::std::uintptr_t>(mul_array)) { __builtin_unreachable(); } if(reinterpret_cast<::std::uintptr_t>(__builtin_assume_aligned((void*)(add_array), 32)) != reinterpret_cast<::std::uintptr_t>(add_array)) { __builtin_unreachable(); } if(reinterpret_cast<::std::uintptr_t>(__builtin_assume_aligned((void*)(x_array), 32)) != reinterpret_cast<::std::uintptr_t>(x_array)) { __builtin_unreachable(); } ``` I have my doubts on the previous one But this should ideally be valid ```cpp if((reinterpret_cast<::std::uintptr_t>(__builtin_assume_aligned((void*)(mul_array), 32)) & (32-1)) != 0) { __builtin_unreachable(); } if((reinterpret_cast<::std::uintptr_t>(__builtin_assume_aligned((void*)(add_array), 32)) & (32-1)) != 0) { __builtin_unreachable(); } if((reinterpret_cast<::std::uintptr_t>(__builtin_assume_aligned((void*)(x_array), 32)) & (32-1)) != 0) { __builtin_unreachable(); } ``` But either of them are unable to change the Load Stores from Unaligned to Aligned. Maybe victims of aggressive Dead Code elimination here? GCC intrinsics believe that there can be no case its false and the code is deleted for the same? Because __builtin_assume_aligned should always be a multiple of 32 in the above cases. Or __builtin_assume_aligned does not support usage like this in GCC and in Clang it does? And due to that difference, we have a difference in behavior. Its pretty interesting either way.
[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement Keywords||missed-optimization See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=109045 --- Comment #5 from Andrew Pinski --- Right now we don't always prop back what information we get from the assume attributes. Maybe with the recent prange addition, it can for pointers ...
[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995 --- Comment #4 from Andrew Pinski --- Oh this is the more correct syntax: [[assume((uintptr_t(x_array) & (32-1)) == 0)]]; [[assume((uintptr_t(mul_array) & (32-1)) == 0)]]; [[assume((uintptr_t(add_array) & (32-1)) == 0)]];
[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995 --- Comment #3 from Pratik Chowdhury --- Yeah definitely. My bad. Sorry. @Andrew Pinski [however even that change does not seem to change the results for GCC with Aligned Loads not being used](https://gcc.godbolt.org/z/9WbMbePc1) Added the more corrected 2 Power Trick function at the very end Any other mistake I could have made?
[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995 --- Comment #2 from Andrew Pinski --- I suspect the syntax you want instead is: [[assume(uintptr_t(x_array) & (32-1) == 0]]; Becuase __builtin_assume_aligned takes a pointer and returns a pointer that has the assumed alignment
[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995 --- Comment #1 from Andrew Pinski --- Created attachment 58135 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58135&action=edit testcase