https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117001
Bug ID: 117001
Summary: O3 auto tree loop vectorization produces incorrect
output on armv8.2-a+sve
Product: gcc
Version: 10.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: Robert.Hardwick at arm dot com
Target Milestone: ---
We have seen some incorrect numbers being produced when O3 is enabled on Arm
Neoverse V1 ( armv8.2-a+sve ). I have reduced the problem down to a small
reproducer and identified that adding -fno-tree-loop-vectorize to gcc options
will produce the correct output.
It seems to happen when we have a C style array contained within a std::array
stucture and it occurs when auto loop vectorization is enabled.
This has been observed on 10.2.1 and 11.4.1
Reproducible example
#include <array>
typedef std::array<uint64_t[4], 2> my_type;
// helpful to print output to stdout
std::ostream& operator<<(std::ostream& stream, const my_type& vec) {
stream << "[";
for ( int j = 0; j < 2; j++){
for (int i = 0; i != 4; i++) {
if (i != 0 || j != 0) {
stream << ", ";
}
stream << vec[j][i];
}
}
stream << "]";
return stream;
}
int main() {
my_type a = {{0, 0, 0, 1, 0, 0, 1, 0}};
my_type b = {{1, 1, 1, 1, 1, 1, 1, 1}};
my_type mask = {{0, 0, 0, 0, 0, 1, 0, 0}};
my_type result = {{0, 0, 0, 0, 0, 0, 0, 0}};
for (int i = 0; i < 2; i++) {
for (int j = 0; j < 4; j++) {
if ( mask[i][j] != 0 )
{
result[i][j] = b[i][j];
} else {
result[i][j] = a[i][j];
}
}
}
std::cout << result << std::endl;
}
Observations
With -O3 -fno-tree-loop-vectorize -march=armv8.2-a+sve output is INCORRECT
[0, 0, 0, 1, 0, 0, 1, 0]
with -O3 -march=armv8.2-a+sve output is CORRECT
[0, 0, 0, 1, 0, 1, 1, 0]
The operation should be doing the equivalent of
result[i] = mask[i] ? b[i] : a[i]
So the 6th element ( at i=1, j=1 ) should be 1, not 0.