https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119881
Bug ID: 119881
Summary: support alias analysis for large number of pointers
Product: gcc
Version: unknown
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: tnfchris at gcc dot gnu.org
Blocks: 53947, 115130
Target Milestone: ---
Consider the following example:
void foo (int *a1,
int *a2,
int *a3,
int *a4,
int *a5,
int *a6,
int *a7,
int *a8,
int *a9,
int *a10,
int *a11,
int *a12,
int *a13,
int *a14,
int *a15,
int *a16,
int *a17,
int *a18,
int *a19,
int *a20,
int n, int c)
{
for (int i = 0; i < n; i++)
{
a1[i] += c;
a2[i] += c;
a3[i] += c;
a4[i] += c;
a5[i] += c;
a6[i] += c;
a7[i] += c;
a8[i] += c;
a9[i] += c;
#if 1
a10[i] += c;
a11[i] += c;
a12[i] += c;
a13[i] += c;
a14[i] += c;
a15[i] += c;
a16[i] += c;
a17[i] += c;
a18[i] += c;
a19[i] += c;
a20[i] += c;
#endif
}
}
Both GCC and Clang fail to vectorize giving up on the quadratic alias analysis.
The Intel compiler however does vectorize and does something smart
https://godbolt.org/z/3Wq7ax7o6
The pointers are dumped into a local array and the loop is guarded by a call to
__intel_rtdd_indep.
Disassembling this shows that what they do is dump the pointers as a pair
{start_ptr, end_ptr} then do a quicksort based on start_ptr and do a linear
scan over the result checking that r[i].end_ptr >= r[i].start_ptr.
This enables them to vectorize larger programs to a greater degree.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115130
[Bug 115130] [meta-bug] early break vectorization