https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116855
Bug ID: 116855 Summary: Unsafe early-break vectorization Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: fxue at os dot amperecomputing.com Target Milestone: --- For the case: char string[1020]; char * find(size_t n, char c) { for (size_t i = 0; i < n; i++) { if (string[i] == c) return &string[i]; } return 0; } On aarch64 (not SVE compilation), the loop could be vectorized with -O3 as: ... bnd.5_22 = n_4(D) >> 4; vect_cst__50 = {c_6(D), c_6(D), ..., c_6(D), c_6(D)}; ... # vectp_string.10_47 = PHI <vectp_string.10_48(8), &string(13)> # ivtmp_63 = PHI <ivtmp_64(8), 0(13)> ... vect__1.12_49 = MEM <vector(16) char> [(char *)vectp_string.10_47]; mask_patt_9.13_51 = vect__1.12_49 == vect_cst__50; if (mask_patt_9.13_51 != { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }) goto <bb 20>; [5.50%] else goto <bb 5>; [94.50%] ... vectp_string.10_48 = vectp_string.10_47 + 16; ivtmp_64 = ivtmp_63 + 1; if (ivtmp_64 < bnd.5_22) goto <bb 8>; [94.50%] else goto <bb 15>; [5.50%] Suppose that n is 1026, larger than length of "string", and only its last element equals "char c", then the search would end up with a vector load that contains unsafe memory accesses out bound of "string", and this may trigger segfault. One possible fix is to generate vector niter using the smaller value between known constant bound and variable scalar niter. Another solution is that we could follow assertion as "-fallow-store-data-races", which assume segfault would not happen, so it is fine with introduction of new data races, then we could enable the vectorization with -Ofast, not -O3. And by this means, it could be extended to cover data array (represented by pointer) with no statically-determined bound, for example: char * find(char *string, size_t n, char c) { for (size_t i = 0; i < n; i++) { if (string[i] == c) return &string[i]; } return 0; }