https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117558
Bug ID: 117558
Summary: peeling for gap overrun check imprecise for VLA
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
RISC-V FAILs:
FAIL: gcc.target/riscv/rvv/autovec/struct/struct_vect-17.c scan-assembler-times
vlseg4e64\\.v 1
FAIL: gcc.target/riscv/rvv/autovec/struct/struct_vect-18.c scan-assembler-times
vlseg4e64\\.v 1
The relevant check is
/* Peeling for gaps assumes that a single scalar iteration
is enough to make sure the last vector iteration doesn't
access excess elements. */
if (overrun_p
&& (!can_div_trunc_p (group_size
* LOOP_VINFO_VECT_FACTOR (loop_vinfo) -
gap,
nunits, &tem, &remain)
|| maybe_lt (remain + group_size, nunits)))
{
/* But peeling a single scalar iteration is enough if
we can use the next power-of-two sized partial
access and that is sufficiently small to be covered
by the single scalar iteration. */
unsigned HOST_WIDE_INT cnunits, cvf, cremain, cpart_size;
if (!nunits.is_constant (&cnunits)
|| !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&cvf)
|| (((cremain = group_size * cvf - gap % cnunits), true)
&& ((cpart_size = (1 << ceil_log2 (cremain))) != cnunits)
&& (cremain + group_size < cpart_size
|| vector_vector_composition_type
(vectype, cnunits / cpart_size,
&half_vtype) == NULL_TREE)))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"peeling for gaps insufficient for "
"access\n");
return false;
But with RVVM1DF we have group_size == 4, gap == 3, VF [2, 2] and nunits [2, 2]
which yields a failure to can_div_trunc_p of [5, 8] by [2, 2].
For RVVM1SF and VF [4, 4] (same group/gap) and nunits [4, 4] can_div_trunc_p
of [13, 16] by [4, 4] succeeds.
I'll note the non-SLP path lacks the above correctness check.
So the thing we're missing here is that when nunits < group_size the
maybe_lt (remain + group_size, nunits) check is never true. Of course
maybe_gt (nunits, group_size), but with say, a VF of two the division
would succeed.
I wonder how to improve the check for this case.