https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #27 from Li Pan <pan2.li at intel dot com> ---
Hi Richard and Juzhe.

I investigated this issue recently and noticed that it may be related to the
array size of the constant memory. Assume we have 2 functions as below.

vuint8m1_t fn_00000 () {
  uint8_t arr[32] = {1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9, 1, 2, 7,
1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9};

  return __riscv_vle8_v_u8m1(arr, 32);
}

vuint8m2_t fn_11111 () {
  uint8_t arr[32] = {1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9, 1, 2, 7,
1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9};

  return __riscv_vle8_v_u8m2(arr, 32);
}

The vuint8m1 will have stack variables but the vuint8m2 doesn't. Thus I guess
there may be some limitations when optimization. Finally, I located
extract_low_bits when get_stored_val in dse. Looks like it can only take care
of scalar mode if the nunits are not equal.

rtx extract_low_bits (machine_mode mode, machine_mode src_mode, rtx src)
{
  ...
  if (!int_mode_for_mode (src_mode).exists (&src_int_mode)
      || !int_mode_for_mode (mode).exists (&int_mode))
    return NULL_RTX;
  ...
}

I try to allow the vector mode for the gen_lowpart here if and only if the size
of mode is not greater than src mode. It can eliminate the stack variables as
we expected up to a point for the above functions.

I tested RVV regression and looks good for now. But I would like to double
confirm with you that it is reasonable? Before we start to do more testing. ;).

Thanks.

Reply via email to