https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
--- Comment #27 from Li Pan <pan2.li at intel dot com> --- Hi Richard and Juzhe. I investigated this issue recently and noticed that it may be related to the array size of the constant memory. Assume we have 2 functions as below. vuint8m1_t fn_00000 () { uint8_t arr[32] = {1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9, 1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9}; return __riscv_vle8_v_u8m1(arr, 32); } vuint8m2_t fn_11111 () { uint8_t arr[32] = {1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9, 1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9}; return __riscv_vle8_v_u8m2(arr, 32); } The vuint8m1 will have stack variables but the vuint8m2 doesn't. Thus I guess there may be some limitations when optimization. Finally, I located extract_low_bits when get_stored_val in dse. Looks like it can only take care of scalar mode if the nunits are not equal. rtx extract_low_bits (machine_mode mode, machine_mode src_mode, rtx src) { ... if (!int_mode_for_mode (src_mode).exists (&src_int_mode) || !int_mode_for_mode (mode).exists (&int_mode)) return NULL_RTX; ... } I try to allow the vector mode for the gen_lowpart here if and only if the size of mode is not greater than src mode. It can eliminate the stack variables as we expected up to a point for the above functions. I tested RVV regression and looks good for now. But I would like to double confirm with you that it is reasonable? Before we start to do more testing. ;). Thanks.