>> So we're inserting a dummy vect_perm element (that's live from the start?).
>> Would it make sense to instead increase the number of needed registers for
>> a load/store and handle this similarly to compute_nregs_for_mode?
>> Maybe also do it directly in compute_local_live_ranges and extend live_range
>> by an nregs?

Yeah,  we're inserting a dummy vect_perm element which has live range from 0 to 
max_point of the bb.
I have tried it in 'compute_nregs_for_mode' and 'compute_local_live_ranges' 
since we don't know the maximum point of bb yet.

Address other comments in V3:
[PATCH V3] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for 
non-adjacent load/store (gnu.org)



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-10-16 20:33
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V2] RISC-V: Fix unexpected big LMUL choosing in dynamic 
LMUL model for non-adjacent load/store
Hi Juzhe,
 
> +/* Get STORE value.  */
> +static tree
> +get_store_value (gimple *stmt)
> +{
> +  if (is_gimple_call (stmt) && gimple_call_internal_p (stmt))
> +    {
> +      if (gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
> + return gimple_call_arg (stmt, 3);
> +      else
> + gcc_unreachable ();
> +    }
> +  else
> +    return gimple_assign_rhs1 (stmt);
> +}
 
This was something I was about to mention in the review of v1
that I already started.  This is better now.
 
> +       basic_block bb = e->src;
 
Rename to pred or so?  And then keep the original bb.
 
>        if (!live_range)
> - continue;
> + {
> +   if (single_succ_p (e->src))
> +     {
> +       /*
> + <bb 7> [local count: 850510900]:
> + goto <bb 3>; [100.00%]
> +
> + Handle this case, we should extend live range of bb 3.
> +       */
 
/* If the predecessor is an extended basic block extend it and look for
   DEF's definition again.  */
 
> +       bb = single_succ (e->src);
> +       if (!program_points_per_bb.get (bb))
> + continue;
> +       live_ranges = live_ranges_per_bb.get (bb);
> +       max_point
> + = (*program_points_per_bb.get (bb)).length () - 1;
> +       live_range = live_ranges->get (def);
> +       if (!live_range)
> + continue;
> +     }
> +   else
> +     continue;
> + }
 
We're approaching a complexity where reverse postorder would have
helped ;)  Maybe split out the live range into a separate function
get_live_range_for_bb ()?
 
> +      for (si = gsi_start_bb (bbs[i]); !gsi_end_p (si); gsi_next (&si))
> + {
> +   if (!(is_gimple_assign (gsi_stmt (si))
> + || is_gimple_call (gsi_stmt (si))))
> +     continue;
> +   stmt_vec_info stmt_info = vinfo->lookup_stmt (gsi_stmt (si));
> +   enum stmt_vec_info_type type
> +     = STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info));
> +   if ((type == load_vec_info_type || type == store_vec_info_type)
> +       && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info)))
> +     {
> +       /* For non-adjacent load/store STMT, we will potentially
> + convert it into:
> +
> +    1. MASK_LEN_GATHER_LOAD (..., perm indice).
> +    2. Continguous load/store + VEC_PERM (..., perm indice)
> +
> + We will be likely using one more vector variable.  */
> +       unsigned int max_point
> + = (*program_points_per_bb.get (bbs[i])).length () - 1;
> +       auto *live_ranges = live_ranges_per_bb.get (bbs[i]);
> +       bool existed_p = false;
> +       tree var = type == load_vec_info_type
> +    ? gimple_get_lhs (gsi_stmt (si))
> +    : get_store_value (gsi_stmt (si));
> +       tree sel_type = build_nonstandard_integer_type (
> + TYPE_PRECISION (TREE_TYPE (var)), 1);
> +       tree sel = build_decl (UNKNOWN_LOCATION, VAR_DECL,
> +      get_identifier ("vect_perm"), sel_type);
> +       pair &live_range = live_ranges->get_or_insert (sel, &existed_p);
> +       gcc_assert (!existed_p);
> +       live_range = pair (0, max_point);
> +       if (dump_enabled_p ())
> + dump_printf_loc (MSG_NOTE, vect_location,
> + "Add perm indice %T, start = 0, end = %d\n",
> + sel, max_point);
> +     }
> + }
>      }
>  }
 
So we're inserting a dummy vect_perm element (that's live from the start?).
Would it make sense to instead increase the number of needed registers for
a load/store and handle this similarly to compute_nregs_for_mode?
Maybe also do it directly in compute_local_live_ranges and extend live_range
by an nregs?
 
Regards
Robin
 
 

Reply via email to