>> Thanks for the explanation, exporting the functions is OK.
Thanks.

>> It would be nice to put these internal functions into a class or a
>> namespace given their non LCM name.
Hi, Richi. I saw there is a function "extern void compute_available (sbitmap *, 
sbitmap *, sbitmap *, sbitmap *);"
which is already exported as global.

Do you mean I add those 2 functions (I export this patch) and 
"compute_avaialble" which has already been exported
into namespace lcm like this:

namespace lcm
{
  compute_available 
compute_antinout_edge
compute_earliest
}
?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-21 15:50
To: juzhe.zh...@rivai.ai
CC: gcc-patches; jeffreyalaw
Subject: Re: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL 
PASS use in RISC-V backend
On Mon, 21 Aug 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi. Richi.
> I'd like to share more details that I want to do in VSETVL PASS.
> 
> Consider this following case:
> 
> for
>   for 
>     for
>       ...
>          for
>      VSETVL demand: RATIO = 32 and TU policy.
> 
> For this simple case, 'pre_edge_lcm_av' can perfectly work for us, will hoist 
> "vsetvli e32,tu" to the outer-most loop.
> 
> However, for this case:
>   for
>   for 
>     for
>       ...
>          for
>    if (...)
>      VSETVL 1 demand: RATIO = 32 and TU policy.
>    else if (...)
>      VSETVL 2 demand: SEW = 16.
>    else
>      VSETVL 3 demand: MU policy.
> 
> 'pre_edge_lcm_av' is not sufficient to give us optimal codegen since VSETVL 
> 1,  VSETVL 2 and VSETVL 3 are 3 different VSETVL demands
> 'pre_edge_lcm_av' can only hoist one of them. Such case I can easily produce 
> by RVV intrinsic and they are already in our RVV testsuite.
> 
> To get the optimal codegen for this case,  We need I call it as "Demand 
> fusion" which is fusing all "compatible" VSETVLs into a single VSETVL
> then set them to avoid redundant VSETVLs.
> 
> In this case, we should be able to fuse VSETVL 1, VSETVL 2 and VSETVL 3 into 
> new VSETVL demand : SEW = 16, LMUL = MF2, TU, MU into a single 
> new VSETVL demand. Instead of giving 'pre_edge_lcm_av' 3 VSETVL demands 
> (VSETVL 1/2/3). I give 'pre_edge_lcm_av' only single 1 new VSETVL demand.
> Then, LCM PRE can hoist such fused VSETVL to the outer-most loop. So the 
> program will be transformed as:
> 
> VSETVL SEW = 16, LMUL = MF2, TU, MU
>   for
>   for 
>     for
>       ...
>          for
>    if (...) 
>      .....   no vsetvl insn.
>    else if (...)
>      ....  no vsetvl insn.
> 
>    else
>      ....  no vsetvl insn.
> 
> So, how to do the demand fusion in this case? 
> Before this patch and following RISC-V refactor patch, I do it explictly with 
> my own decide algorithm.
> Meaning I calculate which location of the program to do the VSETVL fusion is 
> correct and optimal.
> 
> However, I found "compute_earliest" can help us to do the job for calculating 
> the location of the program to do VSETVL fusion and
> turns out it's a quite more reliable and reasonable approach than I do.
> 
> So that's why I export those 2 functions for us to be use in Phase 3 (Demand 
> fusion) in RISC-V backend VSETVL PASS.
 
Thanks for the explanation, exporting the functions is OK.
 
Richard.
 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-21 15:09
> To: Juzhe-Zhong
> CC: gcc-patches; jeffreyalaw
> Subject: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL 
> PASS use in RISC-V backend
> On Mon, 21 Aug 2023, Juzhe-Zhong wrote:
>  
> > This patch exports 'compute_antinout_edge' and 'compute_earliest' as global 
> > scope
> > which is going to be used in VSETVL PASS of RISC-V backend.
> > 
> > The demand fusion is the fusion of VSETVL information to emit VSETVL which 
> > dominate and pre-config for most
> > of the RVV instructions in order to elide redundant VSETVLs.
> > 
> > For exmaple:
> > 
> > for
> >  for
> >   for
> >     if (cond}
> >       VSETVL demand 1: SEW/LMUL = 16 and TU policy
> >     else
> >       VSETVL demand 2: SEW = 32
> > 
> > VSETVL pass should be able to fuse demand 1 and demand 2 into new demand: 
> > SEW = 32, LMUL = M2, TU policy.
> > Then emit such VSETVL at the outmost of the for loop to get the most 
> > optimal codegen and run-time execution.
> > 
> > Currenty the VSETVL PASS Phase 3 (demand fusion) is really messy and 
> > un-reliable as well as un-maintainable.
> > And, I recently read dragon book and morgan's book again, I found there 
> > "earliest" can allow us to do the
> > demand fusion in a very reliable and optimal way.
> > 
> > So, this patch exports these 2 functions which are very helpful for VSETVL 
> > pass.
>  
> It would be nice to put these internal functions into a class or a
> namespace given their non LCM name.  I don't see how you are going
> to use these intermediate DF functions - they are just necessary
> to compute pre_edge_lcm_avs which I see you already do.  Just to say
> you are possibly going to blow up compile-time complexity of your
> VSETVL dataflow problem?
>  
> > gcc/ChangeLog:
> > 
> > * lcm.cc (compute_antinout_edge): Export as global use.
> > (compute_earliest): Ditto.
> > (compute_rev_insert_delete): Ditto.
> > * lcm.h (compute_antinout_edge): Ditto.
> > (compute_earliest): Ditto.
> > 
> > ---
> >  gcc/lcm.cc | 7 ++-----
> >  gcc/lcm.h  | 3 +++
> >  2 files changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/gcc/lcm.cc b/gcc/lcm.cc
> > index 94a3ed43aea..03421e490e4 100644
> > --- a/gcc/lcm.cc
> > +++ b/gcc/lcm.cc
> > @@ -56,9 +56,6 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "lcm.h"
> >  
> >  /* Edge based LCM routines.  */
> > -static void compute_antinout_edge (sbitmap *, sbitmap *, sbitmap *, 
> > sbitmap *);
> > -static void compute_earliest (struct edge_list *, int, sbitmap *, sbitmap 
> > *,
> > -       sbitmap *, sbitmap *, sbitmap *);
> >  static void compute_laterin (struct edge_list *, sbitmap *, sbitmap *,
> >       sbitmap *, sbitmap *);
> >  static void compute_insert_delete (struct edge_list *edge_list, sbitmap *,
> > @@ -79,7 +76,7 @@ static void compute_rev_insert_delete (struct edge_list 
> > *edge_list, sbitmap *,
> >     This is done based on the flow graph, and not on the pred-succ lists.
> >     Other than that, its pretty much identical to compute_antinout.  */
> >  
> > -static void
> > +void
> >  compute_antinout_edge (sbitmap *antloc, sbitmap *transp, sbitmap *antin,
> >         sbitmap *antout)
> >  {
> > @@ -170,7 +167,7 @@ compute_antinout_edge (sbitmap *antloc, sbitmap 
> > *transp, sbitmap *antin,
> >  
> >  /* Compute the earliest vector for edge based lcm.  */
> >  
> > -static void
> > +void
> >  compute_earliest (struct edge_list *edge_list, int n_exprs, sbitmap *antin,
> >    sbitmap *antout, sbitmap *avout, sbitmap *kill,
> >    sbitmap *earliest)
> > diff --git a/gcc/lcm.h b/gcc/lcm.h
> > index e08339352e0..7145d6fc46d 100644
> > --- a/gcc/lcm.h
> > +++ b/gcc/lcm.h
> > @@ -31,4 +31,7 @@ extern struct edge_list *pre_edge_rev_lcm (int, sbitmap *,
> >     sbitmap *, sbitmap *,
> >     sbitmap *, sbitmap **,
> >     sbitmap **);
> > +extern void compute_antinout_edge (sbitmap *, sbitmap *, sbitmap *, 
> > sbitmap *);
> > +extern void compute_earliest (struct edge_list *, int, sbitmap *, sbitmap 
> > *,
> > +       sbitmap *, sbitmap *, sbitmap *);
> >  #endif /* GCC_LCM_H */
> > 
>  
> 
 
-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
 

Reply via email to