On Tue, 9 May 2023, Li, Pan2 wrote:

> Update the memory allocated bytes for both the all 12-bits patch and 
> code 8-bits + mode 16-bits.

Just to throw in a comment here - for IL tree/GIMPLE is the more
important part since the whole program will be in tree/GIMPLE while
we only have a single function in RTL at a time.

Some host archs will have difficulties loading unaligned words so
it is important to keep often accessed larger bitfields aligned
to allow efficient access (aligned load + mask, no shifts).  That
means ideally machine_mode will be 16 bits and code 8 or 16 bits.

I think shrinking RTX code is a good idea, we'll unlikely run out of
bits there.  Shrinking RTX code means you have to re-order
code and mode (see above about alignment), that will complicate the
var-tracking "fixup".

We are going to run out of bits in tree_type_common, we've been
handing them out without much care recently :/

Richard.

> Bytes allocated with O2:
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> Benchmark             |  upstream             | with the all 12-bits patch    
> | with 8 bits code and 16 bits mode patch
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
> 400.perlbench         | 25286185160           | 25286590847 ~0.0%             
> | 25286927562 ~0.0%
> 401.bzip2             | 1429883731            | 1430373103 ~0.0%              
> | 1430401245 ~0.0%
> 403.gcc                       | 55023568981           | 55027574220 ~0.0%     
>         | 55028727683 ~0.0%
> 429.mcf               | 1360975660            | 1360959361 ~0.0%              
> | 1360960745 ~0.0%
> 445.gobmk             | 12791636502           | 12789648370 ~0.0%             
> | 12789919097 ~0.0%
> 456.hmmer             | 9354433652            | 9353899089 ~0.0%              
> | 9353990523 ~0.0%
> 458.sjeng             | 1991260562            | 1991107773 ~0.0%              
> | 1991153851 ~0.0%
> 462.libquantum                | 1725112078            | 1724972077 ~0.0%      
>         | 1724983726 ~0.0%
> 464.h264ref           | 8597673515            | 8597748172 ~0.0%              
> | 8597931771 ~0.0%
> 471.omnetpp           | 37613034778           | 37614346380 ~0.0%             
> | 37614470890 ~0.0%
> 473.astar             | 3817295518            | 3817226365 ~0.0%              
> | 3817239631 ~0.0%
> 483.xalancbmk         | 149418776991  | 149405214817 ~0.0%            | 
> 149405744428 ~0.0%
> 
> Bytes allocated with Ofast + funroll-loops:
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> Benchmark             |  upstream             | with the all 12-bits patch    
> | with 8 bits code and 16 bits mode patch
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
> 400.perlbench         | 30438407499           | 30568217795 +0.4%             
> | 30568869401 +0.4%
> 401.bzip2             | 2277114519            | 2318588280 +1.8%              
> | 2318659896 +1.8%
> 403.gcc                       | 64499664264           | 64764400606 +0.4%     
>         | 64766107560 +0.4%
> 429.mcf               | 1361486758            | 1399872438 +2.8%              
> | 1399876436 +2.8%
> 445.gobmk             | 15258056111           | 15392769408 +0.9%             
> | 15393305108 +0.9%
> 456.hmmer             | 10896615649           | 10934649010 +0.3%             
> | 10934858994 +0.4%
> 458.sjeng             | 2592620709            | 2641551464 +1.9%              
> | 2641641389 +1.9%
> 462.libquantum                | 1814487525            | 1856446214 +2.3%      
>         | 1856475555 +2.3%
> 464.h264ref           | 13528736878           | 13606989269 +0.6%             
> | 13607467432 +0.6%
> 471.omnetpp           | 38721066702           | 38908678658 +0.5%             
> | 38908940169 +0.5%
> 473.astar             | 3924015756            | 3967867190 +1.1%              
> | 3967897551 +1.1%
> 483.xalancbmk         | 165897692838  | 166818255397 +0.6%            | 
> 166819397831 +0.6%
> 
> Pan
> 
> 
> -----Original Message-----
> From: Li, Pan2 
> Sent: Monday, May 8, 2023 4:06 PM
> To: Richard Biener <rguent...@suse.de>
> Cc: Jeff Law <jeffreya...@gmail.com>; Kito Cheng <kito.ch...@gmail.com>; 
> juzhe.zh...@rivai.ai; richard.sandiford <richard.sandif...@arm.com>; 
> gcc-patches <gcc-patches@gcc.gnu.org>; palmer <pal...@dabbelt.com>; jakub 
> <ja...@redhat.com>
> Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
> 16-bit
> 
> After the bits patch like below.
> 
> rtx_def code 16 => 8 bits.
> rtx_def mode 8 => 16 bits.
> tree_base code unchanged.
> 
> The structure layout of both the rtx_def and tree_base will be something 
> similar as below. As I understand, the lower 8-bits of tree_base will be 
> inspected when 'dv' is a tree for the rtx conversion.
> 
> tree_base             rtx_def
> code: 16              code: 8
> side_effects_flag: 1  mode: 16
> constant_flag: 1
> addressable_flag: 1
> volatile_flag: 1
> readonly_flag: 1
> asm_written_flag: 1
> nowarning_flag: 1
> visited: 1
> used_flag: 1
> nothrow_flag: 1
> static_flag: 1
> public_flag: 1
> private_flag: 1
> protected_flag: 1
> deprecated_flag: 1
> default_def_flag: 1
> 
> I have a try a similar approach (as below) as you mentioned, aka shrink 
> tree_code as 1:1 overlap to rtx_code. And completed one memory allocated 
> bytes test in another email.
> 
> rtx_def code 16 => 12 bits.
> rtx_def mode 8 => 12 bits.
> tree_base code 16 => 12 bits.
> 
> Pan
> 
> -----Original Message-----
> From: Richard Biener <rguent...@suse.de>
> Sent: Monday, May 8, 2023 3:38 PM
> To: Li, Pan2 <pan2...@intel.com>
> Cc: Jeff Law <jeffreya...@gmail.com>; Kito Cheng <kito.ch...@gmail.com>; 
> juzhe.zh...@rivai.ai; richard.sandiford <richard.sandif...@arm.com>; 
> gcc-patches <gcc-patches@gcc.gnu.org>; palmer <pal...@dabbelt.com>; jakub 
> <ja...@redhat.com>
> Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
> 16-bit
> 
> On Mon, 8 May 2023, Li, Pan2 wrote:
> 
> > return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able to 
> > fix this ICE after mode bits change.
> 
> Can you check which bits this will inspect when 'dv' is a tree after your 
> patch?  VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when 
> there was a 1:1 overlap.
> 
> I think for all cases but struct loc_exp_dep we could find a bit to record 
> wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be 
> difficult (unless we start to take bits from pointer representations).
> 
> That said, I agree with Jeff that the code is ugly, but a simplistic 
> conversion isn't what we want.
> 
> An alternative "solution" might be to also shrink tree_code when we shrink 
> rtx_code and keep the 1:1 overlap.
> 
> Richard.
> 
> > I will re-trigger the memory allocate bytes test with below changes 
> > for X86.
> > 
> > rtx_def code 16 => 8 bits.
> > rtx_def mode 8 => 16 bits.
> > tree_base code unchanged.
> > 
> > Pan
> > 
> > -----Original Message-----
> > From: Li, Pan2
> > Sent: Monday, May 8, 2023 2:42 PM
> > To: Richard Biener <rguent...@suse.de>; Jeff Law 
> > <jeffreya...@gmail.com>
> > Cc: Kito Cheng <kito.ch...@gmail.com>; juzhe.zh...@rivai.ai; 
> > richard.sandiford <richard.sandif...@arm.com>; gcc-patches 
> > <gcc-patches@gcc.gnu.org>; palmer <pal...@dabbelt.com>; jakub 
> > <ja...@redhat.com>
> > Subject: RE: [PATCH] machine_mode type size: Extend enum size from 
> > 8-bit to 16-bit
> > 
> > Oops. Actually I am patching a version as you mentioned like storage 
> > allocation. Thank you Richard, will try your suggestion and keep you posted.
> > 
> > Pan
> > 
> > -----Original Message-----
> > From: Richard Biener <rguent...@suse.de>
> > Sent: Monday, May 8, 2023 2:30 PM
> > To: Jeff Law <jeffreya...@gmail.com>
> > Cc: Li, Pan2 <pan2...@intel.com>; Kito Cheng <kito.ch...@gmail.com>; 
> > juzhe.zh...@rivai.ai; richard.sandiford <richard.sandif...@arm.com>; 
> > gcc-patches <gcc-patches@gcc.gnu.org>; palmer <pal...@dabbelt.com>; 
> > jakub <ja...@redhat.com>
> > Subject: Re: [PATCH] machine_mode type size: Extend enum size from 
> > 8-bit to 16-bit
> > 
> > On Sun, 7 May 2023, Jeff Law wrote:
> > 
> > > 
> > > 
> > > On 5/6/23 19:55, Li, Pan2 wrote:
> > > > It looks like we cannot simply swap the code and mode in rtx_def, 
> > > > the code may have to be the same bits as the tree_code in tree_base.
> > > > Or we will meet ICE like below.
> > > > 
> > > > rtx_def code 16 => 8 bits.
> > > > rtx_def mode 8 => 16 bits.
> > > > 
> > > > static inline decl_or_value
> > > > dv_from_value (rtx value)
> > > > {
> > > >    decl_or_value dv;
> > > >    dv = value;
> > > >    gcc_checking_assert (dv_is_value_p (dv));  <=  ICE
> > > >    return dv;
> > > Ugh.  We really just need to fix this code.  It assumes particular 
> > > structure layouts and that's just wrong/dumb.
> > 
> > Well, it's a neat trick ... we just need to adjust it to
> > 
> > static inline bool
> > dv_is_decl_p (decl_or_value dv)
> > {
> >   return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; }
> > 
> > I think (and hope for the 'decl' case the bits inspected are never 
> > 'VALUE').  Of course the above stinks from a TBAA perspective ...
> > 
> > Any "real" fix would require allocating storage for a discriminator and 
> > thus hurt the resource constrained var-tracking a lot.
> > 
> > Richard.
> > 
> 
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, 
> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 
> 36809 (AG Nuernberg)
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Reply via email to