I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes 
for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count 
the bytes allocated from valgrind log like this "==2832896==   total heap 
usage: 208 allocs, 165 frees, 123,204 bytes allocated".

Consider some variance of valgrind, it looks like the impact to bytes allocated 
may be limited. However, I am still running this for x86, it will take more 
than 30 hours for each iteration...

RISC-V GCC Version:
>> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc --version
riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503 (experimental)
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Bytes allocated with O2:
-----------------------------------------------------------------------------------------------------
Benchmark               |  upstream             | with this PATCH       
-----------------------------------------------------------------------------------------------------
400.perlbench           | 29699642875           | 29949876269 ~0.0%
401.bzip2               | 1641041659            | 1755563972 +6.95%
403.gcc                 | 68447500516           | 68900883291 ~0.0%
429.mcf         | 1433156462            | 1433253373 ~0.0%
445.gobmk               | 14239225210           | 14463438465 ~0.0%
456.hmmer               | 9635955623            | 9808534948 +1.8%
458.sjeng               | 2419478204            | 2545478940 +5.4%
462.libquantum          | 1686404489            | 1800884197 +6.8%
464.h264ref     8j1     | 10190413900           | 10351134161 +1.6%
471.omnetpp             | 40814627684           | 41185864529 ~0.0%
473.astar               | 3807097529            | 3928428183 +3.2%
483.xalancbmk           | 152959418167  | 154201738843 ~0.0%

Bytes allocated with Ofast + funroll-loops:
------------------------------------------------------------------------------------------
Benchmark               |  upstream             | with this PATCH
------------------------------------------------------------------------------------------
400.perlbench           |  39491184733          | 39223020267 ~0.0% 
401.bzip2               |  2843871517           | 2730383463 ~0%
403.gcc                 |  84195991898          | 83730632955 -4.0% 
429.mcf         |  1481381164           | 1367309565 -7.7%
445.gobmk               |  20123943663          | 19886116394 -1.2%
456.hmmer               |  12302445139          | 12121745383 -1.5%
458.sjeng               |  3884712615           | 3755481930  -3.3%
462.libquantum          |  1966619940           | 1852274342  -5.8%
464.h264ref             |  19219365552          | 19050288201 ~0.0%
471.omnetpp             |  45701008325          | 45327805079 ~0.0%
473.astar               |  4118600354           | 3995943705 -3.0%
483.xalancbmk           |  179481305182 | 178160306301 ~0.0%

Pan


-----Original Message-----
From: Gcc-patches <gcc-patches-bounces+pan2.li=intel....@gcc.gnu.org> On Behalf 
Of ???
Sent: Thursday, April 13, 2023 7:23 AM
To: kito.cheng <kito.ch...@gmail.com>; rguenther <rguent...@suse.de>
Cc: richard.sandiford <richard.sandif...@arm.com>; Jeff Law 
<jeffreya...@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer 
<pal...@dabbelt.com>; jakub <ja...@redhat.com>
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit

Yeah, like kito said.
Turns out the tuple type model in ARM SVE is the optimal solution for RVV.
And we like ARM SVE style implmentation.

And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal 
not exceed 64 bit.
But it seems that there is still problem in tree_type_common and 
tree_decl_common, is that right?

After several trys (remove all redundant TI/TF vector modes and FP16 vector 
mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting 
new RVV intrinsisc features recently.
However, we can't support more in the future, for example, FP16 vector, BF16 
vector, matrix modes, VLS modes,...etc.

From RVV side, I think extending 1 more bit of machine mode should be enough 
for RVV (overal 512 modes).
Is it possible make it happen in tree_type_common and tree_decl_common, 
Richards?

Thank you so much for all comments.


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-04-12 17:31
To: Richard Biener
CC: juzhe.zh...@rivai.ai; richard.sandiford; jeffreyalaw; gcc-patches; palmer; 
jakub
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
> > The concept of fractional LMUL is the same as the concept of 
> > AArch64's partial SVE vectors, so they can only access the lowest 
> > part, like SVE's partial vector.
> >
> > We want to spill/restore the exact size of those modes (1/2, 1/4, 
> > 1/8), so adding dedicated modes for those partial vector modes 
> > should be unavoidable IMO.
> >
> > And even if we use sub-vector, we still need to define those partial 
> > vector types.
>
> Could you use integer modes for the fractional vectors?
 
You mean using the scalar integer mode like using (subreg:SI
(reg:VNx4SI) 0) to represent
LMUL=1/4?
(Assume VNx4SI is mode for M1)
 
If so I think it might not be able to model that right - it seems like we are 
using 32-bits but actually we are using poly_int16(1, 1) * 32 bits.
 
> For computation you can always appropriately limit the LEN?
 
RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) to guarantee 
the vector length is at least larger than N bits, but it's just guarantee the 
minimal length like SVE guarantee the minimal vector length is 128 bits
 

Reply via email to