I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated".
Consider some variance of valgrind, it looks like the impact to bytes allocated may be limited. However, I am still running this for x86, it will take more than 30 hours for each iteration... RISC-V GCC Version: >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc --version riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503 (experimental) Copyright (C) 2023 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Bytes allocated with O2: ----------------------------------------------------------------------------------------------------- Benchmark | upstream | with this PATCH ----------------------------------------------------------------------------------------------------- 400.perlbench | 29699642875 | 29949876269 ~0.0% 401.bzip2 | 1641041659 | 1755563972 +6.95% 403.gcc | 68447500516 | 68900883291 ~0.0% 429.mcf | 1433156462 | 1433253373 ~0.0% 445.gobmk | 14239225210 | 14463438465 ~0.0% 456.hmmer | 9635955623 | 9808534948 +1.8% 458.sjeng | 2419478204 | 2545478940 +5.4% 462.libquantum | 1686404489 | 1800884197 +6.8% 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6% 471.omnetpp | 40814627684 | 41185864529 ~0.0% 473.astar | 3807097529 | 3928428183 +3.2% 483.xalancbmk | 152959418167 | 154201738843 ~0.0% Bytes allocated with Ofast + funroll-loops: ------------------------------------------------------------------------------------------ Benchmark | upstream | with this PATCH ------------------------------------------------------------------------------------------ 400.perlbench | 39491184733 | 39223020267 ~0.0% 401.bzip2 | 2843871517 | 2730383463 ~0% 403.gcc | 84195991898 | 83730632955 -4.0% 429.mcf | 1481381164 | 1367309565 -7.7% 445.gobmk | 20123943663 | 19886116394 -1.2% 456.hmmer | 12302445139 | 12121745383 -1.5% 458.sjeng | 3884712615 | 3755481930 -3.3% 462.libquantum | 1966619940 | 1852274342 -5.8% 464.h264ref | 19219365552 | 19050288201 ~0.0% 471.omnetpp | 45701008325 | 45327805079 ~0.0% 473.astar | 4118600354 | 3995943705 -3.0% 483.xalancbmk | 179481305182 | 178160306301 ~0.0% Pan -----Original Message----- From: Gcc-patches <[email protected]> On Behalf Of ??? Sent: Thursday, April 13, 2023 7:23 AM To: kito.cheng <[email protected]>; rguenther <[email protected]> Cc: richard.sandiford <[email protected]>; Jeff Law <[email protected]>; gcc-patches <[email protected]>; palmer <[email protected]>; jakub <[email protected]> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit Yeah, like kito said. Turns out the tuple type model in ARM SVE is the optimal solution for RVV. And we like ARM SVE style implmentation. And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit. But it seems that there is still problem in tree_type_common and tree_decl_common, is that right? After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently. However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc. From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes). Is it possible make it happen in tree_type_common and tree_decl_common, Richards? Thank you so much for all comments. [email protected] From: Kito Cheng Date: 2023-04-12 17:31 To: Richard Biener CC: [email protected]; richard.sandiford; jeffreyalaw; gcc-patches; palmer; jakub Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit > > The concept of fractional LMUL is the same as the concept of > > AArch64's partial SVE vectors, so they can only access the lowest > > part, like SVE's partial vector. > > > > We want to spill/restore the exact size of those modes (1/2, 1/4, > > 1/8), so adding dedicated modes for those partial vector modes > > should be unavoidable IMO. > > > > And even if we use sub-vector, we still need to define those partial > > vector types. > > Could you use integer modes for the fractional vectors? You mean using the scalar integer mode like using (subreg:SI (reg:VNx4SI) 0) to represent LMUL=1/4? (Assume VNx4SI is mode for M1) If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits. > For computation you can always appropriately limit the LEN? RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) to guarantee the vector length is at least larger than N bits, but it's just guarantee the minimal length like SVE guarantee the minimal vector length is 128 bits
