On 5/7/24 11:38 PM, Christoph Müllner wrote:
The Zicboz extension offers the cbo.zero instruction, which can be used
to clean a memory region corresponding to a cache block.
The Zic64b extension defines the cache block size to 64 byte.
If both extensions are available, it is possible to use cbo.zero
to clear memory, if the alignment and size constraints are met.
This patch implements this.

gcc/ChangeLog:

        * config/riscv/riscv-protos.h (riscv_expand_block_clear): New prototype.
        * config/riscv/riscv-string.cc (riscv_expand_block_clear_zicboz_zic64b):
        New function to expand a block-clear with cbo.zero.
        (riscv_expand_block_clear): New RISC-V block-clear expansion function.
        * config/riscv/riscv.md (setmem<mode>): New setmem expansion.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/cmo-zicboz-zic64-1.c: New test.
Depending on the underlying uarch details cbo.zero may not be nearly as useful as it might first appear. There can be multiple uarch details that come into play. We've done a fair amount of measurement internally in this space and while cbo.zero is a win, it's not a huge win. Point being we may nee to come back and make this part of the tuning structure so uarchs can adjust.

--


I know in the cbo memset implementation VRULL provided to Ventana you used the trick of allowing overlapping stores to avoid the alignment requirements. ie we issue a series of "sd" instrutions to ensure we cross the alignment barrier, then a series of cbo.zero instructions for the cache lines (possibly overlapping the locations stored by those "sd" instructions, then handled residuals which may overlap the last cbo.zero instructions.

I don't think you necessarily have to do that for this patch, but I suspect that a similar approach would make this apply much more often in practice.

So, OK for the trunk and consider the unaligned cases as potential follow-up enhancements.

THanks
Jeff

Reply via email to