Hi Mark, On 1/15/21 3:45 PM, Mark Rutland wrote: > On Fri, Jan 15, 2021 at 12:00:43PM +0000, Vincenzo Frascino wrote: >> mte_assign_mem_tag_range() is called on production KASAN HW hot >> paths. It makes sense to optimize it in an attempt to reduce the >> overhead. >> >> Optimize mte_assign_mem_tag_range() based on the indications provided at >> [1]. > > ... what exactly is the optimization? > > I /think/ you're just trying to have it inlined, but you should mention > that explicitly. >
Good point, I will change it in the next version. I used "Optimize" as a continuation of the topic in the previous thread but you are right it is not immediately obvious. >> >> [1] >> https://lore.kernel.org/r/caaehk+wco+j7d1_t89dg+jjrplk3x9rsgfkxjgd0zcufjqt...@mail.gmail.com/ >> >> Cc: Catalin Marinas <catalin.mari...@arm.com> >> Cc: Will Deacon <w...@kernel.org> >> Signed-off-by: Vincenzo Frascino <vincenzo.frasc...@arm.com> >> --- >> arch/arm64/include/asm/mte.h | 26 +++++++++++++++++++++++++- >> arch/arm64/lib/mte.S | 15 --------------- >> 2 files changed, 25 insertions(+), 16 deletions(-) >> >> diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h >> index 1a715963d909..9730f2b07b79 100644 >> --- a/arch/arm64/include/asm/mte.h >> +++ b/arch/arm64/include/asm/mte.h >> @@ -49,7 +49,31 @@ long get_mte_ctrl(struct task_struct *task); >> int mte_ptrace_copy_tags(struct task_struct *child, long request, >> unsigned long addr, unsigned long data); >> >> -void mte_assign_mem_tag_range(void *addr, size_t size); >> +static inline void mte_assign_mem_tag_range(void *addr, size_t size) >> +{ >> + u64 _addr = (u64)addr; >> + u64 _end = _addr + size; >> + >> + /* >> + * This function must be invoked from an MTE enabled context. >> + * >> + * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and >> + * size must be non-zero and MTE_GRANULE_SIZE aligned. >> + */ >> + do { >> + /* >> + * 'asm volatile' is required to prevent the compiler to move >> + * the statement outside of the loop. >> + */ >> + asm volatile(__MTE_PREAMBLE "stg %0, [%0]" >> + : >> + : "r" (_addr) >> + : "memory"); >> + >> + _addr += MTE_GRANULE_SIZE; >> + } while (_addr < _end); > > Is there any chance that this can be used for the last bytes of the > virtual address space? This might need to change to `_addr == _end` if > that is possible, otherwise it'll terminate early in that case. > Theoretically it is a possibility. I will change the condition and add a note for that. >> +} > > What does the code generation look like for this, relative to the > assembly version? > The assembly looks like this: 390: 8b000022 add x2, x1, x0 394: aa0003e1 mov x1, x0 398: d9200821 stg x1, [x1] 39c: 91004021 add x1, x1, #0x10 3a0: eb01005f cmp x2, x1 3a4: 54ffffa8 b.hi 398 <mte_set_mem_tag_range+0x48> You can see the handcrafted one below. > Thanks, > Mark. > >> + >> >> #else /* CONFIG_ARM64_MTE */ >> >> diff --git a/arch/arm64/lib/mte.S b/arch/arm64/lib/mte.S >> index 9e1a12e10053..a0a650451510 100644 >> --- a/arch/arm64/lib/mte.S >> +++ b/arch/arm64/lib/mte.S >> @@ -150,18 +150,3 @@ SYM_FUNC_START(mte_restore_page_tags) >> ret >> SYM_FUNC_END(mte_restore_page_tags) >> >> -/* >> - * Assign allocation tags for a region of memory based on the pointer tag >> - * x0 - source pointer >> - * x1 - size >> - * >> - * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and >> - * size must be non-zero and MTE_GRANULE_SIZE aligned. >> - */ >> -SYM_FUNC_START(mte_assign_mem_tag_range) >> -1: stg x0, [x0] >> - add x0, x0, #MTE_GRANULE_SIZE >> - subs x1, x1, #MTE_GRANULE_SIZE >> - b.gt 1b >> - ret >> -SYM_FUNC_END(mte_assign_mem_tag_range) >> -- >> 2.30.0 >> -- Regards, Vincenzo