On Mon, Dec 11, 2023 at 06:49:37PM +0000, Catalin Marinas wrote:
> On Fri, Nov 24, 2023 at 04:34:59PM +0000, Joey Gouly wrote:
> > @@ -211,11 +212,24 @@ init_new_context(struct task_struct *tsk, struct 
> > mm_struct *mm)
> >  {
> >     atomic64_set(&mm->context.id, 0);
> >     refcount_set(&mm->context.pinned, 0);
> > +
> > +   // pkey 0 is the default, so always reserve it.
> > +   mm->context.pkey_allocation_map = 0x1;
> 
> Nit: use /* */ style comments.
> 
> > @@ -151,7 +170,9 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t 
> > phys)
> >   * PTE_VALID bit set.
> >   */
> >  #define pte_access_permitted(pte, write) \
> > -   (((pte_val(pte) & (PTE_VALID | PTE_USER)) == (PTE_VALID | PTE_USER)) && 
> > (!(write) || pte_write(pte)))
> > +   (((pte_val(pte) & (PTE_VALID | PTE_USER)) == (PTE_VALID | PTE_USER)) && 
> > \
> > +    (!(write) || pte_write(pte)) && \
> > +    por_el0_allows_pkey(FIELD_GET(PTE_PO_IDX_MASK, pte_val(pte)), write, 
> > false))
> 
> Do not change pte_access_permitted(), just let it handle the base
> permissions. This check is about the mm tables, not some current POR_EL0
> setting of the thread.
> 
> As an example, with this change Linux may decide not to clear the MTE
> tags just because the current POR_EL0 says no-access. The thread
> subsequently changes POR_EL0 and it can read the stale tags.
> 
> I haven't checked what x86 and powerpc do here. There may be some
> implications on GUP but I'd rather ignore POE for this case.

There are tests in tools/testing/selftests/mm/protection_keys.c
test_kernel_gup_of_access_disabled_region etc, that explicitly check GUP is
affected by pkeys. So if we didn't modify pte_access_permitted() it would fail
the test and work differently than the other architectures.  For
MTE/__sync_cache_and_tags, I think could add a pte_access_permitted_no_poe().

> 
> >  #define pmd_access_permitted(pmd, write) \
> >     (pte_access_permitted(pmd_pte(pmd), (write)))
> >  #define pud_access_permitted(pud, write) \
> > diff --git a/arch/arm64/include/asm/pkeys.h b/arch/arm64/include/asm/pkeys.h
> > index 5761fb48fd53..a80c654da93d 100644
> > --- a/arch/arm64/include/asm/pkeys.h
> > +++ b/arch/arm64/include/asm/pkeys.h
> [...]
> >  static inline int execute_only_pkey(struct mm_struct *mm)
> >  {
> > +   // Execute-only mappings are handled by EPAN/FEAT_PAN3.
> > +   WARN_ON_ONCE(!cpus_have_final_cap(ARM64_HAS_EPAN));
> > +
> >     return -1;
> >  }
> 
> Why the WARN_ON_ONCE() here? It will trigger if the user asks for
> PROT_EXEC and I can't see any subsequent patch that changes the core
> code not to call it. I think we need some arch_has_execute_only_pkey()
> to avoid going on this path. Our arch would support exec-only with any
> pkey.

This would warn if you somehow had a CPU with FEAT_POE but not FEAT_PAN3. So I
don't really need the WARN() here, if the CPU supports FEAT_POE it should also
support FEAT_PAN3. Arm v8.7 made FEAT_PAN3 mandatory. 

> 
> > @@ -1490,6 +1491,38 @@ void ptep_modify_prot_commit(struct vm_area_struct 
> > *vma, unsigned long addr, pte
> >  #ifdef CONFIG_ARCH_HAS_PKEYS
> >  int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned 
> > long init_val)
> >  {
> > -   return -ENOSPC;
> > +   u64 new_por = POE_RXW;
> > +   u64 old_por;
> > +   u64 pkey_shift;
> > +
> > +   if (!arch_pkeys_enabled())
> > +           return -ENOSPC;
> > +
> > +   /*
> > +    * This code should only be called with valid 'pkey'
> > +    * values originating from in-kernel users.  Complain
> > +    * if a bad value is observed.
> > +    */
> > +   if (WARN_ON_ONCE(pkey >= arch_max_pkey()))
> > +           return -EINVAL;
> > +
> > +   /* Set the bits we need in POR:  */
> > +   if (init_val & PKEY_DISABLE_ACCESS)
> > +           new_por = POE_X;
> 
> Does PKEY_DISABLE_ACCESS mean allow execute? Or does x86 not have a way
> to disable execution?

As I understand it X86 can either disable Read and Write or disable Write. No
way to disable execution.

The man page says:
        PKEY_DISABLE_ACCESS
              Disable all data access to memory covered by the returned
              protection key.

> 
> > +   else if (init_val & PKEY_DISABLE_WRITE)
> > +           new_por = POE_RX;
> > +
> > +   /* Shift the bits in to the correct place in POR for pkey: */
> > +   pkey_shift = pkey * POR_BITS_PER_PKEY;
> > +   new_por <<= pkey_shift;
> > +
> > +   /* Get old POR and mask off any old bits in place: */
> > +   old_por = read_sysreg_s(SYS_POR_EL0);
> > +   old_por &= ~(POE_MASK << pkey_shift);
> > +
> > +   /* Write old part along with new part: */
> > +   write_sysreg_s(old_por | new_por, SYS_POR_EL0);
> > +
> > +   return 0;
> >  }
> >  #endif

Thanks,
Joey

Reply via email to