Re: [PATCH v5 0/8] KVM: arm64: permit MAP_SHARED mappings with MTE enabled

2022-11-04 Thread Peter Collingbourne
On Fri, Nov 4, 2022 at 9:23 AM Marc Zyngier  wrote:
>
> On Fri, 04 Nov 2022 01:10:33 +0000,
> Peter Collingbourne  wrote:
> >
> > Hi,
> >
> > This patch series allows VMMs to use shared mappings in MTE enabled
> > guests. The first five patches were taken from Catalin's tree [1] which
> > addressed some review feedback from when they were previously sent out
> > as v3 of this series. The first patch from Catalin's tree makes room
> > for an additional PG_arch_3 flag by making the newer PG_arch_* flags
> > arch-dependent. The next four patches are based on a series that
> > Catalin sent out prior to v3, whose cover letter [2] I quote from below:
> >
> > > This series aims to fix the races between initialising the tags on a
> > > page and setting the PG_mte_tagged flag. Currently the flag is set
> > > either before or after that tag initialisation and this can lead to CoW
> > > copying stale tags. The first patch moves the flag setting after the
> > > tags have been initialised, solving the CoW issue. However, concurrent
> > > mprotect() on a shared mapping may (very rarely) lead to valid tags
> > > being zeroed.
> > >
> > > The second skips the sanitise_mte_tags() call in kvm_set_spte_gfn(),
> > > deferring it to user_mem_abort(). The outcome is that no
> > > sanitise_mte_tags() can be simplified to skip the pfn_to_online_page()
> > > check and only rely on VM_MTE_ALLOWED vma flag that can be checked in
> > > user_mem_abort().
> > >
> > > The third and fourth patches use PG_arch_3 as a lock for page tagging,
> > > based on Peter Collingbourne's idea of a two-bit lock.
> > >
> > > I think the first patch can be queued but the rest needs some in depth
> > > review and test. With this series (if correct) we could allos MAP_SHARED
> > > on KVM guest memory but this is to be discussed separately as there are
> > > some KVM ABI implications.
> >
> > In this v5 I rebased Catalin's tree onto -next again. Please double check
>
> Please don't do use -next as a base. In-flight series should be based
> on a *stable* tag, either 6.0 or one of the early -RCs. If there is a
> known conflict with -next, do mention it in the cover letter and
> provide a resolution.

Okay, I will keep that in mind.

> > my rebase, which resolved the conflict with commit a8e5e5146ad0 ("arm64:
> > mte: Avoid setting PG_mte_tagged if no tags cleared or restored").
>
> This commit seems part of -rc1, so I guess the patches directly apply
> on top of that tag?

Yes, sorry, this also applies cleanly to -rc1.

> > I now have Reviewed-by for all patches except for the last one, which adds
> > the documentation. Thanks for the reviews so far, and please take a look!
>
> I'd really like the MM folks (list now cc'd) to look at the relevant
> patches (1 and 5) and ack them before I take this.

Okay, here are the lore links for the convenience of the MM folks:
https://lore.kernel.org/all/20221104011041.290951-2-...@google.com/
https://lore.kernel.org/all/20221104011041.290951-6-...@google.com/

Peter
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 7/8] KVM: arm64: permit all VM_MTE_ALLOWED mappings with MTE enabled

2022-11-03 Thread Peter Collingbourne
Certain VMMs such as crosvm have features (e.g. sandboxing) that depend
on being able to map guest memory as MAP_SHARED. The current restriction
on sharing MAP_SHARED pages with the guest is preventing the use of
those features with MTE. Now that the races between tasks concurrently
clearing tags on the same page have been fixed, remove this restriction.

Note that this is a relaxation of the ABI.

Signed-off-by: Peter Collingbourne 
Reviewed-by: Catalin Marinas 
Reviewed-by: Steven Price 
---
 arch/arm64/kvm/mmu.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 9ff9a271cf01..b9402d8b5a90 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1110,14 +1110,6 @@ static void sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t 
pfn,
 
 static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
 {
-   /*
-* VM_SHARED mappings are not allowed with MTE to avoid races
-* when updating the PG_mte_tagged page flag, see
-* sanitise_mte_tags for more details.
-*/
-   if (vma->vm_flags & VM_SHARED)
-   return false;
-
return vma->vm_flags & VM_MTE_ALLOWED;
 }
 
-- 
2.38.1.431.g37b22c650d-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 6/8] KVM: arm64: unify the tests for VMAs in memslots when MTE is enabled

2022-11-03 Thread Peter Collingbourne
Previously we allowed creating a memslot containing a private mapping that
was not VM_MTE_ALLOWED, but would later reject KVM_RUN with -EFAULT. Now
we reject the memory region at memslot creation time.

Since this is a minor tweak to the ABI (a VMM that created one of
these memslots would fail later anyway), no VMM to my knowledge has
MTE support yet, and the hardware with the necessary features is not
generally available, we can probably make this ABI change at this point.

Signed-off-by: Peter Collingbourne 
Reviewed-by: Catalin Marinas 
Reviewed-by: Steven Price 
---
 arch/arm64/kvm/mmu.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index fa2c85b93149..9ff9a271cf01 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1108,6 +1108,19 @@ static void sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t 
pfn,
}
 }
 
+static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
+{
+   /*
+* VM_SHARED mappings are not allowed with MTE to avoid races
+* when updating the PG_mte_tagged page flag, see
+* sanitise_mte_tags for more details.
+*/
+   if (vma->vm_flags & VM_SHARED)
+   return false;
+
+   return vma->vm_flags & VM_MTE_ALLOWED;
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
  struct kvm_memory_slot *memslot, unsigned long hva,
  unsigned long fault_status)
@@ -1284,9 +1297,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
}
 
if (fault_status != FSC_PERM && !device && kvm_has_mte(kvm)) {
-   /* Check the VMM hasn't introduced a new VM_SHARED VMA */
-   if ((vma->vm_flags & VM_MTE_ALLOWED) &&
-   !(vma->vm_flags & VM_SHARED)) {
+   /* Check the VMM hasn't introduced a new disallowed VMA */
+   if (kvm_vma_mte_allowed(vma)) {
sanitise_mte_tags(kvm, pfn, vma_pagesize);
} else {
ret = -EFAULT;
@@ -1730,12 +1742,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
if (!vma)
break;
 
-   /*
-* VM_SHARED mappings are not allowed with MTE to avoid races
-* when updating the PG_mte_tagged page flag, see
-* sanitise_mte_tags for more details.
-*/
-   if (kvm_has_mte(kvm) && vma->vm_flags & VM_SHARED) {
+   if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
ret = -EINVAL;
break;
}
-- 
2.38.1.431.g37b22c650d-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 8/8] Documentation: document the ABI changes for KVM_CAP_ARM_MTE

2022-11-03 Thread Peter Collingbourne
Document both the restriction on VM_MTE_ALLOWED mappings and
the relaxation for shared mappings.

Signed-off-by: Peter Collingbourne 
Acked-by: Catalin Marinas 
---
 Documentation/virt/kvm/api.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index eee9f857a986..b55f80dadcfe 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7385,8 +7385,9 @@ hibernation of the host; however the VMM needs to 
manually save/restore the
 tags as appropriate if the VM is migrated.
 
 When this capability is enabled all memory in memslots must be mapped as
-not-shareable (no MAP_SHARED), attempts to create a memslot with a
-MAP_SHARED mmap will result in an -EINVAL return.
+``MAP_ANONYMOUS`` or with a RAM-based file mapping (``tmpfs``, ``memfd``),
+attempts to create a memslot with an invalid mmap will result in an
+-EINVAL return.
 
 When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
 perform a bulk copy of tags to/from the guest.
-- 
2.38.1.431.g37b22c650d-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 5/8] arm64: mte: Lock a page for MTE tag initialisation

2022-11-03 Thread Peter Collingbourne
From: Catalin Marinas 

Initialising the tags and setting PG_mte_tagged flag for a page can race
between multiple set_pte_at() on shared pages or setting the stage 2 pte
via user_mem_abort(). Introduce a new PG_mte_lock flag as PG_arch_3 and
set it before attempting page initialisation. Given that PG_mte_tagged
is never cleared for a page, consider setting this flag to mean page
unlocked and wait on this bit with acquire semantics if the page is
locked:

- try_page_mte_tagging() - lock the page for tagging, return true if it
  can be tagged, false if already tagged. No acquire semantics if it
  returns true (PG_mte_tagged not set) as there is no serialisation with
  a previous set_page_mte_tagged().

- set_page_mte_tagged() - set PG_mte_tagged with release semantics.

The two-bit locking is based on Peter Collingbourne's idea.

Signed-off-by: Catalin Marinas 
Signed-off-by: Peter Collingbourne 
Reviewed-by: Steven Price 
Cc: Will Deacon 
Cc: Marc Zyngier 
Cc: Peter Collingbourne 
---
 arch/arm64/include/asm/mte.h | 35 +++-
 arch/arm64/include/asm/pgtable.h |  4 ++--
 arch/arm64/kernel/cpufeature.c   |  2 +-
 arch/arm64/kernel/mte.c  | 12 +++
 arch/arm64/kvm/guest.c   | 16 +--
 arch/arm64/kvm/mmu.c |  2 +-
 arch/arm64/mm/copypage.c |  2 ++
 arch/arm64/mm/fault.c|  2 ++
 arch/arm64/mm/mteswap.c  | 14 +
 9 files changed, 60 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index 3f8199ba265a..20dd06d70af5 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -25,7 +25,7 @@ unsigned long mte_copy_tags_to_user(void __user *to, void 
*from,
unsigned long n);
 int mte_save_tags(struct page *page);
 void mte_save_page_tags(const void *page_addr, void *tag_storage);
-bool mte_restore_tags(swp_entry_t entry, struct page *page);
+void mte_restore_tags(swp_entry_t entry, struct page *page);
 void mte_restore_page_tags(void *page_addr, const void *tag_storage);
 void mte_invalidate_tags(int type, pgoff_t offset);
 void mte_invalidate_tags_area(int type);
@@ -36,6 +36,8 @@ void mte_free_tag_storage(char *storage);
 
 /* track which pages have valid allocation tags */
 #define PG_mte_tagged  PG_arch_2
+/* simple lock to avoid multiple threads tagging the same page */
+#define PG_mte_lockPG_arch_3
 
 static inline void set_page_mte_tagged(struct page *page)
 {
@@ -60,6 +62,33 @@ static inline bool page_mte_tagged(struct page *page)
return ret;
 }
 
+/*
+ * Lock the page for tagging and return 'true' if the page can be tagged,
+ * 'false' if already tagged. PG_mte_tagged is never cleared and therefore the
+ * locking only happens once for page initialisation.
+ *
+ * The page MTE lock state:
+ *
+ *   Locked:   PG_mte_lock && !PG_mte_tagged
+ *   Unlocked: !PG_mte_lock || PG_mte_tagged
+ *
+ * Acquire semantics only if the page is tagged (returning 'false').
+ */
+static inline bool try_page_mte_tagging(struct page *page)
+{
+   if (!test_and_set_bit(PG_mte_lock, >flags))
+   return true;
+
+   /*
+* The tags are either being initialised or may have been initialised
+* already. Check if the PG_mte_tagged flag has been set or wait
+* otherwise.
+*/
+   smp_cond_load_acquire(>flags, VAL & (1UL << PG_mte_tagged));
+
+   return false;
+}
+
 void mte_zero_clear_page_tags(void *addr);
 void mte_sync_tags(pte_t old_pte, pte_t pte);
 void mte_copy_page_tags(void *kto, const void *kfrom);
@@ -86,6 +115,10 @@ static inline bool page_mte_tagged(struct page *page)
 {
return false;
 }
+static inline bool try_page_mte_tagging(struct page *page)
+{
+   return false;
+}
 static inline void mte_zero_clear_page_tags(void *addr)
 {
 }
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index c6a2d8891d2a..c99fc9aec373 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1047,8 +1047,8 @@ static inline void arch_swap_invalidate_area(int type)
 #define __HAVE_ARCH_SWAP_RESTORE
 static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
 {
-   if (system_supports_mte() && mte_restore_tags(entry, >page))
-   set_page_mte_tagged(>page);
+   if (system_supports_mte())
+   mte_restore_tags(entry, >page);
 }
 
 #endif /* CONFIG_ARM64_MTE */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index df11cfe61fcb..afb4ffd745c3 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2050,7 +2050,7 @@ static void cpu_enable_mte(struct arm64_cpu_capabilities 
const *cap)
 * Clear the tags in the zero page. This needs to be done via the
 * linear map which has the Tagged attribute.
 */

[PATCH v5 3/8] KVM: arm64: Simplify the sanitise_mte_tags() logic

2022-11-03 Thread Peter Collingbourne
From: Catalin Marinas 

Currently sanitise_mte_tags() checks if it's an online page before
attempting to sanitise the tags. Such detection should be done in the
caller via the VM_MTE_ALLOWED vma flag. Since kvm_set_spte_gfn() does
not have the vma, leave the page unmapped if not already tagged. Tag
initialisation will be done on a subsequent access fault in
user_mem_abort().

Signed-off-by: Catalin Marinas 
[p...@google.com: fix the page initializer]
Signed-off-by: Peter Collingbourne 
Reviewed-by: Steven Price 
Cc: Will Deacon 
Cc: Marc Zyngier 
Cc: Peter Collingbourne 
---
 arch/arm64/kvm/mmu.c | 40 +++-
 1 file changed, 15 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 2c3759f1f2c5..e81bfb730629 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1091,23 +1091,14 @@ static int get_vma_page_shift(struct vm_area_struct 
*vma, unsigned long hva)
  * - mmap_lock protects between a VM faulting a page in and the VMM performing
  *   an mprotect() to add VM_MTE
  */
-static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
-unsigned long size)
+static void sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
+ unsigned long size)
 {
unsigned long i, nr_pages = size >> PAGE_SHIFT;
-   struct page *page;
+   struct page *page = pfn_to_page(pfn);
 
if (!kvm_has_mte(kvm))
-   return 0;
-
-   /*
-* pfn_to_online_page() is used to reject ZONE_DEVICE pages
-* that may not support tags.
-*/
-   page = pfn_to_online_page(pfn);
-
-   if (!page)
-   return -EFAULT;
+   return;
 
for (i = 0; i < nr_pages; i++, page++) {
if (!page_mte_tagged(page)) {
@@ -1115,8 +1106,6 @@ static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t 
pfn,
set_page_mte_tagged(page);
}
}
-
-   return 0;
 }
 
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
@@ -1127,7 +1116,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
bool write_fault, writable, force_pte = false;
bool exec_fault;
bool device = false;
-   bool shared;
unsigned long mmu_seq;
struct kvm *kvm = vcpu->kvm;
struct kvm_mmu_memory_cache *memcache = >arch.mmu_page_cache;
@@ -1177,8 +1165,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
vma_shift = get_vma_page_shift(vma, hva);
}
 
-   shared = (vma->vm_flags & VM_SHARED);
-
switch (vma_shift) {
 #ifndef __PAGETABLE_PMD_FOLDED
case PUD_SHIFT:
@@ -1299,12 +1285,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
 
if (fault_status != FSC_PERM && !device && kvm_has_mte(kvm)) {
/* Check the VMM hasn't introduced a new VM_SHARED VMA */
-   if (!shared)
-   ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
-   else
+   if ((vma->vm_flags & VM_MTE_ALLOWED) &&
+   !(vma->vm_flags & VM_SHARED)) {
+   sanitise_mte_tags(kvm, pfn, vma_pagesize);
+   } else {
ret = -EFAULT;
-   if (ret)
goto out_unlock;
+   }
}
 
if (writable)
@@ -1526,15 +1513,18 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct 
kvm_gfn_range *range)
 bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
kvm_pfn_t pfn = pte_pfn(range->pte);
-   int ret;
 
if (!kvm->arch.mmu.pgt)
return false;
 
WARN_ON(range->end - range->start != 1);
 
-   ret = sanitise_mte_tags(kvm, pfn, PAGE_SIZE);
-   if (ret)
+   /*
+* If the page isn't tagged, defer to user_mem_abort() for sanitising
+* the MTE tags. The S2 pte should have been unmapped by
+* mmu_notifier_invalidate_range_end().
+*/
+   if (kvm_has_mte(kvm) && !page_mte_tagged(pfn_to_page(pfn)))
return false;
 
/*
-- 
2.38.1.431.g37b22c650d-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 2/8] arm64: mte: Fix/clarify the PG_mte_tagged semantics

2022-11-03 Thread Peter Collingbourne
From: Catalin Marinas 

Currently the PG_mte_tagged page flag mostly means the page contains
valid tags and it should be set after the tags have been cleared or
restored. However, in mte_sync_tags() it is set before setting the tags
to avoid, in theory, a race with concurrent mprotect(PROT_MTE) for
shared pages. However, a concurrent mprotect(PROT_MTE) with a copy on
write in another thread can cause the new page to have stale tags.
Similarly, tag reading via ptrace() can read stale tags if the
PG_mte_tagged flag is set before actually clearing/restoring the tags.

Fix the PG_mte_tagged semantics so that it is only set after the tags
have been cleared or restored. This is safe for swap restoring into a
MAP_SHARED or CoW page since the core code takes the page lock. Add two
functions to test and set the PG_mte_tagged flag with acquire and
release semantics. The downside is that concurrent mprotect(PROT_MTE) on
a MAP_SHARED page may cause tag loss. This is already the case for KVM
guests if a VMM changes the page protection while the guest triggers a
user_mem_abort().

Signed-off-by: Catalin Marinas 
[p...@google.com: fix build with CONFIG_ARM64_MTE disabled]
Signed-off-by: Peter Collingbourne 
Reviewed-by: Cornelia Huck 
Reviewed-by: Steven Price 
Cc: Will Deacon 
Cc: Marc Zyngier 
Cc: Peter Collingbourne 
---
 arch/arm64/include/asm/mte.h | 30 ++
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/arm64/kernel/cpufeature.c   |  4 +++-
 arch/arm64/kernel/elfcore.c  |  2 +-
 arch/arm64/kernel/hibernate.c|  2 +-
 arch/arm64/kernel/mte.c  | 17 +++--
 arch/arm64/kvm/guest.c   |  4 ++--
 arch/arm64/kvm/mmu.c |  4 ++--
 arch/arm64/mm/copypage.c |  5 +++--
 arch/arm64/mm/fault.c|  2 +-
 arch/arm64/mm/mteswap.c  |  2 +-
 11 files changed, 56 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index 760c62f8e22f..3f8199ba265a 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -37,6 +37,29 @@ void mte_free_tag_storage(char *storage);
 /* track which pages have valid allocation tags */
 #define PG_mte_tagged  PG_arch_2
 
+static inline void set_page_mte_tagged(struct page *page)
+{
+   /*
+* Ensure that the tags written prior to this function are visible
+* before the page flags update.
+*/
+   smp_wmb();
+   set_bit(PG_mte_tagged, >flags);
+}
+
+static inline bool page_mte_tagged(struct page *page)
+{
+   bool ret = test_bit(PG_mte_tagged, >flags);
+
+   /*
+* If the page is tagged, ensure ordering with a likely subsequent
+* read of the tags.
+*/
+   if (ret)
+   smp_rmb();
+   return ret;
+}
+
 void mte_zero_clear_page_tags(void *addr);
 void mte_sync_tags(pte_t old_pte, pte_t pte);
 void mte_copy_page_tags(void *kto, const void *kfrom);
@@ -56,6 +79,13 @@ size_t mte_probe_user_range(const char __user *uaddr, size_t 
size);
 /* unused if !CONFIG_ARM64_MTE, silence the compiler */
 #define PG_mte_tagged  0
 
+static inline void set_page_mte_tagged(struct page *page)
+{
+}
+static inline bool page_mte_tagged(struct page *page)
+{
+   return false;
+}
 static inline void mte_zero_clear_page_tags(void *addr)
 {
 }
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 4873c1d6e7d0..c6a2d8891d2a 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1048,7 +1048,7 @@ static inline void arch_swap_invalidate_area(int type)
 static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
 {
if (system_supports_mte() && mte_restore_tags(entry, >page))
-   set_bit(PG_mte_tagged, >flags);
+   set_page_mte_tagged(>page);
 }
 
 #endif /* CONFIG_ARM64_MTE */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 6062454a9067..df11cfe61fcb 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2050,8 +2050,10 @@ static void cpu_enable_mte(struct arm64_cpu_capabilities 
const *cap)
 * Clear the tags in the zero page. This needs to be done via the
 * linear map which has the Tagged attribute.
 */
-   if (!test_and_set_bit(PG_mte_tagged, _PAGE(0)->flags))
+   if (!page_mte_tagged(ZERO_PAGE(0))) {
mte_clear_page_tags(lm_alias(empty_zero_page));
+   set_page_mte_tagged(ZERO_PAGE(0));
+   }
 
kasan_init_hw_tags_cpu();
 }
diff --git a/arch/arm64/kernel/elfcore.c b/arch/arm64/kernel/elfcore.c
index 27ef7ad3ffd2..353009d7f307 100644
--- a/arch/arm64/kernel/elfcore.c
+++ b/arch/arm64/kernel/elfcore.c
@@ -47,7 +47,7 @@ static int mte_dump_tag_range(struct coredump_params *cprm,
 * Pages mapped in user space as !pte_access_permitted() (e.g.
 * PROT_EXEC only) may not have 

[PATCH v5 4/8] mm: Add PG_arch_3 page flag

2022-11-03 Thread Peter Collingbourne
As with PG_arch_2, this flag is only allowed on 64-bit architectures due
to the shortage of bits available. It will be used by the arm64 MTE code
in subsequent patches.

Signed-off-by: Peter Collingbourne 
Cc: Will Deacon 
Cc: Marc Zyngier 
Cc: Steven Price 
[catalin.mari...@arm.com: added flag preserving in __split_huge_page_tail()]
Signed-off-by: Catalin Marinas 
Reviewed-by: Steven Price 
---
 fs/proc/page.c| 1 +
 include/linux/kernel-page-flags.h | 1 +
 include/linux/page-flags.h| 1 +
 include/trace/events/mmflags.h| 1 +
 mm/huge_memory.c  | 1 +
 5 files changed, 5 insertions(+)

diff --git a/fs/proc/page.c b/fs/proc/page.c
index 882525c8e94c..6249c347809a 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -221,6 +221,7 @@ u64 stable_page_flags(struct page *page)
u |= kpf_copy_bit(k, KPF_ARCH,  PG_arch_1);
 #ifdef CONFIG_ARCH_USES_PG_ARCH_X
u |= kpf_copy_bit(k, KPF_ARCH_2,PG_arch_2);
+   u |= kpf_copy_bit(k, KPF_ARCH_3,PG_arch_3);
 #endif
 
return u;
diff --git a/include/linux/kernel-page-flags.h 
b/include/linux/kernel-page-flags.h
index eee1877a354e..859f4b0c1b2b 100644
--- a/include/linux/kernel-page-flags.h
+++ b/include/linux/kernel-page-flags.h
@@ -18,5 +18,6 @@
 #define KPF_UNCACHED   39
 #define KPF_SOFTDIRTY  40
 #define KPF_ARCH_2 41
+#define KPF_ARCH_3 42
 
 #endif /* LINUX_KERNEL_PAGE_FLAGS_H */
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 5dc7977edf9d..c50ce2812f17 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -134,6 +134,7 @@ enum pageflags {
 #endif
 #ifdef CONFIG_ARCH_USES_PG_ARCH_X
PG_arch_2,
+   PG_arch_3,
 #endif
 #ifdef CONFIG_KASAN_HW_TAGS
PG_skip_kasan_poison,
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 4673e58a7626..9db52bc4ce19 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -130,6 +130,7 @@ IF_HAVE_PG_HWPOISON(PG_hwpoison,"hwpoison"  )   
\
 IF_HAVE_PG_IDLE(PG_young,  "young" )   \
 IF_HAVE_PG_IDLE(PG_idle,   "idle"  )   \
 IF_HAVE_PG_ARCH_X(PG_arch_2,   "arch_2")   \
+IF_HAVE_PG_ARCH_X(PG_arch_3,   "arch_3")   \
 IF_HAVE_PG_SKIP_KASAN_POISON(PG_skip_kasan_poison, "skip_kasan_poison")
 
 #define show_page_flags(flags) \
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5d87dc4611b9..c509011bd4a2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2403,6 +2403,7 @@ static void __split_huge_page_tail(struct page *head, int 
tail,
 (1L << PG_unevictable) |
 #ifdef CONFIG_ARCH_USES_PG_ARCH_X
 (1L << PG_arch_2) |
+(1L << PG_arch_3) |
 #endif
 (1L << PG_dirty) |
 LRU_GEN_MASK | LRU_REFS_MASK));
-- 
2.38.1.431.g37b22c650d-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 1/8] mm: Do not enable PG_arch_2 for all 64-bit architectures

2022-11-03 Thread Peter Collingbourne
From: Catalin Marinas 

Commit 4beba9486abd ("mm: Add PG_arch_2 page flag") introduced a new
page flag for all 64-bit architectures. However, even if an architecture
is 64-bit, it may still have limited spare bits in the 'flags' member of
'struct page'. This may happen if an architecture enables SPARSEMEM
without SPARSEMEM_VMEMMAP as is the case with the newly added loongarch.
This architecture port needs 19 more bits for the sparsemem section
information and, while it is currently fine with PG_arch_2, adding any
more PG_arch_* flags will trigger build-time warnings.

Add a new CONFIG_ARCH_USES_PG_ARCH_X option which can be selected by
architectures that need more PG_arch_* flags beyond PG_arch_1. Select it
on arm64.

Signed-off-by: Catalin Marinas 
[p...@google.com: fix build with CONFIG_ARM64_MTE disabled]
Signed-off-by: Peter Collingbourne 
Reported-by: kernel test robot 
Cc: Andrew Morton 
Cc: Steven Price 
Reviewed-by: Steven Price 
---
 arch/arm64/Kconfig | 1 +
 fs/proc/page.c | 2 +-
 include/linux/page-flags.h | 2 +-
 include/trace/events/mmflags.h | 8 
 mm/Kconfig | 8 
 mm/huge_memory.c   | 2 +-
 6 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 2d505fc0e85e..db6b80752e5d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1966,6 +1966,7 @@ config ARM64_MTE
depends on ARM64_PAN
select ARCH_HAS_SUBPAGE_FAULTS
select ARCH_USES_HIGH_VMA_FLAGS
+   select ARCH_USES_PG_ARCH_X
help
  Memory Tagging (part of the ARMv8.5 Extensions) provides
  architectural support for run-time, always-on detection of
diff --git a/fs/proc/page.c b/fs/proc/page.c
index f2273b164535..882525c8e94c 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -219,7 +219,7 @@ u64 stable_page_flags(struct page *page)
u |= kpf_copy_bit(k, KPF_PRIVATE_2, PG_private_2);
u |= kpf_copy_bit(k, KPF_OWNER_PRIVATE, PG_owner_priv_1);
u |= kpf_copy_bit(k, KPF_ARCH,  PG_arch_1);
-#ifdef CONFIG_64BIT
+#ifdef CONFIG_ARCH_USES_PG_ARCH_X
u |= kpf_copy_bit(k, KPF_ARCH_2,PG_arch_2);
 #endif
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 0b0ae5084e60..5dc7977edf9d 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -132,7 +132,7 @@ enum pageflags {
PG_young,
PG_idle,
 #endif
-#ifdef CONFIG_64BIT
+#ifdef CONFIG_ARCH_USES_PG_ARCH_X
PG_arch_2,
 #endif
 #ifdef CONFIG_KASAN_HW_TAGS
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 11524cda4a95..4673e58a7626 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -90,10 +90,10 @@
 #define IF_HAVE_PG_IDLE(flag,string)
 #endif
 
-#ifdef CONFIG_64BIT
-#define IF_HAVE_PG_ARCH_2(flag,string) ,{1UL << flag, string}
+#ifdef CONFIG_ARCH_USES_PG_ARCH_X
+#define IF_HAVE_PG_ARCH_X(flag,string) ,{1UL << flag, string}
 #else
-#define IF_HAVE_PG_ARCH_2(flag,string)
+#define IF_HAVE_PG_ARCH_X(flag,string)
 #endif
 
 #ifdef CONFIG_KASAN_HW_TAGS
@@ -129,7 +129,7 @@ IF_HAVE_PG_UNCACHED(PG_uncached,"uncached"  )   
\
 IF_HAVE_PG_HWPOISON(PG_hwpoison,   "hwpoison"  )   \
 IF_HAVE_PG_IDLE(PG_young,  "young" )   \
 IF_HAVE_PG_IDLE(PG_idle,   "idle"  )   \
-IF_HAVE_PG_ARCH_2(PG_arch_2,   "arch_2")   \
+IF_HAVE_PG_ARCH_X(PG_arch_2,   "arch_2")   \
 IF_HAVE_PG_SKIP_KASAN_POISON(PG_skip_kasan_poison, "skip_kasan_poison")
 
 #define show_page_flags(flags) \
diff --git a/mm/Kconfig b/mm/Kconfig
index b0b56c33f2ed..8e9e26ca472c 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1005,6 +1005,14 @@ config ARCH_USES_HIGH_VMA_FLAGS
 config ARCH_HAS_PKEYS
bool
 
+config ARCH_USES_PG_ARCH_X
+   bool
+   help
+ Enable the definition of PG_arch_x page flags with x > 1. Only
+ suitable for 64-bit architectures with CONFIG_FLATMEM or
+ CONFIG_SPARSEMEM_VMEMMAP enabled, otherwise there may not be
+ enough room for additional bits in page->flags.
+
 config VM_EVENT_COUNTERS
default y
bool "Enable VM event counters for /proc/vmstat" if EXPERT
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1d47b3f7b877..5d87dc4611b9 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2401,7 +2401,7 @@ static void __split_huge_page_tail(struct page *head, int 
tail,
 (1L << PG_workingset) |
 (1L << PG_locked) |
 (1L << PG_unevictable) |
-#ifdef CONFIG_64BIT
+#ifdef CONFIG_ARCH_USES_PG_ARCH_X
  

[PATCH v5 0/8] KVM: arm64: permit MAP_SHARED mappings with MTE enabled

2022-11-03 Thread Peter Collingbourne
Hi,

This patch series allows VMMs to use shared mappings in MTE enabled
guests. The first five patches were taken from Catalin's tree [1] which
addressed some review feedback from when they were previously sent out
as v3 of this series. The first patch from Catalin's tree makes room
for an additional PG_arch_3 flag by making the newer PG_arch_* flags
arch-dependent. The next four patches are based on a series that
Catalin sent out prior to v3, whose cover letter [2] I quote from below:

> This series aims to fix the races between initialising the tags on a
> page and setting the PG_mte_tagged flag. Currently the flag is set
> either before or after that tag initialisation and this can lead to CoW
> copying stale tags. The first patch moves the flag setting after the
> tags have been initialised, solving the CoW issue. However, concurrent
> mprotect() on a shared mapping may (very rarely) lead to valid tags
> being zeroed.
>
> The second skips the sanitise_mte_tags() call in kvm_set_spte_gfn(),
> deferring it to user_mem_abort(). The outcome is that no
> sanitise_mte_tags() can be simplified to skip the pfn_to_online_page()
> check and only rely on VM_MTE_ALLOWED vma flag that can be checked in
> user_mem_abort().
>
> The third and fourth patches use PG_arch_3 as a lock for page tagging,
> based on Peter Collingbourne's idea of a two-bit lock.
>
> I think the first patch can be queued but the rest needs some in depth
> review and test. With this series (if correct) we could allos MAP_SHARED
> on KVM guest memory but this is to be discussed separately as there are
> some KVM ABI implications.

In this v5 I rebased Catalin's tree onto -next again. Please double check
my rebase, which resolved the conflict with commit a8e5e5146ad0 ("arm64:
mte: Avoid setting PG_mte_tagged if no tags cleared or restored").

I now have Reviewed-by for all patches except for the last one, which adds
the documentation. Thanks for the reviews so far, and please take a look!

I've tested it on QEMU as well as on MTE-capable hardware by booting a
Linux kernel and userspace under a crosvm with MTE support [3].

[1] git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux devel/mte-pg-flags
[2] 
https://lore.kernel.org/all/20220705142619.4135905-1-catalin.mari...@arm.com/
[3] https://chromium-review.googlesource.com/c/crosvm/crosvm/+/3892141

Catalin Marinas (4):
  mm: Do not enable PG_arch_2 for all 64-bit architectures
  arm64: mte: Fix/clarify the PG_mte_tagged semantics
  KVM: arm64: Simplify the sanitise_mte_tags() logic
  arm64: mte: Lock a page for MTE tag initialisation

Peter Collingbourne (4):
  mm: Add PG_arch_3 page flag
  KVM: arm64: unify the tests for VMAs in memslots when MTE is enabled
  KVM: arm64: permit all VM_MTE_ALLOWED mappings with MTE enabled
  Documentation: document the ABI changes for KVM_CAP_ARM_MTE

 Documentation/virt/kvm/api.rst|  5 ++-
 arch/arm64/Kconfig|  1 +
 arch/arm64/include/asm/mte.h  | 65 ++-
 arch/arm64/include/asm/pgtable.h  |  4 +-
 arch/arm64/kernel/cpufeature.c|  4 +-
 arch/arm64/kernel/elfcore.c   |  2 +-
 arch/arm64/kernel/hibernate.c |  2 +-
 arch/arm64/kernel/mte.c   | 21 +-
 arch/arm64/kvm/guest.c| 18 +
 arch/arm64/kvm/mmu.c  | 55 +++---
 arch/arm64/mm/copypage.c  |  7 +++-
 arch/arm64/mm/fault.c |  4 +-
 arch/arm64/mm/mteswap.c   | 16 +++-
 fs/proc/page.c|  3 +-
 include/linux/kernel-page-flags.h |  1 +
 include/linux/page-flags.h|  3 +-
 include/trace/events/mmflags.h|  9 +++--
 mm/Kconfig|  8 
 mm/huge_memory.c  |  3 +-
 19 files changed, 152 insertions(+), 79 deletions(-)

-- 
2.38.1.431.g37b22c650d-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 3/7] mm: Add PG_arch_3 page flag

2022-09-20 Thread Peter Collingbourne
On Tue, Sep 20, 2022 at 9:58 AM Catalin Marinas  wrote:
>
> On Tue, Sep 20, 2022 at 05:33:42PM +0100, Marc Zyngier wrote:
> > On Tue, 20 Sep 2022 16:39:47 +0100,
> > Catalin Marinas  wrote:
> > > On Mon, Sep 19, 2022 at 07:12:53PM +0100, Marc Zyngier wrote:
> > > > On Mon, 05 Sep 2022 18:01:55 +0100,
> > > > Catalin Marinas  wrote:
> > > > > Peter, please let me know if you want to pick this series up together
> > > > > with your other KVM patches. Otherwise I can post it separately, it's
> > > > > worth merging it on its own as it clarifies the page flag vs tag 
> > > > > setting
> > > > > ordering.
> > > >
> > > > I'm looking at queuing this, but I'm confused by this comment. Do I
> > > > need to pick this as part of the series? Or is this an independent
> > > > thing (my hunch is that it is actually required not to break other
> > > > architectures...).
> > >
> > > This series series (at least the first patches) won't apply cleanly on
> > > top of 6.0-rc1 and, of course, we shouldn't break other architectures. I
> > > can repost the whole series but I don't have the setup to test the
> > > MAP_SHARED KVM option (unless Peter plans to post it soon).
> >
> > I don't feel brave enough to take a series affecting all architectures
>
> It shouldn't affect the others, the only change is that PG_arch_2 is now
> only defined for arm64 but no other architecture is using it. The
> problem with loongarch is that it doesn't have enough spare bits in
> page->flags and even without any patches I think it's broken with the
> right value for NR_CPUS.
>
> > so late in the game, and the whole thing had very little arm64
> > exposure. The latest QEMU doesn't seem to work anymore, so I don't
> > have any MTE-capable emulation (and using the FVP remotely is a pain
> > in the proverbial neck).
> >
> > I'll come back to this after the merge window, should Peter decide to
> > respin the series.
>
> It makes sense.

Apologies for the delay, I've now sent out v4 of this series which
includes the patches on your branch.

Peter
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 7/8] KVM: arm64: permit all VM_MTE_ALLOWED mappings with MTE enabled

2022-09-20 Thread Peter Collingbourne
Certain VMMs such as crosvm have features (e.g. sandboxing) that depend
on being able to map guest memory as MAP_SHARED. The current restriction
on sharing MAP_SHARED pages with the guest is preventing the use of
those features with MTE. Now that the races between tasks concurrently
clearing tags on the same page have been fixed, remove this restriction.

Note that this is a relaxation of the ABI.

Signed-off-by: Peter Collingbourne 
Reviewed-by: Catalin Marinas 
Reviewed-by: Steven Price 
---
 arch/arm64/kvm/mmu.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index e34fbabd8b93..996ea11fb0e5 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1075,14 +1075,6 @@ static void sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t 
pfn,
 
 static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
 {
-   /*
-* VM_SHARED mappings are not allowed with MTE to avoid races
-* when updating the PG_mte_tagged page flag, see
-* sanitise_mte_tags for more details.
-*/
-   if (vma->vm_flags & VM_SHARED)
-   return false;
-
return vma->vm_flags & VM_MTE_ALLOWED;
 }
 
-- 
2.37.3.968.ga6b4b080e4-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 5/8] arm64: mte: Lock a page for MTE tag initialisation

2022-09-20 Thread Peter Collingbourne
From: Catalin Marinas 

Initialising the tags and setting PG_mte_tagged flag for a page can race
between multiple set_pte_at() on shared pages or setting the stage 2 pte
via user_mem_abort(). Introduce a new PG_mte_lock flag as PG_arch_3 and
set it before attempting page initialisation. Given that PG_mte_tagged
is never cleared for a page, consider setting this flag to mean page
unlocked and wait on this bit with acquire semantics if the page is
locked:

- try_page_mte_tagging() - lock the page for tagging, return true if it
  can be tagged, false if already tagged. No acquire semantics if it
  returns true (PG_mte_tagged not set) as there is no serialisation with
  a previous set_page_mte_tagged().

- set_page_mte_tagged() - set PG_mte_tagged with release semantics.

The two-bit locking is based on Peter Collingbourne's idea.

Signed-off-by: Catalin Marinas 
Signed-off-by: Peter Collingbourne 
Reviewed-by: Steven Price 
Cc: Will Deacon 
Cc: Marc Zyngier 
Cc: Peter Collingbourne 
---
 arch/arm64/include/asm/mte.h | 35 +++-
 arch/arm64/include/asm/pgtable.h |  4 ++--
 arch/arm64/kernel/cpufeature.c   |  2 +-
 arch/arm64/kernel/mte.c  | 12 +--
 arch/arm64/kvm/guest.c   | 16 +--
 arch/arm64/kvm/mmu.c |  2 +-
 arch/arm64/mm/copypage.c |  2 ++
 arch/arm64/mm/fault.c|  2 ++
 arch/arm64/mm/mteswap.c  | 11 +-
 9 files changed, 64 insertions(+), 22 deletions(-)

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index 46618c575eac..be6560e1ff2b 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -25,7 +25,7 @@ unsigned long mte_copy_tags_to_user(void __user *to, void 
*from,
unsigned long n);
 int mte_save_tags(struct page *page);
 void mte_save_page_tags(const void *page_addr, void *tag_storage);
-bool mte_restore_tags(swp_entry_t entry, struct page *page);
+void mte_restore_tags(swp_entry_t entry, struct page *page);
 void mte_restore_page_tags(void *page_addr, const void *tag_storage);
 void mte_invalidate_tags(int type, pgoff_t offset);
 void mte_invalidate_tags_area(int type);
@@ -36,6 +36,8 @@ void mte_free_tag_storage(char *storage);
 
 /* track which pages have valid allocation tags */
 #define PG_mte_tagged  PG_arch_2
+/* simple lock to avoid multiple threads tagging the same page */
+#define PG_mte_lockPG_arch_3
 
 static inline void set_page_mte_tagged(struct page *page)
 {
@@ -60,6 +62,33 @@ static inline bool page_mte_tagged(struct page *page)
return ret;
 }
 
+/*
+ * Lock the page for tagging and return 'true' if the page can be tagged,
+ * 'false' if already tagged. PG_mte_tagged is never cleared and therefore the
+ * locking only happens once for page initialisation.
+ *
+ * The page MTE lock state:
+ *
+ *   Locked:   PG_mte_lock && !PG_mte_tagged
+ *   Unlocked: !PG_mte_lock || PG_mte_tagged
+ *
+ * Acquire semantics only if the page is tagged (returning 'false').
+ */
+static inline bool try_page_mte_tagging(struct page *page)
+{
+   if (!test_and_set_bit(PG_mte_lock, >flags))
+   return true;
+
+   /*
+* The tags are either being initialised or may have been initialised
+* already. Check if the PG_mte_tagged flag has been set or wait
+* otherwise.
+*/
+   smp_cond_load_acquire(>flags, VAL & (1UL << PG_mte_tagged));
+
+   return false;
+}
+
 void mte_zero_clear_page_tags(void *addr);
 void mte_sync_tags(pte_t old_pte, pte_t pte);
 void mte_copy_page_tags(void *kto, const void *kfrom);
@@ -84,6 +113,10 @@ static inline bool page_mte_tagged(struct page *page)
 {
return false;
 }
+static inline bool try_page_mte_tagging(struct page *page)
+{
+   return false;
+}
 static inline void mte_zero_clear_page_tags(void *addr)
 {
 }
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 98b638441521..8735ac1a1e32 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1049,8 +1049,8 @@ static inline void arch_swap_invalidate_area(int type)
 #define __HAVE_ARCH_SWAP_RESTORE
 static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
 {
-   if (system_supports_mte() && mte_restore_tags(entry, >page))
-   set_page_mte_tagged(>page);
+   if (system_supports_mte())
+   mte_restore_tags(entry, >page);
 }
 
 #endif /* CONFIG_ARM64_MTE */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index ab3312788d60..e2c0a707a941 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2049,7 +2049,7 @@ static void cpu_enable_mte(struct arm64_cpu_capabilities 
const *cap)
 * Clear the tags in the zero page. This needs to be done via the
 * linear map which has the Tagged attribute.
 */

[PATCH v4 4/8] mm: Add PG_arch_3 page flag

2022-09-20 Thread Peter Collingbourne
As with PG_arch_2, this flag is only allowed on 64-bit architectures due
to the shortage of bits available. It will be used by the arm64 MTE code
in subsequent patches.

Signed-off-by: Peter Collingbourne 
Cc: Will Deacon 
Cc: Marc Zyngier 
Cc: Steven Price 
[catalin.mari...@arm.com: added flag preserving in __split_huge_page_tail()]
Signed-off-by: Catalin Marinas 
---
 fs/proc/page.c| 1 +
 include/linux/kernel-page-flags.h | 1 +
 include/linux/page-flags.h| 1 +
 include/trace/events/mmflags.h| 1 +
 mm/huge_memory.c  | 1 +
 5 files changed, 5 insertions(+)

diff --git a/fs/proc/page.c b/fs/proc/page.c
index 6f4b4bcb9b0d..43d371e6b366 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -220,6 +220,7 @@ u64 stable_page_flags(struct page *page)
u |= kpf_copy_bit(k, KPF_ARCH,  PG_arch_1);
 #ifdef CONFIG_ARCH_USES_PG_ARCH_X
u |= kpf_copy_bit(k, KPF_ARCH_2,PG_arch_2);
+   u |= kpf_copy_bit(k, KPF_ARCH_3,PG_arch_3);
 #endif
 
return u;
diff --git a/include/linux/kernel-page-flags.h 
b/include/linux/kernel-page-flags.h
index eee1877a354e..859f4b0c1b2b 100644
--- a/include/linux/kernel-page-flags.h
+++ b/include/linux/kernel-page-flags.h
@@ -18,5 +18,6 @@
 #define KPF_UNCACHED   39
 #define KPF_SOFTDIRTY  40
 #define KPF_ARCH_2 41
+#define KPF_ARCH_3 42
 
 #endif /* LINUX_KERNEL_PAGE_FLAGS_H */
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 5dc7977edf9d..c50ce2812f17 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -134,6 +134,7 @@ enum pageflags {
 #endif
 #ifdef CONFIG_ARCH_USES_PG_ARCH_X
PG_arch_2,
+   PG_arch_3,
 #endif
 #ifdef CONFIG_KASAN_HW_TAGS
PG_skip_kasan_poison,
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 4673e58a7626..9db52bc4ce19 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -130,6 +130,7 @@ IF_HAVE_PG_HWPOISON(PG_hwpoison,"hwpoison"  )   
\
 IF_HAVE_PG_IDLE(PG_young,  "young" )   \
 IF_HAVE_PG_IDLE(PG_idle,   "idle"  )   \
 IF_HAVE_PG_ARCH_X(PG_arch_2,   "arch_2")   \
+IF_HAVE_PG_ARCH_X(PG_arch_3,   "arch_3")   \
 IF_HAVE_PG_SKIP_KASAN_POISON(PG_skip_kasan_poison, "skip_kasan_poison")
 
 #define show_page_flags(flags) \
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 24974a4ce28f..c7c5f9fb226d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2446,6 +2446,7 @@ static void __split_huge_page_tail(struct page *head, int 
tail,
 (1L << PG_unevictable) |
 #ifdef CONFIG_ARCH_USES_PG_ARCH_X
 (1L << PG_arch_2) |
+(1L << PG_arch_3) |
 #endif
 (1L << PG_dirty) |
 LRU_GEN_MASK | LRU_REFS_MASK));
-- 
2.37.3.968.ga6b4b080e4-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 8/8] Documentation: document the ABI changes for KVM_CAP_ARM_MTE

2022-09-20 Thread Peter Collingbourne
Document both the restriction on VM_MTE_ALLOWED mappings and
the relaxation for shared mappings.

Signed-off-by: Peter Collingbourne 
Acked-by: Catalin Marinas 
---
 Documentation/virt/kvm/api.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index abd7c32126ce..7afe603567fd 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7486,8 +7486,9 @@ hibernation of the host; however the VMM needs to 
manually save/restore the
 tags as appropriate if the VM is migrated.
 
 When this capability is enabled all memory in memslots must be mapped as
-not-shareable (no MAP_SHARED), attempts to create a memslot with a
-MAP_SHARED mmap will result in an -EINVAL return.
+``MAP_ANONYMOUS`` or with a RAM-based file mapping (``tmpfs``, ``memfd``),
+attempts to create a memslot with an invalid mmap will result in an
+-EINVAL return.
 
 When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
 perform a bulk copy of tags to/from the guest.
-- 
2.37.3.968.ga6b4b080e4-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 6/8] KVM: arm64: unify the tests for VMAs in memslots when MTE is enabled

2022-09-20 Thread Peter Collingbourne
Previously we allowed creating a memslot containing a private mapping that
was not VM_MTE_ALLOWED, but would later reject KVM_RUN with -EFAULT. Now
we reject the memory region at memslot creation time.

Since this is a minor tweak to the ABI (a VMM that created one of
these memslots would fail later anyway), no VMM to my knowledge has
MTE support yet, and the hardware with the necessary features is not
generally available, we can probably make this ABI change at this point.

Signed-off-by: Peter Collingbourne 
Reviewed-by: Catalin Marinas 
Reviewed-by: Steven Price 
---
 arch/arm64/kvm/mmu.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index bebfd1e0bbf0..e34fbabd8b93 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1073,6 +1073,19 @@ static void sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t 
pfn,
}
 }
 
+static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
+{
+   /*
+* VM_SHARED mappings are not allowed with MTE to avoid races
+* when updating the PG_mte_tagged page flag, see
+* sanitise_mte_tags for more details.
+*/
+   if (vma->vm_flags & VM_SHARED)
+   return false;
+
+   return vma->vm_flags & VM_MTE_ALLOWED;
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
  struct kvm_memory_slot *memslot, unsigned long hva,
  unsigned long fault_status)
@@ -1249,9 +1262,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
}
 
if (fault_status != FSC_PERM && !device && kvm_has_mte(kvm)) {
-   /* Check the VMM hasn't introduced a new VM_SHARED VMA */
-   if ((vma->vm_flags & VM_MTE_ALLOWED) &&
-   !(vma->vm_flags & VM_SHARED)) {
+   /* Check the VMM hasn't introduced a new disallowed VMA */
+   if (kvm_vma_mte_allowed(vma)) {
sanitise_mte_tags(kvm, pfn, vma_pagesize);
} else {
ret = -EFAULT;
@@ -1695,12 +1707,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
if (!vma)
break;
 
-   /*
-* VM_SHARED mappings are not allowed with MTE to avoid races
-* when updating the PG_mte_tagged page flag, see
-* sanitise_mte_tags for more details.
-*/
-   if (kvm_has_mte(kvm) && vma->vm_flags & VM_SHARED) {
+   if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
ret = -EINVAL;
break;
}
-- 
2.37.3.968.ga6b4b080e4-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 3/8] KVM: arm64: Simplify the sanitise_mte_tags() logic

2022-09-20 Thread Peter Collingbourne
From: Catalin Marinas 

Currently sanitise_mte_tags() checks if it's an online page before
attempting to sanitise the tags. Such detection should be done in the
caller via the VM_MTE_ALLOWED vma flag. Since kvm_set_spte_gfn() does
not have the vma, leave the page unmapped if not already tagged. Tag
initialisation will be done on a subsequent access fault in
user_mem_abort().

Signed-off-by: Catalin Marinas 
[p...@google.com: fix the page initializer]
Signed-off-by: Peter Collingbourne 
Reviewed-by: Steven Price 
Cc: Will Deacon 
Cc: Marc Zyngier 
Cc: Peter Collingbourne 
---
 arch/arm64/kvm/mmu.c | 40 +++-
 1 file changed, 15 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 012ed1bc0762..5a131f009cf9 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1056,23 +1056,14 @@ static int get_vma_page_shift(struct vm_area_struct 
*vma, unsigned long hva)
  * - mmap_lock protects between a VM faulting a page in and the VMM performing
  *   an mprotect() to add VM_MTE
  */
-static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
-unsigned long size)
+static void sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
+ unsigned long size)
 {
unsigned long i, nr_pages = size >> PAGE_SHIFT;
-   struct page *page;
+   struct page *page = pfn_to_page(pfn);
 
if (!kvm_has_mte(kvm))
-   return 0;
-
-   /*
-* pfn_to_online_page() is used to reject ZONE_DEVICE pages
-* that may not support tags.
-*/
-   page = pfn_to_online_page(pfn);
-
-   if (!page)
-   return -EFAULT;
+   return;
 
for (i = 0; i < nr_pages; i++, page++) {
if (!page_mte_tagged(page)) {
@@ -1080,8 +1071,6 @@ static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t 
pfn,
set_page_mte_tagged(page);
}
}
-
-   return 0;
 }
 
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
@@ -1092,7 +1081,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
bool write_fault, writable, force_pte = false;
bool exec_fault;
bool device = false;
-   bool shared;
unsigned long mmu_seq;
struct kvm *kvm = vcpu->kvm;
struct kvm_mmu_memory_cache *memcache = >arch.mmu_page_cache;
@@ -1142,8 +1130,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
vma_shift = get_vma_page_shift(vma, hva);
}
 
-   shared = (vma->vm_flags & VM_SHARED);
-
switch (vma_shift) {
 #ifndef __PAGETABLE_PMD_FOLDED
case PUD_SHIFT:
@@ -1264,12 +1250,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
 
if (fault_status != FSC_PERM && !device && kvm_has_mte(kvm)) {
/* Check the VMM hasn't introduced a new VM_SHARED VMA */
-   if (!shared)
-   ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
-   else
+   if ((vma->vm_flags & VM_MTE_ALLOWED) &&
+   !(vma->vm_flags & VM_SHARED)) {
+   sanitise_mte_tags(kvm, pfn, vma_pagesize);
+   } else {
ret = -EFAULT;
-   if (ret)
goto out_unlock;
+   }
}
 
if (writable)
@@ -1491,15 +1478,18 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct 
kvm_gfn_range *range)
 bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
kvm_pfn_t pfn = pte_pfn(range->pte);
-   int ret;
 
if (!kvm->arch.mmu.pgt)
return false;
 
WARN_ON(range->end - range->start != 1);
 
-   ret = sanitise_mte_tags(kvm, pfn, PAGE_SIZE);
-   if (ret)
+   /*
+* If the page isn't tagged, defer to user_mem_abort() for sanitising
+* the MTE tags. The S2 pte should have been unmapped by
+* mmu_notifier_invalidate_range_end().
+*/
+   if (kvm_has_mte(kvm) && !page_mte_tagged(pfn_to_page(pfn)))
return false;
 
/*
-- 
2.37.3.968.ga6b4b080e4-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 2/8] arm64: mte: Fix/clarify the PG_mte_tagged semantics

2022-09-20 Thread Peter Collingbourne
From: Catalin Marinas 

Currently the PG_mte_tagged page flag mostly means the page contains
valid tags and it should be set after the tags have been cleared or
restored. However, in mte_sync_tags() it is set before setting the tags
to avoid, in theory, a race with concurrent mprotect(PROT_MTE) for
shared pages. However, a concurrent mprotect(PROT_MTE) with a copy on
write in another thread can cause the new page to have stale tags.
Similarly, tag reading via ptrace() can read stale tags if the
PG_mte_tagged flag is set before actually clearing/restoring the tags.

Fix the PG_mte_tagged semantics so that it is only set after the tags
have been cleared or restored. This is safe for swap restoring into a
MAP_SHARED or CoW page since the core code takes the page lock. Add two
functions to test and set the PG_mte_tagged flag with acquire and
release semantics. The downside is that concurrent mprotect(PROT_MTE) on
a MAP_SHARED page may cause tag loss. This is already the case for KVM
guests if a VMM changes the page protection while the guest triggers a
user_mem_abort().

Signed-off-by: Catalin Marinas 
[p...@google.com: fix build with CONFIG_ARM64_MTE disabled]
Signed-off-by: Peter Collingbourne 
Reviewed-by: Cornelia Huck 
Reviewed-by: Steven Price 
Cc: Will Deacon 
Cc: Marc Zyngier 
Cc: Peter Collingbourne 
---
 arch/arm64/include/asm/mte.h | 30 ++
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/arm64/kernel/cpufeature.c   |  4 +++-
 arch/arm64/kernel/elfcore.c  |  2 +-
 arch/arm64/kernel/hibernate.c|  2 +-
 arch/arm64/kernel/mte.c  | 12 +++-
 arch/arm64/kvm/guest.c   |  4 ++--
 arch/arm64/kvm/mmu.c |  4 ++--
 arch/arm64/mm/copypage.c |  5 +++--
 arch/arm64/mm/fault.c|  2 +-
 arch/arm64/mm/mteswap.c  |  2 +-
 11 files changed, 52 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index aa523591a44e..46618c575eac 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -37,6 +37,29 @@ void mte_free_tag_storage(char *storage);
 /* track which pages have valid allocation tags */
 #define PG_mte_tagged  PG_arch_2
 
+static inline void set_page_mte_tagged(struct page *page)
+{
+   /*
+* Ensure that the tags written prior to this function are visible
+* before the page flags update.
+*/
+   smp_wmb();
+   set_bit(PG_mte_tagged, >flags);
+}
+
+static inline bool page_mte_tagged(struct page *page)
+{
+   bool ret = test_bit(PG_mte_tagged, >flags);
+
+   /*
+* If the page is tagged, ensure ordering with a likely subsequent
+* read of the tags.
+*/
+   if (ret)
+   smp_rmb();
+   return ret;
+}
+
 void mte_zero_clear_page_tags(void *addr);
 void mte_sync_tags(pte_t old_pte, pte_t pte);
 void mte_copy_page_tags(void *kto, const void *kfrom);
@@ -54,6 +77,13 @@ size_t mte_probe_user_range(const char __user *uaddr, size_t 
size);
 /* unused if !CONFIG_ARM64_MTE, silence the compiler */
 #define PG_mte_tagged  0
 
+static inline void set_page_mte_tagged(struct page *page)
+{
+}
+static inline bool page_mte_tagged(struct page *page)
+{
+   return false;
+}
 static inline void mte_zero_clear_page_tags(void *addr)
 {
 }
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 71a1af42f0e8..98b638441521 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1050,7 +1050,7 @@ static inline void arch_swap_invalidate_area(int type)
 static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
 {
if (system_supports_mte() && mte_restore_tags(entry, >page))
-   set_bit(PG_mte_tagged, >flags);
+   set_page_mte_tagged(>page);
 }
 
 #endif /* CONFIG_ARM64_MTE */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 5d0527ba0804..ab3312788d60 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2049,8 +2049,10 @@ static void cpu_enable_mte(struct arm64_cpu_capabilities 
const *cap)
 * Clear the tags in the zero page. This needs to be done via the
 * linear map which has the Tagged attribute.
 */
-   if (!test_and_set_bit(PG_mte_tagged, _PAGE(0)->flags))
+   if (!page_mte_tagged(ZERO_PAGE(0))) {
mte_clear_page_tags(lm_alias(empty_zero_page));
+   set_page_mte_tagged(ZERO_PAGE(0));
+   }
 
kasan_init_hw_tags_cpu();
 }
diff --git a/arch/arm64/kernel/elfcore.c b/arch/arm64/kernel/elfcore.c
index 27ef7ad3ffd2..353009d7f307 100644
--- a/arch/arm64/kernel/elfcore.c
+++ b/arch/arm64/kernel/elfcore.c
@@ -47,7 +47,7 @@ static int mte_dump_tag_range(struct coredump_params *cprm,
 * Pages mapped in user space as !pte_access_permitted() (e.g.
 * PROT_EXEC only) may not have 

[PATCH v4 1/8] mm: Do not enable PG_arch_2 for all 64-bit architectures

2022-09-20 Thread Peter Collingbourne
From: Catalin Marinas 

Commit 4beba9486abd ("mm: Add PG_arch_2 page flag") introduced a new
page flag for all 64-bit architectures. However, even if an architecture
is 64-bit, it may still have limited spare bits in the 'flags' member of
'struct page'. This may happen if an architecture enables SPARSEMEM
without SPARSEMEM_VMEMMAP as is the case with the newly added loongarch.
This architecture port needs 19 more bits for the sparsemem section
information and, while it is currently fine with PG_arch_2, adding any
more PG_arch_* flags will trigger build-time warnings.

Add a new CONFIG_ARCH_USES_PG_ARCH_X option which can be selected by
architectures that need more PG_arch_* flags beyond PG_arch_1. Select it
on arm64.

Signed-off-by: Catalin Marinas 
Signed-off-by: Peter Collingbourne 
Reported-by: kernel test robot 
Cc: Andrew Morton 
Cc: Steven Price 
---
 arch/arm64/Kconfig | 1 +
 fs/proc/page.c | 2 +-
 include/linux/page-flags.h | 2 +-
 include/trace/events/mmflags.h | 8 
 mm/Kconfig | 8 
 mm/huge_memory.c   | 2 +-
 6 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index f6737d2f37b2..f2435b62e0ba 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1948,6 +1948,7 @@ config ARM64_MTE
depends on ARM64_PAN
select ARCH_HAS_SUBPAGE_FAULTS
select ARCH_USES_HIGH_VMA_FLAGS
+   select ARCH_USES_PG_ARCH_X
help
  Memory Tagging (part of the ARMv8.5 Extensions) provides
  architectural support for run-time, always-on detection of
diff --git a/fs/proc/page.c b/fs/proc/page.c
index a2873a617ae8..6f4b4bcb9b0d 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -218,7 +218,7 @@ u64 stable_page_flags(struct page *page)
u |= kpf_copy_bit(k, KPF_PRIVATE_2, PG_private_2);
u |= kpf_copy_bit(k, KPF_OWNER_PRIVATE, PG_owner_priv_1);
u |= kpf_copy_bit(k, KPF_ARCH,  PG_arch_1);
-#ifdef CONFIG_64BIT
+#ifdef CONFIG_ARCH_USES_PG_ARCH_X
u |= kpf_copy_bit(k, KPF_ARCH_2,PG_arch_2);
 #endif
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 0b0ae5084e60..5dc7977edf9d 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -132,7 +132,7 @@ enum pageflags {
PG_young,
PG_idle,
 #endif
-#ifdef CONFIG_64BIT
+#ifdef CONFIG_ARCH_USES_PG_ARCH_X
PG_arch_2,
 #endif
 #ifdef CONFIG_KASAN_HW_TAGS
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 11524cda4a95..4673e58a7626 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -90,10 +90,10 @@
 #define IF_HAVE_PG_IDLE(flag,string)
 #endif
 
-#ifdef CONFIG_64BIT
-#define IF_HAVE_PG_ARCH_2(flag,string) ,{1UL << flag, string}
+#ifdef CONFIG_ARCH_USES_PG_ARCH_X
+#define IF_HAVE_PG_ARCH_X(flag,string) ,{1UL << flag, string}
 #else
-#define IF_HAVE_PG_ARCH_2(flag,string)
+#define IF_HAVE_PG_ARCH_X(flag,string)
 #endif
 
 #ifdef CONFIG_KASAN_HW_TAGS
@@ -129,7 +129,7 @@ IF_HAVE_PG_UNCACHED(PG_uncached,"uncached"  )   
\
 IF_HAVE_PG_HWPOISON(PG_hwpoison,   "hwpoison"  )   \
 IF_HAVE_PG_IDLE(PG_young,  "young" )   \
 IF_HAVE_PG_IDLE(PG_idle,   "idle"  )   \
-IF_HAVE_PG_ARCH_2(PG_arch_2,   "arch_2")   \
+IF_HAVE_PG_ARCH_X(PG_arch_2,   "arch_2")   \
 IF_HAVE_PG_SKIP_KASAN_POISON(PG_skip_kasan_poison, "skip_kasan_poison")
 
 #define show_page_flags(flags) \
diff --git a/mm/Kconfig b/mm/Kconfig
index ceec438c0741..a976cbb07bd6 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -999,6 +999,14 @@ config ARCH_USES_HIGH_VMA_FLAGS
 config ARCH_HAS_PKEYS
bool
 
+config ARCH_USES_PG_ARCH_X
+   bool
+   help
+ Enable the definition of PG_arch_x page flags with x > 1. Only
+ suitable for 64-bit architectures with CONFIG_FLATMEM or
+ CONFIG_SPARSEMEM_VMEMMAP enabled, otherwise there may not be
+ enough room for additional bits in page->flags.
+
 config VM_EVENT_COUNTERS
default y
bool "Enable VM event counters for /proc/vmstat" if EXPERT
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1cc4a5f4791e..24974a4ce28f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2444,7 +2444,7 @@ static void __split_huge_page_tail(struct page *head, int 
tail,
 (1L << PG_workingset) |
 (1L << PG_locked) |
 (1L << PG_unevictable) |
-#ifdef CONFIG_64BIT
+#ifdef CONFIG_ARCH_USES_PG_ARCH_X
 (1L << PG_arch_2) |
 #endif
 (1L << PG_dirty)

[PATCH v4 0/8] KVM: arm64: permit MAP_SHARED mappings with MTE enabled

2022-09-20 Thread Peter Collingbourne
Hi,

This patch series allows VMMs to use shared mappings in MTE enabled
guests. The first five patches were taken from Catalin's tree [1] which
addressed some review feedback from when they were previously sent out
as v3 of this series. The first patch from Catalin's tree makes room
for an additional PG_arch_3 flag by making the newer PG_arch_* flags
arch-dependent. The next four patches are based on a series that
Catalin sent out prior to v3, whose cover letter [2] I quote from below:

> This series aims to fix the races between initialising the tags on a
> page and setting the PG_mte_tagged flag. Currently the flag is set
> either before or after that tag initialisation and this can lead to CoW
> copying stale tags. The first patch moves the flag setting after the
> tags have been initialised, solving the CoW issue. However, concurrent
> mprotect() on a shared mapping may (very rarely) lead to valid tags
> being zeroed.
>
> The second skips the sanitise_mte_tags() call in kvm_set_spte_gfn(),
> deferring it to user_mem_abort(). The outcome is that no
> sanitise_mte_tags() can be simplified to skip the pfn_to_online_page()
> check and only rely on VM_MTE_ALLOWED vma flag that can be checked in
> user_mem_abort().
>
> The third and fourth patches use PG_arch_3 as a lock for page tagging,
> based on Peter Collingbourne's idea of a two-bit lock.
>
> I think the first patch can be queued but the rest needs some in depth
> review and test. With this series (if correct) we could allos MAP_SHARED
> on KVM guest memory but this is to be discussed separately as there are
> some KVM ABI implications.

I rebased Catalin's tree onto -next and added the proposed userspace
enablement patches after the series. I've tested it on QEMU as well as
on MTE-capable hardware by booting a Linux kernel and userspace under
a crosvm with MTE support [3].

[1] git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux devel/mte-pg-flags
[2] 
https://lore.kernel.org/all/20220705142619.4135905-1-catalin.mari...@arm.com/
[3] https://chromium-review.googlesource.com/c/crosvm/crosvm/+/3892141

Catalin Marinas (4):
  mm: Do not enable PG_arch_2 for all 64-bit architectures
  arm64: mte: Fix/clarify the PG_mte_tagged semantics
  KVM: arm64: Simplify the sanitise_mte_tags() logic
  arm64: mte: Lock a page for MTE tag initialisation

Peter Collingbourne (4):
  mm: Add PG_arch_3 page flag
  KVM: arm64: unify the tests for VMAs in memslots when MTE is enabled
  KVM: arm64: permit all VM_MTE_ALLOWED mappings with MTE enabled
  Documentation: document the ABI changes for KVM_CAP_ARM_MTE

 Documentation/virt/kvm/api.rst|  5 ++-
 arch/arm64/Kconfig|  1 +
 arch/arm64/include/asm/mte.h  | 65 ++-
 arch/arm64/include/asm/pgtable.h  |  4 +-
 arch/arm64/kernel/cpufeature.c|  4 +-
 arch/arm64/kernel/elfcore.c   |  2 +-
 arch/arm64/kernel/hibernate.c |  2 +-
 arch/arm64/kernel/mte.c   | 16 
 arch/arm64/kvm/guest.c| 18 +
 arch/arm64/kvm/mmu.c  | 55 +++---
 arch/arm64/mm/copypage.c  |  7 +++-
 arch/arm64/mm/fault.c |  4 +-
 arch/arm64/mm/mteswap.c   | 13 ---
 fs/proc/page.c|  3 +-
 include/linux/kernel-page-flags.h |  1 +
 include/linux/page-flags.h|  3 +-
 include/trace/events/mmflags.h|  9 +++--
 mm/Kconfig|  8 
 mm/huge_memory.c  |  3 +-
 19 files changed, 152 insertions(+), 71 deletions(-)

-- 
2.37.3.968.ga6b4b080e4-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm