Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse

2023-09-11 Thread David Hildenbrand

On 11.09.23 13:52, Catalin Marinas wrote:

On Wed, Sep 06, 2023 at 12:23:21PM +0100, Alexandru Elisei wrote:

On Thu, Aug 24, 2023 at 04:24:30PM +0100, Catalin Marinas wrote:

On Thu, Aug 24, 2023 at 01:25:41PM +0200, David Hildenbrand wrote:

On 24.08.23 13:06, David Hildenbrand wrote:

Regarding one complication: "The kernel needs to know where to allocate
a PROT_MTE page from or migrate a current page if it becomes PROT_MTE
(mprotect()) and the range it is in does not support tagging.",
simplified handling would be if it's in a MIGRATE_CMA pageblock, it
doesn't support tagging. You have to migrate to a !CMA page (for
example, not specifying GFP_MOVABLE as a quick way to achieve that).


Okay, I now realize that this patch set effectively duplicates some CMA
behavior using a new migrate-type.

[...]

I considered mixing the tag storage memory memory with normal memory and
adding it to MIGRATE_CMA. But since tag storage memory cannot be tagged,
this means that it's not enough anymore to have a __GFP_MOVABLE allocation
request to use MIGRATE_CMA.

I considered two solutions to this problem:

1. Only allocate from MIGRATE_CMA is the requested memory is not tagged =>
this effectively means transforming all memory from MIGRATE_CMA into the
MIGRATE_METADATA migratetype that the series introduces. Not very
appealing, because that means treating normal memory that is also on the
MIGRATE_CMA lists as tagged memory.


That's indeed not ideal. We could try this if it makes the patches
significantly simpler, though I'm not so sure.

Allocating metadata is the easier part as we know the correspondence
from the tagged pages (32 PROT_MTE page) to the metadata page (1 tag
storage page), so alloc_contig_range() does this for us. Just adding it
to the CMA range is sufficient.

However, making sure that we don't allocate PROT_MTE pages from the
metadata range is what led us to another migrate type. I guess we could
achieve something similar with a new zone or a CPU-less NUMA node,


Ideally, no significant core-mm changes to optimize for an architecture 
oddity. That implies, no new zones and no new migratetypes -- unless it 
is unavoidable and you are confident that you can convince core-MM 
people that the use case (giving back 3% of system RAM at max in some 
setups) is worth the trouble.


I also had CPU-less NUMA nodes in mind when thinking about that, but not 
sure how easy it would be to integrate it. If the tag memory has 
actually different performance characteristics as well, a NUMA node 
would be the right choice.


If we could find some way to easily support this either via CMA or 
CPU-less NUMA nodes, that would be much preferable; even if we cannot 
cover each and every future use case right now. I expect some issues 
with CXL+MTE either way , but are happy to be taught otherwise :)



Another thought I had was adding something like CMA memory 
characteristics. Like, asking if a given CMA area/page supports tagging 
(i.e., flag for the CMA area set?)?


When you need memory that supports tagging and have a page that does not 
support tagging (CMA && taggable), simply migrate to !MOVABLE memory 
(eventually we could also try adding !CMA).


Was that discussed and what would be the challenges with that? Page 
migration due to compaction comes to mind, but it might also be easy to 
handle if we can just avoid CMA memory for that.



though the latter is not guaranteed not to allocate memory from the
range, only make it less likely. Both these options are less flexible in
terms of size/alignment/placement.

Maybe as a quick hack - only allow PROT_MTE from ZONE_NORMAL and
configure the metadata range in ZONE_MOVABLE but at some point I'd
expect some CXL-attached memory to support MTE with additional carveout
reserved.


I have no idea how we could possibly cleanly support memory hotplug in 
virtual environments (virtual DIMMs, virtio-mem) with MTE. In contrast 
to s390x storage keys, the approach that arm64 with MTE took here 
(exposing tag memory to the VM) makes it rather hard and complicated.


--
Cheers,

David / dhildenb



Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse

2023-09-11 Thread Catalin Marinas
On Wed, Sep 06, 2023 at 12:23:21PM +0100, Alexandru Elisei wrote:
> On Thu, Aug 24, 2023 at 04:24:30PM +0100, Catalin Marinas wrote:
> > On Thu, Aug 24, 2023 at 01:25:41PM +0200, David Hildenbrand wrote:
> > > On 24.08.23 13:06, David Hildenbrand wrote:
> > > > Regarding one complication: "The kernel needs to know where to allocate
> > > > a PROT_MTE page from or migrate a current page if it becomes PROT_MTE
> > > > (mprotect()) and the range it is in does not support tagging.",
> > > > simplified handling would be if it's in a MIGRATE_CMA pageblock, it
> > > > doesn't support tagging. You have to migrate to a !CMA page (for
> > > > example, not specifying GFP_MOVABLE as a quick way to achieve that).
> > > 
> > > Okay, I now realize that this patch set effectively duplicates some CMA
> > > behavior using a new migrate-type.
[...]
> I considered mixing the tag storage memory memory with normal memory and
> adding it to MIGRATE_CMA. But since tag storage memory cannot be tagged,
> this means that it's not enough anymore to have a __GFP_MOVABLE allocation
> request to use MIGRATE_CMA.
> 
> I considered two solutions to this problem:
> 
> 1. Only allocate from MIGRATE_CMA is the requested memory is not tagged =>
> this effectively means transforming all memory from MIGRATE_CMA into the
> MIGRATE_METADATA migratetype that the series introduces. Not very
> appealing, because that means treating normal memory that is also on the
> MIGRATE_CMA lists as tagged memory.

That's indeed not ideal. We could try this if it makes the patches
significantly simpler, though I'm not so sure.

Allocating metadata is the easier part as we know the correspondence
from the tagged pages (32 PROT_MTE page) to the metadata page (1 tag
storage page), so alloc_contig_range() does this for us. Just adding it
to the CMA range is sufficient.

However, making sure that we don't allocate PROT_MTE pages from the
metadata range is what led us to another migrate type. I guess we could
achieve something similar with a new zone or a CPU-less NUMA node,
though the latter is not guaranteed not to allocate memory from the
range, only make it less likely. Both these options are less flexible in
terms of size/alignment/placement.

Maybe as a quick hack - only allow PROT_MTE from ZONE_NORMAL and
configure the metadata range in ZONE_MOVABLE but at some point I'd
expect some CXL-attached memory to support MTE with additional carveout
reserved.

To recap, in this series, a PROT_MTE page allocation starts with a
typical allocation from anywhere other than MIGRATE_METADATA followed by
the hooks to reserve the corresponding metadata range at (pfn * 128 +
offset) for a 4K page. The whole metadata page is reserved, so the
adjacent 31 pages around the original allocation can also be mapped as
PROT_MTE.

(Peter and Evgenii @ Google had a slightly different approach in their
prototype: separate free_area[] array for PROT_MTE pages; while it has
some advantages, I found it more intrusive since the same page can be on
a free_area/free_list or another)

> 2. Keep track of which pages are tag storage at page granularity (either by
> a page flag, or by checking that the pfn falls in one of the tag storage
> region, or by some other mechanism). When the page allocator takes free
> pages from the MIGRATE_METADATA list to satisfy an allocation, compare the
> gfp mask with the page type, and if the allocation is tagged and the page
> is a tag storage page, put it back at the tail of the free list and choose
> the next page. Repeat until the page allocator finds a normal memory page
> that can be tagged (some refinements obviously needed to need to avoid
> infinite loops).

With large enough CMA areas, there's a real risk of latency spikes, RCU
stalls etc. Not really keen on such heuristics.

-- 
Catalin


Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse

2023-09-13 Thread 李冠穎
On Wed, 2023-08-23 at 14:13 +0100, Alexandru Elisei wrote:
> Introduction
> 
> 
> Arm has implemented memory coloring in hardware, and the feature is
> called
> Memory Tagging Extensions (MTE). It works by embedding a 4 bit tag in
> bits
> 59..56 of a pointer, and storing this tag to a reserved memory
> location.
> When the pointer is dereferenced, the hardware compares the tag
> embedded in
> the pointer (logical tag) with the tag stored in memory (allocation
> tag).
> 
> The relation between memory and where the tag for that memory is
> stored is
> static.
> 
> The memory where the tags are stored have been so far unaccessible to
> Linux.
> This series aims to change that, by adding support for using the tag
> storage
> memory only as data memory; tag storage memory cannot be itself
> tagged.
> 
> 
> Implementation
> ==
> 
> The series is based on v6.5-rc3 with these two patches cherry picked:
> 
> - mm: Call arch_swap_restore() from unuse_pte():
> 
> 
> https://lore.kernel.org/all/20230523004312.1807357-3-...@google.com/
> 
> - arm64: mte: Simplify swap tag restoration logic:
> 
> 
> https://lore.kernel.org/all/20230523004312.1807357-4-...@google.com/
> 
> The above two patches are queued for the v6.6 merge window:
> 
> 
> https://lore.kernel.org/all/20230702123821.04e64ea2c04dd0fdc947b...@linux-foundation.org/
> 
> The entire series, including the above patches, can be cloned with:
> 
> $ git clone https://gitlab.arm.com/linux-arm/linux-ae.git \
>   -b arm-mte-dynamic-carveout-rfc-v1
> 
> On the arm64 architecture side, an extension is being worked on that
> will
> clarify how MTE tag storage reuse should behave. The extension will
> be
> made public soon.
> 
> On the Linux side, MTE tag storage reuse is accomplished with the
> following changes:
> 
> 1. The tag storage memory is exposed to the memory allocator as a new
> migratetype, MIGRATE_METADATA. It behaves similarly to MIGRATE_CMA,
> with
> the restriction that it cannot be used to allocate tagged memory (tag
> storage memory cannot be tagged). On tagged page allocation, the
> corresponding tag storage is reserved via alloc_contig_range().
> 
> 2. mprotect(PROT_MTE) is implemented by changing the pte prot to
> PAGE_METADATA_NONE. When the page is next accessed, a fault is taken
> and
> the corresponding tag storage is reserved.
> 
> 3. When the code tries to copy tags to a page which doesn't have the
> tag
> storage reserved, the tags are copied to an xarray and restored in
> set_pte_at(), when the page is eventually mapped with the tag storage
> reserved.
> 
> KVM support has not been implemented yet, that because a non-MTE
> enabled VMA
> can back the memory of an MTE-enabled VM. After there is a consensus
> on the
> right approach on the memory management support, I will add it.
> 
> Explanations for the last two changes follow. The gist of it is that
> they
> were added mostly because of races, and it my intention to make the
> code
> more robust.
> 
> PAGE_METADATA_NONE was introduced to avoid races with
> mprotect(PROT_MTE).
> For example, migration can race with mprotect(PROT_MTE):
> - thread 0 initiates migration for a page in a non-MTE enabled VMA
> and a
>   destination page is allocated without tag storage.
> - thread 1 handles an mprotect(PROT_MTE), the VMA becomes tagged, and
> an
>   access turns the source page that is in the process of being
> migrated
>   into a tagged page.
> - thread 0 finishes migration and the destination page is mapped as
> tagged,
>   but without tag storage reserved.
> More details and examples can be found in the patches.
> 
> This race is also related to how tag restoring is handled when tag
> storage
> is missing: when a tagged page is swapped out, the tags are saved in
> an
> xarray indexed by swp_entry.val. When a page is swapped back in, if
> there
> are tags corresponding to the swp_entry that the page will replace,
> the
> tags are unconditionally restored, even if the page will be mapped as
> untagged. Because the page will be mapped as untagged, tag storage
> was
> not reserved when the page was allocated to replace the swp_entry
> which has
> tags associated with it.
> 
> To get around this, save the tags in a new xarray, this time indexed
> by
> pfn, and restore them when the same page is mapped as tagged.
> 
> This also solves another race, this time with copy_highpage. In the
> scenario where migration races with mprotect(PROT_MTE), before the
> page is
> mapped, the contents of the source page is copied to the destination.
> And
> this includes tags, which will be copied to a page with missing tag
> storage, which can to data corruption if the missing tag storage is
> in use
> for data. So copy_highpage() has received a similar treatment to the
> swap
> code, and the source tags are copied in the xarray indexed by the
> destination page pfn.
> 
> 
> Overview of the patches
> ===
> 
> Patches 1-3 do some preparatory work by renaming

Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse

2023-09-13 Thread Catalin Marinas
On Mon, Sep 11, 2023 at 02:29:03PM +0200, David Hildenbrand wrote:
> On 11.09.23 13:52, Catalin Marinas wrote:
> > On Wed, Sep 06, 2023 at 12:23:21PM +0100, Alexandru Elisei wrote:
> > > On Thu, Aug 24, 2023 at 04:24:30PM +0100, Catalin Marinas wrote:
> > > > On Thu, Aug 24, 2023 at 01:25:41PM +0200, David Hildenbrand wrote:
> > > > > On 24.08.23 13:06, David Hildenbrand wrote:
> > > > > > Regarding one complication: "The kernel needs to know where to 
> > > > > > allocate
> > > > > > a PROT_MTE page from or migrate a current page if it becomes 
> > > > > > PROT_MTE
> > > > > > (mprotect()) and the range it is in does not support tagging.",
> > > > > > simplified handling would be if it's in a MIGRATE_CMA pageblock, it
> > > > > > doesn't support tagging. You have to migrate to a !CMA page (for
> > > > > > example, not specifying GFP_MOVABLE as a quick way to achieve that).
> > > > > 
> > > > > Okay, I now realize that this patch set effectively duplicates some 
> > > > > CMA
> > > > > behavior using a new migrate-type.
> > [...]
> > > I considered mixing the tag storage memory memory with normal memory and
> > > adding it to MIGRATE_CMA. But since tag storage memory cannot be tagged,
> > > this means that it's not enough anymore to have a __GFP_MOVABLE allocation
> > > request to use MIGRATE_CMA.
> > > 
> > > I considered two solutions to this problem:
> > > 
> > > 1. Only allocate from MIGRATE_CMA is the requested memory is not tagged =>
> > > this effectively means transforming all memory from MIGRATE_CMA into the
> > > MIGRATE_METADATA migratetype that the series introduces. Not very
> > > appealing, because that means treating normal memory that is also on the
> > > MIGRATE_CMA lists as tagged memory.
> > 
> > That's indeed not ideal. We could try this if it makes the patches
> > significantly simpler, though I'm not so sure.
> > 
> > Allocating metadata is the easier part as we know the correspondence
> > from the tagged pages (32 PROT_MTE page) to the metadata page (1 tag
> > storage page), so alloc_contig_range() does this for us. Just adding it
> > to the CMA range is sufficient.
> > 
> > However, making sure that we don't allocate PROT_MTE pages from the
> > metadata range is what led us to another migrate type. I guess we could
> > achieve something similar with a new zone or a CPU-less NUMA node,
> 
> Ideally, no significant core-mm changes to optimize for an architecture
> oddity. That implies, no new zones and no new migratetypes -- unless it is
> unavoidable and you are confident that you can convince core-MM people that
> the use case (giving back 3% of system RAM at max in some setups) is worth
> the trouble.

If I was an mm maintainer, I'd also question this ;). But vendors seem
pretty picky about the amount of RAM reserved for MTE (e.g. 0.5G for a
16G platform does look somewhat big). As more and more apps adopt MTE,
the wastage would be smaller but the first step is getting vendors to
enable it.

> I also had CPU-less NUMA nodes in mind when thinking about that, but not
> sure how easy it would be to integrate it. If the tag memory has actually
> different performance characteristics as well, a NUMA node would be the
> right choice.

In general I'd expect the same characteristics. However, changing the
memory designation from tag to data (and vice-versa) requires some cache
maintenance. The allocation cost is slightly higher (not the runtime
one), so it would help if the page allocator does not favour this range.
Anyway, that's an optimisation to worry about later.

> If we could find some way to easily support this either via CMA or CPU-less
> NUMA nodes, that would be much preferable; even if we cannot cover each and
> every future use case right now. I expect some issues with CXL+MTE either
> way , but are happy to be taught otherwise :)

I think CXL+MTE is rather theoretical at the moment. Given that PCIe
doesn't have any notion of MTE, more likely there would be some piece of
interconnect that generates two memory accesses: one for data and the
other for tags at a configurable offset (which may or may not be in the
same CXL range).

> Another thought I had was adding something like CMA memory characteristics.
> Like, asking if a given CMA area/page supports tagging (i.e., flag for the
> CMA area set?)?

I don't think adding CMA memory characteristics helps much. The metadata
allocation wouldn't go through cma_alloc() but rather
alloc_contig_range() directly for a specific pfn corresponding to the
data pages with PROT_MTE. The core mm code doesn't need to know about
the tag storage layout.

It's also unlikely for cma_alloc() memory to be mapped as PROT_MTE.
That's typically coming from device drivers (DMA API) with their own
mmap() implementation that doesn't normally set VM_MTE_ALLOWED (and
therefore PROT_MTE is rejected).

What we need though is to prevent vma_alloc_folio() from allocating from
a MIGRATE_CMA list if PROT_MTE (VM_MTE). I guess that's basically
removing __GFP_MOVABL

Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse

2023-09-14 Thread Catalin Marinas
Hi Kuan-Ying,

On Wed, Sep 13, 2023 at 08:11:40AM +, Kuan-Ying Lee (李冠穎) wrote:
> On Wed, 2023-08-23 at 14:13 +0100, Alexandru Elisei wrote:
> > diff --git a/arch/arm64/boot/dts/arm/fvp-base-revc.dts
> > b/arch/arm64/boot/dts/arm/fvp-base-revc.dts
> > index 60472d65a355..bd050373d6cf 100644
> > --- a/arch/arm64/boot/dts/arm/fvp-base-revc.dts
> > +++ b/arch/arm64/boot/dts/arm/fvp-base-revc.dts
> > @@ -165,10 +165,28 @@ C1_L2: l2-cache1 {
> > };
> > };
> >  
> > -   memory@8000 {
> > +   memory0: memory@8000 {
> > device_type = "memory";
> > -   reg = <0x 0x8000 0 0x8000>,
> > - <0x0008 0x8000 0 0x8000>;
> > +   reg = <0x00 0x8000 0x00 0x7c00>;
> > +   };
> > +
> > +   metadata0: metadata@c000  {
> > +   compatible = "arm,mte-tag-storage";
> > +   reg = <0x00 0xfc00 0x00 0x3e0>;
> > +   block-size = <0x1000>;
> > +   memory = <&memory0>;
> > +   };
> > +
> > +   memory1: memory@88000 {
> > +   device_type = "memory";
> > +   reg = <0x08 0x8000 0x00 0x7c00>;
> > +   };
> > +
> > +   metadata1: metadata@8c000  {
> > +   compatible = "arm,mte-tag-storage";
> > +   reg = <0x08 0xfc00 0x00 0x3e0>;
> > +   block-size = <0x1000>;
> > +   memory = <&memory1>;
> > };
> >  
> 
> AFAIK, the above memory configuration means that there are two region
> of dram(0x8000-0xfc00 and 0x8_8000-0x8_fc000) and this
> is called PDD memory map.
> 
> Document[1] said there are some constraints of tag memory as below.
> 
> | The following constraints apply to the tag regions in DRAM:
> | 1. The tag region cannot be interleaved with the data region.
> | The tag region must also be above the data region within DRAM.
> |
> | 2.The tag region in the physical address space cannot straddle
> | multiple regions of a memory map.
> |
> | PDD memory map is not allowed to have part of the tag region between
> | 2GB-4GB and another part between 34GB-64GB.
> 
> I'm not sure if we can separate tag memory with the above
> configuration. Or do I miss something?
> 
> [1] https://developer.arm.com/documentation/101569/0300/?lang=en
> (Section 5.4.6.1)

Good point, thanks. The above dts some random layout we picked as an
example, it doesn't match any real hardware and we didn't pay attention
to the interconnect limitations (we fake the tag storage on the model).

I'll try to dig out how the mtu_tag_addr_shutter registers work and how
the sparse DRAM space is compressed to a smaller tag range. But that's
something done by firmware and the kernel only learns the tag storage
location from the DT (provided by firmware). We also don't need to know
the fine-grained mapping between 32 bytes of data and 1 byte (2 tags) in
the tag storage, only the block size in the tag storage space that
covers all interleaving done by the interconnect (it can be from 1 byte
to something larger like a page; the kernel will then use the lowest
common multiple between a page size and this tag block size to figure
out how many pages to reserve).

-- 
Catalin


Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse

2023-10-24 Thread Hyesoo Yu
On Wed, Sep 13, 2023 at 04:29:25PM +0100, Catalin Marinas wrote:
> On Mon, Sep 11, 2023 at 02:29:03PM +0200, David Hildenbrand wrote:
> > On 11.09.23 13:52, Catalin Marinas wrote:
> > > On Wed, Sep 06, 2023 at 12:23:21PM +0100, Alexandru Elisei wrote:
> > > > On Thu, Aug 24, 2023 at 04:24:30PM +0100, Catalin Marinas wrote:
> > > > > On Thu, Aug 24, 2023 at 01:25:41PM +0200, David Hildenbrand wrote:
> > > > > > On 24.08.23 13:06, David Hildenbrand wrote:
> > > > > > > Regarding one complication: "The kernel needs to know where to 
> > > > > > > allocate
> > > > > > > a PROT_MTE page from or migrate a current page if it becomes 
> > > > > > > PROT_MTE
> > > > > > > (mprotect()) and the range it is in does not support tagging.",
> > > > > > > simplified handling would be if it's in a MIGRATE_CMA pageblock, 
> > > > > > > it
> > > > > > > doesn't support tagging. You have to migrate to a !CMA page (for
> > > > > > > example, not specifying GFP_MOVABLE as a quick way to achieve 
> > > > > > > that).
> > > > > > 
> > > > > > Okay, I now realize that this patch set effectively duplicates some 
> > > > > > CMA
> > > > > > behavior using a new migrate-type.
> > > [...]
> > > > I considered mixing the tag storage memory memory with normal memory and
> > > > adding it to MIGRATE_CMA. But since tag storage memory cannot be tagged,
> > > > this means that it's not enough anymore to have a __GFP_MOVABLE 
> > > > allocation
> > > > request to use MIGRATE_CMA.
> > > > 
> > > > I considered two solutions to this problem:
> > > > 
> > > > 1. Only allocate from MIGRATE_CMA is the requested memory is not tagged 
> > > > =>
> > > > this effectively means transforming all memory from MIGRATE_CMA into the
> > > > MIGRATE_METADATA migratetype that the series introduces. Not very
> > > > appealing, because that means treating normal memory that is also on the
> > > > MIGRATE_CMA lists as tagged memory.
> > > 
> > > That's indeed not ideal. We could try this if it makes the patches
> > > significantly simpler, though I'm not so sure.
> > > 
> > > Allocating metadata is the easier part as we know the correspondence
> > > from the tagged pages (32 PROT_MTE page) to the metadata page (1 tag
> > > storage page), so alloc_contig_range() does this for us. Just adding it
> > > to the CMA range is sufficient.
> > > 
> > > However, making sure that we don't allocate PROT_MTE pages from the
> > > metadata range is what led us to another migrate type. I guess we could
> > > achieve something similar with a new zone or a CPU-less NUMA node,
> > 
> > Ideally, no significant core-mm changes to optimize for an architecture
> > oddity. That implies, no new zones and no new migratetypes -- unless it is
> > unavoidable and you are confident that you can convince core-MM people that
> > the use case (giving back 3% of system RAM at max in some setups) is worth
> > the trouble.
> 
> If I was an mm maintainer, I'd also question this ;). But vendors seem
> pretty picky about the amount of RAM reserved for MTE (e.g. 0.5G for a
> 16G platform does look somewhat big). As more and more apps adopt MTE,
> the wastage would be smaller but the first step is getting vendors to
> enable it.
> 
> > I also had CPU-less NUMA nodes in mind when thinking about that, but not
> > sure how easy it would be to integrate it. If the tag memory has actually
> > different performance characteristics as well, a NUMA node would be the
> > right choice.
> 
> In general I'd expect the same characteristics. However, changing the
> memory designation from tag to data (and vice-versa) requires some cache
> maintenance. The allocation cost is slightly higher (not the runtime
> one), so it would help if the page allocator does not favour this range.
> Anyway, that's an optimisation to worry about later.
> 
> > If we could find some way to easily support this either via CMA or CPU-less
> > NUMA nodes, that would be much preferable; even if we cannot cover each and
> > every future use case right now. I expect some issues with CXL+MTE either
> > way , but are happy to be taught otherwise :)
> 
> I think CXL+MTE is rather theoretical at the moment. Given that PCIe
> doesn't have any notion of MTE, more likely there would be some piece of
> interconnect that generates two memory accesses: one for data and the
> other for tags at a configurable offset (which may or may not be in the
> same CXL range).
> 
> > Another thought I had was adding something like CMA memory characteristics.
> > Like, asking if a given CMA area/page supports tagging (i.e., flag for the
> > CMA area set?)?
> 
> I don't think adding CMA memory characteristics helps much. The metadata
> allocation wouldn't go through cma_alloc() but rather
> alloc_contig_range() directly for a specific pfn corresponding to the
> data pages with PROT_MTE. The core mm code doesn't need to know about
> the tag storage layout.
> 
> It's also unlikely for cma_alloc() memory to be mapped as PROT_MTE.
> That's typically coming from device dr

Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse

2023-10-25 Thread Alexandru Elisei
Hi,

On Wed, Oct 25, 2023 at 11:59:32AM +0900, Hyesoo Yu wrote:
> On Wed, Sep 13, 2023 at 04:29:25PM +0100, Catalin Marinas wrote:
> > On Mon, Sep 11, 2023 at 02:29:03PM +0200, David Hildenbrand wrote:
> > > On 11.09.23 13:52, Catalin Marinas wrote:
> > > > On Wed, Sep 06, 2023 at 12:23:21PM +0100, Alexandru Elisei wrote:
> > > > > On Thu, Aug 24, 2023 at 04:24:30PM +0100, Catalin Marinas wrote:
> > > > > > On Thu, Aug 24, 2023 at 01:25:41PM +0200, David Hildenbrand wrote:
> > > > > > > On 24.08.23 13:06, David Hildenbrand wrote:
> > > > > > > > Regarding one complication: "The kernel needs to know where to 
> > > > > > > > allocate
> > > > > > > > a PROT_MTE page from or migrate a current page if it becomes 
> > > > > > > > PROT_MTE
> > > > > > > > (mprotect()) and the range it is in does not support tagging.",
> > > > > > > > simplified handling would be if it's in a MIGRATE_CMA 
> > > > > > > > pageblock, it
> > > > > > > > doesn't support tagging. You have to migrate to a !CMA page (for
> > > > > > > > example, not specifying GFP_MOVABLE as a quick way to achieve 
> > > > > > > > that).
> > > > > > > 
> > > > > > > Okay, I now realize that this patch set effectively duplicates 
> > > > > > > some CMA
> > > > > > > behavior using a new migrate-type.
> > > > [...]
> > > > > I considered mixing the tag storage memory memory with normal memory 
> > > > > and
> > > > > adding it to MIGRATE_CMA. But since tag storage memory cannot be 
> > > > > tagged,
> > > > > this means that it's not enough anymore to have a __GFP_MOVABLE 
> > > > > allocation
> > > > > request to use MIGRATE_CMA.
> > > > > 
> > > > > I considered two solutions to this problem:
> > > > > 
> > > > > 1. Only allocate from MIGRATE_CMA is the requested memory is not 
> > > > > tagged =>
> > > > > this effectively means transforming all memory from MIGRATE_CMA into 
> > > > > the
> > > > > MIGRATE_METADATA migratetype that the series introduces. Not very
> > > > > appealing, because that means treating normal memory that is also on 
> > > > > the
> > > > > MIGRATE_CMA lists as tagged memory.
> > > > 
> > > > That's indeed not ideal. We could try this if it makes the patches
> > > > significantly simpler, though I'm not so sure.
> > > > 
> > > > Allocating metadata is the easier part as we know the correspondence
> > > > from the tagged pages (32 PROT_MTE page) to the metadata page (1 tag
> > > > storage page), so alloc_contig_range() does this for us. Just adding it
> > > > to the CMA range is sufficient.
> > > > 
> > > > However, making sure that we don't allocate PROT_MTE pages from the
> > > > metadata range is what led us to another migrate type. I guess we could
> > > > achieve something similar with a new zone or a CPU-less NUMA node,
> > > 
> > > Ideally, no significant core-mm changes to optimize for an architecture
> > > oddity. That implies, no new zones and no new migratetypes -- unless it is
> > > unavoidable and you are confident that you can convince core-MM people 
> > > that
> > > the use case (giving back 3% of system RAM at max in some setups) is worth
> > > the trouble.
> > 
> > If I was an mm maintainer, I'd also question this ;). But vendors seem
> > pretty picky about the amount of RAM reserved for MTE (e.g. 0.5G for a
> > 16G platform does look somewhat big). As more and more apps adopt MTE,
> > the wastage would be smaller but the first step is getting vendors to
> > enable it.
> > 
> > > I also had CPU-less NUMA nodes in mind when thinking about that, but not
> > > sure how easy it would be to integrate it. If the tag memory has actually
> > > different performance characteristics as well, a NUMA node would be the
> > > right choice.
> > 
> > In general I'd expect the same characteristics. However, changing the
> > memory designation from tag to data (and vice-versa) requires some cache
> > maintenance. The allocation cost is slightly higher (not the runtime
> > one), so it would help if the page allocator does not favour this range.
> > Anyway, that's an optimisation to worry about later.
> > 
> > > If we could find some way to easily support this either via CMA or 
> > > CPU-less
> > > NUMA nodes, that would be much preferable; even if we cannot cover each 
> > > and
> > > every future use case right now. I expect some issues with CXL+MTE either
> > > way , but are happy to be taught otherwise :)
> > 
> > I think CXL+MTE is rather theoretical at the moment. Given that PCIe
> > doesn't have any notion of MTE, more likely there would be some piece of
> > interconnect that generates two memory accesses: one for data and the
> > other for tags at a configurable offset (which may or may not be in the
> > same CXL range).
> > 
> > > Another thought I had was adding something like CMA memory 
> > > characteristics.
> > > Like, asking if a given CMA area/page supports tagging (i.e., flag for the
> > > CMA area set?)?
> > 
> > I don't think adding CMA memory characteristics helps much. The metadata
> > allocation wouldn't go 

Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse

2023-10-25 Thread Hyesoo Yu
On Wed, Oct 25, 2023 at 09:47:36AM +0100, Alexandru Elisei wrote:
> Hi,
> 
> On Wed, Oct 25, 2023 at 11:59:32AM +0900, Hyesoo Yu wrote:
> > On Wed, Sep 13, 2023 at 04:29:25PM +0100, Catalin Marinas wrote:
> > > On Mon, Sep 11, 2023 at 02:29:03PM +0200, David Hildenbrand wrote:
> > > > On 11.09.23 13:52, Catalin Marinas wrote:
> > > > > On Wed, Sep 06, 2023 at 12:23:21PM +0100, Alexandru Elisei wrote:
> > > > > > On Thu, Aug 24, 2023 at 04:24:30PM +0100, Catalin Marinas wrote:
> > > > > > > On Thu, Aug 24, 2023 at 01:25:41PM +0200, David Hildenbrand wrote:
> > > > > > > > On 24.08.23 13:06, David Hildenbrand wrote:
> > > > > > > > > Regarding one complication: "The kernel needs to know where 
> > > > > > > > > to allocate
> > > > > > > > > a PROT_MTE page from or migrate a current page if it becomes 
> > > > > > > > > PROT_MTE
> > > > > > > > > (mprotect()) and the range it is in does not support 
> > > > > > > > > tagging.",
> > > > > > > > > simplified handling would be if it's in a MIGRATE_CMA 
> > > > > > > > > pageblock, it
> > > > > > > > > doesn't support tagging. You have to migrate to a !CMA page 
> > > > > > > > > (for
> > > > > > > > > example, not specifying GFP_MOVABLE as a quick way to achieve 
> > > > > > > > > that).
> > > > > > > > 
> > > > > > > > Okay, I now realize that this patch set effectively duplicates 
> > > > > > > > some CMA
> > > > > > > > behavior using a new migrate-type.
> > > > > [...]
> > > > > > I considered mixing the tag storage memory memory with normal 
> > > > > > memory and
> > > > > > adding it to MIGRATE_CMA. But since tag storage memory cannot be 
> > > > > > tagged,
> > > > > > this means that it's not enough anymore to have a __GFP_MOVABLE 
> > > > > > allocation
> > > > > > request to use MIGRATE_CMA.
> > > > > > 
> > > > > > I considered two solutions to this problem:
> > > > > > 
> > > > > > 1. Only allocate from MIGRATE_CMA is the requested memory is not 
> > > > > > tagged =>
> > > > > > this effectively means transforming all memory from MIGRATE_CMA 
> > > > > > into the
> > > > > > MIGRATE_METADATA migratetype that the series introduces. Not very
> > > > > > appealing, because that means treating normal memory that is also 
> > > > > > on the
> > > > > > MIGRATE_CMA lists as tagged memory.
> > > > > 
> > > > > That's indeed not ideal. We could try this if it makes the patches
> > > > > significantly simpler, though I'm not so sure.
> > > > > 
> > > > > Allocating metadata is the easier part as we know the correspondence
> > > > > from the tagged pages (32 PROT_MTE page) to the metadata page (1 tag
> > > > > storage page), so alloc_contig_range() does this for us. Just adding 
> > > > > it
> > > > > to the CMA range is sufficient.
> > > > > 
> > > > > However, making sure that we don't allocate PROT_MTE pages from the
> > > > > metadata range is what led us to another migrate type. I guess we 
> > > > > could
> > > > > achieve something similar with a new zone or a CPU-less NUMA node,
> > > > 
> > > > Ideally, no significant core-mm changes to optimize for an architecture
> > > > oddity. That implies, no new zones and no new migratetypes -- unless it 
> > > > is
> > > > unavoidable and you are confident that you can convince core-MM people 
> > > > that
> > > > the use case (giving back 3% of system RAM at max in some setups) is 
> > > > worth
> > > > the trouble.
> > > 
> > > If I was an mm maintainer, I'd also question this ;). But vendors seem
> > > pretty picky about the amount of RAM reserved for MTE (e.g. 0.5G for a
> > > 16G platform does look somewhat big). As more and more apps adopt MTE,
> > > the wastage would be smaller but the first step is getting vendors to
> > > enable it.
> > > 
> > > > I also had CPU-less NUMA nodes in mind when thinking about that, but not
> > > > sure how easy it would be to integrate it. If the tag memory has 
> > > > actually
> > > > different performance characteristics as well, a NUMA node would be the
> > > > right choice.
> > > 
> > > In general I'd expect the same characteristics. However, changing the
> > > memory designation from tag to data (and vice-versa) requires some cache
> > > maintenance. The allocation cost is slightly higher (not the runtime
> > > one), so it would help if the page allocator does not favour this range.
> > > Anyway, that's an optimisation to worry about later.
> > > 
> > > > If we could find some way to easily support this either via CMA or 
> > > > CPU-less
> > > > NUMA nodes, that would be much preferable; even if we cannot cover each 
> > > > and
> > > > every future use case right now. I expect some issues with CXL+MTE 
> > > > either
> > > > way , but are happy to be taught otherwise :)
> > > 
> > > I think CXL+MTE is rather theoretical at the moment. Given that PCIe
> > > doesn't have any notion of MTE, more likely there would be some piece of
> > > interconnect that generates two memory accesses: one for data and the
> > > other for tags at a configurable offset (which may or m

Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse

2023-10-27 Thread Catalin Marinas
On Wed, Oct 25, 2023 at 05:52:58PM +0900, Hyesoo Yu wrote:
> If we only avoid using ALLOC_CMA for __GFP_TAGGED, would we still be able to 
> use
> the next iteration even if the hardware does not support "tag of tag" ? 

It depends on how the next iteration looks like. The plan was not to
support this so that we avoid another complication where a non-tagged
page is mprotect'ed to become tagged and it would need to be migrated
out of the CMA range. Not sure how much code it would save.

> I am not sure every vendor will support tag of tag, since there is no 
> information
> related to that feature, like in the Google spec document.

If you are aware of any vendors not supporting this, please direct them
to the Arm support team, it would be very useful information for us.

Thanks.

-- 
Catalin