Re: [RFC 0/9] Linear Address Masking enabling

2021-02-07 Thread Kirill A. Shutemov
On Sun, Feb 07, 2021 at 09:24:23AM +0100, Dmitry Vyukov wrote:
> On Fri, Feb 5, 2021 at 4:16 PM Kirill A. Shutemov
>  wrote:
> >
> > Linear Address Masking[1] (LAM) modifies the checking that is applied to
> > 64-bit linear addresses, allowing software to use of the untranslated
> > address bits for metadata.
> >
> > The patchset brings support for LAM for userspace addresses.
> >
> > The most sensitive part of enabling is change in tlb.c, where CR3 flags
> > get set. Please take a look that what I'm doing makes sense.
> >
> > The patchset is RFC quality and the code requires more testing before it
> > can be applied.
> >
> > The userspace API is not finalized yet. The patchset extends API used by
> > ARM64: PR_GET/SET_TAGGED_ADDR_CTRL. The API is adjusted to not imply ARM
> > TBI: it now allows to request a number of bits of metadata needed and
> > report where these bits are located in the address.
> >
> > There's an alternative proposal[2] for the API based on Intel CET
> > interface. Please let us know if you prefer one over another.
> >
> > The feature competes for bits with 5-level paging: LAM_U48 makes it
> > impossible to map anything about 47-bits. The patchset made these
> > capability mutually exclusive: whatever used first wins. LAM_U57 can be
> > combined with mappings above 47-bits.
> >
> > I include QEMU patch in case if somebody wants to play with the feature.
> 
> Exciting! Do you plan to send the QEMU patch to QEMU?

Sure. After more testing, once I'm sure it's conforming to the hardware.

-- 
 Kirill A. Shutemov


Re: [RFC 0/9] Linear Address Masking enabling

2021-02-07 Thread Dmitry Vyukov
On Fri, Feb 5, 2021 at 4:16 PM Kirill A. Shutemov
 wrote:
>
> Linear Address Masking[1] (LAM) modifies the checking that is applied to
> 64-bit linear addresses, allowing software to use of the untranslated
> address bits for metadata.
>
> The patchset brings support for LAM for userspace addresses.
>
> The most sensitive part of enabling is change in tlb.c, where CR3 flags
> get set. Please take a look that what I'm doing makes sense.
>
> The patchset is RFC quality and the code requires more testing before it
> can be applied.
>
> The userspace API is not finalized yet. The patchset extends API used by
> ARM64: PR_GET/SET_TAGGED_ADDR_CTRL. The API is adjusted to not imply ARM
> TBI: it now allows to request a number of bits of metadata needed and
> report where these bits are located in the address.
>
> There's an alternative proposal[2] for the API based on Intel CET
> interface. Please let us know if you prefer one over another.
>
> The feature competes for bits with 5-level paging: LAM_U48 makes it
> impossible to map anything about 47-bits. The patchset made these
> capability mutually exclusive: whatever used first wins. LAM_U57 can be
> combined with mappings above 47-bits.
>
> I include QEMU patch in case if somebody wants to play with the feature.

Exciting! Do you plan to send the QEMU patch to QEMU?

> The branch:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git lam
>
> Any comments are welcome.
>
> [1] ISE, Chapter 14. 
> https://software.intel.com/content/dam/develop/external/us/en/documents-tps/architecture-instruction-set-extensions-programming-reference.pdf
> [2] 
> https://github.com/hjl-tools/linux/commit/e85fa032e5b276ddf17edd056f92f599db9e8369
>
> Kirill A. Shutemov (9):
>   mm, arm64: Update PR_SET/GET_TAGGED_ADDR_CTRL interface
>   x86/mm: Fix CR3_ADDR_MASK
>   x86: CPUID and CR3/CR4 flags for Linear Address Masking
>   x86/mm: Introduce TIF_LAM_U57 and TIF_LAM_U48
>   x86/mm: Provide untagged_addr() helper
>   x86/uaccess: Remove tags from the address before checking
>   x86/mm: Handle tagged memory accesses from kernel threads
>   x86/mm: Make LAM_U48 and mappings above 47-bits mutually exclusive
>   x86/mm: Implement PR_SET/GET_TAGGED_ADDR_CTRL with LAM
>
>  arch/arm64/include/asm/processor.h|  12 +-
>  arch/arm64/kernel/process.c   |  45 +-
>  arch/arm64/kernel/ptrace.c|   4 +-
>  arch/x86/include/asm/cpufeatures.h|   1 +
>  arch/x86/include/asm/elf.h|   3 +-
>  arch/x86/include/asm/mmu.h|   1 +
>  arch/x86/include/asm/mmu_context.h|  13 ++
>  arch/x86/include/asm/page_32.h|   3 +
>  arch/x86/include/asm/page_64.h|  19 +++
>  arch/x86/include/asm/processor-flags.h|   2 +-
>  arch/x86/include/asm/processor.h  |  10 ++
>  arch/x86/include/asm/thread_info.h|   9 +-
>  arch/x86/include/asm/tlbflush.h   |   5 +
>  arch/x86/include/asm/uaccess.h|  16 +-
>  arch/x86/include/uapi/asm/processor-flags.h   |   6 +
>  arch/x86/kernel/process_64.c  | 145 ++
>  arch/x86/kernel/sys_x86_64.c  |   5 +-
>  arch/x86/mm/hugetlbpage.c |   6 +-
>  arch/x86/mm/mmap.c|   9 +-
>  arch/x86/mm/tlb.c | 124 +--
>  kernel/sys.c  |  14 +-
>  .../testing/selftests/arm64/tags/tags_test.c  |  31 
>  .../selftests/{arm64 => vm}/tags/.gitignore   |   0
>  .../selftests/{arm64 => vm}/tags/Makefile |   0
>  .../{arm64 => vm}/tags/run_tags_test.sh   |   0
>  tools/testing/selftests/vm/tags/tags_test.c   |  57 +++
>  26 files changed, 464 insertions(+), 76 deletions(-)
>  delete mode 100644 tools/testing/selftests/arm64/tags/tags_test.c
>  rename tools/testing/selftests/{arm64 => vm}/tags/.gitignore (100%)
>  rename tools/testing/selftests/{arm64 => vm}/tags/Makefile (100%)
>  rename tools/testing/selftests/{arm64 => vm}/tags/run_tags_test.sh (100%)
>  create mode 100644 tools/testing/selftests/vm/tags/tags_test.c
>
> --
> 2.26.2
>


Re: [RFC 0/9] Linear Address Masking enabling

2021-02-05 Thread Peter Zijlstra
On Fri, Feb 05, 2021 at 06:16:20PM +0300, Kirill A. Shutemov wrote:
> The feature competes for bits with 5-level paging: LAM_U48 makes it
> impossible to map anything about 47-bits. The patchset made these
> capability mutually exclusive: whatever used first wins. LAM_U57 can be
> combined with mappings above 47-bits.

And I suppose we still can't switch between 4 and 5 level at runtime,
using a CR3 bit?


Re: [RFC 0/9] Linear Address Masking enabling

2021-02-05 Thread Kirill A. Shutemov
On Fri, Feb 05, 2021 at 04:49:05PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 05, 2021 at 06:16:20PM +0300, Kirill A. Shutemov wrote:
> > The feature competes for bits with 5-level paging: LAM_U48 makes it
> > impossible to map anything about 47-bits. The patchset made these
> > capability mutually exclusive: whatever used first wins. LAM_U57 can be
> > combined with mappings above 47-bits.
> 
> And I suppose we still can't switch between 4 and 5 level at runtime,
> using a CR3 bit?

No. And I can't imagine how would it work with 5-level on kernel side.

-- 
 Kirill A. Shutemov


Re: [RFC 0/9] Linear Address Masking enabling

2021-02-05 Thread Peter Zijlstra
On Fri, Feb 05, 2021 at 07:01:27PM +0300, Kirill A. Shutemov wrote:
> On Fri, Feb 05, 2021 at 04:49:05PM +0100, Peter Zijlstra wrote:
> > On Fri, Feb 05, 2021 at 06:16:20PM +0300, Kirill A. Shutemov wrote:
> > > The feature competes for bits with 5-level paging: LAM_U48 makes it
> > > impossible to map anything about 47-bits. The patchset made these
> > > capability mutually exclusive: whatever used first wins. LAM_U57 can be
> > > combined with mappings above 47-bits.
> > 
> > And I suppose we still can't switch between 4 and 5 level at runtime,
> > using a CR3 bit?
> 
> No. And I can't imagine how would it work with 5-level on kernel side.

KPTI already switches CR3 on every entry and only maps a very limited
number of kernel pages in the user map. This means a 4 level user
page-table should be possible.

The kernel page-tables would only need to update their p5d[0] on every
4l user change.

Not as nice as actually having separate user and kernel page-tables in
hardware, but it would actually make 5l page-tables useful on machines
with less than stupid amounds of memory I think.

One of the road-blocks to doing per-cpu kernel page-tables is having to
do 2k copies, only having to update a single P5D entry would be ideal.

Ofcourse, once we get 5l user tables we're back to being stupid, but
maybe tasks with that much memory don't actually switch much, who
knows.


[RFC 0/9] Linear Address Masking enabling

2021-02-05 Thread Kirill A. Shutemov
Linear Address Masking[1] (LAM) modifies the checking that is applied to
64-bit linear addresses, allowing software to use of the untranslated
address bits for metadata.

The patchset brings support for LAM for userspace addresses.

The most sensitive part of enabling is change in tlb.c, where CR3 flags
get set. Please take a look that what I'm doing makes sense.

The patchset is RFC quality and the code requires more testing before it
can be applied.

The userspace API is not finalized yet. The patchset extends API used by
ARM64: PR_GET/SET_TAGGED_ADDR_CTRL. The API is adjusted to not imply ARM
TBI: it now allows to request a number of bits of metadata needed and
report where these bits are located in the address.

There's an alternative proposal[2] for the API based on Intel CET
interface. Please let us know if you prefer one over another.

The feature competes for bits with 5-level paging: LAM_U48 makes it
impossible to map anything about 47-bits. The patchset made these
capability mutually exclusive: whatever used first wins. LAM_U57 can be
combined with mappings above 47-bits.

I include QEMU patch in case if somebody wants to play with the feature.

The branch:

git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git lam

Any comments are welcome.

[1] ISE, Chapter 14. 
https://software.intel.com/content/dam/develop/external/us/en/documents-tps/architecture-instruction-set-extensions-programming-reference.pdf
[2] 
https://github.com/hjl-tools/linux/commit/e85fa032e5b276ddf17edd056f92f599db9e8369

Kirill A. Shutemov (9):
  mm, arm64: Update PR_SET/GET_TAGGED_ADDR_CTRL interface
  x86/mm: Fix CR3_ADDR_MASK
  x86: CPUID and CR3/CR4 flags for Linear Address Masking
  x86/mm: Introduce TIF_LAM_U57 and TIF_LAM_U48
  x86/mm: Provide untagged_addr() helper
  x86/uaccess: Remove tags from the address before checking
  x86/mm: Handle tagged memory accesses from kernel threads
  x86/mm: Make LAM_U48 and mappings above 47-bits mutually exclusive
  x86/mm: Implement PR_SET/GET_TAGGED_ADDR_CTRL with LAM

 arch/arm64/include/asm/processor.h|  12 +-
 arch/arm64/kernel/process.c   |  45 +-
 arch/arm64/kernel/ptrace.c|   4 +-
 arch/x86/include/asm/cpufeatures.h|   1 +
 arch/x86/include/asm/elf.h|   3 +-
 arch/x86/include/asm/mmu.h|   1 +
 arch/x86/include/asm/mmu_context.h|  13 ++
 arch/x86/include/asm/page_32.h|   3 +
 arch/x86/include/asm/page_64.h|  19 +++
 arch/x86/include/asm/processor-flags.h|   2 +-
 arch/x86/include/asm/processor.h  |  10 ++
 arch/x86/include/asm/thread_info.h|   9 +-
 arch/x86/include/asm/tlbflush.h   |   5 +
 arch/x86/include/asm/uaccess.h|  16 +-
 arch/x86/include/uapi/asm/processor-flags.h   |   6 +
 arch/x86/kernel/process_64.c  | 145 ++
 arch/x86/kernel/sys_x86_64.c  |   5 +-
 arch/x86/mm/hugetlbpage.c |   6 +-
 arch/x86/mm/mmap.c|   9 +-
 arch/x86/mm/tlb.c | 124 +--
 kernel/sys.c  |  14 +-
 .../testing/selftests/arm64/tags/tags_test.c  |  31 
 .../selftests/{arm64 => vm}/tags/.gitignore   |   0
 .../selftests/{arm64 => vm}/tags/Makefile |   0
 .../{arm64 => vm}/tags/run_tags_test.sh   |   0
 tools/testing/selftests/vm/tags/tags_test.c   |  57 +++
 26 files changed, 464 insertions(+), 76 deletions(-)
 delete mode 100644 tools/testing/selftests/arm64/tags/tags_test.c
 rename tools/testing/selftests/{arm64 => vm}/tags/.gitignore (100%)
 rename tools/testing/selftests/{arm64 => vm}/tags/Makefile (100%)
 rename tools/testing/selftests/{arm64 => vm}/tags/run_tags_test.sh (100%)
 create mode 100644 tools/testing/selftests/vm/tags/tags_test.c

-- 
2.26.2