Very nice.
At first, I wanted to suggest maybe we should keep the old mapping in some
#ifdef, in case one day we'll want to return to it because 47 bits is not
enough.

But then I realized, we already make due with much fewer than 47 bits for
normal allocations (if I understand correctly, we have just 44 bits, or 17
TB),
and we had 47 bits for mmap and now we'll "only" have 46 bits for mmap, but
that's still 70 TB and probably more than enough an anyway much more
than we have for malloc(). So it is unlikely we'll ever need to want to
change the mapping back (and if we do, we always have the git history).

So I'll test this and commit it as-is. Thanks!

--
Nadav Har'El
n...@scylladb.com


On Fri, Oct 14, 2022 at 3:08 AM Waldemar Kozaczuk <jwkozac...@gmail.com>
wrote:

> This patch changes the layout of the virtual address space to make
> all virtual memory fit below the 0x0000800000000000 address.
>
> As Nadav Har'El explains in the description of the issue #1196:
>
> "Although x86 is nominally a 64-bit address space, it didn't fully
> support the entire 64 bits and doesn't even now. Rather (see a good
> explanation
> in https://en.wikipedia.org/wiki/X86-64, "canonical form address") it only
> supported 48 bits.
>
> Moreover, all the highest bits must be copies of the bit 47. So basically
> you have 47 bits (128 TB) with all highest bits 0, and another 128 TB with
> the highest bits 1 - these are the 0xfffff... addresses.
>
> So it was convenient for OSv to divide the address space with one half for
> mmap
> and one half for malloc."
>
> As it turns out, the virtual address space story on AArch64 is similar
> (for details read https://www.kernel.org/doc/html/latest/arm64/memory.html
> ),
> where all the bits 63:48 are set to either 0 or 1. The difference to x86
> is that
> the 0-addresses mapping is specified by the table pointed by the TTBR0
> system register
> and the 1-addresses by the table pointed by the TTBR1 register.
>
> So the virtual-physical linear memory mapping before this patch would
> look like this per the 'osv lineap_mmap' gdb command:
>
> x86_64)
>            vaddr            paddr     size perm memattr name
>         40200000           200000   67d434 rwxp  normal kernel
> ffff800000000000                0 40000000 rwxp  normal main
> ffff8000000f0000            f0000    10000 rwxp  normal dmi
> ffff8000000f5a10            f5a10      17c rwxp  normal smbios
> ffff800040000000         40000000 3ffdd000 rwxp  normal main
> ffff80007fe00000         7fe00000   200000 rwxp  normal acpi
> ffff8000feb91000         feb91000     1000 rwxp  normal pci_bar
> ffff8000feb92000         feb92000     1000 rwxp  normal pci_bar
> ffff8000fec00000         fec00000     1000 rwxp  normal ioapic
> ffff900000000000                0 40000000 rwxp  normal page
> ffff900040000000         40000000 3ffdd000 rwxp  normal page
> ffffa00000000000                0 40000000 rwxp  normal mempool
> ffffa00040000000         40000000 3ffdd000 rwxp  normal mempool
>
> aarch64)
>            vaddr            paddr     size perm memattr name
>          8000000          8000000    10000 rwxp     dev gic_dist
>          8010000          8010000    10000 rwxp     dev gic_cpu
>          9000000          9000000     1000 rwxp     dev pl011
>          9010000          9010000     1000 rwxp     dev pl031
>         10000000         10000000 2eff0000 rwxp     dev pci_mem
>         3eff0000         3eff0000    10000 rwxp     dev pci_io
>        fc0000000         40000000   84e000 rwxp  normal kernel
>       4010000000       4010000000 10000000 rwxp     dev pci_cfg
> ffff80000a000000          a000000      200 rwxp  normal virtio_mmio_cfg
> ffff80000a000200          a000200      200 rwxp  normal virtio_mmio_cfg
> ffff80000a000400          a000400      200 rwxp  normal virtio_mmio_cfg
> ffff80000a000600          a000600      200 rwxp  normal virtio_mmio_cfg
> ffff80000a000800          a000800      200 rwxp  normal virtio_mmio_cfg
> ffff80000a000a00          a000a00      200 rwxp  normal virtio_mmio_cfg
> ffff80000a000c00          a000c00      200 rwxp  normal virtio_mmio_cfg
> ffff80000a000e00          a000e00      200 rwxp  normal virtio_mmio_cfg
> ffff80004084e000         4084e000 7f7b2000 rwxp  normal main
> ffff90004084e000         4084e000 7f7b2000 rwxp  normal page
> ffffa0004084e000         4084e000 7f7b2000 rwxp  normal mempool
>
> The mappings above include the kernel code, memory-mapped devices
> as well as the malloc-related areas marked by "main", "page" and
> "mempool" names.
>
> There are also mmap-related areas as indicated by this gdb example:
> osv mmap
> 0x0000000000000000 0x0000000000000000 [0.0 kB]         flags=none
>  perm=none
> 0x0000200000000000 0x0000200000001000 [4.0 kB]         flags=p
> perm=none
> 0x0000800000000000 0x0000800000000000 [0.0 kB]         flags=none
>  perm=none
>
> Unfortunately, this virtual memory layout while being convenient,
> prevents some Linux applications from running correctly on OSv.
> More specifically, the RapidJSON C++ library (see
> https://rapidjson.org/index.html) on x86_64
> and Java JIT compiler on AArch64 (see #1145 and #1157) use the 63-48 bits
> to
> "pack" some extra information for some optimizations and thus assumes
> that these bits are 0.
>
> So this patch changes the virtual memory layout to make "malloc" areas
> fall under 0x0000800000000000. In short we effectively move the areas:
>
> ffff800000000000 - ffff8fffffffffff (main)
> ffff900000000000 - ffff9fffffffffff (page)
> ffffa00000000000 - ffffafffffffffff (mempool)
> ffffb00000000000 - ffffbfffffffffff (debug)
>
> to:
>
> 0000400000000000 - 00004fffffffffff (main)
> 0000500000000000 - 00005fffffffffff (page)
> 0000600000000000 - 00006fffffffffff (mempool)
> 0000700000000000 - 00007fffffffffff (debug)
>
> We also squeze the mmap area from:
>
> 0000000000000000 - 0000800000000000
>
> to:
>
> 0000000000000000 - 0000400000000000
>
> As a result the linear mappings after the patch look like this:
>
> x86_64)
>            vaddr            paddr     size perm memattr name
>         40200000           200000   67c434 rwxp  normal kernel
>     400000000000                0 40000000 rwxp  normal main
>     4000000f0000            f0000    10000 rwxp  normal dmi
>     4000000f5a10            f5a10      17c rwxp  normal smbios
>     400040000000         40000000 3ffdd000 rwxp  normal main
>     40007fe00000         7fe00000   200000 rwxp  normal acpi
>     4000feb91000         feb91000     1000 rwxp  normal pci_bar
>     4000feb92000         feb92000     1000 rwxp  normal pci_bar
>     4000fec00000         fec00000     1000 rwxp  normal ioapic
>     500000000000                0 40000000 rwxp  normal page
>     500040000000         40000000 3ffdd000 rwxp  normal page
>     600000000000                0 40000000 rwxp  normal mempool
>     600040000000         40000000 3ffdd000 rwxp  normal mempool
>
> aarch64)
>            vaddr            paddr     size perm memattr name
>          8000000          8000000    10000 rwxp     dev gic_dist
>          8010000          8010000    10000 rwxp     dev gic_cpu
>          9000000          9000000     1000 rwxp     dev pl011
>          9010000          9010000     1000 rwxp     dev pl031
>         10000000         10000000 2eff0000 rwxp     dev pci_mem
>         3eff0000         3eff0000    10000 rwxp     dev pci_io
>        fc0000000         40000000   7de000 rwxp  normal kernel
>       4010000000       4010000000 10000000 rwxp     dev pci_cfg
>     40000a000000          a000000      200 rwxp  normal virtio_mmio_cfg
>     40000a000200          a000200      200 rwxp  normal virtio_mmio_cfg
>     40000a000400          a000400      200 rwxp  normal virtio_mmio_cfg
>     40000a000600          a000600      200 rwxp  normal virtio_mmio_cfg
>     40000a000800          a000800      200 rwxp  normal virtio_mmio_cfg
>     40000a000a00          a000a00      200 rwxp  normal virtio_mmio_cfg
>     40000a000c00          a000c00      200 rwxp  normal virtio_mmio_cfg
>     40000a000e00          a000e00      200 rwxp  normal virtio_mmio_cfg
>     4000407de000         407de000 7f822000 rwxp  normal main
>     5000407de000         407de000 7f822000 rwxp  normal page
>     6000407de000         407de000 7f822000 rwxp  normal mempool
>
> Fixes #1196
> Fixes #1145
> Fixes #1157
>
> Signed-off-by: Waldemar Kozaczuk <jwkozac...@gmail.com>
> ---
>  arch/aarch64/arch-setup.cc | 5 ++---
>  core/mmu.cc                | 2 +-
>  include/osv/mmu-defs.hh    | 4 ++--
>  scripts/loader.py          | 2 +-
>  4 files changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/arch/aarch64/arch-setup.cc b/arch/aarch64/arch-setup.cc
> index 24e007c4..89dceae9 100644
> --- a/arch/aarch64/arch-setup.cc
> +++ b/arch/aarch64/arch-setup.cc
> @@ -36,12 +36,11 @@
>
>  void setup_temporary_phys_map()
>  {
> -    // duplicate 1:1 mapping into phys_mem
> +    // duplicate 1:1 mapping into the lower part of phys_mem
>      u64 *pt_ttbr0 = reinterpret_cast<u64*>(processor::read_ttbr0());
> -    u64 *pt_ttbr1 = reinterpret_cast<u64*>(processor::read_ttbr1());
>      for (auto&& area : mmu::identity_mapped_areas) {
>          auto base = reinterpret_cast<void*>(get_mem_area_base(area));
> -        pt_ttbr1[mmu::pt_index(base, 3)] = pt_ttbr0[0];
> +        pt_ttbr0[mmu::pt_index(base, 3)] = pt_ttbr0[0];
>      }
>      mmu::flush_tlb_all();
>  }
> diff --git a/core/mmu.cc b/core/mmu.cc
> index 007d4331..33ae8407 100644
> --- a/core/mmu.cc
> +++ b/core/mmu.cc
> @@ -78,7 +78,7 @@ public:
>  };
>
>  constexpr uintptr_t lower_vma_limit = 0x0;
> -constexpr uintptr_t upper_vma_limit = 0x800000000000;
> +constexpr uintptr_t upper_vma_limit = 0x400000000000;
>
>  typedef boost::intrusive::set<vma,
>                                bi::compare<vma_compare>,
> diff --git a/include/osv/mmu-defs.hh b/include/osv/mmu-defs.hh
> index 18edf441..fd6a85a6 100644
> --- a/include/osv/mmu-defs.hh
> +++ b/include/osv/mmu-defs.hh
> @@ -46,12 +46,12 @@ constexpr uintptr_t mem_area_size = uintptr_t(1) << 44;
>
>  constexpr uintptr_t get_mem_area_base(mem_area area)
>  {
> -    return 0xffff800000000000 | uintptr_t(area) << 44;
> +    return 0x400000000000 | uintptr_t(area) << 44;
>  }
>
>  static inline mem_area get_mem_area(void* addr)
>  {
> -    return mem_area(reinterpret_cast<uintptr_t>(addr) >> 44 & 7);
> +    return mem_area(reinterpret_cast<uintptr_t>(addr) >> 44 & 3);
>  }
>
>  constexpr void* translate_mem_area(mem_area from, mem_area to, void* addr)
> diff --git a/scripts/loader.py b/scripts/loader.py
> index 6878a7a3..0ce782d0 100755
> --- a/scripts/loader.py
> +++ b/scripts/loader.py
> @@ -27,7 +27,7 @@ class status_enum_class(object):
>      pass
>  status_enum = status_enum_class()
>
> -phys_mem = 0xffff800000000000
> +phys_mem = 0x400000000000
>
>  def pt_index(addr, level):
>      return (addr >> (12 + 9 * level)) & 511
> --
> 2.34.1
>
> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to osv-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/osv-dev/20221014000810.7323-1-jwkozaczuk%40gmail.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CANEVyjvSF3xnvZbp7TVODnfN1bVNkUGCPCQi4qruf8BdQ5cSnw%40mail.gmail.com.

Reply via email to