Re: [osv-dev] [PATCH] mempool: preliminary support of mmap-backed large allocations

Nadav Har'El Mon, 23 Mar 2020 17:48:47 -0700

On Tue, Mar 24, 2020 at 12:05 AM Waldemar Kozaczuk <jwkozac...@gmail.com>
wrote:


> This patch partially addresses the issue #854. In essence it changes
> malloc_large()
> to use mmu::map_anon() if requested memory is >= 2MB and it does not have
> be
> contiguous in physical memory. It is supposed to help OSv memory allocation
> behave better when memory in free_page_ranges is heavily fragmented and
> large allocations (>=2MB) cannot be satisfied with straight contiguous
> page range.
>
> Please not this patch does NOT address scenario where allocation
> requests > 4K and < 2MB cannot be satisfied because mamory is fragmented
> at that level.
>
> Signed-off-by: Waldemar Kozaczuk <jwkozac...@gmail.com>
> ---
>  core/mempool.cc | 41 ++++++++++++++++++++++++++++++++++++-----
>  core/mmu.cc     |  9 +++++----
>  2 files changed, 41 insertions(+), 9 deletions(-)
>
> diff --git a/core/mempool.cc b/core/mempool.cc
> index d902eea8..0d82f145 100644
> --- a/core/mempool.cc
> +++ b/core/mempool.cc
> @@ -702,10 +702,13 @@ page_range* page_range_allocator::alloc(size_t size)
>          for (auto&& pr : _free[exact_order - 1]) {
>

The comment just above this line mentioning non-contiguous allocation, is
it still relevant with your patch?
Unfortunately, I don't understand what that comment was trying to say :-(

             if (pr.size >= size) {
>                  range = &pr;
> +                remove_list(exact_order - 1, *range);
>

I'm not familiar with this code, and I don't understand what I'm seeing
here. What does the change here and a few lines below change in this
function?

                 break;
>              }
>          }
> -        return nullptr;
> +        if (!range) {
> +            return nullptr;
> +        }
>      } else if (order == max_order) {
>          range = &*_free_huge.rbegin();
>          if (range->size < size) {
> @@ -820,7 +823,9 @@ void page_range_allocator::for_each(unsigned
> min_order, Func f)
>      }
>  }
>
> -static void* malloc_large(size_t size, size_t alignment, bool block =
> true)
> +static size_t huge_page_size = 0x200000;
>

we already have mmu::huge_page_size, perhaps we don't need another one?
If it's just an arbitrary limit and doesn't really need to be the same as
mmu::huge_page_size, it should get a different name.


> +
> +static void* malloc_large(size_t size, size_t alignment, bool block =
> true, bool contiguous = true)
>  {
>      auto requested_size = size;
>      size_t offset;
> @@ -832,6 +837,16 @@ static void* malloc_large(size_t size, size_t
> alignment, bool block = true)
>      size += offset;
>      size = align_up(size, page_size);
>
> +    if (size >= huge_page_size && !contiguous) {
> +        // Map memory if requested memory greater than 2MB and does not
> need to be contiguous
>

Please use the constant name here, not "2MB". I'm not sure what it should
be, and if we ever change it,
it would be a shame to have to hunt down all these references.

+        //TODO: For now pre-populate the memory, in future consider
> turning it off
> +        void* obj = mmu::map_anon(nullptr, size, mmu::mmap_populate,
> mmu::perm_read | mmu::perm_write);
>
+        page_range* ret_header = reinterpret_cast<page_range*>(obj);
> +        ret_header->size = size;
>

Wow, if this code wasn't confusing enough already ;-) So now we have a
"page_range" structure but it isn't a normal page_range,
and can't be freed with free_page_range() as usual (I see you already
handled that correctly below). I guess it's fine, but sad.

+        trace_memory_malloc_large(obj + offset, requested_size, size,
> alignment);
> +        return obj + offset;
> +    }
> +
>      while (true) {
>          WITH_LOCK(free_page_ranges_lock) {
>              reclaimer_thread.wait_for_minimum_memory();
> @@ -1061,10 +1076,21 @@ static void free_page_range(void *addr, size_t
> size)
>      free_page_range(static_cast<page_range*>(addr));
>  }
>
> +static inline bool is_addr_memory_mmapped(void* addr)
> +{
> +    return (ulong)addr >= 0x200000000000 && (ulong)addr < 0x800000000000;
>

Where did you get these numbers? Don't we already have numeric constants or
functions to check these
things? Functions are probably better, and would need to be implemented
differently for x86 and aarch64?

I see we already have a function is_linear_mapped(), would using its
negation be enough? Or am
I missing some other options?


> +}
> +
>  static void free_large(void* obj)
>  {
>      obj = align_down(obj - 1, page_size);
> -    free_page_range(static_cast<page_range*>(obj));
> +    auto range = static_cast<page_range *>(obj);
> +    if (is_addr_memory_mmapped(obj)) {
> +        mmu::munmap(obj, range->size);
>
+    }
> +    else {
> +        free_page_range(range);
> +    }
>  }
>
>  static unsigned large_object_size(void *obj)
> @@ -1689,7 +1715,7 @@ static inline void* std_malloc(size_t size, size_t
> alignment)
>                                         memory::alloc_page());
>          trace_memory_malloc_page(ret, size, mmu::page_size, alignment);
>      } else {
> -        ret = memory::malloc_large(size, alignment);
> +        ret = memory::malloc_large(size, alignment, true, false);
>      }
>      memory::tracker_remember(ret, size);
>      return ret;
> @@ -1760,6 +1786,11 @@ void free(void* object)
>          return;
>      }
>      memory::tracker_forget(object);
> +    if (memory::is_addr_memory_mmapped(object)) {
> +        memory::free_large(object);
>

This was confusing for me to see. Above you changed free_large() to handle
either the "normal" (contiguous) large range,
and the mmap case. But in this place, you call it *only* for the mmap case,
and a few lines below, you call it again *only*
for the normal (contiguous) case. So it didn't have to be a single function
at all... You could have left free_large() unchanged
as it was, and only in te line above, call mmu::unmap() - not free_large.

+        return;
> +    }
> +
>      switch (mmu::get_mem_area(object)) {
>      case mmu::mem_area::page:
>          object = mmu::translate_mem_area(mmu::mem_area::page,
> @@ -1969,7 +2000,7 @@ void* alloc_phys_contiguous_aligned(size_t size,
> size_t align, bool block)
>      assert(is_power_of_two(align));
>      // make use of the standard large allocator returning properly aligned
>      // physically contiguous memory:
> -    auto ret = malloc_large(size, align, block);
> +    auto ret = malloc_large(size, align, block, true);
>      assert (!(reinterpret_cast<uintptr_t>(ret) & (align - 1)));
>      return ret;
>  }
> diff --git a/core/mmu.cc b/core/mmu.cc
> index ff3fab47..d184541c 100644
> --- a/core/mmu.cc
> +++ b/core/mmu.cc
> @@ -115,10 +115,11 @@ phys virt_to_phys(void *virt)
>      }
>  #endif
>
> -    // For now, only allow non-mmaped areas.  Later, we can either
> -    // bounce such addresses, or lock them in memory and translate
> -    assert(virt >= phys_mem);
> -    return reinterpret_cast<uintptr_t>(virt) & (mem_area_size - 1);
> +    if (virt >= phys_mem) {
> +        return reinterpret_cast<uintptr_t>(virt) & (mem_area_size - 1);
> +    } else {
> +        return virt_to_phys_pt(virt);
> +    }
>

I am not sure I understand this change. Do we really want to do this? The
caller of this function
won't know how many bytes after the start physical address are actually
contiguous, and if one calls
this on an mmap(), it's not even guaranteed that the mapping remains in
place or doesn't change...

Note that if I understand correctly, your code doesn't change
alloc_phys_contiguous_aligned()
and virt_to_phys() can still be used on what it returns. So maybe it
doesn't need to be fixed at all?


 }
>
>  template <int N, typename MakePTE>
> --
> 2.20.1
>
> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to osv-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/osv-dev/20200323220551.29880-1-jwkozaczuk%40gmail.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CANEVyjt4TT89VRfUf19497bNz7%3DraO%2Bv3c0_ieFQxuD78J0K8w%40mail.gmail.com.

Re: [osv-dev] [PATCH] mempool: preliminary support of mmap-backed large allocations

Reply via email to