Re: [osv-dev] [PATCH V4 2/2] mempool: use map_anon() for large allocations or when memory is fragmented

Waldek Kozaczuk Sun, 29 Mar 2020 08:26:00 -0700

Let me try to address your comments and send V5.

On Sunday, March 29, 2020 at 11:21:00 AM UTC-4, Nadav Har'El wrote:
>
>
> On Sun, Mar 29, 2020 at 5:37 PM Waldek Kozaczuk <jwkoz...@gmail.com 
> <javascript:>> wrote:
>
>>
>>
>> On Sunday, March 29, 2020 at 10:04:45 AM UTC-4, Nadav Har'El wrote:
>>>
>>> Looks very good, I  have several minor suggestions below (inline), but I 
>>> would have committed this patch anyway,
>>> except that I didn't understand is why this patch is titled "2/2" - 
>>> where is 1/2? I couldn't find it.
>>>
>> It is called '[PATCH V2 1/2] mempool: fix a bug in page_range_allocator() 
>> when handling worst case O(n) scenario' from Match 24th. It fixes somewhat 
>> related bug.
>>
>
> Ok, I didn't realize I should look for older versions.
> Do you want to change anything in v4 based on my comments, or want me to 
> commit your patch as-is and then consider my comments?
>
> If you want to change anything, please send v5 with both patches (or 
> commit them yourself). Thanks!
>
>
>>> I see you have new realloc() tests - did these tests fail before your 
>>> patches? 
>>>
>> I think one of the previous version of this patch was failing and in any 
>> case, it was a way for me to better verify my changes, 
>>
>>>
>>> --
>>> Nadav Har'El
>>> n...@scylladb.com
>>>
>>>
>>> On Thu, Mar 26, 2020 at 8:36 PM Waldemar Kozaczuk <jwkoz...@gmail.com> 
>>> wrote:
>>>
>>>> This patch mostly addresses the issue #854. In essence it changes 
>>>> malloc_large()
>>>> to use mmu::map_anon() if requested memory is greater than size of 
>>>> "huge page"
>>>> and it does not have be contiguous in physical memory. Also it makes it 
>>>> fallback
>>>> to map_anon() based allocation for requests > 4K and < 2MB that cannot 
>>>> be satisfied
>>>> when memory is heavily fragmented.
>>>>
>>>> It is supposed to help OSv memory allocation behave better when memory
>>>> in free_page_ranges is heavily fragmented and large allocations (>=2MB)
>>>> cannot be satisfied with single contiguous page range. When working on
>>>> this patch it has also become clear, that applications that heavily use 
>>>> realloc()
>>>> which allocates new chunks of memory and frees old ones, might very 
>>>> much encounter
>>>> heavy fragmentation of memory. Hopefully this patch is going to help
>>>> in these scenarios.
>>>>
>>>> It just happens that object_size() and large_object_size() needed to be 
>>>> adjusted to work with new malloc_large() implementation. It was 
>>>> discovered
>>>> that therefore other related issues are fixed by this patch as well.
>>>>
>>>> Fixes #854
>>>> Fixed #1077
>>>>
>>>> This patch also potentially addresses realloc() related issue #784.
>>>>
>>>
>>> It seems very likely it does address this as well. Maybe we should just 
>>> close this issue too, because we're never going to get more information 
>>> about this issue.
>>>
>>>
>>>> Lastly this patch adds new test verifying realloc() and 
>>>> malloc_usable_size().
>>>>
>>>> Signed-off-by: Waldemar Kozaczuk <jwkoz...@gmail.com>
>>>> ---
>>>>  core/mempool.cc        | 66 +++++++++++++++++++++++------
>>>>  modules/tests/Makefile |  2 +-
>>>>  tests/tst-realloc.cc   | 94 ++++++++++++++++++++++++++++++++++++++++++
>>>>  3 files changed, 149 insertions(+), 13 deletions(-)
>>>>  create mode 100644 tests/tst-realloc.cc
>>>>
>>>> diff --git a/core/mempool.cc b/core/mempool.cc
>>>> index 11fd1456..1bf010d6 100644
>>>> --- a/core/mempool.cc
>>>> +++ b/core/mempool.cc
>>>> @@ -546,7 +546,7 @@ public:
>>>>      page_range_allocator() : _deferred_free(nullptr) { }
>>>>
>>>>      template<bool UseBitmap = true>
>>>> -    page_range* alloc(size_t size);
>>>> +    page_range* alloc(size_t size, bool contiguous = true);
>>>>      page_range* alloc_aligned(size_t size, size_t offset, size_t 
>>>> alignment,
>>>>                                bool fill = false);
>>>>      void free(page_range* pr);
>>>> @@ -678,7 +678,7 @@ void 
>>>> page_range_allocator::bitmap_allocator<T>::deallocate(T* p, size_t n)
>>>>  }
>>>>
>>>>  template<bool UseBitmap>
>>>> -page_range* page_range_allocator::alloc(size_t size)
>>>> +page_range* page_range_allocator::alloc(size_t size, bool contiguous)
>>>>  {
>>>>      auto exact_order = ilog2_roundup(size / page_size);
>>>>      if (exact_order > max_order) {
>>>> @@ -692,13 +692,12 @@ page_range* page_range_allocator::alloc(size_t 
>>>> size)
>>>>
>>>>      page_range* range = nullptr;
>>>>      if (!bitset) {
>>>> -        if (!exact_order || _free[exact_order - 1].empty()) {
>>>> +        if (!contiguous || !exact_order || _free[exact_order - 
>>>> 1].empty()) {
>>>>              return nullptr;
>>>>          }
>>>> -        // TODO: This linear search makes worst case complexity of the 
>>>> allocator
>>>> -        // O(n). It would be better to fall back to non-contiguous 
>>>> allocation
>>>> -        // and make worst case complexity depend on the size of 
>>>> requested memory
>>>> -        // block and the logarithm of the number of free huge page 
>>>> ranges.
>>>> +        // This linear search makes worst case complexity of the 
>>>> allocator
>>>> +        // O(n). Unfortunately we do not have choice for contiguous 
>>>> allocation
>>>> +        // so let us hope there is large enough range.
>>>>          for (auto&& pr : _free[exact_order - 1]) {
>>>>              if (pr.size >= size) {
>>>>                  range = &pr;
>>>> @@ -823,7 +822,16 @@ void page_range_allocator::for_each(unsigned 
>>>> min_order, Func f)
>>>>      }
>>>>  }
>>>>
>>>> -static void* malloc_large(size_t size, size_t alignment, bool block = 
>>>> true)
>>>> +static void* mapped_malloc_large(size_t size, size_t offset)
>>>> +{
>>>> +    //TODO: For now pre-populate the memory, in future consider doing 
>>>> lazy population
>>>> +    void* obj = mmu::map_anon(nullptr, size, mmu::mmap_populate, 
>>>> mmu::perm_read | mmu::perm_write);
>>>> +    size_t* ret_header = static_cast<size_t*>(obj);
>>>> +    *ret_header = size;
>>>> +    return obj + offset;
>>>> +}
>>>> +
>>>> +static void* malloc_large(size_t size, size_t alignment, bool block = 
>>>> true, bool contiguous = true)
>>>>  {
>>>>      auto requested_size = size;
>>>>      size_t offset;
>>>> @@ -835,6 +843,14 @@ static void* malloc_large(size_t size, size_t 
>>>> alignment, bool block = true)
>>>>      size += offset;
>>>>      size = align_up(size, page_size);
>>>>
>>>> +    // Use mmap if requested memory greater than "huge page" size
>>>> +    // and does not need to be contiguous
>>>> +    if (size >= mmu::huge_page_size && !contiguous) {
>>>> +        void* obj = mapped_malloc_large(size, offset);
>>>> +        trace_memory_malloc_large(obj, requested_size, size, 
>>>> alignment);
>>>> +        return obj;
>>>> +    }
>>>> +
>>>>      while (true) {
>>>>          WITH_LOCK(free_page_ranges_lock) {
>>>>              reclaimer_thread.wait_for_minimum_memory();
>>>> @@ -842,7 +858,7 @@ static void* malloc_large(size_t size, size_t 
>>>> alignment, bool block = true)
>>>>              if (alignment > page_size) {
>>>>                  ret_header = free_page_ranges.alloc_aligned(size, 
>>>> page_size, alignment);
>>>>              } else {
>>>> -                ret_header = free_page_ranges.alloc(size);
>>>> +                ret_header = free_page_ranges.alloc(size, contiguous);
>>>>              }
>>>>              if (ret_header) {
>>>>                  on_alloc(size);
>>>> @@ -850,6 +866,10 @@ static void* malloc_large(size_t size, size_t 
>>>> alignment, bool block = true)
>>>>                  obj += offset;
>>>>                  trace_memory_malloc_large(obj, requested_size, size, 
>>>> alignment);
>>>>                  return obj;
>>>> +            } else if (!contiguous) {
>>>> +                // If we failed to get contiguous memory allocation and
>>>> +                // the caller does not require one let us use 
>>>> map-based allocation
>>>> +                break;
>>>>              }
>>>>              if (block)
>>>>                  reclaimer_thread.wait_for_memory(size);
>>>> @@ -857,6 +877,10 @@ static void* malloc_large(size_t size, size_t 
>>>> alignment, bool block = true)
>>>>                  return nullptr;
>>>>          }
>>>>      }
>>>> +
>>>> +    void* obj = mapped_malloc_large(size, offset);
>>>> +    trace_memory_malloc_large(obj, requested_size, size, alignment);
>>>> +    return obj;
>>>>
>>>
>>> nipick: wouldn't it be clearer to put this code where you have the 
>>> "break" above (instead of the "break"), instead of here?
>>>
>>>  }
>>>>
>>>>  void shrinker::deactivate_shrinker()
>>>> @@ -1072,9 +1096,11 @@ static void free_large(void* obj)
>>>>
>>>>  static unsigned large_object_size(void *obj)
>>>>  {
>>>> +    auto original_obj = obj;
>>>>      obj = align_down(obj - 1, page_size);
>>>>
>>> +    size_t offset = reinterpret_cast<uint64_t>(original_obj) - 
>>>> reinterpret_cast<uint64_t>(obj); 
>>>>
>>>      auto header = static_cast<page_range*>(obj);
>>>> -    return header->size;
>>>> +    return header->size - offset; 
>>>>
>>>  }
>>>>
>>>
>>> I wonder if just this change explains 
>>> https://github.com/cloudius-systems/osv/issues/784 and we should close 
>>> that issue.
>>> But don't feel an expert enough in this issue to say...
>>>
>>>
>>>>  namespace page_pool {
>>>> @@ -1692,7 +1718,7 @@ static inline void* std_malloc(size_t size, 
>>>> size_t alignment)
>>>>                                         memory::alloc_page());
>>>>          trace_memory_malloc_page(ret, size, mmu::page_size, alignment);
>>>>      } else {
>>>> -        ret = memory::malloc_large(size, alignment);
>>>> +        ret = memory::malloc_large(size, alignment, true, false);
>>>>      }
>>>>      memory::tracker_remember(ret, size);
>>>>      return ret;
>>>> @@ -1714,6 +1740,14 @@ void* calloc(size_t nmemb, size_t size)
>>>>
>>>>  static size_t object_size(void *object)
>>>>  {
>>>> +    if (!mmu::is_linear_mapped(object, 0)) {
>>>> +        auto original_object = object;
>>>> +        object = align_down(object - 1, mmu::page_size);
>>>> +        size_t offset = reinterpret_cast<uint64_t>(original_object) - 
>>>> reinterpret_cast<uint64_t>(object);
>>>> +        size_t* ret_header = static_cast<size_t*>(object);
>>>> +        return *ret_header - offset;
>>>>
>>>
>>> Can you just call large_object_size() here instead of duplicating its 
>>> code?
>>> Just like you do below in the mmu::mem_area::main case?
>>>   
>>>
>>>> +    }
>>>> +
>>>>      switch (mmu::get_mem_area(object)) {
>>>>      case mmu::mem_area::main:
>>>>          return memory::large_object_size(object);
>>>> @@ -1763,6 +1797,14 @@ void free(void* object)
>>>>          return;
>>>>      }
>>>>      memory::tracker_forget(object);
>>>> +
>>>> +    if (!mmu::is_linear_mapped(object, 0)) {
>>>> +        object = align_down(object - 1, mmu::page_size);
>>>> +        size_t* ret_header = static_cast<size_t*>(object);
>>>> +        mmu::munmap(object, *ret_header);
>>>> +        return;
>>>>
>>>
>>> Nitpick:  would be nice to put this code in a function 
>>> mapped_free_large(), and put this
>>> function next to mapped_malloc_large() which you already have - so a 
>>> reader can see
>>> the two of them next to each other.
>>>
>>> +    }
>>>> +
>>>>      switch (mmu::get_mem_area(object)) {
>>>>      case mmu::mem_area::page:
>>>>          object = mmu::translate_mem_area(mmu::mem_area::page,
>>>> @@ -1972,7 +2014,7 @@ void* alloc_phys_contiguous_aligned(size_t size, 
>>>> size_t align, bool block)
>>>>      assert(is_power_of_two(align));
>>>>      // make use of the standard large allocator returning properly 
>>>> aligned
>>>>      // physically contiguous memory:
>>>> -    auto ret = malloc_large(size, align, block);
>>>> +    auto ret = malloc_large(size, align, block, true);
>>>>      assert (!(reinterpret_cast<uintptr_t>(ret) & (align - 1)));
>>>>      return ret;
>>>>  }
>>>> diff --git a/modules/tests/Makefile b/modules/tests/Makefile
>>>> index ce004339..10df022f 100644
>>>> --- a/modules/tests/Makefile
>>>> +++ b/modules/tests/Makefile
>>>> @@ -129,7 +129,7 @@ tests := tst-pthread.so misc-ramdisk.so tst-vblk.so 
>>>> tst-bsd-evh.so \
>>>>         tst-sigaltstack.so tst-fread.so tst-tcp-cork.so tst-tcp-v6.so \
>>>>         tst-calloc.so tst-crypt.so tst-non-fpic.so tst-small-malloc.so \
>>>>         tst-mmx-fpu.so tst-getopt.so tst-getopt-pie.so tst-non-pie.so 
>>>> tst-semaphore.so \
>>>> -       tst-elf-init.so
>>>> +       tst-elf-init.so tst-realloc.so
>>>>  #      libstatic-thread-variable.so tst-static-thread-variable.so \
>>>>
>>>>  tests += testrunner.so
>>>> diff --git a/tests/tst-realloc.cc b/tests/tst-realloc.cc
>>>> new file mode 100644
>>>> index 00000000..49edfb7a
>>>> --- /dev/null
>>>> +++ b/tests/tst-realloc.cc
>>>> @@ -0,0 +1,94 @@
>>>> +/*
>>>> +* Copyright (C) 2020 Waldemar Kozaczuk
>>>> +*
>>>> +* This work is open source software, licensed under the terms of the
>>>> +* BSD license as described in the LICENSE file in the top-level 
>>>> directory.
>>>> +*/
>>>> +
>>>> +#include <stdlib.h>
>>>> +#include <string.h>
>>>> +#include <cassert>
>>>> +#include <iostream>
>>>> +
>>>> +extern "C" size_t malloc_usable_size (void *ptr);
>>>> +
>>>> +static void test_realloc(size_t original_size, size_t new_size)
>>>> +{
>>>> +    char data[11] = "0123456789";
>>>> +
>>>> +    void *original_buf = malloc(original_size);
>>>> +    assert(original_buf);
>>>> +
>>>> +    char *buf = static_cast<char*>(original_buf);
>>>> +    for (size_t i = 0; i < original_size; i++) {
>>>> +        buf[i] = data[i % 10];
>>>> +    }
>>>> +
>>>> +    void *new_buf = realloc(original_buf, new_size);
>>>> +    assert(new_buf);
>>>> +
>>>> +    auto expected_same_data_len = std::min(original_size, new_size);
>>>> +    buf = static_cast<char*>(new_buf);
>>>> +    for (size_t i = 0; i < expected_same_data_len; i++) {
>>>> +        assert(buf[i] == data[i % 10]);
>>>> +    }
>>>> +
>>>> +    free(new_buf);
>>>> +
>>>> +    std::cerr << "PASSED realloc() for original_size: " << 
>>>> original_size << ", new_size: " << new_size << std::endl;
>>>> +}
>>>> +
>>>> +static void test_usable_size(size_t size, size_t expected_usable_size)
>>>> +{
>>>> +    void* ptr = malloc(size);
>>>> +    assert(expected_usable_size == malloc_usable_size(ptr));
>>>> +    free(ptr);
>>>> +
>>>> +    std::cerr << "PASSED malloc_usable_size() for size: " << size << 
>>>> std::endl;
>>>> +}
>>>> +
>>>> +int main()
>>>> +{
>>>> +    test_realloc(1,2);
>>>> +    test_realloc(2,1);
>>>> +
>>>> +    test_realloc(4,7);
>>>> +    test_realloc(7,4);
>>>> +
>>>> +    test_realloc(63,128);
>>>> +    test_realloc(128,63);
>>>> +
>>>> +    test_realloc(4000,5000);
>>>> +    test_realloc(5000,4000);
>>>> +
>>>> +    test_realloc(4096,4096);
>>>> +
>>>> +    test_realloc(0x100000,0x100000);
>>>> +    test_realloc(0x100000,0x100900);
>>>> +    test_realloc(0x100900,0x100000);
>>>> +
>>>> +    test_realloc(0x200000,0x200000);
>>>> +    test_realloc(0x200000,0x300900);
>>>> +    test_realloc(0x300900,0x200000);
>>>> +
>>>> +    test_realloc(0x600900,0x600000);
>>>> +    test_realloc(0x400000,0x600000);
>>>> +    test_realloc(0x600000,0x400900);
>>>> +
>>>> +    void *buf = realloc(nullptr, 0);
>>>> +    assert(buf);
>>>> +    free(buf);
>>>> +
>>>> +    buf = malloc(16);
>>>> +    assert(!realloc(buf, 0));
>>>> +
>>>> +    test_usable_size(1, 8);
>>>> +    test_usable_size(8, 8);
>>>> +    test_usable_size(67, 128);
>>>> +    test_usable_size(0x4010, 0x4FC0);
>>>> +    test_usable_size(0x100000, 0x100FC0);
>>>> +    test_usable_size(0x200000, 0x200FC0);
>>>> +
>>>> +    std::cerr << "PASSED\n";
>>>> +    return 0;
>>>> +}
>>>> \ No newline at end of file
>>>> -- 
>>>> 2.20.1
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "OSv Development" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to osv...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/osv-dev/20200326183626.12013-1-jwkozaczuk%40gmail.com
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "OSv Development" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to osv...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/osv-dev/29288786-0656-4ab7-b068-c6b249e2dfa3%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/osv-dev/29288786-0656-4ab7-b068-c6b249e2dfa3%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>


-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/4f353577-06bf-4ca8-b708-c5ed4c1d7bc8%40googlegroups.com.

Re: [osv-dev] [PATCH V4 2/2] mempool: use map_anon() for large allocations or when memory is fragmented

Reply via email to