On Tue, Sep 10, 2019 at 5:15 PM Waldek Kozaczuk <jwkozac...@gmail.com> wrote:
> > > On Tuesday, September 10, 2019 at 2:52:52 AM UTC-4, Nadav Har'El wrote: >> >> >> On Mon, Sep 9, 2019 at 2:41 PM <pusno...@gmail.com> wrote: >> >>> Hi, >>> I found malloc returns physical address in mempool area and does not >>> perform demand paging (only mmap does). >>> Is there any reason for the design choice? >>> >> >> I guess you're not really asking about *demand* paging ("swapping") >> because this feature is usually an unnecessary complication in >> single-application kernels. If I understand correctly, your question more >> about why does malloc() allocate physically contiguous memory unlike mmap(). >> >> The answer is that we originally did this because of the issue of huge >> pages. Modern CPUs have another level above the regular 4K pages - 2 MB >> pages called "huge pages". Applications get a performance boost by using >> huge pages, because the CPU's page table cache (the TLB) can only fit a >> fixed number of pages, so an application using few huge tables instead of a >> large number of small pages will have a higher hit rate in this cache, and >> improved performance. So it is inefficient to allocate a 8 KB allocation >> using small pages (two separate pages in physical pages but contiguous in >> virtual memory) - it is more efficient to set up huge pages and return the >> 8KB allocation as a contiguous part of such a huge-table. We measured this >> to noticeably improve (by a few percent) of applications which use memory >> allocated in small and-medium sized allocations. >> >> That being said, for really large allocations - significantly over 2MB >> (the huge-page size) - there's no real reason why we need those to be >> contiguous in physical memory - we can build them from 2MB huge-pages, each >> contiguous in physical memory but overall the entire object is not. In >> fact, this is *exactly* what our mmap() does. So it would be nice if >> malloc() could fall back to call mmap() for allocations larger than some >> threshold (2MB, 4MB, or whatever). This is definitely doable - we have an >> open issue about this: https://github.com/cloudius-systems/osv/issues/854 >> - and it explains how it can be done. >> > Wouldn't we also have to employ the trick you suggested in issue > https://github.com/cloudius-systems/osv/issues/143 - pre-fault the memory > to make sure that kernel code does not access non-committed when preemption > is disabled? Or that requirement only applies to memory mmaped for stacks? > Most OSv kernel code runs in preemption mode. Only a small amount of kernel code runs with preemption disabled, and it doesn't normally access user-allocated objects. One notable exception is the stack which even preemption-disabled code uses. But you're right that there may be *kernel* code which uses malloc() with the implicit assumption that it always returns mapped and/or physically-contiguous memory. Such code should really call alloc_phys_contiguous_aligned() but perhaps doesn't (and in any case that function calls malloc() today :-)). I'm hoping that if we'll only use mmap for very large malloc(), we'll never notice any of these problems, because the kernel will not likely be working with very large allocations. >> >>> OSv fails, even if it only uses small portion of allocated memory. >>> >> >> In your example, if I understand correctly, you tried to allocate 512 MB >> with a 128 MB memory, so it's not "a small portion" of memory - it's more >> than the memory you have :-) >> >> But the issue still has merit. If you tried to allocate 50 MB it might >> have still have failed, because of memory fragmentation (i.e., we have 50 >> MB free memory, but not contiguous in physical memory). >> >> >>> >>> #include <stdio.h> >>> #include <stdlib.h> >>> #include <sys/mman.h> >>> >>> int main() >>> { >>> size_t size = 512 * 1024 * 1024; >>> printf("Hello from main\n"); >>> printf("allocation %x start\n", size); >>> //int *p = (int *)malloc(size); // FAIL >>> int *p = (int *)mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | >>> MAP_ANONYMOUS, -1, 0); // OK >>> printf("allocation %x = %p\n", size, p); >>> *(p) = 512; >>> printf("access done\n"); >>> >>> return 0; >>> } >>> >>> >>> Thanks. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "OSv Development" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to osv...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/osv-dev/5378fb86-73b9-4987-aa0a-70a573d1921b%40googlegroups.com >>> <https://groups.google.com/d/msgid/osv-dev/5378fb86-73b9-4987-aa0a-70a573d1921b%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "OSv Development" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to osv-dev+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/osv-dev/65925351-9586-43c1-a68f-d51083e86aa7%40googlegroups.com > <https://groups.google.com/d/msgid/osv-dev/65925351-9586-43c1-a68f-d51083e86aa7%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/CANEVyjvk9Vf140BP5RD-%2BjvrvqQ8ZGRkf4XAhYRxmHjp9uTs6g%40mail.gmail.com.