On 03/13/2014 11:37 PM, Minchan Kim wrote: > This patch is an attempt to support MADV_FREE for Linux. > > Rationale is following as. > > Allocators call munmap(2) when user call free(3) if ptr is > in mmaped area. But munmap isn't cheap because it have to clean up > all pte entries, unlinking a vma and returns free pages to buddy > so overhead would be increased linearly by mmaped area's size. > So they like madvise_dontneed rather than munmap. > > "dontneed" holds read-side lock of mmap_sem so other threads > of the process could go with concurrent page faults so it is > better than munmap if it's not lack of address space. > But the problem is that most of allocator reuses that address > space soonish so applications see page fault, page allocation, > page zeroing if allocator already called madvise_dontneed > on the address space. > > For avoidng that overheads, other OS have supported MADV_FREE. > The idea is just mark pages as lazyfree when madvise called > and purge them if memory pressure happens. Otherwise, VM doesn't > detach pages on the address space so application could use > that memory space without above overheads.
I must be missing something. If the application issues MADV_FREE and then writes to the MADV_FREEd range, the kernel needs to know that the pages are no longer safe to lazily free. This would presumably happen via a page fault on write. For that to happen reliably, the kernel has to write protect the pages when MADV_FREE is called, which in turn requires flushing the TLBs. How does this end up being faster than munmap? --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/