Hello, On Wed, Oct 31, 2012 at 02:59:07PM -0700, Paul Turner wrote: > On Wed, Oct 31, 2012 at 2:35 PM, Andrew Morton > <a...@linux-foundation.org> wrote: > > > > On Tue, 30 Oct 2012 10:29:54 +0900 > > Minchan Kim <minc...@kernel.org> wrote: > > > > > This patch introudces new madvise behavior MADV_VOLATILE and > > > MADV_NOVOLATILE for anonymous pages. It's different with > > > John Stultz's version which considers only tmpfs while this patch > > > considers only anonymous pages so this cannot cover John's one. > > > If below idea is proved as reasonable, I hope we can unify both > > > concepts by madvise/fadvise. > > > > > > Rationale is following as. > > > Many allocators call munmap(2) when user call free(3) if ptr is > > > in mmaped area. But munmap isn't cheap because it have to clean up > > > all pte entries and unlinking a vma so overhead would be increased > > > linearly by mmaped area's size. > > > > Presumably the userspace allocator will internally manage memory in > > large chunks, so the munmap() call frequency will be much lower than > > the free() call frequency. So the performance gains from this change > > might be very small. > > I don't think I strictly understand the motivation from a > malloc-standpoint here. > > These days we (tcmalloc) use madvise(..., MADV_DONTNEED) when we want > to perform discards on Linux. For any reasonable allocator (short > of binding malloc --> mmap, free --> unmap) this seems a better > choice. > > Note also from a performance stand-point I doubt any allocator (which > case about performance) is going to want to pay the cost of even a > null syscall about typical malloc/free usage (consider: a tcmalloc
Good point. > malloc/free pairis currently <20ns). Given then that this cost is > amortized once you start doing discards on larger blocks MADV_DONTNEED > seems a preferable interface: > - You don't need to reconstruct an arena when you do want to allocate > since there's no munmap/mmap for the region to change about > - There are no syscalls involved in later reallocating the block. Above benefits are applied on MADV_VOLATILE, too. But as you pointed out, there is a little bit overhead than DONTNEED because allocator should call madvise(MADV_NOVOLATILE) before allocation. For mavise(NOVOLATILE) does just mark vma flag, it does need mmap_sem and could be a problem on parallel malloc/free workload as KOSAKI pointed out. In such case, we can change semantic so malloc doesn't need to call madivse(NOVOLATILE) before allocating. Then, page fault handler have to check whether this page fault happen by access of volatile vma. If so, it could return zero page instead of SIGBUS and mark the vma isn't volatile any more. > > The only real additional cost is address-space. Are you strongly > concerned about the 32-bit case? No. I believe allocators have a logic to clean up them once address space is almost full. Thanks, Paul. -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/