> I'm having a guess that this can be caused by use of > alloc_pages_exact() for NoMMU private anonymous mappings. > > This routine causes "tail" of allocation to be returned back > to allocator... and inserted at top of free list. Later, when > whatever in the system makes a trivial order-0 allocation, these > just-freed tails immediately get used (because free pages are > inserted at end of free lists... and new pages are allocated > from either beginning or end of free list, depending on 'cold' > parameter). At a glance, this should have a net effect of much > increased probability of "tail" of large allocation get used > as a small allocation, and thus inability to rebuild a large free > block at time when large allocation is freed. > > Is there any protection against this effect in the allocator > of current kernels? (kernel of system in question is somewhat > outdated)
Just a small followup to whoever may be interested. The guess was correct. In old 3.0.8 kernel running on system in question, do_mmap_private() in mm/nommu.c did not yet use alloc_pages_exact() but instead had it's own implementation of the same logic. And there was a tunable 'nr_trim_pages' that could alter it. In particular setting that tunable to 0 effectively disabled any freeing of tails. We tried to set nr_trim_pages=0 and fragmentation issue went away. Nikita