On Tue, Nov 20, 2012 at 05:09:18PM +0100, Ingo Molnar wrote: > > Ok, the patch withstood a bit more testing as well. Below is a > v2 version of it, with a couple of cleanups (no functional > changes). > > Thanks, > > Ingo > > -----------------> > Subject: mm, numa: Turn 4K pte NUMA faults into effective hugepage ones > From: Ingo Molnar <mi...@kernel.org> > Date: Tue Nov 20 15:48:26 CET 2012 > > Reduce the 4K page fault count by looking around and processing > nearby pages if possible. > > To keep the logic and cache overhead simple and straightforward > we do a couple of simplifications: > > - we only scan in the HPAGE_SIZE range of the faulting address > - we only go as far as the vma allows us > > Also simplify the do_numa_page() flow while at it and fix the > previous double faulting we incurred due to not properly fixing > up freshly migrated ptes. > > Suggested-by: Mel Gorman <mgor...@suse.de> > Cc: Linus Torvalds <torva...@linux-foundation.org> > Cc: Andrew Morton <a...@linux-foundation.org> > Cc: Peter Zijlstra <a.p.zijls...@chello.nl> > Cc: Andrea Arcangeli <aarca...@redhat.com> > Cc: Rik van Riel <r...@redhat.com> > Cc: Hugh Dickins <hu...@google.com> > Signed-off-by: Ingo Molnar <mi...@kernel.org> > --- > mm/memory.c | 99 > ++++++++++++++++++++++++++++++++++++++---------------------- > 1 file changed, 64 insertions(+), 35 deletions(-) >
This is functionally similar to what balancenuma does but there is one key difference worth noting. I only mark the PMD pmd_numa if all the pages pointed to by the updated[*] PTEs underneath are on the same node. The intention is that if the workload is converged on a PMD boundary then a migration of all the pages underneath will be remote->local copies. If the workload is not converged on a PMD boundary and you handle all the faults then you are potentially incurring remote->remote copies. It also means that if the workload is not converged on the PMD boundary then a PTE fault is just one page. With yours, it will be the full PMD every time, right? [*] Note I said only the updated ptes are checked. I do not check every PTE underneath. I could but felt the benefit would be marginal. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/