In the special "pmd" mode of knuma_scand
(/sys/kernel/mm/autonuma/knuma_scand/pmd == 1), the pmd may be of numa
type (_PAGE_PRESENT not set), however the pte might be
present. Therefore, gup_pmd_range() must return 0 in this case to
avoid losing a NUMA hinting page fault during gup_fast.

Note: gup_fast will skip over non present ptes (like numa types), so
no explicit check is needed for the pte_numa case. gup_fast will also
skip over THP when the trans huge pmd is non present. So, the pmd_numa
case will also be correctly skipped with no additional code changes
required.

Acked-by: Rik van Riel <r...@redhat.com>
Signed-off-by: Andrea Arcangeli <aarca...@redhat.com>
---
 arch/x86/mm/gup.c |   13 ++++++++++++-
 1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index dd74e46..02c5ec5 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -163,8 +163,19 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, 
unsigned long end,
                 * can't because it has irq disabled and
                 * wait_split_huge_page() would never return as the
                 * tlb flush IPI wouldn't run.
+                *
+                * The pmd_numa() check is needed because the code
+                * doesn't check the _PAGE_PRESENT bit of the pmd if
+                * the gup_pte_range() path is taken. NOTE: not all
+                * gup_fast users will will access the page contents
+                * using the CPU through the NUMA memory channels like
+                * KVM does. So we're forced to trigger NUMA hinting
+                * page faults unconditionally for all gup_fast users
+                * even though NUMA hinting page faults aren't useful
+                * to I/O drivers that will access the page with DMA
+                * and not with the CPU.
                 */
-               if (pmd_none(pmd) || pmd_trans_splitting(pmd))
+               if (pmd_none(pmd) || pmd_trans_splitting(pmd) || pmd_numa(pmd))
                        return 0;
                if (unlikely(pmd_large(pmd))) {
                        if (!gup_huge_pmd(pmd, addr, next, write, pages, nr))
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to