Mel Gorman <mgor...@suse.de> writes: > On Wed, Oct 01, 2014 at 09:18:25AM -0700, Linus Torvalds wrote: >> On Wed, Oct 1, 2014 at 9:01 AM, Linus Torvalds >> <torva...@linux-foundation.org> wrote: >> > >> > We need to get rid of it, and just make it the same as pte_protnone(). >> > And then the real protnone is in the vma flags, and if you actually >> > ever get to a pte that is marked protnone, you know it's a numa page. >> >> So I'd really suggest we do exactly that. Get rid of "pte_numa()" >> entirely, get rid of "_PAGE_[BIT_]NUMA" entirely, and instead add a >> "pte_protnone()" helper to check for the "protnone" case (which on x86 >> is testing the _PAGE_PROTNONE bit, and on most other architectures is >> just testing that the page has no access rights). >> > > Do not interpret the following as being against the idea of taking the > pte_protnone approach. This is intended to give background. > > At the time the changes were made to the _PAGE_NUMA bits it was acknowledged > that a full move to prot_none was an option but it was not the preferred > solution at the time. It replaced one set of corner cases with another and > the last time like this time, there was considerable time pressure. The > VMA would be required to distinguish between a NUMA hinting fault and a > real prot_none bit. In most cases, we have the VMA now with the exception > of GUP. GUP would have to unconditionally go into the slow path to do the > VMA lookup. That is not likely to be a big of a problem but it was a concern. > > In early implementations based on prot_none there were some VMA-based > protection checks that had higher overhead. At the time, there were severe > problems with overhead due to NUMA balancing and adding more was not > desirable. This has been addressed since with changes in multiple other > areas so it's much less of a concern now than it was. In the current shape, > these probably is not as much a problem as long as any check on pte_numa > was first guarded by a VMA check. One way of handling the corner cases > where would be to pass in the VMA where available and have a VM_BUG_ON that > fires if its a PROT_NONE VMA. That would catch problems during debugging > without adding overhead in the !debug case. > > Going back to the start, the PTE bit was used as the approach due to > concerns that a pte_protnone helper would not work on all architectures, > ppc64 in particular. There was no PROT_NONE bit there and instead prot_none > protections rely on PAGE_USER not being set so it's inaccessible from > userspace. There was discussion at the time that this could conceivably be > broken from some sub-architectures but I don't recall the details. Looking > at the current shape and your patch, it's conceivable that the pte_protnone > could be implemented as a _PAGE_PRESENT && !_PAGE_USER check as long > as it was guarded by a VMA check which x86 requires anyway. Not sure > if that would work for PMDs as I'm not familiar with with ppc64 to tell > offhand. Alternatively, ppc64 would potentially use the bit currently used > for _PAGE_NUMA as a _PROT_NONE bit.
Are we still looking at these options ? I could look at implementing the first option which will also enable us to free up one pte bit. Note: Freeing up one bit will enable us to implement soft dirty tracking needed for CRIU. -aneesh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/