On 2015/04/25 5:01, Andrew Morton wrote:
On Fri, 24 Apr 2015 17:58:33 +0800 Gu Zheng <[email protected]> wrote:Since the change to the cpu <--> mapping (map the cpu to the physical node for all possible at the boot), the node of cpu may be not present, so we use the best near online node if the node is not present in the low level allocation APIs. ... --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -298,9 +298,31 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order, return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL); } +static int find_near_online_node(int node) +{ + int n, val; + int min_val = INT_MAX; + int best_node = -1; + + for_each_online_node(n) { + val = node_distance(node, n); + + if (val < min_val) { + min_val = val; + best_node = n; + } + } + + return best_node; +}This should be `inline' if it's in a header file. But it is far too large to be inlined anyway - please move it to a .c file. And please document it. A critical thing to describe is how we determine whether a node is "near". There are presumably multiple ways in which we could decide that a node is "near" (number of hops, minimum latency, ...). Which one did you choose, and why?static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order) { + /* Offline node, use the best near online node */ + if (!node_online(nid)) + nid = find_near_online_node(nid); + /* Unknown node is current node */ if (nid < 0) nid = numa_node_id(); @@ -311,7 +333,11 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask, static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask, unsigned int order) { - VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid)); + /* Offline node, use the best near online node */ + if (!node_online(nid)) + nid = find_near_online_node(nid);
In above VM_BUG_ON(), !node_online(nid) is the bug.
+ + VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES); return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask)); }Ouch. These functions are called very frequently, and adding overhead to them is a big deal. And the patch even adds overhead to non-x86 architectures which don't benefit from it! Is there no way this problem can be fixed somewhere else? Preferably by fixing things up at hotplug time.
I agree. the results should be cached. If necessary, in per-cpu line. Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

