On Fri, 26 Oct 2007, Lee Schermerhorn wrote:

> That's what my "cpuset-independent interleave" patch does.  David
> doesn't like the "null node mask" interface because it doesn't work with
> libnuma.  I plan to fix that, but I'm chasing other issues.  I should
> get back to the mempol work after today.
> 

Hacking and requiring an updated version of libnuma to allow empty 
nodemasks to be passed is a poor solution; if mempolicy's are supposed to 
be independent from cpusets, then what semantics does an empty nodemask 
actually imply when using MPOL_INTERLEAVE?  To me, it means the entire 
set_mempolicy() should be a no-op, and that's exactly how mainline 
currently treats it _as_well_ as libnuma.  So justifying this change in 
the man page is respectible, but passing an empty nodemask just doesn't 
make sense.

> What I like about the cpuset independent interleave is that the "policy
> remap" when cpusets are changed is a NO-OP--no need to change the
> policy.  Just as "preferred local" policy chooses the node where the
> allocation occurs, my cpuset independent interleave patch interleaves
> across the set of nodes available at the time of the allocation.  The
> application has to specifically ask for this behavior by the null/empty
> nodemask or the TBD libnuma API.  IMO, this is the only reasonable
> interleave policy for apps running in dynamic cpusets.
> 

Passing empty nodemasks with MPOL_INTERLEAVE to set_mempolicy() is the 
only reasonable way of specifying you want, at all times, to interleave 
over all available nodes?  I doubt it.

I personally prefer an approach where cpusets take the responsibility for 
determining how policies change (they use set_mempolicy() anyway to effect 
their mems boundaries) because it's cpusets that has changed the available 
nodemask out from beneath the application.  So instead of trying to create 
a solution where cpusets impact mempolicies and mempolicies impact 
cpusets, it should only be in a single direction.  Cpusets change the 
set of available nodes and should update the attached tasks' mempolicies 
at the same time.  That's the same as saying that cpusets should be built 
on top of mempolicies, which they are, and shouldn't have any reverse 
dependency.

> An aside:  if David et al [at google] are using cpusets on fake numa for
> resource management [I don't know this is the case, but saw some
> discussions way back that indicate it might be?], then maybe this
> becomes less of an issue when control groups [a.k.a. containers] and
> memory resource controls come to fruition?
> 

Completely irrelevant; I care about the interaction between cpusets and 
mempolicies in mainline Linux.

                David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to