On Wed, 13 Feb 2008, Paul Jackson wrote:

> Yes, if an application considers nodes to be interchangeable, I'm
> trying to avoid having that application -have- to know its current
> cpuset placement, for two reasons:
> 
>     For one thing, it's racey.  It's cpuset placement could change,
>     unbeknownst to it, between the time it queried it, and the time
>     that it issued the mbind or set_mempolicy call.
> 
>     For the other thing, it's not always possible.  If the application
>     is currently in a cpuset that is smaller than it's preferred
>     configuration, it would not be possible to express its preferred
>     memory policies using just the smaller number of memory nodes
>     allowed by its current cpuset placement.  How do you say "put
>     this on my third node" if you don't have a third node and you
>     can only speak of the nodes you currently have?
> 

So let's say, like my first example from the previous email, that you have 
MPOL_INTERLEAVE | MPOL_F_RELATIVE_NODES over nodes 3-4 and your cpuset's 
mems is only nodes 5-7.  This would interleave over no nodes.  Correct?

It seems like MPOL_F_RELATIVE_NODES is primarily designed to maintain a 
certain order among the nodes it effects the mempolicy over.  It comes 
with the premise that the task doesn't already know it's cpuset mems 
(otherwise, the current implementation without MPOL_F_STATIC_NODES would 
work fine for this) so it doesn't really care what nodes it allocates 
pages on, it just cares about the order.

This works for MPOL_PREFERRED and MPOL_BIND as well, right?

I don't understand the use case for this (at all), but if you have 
workloads that require this type of setting then I can implement this as 
part of my series.  I just want to confirm that there are real world cases 
backing this so that we don't have flags with highly highly specialized 
cornercases.

 [ If a user _does_ specify MPOL_F_STATIC_NODES | MPOL_F_RELATIVE_NODES
   as part of their syscall, then we'll simply return -EINVAL. ]

> > Well, I didn't cave on anything 
> 
> ;)   Your simple "ok" was ambiguous enough that we were able to
>      read into it whatever we wanted to.
> 
> But I've made my case on that issue (involving the separate or
> packed policy flag field).  So I probably won't say more, and
> I expect to live with whatever you choose, after any further
> input from Lee or others.
> 

Well, there's advantages and disadvantages to either approach.

My preference (both mode and flags stored in the same member of struct 
mempolicy):

   Advantages:

        - completely consistent with the userspace API of passing modes
          and flags together in a pointer to an int, and

        - does not require additional formals to be added to several
          functions, including functions outside mm/mempolicy.c.

   Disadvantage:

        - use of mpol_mode() throughout mm/mempolicy.c code to mask
          off optional mode flags for conditionals or switch statements.

Your preference (separate mode and flags members in struct mempolicy):

   Advantages:

        - clearer implementation when dealing with modes: all existing
          statements involving pol->policy can remain unchanged.

   Disadvantages:

        - requires additional formals to be added to several functions,
          including functions outside mm/mempolicy.c, and

        - takes additional space in struct mempolicy (two bytes) which
          could eventually be used for something else.

In both cases the testing of mode flags is the same as before:

        if (pol->policy & MPOL_F_STATIC_NODES) {
                ...
        }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to