Re: [ckrm-tech] [RFC][PATCH][2/4] Add RSS accounting and control

2007-02-19 Thread Vaidyanathan Srinivasan


Balbir Singh wrote:
> Vaidyanathan Srinivasan wrote:
>> Balbir Singh wrote:
>>> Paul Menage wrote:
 On 2/19/07, Balbir Singh <[EMAIL PROTECTED]> wrote:
>> More worrisome is the potential for use-after-free.  What prevents the
>> pointer at mm->container from referring to freed memory after we're 
>> dropped
>> the lock?
>>
> The container cannot be freed unless all tasks holding references to it 
> are
> gone,
 ... or have been moved to other containers. If you're not holding
 task->alloc_lock or one of the container mutexes, there's nothing to
 stop the task being moved to another container, and the container
 being deleted.

 If you're in an RCU section then you can guarantee that the container
 (that you originally read from the task) and its subsystems at least
 won't be deleted while you're accessing them, but for accounting like
 this I suspect that's not enough, since you need to be adding to the
 accounting stats on the correct container. I think you'll need to hold
 mm->container_lock for the duration of memctl_update_rss()

 Paul

>>> Yes, that sounds like the correct thing to do.
>>>
>> Accounting accuracy will anyway be affected when a process is migrated
>> while it is still allocating pages.  Having a lock here does not
>> necessarily improve the accounting accuracy.  Charges from the old
>> container would have to be moved to the new container before deletion
>> which implies all tasks have already left the container and no
>> mm_struct is holding a pointer to it.
>>
>> The only condition that will break our code will be if the container
>> pointer becomes invalid while we are updating stats.  This can be
>> prevented by RCU section as mentioned by Paul.  I believe explicit
>> lock and unlock may not provide additional benefit here.
>>
> 
> Yes, if the container pointer becomes invalid, then consider the following
> scenario
> 
> 1. Use RCU, get a reference to the container
> 2. All tasks/mm's move to newer container (and the accounting information
> moves)
> 3. Container is RCU deleted
> 4. We still charge the older container that is going to be deleted soon
> 5. Release RCU
> 6. RCU garbage collects (callback runs)
> 
> We end up charging/uncharging a soon to be deleted container, that
> is not good.
> 
> What did I miss?

You are right.  We should go with your read/write lock method.  Later
we can evaluate if using an RCU and then fixing the wrong charge will
work better or worse.

--Vaidy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [RFC][PATCH][2/4] Add RSS accounting and control

2007-02-19 Thread Balbir Singh

Vaidyanathan Srinivasan wrote:


Balbir Singh wrote:

Paul Menage wrote:

On 2/19/07, Balbir Singh <[EMAIL PROTECTED]> wrote:

More worrisome is the potential for use-after-free.  What prevents the
pointer at mm->container from referring to freed memory after we're dropped
the lock?


The container cannot be freed unless all tasks holding references to it are
gone,

... or have been moved to other containers. If you're not holding
task->alloc_lock or one of the container mutexes, there's nothing to
stop the task being moved to another container, and the container
being deleted.

If you're in an RCU section then you can guarantee that the container
(that you originally read from the task) and its subsystems at least
won't be deleted while you're accessing them, but for accounting like
this I suspect that's not enough, since you need to be adding to the
accounting stats on the correct container. I think you'll need to hold
mm->container_lock for the duration of memctl_update_rss()

Paul


Yes, that sounds like the correct thing to do.



Accounting accuracy will anyway be affected when a process is migrated
while it is still allocating pages.  Having a lock here does not
necessarily improve the accounting accuracy.  Charges from the old
container would have to be moved to the new container before deletion
which implies all tasks have already left the container and no
mm_struct is holding a pointer to it.

The only condition that will break our code will be if the container
pointer becomes invalid while we are updating stats.  This can be
prevented by RCU section as mentioned by Paul.  I believe explicit
lock and unlock may not provide additional benefit here.



Yes, if the container pointer becomes invalid, then consider the following
scenario

1. Use RCU, get a reference to the container
2. All tasks/mm's move to newer container (and the accounting information
   moves)
3. Container is RCU deleted
4. We still charge the older container that is going to be deleted soon
5. Release RCU
6. RCU garbage collects (callback runs)

We end up charging/uncharging a soon to be deleted container, that
is not good.

What did I miss?


--Vaidy




--
Warm Regards,
Balbir Singh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [RFC][PATCH][2/4] Add RSS accounting and control

2007-02-19 Thread Vaidyanathan Srinivasan


Balbir Singh wrote:
> Paul Menage wrote:
>> On 2/19/07, Balbir Singh <[EMAIL PROTECTED]> wrote:
 More worrisome is the potential for use-after-free.  What prevents the
 pointer at mm->container from referring to freed memory after we're dropped
 the lock?

>>> The container cannot be freed unless all tasks holding references to it are
>>> gone,
>> ... or have been moved to other containers. If you're not holding
>> task->alloc_lock or one of the container mutexes, there's nothing to
>> stop the task being moved to another container, and the container
>> being deleted.
>>
>> If you're in an RCU section then you can guarantee that the container
>> (that you originally read from the task) and its subsystems at least
>> won't be deleted while you're accessing them, but for accounting like
>> this I suspect that's not enough, since you need to be adding to the
>> accounting stats on the correct container. I think you'll need to hold
>> mm->container_lock for the duration of memctl_update_rss()
>>
>> Paul
>>
> 
> Yes, that sounds like the correct thing to do.
> 

Accounting accuracy will anyway be affected when a process is migrated
while it is still allocating pages.  Having a lock here does not
necessarily improve the accounting accuracy.  Charges from the old
container would have to be moved to the new container before deletion
which implies all tasks have already left the container and no
mm_struct is holding a pointer to it.

The only condition that will break our code will be if the container
pointer becomes invalid while we are updating stats.  This can be
prevented by RCU section as mentioned by Paul.  I believe explicit
lock and unlock may not provide additional benefit here.

--Vaidy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [RFC][PATCH][2/4] Add RSS accounting and control

2007-02-19 Thread Balbir Singh

Paul Menage wrote:

On 2/19/07, Balbir Singh <[EMAIL PROTECTED]> wrote:

More worrisome is the potential for use-after-free.  What prevents the
pointer at mm->container from referring to freed memory after we're dropped
the lock?


The container cannot be freed unless all tasks holding references to it are
gone,


... or have been moved to other containers. If you're not holding
task->alloc_lock or one of the container mutexes, there's nothing to
stop the task being moved to another container, and the container
being deleted.

If you're in an RCU section then you can guarantee that the container
(that you originally read from the task) and its subsystems at least
won't be deleted while you're accessing them, but for accounting like
this I suspect that's not enough, since you need to be adding to the
accounting stats on the correct container. I think you'll need to hold
mm->container_lock for the duration of memctl_update_rss()

Paul



Yes, that sounds like the correct thing to do.

--
Warm Regards,
Balbir Singh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [RFC][PATCH][2/4] Add RSS accounting and control

2007-02-19 Thread Paul Menage

On 2/19/07, Balbir Singh <[EMAIL PROTECTED]> wrote:

>
> More worrisome is the potential for use-after-free.  What prevents the
> pointer at mm->container from referring to freed memory after we're dropped
> the lock?
>

The container cannot be freed unless all tasks holding references to it are
gone,


... or have been moved to other containers. If you're not holding
task->alloc_lock or one of the container mutexes, there's nothing to
stop the task being moved to another container, and the container
being deleted.

If you're in an RCU section then you can guarantee that the container
(that you originally read from the task) and its subsystems at least
won't be deleted while you're accessing them, but for accounting like
this I suspect that's not enough, since you need to be adding to the
accounting stats on the correct container. I think you'll need to hold
mm->container_lock for the duration of memctl_update_rss()

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [RFC][PATCH][2/4] Add RSS accounting and control

2007-02-19 Thread Balbir Singh

Andrew Morton wrote:

On Mon, 19 Feb 2007 16:39:33 +0530 Balbir Singh <[EMAIL PROTECTED]> wrote:


Andrew Morton wrote:

On Mon, 19 Feb 2007 16:07:44 +0530 Balbir Singh <[EMAIL PROTECTED]> wrote:


+void memctlr_mm_free(struct mm_struct *mm)
+{
+   kfree(mm->counter);
+}
+
+static inline void memctlr_mm_assign_container_direct(struct mm_struct *mm,
+   struct container *cont)
+{
+   write_lock(&mm->container_lock);
+   mm->container = cont;
+   write_unlock(&mm->container_lock);
+}

More weird locking here.


The container field of the mm_struct is protected by a read write spin lock.

That doesn't mean anything to me.

What would go wrong if the above locking was simply removed?  And how does
the locking prevent that fault?


Some pages could charged to the wrong container. Apart from that I do not
see anything going bad (I'll double check that).


Argh.  Please, think about this.



Sure, I will. I guess I am short circuiting my thinking process :-)



That locking *doesn't do anything*.  Except for that one situation I
described: some other holder of the lock reads mm->container twice inside
the lock and requires that the value be the same both times (and that sort
of code should be converted to take a local copy, so this locking here can
be removed).



Yes, that makes sense.


+
+   read_lock(&mm->container_lock);
+   cont = mm->container;
+   read_unlock(&mm->container_lock);
+
+   if (!cont)
+   goto done;

And here.  I mean, if there was a reason for taking the lock around that
read, then testing `cont' outside the lock just invalidated that reason.


We took a consistent snapshot of cont. It cannot change outside the lock,
we check the value outside. I am sure I missed something.

If it cannot change outside the lock then we don't need to take the lock!


We took a snapshot that we thought was consistent.


Consistent with what?  That's a single-word read inside that lock.



Yes, that makes sense.


We check for the value
outside. I guess there is no harm, the worst thing that could happen
is wrong accounting during mm->container changes (when a task changes
container).


If container->lock is held when a task is removed from the
container then yes, `cont' here can refer to a container to which the task
no longer belongs.

More worrisome is the potential for use-after-free.  What prevents the
pointer at mm->container from referring to freed memory after we're dropped
the lock?



The container cannot be freed unless all tasks holding references to it are
gone, that would ensure that all mm->containers are pointing elsewhere and
never to a stale value.

I hope my short-circuited brain got this right :-)



--
Warm Regards,
Balbir Singh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [RFC][PATCH][2/4] Add RSS accounting and control

2007-02-19 Thread Andrew Morton
On Mon, 19 Feb 2007 16:39:33 +0530 Balbir Singh <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> > On Mon, 19 Feb 2007 16:07:44 +0530 Balbir Singh <[EMAIL PROTECTED]> wrote:
> > 
>  +void memctlr_mm_free(struct mm_struct *mm)
>  +{
>  +kfree(mm->counter);
>  +}
>  +
>  +static inline void memctlr_mm_assign_container_direct(struct mm_struct 
>  *mm,
>  +struct 
>  container *cont)
>  +{
>  +write_lock(&mm->container_lock);
>  +mm->container = cont;
>  +write_unlock(&mm->container_lock);
>  +}
> >>> More weird locking here.
> >>>
> >> The container field of the mm_struct is protected by a read write spin 
> >> lock.
> > 
> > That doesn't mean anything to me.
> > 
> > What would go wrong if the above locking was simply removed?  And how does
> > the locking prevent that fault?
> > 
> 
> Some pages could charged to the wrong container. Apart from that I do not
> see anything going bad (I'll double check that).

Argh.  Please, think about this.

That locking *doesn't do anything*.  Except for that one situation I
described: some other holder of the lock reads mm->container twice inside
the lock and requires that the value be the same both times (and that sort
of code should be converted to take a local copy, so this locking here can
be removed).

>  +
>  +read_lock(&mm->container_lock);
>  +cont = mm->container;
>  +read_unlock(&mm->container_lock);
>  +
>  +if (!cont)
>  +goto done;
> >>> And here.  I mean, if there was a reason for taking the lock around that
> >>> read, then testing `cont' outside the lock just invalidated that reason.
> >>>
> >> We took a consistent snapshot of cont. It cannot change outside the lock,
> >> we check the value outside. I am sure I missed something.
> > 
> > If it cannot change outside the lock then we don't need to take the lock!
> > 
> 
> We took a snapshot that we thought was consistent.

Consistent with what?  That's a single-word read inside that lock.

> We check for the value
> outside. I guess there is no harm, the worst thing that could happen
> is wrong accounting during mm->container changes (when a task changes
> container).

If container->lock is held when a task is removed from the
container then yes, `cont' here can refer to a container to which the task
no longer belongs.

More worrisome is the potential for use-after-free.  What prevents the
pointer at mm->container from referring to freed memory after we're dropped
the lock?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [RFC][PATCH][2/4] Add RSS accounting and control

2007-02-19 Thread Andrew Morton
On Mon, 19 Feb 2007 16:07:44 +0530 Balbir Singh <[EMAIL PROTECTED]> wrote:

> >> +void memctlr_mm_free(struct mm_struct *mm)
> >> +{
> >> +  kfree(mm->counter);
> >> +}
> >> +
> >> +static inline void memctlr_mm_assign_container_direct(struct mm_struct 
> >> *mm,
> >> +  struct container *cont)
> >> +{
> >> +  write_lock(&mm->container_lock);
> >> +  mm->container = cont;
> >> +  write_unlock(&mm->container_lock);
> >> +}
> > 
> > More weird locking here.
> > 
> 
> The container field of the mm_struct is protected by a read write spin lock.

That doesn't mean anything to me.

What would go wrong if the above locking was simply removed?  And how does
the locking prevent that fault?


> >> +void memctlr_mm_assign_container(struct mm_struct *mm, struct task_struct 
> >> *p)
> >> +{
> >> +  struct container *cont = task_container(p, &memctlr_subsys);
> >> +  struct memctlr *mem = memctlr_from_cont(cont);
> >> +
> >> +  BUG_ON(!mem);
> >> +  write_lock(&mm->container_lock);
> >> +  mm->container = cont;
> >> +  write_unlock(&mm->container_lock);
> >> +}
> > 
> > And here.
> 
> Ditto.

ditto ;)

> > 
> >> +/*
> >> + * Update the rss usage counters for the mm_struct and the container it 
> >> belongs
> >> + * to. We do not fail rss for pages shared during fork (see 
> >> copy_one_pte()).
> >> + */
> >> +int memctlr_update_rss(struct mm_struct *mm, int count, bool check)
> >> +{
> >> +  int ret = 1;
> >> +  struct container *cont;
> >> +  long usage, limit;
> >> +  struct memctlr *mem;
> >> +
> >> +  read_lock(&mm->container_lock);
> >> +  cont = mm->container;
> >> +  read_unlock(&mm->container_lock);
> >> +
> >> +  if (!cont)
> >> +  goto done;
> > 
> > And here.  I mean, if there was a reason for taking the lock around that
> > read, then testing `cont' outside the lock just invalidated that reason.
> > 
> 
> We took a consistent snapshot of cont. It cannot change outside the lock,
> we check the value outside. I am sure I missed something.

If it cannot change outside the lock then we don't need to take the lock!

> 
> MEMCTLR_DONT_CHECK_LIMIT exists for the following reasons
> 
> 1. Pages are shared during fork, fork() is not failed at that point
> since the pages are shared anyway, we allow the RSS limit to be
> exceeded.
> 2. When ZERO_PAGE is added, we don't check for limits (zeromap_pte_range).
> 3. On reducing RSS (passing -1 as the value)

OK, that might make a nice comment somewhere (if it's not already there).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [RFC][PATCH][2/4] Add RSS accounting and control

2007-02-19 Thread Balbir Singh

Andrew Morton wrote:

On Mon, 19 Feb 2007 16:07:44 +0530 Balbir Singh <[EMAIL PROTECTED]> wrote:


+void memctlr_mm_free(struct mm_struct *mm)
+{
+   kfree(mm->counter);
+}
+
+static inline void memctlr_mm_assign_container_direct(struct mm_struct *mm,
+   struct container *cont)
+{
+   write_lock(&mm->container_lock);
+   mm->container = cont;
+   write_unlock(&mm->container_lock);
+}

More weird locking here.


The container field of the mm_struct is protected by a read write spin lock.


That doesn't mean anything to me.

What would go wrong if the above locking was simply removed?  And how does
the locking prevent that fault?



Some pages could charged to the wrong container. Apart from that I do not
see anything going bad (I'll double check that).




+void memctlr_mm_assign_container(struct mm_struct *mm, struct task_struct *p)
+{
+   struct container *cont = task_container(p, &memctlr_subsys);
+   struct memctlr *mem = memctlr_from_cont(cont);
+
+   BUG_ON(!mem);
+   write_lock(&mm->container_lock);
+   mm->container = cont;
+   write_unlock(&mm->container_lock);
+}

And here.

Ditto.


ditto ;)



:-)


+/*
+ * Update the rss usage counters for the mm_struct and the container it belongs
+ * to. We do not fail rss for pages shared during fork (see copy_one_pte()).
+ */
+int memctlr_update_rss(struct mm_struct *mm, int count, bool check)
+{
+   int ret = 1;
+   struct container *cont;
+   long usage, limit;
+   struct memctlr *mem;
+
+   read_lock(&mm->container_lock);
+   cont = mm->container;
+   read_unlock(&mm->container_lock);
+
+   if (!cont)
+   goto done;

And here.  I mean, if there was a reason for taking the lock around that
read, then testing `cont' outside the lock just invalidated that reason.


We took a consistent snapshot of cont. It cannot change outside the lock,
we check the value outside. I am sure I missed something.


If it cannot change outside the lock then we don't need to take the lock!



We took a snapshot that we thought was consistent. We check for the value
outside. I guess there is no harm, the worst thing that could happen
is wrong accounting during mm->container changes (when a task changes
container).


MEMCTLR_DONT_CHECK_LIMIT exists for the following reasons

1. Pages are shared during fork, fork() is not failed at that point
since the pages are shared anyway, we allow the RSS limit to be
exceeded.
2. When ZERO_PAGE is added, we don't check for limits (zeromap_pte_range).
3. On reducing RSS (passing -1 as the value)


OK, that might make a nice comment somewhere (if it's not already there).


Yes, thanks for keeping us humble and honest, I'll add it.

--
Warm Regards,
Balbir Singh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [RFC][PATCH][2/4] Add RSS accounting and control

2007-02-19 Thread Balbir Singh

Andrew Morton wrote:

On Mon, 19 Feb 2007 12:20:34 +0530 Balbir Singh <[EMAIL PROTECTED]> wrote:


This patch adds the basic accounting hooks to account for pages allocated
into the RSS of a process. Accounting is maintained at two levels, in
the mm_struct of each task and in the memory controller data structure
associated with each node in the container.

When the limit specified for the container is exceeded, the task is killed.
RSS accounting is consistent with the current definition of RSS in the
kernel. Shared pages are accounted into the RSS of each process as is
done in the kernel currently. The code is flexible in that it can be easily
modified to work with any definition of RSS.

..

+static inline int memctlr_mm_init(struct mm_struct *mm)
+{
+   return 0;
+}


So it returns zero on success.  OK.



Oops. it should return 1 on success.


--- linux-2.6.20/kernel/fork.c~memctlr-acct 2007-02-18 22:55:50.0 
+0530
+++ linux-2.6.20-balbir/kernel/fork.c   2007-02-18 22:55:50.0 +0530
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 

 #include 
@@ -342,10 +343,15 @@ static struct mm_struct * mm_init(struct
mm->free_area_cache = TASK_UNMAPPED_BASE;
mm->cached_hole_size = ~0UL;
 
+	if (!memctlr_mm_init(mm))

+   goto err;
+


But here we treat zero as an error?



It's a BUG, I'll fix it.



if (likely(!mm_alloc_pgd(mm))) {
mm->def_flags = 0;
return mm;
}
+
+err:
free_mm(mm);
return NULL;
 }

...

+int memctlr_mm_init(struct mm_struct *mm)
+{
+   mm->counter = kmalloc(sizeof(struct res_counter), GFP_KERNEL);
+   if (!mm->counter)
+   return 0;
+   atomic_long_set(&mm->counter->usage, 0);
+   atomic_long_set(&mm->counter->limit, 0);
+   rwlock_init(&mm->container_lock);
+   return 1;
+}


ah-ha, we have another Documentation/SubmitChecklist customer.

It would be more conventional to make this return -EFOO on error,
zero on success.



ok.. I'll convert the functions to be consistent with the
"return 0 on success" philosophy.


+void memctlr_mm_free(struct mm_struct *mm)
+{
+   kfree(mm->counter);
+}
+
+static inline void memctlr_mm_assign_container_direct(struct mm_struct *mm,
+   struct container *cont)
+{
+   write_lock(&mm->container_lock);
+   mm->container = cont;
+   write_unlock(&mm->container_lock);
+}


More weird locking here.



The container field of the mm_struct is protected by a read write spin lock.


+void memctlr_mm_assign_container(struct mm_struct *mm, struct task_struct *p)
+{
+   struct container *cont = task_container(p, &memctlr_subsys);
+   struct memctlr *mem = memctlr_from_cont(cont);
+
+   BUG_ON(!mem);
+   write_lock(&mm->container_lock);
+   mm->container = cont;
+   write_unlock(&mm->container_lock);
+}


And here.


Ditto.




+/*
+ * Update the rss usage counters for the mm_struct and the container it belongs
+ * to. We do not fail rss for pages shared during fork (see copy_one_pte()).
+ */
+int memctlr_update_rss(struct mm_struct *mm, int count, bool check)
+{
+   int ret = 1;
+   struct container *cont;
+   long usage, limit;
+   struct memctlr *mem;
+
+   read_lock(&mm->container_lock);
+   cont = mm->container;
+   read_unlock(&mm->container_lock);
+
+   if (!cont)
+   goto done;


And here.  I mean, if there was a reason for taking the lock around that
read, then testing `cont' outside the lock just invalidated that reason.



We took a consistent snapshot of cont. It cannot change outside the lock,
we check the value outside. I am sure I missed something.


+static inline void memctlr_double_lock(struct memctlr *mem1,
+   struct memctlr *mem2)
+{
+   if (mem1 > mem2) {
+   spin_lock(&mem1->lock);
+   spin_lock(&mem2->lock);
+   } else {
+   spin_lock(&mem2->lock);
+   spin_lock(&mem1->lock);
+   }
+}


Conventionally we take the lower-addressed lock first when doing this, not
the higher-addressed one.



Will fix this.


+static inline void memctlr_double_unlock(struct memctlr *mem1,
+   struct memctlr *mem2)
+{
+   if (mem1 > mem2) {
+   spin_unlock(&mem2->lock);
+   spin_unlock(&mem1->lock);
+   } else {
+   spin_unlock(&mem1->lock);
+   spin_unlock(&mem2->lock);
+   }
+}
+
...

retval = -ENOMEM;
+
+   if (!memctlr_update_rss(mm, 1, MEMCTLR_CHECK_LIMIT))
+   goto out;
+


Again, please use zero for success and -EFOO for error.

That way, you don't have to assume that the reason memctlr_update_rss()
failed was out-of-memory.  Just propagate the error back.



Yes, will do.


flush_dcache_page(page);
pte = get_locked_pte(mm,