On 10/11/2012 02:11 PM, Michal Hocko wrote:
> On Mon 08-10-12 14:06:10, Glauber Costa wrote:
>> This patch adds the basic infrastructure for the accounting of the slab
>> caches. To control that, the following files are created:
>>
>>  * memory.kmem.usage_in_bytes
>>  * memory.kmem.limit_in_bytes
>>  * memory.kmem.failcnt
>>  * memory.kmem.max_usage_in_bytes
>>
>> They have the same meaning of their user memory counterparts. They
>> reflect the state of the "kmem" res_counter.
>>
>> Per cgroup slab memory accounting is not enabled until a limit is set
> 
> s/slab/kmem/ right?
> 
right.

>> +static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
>> +{
>> +    int ret = -EINVAL;
>> +#ifdef CONFIG_MEMCG_KMEM
>> +    struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>> +    /*
>> +     * For simplicity, we won't allow this to be disabled.  It also can't
>> +     * be changed if the cgroup has children already, or if tasks had
>> +     * already joined.
>> +     *
>> +     * If tasks join before we set the limit, a person looking at
>> +     * kmem.usage_in_bytes will have no way to determine when it took
>> +     * place, which makes the value quite meaningless.
>> +     *
>> +     * After it first became limited, changes in the value of the limit are
>> +     * of course permitted.
>> +     *
>> +     * Taking the cgroup_lock is really offensive, but it is so far the only
>> +     * way to guarantee that no children will appear. There are plenty of
>> +     * other offenders, and they should all go away. Fine grained locking
>> +     * is probably the way to go here. When we are fully hierarchical, we
>> +     * can also get rid of the use_hierarchy check.
>> +     */
>> +    cgroup_lock();
>> +    mutex_lock(&set_limit_mutex);
>> +    if (!memcg->kmem_accounted && val != RESOURCE_MAX) {
> 
> Just a nit but wouldn't memcg_kmem_is_accounted(memcg) be better than
> directly checking kmem_accounted?
> Besides that I am not sure I fully understand RESOURCE_MAX test. Say I
> want to have kmem accounting for monitoring so I do 
> echo -1 > memory.kmem.limit_in_bytes
> 
> so you set the value but do not activate it. Isn't this just a reminder
> from the time when the accounting could be deactivated?
> 

No, not at all.

I see you have talked about that in other e-mails, (I was on sick leave
yesterday), so let me consolidate it all here:

What we discussed before, regarding to echo -1 > ... was around the
disable code, something that we no longer allow. So now, if you will
echo -1 to that file *after* it is limited, you get in track only mode.

But for you to start that, you absolutely have to write something
different than -1.

Just one example: libcgroup, regardless of how lame we think it is in
this regard, will write to all cgroup files by default when a file is
updated. If you haven't written anything, it will still write the same
value that the file had before.

This means that an already deployed libcg-managed installation will
suddenly enable kmem for every cgroup. Sure this can be fixed in
userspace, but:

1) There is no reason to break it, if we can
2) It is perfectly reasonable to expect that if you write to a file the
same value that was already there, nothing happens.

I'll update the docs to say that you can just write -1 *after* it is
limited, but i believe enabling it has to be a very clear transition,
for sanity's sake.

>> +            if (cgroup_task_count(cont) || (memcg->use_hierarchy &&
>> +                                            !list_empty(&cont->children))) {
>> +                    ret = -EBUSY;
>> +                    goto out;
>> +            }
>> +            ret = res_counter_set_limit(&memcg->kmem, val);
> 
> VM_BUG_IN(ret) ?
> There shouldn't be any usage when you enable it or something bad is
> going on.
>
Good point, this is indeed an impossible scenario I was just being
overcautious about.

_______________________________________________
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel

Reply via email to