Re: [PATCH RFC 00/14] The new slab memory controller

2019-10-03 Thread Roman Gushchin
On Thu, Oct 03, 2019 at 12:47:41PM +0200, Michal Koutný wrote:
> On Wed, Oct 02, 2019 at 10:00:07PM +0900, Suleiman Souhlal 
>  wrote:
> > kmem.slabinfo has been absolutely invaluable for debugging, in my 
> > experience.
> > I am however not aware of any automation based on it.
> My experience is the same. However, the point is that this has been
> exposed since ages, so the safe assumption is that there may be users.

Yes, but kernel memory accounting was an opt-in feature for years,
and also it can be disabled on boot time, so displaying an empty
memory.slabinfo file doesn't break the interface.

> 
> > Maybe it might be worth adding it to cgroup v2 and have a CONFIG
> > option to enable it?
> I don't think v2 file is necessary given the cost of obtaining the
> information. But I concur the idea of making the per-object tracking
> switchable at boot time or at least CONFIGurable.

As I said, the cost is the same and should be paid in any case,
no matter if cgroup v1 or v2 is used. A user can dynamically switch
between v1 and v2, and there is no way to obtain this information
afterwards, so we need to collect it from scratch.

Another concern I have is that it will require adding a non-trivial amount
of new code (as we don't have dynamically creating and destroying kmem_caches
anymore). It's perfectly doable, but I'm not sure we need it so much
to postpone the merging of the main thing. But I'm happy to hear
any arguments why it's not true.

Thanks!


Re: [PATCH RFC 00/14] The new slab memory controller

2019-10-03 Thread Michal Koutný
On Wed, Oct 02, 2019 at 10:00:07PM +0900, Suleiman Souhlal 
 wrote:
> kmem.slabinfo has been absolutely invaluable for debugging, in my experience.
> I am however not aware of any automation based on it.
My experience is the same. However, the point is that this has been
exposed since ages, so the safe assumption is that there may be users.

> Maybe it might be worth adding it to cgroup v2 and have a CONFIG
> option to enable it?
I don't think v2 file is necessary given the cost of obtaining the
information. But I concur the idea of making the per-object tracking
switchable at boot time or at least CONFIGurable.

Michal


signature.asc
Description: Digital signature


Re: [PATCH RFC 00/14] The new slab memory controller

2019-10-02 Thread Suleiman Souhlal
On Wed, Oct 2, 2019 at 11:09 AM Roman Gushchin  wrote:
>
> On Tue, Oct 01, 2019 at 05:12:02PM +0200, Michal Koutný wrote:
> > On Thu, Sep 05, 2019 at 02:45:44PM -0700, Roman Gushchin  
> > wrote:
> > > Roman Gushchin (14):
> > > [...]
> > >   mm: memcg/slab: use one set of kmem_caches for all memory cgroups
> > From that commit's message:
> >
> > > 6) obsoletes kmem.slabinfo cgroup v1 interface file, as there are
> > >   no per-memcg kmem_caches anymore (empty output is printed)
> >
> > The empty file means no allocations took place in the particular cgroup.
> > I find this quite a surprising change for consumers of these stats.
> >
> > I understand obtaining the same data efficiently from the proposed
> > structures is difficult, however, such a change should be avoided. (In
> > my understanding, obsoleted file ~ not available in v2, however, it
> > should not disappear from v1.)
>
> Well, my assumption is that nobody is using this file for anything except
> debugging purposes (I might be wrong, if somebody has an automation based
> on it, please, let me know). A number of allocations of each type per memory
> cgroup is definitely a useful debug information, but currently it barely works
> (displayed numbers show mostly the number of allocated pages, not the number
> of active objects). We can support it, but it comes with the price, and
> most users don't really need it. So I don't think it worth it to make all
> allocations slower just to keep some debug interface working for some
> cgroup v1 users. Do you have examples when it's really useful and worth
> extra cpu cost?
>
> Unfortunately, we can't enable it conditionally, as a user can switch
> between cgroup v1 and cgroup v2 memory controllers dynamically.

kmem.slabinfo has been absolutely invaluable for debugging, in my experience.
I am however not aware of any automation based on it.

Maybe it might be worth adding it to cgroup v2 and have a CONFIG
option to enable it?

-- Suleiman


Re: [PATCH RFC 00/14] The new slab memory controller

2019-10-01 Thread Roman Gushchin
On Tue, Oct 01, 2019 at 05:12:02PM +0200, Michal Koutný wrote:
> On Thu, Sep 05, 2019 at 02:45:44PM -0700, Roman Gushchin  wrote:
> > Roman Gushchin (14):
> > [...]
> >   mm: memcg/slab: use one set of kmem_caches for all memory cgroups
> From that commit's message:
> 
> > 6) obsoletes kmem.slabinfo cgroup v1 interface file, as there are
> >   no per-memcg kmem_caches anymore (empty output is printed)
> 
> The empty file means no allocations took place in the particular cgroup.
> I find this quite a surprising change for consumers of these stats.
> 
> I understand obtaining the same data efficiently from the proposed
> structures is difficult, however, such a change should be avoided. (In
> my understanding, obsoleted file ~ not available in v2, however, it
> should not disappear from v1.)

Well, my assumption is that nobody is using this file for anything except
debugging purposes (I might be wrong, if somebody has an automation based
on it, please, let me know). A number of allocations of each type per memory
cgroup is definitely a useful debug information, but currently it barely works
(displayed numbers show mostly the number of allocated pages, not the number
of active objects). We can support it, but it comes with the price, and
most users don't really need it. So I don't think it worth it to make all
allocations slower just to keep some debug interface working for some
cgroup v1 users. Do you have examples when it's really useful and worth
extra cpu cost?

Unfortunately, we can't enable it conditionally, as a user can switch
between cgroup v1 and cgroup v2 memory controllers dynamically.

Thanks!


Re: [PATCH RFC 00/14] The new slab memory controller

2019-10-01 Thread Michal Koutný
On Thu, Sep 05, 2019 at 02:45:44PM -0700, Roman Gushchin  wrote:
> Roman Gushchin (14):
> [...]
>   mm: memcg/slab: use one set of kmem_caches for all memory cgroups
From that commit's message:

> 6) obsoletes kmem.slabinfo cgroup v1 interface file, as there are
>   no per-memcg kmem_caches anymore (empty output is printed)

The empty file means no allocations took place in the particular cgroup.
I find this quite a surprising change for consumers of these stats.

I understand obtaining the same data efficiently from the proposed
structures is difficult, however, such a change should be avoided. (In
my understanding, obsoleted file ~ not available in v2, however, it
should not disappear from v1.)

Michal


signature.asc
Description: Digital signature


Re: [PATCH RFC 00/14] The new slab memory controller

2019-09-19 Thread Roman Gushchin
On Fri, Sep 20, 2019 at 06:10:11AM +0900, Suleiman Souhlal wrote:
> On Fri, Sep 20, 2019 at 1:22 AM Roman Gushchin  wrote:
> >
> > On Thu, Sep 19, 2019 at 10:39:18PM +0900, Suleiman Souhlal wrote:
> > > On Fri, Sep 6, 2019 at 6:57 AM Roman Gushchin  wrote:
> > > > The patchset has been tested on a number of different workloads in our
> > > > production. In all cases, it saved hefty amounts of memory:
> > > > 1) web frontend, 650-700 Mb, ~42% of slab memory
> > > > 2) database cache, 750-800 Mb, ~35% of slab memory
> > > > 3) dns server, 700 Mb, ~36% of slab memory
> > >
> > > Do these workloads cycle through a lot of different memcgs?
> >
> > Not really, those are just plain services managed by systemd.
> > They aren't restarted too often, maybe several times per day at most.
> >
> > Also, there is nothing fb-specific. You can take any new modern
> > distributive (I've tried Fedora 30), boot it up and look at the
> > amount of slab memory. Numbers are roughly the same.
> 
> Ah, ok.
> These numbers are kind of surprising to me.
> Do you know if the savings are similar if you use CONFIG_SLAB instead
> of CONFIG_SLUB?

I did only a brief testing of the SLAB version: savings were there, numbers were
slightly less impressive, but still in a double digit number of percents.

> 
> > > For workloads that don't, wouldn't this approach potentially use more
> > > memory? For example, a workload where everything is in one or two
> > > memcgs, and those memcgs last forever.
> > >
> >
> > Yes, it's true, if you have a very small and fixed number of memory cgroups,
> > in theory the new approach can take ~10% more memory.
> >
> > I don't think it's such a big problem though: it seems that the majority
> > of cgroup users have a lot of them, and they are dynamically created and
> > destroyed by systemd/kubernetes/whatever else.
> >
> > And if somebody has a very special setup with only 1-2 cgroups, arguably
> > kernel memory accounting isn't such a big thing for them, so it can be 
> > simple
> > disabled. Am I wrong and do you have a real-life example?
> 
> No, I don't have any specific examples.
> 
> -- Suleiman


Re: [PATCH RFC 00/14] The new slab memory controller

2019-09-19 Thread Suleiman Souhlal
On Fri, Sep 20, 2019 at 1:22 AM Roman Gushchin  wrote:
>
> On Thu, Sep 19, 2019 at 10:39:18PM +0900, Suleiman Souhlal wrote:
> > On Fri, Sep 6, 2019 at 6:57 AM Roman Gushchin  wrote:
> > > The patchset has been tested on a number of different workloads in our
> > > production. In all cases, it saved hefty amounts of memory:
> > > 1) web frontend, 650-700 Mb, ~42% of slab memory
> > > 2) database cache, 750-800 Mb, ~35% of slab memory
> > > 3) dns server, 700 Mb, ~36% of slab memory
> >
> > Do these workloads cycle through a lot of different memcgs?
>
> Not really, those are just plain services managed by systemd.
> They aren't restarted too often, maybe several times per day at most.
>
> Also, there is nothing fb-specific. You can take any new modern
> distributive (I've tried Fedora 30), boot it up and look at the
> amount of slab memory. Numbers are roughly the same.

Ah, ok.
These numbers are kind of surprising to me.
Do you know if the savings are similar if you use CONFIG_SLAB instead
of CONFIG_SLUB?

> > For workloads that don't, wouldn't this approach potentially use more
> > memory? For example, a workload where everything is in one or two
> > memcgs, and those memcgs last forever.
> >
>
> Yes, it's true, if you have a very small and fixed number of memory cgroups,
> in theory the new approach can take ~10% more memory.
>
> I don't think it's such a big problem though: it seems that the majority
> of cgroup users have a lot of them, and they are dynamically created and
> destroyed by systemd/kubernetes/whatever else.
>
> And if somebody has a very special setup with only 1-2 cgroups, arguably
> kernel memory accounting isn't such a big thing for them, so it can be simple
> disabled. Am I wrong and do you have a real-life example?

No, I don't have any specific examples.

-- Suleiman


Re: [PATCH RFC 00/14] The new slab memory controller

2019-09-19 Thread Roman Gushchin
On Thu, Sep 19, 2019 at 10:39:18PM +0900, Suleiman Souhlal wrote:
> On Fri, Sep 6, 2019 at 6:57 AM Roman Gushchin  wrote:
> > The patchset has been tested on a number of different workloads in our
> > production. In all cases, it saved hefty amounts of memory:
> > 1) web frontend, 650-700 Mb, ~42% of slab memory
> > 2) database cache, 750-800 Mb, ~35% of slab memory
> > 3) dns server, 700 Mb, ~36% of slab memory
> 
> Do these workloads cycle through a lot of different memcgs?

Not really, those are just plain services managed by systemd.
They aren't restarted too often, maybe several times per day at most.

Also, there is nothing fb-specific. You can take any new modern
distributive (I've tried Fedora 30), boot it up and look at the
amount of slab memory. Numbers are roughly the same.

> 
> For workloads that don't, wouldn't this approach potentially use more
> memory? For example, a workload where everything is in one or two
> memcgs, and those memcgs last forever.
>

Yes, it's true, if you have a very small and fixed number of memory cgroups,
in theory the new approach can take ~10% more memory.

I don't think it's such a big problem though: it seems that the majority
of cgroup users have a lot of them, and they are dynamically created and
destroyed by systemd/kubernetes/whatever else.

And if somebody has a very special setup with only 1-2 cgroups, arguably
kernel memory accounting isn't such a big thing for them, so it can be simple
disabled. Am I wrong and do you have a real-life example?

Thanks!

Roman


Re: [PATCH RFC 00/14] The new slab memory controller

2019-09-19 Thread Suleiman Souhlal
On Fri, Sep 6, 2019 at 6:57 AM Roman Gushchin  wrote:
> The patchset has been tested on a number of different workloads in our
> production. In all cases, it saved hefty amounts of memory:
> 1) web frontend, 650-700 Mb, ~42% of slab memory
> 2) database cache, 750-800 Mb, ~35% of slab memory
> 3) dns server, 700 Mb, ~36% of slab memory

Do these workloads cycle through a lot of different memcgs?

For workloads that don't, wouldn't this approach potentially use more
memory? For example, a workload where everything is in one or two
memcgs, and those memcgs last forever.

-- Suleiman


Re: [PATCH RFC 00/14] The new slab memory controller

2019-09-17 Thread Roman Gushchin
On Tue, Sep 17, 2019 at 03:48:57PM -0400, Waiman Long wrote:
> On 9/5/19 5:45 PM, Roman Gushchin wrote:
> > The existing slab memory controller is based on the idea of replicating
> > slab allocator internals for each memory cgroup. This approach promises
> > a low memory overhead (one pointer per page), and isn't adding too much
> > code on hot allocation and release paths. But is has a very serious flaw:
> > it leads to a low slab utilization.
> >
> > Using a drgn* script I've got an estimation of slab utilization on
> > a number of machines running different production workloads. In most
> > cases it was between 45% and 65%, and the best number I've seen was
> > around 85%. Turning kmem accounting off brings it to high 90s. Also
> > it brings back 30-50% of slab memory. It means that the real price
> > of the existing slab memory controller is way bigger than a pointer
> > per page.
> >
> > The real reason why the existing design leads to a low slab utilization
> > is simple: slab pages are used exclusively by one memory cgroup.
> > If there are only few allocations of certain size made by a cgroup,
> > or if some active objects (e.g. dentries) are left after the cgroup is
> > deleted, or the cgroup contains a single-threaded application which is
> > barely allocating any kernel objects, but does it every time on a new CPU:
> > in all these cases the resulting slab utilization is very low.
> > If kmem accounting is off, the kernel is able to use free space
> > on slab pages for other allocations.
> >
> > Arguably it wasn't an issue back to days when the kmem controller was
> > introduced and was an opt-in feature, which had to be turned on
> > individually for each memory cgroup. But now it's turned on by default
> > on both cgroup v1 and v2. And modern systemd-based systems tend to
> > create a large number of cgroups.
> >
> > This patchset provides a new implementation of the slab memory controller,
> > which aims to reach a much better slab utilization by sharing slab pages
> > between multiple memory cgroups. Below is the short description of the new
> > design (more details in commit messages).
> >
> > Accounting is performed per-object instead of per-page. Slab-related
> > vmstat counters are converted to bytes. Charging is performed on page-basis,
> > with rounding up and remembering leftovers.
> >
> > Memcg ownership data is stored in a per-slab-page vector: for each slab page
> > a vector of corresponding size is allocated. To keep slab memory reparenting
> > working, instead of saving a pointer to the memory cgroup directly an
> > intermediate object is used. It's simply a pointer to a memcg (which can be
> > easily changed to the parent) with a built-in reference counter. This scheme
> > allows to reparent all allocated objects without walking them over and 
> > changing
> > memcg pointer to the parent.
> >
> > Instead of creating an individual set of kmem_caches for each memory cgroup,
> > two global sets are used: the root set for non-accounted and root-cgroup
> > allocations and the second set for all other allocations. This allows to
> > simplify the lifetime management of individual kmem_caches: they are 
> > destroyed
> > with root counterparts. It allows to remove a good amount of code and make
> > things generally simpler.
> >
> > The patchset contains a couple of semi-independent parts, which can find 
> > their
> > usage outside of the slab memory controller too:
> > 1) subpage charging API, which can be used in the future for accounting of
> >other non-page-sized objects, e.g. percpu allocations.
> > 2) mem_cgroup_ptr API (refcounted pointers to a memcg, can be reused
> >for the efficient reparenting of other objects, e.g. pagecache.
> >
> > The patchset has been tested on a number of different workloads in our
> > production. In all cases, it saved hefty amounts of memory:
> > 1) web frontend, 650-700 Mb, ~42% of slab memory
> > 2) database cache, 750-800 Mb, ~35% of slab memory
> > 3) dns server, 700 Mb, ~36% of slab memory
> >
> > So far I haven't found any regression on all tested workloads, but
> > potential CPU regression caused by more precise accounting is a concern.
> >
> > Obviously the amount of saved memory depend on the number of memory cgroups,
> > uptime and specific workloads, but overall it feels like the new controller
> > saves 30-40% of slab memory, sometimes more. Additionally, it should lead
> > to a lower memory fragmentation, just because of a smaller number of
> > non-movable pages and also because there is no more need to move all
> > slab objects to a new set of pages when a workload is restarted in a new
> > memory cgroup.
> >
> > * https://github.com/osandov/drgn
> >
> >
> > Roman Gushchin (14):
> >   mm: memcg: subpage charging API
> >   mm: memcg: introduce mem_cgroup_ptr
> >   mm: vmstat: use s32 for vm_node_stat_diff in struct per_cpu_nodestat
> >   mm: vmstat: convert slab vmstat counter to bytes
> >   mm: memcg/slab: allocate space for mem

Re: [PATCH RFC 00/14] The new slab memory controller

2019-09-17 Thread Waiman Long
On 9/5/19 5:45 PM, Roman Gushchin wrote:
> The existing slab memory controller is based on the idea of replicating
> slab allocator internals for each memory cgroup. This approach promises
> a low memory overhead (one pointer per page), and isn't adding too much
> code on hot allocation and release paths. But is has a very serious flaw:
> it leads to a low slab utilization.
>
> Using a drgn* script I've got an estimation of slab utilization on
> a number of machines running different production workloads. In most
> cases it was between 45% and 65%, and the best number I've seen was
> around 85%. Turning kmem accounting off brings it to high 90s. Also
> it brings back 30-50% of slab memory. It means that the real price
> of the existing slab memory controller is way bigger than a pointer
> per page.
>
> The real reason why the existing design leads to a low slab utilization
> is simple: slab pages are used exclusively by one memory cgroup.
> If there are only few allocations of certain size made by a cgroup,
> or if some active objects (e.g. dentries) are left after the cgroup is
> deleted, or the cgroup contains a single-threaded application which is
> barely allocating any kernel objects, but does it every time on a new CPU:
> in all these cases the resulting slab utilization is very low.
> If kmem accounting is off, the kernel is able to use free space
> on slab pages for other allocations.
>
> Arguably it wasn't an issue back to days when the kmem controller was
> introduced and was an opt-in feature, which had to be turned on
> individually for each memory cgroup. But now it's turned on by default
> on both cgroup v1 and v2. And modern systemd-based systems tend to
> create a large number of cgroups.
>
> This patchset provides a new implementation of the slab memory controller,
> which aims to reach a much better slab utilization by sharing slab pages
> between multiple memory cgroups. Below is the short description of the new
> design (more details in commit messages).
>
> Accounting is performed per-object instead of per-page. Slab-related
> vmstat counters are converted to bytes. Charging is performed on page-basis,
> with rounding up and remembering leftovers.
>
> Memcg ownership data is stored in a per-slab-page vector: for each slab page
> a vector of corresponding size is allocated. To keep slab memory reparenting
> working, instead of saving a pointer to the memory cgroup directly an
> intermediate object is used. It's simply a pointer to a memcg (which can be
> easily changed to the parent) with a built-in reference counter. This scheme
> allows to reparent all allocated objects without walking them over and 
> changing
> memcg pointer to the parent.
>
> Instead of creating an individual set of kmem_caches for each memory cgroup,
> two global sets are used: the root set for non-accounted and root-cgroup
> allocations and the second set for all other allocations. This allows to
> simplify the lifetime management of individual kmem_caches: they are destroyed
> with root counterparts. It allows to remove a good amount of code and make
> things generally simpler.
>
> The patchset contains a couple of semi-independent parts, which can find their
> usage outside of the slab memory controller too:
> 1) subpage charging API, which can be used in the future for accounting of
>other non-page-sized objects, e.g. percpu allocations.
> 2) mem_cgroup_ptr API (refcounted pointers to a memcg, can be reused
>for the efficient reparenting of other objects, e.g. pagecache.
>
> The patchset has been tested on a number of different workloads in our
> production. In all cases, it saved hefty amounts of memory:
> 1) web frontend, 650-700 Mb, ~42% of slab memory
> 2) database cache, 750-800 Mb, ~35% of slab memory
> 3) dns server, 700 Mb, ~36% of slab memory
>
> So far I haven't found any regression on all tested workloads, but
> potential CPU regression caused by more precise accounting is a concern.
>
> Obviously the amount of saved memory depend on the number of memory cgroups,
> uptime and specific workloads, but overall it feels like the new controller
> saves 30-40% of slab memory, sometimes more. Additionally, it should lead
> to a lower memory fragmentation, just because of a smaller number of
> non-movable pages and also because there is no more need to move all
> slab objects to a new set of pages when a workload is restarted in a new
> memory cgroup.
>
> * https://github.com/osandov/drgn
>
>
> Roman Gushchin (14):
>   mm: memcg: subpage charging API
>   mm: memcg: introduce mem_cgroup_ptr
>   mm: vmstat: use s32 for vm_node_stat_diff in struct per_cpu_nodestat
>   mm: vmstat: convert slab vmstat counter to bytes
>   mm: memcg/slab: allocate space for memcg ownership data for non-root
> slabs
>   mm: slub: implement SLUB version of obj_to_index()
>   mm: memcg/slab: save memcg ownership data for non-root slab objects
>   mm: memcg: move memcg_kmem_bypass() to memcontrol.h
>   mm: memcg: introduc

[PATCH RFC 00/14] The new slab memory controller

2019-09-05 Thread Roman Gushchin
The existing slab memory controller is based on the idea of replicating
slab allocator internals for each memory cgroup. This approach promises
a low memory overhead (one pointer per page), and isn't adding too much
code on hot allocation and release paths. But is has a very serious flaw:
it leads to a low slab utilization.

Using a drgn* script I've got an estimation of slab utilization on
a number of machines running different production workloads. In most
cases it was between 45% and 65%, and the best number I've seen was
around 85%. Turning kmem accounting off brings it to high 90s. Also
it brings back 30-50% of slab memory. It means that the real price
of the existing slab memory controller is way bigger than a pointer
per page.

The real reason why the existing design leads to a low slab utilization
is simple: slab pages are used exclusively by one memory cgroup.
If there are only few allocations of certain size made by a cgroup,
or if some active objects (e.g. dentries) are left after the cgroup is
deleted, or the cgroup contains a single-threaded application which is
barely allocating any kernel objects, but does it every time on a new CPU:
in all these cases the resulting slab utilization is very low.
If kmem accounting is off, the kernel is able to use free space
on slab pages for other allocations.

Arguably it wasn't an issue back to days when the kmem controller was
introduced and was an opt-in feature, which had to be turned on
individually for each memory cgroup. But now it's turned on by default
on both cgroup v1 and v2. And modern systemd-based systems tend to
create a large number of cgroups.

This patchset provides a new implementation of the slab memory controller,
which aims to reach a much better slab utilization by sharing slab pages
between multiple memory cgroups. Below is the short description of the new
design (more details in commit messages).

Accounting is performed per-object instead of per-page. Slab-related
vmstat counters are converted to bytes. Charging is performed on page-basis,
with rounding up and remembering leftovers.

Memcg ownership data is stored in a per-slab-page vector: for each slab page
a vector of corresponding size is allocated. To keep slab memory reparenting
working, instead of saving a pointer to the memory cgroup directly an
intermediate object is used. It's simply a pointer to a memcg (which can be
easily changed to the parent) with a built-in reference counter. This scheme
allows to reparent all allocated objects without walking them over and changing
memcg pointer to the parent.

Instead of creating an individual set of kmem_caches for each memory cgroup,
two global sets are used: the root set for non-accounted and root-cgroup
allocations and the second set for all other allocations. This allows to
simplify the lifetime management of individual kmem_caches: they are destroyed
with root counterparts. It allows to remove a good amount of code and make
things generally simpler.

The patchset contains a couple of semi-independent parts, which can find their
usage outside of the slab memory controller too:
1) subpage charging API, which can be used in the future for accounting of
   other non-page-sized objects, e.g. percpu allocations.
2) mem_cgroup_ptr API (refcounted pointers to a memcg, can be reused
   for the efficient reparenting of other objects, e.g. pagecache.

The patchset has been tested on a number of different workloads in our
production. In all cases, it saved hefty amounts of memory:
1) web frontend, 650-700 Mb, ~42% of slab memory
2) database cache, 750-800 Mb, ~35% of slab memory
3) dns server, 700 Mb, ~36% of slab memory

So far I haven't found any regression on all tested workloads, but
potential CPU regression caused by more precise accounting is a concern.

Obviously the amount of saved memory depend on the number of memory cgroups,
uptime and specific workloads, but overall it feels like the new controller
saves 30-40% of slab memory, sometimes more. Additionally, it should lead
to a lower memory fragmentation, just because of a smaller number of
non-movable pages and also because there is no more need to move all
slab objects to a new set of pages when a workload is restarted in a new
memory cgroup.

* https://github.com/osandov/drgn


Roman Gushchin (14):
  mm: memcg: subpage charging API
  mm: memcg: introduce mem_cgroup_ptr
  mm: vmstat: use s32 for vm_node_stat_diff in struct per_cpu_nodestat
  mm: vmstat: convert slab vmstat counter to bytes
  mm: memcg/slab: allocate space for memcg ownership data for non-root
slabs
  mm: slub: implement SLUB version of obj_to_index()
  mm: memcg/slab: save memcg ownership data for non-root slab objects
  mm: memcg: move memcg_kmem_bypass() to memcontrol.h
  mm: memcg: introduce __mod_lruvec_memcg_state()
  mm: memcg/slab: charge individual slab objects instead of pages
  mm: memcg: move get_mem_cgroup_from_current() to memcontrol.h
  mm: memcg/slab: replace memcg_from_slab_page() with