Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Thu 28-06-18 12:21:26, Shakeel Butt wrote: > On Thu, Jun 28, 2018 at 12:03 PM Jan Kara wrote: > > > > On Wed 27-06-18 12:12:49, Shakeel Butt wrote: > > > A lot of memory can be consumed by the events generated for the huge or > > > unlimited queues if there is either no or slow listener. This can cause > > > system level memory pressure or OOMs. So, it's better to account the > > > fsnotify kmem caches to the memcg of the listener. > > > > > > However the listener can be in a different memcg than the memcg of the > > > producer and these allocations happen in the context of the event > > > producer. This patch introduces remote memcg charging API which the > > > producer can use to charge the allocations to the memcg of the listener. > > > > > > There are seven fsnotify kmem caches and among them allocations from > > > dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and > > > inotify_inode_mark_cachep happens in the context of syscall from the > > > listener. So, SLAB_ACCOUNT is enough for these caches. > > > > > > The objects from fsnotify_mark_connector_cachep are not accounted as they > > > are small compared to the notification mark or events and it is unclear > > > whom to account connector to since it is shared by all events attached to > > > the inode. > > > > > > The allocations from the event caches happen in the context of the event > > > producer. For such caches we will need to remote charge the allocations > > > to the listener's memcg. Thus we save the memcg reference in the > > > fsnotify_group structure of the listener. > > > > > > This patch has also moved the members of fsnotify_group to keep the size > > > same, at least for 64 bit build, even with additional member by filling > > > the holes. > > > > ... > > > > > static int __init fanotify_user_setup(void) > > > { > > > - fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, SLAB_PANIC); > > > + fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, > > > + SLAB_PANIC|SLAB_ACCOUNT); > > > fanotify_event_cachep = KMEM_CACHE(fanotify_event_info, SLAB_PANIC); > > > if (IS_ENABLED(CONFIG_FANOTIFY_ACCESS_PERMISSIONS)) { > > > fanotify_perm_event_cachep = > > > > Why don't you setup also fanotify_event_cachep and > > fanotify_perm_event_cachep caches with SLAB_ACCOUNT and instead specify > > __GFP_ACCOUNT manually? Otherwise the patch looks good to me. > > > > Hi Jan, IMHO having a visible __GFP_ACCOUNT along with > memalloc_use_memcg() makes the code more explicit and readable that we > want to targeted/remote memcg charging. However if you think > otherwise, I will replace __GFP_ACCOUNT with SLAB_ACCOUNT. OK, fair enough. Honza -- Jan Kara SUSE Labs, CR
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Thu 28-06-18 12:21:26, Shakeel Butt wrote: > On Thu, Jun 28, 2018 at 12:03 PM Jan Kara wrote: > > > > On Wed 27-06-18 12:12:49, Shakeel Butt wrote: > > > A lot of memory can be consumed by the events generated for the huge or > > > unlimited queues if there is either no or slow listener. This can cause > > > system level memory pressure or OOMs. So, it's better to account the > > > fsnotify kmem caches to the memcg of the listener. > > > > > > However the listener can be in a different memcg than the memcg of the > > > producer and these allocations happen in the context of the event > > > producer. This patch introduces remote memcg charging API which the > > > producer can use to charge the allocations to the memcg of the listener. > > > > > > There are seven fsnotify kmem caches and among them allocations from > > > dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and > > > inotify_inode_mark_cachep happens in the context of syscall from the > > > listener. So, SLAB_ACCOUNT is enough for these caches. > > > > > > The objects from fsnotify_mark_connector_cachep are not accounted as they > > > are small compared to the notification mark or events and it is unclear > > > whom to account connector to since it is shared by all events attached to > > > the inode. > > > > > > The allocations from the event caches happen in the context of the event > > > producer. For such caches we will need to remote charge the allocations > > > to the listener's memcg. Thus we save the memcg reference in the > > > fsnotify_group structure of the listener. > > > > > > This patch has also moved the members of fsnotify_group to keep the size > > > same, at least for 64 bit build, even with additional member by filling > > > the holes. > > > > ... > > > > > static int __init fanotify_user_setup(void) > > > { > > > - fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, SLAB_PANIC); > > > + fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, > > > + SLAB_PANIC|SLAB_ACCOUNT); > > > fanotify_event_cachep = KMEM_CACHE(fanotify_event_info, SLAB_PANIC); > > > if (IS_ENABLED(CONFIG_FANOTIFY_ACCESS_PERMISSIONS)) { > > > fanotify_perm_event_cachep = > > > > Why don't you setup also fanotify_event_cachep and > > fanotify_perm_event_cachep caches with SLAB_ACCOUNT and instead specify > > __GFP_ACCOUNT manually? Otherwise the patch looks good to me. > > > > Hi Jan, IMHO having a visible __GFP_ACCOUNT along with > memalloc_use_memcg() makes the code more explicit and readable that we > want to targeted/remote memcg charging. However if you think > otherwise, I will replace __GFP_ACCOUNT with SLAB_ACCOUNT. OK, fair enough. Honza -- Jan Kara SUSE Labs, CR
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Thu 28-06-18 12:21:26, Shakeel Butt wrote: > On Thu, Jun 28, 2018 at 12:03 PM Jan Kara wrote: > > > > On Wed 27-06-18 12:12:49, Shakeel Butt wrote: > > > A lot of memory can be consumed by the events generated for the huge or > > > unlimited queues if there is either no or slow listener. This can cause > > > system level memory pressure or OOMs. So, it's better to account the > > > fsnotify kmem caches to the memcg of the listener. > > > > > > However the listener can be in a different memcg than the memcg of the > > > producer and these allocations happen in the context of the event > > > producer. This patch introduces remote memcg charging API which the > > > producer can use to charge the allocations to the memcg of the listener. > > > > > > There are seven fsnotify kmem caches and among them allocations from > > > dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and > > > inotify_inode_mark_cachep happens in the context of syscall from the > > > listener. So, SLAB_ACCOUNT is enough for these caches. > > > > > > The objects from fsnotify_mark_connector_cachep are not accounted as they > > > are small compared to the notification mark or events and it is unclear > > > whom to account connector to since it is shared by all events attached to > > > the inode. > > > > > > The allocations from the event caches happen in the context of the event > > > producer. For such caches we will need to remote charge the allocations > > > to the listener's memcg. Thus we save the memcg reference in the > > > fsnotify_group structure of the listener. > > > > > > This patch has also moved the members of fsnotify_group to keep the size > > > same, at least for 64 bit build, even with additional member by filling > > > the holes. > > > > ... > > > > > static int __init fanotify_user_setup(void) > > > { > > > - fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, SLAB_PANIC); > > > + fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, > > > + SLAB_PANIC|SLAB_ACCOUNT); > > > fanotify_event_cachep = KMEM_CACHE(fanotify_event_info, SLAB_PANIC); > > > if (IS_ENABLED(CONFIG_FANOTIFY_ACCESS_PERMISSIONS)) { > > > fanotify_perm_event_cachep = > > > > Why don't you setup also fanotify_event_cachep and > > fanotify_perm_event_cachep caches with SLAB_ACCOUNT and instead specify > > __GFP_ACCOUNT manually? Otherwise the patch looks good to me. > > > > Hi Jan, IMHO having a visible __GFP_ACCOUNT along with > memalloc_use_memcg() makes the code more explicit and readable that we > want to targeted/remote memcg charging. Agreed. If you had an implicit SLAB_ACCOUNT then you could get inconsistencies when some allocations would get charged to the current task while others would not. -- Michal Hocko SUSE Labs
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Thu 28-06-18 12:21:26, Shakeel Butt wrote: > On Thu, Jun 28, 2018 at 12:03 PM Jan Kara wrote: > > > > On Wed 27-06-18 12:12:49, Shakeel Butt wrote: > > > A lot of memory can be consumed by the events generated for the huge or > > > unlimited queues if there is either no or slow listener. This can cause > > > system level memory pressure or OOMs. So, it's better to account the > > > fsnotify kmem caches to the memcg of the listener. > > > > > > However the listener can be in a different memcg than the memcg of the > > > producer and these allocations happen in the context of the event > > > producer. This patch introduces remote memcg charging API which the > > > producer can use to charge the allocations to the memcg of the listener. > > > > > > There are seven fsnotify kmem caches and among them allocations from > > > dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and > > > inotify_inode_mark_cachep happens in the context of syscall from the > > > listener. So, SLAB_ACCOUNT is enough for these caches. > > > > > > The objects from fsnotify_mark_connector_cachep are not accounted as they > > > are small compared to the notification mark or events and it is unclear > > > whom to account connector to since it is shared by all events attached to > > > the inode. > > > > > > The allocations from the event caches happen in the context of the event > > > producer. For such caches we will need to remote charge the allocations > > > to the listener's memcg. Thus we save the memcg reference in the > > > fsnotify_group structure of the listener. > > > > > > This patch has also moved the members of fsnotify_group to keep the size > > > same, at least for 64 bit build, even with additional member by filling > > > the holes. > > > > ... > > > > > static int __init fanotify_user_setup(void) > > > { > > > - fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, SLAB_PANIC); > > > + fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, > > > + SLAB_PANIC|SLAB_ACCOUNT); > > > fanotify_event_cachep = KMEM_CACHE(fanotify_event_info, SLAB_PANIC); > > > if (IS_ENABLED(CONFIG_FANOTIFY_ACCESS_PERMISSIONS)) { > > > fanotify_perm_event_cachep = > > > > Why don't you setup also fanotify_event_cachep and > > fanotify_perm_event_cachep caches with SLAB_ACCOUNT and instead specify > > __GFP_ACCOUNT manually? Otherwise the patch looks good to me. > > > > Hi Jan, IMHO having a visible __GFP_ACCOUNT along with > memalloc_use_memcg() makes the code more explicit and readable that we > want to targeted/remote memcg charging. Agreed. If you had an implicit SLAB_ACCOUNT then you could get inconsistencies when some allocations would get charged to the current task while others would not. -- Michal Hocko SUSE Labs
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Thu, Jun 28, 2018 at 12:03 PM Jan Kara wrote: > > On Wed 27-06-18 12:12:49, Shakeel Butt wrote: > > A lot of memory can be consumed by the events generated for the huge or > > unlimited queues if there is either no or slow listener. This can cause > > system level memory pressure or OOMs. So, it's better to account the > > fsnotify kmem caches to the memcg of the listener. > > > > However the listener can be in a different memcg than the memcg of the > > producer and these allocations happen in the context of the event > > producer. This patch introduces remote memcg charging API which the > > producer can use to charge the allocations to the memcg of the listener. > > > > There are seven fsnotify kmem caches and among them allocations from > > dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and > > inotify_inode_mark_cachep happens in the context of syscall from the > > listener. So, SLAB_ACCOUNT is enough for these caches. > > > > The objects from fsnotify_mark_connector_cachep are not accounted as they > > are small compared to the notification mark or events and it is unclear > > whom to account connector to since it is shared by all events attached to > > the inode. > > > > The allocations from the event caches happen in the context of the event > > producer. For such caches we will need to remote charge the allocations > > to the listener's memcg. Thus we save the memcg reference in the > > fsnotify_group structure of the listener. > > > > This patch has also moved the members of fsnotify_group to keep the size > > same, at least for 64 bit build, even with additional member by filling > > the holes. > > ... > > > static int __init fanotify_user_setup(void) > > { > > - fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, SLAB_PANIC); > > + fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, > > + SLAB_PANIC|SLAB_ACCOUNT); > > fanotify_event_cachep = KMEM_CACHE(fanotify_event_info, SLAB_PANIC); > > if (IS_ENABLED(CONFIG_FANOTIFY_ACCESS_PERMISSIONS)) { > > fanotify_perm_event_cachep = > > Why don't you setup also fanotify_event_cachep and > fanotify_perm_event_cachep caches with SLAB_ACCOUNT and instead specify > __GFP_ACCOUNT manually? Otherwise the patch looks good to me. > Hi Jan, IMHO having a visible __GFP_ACCOUNT along with memalloc_use_memcg() makes the code more explicit and readable that we want to targeted/remote memcg charging. However if you think otherwise, I will replace __GFP_ACCOUNT with SLAB_ACCOUNT. thanks, Shakeel
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Thu, Jun 28, 2018 at 12:03 PM Jan Kara wrote: > > On Wed 27-06-18 12:12:49, Shakeel Butt wrote: > > A lot of memory can be consumed by the events generated for the huge or > > unlimited queues if there is either no or slow listener. This can cause > > system level memory pressure or OOMs. So, it's better to account the > > fsnotify kmem caches to the memcg of the listener. > > > > However the listener can be in a different memcg than the memcg of the > > producer and these allocations happen in the context of the event > > producer. This patch introduces remote memcg charging API which the > > producer can use to charge the allocations to the memcg of the listener. > > > > There are seven fsnotify kmem caches and among them allocations from > > dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and > > inotify_inode_mark_cachep happens in the context of syscall from the > > listener. So, SLAB_ACCOUNT is enough for these caches. > > > > The objects from fsnotify_mark_connector_cachep are not accounted as they > > are small compared to the notification mark or events and it is unclear > > whom to account connector to since it is shared by all events attached to > > the inode. > > > > The allocations from the event caches happen in the context of the event > > producer. For such caches we will need to remote charge the allocations > > to the listener's memcg. Thus we save the memcg reference in the > > fsnotify_group structure of the listener. > > > > This patch has also moved the members of fsnotify_group to keep the size > > same, at least for 64 bit build, even with additional member by filling > > the holes. > > ... > > > static int __init fanotify_user_setup(void) > > { > > - fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, SLAB_PANIC); > > + fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, > > + SLAB_PANIC|SLAB_ACCOUNT); > > fanotify_event_cachep = KMEM_CACHE(fanotify_event_info, SLAB_PANIC); > > if (IS_ENABLED(CONFIG_FANOTIFY_ACCESS_PERMISSIONS)) { > > fanotify_perm_event_cachep = > > Why don't you setup also fanotify_event_cachep and > fanotify_perm_event_cachep caches with SLAB_ACCOUNT and instead specify > __GFP_ACCOUNT manually? Otherwise the patch looks good to me. > Hi Jan, IMHO having a visible __GFP_ACCOUNT along with memalloc_use_memcg() makes the code more explicit and readable that we want to targeted/remote memcg charging. However if you think otherwise, I will replace __GFP_ACCOUNT with SLAB_ACCOUNT. thanks, Shakeel
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Wed 27-06-18 12:12:49, Shakeel Butt wrote: > A lot of memory can be consumed by the events generated for the huge or > unlimited queues if there is either no or slow listener. This can cause > system level memory pressure or OOMs. So, it's better to account the > fsnotify kmem caches to the memcg of the listener. > > However the listener can be in a different memcg than the memcg of the > producer and these allocations happen in the context of the event > producer. This patch introduces remote memcg charging API which the > producer can use to charge the allocations to the memcg of the listener. > > There are seven fsnotify kmem caches and among them allocations from > dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and > inotify_inode_mark_cachep happens in the context of syscall from the > listener. So, SLAB_ACCOUNT is enough for these caches. > > The objects from fsnotify_mark_connector_cachep are not accounted as they > are small compared to the notification mark or events and it is unclear > whom to account connector to since it is shared by all events attached to > the inode. > > The allocations from the event caches happen in the context of the event > producer. For such caches we will need to remote charge the allocations > to the listener's memcg. Thus we save the memcg reference in the > fsnotify_group structure of the listener. > > This patch has also moved the members of fsnotify_group to keep the size > same, at least for 64 bit build, even with additional member by filling > the holes. ... > static int __init fanotify_user_setup(void) > { > - fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, SLAB_PANIC); > + fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, > + SLAB_PANIC|SLAB_ACCOUNT); > fanotify_event_cachep = KMEM_CACHE(fanotify_event_info, SLAB_PANIC); > if (IS_ENABLED(CONFIG_FANOTIFY_ACCESS_PERMISSIONS)) { > fanotify_perm_event_cachep = Why don't you setup also fanotify_event_cachep and fanotify_perm_event_cachep caches with SLAB_ACCOUNT and instead specify __GFP_ACCOUNT manually? Otherwise the patch looks good to me. Honza -- Jan Kara SUSE Labs, CR
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Wed 27-06-18 12:12:49, Shakeel Butt wrote: > A lot of memory can be consumed by the events generated for the huge or > unlimited queues if there is either no or slow listener. This can cause > system level memory pressure or OOMs. So, it's better to account the > fsnotify kmem caches to the memcg of the listener. > > However the listener can be in a different memcg than the memcg of the > producer and these allocations happen in the context of the event > producer. This patch introduces remote memcg charging API which the > producer can use to charge the allocations to the memcg of the listener. > > There are seven fsnotify kmem caches and among them allocations from > dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and > inotify_inode_mark_cachep happens in the context of syscall from the > listener. So, SLAB_ACCOUNT is enough for these caches. > > The objects from fsnotify_mark_connector_cachep are not accounted as they > are small compared to the notification mark or events and it is unclear > whom to account connector to since it is shared by all events attached to > the inode. > > The allocations from the event caches happen in the context of the event > producer. For such caches we will need to remote charge the allocations > to the listener's memcg. Thus we save the memcg reference in the > fsnotify_group structure of the listener. > > This patch has also moved the members of fsnotify_group to keep the size > same, at least for 64 bit build, even with additional member by filling > the holes. ... > static int __init fanotify_user_setup(void) > { > - fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, SLAB_PANIC); > + fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, > + SLAB_PANIC|SLAB_ACCOUNT); > fanotify_event_cachep = KMEM_CACHE(fanotify_event_info, SLAB_PANIC); > if (IS_ENABLED(CONFIG_FANOTIFY_ACCESS_PERMISSIONS)) { > fanotify_perm_event_cachep = Why don't you setup also fanotify_event_cachep and fanotify_perm_event_cachep caches with SLAB_ACCOUNT and instead specify __GFP_ACCOUNT manually? Otherwise the patch looks good to me. Honza -- Jan Kara SUSE Labs, CR
[PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
A lot of memory can be consumed by the events generated for the huge or unlimited queues if there is either no or slow listener. This can cause system level memory pressure or OOMs. So, it's better to account the fsnotify kmem caches to the memcg of the listener. However the listener can be in a different memcg than the memcg of the producer and these allocations happen in the context of the event producer. This patch introduces remote memcg charging API which the producer can use to charge the allocations to the memcg of the listener. There are seven fsnotify kmem caches and among them allocations from dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and inotify_inode_mark_cachep happens in the context of syscall from the listener. So, SLAB_ACCOUNT is enough for these caches. The objects from fsnotify_mark_connector_cachep are not accounted as they are small compared to the notification mark or events and it is unclear whom to account connector to since it is shared by all events attached to the inode. The allocations from the event caches happen in the context of the event producer. For such caches we will need to remote charge the allocations to the listener's memcg. Thus we save the memcg reference in the fsnotify_group structure of the listener. This patch has also moved the members of fsnotify_group to keep the size same, at least for 64 bit build, even with additional member by filling the holes. Signed-off-by: Shakeel Butt Cc: Michal Hocko Cc: Jan Kara Cc: Amir Goldstein Cc: Greg Thelen Cc: Johannes Weiner Cc: Vladimir Davydov Cc: Roman Gushchin Cc: Andrew Morton Cc: Alexander Viro --- Changelog since v7: - Simplify the remote memcg charging API - get_mem_cgroup_from_current() returns root_mem_cgroup if current->active_memcg is not NULL but css_tryget_online() fails. Changelog since v6: - Removed Jan's ACK as the code has changed a lot - Squashed the separate remote charging API path into this one - Removed kmalloc* & kmem_cache_alloc* APIs and only kept the scope API - Changed fsnotify remote charging code to use scope API Changelog since v5: - None Changelog since v4: - Fixed the build for CONFIG_MEMCG=n Changelog since v3: - Rebased over Jan's patches. - Some cleanup based on Amir's comments. Changelog since v2: - None Changelog since v1: - no more charging fsnotify_mark_connector objects - Fixed the build for SLOB fs/notify/dnotify/dnotify.c | 5 ++-- fs/notify/fanotify/fanotify.c| 14 --- fs/notify/fanotify/fanotify_user.c | 5 +++- fs/notify/group.c| 3 +++ fs/notify/inotify/inotify_fsnotify.c | 7 +- fs/notify/inotify/inotify_user.c | 5 +++- include/linux/fsnotify_backend.h | 12 ++--- include/linux/memcontrol.h | 10 +++- include/linux/sched.h| 3 +++ include/linux/sched/mm.h | 37 kernel/fork.c| 3 +++ mm/memcontrol.c | 37 +--- 12 files changed, 123 insertions(+), 18 deletions(-) diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c index e2bea2ac5dfb..a6365e6bc047 100644 --- a/fs/notify/dnotify/dnotify.c +++ b/fs/notify/dnotify/dnotify.c @@ -384,8 +384,9 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg) static int __init dnotify_init(void) { - dnotify_struct_cache = KMEM_CACHE(dnotify_struct, SLAB_PANIC); - dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC); + dnotify_struct_cache = KMEM_CACHE(dnotify_struct, + SLAB_PANIC|SLAB_ACCOUNT); + dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC|SLAB_ACCOUNT); dnotify_group = fsnotify_alloc_group(_fsnotify_ops); if (IS_ERR(dnotify_group)) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index f90842efea13..6ff1f75d156d 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -11,6 +11,7 @@ #include #include #include +#include #include "fanotify.h" @@ -140,8 +141,8 @@ struct fanotify_event_info *fanotify_alloc_event(struct fsnotify_group *group, struct inode *inode, u32 mask, const struct path *path) { - struct fanotify_event_info *event; - gfp_t gfp = GFP_KERNEL; + struct fanotify_event_info *event = NULL; + gfp_t gfp = GFP_KERNEL | __GFP_ACCOUNT; /* * For queues with unlimited length lost events are not expected and @@ -151,19 +152,22 @@ struct fanotify_event_info *fanotify_alloc_event(struct fsnotify_group *group, if (group->max_events == UINT_MAX) gfp |= __GFP_NOFAIL; + /* Whoever is interested in the event, pays for the allocation. */ + memalloc_use_memcg(group->memcg); + if (fanotify_is_perm_event(mask)) {
[PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
A lot of memory can be consumed by the events generated for the huge or unlimited queues if there is either no or slow listener. This can cause system level memory pressure or OOMs. So, it's better to account the fsnotify kmem caches to the memcg of the listener. However the listener can be in a different memcg than the memcg of the producer and these allocations happen in the context of the event producer. This patch introduces remote memcg charging API which the producer can use to charge the allocations to the memcg of the listener. There are seven fsnotify kmem caches and among them allocations from dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and inotify_inode_mark_cachep happens in the context of syscall from the listener. So, SLAB_ACCOUNT is enough for these caches. The objects from fsnotify_mark_connector_cachep are not accounted as they are small compared to the notification mark or events and it is unclear whom to account connector to since it is shared by all events attached to the inode. The allocations from the event caches happen in the context of the event producer. For such caches we will need to remote charge the allocations to the listener's memcg. Thus we save the memcg reference in the fsnotify_group structure of the listener. This patch has also moved the members of fsnotify_group to keep the size same, at least for 64 bit build, even with additional member by filling the holes. Signed-off-by: Shakeel Butt Cc: Michal Hocko Cc: Jan Kara Cc: Amir Goldstein Cc: Greg Thelen Cc: Johannes Weiner Cc: Vladimir Davydov Cc: Roman Gushchin Cc: Andrew Morton Cc: Alexander Viro --- Changelog since v7: - Simplify the remote memcg charging API - get_mem_cgroup_from_current() returns root_mem_cgroup if current->active_memcg is not NULL but css_tryget_online() fails. Changelog since v6: - Removed Jan's ACK as the code has changed a lot - Squashed the separate remote charging API path into this one - Removed kmalloc* & kmem_cache_alloc* APIs and only kept the scope API - Changed fsnotify remote charging code to use scope API Changelog since v5: - None Changelog since v4: - Fixed the build for CONFIG_MEMCG=n Changelog since v3: - Rebased over Jan's patches. - Some cleanup based on Amir's comments. Changelog since v2: - None Changelog since v1: - no more charging fsnotify_mark_connector objects - Fixed the build for SLOB fs/notify/dnotify/dnotify.c | 5 ++-- fs/notify/fanotify/fanotify.c| 14 --- fs/notify/fanotify/fanotify_user.c | 5 +++- fs/notify/group.c| 3 +++ fs/notify/inotify/inotify_fsnotify.c | 7 +- fs/notify/inotify/inotify_user.c | 5 +++- include/linux/fsnotify_backend.h | 12 ++--- include/linux/memcontrol.h | 10 +++- include/linux/sched.h| 3 +++ include/linux/sched/mm.h | 37 kernel/fork.c| 3 +++ mm/memcontrol.c | 37 +--- 12 files changed, 123 insertions(+), 18 deletions(-) diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c index e2bea2ac5dfb..a6365e6bc047 100644 --- a/fs/notify/dnotify/dnotify.c +++ b/fs/notify/dnotify/dnotify.c @@ -384,8 +384,9 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg) static int __init dnotify_init(void) { - dnotify_struct_cache = KMEM_CACHE(dnotify_struct, SLAB_PANIC); - dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC); + dnotify_struct_cache = KMEM_CACHE(dnotify_struct, + SLAB_PANIC|SLAB_ACCOUNT); + dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC|SLAB_ACCOUNT); dnotify_group = fsnotify_alloc_group(_fsnotify_ops); if (IS_ERR(dnotify_group)) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index f90842efea13..6ff1f75d156d 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -11,6 +11,7 @@ #include #include #include +#include #include "fanotify.h" @@ -140,8 +141,8 @@ struct fanotify_event_info *fanotify_alloc_event(struct fsnotify_group *group, struct inode *inode, u32 mask, const struct path *path) { - struct fanotify_event_info *event; - gfp_t gfp = GFP_KERNEL; + struct fanotify_event_info *event = NULL; + gfp_t gfp = GFP_KERNEL | __GFP_ACCOUNT; /* * For queues with unlimited length lost events are not expected and @@ -151,19 +152,22 @@ struct fanotify_event_info *fanotify_alloc_event(struct fsnotify_group *group, if (group->max_events == UINT_MAX) gfp |= __GFP_NOFAIL; + /* Whoever is interested in the event, pays for the allocation. */ + memalloc_use_memcg(group->memcg); + if (fanotify_is_perm_event(mask)) {
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Tue 26-06-18 12:07:57, Shakeel Butt wrote: > On Tue, Jun 26, 2018 at 11:55 AM Johannes Weiner wrote: > > > > On Tue, Jun 26, 2018 at 11:00:53AM -0700, Shakeel Butt wrote: > > > On Mon, Jun 25, 2018 at 10:49 PM Amir Goldstein > > > wrote: > > > > > > > ... > > > > > > > > The verb 'unuse' takes an argument memcg and 'uses' it - too weird. > > > > You can use 'override'/'revert' verbs like override_creds or just call > > > > memalloc_use_memcg(old_memcg) since there is no reference taken > > > > anyway in use_memcg and no reference released in unuse_memcg. > > > > > > > > Otherwise looks good to me. > > > > > > > > > > Thanks for your feedback. Just using memalloc_use_memcg(old_memcg) and > > > ignoring the return seems more simple. I will wait for feedback from > > > other before changing anything. > > > > We're not nesting calls to memalloc_use_memcg(), right? So we don't > > have to return old_memcg and don't have to pass anything to unuse, it > > can always set current->active_memcg to NULL. > > For buffer_head, the allocation is done with GFP_NOFS. So, I think > there is no chance of nesting. The fsnotify uses GFP_KERNEL but based > on my limited understanding of fsnotify, there should not be any > nesting i.e. the allocation triggering reclaim which trigger fsnotify > events. Though I would like Amir or Jan to confirm there is no nesting > possible. You are correct. Fsnotify events are generated only as a result of some syscall, not due to reclaim or stuff like that. Honza -- Jan Kara SUSE Labs, CR
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Tue 26-06-18 12:07:57, Shakeel Butt wrote: > On Tue, Jun 26, 2018 at 11:55 AM Johannes Weiner wrote: > > > > On Tue, Jun 26, 2018 at 11:00:53AM -0700, Shakeel Butt wrote: > > > On Mon, Jun 25, 2018 at 10:49 PM Amir Goldstein > > > wrote: > > > > > > > ... > > > > > > > > The verb 'unuse' takes an argument memcg and 'uses' it - too weird. > > > > You can use 'override'/'revert' verbs like override_creds or just call > > > > memalloc_use_memcg(old_memcg) since there is no reference taken > > > > anyway in use_memcg and no reference released in unuse_memcg. > > > > > > > > Otherwise looks good to me. > > > > > > > > > > Thanks for your feedback. Just using memalloc_use_memcg(old_memcg) and > > > ignoring the return seems more simple. I will wait for feedback from > > > other before changing anything. > > > > We're not nesting calls to memalloc_use_memcg(), right? So we don't > > have to return old_memcg and don't have to pass anything to unuse, it > > can always set current->active_memcg to NULL. > > For buffer_head, the allocation is done with GFP_NOFS. So, I think > there is no chance of nesting. The fsnotify uses GFP_KERNEL but based > on my limited understanding of fsnotify, there should not be any > nesting i.e. the allocation triggering reclaim which trigger fsnotify > events. Though I would like Amir or Jan to confirm there is no nesting > possible. You are correct. Fsnotify events are generated only as a result of some syscall, not due to reclaim or stuff like that. Honza -- Jan Kara SUSE Labs, CR
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Tue, Jun 26, 2018 at 11:05 PM, Shakeel Butt wrote: > On Tue, Jun 26, 2018 at 12:03 PM Johannes Weiner wrote: >> >> On Mon, Jun 25, 2018 at 04:06:58PM -0700, Shakeel Butt wrote: >> > @@ -140,8 +141,9 @@ struct fanotify_event_info >> > *fanotify_alloc_event(struct fsnotify_group *group, >> >struct inode *inode, u32 >> > mask, >> >const struct path *path) >> > { >> > - struct fanotify_event_info *event; >> > + struct fanotify_event_info *event = NULL; >> > gfp_t gfp = GFP_KERNEL; >> > + struct mem_cgroup *old_memcg = NULL; >> > >> > /* >> >* For queues with unlimited length lost events are not expected and >> > @@ -151,19 +153,25 @@ struct fanotify_event_info >> > *fanotify_alloc_event(struct fsnotify_group *group, >> > if (group->max_events == UINT_MAX) >> > gfp |= __GFP_NOFAIL; >> > >> > + /* Whoever is interested in the event, pays for the allocation. */ >> > + if (group->memcg) { >> > + gfp |= __GFP_ACCOUNT; >> > + old_memcg = memalloc_use_memcg(group->memcg); >> > + } >> >> group->memcg is only NULL when memcg is disabled or there is some >> offlining race. Can you make memalloc_use_memcg(NULL) mean that it >> should charge root_mem_cgroup instead of current->mm->memcg? That way >> we can make this site unconditional while retaining the behavior: >> >> gfp_t gfp = GFP_KERNEL | __GFP_ACCOUNT; >> >> memalloc_use_memcg(group->memcg); >> kmem_cache_alloc(..., gfp); >> out: >> memalloc_unuse_memcg(); >> >> (dropping old_memcg and the unuse parameter as per the other mail) >> > > group->memcg is only NULL when memcg is disabled (i.e. > get_mem_cgroup_from_mm() returns root_mem_cgroup for offlined > mm->memcg). Though group->memcg can point to an offlined memcg. > > If I understand you correctly this is what we want: > > 1. If group->memcg is NULL then __GFP_ACCOUNT is a noop i.e. memcg is > disabled. > 2. If group->memcg is root_mem_cgroup, then __GFP_ACCOUNT again is a > kind of noop (charges to root_mem_cgroups are bypassed). > 3. If group->memcg is offlined memcg, then make __GFP_ACCOUNT noop by > returning root_mem_cgroup from get_mem_cgroup_from_current(). > 4. Else charge group->memcg. > > This seems reasonable. After your Ack and Amir's or Jan's answer to > the nesting query, I will resend the next version of this patch > series. > > In future if we find any use-cases of memalloc_use_memcg nesting then > we can make it work for nesting. > For the fsnotify use case memalloc_use_memcg() certainly doesn't need to nest, but I wonder, if that facility becomes popular among different subsystems, how exactly do you intend to monitor that it doesn't grow nested use cases? I would suggest that you at least leave a WARN_ON_ONCE if memalloc_use_memcg() is called and active_memcg is already set. Thanks, Amir.
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Tue, Jun 26, 2018 at 11:05 PM, Shakeel Butt wrote: > On Tue, Jun 26, 2018 at 12:03 PM Johannes Weiner wrote: >> >> On Mon, Jun 25, 2018 at 04:06:58PM -0700, Shakeel Butt wrote: >> > @@ -140,8 +141,9 @@ struct fanotify_event_info >> > *fanotify_alloc_event(struct fsnotify_group *group, >> >struct inode *inode, u32 >> > mask, >> >const struct path *path) >> > { >> > - struct fanotify_event_info *event; >> > + struct fanotify_event_info *event = NULL; >> > gfp_t gfp = GFP_KERNEL; >> > + struct mem_cgroup *old_memcg = NULL; >> > >> > /* >> >* For queues with unlimited length lost events are not expected and >> > @@ -151,19 +153,25 @@ struct fanotify_event_info >> > *fanotify_alloc_event(struct fsnotify_group *group, >> > if (group->max_events == UINT_MAX) >> > gfp |= __GFP_NOFAIL; >> > >> > + /* Whoever is interested in the event, pays for the allocation. */ >> > + if (group->memcg) { >> > + gfp |= __GFP_ACCOUNT; >> > + old_memcg = memalloc_use_memcg(group->memcg); >> > + } >> >> group->memcg is only NULL when memcg is disabled or there is some >> offlining race. Can you make memalloc_use_memcg(NULL) mean that it >> should charge root_mem_cgroup instead of current->mm->memcg? That way >> we can make this site unconditional while retaining the behavior: >> >> gfp_t gfp = GFP_KERNEL | __GFP_ACCOUNT; >> >> memalloc_use_memcg(group->memcg); >> kmem_cache_alloc(..., gfp); >> out: >> memalloc_unuse_memcg(); >> >> (dropping old_memcg and the unuse parameter as per the other mail) >> > > group->memcg is only NULL when memcg is disabled (i.e. > get_mem_cgroup_from_mm() returns root_mem_cgroup for offlined > mm->memcg). Though group->memcg can point to an offlined memcg. > > If I understand you correctly this is what we want: > > 1. If group->memcg is NULL then __GFP_ACCOUNT is a noop i.e. memcg is > disabled. > 2. If group->memcg is root_mem_cgroup, then __GFP_ACCOUNT again is a > kind of noop (charges to root_mem_cgroups are bypassed). > 3. If group->memcg is offlined memcg, then make __GFP_ACCOUNT noop by > returning root_mem_cgroup from get_mem_cgroup_from_current(). > 4. Else charge group->memcg. > > This seems reasonable. After your Ack and Amir's or Jan's answer to > the nesting query, I will resend the next version of this patch > series. > > In future if we find any use-cases of memalloc_use_memcg nesting then > we can make it work for nesting. > For the fsnotify use case memalloc_use_memcg() certainly doesn't need to nest, but I wonder, if that facility becomes popular among different subsystems, how exactly do you intend to monitor that it doesn't grow nested use cases? I would suggest that you at least leave a WARN_ON_ONCE if memalloc_use_memcg() is called and active_memcg is already set. Thanks, Amir.
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Tue, Jun 26, 2018 at 12:03 PM Johannes Weiner wrote: > > On Mon, Jun 25, 2018 at 04:06:58PM -0700, Shakeel Butt wrote: > > @@ -140,8 +141,9 @@ struct fanotify_event_info *fanotify_alloc_event(struct > > fsnotify_group *group, > >struct inode *inode, u32 > > mask, > >const struct path *path) > > { > > - struct fanotify_event_info *event; > > + struct fanotify_event_info *event = NULL; > > gfp_t gfp = GFP_KERNEL; > > + struct mem_cgroup *old_memcg = NULL; > > > > /* > >* For queues with unlimited length lost events are not expected and > > @@ -151,19 +153,25 @@ struct fanotify_event_info > > *fanotify_alloc_event(struct fsnotify_group *group, > > if (group->max_events == UINT_MAX) > > gfp |= __GFP_NOFAIL; > > > > + /* Whoever is interested in the event, pays for the allocation. */ > > + if (group->memcg) { > > + gfp |= __GFP_ACCOUNT; > > + old_memcg = memalloc_use_memcg(group->memcg); > > + } > > group->memcg is only NULL when memcg is disabled or there is some > offlining race. Can you make memalloc_use_memcg(NULL) mean that it > should charge root_mem_cgroup instead of current->mm->memcg? That way > we can make this site unconditional while retaining the behavior: > > gfp_t gfp = GFP_KERNEL | __GFP_ACCOUNT; > > memalloc_use_memcg(group->memcg); > kmem_cache_alloc(..., gfp); > out: > memalloc_unuse_memcg(); > > (dropping old_memcg and the unuse parameter as per the other mail) > group->memcg is only NULL when memcg is disabled (i.e. get_mem_cgroup_from_mm() returns root_mem_cgroup for offlined mm->memcg). Though group->memcg can point to an offlined memcg. If I understand you correctly this is what we want: 1. If group->memcg is NULL then __GFP_ACCOUNT is a noop i.e. memcg is disabled. 2. If group->memcg is root_mem_cgroup, then __GFP_ACCOUNT again is a kind of noop (charges to root_mem_cgroups are bypassed). 3. If group->memcg is offlined memcg, then make __GFP_ACCOUNT noop by returning root_mem_cgroup from get_mem_cgroup_from_current(). 4. Else charge group->memcg. This seems reasonable. After your Ack and Amir's or Jan's answer to the nesting query, I will resend the next version of this patch series. In future if we find any use-cases of memalloc_use_memcg nesting then we can make it work for nesting. thanks, Shakeel
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Tue, Jun 26, 2018 at 12:03 PM Johannes Weiner wrote: > > On Mon, Jun 25, 2018 at 04:06:58PM -0700, Shakeel Butt wrote: > > @@ -140,8 +141,9 @@ struct fanotify_event_info *fanotify_alloc_event(struct > > fsnotify_group *group, > >struct inode *inode, u32 > > mask, > >const struct path *path) > > { > > - struct fanotify_event_info *event; > > + struct fanotify_event_info *event = NULL; > > gfp_t gfp = GFP_KERNEL; > > + struct mem_cgroup *old_memcg = NULL; > > > > /* > >* For queues with unlimited length lost events are not expected and > > @@ -151,19 +153,25 @@ struct fanotify_event_info > > *fanotify_alloc_event(struct fsnotify_group *group, > > if (group->max_events == UINT_MAX) > > gfp |= __GFP_NOFAIL; > > > > + /* Whoever is interested in the event, pays for the allocation. */ > > + if (group->memcg) { > > + gfp |= __GFP_ACCOUNT; > > + old_memcg = memalloc_use_memcg(group->memcg); > > + } > > group->memcg is only NULL when memcg is disabled or there is some > offlining race. Can you make memalloc_use_memcg(NULL) mean that it > should charge root_mem_cgroup instead of current->mm->memcg? That way > we can make this site unconditional while retaining the behavior: > > gfp_t gfp = GFP_KERNEL | __GFP_ACCOUNT; > > memalloc_use_memcg(group->memcg); > kmem_cache_alloc(..., gfp); > out: > memalloc_unuse_memcg(); > > (dropping old_memcg and the unuse parameter as per the other mail) > group->memcg is only NULL when memcg is disabled (i.e. get_mem_cgroup_from_mm() returns root_mem_cgroup for offlined mm->memcg). Though group->memcg can point to an offlined memcg. If I understand you correctly this is what we want: 1. If group->memcg is NULL then __GFP_ACCOUNT is a noop i.e. memcg is disabled. 2. If group->memcg is root_mem_cgroup, then __GFP_ACCOUNT again is a kind of noop (charges to root_mem_cgroups are bypassed). 3. If group->memcg is offlined memcg, then make __GFP_ACCOUNT noop by returning root_mem_cgroup from get_mem_cgroup_from_current(). 4. Else charge group->memcg. This seems reasonable. After your Ack and Amir's or Jan's answer to the nesting query, I will resend the next version of this patch series. In future if we find any use-cases of memalloc_use_memcg nesting then we can make it work for nesting. thanks, Shakeel
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Tue, Jun 26, 2018 at 11:55 AM Johannes Weiner wrote: > > On Tue, Jun 26, 2018 at 11:00:53AM -0700, Shakeel Butt wrote: > > On Mon, Jun 25, 2018 at 10:49 PM Amir Goldstein wrote: > > > > > ... > > > > > > The verb 'unuse' takes an argument memcg and 'uses' it - too weird. > > > You can use 'override'/'revert' verbs like override_creds or just call > > > memalloc_use_memcg(old_memcg) since there is no reference taken > > > anyway in use_memcg and no reference released in unuse_memcg. > > > > > > Otherwise looks good to me. > > > > > > > Thanks for your feedback. Just using memalloc_use_memcg(old_memcg) and > > ignoring the return seems more simple. I will wait for feedback from > > other before changing anything. > > We're not nesting calls to memalloc_use_memcg(), right? So we don't > have to return old_memcg and don't have to pass anything to unuse, it > can always set current->active_memcg to NULL. For buffer_head, the allocation is done with GFP_NOFS. So, I think there is no chance of nesting. The fsnotify uses GFP_KERNEL but based on my limited understanding of fsnotify, there should not be any nesting i.e. the allocation triggering reclaim which trigger fsnotify events. Though I would like Amir or Jan to confirm there is no nesting possible. thanks, Shakeel
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Tue, Jun 26, 2018 at 11:55 AM Johannes Weiner wrote: > > On Tue, Jun 26, 2018 at 11:00:53AM -0700, Shakeel Butt wrote: > > On Mon, Jun 25, 2018 at 10:49 PM Amir Goldstein wrote: > > > > > ... > > > > > > The verb 'unuse' takes an argument memcg and 'uses' it - too weird. > > > You can use 'override'/'revert' verbs like override_creds or just call > > > memalloc_use_memcg(old_memcg) since there is no reference taken > > > anyway in use_memcg and no reference released in unuse_memcg. > > > > > > Otherwise looks good to me. > > > > > > > Thanks for your feedback. Just using memalloc_use_memcg(old_memcg) and > > ignoring the return seems more simple. I will wait for feedback from > > other before changing anything. > > We're not nesting calls to memalloc_use_memcg(), right? So we don't > have to return old_memcg and don't have to pass anything to unuse, it > can always set current->active_memcg to NULL. For buffer_head, the allocation is done with GFP_NOFS. So, I think there is no chance of nesting. The fsnotify uses GFP_KERNEL but based on my limited understanding of fsnotify, there should not be any nesting i.e. the allocation triggering reclaim which trigger fsnotify events. Though I would like Amir or Jan to confirm there is no nesting possible. thanks, Shakeel
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Mon, Jun 25, 2018 at 04:06:58PM -0700, Shakeel Butt wrote: > @@ -140,8 +141,9 @@ struct fanotify_event_info *fanotify_alloc_event(struct > fsnotify_group *group, >struct inode *inode, u32 mask, >const struct path *path) > { > - struct fanotify_event_info *event; > + struct fanotify_event_info *event = NULL; > gfp_t gfp = GFP_KERNEL; > + struct mem_cgroup *old_memcg = NULL; > > /* >* For queues with unlimited length lost events are not expected and > @@ -151,19 +153,25 @@ struct fanotify_event_info *fanotify_alloc_event(struct > fsnotify_group *group, > if (group->max_events == UINT_MAX) > gfp |= __GFP_NOFAIL; > > + /* Whoever is interested in the event, pays for the allocation. */ > + if (group->memcg) { > + gfp |= __GFP_ACCOUNT; > + old_memcg = memalloc_use_memcg(group->memcg); > + } group->memcg is only NULL when memcg is disabled or there is some offlining race. Can you make memalloc_use_memcg(NULL) mean that it should charge root_mem_cgroup instead of current->mm->memcg? That way we can make this site unconditional while retaining the behavior: gfp_t gfp = GFP_KERNEL | __GFP_ACCOUNT; memalloc_use_memcg(group->memcg); kmem_cache_alloc(..., gfp); out: memalloc_unuse_memcg(); (dropping old_memcg and the unuse parameter as per the other mail) > if (fanotify_is_perm_event(mask)) { > struct fanotify_perm_event_info *pevent; > > pevent = kmem_cache_alloc(fanotify_perm_event_cachep, gfp); > if (!pevent) > - return NULL; > + goto out; > event = >fae; > pevent->response = 0; > goto init; > } > event = kmem_cache_alloc(fanotify_event_cachep, gfp); > if (!event) > - return NULL; > + goto out; > init: __maybe_unused > fsnotify_init_event(>fse, inode, mask); > event->tgid = get_pid(task_tgid(current)); > @@ -174,6 +182,9 @@ init: __maybe_unused > event->path.mnt = NULL; > event->path.dentry = NULL; > } > +out: > + if (group->memcg) > + memalloc_unuse_memcg(old_memcg); > return event; > } Thanks, Johannes
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Mon, Jun 25, 2018 at 04:06:58PM -0700, Shakeel Butt wrote: > @@ -140,8 +141,9 @@ struct fanotify_event_info *fanotify_alloc_event(struct > fsnotify_group *group, >struct inode *inode, u32 mask, >const struct path *path) > { > - struct fanotify_event_info *event; > + struct fanotify_event_info *event = NULL; > gfp_t gfp = GFP_KERNEL; > + struct mem_cgroup *old_memcg = NULL; > > /* >* For queues with unlimited length lost events are not expected and > @@ -151,19 +153,25 @@ struct fanotify_event_info *fanotify_alloc_event(struct > fsnotify_group *group, > if (group->max_events == UINT_MAX) > gfp |= __GFP_NOFAIL; > > + /* Whoever is interested in the event, pays for the allocation. */ > + if (group->memcg) { > + gfp |= __GFP_ACCOUNT; > + old_memcg = memalloc_use_memcg(group->memcg); > + } group->memcg is only NULL when memcg is disabled or there is some offlining race. Can you make memalloc_use_memcg(NULL) mean that it should charge root_mem_cgroup instead of current->mm->memcg? That way we can make this site unconditional while retaining the behavior: gfp_t gfp = GFP_KERNEL | __GFP_ACCOUNT; memalloc_use_memcg(group->memcg); kmem_cache_alloc(..., gfp); out: memalloc_unuse_memcg(); (dropping old_memcg and the unuse parameter as per the other mail) > if (fanotify_is_perm_event(mask)) { > struct fanotify_perm_event_info *pevent; > > pevent = kmem_cache_alloc(fanotify_perm_event_cachep, gfp); > if (!pevent) > - return NULL; > + goto out; > event = >fae; > pevent->response = 0; > goto init; > } > event = kmem_cache_alloc(fanotify_event_cachep, gfp); > if (!event) > - return NULL; > + goto out; > init: __maybe_unused > fsnotify_init_event(>fse, inode, mask); > event->tgid = get_pid(task_tgid(current)); > @@ -174,6 +182,9 @@ init: __maybe_unused > event->path.mnt = NULL; > event->path.dentry = NULL; > } > +out: > + if (group->memcg) > + memalloc_unuse_memcg(old_memcg); > return event; > } Thanks, Johannes
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Tue, Jun 26, 2018 at 11:00:53AM -0700, Shakeel Butt wrote: > On Mon, Jun 25, 2018 at 10:49 PM Amir Goldstein wrote: > > > ... > > > > The verb 'unuse' takes an argument memcg and 'uses' it - too weird. > > You can use 'override'/'revert' verbs like override_creds or just call > > memalloc_use_memcg(old_memcg) since there is no reference taken > > anyway in use_memcg and no reference released in unuse_memcg. > > > > Otherwise looks good to me. > > > > Thanks for your feedback. Just using memalloc_use_memcg(old_memcg) and > ignoring the return seems more simple. I will wait for feedback from > other before changing anything. We're not nesting calls to memalloc_use_memcg(), right? So we don't have to return old_memcg and don't have to pass anything to unuse, it can always set current->active_memcg to NULL.
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Tue, Jun 26, 2018 at 11:00:53AM -0700, Shakeel Butt wrote: > On Mon, Jun 25, 2018 at 10:49 PM Amir Goldstein wrote: > > > ... > > > > The verb 'unuse' takes an argument memcg and 'uses' it - too weird. > > You can use 'override'/'revert' verbs like override_creds or just call > > memalloc_use_memcg(old_memcg) since there is no reference taken > > anyway in use_memcg and no reference released in unuse_memcg. > > > > Otherwise looks good to me. > > > > Thanks for your feedback. Just using memalloc_use_memcg(old_memcg) and > ignoring the return seems more simple. I will wait for feedback from > other before changing anything. We're not nesting calls to memalloc_use_memcg(), right? So we don't have to return old_memcg and don't have to pass anything to unuse, it can always set current->active_memcg to NULL.
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Mon, Jun 25, 2018 at 10:49 PM Amir Goldstein wrote: > ... > > The verb 'unuse' takes an argument memcg and 'uses' it - too weird. > You can use 'override'/'revert' verbs like override_creds or just call > memalloc_use_memcg(old_memcg) since there is no reference taken > anyway in use_memcg and no reference released in unuse_memcg. > > Otherwise looks good to me. > Thanks for your feedback. Just using memalloc_use_memcg(old_memcg) and ignoring the return seems more simple. I will wait for feedback from other before changing anything. thanks, Shakeel
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Mon, Jun 25, 2018 at 10:49 PM Amir Goldstein wrote: > ... > > The verb 'unuse' takes an argument memcg and 'uses' it - too weird. > You can use 'override'/'revert' verbs like override_creds or just call > memalloc_use_memcg(old_memcg) since there is no reference taken > anyway in use_memcg and no reference released in unuse_memcg. > > Otherwise looks good to me. > Thanks for your feedback. Just using memalloc_use_memcg(old_memcg) and ignoring the return seems more simple. I will wait for feedback from other before changing anything. thanks, Shakeel
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Tue, Jun 26, 2018 at 2:06 AM, Shakeel Butt wrote: > A lot of memory can be consumed by the events generated for the huge or > unlimited queues if there is either no or slow listener. This can cause > system level memory pressure or OOMs. So, it's better to account the > fsnotify kmem caches to the memcg of the listener. > > However the listener can be in a different memcg than the memcg of the > producer and these allocations happen in the context of the event > producer. This patch introduces remote memcg charging scope API which the > producer can use to charge the allocations to the memcg of the listener. > > There are seven fsnotify kmem caches and among them allocations from > dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and > inotify_inode_mark_cachep happens in the context of syscall from the > listener. So, SLAB_ACCOUNT is enough for these caches. > > The objects from fsnotify_mark_connector_cachep are not accounted as they > are small compared to the notification mark or events and it is unclear > whom to account connector to since it is shared by all events attached to > the inode. > > The allocations from the event caches happen in the context of the event > producer. For such caches we will need to remote charge the allocations > to the listener's memcg. Thus we save the memcg reference in the > fsnotify_group structure of the listener. > > This patch has also moved the members of fsnotify_group to keep the size > same, at least for 64 bit build, even with additional member by filling > the holes. > > Signed-off-by: Shakeel Butt > Cc: Michal Hocko > Cc: Jan Kara > Cc: Amir Goldstein > Cc: Greg Thelen > Cc: Johannes Weiner > Cc: Vladimir Davydov > Cc: Roman Gushchin > --- > Changelog since v6: > - Removed Jan's ACK as the code has changed a lot > - Squashed the separate remote charging API path into this one > - Removed kmalloc* & kmem_cache_alloc* APIs and only kept the scope API > - Changed fsnotify remote charging code to use scope API > > Changelog since v5: > - None > > Changelog since v4: > - Fixed the build for CONFIG_MEMCG=n > > Changelog since v3: > - Rebased over Jan's patches. > - Some cleanup based on Amir's comments. > > Changelog since v2: > - None > > Changelog since v1: > - no more charging fsnotify_mark_connector objects > - Fixed the build for SLOB > > fs/notify/dnotify/dnotify.c | 5 ++-- > fs/notify/fanotify/fanotify.c| 17 ++-- > fs/notify/fanotify/fanotify_user.c | 5 +++- > fs/notify/group.c| 4 +++ > fs/notify/inotify/inotify_fsnotify.c | 15 +- > fs/notify/inotify/inotify_user.c | 5 +++- > include/linux/fsnotify_backend.h | 12 +--- > include/linux/memcontrol.h | 7 + > include/linux/sched.h| 3 ++ > include/linux/sched/mm.h | 41 > kernel/fork.c| 3 ++ > mm/memcontrol.c | 38 +++--- > 12 files changed, 139 insertions(+), 16 deletions(-) > > diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c > index e2bea2ac5dfb..a6365e6bc047 100644 > --- a/fs/notify/dnotify/dnotify.c > +++ b/fs/notify/dnotify/dnotify.c > @@ -384,8 +384,9 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned > long arg) > > static int __init dnotify_init(void) > { > - dnotify_struct_cache = KMEM_CACHE(dnotify_struct, SLAB_PANIC); > - dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC); > + dnotify_struct_cache = KMEM_CACHE(dnotify_struct, > + SLAB_PANIC|SLAB_ACCOUNT); > + dnotify_mark_cache = KMEM_CACHE(dnotify_mark, > SLAB_PANIC|SLAB_ACCOUNT); > > dnotify_group = fsnotify_alloc_group(_fsnotify_ops); > if (IS_ERR(dnotify_group)) > diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c > index f90842efea13..d6dfcf0ec21f 100644 > --- a/fs/notify/fanotify/fanotify.c > +++ b/fs/notify/fanotify/fanotify.c > @@ -11,6 +11,7 @@ > #include > #include > #include > +#include > > #include "fanotify.h" > > @@ -140,8 +141,9 @@ struct fanotify_event_info *fanotify_alloc_event(struct > fsnotify_group *group, > struct inode *inode, u32 > mask, > const struct path *path) > { > - struct fanotify_event_info *event; > + struct fanotify_event_info *event = NULL; > gfp_t gfp = GFP_KERNEL; > + struct mem_cgroup *old_memcg = NULL; > > /* > * For queues with unlimited length lost events are not expected and > @@ -151,19 +153,25 @@ struct fanotify_event_info *fanotify_alloc_event(struct > fsnotify_group *group, > if (group->max_events == UINT_MAX) > gfp |= __GFP_NOFAIL; > > + /* Whoever is interested in the event, pays for the allocation. */ > + if (group->memcg) { > +
Re: [PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
On Tue, Jun 26, 2018 at 2:06 AM, Shakeel Butt wrote: > A lot of memory can be consumed by the events generated for the huge or > unlimited queues if there is either no or slow listener. This can cause > system level memory pressure or OOMs. So, it's better to account the > fsnotify kmem caches to the memcg of the listener. > > However the listener can be in a different memcg than the memcg of the > producer and these allocations happen in the context of the event > producer. This patch introduces remote memcg charging scope API which the > producer can use to charge the allocations to the memcg of the listener. > > There are seven fsnotify kmem caches and among them allocations from > dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and > inotify_inode_mark_cachep happens in the context of syscall from the > listener. So, SLAB_ACCOUNT is enough for these caches. > > The objects from fsnotify_mark_connector_cachep are not accounted as they > are small compared to the notification mark or events and it is unclear > whom to account connector to since it is shared by all events attached to > the inode. > > The allocations from the event caches happen in the context of the event > producer. For such caches we will need to remote charge the allocations > to the listener's memcg. Thus we save the memcg reference in the > fsnotify_group structure of the listener. > > This patch has also moved the members of fsnotify_group to keep the size > same, at least for 64 bit build, even with additional member by filling > the holes. > > Signed-off-by: Shakeel Butt > Cc: Michal Hocko > Cc: Jan Kara > Cc: Amir Goldstein > Cc: Greg Thelen > Cc: Johannes Weiner > Cc: Vladimir Davydov > Cc: Roman Gushchin > --- > Changelog since v6: > - Removed Jan's ACK as the code has changed a lot > - Squashed the separate remote charging API path into this one > - Removed kmalloc* & kmem_cache_alloc* APIs and only kept the scope API > - Changed fsnotify remote charging code to use scope API > > Changelog since v5: > - None > > Changelog since v4: > - Fixed the build for CONFIG_MEMCG=n > > Changelog since v3: > - Rebased over Jan's patches. > - Some cleanup based on Amir's comments. > > Changelog since v2: > - None > > Changelog since v1: > - no more charging fsnotify_mark_connector objects > - Fixed the build for SLOB > > fs/notify/dnotify/dnotify.c | 5 ++-- > fs/notify/fanotify/fanotify.c| 17 ++-- > fs/notify/fanotify/fanotify_user.c | 5 +++- > fs/notify/group.c| 4 +++ > fs/notify/inotify/inotify_fsnotify.c | 15 +- > fs/notify/inotify/inotify_user.c | 5 +++- > include/linux/fsnotify_backend.h | 12 +--- > include/linux/memcontrol.h | 7 + > include/linux/sched.h| 3 ++ > include/linux/sched/mm.h | 41 > kernel/fork.c| 3 ++ > mm/memcontrol.c | 38 +++--- > 12 files changed, 139 insertions(+), 16 deletions(-) > > diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c > index e2bea2ac5dfb..a6365e6bc047 100644 > --- a/fs/notify/dnotify/dnotify.c > +++ b/fs/notify/dnotify/dnotify.c > @@ -384,8 +384,9 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned > long arg) > > static int __init dnotify_init(void) > { > - dnotify_struct_cache = KMEM_CACHE(dnotify_struct, SLAB_PANIC); > - dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC); > + dnotify_struct_cache = KMEM_CACHE(dnotify_struct, > + SLAB_PANIC|SLAB_ACCOUNT); > + dnotify_mark_cache = KMEM_CACHE(dnotify_mark, > SLAB_PANIC|SLAB_ACCOUNT); > > dnotify_group = fsnotify_alloc_group(_fsnotify_ops); > if (IS_ERR(dnotify_group)) > diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c > index f90842efea13..d6dfcf0ec21f 100644 > --- a/fs/notify/fanotify/fanotify.c > +++ b/fs/notify/fanotify/fanotify.c > @@ -11,6 +11,7 @@ > #include > #include > #include > +#include > > #include "fanotify.h" > > @@ -140,8 +141,9 @@ struct fanotify_event_info *fanotify_alloc_event(struct > fsnotify_group *group, > struct inode *inode, u32 > mask, > const struct path *path) > { > - struct fanotify_event_info *event; > + struct fanotify_event_info *event = NULL; > gfp_t gfp = GFP_KERNEL; > + struct mem_cgroup *old_memcg = NULL; > > /* > * For queues with unlimited length lost events are not expected and > @@ -151,19 +153,25 @@ struct fanotify_event_info *fanotify_alloc_event(struct > fsnotify_group *group, > if (group->max_events == UINT_MAX) > gfp |= __GFP_NOFAIL; > > + /* Whoever is interested in the event, pays for the allocation. */ > + if (group->memcg) { > +
[PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
A lot of memory can be consumed by the events generated for the huge or unlimited queues if there is either no or slow listener. This can cause system level memory pressure or OOMs. So, it's better to account the fsnotify kmem caches to the memcg of the listener. However the listener can be in a different memcg than the memcg of the producer and these allocations happen in the context of the event producer. This patch introduces remote memcg charging scope API which the producer can use to charge the allocations to the memcg of the listener. There are seven fsnotify kmem caches and among them allocations from dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and inotify_inode_mark_cachep happens in the context of syscall from the listener. So, SLAB_ACCOUNT is enough for these caches. The objects from fsnotify_mark_connector_cachep are not accounted as they are small compared to the notification mark or events and it is unclear whom to account connector to since it is shared by all events attached to the inode. The allocations from the event caches happen in the context of the event producer. For such caches we will need to remote charge the allocations to the listener's memcg. Thus we save the memcg reference in the fsnotify_group structure of the listener. This patch has also moved the members of fsnotify_group to keep the size same, at least for 64 bit build, even with additional member by filling the holes. Signed-off-by: Shakeel Butt Cc: Michal Hocko Cc: Jan Kara Cc: Amir Goldstein Cc: Greg Thelen Cc: Johannes Weiner Cc: Vladimir Davydov Cc: Roman Gushchin --- Changelog since v6: - Removed Jan's ACK as the code has changed a lot - Squashed the separate remote charging API path into this one - Removed kmalloc* & kmem_cache_alloc* APIs and only kept the scope API - Changed fsnotify remote charging code to use scope API Changelog since v5: - None Changelog since v4: - Fixed the build for CONFIG_MEMCG=n Changelog since v3: - Rebased over Jan's patches. - Some cleanup based on Amir's comments. Changelog since v2: - None Changelog since v1: - no more charging fsnotify_mark_connector objects - Fixed the build for SLOB fs/notify/dnotify/dnotify.c | 5 ++-- fs/notify/fanotify/fanotify.c| 17 ++-- fs/notify/fanotify/fanotify_user.c | 5 +++- fs/notify/group.c| 4 +++ fs/notify/inotify/inotify_fsnotify.c | 15 +- fs/notify/inotify/inotify_user.c | 5 +++- include/linux/fsnotify_backend.h | 12 +--- include/linux/memcontrol.h | 7 + include/linux/sched.h| 3 ++ include/linux/sched/mm.h | 41 kernel/fork.c| 3 ++ mm/memcontrol.c | 38 +++--- 12 files changed, 139 insertions(+), 16 deletions(-) diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c index e2bea2ac5dfb..a6365e6bc047 100644 --- a/fs/notify/dnotify/dnotify.c +++ b/fs/notify/dnotify/dnotify.c @@ -384,8 +384,9 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg) static int __init dnotify_init(void) { - dnotify_struct_cache = KMEM_CACHE(dnotify_struct, SLAB_PANIC); - dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC); + dnotify_struct_cache = KMEM_CACHE(dnotify_struct, + SLAB_PANIC|SLAB_ACCOUNT); + dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC|SLAB_ACCOUNT); dnotify_group = fsnotify_alloc_group(_fsnotify_ops); if (IS_ERR(dnotify_group)) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index f90842efea13..d6dfcf0ec21f 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -11,6 +11,7 @@ #include #include #include +#include #include "fanotify.h" @@ -140,8 +141,9 @@ struct fanotify_event_info *fanotify_alloc_event(struct fsnotify_group *group, struct inode *inode, u32 mask, const struct path *path) { - struct fanotify_event_info *event; + struct fanotify_event_info *event = NULL; gfp_t gfp = GFP_KERNEL; + struct mem_cgroup *old_memcg = NULL; /* * For queues with unlimited length lost events are not expected and @@ -151,19 +153,25 @@ struct fanotify_event_info *fanotify_alloc_event(struct fsnotify_group *group, if (group->max_events == UINT_MAX) gfp |= __GFP_NOFAIL; + /* Whoever is interested in the event, pays for the allocation. */ + if (group->memcg) { + gfp |= __GFP_ACCOUNT; + old_memcg = memalloc_use_memcg(group->memcg); + } + if (fanotify_is_perm_event(mask)) { struct fanotify_perm_event_info *pevent; pevent = kmem_cache_alloc(fanotify_perm_event_cachep,
[PATCH 1/2] fs: fsnotify: account fsnotify metadata to kmemcg
A lot of memory can be consumed by the events generated for the huge or unlimited queues if there is either no or slow listener. This can cause system level memory pressure or OOMs. So, it's better to account the fsnotify kmem caches to the memcg of the listener. However the listener can be in a different memcg than the memcg of the producer and these allocations happen in the context of the event producer. This patch introduces remote memcg charging scope API which the producer can use to charge the allocations to the memcg of the listener. There are seven fsnotify kmem caches and among them allocations from dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and inotify_inode_mark_cachep happens in the context of syscall from the listener. So, SLAB_ACCOUNT is enough for these caches. The objects from fsnotify_mark_connector_cachep are not accounted as they are small compared to the notification mark or events and it is unclear whom to account connector to since it is shared by all events attached to the inode. The allocations from the event caches happen in the context of the event producer. For such caches we will need to remote charge the allocations to the listener's memcg. Thus we save the memcg reference in the fsnotify_group structure of the listener. This patch has also moved the members of fsnotify_group to keep the size same, at least for 64 bit build, even with additional member by filling the holes. Signed-off-by: Shakeel Butt Cc: Michal Hocko Cc: Jan Kara Cc: Amir Goldstein Cc: Greg Thelen Cc: Johannes Weiner Cc: Vladimir Davydov Cc: Roman Gushchin --- Changelog since v6: - Removed Jan's ACK as the code has changed a lot - Squashed the separate remote charging API path into this one - Removed kmalloc* & kmem_cache_alloc* APIs and only kept the scope API - Changed fsnotify remote charging code to use scope API Changelog since v5: - None Changelog since v4: - Fixed the build for CONFIG_MEMCG=n Changelog since v3: - Rebased over Jan's patches. - Some cleanup based on Amir's comments. Changelog since v2: - None Changelog since v1: - no more charging fsnotify_mark_connector objects - Fixed the build for SLOB fs/notify/dnotify/dnotify.c | 5 ++-- fs/notify/fanotify/fanotify.c| 17 ++-- fs/notify/fanotify/fanotify_user.c | 5 +++- fs/notify/group.c| 4 +++ fs/notify/inotify/inotify_fsnotify.c | 15 +- fs/notify/inotify/inotify_user.c | 5 +++- include/linux/fsnotify_backend.h | 12 +--- include/linux/memcontrol.h | 7 + include/linux/sched.h| 3 ++ include/linux/sched/mm.h | 41 kernel/fork.c| 3 ++ mm/memcontrol.c | 38 +++--- 12 files changed, 139 insertions(+), 16 deletions(-) diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c index e2bea2ac5dfb..a6365e6bc047 100644 --- a/fs/notify/dnotify/dnotify.c +++ b/fs/notify/dnotify/dnotify.c @@ -384,8 +384,9 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg) static int __init dnotify_init(void) { - dnotify_struct_cache = KMEM_CACHE(dnotify_struct, SLAB_PANIC); - dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC); + dnotify_struct_cache = KMEM_CACHE(dnotify_struct, + SLAB_PANIC|SLAB_ACCOUNT); + dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC|SLAB_ACCOUNT); dnotify_group = fsnotify_alloc_group(_fsnotify_ops); if (IS_ERR(dnotify_group)) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index f90842efea13..d6dfcf0ec21f 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -11,6 +11,7 @@ #include #include #include +#include #include "fanotify.h" @@ -140,8 +141,9 @@ struct fanotify_event_info *fanotify_alloc_event(struct fsnotify_group *group, struct inode *inode, u32 mask, const struct path *path) { - struct fanotify_event_info *event; + struct fanotify_event_info *event = NULL; gfp_t gfp = GFP_KERNEL; + struct mem_cgroup *old_memcg = NULL; /* * For queues with unlimited length lost events are not expected and @@ -151,19 +153,25 @@ struct fanotify_event_info *fanotify_alloc_event(struct fsnotify_group *group, if (group->max_events == UINT_MAX) gfp |= __GFP_NOFAIL; + /* Whoever is interested in the event, pays for the allocation. */ + if (group->memcg) { + gfp |= __GFP_ACCOUNT; + old_memcg = memalloc_use_memcg(group->memcg); + } + if (fanotify_is_perm_event(mask)) { struct fanotify_perm_event_info *pevent; pevent = kmem_cache_alloc(fanotify_perm_event_cachep,