Re: [PATCH v2 1/1] kvfree_rcu: Release a page cache under memory pressure
On Tue, Mar 16, 2021 at 02:01:25PM -0700, Paul E. McKenney wrote: > On Tue, Mar 16, 2021 at 09:42:07PM +0100, Uladzislau Rezki wrote: > > > On Wed, Mar 10, 2021 at 09:07:57PM +0100, Uladzislau Rezki (Sony) wrote: > > > > From: Zhang Qiang > > > > > > > > Add a drain_page_cache() function to drain a per-cpu page cache. > > > > The reason behind of it is a system can run into a low memory > > > > condition, in that case a page shrinker can ask for its users > > > > to free their caches in order to get extra memory available for > > > > other needs in a system. > > > > > > > > When a system hits such condition, a page cache is drained for > > > > all CPUs in a system. Apart of that a page cache work is delayed > > > > with 5 seconds interval until a memory pressure disappears. > > > > > > Does this capture it? > > > > > It would be good to have kind of clear interface saying that: > > > > - low memory condition starts; > > - it is over, watermarks were fixed. > > > > but i do not see it. Therefore 5 seconds back-off has been chosen > > to make a cache refilling to be less aggressive. Suppose 5 seconds > > is not enough, in that case the work will attempt to allocate some > > pages using less permissive parameters. What means that if we are > > still in a low memory condition a refilling will probably fail and > > next job will be invoked in 5 seconds one more time. > > I would like such an interface as well, but from what I hear it is > easier to ask for than to provide. :-/ > > > > > > > > > > Add a drain_page_cache() function that drains the specified per-cpu > > > page cache. This function is invoked on each CPU when the system > > > enters a low-memory state, that is, when the shrinker invokes > > > kfree_rcu_shrink_scan(). Thus, when the system is low on memory, > > > kvfree_rcu() starts taking its slow paths. > > > > > > In addition, the first subsequent attempt to refill the caches is > > > delayed for five seconds. > > > > > > > > > > > > A few questions below. > > > > > > Thanx, Paul > > > > > > > Co-developed-by: Uladzislau Rezki (Sony) > > > > Signed-off-by: Uladzislau Rezki (Sony) > > > > Signed-off-by: Zqiang > > > > --- > > > > kernel/rcu/tree.c | 59 --- > > > > 1 file changed, 51 insertions(+), 8 deletions(-) > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > index 2c9cf4df942c..46b8a98ca077 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -3163,7 +3163,7 @@ struct kfree_rcu_cpu { > > > > bool initialized; > > > > int count; > > > > > > > > - struct work_struct page_cache_work; > > > > + struct delayed_work page_cache_work; > > > > atomic_t work_in_progress; > > > > struct hrtimer hrtimer; > > > > > > > > @@ -3175,6 +3175,17 @@ static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) > > > > = { > > > > .lock = __RAW_SPIN_LOCK_UNLOCKED(krc.lock), > > > > }; > > > > > > > > +// A page shrinker can ask for freeing extra pages > > > > +// to get them available for other needs in a system. > > > > +// Usually it happens under low memory condition, in > > > > +// that case hold on a bit with page cache filling. > > > > +static unsigned long backoff_page_cache_fill; > > > > + > > > > +// 5 seconds delay. That is long enough to reduce > > > > +// an interfering and racing with a shrinker where > > > > +// the cache is drained. > > > > +#define PAGE_CACHE_FILL_DELAY (5 * HZ) > > > > + > > > > static __always_inline void > > > > debug_rcu_bhead_unqueue(struct kvfree_rcu_bulk_data *bhead) > > > > { > > > > @@ -3229,6 +3240,26 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp, > > > > > > > > } > > > > > > > > +static int > > > > +drain_page_cache(struct kfree_rcu_cpu *krcp) > > > > +{ > > > > + unsigned long flags; > > > > + struct llist_node *page_list, *pos, *n; > > > > + int freed = 0; > > > > + > > > > + raw_spin_lock_irqsave(>lock, flags); > > > > + page_list = llist_del_all(>bkvcache); > > > > + krcp->nr_bkv_objs = 0; > > > > + raw_spin_unlock_irqrestore(>lock, flags); > > > > + > > > > + llist_for_each_safe(pos, n, page_list) { > > > > + free_page((unsigned long)pos); > > > > + freed++; > > > > + } > > > > + > > > > + return freed; > > > > +} > > > > + > > > > /* > > > > * This function is invoked in workqueue context after a grace period. > > > > * It frees all the objects queued on ->bhead_free or ->head_free. > > > > @@ -3419,7 +3450,7 @@ schedule_page_work_fn(struct hrtimer *t) > > > > struct kfree_rcu_cpu *krcp = > > > > container_of(t, struct kfree_rcu_cpu, hrtimer); > > > > > > > > -
Re: [PATCH v2 1/1] kvfree_rcu: Release a page cache under memory pressure
On Tue, Mar 16, 2021 at 09:42:07PM +0100, Uladzislau Rezki wrote: > > On Wed, Mar 10, 2021 at 09:07:57PM +0100, Uladzislau Rezki (Sony) wrote: > > > From: Zhang Qiang > > > > > > Add a drain_page_cache() function to drain a per-cpu page cache. > > > The reason behind of it is a system can run into a low memory > > > condition, in that case a page shrinker can ask for its users > > > to free their caches in order to get extra memory available for > > > other needs in a system. > > > > > > When a system hits such condition, a page cache is drained for > > > all CPUs in a system. Apart of that a page cache work is delayed > > > with 5 seconds interval until a memory pressure disappears. > > > > Does this capture it? > > > It would be good to have kind of clear interface saying that: > > - low memory condition starts; > - it is over, watermarks were fixed. > > but i do not see it. Therefore 5 seconds back-off has been chosen > to make a cache refilling to be less aggressive. Suppose 5 seconds > is not enough, in that case the work will attempt to allocate some > pages using less permissive parameters. What means that if we are > still in a low memory condition a refilling will probably fail and > next job will be invoked in 5 seconds one more time. I would like such an interface as well, but from what I hear it is easier to ask for than to provide. :-/ > > > > > > Add a drain_page_cache() function that drains the specified per-cpu > > page cache. This function is invoked on each CPU when the system > > enters a low-memory state, that is, when the shrinker invokes > > kfree_rcu_shrink_scan(). Thus, when the system is low on memory, > > kvfree_rcu() starts taking its slow paths. > > > > In addition, the first subsequent attempt to refill the caches is > > delayed for five seconds. > > > > > > > > A few questions below. > > > > Thanx, Paul > > > > > Co-developed-by: Uladzislau Rezki (Sony) > > > Signed-off-by: Uladzislau Rezki (Sony) > > > Signed-off-by: Zqiang > > > --- > > > kernel/rcu/tree.c | 59 --- > > > 1 file changed, 51 insertions(+), 8 deletions(-) > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index 2c9cf4df942c..46b8a98ca077 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -3163,7 +3163,7 @@ struct kfree_rcu_cpu { > > > bool initialized; > > > int count; > > > > > > - struct work_struct page_cache_work; > > > + struct delayed_work page_cache_work; > > > atomic_t work_in_progress; > > > struct hrtimer hrtimer; > > > > > > @@ -3175,6 +3175,17 @@ static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = > > > { > > > .lock = __RAW_SPIN_LOCK_UNLOCKED(krc.lock), > > > }; > > > > > > +// A page shrinker can ask for freeing extra pages > > > +// to get them available for other needs in a system. > > > +// Usually it happens under low memory condition, in > > > +// that case hold on a bit with page cache filling. > > > +static unsigned long backoff_page_cache_fill; > > > + > > > +// 5 seconds delay. That is long enough to reduce > > > +// an interfering and racing with a shrinker where > > > +// the cache is drained. > > > +#define PAGE_CACHE_FILL_DELAY (5 * HZ) > > > + > > > static __always_inline void > > > debug_rcu_bhead_unqueue(struct kvfree_rcu_bulk_data *bhead) > > > { > > > @@ -3229,6 +3240,26 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp, > > > > > > } > > > > > > +static int > > > +drain_page_cache(struct kfree_rcu_cpu *krcp) > > > +{ > > > + unsigned long flags; > > > + struct llist_node *page_list, *pos, *n; > > > + int freed = 0; > > > + > > > + raw_spin_lock_irqsave(>lock, flags); > > > + page_list = llist_del_all(>bkvcache); > > > + krcp->nr_bkv_objs = 0; > > > + raw_spin_unlock_irqrestore(>lock, flags); > > > + > > > + llist_for_each_safe(pos, n, page_list) { > > > + free_page((unsigned long)pos); > > > + freed++; > > > + } > > > + > > > + return freed; > > > +} > > > + > > > /* > > > * This function is invoked in workqueue context after a grace period. > > > * It frees all the objects queued on ->bhead_free or ->head_free. > > > @@ -3419,7 +3450,7 @@ schedule_page_work_fn(struct hrtimer *t) > > > struct kfree_rcu_cpu *krcp = > > > container_of(t, struct kfree_rcu_cpu, hrtimer); > > > > > > - queue_work(system_highpri_wq, >page_cache_work); > > > + queue_delayed_work(system_highpri_wq, >page_cache_work, 0); > > > return HRTIMER_NORESTART; > > > } > > > > > > @@ -3428,7 +3459,7 @@ static void fill_page_cache_func(struct work_struct > > > *work) > > > struct kvfree_rcu_bulk_data *bnode; > > > struct kfree_rcu_cpu *krcp = > > > container_of(work, struct kfree_rcu_cpu, > > > - page_cache_work); > >
Re: [PATCH v2 1/1] kvfree_rcu: Release a page cache under memory pressure
> On Wed, Mar 10, 2021 at 09:07:57PM +0100, Uladzislau Rezki (Sony) wrote: > > From: Zhang Qiang > > > > Add a drain_page_cache() function to drain a per-cpu page cache. > > The reason behind of it is a system can run into a low memory > > condition, in that case a page shrinker can ask for its users > > to free their caches in order to get extra memory available for > > other needs in a system. > > > > When a system hits such condition, a page cache is drained for > > all CPUs in a system. Apart of that a page cache work is delayed > > with 5 seconds interval until a memory pressure disappears. > > Does this capture it? > It would be good to have kind of clear interface saying that: - low memory condition starts; - it is over, watermarks were fixed. but i do not see it. Therefore 5 seconds back-off has been chosen to make a cache refilling to be less aggressive. Suppose 5 seconds is not enough, in that case the work will attempt to allocate some pages using less permissive parameters. What means that if we are still in a low memory condition a refilling will probably fail and next job will be invoked in 5 seconds one more time. > > > Add a drain_page_cache() function that drains the specified per-cpu > page cache. This function is invoked on each CPU when the system > enters a low-memory state, that is, when the shrinker invokes > kfree_rcu_shrink_scan(). Thus, when the system is low on memory, > kvfree_rcu() starts taking its slow paths. > > In addition, the first subsequent attempt to refill the caches is > delayed for five seconds. > > > > A few questions below. > > Thanx, Paul > > > Co-developed-by: Uladzislau Rezki (Sony) > > Signed-off-by: Uladzislau Rezki (Sony) > > Signed-off-by: Zqiang > > --- > > kernel/rcu/tree.c | 59 --- > > 1 file changed, 51 insertions(+), 8 deletions(-) > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 2c9cf4df942c..46b8a98ca077 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -3163,7 +3163,7 @@ struct kfree_rcu_cpu { > > bool initialized; > > int count; > > > > - struct work_struct page_cache_work; > > + struct delayed_work page_cache_work; > > atomic_t work_in_progress; > > struct hrtimer hrtimer; > > > > @@ -3175,6 +3175,17 @@ static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = { > > .lock = __RAW_SPIN_LOCK_UNLOCKED(krc.lock), > > }; > > > > +// A page shrinker can ask for freeing extra pages > > +// to get them available for other needs in a system. > > +// Usually it happens under low memory condition, in > > +// that case hold on a bit with page cache filling. > > +static unsigned long backoff_page_cache_fill; > > + > > +// 5 seconds delay. That is long enough to reduce > > +// an interfering and racing with a shrinker where > > +// the cache is drained. > > +#define PAGE_CACHE_FILL_DELAY (5 * HZ) > > + > > static __always_inline void > > debug_rcu_bhead_unqueue(struct kvfree_rcu_bulk_data *bhead) > > { > > @@ -3229,6 +3240,26 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp, > > > > } > > > > +static int > > +drain_page_cache(struct kfree_rcu_cpu *krcp) > > +{ > > + unsigned long flags; > > + struct llist_node *page_list, *pos, *n; > > + int freed = 0; > > + > > + raw_spin_lock_irqsave(>lock, flags); > > + page_list = llist_del_all(>bkvcache); > > + krcp->nr_bkv_objs = 0; > > + raw_spin_unlock_irqrestore(>lock, flags); > > + > > + llist_for_each_safe(pos, n, page_list) { > > + free_page((unsigned long)pos); > > + freed++; > > + } > > + > > + return freed; > > +} > > + > > /* > > * This function is invoked in workqueue context after a grace period. > > * It frees all the objects queued on ->bhead_free or ->head_free. > > @@ -3419,7 +3450,7 @@ schedule_page_work_fn(struct hrtimer *t) > > struct kfree_rcu_cpu *krcp = > > container_of(t, struct kfree_rcu_cpu, hrtimer); > > > > - queue_work(system_highpri_wq, >page_cache_work); > > + queue_delayed_work(system_highpri_wq, >page_cache_work, 0); > > return HRTIMER_NORESTART; > > } > > > > @@ -3428,7 +3459,7 @@ static void fill_page_cache_func(struct work_struct > > *work) > > struct kvfree_rcu_bulk_data *bnode; > > struct kfree_rcu_cpu *krcp = > > container_of(work, struct kfree_rcu_cpu, > > - page_cache_work); > > + page_cache_work.work); > > unsigned long flags; > > bool pushed; > > int i; > > @@ -3457,10 +3488,14 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp) > > { > > if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING && > > !atomic_xchg(>work_in_progress, 1)) { > > - hrtimer_init(>hrtimer,
Re: [PATCH v2 1/1] kvfree_rcu: Release a page cache under memory pressure
On Wed, Mar 10, 2021 at 09:07:57PM +0100, Uladzislau Rezki (Sony) wrote: > From: Zhang Qiang > > Add a drain_page_cache() function to drain a per-cpu page cache. > The reason behind of it is a system can run into a low memory > condition, in that case a page shrinker can ask for its users > to free their caches in order to get extra memory available for > other needs in a system. > > When a system hits such condition, a page cache is drained for > all CPUs in a system. Apart of that a page cache work is delayed > with 5 seconds interval until a memory pressure disappears. Does this capture it? Add a drain_page_cache() function that drains the specified per-cpu page cache. This function is invoked on each CPU when the system enters a low-memory state, that is, when the shrinker invokes kfree_rcu_shrink_scan(). Thus, when the system is low on memory, kvfree_rcu() starts taking its slow paths. In addition, the first subsequent attempt to refill the caches is delayed for five seconds. A few questions below. Thanx, Paul > Co-developed-by: Uladzislau Rezki (Sony) > Signed-off-by: Uladzislau Rezki (Sony) > Signed-off-by: Zqiang > --- > kernel/rcu/tree.c | 59 --- > 1 file changed, 51 insertions(+), 8 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 2c9cf4df942c..46b8a98ca077 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3163,7 +3163,7 @@ struct kfree_rcu_cpu { > bool initialized; > int count; > > - struct work_struct page_cache_work; > + struct delayed_work page_cache_work; > atomic_t work_in_progress; > struct hrtimer hrtimer; > > @@ -3175,6 +3175,17 @@ static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = { > .lock = __RAW_SPIN_LOCK_UNLOCKED(krc.lock), > }; > > +// A page shrinker can ask for freeing extra pages > +// to get them available for other needs in a system. > +// Usually it happens under low memory condition, in > +// that case hold on a bit with page cache filling. > +static unsigned long backoff_page_cache_fill; > + > +// 5 seconds delay. That is long enough to reduce > +// an interfering and racing with a shrinker where > +// the cache is drained. > +#define PAGE_CACHE_FILL_DELAY (5 * HZ) > + > static __always_inline void > debug_rcu_bhead_unqueue(struct kvfree_rcu_bulk_data *bhead) > { > @@ -3229,6 +3240,26 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp, > > } > > +static int > +drain_page_cache(struct kfree_rcu_cpu *krcp) > +{ > + unsigned long flags; > + struct llist_node *page_list, *pos, *n; > + int freed = 0; > + > + raw_spin_lock_irqsave(>lock, flags); > + page_list = llist_del_all(>bkvcache); > + krcp->nr_bkv_objs = 0; > + raw_spin_unlock_irqrestore(>lock, flags); > + > + llist_for_each_safe(pos, n, page_list) { > + free_page((unsigned long)pos); > + freed++; > + } > + > + return freed; > +} > + > /* > * This function is invoked in workqueue context after a grace period. > * It frees all the objects queued on ->bhead_free or ->head_free. > @@ -3419,7 +3450,7 @@ schedule_page_work_fn(struct hrtimer *t) > struct kfree_rcu_cpu *krcp = > container_of(t, struct kfree_rcu_cpu, hrtimer); > > - queue_work(system_highpri_wq, >page_cache_work); > + queue_delayed_work(system_highpri_wq, >page_cache_work, 0); > return HRTIMER_NORESTART; > } > > @@ -3428,7 +3459,7 @@ static void fill_page_cache_func(struct work_struct > *work) > struct kvfree_rcu_bulk_data *bnode; > struct kfree_rcu_cpu *krcp = > container_of(work, struct kfree_rcu_cpu, > - page_cache_work); > + page_cache_work.work); > unsigned long flags; > bool pushed; > int i; > @@ -3457,10 +3488,14 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp) > { > if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING && > !atomic_xchg(>work_in_progress, 1)) { > - hrtimer_init(>hrtimer, CLOCK_MONOTONIC, > - HRTIMER_MODE_REL); > - krcp->hrtimer.function = schedule_page_work_fn; > - hrtimer_start(>hrtimer, 0, HRTIMER_MODE_REL); > + if (xchg(_page_cache_fill, 0UL)) { How often can run_page_cache_worker() be invoked? I am a bit concerned about the possibility of heavy memory contention on the backoff_page_cache_fill variable on large systems. Unless there is something that sharply bounds the frequency of calls to run_page_cache_worker(), something like this would be more scalable: if (backoff_page_cache_fill && xchg(_page_cache_fill, 0UL)) { It looks to me like all the CPUs could