Re: [PATCH v2 1/1] kvfree_rcu: Release a page cache under memory pressure

2021-03-18 Thread Uladzislau Rezki
On Tue, Mar 16, 2021 at 02:01:25PM -0700, Paul E. McKenney wrote:
> On Tue, Mar 16, 2021 at 09:42:07PM +0100, Uladzislau Rezki wrote:
> > > On Wed, Mar 10, 2021 at 09:07:57PM +0100, Uladzislau Rezki (Sony) wrote:
> > > > From: Zhang Qiang 
> > > > 
> > > > Add a drain_page_cache() function to drain a per-cpu page cache.
> > > > The reason behind of it is a system can run into a low memory
> > > > condition, in that case a page shrinker can ask for its users
> > > > to free their caches in order to get extra memory available for
> > > > other needs in a system.
> > > > 
> > > > When a system hits such condition, a page cache is drained for
> > > > all CPUs in a system. Apart of that a page cache work is delayed
> > > > with 5 seconds interval until a memory pressure disappears.
> > > 
> > > Does this capture it?
> > > 
> > It would be good to have kind of clear interface saying that:
> > 
> > - low memory condition starts;
> > - it is over, watermarks were fixed.
> > 
> > but i do not see it. Therefore 5 seconds back-off has been chosen
> > to make a cache refilling to be less aggressive. Suppose 5 seconds
> > is not enough, in that case the work will attempt to allocate some
> > pages using less permissive parameters. What means that if we are
> > still in a low memory condition a refilling will probably fail and
> > next job will be invoked in 5 seconds one more time.
> 
> I would like such an interface as well, but from what I hear it is
> easier to ask for than to provide.  :-/
> 
> > > 
> > > 
> > > Add a drain_page_cache() function that drains the specified per-cpu
> > > page cache.  This function is invoked on each CPU when the system
> > > enters a low-memory state, that is, when the shrinker invokes
> > > kfree_rcu_shrink_scan().  Thus, when the system is low on memory,
> > > kvfree_rcu() starts taking its slow paths.
> > > 
> > > In addition, the first subsequent attempt to refill the caches is
> > > delayed for five seconds.
> > > 
> > > 
> > > 
> > > A few questions below.
> > > 
> > >   Thanx, Paul
> > > 
> > > > Co-developed-by: Uladzislau Rezki (Sony) 
> > > > Signed-off-by: Uladzislau Rezki (Sony) 
> > > > Signed-off-by: Zqiang 
> > > > ---
> > > >  kernel/rcu/tree.c | 59 ---
> > > >  1 file changed, 51 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 2c9cf4df942c..46b8a98ca077 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -3163,7 +3163,7 @@ struct kfree_rcu_cpu {
> > > > bool initialized;
> > > > int count;
> > > >  
> > > > -   struct work_struct page_cache_work;
> > > > +   struct delayed_work page_cache_work;
> > > > atomic_t work_in_progress;
> > > > struct hrtimer hrtimer;
> > > >  
> > > > @@ -3175,6 +3175,17 @@ static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) 
> > > > = {
> > > > .lock = __RAW_SPIN_LOCK_UNLOCKED(krc.lock),
> > > >  };
> > > >  
> > > > +// A page shrinker can ask for freeing extra pages
> > > > +// to get them available for other needs in a system.
> > > > +// Usually it happens under low memory condition, in
> > > > +// that case hold on a bit with page cache filling.
> > > > +static unsigned long backoff_page_cache_fill;
> > > > +
> > > > +// 5 seconds delay. That is long enough to reduce
> > > > +// an interfering and racing with a shrinker where
> > > > +// the cache is drained.
> > > > +#define PAGE_CACHE_FILL_DELAY (5 * HZ)
> > > > +
> > > >  static __always_inline void
> > > >  debug_rcu_bhead_unqueue(struct kvfree_rcu_bulk_data *bhead)
> > > >  {
> > > > @@ -3229,6 +3240,26 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp,
> > > >  
> > > >  }
> > > >  
> > > > +static int
> > > > +drain_page_cache(struct kfree_rcu_cpu *krcp)
> > > > +{
> > > > +   unsigned long flags;
> > > > +   struct llist_node *page_list, *pos, *n;
> > > > +   int freed = 0;
> > > > +
> > > > +   raw_spin_lock_irqsave(>lock, flags);
> > > > +   page_list = llist_del_all(>bkvcache);
> > > > +   krcp->nr_bkv_objs = 0;
> > > > +   raw_spin_unlock_irqrestore(>lock, flags);
> > > > +
> > > > +   llist_for_each_safe(pos, n, page_list) {
> > > > +   free_page((unsigned long)pos);
> > > > +   freed++;
> > > > +   }
> > > > +
> > > > +   return freed;
> > > > +}
> > > > +
> > > >  /*
> > > >   * This function is invoked in workqueue context after a grace period.
> > > >   * It frees all the objects queued on ->bhead_free or ->head_free.
> > > > @@ -3419,7 +3450,7 @@ schedule_page_work_fn(struct hrtimer *t)
> > > > struct kfree_rcu_cpu *krcp =
> > > > container_of(t, struct kfree_rcu_cpu, hrtimer);
> > > >  
> > > > -   

Re: [PATCH v2 1/1] kvfree_rcu: Release a page cache under memory pressure

2021-03-16 Thread Paul E. McKenney
On Tue, Mar 16, 2021 at 09:42:07PM +0100, Uladzislau Rezki wrote:
> > On Wed, Mar 10, 2021 at 09:07:57PM +0100, Uladzislau Rezki (Sony) wrote:
> > > From: Zhang Qiang 
> > > 
> > > Add a drain_page_cache() function to drain a per-cpu page cache.
> > > The reason behind of it is a system can run into a low memory
> > > condition, in that case a page shrinker can ask for its users
> > > to free their caches in order to get extra memory available for
> > > other needs in a system.
> > > 
> > > When a system hits such condition, a page cache is drained for
> > > all CPUs in a system. Apart of that a page cache work is delayed
> > > with 5 seconds interval until a memory pressure disappears.
> > 
> > Does this capture it?
> > 
> It would be good to have kind of clear interface saying that:
> 
> - low memory condition starts;
> - it is over, watermarks were fixed.
> 
> but i do not see it. Therefore 5 seconds back-off has been chosen
> to make a cache refilling to be less aggressive. Suppose 5 seconds
> is not enough, in that case the work will attempt to allocate some
> pages using less permissive parameters. What means that if we are
> still in a low memory condition a refilling will probably fail and
> next job will be invoked in 5 seconds one more time.

I would like such an interface as well, but from what I hear it is
easier to ask for than to provide.  :-/

> > 
> > 
> > Add a drain_page_cache() function that drains the specified per-cpu
> > page cache.  This function is invoked on each CPU when the system
> > enters a low-memory state, that is, when the shrinker invokes
> > kfree_rcu_shrink_scan().  Thus, when the system is low on memory,
> > kvfree_rcu() starts taking its slow paths.
> > 
> > In addition, the first subsequent attempt to refill the caches is
> > delayed for five seconds.
> > 
> > 
> > 
> > A few questions below.
> > 
> > Thanx, Paul
> > 
> > > Co-developed-by: Uladzislau Rezki (Sony) 
> > > Signed-off-by: Uladzislau Rezki (Sony) 
> > > Signed-off-by: Zqiang 
> > > ---
> > >  kernel/rcu/tree.c | 59 ---
> > >  1 file changed, 51 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 2c9cf4df942c..46b8a98ca077 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3163,7 +3163,7 @@ struct kfree_rcu_cpu {
> > >   bool initialized;
> > >   int count;
> > >  
> > > - struct work_struct page_cache_work;
> > > + struct delayed_work page_cache_work;
> > >   atomic_t work_in_progress;
> > >   struct hrtimer hrtimer;
> > >  
> > > @@ -3175,6 +3175,17 @@ static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = 
> > > {
> > >   .lock = __RAW_SPIN_LOCK_UNLOCKED(krc.lock),
> > >  };
> > >  
> > > +// A page shrinker can ask for freeing extra pages
> > > +// to get them available for other needs in a system.
> > > +// Usually it happens under low memory condition, in
> > > +// that case hold on a bit with page cache filling.
> > > +static unsigned long backoff_page_cache_fill;
> > > +
> > > +// 5 seconds delay. That is long enough to reduce
> > > +// an interfering and racing with a shrinker where
> > > +// the cache is drained.
> > > +#define PAGE_CACHE_FILL_DELAY (5 * HZ)
> > > +
> > >  static __always_inline void
> > >  debug_rcu_bhead_unqueue(struct kvfree_rcu_bulk_data *bhead)
> > >  {
> > > @@ -3229,6 +3240,26 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp,
> > >  
> > >  }
> > >  
> > > +static int
> > > +drain_page_cache(struct kfree_rcu_cpu *krcp)
> > > +{
> > > + unsigned long flags;
> > > + struct llist_node *page_list, *pos, *n;
> > > + int freed = 0;
> > > +
> > > + raw_spin_lock_irqsave(>lock, flags);
> > > + page_list = llist_del_all(>bkvcache);
> > > + krcp->nr_bkv_objs = 0;
> > > + raw_spin_unlock_irqrestore(>lock, flags);
> > > +
> > > + llist_for_each_safe(pos, n, page_list) {
> > > + free_page((unsigned long)pos);
> > > + freed++;
> > > + }
> > > +
> > > + return freed;
> > > +}
> > > +
> > >  /*
> > >   * This function is invoked in workqueue context after a grace period.
> > >   * It frees all the objects queued on ->bhead_free or ->head_free.
> > > @@ -3419,7 +3450,7 @@ schedule_page_work_fn(struct hrtimer *t)
> > >   struct kfree_rcu_cpu *krcp =
> > >   container_of(t, struct kfree_rcu_cpu, hrtimer);
> > >  
> > > - queue_work(system_highpri_wq, >page_cache_work);
> > > + queue_delayed_work(system_highpri_wq, >page_cache_work, 0);
> > >   return HRTIMER_NORESTART;
> > >  }
> > >  
> > > @@ -3428,7 +3459,7 @@ static void fill_page_cache_func(struct work_struct 
> > > *work)
> > >   struct kvfree_rcu_bulk_data *bnode;
> > >   struct kfree_rcu_cpu *krcp =
> > >   container_of(work, struct kfree_rcu_cpu,
> > > - page_cache_work);
> > 

Re: [PATCH v2 1/1] kvfree_rcu: Release a page cache under memory pressure

2021-03-16 Thread Uladzislau Rezki
> On Wed, Mar 10, 2021 at 09:07:57PM +0100, Uladzislau Rezki (Sony) wrote:
> > From: Zhang Qiang 
> > 
> > Add a drain_page_cache() function to drain a per-cpu page cache.
> > The reason behind of it is a system can run into a low memory
> > condition, in that case a page shrinker can ask for its users
> > to free their caches in order to get extra memory available for
> > other needs in a system.
> > 
> > When a system hits such condition, a page cache is drained for
> > all CPUs in a system. Apart of that a page cache work is delayed
> > with 5 seconds interval until a memory pressure disappears.
> 
> Does this capture it?
> 
It would be good to have kind of clear interface saying that:

- low memory condition starts;
- it is over, watermarks were fixed.

but i do not see it. Therefore 5 seconds back-off has been chosen
to make a cache refilling to be less aggressive. Suppose 5 seconds
is not enough, in that case the work will attempt to allocate some
pages using less permissive parameters. What means that if we are
still in a low memory condition a refilling will probably fail and
next job will be invoked in 5 seconds one more time.

> 
> 
> Add a drain_page_cache() function that drains the specified per-cpu
> page cache.  This function is invoked on each CPU when the system
> enters a low-memory state, that is, when the shrinker invokes
> kfree_rcu_shrink_scan().  Thus, when the system is low on memory,
> kvfree_rcu() starts taking its slow paths.
> 
> In addition, the first subsequent attempt to refill the caches is
> delayed for five seconds.
> 
> 
> 
> A few questions below.
> 
>   Thanx, Paul
> 
> > Co-developed-by: Uladzislau Rezki (Sony) 
> > Signed-off-by: Uladzislau Rezki (Sony) 
> > Signed-off-by: Zqiang 
> > ---
> >  kernel/rcu/tree.c | 59 ---
> >  1 file changed, 51 insertions(+), 8 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 2c9cf4df942c..46b8a98ca077 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3163,7 +3163,7 @@ struct kfree_rcu_cpu {
> > bool initialized;
> > int count;
> >  
> > -   struct work_struct page_cache_work;
> > +   struct delayed_work page_cache_work;
> > atomic_t work_in_progress;
> > struct hrtimer hrtimer;
> >  
> > @@ -3175,6 +3175,17 @@ static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = {
> > .lock = __RAW_SPIN_LOCK_UNLOCKED(krc.lock),
> >  };
> >  
> > +// A page shrinker can ask for freeing extra pages
> > +// to get them available for other needs in a system.
> > +// Usually it happens under low memory condition, in
> > +// that case hold on a bit with page cache filling.
> > +static unsigned long backoff_page_cache_fill;
> > +
> > +// 5 seconds delay. That is long enough to reduce
> > +// an interfering and racing with a shrinker where
> > +// the cache is drained.
> > +#define PAGE_CACHE_FILL_DELAY (5 * HZ)
> > +
> >  static __always_inline void
> >  debug_rcu_bhead_unqueue(struct kvfree_rcu_bulk_data *bhead)
> >  {
> > @@ -3229,6 +3240,26 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp,
> >  
> >  }
> >  
> > +static int
> > +drain_page_cache(struct kfree_rcu_cpu *krcp)
> > +{
> > +   unsigned long flags;
> > +   struct llist_node *page_list, *pos, *n;
> > +   int freed = 0;
> > +
> > +   raw_spin_lock_irqsave(>lock, flags);
> > +   page_list = llist_del_all(>bkvcache);
> > +   krcp->nr_bkv_objs = 0;
> > +   raw_spin_unlock_irqrestore(>lock, flags);
> > +
> > +   llist_for_each_safe(pos, n, page_list) {
> > +   free_page((unsigned long)pos);
> > +   freed++;
> > +   }
> > +
> > +   return freed;
> > +}
> > +
> >  /*
> >   * This function is invoked in workqueue context after a grace period.
> >   * It frees all the objects queued on ->bhead_free or ->head_free.
> > @@ -3419,7 +3450,7 @@ schedule_page_work_fn(struct hrtimer *t)
> > struct kfree_rcu_cpu *krcp =
> > container_of(t, struct kfree_rcu_cpu, hrtimer);
> >  
> > -   queue_work(system_highpri_wq, >page_cache_work);
> > +   queue_delayed_work(system_highpri_wq, >page_cache_work, 0);
> > return HRTIMER_NORESTART;
> >  }
> >  
> > @@ -3428,7 +3459,7 @@ static void fill_page_cache_func(struct work_struct 
> > *work)
> > struct kvfree_rcu_bulk_data *bnode;
> > struct kfree_rcu_cpu *krcp =
> > container_of(work, struct kfree_rcu_cpu,
> > -   page_cache_work);
> > +   page_cache_work.work);
> > unsigned long flags;
> > bool pushed;
> > int i;
> > @@ -3457,10 +3488,14 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp)
> >  {
> > if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
> > !atomic_xchg(>work_in_progress, 1)) {
> > -   hrtimer_init(>hrtimer, 

Re: [PATCH v2 1/1] kvfree_rcu: Release a page cache under memory pressure

2021-03-16 Thread Paul E. McKenney
On Wed, Mar 10, 2021 at 09:07:57PM +0100, Uladzislau Rezki (Sony) wrote:
> From: Zhang Qiang 
> 
> Add a drain_page_cache() function to drain a per-cpu page cache.
> The reason behind of it is a system can run into a low memory
> condition, in that case a page shrinker can ask for its users
> to free their caches in order to get extra memory available for
> other needs in a system.
> 
> When a system hits such condition, a page cache is drained for
> all CPUs in a system. Apart of that a page cache work is delayed
> with 5 seconds interval until a memory pressure disappears.

Does this capture it?



Add a drain_page_cache() function that drains the specified per-cpu
page cache.  This function is invoked on each CPU when the system
enters a low-memory state, that is, when the shrinker invokes
kfree_rcu_shrink_scan().  Thus, when the system is low on memory,
kvfree_rcu() starts taking its slow paths.

In addition, the first subsequent attempt to refill the caches is
delayed for five seconds.



A few questions below.

Thanx, Paul

> Co-developed-by: Uladzislau Rezki (Sony) 
> Signed-off-by: Uladzislau Rezki (Sony) 
> Signed-off-by: Zqiang 
> ---
>  kernel/rcu/tree.c | 59 ---
>  1 file changed, 51 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 2c9cf4df942c..46b8a98ca077 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3163,7 +3163,7 @@ struct kfree_rcu_cpu {
>   bool initialized;
>   int count;
>  
> - struct work_struct page_cache_work;
> + struct delayed_work page_cache_work;
>   atomic_t work_in_progress;
>   struct hrtimer hrtimer;
>  
> @@ -3175,6 +3175,17 @@ static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = {
>   .lock = __RAW_SPIN_LOCK_UNLOCKED(krc.lock),
>  };
>  
> +// A page shrinker can ask for freeing extra pages
> +// to get them available for other needs in a system.
> +// Usually it happens under low memory condition, in
> +// that case hold on a bit with page cache filling.
> +static unsigned long backoff_page_cache_fill;
> +
> +// 5 seconds delay. That is long enough to reduce
> +// an interfering and racing with a shrinker where
> +// the cache is drained.
> +#define PAGE_CACHE_FILL_DELAY (5 * HZ)
> +
>  static __always_inline void
>  debug_rcu_bhead_unqueue(struct kvfree_rcu_bulk_data *bhead)
>  {
> @@ -3229,6 +3240,26 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp,
>  
>  }
>  
> +static int
> +drain_page_cache(struct kfree_rcu_cpu *krcp)
> +{
> + unsigned long flags;
> + struct llist_node *page_list, *pos, *n;
> + int freed = 0;
> +
> + raw_spin_lock_irqsave(>lock, flags);
> + page_list = llist_del_all(>bkvcache);
> + krcp->nr_bkv_objs = 0;
> + raw_spin_unlock_irqrestore(>lock, flags);
> +
> + llist_for_each_safe(pos, n, page_list) {
> + free_page((unsigned long)pos);
> + freed++;
> + }
> +
> + return freed;
> +}
> +
>  /*
>   * This function is invoked in workqueue context after a grace period.
>   * It frees all the objects queued on ->bhead_free or ->head_free.
> @@ -3419,7 +3450,7 @@ schedule_page_work_fn(struct hrtimer *t)
>   struct kfree_rcu_cpu *krcp =
>   container_of(t, struct kfree_rcu_cpu, hrtimer);
>  
> - queue_work(system_highpri_wq, >page_cache_work);
> + queue_delayed_work(system_highpri_wq, >page_cache_work, 0);
>   return HRTIMER_NORESTART;
>  }
>  
> @@ -3428,7 +3459,7 @@ static void fill_page_cache_func(struct work_struct 
> *work)
>   struct kvfree_rcu_bulk_data *bnode;
>   struct kfree_rcu_cpu *krcp =
>   container_of(work, struct kfree_rcu_cpu,
> - page_cache_work);
> + page_cache_work.work);
>   unsigned long flags;
>   bool pushed;
>   int i;
> @@ -3457,10 +3488,14 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp)
>  {
>   if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
>   !atomic_xchg(>work_in_progress, 1)) {
> - hrtimer_init(>hrtimer, CLOCK_MONOTONIC,
> - HRTIMER_MODE_REL);
> - krcp->hrtimer.function = schedule_page_work_fn;
> - hrtimer_start(>hrtimer, 0, HRTIMER_MODE_REL);
> + if (xchg(_page_cache_fill, 0UL)) {

How often can run_page_cache_worker() be invoked?  I am a bit
concerned about the possibility of heavy memory contention on the
backoff_page_cache_fill variable on large systems.  Unless there
is something that sharply bounds the frequency of calls to
run_page_cache_worker(), something like this would be more scalable:

if (backoff_page_cache_fill &&
xchg(_page_cache_fill, 0UL)) {

It looks to me like all the CPUs could