Re: [RFC 1/2] mm: disable LRU pagevec during the migration temporarily

Minchan Kim Wed, 17 Feb 2021 12:53:17 -0800

On Wed, Feb 17, 2021 at 10:50:55AM +0100, Michal Hocko wrote:
> On Wed 17-02-21 09:59:54, Michal Hocko wrote:
> > On Tue 16-02-21 09:03:47, Minchan Kim wrote:
> [...]
> > >  /*
> > >   * migrate_prep() needs to be called before we start compiling a list of 
> > > pages
> > >   * to be migrated using isolate_lru_page(). If scheduling work on other 
> > > CPUs is
> > > @@ -64,11 +80,27 @@
> > >   */
> > >  void migrate_prep(void)
> > >  {
> > > + unsigned int cpu;
> > > +
> > > + spin_lock(&migrate_pending_lock);
> > > + migrate_pending_count++;
> > > + spin_unlock(&migrate_pending_lock);
> > 
> > I suspect you do not want to add atomic_read inside hot paths, right? Is
> > this really something that we have to microoptimize for? atomic_read is
> > a simple READ_ONCE on many archs.
> 
> Or you rather wanted to prevent from read memory barrier to enfore the
> ordering.


Yub.

> 
> > > +
> > > + for_each_online_cpu(cpu) {
> > > +         struct work_struct *work = &per_cpu(migrate_pending_work, cpu);
> > > +
> > > +         INIT_WORK(work, read_migrate_pending);
> > > +         queue_work_on(cpu, mm_percpu_wq, work);
> > > + }
> > > +
> > > + for_each_online_cpu(cpu)
> > > +         flush_work(&per_cpu(migrate_pending_work, cpu));
> > 
> > I also do not follow this scheme. Where is the IPI you are mentioning
> > above?
> 
> Thinking about it some more I think you mean the rescheduling IPI here?

True.

> 
> > > + /*
> > > +  * From now on, every online cpu will see uptodate
> > > +  * migarte_pending_work.
> > > +  */
> > >   /*
> > >    * Clear the LRU lists so pages can be isolated.
> > > -  * Note that pages may be moved off the LRU after we have
> > > -  * drained them. Those pages will fail to migrate like other
> > > -  * pages that may be busy.
> > >    */
> > >   lru_add_drain_all();
> > 
> > Overall, this looks rather heavy weight to my taste. Have you tried to
> > play with a simple atomic counter approach? atomic_read when adding to
> > the cache and atomic_inc inside migrate_prep followed by lrdu_add_drain.

I'd like to avoid atomic operation if we could.

> 
> If you really want a strong ordering then it should be sufficient to
> simply alter lru_add_drain_all to force draining all CPUs. This will
> make sure no new pages are added to the pcp lists and you will also sync
> up anything that has accumulated because of a race between atomic_read
> and inc:
> diff --git a/mm/swap.c b/mm/swap.c
> index 2cca7141470c..91600d7bb7a8 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -745,7 +745,7 @@ static void lru_add_drain_per_cpu(struct work_struct 
> *dummy)
>   * Calling this function with cpu hotplug locks held can actually lead
>   * to obscure indirect dependencies via WQ context.
>   */
> -void lru_add_drain_all(void)
> +void lru_add_drain_all(bool force_all_cpus)
>  {
>       /*
>        * lru_drain_gen - Global pages generation number
> @@ -820,7 +820,8 @@ void lru_add_drain_all(void)
>       for_each_online_cpu(cpu) {
>               struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
>  
> -             if (pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) ||
> +             if (force_all_cpus ||
> +                 pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) ||
>                   data_race(pagevec_count(&per_cpu(lru_rotate.pvec, cpu))) ||
>                   pagevec_count(&per_cpu(lru_pvecs.lru_deactivate_file, cpu)) 
> ||
>                   pagevec_count(&per_cpu(lru_pvecs.lru_deactivate, cpu)) ||

Yub, that's a idea.
How about this?

diff --git a/mm/migrate.c b/mm/migrate.c
index a69da8aaeccd..2531642dd9ce 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -57,6 +57,14 @@
 
 #include "internal.h"
 
+static DEFINE_SPINLOCK(migrate_pending_lock);
+static unsigned long migrate_pending_count;
+
+bool migrate_pending(void)
+{
+       return migrate_pending_count;
+}
+
 /*
  * migrate_prep() needs to be called before we start compiling a list of pages
  * to be migrated using isolate_lru_page(). If scheduling work on other CPUs is
@@ -64,13 +72,20 @@
  */
 void migrate_prep(void)
 {
+       unsigned int cpu;
+
+       /*
+        * lru_add_drain_all's IPI will make sure no new pages are added
+        * to the pcp lists and drain them all.
+        */
+       spin_lock(&migrate_pending_lock);
+       migrate_pending_count++;
+       spin_unlock(&migrate_pending_lock);
+
        /*
         * Clear the LRU lists so pages can be isolated.
-        * Note that pages may be moved off the LRU after we have
-        * drained them. Those pages will fail to migrate like other
-        * pages that may be busy.
         */
-       lru_add_drain_all();
+       lru_add_drain_all(true);
 }
 
 /* Do the necessary work of migrate_prep but not if it involves other CPUs */
@@ -79,6 +94,15 @@ void migrate_prep_local(void)
        lru_add_drain();
 }
 
+void migrate_finish(void)
+{
+       int cpu;
+
+       spin_lock(&migrate_pending_lock);
+       migrate_pending_count--;
+       spin_unlock(&migrate_pending_lock);
+}

A odd here is there are no barrier for migrate_finish for
updating migrate_pending_count so other CPUs will see
stale value until they encounters the barrier by chance.
However, it wouldn't be a big deal, IMHO.

What do you think?

Re: [RFC 1/2] mm: disable LRU pagevec during the migration temporarily

Reply via email to