Re: [PATCH 13/19] mm/migrate: Use xchg instead of spinlock
On Tue, Jun 05, 2018 at 12:24:39AM -0700, Srikar Dronamraju wrote: > * Peter Zijlstra [2018-06-04 21:28:21]: > > > > if (time_after(jiffies, pgdat->numabalancing_migrate_next_window)) { > > > - spin_lock(>numabalancing_migrate_lock); > > > - pgdat->numabalancing_migrate_nr_pages = 0; > > > - pgdat->numabalancing_migrate_next_window = jiffies + > > > - msecs_to_jiffies(migrate_interval_millisecs); > > > - spin_unlock(>numabalancing_migrate_lock); > > > + if (xchg(>numabalancing_migrate_nr_pages, 0)) > > > + pgdat->numabalancing_migrate_next_window = jiffies + > > > + msecs_to_jiffies(migrate_interval_millisecs); > > > > Note that both are in fact wrong. That wants to be something like: > > > > pgdat->numabalancing_migrate_next_window += interval; > > > > Otherwise you stretch every interval by 'jiffies - > > numabalancing_migrate_next_window'. > > Okay, I get your point. Note that in practise it probably doesn't matter, but it just upsets my OCD ;-) > > Also, that all wants READ_ONCE/WRITE_ONCE, irrespective of the > > spinlock/xchg. > unsigned long interval = READ_ONCE(pgdat->numabalancing_migrate_next_window); > > if (time_after(jiffies, interval)) { > interval += msecs_to_jiffies(migrate_interval_millisecs)); > if (xchg(>numabalancing_migrate_nr_pages, 0)) > WRITE_ONCE(pgdat->numabalancing_migrate_next_window, interval); > } > > Something like this? Almost, you forgot about the case where 'jiffies - numabalancing_migrate_next_window > interval'. That wants to be something like: unsigned long timo = READ_ONCE(stupid_long_name); if (time_after(jiffies, timo) && xchg(_long_name, 0)) { do { timo += msec_to_jiffies(..); } while (unlikely(time_after(jiffies, timo); WRITE_ONCE(stupid_long_name, timo); }
Re: [PATCH 13/19] mm/migrate: Use xchg instead of spinlock
On Tue, Jun 05, 2018 at 12:24:39AM -0700, Srikar Dronamraju wrote: > * Peter Zijlstra [2018-06-04 21:28:21]: > > > > if (time_after(jiffies, pgdat->numabalancing_migrate_next_window)) { > > > - spin_lock(>numabalancing_migrate_lock); > > > - pgdat->numabalancing_migrate_nr_pages = 0; > > > - pgdat->numabalancing_migrate_next_window = jiffies + > > > - msecs_to_jiffies(migrate_interval_millisecs); > > > - spin_unlock(>numabalancing_migrate_lock); > > > + if (xchg(>numabalancing_migrate_nr_pages, 0)) > > > + pgdat->numabalancing_migrate_next_window = jiffies + > > > + msecs_to_jiffies(migrate_interval_millisecs); > > > > Note that both are in fact wrong. That wants to be something like: > > > > pgdat->numabalancing_migrate_next_window += interval; > > > > Otherwise you stretch every interval by 'jiffies - > > numabalancing_migrate_next_window'. > > Okay, I get your point. Note that in practise it probably doesn't matter, but it just upsets my OCD ;-) > > Also, that all wants READ_ONCE/WRITE_ONCE, irrespective of the > > spinlock/xchg. > unsigned long interval = READ_ONCE(pgdat->numabalancing_migrate_next_window); > > if (time_after(jiffies, interval)) { > interval += msecs_to_jiffies(migrate_interval_millisecs)); > if (xchg(>numabalancing_migrate_nr_pages, 0)) > WRITE_ONCE(pgdat->numabalancing_migrate_next_window, interval); > } > > Something like this? Almost, you forgot about the case where 'jiffies - numabalancing_migrate_next_window > interval'. That wants to be something like: unsigned long timo = READ_ONCE(stupid_long_name); if (time_after(jiffies, timo) && xchg(_long_name, 0)) { do { timo += msec_to_jiffies(..); } while (unlikely(time_after(jiffies, timo); WRITE_ONCE(stupid_long_name, timo); }
Re: [PATCH 13/19] mm/migrate: Use xchg instead of spinlock
* Peter Zijlstra [2018-06-04 21:28:21]: > > if (time_after(jiffies, pgdat->numabalancing_migrate_next_window)) { > > - spin_lock(>numabalancing_migrate_lock); > > - pgdat->numabalancing_migrate_nr_pages = 0; > > - pgdat->numabalancing_migrate_next_window = jiffies + > > - msecs_to_jiffies(migrate_interval_millisecs); > > - spin_unlock(>numabalancing_migrate_lock); > > + if (xchg(>numabalancing_migrate_nr_pages, 0)) > > + pgdat->numabalancing_migrate_next_window = jiffies + > > + msecs_to_jiffies(migrate_interval_millisecs); > > Note that both are in fact wrong. That wants to be something like: > > pgdat->numabalancing_migrate_next_window += interval; > > Otherwise you stretch every interval by 'jiffies - > numabalancing_migrate_next_window'. Okay, I get your point. > > Also, that all wants READ_ONCE/WRITE_ONCE, irrespective of the > spinlock/xchg. > > I suppose the problem here is that PPC has a very nasty test-and-set > spinlock with fwd progress issues while xchg maps to a fairly simple > ll/sc that (hopefully) has some hardware fairness. > > And pgdata being a rather course data structure (per node?) there could > be a lot of CPUs stomping on this here thing. > > So simpler not really, but better for PPC. > unsigned long interval = READ_ONCE(pgdat->numabalancing_migrate_next_window); if (time_after(jiffies, interval)) { interval += msecs_to_jiffies(migrate_interval_millisecs)); if (xchg(>numabalancing_migrate_nr_pages, 0)) WRITE_ONCE(pgdat->numabalancing_migrate_next_window, interval); } Something like this?
Re: [PATCH 13/19] mm/migrate: Use xchg instead of spinlock
* Peter Zijlstra [2018-06-04 21:28:21]: > > if (time_after(jiffies, pgdat->numabalancing_migrate_next_window)) { > > - spin_lock(>numabalancing_migrate_lock); > > - pgdat->numabalancing_migrate_nr_pages = 0; > > - pgdat->numabalancing_migrate_next_window = jiffies + > > - msecs_to_jiffies(migrate_interval_millisecs); > > - spin_unlock(>numabalancing_migrate_lock); > > + if (xchg(>numabalancing_migrate_nr_pages, 0)) > > + pgdat->numabalancing_migrate_next_window = jiffies + > > + msecs_to_jiffies(migrate_interval_millisecs); > > Note that both are in fact wrong. That wants to be something like: > > pgdat->numabalancing_migrate_next_window += interval; > > Otherwise you stretch every interval by 'jiffies - > numabalancing_migrate_next_window'. Okay, I get your point. > > Also, that all wants READ_ONCE/WRITE_ONCE, irrespective of the > spinlock/xchg. > > I suppose the problem here is that PPC has a very nasty test-and-set > spinlock with fwd progress issues while xchg maps to a fairly simple > ll/sc that (hopefully) has some hardware fairness. > > And pgdata being a rather course data structure (per node?) there could > be a lot of CPUs stomping on this here thing. > > So simpler not really, but better for PPC. > unsigned long interval = READ_ONCE(pgdat->numabalancing_migrate_next_window); if (time_after(jiffies, interval)) { interval += msecs_to_jiffies(migrate_interval_millisecs)); if (xchg(>numabalancing_migrate_nr_pages, 0)) WRITE_ONCE(pgdat->numabalancing_migrate_next_window, interval); } Something like this?
Re: [PATCH 13/19] mm/migrate: Use xchg instead of spinlock
On Mon, Jun 04, 2018 at 03:30:22PM +0530, Srikar Dronamraju wrote: > diff --git a/mm/migrate.c b/mm/migrate.c > index 8c0af0f..1c55956 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1874,11 +1874,9 @@ static bool numamigrate_update_ratelimit(pg_data_t > *pgdat, >* all the time is being spent migrating! >*/ > if (time_after(jiffies, pgdat->numabalancing_migrate_next_window)) { > - spin_lock(>numabalancing_migrate_lock); > - pgdat->numabalancing_migrate_nr_pages = 0; > - pgdat->numabalancing_migrate_next_window = jiffies + > - msecs_to_jiffies(migrate_interval_millisecs); > - spin_unlock(>numabalancing_migrate_lock); > + if (xchg(>numabalancing_migrate_nr_pages, 0)) > + pgdat->numabalancing_migrate_next_window = jiffies + > + msecs_to_jiffies(migrate_interval_millisecs); Note that both are in fact wrong. That wants to be something like: pgdat->numabalancing_migrate_next_window += interval; Otherwise you stretch every interval by 'jiffies - numabalancing_migrate_next_window'. Also, that all wants READ_ONCE/WRITE_ONCE, irrespective of the spinlock/xchg. I suppose the problem here is that PPC has a very nasty test-and-set spinlock with fwd progress issues while xchg maps to a fairly simple ll/sc that (hopefully) has some hardware fairness. And pgdata being a rather course data structure (per node?) there could be a lot of CPUs stomping on this here thing. So simpler not really, but better for PPC. > } > if (pgdat->numabalancing_migrate_nr_pages > ratelimit_pages) { > trace_mm_numa_migrate_ratelimit(current, pgdat->node_id, > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 4526643..464a25c 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -6208,7 +6208,6 @@ static void __paginginit free_area_init_core(struct > pglist_data *pgdat) > > pgdat_resize_init(pgdat); > #ifdef CONFIG_NUMA_BALANCING > - spin_lock_init(>numabalancing_migrate_lock); > pgdat->numabalancing_migrate_nr_pages = 0; > pgdat->active_node_migrate = 0; > pgdat->numabalancing_migrate_next_window = jiffies; > -- > 1.8.3.1 >
Re: [PATCH 13/19] mm/migrate: Use xchg instead of spinlock
On Mon, Jun 04, 2018 at 03:30:22PM +0530, Srikar Dronamraju wrote: > diff --git a/mm/migrate.c b/mm/migrate.c > index 8c0af0f..1c55956 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1874,11 +1874,9 @@ static bool numamigrate_update_ratelimit(pg_data_t > *pgdat, >* all the time is being spent migrating! >*/ > if (time_after(jiffies, pgdat->numabalancing_migrate_next_window)) { > - spin_lock(>numabalancing_migrate_lock); > - pgdat->numabalancing_migrate_nr_pages = 0; > - pgdat->numabalancing_migrate_next_window = jiffies + > - msecs_to_jiffies(migrate_interval_millisecs); > - spin_unlock(>numabalancing_migrate_lock); > + if (xchg(>numabalancing_migrate_nr_pages, 0)) > + pgdat->numabalancing_migrate_next_window = jiffies + > + msecs_to_jiffies(migrate_interval_millisecs); Note that both are in fact wrong. That wants to be something like: pgdat->numabalancing_migrate_next_window += interval; Otherwise you stretch every interval by 'jiffies - numabalancing_migrate_next_window'. Also, that all wants READ_ONCE/WRITE_ONCE, irrespective of the spinlock/xchg. I suppose the problem here is that PPC has a very nasty test-and-set spinlock with fwd progress issues while xchg maps to a fairly simple ll/sc that (hopefully) has some hardware fairness. And pgdata being a rather course data structure (per node?) there could be a lot of CPUs stomping on this here thing. So simpler not really, but better for PPC. > } > if (pgdat->numabalancing_migrate_nr_pages > ratelimit_pages) { > trace_mm_numa_migrate_ratelimit(current, pgdat->node_id, > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 4526643..464a25c 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -6208,7 +6208,6 @@ static void __paginginit free_area_init_core(struct > pglist_data *pgdat) > > pgdat_resize_init(pgdat); > #ifdef CONFIG_NUMA_BALANCING > - spin_lock_init(>numabalancing_migrate_lock); > pgdat->numabalancing_migrate_nr_pages = 0; > pgdat->active_node_migrate = 0; > pgdat->numabalancing_migrate_next_window = jiffies; > -- > 1.8.3.1 >
Re: [PATCH 13/19] mm/migrate: Use xchg instead of spinlock
On Mon, 2018-06-04 at 15:30 +0530, Srikar Dronamraju wrote: > > +++ b/mm/migrate.c > @@ -1874,11 +1874,9 @@ static bool > numamigrate_update_ratelimit(pg_data_t *pgdat, >* all the time is being spent migrating! >*/ > if (time_after(jiffies, pgdat- > >numabalancing_migrate_next_window)) { > - spin_lock(>numabalancing_migrate_lock); > - pgdat->numabalancing_migrate_nr_pages = 0; > - pgdat->numabalancing_migrate_next_window = jiffies + > - msecs_to_jiffies(migrate_interval_millisecs) > ; > - spin_unlock(>numabalancing_migrate_lock); > + if (xchg(>numabalancing_migrate_nr_pages, 0)) > + pgdat->numabalancing_migrate_next_window = > jiffies + > + msecs_to_jiffies(migrate_interval_mi > llisecs); > } I am not convinced this is simpler, but no real objection either way :) -- All Rights Reversed. signature.asc Description: This is a digitally signed message part
Re: [PATCH 13/19] mm/migrate: Use xchg instead of spinlock
On Mon, 2018-06-04 at 15:30 +0530, Srikar Dronamraju wrote: > > +++ b/mm/migrate.c > @@ -1874,11 +1874,9 @@ static bool > numamigrate_update_ratelimit(pg_data_t *pgdat, >* all the time is being spent migrating! >*/ > if (time_after(jiffies, pgdat- > >numabalancing_migrate_next_window)) { > - spin_lock(>numabalancing_migrate_lock); > - pgdat->numabalancing_migrate_nr_pages = 0; > - pgdat->numabalancing_migrate_next_window = jiffies + > - msecs_to_jiffies(migrate_interval_millisecs) > ; > - spin_unlock(>numabalancing_migrate_lock); > + if (xchg(>numabalancing_migrate_nr_pages, 0)) > + pgdat->numabalancing_migrate_next_window = > jiffies + > + msecs_to_jiffies(migrate_interval_mi > llisecs); > } I am not convinced this is simpler, but no real objection either way :) -- All Rights Reversed. signature.asc Description: This is a digitally signed message part
[PATCH 13/19] mm/migrate: Use xchg instead of spinlock
Currently resetting the migrate rate limit is under a spinlock. The spinlock will only serialize the migrate rate limiting and something similar can actually be achieved by a simpler xchg. Testcase Time: Min Max Avg StdDev numa01.sh Real: 435.67 707.28 527.49 97.85 numa01.sh Sys: 76.41 231.19 162.49 56.13 numa01.sh User:38247.3659033.5245129.31 7642.69 numa02.sh Real: 60.35 62.09 61.090.69 numa02.sh Sys: 15.01 30.20 20.645.56 numa02.sh User: 5195.93 5294.82 5240.99 40.55 numa03.sh Real: 752.04 919.89 836.81 63.29 numa03.sh Sys: 115.10 133.35 125.467.78 numa03.sh User:58736.4470084.2665103.67 4416.10 numa04.sh Real: 418.43 709.69 512.53 104.17 numa04.sh Sys: 242.99 370.47 297.39 42.20 numa04.sh User:34916.1448429.5438955.65 4928.05 numa05.sh Real: 379.27 434.05 403.70 17.79 numa05.sh Sys: 145.94 344.50 268.72 68.53 numa05.sh User:32679.3235449.7533989.10 913.19 Testcase Time: Min Max Avg StdDev %Change numa01.sh Real: 490.04 774.86 596.26 96.46 -11.5% numa01.sh Sys: 151.52 242.88 184.82 31.71 -12.0% numa01.sh User:41418.4160844.5948776.09 6564.27 -7.47% numa02.sh Real: 60.14 62.94 60.981.00 0.180% numa02.sh Sys: 16.11 30.77 21.205.28 -2.64% numa02.sh User: 5184.33 5311.09 5228.50 44.24 0.238% numa03.sh Real: 790.95 856.35 826.41 24.11 1.258% numa03.sh Sys: 114.93 118.85 117.051.63 7.184% numa03.sh User:60990.9964959.2863470.43 1415.44 2.573% numa04.sh Real: 434.37 597.92 504.87 59.70 1.517% numa04.sh Sys: 237.63 397.40 289.74 55.98 2.640% numa04.sh User:34854.8741121.8338572.52 2615.84 0.993% numa05.sh Real: 386.77 448.90 417.22 22.79 -3.24% numa05.sh Sys: 149.23 379.95 303.04 79.55 -11.3% numa05.sh User:32951.7635959.5834562.18 1034.05 -1.65% Signed-off-by: Srikar Dronamraju --- include/linux/mmzone.h | 3 --- mm/migrate.c | 8 +++- mm/page_alloc.c| 1 - 3 files changed, 3 insertions(+), 9 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index b0767703..0dbe1d5 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -669,9 +669,6 @@ struct zonelist { struct task_struct *kcompactd; #endif #ifdef CONFIG_NUMA_BALANCING - /* Lock serializing the migrate rate limiting window */ - spinlock_t numabalancing_migrate_lock; - /* Rate limiting time interval */ unsigned long numabalancing_migrate_next_window; diff --git a/mm/migrate.c b/mm/migrate.c index 8c0af0f..1c55956 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1874,11 +1874,9 @@ static bool numamigrate_update_ratelimit(pg_data_t *pgdat, * all the time is being spent migrating! */ if (time_after(jiffies, pgdat->numabalancing_migrate_next_window)) { - spin_lock(>numabalancing_migrate_lock); - pgdat->numabalancing_migrate_nr_pages = 0; - pgdat->numabalancing_migrate_next_window = jiffies + - msecs_to_jiffies(migrate_interval_millisecs); - spin_unlock(>numabalancing_migrate_lock); + if (xchg(>numabalancing_migrate_nr_pages, 0)) + pgdat->numabalancing_migrate_next_window = jiffies + + msecs_to_jiffies(migrate_interval_millisecs); } if (pgdat->numabalancing_migrate_nr_pages > ratelimit_pages) { trace_mm_numa_migrate_ratelimit(current, pgdat->node_id, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4526643..464a25c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6208,7 +6208,6 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat) pgdat_resize_init(pgdat); #ifdef CONFIG_NUMA_BALANCING - spin_lock_init(>numabalancing_migrate_lock); pgdat->numabalancing_migrate_nr_pages = 0; pgdat->active_node_migrate = 0; pgdat->numabalancing_migrate_next_window = jiffies; -- 1.8.3.1
[PATCH 13/19] mm/migrate: Use xchg instead of spinlock
Currently resetting the migrate rate limit is under a spinlock. The spinlock will only serialize the migrate rate limiting and something similar can actually be achieved by a simpler xchg. Testcase Time: Min Max Avg StdDev numa01.sh Real: 435.67 707.28 527.49 97.85 numa01.sh Sys: 76.41 231.19 162.49 56.13 numa01.sh User:38247.3659033.5245129.31 7642.69 numa02.sh Real: 60.35 62.09 61.090.69 numa02.sh Sys: 15.01 30.20 20.645.56 numa02.sh User: 5195.93 5294.82 5240.99 40.55 numa03.sh Real: 752.04 919.89 836.81 63.29 numa03.sh Sys: 115.10 133.35 125.467.78 numa03.sh User:58736.4470084.2665103.67 4416.10 numa04.sh Real: 418.43 709.69 512.53 104.17 numa04.sh Sys: 242.99 370.47 297.39 42.20 numa04.sh User:34916.1448429.5438955.65 4928.05 numa05.sh Real: 379.27 434.05 403.70 17.79 numa05.sh Sys: 145.94 344.50 268.72 68.53 numa05.sh User:32679.3235449.7533989.10 913.19 Testcase Time: Min Max Avg StdDev %Change numa01.sh Real: 490.04 774.86 596.26 96.46 -11.5% numa01.sh Sys: 151.52 242.88 184.82 31.71 -12.0% numa01.sh User:41418.4160844.5948776.09 6564.27 -7.47% numa02.sh Real: 60.14 62.94 60.981.00 0.180% numa02.sh Sys: 16.11 30.77 21.205.28 -2.64% numa02.sh User: 5184.33 5311.09 5228.50 44.24 0.238% numa03.sh Real: 790.95 856.35 826.41 24.11 1.258% numa03.sh Sys: 114.93 118.85 117.051.63 7.184% numa03.sh User:60990.9964959.2863470.43 1415.44 2.573% numa04.sh Real: 434.37 597.92 504.87 59.70 1.517% numa04.sh Sys: 237.63 397.40 289.74 55.98 2.640% numa04.sh User:34854.8741121.8338572.52 2615.84 0.993% numa05.sh Real: 386.77 448.90 417.22 22.79 -3.24% numa05.sh Sys: 149.23 379.95 303.04 79.55 -11.3% numa05.sh User:32951.7635959.5834562.18 1034.05 -1.65% Signed-off-by: Srikar Dronamraju --- include/linux/mmzone.h | 3 --- mm/migrate.c | 8 +++- mm/page_alloc.c| 1 - 3 files changed, 3 insertions(+), 9 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index b0767703..0dbe1d5 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -669,9 +669,6 @@ struct zonelist { struct task_struct *kcompactd; #endif #ifdef CONFIG_NUMA_BALANCING - /* Lock serializing the migrate rate limiting window */ - spinlock_t numabalancing_migrate_lock; - /* Rate limiting time interval */ unsigned long numabalancing_migrate_next_window; diff --git a/mm/migrate.c b/mm/migrate.c index 8c0af0f..1c55956 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1874,11 +1874,9 @@ static bool numamigrate_update_ratelimit(pg_data_t *pgdat, * all the time is being spent migrating! */ if (time_after(jiffies, pgdat->numabalancing_migrate_next_window)) { - spin_lock(>numabalancing_migrate_lock); - pgdat->numabalancing_migrate_nr_pages = 0; - pgdat->numabalancing_migrate_next_window = jiffies + - msecs_to_jiffies(migrate_interval_millisecs); - spin_unlock(>numabalancing_migrate_lock); + if (xchg(>numabalancing_migrate_nr_pages, 0)) + pgdat->numabalancing_migrate_next_window = jiffies + + msecs_to_jiffies(migrate_interval_millisecs); } if (pgdat->numabalancing_migrate_nr_pages > ratelimit_pages) { trace_mm_numa_migrate_ratelimit(current, pgdat->node_id, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4526643..464a25c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6208,7 +6208,6 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat) pgdat_resize_init(pgdat); #ifdef CONFIG_NUMA_BALANCING - spin_lock_init(>numabalancing_migrate_lock); pgdat->numabalancing_migrate_nr_pages = 0; pgdat->active_node_migrate = 0; pgdat->numabalancing_migrate_next_window = jiffies; -- 1.8.3.1