Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
* Robin Holt wrote: > I have the patches sort-of finished. The patch set starts by > moving the halt/shutdown/reboot functions over to a new > kernel/reboot.c, next patch gets a checkpatch.pl cleanup to > work, third patch is essentially the below patch against the > new file, and the fourth patch introduces a kernel boot parameter. > > That said, I don't like them because of the 'stable' marking for > these patches. I think I am going submit them with the > existing patch first in the series, then introduce the kernel parameter, > then move them to kernel/reboot.c, and finally pass checkpatch.pl. > > Does that sound alright? Yeah, that ordering sounds right. If there are no objections from others I'll first apply the first patch (with a -stable tag), test it for a day, then apply the rest. Even patch #1 probably won't make it for v3.9-final [there's too many potential downsides IMHO], but this could be one of the cases where marking a patch for -stable and merging it in the merge window is legit. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
* Robin Holt h...@sgi.com wrote: I have the patches sort-of finished. The patch set starts by moving the halt/shutdown/reboot functions over to a new kernel/reboot.c, next patch gets a checkpatch.pl cleanup to work, third patch is essentially the below patch against the new file, and the fourth patch introduces a kernel boot parameter. That said, I don't like them because of the 'stable' marking for these patches. I think I am going submit them with the existing patch first in the series, then introduce the kernel parameter, then move them to kernel/reboot.c, and finally pass checkpatch.pl. Does that sound alright? Yeah, that ordering sounds right. If there are no objections from others I'll first apply the first patch (with a -stable tag), test it for a day, then apply the rest. Even patch #1 probably won't make it for v3.9-final [there's too many potential downsides IMHO], but this could be one of the cases where marking a patch for -stable and merging it in the merge window is legit. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On 04/15, Robin Holt wrote: > > On Sat, Apr 13, 2013 at 06:30:22PM +0200, Oleg Nesterov wrote: > > On 04/12, Robin Holt wrote: > > > > > > +void migrate_to_boot_cpu(void) > > > +{ > > > + /* The boot cpu is always logical cpu 0 */ > > > + int reboot_cpu_id = 0; > > > + > > > + /* Make certain the cpu I'm about to reboot on is online */ > > > + if (!cpu_online(reboot_cpu_id)) > > > + reboot_cpu_id = smp_processor_id(); > > > + > > > + /* Make certain I only run on the appropriate processor */ > > > + set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id)); > > > > This is only theoretical, but perhaps it makes sense to set > > PF_THREAD_BOUND before set_cpus_allowed_ptr() ? To prevent the > > race with another thread doing sched_setaffinity(). > > I don't quite understand this comment. We are migrating our own thread. > How does setting PF_THREAD_BOUND have any effect? Suppose that another thread does, say, sys_sched_setaffinity(our_pid) right after set_cpus_allowed_ptr(). If we set PF_THREAD_BOUND the next set_cpus_allowed() will fail. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On Mon, Apr 15, 2013 at 11:04:08AM -0500, Robin Holt wrote: > On Sat, Apr 13, 2013 at 06:30:22PM +0200, Oleg Nesterov wrote: > > On 04/12, Robin Holt wrote: > > > > > > +void migrate_to_boot_cpu(void) > > > +{ > > > + /* The boot cpu is always logical cpu 0 */ > > > + int reboot_cpu_id = 0; > > > + > > > + /* Make certain the cpu I'm about to reboot on is online */ > > > + if (!cpu_online(reboot_cpu_id)) > > > + reboot_cpu_id = smp_processor_id(); > > > + > > > + /* Make certain I only run on the appropriate processor */ > > > + set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id)); > > > > This is only theoretical, but perhaps it makes sense to set > > PF_THREAD_BOUND before set_cpus_allowed_ptr() ? To prevent the > > race with another thread doing sched_setaffinity(). > > I don't quite understand this comment. We are migrating our own thread. > How does setting PF_THREAD_BOUND have any effect? I should have taken a few more minutes. I understand now. Will do. Robin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On Sat, Apr 13, 2013 at 06:30:22PM +0200, Oleg Nesterov wrote: > On 04/12, Robin Holt wrote: > > > > +void migrate_to_boot_cpu(void) > > +{ > > + /* The boot cpu is always logical cpu 0 */ > > + int reboot_cpu_id = 0; > > + > > + /* Make certain the cpu I'm about to reboot on is online */ > > + if (!cpu_online(reboot_cpu_id)) > > + reboot_cpu_id = smp_processor_id(); > > + > > + /* Make certain I only run on the appropriate processor */ > > + set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id)); > > This is only theoretical, but perhaps it makes sense to set > PF_THREAD_BOUND before set_cpus_allowed_ptr() ? To prevent the > race with another thread doing sched_setaffinity(). I don't quite understand this comment. We are migrating our own thread. How does setting PF_THREAD_BOUND have any effect? Robin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
I have the patches sort-of finished. The patch set starts by moving the halt/shutdown/reboot functions over to a new kernel/reboot.c, next patch gets a checkpatch.pl cleanup to work, third patch is essentially the below patch against the new file, and the fourth patch introduces a kernel boot parameter. That said, I don't like them because of the 'stable' marking for these patches. I think I am going submit them with the existing patch first in the series, then introduce the kernel parameter, then move them to kernel/reboot.c, and finally pass checkpatch.pl. Does that sound alright? Robin On Mon, Apr 15, 2013 at 07:02:16AM -0500, Robin Holt wrote: > On Mon, Apr 15, 2013 at 12:16:44PM +0200, Ingo Molnar wrote: > > > > * Robin Holt wrote: > > > > > From 1767003c943325e52ac78cac6fdbaf2ab63d Mon Sep 17 00:00:00 2001 > > > From: Robin Holt > > > Date: Wed, 3 Apr 2013 13:52:00 -0500 > > > Subject: [PATCH] Migrate shutdown/reboot to boot cpu. > > > > > > We recently noticed that reboot of a 1024 cpu machine takes approx 16 > > > minutes of just stopping the cpus. The slowdown was tracked to commit > > > f96972f. > > > > > > The current implementation does all the work of hot removing the cpus > > > before halting the system. We are switching to just migrating to the > > > boot cpu and then calling continuing with shutdown/reboot. > > > > > > This also has the effect of not breaking x86's command line parameter for > > > specifying the reboot cpu. Note, this code was shamelessly copied from > > > arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu > > > command line parameter. > > > > > > Signed-off-by: Robin Holt > > > To: Shawn Guo > > > To: Ingo Molnar > > > To: Russ Anderson > > > Cc: Andrew Morton > > > Cc: "H. Peter Anvin" > > > Cc: Lai Jiangshan > > > Cc: Linus Torvalds > > > Cc: Linux Kernel Mailing List > > > Cc: Michel Lespinasse > > > Cc: Oleg Nesterov > > > Cc: "Paul E. McKenney" > > > Cc: Paul Mackerras > > > Cc: Peter Zijlstra > > > Cc: Robin Holt > > > Cc: "ru...@rustcorp.com.au" > > > Cc: Tejun Heo > > > Cc: the arch/x86 maintainers > > > Cc: Thomas Gleixner > > > Cc: > > > --- > > > kernel/sys.c | 17 +++-- > > > 1 file changed, 15 insertions(+), 2 deletions(-) > > > > > > diff --git a/kernel/sys.c b/kernel/sys.c > > > index 0da73cf..4d1047d 100644 > > > --- a/kernel/sys.c > > > +++ b/kernel/sys.c > > > @@ -357,6 +357,19 @@ int unregister_reboot_notifier(struct notifier_block > > > *nb) > > > } > > > EXPORT_SYMBOL(unregister_reboot_notifier); > > > > > > +void migrate_to_boot_cpu(void) > > > +{ > > > + /* The boot cpu is always logical cpu 0 */ > > > + int reboot_cpu_id = 0; > > > + > > > + /* Make certain the cpu I'm about to reboot on is online */ > > > + if (!cpu_online(reboot_cpu_id)) > > > + reboot_cpu_id = smp_processor_id(); > > > + > > > + /* Make certain I only run on the appropriate processor */ > > > + set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id)); > > > +} > > > > I guess you suspect what I'm going to suggest next? :-) > > > > While I think something like this commit would be acceptable as a minimal > > regression fix, it would be really lovely to add a second patch as well, > > which > > removes the same code from > > arch/x86/kernel/reboot.c:native_machine_shutdown() and > > merged it with the kernel/sys.c version? That way all platforms gained a > > reboot_cpu command line, and we'd have less code duplication as well. > > Win-win. > > > > ( While at it, it might make sense to move the reboot/shutdown related bits > > from > > kernel/sys.c to a new kernel/shutdown.c or kernel/reboot.c file or so. ) > > That all sounds good. I will work on that this morning. > > Robin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On Mon, Apr 15, 2013 at 12:16:44PM +0200, Ingo Molnar wrote: > > * Robin Holt wrote: > > > From 1767003c943325e52ac78cac6fdbaf2ab63d Mon Sep 17 00:00:00 2001 > > From: Robin Holt > > Date: Wed, 3 Apr 2013 13:52:00 -0500 > > Subject: [PATCH] Migrate shutdown/reboot to boot cpu. > > > > We recently noticed that reboot of a 1024 cpu machine takes approx 16 > > minutes of just stopping the cpus. The slowdown was tracked to commit > > f96972f. > > > > The current implementation does all the work of hot removing the cpus > > before halting the system. We are switching to just migrating to the > > boot cpu and then calling continuing with shutdown/reboot. > > > > This also has the effect of not breaking x86's command line parameter for > > specifying the reboot cpu. Note, this code was shamelessly copied from > > arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu > > command line parameter. > > > > Signed-off-by: Robin Holt > > To: Shawn Guo > > To: Ingo Molnar > > To: Russ Anderson > > Cc: Andrew Morton > > Cc: "H. Peter Anvin" > > Cc: Lai Jiangshan > > Cc: Linus Torvalds > > Cc: Linux Kernel Mailing List > > Cc: Michel Lespinasse > > Cc: Oleg Nesterov > > Cc: "Paul E. McKenney" > > Cc: Paul Mackerras > > Cc: Peter Zijlstra > > Cc: Robin Holt > > Cc: "ru...@rustcorp.com.au" > > Cc: Tejun Heo > > Cc: the arch/x86 maintainers > > Cc: Thomas Gleixner > > Cc: > > --- > > kernel/sys.c | 17 +++-- > > 1 file changed, 15 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/sys.c b/kernel/sys.c > > index 0da73cf..4d1047d 100644 > > --- a/kernel/sys.c > > +++ b/kernel/sys.c > > @@ -357,6 +357,19 @@ int unregister_reboot_notifier(struct notifier_block > > *nb) > > } > > EXPORT_SYMBOL(unregister_reboot_notifier); > > > > +void migrate_to_boot_cpu(void) > > +{ > > + /* The boot cpu is always logical cpu 0 */ > > + int reboot_cpu_id = 0; > > + > > + /* Make certain the cpu I'm about to reboot on is online */ > > + if (!cpu_online(reboot_cpu_id)) > > + reboot_cpu_id = smp_processor_id(); > > + > > + /* Make certain I only run on the appropriate processor */ > > + set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id)); > > +} > > I guess you suspect what I'm going to suggest next? :-) > > While I think something like this commit would be acceptable as a minimal > regression fix, it would be really lovely to add a second patch as well, > which > removes the same code from arch/x86/kernel/reboot.c:native_machine_shutdown() > and > merged it with the kernel/sys.c version? That way all platforms gained a > reboot_cpu command line, and we'd have less code duplication as well. Win-win. > > ( While at it, it might make sense to move the reboot/shutdown related bits > from > kernel/sys.c to a new kernel/shutdown.c or kernel/reboot.c file or so. ) That all sounds good. I will work on that this morning. Robin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
* Robin Holt wrote: > From 1767003c943325e52ac78cac6fdbaf2ab63d Mon Sep 17 00:00:00 2001 > From: Robin Holt > Date: Wed, 3 Apr 2013 13:52:00 -0500 > Subject: [PATCH] Migrate shutdown/reboot to boot cpu. > > We recently noticed that reboot of a 1024 cpu machine takes approx 16 > minutes of just stopping the cpus. The slowdown was tracked to commit > f96972f. > > The current implementation does all the work of hot removing the cpus > before halting the system. We are switching to just migrating to the > boot cpu and then calling continuing with shutdown/reboot. > > This also has the effect of not breaking x86's command line parameter for > specifying the reboot cpu. Note, this code was shamelessly copied from > arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu > command line parameter. > > Signed-off-by: Robin Holt > To: Shawn Guo > To: Ingo Molnar > To: Russ Anderson > Cc: Andrew Morton > Cc: "H. Peter Anvin" > Cc: Lai Jiangshan > Cc: Linus Torvalds > Cc: Linux Kernel Mailing List > Cc: Michel Lespinasse > Cc: Oleg Nesterov > Cc: "Paul E. McKenney" > Cc: Paul Mackerras > Cc: Peter Zijlstra > Cc: Robin Holt > Cc: "ru...@rustcorp.com.au" > Cc: Tejun Heo > Cc: the arch/x86 maintainers > Cc: Thomas Gleixner > Cc: > --- > kernel/sys.c | 17 +++-- > 1 file changed, 15 insertions(+), 2 deletions(-) > > diff --git a/kernel/sys.c b/kernel/sys.c > index 0da73cf..4d1047d 100644 > --- a/kernel/sys.c > +++ b/kernel/sys.c > @@ -357,6 +357,19 @@ int unregister_reboot_notifier(struct notifier_block *nb) > } > EXPORT_SYMBOL(unregister_reboot_notifier); > > +void migrate_to_boot_cpu(void) > +{ > + /* The boot cpu is always logical cpu 0 */ > + int reboot_cpu_id = 0; > + > + /* Make certain the cpu I'm about to reboot on is online */ > + if (!cpu_online(reboot_cpu_id)) > + reboot_cpu_id = smp_processor_id(); > + > + /* Make certain I only run on the appropriate processor */ > + set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id)); > +} I guess you suspect what I'm going to suggest next? :-) While I think something like this commit would be acceptable as a minimal regression fix, it would be really lovely to add a second patch as well, which removes the same code from arch/x86/kernel/reboot.c:native_machine_shutdown() and merged it with the kernel/sys.c version? That way all platforms gained a reboot_cpu command line, and we'd have less code duplication as well. Win-win. ( While at it, it might make sense to move the reboot/shutdown related bits from kernel/sys.c to a new kernel/shutdown.c or kernel/reboot.c file or so. ) Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
* Robin Holt h...@sgi.com wrote: From 1767003c943325e52ac78cac6fdbaf2ab63d Mon Sep 17 00:00:00 2001 From: Robin Holt h...@sgi.com Date: Wed, 3 Apr 2013 13:52:00 -0500 Subject: [PATCH] Migrate shutdown/reboot to boot cpu. We recently noticed that reboot of a 1024 cpu machine takes approx 16 minutes of just stopping the cpus. The slowdown was tracked to commit f96972f. The current implementation does all the work of hot removing the cpus before halting the system. We are switching to just migrating to the boot cpu and then calling continuing with shutdown/reboot. This also has the effect of not breaking x86's command line parameter for specifying the reboot cpu. Note, this code was shamelessly copied from arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu command line parameter. Signed-off-by: Robin Holt h...@sgi.com To: Shawn Guo shawn@linaro.org To: Ingo Molnar mi...@redhat.com To: Russ Anderson r...@sgi.com Cc: Andrew Morton a...@linux-foundation.org Cc: H. Peter Anvin h...@zytor.com Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Linus Torvalds torva...@linux-foundation.org Cc: Linux Kernel Mailing List linux-kernel@vger.kernel.org Cc: Michel Lespinasse wal...@google.com Cc: Oleg Nesterov o...@redhat.com Cc: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Paul Mackerras pau...@samba.org Cc: Peter Zijlstra pet...@infradead.org Cc: Robin Holt h...@sgi.com Cc: ru...@rustcorp.com.au ru...@rustcorp.com.au Cc: Tejun Heo t...@kernel.org Cc: the arch/x86 maintainers x...@kernel.org Cc: Thomas Gleixner t...@linutronix.de Cc: sta...@vger.kernel.org --- kernel/sys.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/kernel/sys.c b/kernel/sys.c index 0da73cf..4d1047d 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -357,6 +357,19 @@ int unregister_reboot_notifier(struct notifier_block *nb) } EXPORT_SYMBOL(unregister_reboot_notifier); +void migrate_to_boot_cpu(void) +{ + /* The boot cpu is always logical cpu 0 */ + int reboot_cpu_id = 0; + + /* Make certain the cpu I'm about to reboot on is online */ + if (!cpu_online(reboot_cpu_id)) + reboot_cpu_id = smp_processor_id(); + + /* Make certain I only run on the appropriate processor */ + set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id)); +} I guess you suspect what I'm going to suggest next? :-) While I think something like this commit would be acceptable as a minimal regression fix, it would be really lovely to add a second patch as well, which removes the same code from arch/x86/kernel/reboot.c:native_machine_shutdown() and merged it with the kernel/sys.c version? That way all platforms gained a reboot_cpu command line, and we'd have less code duplication as well. Win-win. ( While at it, it might make sense to move the reboot/shutdown related bits from kernel/sys.c to a new kernel/shutdown.c or kernel/reboot.c file or so. ) Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On Mon, Apr 15, 2013 at 12:16:44PM +0200, Ingo Molnar wrote: * Robin Holt h...@sgi.com wrote: From 1767003c943325e52ac78cac6fdbaf2ab63d Mon Sep 17 00:00:00 2001 From: Robin Holt h...@sgi.com Date: Wed, 3 Apr 2013 13:52:00 -0500 Subject: [PATCH] Migrate shutdown/reboot to boot cpu. We recently noticed that reboot of a 1024 cpu machine takes approx 16 minutes of just stopping the cpus. The slowdown was tracked to commit f96972f. The current implementation does all the work of hot removing the cpus before halting the system. We are switching to just migrating to the boot cpu and then calling continuing with shutdown/reboot. This also has the effect of not breaking x86's command line parameter for specifying the reboot cpu. Note, this code was shamelessly copied from arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu command line parameter. Signed-off-by: Robin Holt h...@sgi.com To: Shawn Guo shawn@linaro.org To: Ingo Molnar mi...@redhat.com To: Russ Anderson r...@sgi.com Cc: Andrew Morton a...@linux-foundation.org Cc: H. Peter Anvin h...@zytor.com Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Linus Torvalds torva...@linux-foundation.org Cc: Linux Kernel Mailing List linux-kernel@vger.kernel.org Cc: Michel Lespinasse wal...@google.com Cc: Oleg Nesterov o...@redhat.com Cc: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Paul Mackerras pau...@samba.org Cc: Peter Zijlstra pet...@infradead.org Cc: Robin Holt h...@sgi.com Cc: ru...@rustcorp.com.au ru...@rustcorp.com.au Cc: Tejun Heo t...@kernel.org Cc: the arch/x86 maintainers x...@kernel.org Cc: Thomas Gleixner t...@linutronix.de Cc: sta...@vger.kernel.org --- kernel/sys.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/kernel/sys.c b/kernel/sys.c index 0da73cf..4d1047d 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -357,6 +357,19 @@ int unregister_reboot_notifier(struct notifier_block *nb) } EXPORT_SYMBOL(unregister_reboot_notifier); +void migrate_to_boot_cpu(void) +{ + /* The boot cpu is always logical cpu 0 */ + int reboot_cpu_id = 0; + + /* Make certain the cpu I'm about to reboot on is online */ + if (!cpu_online(reboot_cpu_id)) + reboot_cpu_id = smp_processor_id(); + + /* Make certain I only run on the appropriate processor */ + set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id)); +} I guess you suspect what I'm going to suggest next? :-) While I think something like this commit would be acceptable as a minimal regression fix, it would be really lovely to add a second patch as well, which removes the same code from arch/x86/kernel/reboot.c:native_machine_shutdown() and merged it with the kernel/sys.c version? That way all platforms gained a reboot_cpu command line, and we'd have less code duplication as well. Win-win. ( While at it, it might make sense to move the reboot/shutdown related bits from kernel/sys.c to a new kernel/shutdown.c or kernel/reboot.c file or so. ) That all sounds good. I will work on that this morning. Robin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
I have the patches sort-of finished. The patch set starts by moving the halt/shutdown/reboot functions over to a new kernel/reboot.c, next patch gets a checkpatch.pl cleanup to work, third patch is essentially the below patch against the new file, and the fourth patch introduces a kernel boot parameter. That said, I don't like them because of the 'stable' marking for these patches. I think I am going submit them with the existing patch first in the series, then introduce the kernel parameter, then move them to kernel/reboot.c, and finally pass checkpatch.pl. Does that sound alright? Robin On Mon, Apr 15, 2013 at 07:02:16AM -0500, Robin Holt wrote: On Mon, Apr 15, 2013 at 12:16:44PM +0200, Ingo Molnar wrote: * Robin Holt h...@sgi.com wrote: From 1767003c943325e52ac78cac6fdbaf2ab63d Mon Sep 17 00:00:00 2001 From: Robin Holt h...@sgi.com Date: Wed, 3 Apr 2013 13:52:00 -0500 Subject: [PATCH] Migrate shutdown/reboot to boot cpu. We recently noticed that reboot of a 1024 cpu machine takes approx 16 minutes of just stopping the cpus. The slowdown was tracked to commit f96972f. The current implementation does all the work of hot removing the cpus before halting the system. We are switching to just migrating to the boot cpu and then calling continuing with shutdown/reboot. This also has the effect of not breaking x86's command line parameter for specifying the reboot cpu. Note, this code was shamelessly copied from arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu command line parameter. Signed-off-by: Robin Holt h...@sgi.com To: Shawn Guo shawn@linaro.org To: Ingo Molnar mi...@redhat.com To: Russ Anderson r...@sgi.com Cc: Andrew Morton a...@linux-foundation.org Cc: H. Peter Anvin h...@zytor.com Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Linus Torvalds torva...@linux-foundation.org Cc: Linux Kernel Mailing List linux-kernel@vger.kernel.org Cc: Michel Lespinasse wal...@google.com Cc: Oleg Nesterov o...@redhat.com Cc: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Paul Mackerras pau...@samba.org Cc: Peter Zijlstra pet...@infradead.org Cc: Robin Holt h...@sgi.com Cc: ru...@rustcorp.com.au ru...@rustcorp.com.au Cc: Tejun Heo t...@kernel.org Cc: the arch/x86 maintainers x...@kernel.org Cc: Thomas Gleixner t...@linutronix.de Cc: sta...@vger.kernel.org --- kernel/sys.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/kernel/sys.c b/kernel/sys.c index 0da73cf..4d1047d 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -357,6 +357,19 @@ int unregister_reboot_notifier(struct notifier_block *nb) } EXPORT_SYMBOL(unregister_reboot_notifier); +void migrate_to_boot_cpu(void) +{ + /* The boot cpu is always logical cpu 0 */ + int reboot_cpu_id = 0; + + /* Make certain the cpu I'm about to reboot on is online */ + if (!cpu_online(reboot_cpu_id)) + reboot_cpu_id = smp_processor_id(); + + /* Make certain I only run on the appropriate processor */ + set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id)); +} I guess you suspect what I'm going to suggest next? :-) While I think something like this commit would be acceptable as a minimal regression fix, it would be really lovely to add a second patch as well, which removes the same code from arch/x86/kernel/reboot.c:native_machine_shutdown() and merged it with the kernel/sys.c version? That way all platforms gained a reboot_cpu command line, and we'd have less code duplication as well. Win-win. ( While at it, it might make sense to move the reboot/shutdown related bits from kernel/sys.c to a new kernel/shutdown.c or kernel/reboot.c file or so. ) That all sounds good. I will work on that this morning. Robin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On Sat, Apr 13, 2013 at 06:30:22PM +0200, Oleg Nesterov wrote: On 04/12, Robin Holt wrote: +void migrate_to_boot_cpu(void) +{ + /* The boot cpu is always logical cpu 0 */ + int reboot_cpu_id = 0; + + /* Make certain the cpu I'm about to reboot on is online */ + if (!cpu_online(reboot_cpu_id)) + reboot_cpu_id = smp_processor_id(); + + /* Make certain I only run on the appropriate processor */ + set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id)); This is only theoretical, but perhaps it makes sense to set PF_THREAD_BOUND before set_cpus_allowed_ptr() ? To prevent the race with another thread doing sched_setaffinity(). I don't quite understand this comment. We are migrating our own thread. How does setting PF_THREAD_BOUND have any effect? Robin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On Mon, Apr 15, 2013 at 11:04:08AM -0500, Robin Holt wrote: On Sat, Apr 13, 2013 at 06:30:22PM +0200, Oleg Nesterov wrote: On 04/12, Robin Holt wrote: +void migrate_to_boot_cpu(void) +{ + /* The boot cpu is always logical cpu 0 */ + int reboot_cpu_id = 0; + + /* Make certain the cpu I'm about to reboot on is online */ + if (!cpu_online(reboot_cpu_id)) + reboot_cpu_id = smp_processor_id(); + + /* Make certain I only run on the appropriate processor */ + set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id)); This is only theoretical, but perhaps it makes sense to set PF_THREAD_BOUND before set_cpus_allowed_ptr() ? To prevent the race with another thread doing sched_setaffinity(). I don't quite understand this comment. We are migrating our own thread. How does setting PF_THREAD_BOUND have any effect? I should have taken a few more minutes. I understand now. Will do. Robin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On 04/15, Robin Holt wrote: On Sat, Apr 13, 2013 at 06:30:22PM +0200, Oleg Nesterov wrote: On 04/12, Robin Holt wrote: +void migrate_to_boot_cpu(void) +{ + /* The boot cpu is always logical cpu 0 */ + int reboot_cpu_id = 0; + + /* Make certain the cpu I'm about to reboot on is online */ + if (!cpu_online(reboot_cpu_id)) + reboot_cpu_id = smp_processor_id(); + + /* Make certain I only run on the appropriate processor */ + set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id)); This is only theoretical, but perhaps it makes sense to set PF_THREAD_BOUND before set_cpus_allowed_ptr() ? To prevent the race with another thread doing sched_setaffinity(). I don't quite understand this comment. We are migrating our own thread. How does setting PF_THREAD_BOUND have any effect? Suppose that another thread does, say, sys_sched_setaffinity(our_pid) right after set_cpus_allowed_ptr(). If we set PF_THREAD_BOUND the next set_cpus_allowed() will fail. Oleg. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On 04/12/2013 03:01 PM, Robin Holt wrote: > kernel/sys.c | 17 +++-- > 1 file changed, 15 insertions(+), 2 deletions(-) > > diff --git a/kernel/sys.c b/kernel/sys.c > index 0da73cf..4d1047d 100644 > --- a/kernel/sys.c > +++ b/kernel/sys.c > @@ -357,6 +357,19 @@ int unregister_reboot_notifier(struct notifier_block *nb) > } > EXPORT_SYMBOL(unregister_reboot_notifier); > > +void migrate_to_boot_cpu(void) > +{ > + /* The boot cpu is always logical cpu 0 */ > + int reboot_cpu_id = 0; > + > + /* Make certain the cpu I'm about to reboot on is online */ > + if (!cpu_online(reboot_cpu_id)) > + reboot_cpu_id = smp_processor_id(); > + If CPU 0 is offline, there is no point in binding, right? [Fenghua (in CC) added the support to offline CPU0 on x86 Intel platforms. So its possible that CPU0 is offline when you try a reboot.] > + /* Make certain I only run on the appropriate processor */ > + set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id)); > +} > + > /** > * kernel_restart - reboot the system > * @cmd: pointer to buffer containing command to execute for restart > @@ -368,7 +381,7 @@ EXPORT_SYMBOL(unregister_reboot_notifier); > void kernel_restart(char *cmd) > { > kernel_restart_prepare(cmd); > - disable_nonboot_cpus(); > + migrate_to_boot_cpu(); > syscore_shutdown(); > if (!cmd) > printk(KERN_EMERG "Restarting system.\n"); > @@ -414,7 +427,7 @@ void kernel_power_off(void) > kernel_shutdown_prepare(SYSTEM_POWER_OFF); > if (pm_power_off_prepare) > pm_power_off_prepare(); > - disable_nonboot_cpus(); > + migrate_to_boot_cpu(); Okay, so you are touching poweroff also. Restart was only recently altered by Shawn, so we can assume that his fix was necessary only to his platform. However, for poweroff, I see the commit below in the git log, which added the disable_nonboot_cpus() call. commit 4047727e5ae33f9b8d2b7766d1994ea6e5ec2991 Author: Mark Lord Date: Mon Oct 1 01:20:10 2007 -0700 Fix SMP poweroff hangs Its an old commit, so perhaps the issue no longer holds good, but I thought I should bring this to notice, just in case. > syscore_shutdown(); > printk(KERN_EMERG "Power down.\n"); > kmsg_dump(KMSG_DUMP_POWEROFF); > Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On 04/12, Robin Holt wrote: > > +void migrate_to_boot_cpu(void) > +{ > + /* The boot cpu is always logical cpu 0 */ > + int reboot_cpu_id = 0; > + > + /* Make certain the cpu I'm about to reboot on is online */ > + if (!cpu_online(reboot_cpu_id)) > + reboot_cpu_id = smp_processor_id(); > + > + /* Make certain I only run on the appropriate processor */ > + set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id)); This is only theoretical, but perhaps it makes sense to set PF_THREAD_BOUND before set_cpus_allowed_ptr() ? To prevent the race with another thread doing sched_setaffinity(). Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On 04/12, Robin Holt wrote: +void migrate_to_boot_cpu(void) +{ + /* The boot cpu is always logical cpu 0 */ + int reboot_cpu_id = 0; + + /* Make certain the cpu I'm about to reboot on is online */ + if (!cpu_online(reboot_cpu_id)) + reboot_cpu_id = smp_processor_id(); + + /* Make certain I only run on the appropriate processor */ + set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id)); This is only theoretical, but perhaps it makes sense to set PF_THREAD_BOUND before set_cpus_allowed_ptr() ? To prevent the race with another thread doing sched_setaffinity(). Oleg. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On 04/12/2013 03:01 PM, Robin Holt wrote: kernel/sys.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/kernel/sys.c b/kernel/sys.c index 0da73cf..4d1047d 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -357,6 +357,19 @@ int unregister_reboot_notifier(struct notifier_block *nb) } EXPORT_SYMBOL(unregister_reboot_notifier); +void migrate_to_boot_cpu(void) +{ + /* The boot cpu is always logical cpu 0 */ + int reboot_cpu_id = 0; + + /* Make certain the cpu I'm about to reboot on is online */ + if (!cpu_online(reboot_cpu_id)) + reboot_cpu_id = smp_processor_id(); + If CPU 0 is offline, there is no point in binding, right? [Fenghua (in CC) added the support to offline CPU0 on x86 Intel platforms. So its possible that CPU0 is offline when you try a reboot.] + /* Make certain I only run on the appropriate processor */ + set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id)); +} + /** * kernel_restart - reboot the system * @cmd: pointer to buffer containing command to execute for restart @@ -368,7 +381,7 @@ EXPORT_SYMBOL(unregister_reboot_notifier); void kernel_restart(char *cmd) { kernel_restart_prepare(cmd); - disable_nonboot_cpus(); + migrate_to_boot_cpu(); syscore_shutdown(); if (!cmd) printk(KERN_EMERG Restarting system.\n); @@ -414,7 +427,7 @@ void kernel_power_off(void) kernel_shutdown_prepare(SYSTEM_POWER_OFF); if (pm_power_off_prepare) pm_power_off_prepare(); - disable_nonboot_cpus(); + migrate_to_boot_cpu(); Okay, so you are touching poweroff also. Restart was only recently altered by Shawn, so we can assume that his fix was necessary only to his platform. However, for poweroff, I see the commit below in the git log, which added the disable_nonboot_cpus() call. commit 4047727e5ae33f9b8d2b7766d1994ea6e5ec2991 Author: Mark Lord l...@rtr.ca Date: Mon Oct 1 01:20:10 2007 -0700 Fix SMP poweroff hangs Its an old commit, so perhaps the issue no longer holds good, but I thought I should bring this to notice, just in case. syscore_shutdown(); printk(KERN_EMERG Power down.\n); kmsg_dump(KMSG_DUMP_POWEROFF); Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
Meant to send this to Shawn. Too early in the morning. Robin On Fri, Apr 12, 2013 at 04:31:49AM -0500, Robin Holt wrote: > On Fri, Apr 12, 2013 at 11:39:51AM +0530, Srivatsa S. Bhat wrote: > > On 04/12/2013 11:07 AM, Ingo Molnar wrote: > > > > > > * Robin Holt wrote: > > > > > >> For the v3.9 release, can we consider my awful patch? > > > > > > How about trying what I suggested, to make reboot affine to the boot CPU > > > explicitly, not by shutting down all the other CPUs, but by > > > set_cpus_allowed() or > > > so? > > > > > > > I agree, that sounds like the right thing to do for 3.9. Of course, it > > would be > > nice if Shawn could verify that doing that doesn't break his platform due to > > some unknown corner case. > > > > > That should solve the regression, without the ugly special-casing - while > > > giving > > > time to address the hot-unplug performance bottleneck. > > > > > > Once that is done disable_nonboot_cpus() can be used again for reboot. > > > > > > (But no strong feelings either way - both solutions are a workaround in a > > > sense.) > > > >From 1767003c943325e52ac78cac6fdbaf2ab63d Mon Sep 17 00:00:00 2001 > From: Robin Holt > Date: Wed, 3 Apr 2013 13:52:00 -0500 > Subject: [PATCH] Migrate shutdown/reboot to boot cpu. > > We recently noticed that reboot of a 1024 cpu machine takes approx 16 > minutes of just stopping the cpus. The slowdown was tracked to commit > f96972f. > > The current implementation does all the work of hot removing the cpus > before halting the system. We are switching to just migrating to the > boot cpu and then calling continuing with shutdown/reboot. > > This also has the effect of not breaking x86's command line parameter for > specifying the reboot cpu. Note, this code was shamelessly copied from > arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu > command line parameter. > > Signed-off-by: Robin Holt > To: Shawn Guo > To: Ingo Molnar > To: Russ Anderson > Cc: Andrew Morton > Cc: "H. Peter Anvin" > Cc: Lai Jiangshan > Cc: Linus Torvalds > Cc: Linux Kernel Mailing List > Cc: Michel Lespinasse > Cc: Oleg Nesterov > Cc: "Paul E. McKenney" > Cc: Paul Mackerras > Cc: Peter Zijlstra > Cc: Robin Holt > Cc: "ru...@rustcorp.com.au" > Cc: Tejun Heo > Cc: the arch/x86 maintainers > Cc: Thomas Gleixner > Cc: > --- > kernel/sys.c | 17 +++-- > 1 file changed, 15 insertions(+), 2 deletions(-) > > diff --git a/kernel/sys.c b/kernel/sys.c > index 0da73cf..4d1047d 100644 > --- a/kernel/sys.c > +++ b/kernel/sys.c > @@ -357,6 +357,19 @@ int unregister_reboot_notifier(struct notifier_block *nb) > } > EXPORT_SYMBOL(unregister_reboot_notifier); > > +void migrate_to_boot_cpu(void) > +{ > + /* The boot cpu is always logical cpu 0 */ > + int reboot_cpu_id = 0; > + > + /* Make certain the cpu I'm about to reboot on is online */ > + if (!cpu_online(reboot_cpu_id)) > + reboot_cpu_id = smp_processor_id(); > + > + /* Make certain I only run on the appropriate processor */ > + set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id)); > +} > + > /** > * kernel_restart - reboot the system > * @cmd: pointer to buffer containing command to execute for restart > @@ -368,7 +381,7 @@ EXPORT_SYMBOL(unregister_reboot_notifier); > void kernel_restart(char *cmd) > { > kernel_restart_prepare(cmd); > - disable_nonboot_cpus(); > + migrate_to_boot_cpu(); > syscore_shutdown(); > if (!cmd) > printk(KERN_EMERG "Restarting system.\n"); > @@ -414,7 +427,7 @@ void kernel_power_off(void) > kernel_shutdown_prepare(SYSTEM_POWER_OFF); > if (pm_power_off_prepare) > pm_power_off_prepare(); > - disable_nonboot_cpus(); > + migrate_to_boot_cpu(); > syscore_shutdown(); > printk(KERN_EMERG "Power down.\n"); > kmsg_dump(KMSG_DUMP_POWEROFF); > -- > 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On Fri, Apr 12, 2013 at 11:39:51AM +0530, Srivatsa S. Bhat wrote: > On 04/12/2013 11:07 AM, Ingo Molnar wrote: > > > > * Robin Holt wrote: > > > >> For the v3.9 release, can we consider my awful patch? > > > > How about trying what I suggested, to make reboot affine to the boot CPU > > explicitly, not by shutting down all the other CPUs, but by > > set_cpus_allowed() or > > so? > > > > I agree, that sounds like the right thing to do for 3.9. Of course, it would > be > nice if Shawn could verify that doing that doesn't break his platform due to > some unknown corner case. > > > That should solve the regression, without the ugly special-casing - while > > giving > > time to address the hot-unplug performance bottleneck. > > > > Once that is done disable_nonboot_cpus() can be used again for reboot. > > > > (But no strong feelings either way - both solutions are a workaround in a > > sense.) >From 1767003c943325e52ac78cac6fdbaf2ab63d Mon Sep 17 00:00:00 2001 From: Robin Holt Date: Wed, 3 Apr 2013 13:52:00 -0500 Subject: [PATCH] Migrate shutdown/reboot to boot cpu. We recently noticed that reboot of a 1024 cpu machine takes approx 16 minutes of just stopping the cpus. The slowdown was tracked to commit f96972f. The current implementation does all the work of hot removing the cpus before halting the system. We are switching to just migrating to the boot cpu and then calling continuing with shutdown/reboot. This also has the effect of not breaking x86's command line parameter for specifying the reboot cpu. Note, this code was shamelessly copied from arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu command line parameter. Signed-off-by: Robin Holt To: Shawn Guo To: Ingo Molnar To: Russ Anderson Cc: Andrew Morton Cc: "H. Peter Anvin" Cc: Lai Jiangshan Cc: Linus Torvalds Cc: Linux Kernel Mailing List Cc: Michel Lespinasse Cc: Oleg Nesterov Cc: "Paul E. McKenney" Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Robin Holt Cc: "ru...@rustcorp.com.au" Cc: Tejun Heo Cc: the arch/x86 maintainers Cc: Thomas Gleixner Cc: --- kernel/sys.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/kernel/sys.c b/kernel/sys.c index 0da73cf..4d1047d 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -357,6 +357,19 @@ int unregister_reboot_notifier(struct notifier_block *nb) } EXPORT_SYMBOL(unregister_reboot_notifier); +void migrate_to_boot_cpu(void) +{ + /* The boot cpu is always logical cpu 0 */ + int reboot_cpu_id = 0; + + /* Make certain the cpu I'm about to reboot on is online */ + if (!cpu_online(reboot_cpu_id)) + reboot_cpu_id = smp_processor_id(); + + /* Make certain I only run on the appropriate processor */ + set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id)); +} + /** * kernel_restart - reboot the system * @cmd: pointer to buffer containing command to execute for restart @@ -368,7 +381,7 @@ EXPORT_SYMBOL(unregister_reboot_notifier); void kernel_restart(char *cmd) { kernel_restart_prepare(cmd); - disable_nonboot_cpus(); + migrate_to_boot_cpu(); syscore_shutdown(); if (!cmd) printk(KERN_EMERG "Restarting system.\n"); @@ -414,7 +427,7 @@ void kernel_power_off(void) kernel_shutdown_prepare(SYSTEM_POWER_OFF); if (pm_power_off_prepare) pm_power_off_prepare(); - disable_nonboot_cpus(); + migrate_to_boot_cpu(); syscore_shutdown(); printk(KERN_EMERG "Power down.\n"); kmsg_dump(KMSG_DUMP_POWEROFF); -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On 04/12/2013 11:07 AM, Ingo Molnar wrote: > > * Robin Holt wrote: > >> For the v3.9 release, can we consider my awful patch? > > How about trying what I suggested, to make reboot affine to the boot CPU > explicitly, not by shutting down all the other CPUs, but by > set_cpus_allowed() or > so? > I agree, that sounds like the right thing to do for 3.9. Of course, it would be nice if Shawn could verify that doing that doesn't break his platform due to some unknown corner case. > That should solve the regression, without the ugly special-casing - while > giving > time to address the hot-unplug performance bottleneck. > > Once that is done disable_nonboot_cpus() can be used again for reboot. > > (But no strong feelings either way - both solutions are a workaround in a > sense.) > Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On 04/12/2013 11:07 AM, Ingo Molnar wrote: * Robin Holt h...@sgi.com wrote: For the v3.9 release, can we consider my awful patch? How about trying what I suggested, to make reboot affine to the boot CPU explicitly, not by shutting down all the other CPUs, but by set_cpus_allowed() or so? I agree, that sounds like the right thing to do for 3.9. Of course, it would be nice if Shawn could verify that doing that doesn't break his platform due to some unknown corner case. That should solve the regression, without the ugly special-casing - while giving time to address the hot-unplug performance bottleneck. Once that is done disable_nonboot_cpus() can be used again for reboot. (But no strong feelings either way - both solutions are a workaround in a sense.) Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On Fri, Apr 12, 2013 at 11:39:51AM +0530, Srivatsa S. Bhat wrote: On 04/12/2013 11:07 AM, Ingo Molnar wrote: * Robin Holt h...@sgi.com wrote: For the v3.9 release, can we consider my awful patch? How about trying what I suggested, to make reboot affine to the boot CPU explicitly, not by shutting down all the other CPUs, but by set_cpus_allowed() or so? I agree, that sounds like the right thing to do for 3.9. Of course, it would be nice if Shawn could verify that doing that doesn't break his platform due to some unknown corner case. That should solve the regression, without the ugly special-casing - while giving time to address the hot-unplug performance bottleneck. Once that is done disable_nonboot_cpus() can be used again for reboot. (But no strong feelings either way - both solutions are a workaround in a sense.) From 1767003c943325e52ac78cac6fdbaf2ab63d Mon Sep 17 00:00:00 2001 From: Robin Holt h...@sgi.com Date: Wed, 3 Apr 2013 13:52:00 -0500 Subject: [PATCH] Migrate shutdown/reboot to boot cpu. We recently noticed that reboot of a 1024 cpu machine takes approx 16 minutes of just stopping the cpus. The slowdown was tracked to commit f96972f. The current implementation does all the work of hot removing the cpus before halting the system. We are switching to just migrating to the boot cpu and then calling continuing with shutdown/reboot. This also has the effect of not breaking x86's command line parameter for specifying the reboot cpu. Note, this code was shamelessly copied from arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu command line parameter. Signed-off-by: Robin Holt h...@sgi.com To: Shawn Guo shawn@linaro.org To: Ingo Molnar mi...@redhat.com To: Russ Anderson r...@sgi.com Cc: Andrew Morton a...@linux-foundation.org Cc: H. Peter Anvin h...@zytor.com Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Linus Torvalds torva...@linux-foundation.org Cc: Linux Kernel Mailing List linux-kernel@vger.kernel.org Cc: Michel Lespinasse wal...@google.com Cc: Oleg Nesterov o...@redhat.com Cc: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Paul Mackerras pau...@samba.org Cc: Peter Zijlstra pet...@infradead.org Cc: Robin Holt h...@sgi.com Cc: ru...@rustcorp.com.au ru...@rustcorp.com.au Cc: Tejun Heo t...@kernel.org Cc: the arch/x86 maintainers x...@kernel.org Cc: Thomas Gleixner t...@linutronix.de Cc: sta...@vger.kernel.org --- kernel/sys.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/kernel/sys.c b/kernel/sys.c index 0da73cf..4d1047d 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -357,6 +357,19 @@ int unregister_reboot_notifier(struct notifier_block *nb) } EXPORT_SYMBOL(unregister_reboot_notifier); +void migrate_to_boot_cpu(void) +{ + /* The boot cpu is always logical cpu 0 */ + int reboot_cpu_id = 0; + + /* Make certain the cpu I'm about to reboot on is online */ + if (!cpu_online(reboot_cpu_id)) + reboot_cpu_id = smp_processor_id(); + + /* Make certain I only run on the appropriate processor */ + set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id)); +} + /** * kernel_restart - reboot the system * @cmd: pointer to buffer containing command to execute for restart @@ -368,7 +381,7 @@ EXPORT_SYMBOL(unregister_reboot_notifier); void kernel_restart(char *cmd) { kernel_restart_prepare(cmd); - disable_nonboot_cpus(); + migrate_to_boot_cpu(); syscore_shutdown(); if (!cmd) printk(KERN_EMERG Restarting system.\n); @@ -414,7 +427,7 @@ void kernel_power_off(void) kernel_shutdown_prepare(SYSTEM_POWER_OFF); if (pm_power_off_prepare) pm_power_off_prepare(); - disable_nonboot_cpus(); + migrate_to_boot_cpu(); syscore_shutdown(); printk(KERN_EMERG Power down.\n); kmsg_dump(KMSG_DUMP_POWEROFF); -- 1.8.1.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
Meant to send this to Shawn. Too early in the morning. Robin On Fri, Apr 12, 2013 at 04:31:49AM -0500, Robin Holt wrote: On Fri, Apr 12, 2013 at 11:39:51AM +0530, Srivatsa S. Bhat wrote: On 04/12/2013 11:07 AM, Ingo Molnar wrote: * Robin Holt h...@sgi.com wrote: For the v3.9 release, can we consider my awful patch? How about trying what I suggested, to make reboot affine to the boot CPU explicitly, not by shutting down all the other CPUs, but by set_cpus_allowed() or so? I agree, that sounds like the right thing to do for 3.9. Of course, it would be nice if Shawn could verify that doing that doesn't break his platform due to some unknown corner case. That should solve the regression, without the ugly special-casing - while giving time to address the hot-unplug performance bottleneck. Once that is done disable_nonboot_cpus() can be used again for reboot. (But no strong feelings either way - both solutions are a workaround in a sense.) From 1767003c943325e52ac78cac6fdbaf2ab63d Mon Sep 17 00:00:00 2001 From: Robin Holt h...@sgi.com Date: Wed, 3 Apr 2013 13:52:00 -0500 Subject: [PATCH] Migrate shutdown/reboot to boot cpu. We recently noticed that reboot of a 1024 cpu machine takes approx 16 minutes of just stopping the cpus. The slowdown was tracked to commit f96972f. The current implementation does all the work of hot removing the cpus before halting the system. We are switching to just migrating to the boot cpu and then calling continuing with shutdown/reboot. This also has the effect of not breaking x86's command line parameter for specifying the reboot cpu. Note, this code was shamelessly copied from arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu command line parameter. Signed-off-by: Robin Holt h...@sgi.com To: Shawn Guo shawn@linaro.org To: Ingo Molnar mi...@redhat.com To: Russ Anderson r...@sgi.com Cc: Andrew Morton a...@linux-foundation.org Cc: H. Peter Anvin h...@zytor.com Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Linus Torvalds torva...@linux-foundation.org Cc: Linux Kernel Mailing List linux-kernel@vger.kernel.org Cc: Michel Lespinasse wal...@google.com Cc: Oleg Nesterov o...@redhat.com Cc: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Paul Mackerras pau...@samba.org Cc: Peter Zijlstra pet...@infradead.org Cc: Robin Holt h...@sgi.com Cc: ru...@rustcorp.com.au ru...@rustcorp.com.au Cc: Tejun Heo t...@kernel.org Cc: the arch/x86 maintainers x...@kernel.org Cc: Thomas Gleixner t...@linutronix.de Cc: sta...@vger.kernel.org --- kernel/sys.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/kernel/sys.c b/kernel/sys.c index 0da73cf..4d1047d 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -357,6 +357,19 @@ int unregister_reboot_notifier(struct notifier_block *nb) } EXPORT_SYMBOL(unregister_reboot_notifier); +void migrate_to_boot_cpu(void) +{ + /* The boot cpu is always logical cpu 0 */ + int reboot_cpu_id = 0; + + /* Make certain the cpu I'm about to reboot on is online */ + if (!cpu_online(reboot_cpu_id)) + reboot_cpu_id = smp_processor_id(); + + /* Make certain I only run on the appropriate processor */ + set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id)); +} + /** * kernel_restart - reboot the system * @cmd: pointer to buffer containing command to execute for restart @@ -368,7 +381,7 @@ EXPORT_SYMBOL(unregister_reboot_notifier); void kernel_restart(char *cmd) { kernel_restart_prepare(cmd); - disable_nonboot_cpus(); + migrate_to_boot_cpu(); syscore_shutdown(); if (!cmd) printk(KERN_EMERG Restarting system.\n); @@ -414,7 +427,7 @@ void kernel_power_off(void) kernel_shutdown_prepare(SYSTEM_POWER_OFF); if (pm_power_off_prepare) pm_power_off_prepare(); - disable_nonboot_cpus(); + migrate_to_boot_cpu(); syscore_shutdown(); printk(KERN_EMERG Power down.\n); kmsg_dump(KMSG_DUMP_POWEROFF); -- 1.8.1.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
* Robin Holt wrote: > For the v3.9 release, can we consider my awful patch? How about trying what I suggested, to make reboot affine to the boot CPU explicitly, not by shutting down all the other CPUs, but by set_cpus_allowed() or so? That should solve the regression, without the ugly special-casing - while giving time to address the hot-unplug performance bottleneck. Once that is done disable_nonboot_cpus() can be used again for reboot. (But no strong feelings either way - both solutions are a workaround in a sense.) Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On Thu, Apr 11, 2013 at 03:08:20PM -0500, Russ Anderson wrote: > On Thu, Apr 11, 2013 at 08:15:27PM +0530, Srivatsa S. Bhat wrote: > > On 04/11/2013 07:53 PM, Russ Anderson wrote: > > > On Thu, Apr 11, 2013 at 06:15:18PM +0530, Srivatsa S. Bhat wrote: > > >> > > >> One more thing we have to note is that, there are 4 notifiers for taking > > >> a > > >> CPU offline: > > >> > > >> CPU_DOWN_PREPARE > > >> CPU_DYING > > >> CPU_DEAD > > >> CPU_POST_DEAD > > >> > > >> The first can be run in parallel as mentioned above. The second is run in > > >> parallel in the stop_machine() phase as shown in Russ' patch. But the > > >> third > > >> and fourth set of notifications all end up running only on CPU0, which > > >> will > > >> again slow down things. > > > > > > In my testing the third and fourth set were a small part of the overall > > > time. Less than 10%, with cpu notifiers 90+% of the time. > > > > *All* of them are cpu notifiers! All of them invoke __cpu_notify() > > internally. > > So how did you differentiate between them and find out that the third and > > fourth sets take less time? > > I reran a test on a 1024 cpu system, using my test patch to only call > __stop_machine() once. Added printks to show the kernel timestamp > at various points. > > When calling disable_nonboot_cpus() and enable_nonboot_cpus() just after > booting the system: > The loop calling __cpu_notify(CPU_DOWN_PREPARE) took 376.6 seconds. > The loop calling cpu_notify_nofail(CPU_DEAD) took 8.1 seconds. > > My guess is that notifiers do more work in the CPU_DOWN_PREPARE case. > > I also added a loop calling a new notifier (CPU_TEST) which none of > notifiers would recognize, to measure the time it took to spin through > the call chain without the notifiers doing any work. It took > 0.0067 seconds. > > On the actual reboot, as the system was shutting down: > The loop calling __cpu_notify(CPU_DOWN_PREPARE) took 333.8 seconds. > The loop calling cpu_notify_nofail(CPU_DEAD) took 2.7 seconds. How about if you take the notifier_call_chain function copy it to kernel/sys.c, and time each notifier_call() callout individually. Robin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On 04/12/2013 01:38 AM, Russ Anderson wrote: > On Thu, Apr 11, 2013 at 08:15:27PM +0530, Srivatsa S. Bhat wrote: >> On 04/11/2013 07:53 PM, Russ Anderson wrote: >>> On Thu, Apr 11, 2013 at 06:15:18PM +0530, Srivatsa S. Bhat wrote: One more thing we have to note is that, there are 4 notifiers for taking a CPU offline: CPU_DOWN_PREPARE CPU_DYING CPU_DEAD CPU_POST_DEAD The first can be run in parallel as mentioned above. The second is run in parallel in the stop_machine() phase as shown in Russ' patch. But the third and fourth set of notifications all end up running only on CPU0, which will again slow down things. >>> >>> In my testing the third and fourth set were a small part of the overall >>> time. Less than 10%, with cpu notifiers 90+% of the time. >> >> *All* of them are cpu notifiers! All of them invoke __cpu_notify() >> internally. >> So how did you differentiate between them and find out that the third and >> fourth sets take less time? > > I reran a test on a 1024 cpu system, using my test patch to only call > __stop_machine() once. Added printks to show the kernel timestamp > at various points. > > When calling disable_nonboot_cpus() and enable_nonboot_cpus() just after > booting the system: > The loop calling __cpu_notify(CPU_DOWN_PREPARE) took 376.6 seconds. > The loop calling cpu_notify_nofail(CPU_DEAD) took 8.1 seconds. > > My guess is that notifiers do more work in the CPU_DOWN_PREPARE case. > > I also added a loop calling a new notifier (CPU_TEST) which none of > notifiers would recognize, to measure the time it took to spin through > the call chain without the notifiers doing any work. It took > 0.0067 seconds. > > On the actual reboot, as the system was shutting down: > The loop calling __cpu_notify(CPU_DOWN_PREPARE) took 333.8 seconds. > The loop calling cpu_notify_nofail(CPU_DEAD) took 2.7 seconds. > > I don't know how many notifiers are on the chain, or if there is > one heavy hitter accounting for much of the time in the > CPU_DOWN_PREPARE case. > > > FWIW, the overall cpu stop times are somewhat longer than what I > measured before. Not sure if the difference is due to changes in > my test patch, other kernel changes pulled in, or some difference > on the test system. > > Thanks a lot for reporting the time taken at each stage. Its extremely useful. So, we can drop the idea of taking CPUs down in multiple rounds like 512, 256 etc. And, like you mentioned earlier, just running the CPU_DOWN_PREPARE notifiers in parallel (like we discussed earlier) should give us all the performance improvement. Or perhaps, we can instrument the code in kernel/notifier.c (notifier_call_chain) to find out if there is a rogue notifier which contributes most to the ~300 seconds. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On Thu, Apr 11, 2013 at 08:15:27PM +0530, Srivatsa S. Bhat wrote: > On 04/11/2013 07:53 PM, Russ Anderson wrote: > > On Thu, Apr 11, 2013 at 06:15:18PM +0530, Srivatsa S. Bhat wrote: > >> > >> One more thing we have to note is that, there are 4 notifiers for taking a > >> CPU offline: > >> > >> CPU_DOWN_PREPARE > >> CPU_DYING > >> CPU_DEAD > >> CPU_POST_DEAD > >> > >> The first can be run in parallel as mentioned above. The second is run in > >> parallel in the stop_machine() phase as shown in Russ' patch. But the third > >> and fourth set of notifications all end up running only on CPU0, which will > >> again slow down things. > > > > In my testing the third and fourth set were a small part of the overall > > time. Less than 10%, with cpu notifiers 90+% of the time. > > *All* of them are cpu notifiers! All of them invoke __cpu_notify() internally. > So how did you differentiate between them and find out that the third and > fourth sets take less time? I reran a test on a 1024 cpu system, using my test patch to only call __stop_machine() once. Added printks to show the kernel timestamp at various points. When calling disable_nonboot_cpus() and enable_nonboot_cpus() just after booting the system: The loop calling __cpu_notify(CPU_DOWN_PREPARE) took 376.6 seconds. The loop calling cpu_notify_nofail(CPU_DEAD) took 8.1 seconds. My guess is that notifiers do more work in the CPU_DOWN_PREPARE case. I also added a loop calling a new notifier (CPU_TEST) which none of notifiers would recognize, to measure the time it took to spin through the call chain without the notifiers doing any work. It took 0.0067 seconds. On the actual reboot, as the system was shutting down: The loop calling __cpu_notify(CPU_DOWN_PREPARE) took 333.8 seconds. The loop calling cpu_notify_nofail(CPU_DEAD) took 2.7 seconds. I don't know how many notifiers are on the chain, or if there is one heavy hitter accounting for much of the time in the CPU_DOWN_PREPARE case. FWIW, the overall cpu stop times are somewhat longer than what I measured before. Not sure if the difference is due to changes in my test patch, other kernel changes pulled in, or some difference on the test system. > > So you may > > not need the added complexity, or at least fix the cpu notifier part > > first. > > > > To make the 3rd and 4th run fast, the only thing we need to do is take CPUs > offline in smaller steps, like 512, 256 etc.. It doesn't add any extra > complexity over and above what is necessary to make the cpu notifiers run > in parallel in the first place. > > Regards, > Srivatsa S. Bhat -- Russ Anderson, OS RAS/Partitioning Project Lead SGI - Silicon Graphics Inc r...@sgi.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On 04/11/2013 07:53 PM, Russ Anderson wrote: > On Thu, Apr 11, 2013 at 06:15:18PM +0530, Srivatsa S. Bhat wrote: >> On 04/11/2013 11:01 AM, Paul Mackerras wrote: >>> On Wed, Apr 10, 2013 at 08:10:05AM -0700, Linus Torvalds wrote: The optimal solution would be to just speed up the disable_nonboot_cpus() code so much that it isn't an issue. That would be good for suspending too, although I guess suspend isn't a big issue if you have a thousand CPU's. Has anybody checked whether we could do the cpu_down() on non-boot CPU's in parallel? Right now we serialize the thing completely, with >>> >>> I thought Srivatsa S. Bhat had a patchset that did exactly that. >>> Srivatsa? >>> >> >> Thanks for the CC, Paul! Adding some more people to CC. >> >> Actually, my patchset was about removing stop_machine() from the CPU >> offline path. >> http://lwn.net/Articles/538819/ > > I certainly agree with the intent. > Thank you! >> And here is the performance improvement I had measured in the version >> prior to that: >> http://article.gmane.org/gmane.linux.kernel/1435249 >> >> I'm planning to revive this patchset after the 3.10 merge window closes, >> because it depends on doing a tree-wide sweep, and I think its a little >> late to do it in time for the upcoming 3.10 merge window itself. >> >> Anyway, that's about removing stop_machine from CPU hotplug. >> >> Coming to bulk CPU hotplug, yes, I had ideas similar to what Russ suggested. >> But I believe we can do more than that. >> >> As Russ pointed out, the notifiers are not thread-safe, so calling them >> in parallel with different CPUs as arguments isn't going to work. >> >> So, first, we can convert all the CPU hotplug notifiers to take a cpumask >> instead of a single CPU. So assuming that there are 'n' notifiers in total, >> the number of function calls would become n, instead of n*1024. >> But that itself most likely won't give us much benefit over the for-loop >> that Russ has done in his patch, because it'll simply do longer processing >> in each of those 'n' notifiers, by iterating over the cpumask inside each >> notifier. > > As an alternative, how about each cpu have their own notifier list? > Then one task per cpu can spin through that cpu's notifier list, > allowing them to run in parallel. > > I don't know if that would be a faster solution than adding cpumask > to notifiers, but it my guess is it may. > That might not work out well because those notifiers will have to lock against each other. That is, notifier callback A cannot run as A(cpuX) and A(cpuY) in parallel. They will have to serialize themselves, which will make the whole effort useless. But, as I mentioned earlier, A(cpuX) and B(cpuX) can run in parallel without additional serialization, if A and B are completely different callbacks (ie., belonging to different subsystems). >> Now comes the interesting thing: >> >> Consider a notifier chain that looks like this: >> Priority 0: A->B->C->D >> >> We can't invoke say notifier callback A simultaneously on 2 CPUs with 2 >> different hotcpus as argument. *However*, since A, B, C, D all (more or less) >> belong to different subsystems, we can call A, B, C and D in parallel on >> different CPUs. They won't even serialize amongst themselves because they >> take locks (if any) of different subsystems. And since they are of same >> priority, the ordering (A after B or B after A) doesn't matter as well. >> >> So with this, if we combine the idea I wrote above about giving a cpumask >> to each of these notifiers to work with, we end up in this: >> >> CPU 0 CPU 1 CPU2 >> A(cpumask) B(cpumask) C(cpumask) >> >> So, for example, the CPU_DOWN_PREPARE notification can be processed in >> parallel >> on multiple CPUs at a time, for a given cpumask! That should definitely >> give us a good speed-up. >> >> One more thing we have to note is that, there are 4 notifiers for taking a >> CPU offline: >> >> CPU_DOWN_PREPARE >> CPU_DYING >> CPU_DEAD >> CPU_POST_DEAD >> >> The first can be run in parallel as mentioned above. The second is run in >> parallel in the stop_machine() phase as shown in Russ' patch. But the third >> and fourth set of notifications all end up running only on CPU0, which will >> again slow down things. > > In my testing the third and fourth set were a small part of the overall > time. Less than 10%, with cpu notifiers 90+% of the time. *All* of them are cpu notifiers! All of them invoke __cpu_notify() internally. So how did you differentiate between them and find out that the third and fourth sets take less time? > So you may > not need the added complexity, or at least fix the cpu notifier part > first. > To make the 3rd and 4th run fast, the only thing we need to do is take CPUs offline in smaller steps, like 512, 256 etc.. It doesn't add any extra complexity over and above what is necessary to make the cpu notifiers run in parallel in the first place.
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On Thu, Apr 11, 2013 at 06:15:18PM +0530, Srivatsa S. Bhat wrote: > On 04/11/2013 11:01 AM, Paul Mackerras wrote: > > On Wed, Apr 10, 2013 at 08:10:05AM -0700, Linus Torvalds wrote: > >> The optimal solution would be to just speed up the > >> disable_nonboot_cpus() code so much that it isn't an issue. That would > >> be good for suspending too, although I guess suspend isn't a big issue > >> if you have a thousand CPU's. > >> > >> Has anybody checked whether we could do the cpu_down() on non-boot > >> CPU's in parallel? Right now we serialize the thing completely, with > > > > I thought Srivatsa S. Bhat had a patchset that did exactly that. > > Srivatsa? > > > > Thanks for the CC, Paul! Adding some more people to CC. > > Actually, my patchset was about removing stop_machine() from the CPU > offline path. > http://lwn.net/Articles/538819/ I certainly agree with the intent. > And here is the performance improvement I had measured in the version > prior to that: > http://article.gmane.org/gmane.linux.kernel/1435249 > > I'm planning to revive this patchset after the 3.10 merge window closes, > because it depends on doing a tree-wide sweep, and I think its a little > late to do it in time for the upcoming 3.10 merge window itself. > > Anyway, that's about removing stop_machine from CPU hotplug. > > Coming to bulk CPU hotplug, yes, I had ideas similar to what Russ suggested. > But I believe we can do more than that. > > As Russ pointed out, the notifiers are not thread-safe, so calling them > in parallel with different CPUs as arguments isn't going to work. > > So, first, we can convert all the CPU hotplug notifiers to take a cpumask > instead of a single CPU. So assuming that there are 'n' notifiers in total, > the number of function calls would become n, instead of n*1024. > But that itself most likely won't give us much benefit over the for-loop > that Russ has done in his patch, because it'll simply do longer processing > in each of those 'n' notifiers, by iterating over the cpumask inside each > notifier. As an alternative, how about each cpu have their own notifier list? Then one task per cpu can spin through that cpu's notifier list, allowing them to run in parallel. I don't know if that would be a faster solution than adding cpumask to notifiers, but it my guess is it may. > Now comes the interesting thing: > > Consider a notifier chain that looks like this: > Priority 0: A->B->C->D > > We can't invoke say notifier callback A simultaneously on 2 CPUs with 2 > different hotcpus as argument. *However*, since A, B, C, D all (more or less) > belong to different subsystems, we can call A, B, C and D in parallel on > different CPUs. They won't even serialize amongst themselves because they > take locks (if any) of different subsystems. And since they are of same > priority, the ordering (A after B or B after A) doesn't matter as well. > > So with this, if we combine the idea I wrote above about giving a cpumask > to each of these notifiers to work with, we end up in this: > > CPU 0 CPU 1 CPU2 > A(cpumask) B(cpumask) C(cpumask) > > So, for example, the CPU_DOWN_PREPARE notification can be processed in > parallel > on multiple CPUs at a time, for a given cpumask! That should definitely > give us a good speed-up. > > One more thing we have to note is that, there are 4 notifiers for taking a > CPU offline: > > CPU_DOWN_PREPARE > CPU_DYING > CPU_DEAD > CPU_POST_DEAD > > The first can be run in parallel as mentioned above. The second is run in > parallel in the stop_machine() phase as shown in Russ' patch. But the third > and fourth set of notifications all end up running only on CPU0, which will > again slow down things. In my testing the third and fourth set were a small part of the overall time. Less than 10%, with cpu notifiers 90+% of the time. So you may not need the added complexity, or at least fix the cpu notifier part first. > So I suggest taking down the 1024 CPUs in multiple phases, like a binary > search. > First, take 512 CPUs down, then 256 CPUs, then 128 CPUs etc. So at every bulk > CPU hotplug, we have enough online CPUs to handle the notifier load, and that > helps speed things up. Moreover, a handful of calls to stop_machine() is OK > because, stop_machine() takes progressively lesser and lesser time because > lesser CPUs are online on each iteration (and hence it reduces the > synchronization overhead of the stop-machine phase). > > The only downside to this whole idea of running the notifiers of a given > priority in parallel, is error handling - if a notifier fails, it would be > troublesome to rollback I guess. But if we forget that for a moment, we can > give this idea a try! Yes. > Regards, > Srivatsa S. Bhat -- Russ Anderson, OS RAS/Partitioning Project Lead SGI - Silicon Graphics Inc r...@sgi.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
For the v3.9 release, can we consider my awful patch? Thanks, Robin On Thu, Apr 11, 2013 at 06:15:18PM +0530, Srivatsa S. Bhat wrote: > On 04/11/2013 11:01 AM, Paul Mackerras wrote: > > On Wed, Apr 10, 2013 at 08:10:05AM -0700, Linus Torvalds wrote: > >> The optimal solution would be to just speed up the > >> disable_nonboot_cpus() code so much that it isn't an issue. That would > >> be good for suspending too, although I guess suspend isn't a big issue > >> if you have a thousand CPU's. > >> > >> Has anybody checked whether we could do the cpu_down() on non-boot > >> CPU's in parallel? Right now we serialize the thing completely, with > > > > I thought Srivatsa S. Bhat had a patchset that did exactly that. > > Srivatsa? > > > > Thanks for the CC, Paul! Adding some more people to CC. > > Actually, my patchset was about removing stop_machine() from the CPU > offline path. > http://lwn.net/Articles/538819/ > > And here is the performance improvement I had measured in the version > prior to that: > http://article.gmane.org/gmane.linux.kernel/1435249 > > I'm planning to revive this patchset after the 3.10 merge window closes, > because it depends on doing a tree-wide sweep, and I think its a little > late to do it in time for the upcoming 3.10 merge window itself. > > Anyway, that's about removing stop_machine from CPU hotplug. > > Coming to bulk CPU hotplug, yes, I had ideas similar to what Russ suggested. > But I believe we can do more than that. > > As Russ pointed out, the notifiers are not thread-safe, so calling them > in parallel with different CPUs as arguments isn't going to work. > > So, first, we can convert all the CPU hotplug notifiers to take a cpumask > instead of a single CPU. So assuming that there are 'n' notifiers in total, > the number of function calls would become n, instead of n*1024. > But that itself most likely won't give us much benefit over the for-loop > that Russ has done in his patch, because it'll simply do longer processing > in each of those 'n' notifiers, by iterating over the cpumask inside each > notifier. > > Now comes the interesting thing: > > Consider a notifier chain that looks like this: > Priority 0: A->B->C->D > > We can't invoke say notifier callback A simultaneously on 2 CPUs with 2 > different hotcpus as argument. *However*, since A, B, C, D all (more or less) > belong to different subsystems, we can call A, B, C and D in parallel on > different CPUs. They won't even serialize amongst themselves because they > take locks (if any) of different subsystems. And since they are of same > priority, the ordering (A after B or B after A) doesn't matter as well. > > So with this, if we combine the idea I wrote above about giving a cpumask > to each of these notifiers to work with, we end up in this: > > CPU 0 CPU 1 CPU2 > A(cpumask) B(cpumask) C(cpumask) > > So, for example, the CPU_DOWN_PREPARE notification can be processed in > parallel > on multiple CPUs at a time, for a given cpumask! That should definitely > give us a good speed-up. > > One more thing we have to note is that, there are 4 notifiers for taking a > CPU offline: > > CPU_DOWN_PREPARE > CPU_DYING > CPU_DEAD > CPU_POST_DEAD > > The first can be run in parallel as mentioned above. The second is run in > parallel in the stop_machine() phase as shown in Russ' patch. But the third > and fourth set of notifications all end up running only on CPU0, which will > again slow down things. > > So I suggest taking down the 1024 CPUs in multiple phases, like a binary > search. > First, take 512 CPUs down, then 256 CPUs, then 128 CPUs etc. So at every bulk > CPU hotplug, we have enough online CPUs to handle the notifier load, and that > helps speed things up. Moreover, a handful of calls to stop_machine() is OK > because, stop_machine() takes progressively lesser and lesser time because > lesser CPUs are online on each iteration (and hence it reduces the > synchronization overhead of the stop-machine phase). > > The only downside to this whole idea of running the notifiers of a given > priority in parallel, is error handling - if a notifier fails, it would be > troublesome to rollback I guess. But if we forget that for a moment, we can > give this idea a try! > > Regards, > Srivatsa S. Bhat > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
For the v3.9 release, can we consider my awful patch? Thanks, Robin On Thu, Apr 11, 2013 at 06:15:18PM +0530, Srivatsa S. Bhat wrote: On 04/11/2013 11:01 AM, Paul Mackerras wrote: On Wed, Apr 10, 2013 at 08:10:05AM -0700, Linus Torvalds wrote: The optimal solution would be to just speed up the disable_nonboot_cpus() code so much that it isn't an issue. That would be good for suspending too, although I guess suspend isn't a big issue if you have a thousand CPU's. Has anybody checked whether we could do the cpu_down() on non-boot CPU's in parallel? Right now we serialize the thing completely, with I thought Srivatsa S. Bhat had a patchset that did exactly that. Srivatsa? Thanks for the CC, Paul! Adding some more people to CC. Actually, my patchset was about removing stop_machine() from the CPU offline path. http://lwn.net/Articles/538819/ And here is the performance improvement I had measured in the version prior to that: http://article.gmane.org/gmane.linux.kernel/1435249 I'm planning to revive this patchset after the 3.10 merge window closes, because it depends on doing a tree-wide sweep, and I think its a little late to do it in time for the upcoming 3.10 merge window itself. Anyway, that's about removing stop_machine from CPU hotplug. Coming to bulk CPU hotplug, yes, I had ideas similar to what Russ suggested. But I believe we can do more than that. As Russ pointed out, the notifiers are not thread-safe, so calling them in parallel with different CPUs as arguments isn't going to work. So, first, we can convert all the CPU hotplug notifiers to take a cpumask instead of a single CPU. So assuming that there are 'n' notifiers in total, the number of function calls would become n, instead of n*1024. But that itself most likely won't give us much benefit over the for-loop that Russ has done in his patch, because it'll simply do longer processing in each of those 'n' notifiers, by iterating over the cpumask inside each notifier. Now comes the interesting thing: Consider a notifier chain that looks like this: Priority 0: A-B-C-D We can't invoke say notifier callback A simultaneously on 2 CPUs with 2 different hotcpus as argument. *However*, since A, B, C, D all (more or less) belong to different subsystems, we can call A, B, C and D in parallel on different CPUs. They won't even serialize amongst themselves because they take locks (if any) of different subsystems. And since they are of same priority, the ordering (A after B or B after A) doesn't matter as well. So with this, if we combine the idea I wrote above about giving a cpumask to each of these notifiers to work with, we end up in this: CPU 0 CPU 1 CPU2 A(cpumask) B(cpumask) C(cpumask) So, for example, the CPU_DOWN_PREPARE notification can be processed in parallel on multiple CPUs at a time, for a given cpumask! That should definitely give us a good speed-up. One more thing we have to note is that, there are 4 notifiers for taking a CPU offline: CPU_DOWN_PREPARE CPU_DYING CPU_DEAD CPU_POST_DEAD The first can be run in parallel as mentioned above. The second is run in parallel in the stop_machine() phase as shown in Russ' patch. But the third and fourth set of notifications all end up running only on CPU0, which will again slow down things. So I suggest taking down the 1024 CPUs in multiple phases, like a binary search. First, take 512 CPUs down, then 256 CPUs, then 128 CPUs etc. So at every bulk CPU hotplug, we have enough online CPUs to handle the notifier load, and that helps speed things up. Moreover, a handful of calls to stop_machine() is OK because, stop_machine() takes progressively lesser and lesser time because lesser CPUs are online on each iteration (and hence it reduces the synchronization overhead of the stop-machine phase). The only downside to this whole idea of running the notifiers of a given priority in parallel, is error handling - if a notifier fails, it would be troublesome to rollback I guess. But if we forget that for a moment, we can give this idea a try! Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On Thu, Apr 11, 2013 at 06:15:18PM +0530, Srivatsa S. Bhat wrote: On 04/11/2013 11:01 AM, Paul Mackerras wrote: On Wed, Apr 10, 2013 at 08:10:05AM -0700, Linus Torvalds wrote: The optimal solution would be to just speed up the disable_nonboot_cpus() code so much that it isn't an issue. That would be good for suspending too, although I guess suspend isn't a big issue if you have a thousand CPU's. Has anybody checked whether we could do the cpu_down() on non-boot CPU's in parallel? Right now we serialize the thing completely, with I thought Srivatsa S. Bhat had a patchset that did exactly that. Srivatsa? Thanks for the CC, Paul! Adding some more people to CC. Actually, my patchset was about removing stop_machine() from the CPU offline path. http://lwn.net/Articles/538819/ I certainly agree with the intent. And here is the performance improvement I had measured in the version prior to that: http://article.gmane.org/gmane.linux.kernel/1435249 I'm planning to revive this patchset after the 3.10 merge window closes, because it depends on doing a tree-wide sweep, and I think its a little late to do it in time for the upcoming 3.10 merge window itself. Anyway, that's about removing stop_machine from CPU hotplug. Coming to bulk CPU hotplug, yes, I had ideas similar to what Russ suggested. But I believe we can do more than that. As Russ pointed out, the notifiers are not thread-safe, so calling them in parallel with different CPUs as arguments isn't going to work. So, first, we can convert all the CPU hotplug notifiers to take a cpumask instead of a single CPU. So assuming that there are 'n' notifiers in total, the number of function calls would become n, instead of n*1024. But that itself most likely won't give us much benefit over the for-loop that Russ has done in his patch, because it'll simply do longer processing in each of those 'n' notifiers, by iterating over the cpumask inside each notifier. As an alternative, how about each cpu have their own notifier list? Then one task per cpu can spin through that cpu's notifier list, allowing them to run in parallel. I don't know if that would be a faster solution than adding cpumask to notifiers, but it my guess is it may. Now comes the interesting thing: Consider a notifier chain that looks like this: Priority 0: A-B-C-D We can't invoke say notifier callback A simultaneously on 2 CPUs with 2 different hotcpus as argument. *However*, since A, B, C, D all (more or less) belong to different subsystems, we can call A, B, C and D in parallel on different CPUs. They won't even serialize amongst themselves because they take locks (if any) of different subsystems. And since they are of same priority, the ordering (A after B or B after A) doesn't matter as well. So with this, if we combine the idea I wrote above about giving a cpumask to each of these notifiers to work with, we end up in this: CPU 0 CPU 1 CPU2 A(cpumask) B(cpumask) C(cpumask) So, for example, the CPU_DOWN_PREPARE notification can be processed in parallel on multiple CPUs at a time, for a given cpumask! That should definitely give us a good speed-up. One more thing we have to note is that, there are 4 notifiers for taking a CPU offline: CPU_DOWN_PREPARE CPU_DYING CPU_DEAD CPU_POST_DEAD The first can be run in parallel as mentioned above. The second is run in parallel in the stop_machine() phase as shown in Russ' patch. But the third and fourth set of notifications all end up running only on CPU0, which will again slow down things. In my testing the third and fourth set were a small part of the overall time. Less than 10%, with cpu notifiers 90+% of the time. So you may not need the added complexity, or at least fix the cpu notifier part first. So I suggest taking down the 1024 CPUs in multiple phases, like a binary search. First, take 512 CPUs down, then 256 CPUs, then 128 CPUs etc. So at every bulk CPU hotplug, we have enough online CPUs to handle the notifier load, and that helps speed things up. Moreover, a handful of calls to stop_machine() is OK because, stop_machine() takes progressively lesser and lesser time because lesser CPUs are online on each iteration (and hence it reduces the synchronization overhead of the stop-machine phase). The only downside to this whole idea of running the notifiers of a given priority in parallel, is error handling - if a notifier fails, it would be troublesome to rollback I guess. But if we forget that for a moment, we can give this idea a try! Yes. Regards, Srivatsa S. Bhat -- Russ Anderson, OS RAS/Partitioning Project Lead SGI - Silicon Graphics Inc r...@sgi.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On 04/11/2013 07:53 PM, Russ Anderson wrote: On Thu, Apr 11, 2013 at 06:15:18PM +0530, Srivatsa S. Bhat wrote: On 04/11/2013 11:01 AM, Paul Mackerras wrote: On Wed, Apr 10, 2013 at 08:10:05AM -0700, Linus Torvalds wrote: The optimal solution would be to just speed up the disable_nonboot_cpus() code so much that it isn't an issue. That would be good for suspending too, although I guess suspend isn't a big issue if you have a thousand CPU's. Has anybody checked whether we could do the cpu_down() on non-boot CPU's in parallel? Right now we serialize the thing completely, with I thought Srivatsa S. Bhat had a patchset that did exactly that. Srivatsa? Thanks for the CC, Paul! Adding some more people to CC. Actually, my patchset was about removing stop_machine() from the CPU offline path. http://lwn.net/Articles/538819/ I certainly agree with the intent. Thank you! And here is the performance improvement I had measured in the version prior to that: http://article.gmane.org/gmane.linux.kernel/1435249 I'm planning to revive this patchset after the 3.10 merge window closes, because it depends on doing a tree-wide sweep, and I think its a little late to do it in time for the upcoming 3.10 merge window itself. Anyway, that's about removing stop_machine from CPU hotplug. Coming to bulk CPU hotplug, yes, I had ideas similar to what Russ suggested. But I believe we can do more than that. As Russ pointed out, the notifiers are not thread-safe, so calling them in parallel with different CPUs as arguments isn't going to work. So, first, we can convert all the CPU hotplug notifiers to take a cpumask instead of a single CPU. So assuming that there are 'n' notifiers in total, the number of function calls would become n, instead of n*1024. But that itself most likely won't give us much benefit over the for-loop that Russ has done in his patch, because it'll simply do longer processing in each of those 'n' notifiers, by iterating over the cpumask inside each notifier. As an alternative, how about each cpu have their own notifier list? Then one task per cpu can spin through that cpu's notifier list, allowing them to run in parallel. I don't know if that would be a faster solution than adding cpumask to notifiers, but it my guess is it may. That might not work out well because those notifiers will have to lock against each other. That is, notifier callback A cannot run as A(cpuX) and A(cpuY) in parallel. They will have to serialize themselves, which will make the whole effort useless. But, as I mentioned earlier, A(cpuX) and B(cpuX) can run in parallel without additional serialization, if A and B are completely different callbacks (ie., belonging to different subsystems). Now comes the interesting thing: Consider a notifier chain that looks like this: Priority 0: A-B-C-D We can't invoke say notifier callback A simultaneously on 2 CPUs with 2 different hotcpus as argument. *However*, since A, B, C, D all (more or less) belong to different subsystems, we can call A, B, C and D in parallel on different CPUs. They won't even serialize amongst themselves because they take locks (if any) of different subsystems. And since they are of same priority, the ordering (A after B or B after A) doesn't matter as well. So with this, if we combine the idea I wrote above about giving a cpumask to each of these notifiers to work with, we end up in this: CPU 0 CPU 1 CPU2 A(cpumask) B(cpumask) C(cpumask) So, for example, the CPU_DOWN_PREPARE notification can be processed in parallel on multiple CPUs at a time, for a given cpumask! That should definitely give us a good speed-up. One more thing we have to note is that, there are 4 notifiers for taking a CPU offline: CPU_DOWN_PREPARE CPU_DYING CPU_DEAD CPU_POST_DEAD The first can be run in parallel as mentioned above. The second is run in parallel in the stop_machine() phase as shown in Russ' patch. But the third and fourth set of notifications all end up running only on CPU0, which will again slow down things. In my testing the third and fourth set were a small part of the overall time. Less than 10%, with cpu notifiers 90+% of the time. *All* of them are cpu notifiers! All of them invoke __cpu_notify() internally. So how did you differentiate between them and find out that the third and fourth sets take less time? So you may not need the added complexity, or at least fix the cpu notifier part first. To make the 3rd and 4th run fast, the only thing we need to do is take CPUs offline in smaller steps, like 512, 256 etc.. It doesn't add any extra complexity over and above what is necessary to make the cpu notifiers run in parallel in the first place. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On Thu, Apr 11, 2013 at 08:15:27PM +0530, Srivatsa S. Bhat wrote: On 04/11/2013 07:53 PM, Russ Anderson wrote: On Thu, Apr 11, 2013 at 06:15:18PM +0530, Srivatsa S. Bhat wrote: One more thing we have to note is that, there are 4 notifiers for taking a CPU offline: CPU_DOWN_PREPARE CPU_DYING CPU_DEAD CPU_POST_DEAD The first can be run in parallel as mentioned above. The second is run in parallel in the stop_machine() phase as shown in Russ' patch. But the third and fourth set of notifications all end up running only on CPU0, which will again slow down things. In my testing the third and fourth set were a small part of the overall time. Less than 10%, with cpu notifiers 90+% of the time. *All* of them are cpu notifiers! All of them invoke __cpu_notify() internally. So how did you differentiate between them and find out that the third and fourth sets take less time? I reran a test on a 1024 cpu system, using my test patch to only call __stop_machine() once. Added printks to show the kernel timestamp at various points. When calling disable_nonboot_cpus() and enable_nonboot_cpus() just after booting the system: The loop calling __cpu_notify(CPU_DOWN_PREPARE) took 376.6 seconds. The loop calling cpu_notify_nofail(CPU_DEAD) took 8.1 seconds. My guess is that notifiers do more work in the CPU_DOWN_PREPARE case. I also added a loop calling a new notifier (CPU_TEST) which none of notifiers would recognize, to measure the time it took to spin through the call chain without the notifiers doing any work. It took 0.0067 seconds. On the actual reboot, as the system was shutting down: The loop calling __cpu_notify(CPU_DOWN_PREPARE) took 333.8 seconds. The loop calling cpu_notify_nofail(CPU_DEAD) took 2.7 seconds. I don't know how many notifiers are on the chain, or if there is one heavy hitter accounting for much of the time in the CPU_DOWN_PREPARE case. FWIW, the overall cpu stop times are somewhat longer than what I measured before. Not sure if the difference is due to changes in my test patch, other kernel changes pulled in, or some difference on the test system. So you may not need the added complexity, or at least fix the cpu notifier part first. To make the 3rd and 4th run fast, the only thing we need to do is take CPUs offline in smaller steps, like 512, 256 etc.. It doesn't add any extra complexity over and above what is necessary to make the cpu notifiers run in parallel in the first place. Regards, Srivatsa S. Bhat -- Russ Anderson, OS RAS/Partitioning Project Lead SGI - Silicon Graphics Inc r...@sgi.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On 04/12/2013 01:38 AM, Russ Anderson wrote: On Thu, Apr 11, 2013 at 08:15:27PM +0530, Srivatsa S. Bhat wrote: On 04/11/2013 07:53 PM, Russ Anderson wrote: On Thu, Apr 11, 2013 at 06:15:18PM +0530, Srivatsa S. Bhat wrote: One more thing we have to note is that, there are 4 notifiers for taking a CPU offline: CPU_DOWN_PREPARE CPU_DYING CPU_DEAD CPU_POST_DEAD The first can be run in parallel as mentioned above. The second is run in parallel in the stop_machine() phase as shown in Russ' patch. But the third and fourth set of notifications all end up running only on CPU0, which will again slow down things. In my testing the third and fourth set were a small part of the overall time. Less than 10%, with cpu notifiers 90+% of the time. *All* of them are cpu notifiers! All of them invoke __cpu_notify() internally. So how did you differentiate between them and find out that the third and fourth sets take less time? I reran a test on a 1024 cpu system, using my test patch to only call __stop_machine() once. Added printks to show the kernel timestamp at various points. When calling disable_nonboot_cpus() and enable_nonboot_cpus() just after booting the system: The loop calling __cpu_notify(CPU_DOWN_PREPARE) took 376.6 seconds. The loop calling cpu_notify_nofail(CPU_DEAD) took 8.1 seconds. My guess is that notifiers do more work in the CPU_DOWN_PREPARE case. I also added a loop calling a new notifier (CPU_TEST) which none of notifiers would recognize, to measure the time it took to spin through the call chain without the notifiers doing any work. It took 0.0067 seconds. On the actual reboot, as the system was shutting down: The loop calling __cpu_notify(CPU_DOWN_PREPARE) took 333.8 seconds. The loop calling cpu_notify_nofail(CPU_DEAD) took 2.7 seconds. I don't know how many notifiers are on the chain, or if there is one heavy hitter accounting for much of the time in the CPU_DOWN_PREPARE case. FWIW, the overall cpu stop times are somewhat longer than what I measured before. Not sure if the difference is due to changes in my test patch, other kernel changes pulled in, or some difference on the test system. Thanks a lot for reporting the time taken at each stage. Its extremely useful. So, we can drop the idea of taking CPUs down in multiple rounds like 512, 256 etc. And, like you mentioned earlier, just running the CPU_DOWN_PREPARE notifiers in parallel (like we discussed earlier) should give us all the performance improvement. Or perhaps, we can instrument the code in kernel/notifier.c (notifier_call_chain) to find out if there is a rogue notifier which contributes most to the ~300 seconds. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
On Thu, Apr 11, 2013 at 03:08:20PM -0500, Russ Anderson wrote: On Thu, Apr 11, 2013 at 08:15:27PM +0530, Srivatsa S. Bhat wrote: On 04/11/2013 07:53 PM, Russ Anderson wrote: On Thu, Apr 11, 2013 at 06:15:18PM +0530, Srivatsa S. Bhat wrote: One more thing we have to note is that, there are 4 notifiers for taking a CPU offline: CPU_DOWN_PREPARE CPU_DYING CPU_DEAD CPU_POST_DEAD The first can be run in parallel as mentioned above. The second is run in parallel in the stop_machine() phase as shown in Russ' patch. But the third and fourth set of notifications all end up running only on CPU0, which will again slow down things. In my testing the third and fourth set were a small part of the overall time. Less than 10%, with cpu notifiers 90+% of the time. *All* of them are cpu notifiers! All of them invoke __cpu_notify() internally. So how did you differentiate between them and find out that the third and fourth sets take less time? I reran a test on a 1024 cpu system, using my test patch to only call __stop_machine() once. Added printks to show the kernel timestamp at various points. When calling disable_nonboot_cpus() and enable_nonboot_cpus() just after booting the system: The loop calling __cpu_notify(CPU_DOWN_PREPARE) took 376.6 seconds. The loop calling cpu_notify_nofail(CPU_DEAD) took 8.1 seconds. My guess is that notifiers do more work in the CPU_DOWN_PREPARE case. I also added a loop calling a new notifier (CPU_TEST) which none of notifiers would recognize, to measure the time it took to spin through the call chain without the notifiers doing any work. It took 0.0067 seconds. On the actual reboot, as the system was shutting down: The loop calling __cpu_notify(CPU_DOWN_PREPARE) took 333.8 seconds. The loop calling cpu_notify_nofail(CPU_DEAD) took 2.7 seconds. How about if you take the notifier_call_chain function copy it to kernel/sys.c, and time each notifier_call() callout individually. Robin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.)
* Robin Holt h...@sgi.com wrote: For the v3.9 release, can we consider my awful patch? How about trying what I suggested, to make reboot affine to the boot CPU explicitly, not by shutting down all the other CPUs, but by set_cpus_allowed() or so? That should solve the regression, without the ugly special-casing - while giving time to address the hot-unplug performance bottleneck. Once that is done disable_nonboot_cpus() can be used again for reboot. (But no strong feelings either way - both solutions are a workaround in a sense.) Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/