Re: [PATCH] x86: Add check for number of available vectors before CPU down [v7]
* Prarit Bhargava wrote: > > > On 01/12/2014 04:56 AM, Ingo Molnar wrote: > > > > * Prarit Bhargava wrote: > > > >> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791 > >> > >> When a cpu is downed on a system, the irqs on the cpu are assigned to > >> other cpus. It is possible, however, that when a cpu is downed there > >> aren't enough free vectors on the remaining cpus to account for the > >> vectors from the cpu that is being downed. > >> > >> This results in an interesting "overflow" condition where irqs are > >> "assigned" to a CPU but are not handled. > >> > >> For example, when downing cpus on a 1-64 logical processor system: > >> > >> > >> [ 232.021745] smpboot: CPU 61 is now offline > >> [ 238.480275] smpboot: CPU 62 is now offline > >> [ 245.991080] [ cut here ] > >> [ 245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 > >> dev_watchdog+0x246/0x250() > >> [ 246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out > >> [ 246.013070] Modules linked in: lockd sunrpc iTCO_wdt > >> iTCO_vendor_support sb_edac ixgbe microcode e1000e pcspkr joydev edac_core > >> lpc_ich ioatdma ptp mdio mfd_core i2c_i801 dca pps_core i2c_core wmi > >> acpi_cpufreq isci libsas scsi_transport_sas > >> [ 246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14 > >> [ 246.044451] Hardware name: Intel Corporation S4600LH > >> ../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 > >> [ 246.057371] 0009 88081fa03d40 8164fbf6 > >> 88081fa0ee48 > >> [ 246.065728] 88081fa03d90 88081fa03d80 81054ecc > >> 88081fa13040 > >> [ 246.074073] 88200cce 0040 > >> > >> [ 246.082430] Call Trace: > >> [ 246.085174][] dump_stack+0x46/0x58 > >> [ 246.091633] [] warn_slowpath_common+0x8c/0xc0 > >> [ 246.098352] [] warn_slowpath_fmt+0x46/0x50 > >> [ 246.104786] [] dev_watchdog+0x246/0x250 > >> [ 246.110923] [] ? > >> dev_deactivate_queue.constprop.31+0x80/0x80 > >> [ 246.119097] [] call_timer_fn+0x3a/0x110 > >> [ 246.125224] [] ? update_process_times+0x6f/0x80 > >> [ 246.132137] [] ? > >> dev_deactivate_queue.constprop.31+0x80/0x80 > >> [ 246.140308] [] run_timer_softirq+0x1f0/0x2a0 > >> [ 246.146933] [] __do_softirq+0xe0/0x220 > >> [ 246.152976] [] call_softirq+0x1c/0x30 > >> [ 246.158920] [] do_softirq+0x55/0x90 > >> [ 246.164670] [] irq_exit+0xa5/0xb0 > >> [ 246.170227] [] smp_apic_timer_interrupt+0x4a/0x60 > >> [ 246.177324] [] apic_timer_interrupt+0x6a/0x70 > >> [ 246.184041][] ? cpuidle_enter_state+0x5b/0xe0 > >> [ 246.191559] [] ? cpuidle_enter_state+0x57/0xe0 > >> [ 246.198374] [] cpuidle_idle_call+0xbd/0x200 > >> [ 246.204900] [] arch_cpu_idle+0xe/0x30 > >> [ 246.210846] [] cpu_startup_entry+0xd0/0x250 > >> [ 246.217371] [] rest_init+0x77/0x80 > >> [ 246.223028] [] start_kernel+0x3ee/0x3fb > >> [ 246.229165] [] ? repair_env_string+0x5e/0x5e > >> [ 246.235787] [] x86_64_start_reservations+0x2a/0x2c > >> [ 246.242990] [] x86_64_start_kernel+0xf8/0xfc > >> [ 246.249610] ---[ end trace fb74fdef54d79039 ]--- > >> [ 246.254807] ixgbe :c2:00.0 p786p1: initiating reset due to tx > >> timeout > >> [ 246.262489] ixgbe :c2:00.0 p786p1: Reset adapter > >> Last login: Mon Nov 11 08:35:14 from 10.18.17.119 > >> [root@(none) ~]# [ 246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5 > >> [ 249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow > >> Control: RX/TX > >> [ 246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5 > >> [ 249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow > >> Control: RX/TX > >> > >> (last lines keep repeating. ixgbe driver is dead until module reload.) > >> > >> If the downed cpu has more vectors than are free on the remaining cpus on > >> the > >> system, it is possible that some vectors are "orphaned" even though they > >> are > >> assigned to a cpu. In this case, since the ixgbe driver had a watchdog, > >> the > >> watchdog fired and notified that something was wrong. > >> > >> This patch adds a function, check_vectors(), to compare the number of > >> vectors > >> on the CPU going down and compares it to the number of vectors available on > >> the system. If there aren't enough vectors for the CPU to go down, an > >> error is returned and propogated back to userspace. > >> > >> v2: Do not need to look at percpu irqs > >> v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest > >> Priority Mode > >> v4: Additional changes suggested by Gong Chen. > >> v5/v6/v7: Updated comment text > >> > >> Signed-off-by: Prarit Bhargava > >> Reviewed-by: Gong Chen > >> Cc: Andi Kleen > >> Cc: Michel Lespinasse > >> Cc: Seiji Aguchi > >> Cc: Yang Zhang > >> Cc: Paul Gortmaker > >> Cc: Janet Morgan > >> Cc: Tony Luck > >> Cc: Ruiv Wang > >> Cc: Gong Chen > >> Cc: H.
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v7]
* Prarit Bhargava pra...@redhat.com wrote: On 01/12/2014 04:56 AM, Ingo Molnar wrote: * Prarit Bhargava pra...@redhat.com wrote: Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791 When a cpu is downed on a system, the irqs on the cpu are assigned to other cpus. It is possible, however, that when a cpu is downed there aren't enough free vectors on the remaining cpus to account for the vectors from the cpu that is being downed. This results in an interesting overflow condition where irqs are assigned to a CPU but are not handled. For example, when downing cpus on a 1-64 logical processor system: snip [ 232.021745] smpboot: CPU 61 is now offline [ 238.480275] smpboot: CPU 62 is now offline [ 245.991080] [ cut here ] [ 245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x246/0x250() [ 246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out [ 246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas scsi_transport_sas [ 246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14 [ 246.044451] Hardware name: Intel Corporation S4600LH ../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [ 246.057371] 0009 88081fa03d40 8164fbf6 88081fa0ee48 [ 246.065728] 88081fa03d90 88081fa03d80 81054ecc 88081fa13040 [ 246.074073] 88200cce 0040 [ 246.082430] Call Trace: [ 246.085174] IRQ [8164fbf6] dump_stack+0x46/0x58 [ 246.091633] [81054ecc] warn_slowpath_common+0x8c/0xc0 [ 246.098352] [81054fb6] warn_slowpath_fmt+0x46/0x50 [ 246.104786] [815710d6] dev_watchdog+0x246/0x250 [ 246.110923] [81570e90] ? dev_deactivate_queue.constprop.31+0x80/0x80 [ 246.119097] [8106092a] call_timer_fn+0x3a/0x110 [ 246.125224] [8106280f] ? update_process_times+0x6f/0x80 [ 246.132137] [81570e90] ? dev_deactivate_queue.constprop.31+0x80/0x80 [ 246.140308] [81061db0] run_timer_softirq+0x1f0/0x2a0 [ 246.146933] [81059a80] __do_softirq+0xe0/0x220 [ 246.152976] [8165fedc] call_softirq+0x1c/0x30 [ 246.158920] [810045f5] do_softirq+0x55/0x90 [ 246.164670] [81059d35] irq_exit+0xa5/0xb0 [ 246.170227] [8166062a] smp_apic_timer_interrupt+0x4a/0x60 [ 246.177324] [8165f40a] apic_timer_interrupt+0x6a/0x70 [ 246.184041] EOI [81505a1b] ? cpuidle_enter_state+0x5b/0xe0 [ 246.191559] [81505a17] ? cpuidle_enter_state+0x57/0xe0 [ 246.198374] [81505b5d] cpuidle_idle_call+0xbd/0x200 [ 246.204900] [8100b7ae] arch_cpu_idle+0xe/0x30 [ 246.210846] [810a47b0] cpu_startup_entry+0xd0/0x250 [ 246.217371] [81646b47] rest_init+0x77/0x80 [ 246.223028] [81d09e8e] start_kernel+0x3ee/0x3fb [ 246.229165] [81d0989f] ? repair_env_string+0x5e/0x5e [ 246.235787] [81d095a5] x86_64_start_reservations+0x2a/0x2c [ 246.242990] [81d0969f] x86_64_start_kernel+0xf8/0xfc [ 246.249610] ---[ end trace fb74fdef54d79039 ]--- [ 246.254807] ixgbe :c2:00.0 p786p1: initiating reset due to tx timeout [ 246.262489] ixgbe :c2:00.0 p786p1: Reset adapter Last login: Mon Nov 11 08:35:14 from 10.18.17.119 [root@(none) ~]# [ 246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5 [ 249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX [ 246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5 [ 249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX (last lines keep repeating. ixgbe driver is dead until module reload.) If the downed cpu has more vectors than are free on the remaining cpus on the system, it is possible that some vectors are orphaned even though they are assigned to a cpu. In this case, since the ixgbe driver had a watchdog, the watchdog fired and notified that something was wrong. This patch adds a function, check_vectors(), to compare the number of vectors on the CPU going down and compares it to the number of vectors available on the system. If there aren't enough vectors for the CPU to go down, an error is returned and propogated back to userspace. v2: Do not need to look at percpu irqs v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest Priority Mode v4: Additional changes suggested by Gong Chen. v5/v6/v7: Updated comment text Signed-off-by: Prarit Bhargava pra...@redhat.com Reviewed-by: Gong Chen gong.c...@linux.intel.com Cc: Andi Kleen
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v7]
On 01/12/2014 04:56 AM, Ingo Molnar wrote: > > * Prarit Bhargava wrote: > >> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791 >> >> When a cpu is downed on a system, the irqs on the cpu are assigned to >> other cpus. It is possible, however, that when a cpu is downed there >> aren't enough free vectors on the remaining cpus to account for the >> vectors from the cpu that is being downed. >> >> This results in an interesting "overflow" condition where irqs are >> "assigned" to a CPU but are not handled. >> >> For example, when downing cpus on a 1-64 logical processor system: >> >> >> [ 232.021745] smpboot: CPU 61 is now offline >> [ 238.480275] smpboot: CPU 62 is now offline >> [ 245.991080] [ cut here ] >> [ 245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 >> dev_watchdog+0x246/0x250() >> [ 246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out >> [ 246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support >> sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp >> mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas >> scsi_transport_sas >> [ 246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14 >> [ 246.044451] Hardware name: Intel Corporation S4600LH >> ../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 >> [ 246.057371] 0009 88081fa03d40 8164fbf6 >> 88081fa0ee48 >> [ 246.065728] 88081fa03d90 88081fa03d80 81054ecc >> 88081fa13040 >> [ 246.074073] 88200cce 0040 >> >> [ 246.082430] Call Trace: >> [ 246.085174][] dump_stack+0x46/0x58 >> [ 246.091633] [] warn_slowpath_common+0x8c/0xc0 >> [ 246.098352] [] warn_slowpath_fmt+0x46/0x50 >> [ 246.104786] [] dev_watchdog+0x246/0x250 >> [ 246.110923] [] ? >> dev_deactivate_queue.constprop.31+0x80/0x80 >> [ 246.119097] [] call_timer_fn+0x3a/0x110 >> [ 246.125224] [] ? update_process_times+0x6f/0x80 >> [ 246.132137] [] ? >> dev_deactivate_queue.constprop.31+0x80/0x80 >> [ 246.140308] [] run_timer_softirq+0x1f0/0x2a0 >> [ 246.146933] [] __do_softirq+0xe0/0x220 >> [ 246.152976] [] call_softirq+0x1c/0x30 >> [ 246.158920] [] do_softirq+0x55/0x90 >> [ 246.164670] [] irq_exit+0xa5/0xb0 >> [ 246.170227] [] smp_apic_timer_interrupt+0x4a/0x60 >> [ 246.177324] [] apic_timer_interrupt+0x6a/0x70 >> [ 246.184041][] ? cpuidle_enter_state+0x5b/0xe0 >> [ 246.191559] [] ? cpuidle_enter_state+0x57/0xe0 >> [ 246.198374] [] cpuidle_idle_call+0xbd/0x200 >> [ 246.204900] [] arch_cpu_idle+0xe/0x30 >> [ 246.210846] [] cpu_startup_entry+0xd0/0x250 >> [ 246.217371] [] rest_init+0x77/0x80 >> [ 246.223028] [] start_kernel+0x3ee/0x3fb >> [ 246.229165] [] ? repair_env_string+0x5e/0x5e >> [ 246.235787] [] x86_64_start_reservations+0x2a/0x2c >> [ 246.242990] [] x86_64_start_kernel+0xf8/0xfc >> [ 246.249610] ---[ end trace fb74fdef54d79039 ]--- >> [ 246.254807] ixgbe :c2:00.0 p786p1: initiating reset due to tx timeout >> [ 246.262489] ixgbe :c2:00.0 p786p1: Reset adapter >> Last login: Mon Nov 11 08:35:14 from 10.18.17.119 >> [root@(none) ~]# [ 246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5 >> [ 249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow >> Control: RX/TX >> [ 246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5 >> [ 249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow >> Control: RX/TX >> >> (last lines keep repeating. ixgbe driver is dead until module reload.) >> >> If the downed cpu has more vectors than are free on the remaining cpus on the >> system, it is possible that some vectors are "orphaned" even though they are >> assigned to a cpu. In this case, since the ixgbe driver had a watchdog, the >> watchdog fired and notified that something was wrong. >> >> This patch adds a function, check_vectors(), to compare the number of vectors >> on the CPU going down and compares it to the number of vectors available on >> the system. If there aren't enough vectors for the CPU to go down, an >> error is returned and propogated back to userspace. >> >> v2: Do not need to look at percpu irqs >> v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest >> Priority Mode >> v4: Additional changes suggested by Gong Chen. >> v5/v6/v7: Updated comment text >> >> Signed-off-by: Prarit Bhargava >> Reviewed-by: Gong Chen >> Cc: Andi Kleen >> Cc: Michel Lespinasse >> Cc: Seiji Aguchi >> Cc: Yang Zhang >> Cc: Paul Gortmaker >> Cc: Janet Morgan >> Cc: Tony Luck >> Cc: Ruiv Wang >> Cc: Gong Chen >> Cc: H. Peter Anvin >> Cc: Gong Chen >> Cc: x...@kernel.org >> --- >> arch/x86/include/asm/irq.h |1 + >> arch/x86/kernel/irq.c | 69 >> >> arch/x86/kernel/smpboot.c |6 >> 3 files changed, 76 insertions(+) >>
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v7]
* Prarit Bhargava wrote: > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791 > > When a cpu is downed on a system, the irqs on the cpu are assigned to > other cpus. It is possible, however, that when a cpu is downed there > aren't enough free vectors on the remaining cpus to account for the > vectors from the cpu that is being downed. > > This results in an interesting "overflow" condition where irqs are > "assigned" to a CPU but are not handled. > > For example, when downing cpus on a 1-64 logical processor system: > > > [ 232.021745] smpboot: CPU 61 is now offline > [ 238.480275] smpboot: CPU 62 is now offline > [ 245.991080] [ cut here ] > [ 245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 > dev_watchdog+0x246/0x250() > [ 246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out > [ 246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support > sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp > mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas > scsi_transport_sas > [ 246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14 > [ 246.044451] Hardware name: Intel Corporation S4600LH > ../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 > [ 246.057371] 0009 88081fa03d40 8164fbf6 > 88081fa0ee48 > [ 246.065728] 88081fa03d90 88081fa03d80 81054ecc > 88081fa13040 > [ 246.074073] 88200cce 0040 > > [ 246.082430] Call Trace: > [ 246.085174][] dump_stack+0x46/0x58 > [ 246.091633] [] warn_slowpath_common+0x8c/0xc0 > [ 246.098352] [] warn_slowpath_fmt+0x46/0x50 > [ 246.104786] [] dev_watchdog+0x246/0x250 > [ 246.110923] [] ? > dev_deactivate_queue.constprop.31+0x80/0x80 > [ 246.119097] [] call_timer_fn+0x3a/0x110 > [ 246.125224] [] ? update_process_times+0x6f/0x80 > [ 246.132137] [] ? > dev_deactivate_queue.constprop.31+0x80/0x80 > [ 246.140308] [] run_timer_softirq+0x1f0/0x2a0 > [ 246.146933] [] __do_softirq+0xe0/0x220 > [ 246.152976] [] call_softirq+0x1c/0x30 > [ 246.158920] [] do_softirq+0x55/0x90 > [ 246.164670] [] irq_exit+0xa5/0xb0 > [ 246.170227] [] smp_apic_timer_interrupt+0x4a/0x60 > [ 246.177324] [] apic_timer_interrupt+0x6a/0x70 > [ 246.184041][] ? cpuidle_enter_state+0x5b/0xe0 > [ 246.191559] [] ? cpuidle_enter_state+0x57/0xe0 > [ 246.198374] [] cpuidle_idle_call+0xbd/0x200 > [ 246.204900] [] arch_cpu_idle+0xe/0x30 > [ 246.210846] [] cpu_startup_entry+0xd0/0x250 > [ 246.217371] [] rest_init+0x77/0x80 > [ 246.223028] [] start_kernel+0x3ee/0x3fb > [ 246.229165] [] ? repair_env_string+0x5e/0x5e > [ 246.235787] [] x86_64_start_reservations+0x2a/0x2c > [ 246.242990] [] x86_64_start_kernel+0xf8/0xfc > [ 246.249610] ---[ end trace fb74fdef54d79039 ]--- > [ 246.254807] ixgbe :c2:00.0 p786p1: initiating reset due to tx timeout > [ 246.262489] ixgbe :c2:00.0 p786p1: Reset adapter > Last login: Mon Nov 11 08:35:14 from 10.18.17.119 > [root@(none) ~]# [ 246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5 > [ 249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow > Control: RX/TX > [ 246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5 > [ 249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow > Control: RX/TX > > (last lines keep repeating. ixgbe driver is dead until module reload.) > > If the downed cpu has more vectors than are free on the remaining cpus on the > system, it is possible that some vectors are "orphaned" even though they are > assigned to a cpu. In this case, since the ixgbe driver had a watchdog, the > watchdog fired and notified that something was wrong. > > This patch adds a function, check_vectors(), to compare the number of vectors > on the CPU going down and compares it to the number of vectors available on > the system. If there aren't enough vectors for the CPU to go down, an > error is returned and propogated back to userspace. > > v2: Do not need to look at percpu irqs > v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest > Priority Mode > v4: Additional changes suggested by Gong Chen. > v5/v6/v7: Updated comment text > > Signed-off-by: Prarit Bhargava > Reviewed-by: Gong Chen > Cc: Andi Kleen > Cc: Michel Lespinasse > Cc: Seiji Aguchi > Cc: Yang Zhang > Cc: Paul Gortmaker > Cc: Janet Morgan > Cc: Tony Luck > Cc: Ruiv Wang > Cc: Gong Chen > Cc: H. Peter Anvin > Cc: Gong Chen > Cc: x...@kernel.org > --- > arch/x86/include/asm/irq.h |1 + > arch/x86/kernel/irq.c | 69 > > arch/x86/kernel/smpboot.c |6 > 3 files changed, 76 insertions(+) > > diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h > index 0ea10f27..cb6cfcd 100644 > --- a/arch/x86/include/asm/irq.h > +++
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v7]
* Prarit Bhargava pra...@redhat.com wrote: Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791 When a cpu is downed on a system, the irqs on the cpu are assigned to other cpus. It is possible, however, that when a cpu is downed there aren't enough free vectors on the remaining cpus to account for the vectors from the cpu that is being downed. This results in an interesting overflow condition where irqs are assigned to a CPU but are not handled. For example, when downing cpus on a 1-64 logical processor system: snip [ 232.021745] smpboot: CPU 61 is now offline [ 238.480275] smpboot: CPU 62 is now offline [ 245.991080] [ cut here ] [ 245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x246/0x250() [ 246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out [ 246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas scsi_transport_sas [ 246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14 [ 246.044451] Hardware name: Intel Corporation S4600LH ../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [ 246.057371] 0009 88081fa03d40 8164fbf6 88081fa0ee48 [ 246.065728] 88081fa03d90 88081fa03d80 81054ecc 88081fa13040 [ 246.074073] 88200cce 0040 [ 246.082430] Call Trace: [ 246.085174] IRQ [8164fbf6] dump_stack+0x46/0x58 [ 246.091633] [81054ecc] warn_slowpath_common+0x8c/0xc0 [ 246.098352] [81054fb6] warn_slowpath_fmt+0x46/0x50 [ 246.104786] [815710d6] dev_watchdog+0x246/0x250 [ 246.110923] [81570e90] ? dev_deactivate_queue.constprop.31+0x80/0x80 [ 246.119097] [8106092a] call_timer_fn+0x3a/0x110 [ 246.125224] [8106280f] ? update_process_times+0x6f/0x80 [ 246.132137] [81570e90] ? dev_deactivate_queue.constprop.31+0x80/0x80 [ 246.140308] [81061db0] run_timer_softirq+0x1f0/0x2a0 [ 246.146933] [81059a80] __do_softirq+0xe0/0x220 [ 246.152976] [8165fedc] call_softirq+0x1c/0x30 [ 246.158920] [810045f5] do_softirq+0x55/0x90 [ 246.164670] [81059d35] irq_exit+0xa5/0xb0 [ 246.170227] [8166062a] smp_apic_timer_interrupt+0x4a/0x60 [ 246.177324] [8165f40a] apic_timer_interrupt+0x6a/0x70 [ 246.184041] EOI [81505a1b] ? cpuidle_enter_state+0x5b/0xe0 [ 246.191559] [81505a17] ? cpuidle_enter_state+0x57/0xe0 [ 246.198374] [81505b5d] cpuidle_idle_call+0xbd/0x200 [ 246.204900] [8100b7ae] arch_cpu_idle+0xe/0x30 [ 246.210846] [810a47b0] cpu_startup_entry+0xd0/0x250 [ 246.217371] [81646b47] rest_init+0x77/0x80 [ 246.223028] [81d09e8e] start_kernel+0x3ee/0x3fb [ 246.229165] [81d0989f] ? repair_env_string+0x5e/0x5e [ 246.235787] [81d095a5] x86_64_start_reservations+0x2a/0x2c [ 246.242990] [81d0969f] x86_64_start_kernel+0xf8/0xfc [ 246.249610] ---[ end trace fb74fdef54d79039 ]--- [ 246.254807] ixgbe :c2:00.0 p786p1: initiating reset due to tx timeout [ 246.262489] ixgbe :c2:00.0 p786p1: Reset adapter Last login: Mon Nov 11 08:35:14 from 10.18.17.119 [root@(none) ~]# [ 246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5 [ 249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX [ 246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5 [ 249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX (last lines keep repeating. ixgbe driver is dead until module reload.) If the downed cpu has more vectors than are free on the remaining cpus on the system, it is possible that some vectors are orphaned even though they are assigned to a cpu. In this case, since the ixgbe driver had a watchdog, the watchdog fired and notified that something was wrong. This patch adds a function, check_vectors(), to compare the number of vectors on the CPU going down and compares it to the number of vectors available on the system. If there aren't enough vectors for the CPU to go down, an error is returned and propogated back to userspace. v2: Do not need to look at percpu irqs v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest Priority Mode v4: Additional changes suggested by Gong Chen. v5/v6/v7: Updated comment text Signed-off-by: Prarit Bhargava pra...@redhat.com Reviewed-by: Gong Chen gong.c...@linux.intel.com Cc: Andi Kleen a...@linux.intel.com Cc: Michel Lespinasse wal...@google.com Cc: Seiji Aguchi seiji.agu...@hds.com Cc: Yang Zhang yang.z.zh...@intel.com Cc: Paul Gortmaker paul.gortma...@windriver.com Cc: Janet Morgan
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v7]
On 01/12/2014 04:56 AM, Ingo Molnar wrote: * Prarit Bhargava pra...@redhat.com wrote: Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791 When a cpu is downed on a system, the irqs on the cpu are assigned to other cpus. It is possible, however, that when a cpu is downed there aren't enough free vectors on the remaining cpus to account for the vectors from the cpu that is being downed. This results in an interesting overflow condition where irqs are assigned to a CPU but are not handled. For example, when downing cpus on a 1-64 logical processor system: snip [ 232.021745] smpboot: CPU 61 is now offline [ 238.480275] smpboot: CPU 62 is now offline [ 245.991080] [ cut here ] [ 245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x246/0x250() [ 246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out [ 246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas scsi_transport_sas [ 246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14 [ 246.044451] Hardware name: Intel Corporation S4600LH ../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [ 246.057371] 0009 88081fa03d40 8164fbf6 88081fa0ee48 [ 246.065728] 88081fa03d90 88081fa03d80 81054ecc 88081fa13040 [ 246.074073] 88200cce 0040 [ 246.082430] Call Trace: [ 246.085174] IRQ [8164fbf6] dump_stack+0x46/0x58 [ 246.091633] [81054ecc] warn_slowpath_common+0x8c/0xc0 [ 246.098352] [81054fb6] warn_slowpath_fmt+0x46/0x50 [ 246.104786] [815710d6] dev_watchdog+0x246/0x250 [ 246.110923] [81570e90] ? dev_deactivate_queue.constprop.31+0x80/0x80 [ 246.119097] [8106092a] call_timer_fn+0x3a/0x110 [ 246.125224] [8106280f] ? update_process_times+0x6f/0x80 [ 246.132137] [81570e90] ? dev_deactivate_queue.constprop.31+0x80/0x80 [ 246.140308] [81061db0] run_timer_softirq+0x1f0/0x2a0 [ 246.146933] [81059a80] __do_softirq+0xe0/0x220 [ 246.152976] [8165fedc] call_softirq+0x1c/0x30 [ 246.158920] [810045f5] do_softirq+0x55/0x90 [ 246.164670] [81059d35] irq_exit+0xa5/0xb0 [ 246.170227] [8166062a] smp_apic_timer_interrupt+0x4a/0x60 [ 246.177324] [8165f40a] apic_timer_interrupt+0x6a/0x70 [ 246.184041] EOI [81505a1b] ? cpuidle_enter_state+0x5b/0xe0 [ 246.191559] [81505a17] ? cpuidle_enter_state+0x57/0xe0 [ 246.198374] [81505b5d] cpuidle_idle_call+0xbd/0x200 [ 246.204900] [8100b7ae] arch_cpu_idle+0xe/0x30 [ 246.210846] [810a47b0] cpu_startup_entry+0xd0/0x250 [ 246.217371] [81646b47] rest_init+0x77/0x80 [ 246.223028] [81d09e8e] start_kernel+0x3ee/0x3fb [ 246.229165] [81d0989f] ? repair_env_string+0x5e/0x5e [ 246.235787] [81d095a5] x86_64_start_reservations+0x2a/0x2c [ 246.242990] [81d0969f] x86_64_start_kernel+0xf8/0xfc [ 246.249610] ---[ end trace fb74fdef54d79039 ]--- [ 246.254807] ixgbe :c2:00.0 p786p1: initiating reset due to tx timeout [ 246.262489] ixgbe :c2:00.0 p786p1: Reset adapter Last login: Mon Nov 11 08:35:14 from 10.18.17.119 [root@(none) ~]# [ 246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5 [ 249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX [ 246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5 [ 249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX (last lines keep repeating. ixgbe driver is dead until module reload.) If the downed cpu has more vectors than are free on the remaining cpus on the system, it is possible that some vectors are orphaned even though they are assigned to a cpu. In this case, since the ixgbe driver had a watchdog, the watchdog fired and notified that something was wrong. This patch adds a function, check_vectors(), to compare the number of vectors on the CPU going down and compares it to the number of vectors available on the system. If there aren't enough vectors for the CPU to go down, an error is returned and propogated back to userspace. v2: Do not need to look at percpu irqs v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest Priority Mode v4: Additional changes suggested by Gong Chen. v5/v6/v7: Updated comment text Signed-off-by: Prarit Bhargava pra...@redhat.com Reviewed-by: Gong Chen gong.c...@linux.intel.com Cc: Andi Kleen a...@linux.intel.com Cc: Michel Lespinasse wal...@google.com Cc: Seiji Aguchi seiji.agu...@hds.com Cc: Yang Zhang yang.z.zh...@intel.com Cc: Paul Gortmaker
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v6]
On 01/07/2014 12:54 PM, Luck, Tony wrote: > + for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) { > + irq = __this_cpu_read(vector_irq[vector]); > + if (irq >= 0) { > + desc = irq_to_desc(irq); > + data = irq_desc_get_irq_data(desc); > + cpumask_copy(_new, data->affinity); > + cpu_clear(this_cpu, affinity_new); > + /* > + * The check below determines if this irq requires > + * an empty vector_irq[irq] entry on an online > + * cpu. > + * > + * The code only counts active non-percpu irqs, and > + * those irqs that are not linked to on an online cpu. > + * The first test is trivial, the second is not. > + * > + * The second test takes into account the > + * account that a single irq may be mapped to multiple > + * cpu's vector_irq[] (for example IOAPIC cluster > + * mode). In this case we have two > + * possibilities: > + * > + * 1) the resulting affinity mask is empty; that is > + * this the down'd cpu is the last cpu in the irq's > + * affinity mask, and > Code says "||" - so I think comment should say "or". > + * > + * 2) the resulting affinity mask is no longer > + * a subset of the online cpus but the affinity > + * mask is not zero; that is the down'd cpu is the > + * last online cpu in a user set affinity mask. > + * > + * In both possibilities, we need to remap the irq > + * to a new vector_irq[]. > + * > + */ > + if (irq_has_action(irq) && !irqd_is_per_cpu(data) && > + (cpumask_empty(_new) || > + !cpumask_subset(_new, _new))) > + this_count++; > + } > > That's an impressive 6:1 ratio of lines-of-comment to lines-of-code! Heh -- I couldn't decide if I should keep it all together in one comment or divide it up. I guess it does look less scary if I divide it up. So how about (sorry for the cut-and-paste) for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) { irq = __this_cpu_read(vector_irq[vector]); if (irq >= 0) { desc = irq_to_desc(irq); data = irq_desc_get_irq_data(desc); cpumask_copy(_new, data->affinity); cpu_clear(this_cpu, affinity_new); /* Do not count inactive or per-cpu irqs. */ if (!irq_has_action(irq) || irqd_is_per_cpu(data)) continue; /* * A single irq may be mapped to multiple * cpu's vector_irq[] (for example IOAPIC cluster * mode). In this case we have two * possibilities: * * 1) the resulting affinity mask is empty; that is * this the down'd cpu is the last cpu in the irq's * affinity mask, or * * 2) the resulting affinity mask is no longer * a subset of the online cpus but the affinity * mask is not zero; that is the down'd cpu is the * last online cpu in a user set affinity mask. */ if (cpumask_empty(_new) || !cpumask_subset(_new, _new)) this_count++; } } Everyone okay with that? P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] x86: Add check for number of available vectors before CPU down [v6]
+ for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) { + irq = __this_cpu_read(vector_irq[vector]); + if (irq >= 0) { + desc = irq_to_desc(irq); + data = irq_desc_get_irq_data(desc); + cpumask_copy(_new, data->affinity); + cpu_clear(this_cpu, affinity_new); + /* +* The check below determines if this irq requires +* an empty vector_irq[irq] entry on an online +* cpu. +* +* The code only counts active non-percpu irqs, and +* those irqs that are not linked to on an online cpu. +* The first test is trivial, the second is not. +* +* The second test takes into account the +* account that a single irq may be mapped to multiple +* cpu's vector_irq[] (for example IOAPIC cluster +* mode). In this case we have two +* possibilities: +* +* 1) the resulting affinity mask is empty; that is +* this the down'd cpu is the last cpu in the irq's +* affinity mask, and Code says "||" - so I think comment should say "or". +* +* 2) the resulting affinity mask is no longer +* a subset of the online cpus but the affinity +* mask is not zero; that is the down'd cpu is the +* last online cpu in a user set affinity mask. +* +* In both possibilities, we need to remap the irq +* to a new vector_irq[]. +* +*/ + if (irq_has_action(irq) && !irqd_is_per_cpu(data) && + (cpumask_empty(_new) || +!cpumask_subset(_new, _new))) + this_count++; + } That's an impressive 6:1 ratio of lines-of-comment to lines-of-code! Perhaps it would be less scary if the test were broken up into the easy/obvious part and the one that has taken all these revisions to work out? E.g. /* no need to count inactive or per-cpu irqs */ if (!irq_has_action(irq) || irqd_is_per_cpu(data)) continue; /* * We need to look for a new home for this irq if: ... paste in 1), 2) from above here ... (but s/and/or/ to match code) */ if (cpumask_empty(_new) || !cpumask_subset(_new, _new)) this_count++; -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] x86: Add check for number of available vectors before CPU down [v6]
+ for (vector = FIRST_EXTERNAL_VECTOR; vector NR_VECTORS; vector++) { + irq = __this_cpu_read(vector_irq[vector]); + if (irq = 0) { + desc = irq_to_desc(irq); + data = irq_desc_get_irq_data(desc); + cpumask_copy(affinity_new, data-affinity); + cpu_clear(this_cpu, affinity_new); + /* +* The check below determines if this irq requires +* an empty vector_irq[irq] entry on an online +* cpu. +* +* The code only counts active non-percpu irqs, and +* those irqs that are not linked to on an online cpu. +* The first test is trivial, the second is not. +* +* The second test takes into account the +* account that a single irq may be mapped to multiple +* cpu's vector_irq[] (for example IOAPIC cluster +* mode). In this case we have two +* possibilities: +* +* 1) the resulting affinity mask is empty; that is +* this the down'd cpu is the last cpu in the irq's +* affinity mask, and Code says || - so I think comment should say or. +* +* 2) the resulting affinity mask is no longer +* a subset of the online cpus but the affinity +* mask is not zero; that is the down'd cpu is the +* last online cpu in a user set affinity mask. +* +* In both possibilities, we need to remap the irq +* to a new vector_irq[]. +* +*/ + if (irq_has_action(irq) !irqd_is_per_cpu(data) + (cpumask_empty(affinity_new) || +!cpumask_subset(affinity_new, online_new))) + this_count++; + } That's an impressive 6:1 ratio of lines-of-comment to lines-of-code! Perhaps it would be less scary if the test were broken up into the easy/obvious part and the one that has taken all these revisions to work out? E.g. /* no need to count inactive or per-cpu irqs */ if (!irq_has_action(irq) || irqd_is_per_cpu(data)) continue; /* * We need to look for a new home for this irq if: ... paste in 1), 2) from above here ... (but s/and/or/ to match code) */ if (cpumask_empty(affinity_new) || !cpumask_subset(affinity_new, online_new)) this_count++; -Tony -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v6]
On 01/07/2014 12:54 PM, Luck, Tony wrote: + for (vector = FIRST_EXTERNAL_VECTOR; vector NR_VECTORS; vector++) { + irq = __this_cpu_read(vector_irq[vector]); + if (irq = 0) { + desc = irq_to_desc(irq); + data = irq_desc_get_irq_data(desc); + cpumask_copy(affinity_new, data-affinity); + cpu_clear(this_cpu, affinity_new); + /* + * The check below determines if this irq requires + * an empty vector_irq[irq] entry on an online + * cpu. + * + * The code only counts active non-percpu irqs, and + * those irqs that are not linked to on an online cpu. + * The first test is trivial, the second is not. + * + * The second test takes into account the + * account that a single irq may be mapped to multiple + * cpu's vector_irq[] (for example IOAPIC cluster + * mode). In this case we have two + * possibilities: + * + * 1) the resulting affinity mask is empty; that is + * this the down'd cpu is the last cpu in the irq's + * affinity mask, and Code says || - so I think comment should say or. + * + * 2) the resulting affinity mask is no longer + * a subset of the online cpus but the affinity + * mask is not zero; that is the down'd cpu is the + * last online cpu in a user set affinity mask. + * + * In both possibilities, we need to remap the irq + * to a new vector_irq[]. + * + */ + if (irq_has_action(irq) !irqd_is_per_cpu(data) + (cpumask_empty(affinity_new) || + !cpumask_subset(affinity_new, online_new))) + this_count++; + } That's an impressive 6:1 ratio of lines-of-comment to lines-of-code! Heh -- I couldn't decide if I should keep it all together in one comment or divide it up. I guess it does look less scary if I divide it up. So how about (sorry for the cut-and-paste) for (vector = FIRST_EXTERNAL_VECTOR; vector NR_VECTORS; vector++) { irq = __this_cpu_read(vector_irq[vector]); if (irq = 0) { desc = irq_to_desc(irq); data = irq_desc_get_irq_data(desc); cpumask_copy(affinity_new, data-affinity); cpu_clear(this_cpu, affinity_new); /* Do not count inactive or per-cpu irqs. */ if (!irq_has_action(irq) || irqd_is_per_cpu(data)) continue; /* * A single irq may be mapped to multiple * cpu's vector_irq[] (for example IOAPIC cluster * mode). In this case we have two * possibilities: * * 1) the resulting affinity mask is empty; that is * this the down'd cpu is the last cpu in the irq's * affinity mask, or * * 2) the resulting affinity mask is no longer * a subset of the online cpus but the affinity * mask is not zero; that is the down'd cpu is the * last online cpu in a user set affinity mask. */ if (cpumask_empty(affinity_new) || !cpumask_subset(affinity_new, online_new)) this_count++; } } Everyone okay with that? P. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v4]
Add some nitpicks below. Reviewed-by: Chen, Gong On Thu, Jan 02, 2014 at 07:47:24PM -0500, Prarit Bhargava wrote: > +int check_irq_vectors_for_cpu_disable(void) > +{ > + int irq, cpu; > + unsigned int vector, this_count, count; > + struct irq_desc *desc; > + struct irq_data *data; > + struct cpumask affinity_new, online_new; > + > + cpumask_copy(_new, cpu_online_mask); > + cpu_clear(smp_processor_id(), online_new); I notice that you use smp_processor_id() many times. Maybe we can save it first for speed. > + > + this_count = 0; > + for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) { > + irq = __this_cpu_read(vector_irq[vector]); > + if (irq >= 0) { > + desc = irq_to_desc(irq); > + data = irq_desc_get_irq_data(desc); > + cpumask_copy(_new, data->affinity); > + cpu_clear(smp_processor_id(), affinity_new); > + /* > + * Only count active non-percpu irqs, and those > + * irqs that are not linked to on an online cpu; in > + * fixup_irqs(), chip->irq_set_affinity() will be > + * called which will set vector_irq[irq] for each > + * cpu. > + */ > + if (irq_has_action(irq) && !irqd_is_per_cpu(data) && > + (cpumask_empty(_new) || > + !cpumask_subset(_new, _new))) > + this_count++; Would you please add some extra comments to describe these two different conditions that cpumask is empty and non-empty. I feel it is a little bit sutble and I don't expect I'm confused by myself one day :-). signature.asc Description: Digital signature
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 01/02/2014 07:57 AM, Prarit Bhargava wrote: > > > On 01/01/2014 09:41 PM, Chen, Gong wrote: >> On Tue, Dec 31, 2013 at 04:22:09PM -0500, Prarit Bhargava wrote: >>> Okay, how about, >>> if (irq_has_action(irq) && !irqd_is_per_cpu(data) && >>> ((!cpumask_empty(_new)) && >>> !cpumask_subset(_new, _new)) >>> || >>> cpumask_empty(_new)) >>> this_count++; >>> >> I think it is good but a little bit complicated. How about this: >> >> if (irq_has_action(irq) && !irqd_is_per_cpu(data) && >> /* add some commments to emphysize the importance of turn */ >> (cpumask_empty(_new) || >> !cpumask_subset(_new, _new))) > > Yeah :) I thought of that after I sent it. :) > >> >>> I tried this with the following examples and AFAICT I get the correct >>> result: >>> >>> 1) affinity mask = online mask = 0xf. CPU 3 (1000b) is down'd. >>> >>> this_count is not incremented. >>> >>> 2) affinity mask is a non-zero subset of the online mask (which IMO is >>> the "typical" case). For example, affinity_mask = 0x9, online mask = 0xf. >>> CPU >>> 3 is again down'd. >>> >>> this_count is not incremented. >>> >>> 3) affinity_mask = 0x1, online mask = 0x3. (this is your example). CPU >>> 1 is going down. >>> >>> this_count is incremented, as the resulting affinity mask will be 0. >>> >>> 4) affinity_mask = 0x0, online mask = 0x7. CPU 1 is going down. >>> >>> this_count is incremented, as the affinity mask is 0. >>> >> The 4th scenario is very tricky. If you try to set affinity from user space, >> it will return failure because before kernel tried to change the affinity it >> will verify it: >> int __ioapic_set_affinity(...) >> { >> ... >> if (!cpumask_intersects(mask, cpu_online_mask)) >> return -EINVAL; >> ... >> } >> >> So from this point of view, affinity can't be 0. But your patch is very >> special because you change it by hand: >> cpu_clear(smp_processor_id(), affinity_new); >> >> so it is reasonable. It makes me thinking a little bit more. In fixup_irqs >> we have similar logic but we don't protect it. Maybe it is because currently >> the scenario 4 can't happen because we stop it in advance. But who knows >> if one day we use it in other situation we will hit this subtle issue >> probably. >> >> So, Prarit, I suggest you writing another patch to fix this potential issue >> for fixup_irqs. How would you think? > > As you know Rui, I've been staring at that code wondering if it needs a fix. > I'd like to hear Gong Chen's thoughts about it too... > Oops -- got my sender header mixed up somehow. Sorry 'bout that Gong :( Gong, I think I'd like to go with the above patch for now and then take a closer look at fixing the fixup_irqs() in another patch. The reason is that I think the above patch is critical for getting cpu hotplug working with large systems with many devices. I'm going to do more testing and will resubmit the patch later in the day. P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 01/01/2014 09:41 PM, Chen, Gong wrote: > On Tue, Dec 31, 2013 at 04:22:09PM -0500, Prarit Bhargava wrote: >> Okay, how about, >> if (irq_has_action(irq) && !irqd_is_per_cpu(data) && >> ((!cpumask_empty(_new)) && >> !cpumask_subset(_new, _new)) || >> cpumask_empty(_new)) >> this_count++; >> > I think it is good but a little bit complicated. How about this: > > if (irq_has_action(irq) && !irqd_is_per_cpu(data) && > /* add some commments to emphysize the importance of turn */ > (cpumask_empty(_new) || > !cpumask_subset(_new, _new))) Yeah :) I thought of that after I sent it. :) > >> I tried this with the following examples and AFAICT I get the correct result: >> >> 1) affinity mask = online mask = 0xf. CPU 3 (1000b) is down'd. >> >> this_count is not incremented. >> >> 2) affinity mask is a non-zero subset of the online mask (which IMO is >> the "typical" case). For example, affinity_mask = 0x9, online mask = 0xf. >> CPU >> 3 is again down'd. >> >> this_count is not incremented. >> >> 3) affinity_mask = 0x1, online mask = 0x3. (this is your example). CPU >> 1 is going down. >> >> this_count is incremented, as the resulting affinity mask will be 0. >> >> 4) affinity_mask = 0x0, online mask = 0x7. CPU 1 is going down. >> >> this_count is incremented, as the affinity mask is 0. >> > The 4th scenario is very tricky. If you try to set affinity from user space, > it will return failure because before kernel tried to change the affinity it > will verify it: > int __ioapic_set_affinity(...) > { > ... > if (!cpumask_intersects(mask, cpu_online_mask)) > return -EINVAL; > ... > } > > So from this point of view, affinity can't be 0. But your patch is very > special because you change it by hand: > cpu_clear(smp_processor_id(), affinity_new); > > so it is reasonable. It makes me thinking a little bit more. In fixup_irqs > we have similar logic but we don't protect it. Maybe it is because currently > the scenario 4 can't happen because we stop it in advance. But who knows > if one day we use it in other situation we will hit this subtle issue > probably. > > So, Prarit, I suggest you writing another patch to fix this potential issue > for fixup_irqs. How would you think? As you know Rui, I've been staring at that code wondering if it needs a fix. I'd like to hear Gong Chen's thoughts about it too... P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 01/01/2014 09:41 PM, Chen, Gong wrote: On Tue, Dec 31, 2013 at 04:22:09PM -0500, Prarit Bhargava wrote: Okay, how about, if (irq_has_action(irq) !irqd_is_per_cpu(data) ((!cpumask_empty(affinity_new)) !cpumask_subset(affinity_new, online_new)) || cpumask_empty(affinity_new)) this_count++; I think it is good but a little bit complicated. How about this: if (irq_has_action(irq) !irqd_is_per_cpu(data) /* add some commments to emphysize the importance of turn */ (cpumask_empty(affinity_new) || !cpumask_subset(affinity_new, online_new))) Yeah :) I thought of that after I sent it. :) I tried this with the following examples and AFAICT I get the correct result: 1) affinity mask = online mask = 0xf. CPU 3 (1000b) is down'd. this_count is not incremented. 2) affinity mask is a non-zero subset of the online mask (which IMO is the typical case). For example, affinity_mask = 0x9, online mask = 0xf. CPU 3 is again down'd. this_count is not incremented. 3) affinity_mask = 0x1, online mask = 0x3. (this is your example). CPU 1 is going down. this_count is incremented, as the resulting affinity mask will be 0. 4) affinity_mask = 0x0, online mask = 0x7. CPU 1 is going down. this_count is incremented, as the affinity mask is 0. The 4th scenario is very tricky. If you try to set affinity from user space, it will return failure because before kernel tried to change the affinity it will verify it: int __ioapic_set_affinity(...) { ... if (!cpumask_intersects(mask, cpu_online_mask)) return -EINVAL; ... } So from this point of view, affinity can't be 0. But your patch is very special because you change it by hand: cpu_clear(smp_processor_id(), affinity_new); so it is reasonable. It makes me thinking a little bit more. In fixup_irqs we have similar logic but we don't protect it. Maybe it is because currently the scenario 4 can't happen because we stop it in advance. But who knows if one day we use it in other situation we will hit this subtle issue probably. So, Prarit, I suggest you writing another patch to fix this potential issue for fixup_irqs. How would you think? As you know Rui, I've been staring at that code wondering if it needs a fix. I'd like to hear Gong Chen's thoughts about it too... P. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 01/02/2014 07:57 AM, Prarit Bhargava wrote: On 01/01/2014 09:41 PM, Chen, Gong wrote: On Tue, Dec 31, 2013 at 04:22:09PM -0500, Prarit Bhargava wrote: Okay, how about, if (irq_has_action(irq) !irqd_is_per_cpu(data) ((!cpumask_empty(affinity_new)) !cpumask_subset(affinity_new, online_new)) || cpumask_empty(affinity_new)) this_count++; I think it is good but a little bit complicated. How about this: if (irq_has_action(irq) !irqd_is_per_cpu(data) /* add some commments to emphysize the importance of turn */ (cpumask_empty(affinity_new) || !cpumask_subset(affinity_new, online_new))) Yeah :) I thought of that after I sent it. :) I tried this with the following examples and AFAICT I get the correct result: 1) affinity mask = online mask = 0xf. CPU 3 (1000b) is down'd. this_count is not incremented. 2) affinity mask is a non-zero subset of the online mask (which IMO is the typical case). For example, affinity_mask = 0x9, online mask = 0xf. CPU 3 is again down'd. this_count is not incremented. 3) affinity_mask = 0x1, online mask = 0x3. (this is your example). CPU 1 is going down. this_count is incremented, as the resulting affinity mask will be 0. 4) affinity_mask = 0x0, online mask = 0x7. CPU 1 is going down. this_count is incremented, as the affinity mask is 0. The 4th scenario is very tricky. If you try to set affinity from user space, it will return failure because before kernel tried to change the affinity it will verify it: int __ioapic_set_affinity(...) { ... if (!cpumask_intersects(mask, cpu_online_mask)) return -EINVAL; ... } So from this point of view, affinity can't be 0. But your patch is very special because you change it by hand: cpu_clear(smp_processor_id(), affinity_new); so it is reasonable. It makes me thinking a little bit more. In fixup_irqs we have similar logic but we don't protect it. Maybe it is because currently the scenario 4 can't happen because we stop it in advance. But who knows if one day we use it in other situation we will hit this subtle issue probably. So, Prarit, I suggest you writing another patch to fix this potential issue for fixup_irqs. How would you think? As you know Rui, I've been staring at that code wondering if it needs a fix. I'd like to hear Gong Chen's thoughts about it too... Oops -- got my sender header mixed up somehow. Sorry 'bout that Gong :( Gong, I think I'd like to go with the above patch for now and then take a closer look at fixing the fixup_irqs() in another patch. The reason is that I think the above patch is critical for getting cpu hotplug working with large systems with many devices. I'm going to do more testing and will resubmit the patch later in the day. P. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v4]
Add some nitpicks below. Reviewed-by: Chen, Gong gong.c...@linux.intel.com On Thu, Jan 02, 2014 at 07:47:24PM -0500, Prarit Bhargava wrote: +int check_irq_vectors_for_cpu_disable(void) +{ + int irq, cpu; + unsigned int vector, this_count, count; + struct irq_desc *desc; + struct irq_data *data; + struct cpumask affinity_new, online_new; + + cpumask_copy(online_new, cpu_online_mask); + cpu_clear(smp_processor_id(), online_new); I notice that you use smp_processor_id() many times. Maybe we can save it first for speed. + + this_count = 0; + for (vector = FIRST_EXTERNAL_VECTOR; vector NR_VECTORS; vector++) { + irq = __this_cpu_read(vector_irq[vector]); + if (irq = 0) { + desc = irq_to_desc(irq); + data = irq_desc_get_irq_data(desc); + cpumask_copy(affinity_new, data-affinity); + cpu_clear(smp_processor_id(), affinity_new); + /* + * Only count active non-percpu irqs, and those + * irqs that are not linked to on an online cpu; in + * fixup_irqs(), chip-irq_set_affinity() will be + * called which will set vector_irq[irq] for each + * cpu. + */ + if (irq_has_action(irq) !irqd_is_per_cpu(data) + (cpumask_empty(affinity_new) || + !cpumask_subset(affinity_new, online_new))) + this_count++; Would you please add some extra comments to describe these two different conditions that cpumask is empty and non-empty. I feel it is a little bit sutble and I don't expect I'm confused by myself one day :-). signature.asc Description: Digital signature
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On Tue, Dec 31, 2013 at 04:22:09PM -0500, Prarit Bhargava wrote: > Okay, how about, > if (irq_has_action(irq) && !irqd_is_per_cpu(data) && > ((!cpumask_empty(_new)) && > !cpumask_subset(_new, _new)) || > cpumask_empty(_new)) > this_count++; > I think it is good but a little bit complicated. How about this: if (irq_has_action(irq) && !irqd_is_per_cpu(data) && /* add some commments to emphysize the importance of turn */ (cpumask_empty(_new) || !cpumask_subset(_new, _new))) > I tried this with the following examples and AFAICT I get the correct result: > > 1) affinity mask = online mask = 0xf. CPU 3 (1000b) is down'd. > > this_count is not incremented. > > 2) affinity mask is a non-zero subset of the online mask (which IMO is > the "typical" case). For example, affinity_mask = 0x9, online mask = 0xf. > CPU > 3 is again down'd. > > this_count is not incremented. > > 3) affinity_mask = 0x1, online mask = 0x3. (this is your example). CPU > 1 is going down. > > this_count is incremented, as the resulting affinity mask will be 0. > > 4) affinity_mask = 0x0, online mask = 0x7. CPU 1 is going down. > > this_count is incremented, as the affinity mask is 0. > The 4th scenario is very tricky. If you try to set affinity from user space, it will return failure because before kernel tried to change the affinity it will verify it: int __ioapic_set_affinity(...) { ... if (!cpumask_intersects(mask, cpu_online_mask)) return -EINVAL; ... } So from this point of view, affinity can't be 0. But your patch is very special because you change it by hand: cpu_clear(smp_processor_id(), affinity_new); so it is reasonable. It makes me thinking a little bit more. In fixup_irqs we have similar logic but we don't protect it. Maybe it is because currently the scenario 4 can't happen because we stop it in advance. But who knows if one day we use it in other situation we will hit this subtle issue probably. So, Prarit, I suggest you writing another patch to fix this potential issue for fixup_irqs. How would you think? signature.asc Description: Digital signature
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On Tue, Dec 31, 2013 at 04:22:09PM -0500, Prarit Bhargava wrote: Okay, how about, if (irq_has_action(irq) !irqd_is_per_cpu(data) ((!cpumask_empty(affinity_new)) !cpumask_subset(affinity_new, online_new)) || cpumask_empty(affinity_new)) this_count++; I think it is good but a little bit complicated. How about this: if (irq_has_action(irq) !irqd_is_per_cpu(data) /* add some commments to emphysize the importance of turn */ (cpumask_empty(affinity_new) || !cpumask_subset(affinity_new, online_new))) I tried this with the following examples and AFAICT I get the correct result: 1) affinity mask = online mask = 0xf. CPU 3 (1000b) is down'd. this_count is not incremented. 2) affinity mask is a non-zero subset of the online mask (which IMO is the typical case). For example, affinity_mask = 0x9, online mask = 0xf. CPU 3 is again down'd. this_count is not incremented. 3) affinity_mask = 0x1, online mask = 0x3. (this is your example). CPU 1 is going down. this_count is incremented, as the resulting affinity mask will be 0. 4) affinity_mask = 0x0, online mask = 0x7. CPU 1 is going down. this_count is incremented, as the affinity mask is 0. The 4th scenario is very tricky. If you try to set affinity from user space, it will return failure because before kernel tried to change the affinity it will verify it: int __ioapic_set_affinity(...) { ... if (!cpumask_intersects(mask, cpu_online_mask)) return -EINVAL; ... } So from this point of view, affinity can't be 0. But your patch is very special because you change it by hand: cpu_clear(smp_processor_id(), affinity_new); so it is reasonable. It makes me thinking a little bit more. In fixup_irqs we have similar logic but we don't protect it. Maybe it is because currently the scenario 4 can't happen because we stop it in advance. But who knows if one day we use it in other situation we will hit this subtle issue probably. So, Prarit, I suggest you writing another patch to fix this potential issue for fixup_irqs. How would you think? signature.asc Description: Digital signature
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 12/30/2013 09:58 PM, rui wang wrote: > On 12/30/13, Prarit Bhargava wrote: >> >> >> On 12/30/2013 07:56 AM, rui wang wrote: >> > ... >> Okay, so the big issue is that we need to do the calculation without this >> cpu, > >> >> int check_irq_vectors_for_cpu_disable(void) >> { >>int irq, cpu; >>unsigned int vector, this_count, count; >>struct irq_desc *desc; >>struct irq_data *data; >>struct cpumask online_new; /* cpu_online_mask - this_cpu */ >>struct cpumask affinity_new; /* affinity - this_cpu */ >> >>cpumask_copy(_new, cpu_online_mask); >>cpu_clear(smp_processor_id(), online_new); >> >>this_count = 0; >>for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) { >>irq = __this_cpu_read(vector_irq[vector]); >>if (irq >= 0) { >>desc = irq_to_desc(irq); >>data = irq_desc_get_irq_data(desc); >>cpumask_copy(_new, data->affinity); >>cpu_clear(smp_processor_id(), affinity_new); >>if (irq_has_action(irq) && !irqd_is_per_cpu(data) && >>!cpumask_subset(_new, _new) && >>!cpumask_empty(_new)) > > If this cpu is the only target, then affinity_new becomes empty. > Should we count it for migration? Okay, how about, if (irq_has_action(irq) && !irqd_is_per_cpu(data) && ((!cpumask_empty(_new)) && !cpumask_subset(_new, _new)) || cpumask_empty(_new)) this_count++; I tried this with the following examples and AFAICT I get the correct result: 1) affinity mask = online mask = 0xf. CPU 3 (1000b) is down'd. this_count is not incremented. 2) affinity mask is a non-zero subset of the online mask (which IMO is the "typical" case). For example, affinity_mask = 0x9, online mask = 0xf. CPU 3 is again down'd. this_count is not incremented. 3) affinity_mask = 0x1, online mask = 0x3. (this is your example). CPU 1 is going down. this_count is incremented, as the resulting affinity mask will be 0. 4) affinity_mask = 0x0, online mask = 0x7. CPU 1 is going down. this_count is incremented, as the affinity mask is 0. P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 12/30/2013 09:58 PM, rui wang wrote: On 12/30/13, Prarit Bhargava pra...@redhat.com wrote: On 12/30/2013 07:56 AM, rui wang wrote: ... Okay, so the big issue is that we need to do the calculation without this cpu, int check_irq_vectors_for_cpu_disable(void) { int irq, cpu; unsigned int vector, this_count, count; struct irq_desc *desc; struct irq_data *data; struct cpumask online_new; /* cpu_online_mask - this_cpu */ struct cpumask affinity_new; /* affinity - this_cpu */ cpumask_copy(online_new, cpu_online_mask); cpu_clear(smp_processor_id(), online_new); this_count = 0; for (vector = FIRST_EXTERNAL_VECTOR; vector NR_VECTORS; vector++) { irq = __this_cpu_read(vector_irq[vector]); if (irq = 0) { desc = irq_to_desc(irq); data = irq_desc_get_irq_data(desc); cpumask_copy(affinity_new, data-affinity); cpu_clear(smp_processor_id(), affinity_new); if (irq_has_action(irq) !irqd_is_per_cpu(data) !cpumask_subset(affinity_new, online_new) !cpumask_empty(affinity_new)) If this cpu is the only target, then affinity_new becomes empty. Should we count it for migration? Okay, how about, if (irq_has_action(irq) !irqd_is_per_cpu(data) ((!cpumask_empty(affinity_new)) !cpumask_subset(affinity_new, online_new)) || cpumask_empty(affinity_new)) this_count++; I tried this with the following examples and AFAICT I get the correct result: 1) affinity mask = online mask = 0xf. CPU 3 (1000b) is down'd. this_count is not incremented. 2) affinity mask is a non-zero subset of the online mask (which IMO is the typical case). For example, affinity_mask = 0x9, online mask = 0xf. CPU 3 is again down'd. this_count is not incremented. 3) affinity_mask = 0x1, online mask = 0x3. (this is your example). CPU 1 is going down. this_count is incremented, as the resulting affinity mask will be 0. 4) affinity_mask = 0x0, online mask = 0x7. CPU 1 is going down. this_count is incremented, as the affinity mask is 0. P. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 12/30/13, Prarit Bhargava wrote: > > > On 12/30/2013 07:56 AM, rui wang wrote: > >> An irq can be mapped to only one vector number, but can have multiple >> destination CPUs. i.e. the same irq/vector can appear on multiple >> CPUs' vector_irq[]. So checking data->affinity is necessary I think. > > That's true Rui -- but here's what I think the scenario actually is. > > Suppose we have a 4-cpu system, and we have an IRQ that is mapped to > multiple > cpu's vector_irq[]. For example, we have IRQ 200 that is mapped to CPU 2 > vector_irq[50], and CPU 3 vector_irq[60]. > This should not happen. There's only one LSB to fill the vector number in IRTE. So either 50 or 60 but not both for an irq. ... > Okay, so the big issue is that we need to do the calculation without this cpu, Yes if it's done before calling cpu_disable_comm(). > so I think this works (sorry for the cut-and-paste) > > int check_irq_vectors_for_cpu_disable(void) > { >int irq, cpu; >unsigned int vector, this_count, count; >struct irq_desc *desc; >struct irq_data *data; >struct cpumask online_new; /* cpu_online_mask - this_cpu */ >struct cpumask affinity_new; /* affinity - this_cpu */ > >cpumask_copy(_new, cpu_online_mask); >cpu_clear(smp_processor_id(), online_new); > >this_count = 0; >for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) { >irq = __this_cpu_read(vector_irq[vector]); >if (irq >= 0) { >desc = irq_to_desc(irq); >data = irq_desc_get_irq_data(desc); >cpumask_copy(_new, data->affinity); >cpu_clear(smp_processor_id(), affinity_new); >if (irq_has_action(irq) && !irqd_is_per_cpu(data) && >!cpumask_subset(_new, _new) && >!cpumask_empty(_new)) If this cpu is the only target, then affinity_new becomes empty. Should we count it for migration? Thanks Rui -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 12/30/2013 07:56 AM, rui wang wrote: > On 12/29/13, Prarit Bhargava wrote: >> >> >> On 12/20/2013 04:41 AM, rui wang wrote: > <> >>> The vector number for an irq is programmed in the LSB of the IOAPIC >>> IRTE (or MSI data register in the case of MSI/MSIx). So there can be >>> only one vector number (although multiple CPUs can be specified >>> through DM). An MSI-capable device can dynamically change the lower >>> few bits in the LSB to signal multiple interrupts with a contiguous >>> range of vectors in powers of 2,but each of these vectors is treated >>> as a separate IRQ. i.e. each of them has a separate irq desc, or a >>> separate line in the /proc/interrupt file. This patch shows the MSI >>> irq allocation in detail: >>> http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=51906e779f2b13b38f8153774c4c7163d412ffd9 >>> >>> Thanks >>> Rui >>> >> >> Gong and Rui, >> >> After looking at this in detail I realized I made a mistake in my patch by >> including the check for the smp_affinity. Simply put, it shouldn't be >> there >> given Rui's explanation above. >> >> So I think the patch simply needs to do: >> >> this_count = 0; >> for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) >> { >> irq = __this_cpu_read(vector_irq[vector]); >> if (irq >= 0) { >> desc = irq_to_desc(irq); >> data = irq_desc_get_irq_data(desc); >> affinity = data->affinity; >> if (irq_has_action(irq) && !irqd_is_per_cpu(data)) >> this_count++; >> } >> } >> >> Can the two of you confirm the above is correct? It would be greatly >> appreciated. > > An irq can be mapped to only one vector number, but can have multiple > destination CPUs. i.e. the same irq/vector can appear on multiple > CPUs' vector_irq[]. So checking data->affinity is necessary I think. > But notice that data->affinity is updated in chip->irq_set_affinity() > inside fixup_irqs(), while cpu_online_mask is updated in > remove_cpu_from_maps() inside cpu_disable_common(). They are updated > in different places. So the algorithm to check them against each other > should be different, depending on where you put the check_vectors(). > That's my understanding. Okay, so the big issue is that we need to do the calculation without this cpu, so I think this works (sorry for the cut-and-paste) int check_irq_vectors_for_cpu_disable(void) { int irq, cpu; unsigned int vector, this_count, count; struct irq_desc *desc; struct irq_data *data; struct cpumask online_new; /* cpu_online_mask - this_cpu */ struct cpumask affinity_new; /* affinity - this_cpu */ cpumask_copy(_new, cpu_online_mask); cpu_clear(smp_processor_id(), online_new); this_count = 0; for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) { irq = __this_cpu_read(vector_irq[vector]); if (irq >= 0) { desc = irq_to_desc(irq); data = irq_desc_get_irq_data(desc); cpumask_copy(_new, data->affinity); cpu_clear(smp_processor_id(), affinity_new); if (irq_has_action(irq) && !irqd_is_per_cpu(data) && !cpumask_subset(_new, _new) && !cpumask_empty(_new)) this_count++; } } ... If I go back to the various examples this appears to work. For example, your previous case was all cpus are online, CPU 1 goes down and we have an IRQ with affinity for CPU (1,2). We skip this IRQ which is correct. And if we have another IRQ with affinity of only CPU 1 we will not skip this IRQ, which is also correct. I've tried other examples and they appear to work AFAICT. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 12/30/2013 07:56 AM, rui wang wrote: > An irq can be mapped to only one vector number, but can have multiple > destination CPUs. i.e. the same irq/vector can appear on multiple > CPUs' vector_irq[]. So checking data->affinity is necessary I think. That's true Rui -- but here's what I think the scenario actually is. Suppose we have a 4-cpu system, and we have an IRQ that is mapped to multiple cpu's vector_irq[]. For example, we have IRQ 200 that is mapped to CPU 2 vector_irq[50], and CPU 3 vector_irq[60]. Now I 'echo 0 > /sys/devices/system/cpu/cpu3/online'. cpu_disable is called and the kernel migrates IRQs off to other cpus. Regardless if IRQ 200 is already mapped to CPU2 vector_irq[50], the mapping for CPU 3 vector_irq[60] *must be migrated* to another CPU. It has a valid irq handler and the IRQ is active. It doesn't just disappear because the CPU went down. ie) AFAICT we should not differentiate between a multiple mapped IRQ and a singly mapped IRQ when traversing the vector_irq[] for CPU 3. I'm probably being dense on this but I'm not seeing a problem with migrating the IRQ. > But notice that data->affinity is updated in chip->irq_set_affinity() > inside fixup_irqs(), while cpu_online_mask is updated in > remove_cpu_from_maps() inside cpu_disable_common(). It shouldn't matter that the maps are updated in different areas during the execution as we're in stop_machine(). They are updated > in different places. So the algorithm to check them against each other > should be different, depending on where you put the check_vectors(). > That's my understanding. > P. > Thanks > Rui -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 12/30/2013 02:44 AM, Chen, Gong wrote: > On Sat, Dec 28, 2013 at 12:10:38PM -0500, Prarit Bhargava wrote: >> Gong and Rui, >> >> After looking at this in detail I realized I made a mistake in my patch by >> including the check for the smp_affinity. Simply put, it shouldn't be there >> given Rui's explanation above. >> >> So I think the patch simply needs to do: >> >> this_count = 0; >> for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) { >> irq = __this_cpu_read(vector_irq[vector]); >> if (irq >= 0) { >> desc = irq_to_desc(irq); >> data = irq_desc_get_irq_data(desc); >> affinity = data->affinity; >> if (irq_has_action(irq) && !irqd_is_per_cpu(data)) >> this_count++; >> } >> } >> >> Can the two of you confirm the above is correct? It would be greatly >> appreciated. >> > > No, I don't think it is correct. We still need to consider smp_affinity. > > fixup_irqs > irq_set_affinity(native_ioapic_set_affinity) > __ioapic_set_affinity > assign_irq_vector > __assign_irq_vector > cpu_mask_to_apicid_and > /* now begin to set ioapic RET */ > > __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask) > { > ... > apic->vector_allocation_domain(cpu, tmp_mask, mask); > ... > for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask) > per_cpu(vector_irq, new_cpu)[vector] = irq; > cfg->vector = vector; > cpumask_copy(cfg->domain, tmp_mask); > ... > } > > On same vecotr on all related vector_irq, irq is set. So such kind of > irq should happen in multiple vector_irq. In cpu_mask_to_apicid_and(e.g. > x2apic_cpu_mask_to_apicid_and for cluster mode), apic is updated > depending on new mask. That's why I think this kind of interrupt > should be bypassed. Hmm ... okay. I'll take a closer look at this. Thanks for the additional information. P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 12/29/13, Prarit Bhargava wrote: > > > On 12/20/2013 04:41 AM, rui wang wrote: <> >> The vector number for an irq is programmed in the LSB of the IOAPIC >> IRTE (or MSI data register in the case of MSI/MSIx). So there can be >> only one vector number (although multiple CPUs can be specified >> through DM). An MSI-capable device can dynamically change the lower >> few bits in the LSB to signal multiple interrupts with a contiguous >> range of vectors in powers of 2,but each of these vectors is treated >> as a separate IRQ. i.e. each of them has a separate irq desc, or a >> separate line in the /proc/interrupt file. This patch shows the MSI >> irq allocation in detail: >> http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=51906e779f2b13b38f8153774c4c7163d412ffd9 >> >> Thanks >> Rui >> > > Gong and Rui, > > After looking at this in detail I realized I made a mistake in my patch by > including the check for the smp_affinity. Simply put, it shouldn't be > there > given Rui's explanation above. > > So I think the patch simply needs to do: > > this_count = 0; > for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) > { > irq = __this_cpu_read(vector_irq[vector]); > if (irq >= 0) { > desc = irq_to_desc(irq); > data = irq_desc_get_irq_data(desc); > affinity = data->affinity; > if (irq_has_action(irq) && !irqd_is_per_cpu(data)) > this_count++; > } > } > > Can the two of you confirm the above is correct? It would be greatly > appreciated. An irq can be mapped to only one vector number, but can have multiple destination CPUs. i.e. the same irq/vector can appear on multiple CPUs' vector_irq[]. So checking data->affinity is necessary I think. But notice that data->affinity is updated in chip->irq_set_affinity() inside fixup_irqs(), while cpu_online_mask is updated in remove_cpu_from_maps() inside cpu_disable_common(). They are updated in different places. So the algorithm to check them against each other should be different, depending on where you put the check_vectors(). That's my understanding. Thanks Rui -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On Sat, Dec 28, 2013 at 12:10:38PM -0500, Prarit Bhargava wrote: > Gong and Rui, > > After looking at this in detail I realized I made a mistake in my patch by > including the check for the smp_affinity. Simply put, it shouldn't be there > given Rui's explanation above. > > So I think the patch simply needs to do: > > this_count = 0; > for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) { > irq = __this_cpu_read(vector_irq[vector]); > if (irq >= 0) { > desc = irq_to_desc(irq); > data = irq_desc_get_irq_data(desc); > affinity = data->affinity; > if (irq_has_action(irq) && !irqd_is_per_cpu(data)) > this_count++; > } > } > > Can the two of you confirm the above is correct? It would be greatly > appreciated. > No, I don't think it is correct. We still need to consider smp_affinity. fixup_irqs irq_set_affinity(native_ioapic_set_affinity) __ioapic_set_affinity assign_irq_vector __assign_irq_vector cpu_mask_to_apicid_and /* now begin to set ioapic RET */ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask) { ... apic->vector_allocation_domain(cpu, tmp_mask, mask); ... for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask) per_cpu(vector_irq, new_cpu)[vector] = irq; cfg->vector = vector; cpumask_copy(cfg->domain, tmp_mask); ... } On same vecotr on all related vector_irq, irq is set. So such kind of irq should happen in multiple vector_irq. In cpu_mask_to_apicid_and(e.g. x2apic_cpu_mask_to_apicid_and for cluster mode), apic is updated depending on new mask. That's why I think this kind of interrupt should be bypassed. signature.asc Description: Digital signature
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On Sat, Dec 28, 2013 at 12:10:38PM -0500, Prarit Bhargava wrote: Gong and Rui, After looking at this in detail I realized I made a mistake in my patch by including the check for the smp_affinity. Simply put, it shouldn't be there given Rui's explanation above. So I think the patch simply needs to do: this_count = 0; for (vector = FIRST_EXTERNAL_VECTOR; vector NR_VECTORS; vector++) { irq = __this_cpu_read(vector_irq[vector]); if (irq = 0) { desc = irq_to_desc(irq); data = irq_desc_get_irq_data(desc); affinity = data-affinity; if (irq_has_action(irq) !irqd_is_per_cpu(data)) this_count++; } } Can the two of you confirm the above is correct? It would be greatly appreciated. No, I don't think it is correct. We still need to consider smp_affinity. fixup_irqs irq_set_affinity(native_ioapic_set_affinity) __ioapic_set_affinity assign_irq_vector __assign_irq_vector cpu_mask_to_apicid_and /* now begin to set ioapic RET */ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask) { ... apic-vector_allocation_domain(cpu, tmp_mask, mask); ... for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask) per_cpu(vector_irq, new_cpu)[vector] = irq; cfg-vector = vector; cpumask_copy(cfg-domain, tmp_mask); ... } On same vecotr on all related vector_irq, irq is set. So such kind of irq should happen in multiple vector_irq. In cpu_mask_to_apicid_and(e.g. x2apic_cpu_mask_to_apicid_and for cluster mode), apic is updated depending on new mask. That's why I think this kind of interrupt should be bypassed. signature.asc Description: Digital signature
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 12/29/13, Prarit Bhargava pra...@redhat.com wrote: On 12/20/2013 04:41 AM, rui wang wrote: snip The vector number for an irq is programmed in the LSB of the IOAPIC IRTE (or MSI data register in the case of MSI/MSIx). So there can be only one vector number (although multiple CPUs can be specified through DM). An MSI-capable device can dynamically change the lower few bits in the LSB to signal multiple interrupts with a contiguous range of vectors in powers of 2,but each of these vectors is treated as a separate IRQ. i.e. each of them has a separate irq desc, or a separate line in the /proc/interrupt file. This patch shows the MSI irq allocation in detail: http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=51906e779f2b13b38f8153774c4c7163d412ffd9 Thanks Rui Gong and Rui, After looking at this in detail I realized I made a mistake in my patch by including the check for the smp_affinity. Simply put, it shouldn't be there given Rui's explanation above. So I think the patch simply needs to do: this_count = 0; for (vector = FIRST_EXTERNAL_VECTOR; vector NR_VECTORS; vector++) { irq = __this_cpu_read(vector_irq[vector]); if (irq = 0) { desc = irq_to_desc(irq); data = irq_desc_get_irq_data(desc); affinity = data-affinity; if (irq_has_action(irq) !irqd_is_per_cpu(data)) this_count++; } } Can the two of you confirm the above is correct? It would be greatly appreciated. An irq can be mapped to only one vector number, but can have multiple destination CPUs. i.e. the same irq/vector can appear on multiple CPUs' vector_irq[]. So checking data-affinity is necessary I think. But notice that data-affinity is updated in chip-irq_set_affinity() inside fixup_irqs(), while cpu_online_mask is updated in remove_cpu_from_maps() inside cpu_disable_common(). They are updated in different places. So the algorithm to check them against each other should be different, depending on where you put the check_vectors(). That's my understanding. Thanks Rui -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 12/30/2013 02:44 AM, Chen, Gong wrote: On Sat, Dec 28, 2013 at 12:10:38PM -0500, Prarit Bhargava wrote: Gong and Rui, After looking at this in detail I realized I made a mistake in my patch by including the check for the smp_affinity. Simply put, it shouldn't be there given Rui's explanation above. So I think the patch simply needs to do: this_count = 0; for (vector = FIRST_EXTERNAL_VECTOR; vector NR_VECTORS; vector++) { irq = __this_cpu_read(vector_irq[vector]); if (irq = 0) { desc = irq_to_desc(irq); data = irq_desc_get_irq_data(desc); affinity = data-affinity; if (irq_has_action(irq) !irqd_is_per_cpu(data)) this_count++; } } Can the two of you confirm the above is correct? It would be greatly appreciated. No, I don't think it is correct. We still need to consider smp_affinity. fixup_irqs irq_set_affinity(native_ioapic_set_affinity) __ioapic_set_affinity assign_irq_vector __assign_irq_vector cpu_mask_to_apicid_and /* now begin to set ioapic RET */ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask) { ... apic-vector_allocation_domain(cpu, tmp_mask, mask); ... for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask) per_cpu(vector_irq, new_cpu)[vector] = irq; cfg-vector = vector; cpumask_copy(cfg-domain, tmp_mask); ... } On same vecotr on all related vector_irq, irq is set. So such kind of irq should happen in multiple vector_irq. In cpu_mask_to_apicid_and(e.g. x2apic_cpu_mask_to_apicid_and for cluster mode), apic is updated depending on new mask. That's why I think this kind of interrupt should be bypassed. Hmm ... okay. I'll take a closer look at this. Thanks for the additional information. P. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 12/30/2013 07:56 AM, rui wang wrote: An irq can be mapped to only one vector number, but can have multiple destination CPUs. i.e. the same irq/vector can appear on multiple CPUs' vector_irq[]. So checking data-affinity is necessary I think. That's true Rui -- but here's what I think the scenario actually is. Suppose we have a 4-cpu system, and we have an IRQ that is mapped to multiple cpu's vector_irq[]. For example, we have IRQ 200 that is mapped to CPU 2 vector_irq[50], and CPU 3 vector_irq[60]. Now I 'echo 0 /sys/devices/system/cpu/cpu3/online'. cpu_disable is called and the kernel migrates IRQs off to other cpus. Regardless if IRQ 200 is already mapped to CPU2 vector_irq[50], the mapping for CPU 3 vector_irq[60] *must be migrated* to another CPU. It has a valid irq handler and the IRQ is active. It doesn't just disappear because the CPU went down. ie) AFAICT we should not differentiate between a multiple mapped IRQ and a singly mapped IRQ when traversing the vector_irq[] for CPU 3. I'm probably being dense on this but I'm not seeing a problem with migrating the IRQ. But notice that data-affinity is updated in chip-irq_set_affinity() inside fixup_irqs(), while cpu_online_mask is updated in remove_cpu_from_maps() inside cpu_disable_common(). It shouldn't matter that the maps are updated in different areas during the execution as we're in stop_machine(). They are updated in different places. So the algorithm to check them against each other should be different, depending on where you put the check_vectors(). That's my understanding. P. Thanks Rui -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 12/30/2013 07:56 AM, rui wang wrote: On 12/29/13, Prarit Bhargava pra...@redhat.com wrote: On 12/20/2013 04:41 AM, rui wang wrote: snip The vector number for an irq is programmed in the LSB of the IOAPIC IRTE (or MSI data register in the case of MSI/MSIx). So there can be only one vector number (although multiple CPUs can be specified through DM). An MSI-capable device can dynamically change the lower few bits in the LSB to signal multiple interrupts with a contiguous range of vectors in powers of 2,but each of these vectors is treated as a separate IRQ. i.e. each of them has a separate irq desc, or a separate line in the /proc/interrupt file. This patch shows the MSI irq allocation in detail: http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=51906e779f2b13b38f8153774c4c7163d412ffd9 Thanks Rui Gong and Rui, After looking at this in detail I realized I made a mistake in my patch by including the check for the smp_affinity. Simply put, it shouldn't be there given Rui's explanation above. So I think the patch simply needs to do: this_count = 0; for (vector = FIRST_EXTERNAL_VECTOR; vector NR_VECTORS; vector++) { irq = __this_cpu_read(vector_irq[vector]); if (irq = 0) { desc = irq_to_desc(irq); data = irq_desc_get_irq_data(desc); affinity = data-affinity; if (irq_has_action(irq) !irqd_is_per_cpu(data)) this_count++; } } Can the two of you confirm the above is correct? It would be greatly appreciated. An irq can be mapped to only one vector number, but can have multiple destination CPUs. i.e. the same irq/vector can appear on multiple CPUs' vector_irq[]. So checking data-affinity is necessary I think. But notice that data-affinity is updated in chip-irq_set_affinity() inside fixup_irqs(), while cpu_online_mask is updated in remove_cpu_from_maps() inside cpu_disable_common(). They are updated in different places. So the algorithm to check them against each other should be different, depending on where you put the check_vectors(). That's my understanding. Okay, so the big issue is that we need to do the calculation without this cpu, so I think this works (sorry for the cut-and-paste) int check_irq_vectors_for_cpu_disable(void) { int irq, cpu; unsigned int vector, this_count, count; struct irq_desc *desc; struct irq_data *data; struct cpumask online_new; /* cpu_online_mask - this_cpu */ struct cpumask affinity_new; /* affinity - this_cpu */ cpumask_copy(online_new, cpu_online_mask); cpu_clear(smp_processor_id(), online_new); this_count = 0; for (vector = FIRST_EXTERNAL_VECTOR; vector NR_VECTORS; vector++) { irq = __this_cpu_read(vector_irq[vector]); if (irq = 0) { desc = irq_to_desc(irq); data = irq_desc_get_irq_data(desc); cpumask_copy(affinity_new, data-affinity); cpu_clear(smp_processor_id(), affinity_new); if (irq_has_action(irq) !irqd_is_per_cpu(data) !cpumask_subset(affinity_new, online_new) !cpumask_empty(affinity_new)) this_count++; } } ... If I go back to the various examples this appears to work. For example, your previous case was all cpus are online, CPU 1 goes down and we have an IRQ with affinity for CPU (1,2). We skip this IRQ which is correct. And if we have another IRQ with affinity of only CPU 1 we will not skip this IRQ, which is also correct. I've tried other examples and they appear to work AFAICT. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 12/30/13, Prarit Bhargava pra...@redhat.com wrote: On 12/30/2013 07:56 AM, rui wang wrote: An irq can be mapped to only one vector number, but can have multiple destination CPUs. i.e. the same irq/vector can appear on multiple CPUs' vector_irq[]. So checking data-affinity is necessary I think. That's true Rui -- but here's what I think the scenario actually is. Suppose we have a 4-cpu system, and we have an IRQ that is mapped to multiple cpu's vector_irq[]. For example, we have IRQ 200 that is mapped to CPU 2 vector_irq[50], and CPU 3 vector_irq[60]. This should not happen. There's only one LSB to fill the vector number in IRTE. So either 50 or 60 but not both for an irq. ... Okay, so the big issue is that we need to do the calculation without this cpu, Yes if it's done before calling cpu_disable_comm(). so I think this works (sorry for the cut-and-paste) int check_irq_vectors_for_cpu_disable(void) { int irq, cpu; unsigned int vector, this_count, count; struct irq_desc *desc; struct irq_data *data; struct cpumask online_new; /* cpu_online_mask - this_cpu */ struct cpumask affinity_new; /* affinity - this_cpu */ cpumask_copy(online_new, cpu_online_mask); cpu_clear(smp_processor_id(), online_new); this_count = 0; for (vector = FIRST_EXTERNAL_VECTOR; vector NR_VECTORS; vector++) { irq = __this_cpu_read(vector_irq[vector]); if (irq = 0) { desc = irq_to_desc(irq); data = irq_desc_get_irq_data(desc); cpumask_copy(affinity_new, data-affinity); cpu_clear(smp_processor_id(), affinity_new); if (irq_has_action(irq) !irqd_is_per_cpu(data) !cpumask_subset(affinity_new, online_new) !cpumask_empty(affinity_new)) If this cpu is the only target, then affinity_new becomes empty. Should we count it for migration? Thanks Rui -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 12/20/2013 04:41 AM, rui wang wrote: > On 12/20/13, Prarit Bhargava wrote: >> >> >> On 12/19/2013 01:05 PM, Tony Luck wrote: >>> On Wed, Dec 18, 2013 at 11:50 AM, Tony Luck wrote: Looks good to me. >>> >>> Though now I've been confused by an offline question about affinity. >> >> Heh :) I'm pursuing it now. Rui has asked a pretty good question that I >> don't >> know the answer to off the top of my head. I'm still looking at the code. >> >>> >>> Suppose we have some interrupt that has affinity to multiple cpus. E.g. >>> (real example from one of my machines): >>> >>> # cat /proc/irq/94/smp_affinity_list >>> 26,54 >>> >>> Now If I want to take either cpu26 or cpu54 offline - I'm guessing that I >>> don't >>> really need to find a new home for vector 94 - because the other one of >>> that >>> pair already has that set up. But your check_vectors code doesn't look >>> like >>> it accounts for that - if we take cpu26 offline - it would see that >>> cpu54 doesn't >>> have 94 free - but doesn't check that it is for the same interrupt. >>> >>> But I may be mixing "vectors" and "irqs" here. >> >> Yep. The question really is this: is the irq mapped to a single vector or >> multiple vectors. (I think) >> > > The vector number for an irq is programmed in the LSB of the IOAPIC > IRTE (or MSI data register in the case of MSI/MSIx). So there can be > only one vector number (although multiple CPUs can be specified > through DM). An MSI-capable device can dynamically change the lower > few bits in the LSB to signal multiple interrupts with a contiguous > range of vectors in powers of 2,but each of these vectors is treated > as a separate IRQ. i.e. each of them has a separate irq desc, or a > separate line in the /proc/interrupt file. This patch shows the MSI > irq allocation in detail: > http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=51906e779f2b13b38f8153774c4c7163d412ffd9 > > Thanks > Rui > Gong and Rui, After looking at this in detail I realized I made a mistake in my patch by including the check for the smp_affinity. Simply put, it shouldn't be there given Rui's explanation above. So I think the patch simply needs to do: this_count = 0; for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) { irq = __this_cpu_read(vector_irq[vector]); if (irq >= 0) { desc = irq_to_desc(irq); data = irq_desc_get_irq_data(desc); affinity = data->affinity; if (irq_has_action(irq) && !irqd_is_per_cpu(data)) this_count++; } } Can the two of you confirm the above is correct? It would be greatly appreciated. Tony, I apologize -- your comments made me think you were stating a fact and not asking a question on the behavior of affinity. I completely misunderstood what you were suggesting. I thought you were implying that that the affinity "tied" IRQ behavior together; it does not. It is simply a suggestion of what IRQs should be assigned to a particular CPU. There is an expectation that the system will attempt to honour the affinity, however, it is not like each CPU is assigned a separate IRQ. P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 12/20/2013 04:41 AM, rui wang wrote: On 12/20/13, Prarit Bhargava pra...@redhat.com wrote: On 12/19/2013 01:05 PM, Tony Luck wrote: On Wed, Dec 18, 2013 at 11:50 AM, Tony Luck tony.l...@intel.com wrote: Looks good to me. Though now I've been confused by an offline question about affinity. Heh :) I'm pursuing it now. Rui has asked a pretty good question that I don't know the answer to off the top of my head. I'm still looking at the code. Suppose we have some interrupt that has affinity to multiple cpus. E.g. (real example from one of my machines): # cat /proc/irq/94/smp_affinity_list 26,54 Now If I want to take either cpu26 or cpu54 offline - I'm guessing that I don't really need to find a new home for vector 94 - because the other one of that pair already has that set up. But your check_vectors code doesn't look like it accounts for that - if we take cpu26 offline - it would see that cpu54 doesn't have 94 free - but doesn't check that it is for the same interrupt. But I may be mixing vectors and irqs here. Yep. The question really is this: is the irq mapped to a single vector or multiple vectors. (I think) The vector number for an irq is programmed in the LSB of the IOAPIC IRTE (or MSI data register in the case of MSI/MSIx). So there can be only one vector number (although multiple CPUs can be specified through DM). An MSI-capable device can dynamically change the lower few bits in the LSB to signal multiple interrupts with a contiguous range of vectors in powers of 2,but each of these vectors is treated as a separate IRQ. i.e. each of them has a separate irq desc, or a separate line in the /proc/interrupt file. This patch shows the MSI irq allocation in detail: http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=51906e779f2b13b38f8153774c4c7163d412ffd9 Thanks Rui Gong and Rui, After looking at this in detail I realized I made a mistake in my patch by including the check for the smp_affinity. Simply put, it shouldn't be there given Rui's explanation above. So I think the patch simply needs to do: this_count = 0; for (vector = FIRST_EXTERNAL_VECTOR; vector NR_VECTORS; vector++) { irq = __this_cpu_read(vector_irq[vector]); if (irq = 0) { desc = irq_to_desc(irq); data = irq_desc_get_irq_data(desc); affinity = data-affinity; if (irq_has_action(irq) !irqd_is_per_cpu(data)) this_count++; } } Can the two of you confirm the above is correct? It would be greatly appreciated. Tony, I apologize -- your comments made me think you were stating a fact and not asking a question on the behavior of affinity. I completely misunderstood what you were suggesting. I thought you were implying that that the affinity tied IRQ behavior together; it does not. It is simply a suggestion of what IRQs should be assigned to a particular CPU. There is an expectation that the system will attempt to honour the affinity, however, it is not like each CPU is assigned a separate IRQ. P. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v3]
On 12/23/2013 02:12 AM, Chen, Gong wrote: > On Fri, Dec 20, 2013 at 10:50:09AM -0500, Prarit Bhargava wrote: >> +int check_vectors(void) >> +{ >> +int irq, cpu; >> +unsigned int vector, this_count, count; >> +struct irq_desc *desc; >> +struct irq_data *data; >> +struct cpumask *affinity; >> + >> +this_count = 0; >> +for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) { >> +irq = __this_cpu_read(vector_irq[vector]); >> +if (irq >= 0) { >> +desc = irq_to_desc(irq); >> +data = irq_desc_get_irq_data(desc); >> +affinity = data->affinity; >> +if (irq_has_action(irq) || !irqd_is_per_cpu(data) || > This line looks weird. Why counter will be increased once the irq has > action? Do you mean > > if (irq_has_action(irq) && !irqd_is_per_cpu(data) && > !cpumask_subset(affinity, cpu_online_mask)) > > Yes, you're absolutely right. I mixed up the and and or's on this line of code. I will send out a patch against tip shortly. P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v3]
On 12/23/2013 02:12 AM, Chen, Gong wrote: On Fri, Dec 20, 2013 at 10:50:09AM -0500, Prarit Bhargava wrote: +int check_vectors(void) +{ +int irq, cpu; +unsigned int vector, this_count, count; +struct irq_desc *desc; +struct irq_data *data; +struct cpumask *affinity; + +this_count = 0; +for (vector = FIRST_EXTERNAL_VECTOR; vector NR_VECTORS; vector++) { +irq = __this_cpu_read(vector_irq[vector]); +if (irq = 0) { +desc = irq_to_desc(irq); +data = irq_desc_get_irq_data(desc); +affinity = data-affinity; +if (irq_has_action(irq) || !irqd_is_per_cpu(data) || This line looks weird. Why counter will be increased once the irq has action? Do you mean if (irq_has_action(irq) !irqd_is_per_cpu(data) !cpumask_subset(affinity, cpu_online_mask)) Yes, you're absolutely right. I mixed up the and and or's on this line of code. I will send out a patch against tip shortly. P. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v3]
On Fri, Dec 20, 2013 at 10:50:09AM -0500, Prarit Bhargava wrote: > +int check_vectors(void) > +{ > + int irq, cpu; > + unsigned int vector, this_count, count; > + struct irq_desc *desc; > + struct irq_data *data; > + struct cpumask *affinity; > + > + this_count = 0; > + for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) { > + irq = __this_cpu_read(vector_irq[vector]); > + if (irq >= 0) { > + desc = irq_to_desc(irq); > + data = irq_desc_get_irq_data(desc); > + affinity = data->affinity; > + if (irq_has_action(irq) || !irqd_is_per_cpu(data) || This line looks weird. Why counter will be increased once the irq has action? Do you mean if (irq_has_action(irq) && !irqd_is_per_cpu(data) && !cpumask_subset(affinity, cpu_online_mask)) > + !cpumask_subset(affinity, cpu_online_mask)) It looks like cpu_online_mask will be updated until cpu_disable_common is called, but your check_vectors is called before that. [...] > { > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index 85dc05a..7ac6654 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -1312,6 +1312,12 @@ void cpu_disable_common(void) > > int native_cpu_disable(void) > { > + int ret; > + > + ret = check_vectors(); > + if (ret) > + return ret; > + > clear_local_APIC(); > > cpu_disable_common(); > -- > 1.7.9.3 > signature.asc Description: Digital signature
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v3]
On Fri, Dec 20, 2013 at 10:50:09AM -0500, Prarit Bhargava wrote: +int check_vectors(void) +{ + int irq, cpu; + unsigned int vector, this_count, count; + struct irq_desc *desc; + struct irq_data *data; + struct cpumask *affinity; + + this_count = 0; + for (vector = FIRST_EXTERNAL_VECTOR; vector NR_VECTORS; vector++) { + irq = __this_cpu_read(vector_irq[vector]); + if (irq = 0) { + desc = irq_to_desc(irq); + data = irq_desc_get_irq_data(desc); + affinity = data-affinity; + if (irq_has_action(irq) || !irqd_is_per_cpu(data) || This line looks weird. Why counter will be increased once the irq has action? Do you mean if (irq_has_action(irq) !irqd_is_per_cpu(data) !cpumask_subset(affinity, cpu_online_mask)) + !cpumask_subset(affinity, cpu_online_mask)) It looks like cpu_online_mask will be updated until cpu_disable_common is called, but your check_vectors is called before that. [...] { diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 85dc05a..7ac6654 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1312,6 +1312,12 @@ void cpu_disable_common(void) int native_cpu_disable(void) { + int ret; + + ret = check_vectors(); + if (ret) + return ret; + clear_local_APIC(); cpu_disable_common(); -- 1.7.9.3 signature.asc Description: Digital signature
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v3]
On 12/20/2013 03:06 PM, Luck, Tony wrote: >> I haven't double checked, but I'm assuming the hot plug locks are held >> while you are doing this. > > I dug into that when looking at v2 - the whole thing is under "stop_machine()" > so locking was not an issue. > > Reviewed-by: Tony Luck > Awesome, thank you! I have queued this up for Linus already as a fix for 3.13, as it seems important enough. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] x86: Add check for number of available vectors before CPU down [v3]
> I haven't double checked, but I'm assuming the hot plug locks are held > while you are doing this. I dug into that when looking at v2 - the whole thing is under "stop_machine()" so locking was not an issue. Reviewed-by: Tony Luck -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v3]
On Fri, Dec 20, 2013 at 10:50:09AM -0500, Prarit Bhargava wrote: > I tested across various systems with both the linux.git and linux-next.git > trees and do not see any false positives. The patch behaves as expected on > a system with 590 IRQs allocated at boot time; the system refuses to down cpu > 62. I read the patch and it looks good to me I haven't double checked, but I'm assuming the hot plug locks are held while you are doing this. Reviewed-by: Andi Kleen -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 12/20/2013 04:41 AM, rui wang wrote: > On 12/20/13, Prarit Bhargava wrote: >> >> >> On 12/19/2013 01:05 PM, Tony Luck wrote: >>> On Wed, Dec 18, 2013 at 11:50 AM, Tony Luck wrote: Looks good to me. >>> >>> Though now I've been confused by an offline question about affinity. >> >> Heh :) I'm pursuing it now. Rui has asked a pretty good question that I >> don't >> know the answer to off the top of my head. I'm still looking at the code. >> >>> >>> Suppose we have some interrupt that has affinity to multiple cpus. E.g. >>> (real example from one of my machines): >>> >>> # cat /proc/irq/94/smp_affinity_list >>> 26,54 >>> >>> Now If I want to take either cpu26 or cpu54 offline - I'm guessing that I >>> don't >>> really need to find a new home for vector 94 - because the other one of >>> that >>> pair already has that set up. But your check_vectors code doesn't look >>> like >>> it accounts for that - if we take cpu26 offline - it would see that >>> cpu54 doesn't >>> have 94 free - but doesn't check that it is for the same interrupt. >>> >>> But I may be mixing "vectors" and "irqs" here. >> >> Yep. The question really is this: is the irq mapped to a single vector or >> multiple vectors. (I think) >> > > The vector number for an irq is programmed in the LSB of the IOAPIC > IRTE (or MSI data register in the case of MSI/MSIx). So there can be > only one vector number (although multiple CPUs can be specified > through DM). An MSI-capable device can dynamically change the lower > few bits in the LSB to signal multiple interrupts with a contiguous > range of vectors in powers of 2,but each of these vectors is treated > as a separate IRQ. i.e. each of them has a separate irq desc, or a > separate line in the /proc/interrupt file. This patch shows the MSI > irq allocation in detail: > http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=51906e779f2b13b38f8153774c4c7163d412ffd9 > Yep, that's where I got the idea for the affinity check in my previous post. So far linux.git top-of-tree + my patch looks good. I'm pulling down linux-next.git right now ... P. > Thanks > Rui > > >> P. >> >>> >>> -Tony >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ >> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 12/20/13, Prarit Bhargava wrote: > > > On 12/19/2013 01:05 PM, Tony Luck wrote: >> On Wed, Dec 18, 2013 at 11:50 AM, Tony Luck wrote: >>> Looks good to me. >> >> Though now I've been confused by an offline question about affinity. > > Heh :) I'm pursuing it now. Rui has asked a pretty good question that I > don't > know the answer to off the top of my head. I'm still looking at the code. > >> >> Suppose we have some interrupt that has affinity to multiple cpus. E.g. >> (real example from one of my machines): >> >> # cat /proc/irq/94/smp_affinity_list >> 26,54 >> >> Now If I want to take either cpu26 or cpu54 offline - I'm guessing that I >> don't >> really need to find a new home for vector 94 - because the other one of >> that >> pair already has that set up. But your check_vectors code doesn't look >> like >> it accounts for that - if we take cpu26 offline - it would see that >> cpu54 doesn't >> have 94 free - but doesn't check that it is for the same interrupt. >> >> But I may be mixing "vectors" and "irqs" here. > > Yep. The question really is this: is the irq mapped to a single vector or > multiple vectors. (I think) > The vector number for an irq is programmed in the LSB of the IOAPIC IRTE (or MSI data register in the case of MSI/MSIx). So there can be only one vector number (although multiple CPUs can be specified through DM). An MSI-capable device can dynamically change the lower few bits in the LSB to signal multiple interrupts with a contiguous range of vectors in powers of 2,but each of these vectors is treated as a separate IRQ. i.e. each of them has a separate irq desc, or a separate line in the /proc/interrupt file. This patch shows the MSI irq allocation in detail: http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=51906e779f2b13b38f8153774c4c7163d412ffd9 Thanks Rui > P. > >> >> -Tony > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 12/20/13, Prarit Bhargava pra...@redhat.com wrote: On 12/19/2013 01:05 PM, Tony Luck wrote: On Wed, Dec 18, 2013 at 11:50 AM, Tony Luck tony.l...@intel.com wrote: Looks good to me. Though now I've been confused by an offline question about affinity. Heh :) I'm pursuing it now. Rui has asked a pretty good question that I don't know the answer to off the top of my head. I'm still looking at the code. Suppose we have some interrupt that has affinity to multiple cpus. E.g. (real example from one of my machines): # cat /proc/irq/94/smp_affinity_list 26,54 Now If I want to take either cpu26 or cpu54 offline - I'm guessing that I don't really need to find a new home for vector 94 - because the other one of that pair already has that set up. But your check_vectors code doesn't look like it accounts for that - if we take cpu26 offline - it would see that cpu54 doesn't have 94 free - but doesn't check that it is for the same interrupt. But I may be mixing vectors and irqs here. Yep. The question really is this: is the irq mapped to a single vector or multiple vectors. (I think) The vector number for an irq is programmed in the LSB of the IOAPIC IRTE (or MSI data register in the case of MSI/MSIx). So there can be only one vector number (although multiple CPUs can be specified through DM). An MSI-capable device can dynamically change the lower few bits in the LSB to signal multiple interrupts with a contiguous range of vectors in powers of 2,but each of these vectors is treated as a separate IRQ. i.e. each of them has a separate irq desc, or a separate line in the /proc/interrupt file. This patch shows the MSI irq allocation in detail: http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=51906e779f2b13b38f8153774c4c7163d412ffd9 Thanks Rui P. -Tony -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 12/20/2013 04:41 AM, rui wang wrote: On 12/20/13, Prarit Bhargava pra...@redhat.com wrote: On 12/19/2013 01:05 PM, Tony Luck wrote: On Wed, Dec 18, 2013 at 11:50 AM, Tony Luck tony.l...@intel.com wrote: Looks good to me. Though now I've been confused by an offline question about affinity. Heh :) I'm pursuing it now. Rui has asked a pretty good question that I don't know the answer to off the top of my head. I'm still looking at the code. Suppose we have some interrupt that has affinity to multiple cpus. E.g. (real example from one of my machines): # cat /proc/irq/94/smp_affinity_list 26,54 Now If I want to take either cpu26 or cpu54 offline - I'm guessing that I don't really need to find a new home for vector 94 - because the other one of that pair already has that set up. But your check_vectors code doesn't look like it accounts for that - if we take cpu26 offline - it would see that cpu54 doesn't have 94 free - but doesn't check that it is for the same interrupt. But I may be mixing vectors and irqs here. Yep. The question really is this: is the irq mapped to a single vector or multiple vectors. (I think) The vector number for an irq is programmed in the LSB of the IOAPIC IRTE (or MSI data register in the case of MSI/MSIx). So there can be only one vector number (although multiple CPUs can be specified through DM). An MSI-capable device can dynamically change the lower few bits in the LSB to signal multiple interrupts with a contiguous range of vectors in powers of 2,but each of these vectors is treated as a separate IRQ. i.e. each of them has a separate irq desc, or a separate line in the /proc/interrupt file. This patch shows the MSI irq allocation in detail: http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=51906e779f2b13b38f8153774c4c7163d412ffd9 Yep, that's where I got the idea for the affinity check in my previous post. So far linux.git top-of-tree + my patch looks good. I'm pulling down linux-next.git right now ... P. Thanks Rui P. -Tony -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v3]
On Fri, Dec 20, 2013 at 10:50:09AM -0500, Prarit Bhargava wrote: I tested across various systems with both the linux.git and linux-next.git trees and do not see any false positives. The patch behaves as expected on a system with 590 IRQs allocated at boot time; the system refuses to down cpu 62. I read the patch and it looks good to me I haven't double checked, but I'm assuming the hot plug locks are held while you are doing this. Reviewed-by: Andi Kleen a...@linux.intel.com -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] x86: Add check for number of available vectors before CPU down [v3]
I haven't double checked, but I'm assuming the hot plug locks are held while you are doing this. I dug into that when looking at v2 - the whole thing is under stop_machine() so locking was not an issue. Reviewed-by: Tony Luck tony.l...@intel.com -Tony -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v3]
On 12/20/2013 03:06 PM, Luck, Tony wrote: I haven't double checked, but I'm assuming the hot plug locks are held while you are doing this. I dug into that when looking at v2 - the whole thing is under stop_machine() so locking was not an issue. Reviewed-by: Tony Luck tony.l...@intel.com Awesome, thank you! I have queued this up for Linus already as a fix for 3.13, as it seems important enough. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On Thu, Dec 19, 2013 at 01:11:32PM -0500, Prarit Bhargava wrote: > > Yep. The question really is this: is the irq mapped to a single vector or > multiple vectors. (I think) > > P. > Yes, this is the original thought I want to express on the bugzilla. On an ideal environment, without irq balance, all vector_irq should own same values, say, 0x. So when one CPU is off-lined, your patch will considers no places to contain these to-be-migrated irqs but the fact is they are shared on different CPUs. So if we can answer this question, the answer will be clear :-). signature.asc Description: Digital signature
Re: [PATCH] x86: Add check for number of available vectors before CPU down
On 12/19/2013 02:19 AM, rui wang wrote: > On 12/19/13, Prarit Bhargava wrote: >> >> >> On 12/03/2013 09:48 PM, rui wang wrote: >>> On 11/20/13, Prarit Bhargava wrote: >>> Have you considered the case when an IRQ is destined to more than one CPU? >>> e.g.: >>> >>> bash# cat /proc/irq/89/smp_affinity_list >>> 30,62 >>> bash# >>> >>> In this case offlining CPU30 does not seem to require an empty vector >>> slot. It seems that we only need to change the affinity mask of irq89. >>> Your check_vectors() assumed that each irq on the offlining cpu >>> requires a new vector slot. >>> >> >> Rui, >> >> The smp_affinity_list only indicates a preferred destination of the IRQ, not >> the >> *actual* location of the CPU. So the IRQ is on one of cpu 30 or 62 but not >> both >> simultaneously. >> > > It depends on how IOAPIC (or MSI/MSIx) is configured. An IRQ can be > simultaneously broadcast to all destination CPUs (Fixed Mode) or > delivered to the CPU with the lowest priority task (Lowest Priority > Mode). It's programmed in the Delivery Mode bits of the IOAPIC's IO > Redirection table registers, or the Message Data Register in the case > of MSI/MSIx > Thanks for clueing me in Rui :). You're right. I do need to do a if (irq_has_action(irq) || !irqd_is_per_cpu(data) || !cpumask_subset(affinity, cpu_online_mask)) instead of just if (!irqd_is_per_cpu(data)) I'm going to do some additional testing tonight across various systems and will repost tomorrow if the testing is successful. P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 12/19/2013 01:05 PM, Tony Luck wrote: > On Wed, Dec 18, 2013 at 11:50 AM, Tony Luck wrote: >> Looks good to me. > > Though now I've been confused by an offline question about affinity. Heh :) I'm pursuing it now. Rui has asked a pretty good question that I don't know the answer to off the top of my head. I'm still looking at the code. > > Suppose we have some interrupt that has affinity to multiple cpus. E.g. > (real example from one of my machines): > > # cat /proc/irq/94/smp_affinity_list > 26,54 > > Now If I want to take either cpu26 or cpu54 offline - I'm guessing that I > don't > really need to find a new home for vector 94 - because the other one of that > pair already has that set up. But your check_vectors code doesn't look like > it accounts for that - if we take cpu26 offline - it would see that > cpu54 doesn't > have 94 free - but doesn't check that it is for the same interrupt. > > But I may be mixing "vectors" and "irqs" here. Yep. The question really is this: is the irq mapped to a single vector or multiple vectors. (I think) P. > > -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On Wed, Dec 18, 2013 at 11:50 AM, Tony Luck wrote: > Looks good to me. Though now I've been confused by an offline question about affinity. Suppose we have some interrupt that has affinity to multiple cpus. E.g. (real example from one of my machines): # cat /proc/irq/94/smp_affinity_list 26,54 Now If I want to take either cpu26 or cpu54 offline - I'm guessing that I don't really need to find a new home for vector 94 - because the other one of that pair already has that set up. But your check_vectors code doesn't look like it accounts for that - if we take cpu26 offline - it would see that cpu54 doesn't have 94 free - but doesn't check that it is for the same interrupt. But I may be mixing "vectors" and "irqs" here. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down
On 12/19/2013 02:19 AM, rui wang wrote: > On 12/19/13, Prarit Bhargava wrote: >> >> >> On 12/03/2013 09:48 PM, rui wang wrote: >>> On 11/20/13, Prarit Bhargava wrote: >>> Have you considered the case when an IRQ is destined to more than one CPU? >>> e.g.: >>> >>> bash# cat /proc/irq/89/smp_affinity_list >>> 30,62 >>> bash# >>> >>> In this case offlining CPU30 does not seem to require an empty vector >>> slot. It seems that we only need to change the affinity mask of irq89. >>> Your check_vectors() assumed that each irq on the offlining cpu >>> requires a new vector slot. >>> >> >> Rui, >> >> The smp_affinity_list only indicates a preferred destination of the IRQ, not >> the >> *actual* location of the CPU. So the IRQ is on one of cpu 30 or 62 but not >> both >> simultaneously. >> > > It depends on how IOAPIC (or MSI/MSIx) is configured. An IRQ can be > simultaneously broadcast to all destination CPUs (Fixed Mode) or > delivered to the CPU with the lowest priority task (Lowest Priority > Mode). It's programmed in the Delivery Mode bits of the IOAPIC's IO > Redirection table registers, or the Message Data Register in the case > of MSI/MSIx Hmm ... I didn't realize that this was a possibility. I'll go back and rework the patch. Thanks for the info Rui! P. > >> If the case is that 62 is being brought down, then the smp_affinity mask >> will be >> updated to reflect only cpu 30 (and vice versa). >> > > Yes the affinity mask should be updated. But if it was destined to > more than one CPU, your "this_counter" does not seem to count the > right numbers. Are you saying that smp_affinity mask is broken on > Linux so that there's no way to configure an IRQ to target more than > one CPU? > > Thanks > Rui > >> P. >> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down
On 12/19/2013 02:19 AM, rui wang wrote: On 12/19/13, Prarit Bhargava pra...@redhat.com wrote: On 12/03/2013 09:48 PM, rui wang wrote: On 11/20/13, Prarit Bhargava pra...@redhat.com wrote: Have you considered the case when an IRQ is destined to more than one CPU? e.g.: bash# cat /proc/irq/89/smp_affinity_list 30,62 bash# In this case offlining CPU30 does not seem to require an empty vector slot. It seems that we only need to change the affinity mask of irq89. Your check_vectors() assumed that each irq on the offlining cpu requires a new vector slot. Rui, The smp_affinity_list only indicates a preferred destination of the IRQ, not the *actual* location of the CPU. So the IRQ is on one of cpu 30 or 62 but not both simultaneously. It depends on how IOAPIC (or MSI/MSIx) is configured. An IRQ can be simultaneously broadcast to all destination CPUs (Fixed Mode) or delivered to the CPU with the lowest priority task (Lowest Priority Mode). It's programmed in the Delivery Mode bits of the IOAPIC's IO Redirection table registers, or the Message Data Register in the case of MSI/MSIx Hmm ... I didn't realize that this was a possibility. I'll go back and rework the patch. Thanks for the info Rui! P. If the case is that 62 is being brought down, then the smp_affinity mask will be updated to reflect only cpu 30 (and vice versa). Yes the affinity mask should be updated. But if it was destined to more than one CPU, your this_counter does not seem to count the right numbers. Are you saying that smp_affinity mask is broken on Linux so that there's no way to configure an IRQ to target more than one CPU? Thanks Rui P. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On Wed, Dec 18, 2013 at 11:50 AM, Tony Luck tony.l...@intel.com wrote: Looks good to me. Though now I've been confused by an offline question about affinity. Suppose we have some interrupt that has affinity to multiple cpus. E.g. (real example from one of my machines): # cat /proc/irq/94/smp_affinity_list 26,54 Now If I want to take either cpu26 or cpu54 offline - I'm guessing that I don't really need to find a new home for vector 94 - because the other one of that pair already has that set up. But your check_vectors code doesn't look like it accounts for that - if we take cpu26 offline - it would see that cpu54 doesn't have 94 free - but doesn't check that it is for the same interrupt. But I may be mixing vectors and irqs here. -Tony -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On 12/19/2013 01:05 PM, Tony Luck wrote: On Wed, Dec 18, 2013 at 11:50 AM, Tony Luck tony.l...@intel.com wrote: Looks good to me. Though now I've been confused by an offline question about affinity. Heh :) I'm pursuing it now. Rui has asked a pretty good question that I don't know the answer to off the top of my head. I'm still looking at the code. Suppose we have some interrupt that has affinity to multiple cpus. E.g. (real example from one of my machines): # cat /proc/irq/94/smp_affinity_list 26,54 Now If I want to take either cpu26 or cpu54 offline - I'm guessing that I don't really need to find a new home for vector 94 - because the other one of that pair already has that set up. But your check_vectors code doesn't look like it accounts for that - if we take cpu26 offline - it would see that cpu54 doesn't have 94 free - but doesn't check that it is for the same interrupt. But I may be mixing vectors and irqs here. Yep. The question really is this: is the irq mapped to a single vector or multiple vectors. (I think) P. -Tony -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down
On 12/19/2013 02:19 AM, rui wang wrote: On 12/19/13, Prarit Bhargava pra...@redhat.com wrote: On 12/03/2013 09:48 PM, rui wang wrote: On 11/20/13, Prarit Bhargava pra...@redhat.com wrote: Have you considered the case when an IRQ is destined to more than one CPU? e.g.: bash# cat /proc/irq/89/smp_affinity_list 30,62 bash# In this case offlining CPU30 does not seem to require an empty vector slot. It seems that we only need to change the affinity mask of irq89. Your check_vectors() assumed that each irq on the offlining cpu requires a new vector slot. Rui, The smp_affinity_list only indicates a preferred destination of the IRQ, not the *actual* location of the CPU. So the IRQ is on one of cpu 30 or 62 but not both simultaneously. It depends on how IOAPIC (or MSI/MSIx) is configured. An IRQ can be simultaneously broadcast to all destination CPUs (Fixed Mode) or delivered to the CPU with the lowest priority task (Lowest Priority Mode). It's programmed in the Delivery Mode bits of the IOAPIC's IO Redirection table registers, or the Message Data Register in the case of MSI/MSIx Thanks for clueing me in Rui :). You're right. I do need to do a if (irq_has_action(irq) || !irqd_is_per_cpu(data) || !cpumask_subset(affinity, cpu_online_mask)) instead of just if (!irqd_is_per_cpu(data)) I'm going to do some additional testing tonight across various systems and will repost tomorrow if the testing is successful. P. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On Thu, Dec 19, 2013 at 01:11:32PM -0500, Prarit Bhargava wrote: Yep. The question really is this: is the irq mapped to a single vector or multiple vectors. (I think) P. Yes, this is the original thought I want to express on the bugzilla. On an ideal environment, without irq balance, all vector_irq should own same values, say, 0x. So when one CPU is off-lined, your patch will considers no places to contain these to-be-migrated irqs but the fact is they are shared on different CPUs. So if we can answer this question, the answer will be clear :-). signature.asc Description: Digital signature
Re: [PATCH] x86: Add check for number of available vectors before CPU down
On 12/19/13, Prarit Bhargava wrote: > > > On 12/03/2013 09:48 PM, rui wang wrote: >> On 11/20/13, Prarit Bhargava wrote: >> Have you considered the case when an IRQ is destined to more than one CPU? >> e.g.: >> >> bash# cat /proc/irq/89/smp_affinity_list >> 30,62 >> bash# >> >> In this case offlining CPU30 does not seem to require an empty vector >> slot. It seems that we only need to change the affinity mask of irq89. >> Your check_vectors() assumed that each irq on the offlining cpu >> requires a new vector slot. >> > > Rui, > > The smp_affinity_list only indicates a preferred destination of the IRQ, not > the > *actual* location of the CPU. So the IRQ is on one of cpu 30 or 62 but not > both > simultaneously. > It depends on how IOAPIC (or MSI/MSIx) is configured. An IRQ can be simultaneously broadcast to all destination CPUs (Fixed Mode) or delivered to the CPU with the lowest priority task (Lowest Priority Mode). It's programmed in the Delivery Mode bits of the IOAPIC's IO Redirection table registers, or the Message Data Register in the case of MSI/MSIx > If the case is that 62 is being brought down, then the smp_affinity mask > will be > updated to reflect only cpu 30 (and vice versa). > Yes the affinity mask should be updated. But if it was destined to more than one CPU, your "this_counter" does not seem to count the right numbers. Are you saying that smp_affinity mask is broken on Linux so that there's no way to configure an IRQ to target more than one CPU? Thanks Rui > P. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On Wed, Dec 18, 2013 at 11:29 AM, Prarit Bhargava wrote: > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791 > > When a cpu is downed on a system, the irqs on the cpu are assigned to other > cpus. It is possible, however, that when a cpu is downed there aren't enough > free vectors on the remaining cpus to account for the vectors from the cpu > that is being downed. > > This results in an interesting "overflow" condition where irqs are "assigned" > to a CPU but are not handled. Looks good to me. Acked-by: Tony Luck -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down
On 12/03/2013 09:48 PM, rui wang wrote: > On 11/20/13, Prarit Bhargava wrote: > Have you considered the case when an IRQ is destined to more than one CPU? > e.g.: > > bash# cat /proc/irq/89/smp_affinity_list > 30,62 > bash# > > In this case offlining CPU30 does not seem to require an empty vector > slot. It seems that we only need to change the affinity mask of irq89. > Your check_vectors() assumed that each irq on the offlining cpu > requires a new vector slot. > Rui, The smp_affinity_list only indicates a preferred destination of the IRQ, not the *actual* location of the CPU. So the IRQ is on one of cpu 30 or 62 but not both simultaneously. If the case is that 62 is being brought down, then the smp_affinity mask will be updated to reflect only cpu 30 (and vice versa). P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down
On 12/03/2013 09:48 PM, rui wang wrote: On 11/20/13, Prarit Bhargava pra...@redhat.com wrote: Have you considered the case when an IRQ is destined to more than one CPU? e.g.: bash# cat /proc/irq/89/smp_affinity_list 30,62 bash# In this case offlining CPU30 does not seem to require an empty vector slot. It seems that we only need to change the affinity mask of irq89. Your check_vectors() assumed that each irq on the offlining cpu requires a new vector slot. Rui, The smp_affinity_list only indicates a preferred destination of the IRQ, not the *actual* location of the CPU. So the IRQ is on one of cpu 30 or 62 but not both simultaneously. If the case is that 62 is being brought down, then the smp_affinity mask will be updated to reflect only cpu 30 (and vice versa). P. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]
On Wed, Dec 18, 2013 at 11:29 AM, Prarit Bhargava pra...@redhat.com wrote: Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791 When a cpu is downed on a system, the irqs on the cpu are assigned to other cpus. It is possible, however, that when a cpu is downed there aren't enough free vectors on the remaining cpus to account for the vectors from the cpu that is being downed. This results in an interesting overflow condition where irqs are assigned to a CPU but are not handled. Looks good to me. Acked-by: Tony Luck tony.l...@intel.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down
On 12/19/13, Prarit Bhargava pra...@redhat.com wrote: On 12/03/2013 09:48 PM, rui wang wrote: On 11/20/13, Prarit Bhargava pra...@redhat.com wrote: Have you considered the case when an IRQ is destined to more than one CPU? e.g.: bash# cat /proc/irq/89/smp_affinity_list 30,62 bash# In this case offlining CPU30 does not seem to require an empty vector slot. It seems that we only need to change the affinity mask of irq89. Your check_vectors() assumed that each irq on the offlining cpu requires a new vector slot. Rui, The smp_affinity_list only indicates a preferred destination of the IRQ, not the *actual* location of the CPU. So the IRQ is on one of cpu 30 or 62 but not both simultaneously. It depends on how IOAPIC (or MSI/MSIx) is configured. An IRQ can be simultaneously broadcast to all destination CPUs (Fixed Mode) or delivered to the CPU with the lowest priority task (Lowest Priority Mode). It's programmed in the Delivery Mode bits of the IOAPIC's IO Redirection table registers, or the Message Data Register in the case of MSI/MSIx If the case is that 62 is being brought down, then the smp_affinity mask will be updated to reflect only cpu 30 (and vice versa). Yes the affinity mask should be updated. But if it was destined to more than one CPU, your this_counter does not seem to count the right numbers. Are you saying that smp_affinity mask is broken on Linux so that there's no way to configure an IRQ to target more than one CPU? Thanks Rui P. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down
On 12/03/2013 06:42 PM, Yu, Fenghua wrote: > > >> -Original Message- >> From: Prarit Bhargava [mailto:pra...@redhat.com] >> >> Second try at this ... >> >> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791 >> >> When a cpu is downed on a system, the irqs on the cpu are assigned to >> other cpus. It is possible, however, that when a cpu is downed there >> aren't enough free vectors on the remaining cpus to account for the >> vectors from the cpu that is being downed. >> >> This results in an interesting "overflow" condition where irqs are >> "assigned" to a CPU but are not handled. >> >> For example, when downing cpus on a 1-64 logical processor system: >> >> +if (per_cpu(vector_irq, cpu)[vector] < 0) >> +count++; > > But later on fixup_irqs will set some of vector_irq vector as -1 on this > to-be-disabled cpu. That will release vectors assigned to this cpu. So > checking vector_irq at this point before fixup_irqs doesn't make sense, right? > Fenghua, IIUC the purpose of fixup_irqs() is to migrate irqs from the to-be-disabled cpu to other cpus by using affinity, etc.. The to-be-disabled cpus' irqs are set to -1, however, only after they are migrated off to an enabled cpu. This check I'm introducing is well before that stage in effort to determine whether or not the system has "room" for the irqs on the remaining cpus. So I think that the check is in the right spot. We should do it early. I suppose an argument could be made to do the check in fixup_irqs(), however, my feeling is by then it is very late in the process of downing the cpu. I noticed a subtle bug in my patch that I will send with a [v2]. I'm not taking into account per-cpu interrupts in the check_vectors() function. P. > Thanks. > > -Fenghua > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down
On 12/03/2013 06:42 PM, Yu, Fenghua wrote: -Original Message- From: Prarit Bhargava [mailto:pra...@redhat.com] Second try at this ... Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791 When a cpu is downed on a system, the irqs on the cpu are assigned to other cpus. It is possible, however, that when a cpu is downed there aren't enough free vectors on the remaining cpus to account for the vectors from the cpu that is being downed. This results in an interesting overflow condition where irqs are assigned to a CPU but are not handled. For example, when downing cpus on a 1-64 logical processor system: +if (per_cpu(vector_irq, cpu)[vector] 0) +count++; But later on fixup_irqs will set some of vector_irq vector as -1 on this to-be-disabled cpu. That will release vectors assigned to this cpu. So checking vector_irq at this point before fixup_irqs doesn't make sense, right? Fenghua, IIUC the purpose of fixup_irqs() is to migrate irqs from the to-be-disabled cpu to other cpus by using affinity, etc.. The to-be-disabled cpus' irqs are set to -1, however, only after they are migrated off to an enabled cpu. This check I'm introducing is well before that stage in effort to determine whether or not the system has room for the irqs on the remaining cpus. So I think that the check is in the right spot. We should do it early. I suppose an argument could be made to do the check in fixup_irqs(), however, my feeling is by then it is very late in the process of downing the cpu. I noticed a subtle bug in my patch that I will send with a [v2]. I'm not taking into account per-cpu interrupts in the check_vectors() function. P. Thanks. -Fenghua -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down
On 11/20/13, Prarit Bhargava wrote: > Second try at this ... > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791 > > When a cpu is downed on a system, the irqs on the cpu are assigned to other > cpus. It is possible, however, that when a cpu is downed there aren't > enough free vectors on the remaining cpus to account for the vectors from > the cpu that is being downed. > > This results in an interesting "overflow" condition where irqs are > "assigned" to a CPU but are not handled. > > For example, when downing cpus on a 1-64 logical processor system: > > > [ 232.021745] smpboot: CPU 61 is now offline > [ 238.480275] smpboot: CPU 62 is now offline > [ 245.991080] [ cut here ] > [ 245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 > dev_watchdog+0x246/0x250() > [ 246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out > [ 246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support > sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp > mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas > scsi_transport_sas > [ 246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14 > [ 246.044451] Hardware name: Intel Corporation S4600LH > ../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 > [ 246.057371] 0009 88081fa03d40 8164fbf6 > 88081fa0ee48 > [ 246.065728] 88081fa03d90 88081fa03d80 81054ecc > 88081fa13040 > [ 246.074073] 88200cce 0040 > > [ 246.082430] Call Trace: > [ 246.085174][] dump_stack+0x46/0x58 > [ 246.091633] [] warn_slowpath_common+0x8c/0xc0 > [ 246.098352] [] warn_slowpath_fmt+0x46/0x50 > [ 246.104786] [] dev_watchdog+0x246/0x250 > [ 246.110923] [] ? > dev_deactivate_queue.constprop.31+0x80/0x80 > [ 246.119097] [] call_timer_fn+0x3a/0x110 > [ 246.125224] [] ? update_process_times+0x6f/0x80 > [ 246.132137] [] ? > dev_deactivate_queue.constprop.31+0x80/0x80 > [ 246.140308] [] run_timer_softirq+0x1f0/0x2a0 > [ 246.146933] [] __do_softirq+0xe0/0x220 > [ 246.152976] [] call_softirq+0x1c/0x30 > [ 246.158920] [] do_softirq+0x55/0x90 > [ 246.164670] [] irq_exit+0xa5/0xb0 > [ 246.170227] [] smp_apic_timer_interrupt+0x4a/0x60 > [ 246.177324] [] apic_timer_interrupt+0x6a/0x70 > [ 246.184041][] ? cpuidle_enter_state+0x5b/0xe0 > [ 246.191559] [] ? cpuidle_enter_state+0x57/0xe0 > [ 246.198374] [] cpuidle_idle_call+0xbd/0x200 > [ 246.204900] [] arch_cpu_idle+0xe/0x30 > [ 246.210846] [] cpu_startup_entry+0xd0/0x250 > [ 246.217371] [] rest_init+0x77/0x80 > [ 246.223028] [] start_kernel+0x3ee/0x3fb > [ 246.229165] [] ? repair_env_string+0x5e/0x5e > [ 246.235787] [] x86_64_start_reservations+0x2a/0x2c > [ 246.242990] [] x86_64_start_kernel+0xf8/0xfc > [ 246.249610] ---[ end trace fb74fdef54d79039 ]--- > [ 246.254807] ixgbe :c2:00.0 p786p1: initiating reset due to tx > timeout > [ 246.262489] ixgbe :c2:00.0 p786p1: Reset adapter > Last login: Mon Nov 11 08:35:14 from 10.18.17.119 > [root@(none) ~]# [ 246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5 > [ 249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow > Control: RX/TX > [ 246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5 > [ 249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow > Control: RX/TX > > (last lines keep repeating. ixgbe driver is dead until module reload.) > > If the downed cpu has more vectors than are free on the remaining cpus on > the > system, it is possible that some vectors are "orphaned" even though they > are > assigned to a cpu. In this case, since the ixgbe driver had a watchdog, > the > watchdog fired and notified that something was wrong. > > This patch adds a function, check_vectors(), to compare the number of > vectors > on the CPU going down and compares it to the number of vectors available on > the system. If there aren't enough vectors for the CPU to go down, an > error is returned and propogated back to userspace. > > Signed-off-by: Prarit Bhargava > Cc: x...@kernel.org > --- > arch/x86/include/asm/irq.h |1 + > arch/x86/kernel/irq.c | 33 + > arch/x86/kernel/smpboot.c |6 ++ > 3 files changed, 40 insertions(+) > > diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h > index 0ea10f27..dfd7372 100644 > --- a/arch/x86/include/asm/irq.h > +++ b/arch/x86/include/asm/irq.h > @@ -25,6 +25,7 @@ extern void irq_ctx_init(int cpu); > > #ifdef CONFIG_HOTPLUG_CPU > #include > +extern int check_vectors(void); > extern void fixup_irqs(void); > extern void irq_force_complete_move(int); > #endif > diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c > index 22d0687..ea5fa19 100644 > --- a/arch/x86/kernel/irq.c > +++ b/arch/x86/kernel/irq.c > @@ -262,6 +262,39 @@ __visible void
RE: [PATCH] x86: Add check for number of available vectors before CPU down
> -Original Message- > From: Prarit Bhargava [mailto:pra...@redhat.com] > > Second try at this ... > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791 > > When a cpu is downed on a system, the irqs on the cpu are assigned to > other cpus. It is possible, however, that when a cpu is downed there > aren't enough free vectors on the remaining cpus to account for the > vectors from the cpu that is being downed. > > This results in an interesting "overflow" condition where irqs are > "assigned" to a CPU but are not handled. > > For example, when downing cpus on a 1-64 logical processor system: > > + if (per_cpu(vector_irq, cpu)[vector] < 0) > + count++; But later on fixup_irqs will set some of vector_irq vector as -1 on this to-be-disabled cpu. That will release vectors assigned to this cpu. So checking vector_irq at this point before fixup_irqs doesn't make sense, right? Thanks. -Fenghua -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] x86: Add check for number of available vectors before CPU down
-Original Message- From: Prarit Bhargava [mailto:pra...@redhat.com] Second try at this ... Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791 When a cpu is downed on a system, the irqs on the cpu are assigned to other cpus. It is possible, however, that when a cpu is downed there aren't enough free vectors on the remaining cpus to account for the vectors from the cpu that is being downed. This results in an interesting overflow condition where irqs are assigned to a CPU but are not handled. For example, when downing cpus on a 1-64 logical processor system: + if (per_cpu(vector_irq, cpu)[vector] 0) + count++; But later on fixup_irqs will set some of vector_irq vector as -1 on this to-be-disabled cpu. That will release vectors assigned to this cpu. So checking vector_irq at this point before fixup_irqs doesn't make sense, right? Thanks. -Fenghua -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add check for number of available vectors before CPU down
On 11/20/13, Prarit Bhargava pra...@redhat.com wrote: Second try at this ... Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791 When a cpu is downed on a system, the irqs on the cpu are assigned to other cpus. It is possible, however, that when a cpu is downed there aren't enough free vectors on the remaining cpus to account for the vectors from the cpu that is being downed. This results in an interesting overflow condition where irqs are assigned to a CPU but are not handled. For example, when downing cpus on a 1-64 logical processor system: snip [ 232.021745] smpboot: CPU 61 is now offline [ 238.480275] smpboot: CPU 62 is now offline [ 245.991080] [ cut here ] [ 245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x246/0x250() [ 246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out [ 246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas scsi_transport_sas [ 246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14 [ 246.044451] Hardware name: Intel Corporation S4600LH ../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [ 246.057371] 0009 88081fa03d40 8164fbf6 88081fa0ee48 [ 246.065728] 88081fa03d90 88081fa03d80 81054ecc 88081fa13040 [ 246.074073] 88200cce 0040 [ 246.082430] Call Trace: [ 246.085174] IRQ [8164fbf6] dump_stack+0x46/0x58 [ 246.091633] [81054ecc] warn_slowpath_common+0x8c/0xc0 [ 246.098352] [81054fb6] warn_slowpath_fmt+0x46/0x50 [ 246.104786] [815710d6] dev_watchdog+0x246/0x250 [ 246.110923] [81570e90] ? dev_deactivate_queue.constprop.31+0x80/0x80 [ 246.119097] [8106092a] call_timer_fn+0x3a/0x110 [ 246.125224] [8106280f] ? update_process_times+0x6f/0x80 [ 246.132137] [81570e90] ? dev_deactivate_queue.constprop.31+0x80/0x80 [ 246.140308] [81061db0] run_timer_softirq+0x1f0/0x2a0 [ 246.146933] [81059a80] __do_softirq+0xe0/0x220 [ 246.152976] [8165fedc] call_softirq+0x1c/0x30 [ 246.158920] [810045f5] do_softirq+0x55/0x90 [ 246.164670] [81059d35] irq_exit+0xa5/0xb0 [ 246.170227] [8166062a] smp_apic_timer_interrupt+0x4a/0x60 [ 246.177324] [8165f40a] apic_timer_interrupt+0x6a/0x70 [ 246.184041] EOI [81505a1b] ? cpuidle_enter_state+0x5b/0xe0 [ 246.191559] [81505a17] ? cpuidle_enter_state+0x57/0xe0 [ 246.198374] [81505b5d] cpuidle_idle_call+0xbd/0x200 [ 246.204900] [8100b7ae] arch_cpu_idle+0xe/0x30 [ 246.210846] [810a47b0] cpu_startup_entry+0xd0/0x250 [ 246.217371] [81646b47] rest_init+0x77/0x80 [ 246.223028] [81d09e8e] start_kernel+0x3ee/0x3fb [ 246.229165] [81d0989f] ? repair_env_string+0x5e/0x5e [ 246.235787] [81d095a5] x86_64_start_reservations+0x2a/0x2c [ 246.242990] [81d0969f] x86_64_start_kernel+0xf8/0xfc [ 246.249610] ---[ end trace fb74fdef54d79039 ]--- [ 246.254807] ixgbe :c2:00.0 p786p1: initiating reset due to tx timeout [ 246.262489] ixgbe :c2:00.0 p786p1: Reset adapter Last login: Mon Nov 11 08:35:14 from 10.18.17.119 [root@(none) ~]# [ 246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5 [ 249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX [ 246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5 [ 249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX (last lines keep repeating. ixgbe driver is dead until module reload.) If the downed cpu has more vectors than are free on the remaining cpus on the system, it is possible that some vectors are orphaned even though they are assigned to a cpu. In this case, since the ixgbe driver had a watchdog, the watchdog fired and notified that something was wrong. This patch adds a function, check_vectors(), to compare the number of vectors on the CPU going down and compares it to the number of vectors available on the system. If there aren't enough vectors for the CPU to go down, an error is returned and propogated back to userspace. Signed-off-by: Prarit Bhargava pra...@redhat.com Cc: x...@kernel.org --- arch/x86/include/asm/irq.h |1 + arch/x86/kernel/irq.c | 33 + arch/x86/kernel/smpboot.c |6 ++ 3 files changed, 40 insertions(+) diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h index 0ea10f27..dfd7372 100644 --- a/arch/x86/include/asm/irq.h +++ b/arch/x86/include/asm/irq.h @@ -25,6 +25,7 @@ extern void irq_ctx_init(int cpu); #ifdef CONFIG_HOTPLUG_CPU