Re: [PATCH] x86: Add check for number of available vectors before CPU down [v7]

2014-01-13 Thread Ingo Molnar

* Prarit Bhargava  wrote:

> 
> 
> On 01/12/2014 04:56 AM, Ingo Molnar wrote:
> > 
> > * Prarit Bhargava  wrote:
> > 
> >> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791
> >>
> >> When a cpu is downed on a system, the irqs on the cpu are assigned to
> >> other cpus.  It is possible, however, that when a cpu is downed there
> >> aren't enough free vectors on the remaining cpus to account for the
> >> vectors from the cpu that is being downed.
> >>
> >> This results in an interesting "overflow" condition where irqs are
> >> "assigned" to a CPU but are not handled.
> >>
> >> For example, when downing cpus on a 1-64 logical processor system:
> >>
> >> 
> >> [  232.021745] smpboot: CPU 61 is now offline
> >> [  238.480275] smpboot: CPU 62 is now offline
> >> [  245.991080] [ cut here ]
> >> [  245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 
> >> dev_watchdog+0x246/0x250()
> >> [  246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out
> >> [  246.013070] Modules linked in: lockd sunrpc iTCO_wdt 
> >> iTCO_vendor_support sb_edac ixgbe microcode e1000e pcspkr joydev edac_core 
> >> lpc_ich ioatdma ptp mdio mfd_core i2c_i801 dca pps_core i2c_core wmi 
> >> acpi_cpufreq isci libsas scsi_transport_sas
> >> [  246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14
> >> [  246.044451] Hardware name: Intel Corporation S4600LH 
> >> ../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
> >> [  246.057371]  0009 88081fa03d40 8164fbf6 
> >> 88081fa0ee48
> >> [  246.065728]  88081fa03d90 88081fa03d80 81054ecc 
> >> 88081fa13040
> >> [  246.074073]   88200cce 0040 
> >> 
> >> [  246.082430] Call Trace:
> >> [  246.085174][] dump_stack+0x46/0x58
> >> [  246.091633]  [] warn_slowpath_common+0x8c/0xc0
> >> [  246.098352]  [] warn_slowpath_fmt+0x46/0x50
> >> [  246.104786]  [] dev_watchdog+0x246/0x250
> >> [  246.110923]  [] ? 
> >> dev_deactivate_queue.constprop.31+0x80/0x80
> >> [  246.119097]  [] call_timer_fn+0x3a/0x110
> >> [  246.125224]  [] ? update_process_times+0x6f/0x80
> >> [  246.132137]  [] ? 
> >> dev_deactivate_queue.constprop.31+0x80/0x80
> >> [  246.140308]  [] run_timer_softirq+0x1f0/0x2a0
> >> [  246.146933]  [] __do_softirq+0xe0/0x220
> >> [  246.152976]  [] call_softirq+0x1c/0x30
> >> [  246.158920]  [] do_softirq+0x55/0x90
> >> [  246.164670]  [] irq_exit+0xa5/0xb0
> >> [  246.170227]  [] smp_apic_timer_interrupt+0x4a/0x60
> >> [  246.177324]  [] apic_timer_interrupt+0x6a/0x70
> >> [  246.184041][] ? cpuidle_enter_state+0x5b/0xe0
> >> [  246.191559]  [] ? cpuidle_enter_state+0x57/0xe0
> >> [  246.198374]  [] cpuidle_idle_call+0xbd/0x200
> >> [  246.204900]  [] arch_cpu_idle+0xe/0x30
> >> [  246.210846]  [] cpu_startup_entry+0xd0/0x250
> >> [  246.217371]  [] rest_init+0x77/0x80
> >> [  246.223028]  [] start_kernel+0x3ee/0x3fb
> >> [  246.229165]  [] ? repair_env_string+0x5e/0x5e
> >> [  246.235787]  [] x86_64_start_reservations+0x2a/0x2c
> >> [  246.242990]  [] x86_64_start_kernel+0xf8/0xfc
> >> [  246.249610] ---[ end trace fb74fdef54d79039 ]---
> >> [  246.254807] ixgbe :c2:00.0 p786p1: initiating reset due to tx 
> >> timeout
> >> [  246.262489] ixgbe :c2:00.0 p786p1: Reset adapter
> >> Last login: Mon Nov 11 08:35:14 from 10.18.17.119
> >> [root@(none) ~]# [  246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5
> >> [  249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow 
> >> Control: RX/TX
> >> [  246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5
> >> [  249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow 
> >> Control: RX/TX
> >>
> >> (last lines keep repeating.  ixgbe driver is dead until module reload.)
> >>
> >> If the downed cpu has more vectors than are free on the remaining cpus on 
> >> the
> >> system, it is possible that some vectors are "orphaned" even though they 
> >> are
> >> assigned to a cpu.  In this case, since the ixgbe driver had a watchdog, 
> >> the
> >> watchdog fired and notified that something was wrong.
> >>
> >> This patch adds a function, check_vectors(), to compare the number of 
> >> vectors
> >> on the CPU going down and compares it to the number of vectors available on
> >> the system.  If there aren't enough vectors for the CPU to go down, an
> >> error is returned and propogated back to userspace.
> >>
> >> v2: Do not need to look at percpu irqs
> >> v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest
> >> Priority Mode
> >> v4: Additional changes suggested by Gong Chen.
> >> v5/v6/v7: Updated comment text
> >>
> >> Signed-off-by: Prarit Bhargava 
> >> Reviewed-by: Gong Chen 
> >> Cc: Andi Kleen 
> >> Cc: Michel Lespinasse 
> >> Cc: Seiji Aguchi 
> >> Cc: Yang Zhang 
> >> Cc: Paul Gortmaker 
> >> Cc: Janet Morgan 
> >> Cc: Tony Luck 
> >> Cc: Ruiv Wang 
> >> Cc: Gong Chen 
> >> Cc: H. 

Re: [PATCH] x86: Add check for number of available vectors before CPU down [v7]

2014-01-13 Thread Ingo Molnar

* Prarit Bhargava pra...@redhat.com wrote:

 
 
 On 01/12/2014 04:56 AM, Ingo Molnar wrote:
  
  * Prarit Bhargava pra...@redhat.com wrote:
  
  Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791
 
  When a cpu is downed on a system, the irqs on the cpu are assigned to
  other cpus.  It is possible, however, that when a cpu is downed there
  aren't enough free vectors on the remaining cpus to account for the
  vectors from the cpu that is being downed.
 
  This results in an interesting overflow condition where irqs are
  assigned to a CPU but are not handled.
 
  For example, when downing cpus on a 1-64 logical processor system:
 
  snip
  [  232.021745] smpboot: CPU 61 is now offline
  [  238.480275] smpboot: CPU 62 is now offline
  [  245.991080] [ cut here ]
  [  245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 
  dev_watchdog+0x246/0x250()
  [  246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out
  [  246.013070] Modules linked in: lockd sunrpc iTCO_wdt 
  iTCO_vendor_support sb_edac ixgbe microcode e1000e pcspkr joydev edac_core 
  lpc_ich ioatdma ptp mdio mfd_core i2c_i801 dca pps_core i2c_core wmi 
  acpi_cpufreq isci libsas scsi_transport_sas
  [  246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14
  [  246.044451] Hardware name: Intel Corporation S4600LH 
  ../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
  [  246.057371]  0009 88081fa03d40 8164fbf6 
  88081fa0ee48
  [  246.065728]  88081fa03d90 88081fa03d80 81054ecc 
  88081fa13040
  [  246.074073]   88200cce 0040 
  
  [  246.082430] Call Trace:
  [  246.085174]  IRQ  [8164fbf6] dump_stack+0x46/0x58
  [  246.091633]  [81054ecc] warn_slowpath_common+0x8c/0xc0
  [  246.098352]  [81054fb6] warn_slowpath_fmt+0x46/0x50
  [  246.104786]  [815710d6] dev_watchdog+0x246/0x250
  [  246.110923]  [81570e90] ? 
  dev_deactivate_queue.constprop.31+0x80/0x80
  [  246.119097]  [8106092a] call_timer_fn+0x3a/0x110
  [  246.125224]  [8106280f] ? update_process_times+0x6f/0x80
  [  246.132137]  [81570e90] ? 
  dev_deactivate_queue.constprop.31+0x80/0x80
  [  246.140308]  [81061db0] run_timer_softirq+0x1f0/0x2a0
  [  246.146933]  [81059a80] __do_softirq+0xe0/0x220
  [  246.152976]  [8165fedc] call_softirq+0x1c/0x30
  [  246.158920]  [810045f5] do_softirq+0x55/0x90
  [  246.164670]  [81059d35] irq_exit+0xa5/0xb0
  [  246.170227]  [8166062a] smp_apic_timer_interrupt+0x4a/0x60
  [  246.177324]  [8165f40a] apic_timer_interrupt+0x6a/0x70
  [  246.184041]  EOI  [81505a1b] ? cpuidle_enter_state+0x5b/0xe0
  [  246.191559]  [81505a17] ? cpuidle_enter_state+0x57/0xe0
  [  246.198374]  [81505b5d] cpuidle_idle_call+0xbd/0x200
  [  246.204900]  [8100b7ae] arch_cpu_idle+0xe/0x30
  [  246.210846]  [810a47b0] cpu_startup_entry+0xd0/0x250
  [  246.217371]  [81646b47] rest_init+0x77/0x80
  [  246.223028]  [81d09e8e] start_kernel+0x3ee/0x3fb
  [  246.229165]  [81d0989f] ? repair_env_string+0x5e/0x5e
  [  246.235787]  [81d095a5] x86_64_start_reservations+0x2a/0x2c
  [  246.242990]  [81d0969f] x86_64_start_kernel+0xf8/0xfc
  [  246.249610] ---[ end trace fb74fdef54d79039 ]---
  [  246.254807] ixgbe :c2:00.0 p786p1: initiating reset due to tx 
  timeout
  [  246.262489] ixgbe :c2:00.0 p786p1: Reset adapter
  Last login: Mon Nov 11 08:35:14 from 10.18.17.119
  [root@(none) ~]# [  246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5
  [  249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow 
  Control: RX/TX
  [  246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5
  [  249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow 
  Control: RX/TX
 
  (last lines keep repeating.  ixgbe driver is dead until module reload.)
 
  If the downed cpu has more vectors than are free on the remaining cpus on 
  the
  system, it is possible that some vectors are orphaned even though they 
  are
  assigned to a cpu.  In this case, since the ixgbe driver had a watchdog, 
  the
  watchdog fired and notified that something was wrong.
 
  This patch adds a function, check_vectors(), to compare the number of 
  vectors
  on the CPU going down and compares it to the number of vectors available on
  the system.  If there aren't enough vectors for the CPU to go down, an
  error is returned and propogated back to userspace.
 
  v2: Do not need to look at percpu irqs
  v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest
  Priority Mode
  v4: Additional changes suggested by Gong Chen.
  v5/v6/v7: Updated comment text
 
  Signed-off-by: Prarit Bhargava pra...@redhat.com
  Reviewed-by: Gong Chen gong.c...@linux.intel.com
  Cc: Andi Kleen 

Re: [PATCH] x86: Add check for number of available vectors before CPU down [v7]

2014-01-12 Thread Prarit Bhargava


On 01/12/2014 04:56 AM, Ingo Molnar wrote:
> 
> * Prarit Bhargava  wrote:
> 
>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791
>>
>> When a cpu is downed on a system, the irqs on the cpu are assigned to
>> other cpus.  It is possible, however, that when a cpu is downed there
>> aren't enough free vectors on the remaining cpus to account for the
>> vectors from the cpu that is being downed.
>>
>> This results in an interesting "overflow" condition where irqs are
>> "assigned" to a CPU but are not handled.
>>
>> For example, when downing cpus on a 1-64 logical processor system:
>>
>> 
>> [  232.021745] smpboot: CPU 61 is now offline
>> [  238.480275] smpboot: CPU 62 is now offline
>> [  245.991080] [ cut here ]
>> [  245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 
>> dev_watchdog+0x246/0x250()
>> [  246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out
>> [  246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support 
>> sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp 
>> mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas 
>> scsi_transport_sas
>> [  246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14
>> [  246.044451] Hardware name: Intel Corporation S4600LH 
>> ../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
>> [  246.057371]  0009 88081fa03d40 8164fbf6 
>> 88081fa0ee48
>> [  246.065728]  88081fa03d90 88081fa03d80 81054ecc 
>> 88081fa13040
>> [  246.074073]   88200cce 0040 
>> 
>> [  246.082430] Call Trace:
>> [  246.085174][] dump_stack+0x46/0x58
>> [  246.091633]  [] warn_slowpath_common+0x8c/0xc0
>> [  246.098352]  [] warn_slowpath_fmt+0x46/0x50
>> [  246.104786]  [] dev_watchdog+0x246/0x250
>> [  246.110923]  [] ? 
>> dev_deactivate_queue.constprop.31+0x80/0x80
>> [  246.119097]  [] call_timer_fn+0x3a/0x110
>> [  246.125224]  [] ? update_process_times+0x6f/0x80
>> [  246.132137]  [] ? 
>> dev_deactivate_queue.constprop.31+0x80/0x80
>> [  246.140308]  [] run_timer_softirq+0x1f0/0x2a0
>> [  246.146933]  [] __do_softirq+0xe0/0x220
>> [  246.152976]  [] call_softirq+0x1c/0x30
>> [  246.158920]  [] do_softirq+0x55/0x90
>> [  246.164670]  [] irq_exit+0xa5/0xb0
>> [  246.170227]  [] smp_apic_timer_interrupt+0x4a/0x60
>> [  246.177324]  [] apic_timer_interrupt+0x6a/0x70
>> [  246.184041][] ? cpuidle_enter_state+0x5b/0xe0
>> [  246.191559]  [] ? cpuidle_enter_state+0x57/0xe0
>> [  246.198374]  [] cpuidle_idle_call+0xbd/0x200
>> [  246.204900]  [] arch_cpu_idle+0xe/0x30
>> [  246.210846]  [] cpu_startup_entry+0xd0/0x250
>> [  246.217371]  [] rest_init+0x77/0x80
>> [  246.223028]  [] start_kernel+0x3ee/0x3fb
>> [  246.229165]  [] ? repair_env_string+0x5e/0x5e
>> [  246.235787]  [] x86_64_start_reservations+0x2a/0x2c
>> [  246.242990]  [] x86_64_start_kernel+0xf8/0xfc
>> [  246.249610] ---[ end trace fb74fdef54d79039 ]---
>> [  246.254807] ixgbe :c2:00.0 p786p1: initiating reset due to tx timeout
>> [  246.262489] ixgbe :c2:00.0 p786p1: Reset adapter
>> Last login: Mon Nov 11 08:35:14 from 10.18.17.119
>> [root@(none) ~]# [  246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5
>> [  249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow 
>> Control: RX/TX
>> [  246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5
>> [  249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow 
>> Control: RX/TX
>>
>> (last lines keep repeating.  ixgbe driver is dead until module reload.)
>>
>> If the downed cpu has more vectors than are free on the remaining cpus on the
>> system, it is possible that some vectors are "orphaned" even though they are
>> assigned to a cpu.  In this case, since the ixgbe driver had a watchdog, the
>> watchdog fired and notified that something was wrong.
>>
>> This patch adds a function, check_vectors(), to compare the number of vectors
>> on the CPU going down and compares it to the number of vectors available on
>> the system.  If there aren't enough vectors for the CPU to go down, an
>> error is returned and propogated back to userspace.
>>
>> v2: Do not need to look at percpu irqs
>> v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest
>> Priority Mode
>> v4: Additional changes suggested by Gong Chen.
>> v5/v6/v7: Updated comment text
>>
>> Signed-off-by: Prarit Bhargava 
>> Reviewed-by: Gong Chen 
>> Cc: Andi Kleen 
>> Cc: Michel Lespinasse 
>> Cc: Seiji Aguchi 
>> Cc: Yang Zhang 
>> Cc: Paul Gortmaker 
>> Cc: Janet Morgan 
>> Cc: Tony Luck 
>> Cc: Ruiv Wang 
>> Cc: Gong Chen 
>> Cc: H. Peter Anvin 
>> Cc: Gong Chen 
>> Cc: x...@kernel.org
>> ---
>>  arch/x86/include/asm/irq.h |1 +
>>  arch/x86/kernel/irq.c  |   69 
>> 
>>  arch/x86/kernel/smpboot.c  |6 
>>  3 files changed, 76 insertions(+)
>>

Re: [PATCH] x86: Add check for number of available vectors before CPU down [v7]

2014-01-12 Thread Ingo Molnar

* Prarit Bhargava  wrote:

> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791
> 
> When a cpu is downed on a system, the irqs on the cpu are assigned to
> other cpus.  It is possible, however, that when a cpu is downed there
> aren't enough free vectors on the remaining cpus to account for the
> vectors from the cpu that is being downed.
> 
> This results in an interesting "overflow" condition where irqs are
> "assigned" to a CPU but are not handled.
> 
> For example, when downing cpus on a 1-64 logical processor system:
> 
> 
> [  232.021745] smpboot: CPU 61 is now offline
> [  238.480275] smpboot: CPU 62 is now offline
> [  245.991080] [ cut here ]
> [  245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 
> dev_watchdog+0x246/0x250()
> [  246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out
> [  246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support 
> sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp 
> mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas 
> scsi_transport_sas
> [  246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14
> [  246.044451] Hardware name: Intel Corporation S4600LH 
> ../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
> [  246.057371]  0009 88081fa03d40 8164fbf6 
> 88081fa0ee48
> [  246.065728]  88081fa03d90 88081fa03d80 81054ecc 
> 88081fa13040
> [  246.074073]   88200cce 0040 
> 
> [  246.082430] Call Trace:
> [  246.085174][] dump_stack+0x46/0x58
> [  246.091633]  [] warn_slowpath_common+0x8c/0xc0
> [  246.098352]  [] warn_slowpath_fmt+0x46/0x50
> [  246.104786]  [] dev_watchdog+0x246/0x250
> [  246.110923]  [] ? 
> dev_deactivate_queue.constprop.31+0x80/0x80
> [  246.119097]  [] call_timer_fn+0x3a/0x110
> [  246.125224]  [] ? update_process_times+0x6f/0x80
> [  246.132137]  [] ? 
> dev_deactivate_queue.constprop.31+0x80/0x80
> [  246.140308]  [] run_timer_softirq+0x1f0/0x2a0
> [  246.146933]  [] __do_softirq+0xe0/0x220
> [  246.152976]  [] call_softirq+0x1c/0x30
> [  246.158920]  [] do_softirq+0x55/0x90
> [  246.164670]  [] irq_exit+0xa5/0xb0
> [  246.170227]  [] smp_apic_timer_interrupt+0x4a/0x60
> [  246.177324]  [] apic_timer_interrupt+0x6a/0x70
> [  246.184041][] ? cpuidle_enter_state+0x5b/0xe0
> [  246.191559]  [] ? cpuidle_enter_state+0x57/0xe0
> [  246.198374]  [] cpuidle_idle_call+0xbd/0x200
> [  246.204900]  [] arch_cpu_idle+0xe/0x30
> [  246.210846]  [] cpu_startup_entry+0xd0/0x250
> [  246.217371]  [] rest_init+0x77/0x80
> [  246.223028]  [] start_kernel+0x3ee/0x3fb
> [  246.229165]  [] ? repair_env_string+0x5e/0x5e
> [  246.235787]  [] x86_64_start_reservations+0x2a/0x2c
> [  246.242990]  [] x86_64_start_kernel+0xf8/0xfc
> [  246.249610] ---[ end trace fb74fdef54d79039 ]---
> [  246.254807] ixgbe :c2:00.0 p786p1: initiating reset due to tx timeout
> [  246.262489] ixgbe :c2:00.0 p786p1: Reset adapter
> Last login: Mon Nov 11 08:35:14 from 10.18.17.119
> [root@(none) ~]# [  246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5
> [  249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow 
> Control: RX/TX
> [  246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5
> [  249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow 
> Control: RX/TX
> 
> (last lines keep repeating.  ixgbe driver is dead until module reload.)
> 
> If the downed cpu has more vectors than are free on the remaining cpus on the
> system, it is possible that some vectors are "orphaned" even though they are
> assigned to a cpu.  In this case, since the ixgbe driver had a watchdog, the
> watchdog fired and notified that something was wrong.
> 
> This patch adds a function, check_vectors(), to compare the number of vectors
> on the CPU going down and compares it to the number of vectors available on
> the system.  If there aren't enough vectors for the CPU to go down, an
> error is returned and propogated back to userspace.
> 
> v2: Do not need to look at percpu irqs
> v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest
> Priority Mode
> v4: Additional changes suggested by Gong Chen.
> v5/v6/v7: Updated comment text
> 
> Signed-off-by: Prarit Bhargava 
> Reviewed-by: Gong Chen 
> Cc: Andi Kleen 
> Cc: Michel Lespinasse 
> Cc: Seiji Aguchi 
> Cc: Yang Zhang 
> Cc: Paul Gortmaker 
> Cc: Janet Morgan 
> Cc: Tony Luck 
> Cc: Ruiv Wang 
> Cc: Gong Chen 
> Cc: H. Peter Anvin 
> Cc: Gong Chen 
> Cc: x...@kernel.org
> ---
>  arch/x86/include/asm/irq.h |1 +
>  arch/x86/kernel/irq.c  |   69 
> 
>  arch/x86/kernel/smpboot.c  |6 
>  3 files changed, 76 insertions(+)
> 
> diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h
> index 0ea10f27..cb6cfcd 100644
> --- a/arch/x86/include/asm/irq.h
> +++ 

Re: [PATCH] x86: Add check for number of available vectors before CPU down [v7]

2014-01-12 Thread Ingo Molnar

* Prarit Bhargava pra...@redhat.com wrote:

 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791
 
 When a cpu is downed on a system, the irqs on the cpu are assigned to
 other cpus.  It is possible, however, that when a cpu is downed there
 aren't enough free vectors on the remaining cpus to account for the
 vectors from the cpu that is being downed.
 
 This results in an interesting overflow condition where irqs are
 assigned to a CPU but are not handled.
 
 For example, when downing cpus on a 1-64 logical processor system:
 
 snip
 [  232.021745] smpboot: CPU 61 is now offline
 [  238.480275] smpboot: CPU 62 is now offline
 [  245.991080] [ cut here ]
 [  245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 
 dev_watchdog+0x246/0x250()
 [  246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out
 [  246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support 
 sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp 
 mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas 
 scsi_transport_sas
 [  246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14
 [  246.044451] Hardware name: Intel Corporation S4600LH 
 ../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
 [  246.057371]  0009 88081fa03d40 8164fbf6 
 88081fa0ee48
 [  246.065728]  88081fa03d90 88081fa03d80 81054ecc 
 88081fa13040
 [  246.074073]   88200cce 0040 
 
 [  246.082430] Call Trace:
 [  246.085174]  IRQ  [8164fbf6] dump_stack+0x46/0x58
 [  246.091633]  [81054ecc] warn_slowpath_common+0x8c/0xc0
 [  246.098352]  [81054fb6] warn_slowpath_fmt+0x46/0x50
 [  246.104786]  [815710d6] dev_watchdog+0x246/0x250
 [  246.110923]  [81570e90] ? 
 dev_deactivate_queue.constprop.31+0x80/0x80
 [  246.119097]  [8106092a] call_timer_fn+0x3a/0x110
 [  246.125224]  [8106280f] ? update_process_times+0x6f/0x80
 [  246.132137]  [81570e90] ? 
 dev_deactivate_queue.constprop.31+0x80/0x80
 [  246.140308]  [81061db0] run_timer_softirq+0x1f0/0x2a0
 [  246.146933]  [81059a80] __do_softirq+0xe0/0x220
 [  246.152976]  [8165fedc] call_softirq+0x1c/0x30
 [  246.158920]  [810045f5] do_softirq+0x55/0x90
 [  246.164670]  [81059d35] irq_exit+0xa5/0xb0
 [  246.170227]  [8166062a] smp_apic_timer_interrupt+0x4a/0x60
 [  246.177324]  [8165f40a] apic_timer_interrupt+0x6a/0x70
 [  246.184041]  EOI  [81505a1b] ? cpuidle_enter_state+0x5b/0xe0
 [  246.191559]  [81505a17] ? cpuidle_enter_state+0x57/0xe0
 [  246.198374]  [81505b5d] cpuidle_idle_call+0xbd/0x200
 [  246.204900]  [8100b7ae] arch_cpu_idle+0xe/0x30
 [  246.210846]  [810a47b0] cpu_startup_entry+0xd0/0x250
 [  246.217371]  [81646b47] rest_init+0x77/0x80
 [  246.223028]  [81d09e8e] start_kernel+0x3ee/0x3fb
 [  246.229165]  [81d0989f] ? repair_env_string+0x5e/0x5e
 [  246.235787]  [81d095a5] x86_64_start_reservations+0x2a/0x2c
 [  246.242990]  [81d0969f] x86_64_start_kernel+0xf8/0xfc
 [  246.249610] ---[ end trace fb74fdef54d79039 ]---
 [  246.254807] ixgbe :c2:00.0 p786p1: initiating reset due to tx timeout
 [  246.262489] ixgbe :c2:00.0 p786p1: Reset adapter
 Last login: Mon Nov 11 08:35:14 from 10.18.17.119
 [root@(none) ~]# [  246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5
 [  249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow 
 Control: RX/TX
 [  246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5
 [  249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow 
 Control: RX/TX
 
 (last lines keep repeating.  ixgbe driver is dead until module reload.)
 
 If the downed cpu has more vectors than are free on the remaining cpus on the
 system, it is possible that some vectors are orphaned even though they are
 assigned to a cpu.  In this case, since the ixgbe driver had a watchdog, the
 watchdog fired and notified that something was wrong.
 
 This patch adds a function, check_vectors(), to compare the number of vectors
 on the CPU going down and compares it to the number of vectors available on
 the system.  If there aren't enough vectors for the CPU to go down, an
 error is returned and propogated back to userspace.
 
 v2: Do not need to look at percpu irqs
 v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest
 Priority Mode
 v4: Additional changes suggested by Gong Chen.
 v5/v6/v7: Updated comment text
 
 Signed-off-by: Prarit Bhargava pra...@redhat.com
 Reviewed-by: Gong Chen gong.c...@linux.intel.com
 Cc: Andi Kleen a...@linux.intel.com
 Cc: Michel Lespinasse wal...@google.com
 Cc: Seiji Aguchi seiji.agu...@hds.com
 Cc: Yang Zhang yang.z.zh...@intel.com
 Cc: Paul Gortmaker paul.gortma...@windriver.com
 Cc: Janet Morgan 

Re: [PATCH] x86: Add check for number of available vectors before CPU down [v7]

2014-01-12 Thread Prarit Bhargava


On 01/12/2014 04:56 AM, Ingo Molnar wrote:
 
 * Prarit Bhargava pra...@redhat.com wrote:
 
 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791

 When a cpu is downed on a system, the irqs on the cpu are assigned to
 other cpus.  It is possible, however, that when a cpu is downed there
 aren't enough free vectors on the remaining cpus to account for the
 vectors from the cpu that is being downed.

 This results in an interesting overflow condition where irqs are
 assigned to a CPU but are not handled.

 For example, when downing cpus on a 1-64 logical processor system:

 snip
 [  232.021745] smpboot: CPU 61 is now offline
 [  238.480275] smpboot: CPU 62 is now offline
 [  245.991080] [ cut here ]
 [  245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 
 dev_watchdog+0x246/0x250()
 [  246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out
 [  246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support 
 sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp 
 mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas 
 scsi_transport_sas
 [  246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14
 [  246.044451] Hardware name: Intel Corporation S4600LH 
 ../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
 [  246.057371]  0009 88081fa03d40 8164fbf6 
 88081fa0ee48
 [  246.065728]  88081fa03d90 88081fa03d80 81054ecc 
 88081fa13040
 [  246.074073]   88200cce 0040 
 
 [  246.082430] Call Trace:
 [  246.085174]  IRQ  [8164fbf6] dump_stack+0x46/0x58
 [  246.091633]  [81054ecc] warn_slowpath_common+0x8c/0xc0
 [  246.098352]  [81054fb6] warn_slowpath_fmt+0x46/0x50
 [  246.104786]  [815710d6] dev_watchdog+0x246/0x250
 [  246.110923]  [81570e90] ? 
 dev_deactivate_queue.constprop.31+0x80/0x80
 [  246.119097]  [8106092a] call_timer_fn+0x3a/0x110
 [  246.125224]  [8106280f] ? update_process_times+0x6f/0x80
 [  246.132137]  [81570e90] ? 
 dev_deactivate_queue.constprop.31+0x80/0x80
 [  246.140308]  [81061db0] run_timer_softirq+0x1f0/0x2a0
 [  246.146933]  [81059a80] __do_softirq+0xe0/0x220
 [  246.152976]  [8165fedc] call_softirq+0x1c/0x30
 [  246.158920]  [810045f5] do_softirq+0x55/0x90
 [  246.164670]  [81059d35] irq_exit+0xa5/0xb0
 [  246.170227]  [8166062a] smp_apic_timer_interrupt+0x4a/0x60
 [  246.177324]  [8165f40a] apic_timer_interrupt+0x6a/0x70
 [  246.184041]  EOI  [81505a1b] ? cpuidle_enter_state+0x5b/0xe0
 [  246.191559]  [81505a17] ? cpuidle_enter_state+0x57/0xe0
 [  246.198374]  [81505b5d] cpuidle_idle_call+0xbd/0x200
 [  246.204900]  [8100b7ae] arch_cpu_idle+0xe/0x30
 [  246.210846]  [810a47b0] cpu_startup_entry+0xd0/0x250
 [  246.217371]  [81646b47] rest_init+0x77/0x80
 [  246.223028]  [81d09e8e] start_kernel+0x3ee/0x3fb
 [  246.229165]  [81d0989f] ? repair_env_string+0x5e/0x5e
 [  246.235787]  [81d095a5] x86_64_start_reservations+0x2a/0x2c
 [  246.242990]  [81d0969f] x86_64_start_kernel+0xf8/0xfc
 [  246.249610] ---[ end trace fb74fdef54d79039 ]---
 [  246.254807] ixgbe :c2:00.0 p786p1: initiating reset due to tx timeout
 [  246.262489] ixgbe :c2:00.0 p786p1: Reset adapter
 Last login: Mon Nov 11 08:35:14 from 10.18.17.119
 [root@(none) ~]# [  246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5
 [  249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow 
 Control: RX/TX
 [  246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5
 [  249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow 
 Control: RX/TX

 (last lines keep repeating.  ixgbe driver is dead until module reload.)

 If the downed cpu has more vectors than are free on the remaining cpus on the
 system, it is possible that some vectors are orphaned even though they are
 assigned to a cpu.  In this case, since the ixgbe driver had a watchdog, the
 watchdog fired and notified that something was wrong.

 This patch adds a function, check_vectors(), to compare the number of vectors
 on the CPU going down and compares it to the number of vectors available on
 the system.  If there aren't enough vectors for the CPU to go down, an
 error is returned and propogated back to userspace.

 v2: Do not need to look at percpu irqs
 v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest
 Priority Mode
 v4: Additional changes suggested by Gong Chen.
 v5/v6/v7: Updated comment text

 Signed-off-by: Prarit Bhargava pra...@redhat.com
 Reviewed-by: Gong Chen gong.c...@linux.intel.com
 Cc: Andi Kleen a...@linux.intel.com
 Cc: Michel Lespinasse wal...@google.com
 Cc: Seiji Aguchi seiji.agu...@hds.com
 Cc: Yang Zhang yang.z.zh...@intel.com
 Cc: Paul Gortmaker 

Re: [PATCH] x86: Add check for number of available vectors before CPU down [v6]

2014-01-07 Thread Prarit Bhargava


On 01/07/2014 12:54 PM, Luck, Tony wrote:
> + for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
> + irq = __this_cpu_read(vector_irq[vector]);
> + if (irq >= 0) {
> + desc = irq_to_desc(irq);
> + data = irq_desc_get_irq_data(desc);
> + cpumask_copy(_new, data->affinity);
> + cpu_clear(this_cpu, affinity_new);
> + /*
> +  * The check below determines if this irq requires
> +  * an empty vector_irq[irq] entry on an online
> +  * cpu.
> +  *
> +  * The code only counts active non-percpu irqs, and
> +  * those irqs that are not linked to on an online cpu.
> +  * The first test is trivial, the second is not.
> +  *
> +  * The second test takes into account the
> +  * account that a single irq may be mapped to multiple
> +  * cpu's vector_irq[] (for example IOAPIC cluster
> +  * mode).  In this case we have two
> +  * possibilities:
> +  *
> +  * 1) the resulting affinity mask is empty; that is
> +  * this the down'd cpu is the last cpu in the irq's
> +  * affinity mask, and
> Code says "||" - so I think comment should say "or".
> +  *
> +  * 2) the resulting affinity mask is no longer
> +  * a subset of the online cpus but the affinity
> +  * mask is not zero; that is the down'd cpu is the
> +  * last online cpu in a user set affinity mask.
> +  *
> +  * In both possibilities, we need to remap the irq
> +  * to a new vector_irq[].
> +  *
> +  */
> + if (irq_has_action(irq) && !irqd_is_per_cpu(data) &&
> + (cpumask_empty(_new) ||
> +  !cpumask_subset(_new, _new)))
> + this_count++;
> + }
> 
> That's an impressive 6:1 ratio of lines-of-comment to lines-of-code!

Heh -- I couldn't decide if I should keep it all together in one comment or
divide it up.  I guess it does look less scary if I divide it up.  So how about
(sorry for the cut-and-paste)


for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
irq = __this_cpu_read(vector_irq[vector]);
if (irq >= 0) {
desc = irq_to_desc(irq);
data = irq_desc_get_irq_data(desc);
cpumask_copy(_new, data->affinity);
cpu_clear(this_cpu, affinity_new);

/* Do not count inactive or per-cpu irqs. */
if (!irq_has_action(irq) || irqd_is_per_cpu(data))
continue;

/*
 * A single irq may be mapped to multiple
 * cpu's vector_irq[] (for example IOAPIC cluster
 * mode).  In this case we have two
 * possibilities:
 *
 * 1) the resulting affinity mask is empty; that is
 * this the down'd cpu is the last cpu in the irq's
 * affinity mask, or
 *
 * 2) the resulting affinity mask is no longer
 * a subset of the online cpus but the affinity
 * mask is not zero; that is the down'd cpu is the
 * last online cpu in a user set affinity mask.
 */
if (cpumask_empty(_new) ||
!cpumask_subset(_new, _new))
this_count++;
}
}


Everyone okay with that?

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] x86: Add check for number of available vectors before CPU down [v6]

2014-01-07 Thread Luck, Tony
+   for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
+   irq = __this_cpu_read(vector_irq[vector]);
+   if (irq >= 0) {
+   desc = irq_to_desc(irq);
+   data = irq_desc_get_irq_data(desc);
+   cpumask_copy(_new, data->affinity);
+   cpu_clear(this_cpu, affinity_new);
+   /*
+* The check below determines if this irq requires
+* an empty vector_irq[irq] entry on an online
+* cpu.
+*
+* The code only counts active non-percpu irqs, and
+* those irqs that are not linked to on an online cpu.
+* The first test is trivial, the second is not.
+*
+* The second test takes into account the
+* account that a single irq may be mapped to multiple
+* cpu's vector_irq[] (for example IOAPIC cluster
+* mode).  In this case we have two
+* possibilities:
+*
+* 1) the resulting affinity mask is empty; that is
+* this the down'd cpu is the last cpu in the irq's
+* affinity mask, and
Code says "||" - so I think comment should say "or".
+*
+* 2) the resulting affinity mask is no longer
+* a subset of the online cpus but the affinity
+* mask is not zero; that is the down'd cpu is the
+* last online cpu in a user set affinity mask.
+*
+* In both possibilities, we need to remap the irq
+* to a new vector_irq[].
+*
+*/
+   if (irq_has_action(irq) && !irqd_is_per_cpu(data) &&
+   (cpumask_empty(_new) ||
+!cpumask_subset(_new, _new)))
+   this_count++;
+   }

That's an impressive 6:1 ratio of lines-of-comment to lines-of-code!

Perhaps it would be less scary if the test were broken up into the easy/obvious 
part
and the one that has taken all these revisions to work out?  E.g.

/* no need to count inactive or per-cpu irqs */
if (!irq_has_action(irq) || irqd_is_per_cpu(data))
continue;

/*
  * We need to look for a new home for this irq if:
... paste in 1), 2) from above here ... (but 
s/and/or/ to match code)
 */
if (cpumask_empty(_new) ||
!cpumask_subset(_new, _new))
this_count++;

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] x86: Add check for number of available vectors before CPU down [v6]

2014-01-07 Thread Luck, Tony
+   for (vector = FIRST_EXTERNAL_VECTOR; vector  NR_VECTORS; vector++) {
+   irq = __this_cpu_read(vector_irq[vector]);
+   if (irq = 0) {
+   desc = irq_to_desc(irq);
+   data = irq_desc_get_irq_data(desc);
+   cpumask_copy(affinity_new, data-affinity);
+   cpu_clear(this_cpu, affinity_new);
+   /*
+* The check below determines if this irq requires
+* an empty vector_irq[irq] entry on an online
+* cpu.
+*
+* The code only counts active non-percpu irqs, and
+* those irqs that are not linked to on an online cpu.
+* The first test is trivial, the second is not.
+*
+* The second test takes into account the
+* account that a single irq may be mapped to multiple
+* cpu's vector_irq[] (for example IOAPIC cluster
+* mode).  In this case we have two
+* possibilities:
+*
+* 1) the resulting affinity mask is empty; that is
+* this the down'd cpu is the last cpu in the irq's
+* affinity mask, and
Code says || - so I think comment should say or.
+*
+* 2) the resulting affinity mask is no longer
+* a subset of the online cpus but the affinity
+* mask is not zero; that is the down'd cpu is the
+* last online cpu in a user set affinity mask.
+*
+* In both possibilities, we need to remap the irq
+* to a new vector_irq[].
+*
+*/
+   if (irq_has_action(irq)  !irqd_is_per_cpu(data) 
+   (cpumask_empty(affinity_new) ||
+!cpumask_subset(affinity_new, online_new)))
+   this_count++;
+   }

That's an impressive 6:1 ratio of lines-of-comment to lines-of-code!

Perhaps it would be less scary if the test were broken up into the easy/obvious 
part
and the one that has taken all these revisions to work out?  E.g.

/* no need to count inactive or per-cpu irqs */
if (!irq_has_action(irq) || irqd_is_per_cpu(data))
continue;

/*
  * We need to look for a new home for this irq if:
... paste in 1), 2) from above here ... (but 
s/and/or/ to match code)
 */
if (cpumask_empty(affinity_new) ||
!cpumask_subset(affinity_new, online_new))
this_count++;

-Tony
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v6]

2014-01-07 Thread Prarit Bhargava


On 01/07/2014 12:54 PM, Luck, Tony wrote:
 + for (vector = FIRST_EXTERNAL_VECTOR; vector  NR_VECTORS; vector++) {
 + irq = __this_cpu_read(vector_irq[vector]);
 + if (irq = 0) {
 + desc = irq_to_desc(irq);
 + data = irq_desc_get_irq_data(desc);
 + cpumask_copy(affinity_new, data-affinity);
 + cpu_clear(this_cpu, affinity_new);
 + /*
 +  * The check below determines if this irq requires
 +  * an empty vector_irq[irq] entry on an online
 +  * cpu.
 +  *
 +  * The code only counts active non-percpu irqs, and
 +  * those irqs that are not linked to on an online cpu.
 +  * The first test is trivial, the second is not.
 +  *
 +  * The second test takes into account the
 +  * account that a single irq may be mapped to multiple
 +  * cpu's vector_irq[] (for example IOAPIC cluster
 +  * mode).  In this case we have two
 +  * possibilities:
 +  *
 +  * 1) the resulting affinity mask is empty; that is
 +  * this the down'd cpu is the last cpu in the irq's
 +  * affinity mask, and
 Code says || - so I think comment should say or.
 +  *
 +  * 2) the resulting affinity mask is no longer
 +  * a subset of the online cpus but the affinity
 +  * mask is not zero; that is the down'd cpu is the
 +  * last online cpu in a user set affinity mask.
 +  *
 +  * In both possibilities, we need to remap the irq
 +  * to a new vector_irq[].
 +  *
 +  */
 + if (irq_has_action(irq)  !irqd_is_per_cpu(data) 
 + (cpumask_empty(affinity_new) ||
 +  !cpumask_subset(affinity_new, online_new)))
 + this_count++;
 + }
 
 That's an impressive 6:1 ratio of lines-of-comment to lines-of-code!

Heh -- I couldn't decide if I should keep it all together in one comment or
divide it up.  I guess it does look less scary if I divide it up.  So how about
(sorry for the cut-and-paste)


for (vector = FIRST_EXTERNAL_VECTOR; vector  NR_VECTORS; vector++) {
irq = __this_cpu_read(vector_irq[vector]);
if (irq = 0) {
desc = irq_to_desc(irq);
data = irq_desc_get_irq_data(desc);
cpumask_copy(affinity_new, data-affinity);
cpu_clear(this_cpu, affinity_new);

/* Do not count inactive or per-cpu irqs. */
if (!irq_has_action(irq) || irqd_is_per_cpu(data))
continue;

/*
 * A single irq may be mapped to multiple
 * cpu's vector_irq[] (for example IOAPIC cluster
 * mode).  In this case we have two
 * possibilities:
 *
 * 1) the resulting affinity mask is empty; that is
 * this the down'd cpu is the last cpu in the irq's
 * affinity mask, or
 *
 * 2) the resulting affinity mask is no longer
 * a subset of the online cpus but the affinity
 * mask is not zero; that is the down'd cpu is the
 * last online cpu in a user set affinity mask.
 */
if (cpumask_empty(affinity_new) ||
!cpumask_subset(affinity_new, online_new))
this_count++;
}
}


Everyone okay with that?

P.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v4]

2014-01-02 Thread Chen, Gong
Add some nitpicks below.

Reviewed-by: Chen, Gong 

On Thu, Jan 02, 2014 at 07:47:24PM -0500, Prarit Bhargava wrote:
> +int check_irq_vectors_for_cpu_disable(void)
> +{
> + int irq, cpu;
> + unsigned int vector, this_count, count;
> + struct irq_desc *desc;
> + struct irq_data *data;
> + struct cpumask affinity_new, online_new;
> +
> + cpumask_copy(_new, cpu_online_mask);
> + cpu_clear(smp_processor_id(), online_new);
I notice that you use smp_processor_id() many times. Maybe we can
save it first for speed.

> +
> + this_count = 0;
> + for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
> + irq = __this_cpu_read(vector_irq[vector]);
> + if (irq >= 0) {
> + desc = irq_to_desc(irq);
> + data = irq_desc_get_irq_data(desc);
> + cpumask_copy(_new, data->affinity);
> + cpu_clear(smp_processor_id(), affinity_new);
> + /*
> +  * Only count active non-percpu irqs, and those
> +  * irqs that are not linked to on an online cpu; in
> +  * fixup_irqs(), chip->irq_set_affinity() will be
> +  * called which will set vector_irq[irq] for each
> +  * cpu.
> +  */
> + if (irq_has_action(irq) && !irqd_is_per_cpu(data) &&
> + (cpumask_empty(_new) ||
> +  !cpumask_subset(_new, _new)))
> + this_count++;
Would you please add some extra comments to describe these two different
conditions that cpumask is empty and non-empty. I feel it is a little bit
sutble and I don't expect I'm confused by myself one day :-).



signature.asc
Description: Digital signature


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2014-01-02 Thread Prarit Bhargava


On 01/02/2014 07:57 AM, Prarit Bhargava wrote:
> 
> 
> On 01/01/2014 09:41 PM, Chen, Gong wrote:
>> On Tue, Dec 31, 2013 at 04:22:09PM -0500, Prarit Bhargava wrote:
>>> Okay, how about,
>>> if (irq_has_action(irq) && !irqd_is_per_cpu(data) &&
>>> ((!cpumask_empty(_new)) &&
>>>   !cpumask_subset(_new, _new)) 
>>> ||
>>>  cpumask_empty(_new))
>>> this_count++;
>>>
>> I think it is good but a little bit complicated. How about this:
>>
>> if (irq_has_action(irq) && !irqd_is_per_cpu(data) &&
>> /* add some commments to emphysize the importance of turn */
>> (cpumask_empty(_new) ||
>> !cpumask_subset(_new, _new)))
> 
> Yeah :)  I thought of that after I sent it. :)
> 
>>
>>> I tried this with the following examples and AFAICT I get the correct 
>>> result:
>>>
>>> 1) affinity mask = online mask = 0xf.  CPU 3 (1000b) is down'd.
>>>
>>> this_count is not incremented.
>>>
>>> 2) affinity mask is a non-zero subset of the online mask (which IMO is
>>> the "typical" case).  For example, affinity_mask = 0x9, online mask = 0xf.  
>>> CPU
>>> 3 is again down'd.
>>>
>>> this_count is not incremented.
>>>
>>> 3) affinity_mask = 0x1, online mask = 0x3. (this is your example).  CPU
>>> 1 is going down.
>>>
>>> this_count is incremented, as the resulting affinity mask will be 0.
>>>
>>> 4) affinity_mask = 0x0, online mask = 0x7.  CPU 1 is going down.
>>>
>>> this_count is incremented, as the affinity mask is 0.
>>>
>> The 4th scenario is very tricky. If you try to set affinity from user space,
>> it will return failure because before kernel tried to change the affinity it
>> will verify it:
>> int __ioapic_set_affinity(...)
>> {
>> ...
>> if (!cpumask_intersects(mask, cpu_online_mask))
>> return -EINVAL;
>> ...
>> }
>>
>> So from this point of view, affinity can't be 0. But your patch is very
>> special because you change it by hand:
>> cpu_clear(smp_processor_id(), affinity_new);
>>
>> so it is reasonable. It makes me thinking a little bit more. In fixup_irqs
>> we have similar logic but we don't protect it. Maybe it is because currently
>> the scenario 4 can't happen because we stop it in advance. But who knows
>> if one day we use it in other situation we will hit this subtle issue
>> probably.
>>
>> So, Prarit, I suggest you writing another patch to fix this potential issue
>> for fixup_irqs. How would you think?
> 
> As you know Rui, I've been staring at that code wondering if it needs a fix.
> I'd like to hear Gong Chen's thoughts about it too...
> 

Oops -- got my sender header mixed up somehow.  Sorry 'bout that Gong :(

Gong, I think I'd like to go with the above patch for now and then take a closer
look at fixing the fixup_irqs() in another patch.  The reason is that I think
the above patch is critical for getting cpu hotplug working with large systems
with many devices.

I'm going to do more testing and will resubmit the patch later in the day.

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2014-01-02 Thread Prarit Bhargava


On 01/01/2014 09:41 PM, Chen, Gong wrote:
> On Tue, Dec 31, 2013 at 04:22:09PM -0500, Prarit Bhargava wrote:
>> Okay, how about,
>> if (irq_has_action(irq) && !irqd_is_per_cpu(data) &&
>> ((!cpumask_empty(_new)) &&
>>   !cpumask_subset(_new, _new)) ||
>>  cpumask_empty(_new))
>> this_count++;
>>
> I think it is good but a little bit complicated. How about this:
> 
> if (irq_has_action(irq) && !irqd_is_per_cpu(data) &&
> /* add some commments to emphysize the importance of turn */
> (cpumask_empty(_new) ||
> !cpumask_subset(_new, _new)))

Yeah :)  I thought of that after I sent it. :)

> 
>> I tried this with the following examples and AFAICT I get the correct result:
>>
>> 1) affinity mask = online mask = 0xf.  CPU 3 (1000b) is down'd.
>>
>> this_count is not incremented.
>>
>> 2) affinity mask is a non-zero subset of the online mask (which IMO is
>> the "typical" case).  For example, affinity_mask = 0x9, online mask = 0xf.  
>> CPU
>> 3 is again down'd.
>>
>> this_count is not incremented.
>>
>> 3) affinity_mask = 0x1, online mask = 0x3. (this is your example).  CPU
>> 1 is going down.
>>
>> this_count is incremented, as the resulting affinity mask will be 0.
>>
>> 4) affinity_mask = 0x0, online mask = 0x7.  CPU 1 is going down.
>>
>> this_count is incremented, as the affinity mask is 0.
>>
> The 4th scenario is very tricky. If you try to set affinity from user space,
> it will return failure because before kernel tried to change the affinity it
> will verify it:
> int __ioapic_set_affinity(...)
> {
> ...
> if (!cpumask_intersects(mask, cpu_online_mask))
> return -EINVAL;
> ...
> }
> 
> So from this point of view, affinity can't be 0. But your patch is very
> special because you change it by hand:
> cpu_clear(smp_processor_id(), affinity_new);
> 
> so it is reasonable. It makes me thinking a little bit more. In fixup_irqs
> we have similar logic but we don't protect it. Maybe it is because currently
> the scenario 4 can't happen because we stop it in advance. But who knows
> if one day we use it in other situation we will hit this subtle issue
> probably.
> 
> So, Prarit, I suggest you writing another patch to fix this potential issue
> for fixup_irqs. How would you think?

As you know Rui, I've been staring at that code wondering if it needs a fix.
I'd like to hear Gong Chen's thoughts about it too...

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2014-01-02 Thread Prarit Bhargava


On 01/01/2014 09:41 PM, Chen, Gong wrote:
 On Tue, Dec 31, 2013 at 04:22:09PM -0500, Prarit Bhargava wrote:
 Okay, how about,
 if (irq_has_action(irq)  !irqd_is_per_cpu(data) 
 ((!cpumask_empty(affinity_new)) 
   !cpumask_subset(affinity_new, online_new)) ||
  cpumask_empty(affinity_new))
 this_count++;

 I think it is good but a little bit complicated. How about this:
 
 if (irq_has_action(irq)  !irqd_is_per_cpu(data) 
 /* add some commments to emphysize the importance of turn */
 (cpumask_empty(affinity_new) ||
 !cpumask_subset(affinity_new, online_new)))

Yeah :)  I thought of that after I sent it. :)

 
 I tried this with the following examples and AFAICT I get the correct result:

 1) affinity mask = online mask = 0xf.  CPU 3 (1000b) is down'd.

 this_count is not incremented.

 2) affinity mask is a non-zero subset of the online mask (which IMO is
 the typical case).  For example, affinity_mask = 0x9, online mask = 0xf.  
 CPU
 3 is again down'd.

 this_count is not incremented.

 3) affinity_mask = 0x1, online mask = 0x3. (this is your example).  CPU
 1 is going down.

 this_count is incremented, as the resulting affinity mask will be 0.

 4) affinity_mask = 0x0, online mask = 0x7.  CPU 1 is going down.

 this_count is incremented, as the affinity mask is 0.

 The 4th scenario is very tricky. If you try to set affinity from user space,
 it will return failure because before kernel tried to change the affinity it
 will verify it:
 int __ioapic_set_affinity(...)
 {
 ...
 if (!cpumask_intersects(mask, cpu_online_mask))
 return -EINVAL;
 ...
 }
 
 So from this point of view, affinity can't be 0. But your patch is very
 special because you change it by hand:
 cpu_clear(smp_processor_id(), affinity_new);
 
 so it is reasonable. It makes me thinking a little bit more. In fixup_irqs
 we have similar logic but we don't protect it. Maybe it is because currently
 the scenario 4 can't happen because we stop it in advance. But who knows
 if one day we use it in other situation we will hit this subtle issue
 probably.
 
 So, Prarit, I suggest you writing another patch to fix this potential issue
 for fixup_irqs. How would you think?

As you know Rui, I've been staring at that code wondering if it needs a fix.
I'd like to hear Gong Chen's thoughts about it too...

P.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2014-01-02 Thread Prarit Bhargava


On 01/02/2014 07:57 AM, Prarit Bhargava wrote:
 
 
 On 01/01/2014 09:41 PM, Chen, Gong wrote:
 On Tue, Dec 31, 2013 at 04:22:09PM -0500, Prarit Bhargava wrote:
 Okay, how about,
 if (irq_has_action(irq)  !irqd_is_per_cpu(data) 
 ((!cpumask_empty(affinity_new)) 
   !cpumask_subset(affinity_new, online_new)) 
 ||
  cpumask_empty(affinity_new))
 this_count++;

 I think it is good but a little bit complicated. How about this:

 if (irq_has_action(irq)  !irqd_is_per_cpu(data) 
 /* add some commments to emphysize the importance of turn */
 (cpumask_empty(affinity_new) ||
 !cpumask_subset(affinity_new, online_new)))
 
 Yeah :)  I thought of that after I sent it. :)
 

 I tried this with the following examples and AFAICT I get the correct 
 result:

 1) affinity mask = online mask = 0xf.  CPU 3 (1000b) is down'd.

 this_count is not incremented.

 2) affinity mask is a non-zero subset of the online mask (which IMO is
 the typical case).  For example, affinity_mask = 0x9, online mask = 0xf.  
 CPU
 3 is again down'd.

 this_count is not incremented.

 3) affinity_mask = 0x1, online mask = 0x3. (this is your example).  CPU
 1 is going down.

 this_count is incremented, as the resulting affinity mask will be 0.

 4) affinity_mask = 0x0, online mask = 0x7.  CPU 1 is going down.

 this_count is incremented, as the affinity mask is 0.

 The 4th scenario is very tricky. If you try to set affinity from user space,
 it will return failure because before kernel tried to change the affinity it
 will verify it:
 int __ioapic_set_affinity(...)
 {
 ...
 if (!cpumask_intersects(mask, cpu_online_mask))
 return -EINVAL;
 ...
 }

 So from this point of view, affinity can't be 0. But your patch is very
 special because you change it by hand:
 cpu_clear(smp_processor_id(), affinity_new);

 so it is reasonable. It makes me thinking a little bit more. In fixup_irqs
 we have similar logic but we don't protect it. Maybe it is because currently
 the scenario 4 can't happen because we stop it in advance. But who knows
 if one day we use it in other situation we will hit this subtle issue
 probably.

 So, Prarit, I suggest you writing another patch to fix this potential issue
 for fixup_irqs. How would you think?
 
 As you know Rui, I've been staring at that code wondering if it needs a fix.
 I'd like to hear Gong Chen's thoughts about it too...
 

Oops -- got my sender header mixed up somehow.  Sorry 'bout that Gong :(

Gong, I think I'd like to go with the above patch for now and then take a closer
look at fixing the fixup_irqs() in another patch.  The reason is that I think
the above patch is critical for getting cpu hotplug working with large systems
with many devices.

I'm going to do more testing and will resubmit the patch later in the day.

P.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v4]

2014-01-02 Thread Chen, Gong
Add some nitpicks below.

Reviewed-by: Chen, Gong gong.c...@linux.intel.com

On Thu, Jan 02, 2014 at 07:47:24PM -0500, Prarit Bhargava wrote:
 +int check_irq_vectors_for_cpu_disable(void)
 +{
 + int irq, cpu;
 + unsigned int vector, this_count, count;
 + struct irq_desc *desc;
 + struct irq_data *data;
 + struct cpumask affinity_new, online_new;
 +
 + cpumask_copy(online_new, cpu_online_mask);
 + cpu_clear(smp_processor_id(), online_new);
I notice that you use smp_processor_id() many times. Maybe we can
save it first for speed.

 +
 + this_count = 0;
 + for (vector = FIRST_EXTERNAL_VECTOR; vector  NR_VECTORS; vector++) {
 + irq = __this_cpu_read(vector_irq[vector]);
 + if (irq = 0) {
 + desc = irq_to_desc(irq);
 + data = irq_desc_get_irq_data(desc);
 + cpumask_copy(affinity_new, data-affinity);
 + cpu_clear(smp_processor_id(), affinity_new);
 + /*
 +  * Only count active non-percpu irqs, and those
 +  * irqs that are not linked to on an online cpu; in
 +  * fixup_irqs(), chip-irq_set_affinity() will be
 +  * called which will set vector_irq[irq] for each
 +  * cpu.
 +  */
 + if (irq_has_action(irq)  !irqd_is_per_cpu(data) 
 + (cpumask_empty(affinity_new) ||
 +  !cpumask_subset(affinity_new, online_new)))
 + this_count++;
Would you please add some extra comments to describe these two different
conditions that cpumask is empty and non-empty. I feel it is a little bit
sutble and I don't expect I'm confused by myself one day :-).



signature.asc
Description: Digital signature


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2014-01-01 Thread Chen, Gong
On Tue, Dec 31, 2013 at 04:22:09PM -0500, Prarit Bhargava wrote:
> Okay, how about,
> if (irq_has_action(irq) && !irqd_is_per_cpu(data) &&
> ((!cpumask_empty(_new)) &&
>   !cpumask_subset(_new, _new)) ||
>  cpumask_empty(_new))
> this_count++;
> 
I think it is good but a little bit complicated. How about this:

if (irq_has_action(irq) && !irqd_is_per_cpu(data) &&
/* add some commments to emphysize the importance of turn */
(cpumask_empty(_new) ||
!cpumask_subset(_new, _new)))

> I tried this with the following examples and AFAICT I get the correct result:
> 
> 1) affinity mask = online mask = 0xf.  CPU 3 (1000b) is down'd.
> 
> this_count is not incremented.
> 
> 2) affinity mask is a non-zero subset of the online mask (which IMO is
> the "typical" case).  For example, affinity_mask = 0x9, online mask = 0xf.  
> CPU
> 3 is again down'd.
> 
> this_count is not incremented.
> 
> 3) affinity_mask = 0x1, online mask = 0x3. (this is your example).  CPU
> 1 is going down.
> 
> this_count is incremented, as the resulting affinity mask will be 0.
> 
> 4) affinity_mask = 0x0, online mask = 0x7.  CPU 1 is going down.
> 
> this_count is incremented, as the affinity mask is 0.
> 
The 4th scenario is very tricky. If you try to set affinity from user space,
it will return failure because before kernel tried to change the affinity it
will verify it:
int __ioapic_set_affinity(...)
{
...
if (!cpumask_intersects(mask, cpu_online_mask))
return -EINVAL;
...
}

So from this point of view, affinity can't be 0. But your patch is very
special because you change it by hand:
cpu_clear(smp_processor_id(), affinity_new);

so it is reasonable. It makes me thinking a little bit more. In fixup_irqs
we have similar logic but we don't protect it. Maybe it is because currently
the scenario 4 can't happen because we stop it in advance. But who knows
if one day we use it in other situation we will hit this subtle issue
probably.

So, Prarit, I suggest you writing another patch to fix this potential issue
for fixup_irqs. How would you think?


signature.asc
Description: Digital signature


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2014-01-01 Thread Chen, Gong
On Tue, Dec 31, 2013 at 04:22:09PM -0500, Prarit Bhargava wrote:
 Okay, how about,
 if (irq_has_action(irq)  !irqd_is_per_cpu(data) 
 ((!cpumask_empty(affinity_new)) 
   !cpumask_subset(affinity_new, online_new)) ||
  cpumask_empty(affinity_new))
 this_count++;
 
I think it is good but a little bit complicated. How about this:

if (irq_has_action(irq)  !irqd_is_per_cpu(data) 
/* add some commments to emphysize the importance of turn */
(cpumask_empty(affinity_new) ||
!cpumask_subset(affinity_new, online_new)))

 I tried this with the following examples and AFAICT I get the correct result:
 
 1) affinity mask = online mask = 0xf.  CPU 3 (1000b) is down'd.
 
 this_count is not incremented.
 
 2) affinity mask is a non-zero subset of the online mask (which IMO is
 the typical case).  For example, affinity_mask = 0x9, online mask = 0xf.  
 CPU
 3 is again down'd.
 
 this_count is not incremented.
 
 3) affinity_mask = 0x1, online mask = 0x3. (this is your example).  CPU
 1 is going down.
 
 this_count is incremented, as the resulting affinity mask will be 0.
 
 4) affinity_mask = 0x0, online mask = 0x7.  CPU 1 is going down.
 
 this_count is incremented, as the affinity mask is 0.
 
The 4th scenario is very tricky. If you try to set affinity from user space,
it will return failure because before kernel tried to change the affinity it
will verify it:
int __ioapic_set_affinity(...)
{
...
if (!cpumask_intersects(mask, cpu_online_mask))
return -EINVAL;
...
}

So from this point of view, affinity can't be 0. But your patch is very
special because you change it by hand:
cpu_clear(smp_processor_id(), affinity_new);

so it is reasonable. It makes me thinking a little bit more. In fixup_irqs
we have similar logic but we don't protect it. Maybe it is because currently
the scenario 4 can't happen because we stop it in advance. But who knows
if one day we use it in other situation we will hit this subtle issue
probably.

So, Prarit, I suggest you writing another patch to fix this potential issue
for fixup_irqs. How would you think?


signature.asc
Description: Digital signature


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-31 Thread Prarit Bhargava


On 12/30/2013 09:58 PM, rui wang wrote:
> On 12/30/13, Prarit Bhargava  wrote:
>>
>>
>> On 12/30/2013 07:56 AM, rui wang wrote:
>>
> ...
>> Okay, so the big issue is that we need to do the calculation without this 
>> cpu,
> 
>>
>> int check_irq_vectors_for_cpu_disable(void)
>> {
>>int irq, cpu;
>>unsigned int vector, this_count, count;
>>struct irq_desc *desc;
>>struct irq_data *data;
>>struct cpumask online_new; /* cpu_online_mask - this_cpu */
>>struct cpumask affinity_new; /* affinity - this_cpu */
>>
>>cpumask_copy(_new, cpu_online_mask);
>>cpu_clear(smp_processor_id(), online_new);
>>
>>this_count = 0;
>>for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
>>irq = __this_cpu_read(vector_irq[vector]);
>>if (irq >= 0) {
>>desc = irq_to_desc(irq);
>>data = irq_desc_get_irq_data(desc);
>>cpumask_copy(_new, data->affinity);
>>cpu_clear(smp_processor_id(), affinity_new);
>>if (irq_has_action(irq) && !irqd_is_per_cpu(data) &&
>>!cpumask_subset(_new, _new) &&
>>!cpumask_empty(_new))
> 
> If this cpu is the only target, then affinity_new becomes empty.
> Should we count it for migration?

Okay, how about,
if (irq_has_action(irq) && !irqd_is_per_cpu(data) &&
((!cpumask_empty(_new)) &&
  !cpumask_subset(_new, _new)) ||
 cpumask_empty(_new))
this_count++;

I tried this with the following examples and AFAICT I get the correct result:

1) affinity mask = online mask = 0xf.  CPU 3 (1000b) is down'd.

this_count is not incremented.

2) affinity mask is a non-zero subset of the online mask (which IMO is
the "typical" case).  For example, affinity_mask = 0x9, online mask = 0xf.  CPU
3 is again down'd.

this_count is not incremented.

3) affinity_mask = 0x1, online mask = 0x3. (this is your example).  CPU
1 is going down.

this_count is incremented, as the resulting affinity mask will be 0.

4) affinity_mask = 0x0, online mask = 0x7.  CPU 1 is going down.

this_count is incremented, as the affinity mask is 0.

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-31 Thread Prarit Bhargava


On 12/30/2013 09:58 PM, rui wang wrote:
 On 12/30/13, Prarit Bhargava pra...@redhat.com wrote:


 On 12/30/2013 07:56 AM, rui wang wrote:

 ...
 Okay, so the big issue is that we need to do the calculation without this 
 cpu,
 

 int check_irq_vectors_for_cpu_disable(void)
 {
int irq, cpu;
unsigned int vector, this_count, count;
struct irq_desc *desc;
struct irq_data *data;
struct cpumask online_new; /* cpu_online_mask - this_cpu */
struct cpumask affinity_new; /* affinity - this_cpu */

cpumask_copy(online_new, cpu_online_mask);
cpu_clear(smp_processor_id(), online_new);

this_count = 0;
for (vector = FIRST_EXTERNAL_VECTOR; vector  NR_VECTORS; vector++) {
irq = __this_cpu_read(vector_irq[vector]);
if (irq = 0) {
desc = irq_to_desc(irq);
data = irq_desc_get_irq_data(desc);
cpumask_copy(affinity_new, data-affinity);
cpu_clear(smp_processor_id(), affinity_new);
if (irq_has_action(irq)  !irqd_is_per_cpu(data) 
!cpumask_subset(affinity_new, online_new) 
!cpumask_empty(affinity_new))
 
 If this cpu is the only target, then affinity_new becomes empty.
 Should we count it for migration?

Okay, how about,
if (irq_has_action(irq)  !irqd_is_per_cpu(data) 
((!cpumask_empty(affinity_new)) 
  !cpumask_subset(affinity_new, online_new)) ||
 cpumask_empty(affinity_new))
this_count++;

I tried this with the following examples and AFAICT I get the correct result:

1) affinity mask = online mask = 0xf.  CPU 3 (1000b) is down'd.

this_count is not incremented.

2) affinity mask is a non-zero subset of the online mask (which IMO is
the typical case).  For example, affinity_mask = 0x9, online mask = 0xf.  CPU
3 is again down'd.

this_count is not incremented.

3) affinity_mask = 0x1, online mask = 0x3. (this is your example).  CPU
1 is going down.

this_count is incremented, as the resulting affinity mask will be 0.

4) affinity_mask = 0x0, online mask = 0x7.  CPU 1 is going down.

this_count is incremented, as the affinity mask is 0.

P.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-30 Thread rui wang
On 12/30/13, Prarit Bhargava  wrote:
>
>
> On 12/30/2013 07:56 AM, rui wang wrote:
>
>> An irq can be mapped to only one vector number, but can have multiple
>> destination CPUs. i.e. the same irq/vector can appear on multiple
>> CPUs' vector_irq[]. So checking data->affinity is necessary I think.
>
> That's true Rui -- but here's what I think the scenario actually is.
>
> Suppose we have a 4-cpu system, and we have an IRQ that is mapped to
> multiple
> cpu's vector_irq[].  For example, we have IRQ 200 that is mapped to CPU 2
> vector_irq[50], and CPU 3 vector_irq[60].
>

This should not happen. There's only one LSB to fill the vector number
in IRTE. So either 50 or 60 but not both for an irq.

...
> Okay, so the big issue is that we need to do the calculation without this cpu,

Yes if it's done before calling cpu_disable_comm().

> so I think this works (sorry for the cut-and-paste)
>
> int check_irq_vectors_for_cpu_disable(void)
> {
>int irq, cpu;
>unsigned int vector, this_count, count;
>struct irq_desc *desc;
>struct irq_data *data;
>struct cpumask online_new; /* cpu_online_mask - this_cpu */
>struct cpumask affinity_new; /* affinity - this_cpu */
>
>cpumask_copy(_new, cpu_online_mask);
>cpu_clear(smp_processor_id(), online_new);
>
>this_count = 0;
>for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
>irq = __this_cpu_read(vector_irq[vector]);
>if (irq >= 0) {
>desc = irq_to_desc(irq);
>data = irq_desc_get_irq_data(desc);
>cpumask_copy(_new, data->affinity);
>cpu_clear(smp_processor_id(), affinity_new);
>if (irq_has_action(irq) && !irqd_is_per_cpu(data) &&
>!cpumask_subset(_new, _new) &&
>!cpumask_empty(_new))

If this cpu is the only target, then affinity_new becomes empty.
Should we count it for migration?

Thanks
Rui
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-30 Thread Prarit Bhargava


On 12/30/2013 07:56 AM, rui wang wrote:
> On 12/29/13, Prarit Bhargava  wrote:
>>
>>
>> On 12/20/2013 04:41 AM, rui wang wrote:
> <>
>>> The vector number for an irq is programmed in the LSB of the IOAPIC
>>> IRTE (or MSI data register in the case of MSI/MSIx). So there can be
>>> only one vector number (although multiple CPUs can be specified
>>> through DM). An MSI-capable device can dynamically change the lower
>>> few bits in the LSB to signal multiple interrupts with a contiguous
>>> range of vectors in powers of 2,but each of these vectors is treated
>>> as a separate IRQ. i.e. each of them has a separate irq desc, or a
>>> separate line in the /proc/interrupt file. This patch shows the MSI
>>> irq allocation in detail:
>>> http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=51906e779f2b13b38f8153774c4c7163d412ffd9
>>>
>>> Thanks
>>> Rui
>>>
>>
>> Gong and Rui,
>>
>> After looking at this in detail I realized I made a mistake in my patch by
>> including the check for the smp_affinity.  Simply put, it shouldn't be
>> there
>> given Rui's explanation above.
>>
>> So I think the patch simply needs to do:
>>
>> this_count = 0;
>> for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++)
>> {
>> irq = __this_cpu_read(vector_irq[vector]);
>> if (irq >= 0) {
>> desc = irq_to_desc(irq);
>> data = irq_desc_get_irq_data(desc);
>> affinity = data->affinity;
>> if (irq_has_action(irq) && !irqd_is_per_cpu(data))
>> this_count++;
>> }
>> }
>>
>> Can the two of you confirm the above is correct?  It would be greatly
>> appreciated.
> 
> An irq can be mapped to only one vector number, but can have multiple
> destination CPUs. i.e. the same irq/vector can appear on multiple
> CPUs' vector_irq[]. So checking data->affinity is necessary I think.
> But notice that data->affinity is updated in chip->irq_set_affinity()
> inside fixup_irqs(), while cpu_online_mask is updated in
> remove_cpu_from_maps() inside cpu_disable_common(). They are updated
> in different places. So the algorithm to check them against each other
> should be different, depending on where you put the check_vectors().
> That's my understanding.


Okay, so the big issue is that we need to do the calculation without this cpu,
so I think this works (sorry for the cut-and-paste)

int check_irq_vectors_for_cpu_disable(void)
{
int irq, cpu;
unsigned int vector, this_count, count;
struct irq_desc *desc;
struct irq_data *data;
struct cpumask online_new; /* cpu_online_mask - this_cpu */
struct cpumask affinity_new; /* affinity - this_cpu */

cpumask_copy(_new, cpu_online_mask);
cpu_clear(smp_processor_id(), online_new);

this_count = 0;
for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
irq = __this_cpu_read(vector_irq[vector]);
if (irq >= 0) {
desc = irq_to_desc(irq);
data = irq_desc_get_irq_data(desc);
cpumask_copy(_new, data->affinity);
cpu_clear(smp_processor_id(), affinity_new);
if (irq_has_action(irq) && !irqd_is_per_cpu(data) &&
!cpumask_subset(_new, _new) &&
!cpumask_empty(_new))
this_count++;
}
}

...

If I go back to the various examples this appears to work.  For example, your
previous case was all cpus are online, CPU 1 goes down and we have an IRQ with
affinity for CPU (1,2).  We skip this IRQ which is correct.

And if we have another IRQ with affinity of only CPU 1 we will not skip this
IRQ, which is also correct.

I've tried other examples and they appear to work AFAICT.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-30 Thread Prarit Bhargava


On 12/30/2013 07:56 AM, rui wang wrote:

> An irq can be mapped to only one vector number, but can have multiple
> destination CPUs. i.e. the same irq/vector can appear on multiple
> CPUs' vector_irq[]. So checking data->affinity is necessary I think.

That's true Rui -- but here's what I think the scenario actually is.

Suppose we have a 4-cpu system, and we have an IRQ that is mapped to multiple
cpu's vector_irq[].  For example, we have IRQ 200 that is mapped to CPU 2
vector_irq[50], and CPU 3 vector_irq[60].

Now I 'echo 0 > /sys/devices/system/cpu/cpu3/online'.

cpu_disable is called and the kernel migrates IRQs off to other cpus.
Regardless if IRQ 200 is already mapped to CPU2 vector_irq[50], the mapping for
CPU 3 vector_irq[60] *must be migrated* to another CPU.  It has a valid irq
handler and the IRQ is active.  It doesn't just disappear because the CPU went 
down.

ie) AFAICT we should not differentiate between a multiple mapped IRQ and a
singly mapped IRQ when traversing the vector_irq[] for CPU 3.

I'm probably being dense on this but I'm not seeing a problem with migrating the
IRQ.

> But notice that data->affinity is updated in chip->irq_set_affinity()
> inside fixup_irqs(), while cpu_online_mask is updated in
> remove_cpu_from_maps() inside cpu_disable_common(). 

It shouldn't matter that the maps are updated in different areas during the
execution as we're in stop_machine().

They are updated
> in different places. So the algorithm to check them against each other
> should be different, depending on where you put the check_vectors().
> That's my understanding.
> 

P.

> Thanks
> Rui
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-30 Thread Prarit Bhargava


On 12/30/2013 02:44 AM, Chen, Gong wrote:
> On Sat, Dec 28, 2013 at 12:10:38PM -0500, Prarit Bhargava wrote:
>> Gong and Rui,
>>
>> After looking at this in detail I realized I made a mistake in my patch by
>> including the check for the smp_affinity.  Simply put, it shouldn't be there
>> given Rui's explanation above.
>>
>> So I think the patch simply needs to do:
>>
>> this_count = 0;
>> for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
>> irq = __this_cpu_read(vector_irq[vector]);
>> if (irq >= 0) {
>> desc = irq_to_desc(irq);
>> data = irq_desc_get_irq_data(desc);
>> affinity = data->affinity;
>> if (irq_has_action(irq) && !irqd_is_per_cpu(data))
>> this_count++;
>> }
>> }
>>
>> Can the two of you confirm the above is correct?  It would be greatly 
>> appreciated.
>>
> 
> No, I don't think it is correct. We still need to consider smp_affinity.
> 
> fixup_irqs
> irq_set_affinity(native_ioapic_set_affinity)
> __ioapic_set_affinity
> assign_irq_vector
> __assign_irq_vector
> cpu_mask_to_apicid_and
> /* now begin to set ioapic RET */
> 
> __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
> {
> ...
> apic->vector_allocation_domain(cpu, tmp_mask, mask);
> ...
> for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask)
> per_cpu(vector_irq, new_cpu)[vector] = irq;
> cfg->vector = vector;
> cpumask_copy(cfg->domain, tmp_mask);
> ...
> }
> 
> On same vecotr on all related vector_irq, irq is set. So such kind of
> irq should happen in multiple vector_irq. In cpu_mask_to_apicid_and(e.g.
> x2apic_cpu_mask_to_apicid_and for cluster mode), apic is updated
> depending on new mask. That's why I think this kind of interrupt
> should be bypassed.

Hmm ... okay.  I'll take a closer look at this.

Thanks for the additional information.

P.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-30 Thread rui wang
On 12/29/13, Prarit Bhargava  wrote:
>
>
> On 12/20/2013 04:41 AM, rui wang wrote:
<>
>> The vector number for an irq is programmed in the LSB of the IOAPIC
>> IRTE (or MSI data register in the case of MSI/MSIx). So there can be
>> only one vector number (although multiple CPUs can be specified
>> through DM). An MSI-capable device can dynamically change the lower
>> few bits in the LSB to signal multiple interrupts with a contiguous
>> range of vectors in powers of 2,but each of these vectors is treated
>> as a separate IRQ. i.e. each of them has a separate irq desc, or a
>> separate line in the /proc/interrupt file. This patch shows the MSI
>> irq allocation in detail:
>> http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=51906e779f2b13b38f8153774c4c7163d412ffd9
>>
>> Thanks
>> Rui
>>
>
> Gong and Rui,
>
> After looking at this in detail I realized I made a mistake in my patch by
> including the check for the smp_affinity.  Simply put, it shouldn't be
> there
> given Rui's explanation above.
>
> So I think the patch simply needs to do:
>
> this_count = 0;
> for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++)
> {
> irq = __this_cpu_read(vector_irq[vector]);
> if (irq >= 0) {
> desc = irq_to_desc(irq);
> data = irq_desc_get_irq_data(desc);
> affinity = data->affinity;
> if (irq_has_action(irq) && !irqd_is_per_cpu(data))
> this_count++;
> }
> }
>
> Can the two of you confirm the above is correct?  It would be greatly
> appreciated.

An irq can be mapped to only one vector number, but can have multiple
destination CPUs. i.e. the same irq/vector can appear on multiple
CPUs' vector_irq[]. So checking data->affinity is necessary I think.
But notice that data->affinity is updated in chip->irq_set_affinity()
inside fixup_irqs(), while cpu_online_mask is updated in
remove_cpu_from_maps() inside cpu_disable_common(). They are updated
in different places. So the algorithm to check them against each other
should be different, depending on where you put the check_vectors().
That's my understanding.

Thanks
Rui
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-30 Thread Chen, Gong
On Sat, Dec 28, 2013 at 12:10:38PM -0500, Prarit Bhargava wrote:
> Gong and Rui,
> 
> After looking at this in detail I realized I made a mistake in my patch by
> including the check for the smp_affinity.  Simply put, it shouldn't be there
> given Rui's explanation above.
> 
> So I think the patch simply needs to do:
> 
> this_count = 0;
> for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
> irq = __this_cpu_read(vector_irq[vector]);
> if (irq >= 0) {
> desc = irq_to_desc(irq);
> data = irq_desc_get_irq_data(desc);
> affinity = data->affinity;
> if (irq_has_action(irq) && !irqd_is_per_cpu(data))
> this_count++;
> }
> }
> 
> Can the two of you confirm the above is correct?  It would be greatly 
> appreciated.
> 

No, I don't think it is correct. We still need to consider smp_affinity.

fixup_irqs
irq_set_affinity(native_ioapic_set_affinity)
__ioapic_set_affinity
assign_irq_vector
__assign_irq_vector
cpu_mask_to_apicid_and
/* now begin to set ioapic RET */

__assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
{
...
apic->vector_allocation_domain(cpu, tmp_mask, mask);
...
for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask)
per_cpu(vector_irq, new_cpu)[vector] = irq;
cfg->vector = vector;
cpumask_copy(cfg->domain, tmp_mask);
...
}

On same vecotr on all related vector_irq, irq is set. So such kind of
irq should happen in multiple vector_irq. In cpu_mask_to_apicid_and(e.g.
x2apic_cpu_mask_to_apicid_and for cluster mode), apic is updated
depending on new mask. That's why I think this kind of interrupt
should be bypassed.


signature.asc
Description: Digital signature


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-30 Thread Chen, Gong
On Sat, Dec 28, 2013 at 12:10:38PM -0500, Prarit Bhargava wrote:
 Gong and Rui,
 
 After looking at this in detail I realized I made a mistake in my patch by
 including the check for the smp_affinity.  Simply put, it shouldn't be there
 given Rui's explanation above.
 
 So I think the patch simply needs to do:
 
 this_count = 0;
 for (vector = FIRST_EXTERNAL_VECTOR; vector  NR_VECTORS; vector++) {
 irq = __this_cpu_read(vector_irq[vector]);
 if (irq = 0) {
 desc = irq_to_desc(irq);
 data = irq_desc_get_irq_data(desc);
 affinity = data-affinity;
 if (irq_has_action(irq)  !irqd_is_per_cpu(data))
 this_count++;
 }
 }
 
 Can the two of you confirm the above is correct?  It would be greatly 
 appreciated.
 

No, I don't think it is correct. We still need to consider smp_affinity.

fixup_irqs
irq_set_affinity(native_ioapic_set_affinity)
__ioapic_set_affinity
assign_irq_vector
__assign_irq_vector
cpu_mask_to_apicid_and
/* now begin to set ioapic RET */

__assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
{
...
apic-vector_allocation_domain(cpu, tmp_mask, mask);
...
for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask)
per_cpu(vector_irq, new_cpu)[vector] = irq;
cfg-vector = vector;
cpumask_copy(cfg-domain, tmp_mask);
...
}

On same vecotr on all related vector_irq, irq is set. So such kind of
irq should happen in multiple vector_irq. In cpu_mask_to_apicid_and(e.g.
x2apic_cpu_mask_to_apicid_and for cluster mode), apic is updated
depending on new mask. That's why I think this kind of interrupt
should be bypassed.


signature.asc
Description: Digital signature


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-30 Thread rui wang
On 12/29/13, Prarit Bhargava pra...@redhat.com wrote:


 On 12/20/2013 04:41 AM, rui wang wrote:
snip
 The vector number for an irq is programmed in the LSB of the IOAPIC
 IRTE (or MSI data register in the case of MSI/MSIx). So there can be
 only one vector number (although multiple CPUs can be specified
 through DM). An MSI-capable device can dynamically change the lower
 few bits in the LSB to signal multiple interrupts with a contiguous
 range of vectors in powers of 2,but each of these vectors is treated
 as a separate IRQ. i.e. each of them has a separate irq desc, or a
 separate line in the /proc/interrupt file. This patch shows the MSI
 irq allocation in detail:
 http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=51906e779f2b13b38f8153774c4c7163d412ffd9

 Thanks
 Rui


 Gong and Rui,

 After looking at this in detail I realized I made a mistake in my patch by
 including the check for the smp_affinity.  Simply put, it shouldn't be
 there
 given Rui's explanation above.

 So I think the patch simply needs to do:

 this_count = 0;
 for (vector = FIRST_EXTERNAL_VECTOR; vector  NR_VECTORS; vector++)
 {
 irq = __this_cpu_read(vector_irq[vector]);
 if (irq = 0) {
 desc = irq_to_desc(irq);
 data = irq_desc_get_irq_data(desc);
 affinity = data-affinity;
 if (irq_has_action(irq)  !irqd_is_per_cpu(data))
 this_count++;
 }
 }

 Can the two of you confirm the above is correct?  It would be greatly
 appreciated.

An irq can be mapped to only one vector number, but can have multiple
destination CPUs. i.e. the same irq/vector can appear on multiple
CPUs' vector_irq[]. So checking data-affinity is necessary I think.
But notice that data-affinity is updated in chip-irq_set_affinity()
inside fixup_irqs(), while cpu_online_mask is updated in
remove_cpu_from_maps() inside cpu_disable_common(). They are updated
in different places. So the algorithm to check them against each other
should be different, depending on where you put the check_vectors().
That's my understanding.

Thanks
Rui
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-30 Thread Prarit Bhargava


On 12/30/2013 02:44 AM, Chen, Gong wrote:
 On Sat, Dec 28, 2013 at 12:10:38PM -0500, Prarit Bhargava wrote:
 Gong and Rui,

 After looking at this in detail I realized I made a mistake in my patch by
 including the check for the smp_affinity.  Simply put, it shouldn't be there
 given Rui's explanation above.

 So I think the patch simply needs to do:

 this_count = 0;
 for (vector = FIRST_EXTERNAL_VECTOR; vector  NR_VECTORS; vector++) {
 irq = __this_cpu_read(vector_irq[vector]);
 if (irq = 0) {
 desc = irq_to_desc(irq);
 data = irq_desc_get_irq_data(desc);
 affinity = data-affinity;
 if (irq_has_action(irq)  !irqd_is_per_cpu(data))
 this_count++;
 }
 }

 Can the two of you confirm the above is correct?  It would be greatly 
 appreciated.

 
 No, I don't think it is correct. We still need to consider smp_affinity.
 
 fixup_irqs
 irq_set_affinity(native_ioapic_set_affinity)
 __ioapic_set_affinity
 assign_irq_vector
 __assign_irq_vector
 cpu_mask_to_apicid_and
 /* now begin to set ioapic RET */
 
 __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
 {
 ...
 apic-vector_allocation_domain(cpu, tmp_mask, mask);
 ...
 for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask)
 per_cpu(vector_irq, new_cpu)[vector] = irq;
 cfg-vector = vector;
 cpumask_copy(cfg-domain, tmp_mask);
 ...
 }
 
 On same vecotr on all related vector_irq, irq is set. So such kind of
 irq should happen in multiple vector_irq. In cpu_mask_to_apicid_and(e.g.
 x2apic_cpu_mask_to_apicid_and for cluster mode), apic is updated
 depending on new mask. That's why I think this kind of interrupt
 should be bypassed.

Hmm ... okay.  I'll take a closer look at this.

Thanks for the additional information.

P.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-30 Thread Prarit Bhargava


On 12/30/2013 07:56 AM, rui wang wrote:

 An irq can be mapped to only one vector number, but can have multiple
 destination CPUs. i.e. the same irq/vector can appear on multiple
 CPUs' vector_irq[]. So checking data-affinity is necessary I think.

That's true Rui -- but here's what I think the scenario actually is.

Suppose we have a 4-cpu system, and we have an IRQ that is mapped to multiple
cpu's vector_irq[].  For example, we have IRQ 200 that is mapped to CPU 2
vector_irq[50], and CPU 3 vector_irq[60].

Now I 'echo 0  /sys/devices/system/cpu/cpu3/online'.

cpu_disable is called and the kernel migrates IRQs off to other cpus.
Regardless if IRQ 200 is already mapped to CPU2 vector_irq[50], the mapping for
CPU 3 vector_irq[60] *must be migrated* to another CPU.  It has a valid irq
handler and the IRQ is active.  It doesn't just disappear because the CPU went 
down.

ie) AFAICT we should not differentiate between a multiple mapped IRQ and a
singly mapped IRQ when traversing the vector_irq[] for CPU 3.

I'm probably being dense on this but I'm not seeing a problem with migrating the
IRQ.

 But notice that data-affinity is updated in chip-irq_set_affinity()
 inside fixup_irqs(), while cpu_online_mask is updated in
 remove_cpu_from_maps() inside cpu_disable_common(). 

It shouldn't matter that the maps are updated in different areas during the
execution as we're in stop_machine().

They are updated
 in different places. So the algorithm to check them against each other
 should be different, depending on where you put the check_vectors().
 That's my understanding.
 

P.

 Thanks
 Rui
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-30 Thread Prarit Bhargava


On 12/30/2013 07:56 AM, rui wang wrote:
 On 12/29/13, Prarit Bhargava pra...@redhat.com wrote:


 On 12/20/2013 04:41 AM, rui wang wrote:
 snip
 The vector number for an irq is programmed in the LSB of the IOAPIC
 IRTE (or MSI data register in the case of MSI/MSIx). So there can be
 only one vector number (although multiple CPUs can be specified
 through DM). An MSI-capable device can dynamically change the lower
 few bits in the LSB to signal multiple interrupts with a contiguous
 range of vectors in powers of 2,but each of these vectors is treated
 as a separate IRQ. i.e. each of them has a separate irq desc, or a
 separate line in the /proc/interrupt file. This patch shows the MSI
 irq allocation in detail:
 http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=51906e779f2b13b38f8153774c4c7163d412ffd9

 Thanks
 Rui


 Gong and Rui,

 After looking at this in detail I realized I made a mistake in my patch by
 including the check for the smp_affinity.  Simply put, it shouldn't be
 there
 given Rui's explanation above.

 So I think the patch simply needs to do:

 this_count = 0;
 for (vector = FIRST_EXTERNAL_VECTOR; vector  NR_VECTORS; vector++)
 {
 irq = __this_cpu_read(vector_irq[vector]);
 if (irq = 0) {
 desc = irq_to_desc(irq);
 data = irq_desc_get_irq_data(desc);
 affinity = data-affinity;
 if (irq_has_action(irq)  !irqd_is_per_cpu(data))
 this_count++;
 }
 }

 Can the two of you confirm the above is correct?  It would be greatly
 appreciated.
 
 An irq can be mapped to only one vector number, but can have multiple
 destination CPUs. i.e. the same irq/vector can appear on multiple
 CPUs' vector_irq[]. So checking data-affinity is necessary I think.
 But notice that data-affinity is updated in chip-irq_set_affinity()
 inside fixup_irqs(), while cpu_online_mask is updated in
 remove_cpu_from_maps() inside cpu_disable_common(). They are updated
 in different places. So the algorithm to check them against each other
 should be different, depending on where you put the check_vectors().
 That's my understanding.


Okay, so the big issue is that we need to do the calculation without this cpu,
so I think this works (sorry for the cut-and-paste)

int check_irq_vectors_for_cpu_disable(void)
{
int irq, cpu;
unsigned int vector, this_count, count;
struct irq_desc *desc;
struct irq_data *data;
struct cpumask online_new; /* cpu_online_mask - this_cpu */
struct cpumask affinity_new; /* affinity - this_cpu */

cpumask_copy(online_new, cpu_online_mask);
cpu_clear(smp_processor_id(), online_new);

this_count = 0;
for (vector = FIRST_EXTERNAL_VECTOR; vector  NR_VECTORS; vector++) {
irq = __this_cpu_read(vector_irq[vector]);
if (irq = 0) {
desc = irq_to_desc(irq);
data = irq_desc_get_irq_data(desc);
cpumask_copy(affinity_new, data-affinity);
cpu_clear(smp_processor_id(), affinity_new);
if (irq_has_action(irq)  !irqd_is_per_cpu(data) 
!cpumask_subset(affinity_new, online_new) 
!cpumask_empty(affinity_new))
this_count++;
}
}

...

If I go back to the various examples this appears to work.  For example, your
previous case was all cpus are online, CPU 1 goes down and we have an IRQ with
affinity for CPU (1,2).  We skip this IRQ which is correct.

And if we have another IRQ with affinity of only CPU 1 we will not skip this
IRQ, which is also correct.

I've tried other examples and they appear to work AFAICT.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-30 Thread rui wang
On 12/30/13, Prarit Bhargava pra...@redhat.com wrote:


 On 12/30/2013 07:56 AM, rui wang wrote:

 An irq can be mapped to only one vector number, but can have multiple
 destination CPUs. i.e. the same irq/vector can appear on multiple
 CPUs' vector_irq[]. So checking data-affinity is necessary I think.

 That's true Rui -- but here's what I think the scenario actually is.

 Suppose we have a 4-cpu system, and we have an IRQ that is mapped to
 multiple
 cpu's vector_irq[].  For example, we have IRQ 200 that is mapped to CPU 2
 vector_irq[50], and CPU 3 vector_irq[60].


This should not happen. There's only one LSB to fill the vector number
in IRTE. So either 50 or 60 but not both for an irq.

...
 Okay, so the big issue is that we need to do the calculation without this cpu,

Yes if it's done before calling cpu_disable_comm().

 so I think this works (sorry for the cut-and-paste)

 int check_irq_vectors_for_cpu_disable(void)
 {
int irq, cpu;
unsigned int vector, this_count, count;
struct irq_desc *desc;
struct irq_data *data;
struct cpumask online_new; /* cpu_online_mask - this_cpu */
struct cpumask affinity_new; /* affinity - this_cpu */

cpumask_copy(online_new, cpu_online_mask);
cpu_clear(smp_processor_id(), online_new);

this_count = 0;
for (vector = FIRST_EXTERNAL_VECTOR; vector  NR_VECTORS; vector++) {
irq = __this_cpu_read(vector_irq[vector]);
if (irq = 0) {
desc = irq_to_desc(irq);
data = irq_desc_get_irq_data(desc);
cpumask_copy(affinity_new, data-affinity);
cpu_clear(smp_processor_id(), affinity_new);
if (irq_has_action(irq)  !irqd_is_per_cpu(data) 
!cpumask_subset(affinity_new, online_new) 
!cpumask_empty(affinity_new))

If this cpu is the only target, then affinity_new becomes empty.
Should we count it for migration?

Thanks
Rui
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-28 Thread Prarit Bhargava


On 12/20/2013 04:41 AM, rui wang wrote:
> On 12/20/13, Prarit Bhargava  wrote:
>>
>>
>> On 12/19/2013 01:05 PM, Tony Luck wrote:
>>> On Wed, Dec 18, 2013 at 11:50 AM, Tony Luck  wrote:
 Looks good to me.
>>>
>>> Though now I've been confused by an offline question about affinity.
>>
>> Heh :)  I'm pursuing it now.  Rui has asked a pretty good question that I
>> don't
>> know the answer to off the top of my head.  I'm still looking at the code.
>>
>>>
>>> Suppose we have some interrupt that has affinity to multiple cpus. E.g.
>>> (real example from one of my machines):
>>>
>>> # cat /proc/irq/94/smp_affinity_list
>>> 26,54
>>>
>>> Now If I want to take either cpu26 or cpu54 offline - I'm guessing that I
>>> don't
>>> really need to find a new home for vector 94 - because the other one of
>>> that
>>> pair already has that set up.  But your check_vectors code doesn't look
>>> like
>>> it accounts for that - if we take cpu26 offline - it would see that
>>> cpu54 doesn't
>>> have 94 free - but doesn't check that it is for the same interrupt.
>>>
>>> But I may be mixing "vectors" and "irqs" here.
>>
>> Yep.  The question really is this: is the irq mapped to a single vector or
>> multiple vectors. (I think)
>>
> 
> The vector number for an irq is programmed in the LSB of the IOAPIC
> IRTE (or MSI data register in the case of MSI/MSIx). So there can be
> only one vector number (although multiple CPUs can be specified
> through DM). An MSI-capable device can dynamically change the lower
> few bits in the LSB to signal multiple interrupts with a contiguous
> range of vectors in powers of 2,but each of these vectors is treated
> as a separate IRQ. i.e. each of them has a separate irq desc, or a
> separate line in the /proc/interrupt file. This patch shows the MSI
> irq allocation in detail:
> http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=51906e779f2b13b38f8153774c4c7163d412ffd9
> 
> Thanks
> Rui
> 

Gong and Rui,

After looking at this in detail I realized I made a mistake in my patch by
including the check for the smp_affinity.  Simply put, it shouldn't be there
given Rui's explanation above.

So I think the patch simply needs to do:

this_count = 0;
for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
irq = __this_cpu_read(vector_irq[vector]);
if (irq >= 0) {
desc = irq_to_desc(irq);
data = irq_desc_get_irq_data(desc);
affinity = data->affinity;
if (irq_has_action(irq) && !irqd_is_per_cpu(data))
this_count++;
}
}

Can the two of you confirm the above is correct?  It would be greatly 
appreciated.

Tony, I apologize -- your comments made me think you were stating a fact and not
asking a question on the behavior of affinity.  I completely misunderstood what
you were suggesting.  I thought you were implying that that the affinity "tied"
IRQ behavior together; it does not.  It is simply a suggestion of what IRQs
should be assigned to a particular CPU.  There is an expectation that the system
will attempt to honour the affinity, however, it is not like each CPU is
assigned a separate IRQ.

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-28 Thread Prarit Bhargava


On 12/20/2013 04:41 AM, rui wang wrote:
 On 12/20/13, Prarit Bhargava pra...@redhat.com wrote:


 On 12/19/2013 01:05 PM, Tony Luck wrote:
 On Wed, Dec 18, 2013 at 11:50 AM, Tony Luck tony.l...@intel.com wrote:
 Looks good to me.

 Though now I've been confused by an offline question about affinity.

 Heh :)  I'm pursuing it now.  Rui has asked a pretty good question that I
 don't
 know the answer to off the top of my head.  I'm still looking at the code.


 Suppose we have some interrupt that has affinity to multiple cpus. E.g.
 (real example from one of my machines):

 # cat /proc/irq/94/smp_affinity_list
 26,54

 Now If I want to take either cpu26 or cpu54 offline - I'm guessing that I
 don't
 really need to find a new home for vector 94 - because the other one of
 that
 pair already has that set up.  But your check_vectors code doesn't look
 like
 it accounts for that - if we take cpu26 offline - it would see that
 cpu54 doesn't
 have 94 free - but doesn't check that it is for the same interrupt.

 But I may be mixing vectors and irqs here.

 Yep.  The question really is this: is the irq mapped to a single vector or
 multiple vectors. (I think)

 
 The vector number for an irq is programmed in the LSB of the IOAPIC
 IRTE (or MSI data register in the case of MSI/MSIx). So there can be
 only one vector number (although multiple CPUs can be specified
 through DM). An MSI-capable device can dynamically change the lower
 few bits in the LSB to signal multiple interrupts with a contiguous
 range of vectors in powers of 2,but each of these vectors is treated
 as a separate IRQ. i.e. each of them has a separate irq desc, or a
 separate line in the /proc/interrupt file. This patch shows the MSI
 irq allocation in detail:
 http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=51906e779f2b13b38f8153774c4c7163d412ffd9
 
 Thanks
 Rui
 

Gong and Rui,

After looking at this in detail I realized I made a mistake in my patch by
including the check for the smp_affinity.  Simply put, it shouldn't be there
given Rui's explanation above.

So I think the patch simply needs to do:

this_count = 0;
for (vector = FIRST_EXTERNAL_VECTOR; vector  NR_VECTORS; vector++) {
irq = __this_cpu_read(vector_irq[vector]);
if (irq = 0) {
desc = irq_to_desc(irq);
data = irq_desc_get_irq_data(desc);
affinity = data-affinity;
if (irq_has_action(irq)  !irqd_is_per_cpu(data))
this_count++;
}
}

Can the two of you confirm the above is correct?  It would be greatly 
appreciated.

Tony, I apologize -- your comments made me think you were stating a fact and not
asking a question on the behavior of affinity.  I completely misunderstood what
you were suggesting.  I thought you were implying that that the affinity tied
IRQ behavior together; it does not.  It is simply a suggestion of what IRQs
should be assigned to a particular CPU.  There is an expectation that the system
will attempt to honour the affinity, however, it is not like each CPU is
assigned a separate IRQ.

P.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v3]

2013-12-23 Thread Prarit Bhargava


On 12/23/2013 02:12 AM, Chen, Gong wrote:
> On Fri, Dec 20, 2013 at 10:50:09AM -0500, Prarit Bhargava wrote:
>> +int check_vectors(void)
>> +{
>> +int irq, cpu;
>> +unsigned int vector, this_count, count;
>> +struct irq_desc *desc;
>> +struct irq_data *data;
>> +struct cpumask *affinity;
>> +
>> +this_count = 0;
>> +for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
>> +irq = __this_cpu_read(vector_irq[vector]);
>> +if (irq >= 0) {
>> +desc = irq_to_desc(irq);
>> +data = irq_desc_get_irq_data(desc);
>> +affinity = data->affinity;
>> +if (irq_has_action(irq) || !irqd_is_per_cpu(data) ||
> This line looks weird. Why counter will be increased once the irq has
> action? Do you mean
> 
> if (irq_has_action(irq) && !irqd_is_per_cpu(data) &&
> !cpumask_subset(affinity, cpu_online_mask))
> 
> 

Yes, you're absolutely right.  I mixed up the and and or's on this line of code.
 I will send out a patch against tip shortly.

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v3]

2013-12-23 Thread Prarit Bhargava


On 12/23/2013 02:12 AM, Chen, Gong wrote:
 On Fri, Dec 20, 2013 at 10:50:09AM -0500, Prarit Bhargava wrote:
 +int check_vectors(void)
 +{
 +int irq, cpu;
 +unsigned int vector, this_count, count;
 +struct irq_desc *desc;
 +struct irq_data *data;
 +struct cpumask *affinity;
 +
 +this_count = 0;
 +for (vector = FIRST_EXTERNAL_VECTOR; vector  NR_VECTORS; vector++) {
 +irq = __this_cpu_read(vector_irq[vector]);
 +if (irq = 0) {
 +desc = irq_to_desc(irq);
 +data = irq_desc_get_irq_data(desc);
 +affinity = data-affinity;
 +if (irq_has_action(irq) || !irqd_is_per_cpu(data) ||
 This line looks weird. Why counter will be increased once the irq has
 action? Do you mean
 
 if (irq_has_action(irq)  !irqd_is_per_cpu(data) 
 !cpumask_subset(affinity, cpu_online_mask))
 
 

Yes, you're absolutely right.  I mixed up the and and or's on this line of code.
 I will send out a patch against tip shortly.

P.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v3]

2013-12-22 Thread Chen, Gong
On Fri, Dec 20, 2013 at 10:50:09AM -0500, Prarit Bhargava wrote:
> +int check_vectors(void)
> +{
> + int irq, cpu;
> + unsigned int vector, this_count, count;
> + struct irq_desc *desc;
> + struct irq_data *data;
> + struct cpumask *affinity;
> +
> + this_count = 0;
> + for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
> + irq = __this_cpu_read(vector_irq[vector]);
> + if (irq >= 0) {
> + desc = irq_to_desc(irq);
> + data = irq_desc_get_irq_data(desc);
> + affinity = data->affinity;
> + if (irq_has_action(irq) || !irqd_is_per_cpu(data) ||
This line looks weird. Why counter will be increased once the irq has
action? Do you mean

if (irq_has_action(irq) && !irqd_is_per_cpu(data) &&
!cpumask_subset(affinity, cpu_online_mask))


> + !cpumask_subset(affinity, cpu_online_mask))
It looks like cpu_online_mask will be updated until cpu_disable_common
is called, but your check_vectors is called before that.

[...]
>  {
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 85dc05a..7ac6654 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1312,6 +1312,12 @@ void cpu_disable_common(void)
>  
>  int native_cpu_disable(void)
>  {
> + int ret;
> +
> + ret = check_vectors();
> + if (ret)
> + return ret;
> +
>   clear_local_APIC();
>  
>   cpu_disable_common();
> -- 
> 1.7.9.3
> 


signature.asc
Description: Digital signature


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v3]

2013-12-22 Thread Chen, Gong
On Fri, Dec 20, 2013 at 10:50:09AM -0500, Prarit Bhargava wrote:
 +int check_vectors(void)
 +{
 + int irq, cpu;
 + unsigned int vector, this_count, count;
 + struct irq_desc *desc;
 + struct irq_data *data;
 + struct cpumask *affinity;
 +
 + this_count = 0;
 + for (vector = FIRST_EXTERNAL_VECTOR; vector  NR_VECTORS; vector++) {
 + irq = __this_cpu_read(vector_irq[vector]);
 + if (irq = 0) {
 + desc = irq_to_desc(irq);
 + data = irq_desc_get_irq_data(desc);
 + affinity = data-affinity;
 + if (irq_has_action(irq) || !irqd_is_per_cpu(data) ||
This line looks weird. Why counter will be increased once the irq has
action? Do you mean

if (irq_has_action(irq)  !irqd_is_per_cpu(data) 
!cpumask_subset(affinity, cpu_online_mask))


 + !cpumask_subset(affinity, cpu_online_mask))
It looks like cpu_online_mask will be updated until cpu_disable_common
is called, but your check_vectors is called before that.

[...]
  {
 diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
 index 85dc05a..7ac6654 100644
 --- a/arch/x86/kernel/smpboot.c
 +++ b/arch/x86/kernel/smpboot.c
 @@ -1312,6 +1312,12 @@ void cpu_disable_common(void)
  
  int native_cpu_disable(void)
  {
 + int ret;
 +
 + ret = check_vectors();
 + if (ret)
 + return ret;
 +
   clear_local_APIC();
  
   cpu_disable_common();
 -- 
 1.7.9.3
 


signature.asc
Description: Digital signature


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v3]

2013-12-20 Thread H. Peter Anvin
On 12/20/2013 03:06 PM, Luck, Tony wrote:
>> I haven't double checked, but I'm assuming the hot plug locks are held
>> while you are doing this.
> 
> I dug into that when looking at v2 - the whole thing is under "stop_machine()"
> so locking was not an issue.
> 
> Reviewed-by: Tony Luck 
> 

Awesome, thank you!  I have queued this up for Linus already as a fix
for 3.13, as it seems important enough.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] x86: Add check for number of available vectors before CPU down [v3]

2013-12-20 Thread Luck, Tony
> I haven't double checked, but I'm assuming the hot plug locks are held
> while you are doing this.

I dug into that when looking at v2 - the whole thing is under "stop_machine()"
so locking was not an issue.

Reviewed-by: Tony Luck 

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v3]

2013-12-20 Thread Andi Kleen
On Fri, Dec 20, 2013 at 10:50:09AM -0500, Prarit Bhargava wrote:
> I tested across various systems with both the linux.git and linux-next.git
> trees and do not see any false positives.  The patch behaves as expected on
> a system with 590 IRQs allocated at boot time; the system refuses to down cpu
> 62.

I read the patch and it looks good to me

I haven't double checked, but I'm assuming the hot plug locks are held
while you are doing this.

Reviewed-by: Andi Kleen 

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-20 Thread Prarit Bhargava


On 12/20/2013 04:41 AM, rui wang wrote:
> On 12/20/13, Prarit Bhargava  wrote:
>>
>>
>> On 12/19/2013 01:05 PM, Tony Luck wrote:
>>> On Wed, Dec 18, 2013 at 11:50 AM, Tony Luck  wrote:
 Looks good to me.
>>>
>>> Though now I've been confused by an offline question about affinity.
>>
>> Heh :)  I'm pursuing it now.  Rui has asked a pretty good question that I
>> don't
>> know the answer to off the top of my head.  I'm still looking at the code.
>>
>>>
>>> Suppose we have some interrupt that has affinity to multiple cpus. E.g.
>>> (real example from one of my machines):
>>>
>>> # cat /proc/irq/94/smp_affinity_list
>>> 26,54
>>>
>>> Now If I want to take either cpu26 or cpu54 offline - I'm guessing that I
>>> don't
>>> really need to find a new home for vector 94 - because the other one of
>>> that
>>> pair already has that set up.  But your check_vectors code doesn't look
>>> like
>>> it accounts for that - if we take cpu26 offline - it would see that
>>> cpu54 doesn't
>>> have 94 free - but doesn't check that it is for the same interrupt.
>>>
>>> But I may be mixing "vectors" and "irqs" here.
>>
>> Yep.  The question really is this: is the irq mapped to a single vector or
>> multiple vectors. (I think)
>>
> 
> The vector number for an irq is programmed in the LSB of the IOAPIC
> IRTE (or MSI data register in the case of MSI/MSIx). So there can be
> only one vector number (although multiple CPUs can be specified
> through DM). An MSI-capable device can dynamically change the lower
> few bits in the LSB to signal multiple interrupts with a contiguous
> range of vectors in powers of 2,but each of these vectors is treated
> as a separate IRQ. i.e. each of them has a separate irq desc, or a
> separate line in the /proc/interrupt file. This patch shows the MSI
> irq allocation in detail:
> http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=51906e779f2b13b38f8153774c4c7163d412ffd9
> 

Yep, that's where I got the idea for the affinity check in my previous post.  So
far linux.git top-of-tree + my patch looks good.  I'm pulling down
linux-next.git right now ...

P.

> Thanks
> Rui
> 
> 
>> P.
>>
>>>
>>> -Tony
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-20 Thread rui wang
On 12/20/13, Prarit Bhargava  wrote:
>
>
> On 12/19/2013 01:05 PM, Tony Luck wrote:
>> On Wed, Dec 18, 2013 at 11:50 AM, Tony Luck  wrote:
>>> Looks good to me.
>>
>> Though now I've been confused by an offline question about affinity.
>
> Heh :)  I'm pursuing it now.  Rui has asked a pretty good question that I
> don't
> know the answer to off the top of my head.  I'm still looking at the code.
>
>>
>> Suppose we have some interrupt that has affinity to multiple cpus. E.g.
>> (real example from one of my machines):
>>
>> # cat /proc/irq/94/smp_affinity_list
>> 26,54
>>
>> Now If I want to take either cpu26 or cpu54 offline - I'm guessing that I
>> don't
>> really need to find a new home for vector 94 - because the other one of
>> that
>> pair already has that set up.  But your check_vectors code doesn't look
>> like
>> it accounts for that - if we take cpu26 offline - it would see that
>> cpu54 doesn't
>> have 94 free - but doesn't check that it is for the same interrupt.
>>
>> But I may be mixing "vectors" and "irqs" here.
>
> Yep.  The question really is this: is the irq mapped to a single vector or
> multiple vectors. (I think)
>

The vector number for an irq is programmed in the LSB of the IOAPIC
IRTE (or MSI data register in the case of MSI/MSIx). So there can be
only one vector number (although multiple CPUs can be specified
through DM). An MSI-capable device can dynamically change the lower
few bits in the LSB to signal multiple interrupts with a contiguous
range of vectors in powers of 2,but each of these vectors is treated
as a separate IRQ. i.e. each of them has a separate irq desc, or a
separate line in the /proc/interrupt file. This patch shows the MSI
irq allocation in detail:
http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=51906e779f2b13b38f8153774c4c7163d412ffd9

Thanks
Rui


> P.
>
>>
>> -Tony
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-20 Thread rui wang
On 12/20/13, Prarit Bhargava pra...@redhat.com wrote:


 On 12/19/2013 01:05 PM, Tony Luck wrote:
 On Wed, Dec 18, 2013 at 11:50 AM, Tony Luck tony.l...@intel.com wrote:
 Looks good to me.

 Though now I've been confused by an offline question about affinity.

 Heh :)  I'm pursuing it now.  Rui has asked a pretty good question that I
 don't
 know the answer to off the top of my head.  I'm still looking at the code.


 Suppose we have some interrupt that has affinity to multiple cpus. E.g.
 (real example from one of my machines):

 # cat /proc/irq/94/smp_affinity_list
 26,54

 Now If I want to take either cpu26 or cpu54 offline - I'm guessing that I
 don't
 really need to find a new home for vector 94 - because the other one of
 that
 pair already has that set up.  But your check_vectors code doesn't look
 like
 it accounts for that - if we take cpu26 offline - it would see that
 cpu54 doesn't
 have 94 free - but doesn't check that it is for the same interrupt.

 But I may be mixing vectors and irqs here.

 Yep.  The question really is this: is the irq mapped to a single vector or
 multiple vectors. (I think)


The vector number for an irq is programmed in the LSB of the IOAPIC
IRTE (or MSI data register in the case of MSI/MSIx). So there can be
only one vector number (although multiple CPUs can be specified
through DM). An MSI-capable device can dynamically change the lower
few bits in the LSB to signal multiple interrupts with a contiguous
range of vectors in powers of 2,but each of these vectors is treated
as a separate IRQ. i.e. each of them has a separate irq desc, or a
separate line in the /proc/interrupt file. This patch shows the MSI
irq allocation in detail:
http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=51906e779f2b13b38f8153774c4c7163d412ffd9

Thanks
Rui


 P.


 -Tony
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-20 Thread Prarit Bhargava


On 12/20/2013 04:41 AM, rui wang wrote:
 On 12/20/13, Prarit Bhargava pra...@redhat.com wrote:


 On 12/19/2013 01:05 PM, Tony Luck wrote:
 On Wed, Dec 18, 2013 at 11:50 AM, Tony Luck tony.l...@intel.com wrote:
 Looks good to me.

 Though now I've been confused by an offline question about affinity.

 Heh :)  I'm pursuing it now.  Rui has asked a pretty good question that I
 don't
 know the answer to off the top of my head.  I'm still looking at the code.


 Suppose we have some interrupt that has affinity to multiple cpus. E.g.
 (real example from one of my machines):

 # cat /proc/irq/94/smp_affinity_list
 26,54

 Now If I want to take either cpu26 or cpu54 offline - I'm guessing that I
 don't
 really need to find a new home for vector 94 - because the other one of
 that
 pair already has that set up.  But your check_vectors code doesn't look
 like
 it accounts for that - if we take cpu26 offline - it would see that
 cpu54 doesn't
 have 94 free - but doesn't check that it is for the same interrupt.

 But I may be mixing vectors and irqs here.

 Yep.  The question really is this: is the irq mapped to a single vector or
 multiple vectors. (I think)

 
 The vector number for an irq is programmed in the LSB of the IOAPIC
 IRTE (or MSI data register in the case of MSI/MSIx). So there can be
 only one vector number (although multiple CPUs can be specified
 through DM). An MSI-capable device can dynamically change the lower
 few bits in the LSB to signal multiple interrupts with a contiguous
 range of vectors in powers of 2,but each of these vectors is treated
 as a separate IRQ. i.e. each of them has a separate irq desc, or a
 separate line in the /proc/interrupt file. This patch shows the MSI
 irq allocation in detail:
 http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=51906e779f2b13b38f8153774c4c7163d412ffd9
 

Yep, that's where I got the idea for the affinity check in my previous post.  So
far linux.git top-of-tree + my patch looks good.  I'm pulling down
linux-next.git right now ...

P.

 Thanks
 Rui
 
 
 P.


 -Tony
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v3]

2013-12-20 Thread Andi Kleen
On Fri, Dec 20, 2013 at 10:50:09AM -0500, Prarit Bhargava wrote:
 I tested across various systems with both the linux.git and linux-next.git
 trees and do not see any false positives.  The patch behaves as expected on
 a system with 590 IRQs allocated at boot time; the system refuses to down cpu
 62.

I read the patch and it looks good to me

I haven't double checked, but I'm assuming the hot plug locks are held
while you are doing this.

Reviewed-by: Andi Kleen a...@linux.intel.com

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] x86: Add check for number of available vectors before CPU down [v3]

2013-12-20 Thread Luck, Tony
 I haven't double checked, but I'm assuming the hot plug locks are held
 while you are doing this.

I dug into that when looking at v2 - the whole thing is under stop_machine()
so locking was not an issue.

Reviewed-by: Tony Luck tony.l...@intel.com

-Tony
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v3]

2013-12-20 Thread H. Peter Anvin
On 12/20/2013 03:06 PM, Luck, Tony wrote:
 I haven't double checked, but I'm assuming the hot plug locks are held
 while you are doing this.
 
 I dug into that when looking at v2 - the whole thing is under stop_machine()
 so locking was not an issue.
 
 Reviewed-by: Tony Luck tony.l...@intel.com
 

Awesome, thank you!  I have queued this up for Linus already as a fix
for 3.13, as it seems important enough.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-19 Thread Chen, Gong
On Thu, Dec 19, 2013 at 01:11:32PM -0500, Prarit Bhargava wrote:
> 
> Yep.  The question really is this: is the irq mapped to a single vector or
> multiple vectors. (I think)
> 
> P.
> 
Yes, this is the original thought I want to express on the bugzilla. On an
ideal environment, without irq balance, all vector_irq should own same
values, say, 0x. So when one CPU is off-lined, your patch will
considers no places to contain these to-be-migrated irqs but the fact is
they are shared on different CPUs. So if we can answer this question,
the answer will be clear :-).


signature.asc
Description: Digital signature


Re: [PATCH] x86: Add check for number of available vectors before CPU down

2013-12-19 Thread Prarit Bhargava


On 12/19/2013 02:19 AM, rui wang wrote:
> On 12/19/13, Prarit Bhargava  wrote:
>>
>>
>> On 12/03/2013 09:48 PM, rui wang wrote:
>>> On 11/20/13, Prarit Bhargava  wrote:
>>> Have you considered the case when an IRQ is destined to more than one CPU?
>>> e.g.:
>>>
>>> bash# cat /proc/irq/89/smp_affinity_list
>>> 30,62
>>> bash#
>>>
>>> In this case offlining CPU30 does not seem to require an empty vector
>>> slot. It seems that we only need to change the affinity mask of irq89.
>>> Your check_vectors() assumed that each irq on the offlining cpu
>>> requires a new vector slot.
>>>
>>
>> Rui,
>>
>> The smp_affinity_list only indicates a preferred destination of the IRQ, not
>> the
>> *actual* location of the CPU.  So the IRQ is on one of cpu 30 or 62 but not
>> both
>> simultaneously.
>>
> 
> It depends on how IOAPIC (or MSI/MSIx) is configured. An IRQ can be
> simultaneously broadcast to all destination CPUs (Fixed Mode) or
> delivered to the CPU with the lowest priority task (Lowest Priority
> Mode). It's programmed in the Delivery Mode bits of the IOAPIC's IO
> Redirection table registers, or the Message Data Register in the case
> of MSI/MSIx
> 

Thanks for clueing me in Rui :).  You're right.  I do need to do a

if (irq_has_action(irq) || !irqd_is_per_cpu(data) ||
!cpumask_subset(affinity, cpu_online_mask))

instead of just

if  (!irqd_is_per_cpu(data))

I'm going to do some additional testing tonight across various systems and will
repost tomorrow if the testing is successful.

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-19 Thread Prarit Bhargava


On 12/19/2013 01:05 PM, Tony Luck wrote:
> On Wed, Dec 18, 2013 at 11:50 AM, Tony Luck  wrote:
>> Looks good to me.
> 
> Though now I've been confused by an offline question about affinity.

Heh :)  I'm pursuing it now.  Rui has asked a pretty good question that I don't
know the answer to off the top of my head.  I'm still looking at the code.

> 
> Suppose we have some interrupt that has affinity to multiple cpus. E.g.
> (real example from one of my machines):
> 
> # cat /proc/irq/94/smp_affinity_list
> 26,54
> 
> Now If I want to take either cpu26 or cpu54 offline - I'm guessing that I 
> don't
> really need to find a new home for vector 94 - because the other one of that
> pair already has that set up.  But your check_vectors code doesn't look like
> it accounts for that - if we take cpu26 offline - it would see that
> cpu54 doesn't
> have 94 free - but doesn't check that it is for the same interrupt.
> 
> But I may be mixing "vectors" and "irqs" here.

Yep.  The question really is this: is the irq mapped to a single vector or
multiple vectors. (I think)

P.

> 
> -Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-19 Thread Tony Luck
On Wed, Dec 18, 2013 at 11:50 AM, Tony Luck  wrote:
> Looks good to me.

Though now I've been confused by an offline question about affinity.

Suppose we have some interrupt that has affinity to multiple cpus. E.g.
(real example from one of my machines):

# cat /proc/irq/94/smp_affinity_list
26,54

Now If I want to take either cpu26 or cpu54 offline - I'm guessing that I don't
really need to find a new home for vector 94 - because the other one of that
pair already has that set up.  But your check_vectors code doesn't look like
it accounts for that - if we take cpu26 offline - it would see that
cpu54 doesn't
have 94 free - but doesn't check that it is for the same interrupt.

But I may be mixing "vectors" and "irqs" here.

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down

2013-12-19 Thread Prarit Bhargava


On 12/19/2013 02:19 AM, rui wang wrote:
> On 12/19/13, Prarit Bhargava  wrote:
>>
>>
>> On 12/03/2013 09:48 PM, rui wang wrote:
>>> On 11/20/13, Prarit Bhargava  wrote:
>>> Have you considered the case when an IRQ is destined to more than one CPU?
>>> e.g.:
>>>
>>> bash# cat /proc/irq/89/smp_affinity_list
>>> 30,62
>>> bash#
>>>
>>> In this case offlining CPU30 does not seem to require an empty vector
>>> slot. It seems that we only need to change the affinity mask of irq89.
>>> Your check_vectors() assumed that each irq on the offlining cpu
>>> requires a new vector slot.
>>>
>>
>> Rui,
>>
>> The smp_affinity_list only indicates a preferred destination of the IRQ, not
>> the
>> *actual* location of the CPU.  So the IRQ is on one of cpu 30 or 62 but not
>> both
>> simultaneously.
>>
> 
> It depends on how IOAPIC (or MSI/MSIx) is configured. An IRQ can be
> simultaneously broadcast to all destination CPUs (Fixed Mode) or
> delivered to the CPU with the lowest priority task (Lowest Priority
> Mode). It's programmed in the Delivery Mode bits of the IOAPIC's IO
> Redirection table registers, or the Message Data Register in the case
> of MSI/MSIx

Hmm ... I didn't realize that this was a possibility.  I'll go back and rework
the patch.

Thanks for the info Rui!

P.

> 
>> If the case is that 62 is being brought down, then the smp_affinity mask
>> will be
>> updated to reflect only cpu 30 (and vice versa).
>>
> 
> Yes the affinity mask should be updated. But if it was destined to
> more than one CPU, your "this_counter" does not seem to count the
> right numbers. Are you saying that smp_affinity mask is broken on
> Linux so that there's no way to configure an IRQ to target more than
> one CPU?
> 
> Thanks
> Rui
> 
>> P.
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down

2013-12-19 Thread Prarit Bhargava


On 12/19/2013 02:19 AM, rui wang wrote:
 On 12/19/13, Prarit Bhargava pra...@redhat.com wrote:


 On 12/03/2013 09:48 PM, rui wang wrote:
 On 11/20/13, Prarit Bhargava pra...@redhat.com wrote:
 Have you considered the case when an IRQ is destined to more than one CPU?
 e.g.:

 bash# cat /proc/irq/89/smp_affinity_list
 30,62
 bash#

 In this case offlining CPU30 does not seem to require an empty vector
 slot. It seems that we only need to change the affinity mask of irq89.
 Your check_vectors() assumed that each irq on the offlining cpu
 requires a new vector slot.


 Rui,

 The smp_affinity_list only indicates a preferred destination of the IRQ, not
 the
 *actual* location of the CPU.  So the IRQ is on one of cpu 30 or 62 but not
 both
 simultaneously.

 
 It depends on how IOAPIC (or MSI/MSIx) is configured. An IRQ can be
 simultaneously broadcast to all destination CPUs (Fixed Mode) or
 delivered to the CPU with the lowest priority task (Lowest Priority
 Mode). It's programmed in the Delivery Mode bits of the IOAPIC's IO
 Redirection table registers, or the Message Data Register in the case
 of MSI/MSIx

Hmm ... I didn't realize that this was a possibility.  I'll go back and rework
the patch.

Thanks for the info Rui!

P.

 
 If the case is that 62 is being brought down, then the smp_affinity mask
 will be
 updated to reflect only cpu 30 (and vice versa).

 
 Yes the affinity mask should be updated. But if it was destined to
 more than one CPU, your this_counter does not seem to count the
 right numbers. Are you saying that smp_affinity mask is broken on
 Linux so that there's no way to configure an IRQ to target more than
 one CPU?
 
 Thanks
 Rui
 
 P.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-19 Thread Tony Luck
On Wed, Dec 18, 2013 at 11:50 AM, Tony Luck tony.l...@intel.com wrote:
 Looks good to me.

Though now I've been confused by an offline question about affinity.

Suppose we have some interrupt that has affinity to multiple cpus. E.g.
(real example from one of my machines):

# cat /proc/irq/94/smp_affinity_list
26,54

Now If I want to take either cpu26 or cpu54 offline - I'm guessing that I don't
really need to find a new home for vector 94 - because the other one of that
pair already has that set up.  But your check_vectors code doesn't look like
it accounts for that - if we take cpu26 offline - it would see that
cpu54 doesn't
have 94 free - but doesn't check that it is for the same interrupt.

But I may be mixing vectors and irqs here.

-Tony
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-19 Thread Prarit Bhargava


On 12/19/2013 01:05 PM, Tony Luck wrote:
 On Wed, Dec 18, 2013 at 11:50 AM, Tony Luck tony.l...@intel.com wrote:
 Looks good to me.
 
 Though now I've been confused by an offline question about affinity.

Heh :)  I'm pursuing it now.  Rui has asked a pretty good question that I don't
know the answer to off the top of my head.  I'm still looking at the code.

 
 Suppose we have some interrupt that has affinity to multiple cpus. E.g.
 (real example from one of my machines):
 
 # cat /proc/irq/94/smp_affinity_list
 26,54
 
 Now If I want to take either cpu26 or cpu54 offline - I'm guessing that I 
 don't
 really need to find a new home for vector 94 - because the other one of that
 pair already has that set up.  But your check_vectors code doesn't look like
 it accounts for that - if we take cpu26 offline - it would see that
 cpu54 doesn't
 have 94 free - but doesn't check that it is for the same interrupt.
 
 But I may be mixing vectors and irqs here.

Yep.  The question really is this: is the irq mapped to a single vector or
multiple vectors. (I think)

P.

 
 -Tony
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down

2013-12-19 Thread Prarit Bhargava


On 12/19/2013 02:19 AM, rui wang wrote:
 On 12/19/13, Prarit Bhargava pra...@redhat.com wrote:


 On 12/03/2013 09:48 PM, rui wang wrote:
 On 11/20/13, Prarit Bhargava pra...@redhat.com wrote:
 Have you considered the case when an IRQ is destined to more than one CPU?
 e.g.:

 bash# cat /proc/irq/89/smp_affinity_list
 30,62
 bash#

 In this case offlining CPU30 does not seem to require an empty vector
 slot. It seems that we only need to change the affinity mask of irq89.
 Your check_vectors() assumed that each irq on the offlining cpu
 requires a new vector slot.


 Rui,

 The smp_affinity_list only indicates a preferred destination of the IRQ, not
 the
 *actual* location of the CPU.  So the IRQ is on one of cpu 30 or 62 but not
 both
 simultaneously.

 
 It depends on how IOAPIC (or MSI/MSIx) is configured. An IRQ can be
 simultaneously broadcast to all destination CPUs (Fixed Mode) or
 delivered to the CPU with the lowest priority task (Lowest Priority
 Mode). It's programmed in the Delivery Mode bits of the IOAPIC's IO
 Redirection table registers, or the Message Data Register in the case
 of MSI/MSIx
 

Thanks for clueing me in Rui :).  You're right.  I do need to do a

if (irq_has_action(irq) || !irqd_is_per_cpu(data) ||
!cpumask_subset(affinity, cpu_online_mask))

instead of just

if  (!irqd_is_per_cpu(data))

I'm going to do some additional testing tonight across various systems and will
repost tomorrow if the testing is successful.

P.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-19 Thread Chen, Gong
On Thu, Dec 19, 2013 at 01:11:32PM -0500, Prarit Bhargava wrote:
 
 Yep.  The question really is this: is the irq mapped to a single vector or
 multiple vectors. (I think)
 
 P.
 
Yes, this is the original thought I want to express on the bugzilla. On an
ideal environment, without irq balance, all vector_irq should own same
values, say, 0x. So when one CPU is off-lined, your patch will
considers no places to contain these to-be-migrated irqs but the fact is
they are shared on different CPUs. So if we can answer this question,
the answer will be clear :-).


signature.asc
Description: Digital signature


Re: [PATCH] x86: Add check for number of available vectors before CPU down

2013-12-18 Thread rui wang
On 12/19/13, Prarit Bhargava  wrote:
>
>
> On 12/03/2013 09:48 PM, rui wang wrote:
>> On 11/20/13, Prarit Bhargava  wrote:
>> Have you considered the case when an IRQ is destined to more than one CPU?
>> e.g.:
>>
>> bash# cat /proc/irq/89/smp_affinity_list
>> 30,62
>> bash#
>>
>> In this case offlining CPU30 does not seem to require an empty vector
>> slot. It seems that we only need to change the affinity mask of irq89.
>> Your check_vectors() assumed that each irq on the offlining cpu
>> requires a new vector slot.
>>
>
> Rui,
>
> The smp_affinity_list only indicates a preferred destination of the IRQ, not
> the
> *actual* location of the CPU.  So the IRQ is on one of cpu 30 or 62 but not
> both
> simultaneously.
>

It depends on how IOAPIC (or MSI/MSIx) is configured. An IRQ can be
simultaneously broadcast to all destination CPUs (Fixed Mode) or
delivered to the CPU with the lowest priority task (Lowest Priority
Mode). It's programmed in the Delivery Mode bits of the IOAPIC's IO
Redirection table registers, or the Message Data Register in the case
of MSI/MSIx

> If the case is that 62 is being brought down, then the smp_affinity mask
> will be
> updated to reflect only cpu 30 (and vice versa).
>

Yes the affinity mask should be updated. But if it was destined to
more than one CPU, your "this_counter" does not seem to count the
right numbers. Are you saying that smp_affinity mask is broken on
Linux so that there's no way to configure an IRQ to target more than
one CPU?

Thanks
Rui

> P.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-18 Thread Tony Luck
On Wed, Dec 18, 2013 at 11:29 AM, Prarit Bhargava  wrote:
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791
>
> When a cpu is downed on a system, the irqs on the cpu are assigned to other 
> cpus.  It is possible, however, that when a cpu is downed there aren't enough 
> free vectors on the remaining cpus to account for the vectors from the cpu 
> that is being downed.
>
> This results in an interesting "overflow" condition where irqs are "assigned" 
> to a CPU but are not handled.

Looks good to me.

Acked-by: Tony Luck 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down

2013-12-18 Thread Prarit Bhargava


On 12/03/2013 09:48 PM, rui wang wrote:
> On 11/20/13, Prarit Bhargava  wrote:
> Have you considered the case when an IRQ is destined to more than one CPU? 
> e.g.:
> 
> bash# cat /proc/irq/89/smp_affinity_list
> 30,62
> bash#
> 
> In this case offlining CPU30 does not seem to require an empty vector
> slot. It seems that we only need to change the affinity mask of irq89.
> Your check_vectors() assumed that each irq on the offlining cpu
> requires a new vector slot.
> 

Rui,

The smp_affinity_list only indicates a preferred destination of the IRQ, not the
*actual* location of the CPU.  So the IRQ is on one of cpu 30 or 62 but not both
simultaneously.

If the case is that 62 is being brought down, then the smp_affinity mask will be
updated to reflect only cpu 30 (and vice versa).

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down

2013-12-18 Thread Prarit Bhargava


On 12/03/2013 09:48 PM, rui wang wrote:
 On 11/20/13, Prarit Bhargava pra...@redhat.com wrote:
 Have you considered the case when an IRQ is destined to more than one CPU? 
 e.g.:
 
 bash# cat /proc/irq/89/smp_affinity_list
 30,62
 bash#
 
 In this case offlining CPU30 does not seem to require an empty vector
 slot. It seems that we only need to change the affinity mask of irq89.
 Your check_vectors() assumed that each irq on the offlining cpu
 requires a new vector slot.
 

Rui,

The smp_affinity_list only indicates a preferred destination of the IRQ, not the
*actual* location of the CPU.  So the IRQ is on one of cpu 30 or 62 but not both
simultaneously.

If the case is that 62 is being brought down, then the smp_affinity mask will be
updated to reflect only cpu 30 (and vice versa).

P.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down [v2]

2013-12-18 Thread Tony Luck
On Wed, Dec 18, 2013 at 11:29 AM, Prarit Bhargava pra...@redhat.com wrote:
 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791

 When a cpu is downed on a system, the irqs on the cpu are assigned to other 
 cpus.  It is possible, however, that when a cpu is downed there aren't enough 
 free vectors on the remaining cpus to account for the vectors from the cpu 
 that is being downed.

 This results in an interesting overflow condition where irqs are assigned 
 to a CPU but are not handled.

Looks good to me.

Acked-by: Tony Luck tony.l...@intel.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down

2013-12-18 Thread rui wang
On 12/19/13, Prarit Bhargava pra...@redhat.com wrote:


 On 12/03/2013 09:48 PM, rui wang wrote:
 On 11/20/13, Prarit Bhargava pra...@redhat.com wrote:
 Have you considered the case when an IRQ is destined to more than one CPU?
 e.g.:

 bash# cat /proc/irq/89/smp_affinity_list
 30,62
 bash#

 In this case offlining CPU30 does not seem to require an empty vector
 slot. It seems that we only need to change the affinity mask of irq89.
 Your check_vectors() assumed that each irq on the offlining cpu
 requires a new vector slot.


 Rui,

 The smp_affinity_list only indicates a preferred destination of the IRQ, not
 the
 *actual* location of the CPU.  So the IRQ is on one of cpu 30 or 62 but not
 both
 simultaneously.


It depends on how IOAPIC (or MSI/MSIx) is configured. An IRQ can be
simultaneously broadcast to all destination CPUs (Fixed Mode) or
delivered to the CPU with the lowest priority task (Lowest Priority
Mode). It's programmed in the Delivery Mode bits of the IOAPIC's IO
Redirection table registers, or the Message Data Register in the case
of MSI/MSIx

 If the case is that 62 is being brought down, then the smp_affinity mask
 will be
 updated to reflect only cpu 30 (and vice versa).


Yes the affinity mask should be updated. But if it was destined to
more than one CPU, your this_counter does not seem to count the
right numbers. Are you saying that smp_affinity mask is broken on
Linux so that there's no way to configure an IRQ to target more than
one CPU?

Thanks
Rui

 P.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down

2013-12-04 Thread Prarit Bhargava


On 12/03/2013 06:42 PM, Yu, Fenghua wrote:
> 
> 
>> -Original Message-
>> From: Prarit Bhargava [mailto:pra...@redhat.com]
>>
>> Second try at this ...
>>
>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791
>>
>> When a cpu is downed on a system, the irqs on the cpu are assigned to
>> other cpus.  It is possible, however, that when a cpu is downed there
>> aren't enough free vectors on the remaining cpus to account for the
>> vectors from the cpu that is being downed.
>>
>> This results in an interesting "overflow" condition where irqs are
>> "assigned" to a CPU but are not handled.
>>
>> For example, when downing cpus on a 1-64 logical processor system:
>>
>> +if (per_cpu(vector_irq, cpu)[vector] < 0)
>> +count++;
> 
> But later on fixup_irqs will set some of vector_irq vector as -1 on this
> to-be-disabled cpu. That will release vectors assigned to this cpu. So
> checking vector_irq at this point before fixup_irqs doesn't make sense, right?
> 

Fenghua, IIUC the purpose of fixup_irqs() is to migrate irqs from the
to-be-disabled cpu to other cpus by using affinity, etc..  The to-be-disabled
cpus' irqs are set to -1, however, only after they are migrated off to an
enabled cpu.

This check I'm introducing is well before that stage in effort to determine
whether or not the system has "room" for the irqs on the remaining cpus.

So I think that the check is in the right spot.  We should do it early.  I
suppose an argument could be made to do the check in fixup_irqs(), however, my
feeling is by then it is very late in the process of downing the cpu.

I noticed a subtle bug in my patch that I will send with a [v2].  I'm not taking
into account per-cpu interrupts in the check_vectors() function.

P.

> Thanks.
> 
> -Fenghua
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down

2013-12-04 Thread Prarit Bhargava


On 12/03/2013 06:42 PM, Yu, Fenghua wrote:
 
 
 -Original Message-
 From: Prarit Bhargava [mailto:pra...@redhat.com]

 Second try at this ...

 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791

 When a cpu is downed on a system, the irqs on the cpu are assigned to
 other cpus.  It is possible, however, that when a cpu is downed there
 aren't enough free vectors on the remaining cpus to account for the
 vectors from the cpu that is being downed.

 This results in an interesting overflow condition where irqs are
 assigned to a CPU but are not handled.

 For example, when downing cpus on a 1-64 logical processor system:

 +if (per_cpu(vector_irq, cpu)[vector]  0)
 +count++;
 
 But later on fixup_irqs will set some of vector_irq vector as -1 on this
 to-be-disabled cpu. That will release vectors assigned to this cpu. So
 checking vector_irq at this point before fixup_irqs doesn't make sense, right?
 

Fenghua, IIUC the purpose of fixup_irqs() is to migrate irqs from the
to-be-disabled cpu to other cpus by using affinity, etc..  The to-be-disabled
cpus' irqs are set to -1, however, only after they are migrated off to an
enabled cpu.

This check I'm introducing is well before that stage in effort to determine
whether or not the system has room for the irqs on the remaining cpus.

So I think that the check is in the right spot.  We should do it early.  I
suppose an argument could be made to do the check in fixup_irqs(), however, my
feeling is by then it is very late in the process of downing the cpu.

I noticed a subtle bug in my patch that I will send with a [v2].  I'm not taking
into account per-cpu interrupts in the check_vectors() function.

P.

 Thanks.
 
 -Fenghua
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down

2013-12-03 Thread rui wang
On 11/20/13, Prarit Bhargava  wrote:
> Second try at this ...
>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791
>
> When a cpu is downed on a system, the irqs on the cpu are assigned to other
> cpus.  It is possible, however, that when a cpu is downed there aren't
> enough free vectors on the remaining cpus to account for the vectors from
> the cpu that is being downed.
>
> This results in an interesting "overflow" condition where irqs are
> "assigned" to a CPU but are not handled.
>
> For example, when downing cpus on a 1-64 logical processor system:
>
> 
> [  232.021745] smpboot: CPU 61 is now offline
> [  238.480275] smpboot: CPU 62 is now offline
> [  245.991080] [ cut here ]
> [  245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264
> dev_watchdog+0x246/0x250()
> [  246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out
> [  246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support
> sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp
> mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas
> scsi_transport_sas
> [  246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14
> [  246.044451] Hardware name: Intel Corporation S4600LH
> ../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
> [  246.057371]  0009 88081fa03d40 8164fbf6
> 88081fa0ee48
> [  246.065728]  88081fa03d90 88081fa03d80 81054ecc
> 88081fa13040
> [  246.074073]   88200cce 0040
> 
> [  246.082430] Call Trace:
> [  246.085174][] dump_stack+0x46/0x58
> [  246.091633]  [] warn_slowpath_common+0x8c/0xc0
> [  246.098352]  [] warn_slowpath_fmt+0x46/0x50
> [  246.104786]  [] dev_watchdog+0x246/0x250
> [  246.110923]  [] ?
> dev_deactivate_queue.constprop.31+0x80/0x80
> [  246.119097]  [] call_timer_fn+0x3a/0x110
> [  246.125224]  [] ? update_process_times+0x6f/0x80
> [  246.132137]  [] ?
> dev_deactivate_queue.constprop.31+0x80/0x80
> [  246.140308]  [] run_timer_softirq+0x1f0/0x2a0
> [  246.146933]  [] __do_softirq+0xe0/0x220
> [  246.152976]  [] call_softirq+0x1c/0x30
> [  246.158920]  [] do_softirq+0x55/0x90
> [  246.164670]  [] irq_exit+0xa5/0xb0
> [  246.170227]  [] smp_apic_timer_interrupt+0x4a/0x60
> [  246.177324]  [] apic_timer_interrupt+0x6a/0x70
> [  246.184041][] ? cpuidle_enter_state+0x5b/0xe0
> [  246.191559]  [] ? cpuidle_enter_state+0x57/0xe0
> [  246.198374]  [] cpuidle_idle_call+0xbd/0x200
> [  246.204900]  [] arch_cpu_idle+0xe/0x30
> [  246.210846]  [] cpu_startup_entry+0xd0/0x250
> [  246.217371]  [] rest_init+0x77/0x80
> [  246.223028]  [] start_kernel+0x3ee/0x3fb
> [  246.229165]  [] ? repair_env_string+0x5e/0x5e
> [  246.235787]  [] x86_64_start_reservations+0x2a/0x2c
> [  246.242990]  [] x86_64_start_kernel+0xf8/0xfc
> [  246.249610] ---[ end trace fb74fdef54d79039 ]---
> [  246.254807] ixgbe :c2:00.0 p786p1: initiating reset due to tx
> timeout
> [  246.262489] ixgbe :c2:00.0 p786p1: Reset adapter
> Last login: Mon Nov 11 08:35:14 from 10.18.17.119
> [root@(none) ~]# [  246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5
> [  249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow
> Control: RX/TX
> [  246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5
> [  249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow
> Control: RX/TX
>
> (last lines keep repeating.  ixgbe driver is dead until module reload.)
>
> If the downed cpu has more vectors than are free on the remaining cpus on
> the
> system, it is possible that some vectors are "orphaned" even though they
> are
> assigned to a cpu.  In this case, since the ixgbe driver had a watchdog,
> the
> watchdog fired and notified that something was wrong.
>
> This patch adds a function, check_vectors(), to compare the number of
> vectors
> on the CPU going down and compares it to the number of vectors available on
> the system.  If there aren't enough vectors for the CPU to go down, an
> error is returned and propogated back to userspace.
>
> Signed-off-by: Prarit Bhargava 
> Cc: x...@kernel.org
> ---
>  arch/x86/include/asm/irq.h |1 +
>  arch/x86/kernel/irq.c  |   33 +
>  arch/x86/kernel/smpboot.c  |6 ++
>  3 files changed, 40 insertions(+)
>
> diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h
> index 0ea10f27..dfd7372 100644
> --- a/arch/x86/include/asm/irq.h
> +++ b/arch/x86/include/asm/irq.h
> @@ -25,6 +25,7 @@ extern void irq_ctx_init(int cpu);
>
>  #ifdef CONFIG_HOTPLUG_CPU
>  #include 
> +extern int check_vectors(void);
>  extern void fixup_irqs(void);
>  extern void irq_force_complete_move(int);
>  #endif
> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> index 22d0687..ea5fa19 100644
> --- a/arch/x86/kernel/irq.c
> +++ b/arch/x86/kernel/irq.c
> @@ -262,6 +262,39 @@ __visible void 

RE: [PATCH] x86: Add check for number of available vectors before CPU down

2013-12-03 Thread Yu, Fenghua


> -Original Message-
> From: Prarit Bhargava [mailto:pra...@redhat.com]
> 
> Second try at this ...
> 
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791
> 
> When a cpu is downed on a system, the irqs on the cpu are assigned to
> other cpus.  It is possible, however, that when a cpu is downed there
> aren't enough free vectors on the remaining cpus to account for the
> vectors from the cpu that is being downed.
> 
> This results in an interesting "overflow" condition where irqs are
> "assigned" to a CPU but are not handled.
> 
> For example, when downing cpus on a 1-64 logical processor system:
> 
> + if (per_cpu(vector_irq, cpu)[vector] < 0)
> + count++;

But later on fixup_irqs will set some of vector_irq vector as -1 on this
to-be-disabled cpu. That will release vectors assigned to this cpu. So
checking vector_irq at this point before fixup_irqs doesn't make sense, right?

Thanks.

-Fenghua

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] x86: Add check for number of available vectors before CPU down

2013-12-03 Thread Yu, Fenghua


 -Original Message-
 From: Prarit Bhargava [mailto:pra...@redhat.com]
 
 Second try at this ...
 
 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791
 
 When a cpu is downed on a system, the irqs on the cpu are assigned to
 other cpus.  It is possible, however, that when a cpu is downed there
 aren't enough free vectors on the remaining cpus to account for the
 vectors from the cpu that is being downed.
 
 This results in an interesting overflow condition where irqs are
 assigned to a CPU but are not handled.
 
 For example, when downing cpus on a 1-64 logical processor system:
 
 + if (per_cpu(vector_irq, cpu)[vector]  0)
 + count++;

But later on fixup_irqs will set some of vector_irq vector as -1 on this
to-be-disabled cpu. That will release vectors assigned to this cpu. So
checking vector_irq at this point before fixup_irqs doesn't make sense, right?

Thanks.

-Fenghua

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Add check for number of available vectors before CPU down

2013-12-03 Thread rui wang
On 11/20/13, Prarit Bhargava pra...@redhat.com wrote:
 Second try at this ...

 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791

 When a cpu is downed on a system, the irqs on the cpu are assigned to other
 cpus.  It is possible, however, that when a cpu is downed there aren't
 enough free vectors on the remaining cpus to account for the vectors from
 the cpu that is being downed.

 This results in an interesting overflow condition where irqs are
 assigned to a CPU but are not handled.

 For example, when downing cpus on a 1-64 logical processor system:

 snip
 [  232.021745] smpboot: CPU 61 is now offline
 [  238.480275] smpboot: CPU 62 is now offline
 [  245.991080] [ cut here ]
 [  245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264
 dev_watchdog+0x246/0x250()
 [  246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out
 [  246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support
 sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp
 mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas
 scsi_transport_sas
 [  246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14
 [  246.044451] Hardware name: Intel Corporation S4600LH
 ../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
 [  246.057371]  0009 88081fa03d40 8164fbf6
 88081fa0ee48
 [  246.065728]  88081fa03d90 88081fa03d80 81054ecc
 88081fa13040
 [  246.074073]   88200cce 0040
 
 [  246.082430] Call Trace:
 [  246.085174]  IRQ  [8164fbf6] dump_stack+0x46/0x58
 [  246.091633]  [81054ecc] warn_slowpath_common+0x8c/0xc0
 [  246.098352]  [81054fb6] warn_slowpath_fmt+0x46/0x50
 [  246.104786]  [815710d6] dev_watchdog+0x246/0x250
 [  246.110923]  [81570e90] ?
 dev_deactivate_queue.constprop.31+0x80/0x80
 [  246.119097]  [8106092a] call_timer_fn+0x3a/0x110
 [  246.125224]  [8106280f] ? update_process_times+0x6f/0x80
 [  246.132137]  [81570e90] ?
 dev_deactivate_queue.constprop.31+0x80/0x80
 [  246.140308]  [81061db0] run_timer_softirq+0x1f0/0x2a0
 [  246.146933]  [81059a80] __do_softirq+0xe0/0x220
 [  246.152976]  [8165fedc] call_softirq+0x1c/0x30
 [  246.158920]  [810045f5] do_softirq+0x55/0x90
 [  246.164670]  [81059d35] irq_exit+0xa5/0xb0
 [  246.170227]  [8166062a] smp_apic_timer_interrupt+0x4a/0x60
 [  246.177324]  [8165f40a] apic_timer_interrupt+0x6a/0x70
 [  246.184041]  EOI  [81505a1b] ? cpuidle_enter_state+0x5b/0xe0
 [  246.191559]  [81505a17] ? cpuidle_enter_state+0x57/0xe0
 [  246.198374]  [81505b5d] cpuidle_idle_call+0xbd/0x200
 [  246.204900]  [8100b7ae] arch_cpu_idle+0xe/0x30
 [  246.210846]  [810a47b0] cpu_startup_entry+0xd0/0x250
 [  246.217371]  [81646b47] rest_init+0x77/0x80
 [  246.223028]  [81d09e8e] start_kernel+0x3ee/0x3fb
 [  246.229165]  [81d0989f] ? repair_env_string+0x5e/0x5e
 [  246.235787]  [81d095a5] x86_64_start_reservations+0x2a/0x2c
 [  246.242990]  [81d0969f] x86_64_start_kernel+0xf8/0xfc
 [  246.249610] ---[ end trace fb74fdef54d79039 ]---
 [  246.254807] ixgbe :c2:00.0 p786p1: initiating reset due to tx
 timeout
 [  246.262489] ixgbe :c2:00.0 p786p1: Reset adapter
 Last login: Mon Nov 11 08:35:14 from 10.18.17.119
 [root@(none) ~]# [  246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5
 [  249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow
 Control: RX/TX
 [  246.792676] ixgbe :c2:00.0 p786p1: detected SFP+: 5
 [  249.231598] ixgbe :c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow
 Control: RX/TX

 (last lines keep repeating.  ixgbe driver is dead until module reload.)

 If the downed cpu has more vectors than are free on the remaining cpus on
 the
 system, it is possible that some vectors are orphaned even though they
 are
 assigned to a cpu.  In this case, since the ixgbe driver had a watchdog,
 the
 watchdog fired and notified that something was wrong.

 This patch adds a function, check_vectors(), to compare the number of
 vectors
 on the CPU going down and compares it to the number of vectors available on
 the system.  If there aren't enough vectors for the CPU to go down, an
 error is returned and propogated back to userspace.

 Signed-off-by: Prarit Bhargava pra...@redhat.com
 Cc: x...@kernel.org
 ---
  arch/x86/include/asm/irq.h |1 +
  arch/x86/kernel/irq.c  |   33 +
  arch/x86/kernel/smpboot.c  |6 ++
  3 files changed, 40 insertions(+)

 diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h
 index 0ea10f27..dfd7372 100644
 --- a/arch/x86/include/asm/irq.h
 +++ b/arch/x86/include/asm/irq.h
 @@ -25,6 +25,7 @@ extern void irq_ctx_init(int cpu);

  #ifdef CONFIG_HOTPLUG_CPU