Re: Observing Softlockup's while running heavy IOs
On Mon, Sep 12, 2016 at 01:48:39PM +0530, Sreekanth Reddy wrote: > On Thu, Sep 8, 2016 at 7:09 PM, Neil Horman wrote: > > On Thu, Sep 08, 2016 at 11:12:40AM +0530, Sreekanth Reddy wrote: > >> On Wed, Sep 7, 2016 at 6:54 PM, Neil Horman wrote: > >> > On Wed, Sep 07, 2016 at 11:30:04AM +0530, Sreekanth Reddy wrote: > >> >> On Tue, Sep 6, 2016 at 8:36 PM, Neil Horman > >> >> wrote: > >> >> > On Tue, Sep 06, 2016 at 04:52:37PM +0530, Sreekanth Reddy wrote: > >> >> >> On Fri, Sep 2, 2016 at 4:34 AM, Bart Van Assche > >> >> >> wrote: > >> >> >> > On 09/01/2016 03:31 AM, Sreekanth Reddy wrote: > >> >> >> >> > >> >> >> >> I reduced the ISR workload by one third in-order to reduce the > >> >> >> >> time > >> >> >> >> that is spent per CPU in interrupt context, even then I am > >> >> >> >> observing > >> >> >> >> softlockups. > >> >> >> >> > >> >> >> >> As I mentioned before only same single CPU in the set of > >> >> >> >> CPUs(enabled > >> >> >> >> in affinity_hint) is busy with handling the interrupts from > >> >> >> >> corresponding IRQx. I have done below experiment in driver to > >> >> >> >> limit > >> >> >> >> these softlockups/hardlockups. But I am not sure whether it is > >> >> >> >> reasonable to do this in driver, > >> >> >> >> > >> >> >> >> Experiment: > >> >> >> >> If the CPUx is continuously busy with handling the remote CPUs > >> >> >> >> (enabled in the corresponding IRQ's affinity_hint) IO works by > >> >> >> >> 1/4th > >> >> >> >> of the HBA queue depth in the same ISR context then enable a flag > >> >> >> >> called 'change_smp_affinity' for this IRQ. Also created a thread > >> >> >> >> with > >> >> >> >> will poll for this flag for every IRQ's (enabled by driver) for > >> >> >> >> every > >> >> >> >> second. If this thread see that this flag is enabled for any IRQ > >> >> >> >> then > >> >> >> >> it will write next CPU number from the CPUs enabled in the IRQ's > >> >> >> >> affinity_hint to the IRQ's smp_affinity procfs attribute using > >> >> >> >> 'call_usermodehelper()' API. > >> >> >> >> > >> >> >> >> This to make sure that interrupts are not processed by same > >> >> >> >> single CPU > >> >> >> >> all the time and to make the other CPUs to handle the interrupts > >> >> >> >> if > >> >> >> >> the current CPU is continuously busy with handling the other CPUs > >> >> >> >> IO > >> >> >> >> interrupts. > >> >> >> >> > >> >> >> >> For example consider a system which has 8 logical CPUs and one > >> >> >> >> MSIx > >> >> >> >> vector enabled (called IRQ 120) in driver, HBA queue depth as 8K. > >> >> >> >> then IRQ's procfs attributes will be > >> >> >> >> IRQ# 120, affinity_hint=0xff, smp_affinity=0x00 > >> >> >> >> > >> >> >> >> After starting heavy IOs, we will observe that only CPU0 will be > >> >> >> >> busy > >> >> >> >> with handling the interrupts. This experiment driver will change > >> >> >> >> the > >> >> >> >> smp_affinity to next CPU number i.e. 0x01 (using cmd 'echo 0x01 > > >> >> >> >> /proc/irq/120/smp_affinity', driver issue's this cmd using > >> >> >> >> call_usermodehelper() API) if it observes that CPU0 is > >> >> >> >> continuously > >> >> >> >> processing more than 2K of IOs replies of other CPUs i.e from > >> >> >> >> CPU1 to > >> >> >> >> CPU7. > >> >> >> >> > >> >> >> >> Whether doing this kind of stuff in driver is ok? > >> >> >> > > >> >> >> > > >> >> >> > Hello Sreekanth, > >> >> >> > > >> >> >> > To me this sounds like something that should be implemented in the > >> >> >> > I/O > >> >> >> > chipset on the motherboard. If you have a look at the Intel > >> >> >> > Software > >> >> >> > Developer Manuals then you will see that logical destination mode > >> >> >> > supports > >> >> >> > round-robin interrupt delivery. However, the Linux kernel selects > >> >> >> > physical > >> >> >> > destination mode on systems with more than eight logical CPUs (see > >> >> >> > also > >> >> >> > arch/x86/kernel/apic/apic_flat_64.c). > >> >> >> > > >> >> >> > I'm not sure the maintainers of the interrupt subsystem would > >> >> >> > welcome code > >> >> >> > that emulates round-robin interrupt delivery. So your best option > >> >> >> > is > >> >> >> > probably to minimize the amount of work that is done in interrupt > >> >> >> > context > >> >> >> > and to move as much work as possible out of interrupt context in > >> >> >> > such a way > >> >> >> > that it can be spread over multiple CPU cores, e.g. by using > >> >> >> > queue_work_on(). > >> >> >> > > >> >> >> > Bart. > >> >> >> > >> >> >> Bart, > >> >> >> > >> >> >> Thanks a lot for providing lot of inputs and valuable information on > >> >> >> this issue. > >> >> >> > >> >> >> Today I got one more observation. i.e. I am not observing any lockups > >> >> >> if I use 1.0.4-6 versioned irqbalance. > >> >> >> Since this versioned irqbalance is able to shift the load to other > >> >> >> CPU > >> >> >> when one CPU is heavily loaded. > >> >> >> > >> >> > > >> >> > This isn't happening because irqbalance is no l
Re: Observing Softlockup's while running heavy IOs
On Thu, Sep 8, 2016 at 7:09 PM, Neil Horman wrote: > On Thu, Sep 08, 2016 at 11:12:40AM +0530, Sreekanth Reddy wrote: >> On Wed, Sep 7, 2016 at 6:54 PM, Neil Horman wrote: >> > On Wed, Sep 07, 2016 at 11:30:04AM +0530, Sreekanth Reddy wrote: >> >> On Tue, Sep 6, 2016 at 8:36 PM, Neil Horman wrote: >> >> > On Tue, Sep 06, 2016 at 04:52:37PM +0530, Sreekanth Reddy wrote: >> >> >> On Fri, Sep 2, 2016 at 4:34 AM, Bart Van Assche >> >> >> wrote: >> >> >> > On 09/01/2016 03:31 AM, Sreekanth Reddy wrote: >> >> >> >> >> >> >> >> I reduced the ISR workload by one third in-order to reduce the time >> >> >> >> that is spent per CPU in interrupt context, even then I am observing >> >> >> >> softlockups. >> >> >> >> >> >> >> >> As I mentioned before only same single CPU in the set of >> >> >> >> CPUs(enabled >> >> >> >> in affinity_hint) is busy with handling the interrupts from >> >> >> >> corresponding IRQx. I have done below experiment in driver to limit >> >> >> >> these softlockups/hardlockups. But I am not sure whether it is >> >> >> >> reasonable to do this in driver, >> >> >> >> >> >> >> >> Experiment: >> >> >> >> If the CPUx is continuously busy with handling the remote CPUs >> >> >> >> (enabled in the corresponding IRQ's affinity_hint) IO works by 1/4th >> >> >> >> of the HBA queue depth in the same ISR context then enable a flag >> >> >> >> called 'change_smp_affinity' for this IRQ. Also created a thread >> >> >> >> with >> >> >> >> will poll for this flag for every IRQ's (enabled by driver) for >> >> >> >> every >> >> >> >> second. If this thread see that this flag is enabled for any IRQ >> >> >> >> then >> >> >> >> it will write next CPU number from the CPUs enabled in the IRQ's >> >> >> >> affinity_hint to the IRQ's smp_affinity procfs attribute using >> >> >> >> 'call_usermodehelper()' API. >> >> >> >> >> >> >> >> This to make sure that interrupts are not processed by same single >> >> >> >> CPU >> >> >> >> all the time and to make the other CPUs to handle the interrupts if >> >> >> >> the current CPU is continuously busy with handling the other CPUs IO >> >> >> >> interrupts. >> >> >> >> >> >> >> >> For example consider a system which has 8 logical CPUs and one MSIx >> >> >> >> vector enabled (called IRQ 120) in driver, HBA queue depth as 8K. >> >> >> >> then IRQ's procfs attributes will be >> >> >> >> IRQ# 120, affinity_hint=0xff, smp_affinity=0x00 >> >> >> >> >> >> >> >> After starting heavy IOs, we will observe that only CPU0 will be >> >> >> >> busy >> >> >> >> with handling the interrupts. This experiment driver will change the >> >> >> >> smp_affinity to next CPU number i.e. 0x01 (using cmd 'echo 0x01 > >> >> >> >> /proc/irq/120/smp_affinity', driver issue's this cmd using >> >> >> >> call_usermodehelper() API) if it observes that CPU0 is continuously >> >> >> >> processing more than 2K of IOs replies of other CPUs i.e from CPU1 >> >> >> >> to >> >> >> >> CPU7. >> >> >> >> >> >> >> >> Whether doing this kind of stuff in driver is ok? >> >> >> > >> >> >> > >> >> >> > Hello Sreekanth, >> >> >> > >> >> >> > To me this sounds like something that should be implemented in the >> >> >> > I/O >> >> >> > chipset on the motherboard. If you have a look at the Intel Software >> >> >> > Developer Manuals then you will see that logical destination mode >> >> >> > supports >> >> >> > round-robin interrupt delivery. However, the Linux kernel selects >> >> >> > physical >> >> >> > destination mode on systems with more than eight logical CPUs (see >> >> >> > also >> >> >> > arch/x86/kernel/apic/apic_flat_64.c). >> >> >> > >> >> >> > I'm not sure the maintainers of the interrupt subsystem would >> >> >> > welcome code >> >> >> > that emulates round-robin interrupt delivery. So your best option is >> >> >> > probably to minimize the amount of work that is done in interrupt >> >> >> > context >> >> >> > and to move as much work as possible out of interrupt context in >> >> >> > such a way >> >> >> > that it can be spread over multiple CPU cores, e.g. by using >> >> >> > queue_work_on(). >> >> >> > >> >> >> > Bart. >> >> >> >> >> >> Bart, >> >> >> >> >> >> Thanks a lot for providing lot of inputs and valuable information on >> >> >> this issue. >> >> >> >> >> >> Today I got one more observation. i.e. I am not observing any lockups >> >> >> if I use 1.0.4-6 versioned irqbalance. >> >> >> Since this versioned irqbalance is able to shift the load to other CPU >> >> >> when one CPU is heavily loaded. >> >> >> >> >> > >> >> > This isn't happening because irqbalance is no longer able to shift load >> >> > between >> >> > cpus, its happening because of commit >> >> > 996ee2cf7a4d10454de68ac4978adb5cf22850f8. >> >> > irqs with higher interrupt volumes sould be balanced to a specific cpu >> >> > core, >> >> > rather than to a cache domain to maximize cpu-local cache hit rates. >> >> > Prior to >> >> > that change we balanced to a cache domain and your workload didn't have >> >> > to >> >> > serialize mu
Re: Observing Softlockup's while running heavy IOs
On Thu, Sep 08, 2016 at 11:12:40AM +0530, Sreekanth Reddy wrote: > On Wed, Sep 7, 2016 at 6:54 PM, Neil Horman wrote: > > On Wed, Sep 07, 2016 at 11:30:04AM +0530, Sreekanth Reddy wrote: > >> On Tue, Sep 6, 2016 at 8:36 PM, Neil Horman wrote: > >> > On Tue, Sep 06, 2016 at 04:52:37PM +0530, Sreekanth Reddy wrote: > >> >> On Fri, Sep 2, 2016 at 4:34 AM, Bart Van Assche > >> >> wrote: > >> >> > On 09/01/2016 03:31 AM, Sreekanth Reddy wrote: > >> >> >> > >> >> >> I reduced the ISR workload by one third in-order to reduce the time > >> >> >> that is spent per CPU in interrupt context, even then I am observing > >> >> >> softlockups. > >> >> >> > >> >> >> As I mentioned before only same single CPU in the set of CPUs(enabled > >> >> >> in affinity_hint) is busy with handling the interrupts from > >> >> >> corresponding IRQx. I have done below experiment in driver to limit > >> >> >> these softlockups/hardlockups. But I am not sure whether it is > >> >> >> reasonable to do this in driver, > >> >> >> > >> >> >> Experiment: > >> >> >> If the CPUx is continuously busy with handling the remote CPUs > >> >> >> (enabled in the corresponding IRQ's affinity_hint) IO works by 1/4th > >> >> >> of the HBA queue depth in the same ISR context then enable a flag > >> >> >> called 'change_smp_affinity' for this IRQ. Also created a thread with > >> >> >> will poll for this flag for every IRQ's (enabled by driver) for every > >> >> >> second. If this thread see that this flag is enabled for any IRQ then > >> >> >> it will write next CPU number from the CPUs enabled in the IRQ's > >> >> >> affinity_hint to the IRQ's smp_affinity procfs attribute using > >> >> >> 'call_usermodehelper()' API. > >> >> >> > >> >> >> This to make sure that interrupts are not processed by same single > >> >> >> CPU > >> >> >> all the time and to make the other CPUs to handle the interrupts if > >> >> >> the current CPU is continuously busy with handling the other CPUs IO > >> >> >> interrupts. > >> >> >> > >> >> >> For example consider a system which has 8 logical CPUs and one MSIx > >> >> >> vector enabled (called IRQ 120) in driver, HBA queue depth as 8K. > >> >> >> then IRQ's procfs attributes will be > >> >> >> IRQ# 120, affinity_hint=0xff, smp_affinity=0x00 > >> >> >> > >> >> >> After starting heavy IOs, we will observe that only CPU0 will be busy > >> >> >> with handling the interrupts. This experiment driver will change the > >> >> >> smp_affinity to next CPU number i.e. 0x01 (using cmd 'echo 0x01 > > >> >> >> /proc/irq/120/smp_affinity', driver issue's this cmd using > >> >> >> call_usermodehelper() API) if it observes that CPU0 is continuously > >> >> >> processing more than 2K of IOs replies of other CPUs i.e from CPU1 to > >> >> >> CPU7. > >> >> >> > >> >> >> Whether doing this kind of stuff in driver is ok? > >> >> > > >> >> > > >> >> > Hello Sreekanth, > >> >> > > >> >> > To me this sounds like something that should be implemented in the I/O > >> >> > chipset on the motherboard. If you have a look at the Intel Software > >> >> > Developer Manuals then you will see that logical destination mode > >> >> > supports > >> >> > round-robin interrupt delivery. However, the Linux kernel selects > >> >> > physical > >> >> > destination mode on systems with more than eight logical CPUs (see > >> >> > also > >> >> > arch/x86/kernel/apic/apic_flat_64.c). > >> >> > > >> >> > I'm not sure the maintainers of the interrupt subsystem would welcome > >> >> > code > >> >> > that emulates round-robin interrupt delivery. So your best option is > >> >> > probably to minimize the amount of work that is done in interrupt > >> >> > context > >> >> > and to move as much work as possible out of interrupt context in such > >> >> > a way > >> >> > that it can be spread over multiple CPU cores, e.g. by using > >> >> > queue_work_on(). > >> >> > > >> >> > Bart. > >> >> > >> >> Bart, > >> >> > >> >> Thanks a lot for providing lot of inputs and valuable information on > >> >> this issue. > >> >> > >> >> Today I got one more observation. i.e. I am not observing any lockups > >> >> if I use 1.0.4-6 versioned irqbalance. > >> >> Since this versioned irqbalance is able to shift the load to other CPU > >> >> when one CPU is heavily loaded. > >> >> > >> > > >> > This isn't happening because irqbalance is no longer able to shift load > >> > between > >> > cpus, its happening because of commit > >> > 996ee2cf7a4d10454de68ac4978adb5cf22850f8. > >> > irqs with higher interrupt volumes sould be balanced to a specific cpu > >> > core, > >> > rather than to a cache domain to maximize cpu-local cache hit rates. > >> > Prior to > >> > that change we balanced to a cache domain and your workload didn't have > >> > to > >> > serialize multiple interrupts to a single core. My suggestion to you is > >> > to use > >> > the --policyscript option to make your storage irqs get balanced to the > >> > cache > >> > level, rather than the core level. That should return the behavi
Re: Observing Softlockup's while running heavy IOs
On Wed, Sep 7, 2016 at 6:54 PM, Neil Horman wrote: > On Wed, Sep 07, 2016 at 11:30:04AM +0530, Sreekanth Reddy wrote: >> On Tue, Sep 6, 2016 at 8:36 PM, Neil Horman wrote: >> > On Tue, Sep 06, 2016 at 04:52:37PM +0530, Sreekanth Reddy wrote: >> >> On Fri, Sep 2, 2016 at 4:34 AM, Bart Van Assche >> >> wrote: >> >> > On 09/01/2016 03:31 AM, Sreekanth Reddy wrote: >> >> >> >> >> >> I reduced the ISR workload by one third in-order to reduce the time >> >> >> that is spent per CPU in interrupt context, even then I am observing >> >> >> softlockups. >> >> >> >> >> >> As I mentioned before only same single CPU in the set of CPUs(enabled >> >> >> in affinity_hint) is busy with handling the interrupts from >> >> >> corresponding IRQx. I have done below experiment in driver to limit >> >> >> these softlockups/hardlockups. But I am not sure whether it is >> >> >> reasonable to do this in driver, >> >> >> >> >> >> Experiment: >> >> >> If the CPUx is continuously busy with handling the remote CPUs >> >> >> (enabled in the corresponding IRQ's affinity_hint) IO works by 1/4th >> >> >> of the HBA queue depth in the same ISR context then enable a flag >> >> >> called 'change_smp_affinity' for this IRQ. Also created a thread with >> >> >> will poll for this flag for every IRQ's (enabled by driver) for every >> >> >> second. If this thread see that this flag is enabled for any IRQ then >> >> >> it will write next CPU number from the CPUs enabled in the IRQ's >> >> >> affinity_hint to the IRQ's smp_affinity procfs attribute using >> >> >> 'call_usermodehelper()' API. >> >> >> >> >> >> This to make sure that interrupts are not processed by same single CPU >> >> >> all the time and to make the other CPUs to handle the interrupts if >> >> >> the current CPU is continuously busy with handling the other CPUs IO >> >> >> interrupts. >> >> >> >> >> >> For example consider a system which has 8 logical CPUs and one MSIx >> >> >> vector enabled (called IRQ 120) in driver, HBA queue depth as 8K. >> >> >> then IRQ's procfs attributes will be >> >> >> IRQ# 120, affinity_hint=0xff, smp_affinity=0x00 >> >> >> >> >> >> After starting heavy IOs, we will observe that only CPU0 will be busy >> >> >> with handling the interrupts. This experiment driver will change the >> >> >> smp_affinity to next CPU number i.e. 0x01 (using cmd 'echo 0x01 > >> >> >> /proc/irq/120/smp_affinity', driver issue's this cmd using >> >> >> call_usermodehelper() API) if it observes that CPU0 is continuously >> >> >> processing more than 2K of IOs replies of other CPUs i.e from CPU1 to >> >> >> CPU7. >> >> >> >> >> >> Whether doing this kind of stuff in driver is ok? >> >> > >> >> > >> >> > Hello Sreekanth, >> >> > >> >> > To me this sounds like something that should be implemented in the I/O >> >> > chipset on the motherboard. If you have a look at the Intel Software >> >> > Developer Manuals then you will see that logical destination mode >> >> > supports >> >> > round-robin interrupt delivery. However, the Linux kernel selects >> >> > physical >> >> > destination mode on systems with more than eight logical CPUs (see also >> >> > arch/x86/kernel/apic/apic_flat_64.c). >> >> > >> >> > I'm not sure the maintainers of the interrupt subsystem would welcome >> >> > code >> >> > that emulates round-robin interrupt delivery. So your best option is >> >> > probably to minimize the amount of work that is done in interrupt >> >> > context >> >> > and to move as much work as possible out of interrupt context in such a >> >> > way >> >> > that it can be spread over multiple CPU cores, e.g. by using >> >> > queue_work_on(). >> >> > >> >> > Bart. >> >> >> >> Bart, >> >> >> >> Thanks a lot for providing lot of inputs and valuable information on this >> >> issue. >> >> >> >> Today I got one more observation. i.e. I am not observing any lockups >> >> if I use 1.0.4-6 versioned irqbalance. >> >> Since this versioned irqbalance is able to shift the load to other CPU >> >> when one CPU is heavily loaded. >> >> >> > >> > This isn't happening because irqbalance is no longer able to shift load >> > between >> > cpus, its happening because of commit >> > 996ee2cf7a4d10454de68ac4978adb5cf22850f8. >> > irqs with higher interrupt volumes sould be balanced to a specific cpu >> > core, >> > rather than to a cache domain to maximize cpu-local cache hit rates. >> > Prior to >> > that change we balanced to a cache domain and your workload didn't have to >> > serialize multiple interrupts to a single core. My suggestion to you is >> > to use >> > the --policyscript option to make your storage irqs get balanced to the >> > cache >> > level, rather than the core level. That should return the behavior to >> > what you >> > want. >> > >> > Neil >> >> Hi Neil, >> >> Thanks for reply. >> >> Today I tried with setting balance_level to 'cache' for mpt3sas driver >> IRQ's using below policy script and used 1.0.9 versioned irqbalance, >> --
Re: Observing Softlockup's while running heavy IOs
On Wed, Sep 07, 2016 at 11:30:04AM +0530, Sreekanth Reddy wrote: > On Tue, Sep 6, 2016 at 8:36 PM, Neil Horman wrote: > > On Tue, Sep 06, 2016 at 04:52:37PM +0530, Sreekanth Reddy wrote: > >> On Fri, Sep 2, 2016 at 4:34 AM, Bart Van Assche > >> wrote: > >> > On 09/01/2016 03:31 AM, Sreekanth Reddy wrote: > >> >> > >> >> I reduced the ISR workload by one third in-order to reduce the time > >> >> that is spent per CPU in interrupt context, even then I am observing > >> >> softlockups. > >> >> > >> >> As I mentioned before only same single CPU in the set of CPUs(enabled > >> >> in affinity_hint) is busy with handling the interrupts from > >> >> corresponding IRQx. I have done below experiment in driver to limit > >> >> these softlockups/hardlockups. But I am not sure whether it is > >> >> reasonable to do this in driver, > >> >> > >> >> Experiment: > >> >> If the CPUx is continuously busy with handling the remote CPUs > >> >> (enabled in the corresponding IRQ's affinity_hint) IO works by 1/4th > >> >> of the HBA queue depth in the same ISR context then enable a flag > >> >> called 'change_smp_affinity' for this IRQ. Also created a thread with > >> >> will poll for this flag for every IRQ's (enabled by driver) for every > >> >> second. If this thread see that this flag is enabled for any IRQ then > >> >> it will write next CPU number from the CPUs enabled in the IRQ's > >> >> affinity_hint to the IRQ's smp_affinity procfs attribute using > >> >> 'call_usermodehelper()' API. > >> >> > >> >> This to make sure that interrupts are not processed by same single CPU > >> >> all the time and to make the other CPUs to handle the interrupts if > >> >> the current CPU is continuously busy with handling the other CPUs IO > >> >> interrupts. > >> >> > >> >> For example consider a system which has 8 logical CPUs and one MSIx > >> >> vector enabled (called IRQ 120) in driver, HBA queue depth as 8K. > >> >> then IRQ's procfs attributes will be > >> >> IRQ# 120, affinity_hint=0xff, smp_affinity=0x00 > >> >> > >> >> After starting heavy IOs, we will observe that only CPU0 will be busy > >> >> with handling the interrupts. This experiment driver will change the > >> >> smp_affinity to next CPU number i.e. 0x01 (using cmd 'echo 0x01 > > >> >> /proc/irq/120/smp_affinity', driver issue's this cmd using > >> >> call_usermodehelper() API) if it observes that CPU0 is continuously > >> >> processing more than 2K of IOs replies of other CPUs i.e from CPU1 to > >> >> CPU7. > >> >> > >> >> Whether doing this kind of stuff in driver is ok? > >> > > >> > > >> > Hello Sreekanth, > >> > > >> > To me this sounds like something that should be implemented in the I/O > >> > chipset on the motherboard. If you have a look at the Intel Software > >> > Developer Manuals then you will see that logical destination mode > >> > supports > >> > round-robin interrupt delivery. However, the Linux kernel selects > >> > physical > >> > destination mode on systems with more than eight logical CPUs (see also > >> > arch/x86/kernel/apic/apic_flat_64.c). > >> > > >> > I'm not sure the maintainers of the interrupt subsystem would welcome > >> > code > >> > that emulates round-robin interrupt delivery. So your best option is > >> > probably to minimize the amount of work that is done in interrupt context > >> > and to move as much work as possible out of interrupt context in such a > >> > way > >> > that it can be spread over multiple CPU cores, e.g. by using > >> > queue_work_on(). > >> > > >> > Bart. > >> > >> Bart, > >> > >> Thanks a lot for providing lot of inputs and valuable information on this > >> issue. > >> > >> Today I got one more observation. i.e. I am not observing any lockups > >> if I use 1.0.4-6 versioned irqbalance. > >> Since this versioned irqbalance is able to shift the load to other CPU > >> when one CPU is heavily loaded. > >> > > > > This isn't happening because irqbalance is no longer able to shift load > > between > > cpus, its happening because of commit > > 996ee2cf7a4d10454de68ac4978adb5cf22850f8. > > irqs with higher interrupt volumes sould be balanced to a specific cpu core, > > rather than to a cache domain to maximize cpu-local cache hit rates. Prior > > to > > that change we balanced to a cache domain and your workload didn't have to > > serialize multiple interrupts to a single core. My suggestion to you is to > > use > > the --policyscript option to make your storage irqs get balanced to the > > cache > > level, rather than the core level. That should return the behavior to what > > you > > want. > > > > Neil > > Hi Neil, > > Thanks for reply. > > Today I tried with setting balance_level to 'cache' for mpt3sas driver > IRQ's using below policy script and used 1.0.9 versioned irqbalance, > -- > #!/bin/bash > # Header > # Linux Shell Scripting for Irq Balance Policy select for mpt3sas driver > # > > # Command Line Args >
Re: Observing Softlockup's while running heavy IOs
On Tue, Sep 06, 2016 at 04:52:37PM +0530, Sreekanth Reddy wrote: > On Fri, Sep 2, 2016 at 4:34 AM, Bart Van Assche > wrote: > > On 09/01/2016 03:31 AM, Sreekanth Reddy wrote: > >> > >> I reduced the ISR workload by one third in-order to reduce the time > >> that is spent per CPU in interrupt context, even then I am observing > >> softlockups. > >> > >> As I mentioned before only same single CPU in the set of CPUs(enabled > >> in affinity_hint) is busy with handling the interrupts from > >> corresponding IRQx. I have done below experiment in driver to limit > >> these softlockups/hardlockups. But I am not sure whether it is > >> reasonable to do this in driver, > >> > >> Experiment: > >> If the CPUx is continuously busy with handling the remote CPUs > >> (enabled in the corresponding IRQ's affinity_hint) IO works by 1/4th > >> of the HBA queue depth in the same ISR context then enable a flag > >> called 'change_smp_affinity' for this IRQ. Also created a thread with > >> will poll for this flag for every IRQ's (enabled by driver) for every > >> second. If this thread see that this flag is enabled for any IRQ then > >> it will write next CPU number from the CPUs enabled in the IRQ's > >> affinity_hint to the IRQ's smp_affinity procfs attribute using > >> 'call_usermodehelper()' API. > >> > >> This to make sure that interrupts are not processed by same single CPU > >> all the time and to make the other CPUs to handle the interrupts if > >> the current CPU is continuously busy with handling the other CPUs IO > >> interrupts. > >> > >> For example consider a system which has 8 logical CPUs and one MSIx > >> vector enabled (called IRQ 120) in driver, HBA queue depth as 8K. > >> then IRQ's procfs attributes will be > >> IRQ# 120, affinity_hint=0xff, smp_affinity=0x00 > >> > >> After starting heavy IOs, we will observe that only CPU0 will be busy > >> with handling the interrupts. This experiment driver will change the > >> smp_affinity to next CPU number i.e. 0x01 (using cmd 'echo 0x01 > > >> /proc/irq/120/smp_affinity', driver issue's this cmd using > >> call_usermodehelper() API) if it observes that CPU0 is continuously > >> processing more than 2K of IOs replies of other CPUs i.e from CPU1 to > >> CPU7. > >> > >> Whether doing this kind of stuff in driver is ok? > > > > > > Hello Sreekanth, > > > > To me this sounds like something that should be implemented in the I/O > > chipset on the motherboard. If you have a look at the Intel Software > > Developer Manuals then you will see that logical destination mode supports > > round-robin interrupt delivery. However, the Linux kernel selects physical > > destination mode on systems with more than eight logical CPUs (see also > > arch/x86/kernel/apic/apic_flat_64.c). > > > > I'm not sure the maintainers of the interrupt subsystem would welcome code > > that emulates round-robin interrupt delivery. So your best option is > > probably to minimize the amount of work that is done in interrupt context > > and to move as much work as possible out of interrupt context in such a way > > that it can be spread over multiple CPU cores, e.g. by using > > queue_work_on(). > > > > Bart. > > Bart, > > Thanks a lot for providing lot of inputs and valuable information on this > issue. > > Today I got one more observation. i.e. I am not observing any lockups > if I use 1.0.4-6 versioned irqbalance. > Since this versioned irqbalance is able to shift the load to other CPU > when one CPU is heavily loaded. > This isn't happening because irqbalance is no longer able to shift load between cpus, its happening because of commit 996ee2cf7a4d10454de68ac4978adb5cf22850f8. irqs with higher interrupt volumes sould be balanced to a specific cpu core, rather than to a cache domain to maximize cpu-local cache hit rates. Prior to that change we balanced to a cache domain and your workload didn't have to serialize multiple interrupts to a single core. My suggestion to you is to use the --policyscript option to make your storage irqs get balanced to the cache level, rather than the core level. That should return the behavior to what you want. Neil > while running heavy IOs, for first few seconds here is my driver irq's > attributes, > > ioc number = 0 > number of core processors = 24 > msix vector count = 2 > number of cores per msix vector = 16 > > > msix index = 0, irq number = 50, smp_affinity = 40 > affinity_hint = 000fff > msix index = 1, irq number = 51, smp_affinity = 001000 > affinity_hint = fff000 > > We have set affinity for 2 msix vectors and 24 core processors > -- > > After few seconds it observed that CPU12 is heavily loaded for IRQ 51 > and it changed the smp_affinity to CPU21 >
Re: Observing Softlockup's while running heavy IOs
On 09/01/2016 03:31 AM, Sreekanth Reddy wrote: I reduced the ISR workload by one third in-order to reduce the time that is spent per CPU in interrupt context, even then I am observing softlockups. As I mentioned before only same single CPU in the set of CPUs(enabled in affinity_hint) is busy with handling the interrupts from corresponding IRQx. I have done below experiment in driver to limit these softlockups/hardlockups. But I am not sure whether it is reasonable to do this in driver, Experiment: If the CPUx is continuously busy with handling the remote CPUs (enabled in the corresponding IRQ's affinity_hint) IO works by 1/4th of the HBA queue depth in the same ISR context then enable a flag called 'change_smp_affinity' for this IRQ. Also created a thread with will poll for this flag for every IRQ's (enabled by driver) for every second. If this thread see that this flag is enabled for any IRQ then it will write next CPU number from the CPUs enabled in the IRQ's affinity_hint to the IRQ's smp_affinity procfs attribute using 'call_usermodehelper()' API. This to make sure that interrupts are not processed by same single CPU all the time and to make the other CPUs to handle the interrupts if the current CPU is continuously busy with handling the other CPUs IO interrupts. For example consider a system which has 8 logical CPUs and one MSIx vector enabled (called IRQ 120) in driver, HBA queue depth as 8K. then IRQ's procfs attributes will be IRQ# 120, affinity_hint=0xff, smp_affinity=0x00 After starting heavy IOs, we will observe that only CPU0 will be busy with handling the interrupts. This experiment driver will change the smp_affinity to next CPU number i.e. 0x01 (using cmd 'echo 0x01 > /proc/irq/120/smp_affinity', driver issue's this cmd using call_usermodehelper() API) if it observes that CPU0 is continuously processing more than 2K of IOs replies of other CPUs i.e from CPU1 to CPU7. Whether doing this kind of stuff in driver is ok? Hello Sreekanth, To me this sounds like something that should be implemented in the I/O chipset on the motherboard. If you have a look at the Intel Software Developer Manuals then you will see that logical destination mode supports round-robin interrupt delivery. However, the Linux kernel selects physical destination mode on systems with more than eight logical CPUs (see also arch/x86/kernel/apic/apic_flat_64.c). I'm not sure the maintainers of the interrupt subsystem would welcome code that emulates round-robin interrupt delivery. So your best option is probably to minimize the amount of work that is done in interrupt context and to move as much work as possible out of interrupt context in such a way that it can be spread over multiple CPU cores, e.g. by using queue_work_on(). Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Observing Softlockup's while running heavy IOs
On Fri, Aug 19, 2016 at 9:26 PM, Bart Van Assche wrote: > On 08/19/2016 04:44 AM, Sreekanth Reddy wrote: >> >> [ +0.000439] __blk_mq_run_hw_queue() finished after 10058 ms >> [ ... ] >> [ +0.05] [] ? finish_task_switch+0x6b/0x200 >> [ +0.06] [] __schedule+0x36c/0x950 >> [ +0.02] [] schedule+0x37/0x80 >> [ +0.06] [] futex_wait_queue_me+0xbc/0x120 >> [ +0.04] [] futex_wait+0x119/0x270 >> [ +0.04] [] ? futex_wake+0x90/0x180 >> [ +0.03] [] do_futex+0x12b/0xb00 >> [ +0.05] [] ? set_next_entity+0x23e/0x440 >> [ +0.07] [] ? __switch_to+0x261/0x4b0 >> [ +0.04] [] SyS_futex+0x81/0x180 >> [ +0.02] [] ? schedule+0x37/0x80 >> [ +0.04] [] entry_SYSCALL_64_fastpath+0x12/0x71 > > > Hello Sreekanth, > > If a "soft lockup" is reported that often means that kernel code is > iterating too long in a loop without giving up the CPU. Inserting a > cond_resched() call in such loops usually resolves these soft lockup > complaints. However, your latest e-mail shows that the soft lockup complaint > was reported on other code than __blk_mq_run_hw_queue(). I'm afraid this > means that the CPU on which the soft lockup was reported is hammered so hard > with interrupts that hardly any time remains for the scheduler to run code > on that CPU. You will have to follow Robert Elliott's advice and reduce the > time that is spent per CPU in interrupt context. > Sorry for delay in response as I was on Vacation. Bart, I reduced the ISR workload by one third in-order to reduce the time that is spent per CPU in interrupt context, even then I am observing softlockups. As I mentioned before only same single CPU in the set of CPUs(enabled in affinity_hint) is busy with handling the interrupts from corresponding IRQx. I have done below experiment in driver to limit these softlockups/hardlockups. But I am not sure whether it is reasonable to do this in driver, Experiment: If the CPUx is continuously busy with handling the remote CPUs (enabled in the corresponding IRQ's affinity_hint) IO works by 1/4th of the HBA queue depth in the same ISR context then enable a flag called 'change_smp_affinity' for this IRQ. Also created a thread with will poll for this flag for every IRQ's (enabled by driver) for every second. If this thread see that this flag is enabled for any IRQ then it will write next CPU number from the CPUs enabled in the IRQ's affinity_hint to the IRQ's smp_affinity procfs attribute using 'call_usermodehelper()' API. This to make sure that interrupts are not processed by same single CPU all the time and to make the other CPUs to handle the interrupts if the current CPU is continuously busy with handling the other CPUs IO interrupts. For example consider a system which has 8 logical CPUs and one MSIx vector enabled (called IRQ 120) in driver, HBA queue depth as 8K. then IRQ's procfs attributes will be IRQ# 120, affinity_hint=0xff, smp_affinity=0x00 After starting heavy IOs, we will observe that only CPU0 will be busy with handling the interrupts. This experiment driver will change the smp_affinity to next CPU number i.e. 0x01 (using cmd 'echo 0x01 > /proc/irq/120/smp_affinity', driver issue's this cmd using call_usermodehelper() API) if it observes that CPU0 is continuously processing more than 2K of IOs replies of other CPUs i.e from CPU1 to CPU7. Whether doing this kind of stuff in driver is ok? Thanks, Sreekanth > Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Observing Softlockup's while running heavy IOs
> -Original Message- > From: Elliott, Robert (Persistent Memory) [mailto:elli...@hpe.com] > Sent: Saturday, August 20, 2016 2:58 AM > To: Sreekanth Reddy > Cc: linux-scsi@vger.kernel.org; linux-ker...@vger.kernel.org; > irqbala...@lists.infradead.org; Kashyap Desai; Sathya Prakash Veerichetty; > Chaitra Basappa; Suganath Prabu Subramani > Subject: RE: Observing Softlockup's while running heavy IOs > > > > > -Original Message- > > From: Sreekanth Reddy [mailto:sreekanth.re...@broadcom.com] > > Sent: Friday, August 19, 2016 6:45 AM > > To: Elliott, Robert (Persistent Memory) > > Subject: Re: Observing Softlockup's while running heavy IOs > > > ... > > Yes I am also observing that all the interrupts are routed to one CPU. > > But still I observing softlockups (sometime hardlockups) even when I > > set rq_affinity to 2. How about below scenario ? For simplicity. HBA with single MSI-x vector. (Whenever HBA supports less MSI-x and logical CPUs are more on system, we can see chance of these issue frequently..) Assume we have 32 logical CPU (4 socket, each with 8 logical CPU). CPU-0 is not participating in IO. Remaining CPU range from 1 to 31 is submitting IO. In such a scenario rq_affinity=2 and irqbalance supporting *exact* smp_affinity_hint will not help. We may see soft/hard lockup on CPU-0.. Are we going to resolve such issue or it is very rare to happen on field ? > > That'll ensure the block layer's completion handling is done there, but > not your > driver's interrupt handler (which precedes the block layer completion > handling). > > > > Is their any way to route the interrupts the same CPUs which has > > submitted the corresponding IOs? > > or > > Is their any way/option in the irqbalance/kernel which can route > > interrupts to CPUs (enabled in affinity_hint) in round robin manner > > after specific time period. > > Ensure your driver creates one MSIX interrupt per CPU core, uses that > interrupt > for all submissions from that core, and reports that it would like that > interrupt to > be serviced by that core in /proc/irq/nnn/affinity_hint. > > Even with hyperthreading, this needs to be based on the logical CPU cores, > not > just the physical core or the physical socket. > You can swamp a logical CPU core as easily as a physical CPU core. > > Then, provide an irqbalance policy script that honors the affinity_hint > for your > driver, or turn off irqbalance and manually set /proc/irq/nnn/smp_affinity > to > match the affinity_hint. > > Some versions of irqbalance honor the hints; some purposely don't and need > to > be overridden with a policy script. > > > --- > Robert Elliott, HPE Persistent Memory > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Observing Softlockup's while running heavy IOs
> -Original Message- > From: Sreekanth Reddy [mailto:sreekanth.re...@broadcom.com] > Sent: Friday, August 19, 2016 6:45 AM > To: Elliott, Robert (Persistent Memory) > Subject: Re: Observing Softlockup's while running heavy IOs > ... > Yes I am also observing that all the interrupts are routed to one > CPU. But still I observing softlockups (sometime hardlockups) > even when I set rq_affinity to 2. That'll ensure the block layer's completion handling is done there, but not your driver's interrupt handler (which precedes the block layer completion handling). > Is their any way to route the interrupts the same CPUs which has > submitted the corresponding IOs? > or > Is their any way/option in the irqbalance/kernel which can route > interrupts to CPUs (enabled in affinity_hint) in round robin manner > after specific time period. Ensure your driver creates one MSIX interrupt per CPU core, uses that interrupt for all submissions from that core, and reports that it would like that interrupt to be serviced by that core in /proc/irq/nnn/affinity_hint. Even with hyperthreading, this needs to be based on the logical CPU cores, not just the physical core or the physical socket. You can swamp a logical CPU core as easily as a physical CPU core. Then, provide an irqbalance policy script that honors the affinity_hint for your driver, or turn off irqbalance and manually set /proc/irq/nnn/smp_affinity to match the affinity_hint. Some versions of irqbalance honor the hints; some purposely don't and need to be overridden with a policy script. --- Robert Elliott, HPE Persistent Memory
Re: Observing Softlockup's while running heavy IOs
On 08/19/2016 04:44 AM, Sreekanth Reddy wrote: [ +0.000439] __blk_mq_run_hw_queue() finished after 10058 ms [ ... ] [ +0.05] [] ? finish_task_switch+0x6b/0x200 [ +0.06] [] __schedule+0x36c/0x950 [ +0.02] [] schedule+0x37/0x80 [ +0.06] [] futex_wait_queue_me+0xbc/0x120 [ +0.04] [] futex_wait+0x119/0x270 [ +0.04] [] ? futex_wake+0x90/0x180 [ +0.03] [] do_futex+0x12b/0xb00 [ +0.05] [] ? set_next_entity+0x23e/0x440 [ +0.07] [] ? __switch_to+0x261/0x4b0 [ +0.04] [] SyS_futex+0x81/0x180 [ +0.02] [] ? schedule+0x37/0x80 [ +0.04] [] entry_SYSCALL_64_fastpath+0x12/0x71 Hello Sreekanth, If a "soft lockup" is reported that often means that kernel code is iterating too long in a loop without giving up the CPU. Inserting a cond_resched() call in such loops usually resolves these soft lockup complaints. However, your latest e-mail shows that the soft lockup complaint was reported on other code than __blk_mq_run_hw_queue(). I'm afraid this means that the CPU on which the soft lockup was reported is hammered so hard with interrupts that hardly any time remains for the scheduler to run code on that CPU. You will have to follow Robert Elliott's advice and reduce the time that is spent per CPU in interrupt context. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Observing Softlockup's while running heavy IOs
First of all thanks for Robert and Bart for reply. Robert, Thanks for the URL, I have gone though this URL. Yes I am also observing that all the interrupts are routed to one CPU. But still I observing softlockups (sometime hardlockups) even when I set rq_affinity to 2. Is their any way to route the interrupts the same CPUs which has submitted the corresponding IOs? or Is their any way/option in the irqbalance/kernel which can route interrupts to CPUs (enabled in affinity_hint) in round robin manner after specific time period. Bart, I have tried with your path and here is logs, [Aug19 13:48] __blk_mq_run_hw_queue() finished after 1 ms [ +1.196454] __blk_mq_run_hw_queue() finished after 11 ms [Aug19 13:49] __blk_mq_run_hw_queue() finished after 20 ms [ +14.173018] __blk_mq_run_hw_queue() finished after 278 ms [ +14.066448] __blk_mq_run_hw_queue() finished after 588 ms [ +5.394698] __blk_mq_run_hw_queue() finished after 1360 ms [Aug19 13:51] __blk_mq_run_hw_queue() finished after 1539 ms [Aug19 13:54] __blk_mq_run_hw_queue() finished after 1762 ms [Aug19 13:55] __blk_mq_run_hw_queue() finished after 2087 ms [Aug19 13:57] __blk_mq_run_hw_queue() finished after 2915 ms [Aug19 14:06] perf interrupt took too long (2509 > 2500), lowering kernel.perf_event_max_sample_rate to 5 [Aug19 14:10] __blk_mq_run_hw_queue() finished after 3266 ms [Aug19 14:15] __blk_mq_run_hw_queue() finished after 3953 ms [Aug19 14:22] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [llvmpipe-9:3152] [ +0.000439] __blk_mq_run_hw_queue() finished after 10058 ms [ +0.007206] Modules linked in: mpt3sas(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables raid456 async_raid6_recov async_memcpy async_pq async_xor xor intel_rapl async_tx iosf_mbi x86_pkg_temp_thermal raid6_pq coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mei_me joydev pcspkr sb_edac i2c_i801 mei iTCO_wdt iTCO_vendor_support edac_core lpc_ich ipmi_ssif ipmi_si ipmi_msghandler shpchp tpm_tis ioatdma acpi_power_meter tpm wmi acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c ast i2c_algo_bit drm_kms_helper ttm ixgbe drm mdio vxlan ip6_udp_tunnel udp_tunnel crc32c_intel raid_class ptp scsi_transport_sas [ +0.63] pps_core nvme dca [last unloaded: mpt3sas] [ +0.07] CPU: 6 PID: 3152 Comm: llvmpipe-9 Tainted: GW OE 4.2.0 #1 [ +0.02] Hardware name: Supermicro SYS-2028U-TNRT+/X10DRU-i+, BIOS 1.1 07/22/2015 [ +0.03] task: 883f5cf557c0 ti: 883f5afd8000 task.ti: 883f5afd8000 [ +0.02] RIP: 0010:[] [] __do_softirq+0x7b/0x290 [ +0.08] RSP: :883f7f183f08 EFLAGS: 0206 [ +0.02] RAX: 883f5afdc000 RBX: 883f7f190080 RCX: 06e0 [ +0.02] RDX: 3508 RSI: 71c139c0 RDI: 883f5cf557c0 [ +0.02] RBP: 883f7f183f68 R08: 3508717f8da4 R09: 883f7f183d80 [ +0.02] R10: R11: 0004 R12: 883f7f183e78 [ +0.01] R13: 8177304b R14: 883f7f183f68 R15: [ +0.03] FS: 7fa76b7f6700() GS:883f7f18() knlGS: Message from syslogd@dhcp-135-24-192-127 at Aug 19 14:22:42 ... kernel:NMI watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [llvmpipe-9:3152] [ +0.02] CS: 0010 DS: ES: CR0: 80050033 [ +0.02] CR2: 7fa03a2d90c0 CR3: 003f61fe4000 CR4: 001406e0 [ +0.02] Stack: [ +0.01] 883f7f18fc40 404040407f18fce8 883f5afdc000 000100aecc1a [ +0.04] 0ab1000a 02027f18fc58 883f7f183f48 [ +0.03] 883f7f1967c0 883f448d57c0 0001 [ +0.03] Call Trace: [ +0.02] [ +0.05] [] irq_exit+0x116/0x120 [ +0.05] [] smp_apic_timer_interrupt+0x46/0x60 [ +0.05] [] apic_timer_interrupt+0x6b/0x70 [ +0.02] [ +0.05] [] ? finish_task_switch+0x6b/0x200 [ +0.06] [] __schedule+0x36c/0x950 [ +0.02] [] schedule+0x37/0x80 [ +0.06] [] futex_wait_queue_me+0xbc/0x120 [ +0.04] [] futex_wait+0x119/0x270 [ +0.04] [] ? futex_wake+0x90/0x180 [ +0.03] [] do_futex+0x12b/0xb00 [ +0.05] [] ? set_next_entity+0x23e/0x440 [ +0.07] [] ? __switch_to+0x261/0x4b0 [ +0.04] [] SyS_futex+0x81/0x180 [ +0.02] [] ? schedule+0x37/0x80 [ +0.04] [] entry_SYSCALL_64_fastpath+0x12/0x71 [ +0.01] Code: 7e 00 01 00 00 65 48 8b 04 25 c4 3c 01 00 c7 45 c0 0a 00 00 00 48 89 45 b0 65 c7 05 6c 26 f7 7e 00 00 00 00 fb 66 0f 1f 44 00 00 ff ff ff ff 49 c7 c4 c0 a0 c0 81 0f bc 45 cc 83 c0 01 89 45 and here is the 'sar -u ALL 1 -P ALL 1' 02:22:43 PM CPU %usr %nice %sys %iowait%steal %irq %soft%guest%gnice %idle 02:22:44 PM all 5.73 0.60
Re: Observing Softlockup's while running heavy IOs
On 08/17/16 22:55, Sreekanth Reddy wrote: > Observing softlockups while running heavy IOs on 8 SSD drives > connected behind our LSI SAS 3004 HBA. Hello Sreekanth, This means that more than 23s was spent before the scheduler was invoked, probably due to a loop. Can you give the attached (untested) patch a try to see whether it is the loop in __blk_mq_run_hw_queue()? Thanks, Bart. From 4da94f2ec37ee5d1b4a5f1ce2886bdafd5cd394c Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Thu, 18 Aug 2016 07:51:49 -0700 Subject: [PATCH] block: Measure __blk_mq_run_hw_queue() execution time Note: the "max_elapsed" variable can be modified by multiple threads concurrently. --- block/blk-mq.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/block/blk-mq.c b/block/blk-mq.c index e931a0e..6d0961c 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -792,6 +792,9 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) LIST_HEAD(driver_list); struct list_head *dptr; int queued; + static long max_elapsed = -1; + unsigned long start = jiffies; + long elapsed; WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask)); @@ -889,6 +892,13 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) **/ blk_mq_run_hw_queue(hctx, true); } + + elapsed = jiffies - start; + if (elapsed > max_elapsed) { + max_elapsed = elapsed; + pr_info("%s() finished after %d ms\n", __func__, + jiffies_to_msecs(elapsed)); + } } /* -- 2.9.2
RE: Observing Softlockup's while running heavy IOs
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel- > ow...@vger.kernel.org] On Behalf Of Sreekanth Reddy > Sent: Thursday, August 18, 2016 12:56 AM > Subject: Observing Softlockup's while running heavy IOs > > Problem statement: > Observing softlockups while running heavy IOs on 8 SSD drives > connected behind our LSI SAS 3004 HBA. > ... > Observing a loop in the IO path, i.e only one CPU is busy with > processing the interrupts and other CPUs (in the affinity_hint mask) > are busy with sending the IOs (these CPUs are not yet all receiving > any interrupts). For example, only CPU6 is busy with processing the > interrupts from IRQ 219 and remaining CPUs i.e CPU 7,8,9,10 & 11 are > just busy with pumping the IOs and they never processed any IO > interrupts from IRQ 219. So we are observing softlockups due to > existence this loop in the IO Path. > > We may not observe these softlockups if irqbalancer might have > balanced the interrupts among the CPUs enabled in the particular > irq's > affinity_hint mask. so that all the CPUs are equaly busy with send > IOs > and processing the interrupts. I am not sure how irqbalancer balance > the load among the CPUs, but here I see only one CPU from irq's > affinity_hint mask is busy with interrupts and remaining CPUs won't > receive any interrupts from this IRQ. > > Please help me with any suggestions/recomendations to slove/limit > these kind of softlockups. Also please let me known if I have missed > any setting in the irqbalance. > The CPUs need to be forced to self-throttle by processing interrupts for their own submissions, which reduces the time they can submit more IOs. See https://lkml.org/lkml/2014/9/9/931 for discussion of this problem when blk-mq was added. --- Robert Elliott, HPE Persistent Memory