Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
All this discussion is well and good, but I suspect there is a driver setup problem where the interrupt isn't being handled properly. Please retest with the latest version of skge driver (I just pushed patches to netdev about 2min ago). One patch changes to disable IRQ's from device for packets until device is brought up. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
All this discussion is well and good, but I suspect there is a driver setup problem where the interrupt isn't being handled properly. Please retest with the latest version of skge driver (I just pushed patches to netdev about 2min ago). One patch changes to disable IRQ's from device for packets until device is brought up. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
On Wed, Mar 14, 2007 at 10:46:47PM +0100, Andi Kleen wrote: > Tejun Heo <[EMAIL PROTECTED]> writes: > > > > Let's assume there's a device which shares its INTX IRQ line with > > another device and the other one is already initialized. During boot, > > due to BIOS's fault, bad hardware design or sheer bad luck, the device > > has got a pending IRQ. > > This seems to be also common after kexec during kexec crashdumps > where the device just continues doing what it did before the crash. > That's true. It happens very frequently in kdump case where underlying device can very well have pending interrupts while second kernel is booting. Currently we allow the kernel to disable that irq line and we boot the kernel with "irqpoll" so that device still operates in polling mode. But getting this fixed will help. If device interrupts are enabled only after driver has had a chance to reset the device, things will be better, at least from kdump perspective. Thanks Vivek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
> Do you mean between disabling IRQ mechanisms and enabling PCI device in > pcim_prepare_device()? Yes. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
Do you mean between disabling IRQ mechanisms and enabling PCI device in pcim_prepare_device()? Yes. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
On Wed, Mar 14, 2007 at 10:46:47PM +0100, Andi Kleen wrote: Tejun Heo [EMAIL PROTECTED] writes: Let's assume there's a device which shares its INTX IRQ line with another device and the other one is already initialized. During boot, due to BIOS's fault, bad hardware design or sheer bad luck, the device has got a pending IRQ. This seems to be also common after kexec during kexec crashdumps where the device just continues doing what it did before the crash. That's true. It happens very frequently in kdump case where underlying device can very well have pending interrupts while second kernel is booting. Currently we allow the kernel to disable that irq line and we boot the kernel with irqpoll so that device still operates in polling mode. But getting this fixed will help. If device interrupts are enabled only after driver has had a chance to reset the device, things will be better, at least from kdump perspective. Thanks Vivek - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
On Thu, Mar 15, 2007 at 11:37:20AM +0900, Tejun Heo wrote: ... > Also, the current implementation doesn't have any arch independent part. I thnk you meant "arch dependent" here. > It's wholly contained in arch independent PCI layer, but it might be > beneficial to have arch dependent hooks (IRQ line enable/disable?) in > the future. > > >What if the device with the IRQ problem is never loaded? Sometimes > >devices aren't loaded until after boot. > > What do you mean by loading a device? Do you mean loading driver for > the device? Yes, I think that's what he meant. > >Any change like this has to be done without changing device drivers. > >Changing the skge/sky2 drivers as special case is not acceptable. I don't like the idead of changing the driver API for PCI device setup. But if it's necessary to solve this class of problem, I think it's ok. > I dunno about that. What I'm proposing is alternative two-step PCI > initialization step - the first step enables the device just enough for > initialization/reset and the second one enables full access. We're > doing part of it already for bus master. I'm proposing to expand that > approach and make them handled by generic PCI layer. As you can see, it > doesn't add noticeable complexity to drivers. I think it's even clearer > than doing pci_set_master() explicitly. Please update Documentation/pci.txt to reflect the API changes too. > If this way of solving the problem is chosen, eventually most drivers > should be converted to new initialization steps. And there is no way to > do this without modifying low level driver. Only low level driver knows > when full blown access can be enabled and such thing must happen before > registering the device to upper layer (e.g. ATA/SCSI, netif). Agreed. ISTR this has been discussed before but don't recall the exact context. I'll try to find the previous thread. When I started the parisc port on 2.4 kernels, the policy was to leave all interrupts enabled even if no interrupt handler was registered. This is useful for debugging misconfigured IRQ routing. Did the policy already change or is this a proposal to change the policy? thanks, grant > sky2/skge aren't exceptions. If this way of solving the problem is > chosen, eventually most if not all drivers should be converted to new > model. It may take two years, maybe five, but as a start just > converting ATA and network drivers shouldn't take too long and that > would help a lot of cases. > > Thanks. > > -- > tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
[cc'ing Andi, Hi!] Hello, Russell King wrote: > On Wed, Mar 14, 2007 at 06:34:11PM -0400, Jeff Garzik wrote: >> Russell King wrote: >>> pci_enable_device() doesn't deal with this; in most PCI setups I've >>> seen, there is no control at PCI level over whether a device generates >>> an interrupt on the bus. Certainly the memory and io command enables >> PCI grew an interrupt enable while you weren't looking: >> PCI_COMMAND_INTX_DISABLE > > That's fine for devices which conform to the later PCI specs, but not > all do. > >> It was added in PCI 2.3 I think. > > Correct. > >> Older PCI devices certainly do not have this standardized bit. > > No PCI device that I have has that bit - including the raid card I > bought last year... Many recent ATA and network controllers do and most new ones will probably do. > In any case, relying on such a new control bit to implement this kind > of functionality would result in a very hit and miss result; Linux > tends to get used on things other than the bleeding edge of hardware > technology. I don't think INTX_DISABLE is on the bleeding edge of hardware technology and many common cases will benefit from using it (just think about the number of newish notebook users). The problem with INTX_DISABLE is that there doesn't seem to be any way to tell whether writing to that bit is safe or not. You are right in that turning off IRQ mechanisms in pci_enable_device() doesn't fix all the problems as PCI-wise it only enables IO and memory address space access, but to some extent it does because in the arch code, it enables the IRQ line and the physical IRQ line might not be shared even if the final IRQ number is shared (Andi, am I correct)? Anyways, I think the proper solution is to make sure all generic IRQ controls including INTX turned off early in the boot during PCI subsystem initialization (ie. do the disable part of pcim_prepare_device() early in the boot before any IRQ line is requested) and let each driver enable after initialization as necessary and do similar things during resume. Note that drivers still need to be modified to signify when the device is initialized enough to enable IRQ, and bus mastering. We can also arch-dep IRQ enabling to the activation time. That will give us more protection even when INTX_DISABLE is not available. Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
Andi Kleen wrote: Tejun Heo <[EMAIL PROTECTED]> writes: Let's assume there's a device which shares its INTX IRQ line with another device and the other one is already initialized. During boot, due to BIOS's fault, bad hardware design or sheer bad luck, the device has got a pending IRQ. This seems to be also common after kexec during kexec crashdumps where the device just continues doing what it did before the crash. This patch expands the pci_set_master() approach. Instead of enabling the device in one go, it's done in two steps - prepare and activate. 'prepare' enables access to PCI configuration, I hope there aren't any new erratas triggered by this. Perhaps it would make sense to add some paranoia sleeps at least before touching other state? Do you mean between disabling IRQ mechanisms and enabling PCI device in pcim_prepare_device()? Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
Stephen Hemminger wrote: The problem is the BIOS is busted on these machines. How much effort do we want to put into dealing with systems with broken BIOS? I would rather have the root cause fixed than creating a bandaid that has to be maintained for all the other architectures and platforms. For sky2/skge, it might be caused by broken BIOS. For some ATA devices, it's just the hardware which is designed that way. Also, under non-x86 machines and during resume, there's no BIOS to nudge chips into sane state. This is an existing problem which has to be solved. How much effort we are gonna put into it is certainly debatable. Also, the current implementation doesn't have any arch independent part. It's wholly contained in arch independent PCI layer, but it might be beneficial to have arch dependent hooks (IRQ line enable/disable?) in the future. What if the device with the IRQ problem is never loaded? Sometimes devices aren't loaded until after boot. What do you mean by loading a device? Do you mean loading driver for the device? The patch as posted is probably not a complete solution. We probably need to make sure during early boot and resume that all IRQ / bus master are turned off where possible and let low level drivers enable them as needed and after certain amount of initialization is performed. If you use MSI interrupts, they aren't shared so there isn't a problem. Maybe the root cause of this is bad MSI emulation handling in BIOS. Yes, if MSI is used things are better. Any change like this has to be done without changing device drivers. Changing the skge/sky2 drivers as special case is not acceptable. I dunno about that. What I'm proposing is alternative two-step PCI initialization step - the first step enables the device just enough for initialization/reset and the second one enables full access. We're doing part of it already for bus master. I'm proposing to expand that approach and make them handled by generic PCI layer. As you can see, it doesn't add noticeable complexity to drivers. I think it's even clearer than doing pci_set_master() explicitly. If this way of solving the problem is chosen, eventually most drivers should be converted to new initialization steps. And there is no way to do this without modifying low level driver. Only low level driver knows when full blown access can be enabled and such thing must happen before registering the device to upper layer (e.g. ATA/SCSI, netif). sky2/skge aren't exceptions. If this way of solving the problem is chosen, eventually most if not all drivers should be converted to new model. It may take two years, maybe five, but as a start just converting ATA and network drivers shouldn't take too long and that would help a lot of cases. Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
Russell King wrote: In any case, relying on such a new control bit to implement this kind of functionality would result in a very hit and miss result; Linux How does Tejun's patch or thesis rely on this new control bit? He explicitly mentions DISABLE_INTX variability... Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
On Wed, Mar 14, 2007 at 06:34:11PM -0400, Jeff Garzik wrote: > Russell King wrote: > >pci_enable_device() doesn't deal with this; in most PCI setups I've > >seen, there is no control at PCI level over whether a device generates > >an interrupt on the bus. Certainly the memory and io command enables > > PCI grew an interrupt enable while you weren't looking: > PCI_COMMAND_INTX_DISABLE That's fine for devices which conform to the later PCI specs, but not all do. > It was added in PCI 2.3 I think. Correct. > Older PCI devices certainly do not have this standardized bit. No PCI device that I have has that bit - including the raid card I bought last year... In any case, relying on such a new control bit to implement this kind of functionality would result in a very hit and miss result; Linux tends to get used on things other than the bleeding edge of hardware technology. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
Russell King wrote: pci_enable_device() doesn't deal with this; in most PCI setups I've seen, there is no control at PCI level over whether a device generates an interrupt on the bus. Certainly the memory and io command enables PCI grew an interrupt enable while you weren't looking: PCI_COMMAND_INTX_DISABLE No idea about ARM, but almost all PCI devices made in the past few years support that bit. Unless you are using a PCI Express device (maybe PCI-X too?), though, you cannot count on the bit's presence. It was added in PCI 2.3 I think. Older PCI devices certainly do not have this standardized bit. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
On Thu, Mar 15, 2007 at 12:23:02AM +0900, Tejun Heo wrote: > The problem is that a PCI device can be in any arbitrary when it gets > enabled and the device has to be enabled for its driver to > initialize/reset it. The most common case this causes headache is as > follows. > > Let's assume there's a device which shares its INTX IRQ line with > another device and the other one is already initialized. pci_enable_device() doesn't deal with this; in most PCI setups I've seen, there is no control at PCI level over whether a device generates an interrupt on the bus. Certainly the memory and io command enables have no effect on the ability of the device to cause an interrupt. In most cases, only the driver of the device knows how to disable interrupts on any particular device. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
Tejun Heo <[EMAIL PROTECTED]> writes: > > Let's assume there's a device which shares its INTX IRQ line with > another device and the other one is already initialized. During boot, > due to BIOS's fault, bad hardware design or sheer bad luck, the device > has got a pending IRQ. This seems to be also common after kexec during kexec crashdumps where the device just continues doing what it did before the crash. > This patch expands the pci_set_master() approach. Instead of enabling > the device in one go, it's done in two steps - prepare and activate. > 'prepare' enables access to PCI configuration, I hope there aren't any new erratas triggered by this. Perhaps it would make sense to add some paranoia sleeps at least before touching other state? -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
On Thu, 15 Mar 2007 00:23:02 +0900 Tejun Heo <[EMAIL PROTECTED]> wrote: > Hello, all. > > This patch started off from the following thread. > > http://thread.gmane.org/gmane.linux.ide/16899 > > The problem is that a PCI device can be in any arbitrary when it gets > enabled and the device has to be enabled for its driver to > initialize/reset it. The most common case this causes headache is as > follows. > > Let's assume there's a device which shares its INTX IRQ line with > another device and the other one is already initialized. During boot, > due to BIOS's fault, bad hardware design or sheer bad luck, the device > has got a pending IRQ. When its driver enables the device, the > pending IRQ hits INTX. The IRQ line has been enabled by the other > driver sharing the IRQ but IRQ handler for this device hasn't been > registered yet. So, screaming interrupts. IRQ subsystem shuts up the > IRQ line in an attempt to save the machine from complete lockup and > both devices end up dead. > > There seem to be other scenarios that this can be triggered as, > reportedly, similar behavior is also observed when IRQ line is not > shared. I suspect similar thing can and is more likely to happen > while resuming as the device can really be in any random state. > > The dilemma here is that 1. the device needs to be enabled to be > initialized/reset 2. enabling the device may cause all hell to break > loose. Note that bus mastering has similar risks and we are currently > dealing with that by doing pci_set_master() from each driver after > certain initialization steps are taken. > > This patch expands the pci_set_master() approach. Instead of enabling > the device in one go, it's done in two steps - prepare and activate. > 'prepare' enables access to PCI configuration, IO, MMIO but prohibits > the device from bus mastering or raising IRQ by adjusting respective > PCI control bits prior to enabling the device. The second step > 'activate' allows the device to bus master and raise IRQs. Typical > initialization would look like the following. > > static int __devinit my_init_one(blah blah) > { > ... > > rc = pcim_prepare_device(pdev); > if (rc) > return rc; > > /* >* reset the controller and initialize msi/msix, whatnot >*/ > > rc = devm_request_irq(>dev, blah blah); > if (rc) > return rc; > > rc = pcim_activate_device(pdev); > if (rc) > return rc; > > /* >* register to upper layer, probe sub devices, etc... >*/ > > return 0; > } > > I've implemented it only as part of resource managed interface because > I didn't want to multiply interface functions with permutations && PCI > devres already has some of the needed feature. If non-managed > interface is ever necessary, that should be doable. > > Anyways, pcim_prepare_device() makes sure bus master is off, INTX (see > gotchas below), MSI and MSIX are disabled, then enables the device. > When it returns, the driver can access the device to initialize it but > the device can't havoc the rest of the system during initialization. > > After most things are set up and IRQ handler is registered, the driver > calls pcim_activate_device() which allows the device to master bus and > raise IRQs. Note that PCI layer keeps track of which IRQ mechanism is > in use and enable only that one. > > Similar thing is done during resume. pci_restore_state() restores PCI > configuration space but makes sure bus master and IRQ mechanisms are > disabled. Driver has to call pcim_activate_device() after it has > restored the device into sane state. > > This change makes init and resume more safe & robust without adding > too much complexity and easily applicable to existing drivers. > > I've converted skge and sky2 as example. skge is only compile tested > and sky2 is lightly tested. > > ** GOTCHAS > > * DISABLE_INTX is relatively new thing. Older devices don't support > the bit and there seems to be no way whether to determine this bit > is implemented or not. > > * Currently bus master bit is set unconditonally on activation. Is > this okay? > > * MSI IRQ masking isn't considered during activation. Can be fixed > easily. > > * Hmmm... prepare/activate? Any better names? > > The patch is against linus #master[M] as of today and converts skge > and sky2 to use new prepare/activate model. > > So, what do you think? > The problem is the BIOS is busted on these machines. How much effort do we want to put into dealing with systems with broken BIOS? I would rather have the root cause fixed than creating a bandaid that has to be maintained for all the other architectures and platforms. What if the device with the IRQ problem is never loaded? Sometimes devices aren't loaded until after boot. If you use MSI interrupts, they aren't shared so there isn't a problem. Maybe the root cause of this is bad MSI emulation
Re: [linux-pm] [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
On Thu, 15 Mar 2007, Tejun Heo wrote: > Hello, all. > > This patch started off from the following thread. > > http://thread.gmane.org/gmane.linux.ide/16899 > > The problem is that a PCI device can be in any arbitrary when it gets > enabled and the device has to be enabled for its driver to > initialize/reset it. The most common case this causes headache is as > follows. > > Let's assume there's a device which shares its INTX IRQ line with > another device and the other one is already initialized. During boot, > due to BIOS's fault, bad hardware design or sheer bad luck, the device > has got a pending IRQ. When its driver enables the device, the > pending IRQ hits INTX. The IRQ line has been enabled by the other > driver sharing the IRQ but IRQ handler for this device hasn't been > registered yet. So, screaming interrupts. IRQ subsystem shuts up the > IRQ line in an attempt to save the machine from complete lockup and > both devices end up dead. In several cases this problem has indeed come up and been fixed by adding a PCI quirk routine to turn off the device's pending IRQ. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
Hello, all. This patch started off from the following thread. http://thread.gmane.org/gmane.linux.ide/16899 The problem is that a PCI device can be in any arbitrary when it gets enabled and the device has to be enabled for its driver to initialize/reset it. The most common case this causes headache is as follows. Let's assume there's a device which shares its INTX IRQ line with another device and the other one is already initialized. During boot, due to BIOS's fault, bad hardware design or sheer bad luck, the device has got a pending IRQ. When its driver enables the device, the pending IRQ hits INTX. The IRQ line has been enabled by the other driver sharing the IRQ but IRQ handler for this device hasn't been registered yet. So, screaming interrupts. IRQ subsystem shuts up the IRQ line in an attempt to save the machine from complete lockup and both devices end up dead. There seem to be other scenarios that this can be triggered as, reportedly, similar behavior is also observed when IRQ line is not shared. I suspect similar thing can and is more likely to happen while resuming as the device can really be in any random state. The dilemma here is that 1. the device needs to be enabled to be initialized/reset 2. enabling the device may cause all hell to break loose. Note that bus mastering has similar risks and we are currently dealing with that by doing pci_set_master() from each driver after certain initialization steps are taken. This patch expands the pci_set_master() approach. Instead of enabling the device in one go, it's done in two steps - prepare and activate. 'prepare' enables access to PCI configuration, IO, MMIO but prohibits the device from bus mastering or raising IRQ by adjusting respective PCI control bits prior to enabling the device. The second step 'activate' allows the device to bus master and raise IRQs. Typical initialization would look like the following. static int __devinit my_init_one(blah blah) { ... rc = pcim_prepare_device(pdev); if (rc) return rc; /* * reset the controller and initialize msi/msix, whatnot */ rc = devm_request_irq(>dev, blah blah); if (rc) return rc; rc = pcim_activate_device(pdev); if (rc) return rc; /* * register to upper layer, probe sub devices, etc... */ return 0; } I've implemented it only as part of resource managed interface because I didn't want to multiply interface functions with permutations && PCI devres already has some of the needed feature. If non-managed interface is ever necessary, that should be doable. Anyways, pcim_prepare_device() makes sure bus master is off, INTX (see gotchas below), MSI and MSIX are disabled, then enables the device. When it returns, the driver can access the device to initialize it but the device can't havoc the rest of the system during initialization. After most things are set up and IRQ handler is registered, the driver calls pcim_activate_device() which allows the device to master bus and raise IRQs. Note that PCI layer keeps track of which IRQ mechanism is in use and enable only that one. Similar thing is done during resume. pci_restore_state() restores PCI configuration space but makes sure bus master and IRQ mechanisms are disabled. Driver has to call pcim_activate_device() after it has restored the device into sane state. This change makes init and resume more safe & robust without adding too much complexity and easily applicable to existing drivers. I've converted skge and sky2 as example. skge is only compile tested and sky2 is lightly tested. ** GOTCHAS * DISABLE_INTX is relatively new thing. Older devices don't support the bit and there seems to be no way whether to determine this bit is implemented or not. * Currently bus master bit is set unconditonally on activation. Is this okay? * MSI IRQ masking isn't considered during activation. Can be fixed easily. * Hmmm... prepare/activate? Any better names? The patch is against linus #master[M] as of today and converts skge and sky2 to use new prepare/activate model. So, what do you think? [M] 2.6.21-rc3 baab1087c61d4506f2c9f4cdb7da162160de16c2 DO NOT APPLY diff --git a/drivers/net/skge.c b/drivers/net/skge.c index eea75a4..b7bf949 100644 --- a/drivers/net/skge.c +++ b/drivers/net/skge.c @@ -3585,9 +3585,9 @@ static int __devinit skge_probe(struct pci_dev *pdev, struct skge_hw *hw; int err, using_dac = 0; - err = pci_enable_device(pdev); + err = pcim_prepare_device(pdev); if (err) { - dev_err(>dev, "cannot enable PCI device\n"); + dev_err(>dev, "cannot prepare PCI device\n"); goto err_out; } @@ -3671,6 +3671,10 @@ static int __devinit skge_probe(struct pci_dev *pdev, } skge_show_addr(dev); + err = pcim_activate_device(pdev); +
[PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
Hello, all. This patch started off from the following thread. http://thread.gmane.org/gmane.linux.ide/16899 The problem is that a PCI device can be in any arbitrary when it gets enabled and the device has to be enabled for its driver to initialize/reset it. The most common case this causes headache is as follows. Let's assume there's a device which shares its INTX IRQ line with another device and the other one is already initialized. During boot, due to BIOS's fault, bad hardware design or sheer bad luck, the device has got a pending IRQ. When its driver enables the device, the pending IRQ hits INTX. The IRQ line has been enabled by the other driver sharing the IRQ but IRQ handler for this device hasn't been registered yet. So, screaming interrupts. IRQ subsystem shuts up the IRQ line in an attempt to save the machine from complete lockup and both devices end up dead. There seem to be other scenarios that this can be triggered as, reportedly, similar behavior is also observed when IRQ line is not shared. I suspect similar thing can and is more likely to happen while resuming as the device can really be in any random state. The dilemma here is that 1. the device needs to be enabled to be initialized/reset 2. enabling the device may cause all hell to break loose. Note that bus mastering has similar risks and we are currently dealing with that by doing pci_set_master() from each driver after certain initialization steps are taken. This patch expands the pci_set_master() approach. Instead of enabling the device in one go, it's done in two steps - prepare and activate. 'prepare' enables access to PCI configuration, IO, MMIO but prohibits the device from bus mastering or raising IRQ by adjusting respective PCI control bits prior to enabling the device. The second step 'activate' allows the device to bus master and raise IRQs. Typical initialization would look like the following. static int __devinit my_init_one(blah blah) { ... rc = pcim_prepare_device(pdev); if (rc) return rc; /* * reset the controller and initialize msi/msix, whatnot */ rc = devm_request_irq(pdev-dev, blah blah); if (rc) return rc; rc = pcim_activate_device(pdev); if (rc) return rc; /* * register to upper layer, probe sub devices, etc... */ return 0; } I've implemented it only as part of resource managed interface because I didn't want to multiply interface functions with permutations PCI devres already has some of the needed feature. If non-managed interface is ever necessary, that should be doable. Anyways, pcim_prepare_device() makes sure bus master is off, INTX (see gotchas below), MSI and MSIX are disabled, then enables the device. When it returns, the driver can access the device to initialize it but the device can't havoc the rest of the system during initialization. After most things are set up and IRQ handler is registered, the driver calls pcim_activate_device() which allows the device to master bus and raise IRQs. Note that PCI layer keeps track of which IRQ mechanism is in use and enable only that one. Similar thing is done during resume. pci_restore_state() restores PCI configuration space but makes sure bus master and IRQ mechanisms are disabled. Driver has to call pcim_activate_device() after it has restored the device into sane state. This change makes init and resume more safe robust without adding too much complexity and easily applicable to existing drivers. I've converted skge and sky2 as example. skge is only compile tested and sky2 is lightly tested. ** GOTCHAS * DISABLE_INTX is relatively new thing. Older devices don't support the bit and there seems to be no way whether to determine this bit is implemented or not. * Currently bus master bit is set unconditonally on activation. Is this okay? * MSI IRQ masking isn't considered during activation. Can be fixed easily. * Hmmm... prepare/activate? Any better names? The patch is against linus #master[M] as of today and converts skge and sky2 to use new prepare/activate model. So, what do you think? [M] 2.6.21-rc3 baab1087c61d4506f2c9f4cdb7da162160de16c2 DO NOT APPLY diff --git a/drivers/net/skge.c b/drivers/net/skge.c index eea75a4..b7bf949 100644 --- a/drivers/net/skge.c +++ b/drivers/net/skge.c @@ -3585,9 +3585,9 @@ static int __devinit skge_probe(struct pci_dev *pdev, struct skge_hw *hw; int err, using_dac = 0; - err = pci_enable_device(pdev); + err = pcim_prepare_device(pdev); if (err) { - dev_err(pdev-dev, cannot enable PCI device\n); + dev_err(pdev-dev, cannot prepare PCI device\n); goto err_out; } @@ -3671,6 +3671,10 @@ static int __devinit skge_probe(struct pci_dev *pdev, } skge_show_addr(dev); + err =
Re: [linux-pm] [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
On Thu, 15 Mar 2007, Tejun Heo wrote: Hello, all. This patch started off from the following thread. http://thread.gmane.org/gmane.linux.ide/16899 The problem is that a PCI device can be in any arbitrary when it gets enabled and the device has to be enabled for its driver to initialize/reset it. The most common case this causes headache is as follows. Let's assume there's a device which shares its INTX IRQ line with another device and the other one is already initialized. During boot, due to BIOS's fault, bad hardware design or sheer bad luck, the device has got a pending IRQ. When its driver enables the device, the pending IRQ hits INTX. The IRQ line has been enabled by the other driver sharing the IRQ but IRQ handler for this device hasn't been registered yet. So, screaming interrupts. IRQ subsystem shuts up the IRQ line in an attempt to save the machine from complete lockup and both devices end up dead. In several cases this problem has indeed come up and been fixed by adding a PCI quirk routine to turn off the device's pending IRQ. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
On Thu, 15 Mar 2007 00:23:02 +0900 Tejun Heo [EMAIL PROTECTED] wrote: Hello, all. This patch started off from the following thread. http://thread.gmane.org/gmane.linux.ide/16899 The problem is that a PCI device can be in any arbitrary when it gets enabled and the device has to be enabled for its driver to initialize/reset it. The most common case this causes headache is as follows. Let's assume there's a device which shares its INTX IRQ line with another device and the other one is already initialized. During boot, due to BIOS's fault, bad hardware design or sheer bad luck, the device has got a pending IRQ. When its driver enables the device, the pending IRQ hits INTX. The IRQ line has been enabled by the other driver sharing the IRQ but IRQ handler for this device hasn't been registered yet. So, screaming interrupts. IRQ subsystem shuts up the IRQ line in an attempt to save the machine from complete lockup and both devices end up dead. There seem to be other scenarios that this can be triggered as, reportedly, similar behavior is also observed when IRQ line is not shared. I suspect similar thing can and is more likely to happen while resuming as the device can really be in any random state. The dilemma here is that 1. the device needs to be enabled to be initialized/reset 2. enabling the device may cause all hell to break loose. Note that bus mastering has similar risks and we are currently dealing with that by doing pci_set_master() from each driver after certain initialization steps are taken. This patch expands the pci_set_master() approach. Instead of enabling the device in one go, it's done in two steps - prepare and activate. 'prepare' enables access to PCI configuration, IO, MMIO but prohibits the device from bus mastering or raising IRQ by adjusting respective PCI control bits prior to enabling the device. The second step 'activate' allows the device to bus master and raise IRQs. Typical initialization would look like the following. static int __devinit my_init_one(blah blah) { ... rc = pcim_prepare_device(pdev); if (rc) return rc; /* * reset the controller and initialize msi/msix, whatnot */ rc = devm_request_irq(pdev-dev, blah blah); if (rc) return rc; rc = pcim_activate_device(pdev); if (rc) return rc; /* * register to upper layer, probe sub devices, etc... */ return 0; } I've implemented it only as part of resource managed interface because I didn't want to multiply interface functions with permutations PCI devres already has some of the needed feature. If non-managed interface is ever necessary, that should be doable. Anyways, pcim_prepare_device() makes sure bus master is off, INTX (see gotchas below), MSI and MSIX are disabled, then enables the device. When it returns, the driver can access the device to initialize it but the device can't havoc the rest of the system during initialization. After most things are set up and IRQ handler is registered, the driver calls pcim_activate_device() which allows the device to master bus and raise IRQs. Note that PCI layer keeps track of which IRQ mechanism is in use and enable only that one. Similar thing is done during resume. pci_restore_state() restores PCI configuration space but makes sure bus master and IRQ mechanisms are disabled. Driver has to call pcim_activate_device() after it has restored the device into sane state. This change makes init and resume more safe robust without adding too much complexity and easily applicable to existing drivers. I've converted skge and sky2 as example. skge is only compile tested and sky2 is lightly tested. ** GOTCHAS * DISABLE_INTX is relatively new thing. Older devices don't support the bit and there seems to be no way whether to determine this bit is implemented or not. * Currently bus master bit is set unconditonally on activation. Is this okay? * MSI IRQ masking isn't considered during activation. Can be fixed easily. * Hmmm... prepare/activate? Any better names? The patch is against linus #master[M] as of today and converts skge and sky2 to use new prepare/activate model. So, what do you think? The problem is the BIOS is busted on these machines. How much effort do we want to put into dealing with systems with broken BIOS? I would rather have the root cause fixed than creating a bandaid that has to be maintained for all the other architectures and platforms. What if the device with the IRQ problem is never loaded? Sometimes devices aren't loaded until after boot. If you use MSI interrupts, they aren't shared so there isn't a problem. Maybe the root cause of this is bad MSI emulation handling in BIOS. Any change like this has to be done without changing device drivers. Changing the skge/sky2 drivers
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
Tejun Heo [EMAIL PROTECTED] writes: Let's assume there's a device which shares its INTX IRQ line with another device and the other one is already initialized. During boot, due to BIOS's fault, bad hardware design or sheer bad luck, the device has got a pending IRQ. This seems to be also common after kexec during kexec crashdumps where the device just continues doing what it did before the crash. This patch expands the pci_set_master() approach. Instead of enabling the device in one go, it's done in two steps - prepare and activate. 'prepare' enables access to PCI configuration, I hope there aren't any new erratas triggered by this. Perhaps it would make sense to add some paranoia sleeps at least before touching other state? -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
On Thu, Mar 15, 2007 at 12:23:02AM +0900, Tejun Heo wrote: The problem is that a PCI device can be in any arbitrary when it gets enabled and the device has to be enabled for its driver to initialize/reset it. The most common case this causes headache is as follows. Let's assume there's a device which shares its INTX IRQ line with another device and the other one is already initialized. pci_enable_device() doesn't deal with this; in most PCI setups I've seen, there is no control at PCI level over whether a device generates an interrupt on the bus. Certainly the memory and io command enables have no effect on the ability of the device to cause an interrupt. In most cases, only the driver of the device knows how to disable interrupts on any particular device. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
Russell King wrote: pci_enable_device() doesn't deal with this; in most PCI setups I've seen, there is no control at PCI level over whether a device generates an interrupt on the bus. Certainly the memory and io command enables PCI grew an interrupt enable while you weren't looking: PCI_COMMAND_INTX_DISABLE No idea about ARM, but almost all PCI devices made in the past few years support that bit. Unless you are using a PCI Express device (maybe PCI-X too?), though, you cannot count on the bit's presence. It was added in PCI 2.3 I think. Older PCI devices certainly do not have this standardized bit. Jeff - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
On Wed, Mar 14, 2007 at 06:34:11PM -0400, Jeff Garzik wrote: Russell King wrote: pci_enable_device() doesn't deal with this; in most PCI setups I've seen, there is no control at PCI level over whether a device generates an interrupt on the bus. Certainly the memory and io command enables PCI grew an interrupt enable while you weren't looking: PCI_COMMAND_INTX_DISABLE That's fine for devices which conform to the later PCI specs, but not all do. It was added in PCI 2.3 I think. Correct. Older PCI devices certainly do not have this standardized bit. No PCI device that I have has that bit - including the raid card I bought last year... In any case, relying on such a new control bit to implement this kind of functionality would result in a very hit and miss result; Linux tends to get used on things other than the bleeding edge of hardware technology. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
Russell King wrote: In any case, relying on such a new control bit to implement this kind of functionality would result in a very hit and miss result; Linux How does Tejun's patch or thesis rely on this new control bit? He explicitly mentions DISABLE_INTX variability... Jeff - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
Stephen Hemminger wrote: The problem is the BIOS is busted on these machines. How much effort do we want to put into dealing with systems with broken BIOS? I would rather have the root cause fixed than creating a bandaid that has to be maintained for all the other architectures and platforms. For sky2/skge, it might be caused by broken BIOS. For some ATA devices, it's just the hardware which is designed that way. Also, under non-x86 machines and during resume, there's no BIOS to nudge chips into sane state. This is an existing problem which has to be solved. How much effort we are gonna put into it is certainly debatable. Also, the current implementation doesn't have any arch independent part. It's wholly contained in arch independent PCI layer, but it might be beneficial to have arch dependent hooks (IRQ line enable/disable?) in the future. What if the device with the IRQ problem is never loaded? Sometimes devices aren't loaded until after boot. What do you mean by loading a device? Do you mean loading driver for the device? The patch as posted is probably not a complete solution. We probably need to make sure during early boot and resume that all IRQ / bus master are turned off where possible and let low level drivers enable them as needed and after certain amount of initialization is performed. If you use MSI interrupts, they aren't shared so there isn't a problem. Maybe the root cause of this is bad MSI emulation handling in BIOS. Yes, if MSI is used things are better. Any change like this has to be done without changing device drivers. Changing the skge/sky2 drivers as special case is not acceptable. I dunno about that. What I'm proposing is alternative two-step PCI initialization step - the first step enables the device just enough for initialization/reset and the second one enables full access. We're doing part of it already for bus master. I'm proposing to expand that approach and make them handled by generic PCI layer. As you can see, it doesn't add noticeable complexity to drivers. I think it's even clearer than doing pci_set_master() explicitly. If this way of solving the problem is chosen, eventually most drivers should be converted to new initialization steps. And there is no way to do this without modifying low level driver. Only low level driver knows when full blown access can be enabled and such thing must happen before registering the device to upper layer (e.g. ATA/SCSI, netif). sky2/skge aren't exceptions. If this way of solving the problem is chosen, eventually most if not all drivers should be converted to new model. It may take two years, maybe five, but as a start just converting ATA and network drivers shouldn't take too long and that would help a lot of cases. Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
Andi Kleen wrote: Tejun Heo [EMAIL PROTECTED] writes: Let's assume there's a device which shares its INTX IRQ line with another device and the other one is already initialized. During boot, due to BIOS's fault, bad hardware design or sheer bad luck, the device has got a pending IRQ. This seems to be also common after kexec during kexec crashdumps where the device just continues doing what it did before the crash. This patch expands the pci_set_master() approach. Instead of enabling the device in one go, it's done in two steps - prepare and activate. 'prepare' enables access to PCI configuration, I hope there aren't any new erratas triggered by this. Perhaps it would make sense to add some paranoia sleeps at least before touching other state? Do you mean between disabling IRQ mechanisms and enabling PCI device in pcim_prepare_device()? Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
[cc'ing Andi, Hi!] Hello, Russell King wrote: On Wed, Mar 14, 2007 at 06:34:11PM -0400, Jeff Garzik wrote: Russell King wrote: pci_enable_device() doesn't deal with this; in most PCI setups I've seen, there is no control at PCI level over whether a device generates an interrupt on the bus. Certainly the memory and io command enables PCI grew an interrupt enable while you weren't looking: PCI_COMMAND_INTX_DISABLE That's fine for devices which conform to the later PCI specs, but not all do. It was added in PCI 2.3 I think. Correct. Older PCI devices certainly do not have this standardized bit. No PCI device that I have has that bit - including the raid card I bought last year... Many recent ATA and network controllers do and most new ones will probably do. In any case, relying on such a new control bit to implement this kind of functionality would result in a very hit and miss result; Linux tends to get used on things other than the bleeding edge of hardware technology. I don't think INTX_DISABLE is on the bleeding edge of hardware technology and many common cases will benefit from using it (just think about the number of newish notebook users). The problem with INTX_DISABLE is that there doesn't seem to be any way to tell whether writing to that bit is safe or not. You are right in that turning off IRQ mechanisms in pci_enable_device() doesn't fix all the problems as PCI-wise it only enables IO and memory address space access, but to some extent it does because in the arch code, it enables the IRQ line and the physical IRQ line might not be shared even if the final IRQ number is shared (Andi, am I correct)? Anyways, I think the proper solution is to make sure all generic IRQ controls including INTX turned off early in the boot during PCI subsystem initialization (ie. do the disable part of pcim_prepare_device() early in the boot before any IRQ line is requested) and let each driver enable after initialization as necessary and do similar things during resume. Note that drivers still need to be modified to signify when the device is initialized enough to enable IRQ, and bus mastering. We can also arch-dep IRQ enabling to the activation time. That will give us more protection even when INTX_DISABLE is not available. Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
On Thu, Mar 15, 2007 at 11:37:20AM +0900, Tejun Heo wrote: ... Also, the current implementation doesn't have any arch independent part. I thnk you meant arch dependent here. It's wholly contained in arch independent PCI layer, but it might be beneficial to have arch dependent hooks (IRQ line enable/disable?) in the future. What if the device with the IRQ problem is never loaded? Sometimes devices aren't loaded until after boot. What do you mean by loading a device? Do you mean loading driver for the device? Yes, I think that's what he meant. Any change like this has to be done without changing device drivers. Changing the skge/sky2 drivers as special case is not acceptable. I don't like the idead of changing the driver API for PCI device setup. But if it's necessary to solve this class of problem, I think it's ok. I dunno about that. What I'm proposing is alternative two-step PCI initialization step - the first step enables the device just enough for initialization/reset and the second one enables full access. We're doing part of it already for bus master. I'm proposing to expand that approach and make them handled by generic PCI layer. As you can see, it doesn't add noticeable complexity to drivers. I think it's even clearer than doing pci_set_master() explicitly. Please update Documentation/pci.txt to reflect the API changes too. If this way of solving the problem is chosen, eventually most drivers should be converted to new initialization steps. And there is no way to do this without modifying low level driver. Only low level driver knows when full blown access can be enabled and such thing must happen before registering the device to upper layer (e.g. ATA/SCSI, netif). Agreed. ISTR this has been discussed before but don't recall the exact context. I'll try to find the previous thread. When I started the parisc port on 2.4 kernels, the policy was to leave all interrupts enabled even if no interrupt handler was registered. This is useful for debugging misconfigured IRQ routing. Did the policy already change or is this a proposal to change the policy? thanks, grant sky2/skge aren't exceptions. If this way of solving the problem is chosen, eventually most if not all drivers should be converted to new model. It may take two years, maybe five, but as a start just converting ATA and network drivers shouldn't take too long and that would help a lot of cases. Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/