Re: [PATCH] eeh: Fixing a bug when pci structure is null
On 2/19/2010 1:54 PM, Benjamin Herrenschmidt wrote: On Fri, 2010-02-19 at 14:43 -0200, Breno Leitao wrote: Hi Ben, I'd like to ask about this patch ? Should I re-submit ? Thanks, Breno Leitao wrote: During a EEH recover, the pci_dev structure can be null, mainly if an eeh event is detected during cpi config operation. In this case, the pci_dev will not be known (and will be null) the kernel will crash with the following message: It should be in -next, can you dbl check ? I just confirmed the patch is in the -next tree. Mike Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/3] Support for PCI Express reset type
This is the first of three patches that implement a bit field that PCI Express device drivers can use to indicate they need a fundamental reset during error recovery. By default, the EEH framework on powerpc does what's known as a hot reset during recovery of a PCI Express device. We've found a case where the device needs a fundamental reset to recover properly. The current PCI error recovery and EEH frameworks do not support this distinction. The attached patch (courtesy of Richard Lary) adds a bit field to pci_dev that indicates whether the device requires a fundamental reset during recovery. These patches supersede the previously submitted patch that implemented a fundamental reset bit field. Please review and let me know of any concerns. Signed-off-by: Mike Mason mm...@us.ibm.com Signed-off-by: Richard Lary rl...@us.ibm.com diff -uNrp a/include/linux/pci.h b/include/linux/pci.h --- a/include/linux/pci.h 2009-07-13 14:25:37.0 -0700 +++ b/include/linux/pci.h 2009-07-15 10:25:37.0 -0700 @@ -273,6 +273,7 @@ struct pci_dev { unsigned intari_enabled:1; /* ARI forwarding */ unsigned intis_managed:1; unsigned intis_pcie:1; + unsigned intneeds_freset:1; /* Dev requires fundamental reset */ unsigned intstate_saved:1; unsigned intis_physfn:1; unsigned intis_virtfn:1; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/3] Support for PCI Express reset type
This is the second of three patches that implement a bit field that PCI Express device drivers can use to indicate they need a fundamental reset during error recovery. By default, the EEH framework on powerpc does what's known as a hot reset during recovery of a PCI Express device. We've found a case where the device needs a fundamental reset to recover properly. The current PCI error recovery and EEH frameworks do not support this distinction. The attached patch updates the Documentation/PCI/pci-error-recovery.txt file with changes related to this new bit field, as well a few unrelated updates. These patches supersede the previously submitted patch that implemented a fundamental reset bit field. Please review and let me know of any concerns. Signed-off-by: Mike Mason mm...@us.ibm.com Signed-off-by: Richard Lary rl...@us.ibm.com --- a/Documentation/PCI/pci-error-recovery.txt 2009-06-09 20:05:27.0 -0700 +++ b/Documentation/PCI/pci-error-recovery.txt 2009-07-30 13:57:15.0 -0700 @@ -4,15 +4,17 @@ February 2, 2006 Current document maintainer: - Linas Vepstas li...@austin.ibm.com + Linas Vepstas linasveps...@gmail.com + updated by Richard Lary rl...@us.ibm.com + and Mike Mason mm...@us.ibm.com on 27-Jul-2009 Many PCI bus controllers are able to detect a variety of hardware PCI errors on the bus, such as parity errors on the data and address busses, as well as SERR and PERR errors. Some of the more advanced chipsets are able to deal with these errors; these include PCI-E chipsets, -and the PCI-host bridges found on IBM Power4 and Power5-based pSeries -boxes. A typical action taken is to disconnect the affected device, +and the PCI-host bridges found on IBM Power4, Power5 and Power6-based +pSeries boxes. A typical action taken is to disconnect the affected device, halting all I/O to it. The goal of a disconnection is to avoid system corruption; for example, to halt system memory corruption due to DMA's to wild addresses. Typically, a reconnection mechanism is also @@ -37,10 +39,11 @@ is forced by the need to handle multi-fu devices that have multiple device drivers associated with them. In the first stage, each driver is allowed to indicate what type of reset it desires, the choices being a simple re-enabling of I/O -or requesting a hard reset (a full electrical #RST of the PCI card). -If any driver requests a full reset, that is what will be done. +or requesting a slot reset. -After a full reset and/or a re-enabling of I/O, all drivers are +If any driver requests a slot reset, that is what will be done. + +After a reset and/or a re-enabling of I/O, all drivers are again notified, so that they may then perform any device setup/config that may be required. After these have all completed, a final resume normal operations event is sent out. @@ -101,7 +104,7 @@ if it implements any, it must implement is not implemented, the corresponding feature is considered unsupported. For example, if mmio_enabled() and resume() aren't there, then it is assumed that the driver is not doing any direct recovery and requires -a reset. If link_reset() is not implemented, the card is assumed as +a slot reset. If link_reset() is not implemented, the card is assumed to not care about link resets. Typically a driver will want to know about a slot_reset(). @@ -111,7 +114,7 @@ sequence described below. STEP 0: Error Event --- -PCI bus error is detect by the PCI hardware. On powerpc, the slot +A PCI bus error is detected by the PCI hardware. On powerpc, the slot is isolated, in that all I/O is blocked: all reads return 0x, all writes are ignored. @@ -139,7 +142,7 @@ The driver must return one of the follow a chance to extract some diagnostic information (see mmio_enable, below). - PCI_ERS_RESULT_NEED_RESET: - Driver returns this if it can't recover without a hard + Driver returns this if it can't recover without a slot reset. - PCI_ERS_RESULT_DISCONNECT: Driver returns this if it doesn't want to recover at all. @@ -169,11 +172,11 @@ is STEP 6 (Permanent Failure). The current powerpc implementation doesn't much care if the device attempts I/O at this point, or not. I/O's will fail, returning - a value of 0xff on read, and writes will be dropped. If the device - driver attempts more than 10K I/O's to a frozen adapter, it will - assume that the device driver has gone into an infinite loop, and - it will panic the kernel. There doesn't seem to be any other - way of stopping a device driver that insists on spinning on I/O. + a value of 0xff on read, and writes will be dropped. If more than + EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH + assumes that the device driver has gone into an infinite loop
[PATCH 3/3] Support for PCI Express reset type
This is the third of three patches that implement a bit field that PCI Express device drivers can use to indicate they need a fundamental reset during error recovery. By default, the EEH framework on powerpc does what's known as a hot reset during recovery of a PCI Express device. We've found a case where the device needs a fundamental reset to recover properly. The current PCI error recovery and EEH frameworks do not support this distinction. The attached patch makes changes to EEH to utilize the new bit field. These patches supersede the previously submitted patch that implemented a fundamental reset bit field. Please review and let me know of any concerns. Signed-off-by: Mike Mason mm...@us.ibm.com Signed-off-by: Richard Lary rl...@us.ibm.com diff -uNrp a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c --- a/arch/powerpc/kernel/pci_64.c 2009-07-13 14:25:24.0 -0700 +++ b/arch/powerpc/kernel/pci_64.c 2009-07-15 10:26:26.0 -0700 @@ -143,6 +143,7 @@ struct pci_dev *of_create_pci_dev(struct dev-dev.bus = pci_bus_type; dev-devfn = devfn; dev-multifunction = 0; /* maybe a lie? */ + dev-needs_freset = 0; /* pcie fundamental reset required */ dev-vendor = get_int_prop(node, vendor-id, 0x); dev-device = get_int_prop(node, device-id, 0x); diff -uNrp a/arch/powerpc/platforms/pseries/eeh.c b/arch/powerpc/platforms/pseries/eeh.c --- a/arch/powerpc/platforms/pseries/eeh.c 2009-06-09 20:05:27.0 -0700 +++ b/arch/powerpc/platforms/pseries/eeh.c 2009-07-15 10:29:04.0 -0700 @@ -744,7 +744,15 @@ int pcibios_set_pcie_reset_state(struct static void __rtas_set_slot_reset(struct pci_dn *pdn) { - rtas_pci_slot_reset (pdn, 1); + struct pci_dev *dev = pdn-pcidev; + + /* Determine type of EEH reset required by device, +* default hot reset or fundamental reset +*/ + if (dev-needs_freset) + rtas_pci_slot_reset(pdn, 3); + else + rtas_pci_slot_reset(pdn, 1); /* The PCI bus requires that the reset be held high for at least * a 100 milliseconds. We wait a bit longer 'just in case'. */___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] Hold reference to device_node during EEH event handling
Michael Ellerman wrote: On Thu, 2009-07-16 at 09:33 -0700, Mike Mason wrote: Michael Ellerman wrote: On Wed, 2009-07-15 at 14:43 -0700, Mike Mason wrote: This patch increments the device_node reference counter when an EEH error occurs and decrements the counter when the event has been handled. This is to prevent the device_node from being released until eeh_event_handler() has had a chance to deal with the event. We've seen cases where the device_node is released too soon when an EEH event occurs during a dlpar remove, causing the event handler to attempt to access bad memory locations. Please review and let me know of any concerns. Taking a reference sounds sane, but ... Signed-off-by: Mike Mason mm...@us.ibm.com --- a/arch/powerpc/platforms/pseries/eeh_event.c2008-10-09 15:13:53.0 -0700 +++ b/arch/powerpc/platforms/pseries/eeh_event.c2009-07-14 14:14:00.0 -0700 @@ -75,6 +75,14 @@ static int eeh_event_handler(void * dumm if (event == NULL) return 0; + /* EEH holds a reference to the device_node, so if it +* equals 1 it's no longer valid and the event should +* be ignored */ + if (atomic_read(event-dn-kref.refcount) == 1) { + of_node_put(event-dn); + return 0; + } That's really gross :) Agreed. I'll look for another way to determine if device is gone and the event should be ignored. Suggestions are welcome :-) Actually, it turns out the atomic_read() isn't necessary. I just need to take the reference to the device_node when the EEH error is detected and let EEH try to handle the error. EEH detects the fact that the device is no longer valid, aborts the recovery attempt, then gives the device_node reference back. Works as expected. I'll resubmit the patch without the atomic_read(). Benh and I had a quick chat about it, and were wondering whether what you really should be doing is taking a reference to the pci device (perhaps as well as the device node). EEH already does that 3 lines before the of_node_get (see below). @@ -140,7 +149,7 @@ int eeh_send_failure_event (struct devic if (dev) pci_dev_get(dev); - event-dn = dn; + event-dn = of_node_get(dn); event-dev = dev; Thanks, Mike ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] Hold reference to device_node during EEH event handling
Michael Ellerman wrote: On Wed, 2009-07-15 at 14:43 -0700, Mike Mason wrote: This patch increments the device_node reference counter when an EEH error occurs and decrements the counter when the event has been handled. This is to prevent the device_node from being released until eeh_event_handler() has had a chance to deal with the event. We've seen cases where the device_node is released too soon when an EEH event occurs during a dlpar remove, causing the event handler to attempt to access bad memory locations. Please review and let me know of any concerns. Taking a reference sounds sane, but ... Signed-off-by: Mike Mason mm...@us.ibm.com --- a/arch/powerpc/platforms/pseries/eeh_event.c2008-10-09 15:13:53.0 -0700 +++ b/arch/powerpc/platforms/pseries/eeh_event.c2009-07-14 14:14:00.0 -0700 @@ -75,6 +75,14 @@ static int eeh_event_handler(void * dumm if (event == NULL) return 0; + /* EEH holds a reference to the device_node, so if it +* equals 1 it's no longer valid and the event should +* be ignored */ + if (atomic_read(event-dn-kref.refcount) == 1) { + of_node_put(event-dn); + return 0; + } That's really gross :) Agreed. I'll look for another way to determine if device is gone and the event should be ignored. Suggestions are welcome :-) And what happens if the refcount goes to 1 just after the check? ie. here. /* Serialize processing of EEH events */ mutex_lock(eeh_event_mutex); eeh_mark_slot(event-dn, EEH_MODE_RECOVERING); cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Support for PCI Express reset type in EEH
This patch was simultaneously submitted to Red Hat for review. As a result of that review, I'm withdrawing this patch and will submit a new version shortly. Mike Mike Mason wrote: By default, EEH does what's known as a hot reset during error recovery of a PCI Express device. We've found a case where the device needs a fundamental reset to recover properly. The current PCI error recovery and EEH frameworks do not support this distinction. The attached patch (courtesy of Richard Lary) implements a reset type callback that can be used to determine what type of reset a device requires. It is backwards compatible with all other drivers that implement PCI error recovery callbacks. Only drivers that require a fundamental reset need to be changed. So far we're only aware of one driver that has the requirement (qla2xxx). The patch touches mostly EEH and pseries code, but does require a couple of minor additions to the overall PCI error recovery framework. Signed-off-by: Mike Mason mm...@us.ibm.com --- a/arch/powerpc/include/asm/ppc-pci.h2009-06-09 20:05:27.0 -0700 +++ b/arch/powerpc/include/asm/ppc-pci.h2009-07-13 16:12:31.0 -0700 @@ -90,7 +90,9 @@ int rtas_pci_enable(struct pci_dn *pdn, * * Returns a non-zero value if the reset failed. */ -int rtas_set_slot_reset (struct pci_dn *); +#define HOT_RESET1 +#define FUNDAMENTAL_RESET3 +int rtas_set_slot_reset (struct pci_dn *, int reset_type); int eeh_wait_for_slot_status(struct pci_dn *pdn, int max_wait_msecs); /** --- a/arch/powerpc/platforms/pseries/eeh.c2009-06-09 20:05:27.0 -0700 +++ b/arch/powerpc/platforms/pseries/eeh.c2009-07-13 16:27:27.0 -0700 @@ -666,7 +666,7 @@ rtas_pci_enable(struct pci_dn *pdn, int /** * rtas_pci_slot_reset - raises/lowers the pci #RST line * @pdn pci device node - * @state: 1/0 to raise/lower the #RST + * @state: 1/3/0 to raise hot-reset/fundamental-reset/lower the #RST * * Clear the EEH-frozen condition on a slot. This routine * asserts the PCI #RST line if the 'state' argument is '1', @@ -742,9 +742,9 @@ int pcibios_set_pcie_reset_state(struct * Return 0 if success, else a non-zero value. */ -static void __rtas_set_slot_reset(struct pci_dn *pdn) +static void __rtas_set_slot_reset(struct pci_dn *pdn, int reset_type) { -rtas_pci_slot_reset (pdn, 1); +rtas_pci_slot_reset (pdn, reset_type); /* The PCI bus requires that the reset be held high for at least * a 100 milliseconds. We wait a bit longer 'just in case'. */ @@ -766,13 +766,13 @@ static void __rtas_set_slot_reset(struct msleep (PCI_BUS_SETTLE_TIME_MSEC); } -int rtas_set_slot_reset(struct pci_dn *pdn) +int rtas_set_slot_reset(struct pci_dn *pdn, int reset_type) { int i, rc; /* Take three shots at resetting the bus */ for (i=0; i3; i++) { -__rtas_set_slot_reset(pdn); +__rtas_set_slot_reset(pdn, reset_type); rc = eeh_wait_for_slot_status(pdn, PCI_BUS_RESET_WAIT_MSEC); if (rc == 0) --- a/arch/powerpc/platforms/pseries/eeh_driver.c2009-07-13 14:25:24.0 -0700 +++ b/arch/powerpc/platforms/pseries/eeh_driver.c2009-07-13 16:39:16.0 -0700 @@ -115,6 +115,34 @@ static void eeh_enable_irq(struct pci_de /* --- */ /** + * eeh_query_reset_type - query each device driver for reset type + * + * Query each device driver for special reset type if required + * merge the device driver responses. Cumulative response + * passed back in userdata. + */ + +static int eeh_query_reset_type(struct pci_dev *dev, void *userdata) +{ +enum pci_ers_result rc, *res = userdata; +struct pci_driver *driver = dev-driver; + +if (!driver) +return 0; + +if (!driver-err_handler || +!driver-err_handler-reset_type) +return 0; + +rc = driver-err_handler-reset_type (dev); + +/* A driver that needs a special reset trumps all others */ +if (rc == PCI_ERS_RESULT_FUNDAMENTAL_RESET ) *res = rc; + +return 0; +} + +/** * eeh_report_error - report pci error to each device driver * * Report an EEH error to each device driver, collect up and @@ -282,9 +310,12 @@ static int eeh_report_failure(struct pci * @pe_dn: pointer to a Partionable Endpoint device node. *This is the top-level structure on which pci *bus resets can be performed. + * + * reset_type: some devices may require type other than default hot reset. */ -static int eeh_reset_device (struct pci_dn *pe_dn, struct pci_bus *bus) +static int eeh_reset_device (struct pci_dn *pe_dn, struct pci_bus *bus, + int reset_type) { struct device_node *dn; int cnt, rc; @@ -298,7 +329,7 @@ static int eeh_reset_device (struct pci_ /* Reset the pci controller. (Asserts RST#; resets config space). * Reconfigure bridges and devices. Don't try to bring the system * up if the reset failed for some reason. */ -rc = rtas_set_slot_reset
[PATCH] Support for PCI Express reset type in EEH
By default, EEH does what's known as a hot reset during error recovery of a PCI Express device. We've found a case where the device needs a fundamental reset to recover properly. The current PCI error recovery and EEH frameworks do not support this distinction. The attached patch (courtesy of Richard Lary) adds a bit field to pci_dev that indicates whether the device requires a fundamental reset during error recovery. This bit can be checked by EEH to determine which reset type is required. This patch supersedes the previously submitted patch that implemented a reset type callback. Please review and let me know of any concerns. Signed-off-by: Mike Mason mm...@us.ibm.com diff -uNrp a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c --- a/arch/powerpc/kernel/pci_64.c 2009-07-13 14:25:24.0 -0700 +++ b/arch/powerpc/kernel/pci_64.c 2009-07-15 10:26:26.0 -0700 @@ -143,6 +143,7 @@ struct pci_dev *of_create_pci_dev(struct dev-dev.bus = pci_bus_type; dev-devfn = devfn; dev-multifunction = 0; /* maybe a lie? */ + dev-fndmntl_rst_rqd = 0; /* pcie fundamental reset required */ dev-vendor = get_int_prop(node, vendor-id, 0x); dev-device = get_int_prop(node, device-id, 0x); diff -uNrp a/arch/powerpc/platforms/pseries/eeh.c b/arch/powerpc/platforms/pseries/eeh.c --- a/arch/powerpc/platforms/pseries/eeh.c 2009-06-09 20:05:27.0 -0700 +++ b/arch/powerpc/platforms/pseries/eeh.c 2009-07-15 10:29:04.0 -0700 @@ -744,7 +744,15 @@ int pcibios_set_pcie_reset_state(struct static void __rtas_set_slot_reset(struct pci_dn *pdn) { - rtas_pci_slot_reset (pdn, 1); + struct pci_dev *dev = pdn-pcidev; + + /* Determine type of EEH reset required by device, +* default hot reset or fundamental reset +*/ + if (dev-fndmntl_rst_rqd) + rtas_pci_slot_reset(pdn, 3); + else + rtas_pci_slot_reset(pdn, 1); /* The PCI bus requires that the reset be held high for at least * a 100 milliseconds. We wait a bit longer 'just in case'. */ diff -uNrp a/include/linux/pci.h b/include/linux/pci.h --- a/include/linux/pci.h 2009-07-13 14:25:37.0 -0700 +++ b/include/linux/pci.h 2009-07-15 10:25:37.0 -0700 @@ -273,6 +273,7 @@ struct pci_dev { unsigned intari_enabled:1; /* ARI forwarding */ unsigned intis_managed:1; unsigned intis_pcie:1; + unsigned intfndmntl_rst_rqd:1; /* Dev requires fundamental reset */ unsigned intstate_saved:1; unsigned intis_physfn:1; unsigned intis_virtfn:1; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] Hold reference to device_node during EEH event handling
This patch increments the device_node reference counter when an EEH error occurs and decrements the counter when the event has been handled. This is to prevent the device_node from being released until eeh_event_handler() has had a chance to deal with the event. We've seen cases where the device_node is released too soon when an EEH event occurs during a dlpar remove, causing the event handler to attempt to access bad memory locations. Please review and let me know of any concerns. Signed-off-by: Mike Mason mm...@us.ibm.com --- a/arch/powerpc/platforms/pseries/eeh_event.c2008-10-09 15:13:53.0 -0700 +++ b/arch/powerpc/platforms/pseries/eeh_event.c2009-07-14 14:14:00.0 -0700 @@ -75,6 +75,14 @@ static int eeh_event_handler(void * dumm if (event == NULL) return 0; + /* EEH holds a reference to the device_node, so if it +* equals 1 it's no longer valid and the event should +* be ignored */ + if (atomic_read(event-dn-kref.refcount) == 1) { + of_node_put(event-dn); + return 0; + } + /* Serialize processing of EEH events */ mutex_lock(eeh_event_mutex); eeh_mark_slot(event-dn, EEH_MODE_RECOVERING); @@ -86,6 +94,7 @@ static int eeh_event_handler(void * dumm eeh_clear_slot(event-dn, EEH_MODE_RECOVERING); pci_dev_put(event-dev); + of_node_put(event-dn); kfree(event); mutex_unlock(eeh_event_mutex); @@ -140,7 +149,7 @@ int eeh_send_failure_event (struct devic if (dev) pci_dev_get(dev); - event-dn = dn; + event-dn = of_node_get(dn); event-dev = dev; /* We may or may not be called in an interrupt context */ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Support for PCI Express reset type in EEH
By default, EEH does what's known as a hot reset during error recovery of a PCI Express device. We've found a case where the device needs a fundamental reset to recover properly. The current PCI error recovery and EEH frameworks do not support this distinction. The attached patch (courtesy of Richard Lary) implements a reset type callback that can be used to determine what type of reset a device requires. It is backwards compatible with all other drivers that implement PCI error recovery callbacks. Only drivers that require a fundamental reset need to be changed. So far we're only aware of one driver that has the requirement (qla2xxx). The patch touches mostly EEH and pseries code, but does require a couple of minor additions to the overall PCI error recovery framework. Signed-off-by: Mike Mason mm...@us.ibm.com --- a/arch/powerpc/include/asm/ppc-pci.h2009-06-09 20:05:27.0 -0700 +++ b/arch/powerpc/include/asm/ppc-pci.h2009-07-13 16:12:31.0 -0700 @@ -90,7 +90,9 @@ int rtas_pci_enable(struct pci_dn *pdn, * * Returns a non-zero value if the reset failed. */ -int rtas_set_slot_reset (struct pci_dn *); +#define HOT_RESET 1 +#define FUNDAMENTAL_RESET 3 +int rtas_set_slot_reset (struct pci_dn *, int reset_type); int eeh_wait_for_slot_status(struct pci_dn *pdn, int max_wait_msecs); /** --- a/arch/powerpc/platforms/pseries/eeh.c 2009-06-09 20:05:27.0 -0700 +++ b/arch/powerpc/platforms/pseries/eeh.c 2009-07-13 16:27:27.0 -0700 @@ -666,7 +666,7 @@ rtas_pci_enable(struct pci_dn *pdn, int /** * rtas_pci_slot_reset - raises/lowers the pci #RST line * @pdn pci device node - * @state: 1/0 to raise/lower the #RST + * @state: 1/3/0 to raise hot-reset/fundamental-reset/lower the #RST * * Clear the EEH-frozen condition on a slot. This routine * asserts the PCI #RST line if the 'state' argument is '1', @@ -742,9 +742,9 @@ int pcibios_set_pcie_reset_state(struct * Return 0 if success, else a non-zero value. */ -static void __rtas_set_slot_reset(struct pci_dn *pdn) +static void __rtas_set_slot_reset(struct pci_dn *pdn, int reset_type) { - rtas_pci_slot_reset (pdn, 1); + rtas_pci_slot_reset (pdn, reset_type); /* The PCI bus requires that the reset be held high for at least * a 100 milliseconds. We wait a bit longer 'just in case'. */ @@ -766,13 +766,13 @@ static void __rtas_set_slot_reset(struct msleep (PCI_BUS_SETTLE_TIME_MSEC); } -int rtas_set_slot_reset(struct pci_dn *pdn) +int rtas_set_slot_reset(struct pci_dn *pdn, int reset_type) { int i, rc; /* Take three shots at resetting the bus */ for (i=0; i3; i++) { - __rtas_set_slot_reset(pdn); + __rtas_set_slot_reset(pdn, reset_type); rc = eeh_wait_for_slot_status(pdn, PCI_BUS_RESET_WAIT_MSEC); if (rc == 0) --- a/arch/powerpc/platforms/pseries/eeh_driver.c 2009-07-13 14:25:24.0 -0700 +++ b/arch/powerpc/platforms/pseries/eeh_driver.c 2009-07-13 16:39:16.0 -0700 @@ -115,6 +115,34 @@ static void eeh_enable_irq(struct pci_de /* --- */ /** + * eeh_query_reset_type - query each device driver for reset type + * + * Query each device driver for special reset type if required + * merge the device driver responses. Cumulative response + * passed back in userdata. + */ + +static int eeh_query_reset_type(struct pci_dev *dev, void *userdata) +{ + enum pci_ers_result rc, *res = userdata; + struct pci_driver *driver = dev-driver; + + if (!driver) + return 0; + + if (!driver-err_handler || + !driver-err_handler-reset_type) + return 0; + + rc = driver-err_handler-reset_type (dev); + + /* A driver that needs a special reset trumps all others */ + if (rc == PCI_ERS_RESULT_FUNDAMENTAL_RESET ) *res = rc; + + return 0; +} + +/** * eeh_report_error - report pci error to each device driver * * Report an EEH error to each device driver, collect up and @@ -282,9 +310,12 @@ static int eeh_report_failure(struct pci * @pe_dn: pointer to a Partionable Endpoint device node. *This is the top-level structure on which pci *bus resets can be performed. + * + * reset_type: some devices may require type other than default hot reset. */ -static int eeh_reset_device (struct pci_dn *pe_dn, struct pci_bus *bus) +static int eeh_reset_device (struct pci_dn *pe_dn, struct pci_bus *bus, +int reset_type) { struct device_node *dn; int cnt, rc; @@ -298,7 +329,7 @@ static int eeh_reset_device (struct pci_ /* Reset the pci controller. (Asserts RST#; resets config space). * Reconfigure bridges and devices. Don't try to bring the system * up if the reset failed for some reason. */ - rc = rtas_set_slot_reset(pe_dn); + rc
Re: [PATCH] Set error_state to pci_channel_io_normal in eeh_report_reset()
Paul Mackerras wrote: Mike Mason writes: diff --git a/arch/powerpc/platforms/pseries/eeh_driver.c b/arch/powerpc/platforms/pseries/eeh_driver.c index 380420f..9a2a6e3 100644 --- a/arch/powerpc/platforms/pseries/eeh_driver.c +++ b/arch/powerpc/platforms/pseries/eeh_driver.c @@ -182,6 +182,8 @@ static void eeh_report_reset(struct pci_dev *dev, void *userdata) if (!driver) return; + dev-error_state = pci_channel_io_normal; + eeh_enable_irq(dev); if (!driver-err_handler || Your mail client totally mangled the whitespace. I fixed it up and applied it, since the patch was small, but in future please fix your mail client so it doesn't do that. Sorry about that. It sounds like this patch needs to be applied to earlier kernel versions -- do you agree? Maybe. The only drivers that care at this point are the emulex and qlogic drivers with native eeh support. How far back would you typically apply a patch like this? I'm submitting it for inclusion in SLES 10 11 and RHEL 5. Mike Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH] Set error_state to pci_channel_io_normal in eeh_report_reset()
While adding native EEH support to Emulex and Qlogic drivers, it was discovered that dev-error_state was set to pci_io_channel_normal too late in the recovery process. These drivers rely on error_state to determine if they can access the device in their slot_reset callback, thus error_state needs to be set to pci_io_channel_norm in eeh_report_reset(). Below is a detailed explanation (courtesy of Richard Lary) as to why this is necessary. Background: PCI MMIO or DMA accesses to a frozen slot generate additional EEH errors. If the number of additional EEH errors exceeds EEH_MAX_FAILS the adapter will be shutdown. To avoid triggering excessive EEH errors and an undesirable adapter shutdown, some drivers use the pci_channel_offline(dev) wrapper function to return a Boolean value based on the value of pci_dev-error_state to determine if PCI MMIO or DMA accesses are safe. If the wrapper returns TRUE, drivers must not make PCI MMIO or DMA access to their hardware. The pci_dev structure member error_state reflects one of three values, 1) pci_channel_io_normal, 2) pci_channel_io_frozen, 3) pci_channel_io_perm_failure. Function pci_channel_offline(dev) returns TRUE if error_state is pci_channel_io_frozen or pci_channel_io_perm_failure. The EEH driver sets pci_dev-error_state to pci_channel_io_frozen at the point where the PCI slot is frozen. Currently, the EEH driver restores dev-error_state to pci_channel_io_normal in eeh_report_resume() before calling the driver's resume callback. However, when the EEH driver calls the driver's slot_reset callback() from eeh_report_reset(), it incorrectly indicates the error state is still pci_channel_io_frozen. Waiting until eeh_report_resume() to restore dev-error_state to pci_channel_io_normal is too late for Emulex and QLogic FC drivers and any other drivers which are designed to use common code paths in these two cases: i) those called after the driver's slot_reset callback() and ii) those called after the PCI slot is frozen but before the driver's slot_reset callback is called. Case i) all driver paths executed to reinitialize the hardware after a reset and case ii) all code paths executed by driver kernel threads that run asynchronous to the main driver thread, such as interrupt handlers and worker threads to process driver work queues. Emulex and QLogic FC drivers are designed with common code paths which require that pci_channel_offline(dev) reflect the true state of the hardware. The state transitions that the hardware takes from Normal Operations to Slot Frozen to Reset to Normal Operations are documented in the Power Architectureâ„¢ Platform Requirements+ (PAPR+) in Table 75. PE State Control. PAPR defines the following 3 states: 0 -- Not reset, Not EEH stopped, MMIO load/store allowed, DMA allowed (Normal Operations) 1 -- Reset, Not EEH stopped, MMIO load/store disabled, DMA disabled 2 -- Not reset, EEH stopped, MMIO load/store disabled, DMA disabled (Slot Frozen) An EEH error places the slot in state 2 (Frozen) and the adapter driver is notified that an EEH error was detected. If the adapter driver returns PCI_ERS_RESULT_NEED_RESET, the EEH driver calls eeh_reset_device() to place the slot into state 1 (Reset) and eeh_reset_device completes by placing the slot into State 0 (Normal Operations). Upon return from eeh_reset_device(), the EEH driver calls eeh_report_reset, which then calls the adapter's slot_reset callback. At the time the adapter's slot_reset callback is called, the true state of the hardware is Normal Operations and should be accurately reflected by setting dev-error_state to pci_channel_io_normal. The current implementation of EEH driver does not do so and requires the following patch to correct this deficiency. Signed-off-by: Mike Mason mm...@us.ibm.com diff --git a/arch/powerpc/platforms/pseries/eeh_driver.c b/arch/powerpc/platforms/pseries/eeh_driver.c index 380420f..9a2a6e3 100644 --- a/arch/powerpc/platforms/pseries/eeh_driver.c +++ b/arch/powerpc/platforms/pseries/eeh_driver.c @@ -182,6 +182,8 @@ static void eeh_report_reset(struct pci_dev *dev, void *userdata) if (!driver) return; + dev-error_state = pci_channel_io_normal; + eeh_enable_irq(dev); if (!driver-err_handler || ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Only disable/enable LSI interrupts in EEH
Michael Ellerman wrote: On Tue, 2009-02-10 at 13:12 -0800, Mike Mason wrote: I'm resubmitting this patch with a couple changes suggested by Michael Ellerman. 1) the new functions should be static, and 2) some people may object to including unrelated formating changes. = The EEH code disables and enables interrupts during the device recovery process. This is unnecessary for MSI and MSI-X interrupts because they are effectively disabled by the DMA Stopped state when an EEH error occurs. The current code is also incorrect for MSI-X interrupts. It doesn't take into account that MSI-X interrupts are tracked in a different way than LSI/MSI interrupts. This patch ensures only LSI interrupts are disabled/enabled. Signed-off-by: Mike Mason mm...@us.ibm.com Acked-by: Linas Vepstas linasveps...@gmail.com Looks good. Assuming you've tested it :) Yes, it's been tested with network devices that use LSI, MSI and MSI-X interrupts. All recovered fine. Acked-by: Michael Ellerman mich...@ellerman.id.au ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Only disable/enable LSI interrupts in EEH
I'm resubmitting this patch with a couple changes suggested by Michael Ellerman. 1) the new functions should be static, and 2) some people may object to including unrelated formating changes. = The EEH code disables and enables interrupts during the device recovery process. This is unnecessary for MSI and MSI-X interrupts because they are effectively disabled by the DMA Stopped state when an EEH error occurs. The current code is also incorrect for MSI-X interrupts. It doesn't take into account that MSI-X interrupts are tracked in a different way than LSI/MSI interrupts. This patch ensures only LSI interrupts are disabled/enabled. Signed-off-by: Mike Mason mm...@us.ibm.com Acked-by: Linas Vepstas linasveps...@gmail.com --- arch/powerpc/platforms/pseries/eeh_driver.c-orig2009-02-10 07:12:31.0 -0800 +++ arch/powerpc/platforms/pseries/eeh_driver.c 2009-02-10 08:19:54.0 -0800 @@ -76,6 +76,40 @@ static int irq_in_use(unsigned int irq) return rc; } +/** + * eeh_disable_irq - disable interrupt for the recovering device + */ +static void eeh_disable_irq(struct pci_dev *dev) +{ + struct device_node *dn = pci_device_to_OF_node(dev); + + /* Don't disable MSI and MSI-X interrupts. They are + * effectively disabled by the DMA Stopped state + * when an EEH error occurs. + */ + if (dev-msi_enabled || dev-msix_enabled) + return; + + if (!irq_in_use(dev-irq)) + return; + + PCI_DN(dn)-eeh_mode |= EEH_MODE_IRQ_DISABLED; + disable_irq_nosync(dev-irq); +} + +/** + * eeh_enable_irq - enable interrupt for the recovering device + */ +static void eeh_enable_irq(struct pci_dev *dev) +{ + struct device_node *dn = pci_device_to_OF_node(dev); + + if ((PCI_DN(dn)-eeh_mode) EEH_MODE_IRQ_DISABLED) { + PCI_DN(dn)-eeh_mode = ~EEH_MODE_IRQ_DISABLED; + enable_irq(dev-irq); + } +} + /* --- */ /** * eeh_report_error - report pci error to each device driver @@ -95,11 +129,8 @@ static void eeh_report_error(struct pci_ if (!driver) return; - if (irq_in_use (dev-irq)) { - struct device_node *dn = pci_device_to_OF_node(dev); - PCI_DN(dn)-eeh_mode |= EEH_MODE_IRQ_DISABLED; - disable_irq_nosync(dev-irq); - } + eeh_disable_irq(dev); + if (!driver-err_handler || !driver-err_handler-error_detected) return; @@ -144,15 +175,12 @@ static void eeh_report_reset(struct pci_ { enum pci_ers_result rc, *res = userdata; struct pci_driver *driver = dev-driver; - struct device_node *dn = pci_device_to_OF_node(dev); if (!driver) return; - if ((PCI_DN(dn)-eeh_mode) EEH_MODE_IRQ_DISABLED) { - PCI_DN(dn)-eeh_mode = ~EEH_MODE_IRQ_DISABLED; - enable_irq(dev-irq); - } + eeh_enable_irq(dev); + if (!driver-err_handler || !driver-err_handler-slot_reset) return; @@ -171,17 +199,14 @@ static void eeh_report_reset(struct pci_ static void eeh_report_resume(struct pci_dev *dev, void *userdata) { struct pci_driver *driver = dev-driver; - struct device_node *dn = pci_device_to_OF_node(dev); dev-error_state = pci_channel_io_normal; if (!driver) return; - if ((PCI_DN(dn)-eeh_mode) EEH_MODE_IRQ_DISABLED) { - PCI_DN(dn)-eeh_mode = ~EEH_MODE_IRQ_DISABLED; - enable_irq(dev-irq); - } + eeh_enable_irq(dev); + if (!driver-err_handler || !driver-err_handler-resume) return; @@ -205,15 +230,12 @@ static void eeh_report_failure(struct pc if (!driver) return; - if (irq_in_use (dev-irq)) { - struct device_node *dn = pci_device_to_OF_node(dev); - PCI_DN(dn)-eeh_mode |= EEH_MODE_IRQ_DISABLED; - disable_irq_nosync(dev-irq); - } - if (!driver-err_handler) - return; - if (!driver-err_handler-error_detected) + eeh_disable_irq(dev); + + if (!driver-err_handler || + !driver-err_handler-error_detected) return; + driver-err_handler-error_detected(dev, pci_channel_io_perm_failure); } ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH] Only disable/enable LSI interrupts in EEH
The EEH code disables and enables interrupts during the device recovery process. This is unnecessary for MSI and MSI-X interrupts because they are effectively disabled by the DMA Stopped state when an EEH error occurs. The current code is also incorrect for MSI-X interrupts. It doesn't take into account that MSI-X interrupts are tracked in a different way than LSI/MSI interrupts. This patch ensures only LSI interrupts are disabled/enabled. The patch also includes a couple minor formatting fixes. Signed-off-by: Mike Mason mm...@us.ibm.com --- linux-2.6.18.ppc64-orig/arch/powerpc/platforms/pseries/eeh_driver.c 2009-01-16 10:59:59.0 -0800 +++ linux-2.6.18.ppc64/arch/powerpc/platforms/pseries/eeh_driver.c 2009-02-07 12:29:14.0 -0800 @@ -67,7 +67,7 @@ static int irq_in_use(unsigned int irq) { int rc = 0; unsigned long flags; - struct irq_desc *desc = irq_desc + irq; + struct irq_desc *desc = irq_desc + irq; spin_lock_irqsave(desc-lock, flags); if (desc-action) @@ -76,6 +76,40 @@ static int irq_in_use(unsigned int irq) return rc; } +/** + * eeh_disable_irq - disable interrupt for the recovering device + */ +void eeh_disable_irq(struct pci_dev *dev) +{ + struct device_node *dn = pci_device_to_OF_node(dev); + + /* Don't disable MSI and MSI-X interrupts. They are + * effectively disabled by the DMA Stopped state + * when an EEH error occurs. + */ + if (dev-msi_enabled || dev-msix_enabled) + return; + + if (!irq_in_use(dev-irq)) + return; + + PCI_DN(dn)-eeh_mode |= EEH_MODE_IRQ_DISABLED; + disable_irq_nosync(dev-irq); +} + +/** + * eeh_enable_irq - enable interrupt for the recovering device + */ +void eeh_enable_irq(struct pci_dev *dev) +{ + struct device_node *dn = pci_device_to_OF_node(dev); + + if ((PCI_DN(dn)-eeh_mode) EEH_MODE_IRQ_DISABLED) { + PCI_DN(dn)-eeh_mode = ~EEH_MODE_IRQ_DISABLED; + enable_irq(dev-irq); + } +} + /* --- */ /** * eeh_report_error - report pci error to each device driver @@ -95,11 +129,8 @@ static void eeh_report_error(struct pci_ if (!driver) return; - if (irq_in_use (dev-irq)) { - struct device_node *dn = pci_device_to_OF_node(dev); - PCI_DN(dn)-eeh_mode |= EEH_MODE_IRQ_DISABLED; - disable_irq_nosync(dev-irq); - } + eeh_disable_irq(dev); + if (!driver-err_handler || !driver-err_handler-error_detected) return; @@ -144,15 +175,12 @@ static void eeh_report_reset(struct pci_ { enum pci_ers_result rc, *res = userdata; struct pci_driver *driver = dev-driver; - struct device_node *dn = pci_device_to_OF_node(dev); if (!driver) return; - if ((PCI_DN(dn)-eeh_mode) EEH_MODE_IRQ_DISABLED) { - PCI_DN(dn)-eeh_mode = ~EEH_MODE_IRQ_DISABLED; - enable_irq(dev-irq); - } + eeh_enable_irq(dev); + if (!driver-err_handler || !driver-err_handler-slot_reset) return; @@ -171,17 +199,14 @@ static void eeh_report_reset(struct pci_ static void eeh_report_resume(struct pci_dev *dev, void *userdata) { struct pci_driver *driver = dev-driver; - struct device_node *dn = pci_device_to_OF_node(dev); dev-error_state = pci_channel_io_normal; if (!driver) return; - if ((PCI_DN(dn)-eeh_mode) EEH_MODE_IRQ_DISABLED) { - PCI_DN(dn)-eeh_mode = ~EEH_MODE_IRQ_DISABLED; - enable_irq(dev-irq); - } + eeh_enable_irq(dev); + if (!driver-err_handler || !driver-err_handler-resume) return; @@ -205,15 +230,12 @@ static void eeh_report_failure(struct pc if (!driver) return; - if (irq_in_use (dev-irq)) { - struct device_node *dn = pci_device_to_OF_node(dev); - PCI_DN(dn)-eeh_mode |= EEH_MODE_IRQ_DISABLED; - disable_irq_nosync(dev-irq); - } - if (!driver-err_handler) - return; - if (!driver-err_handler-error_detected) + eeh_disable_irq(dev); + + if (!driver-err_handler || + !driver-err_handler-error_detected) return; + driver-err_handler-error_detected(dev, pci_channel_io_perm_failure); } ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Don't panic when EEH_MAX_FAILS is exceeded
Here's a repost of the patch with the suggested changes. This patch changes the EEH_MAX_FAILS action from panic to printing an error message. Panicking under under this condition is too harsh. Although performance will be affected and the device may not recover, the system is still running, which at the very least should allow for a more graceful shutdown. The patch also removes the msleep() within a spinlock, which can lead to a deadlock and is not recommended. Signed-off-by: Mike Mason [EMAIL PROTECTED] Acked-by: Linas Vepstas [EMAIL PROTECTED] --- powerpc.git/arch/powerpc/platforms/pseries/eeh.c2008-07-18 08:51:42.0 -0700 +++ powerpc.git-new/arch/powerpc/platforms/pseries/eeh.c2008-07-21 03:25:43.0 -0700 @@ -75,9 +75,9 @@ */ /* If a device driver keeps reading an MMIO register in an interrupt - * handler after a slot isolation event has occurred, we assume it - * is broken and panic. This sets the threshold for how many read - * attempts we allow before panicking. + * handler after a slot isolation event, it might be broken. + * This sets the threshold for how many read attempts we allow + * before printing an error message. */ #define EEH_MAX_FAILS 210 @@ -470,6 +470,7 @@ int eeh_dn_check_failure(struct device_n unsigned long flags; struct pci_dn *pdn; int rc = 0; + const char *location; total_mmio_ffs++; @@ -509,18 +510,15 @@ int eeh_dn_check_failure(struct device_n rc = 1; if (pdn-eeh_mode EEH_MODE_ISOLATED) { pdn-eeh_check_count ++; - if (pdn-eeh_check_count = EEH_MAX_FAILS) { - printk (KERN_ERR EEH: Device driver ignored %d bad reads, panicing\n, - pdn-eeh_check_count); + if (pdn-eeh_check_count % EEH_MAX_FAILS == 0) { + location = of_get_property(dn, ibm,loc-code, NULL); + printk (KERN_ERR EEH: %d reads ignored for recovering device at + location=%s driver=%s pci addr=%s\n, + pdn-eeh_check_count, location, + dev-driver-name, pci_name(dev)); + printk (KERN_ERR EEH: Might be infinite loop in %s driver\n, + dev-driver-name); dump_stack(); - msleep(5000); - - /* re-read the slot reset state */ - if (read_slot_reset_state(pdn, rets) != 0) - rets[0] = -1; /* reset state unknown */ - - /* If we are here, then we hit an infinite loop. Stop. */ - panic(EEH: MMIO halt (%d) on device:%s\n, rets[0], pci_name(dev)); } goto dn_unlock; } ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH] Don't panic when EEH_MAX_FAILS is exceeded
This patch changes the EEH_MAX_FAILS action from panic to printing an error message. Panicking under under this condition is too harsh. Although performance will be affected and the device may not recover, the system is still running, which at the very least, should allow for a more graceful shutdown. The panic() is now wrapped in a DEBUG statement for development purposes. The patch also removes the msleep() within a spinlock, which is not allowed. Signed-off-by: Mike Mason [EMAIL PROTECTED] --- powerpc.git/arch/powerpc/platforms/pseries/eeh.c2008-07-18 08:51:42.0 -0700 +++ powerpc.git-new/arch/powerpc/platforms/pseries/eeh.c2008-07-18 13:26:37.0 -0700 @@ -75,9 +75,9 @@ */ /* If a device driver keeps reading an MMIO register in an interrupt - * handler after a slot isolation event has occurred, we assume it - * is broken and panic. This sets the threshold for how many read - * attempts we allow before panicking. + * handler after a slot isolation event, it might be broken. + * This sets the threshold for how many read attempts we allow + * before printing an error message. */ #define EEH_MAX_FAILS 210 @@ -470,6 +470,7 @@ unsigned long flags; struct pci_dn *pdn; int rc = 0; + const char *location; total_mmio_ffs++; @@ -509,18 +510,24 @@ rc = 1; if (pdn-eeh_mode EEH_MODE_ISOLATED) { pdn-eeh_check_count ++; - if (pdn-eeh_check_count = EEH_MAX_FAILS) { - printk (KERN_ERR EEH: Device driver ignored %d bad reads, panicing\n, - pdn-eeh_check_count); + if (pdn-eeh_check_count % EEH_MAX_FAILS == 0) { + location = (char *) of_get_property(dn, ibm,loc-code, NULL); + printk (KERN_ERR EEH: %d reads ignored for recovering device at + location=%s driver=%s pci addr=%s\n, + pdn-eeh_check_count, location, + dev-driver-name, pci_name(dev)); + printk (KERN_ERR EEH: Might be infinite loop in %s driver\n, + dev-driver-name); +#ifdef DEBUG dump_stack(); - msleep(5000); - + /* re-read the slot reset state */ if (read_slot_reset_state(pdn, rets) != 0) rets[0] = -1; /* reset state unknown */ /* If we are here, then we hit an infinite loop. Stop. */ panic(EEH: MMIO halt (%d) on device:%s\n, rets[0], pci_name(dev)); +#endif } goto dn_unlock; } ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Restore PERR/SERR bit settings during EEH device recovery
Linas Vepstas wrote: 2008/7/7 Mike Mason [EMAIL PROTECTED]: The following patch restores the PERR and SERR bits in the PCI command register during an EEH device recovery. We have found at least one case (an Agilent test card) where the PERR/SERR bits are set to 1 by firmware at boot time, but are not restored to 1 during EEH recovery. Any chance they should be zero, and were accidentally set to 1? In which case, you'd need an else clause, below. I suppose it's possible. I'll add your suggestion and resubmit. Mike The patch fixes the Agilent card problem. It has been tested on several other EEH-enabled cards with no regressions. Signed-off-by: Mike Mason [EMAIL PROTECTED] --- linux-2.6.26-rc9/arch/powerpc/platforms/pseries/eeh.c 2008-07-07 16:06:57.0 -0700 +++ linux-2.6.26-rc9-new/arch/powerpc/platforms/pseries/eeh.c 2008-07-07 16:11:10.0 -0700 @@ -812,6 +812,7 @@ static inline void __restore_bars (struct pci_dn *pdn) { int i; + u32 cmd; if (NULL==pdn-phb) return; for (i=4; i10; i++) { @@ -832,6 +833,15 @@ /* max latency, min grant, interrupt pin and line */ rtas_write_config(pdn, 15*4, 4, pdn-config_space[15]); + + /* Restore PERR SERR bits, some devices require it, + don't touch the other command bits */ + rtas_read_config(pdn, PCI_COMMAND, 4, cmd); + if (pdn-config_space[1] PCI_COMMAND_PARITY) + cmd |= PCI_COMMAND_PARITY; else cmd = ~PCI_COMMAND_PARITY; + if (pdn-config_space[1] PCI_COMMAND_SERR) + cmd |= PCI_COMMAND_SERR; else cmd = ~PCI_COMMAND_SERR; + rtas_write_config(pdn, PCI_COMMAND, 4, cmd); } Other than that, I'll add an Acked-by: Linas Vepstas [EMAIL PROTECTED] --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev