RE: [RFC/PATCH]reconfigure MSI registers after resume

2005-09-02 Thread Nguyen, Tom L
On Thursday, September 01, 2005 10:38 PM Greg KH wrote:
>> Existing pci_save_state(dev)/pci_restore_state(dev) covers only 64
bytes
>> of PCI header. One solution is to extend these APIs to cover up to
256
>> bytes. What do you think?
> Will that solve this issue? 
Yes.

Thanks,
Tom
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC/PATCH]reconfigure MSI registers after resume

2005-09-01 Thread Nguyen, Tom L
On Thursday, September 01, 2005 1:10 PM Andrew Morton wrote:
>>
>> On Thursday, September 01, 2005 12:32 PM Andrew Morton wrote:
>> > So what is the alternative to Shaohua's fix?  Restore all the msi 
>> > registers on resume?
>> 
>> Yes, the PCIe port bus driver, for example, did that.
>> 

> So you're saying that each individual driver which uses MSI is
responsible
> for the restore?  
Yes.

> Is it not possible to do this in some single centralized place?
Existing pci_save_state(dev)/pci_restore_state(dev) covers only 64 bytes
of PCI header. One solution is to extend these APIs to cover up to 256
bytes. What do you think?

Thanks,
Tom
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC/PATCH]reconfigure MSI registers after resume

2005-09-01 Thread Nguyen, Tom L
On Thursday, September 01, 2005 12:32 PM Andrew Morton wrote:
> So what is the alternative to Shaohua's fix?  Restore all the msi 
> registers on resume?

Yes, the PCIe port bus driver, for example, did that.

Thanks,
Tom
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC/PATCH]reconfigure MSI registers after resume

2005-09-01 Thread Nguyen, Tom L
On Wednesday, August 31, 2005 2:44 PM Greg KH wrote:
>>On Thu, Aug 18, 2005 at 01:35:46PM +0800, Shaohua Li wrote:
>> Hi,
>> It appears pci_enable_msi doesn't reconfigure msi registers if it
>> successfully look up a msi for a device. It assumes the data and
address
>> registers unchanged after calling pci_disable_msi. But this isn't
always
>> true, such as in a suspend/resume circle. In my test system, the
>> registers unsurprised become zero after a S3 resume. This patch fixes
my
>> problem, please look at it. MSIX might have the same issue, but I
>> haven't taken a close look.

> Tom, any comments on this?

In the cases of suspend/resume, a device driver needs to restore its PCI
configuration space registers, which include the MSI/MSI-X capability
structures if a device uses MSI/MSI-X. I think reconfiguring MSI
data/address each time a driver calls pci_enable_msi may not be
necessary.

Thanks,
Tom
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] PCI bridge driver rewrite

2005-04-04 Thread Nguyen, Tom L
On Thu Feb 24 2005 - 01:33:38 Adam Belay wrote:

>The basic flow of the new code is as follows:
>1.) A standard "driver core" driver binds to a bridge device.
>2.) When "*probe" is called it sets up the hardware and allocates a
"struct pci_bus".
>3.) The "struct pci_bus" is filled with information about the detected
bridge.
>4.) The driver then registers the "struct pci_bus" with the PCI Bus
Class.
>5.) The PCI Bus Class makes the bridge available to sysfs.
>6.) It then detects hardware attached to the bridge.
>7.) Each new PCI bridge device is registered with the driver model.
>8.) All remaining PCI devices are registered with the driver model.
>
>+static void pci_enable_crs(struct pci_dev *dev)
>+{
>+ u16 cap, rpctl;
>+ int rpcap = pci_find_capability(dev, PCI_CAP_ID_EXP);
>+ if (!rpcap)
>+ return;
>+
>+ pci_read_config_word(dev, rpcap + PCI_CAP_FLAGS, &cap);
>+ if (((cap & PCI_EXP_FLAGS_TYPE) >> 4) != PCI_EXP_TYPE_ROOT_PORT)
>+ return;
>+
>+ pci_read_config_word(dev, rpcap + PCI_EXP_RTCTL, &rpctl);
>+ rpctl |= PCI_EXP_RTCTL_CRSSVE;
>+ pci_write_config_word(dev, rpcap + PCI_EXP_RTCTL, rpctl);
>+}

Adam,

We need to coordinate your work with the PCI Express Port bus driver
that was accepted into the 2.6.x kernel.  The PCI Express Port Bus
driver claims all PCI-PCI Bridge's which implements PCI Express
Capability Structure. Please refer to PCIEBUS-HOWTO.txt for why we
developed PCI Express Port Bus driver to support PCI Express features.
Your current patch will claim PCI Express root ports, preventing the PCI
Express Port bus driver from loading.   Given the many advanced features
of PCI Express a separate bus driver was required.   Can you change the
patch so it only loads on standard PCI bridges and not PCI Express
devices?

Thanks,
Long
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC]PCIErrorRecovery)

2005-03-18 Thread Nguyen, Tom L
On Thursday, March 17, 2005 2:58 PM Benjamin Herrenschmidt wrote:
> Does the link side of PCIE provides a way to trigger a hard reset of
the
> rest of the card ? If not, then it's dodgy as there may be no way to
> consistently "reset" the card if it's in a bad state. 

The PCI Express spec does not make it clear of whether an in-band
mechanism, called a hot-reset, triggers a hard reset of the rest of the
card. I agree that if not, then it's dodgy.

Thanks,
Long
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 0/6] PCI Express Advanced Error Reporting Driver

2005-03-18 Thread Nguyen, Tom L
On Friday, March 18, 2005 10:26 AM Grant Grundler wrote:
>> He was referring to an unpublished draft "Error Reporting ECN".
>> You'll have to talk to Intel's PCI-SIG representative to get a copy.
>
>Good News: the "Error Reporting ECN" is now posted on the PCISIG
website.
>
>Tom, please review and see if/how that changes your implementation.

Agree. Thanks for the update.

Thanks,
Long
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC]PCIErrorRecovery)

2005-03-18 Thread Nguyen, Tom L
On Friday, March 18, 2005 10:10 AM Grant Grundler wrote:
>A port bus driver does NOT sound like a normal device driver.
>If PCI Express defines a standard register set for a bridge
>device (like PCI COnfig space for PCI-PCI Bridges), then I
>don't see a problem with PCI-Express error handling code mucking
>with those registers. Look at how PCI-PCI bridges are supported
>today and which bits of code poke registers on PCI-PCI Bridges.

Please refer to PCIEBUS-HOWTO.txt for how port bus driver works.

Thanks,
Long
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC]PCIErrorRecovery)

2005-03-18 Thread Nguyen, Tom L
On Thursday, March 17, 2005 8:01 PM Paul Mackerras wrote:
> Does the PCI Express AER specification define an API for drivers?

No. That is why we agree a general API that works for all platforms.

>Likewise, with EEH the device driver could take recovery action on its
>own.  But we don't want to end up with multiple sets of recovery code
>in drivers, if possible.  Also we want the recovery code to be as
>simple as possible, otherwise driver authors will get it wrong.

Drivers own their devices register sets.  Therefore if there are any
vendor unique actions that can be taken by the driver to recovery we
expect the driver to do so.  For example, if the drivers see "xyz" error
and there is a known errata and workaround that involves resetting some
registers on the card.  From our perspective we see drivers taking care
of their own cards but the AER driver and your platform code will take
care of the bus/link interfaces.

>I would see the AER driver as being included in the "platform" code.
>The AER driver would be be closely involved in the recovery process.

Our goal is to have the AER driver be part of the general code base
because it is based on a PCI SIG specification that can be implemented
across all architectures.   

>What is the state of a link during the time between when an error is
>detected and when a link reset is done?  Is the link usable?  What
>happens if you try to do a MMIO read from a device downstream of the
>link?

For a FATAL error the link is "unreliable".  This means MMIO operations
may or may not succeed.  That is why the reset is performed by the
upstream port driver.  The interface to that is reliable.  A reset of an
upstream port will propagate to all downstream links.  So we need an
interface to the bus/port driver to request a reset on its downstream
link.  We don't want the AER driver writing port bus driver bridge
control registers.  We are trying to keep the ownership of the devices
register read/write within the domain of the devices driver.  In our
case the port bus driver.

Thanks,
Long
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC]PCIErrorRecovery)

2005-03-18 Thread Nguyen, Tom L
On Thursday, March 17, 2005 6:44 PM Benjamin Herrenschmidt wrote:
>I have difficulties following all of your previous explanations, I must
>admit. My point here is I'd like you to find out if the API can fit on
>the driver side, and if not, what would need to be changed.

In summary, we agreed that the API you propose should be: 

int (*error_handler)(struct pci_dev *dev, union error_src *);

I believe this API works for most of PCI Express needs.  The only
addition PCI Express needs is a mechanism for the AER code to request a
port bus driver to perform a downstream link reset when an error occurs
on that downstream link. For example, you can add the
PCIERR_ERROR_PORT_RESET message with the return is either
PCIERR_RESULT_RECOVERED or PCIERR_RESULT_DISCONNECT to fit PCI Express
needs.

Thanks,
Long
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC] PCIErrorRecovery)

2005-03-17 Thread Nguyen, Tom L
On Wednesday, March 16, 2005 7:20 PM Benjamin Herrenschmidt wrote:
>> What mechanism (message??) is used to perform the bus and/or link
>> level reset?  For PCI Express the reset is performed by the upstream
>> port driver.  My API takes this into account.  Are you assuming the
PCI
>> device on the bus does the reset or will there be a PCI bus driver
that
>> will do the reset?  How will the PCI error handling code initiate a
>> reset?
>
>The "caller", that is the error management framework. I'm defining the
>API at the driver level, not the implementation at the core level.
>
>For example, on IBM pSeries with PCI-Express, we will probably not have
>an AER driver. This will be all dealt by the firmware which will mimmic
>that to the existing EEH error management. We'll have the same API to
do
>the reset that we have today for resetting a slot.

We decided to implement PCI Express error handling based on the PCI
Express specification in a platform independent manner.  This allows any
platform that implements PCI Express AER per the PCI SIG specification
can take advantage of the advanced features, much like SHPC hot-plug or
PCI Express hot-plug implementations.

>You may have noticed in general that I didn't either define who is
>callign those callbacks. It's all implicit that this is done by
platform
>error management code. For example, on ppc64, even the recovery step
>requires action from the platform since the slot has been physically
>isolated. After we have notified all drivers  with the "error detected"
>callback, if we decide we can try the "recover" step (all drivers
>returned they could try it and we decided the error wasn't too fatal)
we
>will call the firmware to re-enable IOs on the slot and call the
>"recover" step.

For PCI Express the endpoint device driver can take recovery action on
its own, depending on the nature of the error so long as it does not
affect the upstream device.  This can include endpoint device resets.
We expect the driver to do this upon error notification, if possible.
In PCI Express since the driver will have the most knowledge regarding
the error it will have the best ability to do device dependent recovery
and IO retry.  If its recovery fails then the AER driver will ask the
upstream device driver to perform the link reset.  Since this is more of
a side effect an explicit call to recover is not necessary.  However, we
understand and agree that it is needed to support the general error
recovery cases for PCI.

To support the AER driver calling an upstream device to initiate a reset
of the link we need a specific callback since the driver doing the reset
is not the driver who got the error.  In the case of general PCI this
could be useful if a PCI bus driver were available to support the
callback for a bridge device.  This would also support specific error
recovery calls to reset an endpoint adapter.  We need a call to request
a driver to perform a reset on a link or device.  

Thanks,
Long
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC] PCIErrorRecovery)

2005-03-17 Thread Nguyen, Tom L
On Wednesday, March 16, 2005 7:52 PM Paul Mackerras wrote:
>> We need some PCI
>> based error flows to understand the details of the flow so we can
>> develop an interface compatible with both.
>
>Here is a basic outline of what happens with EEH (Enhanced Error
>Handling) on IBM PPC64 platforms.  This applies to PCI, PCI-X and
>PCI-Express devices.

Is EEH a PCI-SIG specification? Is EEH specs available in public?

>We have a PCI-PCI bridge per slot.  The bridge (and the PCI fabric
>generally) look for errors such as address parity errors,
>out-of-bounds DMA accesses by the device, or anything that would
>normally cause SERR to be set.  If such an error occurs, the bridge
>immediately isolates the device, meaning that writes by the CPU to the
>device are discarded, reads by the CPU are returned with all 1s data,
>and DMA accesses by the device are blocked.

It seems that a PCI-PCI bridge per slot is hardware implementation
specific. The fact that the PCI-PCI Bridge can isolate the slot is
hardware feature specific.

>What happens at the driver level depends on whether the driver is
>EEH-aware or not.  (This description is more what we would like to
>have rather than what is necessarily implemented at present).

PCI Express AER driver uses similar concept of determining whether the
driver is AER-aware or not except that PCI Express AER is independent
from firmware support.

>If the driver is not EEH-aware but is hot-plug capable, then the
>platform code will notice that reads from the device are returning all
>1s and query firmware about the state of the slot.  Firmware will
>indicate that the slot has been isolated.  Platform code can obtain
>more specific information about the error from firmware and log it.
>Then, platform code will generate a hot-unplug event for the slot.
>After the driver has cleaned up and notified higher levels that its
>device has gone away, platform code will call firmware to reset and
>unisolate the slot, and then generate a hotplug event to tell the
>driver that it can use the device - but as far as the driver is
>concerned, it is a new device.

Where does the platform code reside and where does it log the error?

In PCI Express if the driver is not AER-aware the fatal error message is
reported by its upstream switch, the AER driver obtains comprehensive
error information from the upstream switch (like EEH platform code
obtains error information from the firmware). Since the driver is not
AER-aware, the fatal error is reported to user to make a policy decision
since the PCI Express does not have a hot-plug event for the slot like
EEH platform. 

So it looks like the hot-plug capability of the driver is being used in
lieu of specific callbacks to freeze and thaw IO in the case of a
non-aware driver.  If the driver does not support hot-plug then the
error is just logged.  Do you leave the slot isolated or perform error
recovery anyway?

On a fatal error the interface is down.  No matter what the driver
supports (AER aware, EEH aware, unaware) all IO is likely to fail.
Resetting a bus in a point-to-point environment like PCI Express or EEH
(as you describe) should have little adverse effect.  The risk is the
bus reset will cause a card reset and the driver must understand to
re-initialize the card.  A link reset in PCI Express will not cause a
card reset.  We assume the driver will reset its card if necessary.

>If the driver is EEH-aware, then we use the API that Ben has
>proposed.  Platform code can either reset the slot (by calling
>firmware) or not, depending on what the driver asks for, and also
>depending on any other information the platform code has available to
>it, such as specific information about the error that has occurred.
>Platform code then unisolates the slot and then informs the driver
>that it can reinitialize the device and restart any transfers that
>were in progress.

In PCI Express the AER driver obtains fatal error information from the
upstream switch driver. We can use the same API with message =
PCIERR_ERROR_RECOVER to notify the endpoint driver, which is maybe
unaware of the fatal error reported by its upstream device. Mostly the
driver will respond with PCIERR_RESULT_NEED_RESET.

>Ben's API is aimed at supporting the code flows that we need for EEH
>as well as those needed for recovery from errors on PCI Express.  Part
>of the reason for not just requiring the driver to do everything
>itself is that a slot isolation event can affect multiple drivers,
>because the card in the slot could have a PCI-PCI bridge with multiple
>devices behind it.  Thus the recovery process potentially requires a
>degree of coordination between multiple drivers, and Ben's API
>addresses that.  The same coordination could be required on PCI
>Express, if I understand correctly, because a fault on an upstream
>link could affect many devices downstream of that link.

Yes the same case applies to PCI Express upstream links.  So halting IO
is desired when other devices are affected.

RE: [PATCH 1/6] PCI Express Advanced Error Reporting Driver

2005-03-15 Thread Nguyen, Tom L
Tuesday, March 15, 2005 2:51 PM Linas Vepstas wrote:
>> +void hw_aer_unregister(void)
>> +{
>> +struct pci_dev *dev = (struct pci_dev*)host->dev;
>> +unsigned short id;
>> +
>> +id = (dev->bus->number << 8) | dev->devfn;
>> +
>> +/* Unregister with AER Root driver */
>> +pcie_aer_unregister(id);
>> +}
>
>I don't understand how this can work on a system with 
>more than one domain.  On any midrange/high-end system, 
>you'll have a number of devices with identical values
>for (bus->number << 8) | devfn)

Good catch! I forgot to encounter multiple segments. However, based on
LKML inputs for a common interface in the pci_driver data structure, it
appears that pcie_aer_register and pcie_aer_unregister are no longer
required.

Thanks,
Long
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 0/6] PCI Express Advanced Error Reporting Driver

2005-03-15 Thread Nguyen, Tom L
On Tuesday, March 15, 2005 2:38 PM Grant Grundler wrote:
>> >A co-worker made the following observation (I'm paraphrasing):
>> >...this proposal does not deal with the Error Reporting ECN.
>> >For example, they do not show the advisory non-fatal bit in
>> >the correctable error status register.
>> 
>> Does he refer to the ECN update on the Received Error Bit[0] of the
>> Correctable Error Status Register and on the Training Error Bit[0] of
>> the Uncorrectable Error Status Register? If not, please clarify his
>> comments for us.

>Yes - I believe so.

Great! I will make changes to reflect this update. Thanks for pointing
it out.

Thanks,
Long
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 0/6] PCI Express Advanced Error Reporting Driver

2005-03-15 Thread Nguyen, Tom L
On Tuesday, March 15, 2005 12:12 PM Grant Grundler wrote:
>Tom,
>A co-worker made the following observation (I'm paraphrasing):
>   ...this proposal does not deal with the Error Reporting ECN.
>   For example, they do not show the advisory non-fatal bit in
>   the correctable error status register.

Does he refer to the ECN update on the Received Error Bit[0] of the
Correctable Error Status Register and on the Training Error Bit[0] of
the Uncorrectable Error Status Register? If not, please clarify his
comments for us.

Thanks,
Long
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 5/6] PCI Express Advanced Error Reporting Driver

2005-03-15 Thread Nguyen, Tom L
Friday, March 11, 2005 11:30 PM Greg KH wrote:
>> +
>> +LIST_HEAD(rc_list); /* Define Root Complex List */
>>
>Static?

The rc_list is not static. Thanks for pointing it out. Will make it
static. 

Thanks,
Long
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] PCI Express Advanced Error Reporting Driver

2005-03-14 Thread Nguyen, Tom L
On Friday, March 11, 2005 2:49 PM Paul Mackerras wrote:
>> The standard PCI Specification calls out SERR and PERR. I am not sure
>> about the recent discussion of PCI error of recovery. It is perhaps
>> regarding the possibility of recovering from a PERR or SERR. However,
>> PCI Express error occurs on the PCI Express link or on behalf of
>> transactions occurred on the PCI Express link. PCI Express component,
>> which implements PCI Express Advanced Error Reporting Capability,
sends
>> error message to the Root Port to indicate error occurred on the PCI
>> Express link where it is connected. The PCI Express error recovery is
on
>> behalf of attempting to do a PCI Express link recovery, not PCI error
>> recovery. It appears that PCI Express AER is disjoint from PCI error
>> recovery.
>
>To give you some context, the recent discussion was about how we could
>give a unified interface to drivers for both PCI-Express error
>reporting and for the "Enhanced Error Handling" (EEH) facilities we
>have on IBM PPC64 boxes.  EEH includes not only the detection and
>reporting of errors (for PCI, PCI-X and PCI-Express buses) but also
>hardware support for isolating devices when an error is detected, plus
>means for resetting individual bus segments or slots, to assist in
>recovering a device which has got into a bad state.

Thanks for providing this information.

>Does PCI Express provide any facilities for recovering from errors,
>beyond just "try that transaction again"?

PCI Express AER Root driver provides AER callback interfaces to
coordinate with PCI Express AER aware drivers. However, based on recent
LKML inputs, we like the suggestion for a common interface in the
drivers to support error handling for different platforms.

Thanks,
Long
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/6] PCI Express Advanced Error Reporting Driver

2005-03-14 Thread Nguyen, Tom L
On Saturday, March 12, 2005 1:38 AM Andi Kleen wrote:
>I haven't read your code in detail, just a high level remark.
>
>> +6. Enabling AER Aware Support in PCI Express Device Driver
>> +
>> +To enable AER aware support requires a software driver to configure
>> +the AER capability structure within its device, to initialize its
AER
>> +aware callback handle and to call pcie_aer_register. Sections 6.1,
>> +6.2, and 6.3 describe how to enable AER aware support in details.
>
>There is currently discussion underway for a generic portable PCI 
>error reporting interface for drivers. This is already being worked
>on by some PPC64 and IA64 people. I don't think it would be a good idea
>to add another incompatible PCI-E specific interface.
>
>So I would recommend to not apply pcie_aer_register() et.al.
>and coordinate with the others working on this area on a common
>interface.
>
>This would only impact the device driver interface; having
>a PCI Express specific interface in sysfs is probably ok.
>
>Otherwise we would end up with tons of ifdefs in the drivers
>supporting multiple error reporting interfaces for different platforms,

>which would be bad.
>
>Also in general I think the necessary callbacks should
>be part of the basic device; not provided in a separate structure.

Agree. We will coordinate with the others working on this area on a
common interface.

Thanks for your suggestions,
Long

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 0/6] PCI Express Advanced Error Reporting Driver

2005-03-14 Thread Nguyen, Tom L
On Friday, March 11, 2005 11:21 PM Greg KH wrote:
>> 
>> -Report the errors to user.
>>
>This is done through the syslog, right?  Is that acceptable?

Reporting the errors to user can be written automatically to
/var/log/messages or be manually consumed through the syslog. I am not
sure whether it is acceptable or not, but I like your below suggestion. 

>It looks like you are logging a lot of stuff, all without a kernel log
>level, which is going to really mess up syslog parsers.
>
>Have you thought about just providing userspace with access to the
error
>message, in binary form, from a sysfs file, and causing a kevent to
wake
>userspace up to know to read from the file?  That way all of the
parsing
>of the error log can be done in userspace, and there is no formatting
of
>the messages from within the kernel.

Again, I like this suggestion.

Thanks,
Long
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/6] PCI Express Advanced Error Reporting Driver

2005-03-14 Thread Nguyen, Tom L
On Friday, March 11, 2005 11:25 PM Greg KH wrote:
>> +static ssize_t aer_sysfs_consume_show(struct device_driver *dev,
char >>*buf)
>> +{
>> +return aer_fsprint_record(buf);
>> +}
>> +
>> +static ssize_t aer_sysfs_status_show(struct device_driver *dev, char
>>*buf)
>> +{
>> +return aer_fsprint_devices(buf);
>> +}
>> +
>
>Why call wrapper functions that only do one thing?  Why have this extra
>layer of indirection that is not needed from what I can tell?

Agree, will make changes appropriately. Thanks for your comment.

>> +static ssize_t aer_sysfs_verbose_show(struct device_driver *dev,
char >>*buf)
>> +{
>> +return sprintf(buf, "Verbose display set to %d\n", 
>> +aer_get_verbose()); 
>> +}
>>
>Just echo the value, don't print out pretty strings :)

Agree, will make changes appropriately.

>> +static ssize_t aer_sysfs_verbose_store(struct device_driver *drv,   
>> +const char *buf, size_t count)  
>> +{
>> +aer_set_verbose(buf[0] - 0x30); 
>> +return count;   
>> +}
>>
>Oh, that's a problem waiting to happen... Please validate the user
>provided value before acting on it.

Agree, will make changes appropriately.

>> +static ssize_t aer_sysfs_auto_show(struct device_driver *dev, char
*buf)
>> +{
>> +return sprintf(buf, "Automatic reporting is %s\n",  
>> +(aer_get_auto_mode()) ? "on" : "off");

>> +}
>>
>Again, just print on/off.

Agree, will make changes appropriately.

>> +static ssize_t aer_sysfs_auto_store(struct device_driver *drv,  
>> +const char *buf, size_t count)

>> +{
>> +aer_set_auto_mode(buf[0] - 0x30);   
>> +return count;   
>> +}
>>
>Also validate this.

Agree, will make changes appropriately.

Thanks for all comments,
Long
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [2.6 patch] drivers/pci/pci-acpi.c: possible cleanups

2005-03-14 Thread Nguyen, Tom L
On Saturday, March 12, 2005 2:27 PM Adrian Bunk wrote:
> This patch contains the following possible cleanups:
> - pci-acpi.c: make OSC_UUID static
> - remove the following unused functions:
>  - pci-acpi.c: acpi_query_osc
>  - pci-acpi.c: pci_osc_support_set
> - remove the following unneeded EXPORT_SYMBOL's:
>  - pci-acpi.c: pci_osc_support_set
>
>Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

Please do not remove. These above functions, implemented for _OSC usage,
are used by PCI Express Native Hot Plug and PCI Express Advanced Error
Reporting drivers.

Thanks,
Long
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/6] PCI Express Advanced Error Reporting Driver

2005-03-14 Thread Nguyen, Tom L
Monday, March 14, 2005 3:01 AM David Vrabel wrote:

>> This patch includes PCIEAER-HOWTO.txt, which describes how the PCI
>> Express Advanced Error Reporting Root driver works.
>>
>> --- linux-2.6.11-rc5/Documentation/PCIEAER-HOWTO.txt
>>
>Could this be placed in a sub-system subdirectory (creating one if
>necessary, e.g., pci/)?  The root of Documentation/ is rather full of
>random files as is.

Most of the HOWTO documents are under Documentation/ directory. I have
no problem of placing it in a sub-system subdirectory if it is OK with
Linux community?

Thanks,
Long
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 3/6] PCI Express Advanced Error Reporting Driver

2005-03-14 Thread Nguyen, Tom L
On Friday, March 11, 2005 11:28 PM Greg KH wrote:
>> +
>> +LIST_HEAD(evt_queue);   /* Define Event Queue
List */
>
>Make this static?

Agree, will make this static. Good comment! Thanks.


>> +
>> +/**
>> + * evt_queue_pop - restore an event node from an event log list 
>> + * @where: either from top or bottom of a list
>> + *
>> + * Invoked when an error being consumed
>> + **/
>> +static struct event_node* evt_queue_pop(int where)
>> +{
>> +struct list_head *head = &evt_queue;
>> +struct event_node *evt_node = NULL;
>> +
>> +if (!list_empty(head)) {
>> +head = ((where == GET_ERR_RECORD_TOP) ? head->prev :
head->>next);
>> +evt_node = container_of(head, struct event_node,
e_node);
>> +list_del(&evt_node->e_node);
>> +records--;
>> +}
>> +
>> +return evt_node;
>> +}
>
>The lock is not held in the pop, like it is in the push function.  Any
>reason for this?

The lock is held by its caller, consume_record. I can add a comment at
this function to make it clear.

Thanks,
Long
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] PCI Express Advanced Error Reporting Driver

2005-03-11 Thread Nguyen, Tom L

On Thursday, March 10, 2005 9:23 PM Greg KH wrote:
>> On Thu, Mar 10, 2005 at 03:04:18PM -0800, long wrote:
>> PCI Express error signaling can occur on the PCI Express link itself
>> or on behalf of transactions initiated on the link. PCI Express
>> defines the Advanced Error Reporting capability, which is implemented

>> with a PCI Express advanced error reporting extended capability
>> structure, to provide more robust error reporting. With the Advanced
>> Error Reporting capability a PCI Express component, which detects an
>> error, can send an error message to the Root Port associated with
>> its hierarchy.  

>This patch was too big for lkml, and should also be sent to the
>linux-pci list.  Care to split it up and resend it?

Will split it up and resend.

> Also, how does this tie into the recent discussion about pci error
> recovery?

The standard PCI Specification calls out SERR and PERR. I am not sure
about the recent discussion of PCI error of recovery. It is perhaps
regarding the possibility of recovering from a PERR or SERR. However,
PCI Express error occurs on the PCI Express link or on behalf of
transactions occurred on the PCI Express link. PCI Express component,
which implements PCI Express Advanced Error Reporting Capability, sends
error message to the Root Port to indicate error occurred on the PCI
Express link where it is connected. The PCI Express error recovery is on
behalf of attempting to do a PCI Express link recovery, not PCI error
recovery. It appears that PCI Express AER is disjoint from PCI error
recovery.

Thanks,
Long
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: PCI Express MSI in kernel 2.4 ?

2005-01-26 Thread Nguyen, Tom L
On Tuesday, January 25, 2005 10:59 PM Philippe Marteau wrote:
> I saw that there is an implementation of MSI in the Linux kernel 2.6 
> stream.
>
> Is there a possibility to port this in the 2.4 kernel tree (we are at 
> this time using 2.4.17) ?
>
> Is this already foreseen and planned and when ?
There is no patch for 2.4 and we have no plans to do one.

> Is this MSI feature already used out there ? on which target processor

> and southbridge ?
Chipsets have been shipping with MSI support for several years. The
major dependency is if the PCI card you want to enable supports MSI.

Thanks,
Long
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/