from:"Linas Vepstas"

Re: [PATCH 3/4] Documentation: PCI: pci-error-recovery: drop doubled words

2020-07-04 Thread Linas Vepstas

Acked-by: Linas Vepstas 

for this and the other patches in the series.


On Fri, Jul 3, 2020 at 4:22 PM Randy Dunlap  wrote:
>
> Drop the doubled word "the".
>
> Signed-off-by: Randy Dunlap 
> Cc: Jonathan Corbet 
> Cc: linux-...@vger.kernel.org
> Cc: Bjorn Helgaas 
> Cc: Linas Vepstas 
> Cc: linux-...@vger.kernel.org
> ---
>  Documentation/PCI/pci-error-recovery.rst |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- linux-next-20200701.orig/Documentation/PCI/pci-error-recovery.rst
> +++ linux-next-20200701/Documentation/PCI/pci-error-recovery.rst
> @@ -248,7 +248,7 @@ STEP 4: Slot Reset
>  --
>
>  In response to a return value of PCI_ERS_RESULT_NEED_RESET, the
> -the platform will perform a slot reset on the requesting PCI device(s).
> +platform will perform a slot reset on the requesting PCI device(s).
>  The actual steps taken by a platform to perform a slot reset
>  will be platform-dependent. Upon completion of slot reset, the
>  platform will call the device slot_reset() callback.



-- 
Verbogeny is one of the pleasurettes of a creatific thinkerizer.
--Peter da Silva

Re: [PATCH] pci-error-recovery: doc cleanup

2017-03-21 Thread Linas Vepstas

Hi,

On Tue, Mar 21, 2017 at 9:24 PM, Cao jin  wrote:
> Include whitespace shooting; correction; typo fix; superfluous word
> dropping.

>
> diff --git a/Documentation/PCI/pci-error-recovery.txt 
> b/Documentation/PCI/pci-error-recovery.txt
> index da3b217..0b6bb3e 100644
> --- a/Documentation/PCI/pci-error-recovery.txt
> +++ b/Documentation/PCI/pci-error-recovery.txt
>
> @@ -231,14 +231,14 @@ proceeds to STEP 4 (Slot Reset)
>  STEP 3: Link Reset
>  --
>  The platform resets the link.  This is a PCI-Express specific step
> -and is done whenever a non-fatal error has been detected that can be
> +and is done whenever a fatal error has been detected that can be
>  "solved" by resetting the link.

First: I thought I saw a patch a few months ago that proposed removing
the link rest step.  I don't know if the patch was accepted or not.

If link resets are still supported, then they can only fix NON-fatal errors:
basically, one resets the link, and only the link; one does NOT reset
either the device driver, nor the device state. The idea is that after a link
reset, communications with the device can immediately resume right
where it left off. (this can be hard in practice, if the driver/firmware doesn't
know what it was doing when the error occurred.  this might be why no one
implements it.)  Anyway, the whole point of a link reset is that it is
explicitly
a non-fatal error.

--linas

Re: [PATCH] pci-error-recovery: doc cleanup

2017-03-21 Thread Linas Vepstas

Hi,

On Tue, Mar 21, 2017 at 9:24 PM, Cao jin  wrote:
> Include whitespace shooting; correction; typo fix; superfluous word
> dropping.

>
> diff --git a/Documentation/PCI/pci-error-recovery.txt 
> b/Documentation/PCI/pci-error-recovery.txt
> index da3b217..0b6bb3e 100644
> --- a/Documentation/PCI/pci-error-recovery.txt
> +++ b/Documentation/PCI/pci-error-recovery.txt
>
> @@ -231,14 +231,14 @@ proceeds to STEP 4 (Slot Reset)
>  STEP 3: Link Reset
>  --
>  The platform resets the link.  This is a PCI-Express specific step
> -and is done whenever a non-fatal error has been detected that can be
> +and is done whenever a fatal error has been detected that can be
>  "solved" by resetting the link.

First: I thought I saw a patch a few months ago that proposed removing
the link rest step.  I don't know if the patch was accepted or not.

If link resets are still supported, then they can only fix NON-fatal errors:
basically, one resets the link, and only the link; one does NOT reset
either the device driver, nor the device state. The idea is that after a link
reset, communications with the device can immediately resume right
where it left off. (this can be hard in practice, if the driver/firmware doesn't
know what it was doing when the error occurred.  this might be why no one
implements it.)  Anyway, the whole point of a link reset is that it is
explicitly
a non-fatal error.

--linas

Re: [PATCH] pci-error-recover: doc cleanup

2016-12-08 Thread Linas Vepstas

On Fri, Dec 9, 2016 at 2:37 PM, Cao jin <caoj.f...@cn.fujitsu.com> wrote:
>
>
> On 12/09/2016 02:24 PM, Linas Vepstas wrote:
>> I suppose I'm confused, but I recall that link resets are non-fatal.
>> Fatal errors typically require that the the pci adapter be completely
>> reset, any adapter firmware to be reloaded from scratch, the device
>> driver has to kill all device state and start from scratch. Its huge.
>> If the fatal error is on pci device that is under a block device
>> holding a file system, then (usually) there is no way to recover,
>> because the block layer (and file system) cannot deal with a block
>> device that disappeared and then reappeared some few seconds later.
>> (maybe some future zfs or lvm or btrfs might be able to deal with
>> this, but not today)
>>
>> By contrast, link resets are far more gentle: the device driver might
>> have to discard some half-full FIFO's, or cancel some in-flight
>> commands, but can otherwise gracefully recover without telling the
>> higher layers that there were any problems.
>>
>> --linas
>>
>
> I am little confused too, even not sure if we are talking the same
> *fatal error*, I am talking the fatal error defined in PCI Express spec,
> chapter 6.2.2.2.1:
>
> Fatal errors are uncorrectable error conditions which render the
> particular Link and related hardware unreliable. For Fatal errors, a
> reset of the components on the Link may be required to return to
> reliable operation. Platform handling of Fatal errors, and any efforts
> to limit the effects of these errors, is platform implementation specific.
>
> Link reset means set *secondary bus reset* bit in pci bridge config
> space, can reset the link and device simultaneously, is the strongest
> kind of reset as I know.

OK, well, its been far too many years, and I don't have the PCI spec
at my fingertips.
Isn't there a link reset that can be performed, without forcing a device reset?

The intent was that some PCI link errors are due to vibration,
ground-bounce, humidity, etc. and that these errors can be detected
and do not corrupt the device state or the device driver state.  Since
they are not associated with data corruption (or rather, the
corruption is local to the link), these can be recovered by reseting
just the link, without resetting the whole adapter. They may require
reseting some device-driver state, but not all of it.

However, this was all decided before the PCI-E spec was written, so
maybe the newer PCI-E specs now say something different.

--linas

>
>> On Thu, Dec 8, 2016 at 10:13 PM, Cao jin <caoj.f...@cn.fujitsu.com> wrote:
>>>
>>>
>>> On 12/08/2016 10:05 PM, Jonathan Corbet wrote:
>>>> On Thu, 8 Dec 2016 16:16:14 +0800
>>>> Cao jin <caoj.f...@cn.fujitsu.com> wrote:
>>>>
>>>>>  The platform resets the link, and then calls the link_reset() callback
>>>>>  on all affected device drivers.  This is a PCI-Express specific state
>>>>> -and is done whenever a non-fatal error has been detected that can be
>>>>> +and is done whenever a fatal error has been detected that can be
>>>>>  "solved" by resetting the link. This call informs the driver of the
>>>>
>>>> As far as I can tell, the original text was correct here; why do you
>>>> think this change needs to be made?
>>>>
>>>
>>> See do_recovery() in aer core, reset_link() is called only seeing fatal
>>> error.
>>>
>>> --
>>> Sincerely,
>>> Cao jin
>>>
>>>
>>
>>
>>
>
> --
> Sincerely,
> Cao jin
>
>

Re: [PATCH] pci-error-recover: doc cleanup

2016-12-08 Thread Linas Vepstas

On Fri, Dec 9, 2016 at 2:37 PM, Cao jin  wrote:
>
>
> On 12/09/2016 02:24 PM, Linas Vepstas wrote:
>> I suppose I'm confused, but I recall that link resets are non-fatal.
>> Fatal errors typically require that the the pci adapter be completely
>> reset, any adapter firmware to be reloaded from scratch, the device
>> driver has to kill all device state and start from scratch. Its huge.
>> If the fatal error is on pci device that is under a block device
>> holding a file system, then (usually) there is no way to recover,
>> because the block layer (and file system) cannot deal with a block
>> device that disappeared and then reappeared some few seconds later.
>> (maybe some future zfs or lvm or btrfs might be able to deal with
>> this, but not today)
>>
>> By contrast, link resets are far more gentle: the device driver might
>> have to discard some half-full FIFO's, or cancel some in-flight
>> commands, but can otherwise gracefully recover without telling the
>> higher layers that there were any problems.
>>
>> --linas
>>
>
> I am little confused too, even not sure if we are talking the same
> *fatal error*, I am talking the fatal error defined in PCI Express spec,
> chapter 6.2.2.2.1:
>
> Fatal errors are uncorrectable error conditions which render the
> particular Link and related hardware unreliable. For Fatal errors, a
> reset of the components on the Link may be required to return to
> reliable operation. Platform handling of Fatal errors, and any efforts
> to limit the effects of these errors, is platform implementation specific.
>
> Link reset means set *secondary bus reset* bit in pci bridge config
> space, can reset the link and device simultaneously, is the strongest
> kind of reset as I know.

OK, well, its been far too many years, and I don't have the PCI spec
at my fingertips.
Isn't there a link reset that can be performed, without forcing a device reset?

The intent was that some PCI link errors are due to vibration,
ground-bounce, humidity, etc. and that these errors can be detected
and do not corrupt the device state or the device driver state.  Since
they are not associated with data corruption (or rather, the
corruption is local to the link), these can be recovered by reseting
just the link, without resetting the whole adapter. They may require
reseting some device-driver state, but not all of it.

However, this was all decided before the PCI-E spec was written, so
maybe the newer PCI-E specs now say something different.

--linas

>
>> On Thu, Dec 8, 2016 at 10:13 PM, Cao jin  wrote:
>>>
>>>
>>> On 12/08/2016 10:05 PM, Jonathan Corbet wrote:
>>>> On Thu, 8 Dec 2016 16:16:14 +0800
>>>> Cao jin  wrote:
>>>>
>>>>>  The platform resets the link, and then calls the link_reset() callback
>>>>>  on all affected device drivers.  This is a PCI-Express specific state
>>>>> -and is done whenever a non-fatal error has been detected that can be
>>>>> +and is done whenever a fatal error has been detected that can be
>>>>>  "solved" by resetting the link. This call informs the driver of the
>>>>
>>>> As far as I can tell, the original text was correct here; why do you
>>>> think this change needs to be made?
>>>>
>>>
>>> See do_recovery() in aer core, reset_link() is called only seeing fatal
>>> error.
>>>
>>> --
>>> Sincerely,
>>> Cao jin
>>>
>>>
>>
>>
>>
>
> --
> Sincerely,
> Cao jin
>
>

Re: [PATCH] pci-error-recover: doc cleanup

2016-12-08 Thread Linas Vepstas

I suppose I'm confused, but I recall that link resets are non-fatal.
Fatal errors typically require that the the pci adapter be completely
reset, any adapter firmware to be reloaded from scratch, the device
driver has to kill all device state and start from scratch. Its huge.
If the fatal error is on pci device that is under a block device
holding a file system, then (usually) there is no way to recover,
because the block layer (and file system) cannot deal with a block
device that disappeared and then reappeared some few seconds later.
(maybe some future zfs or lvm or btrfs might be able to deal with
this, but not today)

By contrast, link resets are far more gentle: the device driver might
have to discard some half-full FIFO's, or cancel some in-flight
commands, but can otherwise gracefully recover without telling the
higher layers that there were any problems.

--linas

On Thu, Dec 8, 2016 at 10:13 PM, Cao jin  wrote:
>
>
> On 12/08/2016 10:05 PM, Jonathan Corbet wrote:
>> On Thu, 8 Dec 2016 16:16:14 +0800
>> Cao jin  wrote:
>>
>>>  The platform resets the link, and then calls the link_reset() callback
>>>  on all affected device drivers.  This is a PCI-Express specific state
>>> -and is done whenever a non-fatal error has been detected that can be
>>> +and is done whenever a fatal error has been detected that can be
>>>  "solved" by resetting the link. This call informs the driver of the
>>
>> As far as I can tell, the original text was correct here; why do you
>> think this change needs to be made?
>>
>
> See do_recovery() in aer core, reset_link() is called only seeing fatal
> error.
>
> --
> Sincerely,
> Cao jin
>
>

Re: [PATCH] pci-error-recover: doc cleanup

2016-12-08 Thread Linas Vepstas

I suppose I'm confused, but I recall that link resets are non-fatal.
Fatal errors typically require that the the pci adapter be completely
reset, any adapter firmware to be reloaded from scratch, the device
driver has to kill all device state and start from scratch. Its huge.
If the fatal error is on pci device that is under a block device
holding a file system, then (usually) there is no way to recover,
because the block layer (and file system) cannot deal with a block
device that disappeared and then reappeared some few seconds later.
(maybe some future zfs or lvm or btrfs might be able to deal with
this, but not today)

By contrast, link resets are far more gentle: the device driver might
have to discard some half-full FIFO's, or cancel some in-flight
commands, but can otherwise gracefully recover without telling the
higher layers that there were any problems.

--linas

On Thu, Dec 8, 2016 at 10:13 PM, Cao jin  wrote:
>
>
> On 12/08/2016 10:05 PM, Jonathan Corbet wrote:
>> On Thu, 8 Dec 2016 16:16:14 +0800
>> Cao jin  wrote:
>>
>>>  The platform resets the link, and then calls the link_reset() callback
>>>  on all affected device drivers.  This is a PCI-Express specific state
>>> -and is done whenever a non-fatal error has been detected that can be
>>> +and is done whenever a fatal error has been detected that can be
>>>  "solved" by resetting the link. This call informs the driver of the
>>
>> As far as I can tell, the original text was correct here; why do you
>> think this change needs to be made?
>>
>
> See do_recovery() in aer core, reset_link() is called only seeing fatal
> error.
>
> --
> Sincerely,
> Cao jin
>
>

Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-05-21 Thread Linas Vepstas

Zhang, Bjorn,

On 21 May 2013 10:41, Liu, Joseph  wrote:
> Bjorn,
>
>>> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
>>> index ed4d094..7abefd9 100644
>>> --- a/drivers/pci/pcie/portdrv_pci.c
>>> +++ b/drivers/pci/pcie/portdrv_pci.c
>>> @@ -332,13 +332,11 @@ static pci_ers_result_t 
>>> pcie_portdrv_slot_reset(struct pci_dev *dev)
>>>  pci_ers_result_t status = PCI_ERS_RESULT_RECOVERED;
>>>  int retval;
>>>
>>> -/* If fatal, restore cfg space for possible link reset at upstream */
>>> -if (dev->error_state == pci_channel_io_frozen) {
>>> -dev->state_saved = true;
>>> -pci_restore_state(dev);
>>> -pcie_portdrv_restore_config(dev);
>>> -pci_enable_pcie_error_reporting(dev);
>>> -}
>>> +/* restore cfg space for possible link reset at upstream */
>>> +dev->state_saved = true;
>>> +pci_restore_state(dev);
>>> +pcie_portdrv_restore_config(dev);
>>> +pci_enable_pcie_error_reporting(dev);
>>>
>>>  /* get true return value from  */
>>>  retval = device_for_each_child(>dev, , slot_reset_iter);
>>
>>I think this patch changes the behavior in the case of a non-fatal error
>>where one of the .error_detected() methods returned
>>PCI_ERS_RESULT_NEED_RESET.  In that case, pcie_portdrv_slot_reset()
>>previously did not restore config space, but after your patch, it *will*
>>restore it.  We need an explanation of why this is safe.
>
> Here is my understanding of this part of the patch:
>
> I think your concern here should not be an issue. Whether it is a non-fatal 
> error or a fatal error, as long as one of the .error_detected() method from 
> the downstream drivers involved returns a PCI_ERS_RESULT_NEED_RESET, it 
> should trump all others and a slot reset should be performed, even though it 
> was originally due to a non-fatal error reported or only one of the 
> downstream drivers requests it. In the case of AER driver, this should be 
> implemented in the broadcast_error_message() with pci_walk_bus() by passing 
> in the report_error_detected() function, and merging the results into the 
> result_data->result...
>
> In any case, the fact that this pcie_portdrv_slot_reset() being invoked 
> should be considered as a aftermath of a slot reset has been performed, thus 
> the restore of config space should be performed after the reset. I suppose 
> the restore should be to the same state as fresh power-on states, right?


Again, I think Joe has it exactly right.   The patch, I'm not so sure.
 In my earlier emails, I was assuming that the device has just gotten
either a hard reset (power has been turned off-n-on, e.g. using
pci-hotplug, or a 'soft reset' (whatever that means :-)).   If the
adapter has been reset, then it is appropriate to restore pci config
space to something resembling a fresh power-on.

If the adapter has NOT been reset, then, ummm .. changing
('restoring') the config space would be wrong ... if I recall
correctly, a pci link reset does not whack the config space, and if
the device itself has not been whacked, then all the contents of the
config space (dma mappings, etc) are all still valid, and should not
be changed!

So, the restore of the config space should be conditional, depending
on whether the device has been reset or not.

-- Linas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-05-21 Thread Linas Vepstas

Hi,

On 21 May 2013 02:49, Yanmin Zhang  wrote:
> On Mon, 2013-05-20 at 10:37 -0500, Linas Vepstas wrote:

>>  My impression
>> is that maybe the AER driver had been doing not quite the right thing
>> for a long time.
> Pls. provide evidence/facts. The new patch is to facilitate device driver
> implementation. It doesn't mean current AER driver is incorrect. We need
> a tradeoff.
>
> Just like what Bjoin says, we shouldn't change error_state to 
> pci_channel_io_normal
> before we really recover the hardware. The patch changes it just because
> drivers might call some functions to recover the devices, while such functions
> need (error_state==pci_channel_io_normal).

Perhaps we are talking past each other.  One needs to set error_state
== pci_channel_io_normal before calling slot_reset().  If the aer
driver wasn't doing this all along, then it seems like an old bug to
me.   The error_state flag indicates the status of the PCI channel,
and not the status of the attached device.  Once the channel has been
reset, the error state is "normal" even if the card itself hasn't yet
been brought up.

Whatever it is that the aer driver is doing, it should be doing
something similar to what the eeh driver is doing, in
arch/powerpc/platforms/pseries/eeh_driver.c  -- This is the "reference
implementation" -- Its known right, it was and continues to be heavily
tested, has found its way into sriov, etc.

The one thing that eeh does NOT do is to handle suspend/sleep states.
The basic design assumption back then was that no one would ever
suspend/sleep their server.  Since suspend/sleep messes with PCI
config space, then, yes, one would need to somehow save a second,
pristine copy of config space for device recovery.

-- Linas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-05-21 Thread Linas Vepstas

Hi,

On 21 May 2013 02:49, Yanmin Zhang yanmin_zh...@linux.intel.com wrote:
 On Mon, 2013-05-20 at 10:37 -0500, Linas Vepstas wrote:

  My impression
 is that maybe the AER driver had been doing not quite the right thing
 for a long time.
 Pls. provide evidence/facts. The new patch is to facilitate device driver
 implementation. It doesn't mean current AER driver is incorrect. We need
 a tradeoff.

 Just like what Bjoin says, we shouldn't change error_state to 
 pci_channel_io_normal
 before we really recover the hardware. The patch changes it just because
 drivers might call some functions to recover the devices, while such functions
 need (error_state==pci_channel_io_normal).

Perhaps we are talking past each other.  One needs to set error_state
== pci_channel_io_normal before calling slot_reset().  If the aer
driver wasn't doing this all along, then it seems like an old bug to
me.   The error_state flag indicates the status of the PCI channel,
and not the status of the attached device.  Once the channel has been
reset, the error state is normal even if the card itself hasn't yet
been brought up.

Whatever it is that the aer driver is doing, it should be doing
something similar to what the eeh driver is doing, in
arch/powerpc/platforms/pseries/eeh_driver.c  -- This is the reference
implementation -- Its known right, it was and continues to be heavily
tested, has found its way into sriov, etc.

The one thing that eeh does NOT do is to handle suspend/sleep states.
The basic design assumption back then was that no one would ever
suspend/sleep their server.  Since suspend/sleep messes with PCI
config space, then, yes, one would need to somehow save a second,
pristine copy of config space for device recovery.

-- Linas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-05-21 Thread Linas Vepstas

Zhang, Bjorn,

On 21 May 2013 10:41, Liu, Joseph joseph@emulex.com wrote:
 Bjorn,

 diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
 index ed4d094..7abefd9 100644
 --- a/drivers/pci/pcie/portdrv_pci.c
 +++ b/drivers/pci/pcie/portdrv_pci.c
 @@ -332,13 +332,11 @@ static pci_ers_result_t 
 pcie_portdrv_slot_reset(struct pci_dev *dev)
  pci_ers_result_t status = PCI_ERS_RESULT_RECOVERED;
  int retval;

 -/* If fatal, restore cfg space for possible link reset at upstream */
 -if (dev-error_state == pci_channel_io_frozen) {
 -dev-state_saved = true;
 -pci_restore_state(dev);
 -pcie_portdrv_restore_config(dev);
 -pci_enable_pcie_error_reporting(dev);
 -}
 +/* restore cfg space for possible link reset at upstream */
 +dev-state_saved = true;
 +pci_restore_state(dev);
 +pcie_portdrv_restore_config(dev);
 +pci_enable_pcie_error_reporting(dev);

  /* get true return value from status */
  retval = device_for_each_child(dev-dev, status, slot_reset_iter);

I think this patch changes the behavior in the case of a non-fatal error
where one of the .error_detected() methods returned
PCI_ERS_RESULT_NEED_RESET.  In that case, pcie_portdrv_slot_reset()
previously did not restore config space, but after your patch, it *will*
restore it.  We need an explanation of why this is safe.

 Here is my understanding of this part of the patch:

 I think your concern here should not be an issue. Whether it is a non-fatal 
 error or a fatal error, as long as one of the .error_detected() method from 
 the downstream drivers involved returns a PCI_ERS_RESULT_NEED_RESET, it 
 should trump all others and a slot reset should be performed, even though it 
 was originally due to a non-fatal error reported or only one of the 
 downstream drivers requests it. In the case of AER driver, this should be 
 implemented in the broadcast_error_message() with pci_walk_bus() by passing 
 in the report_error_detected() function, and merging the results into the 
 result_data-result...

 In any case, the fact that this pcie_portdrv_slot_reset() being invoked 
 should be considered as a aftermath of a slot reset has been performed, thus 
 the restore of config space should be performed after the reset. I suppose 
 the restore should be to the same state as fresh power-on states, right?


Again, I think Joe has it exactly right.   The patch, I'm not so sure.
 In my earlier emails, I was assuming that the device has just gotten
either a hard reset (power has been turned off-n-on, e.g. using
pci-hotplug, or a 'soft reset' (whatever that means :-)).   If the
adapter has been reset, then it is appropriate to restore pci config
space to something resembling a fresh power-on.

If the adapter has NOT been reset, then, ummm .. changing
('restoring') the config space would be wrong ... if I recall
correctly, a pci link reset does not whack the config space, and if
the device itself has not been whacked, then all the contents of the
config space (dma mappings, etc) are all still valid, and should not
be changed!

So, the restore of the config space should be conditional, depending
on whether the device has been reset or not.

-- Linas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-05-20 Thread Linas Vepstas

On 20 May 2013 12:21, Bjorn Helgaas  wrote:
> On Fri, May 17, 2013 at 5:56 PM, Rafael J. Wysocki  wrote:
>> On Friday, May 17, 2013 05:43:33 PM Bjorn Helgaas wrote:
>>> On Fri, Apr 26, 2013 at 12:28 AM, Zhang, LongX  
>>> wrote:
>
>>> > +   /* restore cfg space for possible link reset at upstream */
>>> > +   dev->state_saved = true;
>>>
>>> "dev->state_saved == true" means that the dev->saved_config_space
>>> contains valid data.  Why do we know that's the case here?  I see that
>>> pcie_portdrv_probe() calls pci_save_state() when we first claim the
>>> port, and I guess we're assuming the state saved then is still valid.

Yes, see below.

>>> But why do we need to actually set dev->state_saved here?  Shouldn't
>>> it be already set to true anyway?
>>
>> This is a dirty trick to make pci_restore_state(dev) always work here
>> (because it checks dev->state_saved and does nothing if it isn't set).
>> I suppose.
>
> Yes, I did investigate enough to see that this is a dirty trick.  My
> question is how we know it's safe to do this dirty trick.
>
>>> > +   pci_restore_state(dev);
>>> > +   pcie_portdrv_restore_config(dev);

Lets review what is supposed to happen:

-- poweron
-- BIOS/firmware/bootloader maybe fiddles with PCI config space
-- kernel fiddles with PCI config space during boot
-- device driver sets up PCI config space correctly for normal i/o
-- PCI error occurs
-- PCI port/link is reset

At this point, we want to set the PCI config space to something
resembling what it was just before the device driver first touched it
after poweron.  Why?  Because the device driver will typically set up
DMA segments, and tweak other settings as it initializes the card.  It
needs to do almost exactly the same things again, when re-initializing
the device after the error reset.  Thus, the PCI config needs to be
appropriate for a fresh initialization of the card.

If  some other variant of PCI config was loaded here, it might contain
incorrect DMA mappings or other settings.  In particular, while the
driver is initializing the card, you risk having the card run away and
start doing things based on these incorrect settings -- provoking
another error or maybe just silently corrupting data.   The config
that you want to load should be more-or-less the same one that was
there during kernel boot, before the device-driver started touching
things.

The "dirty hack" is weird:  either there's valid data, and the flag is
set, or the data is garbage and the flag isn't set ... how could it be
otherwise?

-- Linas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-05-20 Thread Linas Vepstas

I think Joe Liu has it right.  I'm going to top-post because things
are a bit tangled below.  I urge a review of
/Documentation/PCI/pci-error-recovery.txt, as that gives the details.

The intended sequence is that, after an error, the device driver gets
a shot at running some diagnostics & dumps, and then the pci
bridges/controllers/ports/links are reset (by platform code, viz. aer
in this case) to a state resembling a fresh power-on.  Then the
.slot_reset() callback is called on the device driver, to tell the
driver "hey everything upstream is now working, go set yourself up for
normal ops."  Thus, in particular, one should have pdev->error_state =
pci_channel_io_normal; before .slot_reset() is called, and the PCI
config space contents should resemble a fresh power-on state (and
**NOT** some other saved state!)

If the device driver wanted to leave the card in a dead state, it had
several opportunities to say so, earlier in the callback sequence.  If
the driver wanted to fiddle with the card with the old PCI config
space, it already had a chance to do that too, before the
bridges/controllers/ports/links were fully reset by the platform.   By
the time that .slot_reset() is being called, both the platform and the
device driver are expecting smooth sailing ahead.

So, looking at the original patch, it seems reasonable.  My impression
is that maybe the AER driver had been doing not quite the right thing
for a long time.

-- Linas Vepstas

On 20 May 2013 09:38, Liu, Joseph  wrote:
> Bjorn,
>
> I have been working with the low level device drivers for supporting both EEH 
> and AER and were involved in working with Yanmin for verifying this AER 
> patch. Please see some of my responses to your questions inline, prefixed 
> with [jzl].
>
>
> Thanks,
> Joe Liu, Ph.D.
> Senior Principal Engineer
> Emulex Corporation
>
> -Original Message-
> From: Bjorn Helgaas [mailto:bhelg...@google.com]
> Sent: Friday, May 17, 2013 7:44 PM
> To: Zhang, LongX
> Cc: linasveps...@gmail.com; linux-...@vger.kernel.org; 
> linux-kernel@vger.kernel.org; yanmin_zh...@linux.intel.com; Liu, Joseph; 
> Rafael J. Wysocki
> Subject: Re: Subject : [ PATCH ] 
> pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset
>
> [+cc Rafael because he knows about dev->state_saved]
>
> Sorry, I'm not very familiar with AER, so please excuse some naive
> questions below.
>
> On Fri, Apr 26, 2013 at 12:28 AM, Zhang, LongX  wrote:
>> From: Zhang Long 
>>
>> Specific pci device drivers might have many functions to call
>> pci_channel_offline to check device states. When slot_reset happens,
>> drivers' slot_reset callback might call such functions and eventually
>> abort the reset.
>
> |Where does this happen?  I looked at all the references to
> |dev->error_state and all the callers of pci_channel_offline(), and I
> |didn't see any in .slot_reset() methods.
> |
> |(There are *assignments* to dev->error_state in qlcnic_attach_func(),
> |qlge_io_slot_reset(), and qla2xxx_pci_slot_reset().  You might be able
> |to remove those assignments after this patch, but this patch wouldn't
> |really change anything for those paths.)
>
> [jzl] Although low level driver might not call pci_channel_offline() directly 
> in .slot_reset() method itself, it does not mean it will not call it during 
> the device error recovery execution of .slot_reset() method. The 
> .slot_reset() method needs to call some of its bottom layer routines 
> interfacing with device PCI interface for preparing the PCI device from 
> recovery of PCI error (EEH or AER). As a matter of fact, it seems that 
> qla2xxx_pci_slot_reset() routine's assignments to dev->error was an attempt 
> to work around this AER driver issue that this patch is trying to resolve. 
> From upstream kernel 3.9.2's qla2xxx_pci_slot_reset(), it has the assignment 
> and comment below:
>
> static pci_ers_result_t
> qla2xxx_pci_slot_reset(struct pci_dev *pdev)
> {
> pci_ers_result_t ret = PCI_ERS_RESULT_DISCONNECT;
> scsi_qla_host_t *base_vha = pci_get_drvdata(pdev);
> struct qla_hw_data *ha = base_vha->hw;
> struct rsp_que *rsp;
> int rc, retries = 10;
>
> ql_dbg(ql_dbg_aer, base_vha, 0x9004,
> "Slot Reset.\n");
>
> /* Workaround: qla2xxx driver which access hardware earlier
>  * needs error state to be pci_channel_io_online.
>  * Otherwise mailbox command timesout.
>  */
> pdev->error_state = pci_channel_io_normal;
>
>
>
>> The patch resets pdev->error_state to pci_channel_io_normal at
>> the begining of report_slot_reset.
>
>> Signed-off-by: Zhang Yanmin 
>> Signed-off-by: Zhang Long 
>

Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-05-20 Thread Linas Vepstas

I think Joe Liu has it right.  I'm going to top-post because things
are a bit tangled below.  I urge a review of
/Documentation/PCI/pci-error-recovery.txt, as that gives the details.

The intended sequence is that, after an error, the device driver gets
a shot at running some diagnostics  dumps, and then the pci
bridges/controllers/ports/links are reset (by platform code, viz. aer
in this case) to a state resembling a fresh power-on.  Then the
.slot_reset() callback is called on the device driver, to tell the
driver hey everything upstream is now working, go set yourself up for
normal ops.  Thus, in particular, one should have pdev-error_state =
pci_channel_io_normal; before .slot_reset() is called, and the PCI
config space contents should resemble a fresh power-on state (and
**NOT** some other saved state!)

If the device driver wanted to leave the card in a dead state, it had
several opportunities to say so, earlier in the callback sequence.  If
the driver wanted to fiddle with the card with the old PCI config
space, it already had a chance to do that too, before the
bridges/controllers/ports/links were fully reset by the platform.   By
the time that .slot_reset() is being called, both the platform and the
device driver are expecting smooth sailing ahead.

So, looking at the original patch, it seems reasonable.  My impression
is that maybe the AER driver had been doing not quite the right thing
for a long time.

-- Linas Vepstas

On 20 May 2013 09:38, Liu, Joseph joseph@emulex.com wrote:
 Bjorn,

 I have been working with the low level device drivers for supporting both EEH 
 and AER and were involved in working with Yanmin for verifying this AER 
 patch. Please see some of my responses to your questions inline, prefixed 
 with [jzl].


 Thanks,
 Joe Liu, Ph.D.
 Senior Principal Engineer
 Emulex Corporation

 -Original Message-
 From: Bjorn Helgaas [mailto:bhelg...@google.com]
 Sent: Friday, May 17, 2013 7:44 PM
 To: Zhang, LongX
 Cc: linasveps...@gmail.com; linux-...@vger.kernel.org; 
 linux-kernel@vger.kernel.org; yanmin_zh...@linux.intel.com; Liu, Joseph; 
 Rafael J. Wysocki
 Subject: Re: Subject : [ PATCH ] 
 pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

 [+cc Rafael because he knows about dev-state_saved]

 Sorry, I'm not very familiar with AER, so please excuse some naive
 questions below.

 On Fri, Apr 26, 2013 at 12:28 AM, Zhang, LongX longx.zh...@intel.com wrote:
 From: Zhang Long longx.zh...@intel.com

 Specific pci device drivers might have many functions to call
 pci_channel_offline to check device states. When slot_reset happens,
 drivers' slot_reset callback might call such functions and eventually
 abort the reset.

 |Where does this happen?  I looked at all the references to
 |dev-error_state and all the callers of pci_channel_offline(), and I
 |didn't see any in .slot_reset() methods.
 |
 |(There are *assignments* to dev-error_state in qlcnic_attach_func(),
 |qlge_io_slot_reset(), and qla2xxx_pci_slot_reset().  You might be able
 |to remove those assignments after this patch, but this patch wouldn't
 |really change anything for those paths.)

 [jzl] Although low level driver might not call pci_channel_offline() directly 
 in .slot_reset() method itself, it does not mean it will not call it during 
 the device error recovery execution of .slot_reset() method. The 
 .slot_reset() method needs to call some of its bottom layer routines 
 interfacing with device PCI interface for preparing the PCI device from 
 recovery of PCI error (EEH or AER). As a matter of fact, it seems that 
 qla2xxx_pci_slot_reset() routine's assignments to dev-error was an attempt 
 to work around this AER driver issue that this patch is trying to resolve. 
 From upstream kernel 3.9.2's qla2xxx_pci_slot_reset(), it has the assignment 
 and comment below:

 static pci_ers_result_t
 qla2xxx_pci_slot_reset(struct pci_dev *pdev)
 {
 pci_ers_result_t ret = PCI_ERS_RESULT_DISCONNECT;
 scsi_qla_host_t *base_vha = pci_get_drvdata(pdev);
 struct qla_hw_data *ha = base_vha-hw;
 struct rsp_que *rsp;
 int rc, retries = 10;

 ql_dbg(ql_dbg_aer, base_vha, 0x9004,
 Slot Reset.\n);

 /* Workaround: qla2xxx driver which access hardware earlier
  * needs error state to be pci_channel_io_online.
  * Otherwise mailbox command timesout.
  */
 pdev-error_state = pci_channel_io_normal;



 The patch resets pdev-error_state to pci_channel_io_normal at
 the begining of report_slot_reset.

 Signed-off-by: Zhang Yanmin yanmin_zh...@linux.intel.com
 Signed-off-by: Zhang Long longx.zh...@intel.com
 ---
  drivers/pci/pcie/aer/aerdrv_core.c |1 +
  drivers/pci/pcie/portdrv_pci.c |   12 +---
  2 files changed, 6 insertions(+), 7 deletions(-)

 diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
 b/drivers/pci/pcie/aer/aerdrv_core.c
 index 564d97f..c61fd44 100644
 --- a/drivers/pci/pcie/aer/aerdrv_core.c

Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-05-20 Thread Linas Vepstas

On 20 May 2013 12:21, Bjorn Helgaas bhelg...@google.com wrote:
 On Fri, May 17, 2013 at 5:56 PM, Rafael J. Wysocki r...@sisk.pl wrote:
 On Friday, May 17, 2013 05:43:33 PM Bjorn Helgaas wrote:
 On Fri, Apr 26, 2013 at 12:28 AM, Zhang, LongX longx.zh...@intel.com 
 wrote:

  +   /* restore cfg space for possible link reset at upstream */
  +   dev-state_saved = true;

 dev-state_saved == true means that the dev-saved_config_space
 contains valid data.  Why do we know that's the case here?  I see that
 pcie_portdrv_probe() calls pci_save_state() when we first claim the
 port, and I guess we're assuming the state saved then is still valid.

Yes, see below.

 But why do we need to actually set dev-state_saved here?  Shouldn't
 it be already set to true anyway?

 This is a dirty trick to make pci_restore_state(dev) always work here
 (because it checks dev-state_saved and does nothing if it isn't set).
 I suppose.

 Yes, I did investigate enough to see that this is a dirty trick.  My
 question is how we know it's safe to do this dirty trick.

  +   pci_restore_state(dev);
  +   pcie_portdrv_restore_config(dev);

Lets review what is supposed to happen:

-- poweron
-- BIOS/firmware/bootloader maybe fiddles with PCI config space
-- kernel fiddles with PCI config space during boot
-- device driver sets up PCI config space correctly for normal i/o
-- PCI error occurs
-- PCI port/link is reset

At this point, we want to set the PCI config space to something
resembling what it was just before the device driver first touched it
after poweron.  Why?  Because the device driver will typically set up
DMA segments, and tweak other settings as it initializes the card.  It
needs to do almost exactly the same things again, when re-initializing
the device after the error reset.  Thus, the PCI config needs to be
appropriate for a fresh initialization of the card.

If  some other variant of PCI config was loaded here, it might contain
incorrect DMA mappings or other settings.  In particular, while the
driver is initializing the card, you risk having the card run away and
start doing things based on these incorrect settings -- provoking
another error or maybe just silently corrupting data.   The config
that you want to load should be more-or-less the same one that was
there during kernel boot, before the device-driver started touching
things.

The dirty hack is weird:  either there's valid data, and the flag is
set, or the data is garbage and the flag isn't set ... how could it be
otherwise?

-- Linas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-05-03 Thread Linas Vepstas

Greg,

I've been receiving (and reading!) these messages; I replied that I am
not maintaining a tree, and have no way of testing these patches (no
access to hardware.)  I believe it unlikely that my situation will
change, and so I should probably be removed from the maintainers file.

No acked-by or anything; the patches are not "obviously correct" by
visual inspection; they may be right, but would require deeper
thinking about what is actually happening than I'm capable of at this
time; I'm a bit rusty with this code base, and might miss something
important.

-- Linas

p.s. you didn't see my earlier reply because I forgot to hit 'plain text reply'


On 2 May 2013 22:13, Yanmin Zhang  wrote:
>
> On Thu, 2013-05-02 at 19:00 -0700, Greg Kroah-Hartman wrote:
> > On Fri, May 03, 2013 at 08:33:00AM +0800, Yanmin Zhang wrote:
> > > On Wed, 2013-05-01 at 20:20 -0500, Linas Vepstas wrote:
> > > > Hi,
> > > >
> > > > On 1 May 2013 19:30, Yanmin Zhang 
> > > > wrote:
> > > > On Fri, 2013-04-26 at 06:28 +, Zhang, LongX wrote:
> > > > > From: Zhang Long 
> > > > >
> > > > > Specific pci device drivers might have many functions to
> > > > call
> > > > > pci_channel_offline to check device states. When slot_reset
> > > > happens,
> > > > > drivers' slot_reset callback might call such functions and
> > > > eventually
> > > > > abort the reset.
> > > > >
> > > > > The patch resets pdev->error_state to pci_channel_io_normal
> > > > at
> > > > > the begining of report_slot_reset.
> > > > >
> > > > > Thank Liu Joseph for pointing it out.
> > > >
> > > > Linas, Bjorn,
> > > >
> > > > Would you like to merge the patch to your testing tree?
> > > > The patch resolves one issue pointed out by Joseph.
> > > >
> > > >
> > > > I'm not maintaining a tree at this time, and am not able to test.
> > > > Sorry.
> > > Thanks Linas.
> > >
> > > Greg,
> > >
> > > Would you like to merge it into your testing tree? Joseph Liu tested
> > > the patch and it does work.
> >
> > You do know about the scripts/get_maintainer.pl script, right?
> >
> > Hint, try it out :)
> Greg,
>
> Thanks. We did send to the right people, Linas and Bjorn.
>
> scripts/get_maintainer.pl 
> 0001-pci-reset-error_state-to-pci_channel_io_normal-at-re.patch
> Bjorn Helgaas  (supporter:PCI 
> SUBSYSTEM,commit_signer:7/8=88%,commit_signer:10/11=91%)
> Linas Vepstas  (commit_signer:2/8=25%)
> Yijing Wang  
> (commit_signer:2/8=25%,commit_signer:2/11=18%)
> Jiang Liu  
> (commit_signer:2/8=25%,commit_signer:2/11=18%)
> Stephen Hemminger  (commit_signer:1/8=12%)
> "Rafael J. Wysocki"  (commit_signer:6/11=55%)
> Huang Ying  (commit_signer:5/11=45%)
> linux-...@vger.kernel.org (open list:PCI SUBSYSTEM)
> linux-kernel@vger.kernel.org (open list)
>
>
> I remember Jesse was maintaining PCI subsystem and he responded quickly.
>
> Yanmin
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-05-03 Thread Linas Vepstas

Greg,

I've been receiving (and reading!) these messages; I replied that I am
not maintaining a tree, and have no way of testing these patches (no
access to hardware.)  I believe it unlikely that my situation will
change, and so I should probably be removed from the maintainers file.

No acked-by or anything; the patches are not obviously correct by
visual inspection; they may be right, but would require deeper
thinking about what is actually happening than I'm capable of at this
time; I'm a bit rusty with this code base, and might miss something
important.

-- Linas

p.s. you didn't see my earlier reply because I forgot to hit 'plain text reply'


On 2 May 2013 22:13, Yanmin Zhang yanmin_zh...@linux.intel.com wrote:

 On Thu, 2013-05-02 at 19:00 -0700, Greg Kroah-Hartman wrote:
  On Fri, May 03, 2013 at 08:33:00AM +0800, Yanmin Zhang wrote:
   On Wed, 2013-05-01 at 20:20 -0500, Linas Vepstas wrote:
Hi,
   
On 1 May 2013 19:30, Yanmin Zhang yanmin_zh...@linux.intel.com
wrote:
On Fri, 2013-04-26 at 06:28 +, Zhang, LongX wrote:
 From: Zhang Long longx.zh...@intel.com

 Specific pci device drivers might have many functions to
call
 pci_channel_offline to check device states. When slot_reset
happens,
 drivers' slot_reset callback might call such functions and
eventually
 abort the reset.

 The patch resets pdev-error_state to pci_channel_io_normal
at
 the begining of report_slot_reset.

 Thank Liu Joseph for pointing it out.
   
Linas, Bjorn,
   
Would you like to merge the patch to your testing tree?
The patch resolves one issue pointed out by Joseph.
   
   
I'm not maintaining a tree at this time, and am not able to test.
Sorry.
   Thanks Linas.
  
   Greg,
  
   Would you like to merge it into your testing tree? Joseph Liu tested
   the patch and it does work.
 
  You do know about the scripts/get_maintainer.pl script, right?
 
  Hint, try it out :)
 Greg,

 Thanks. We did send to the right people, Linas and Bjorn.

 scripts/get_maintainer.pl 
 0001-pci-reset-error_state-to-pci_channel_io_normal-at-re.patch
 Bjorn Helgaas bhelg...@google.com (supporter:PCI 
 SUBSYSTEM,commit_signer:7/8=88%,commit_signer:10/11=91%)
 Linas Vepstas linasveps...@gmail.com (commit_signer:2/8=25%)
 Yijing Wang wangyij...@huawei.com 
 (commit_signer:2/8=25%,commit_signer:2/11=18%)
 Jiang Liu jiang@huawei.com 
 (commit_signer:2/8=25%,commit_signer:2/11=18%)
 Stephen Hemminger shemmin...@vyatta.com (commit_signer:1/8=12%)
 Rafael J. Wysocki rafael.j.wyso...@intel.com (commit_signer:6/11=55%)
 Huang Ying ying.hu...@intel.com (commit_signer:5/11=45%)
 linux-...@vger.kernel.org (open list:PCI SUBSYSTEM)
 linux-kernel@vger.kernel.org (open list)


 I remember Jesse was maintaining PCI subsystem and he responded quickly.

 Yanmin


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/6] Hexagon: check to if we will overflow the signal stack

2013-04-04 Thread Linas Vepstas

On 3 April 2013 19:02, Richard Kuo  wrote:

> +   /* check if we would overflow the alt stack */
> +   if (on_sig_stack(sp) && !likely(on_sig_stack(sp - frame_size)))
> +   return (void __user __force *)-1UL;

I found the !likely construction confusing, as its doing both a
'unlikely' (right?) and inverting the argument. It seems clearer,
to idiots like me, to write this as:

if (on_sig_stack(sp) && unlikely(!on_sig_stack(sp - frame_size)))

since where checking for overflow, and its unlikely that the overflow happened.

-- Linas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/6] Hexagon: check to if we will overflow the signal stack

2013-04-04 Thread Linas Vepstas

On 3 April 2013 19:02, Richard Kuo r...@codeaurora.org wrote:

 +   /* check if we would overflow the alt stack */
 +   if (on_sig_stack(sp)  !likely(on_sig_stack(sp - frame_size)))
 +   return (void __user __force *)-1UL;

I found the !likely construction confusing, as its doing both a
'unlikely' (right?) and inverting the argument. It seems clearer,
to idiots like me, to write this as:

if (on_sig_stack(sp)  unlikely(!on_sig_stack(sp - frame_size)))

since where checking for overflow, and its unlikely that the overflow happened.

-- Linas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ PATCH RESEND ] PCI-AER: Do not report successful error recovery for devices with AER-unaware drivers

2012-11-14 Thread Linas Vepstas

Yes, what you describe appears to be the correct semantics; this would
then be the more correct patch.

Read-the-email-but-didn't-try-to-test-by: Linas Vepstas  gmail.com>

-- Linas

On 14 November 2012 18:51, Bjorn Helgaas  wrote:
>
> [+cc Lance, Huang, Hidetoshi, Andrew, Zhang]
>
> On Sat, Nov 10, 2012 at 07:41:04AM +, Pandarathil, Vijaymohan R wrote:
> > When an error is detected on a PCIe device which does not have an
> > AER-aware driver, prevent AER infrastructure from reporting
> > successful error recovery.
> >
> > This is because the report_error_detected() function that gets
> > called in the first phase of recovery process allows forward
> > progress even when the driver for the device does not have AER
> > capabilities. It seems that all callbacks (in pci_error_handlers
> > structure) registered by drivers that gets called during error
> > recovery are not mandatory. So the intention of the infrastructure
> > design seems to be to allow forward progress even when a specific
> > callback has not been registered by a driver. However, if error
> > handler structure itself has not been registered, it doesn't make
> > sense to allow forward progress.
> >
> > As a result of the current design, in the case of a single device
> > having an AER-unaware driver or in the case of any function in a
> > multi-function card having an AER-unaware driver, a successful
> > recovery is reported.
> >
> > Typical scenario this happens is when a PCI device is detached
> > from a KVM host and the pci-stub driver on the host claims the
> > device. The pci-stub driver does not have error handling capabilities
> > but the AER infrastructure still reports that the device recovered
> > successfully.
> >
> > The changes proposed here leaves the device in an unrecovered state
> > if the driver for the device or for any function in a multi-function
> > card does not have error handler structure registered. This reflects
> > the true state of the device and prevents any partial recovery (or no
> > recovery at all) reported as successful.
> >
> > Please also see comments from Linas Vepstas at the following link
> > http://www.spinics.net/lists/linux-pci/msg18572.html
> >
> > Reviewed-by: Linas Vepstas  gmail.com>
> > Reviewed-by: Myron Stowe  redhat.com>
> > Signed-off-by: Vijay Mohan Pandarathil  hp.com>
> >
> > ---
> >
> > drivers/pci/pcie/aer/aerdrv_core.c | 6 ++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
> > b/drivers/pci/pcie/aer/aerdrv_core.c
> > index 06bad96..030b229 100644
> > --- a/drivers/pci/pcie/aer/aerdrv_core.c
> > +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> > @@ -215,6 +215,12 @@ static int report_error_detected(struct pci_dev *dev, 
> > void *data)
> >
> >   dev->error_state = result_data->state;
> >
> > + if ((!dev->driver || !dev->driver->err_handler) &&
> > + !(dev->hdr_type & PCI_HEADER_TYPE_BRIDGE)) {
> > + dev_info(>dev, "AER: Error detected but no driver has 
> > claimed this device or the driver is AER-unaware\n");
> > + result_data->result = PCI_ERS_RESULT_NONE;
> > + return 1;
>
> This doesn't seem right because returning 1 here causes pci_walk_bus()
> to terminate, which means we won't set dev->error_state or notify
> drivers for any devices we haven't visited yet.
>
> > + }
> >   if (!dev->driver ||
> >   !dev->driver->err_handler ||
> >   !dev->driver->err_handler->error_detected) {
>
> If none of the drivers in the affected hierarchy supports error handling,
> I think the call tree looks like this:
>
> do_recovery # uncorrectable only
> broadcast_error_message(dev, ..., report_error_detected)
> result_data.result = CAN_RECOVER
> pci_walk_bus(..., report_error_detected)
> report_error_detected   # (each dev in subtree)
> return 0# no change to result
> return result_data.result
> broadcast_error_message(dev, ..., report_mmio_enabled)
> result_data.result = PCI_ERS_RESULT_RECOVERED
> pci_walk_bus(..., report_mmio_enabled)
> report_mmio_enabled # (each dev in subtree)
> return 0# no change to result
> dev_info("recovery successful")
>
> Specifically, ther

Re: [ PATCH RESEND ] PCI-AER: Do not report successful error recovery for devices with AER-unaware drivers

2012-11-14 Thread Linas Vepstas

Yes, what you describe appears to be the correct semantics; this would
then be the more correct patch.

Read-the-email-but-didn't-try-to-test-by: Linas Vepstas linasvepstas
at gmail.com

-- Linas

On 14 November 2012 18:51, Bjorn Helgaas bhelg...@google.com wrote:

 [+cc Lance, Huang, Hidetoshi, Andrew, Zhang]

 On Sat, Nov 10, 2012 at 07:41:04AM +, Pandarathil, Vijaymohan R wrote:
  When an error is detected on a PCIe device which does not have an
  AER-aware driver, prevent AER infrastructure from reporting
  successful error recovery.
 
  This is because the report_error_detected() function that gets
  called in the first phase of recovery process allows forward
  progress even when the driver for the device does not have AER
  capabilities. It seems that all callbacks (in pci_error_handlers
  structure) registered by drivers that gets called during error
  recovery are not mandatory. So the intention of the infrastructure
  design seems to be to allow forward progress even when a specific
  callback has not been registered by a driver. However, if error
  handler structure itself has not been registered, it doesn't make
  sense to allow forward progress.
 
  As a result of the current design, in the case of a single device
  having an AER-unaware driver or in the case of any function in a
  multi-function card having an AER-unaware driver, a successful
  recovery is reported.
 
  Typical scenario this happens is when a PCI device is detached
  from a KVM host and the pci-stub driver on the host claims the
  device. The pci-stub driver does not have error handling capabilities
  but the AER infrastructure still reports that the device recovered
  successfully.
 
  The changes proposed here leaves the device in an unrecovered state
  if the driver for the device or for any function in a multi-function
  card does not have error handler structure registered. This reflects
  the true state of the device and prevents any partial recovery (or no
  recovery at all) reported as successful.
 
  Please also see comments from Linas Vepstas at the following link
  http://www.spinics.net/lists/linux-pci/msg18572.html
 
  Reviewed-by: Linas Vepstas linasvepstas at gmail.com
  Reviewed-by: Myron Stowe mstowe at redhat.com
  Signed-off-by: Vijay Mohan Pandarathil vijaymohan.pandarathil at hp.com
 
  ---
 
  drivers/pci/pcie/aer/aerdrv_core.c | 6 ++
   1 file changed, 6 insertions(+)
 
  diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
  b/drivers/pci/pcie/aer/aerdrv_core.c
  index 06bad96..030b229 100644
  --- a/drivers/pci/pcie/aer/aerdrv_core.c
  +++ b/drivers/pci/pcie/aer/aerdrv_core.c
  @@ -215,6 +215,12 @@ static int report_error_detected(struct pci_dev *dev, 
  void *data)
 
dev-error_state = result_data-state;
 
  + if ((!dev-driver || !dev-driver-err_handler) 
  + !(dev-hdr_type  PCI_HEADER_TYPE_BRIDGE)) {
  + dev_info(dev-dev, AER: Error detected but no driver has 
  claimed this device or the driver is AER-unaware\n);
  + result_data-result = PCI_ERS_RESULT_NONE;
  + return 1;

 This doesn't seem right because returning 1 here causes pci_walk_bus()
 to terminate, which means we won't set dev-error_state or notify
 drivers for any devices we haven't visited yet.

  + }
if (!dev-driver ||
!dev-driver-err_handler ||
!dev-driver-err_handler-error_detected) {

 If none of the drivers in the affected hierarchy supports error handling,
 I think the call tree looks like this:

 do_recovery # uncorrectable only
 broadcast_error_message(dev, ..., report_error_detected)
 result_data.result = CAN_RECOVER
 pci_walk_bus(..., report_error_detected)
 report_error_detected   # (each dev in subtree)
 return 0# no change to result
 return result_data.result
 broadcast_error_message(dev, ..., report_mmio_enabled)
 result_data.result = PCI_ERS_RESULT_RECOVERED
 pci_walk_bus(..., report_mmio_enabled)
 report_mmio_enabled # (each dev in subtree)
 return 0# no change to result
 dev_info(recovery successful)

 Specifically, there are no error_handler functions, so we never change
 result_data.result, and the default is that we treat the error as
 recovered successfully.  That seems broken.  An uncorrectable error
 is by definition recoverable only by device-specific software, i.e.,
 the driver.  We didn't call any drivers, so we can't have recovered
 anything.

 What do you think of the following alternative?  I don't know why you
 checked for bridge devices in your patch, so I don't know whether
 that's important here or not.

 diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
 b/drivers/pci/pcie/aer/aerdrv_core.c
 index 06bad96..a109c68 100644
 --- a/drivers/pci/pcie/aer

Re: [PATCH] ehea: Add kdump support

2007-11-26 Thread Linas Vepstas

Hi,

On Mon, Nov 26, 2007 at 01:41:37PM -0200, Luke Browning wrote:
> On Mon, 2007-11-26 at 19:16 +1100, Michael Ellerman wrote:
> 
> > For kdump we have to assume that the kernel is fundamentally broken,

If I may so humbly suggest: since ehea is a power6 thing only,
we should refocus our energies on "hypervisor assisted dump",
which solves all of these problems. 

In short, upon crash, the hypervisor will reset the 
pci devices into working order, and will then boot
a new fresh kernel into a tiny corner of ram. The rest
of ram is not cleared, and can be dumped. After the 
dump, the mem is returned to general use.

The key point here, for ehea, is "the hypervisor
will reset he device state to something rational".

Preliminary patches are at
http://patchwork.ozlabs.org/linuxppc/patch?id=14884
and following.

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ehea: Add kdump support

2007-11-26 Thread Linas Vepstas


Hi,

On Mon, Nov 26, 2007 at 01:41:37PM -0200, Luke Browning wrote:
 On Mon, 2007-11-26 at 19:16 +1100, Michael Ellerman wrote:
 
  For kdump we have to assume that the kernel is fundamentally broken,

If I may so humbly suggest: since ehea is a power6 thing only,
we should refocus our energies on hypervisor assisted dump,
which solves all of these problems. 

In short, upon crash, the hypervisor will reset the 
pci devices into working order, and will then boot
a new fresh kernel into a tiny corner of ram. The rest
of ram is not cleared, and can be dumped. After the 
dump, the mem is returned to general use.

The key point here, for ehea, is the hypervisor
will reset he device state to something rational.

Preliminary patches are at
http://patchwork.ozlabs.org/linuxppc/patch?id=14884
and following.

--linas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] pci hotplug: fix rpaphp directory naming

2007-11-14 Thread Linas Vepstas

Fix presentation of the slot number in the /sys/bus/pci/slots
directory to match that used in the majority of other drivers.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>

--
On Tue, Nov 13, 2007 at 07:26:07PM -0800, Greg KH wrote:
> We need a signed-off-by: to be able to apply this...

Whoops. See above. Same patch as list time, no changes.

On Tue, Nov 13, 2007 at 02:58:30PM -0700, Matthew Wilcox wrote:
> On Tue, Nov 13, 2007 at 03:41:21PM -0600, Linas Vepstas wrote:
> > /sys/bus/pci/slots
> > /sys/bus/pci/slots/control
> > /sys/bus/pci/slots/control/remove_slot
> > /sys/bus/pci/slots/control/add_slot
> > /sys/bus/pci/slots/0001:00:02.0
> > /sys/bus/pci/slots/0001:00:02.0/phy_location
>
> Ugh.  Almost two years ago, paulus promised me he was going to fix the
> slot name for rpaphp.  Guess he didn't.

You have to ask the right person. :-) I've been defacto mainaining
the rpaphp code for unpteen years now. On the other hand, I am also
much, much better at promising than delivering.

> This is one of the hateful things about the current design -- hotplug
> drivers do too much.  Instead of being just the interface between the
> Linux PCI code and the hardware, they create sysfs directories, add
> files,
> and generally have far too much freedom.

I chopped out several hundred LOC from rpaphp a year ago,
and hopefuly that might make furthre simplification easier 
someday.

> We have four different schemes currently for naming in slots/,
> 1. slot number.  Used by cpqphp, ibmphp, acpiphp, pciehp, shpc.
> 2. domain:bus:dev:fn.  Used by fakephp.
> 3a. domain:bus:dev.  Used by rpaphp and sgihp.
> 3b. Except that rpaphp uses phy_location to present the information
> that
> should be in the name and sgihp uses path.
>
> ... I've forgotten what cpci uses.  And yenta doesn't use it.
>
> How is anyone supposed to write sane managability tools in the
> presence
> of such anarchy?
>
> > ~ # cat /sys/bus/pci/slots/:00:02.2/phy_location
> > U787A.001.DNZ00Z5-P1-C2
>
> Right.  This should look like:
>
> # cat /sys/bus/pci/slots/U787A.001.DNZ00Z5-P1-C2/address
> :00:02

This patch implements exactly what you describe. Boot tested.
I assume you really mean it -- if so, then please review and
ack the patch !?

I have absolutely no clue if this breaks any existing IBM tools.
I'm pretty sure it doesn't ... but attention Mike Strosaker! does it?


 drivers/pci/hotplug/rpaphp.h  |1 
 drivers/pci/hotplug/rpaphp_pci.c  |   14 ---
 drivers/pci/hotplug/rpaphp_slot.c |   47 +++---
 3 files changed, 24 insertions(+), 38 deletions(-)

Index: linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_pci.c
===
--- linux-2.6.23-rc8-mm1.orig/drivers/pci/hotplug/rpaphp_pci.c  2007-07-08 
18:32:17.0 -0500
+++ linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_pci.c   2007-11-13 
17:52:10.0 -0600
@@ -64,19 +64,6 @@ int rpaphp_get_sensor_state(struct slot 
return rc;
 }
 
-static void set_slot_name(struct slot *slot)
-{
-   struct pci_bus *bus = slot->bus;
-   struct pci_dev *bridge;
-
-   bridge = bus->self;
-   if (bridge)
-   strcpy(slot->name, pci_name(bridge));
-   else
-   sprintf(slot->name, "%04x:%02x:00.0", pci_domain_nr(bus),
-   bus->number);
-}
-
 /**
  * rpaphp_enable_slot - record slot state, config pci device
  *
@@ -114,7 +101,6 @@ int rpaphp_enable_slot(struct slot *slot
info->adapter_status = EMPTY;
slot->bus = bus;
slot->pci_devs = >devices;
-   set_slot_name(slot);
 
/* if there's an adapter in the slot, go add the pci devices */
if (state == PRESENT) {
Index: linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_slot.c
===
--- linux-2.6.23-rc8-mm1.orig/drivers/pci/hotplug/rpaphp_slot.c 2007-07-08 
18:32:17.0 -0500
+++ linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_slot.c  2007-11-13 
18:05:13.0 -0600
@@ -33,23 +33,31 @@
 #include 
 #include "rpaphp.h"
 
-static ssize_t location_read_file (struct hotplug_slot *php_slot, char *buf)
+static ssize_t address_read_file (struct hotplug_slot *php_slot, char *buf)
 {
-   char *value;
-   int retval = -ENOENT;
+   int retval;
struct slot *slot = (struct slot *)php_slot->private;
+   struct pci_bus *bus;
 
if (!slot)
-   return retval;
+   return -ENOENT;
 
-   value = slot->location;
-   retval = sprintf (buf, "%s\n", value);
+   bus = slot->bus;
+   if (!bus)
+   return -ENOENT;
+
+   if (bus->self)
+   retval = sprintf(buf, pci_name(bus-&

[PATCH v2] pci hotplug: fix rpaphp directory naming

2007-11-14 Thread Linas Vepstas

Fix presentation of the slot number in the /sys/bus/pci/slots
directory to match that used in the majority of other drivers.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

--
On Tue, Nov 13, 2007 at 07:26:07PM -0800, Greg KH wrote:
 We need a signed-off-by: to be able to apply this...

Whoops. See above. Same patch as list time, no changes.

On Tue, Nov 13, 2007 at 02:58:30PM -0700, Matthew Wilcox wrote:
 On Tue, Nov 13, 2007 at 03:41:21PM -0600, Linas Vepstas wrote:
  /sys/bus/pci/slots
  /sys/bus/pci/slots/control
  /sys/bus/pci/slots/control/remove_slot
  /sys/bus/pci/slots/control/add_slot
  /sys/bus/pci/slots/0001:00:02.0
  /sys/bus/pci/slots/0001:00:02.0/phy_location

 Ugh.  Almost two years ago, paulus promised me he was going to fix the
 slot name for rpaphp.  Guess he didn't.

You have to ask the right person. :-) I've been defacto mainaining
the rpaphp code for unpteen years now. On the other hand, I am also
much, much better at promising than delivering.

 This is one of the hateful things about the current design -- hotplug
 drivers do too much.  Instead of being just the interface between the
 Linux PCI code and the hardware, they create sysfs directories, add
 files,
 and generally have far too much freedom.

I chopped out several hundred LOC from rpaphp a year ago,
and hopefuly that might make furthre simplification easier 
someday.

 We have four different schemes currently for naming in slots/,
 1. slot number.  Used by cpqphp, ibmphp, acpiphp, pciehp, shpc.
 2. domain:bus:dev:fn.  Used by fakephp.
 3a. domain:bus:dev.  Used by rpaphp and sgihp.
 3b. Except that rpaphp uses phy_location to present the information
 that
 should be in the name and sgihp uses path.

 ... I've forgotten what cpci uses.  And yenta doesn't use it.

 How is anyone supposed to write sane managability tools in the
 presence
 of such anarchy?

  ~ # cat /sys/bus/pci/slots/:00:02.2/phy_location
  U787A.001.DNZ00Z5-P1-C2

 Right.  This should look like:

 # cat /sys/bus/pci/slots/U787A.001.DNZ00Z5-P1-C2/address
 :00:02

This patch implements exactly what you describe. Boot tested.
I assume you really mean it -- if so, then please review and
ack the patch !?

I have absolutely no clue if this breaks any existing IBM tools.
I'm pretty sure it doesn't ... but attention Mike Strosaker! does it?


 drivers/pci/hotplug/rpaphp.h  |1 
 drivers/pci/hotplug/rpaphp_pci.c  |   14 ---
 drivers/pci/hotplug/rpaphp_slot.c |   47 +++---
 3 files changed, 24 insertions(+), 38 deletions(-)

Index: linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_pci.c
===
--- linux-2.6.23-rc8-mm1.orig/drivers/pci/hotplug/rpaphp_pci.c  2007-07-08 
18:32:17.0 -0500
+++ linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_pci.c   2007-11-13 
17:52:10.0 -0600
@@ -64,19 +64,6 @@ int rpaphp_get_sensor_state(struct slot 
return rc;
 }
 
-static void set_slot_name(struct slot *slot)
-{
-   struct pci_bus *bus = slot-bus;
-   struct pci_dev *bridge;
-
-   bridge = bus-self;
-   if (bridge)
-   strcpy(slot-name, pci_name(bridge));
-   else
-   sprintf(slot-name, %04x:%02x:00.0, pci_domain_nr(bus),
-   bus-number);
-}
-
 /**
  * rpaphp_enable_slot - record slot state, config pci device
  *
@@ -114,7 +101,6 @@ int rpaphp_enable_slot(struct slot *slot
info-adapter_status = EMPTY;
slot-bus = bus;
slot-pci_devs = bus-devices;
-   set_slot_name(slot);
 
/* if there's an adapter in the slot, go add the pci devices */
if (state == PRESENT) {
Index: linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_slot.c
===
--- linux-2.6.23-rc8-mm1.orig/drivers/pci/hotplug/rpaphp_slot.c 2007-07-08 
18:32:17.0 -0500
+++ linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_slot.c  2007-11-13 
18:05:13.0 -0600
@@ -33,23 +33,31 @@
 #include asm/rtas.h
 #include rpaphp.h
 
-static ssize_t location_read_file (struct hotplug_slot *php_slot, char *buf)
+static ssize_t address_read_file (struct hotplug_slot *php_slot, char *buf)
 {
-   char *value;
-   int retval = -ENOENT;
+   int retval;
struct slot *slot = (struct slot *)php_slot-private;
+   struct pci_bus *bus;
 
if (!slot)
-   return retval;
+   return -ENOENT;
 
-   value = slot-location;
-   retval = sprintf (buf, %s\n, value);
+   bus = slot-bus;
+   if (!bus)
+   return -ENOENT;
+
+   if (bus-self)
+   retval = sprintf(buf, pci_name(bus-self));
+   else
+   retval = sprintf(buf, %04x:%02x:00.0, 
+   pci_domain_nr(bus), bus-number);
+   
return retval;
 }
 
-static struct hotplug_slot_attribute php_attr_location = {
-   .attr = {.name = phy_location, .mode

[PATCH] pci hotplug: fix rpaphp directory naming

2007-11-13 Thread Linas Vepstas



Fix presentation of the slot number in the /sys/bus/pci/slots
directory to match that used in the majority of other drivers.

--

On Tue, Nov 13, 2007 at 02:58:30PM -0700, Matthew Wilcox wrote:
> On Tue, Nov 13, 2007 at 03:41:21PM -0600, Linas Vepstas wrote:
> > /sys/bus/pci/slots
> > /sys/bus/pci/slots/control
> > /sys/bus/pci/slots/control/remove_slot
> > /sys/bus/pci/slots/control/add_slot
> > /sys/bus/pci/slots/0001:00:02.0
> > /sys/bus/pci/slots/0001:00:02.0/phy_location
>
> Ugh.  Almost two years ago, paulus promised me he was going to fix the
> slot name for rpaphp.  Guess he didn't.

You have to ask the right person. :-) I've been defacto mainaining
the rpaphp code for unpteen years now. On the other hand, I am also
much, much better at promising than delivering.

> This is one of the hateful things about the current design -- hotplug
> drivers do too much.  Instead of being just the interface between the
> Linux PCI code and the hardware, they create sysfs directories, add
> files,
> and generally have far too much freedom.

I chopped out several hundred LOC from rpaphp a year ago,
and hopefuly that might make furthre simplification easier 
someday.

> We have four different schemes currently for naming in slots/,
> 1. slot number.  Used by cpqphp, ibmphp, acpiphp, pciehp, shpc.
> 2. domain:bus:dev:fn.  Used by fakephp.
> 3a. domain:bus:dev.  Used by rpaphp and sgihp.
> 3b. Except that rpaphp uses phy_location to present the information
> that
> should be in the name and sgihp uses path.
>
> ... I've forgotten what cpci uses.  And yenta doesn't use it.
>
> How is anyone supposed to write sane managability tools in the
> presence
> of such anarchy?
>
> > ~ # cat /sys/bus/pci/slots/:00:02.2/phy_location
> > U787A.001.DNZ00Z5-P1-C2
>
> Right.  This should look like:
>
> # cat /sys/bus/pci/slots/U787A.001.DNZ00Z5-P1-C2/address
> :00:02

This patch implements exactly what you describe. Boot tested.
I assume you really mean it -- if so, then please review and
ack the patch !?

I have absolutely no clue if this breaks any existing IBM tools.
I'm pretty sure it doesn't ... but attention Mike Strosaker! does it?


 drivers/pci/hotplug/rpaphp.h  |1 
 drivers/pci/hotplug/rpaphp_pci.c  |   14 ---
 drivers/pci/hotplug/rpaphp_slot.c |   47 +++---
 3 files changed, 24 insertions(+), 38 deletions(-)

Index: linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_pci.c
===
--- linux-2.6.23-rc8-mm1.orig/drivers/pci/hotplug/rpaphp_pci.c  2007-07-08 
18:32:17.0 -0500
+++ linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_pci.c   2007-11-13 
17:52:10.0 -0600
@@ -64,19 +64,6 @@ int rpaphp_get_sensor_state(struct slot 
return rc;
 }
 
-static void set_slot_name(struct slot *slot)
-{
-   struct pci_bus *bus = slot->bus;
-   struct pci_dev *bridge;
-
-   bridge = bus->self;
-   if (bridge)
-   strcpy(slot->name, pci_name(bridge));
-   else
-   sprintf(slot->name, "%04x:%02x:00.0", pci_domain_nr(bus),
-   bus->number);
-}
-
 /**
  * rpaphp_enable_slot - record slot state, config pci device
  *
@@ -114,7 +101,6 @@ int rpaphp_enable_slot(struct slot *slot
info->adapter_status = EMPTY;
slot->bus = bus;
slot->pci_devs = >devices;
-   set_slot_name(slot);
 
/* if there's an adapter in the slot, go add the pci devices */
if (state == PRESENT) {
Index: linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_slot.c
===
--- linux-2.6.23-rc8-mm1.orig/drivers/pci/hotplug/rpaphp_slot.c 2007-07-08 
18:32:17.0 -0500
+++ linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_slot.c  2007-11-13 
18:05:13.0 -0600
@@ -33,23 +33,31 @@
 #include 
 #include "rpaphp.h"
 
-static ssize_t location_read_file (struct hotplug_slot *php_slot, char *buf)
+static ssize_t address_read_file (struct hotplug_slot *php_slot, char *buf)
 {
-   char *value;
-   int retval = -ENOENT;
+   int retval;
struct slot *slot = (struct slot *)php_slot->private;
+   struct pci_bus *bus;
 
if (!slot)
-   return retval;
+   return -ENOENT;
 
-   value = slot->location;
-   retval = sprintf (buf, "%s\n", value);
+   bus = slot->bus;
+   if (!bus)
+   return -ENOENT;
+
+   if (bus->self)
+   retval = sprintf(buf, pci_name(bus->self));
+   else
+   retval = sprintf(buf, "%04x:%02x:00.0", 
+   pci_domain_nr(bus), bus->number);
+   
return retval;
 }
 
-static struct hotplug_slot_attribute php_attr_l

Re: [PATCH 0/5][RFC] Physical PCI slot objects

2007-11-13 Thread Linas Vepstas

On Tue, Nov 13, 2007 at 01:59:39PM -0700, Alex Chiang wrote:
> 
> > On pseries systems, I deal with something called the
> > "partitionable endpoint", which I think probably usually
> > corresponds to physical slots, but I don't really know.  
> >
> > So, naively, the physical slot concept doesn't really map to
> > what I have to work with; it just adds one more appendix to it
> > all, one more thing to get confused about.
> 
> Sorry, I'm a bit ignorant about pseries -- what kind of name does
> your PCI hotplug driver give to those slots? What shows up in
> /sys/bus/pci/slots/?

~ # find /sys/bus/pci/slots -print
/sys/bus/pci/slots
/sys/bus/pci/slots/control
/sys/bus/pci/slots/control/remove_slot
/sys/bus/pci/slots/control/add_slot
/sys/bus/pci/slots/0001:00:02.0
/sys/bus/pci/slots/0001:00:02.0/phy_location
/sys/bus/pci/slots/0001:00:02.0/max_bus_speed
/sys/bus/pci/slots/0001:00:02.0/adapter
/sys/bus/pci/slots/0001:00:02.0/attention
/sys/bus/pci/slots/0001:00:02.0/power
/sys/bus/pci/slots/0001:00:02.6
etc.

~ # cat /sys/bus/pci/slots/:00:02.2/phy_location
U787A.001.DNZ00Z5-P1-C2

phy_location corresponds to the printed labels on the box,
and is also shared with the various management consoles
and serivce processors and what not that get involved 
with stuff like that.

Although, please note: on these sysmes, 99% of the "useful"
info about hardware is in /proc/device-tree, and not in /sys
I doubt anyone looks at /sys/bus/pci/slots, I think they mostly
look at the open firmware device tree.

--linas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5][RFC] Physical PCI slot objects

2007-11-13 Thread Linas Vepstas

On Tue, Nov 13, 2007 at 09:01:29AM -0800, Greg KH wrote:
> 
> Also, some companies already provide userspace tools to get all of this
> information about the different slots in a system and what is where,
> from userspace, no kernel changes are needed.  So, why add all this
> extra complexity to the kernel if it is not needed?

Second that motion I don't get it. What are the goals of this
patch, really?  Just to get a "slot geographical location" from the
kernel?  I'm balancing the intellectual appeal of having a kernel
struct for representing physical objects, against the headache of 
reading (debugging, modifying) code that has yet another struct 
doing yet another thing.  So far, the dread of future headaches 
is winning.

On pseries systems, I deal with something called the "partitionable
endpoint", which I think probably usually corresponds to physical 
slots, but I don't really know.  The hardware guys pitch changeups 
all the time.  Some of these are soldered onto the planar, 
so the are not physical slots. But they are, um, "hotpluggable"
in that you can dynamically add & remove them from the system
(even though you cannot physically unplug the ones that are 
soldered on -- you can only assign them to other hardware partitions
(think hardware VM or hardware Xen)).  So, naively, the physical 
slot concept doesn't really map to what I have to work with; 
it just adds one more appendix to it all, one more thing to 
get confused about.

To be clear: above remarks are for the PowerPC boxes. I have
no clue about how things work on the IBM Intel-based boxes.
And Greg's original "get IBM to agree" remark is about the
Intel-based boxes.

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/5, RFC] Introduce pci_slot

2007-11-13 Thread Linas Vepstas

On Mon, Nov 12, 2007 at 05:14:47PM -0700, Alex Chiang wrote:
> +/* pci_slot represents a physical slot */
> +struct pci_slot {
> + struct pci_bus *bus;/* The bus this slot is on */
> + struct pci_slot *next;  /* Next slot on this bus */
> + struct hotplug_slot *hotplug;   /* Hotplug info (migrate over time) */

How much migration do you expect?  Some of it? All of it? If the
answer is "all of it", wouldn't it just be easier to rename 
struct hotplug_slot, and go from there?

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/5] Construct one fakephp slot per pci slot

2007-11-13 Thread Linas Vepstas

On Mon, Nov 12, 2007 at 05:13:36PM -0700, Alex Chiang wrote:
> + slot->name = kmalloc(8, GFP_KERNEL);
> + sprintf(slot->name, "fake%d", count++);

Please use snprintf to avoid buffer overruns!

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/5] Construct one fakephp slot per pci slot

2007-11-13 Thread Linas Vepstas

On Mon, Nov 12, 2007 at 05:13:36PM -0700, Alex Chiang wrote:
 + slot-name = kmalloc(8, GFP_KERNEL);
 + sprintf(slot-name, fake%d, count++);

Please use snprintf to avoid buffer overruns!

--linas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/5, RFC] Introduce pci_slot

2007-11-13 Thread Linas Vepstas

On Mon, Nov 12, 2007 at 05:14:47PM -0700, Alex Chiang wrote:
 +/* pci_slot represents a physical slot */
 +struct pci_slot {
 + struct pci_bus *bus;/* The bus this slot is on */
 + struct pci_slot *next;  /* Next slot on this bus */
 + struct hotplug_slot *hotplug;   /* Hotplug info (migrate over time) */

How much migration do you expect?  Some of it? All of it? If the
answer is all of it, wouldn't it just be easier to rename 
struct hotplug_slot, and go from there?

--linas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5][RFC] Physical PCI slot objects

2007-11-13 Thread Linas Vepstas

On Tue, Nov 13, 2007 at 09:01:29AM -0800, Greg KH wrote:
 
 Also, some companies already provide userspace tools to get all of this
 information about the different slots in a system and what is where,
 from userspace, no kernel changes are needed.  So, why add all this
 extra complexity to the kernel if it is not needed?

Second that motion I don't get it. What are the goals of this
patch, really?  Just to get a slot geographical location from the
kernel?  I'm balancing the intellectual appeal of having a kernel
struct for representing physical objects, against the headache of 
reading (debugging, modifying) code that has yet another struct 
doing yet another thing.  So far, the dread of future headaches 
is winning.

On pseries systems, I deal with something called the partitionable
endpoint, which I think probably usually corresponds to physical 
slots, but I don't really know.  The hardware guys pitch changeups 
all the time.  Some of these are soldered onto the planar, 
so the are not physical slots. But they are, um, hotpluggable
in that you can dynamically add  remove them from the system
(even though you cannot physically unplug the ones that are 
soldered on -- you can only assign them to other hardware partitions
(think hardware VM or hardware Xen)).  So, naively, the physical 
slot concept doesn't really map to what I have to work with; 
it just adds one more appendix to it all, one more thing to 
get confused about.

To be clear: above remarks are for the PowerPC boxes. I have
no clue about how things work on the IBM Intel-based boxes.
And Greg's original get IBM to agree remark is about the
Intel-based boxes.

--linas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5][RFC] Physical PCI slot objects

2007-11-13 Thread Linas Vepstas

On Tue, Nov 13, 2007 at 01:59:39PM -0700, Alex Chiang wrote:
 
  On pseries systems, I deal with something called the
  partitionable endpoint, which I think probably usually
  corresponds to physical slots, but I don't really know.  
 
  So, naively, the physical slot concept doesn't really map to
  what I have to work with; it just adds one more appendix to it
  all, one more thing to get confused about.
 
 Sorry, I'm a bit ignorant about pseries -- what kind of name does
 your PCI hotplug driver give to those slots? What shows up in
 /sys/bus/pci/slots/?

~ # find /sys/bus/pci/slots -print
/sys/bus/pci/slots
/sys/bus/pci/slots/control
/sys/bus/pci/slots/control/remove_slot
/sys/bus/pci/slots/control/add_slot
/sys/bus/pci/slots/0001:00:02.0
/sys/bus/pci/slots/0001:00:02.0/phy_location
/sys/bus/pci/slots/0001:00:02.0/max_bus_speed
/sys/bus/pci/slots/0001:00:02.0/adapter
/sys/bus/pci/slots/0001:00:02.0/attention
/sys/bus/pci/slots/0001:00:02.0/power
/sys/bus/pci/slots/0001:00:02.6
etc.

~ # cat /sys/bus/pci/slots/:00:02.2/phy_location
U787A.001.DNZ00Z5-P1-C2

phy_location corresponds to the printed labels on the box,
and is also shared with the various management consoles
and serivce processors and what not that get involved 
with stuff like that.

Although, please note: on these sysmes, 99% of the useful
info about hardware is in /proc/device-tree, and not in /sys
I doubt anyone looks at /sys/bus/pci/slots, I think they mostly
look at the open firmware device tree.

--linas

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] pci hotplug: fix rpaphp directory naming

2007-11-13 Thread Linas Vepstas



Fix presentation of the slot number in the /sys/bus/pci/slots
directory to match that used in the majority of other drivers.

--

On Tue, Nov 13, 2007 at 02:58:30PM -0700, Matthew Wilcox wrote:
 On Tue, Nov 13, 2007 at 03:41:21PM -0600, Linas Vepstas wrote:
  /sys/bus/pci/slots
  /sys/bus/pci/slots/control
  /sys/bus/pci/slots/control/remove_slot
  /sys/bus/pci/slots/control/add_slot
  /sys/bus/pci/slots/0001:00:02.0
  /sys/bus/pci/slots/0001:00:02.0/phy_location

 Ugh.  Almost two years ago, paulus promised me he was going to fix the
 slot name for rpaphp.  Guess he didn't.

You have to ask the right person. :-) I've been defacto mainaining
the rpaphp code for unpteen years now. On the other hand, I am also
much, much better at promising than delivering.

 This is one of the hateful things about the current design -- hotplug
 drivers do too much.  Instead of being just the interface between the
 Linux PCI code and the hardware, they create sysfs directories, add
 files,
 and generally have far too much freedom.

I chopped out several hundred LOC from rpaphp a year ago,
and hopefuly that might make furthre simplification easier 
someday.

 We have four different schemes currently for naming in slots/,
 1. slot number.  Used by cpqphp, ibmphp, acpiphp, pciehp, shpc.
 2. domain:bus:dev:fn.  Used by fakephp.
 3a. domain:bus:dev.  Used by rpaphp and sgihp.
 3b. Except that rpaphp uses phy_location to present the information
 that
 should be in the name and sgihp uses path.

 ... I've forgotten what cpci uses.  And yenta doesn't use it.

 How is anyone supposed to write sane managability tools in the
 presence
 of such anarchy?

  ~ # cat /sys/bus/pci/slots/:00:02.2/phy_location
  U787A.001.DNZ00Z5-P1-C2

 Right.  This should look like:

 # cat /sys/bus/pci/slots/U787A.001.DNZ00Z5-P1-C2/address
 :00:02

This patch implements exactly what you describe. Boot tested.
I assume you really mean it -- if so, then please review and
ack the patch !?

I have absolutely no clue if this breaks any existing IBM tools.
I'm pretty sure it doesn't ... but attention Mike Strosaker! does it?


 drivers/pci/hotplug/rpaphp.h  |1 
 drivers/pci/hotplug/rpaphp_pci.c  |   14 ---
 drivers/pci/hotplug/rpaphp_slot.c |   47 +++---
 3 files changed, 24 insertions(+), 38 deletions(-)

Index: linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_pci.c
===
--- linux-2.6.23-rc8-mm1.orig/drivers/pci/hotplug/rpaphp_pci.c  2007-07-08 
18:32:17.0 -0500
+++ linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_pci.c   2007-11-13 
17:52:10.0 -0600
@@ -64,19 +64,6 @@ int rpaphp_get_sensor_state(struct slot 
return rc;
 }
 
-static void set_slot_name(struct slot *slot)
-{
-   struct pci_bus *bus = slot-bus;
-   struct pci_dev *bridge;
-
-   bridge = bus-self;
-   if (bridge)
-   strcpy(slot-name, pci_name(bridge));
-   else
-   sprintf(slot-name, %04x:%02x:00.0, pci_domain_nr(bus),
-   bus-number);
-}
-
 /**
  * rpaphp_enable_slot - record slot state, config pci device
  *
@@ -114,7 +101,6 @@ int rpaphp_enable_slot(struct slot *slot
info-adapter_status = EMPTY;
slot-bus = bus;
slot-pci_devs = bus-devices;
-   set_slot_name(slot);
 
/* if there's an adapter in the slot, go add the pci devices */
if (state == PRESENT) {
Index: linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_slot.c
===
--- linux-2.6.23-rc8-mm1.orig/drivers/pci/hotplug/rpaphp_slot.c 2007-07-08 
18:32:17.0 -0500
+++ linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_slot.c  2007-11-13 
18:05:13.0 -0600
@@ -33,23 +33,31 @@
 #include asm/rtas.h
 #include rpaphp.h
 
-static ssize_t location_read_file (struct hotplug_slot *php_slot, char *buf)
+static ssize_t address_read_file (struct hotplug_slot *php_slot, char *buf)
 {
-   char *value;
-   int retval = -ENOENT;
+   int retval;
struct slot *slot = (struct slot *)php_slot-private;
+   struct pci_bus *bus;
 
if (!slot)
-   return retval;
+   return -ENOENT;
 
-   value = slot-location;
-   retval = sprintf (buf, %s\n, value);
+   bus = slot-bus;
+   if (!bus)
+   return -ENOENT;
+
+   if (bus-self)
+   retval = sprintf(buf, pci_name(bus-self));
+   else
+   retval = sprintf(buf, %04x:%02x:00.0, 
+   pci_domain_nr(bus), bus-number);
+   
return retval;
 }
 
-static struct hotplug_slot_attribute php_attr_location = {
-   .attr = {.name = phy_location, .mode = S_IFREG | S_IRUGO},
-   .show = location_read_file,
+static struct hotplug_slot_attribute php_attr_address = {
+   .attr = {.name = address, .mode = S_IFREG | S_IRUGO},
+   .show = address_read_file

Re: [PATCH] Align PCI memory regions to page size (4K) - Fix

2007-11-08 Thread Linas Vepstas

On Sun, Oct 28, 2007 at 11:52:16PM -0600, Grant Grundler wrote:
> > On Sun, Oct 28, 2007 at 03:53:20PM -0400, Barak Fargoun wrote:
> ...
> > > About your question: today, some of the hypervisors are using linux
> > > kernel as their domain-0 (e.g. Xen). In order to implement direct
> > > hardware access for these native domains (e.g.  running windows in a
> > > virtual machine above Xen), the PCI memory regions should be aligned
> > > at-least at the page-level (so, a virtual machine - can't see data of
> > > other devices which may not be assigned to it). So, for that reason,
> > > we wanted a boot parameter to let us force the kernel to align PCI
> > > memory regions at-least at a PAGE_SIZE alignment. It is very useful
> > > for hypervisors which are developed at Linux environment (e.g.: Xen).
> 
> It's a benefit IFF multiple devices are spread across more than one guest
> _and_ we don't trust every particating guest to play nicely with IO.  That way
> the Hypervisor can assign one device to a specific guest OS for direct access.
> E.g. 4 port Gige card could directly support the host and 3 guests with 
> somewhat
> lower risk of tromping on each other's MMIO space.
> 
> If Xen is cooperative, this seems a bit paranoid. I don't recall ever seeing a
> driver bug where the driver accidentally poked MMIO space at the wrong device.

I presume the issue is not a driver bug per-se, but a
spying/hacking-type security issue: Having root in one guest could in
principle allow one to write a driver that snooped on data in other
guests, and/or intentionally corrupted data on other guests.

I envision some ISP renting out 1/3 of a machine with a 4-port card,
and having some nosey college-kid wannabe hacker getting root on one of
the guests and causing trouble.  But perhaps I'm wy off base
here.

(Just like occasional cigarette smoking is known to inevitably lead to
full-fledged heroin addiction, I am pretty sure that the culture of
"cheat codes" among 12-year-olds is going to lead to an epidemic of
hackers in about 10 years. I am atuned to "wannabe hacker culture"). 

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Align PCI memory regions to page size (4K) - Fix

2007-11-08 Thread Linas Vepstas

On Sun, Oct 28, 2007 at 11:52:16PM -0600, Grant Grundler wrote:
  On Sun, Oct 28, 2007 at 03:53:20PM -0400, Barak Fargoun wrote:
 ...
   About your question: today, some of the hypervisors are using linux
   kernel as their domain-0 (e.g. Xen). In order to implement direct
   hardware access for these native domains (e.g.  running windows in a
   virtual machine above Xen), the PCI memory regions should be aligned
   at-least at the page-level (so, a virtual machine - can't see data of
   other devices which may not be assigned to it). So, for that reason,
   we wanted a boot parameter to let us force the kernel to align PCI
   memory regions at-least at a PAGE_SIZE alignment. It is very useful
   for hypervisors which are developed at Linux environment (e.g.: Xen).
 
 It's a benefit IFF multiple devices are spread across more than one guest
 _and_ we don't trust every particating guest to play nicely with IO.  That way
 the Hypervisor can assign one device to a specific guest OS for direct access.
 E.g. 4 port Gige card could directly support the host and 3 guests with 
 somewhat
 lower risk of tromping on each other's MMIO space.
 
 If Xen is cooperative, this seems a bit paranoid. I don't recall ever seeing a
 driver bug where the driver accidentally poked MMIO space at the wrong device.

I presume the issue is not a driver bug per-se, but a
spying/hacking-type security issue: Having root in one guest could in
principle allow one to write a driver that snooped on data in other
guests, and/or intentionally corrupted data on other guests.

I envision some ISP renting out 1/3 of a machine with a 4-port card,
and having some nosey college-kid wannabe hacker getting root on one of
the guests and causing trouble.  But perhaps I'm wy off base
here.

(Just like occasional cigarette smoking is known to inevitably lead to
full-fledged heroin addiction, I am pretty sure that the culture of
cheat codes among 12-year-olds is going to lead to an epidemic of
hackers in about 10 years. I am atuned to wannabe hacker culture). 

--linas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] PCI: disable MSI on more ATI NorthBridges

2007-10-19 Thread Linas Vepstas

On Fri, Oct 19, 2007 at 09:17:23PM +0800, Shane Huang wrote:
> Since we have little experience on PCI and MSI here, we had to try to

As someone else pointed out, AMD should have *lots* of people with
pci and msi experience on the payroll.  (Folks here buy AMD-designed 
pci chips ...)

> ONLY
> comment out the pci_intx() call in drivers/ata/ahci.c
> My system can boot up too with MSI enabled!
> 
> So does it mean that the root cause is our SB700 SATA controller
> has a hardware bug where setting INTX_DISABLE in the PCI COMMAND
> register masks MSI interrupts too? 

That's what it sounds like, to me.

> And what is the software solution or workaround?

Not sure. Sounds like the device driver needs a quirk for this part.

The over-worked Jeff Garzik is the maintainer for that driver.

You should probably provide the pci device id for this beast.

--linas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] PCI: disable MSI on more ATI NorthBridges

2007-10-19 Thread Linas Vepstas

On Fri, Oct 19, 2007 at 09:17:23PM +0800, Shane Huang wrote:
 Since we have little experience on PCI and MSI here, we had to try to

As someone else pointed out, AMD should have *lots* of people with
pci and msi experience on the payroll.  (Folks here buy AMD-designed 
pci chips ...)

 ONLY
 comment out the pci_intx() call in drivers/ata/ahci.c
 My system can boot up too with MSI enabled!
 
 So does it mean that the root cause is our SB700 SATA controller
 has a hardware bug where setting INTX_DISABLE in the PCI COMMAND
 register masks MSI interrupts too? 

That's what it sounds like, to me.

 And what is the software solution or workaround?

Not sure. Sounds like the device driver needs a quirk for this part.

The over-worked Jeff Garzik is the maintainer for that driver.

You should probably provide the pci device id for this beast.

--linas

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pci: implement "pci=noaer"

2007-10-15 Thread Linas Vepstas

On Fri, Oct 05, 2007 at 01:17:58PM -0700, Randy Dunlap wrote:
> From: Randy Dunlap <[EMAIL PROTECTED]>
> 
> For cases in which CONFIG_PCIEAER=y (such as distro kernels), allow users
> to disable PCIE Advanced Error Reporting by using "pci=noaer" on the
> kernel command line.

Looks reasonable to me. 

(sorry for the belated reply... I also saw this in gregkh's patch
series. Since the email was addressed to me, I figure I should 
at least say "yes I read it". Dunno if "Yes-I-read-it-by:" is 
the same as "Acked by:", if it is, then acked-by me.. )

--linas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pci: implement pci=noaer

2007-10-15 Thread Linas Vepstas



On Fri, Oct 05, 2007 at 01:17:58PM -0700, Randy Dunlap wrote:
 From: Randy Dunlap [EMAIL PROTECTED]
 
 For cases in which CONFIG_PCIEAER=y (such as distro kernels), allow users
 to disable PCIE Advanced Error Reporting by using pci=noaer on the
 kernel command line.

Looks reasonable to me. 

(sorry for the belated reply... I also saw this in gregkh's patch
series. Since the email was addressed to me, I figure I should 
at least say yes I read it. Dunno if Yes-I-read-it-by: is 
the same as Acked by:, if it is, then acked-by me.. )

--linas

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Simplify yenta code

2007-10-08 Thread Linas Vepstas

On Sat, Oct 06, 2007 at 01:26:01PM +0900, Komuro wrote:
> Hello,
> 
> Unfortunately, your patch is wrong,
> and does not results in the same run-time behaviour.

Yes, I thought I'd withdrawn it, as a monday-morning error.
If that wasn't clear .. NAK to my own patch.

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Simplify yenta code

2007-10-08 Thread Linas Vepstas

On Sat, Oct 06, 2007 at 01:26:01PM +0900, Komuro wrote:
 Hello,
 
 Unfortunately, your patch is wrong,
 and does not results in the same run-time behaviour.

Yes, I thought I'd withdrawn it, as a monday-morning error.
If that wasn't clear .. NAK to my own patch.

--linas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-05 Thread Linas Vepstas

On Thu, Oct 04, 2007 at 05:01:47PM -0700, Nish Aravamudan wrote:
> On 10/2/07, Tony Breeds <[EMAIL PROTECTED]> wrote:
> > On Wed, Oct 03, 2007 at 10:30:16AM +1000, Michael Ellerman wrote:
> >
> > > I realise it'll make the patch bigger, but this doesn't seem like a
> > > particularly good name for the variable anymore.
> >
> > Sure, what about?
> >
> > Clarify when RTAS logging is enabled.
> >
> > Signed-off-by: Tony Breeds <[EMAIL PROTECTED]>
> 
> For what it's worth, on a different ppc64 box, this resolves a similar
> panic for me.
> 
> Tested-by: Nishanth Aravamudan <[EMAIL PROTECTED]>

For the reasons explained, I'd really like to nack Tony's patch.

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-05 Thread Linas Vepstas

On Thu, Oct 04, 2007 at 05:01:47PM -0700, Nish Aravamudan wrote:
 On 10/2/07, Tony Breeds [EMAIL PROTECTED] wrote:
  On Wed, Oct 03, 2007 at 10:30:16AM +1000, Michael Ellerman wrote:
 
   I realise it'll make the patch bigger, but this doesn't seem like a
   particularly good name for the variable anymore.
 
  Sure, what about?
 
  Clarify when RTAS logging is enabled.
 
  Signed-off-by: Tony Breeds [EMAIL PROTECTED]
 
 For what it's worth, on a different ppc64 box, this resolves a similar
 panic for me.
 
 Tested-by: Nishanth Aravamudan [EMAIL PROTECTED]

For the reasons explained, I'd really like to nack Tony's patch.

--linas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure

2007-10-04 Thread Linas Vepstas

On Mon, Oct 01, 2007 at 07:27:30PM -0600, Matthew Wilcox wrote:
> 
> The thing to remember is that sym2 is in transition from being a dual
> BSD/Linux driver to being a purely Linux driver. 

I was wondering about that; couldn't tell if the split in the code
was historical, or being intentionally maintained.

> > My gut instinct is to say "ack", although prudence dictates that 
> > I should test first. Which might take a few days...
> 
> Fine by me.  

I tested the patch, it worked great. It also seemed to recover 
much more quickly -- so quickly, in fact, that I thought something 
had gone wrong.

I reviewed it one more time, it really does look good. A formal
submission and acked by's at earliest convenience would be good. 

--linas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure

2007-10-04 Thread Linas Vepstas

On Mon, Oct 01, 2007 at 07:27:30PM -0600, Matthew Wilcox wrote:
 
 The thing to remember is that sym2 is in transition from being a dual
 BSD/Linux driver to being a purely Linux driver. 

I was wondering about that; couldn't tell if the split in the code
was historical, or being intentionally maintained.

  My gut instinct is to say ack, although prudence dictates that 
  I should test first. Which might take a few days...
 
 Fine by me.  

I tested the patch, it worked great. It also seemed to recover 
much more quickly -- so quickly, in fact, that I thought something 
had gone wrong.

I reviewed it one more time, it really does look good. A formal
submission and acked by's at earliest convenience would be good. 

--linas

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-03 Thread Linas Vepstas

On Wed, Oct 03, 2007 at 02:09:46PM +1000, Michael Ellerman wrote:
> 
> Until we initialise what exactly?

Until we allocate the error log buffer. The original crash was 
for a null-pointer deref of the unallocated buffer. I just sent 
out a patch to fix this; its a bit simpler than the below.

In that email, I remarked:

Andy Whitcroft's crash was appearently due to firmware complaining
about lost power, (actually, lost power supply redundancy!), which
occurred very early during boot.

Type0040 (EPOW)
Status: bypassed new
Residual error from previous boot.
EPOW Sensor Value:  0002
EPOW warning due to loss of redundancy.
EPOW general power fault.

I've no clue why firmware thought it was OK to report this
during one of the earliest calls to RTAS; I'm still investiigating
that.

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-03 Thread Linas Vepstas

On Wed, Oct 03, 2007 at 02:09:46PM +1000, Michael Ellerman wrote:
 
 Until we initialise what exactly?

Until we allocate the error log buffer. The original crash was 
for a null-pointer deref of the unallocated buffer. I just sent 
out a patch to fix this; its a bit simpler than the below.

In that email, I remarked:

Andy Whitcroft's crash was appearently due to firmware complaining
about lost power, (actually, lost power supply redundancy!), which
occurred very early during boot.

Type0040 (EPOW)
Status: bypassed new
Residual error from previous boot.
EPOW Sensor Value:  0002
EPOW warning due to loss of redundancy.
EPOW general power fault.

I've no clue why firmware thought it was OK to report this
during one of the earliest calls to RTAS; I'm still investiigating
that.

--linas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-02 Thread Linas Vepstas

On Mon, Sep 24, 2007 at 01:35:31PM +0100, Andy Whitcroft wrote:
> Seeing the following from an older power LPAR, pretty sure we had
> this in the previous -mm also:

I haven't forgetten about this ... and am looking at it now.
Seems that whenever I go to reserve the machine pSeries-102,
someone else is using it :-)

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure

2007-10-02 Thread Linas Vepstas

On Mon, Oct 01, 2007 at 07:27:30PM -0600, Matthew Wilcox wrote:
> 
> Fine by me.  Do you have the ability to produce failures on a whim on
> your platforms?  

Yes, although it is very platform specific -- there are actually
transistors in the pci bridge chip, which actually short out lines,
and so, from the point of view of the rest of the chip, it did
actually see a "real" error. Its supposed to be a very realistic 
test.

> I've been vaguely musing a PCI device failure patch for
> x86, just so people can test driver failure paths.

That would be good ... I've recently agreed to accept a fedex
to test someone elses card for them, which is outside my usual
activities.

There's also supposed to be some PCI-X riser card out there, 
(never seen one) which has the ability to inject actual pci 
errors. Its the Agilent PCI BestX card; I got the impression 
they might not sell it anymore; dunno.

One guy in the lab used to brush a grounding strap across
the pins; this usually got a rise out of the audience.

--linas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure

2007-10-02 Thread Linas Vepstas

On Mon, Oct 01, 2007 at 07:27:30PM -0600, Matthew Wilcox wrote:
 
 Fine by me.  Do you have the ability to produce failures on a whim on
 your platforms?  

Yes, although it is very platform specific -- there are actually
transistors in the pci bridge chip, which actually short out lines,
and so, from the point of view of the rest of the chip, it did
actually see a real error. Its supposed to be a very realistic 
test.

 I've been vaguely musing a PCI device failure patch for
 x86, just so people can test driver failure paths.

That would be good ... I've recently agreed to accept a fedex
to test someone elses card for them, which is outside my usual
activities.

There's also supposed to be some PCI-X riser card out there, 
(never seen one) which has the ability to inject actual pci 
errors. Its the Agilent PCI BestX card; I got the impression 
they might not sell it anymore; dunno.

One guy in the lab used to brush a grounding strap across
the pins; this usually got a rise out of the audience.

--linas

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-02 Thread Linas Vepstas

On Mon, Sep 24, 2007 at 01:35:31PM +0100, Andy Whitcroft wrote:
 Seeing the following from an older power LPAR, pretty sure we had
 this in the previous -mm also:

I haven't forgetten about this ... and am looking at it now.
Seems that whenever I go to reserve the machine pSeries-102,
someone else is using it :-)

--linas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure

2007-10-01 Thread Linas Vepstas

On Mon, Oct 01, 2007 at 02:12:47PM -0600, Matthew Wilcox wrote:
> 
> I think the fundamental problem is that completions aren't really
> supposed to be used like this.  Here's one attempt at using completions
> perhaps a little more the way they're supposed to be used, 

Yes, that looks very good to me.  I see it solves a bug that
I hadn't been quite aware of. I don't understand why 
struct host_data is preferable to struct sym_shcb (is it because 
this is the structure that is "naturally protectected" by the 
spinlock?)

My gut instinct is to say "ack", although prudence dictates that 
I should test first. Which might take a few days...

> although now
> I've written it, I wonder if we shouldn't just use a waitqueue instead.

I thought that earlier versions of the driver used waitqueues (I vaguely
remember "eh_wait" in the code), which were later converted to 
completions (I also vaguely recall thinking that the new code was
more elegant/simpler). I converted my patch to use the completions 
likewise, and, as you've clearly shown, did a rather sloppy job in 
the conversion.

I'm tempted to go with this patch; but if you prod, I could attempt
a wait-queue based patch.

--linas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Simplify yenta code

2007-10-01 Thread Linas Vepstas

On Mon, Oct 01, 2007 at 12:56:55PM -0600, Matthew Wilcox wrote:
> 
> Are BRIDGE_IO_MAX and BRIDGE_MEM_MAX guaranteed to be the same?

@#$%^&*. Red-faced, I withdraw the patch. Must be cross-eyed on
a Monday morning. Sorry.

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Simplify yenta code

2007-10-01 Thread Linas Vepstas


Simplify some of the resource detection logic in yenta_socket.
This patch results in the same run-time behaviour as the 
current code, but does so with fewer lines of code. This
makes the logical flow of the code a bit easier to understand.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>
Cc: Dominik Brodowski <[EMAIL PROTECTED]>


 drivers/pcmcia/yenta_socket.c |   22 --
 1 file changed, 8 insertions(+), 14 deletions(-)

Index: linux-2.6.23-rc8-mm1/drivers/pcmcia/yenta_socket.c
===
--- linux-2.6.23-rc8-mm1.orig/drivers/pcmcia/yenta_socket.c 2007-10-01 
12:17:02.0 -0500
+++ linux-2.6.23-rc8-mm1/drivers/pcmcia/yenta_socket.c  2007-10-01 
12:37:36.0 -0500
@@ -654,20 +654,14 @@ static int yenta_allocate_res(struct yen
pci_name(socket->dev), nr);
}
 
-   if (type & IORESOURCE_IO) {
-   if ((yenta_search_res(socket, res, BRIDGE_IO_MAX)) ||
-   (yenta_search_res(socket, res, BRIDGE_IO_ACC)) ||
-   (yenta_search_res(socket, res, BRIDGE_IO_MIN)))
-   return 1;
-   } else {
-   if (type & IORESOURCE_PREFETCH) {
-   if ((yenta_search_res(socket, res, BRIDGE_MEM_MAX)) ||
-   (yenta_search_res(socket, res, BRIDGE_MEM_ACC)) ||
-   (yenta_search_res(socket, res, BRIDGE_MEM_MIN)))
-   return 1;
-   /* Approximating prefetchable by non-prefetchable */
-   res->flags = IORESOURCE_MEM;
-   }
+   if ((yenta_search_res(socket, res, BRIDGE_MEM_MAX)) ||
+   (yenta_search_res(socket, res, BRIDGE_MEM_ACC)) ||
+   (yenta_search_res(socket, res, BRIDGE_MEM_MIN)))
+   return 1;
+
+   if (type & IORESOURCE_PREFETCH) {
+   /* Approximating prefetchable by non-prefetchable */
+   res->flags = IORESOURCE_MEM;
if ((yenta_search_res(socket, res, BRIDGE_MEM_MAX)) ||
(yenta_search_res(socket, res, BRIDGE_MEM_ACC)) ||
(yenta_search_res(socket, res, BRIDGE_MEM_MIN)))
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Simplify yenta code

2007-10-01 Thread Linas Vepstas


Simplify some of the resource detection logic in yenta_socket.
This patch results in the same run-time behaviour as the 
current code, but does so with fewer lines of code. This
makes the logical flow of the code a bit easier to understand.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
Cc: Dominik Brodowski [EMAIL PROTECTED]


 drivers/pcmcia/yenta_socket.c |   22 --
 1 file changed, 8 insertions(+), 14 deletions(-)

Index: linux-2.6.23-rc8-mm1/drivers/pcmcia/yenta_socket.c
===
--- linux-2.6.23-rc8-mm1.orig/drivers/pcmcia/yenta_socket.c 2007-10-01 
12:17:02.0 -0500
+++ linux-2.6.23-rc8-mm1/drivers/pcmcia/yenta_socket.c  2007-10-01 
12:37:36.0 -0500
@@ -654,20 +654,14 @@ static int yenta_allocate_res(struct yen
pci_name(socket-dev), nr);
}
 
-   if (type  IORESOURCE_IO) {
-   if ((yenta_search_res(socket, res, BRIDGE_IO_MAX)) ||
-   (yenta_search_res(socket, res, BRIDGE_IO_ACC)) ||
-   (yenta_search_res(socket, res, BRIDGE_IO_MIN)))
-   return 1;
-   } else {
-   if (type  IORESOURCE_PREFETCH) {
-   if ((yenta_search_res(socket, res, BRIDGE_MEM_MAX)) ||
-   (yenta_search_res(socket, res, BRIDGE_MEM_ACC)) ||
-   (yenta_search_res(socket, res, BRIDGE_MEM_MIN)))
-   return 1;
-   /* Approximating prefetchable by non-prefetchable */
-   res-flags = IORESOURCE_MEM;
-   }
+   if ((yenta_search_res(socket, res, BRIDGE_MEM_MAX)) ||
+   (yenta_search_res(socket, res, BRIDGE_MEM_ACC)) ||
+   (yenta_search_res(socket, res, BRIDGE_MEM_MIN)))
+   return 1;
+
+   if (type  IORESOURCE_PREFETCH) {
+   /* Approximating prefetchable by non-prefetchable */
+   res-flags = IORESOURCE_MEM;
if ((yenta_search_res(socket, res, BRIDGE_MEM_MAX)) ||
(yenta_search_res(socket, res, BRIDGE_MEM_ACC)) ||
(yenta_search_res(socket, res, BRIDGE_MEM_MIN)))
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Simplify yenta code

2007-10-01 Thread Linas Vepstas

On Mon, Oct 01, 2007 at 12:56:55PM -0600, Matthew Wilcox wrote:
 
 Are BRIDGE_IO_MAX and BRIDGE_MEM_MAX guaranteed to be the same?

@#$%^*. Red-faced, I withdraw the patch. Must be cross-eyed on
a Monday morning. Sorry.

--linas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure

2007-10-01 Thread Linas Vepstas

On Mon, Oct 01, 2007 at 02:12:47PM -0600, Matthew Wilcox wrote:
 
 I think the fundamental problem is that completions aren't really
 supposed to be used like this.  Here's one attempt at using completions
 perhaps a little more the way they're supposed to be used, 

Yes, that looks very good to me.  I see it solves a bug that
I hadn't been quite aware of. I don't understand why 
struct host_data is preferable to struct sym_shcb (is it because 
this is the structure that is naturally protectected by the 
spinlock?)

My gut instinct is to say ack, although prudence dictates that 
I should test first. Which might take a few days...

 although now
 I've written it, I wonder if we shouldn't just use a waitqueue instead.

I thought that earlier versions of the driver used waitqueues (I vaguely
remember eh_wait in the code), which were later converted to 
completions (I also vaguely recall thinking that the new code was
more elegant/simpler). I converted my patch to use the completions 
likewise, and, as you've clearly shown, did a rather sloppy job in 
the conversion.

I'm tempted to go with this patch; but if you prod, I could attempt
a wait-queue based patch.

--linas

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure

2007-09-27 Thread Linas Vepstas

On Thu, Sep 27, 2007 at 04:10:31PM -0600, Matthew Wilcox wrote:
> In the error handler, we wait_for_completion(io_reset_wait).
> In sym2_io_error_detected, we init_completion(io_reset_wait).
> Isn't it possible that we hit the error handler before we hit the
> io_error_detected path, and thus the completion wait is lost?
> Since the completion is already initialised in sym_attach(), I don't
> think we need to initialise it in sym2_io_error_detected().
> Makes sense to just delete it?

Good catch. But no ... and I had to study this a bit. Bear with me:

It is enough to call init_completion() once, and not once per use:
it initializes spinlocks, which shouldn't be intialized twice. 

But, that completion might be used multiple times when there are
multiple errors, and so, before using it a second time, one must 
set completion->done = 0.  The INIT_COMPLETION() macro does this. 

One must have completion->done = 0 before every use, as otherwise, 
wait_for_completion() won't actually wait. And since complete_all()
sets x->done += UINT_MAX/2, I'm pretty sure x->done won't be zero
the next time we use it, unless we make it so.

So I need to find a place to safely call INIT_COMPLETION() again, 
after the completion has been used. At the moment, I'm stumped
as to where to do this. 

 [think ... think ... think] 

I think the race you describe above is harmless. The first time
that sym_eh_handler() will run, it will be with SYM_EH_ABORT, 
in it doesn't matter if we lose that, since the device is hosed
anyway. At some later time, it will run with SYM_EH_DEVICE_RESET
and then SYM_EH_BUS_RESET and then SYM_EH_HOST_RESET, and we won't 
miss those, since, by now, sym2_io_error_detected() will have run.

So, by my reading, I'd say that init_completion() in
sym2_io_error_detected() has to stay (although perhaps
it should be replaced by the INIT_COMPLETION() macro.)
Removing it will prevent correct operation on the second 
and subsequent errors.

--Linas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure

2007-09-27 Thread Linas Vepstas

On Wed, Sep 26, 2007 at 09:02:16AM -0600, Matthew Wilcox wrote:
> On Fri, Apr 20, 2007 at 03:47:20PM -0500, Linas Vepstas wrote:
> > Implement the so-called "first failure data capture" (FFDC) for the
> > symbios PCI error recovery.  After a PCI error event is reported,
> > the driver requests that MMIO be enabled. Once enabled, it 
> > then reads and dumps assorted status registers, and concludes
> > by requesting the usual reset sequence.
> 
> > +   /* Request that MMIO be enabled, so register dump can be taken. */
> > +   return PCI_ERS_RESULT_CAN_RECOVER;
> > +}
> 
> I'm a little concerned by the mention of MMIO.  It's entirely possible
> for the sym2 driver to be using ioports to access the card rather than
> MMIO.  Is it simply that it can't on the platform you test on?

The comment is misleading. I've been in the bad habit of calling
it "mmio" whenever its not DMA.

The habit is because there are two distinct enable bits in the 
pci-host bridge during error recovery: one to enable mmio/ioports, 
and the other to enable DMA. If the adapter has gone crazy, I don't 
want to enable DMA, so that it doesn't scribble to bad places. But, 
by enabling mmio/ioports, perhaps it can be finessed back into a 
semi-sane state, e.g. sane enough to perform a dump of its internal
state.

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure

2007-09-27 Thread Linas Vepstas

On Wed, Sep 26, 2007 at 09:02:16AM -0600, Matthew Wilcox wrote:
 On Fri, Apr 20, 2007 at 03:47:20PM -0500, Linas Vepstas wrote:
  Implement the so-called first failure data capture (FFDC) for the
  symbios PCI error recovery.  After a PCI error event is reported,
  the driver requests that MMIO be enabled. Once enabled, it 
  then reads and dumps assorted status registers, and concludes
  by requesting the usual reset sequence.
 
  +   /* Request that MMIO be enabled, so register dump can be taken. */
  +   return PCI_ERS_RESULT_CAN_RECOVER;
  +}
 
 I'm a little concerned by the mention of MMIO.  It's entirely possible
 for the sym2 driver to be using ioports to access the card rather than
 MMIO.  Is it simply that it can't on the platform you test on?

The comment is misleading. I've been in the bad habit of calling
it mmio whenever its not DMA.

The habit is because there are two distinct enable bits in the 
pci-host bridge during error recovery: one to enable mmio/ioports, 
and the other to enable DMA. If the adapter has gone crazy, I don't 
want to enable DMA, so that it doesn't scribble to bad places. But, 
by enabling mmio/ioports, perhaps it can be finessed back into a 
semi-sane state, e.g. sane enough to perform a dump of its internal
state.

--linas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure

2007-09-27 Thread Linas Vepstas

On Thu, Sep 27, 2007 at 04:10:31PM -0600, Matthew Wilcox wrote:
 In the error handler, we wait_for_completion(io_reset_wait).
 In sym2_io_error_detected, we init_completion(io_reset_wait).
 Isn't it possible that we hit the error handler before we hit the
 io_error_detected path, and thus the completion wait is lost?
 Since the completion is already initialised in sym_attach(), I don't
 think we need to initialise it in sym2_io_error_detected().
 Makes sense to just delete it?

Good catch. But no ... and I had to study this a bit. Bear with me:

It is enough to call init_completion() once, and not once per use:
it initializes spinlocks, which shouldn't be intialized twice. 

But, that completion might be used multiple times when there are
multiple errors, and so, before using it a second time, one must 
set completion-done = 0.  The INIT_COMPLETION() macro does this. 

One must have completion-done = 0 before every use, as otherwise, 
wait_for_completion() won't actually wait. And since complete_all()
sets x-done += UINT_MAX/2, I'm pretty sure x-done won't be zero
the next time we use it, unless we make it so.

So I need to find a place to safely call INIT_COMPLETION() again, 
after the completion has been used. At the moment, I'm stumped
as to where to do this. 

 [think ... think ... think] 

I think the race you describe above is harmless. The first time
that sym_eh_handler() will run, it will be with SYM_EH_ABORT, 
in it doesn't matter if we lose that, since the device is hosed
anyway. At some later time, it will run with SYM_EH_DEVICE_RESET
and then SYM_EH_BUS_RESET and then SYM_EH_HOST_RESET, and we won't 
miss those, since, by now, sym2_io_error_detected() will have run.

So, by my reading, I'd say that init_completion() in
sym2_io_error_detected() has to stay (although perhaps
it should be replaced by the INIT_COMPLETION() macro.)
Removing it will prevent correct operation on the second 
and subsequent errors.

--Linas

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.23] ibmebus: Prevent bus_id collisions

2007-08-30 Thread Linas Vepstas

On Thu, Aug 30, 2007 at 04:00:56PM +0200, Joachim Fenkes wrote:
> 
> Plus, I rather like using 
> the full_name since it also contains a descriptive name as opposed to 
> being just nondescript numbers, helping the layman (ie user) to make sense 
> out of a dev_id.

Yes, well, but no. The location code is useful as a geographical
location: slots and devices are physically labelled with stickers 
so you can tell which is which.  Handy when you have to unplug stuff. 
By contrast, the device-tree full_name is mostly just gobldy-gook, 
with some crazy phb numbering in there that, after four years of 
staring at them, I still can't reliably do anything useful with.  
Location codes are nice. 

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.23] ibmebus: Prevent bus_id collisions

2007-08-30 Thread Linas Vepstas

On Thu, Aug 30, 2007 at 04:00:56PM +0200, Joachim Fenkes wrote:
 
 Plus, I rather like using 
 the full_name since it also contains a descriptive name as opposed to 
 being just nondescript numbers, helping the layman (ie user) to make sense 
 out of a dev_id.

Yes, well, but no. The location code is useful as a geographical
location: slots and devices are physically labelled with stickers 
so you can tell which is which.  Handy when you have to unplug stuff. 
By contrast, the device-tree full_name is mostly just gobldy-gook, 
with some crazy phb numbering in there that, after four years of 
staring at them, I still can't reliably do anything useful with.  
Location codes are nice. 

--linas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas

On Fri, Aug 24, 2007 at 02:44:36PM -0700, David Miller wrote:
> From: David Stevens <[EMAIL PROTECTED]>
> Date: Fri, 24 Aug 2007 09:50:58 -0700
> 
> > Problem is if it increases rapidly, you may drop packets
> > before you notice that the ring is full in the current estimated
> > interval.
> 
> This is one of many reasons why hardware interrupt mitigation
> is really needed for this.

When turning off interrupts, don't turn them *all* off.
Leave the queue-full interrupt always on.

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas

On Fri, Aug 24, 2007 at 11:11:56PM +0200, Jan-Bernd Themann wrote:
> (when they are available for
> POWER in our case). 

hrtimer worked fine on the powerpc cell arch last summer.
I assume they work on p5 and p6 too, no ??

> I tried to implement something with "normal" timers, but the result
> was everything but great. The timers seem to be far too slow.
> I'm not sure if it helps to increase it from 1000HZ to 2500HZ
> or more.

Heh. Do the math. Even on 1gigabit cards, that's not enough:

(1gigabit/sec) x (byte/8 bits) x (packet/1500bytes) x (sec/1000 jiffy) 

is 83 packets a jiffy (for big packets, even more for small packets, 
and more again for 10 gigabit cards). So polling once per jiffy is a 
latency disaster.

--linas  

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas

On Fri, Aug 24, 2007 at 09:04:56PM +0200, Bodo Eggert wrote:
> Linas Vepstas <[EMAIL PROTECTED]> wrote:
> > On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
> >> 3) On modern systems the incoming packets are processed very fast. 
> >> Especially
> >> on SMP systems when we use multiple queues we process only a few packets
> >> per napi poll cycle. So NAPI does not work very well here and the interrupt
> >> rate is still high.
> > 
> > worst-case network ping-pong app: send one
> > packet, wait for reply, send one packet, etc.
> 
> Possible solution / possible brainfart:
> 
> Introduce a timer, but don't start to use it to combine packets unless you
> receive n packets within the timeframe. If you receive less than m packets
> within one timeframe, stop using the timer. The system should now have a
> decent response time when the network is idle, and when the network is
> busy, nobody will complain about the latency.-)

Ohh, that was inspirational. Let me free-associate some wild ideas.

Suppose we keep a running average of the recent packet arrival rate,
Lets say its 10 per millisecond ("typical" for a gigabit eth runnning
flat-out).  If we could poll the driver at a rate of 10-20 per
millisecond (i.e. letting the OS do other useful work for 0.05 millisec),
then we could potentially service the card without ever having to enable 
interrupts on the card, and without hurting latency.

If the packet arrival rate becomes slow enough, we go back to an
interrupt-driven scheme (to keep latency down).

The main problem here is that, even for HZ=1000 machines, this amounts 
to 10-20 polls per jiffy.  Which, if implemented in kernel, requires 
using the high-resolution timers. And, umm, don't the HR timers require
a cpu timer interrupt to make them go? So its not clear that this is much
of a win.

The eHEA is a 10 gigabit device, so it can expect 80-100 packets per
millisecond for large packets, and even more, say 1K packets per
millisec, for small packets. (Even the spec for my 1Gb spidernet card
claims its internal rate is 1M packets/sec.) 

Another possiblity is to set HZ to 5000 or 2 or something humongous
... after all cpu's are now faster! But, since this might be wasteful,
maybe we could make HZ be dynamically variable: have high HZ rates when
there's lots of network/disk activity, and low HZ rates when not. That
means a non-constant jiffy.

If all drivers used interrupt mitigation, then the variable-high
frequency jiffy could take thier place, and be more "fair" to everyone.
Most drivers would be polled most of the time when they're busy, and 
only use interrupts when they're not.

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas

On Fri, Aug 24, 2007 at 08:52:03AM -0700, Stephen Hemminger wrote:
> 
> You need hardware support for deferred interrupts. Most devices have it 
> (e1000, sky2, tg3)
> and it interacts well with NAPI. It is not a generic thing you want done by 
> the stack,
> you want the hardware to hold off interrupts until X packets or Y usecs have 
> expired.

Just to be clear, in the previous email I posted on this thread, I
described a worst-case network ping-pong test case (send a packet, wait
for reply), and found out that a deffered interrupt scheme just damaged
the performance of the test case.  Since the folks who came up with the
test case were adamant, I turned off the defferred interrupts.  
While defferred interrupts are an "obvious" solution, I decided that 
they weren't a good solution. (And I have no other solution to offer).

--linas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas

On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
> 3) On modern systems the incoming packets are processed very fast. Especially
>    on SMP systems when we use multiple queues we process only a few packets
>    per napi poll cycle. So NAPI does not work very well here and the 
> interrupt 
>    rate is still high. 

I saw this too, on a system that is "modern" but not terribly fast, and
only slightly (2-way) smp. (the spidernet)

I experimented wih various solutions, none were terribly exciting.  The
thing that killed all of them was a crazy test case that someone sprung on
me:  They had written a worst-case network ping-pong app: send one
packet, wait for reply, send one packet, etc.  

If I waited (indefinitely) for a second packet to show up, the test case 
completely stalled (since no second packet would ever arrive).  And if I 
introduced a timer to wait for a second packet, then I just increased 
the latency in the response to the first packet, and this was noticed, 
and folks complained.  

In the end, I just let it be, and let the system work as a busy-beaver, 
with the high interrupt rate. Is this a wise thing to do?  I was
thinking that, if the system is under heavy load, then the interrupt
rate would fall, since (for less pathological network loads) more 
packets would queue up before the poll was serviced.  But I did not
actually measure the interrupt rate under heavy load ... 

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas

On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
 3) On modern systems the incoming packets are processed very fast. Especially
    on SMP systems when we use multiple queues we process only a few packets
    per napi poll cycle. So NAPI does not work very well here and the 
 interrupt 
    rate is still high. 

I saw this too, on a system that is modern but not terribly fast, and
only slightly (2-way) smp. (the spidernet)

I experimented wih various solutions, none were terribly exciting.  The
thing that killed all of them was a crazy test case that someone sprung on
me:  They had written a worst-case network ping-pong app: send one
packet, wait for reply, send one packet, etc.  

If I waited (indefinitely) for a second packet to show up, the test case 
completely stalled (since no second packet would ever arrive).  And if I 
introduced a timer to wait for a second packet, then I just increased 
the latency in the response to the first packet, and this was noticed, 
and folks complained.  

In the end, I just let it be, and let the system work as a busy-beaver, 
with the high interrupt rate. Is this a wise thing to do?  I was
thinking that, if the system is under heavy load, then the interrupt
rate would fall, since (for less pathological network loads) more 
packets would queue up before the poll was serviced.  But I did not
actually measure the interrupt rate under heavy load ... 

--linas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas

On Fri, Aug 24, 2007 at 08:52:03AM -0700, Stephen Hemminger wrote:
 
 You need hardware support for deferred interrupts. Most devices have it 
 (e1000, sky2, tg3)
 and it interacts well with NAPI. It is not a generic thing you want done by 
 the stack,
 you want the hardware to hold off interrupts until X packets or Y usecs have 
 expired.

Just to be clear, in the previous email I posted on this thread, I
described a worst-case network ping-pong test case (send a packet, wait
for reply), and found out that a deffered interrupt scheme just damaged
the performance of the test case.  Since the folks who came up with the
test case were adamant, I turned off the defferred interrupts.  
While defferred interrupts are an obvious solution, I decided that 
they weren't a good solution. (And I have no other solution to offer).

--linas

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas

On Fri, Aug 24, 2007 at 09:04:56PM +0200, Bodo Eggert wrote:
 Linas Vepstas [EMAIL PROTECTED] wrote:
  On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
  3) On modern systems the incoming packets are processed very fast. 
  Especially
  on SMP systems when we use multiple queues we process only a few packets
  per napi poll cycle. So NAPI does not work very well here and the interrupt
  rate is still high.
  
  worst-case network ping-pong app: send one
  packet, wait for reply, send one packet, etc.
 
 Possible solution / possible brainfart:
 
 Introduce a timer, but don't start to use it to combine packets unless you
 receive n packets within the timeframe. If you receive less than m packets
 within one timeframe, stop using the timer. The system should now have a
 decent response time when the network is idle, and when the network is
 busy, nobody will complain about the latency.-)

Ohh, that was inspirational. Let me free-associate some wild ideas.

Suppose we keep a running average of the recent packet arrival rate,
Lets say its 10 per millisecond (typical for a gigabit eth runnning
flat-out).  If we could poll the driver at a rate of 10-20 per
millisecond (i.e. letting the OS do other useful work for 0.05 millisec),
then we could potentially service the card without ever having to enable 
interrupts on the card, and without hurting latency.

If the packet arrival rate becomes slow enough, we go back to an
interrupt-driven scheme (to keep latency down).

The main problem here is that, even for HZ=1000 machines, this amounts 
to 10-20 polls per jiffy.  Which, if implemented in kernel, requires 
using the high-resolution timers. And, umm, don't the HR timers require
a cpu timer interrupt to make them go? So its not clear that this is much
of a win.

The eHEA is a 10 gigabit device, so it can expect 80-100 packets per
millisecond for large packets, and even more, say 1K packets per
millisec, for small packets. (Even the spec for my 1Gb spidernet card
claims its internal rate is 1M packets/sec.) 

Another possiblity is to set HZ to 5000 or 2 or something humongous
... after all cpu's are now faster! But, since this might be wasteful,
maybe we could make HZ be dynamically variable: have high HZ rates when
there's lots of network/disk activity, and low HZ rates when not. That
means a non-constant jiffy.

If all drivers used interrupt mitigation, then the variable-high
frequency jiffy could take thier place, and be more fair to everyone.
Most drivers would be polled most of the time when they're busy, and 
only use interrupts when they're not.
 
--linas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas

On Fri, Aug 24, 2007 at 11:11:56PM +0200, Jan-Bernd Themann wrote:
 (when they are available for
 POWER in our case). 

hrtimer worked fine on the powerpc cell arch last summer.
I assume they work on p5 and p6 too, no ??

 I tried to implement something with normal timers, but the result
 was everything but great. The timers seem to be far too slow.
 I'm not sure if it helps to increase it from 1000HZ to 2500HZ
 or more.

Heh. Do the math. Even on 1gigabit cards, that's not enough:

(1gigabit/sec) x (byte/8 bits) x (packet/1500bytes) x (sec/1000 jiffy) 

is 83 packets a jiffy (for big packets, even more for small packets, 
and more again for 10 gigabit cards). So polling once per jiffy is a 
latency disaster.

--linas  

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas

On Fri, Aug 24, 2007 at 02:44:36PM -0700, David Miller wrote:
 From: David Stevens [EMAIL PROTECTED]
 Date: Fri, 24 Aug 2007 09:50:58 -0700

  Problem is if it increases rapidly, you may drop packets
  before you notice that the ring is full in the current estimated
  interval.

 This is one of many reasons why hardware interrupt mitigation
 is really needed for this.

When turning off interrupts, don't turn them *all* off.
Leave the queue-full interrupt always on.

--linas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] PCI AER: fix warnings when PCIEAER=n

2007-08-23 Thread Linas Vepstas

On Thu, Aug 23, 2007 at 10:37:53AM -0700, Randy Dunlap wrote:
> From: Randy Dunlap <[EMAIL PROTECTED]>
> 
> Fix warnings when CONFIG_PCIEAER=n:
> 
> drivers/pci/pcie/portdrv_pci.c:105: warning: statement with no effect
> drivers/pci/pcie/portdrv_pci.c:226: warning: statement with no effect
> drivers/scsi/arcmsr/arcmsr_hba.c:352: warning: statement with no effect
> 
> Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>

Acked-by: Linas Vepstas <[EMAIL PROTECTED]>

--linas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] PCI AER: fix warnings when PCIEAER=n

2007-08-23 Thread Linas Vepstas

On Thu, Aug 23, 2007 at 10:37:53AM -0700, Randy Dunlap wrote:
 From: Randy Dunlap [EMAIL PROTECTED]
 
 Fix warnings when CONFIG_PCIEAER=n:
 
 drivers/pci/pcie/portdrv_pci.c:105: warning: statement with no effect
 drivers/pci/pcie/portdrv_pci.c:226: warning: statement with no effect
 drivers/scsi/arcmsr/arcmsr_hba.c:352: warning: statement with no effect
 
 Signed-off-by: Randy Dunlap [EMAIL PROTECTED]

Acked-by: Linas Vepstas [EMAIL PROTECTED]

--linas

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ppc: remove unused amiga_request_irq and mach_request_irq

2007-08-22 Thread Linas Vepstas

On Wed, Aug 22, 2007 at 07:44:41PM +1000, Paul Mackerras wrote:
> Fernando Luis Vázquez Cao writes:
> 
> > amiga_request_irq and mach_request_irq are never used, so delete them.
> 
> OK, but is there a particular reason you want to do this?
> 
> The whole of arch/ppc is going away eventually, so I don't think we
> need to remove it piece by piece.

Its often easier to port/move stuff if you clean it up first.

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ppc: remove unused amiga_request_irq and mach_request_irq

2007-08-22 Thread Linas Vepstas

On Wed, Aug 22, 2007 at 07:44:41PM +1000, Paul Mackerras wrote:
 Fernando Luis Vázquez Cao writes:
 
  amiga_request_irq and mach_request_irq are never used, so delete them.
 
 OK, but is there a particular reason you want to do this?
 
 The whole of arch/ppc is going away eventually, so I don't think we
 need to remove it piece by piece.

Its often easier to port/move stuff if you clean it up first.

--linas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] Add scaled time to taskstats based process accounting

2007-08-17 Thread Linas Vepstas

On Fri, Aug 17, 2007 at 08:22:40AM +1000, Paul Mackerras wrote:
> Linas Vepstas writes:
> 
> > My gut impression (maybe wrong?) is that the scaled time is,
> > in a certain sense, "more accurate" than the unscaled time.
> 
> The "unscaled" time is just time, as in "how many seconds did this
> task spend on the CPU".  It's what all the tools (except a certain
> proprietary workload manager) expect.  Top, ps, etc. get unhappy if
> the times reported (user, system, hardirq, softirq, idle, stolen)
> don't add up to elapsed wall-clock time.

OK, so to keep the tools happy, the total time needs to add up
to wall-clock time.  Which tells me that the "scaled idle time"
should be defined as "wall clock time minus the other stuff".

> The "scaled" time is really CPU cycles divided by some arbitrary
> factor (the notional CPU frequency).  So yes it does give some
> indication of how much progress the task should have made, in some
> sense.

Yes, good, that's what I was expecting.  As a sysadmin and/or 
back-of-the-envelope performance person, I would certainly like 
to have ps and top report the scaled time. When I do "performance 
tuning", I almost always can get away with quick-n-dirty use of
vmstat and top, and only rarely have to descend into more complex 
tools.  I'd hate to loose this quick-n-dirty utility, which,
again ... my gut impression is that these numbers suddenly turn 
mostly meaningless.

That is, if I run the same task 3 times over the next few hours,
will vmstat/top/ps report more or less he same figures?  I'm 
concerned that they won't ... that I'll see different values
come out, depending on whether the chip is overheating, or whether
some other partition is stealing, or whatever causes this thing to 
dynamically scale.

> Both measures are useful.  Because the current user API is in terms of
> real time rather than cycles, we have to continue reporting real time,
> not scaled time, which is why the existing interfaces report unscaled
> time, and the scaled time values are reported through a new extension
> to the taskstats interface.

This begs the question of "what is the real, actual elapsed time?" 
... currently, the "real time" depends very much on how often your 
process got scheduled -- but, if your process is scheduled but 
(due to scaling) isn't "actually running", should that count towards 
the "real time"?

---
I supose that its inevitable that this stuff will get more complex;
I'm just trying to make sure we don't end up doing this backwards,
and deciding to change it around later.

I already notice that "stolen time" is causing confusion in some 
areas.  Its disconcerting to have lots of cores, and lots of threads
per core, only to find that some of your time has been "stolen".
I'm still wondering ... was this the right way to report this?

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] Add scaled time to taskstats based process accounting

2007-08-17 Thread Linas Vepstas

On Fri, Aug 17, 2007 at 08:22:40AM +1000, Paul Mackerras wrote:
 Linas Vepstas writes:
 
  My gut impression (maybe wrong?) is that the scaled time is,
  in a certain sense, more accurate than the unscaled time.
 
 The unscaled time is just time, as in how many seconds did this
 task spend on the CPU.  It's what all the tools (except a certain
 proprietary workload manager) expect.  Top, ps, etc. get unhappy if
 the times reported (user, system, hardirq, softirq, idle, stolen)
 don't add up to elapsed wall-clock time.

OK, so to keep the tools happy, the total time needs to add up
to wall-clock time.  Which tells me that the scaled idle time
should be defined as wall clock time minus the other stuff.

 The scaled time is really CPU cycles divided by some arbitrary
 factor (the notional CPU frequency).  So yes it does give some
 indication of how much progress the task should have made, in some
 sense.

Yes, good, that's what I was expecting.  As a sysadmin and/or 
back-of-the-envelope performance person, I would certainly like 
to have ps and top report the scaled time. When I do performance 
tuning, I almost always can get away with quick-n-dirty use of
vmstat and top, and only rarely have to descend into more complex 
tools.  I'd hate to loose this quick-n-dirty utility, which,
again ... my gut impression is that these numbers suddenly turn 
mostly meaningless.

That is, if I run the same task 3 times over the next few hours,
will vmstat/top/ps report more or less he same figures?  I'm 
concerned that they won't ... that I'll see different values
come out, depending on whether the chip is overheating, or whether
some other partition is stealing, or whatever causes this thing to 
dynamically scale.

 Both measures are useful.  Because the current user API is in terms of
 real time rather than cycles, we have to continue reporting real time,
 not scaled time, which is why the existing interfaces report unscaled
 time, and the scaled time values are reported through a new extension
 to the taskstats interface.

This begs the question of what is the real, actual elapsed time? 
... currently, the real time depends very much on how often your 
process got scheduled -- but, if your process is scheduled but 
(due to scaling) isn't actually running, should that count towards 
the real time?

---
I supose that its inevitable that this stuff will get more complex;
I'm just trying to make sure we don't end up doing this backwards,
and deciding to change it around later.

I already notice that stolen time is causing confusion in some 
areas.  Its disconcerting to have lots of cores, and lots of threads
per core, only to find that some of your time has been stolen.
I'm still wondering ... was this the right way to report this?

--linas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] Add scaled time to taskstats based process accounting

2007-08-16 Thread Linas Vepstas

On Thu, Aug 16, 2007 at 05:09:22PM +1000, Michael Neuling wrote:
> This adds two items to the taststats struct to account for user and
> system time based on scaling the CPU frequency and instruction issue
> rates.
> 
> Adds account_(user|system)_time_scaled callbacks which architectures
> can use to account for time using this mechanism.

There's something simple here that I just don't understand.

>  /*
> + * Account scaled user cpu time to a process.
> + * @p: the process that the cpu time gets accounted to
> + * @cputime: the cpu time spent in user space since the last update
> + */
> +void account_user_time_scaled(struct task_struct *p, cputime_t cputime)
> +{
> + p->utimescaled = cputime_add(p->utimescaled, cputime);
> +}

My gut impression (maybe wrong?) is that the scaled time is,
in a certain sense, "more accurate" than the unscaled time.
In fact, the unscaled time gives me the impression of being
rather meaningless, as it has no particular significance
with respect to the wall-clock, and it also doesn't give
any accurate hint of how much cpu resource was actually
consumed.

If one has a cpu with frequency scaling, then when would
one ever be interested in the non-scaled time? If the answer
is "never", then why not just always use the scaled time,
instead of adding more stuff to the kernel structs?

--linas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] Add scaled time to taskstats based process accounting

2007-08-16 Thread Linas Vepstas

On Thu, Aug 16, 2007 at 05:09:22PM +1000, Michael Neuling wrote:
 This adds two items to the taststats struct to account for user and
 system time based on scaling the CPU frequency and instruction issue
 rates.
 
 Adds account_(user|system)_time_scaled callbacks which architectures
 can use to account for time using this mechanism.

There's something simple here that I just don't understand.

  /*
 + * Account scaled user cpu time to a process.
 + * @p: the process that the cpu time gets accounted to
 + * @cputime: the cpu time spent in user space since the last update
 + */
 +void account_user_time_scaled(struct task_struct *p, cputime_t cputime)
 +{
 + p-utimescaled = cputime_add(p-utimescaled, cputime);
 +}

My gut impression (maybe wrong?) is that the scaled time is,
in a certain sense, more accurate than the unscaled time.
In fact, the unscaled time gives me the impression of being
rather meaningless, as it has no particular significance
with respect to the wall-clock, and it also doesn't give
any accurate hint of how much cpu resource was actually
consumed.

If one has a cpu with frequency scaling, then when would
one ever be interested in the non-scaled time? If the answer
is never, then why not just always use the scaled time,
instead of adding more stuff to the kernel structs?

--linas

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [459/2many] MAINTAINERS - SPIDERNET NETWORK DRIVER for CELL

2007-08-14 Thread Linas Vepstas

On Mon, Aug 13, 2007 at 10:07:25AM -0700, Joe Perches wrote:
> On Mon, 2007-08-13 at 10:45 -0500, Linas Vepstas wrote:
> > Note quite right. spider-pic is not part of spider_net.
> 
> SPIDERNET NETWORK DRIVER for CELL
> P:Linas Vepstas
> M:[EMAIL PROTECTED]
> L:[EMAIL PROTECTED]
> S:Supported
> F:Documentation/networking/spider_net.txt
> F:drivers/net/spider_net*

Works for me.

Acked-by: Linas Vepstas <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [459/2many] MAINTAINERS - SPIDERNET NETWORK DRIVER for CELL

2007-08-14 Thread Linas Vepstas

On Mon, Aug 13, 2007 at 10:07:25AM -0700, Joe Perches wrote:
 On Mon, 2007-08-13 at 10:45 -0500, Linas Vepstas wrote:
  Note quite right. spider-pic is not part of spider_net.
 
 SPIDERNET NETWORK DRIVER for CELL
 P:Linas Vepstas
 M:[EMAIL PROTECTED]
 L:[EMAIL PROTECTED]
 S:Supported
 F:Documentation/networking/spider_net.txt
 F:drivers/net/spider_net*

Works for me.

Acked-by: Linas Vepstas [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [459/2many] MAINTAINERS - SPIDERNET NETWORK DRIVER for CELL

2007-08-13 Thread Linas Vepstas

On Sun, Aug 12, 2007 at 11:36:42PM -0700, [EMAIL PROTECTED] wrote:
> Add file pattern to MAINTAINER entry
> 
> Signed-off-by: Joe Perches <[EMAIL PROTECTED]>
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index b616562..fa8fb1c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -4377,6 +4377,9 @@ P:  Linas Vepstas
>  M:   [EMAIL PROTECTED]
>  L:   [EMAIL PROTECTED]
>  S:   Supported
> +F:   Documentation/networking/spider_net.txt
> +F:   arch/powerpc/platforms/cell/spider-pic.c
> +F:   drivers/net/spider_net*

Note quite right. spider-pic is not part of spider_net.
The rest loks fine.

--linas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [459/2many] MAINTAINERS - SPIDERNET NETWORK DRIVER for CELL

2007-08-13 Thread Linas Vepstas

On Sun, Aug 12, 2007 at 11:36:42PM -0700, [EMAIL PROTECTED] wrote:
 Add file pattern to MAINTAINER entry
 
 Signed-off-by: Joe Perches [EMAIL PROTECTED]
 
 diff --git a/MAINTAINERS b/MAINTAINERS
 index b616562..fa8fb1c 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
 @@ -4377,6 +4377,9 @@ P:  Linas Vepstas
  M:   [EMAIL PROTECTED]
  L:   [EMAIL PROTECTED]
  S:   Supported
 +F:   Documentation/networking/spider_net.txt
 +F:   arch/powerpc/platforms/cell/spider-pic.c
 +F:   drivers/net/spider_net*

Note quite right. spider-pic is not part of spider_net.
The rest loks fine.

--linas

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH]: PCI Error Recovery: Symbios SCSI device driver

2007-08-02 Thread Linas Vepstas

On Thu, Jul 05, 2007 at 12:54:06PM -0600, Matthew Wilcox wrote:
> On Thu, Jul 05, 2007 at 11:28:38AM -0700, Andrew Morton wrote:
> > Well you've sent it a couple of times, and I've sent it in five more times
> > over the past year.  Once we were told "awaiting maintainer ack".
> > 
> > This situation is fairly stupid.  How about we make you the maintainer?
> 
> Last time I looked at it, I still wasn't comfortable with it.  I'm going
> to look at it again.

Please do. Its burning the proverbial hole in my pocket; I'd really
like to get this off my list of things I worry about.

> I'm fairly sure Linas doesn't want to be the sym2 maintainer.  It's
> still an ugly pile of junk that needs cleaning up.

Heh. I have no difficulty living with ugly code: its actually a 
great excuse to fix things instead of doing "real work" :-)

Rather, the menagerie of hardware I have access to is constantly 
changing; I don't have a symbios card just right now, and it might 
take a few days to even find someone who did.  Which is an incredibly
unpleasent, unrewarding activity.

--linas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH]: PCI Error Recovery: Symbios SCSI device driver

2007-08-02 Thread Linas Vepstas

On Thu, Jul 05, 2007 at 12:54:06PM -0600, Matthew Wilcox wrote:
 On Thu, Jul 05, 2007 at 11:28:38AM -0700, Andrew Morton wrote:
  Well you've sent it a couple of times, and I've sent it in five more times
  over the past year.  Once we were told awaiting maintainer ack.
  
  This situation is fairly stupid.  How about we make you the maintainer?
 
 Last time I looked at it, I still wasn't comfortable with it.  I'm going
 to look at it again.

Please do. Its burning the proverbial hole in my pocket; I'd really
like to get this off my list of things I worry about.

 I'm fairly sure Linas doesn't want to be the sym2 maintainer.  It's
 still an ugly pile of junk that needs cleaning up.

Heh. I have no difficulty living with ugly code: its actually a 
great excuse to fix things instead of doing real work :-)

Rather, the menagerie of hardware I have access to is constantly 
changing; I don't have a symbios card just right now, and it might 
take a few days to even find someone who did.  Which is an incredibly
unpleasent, unrewarding activity.

--linas

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic

2007-07-30 Thread Linas Vepstas

On Mon, Jul 30, 2007 at 05:24:33PM +0200, Jan-Bernd Themann wrote:
> 
> Changes to http://www.spinics.net/lists/netdev/msg36912.html
> 
> 1) A new field called "features" has been added to the net_lro_mgr struct.
>It is set by the driver to indicate:
>- LRO_F_NAPI:Use NAPI / netif_rx to pass packets to stack
> 
>- LRO_F_EXTRACT_VLAN_ID: Set by driver if HW extracts VLAN IDs for VLAN
> packets but does not modify ETH protocol (ETH_P_8021Q)
> 
> 2) Padded frames are not aggregated for now. Bug fixed
> 
> 3) Correct header length now used. No minimal header length for aggregated
>packets used anymore.
> 
> 4) Statistic counters were introduced. They are stored in a new struct in
>the net_lro_mgr. This has the advantage that no locking is required in
>cases where the driver uses multiple lro_mgrs for different receive queues.
>Thus we get the following statistics per lro_mgr / eth device:
>- Number of aggregated packets
>- Number of flushed packets
>- Number of times we run out of lro_desc.
> 
>The ratio of "aggregated packets" and "flushed packets" give you an
>idea how well LRO is working.

I'd like to see an edited form of this, together with an introduction to
LRO, written up in the Documentation subdirectory.  

As someone with some driver experience, but not on te bleeding edge,
some basc newbie questions pop into mind:

-- what is LRO?
-- Basic principles of operation?
-- Can I use it in my driver?  
-- Does my hardware have to have some special feature before I can use it?
-- What sort of performance improvement does it provide? Throughput?
   Latency? CPU usage? How does it affect DMA allocation? Does it 
   improve only a certain type of traffic (large/small packets, etc.)
-- Example code? What's the API? How should my driver use it?

Right now, I can maybe find answers by doing lots of googling.  I'd like
to have some quick way of getting a grip on this.

--linas
   
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic

2007-07-30 Thread Linas Vepstas

On Mon, Jul 30, 2007 at 05:24:33PM +0200, Jan-Bernd Themann wrote:
 
 Changes to http://www.spinics.net/lists/netdev/msg36912.html
 
 1) A new field called features has been added to the net_lro_mgr struct.
It is set by the driver to indicate:
- LRO_F_NAPI:Use NAPI / netif_rx to pass packets to stack
 
- LRO_F_EXTRACT_VLAN_ID: Set by driver if HW extracts VLAN IDs for VLAN
 packets but does not modify ETH protocol (ETH_P_8021Q)
 
 2) Padded frames are not aggregated for now. Bug fixed
 
 3) Correct header length now used. No minimal header length for aggregated
packets used anymore.
 
 4) Statistic counters were introduced. They are stored in a new struct in
the net_lro_mgr. This has the advantage that no locking is required in
cases where the driver uses multiple lro_mgrs for different receive queues.
Thus we get the following statistics per lro_mgr / eth device:
- Number of aggregated packets
- Number of flushed packets
- Number of times we run out of lro_desc.
 
The ratio of aggregated packets and flushed packets give you an
idea how well LRO is working.

I'd like to see an edited form of this, together with an introduction to
LRO, written up in the Documentation subdirectory.  

As someone with some driver experience, but not on te bleeding edge,
some basc newbie questions pop into mind:

-- what is LRO?
-- Basic principles of operation?
-- Can I use it in my driver?  
-- Does my hardware have to have some special feature before I can use it?
-- What sort of performance improvement does it provide? Throughput?
   Latency? CPU usage? How does it affect DMA allocation? Does it 
   improve only a certain type of traffic (large/small packets, etc.)
-- Example code? What's the API? How should my driver use it?

Right now, I can maybe find answers by doing lots of googling.  I'd like
to have some quick way of getting a grip on this.

--linas
   
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] crash in 2.6.22-git2 sysctl_set_parent()

2007-07-16 Thread Linas Vepstas

On Fri, Jul 13, 2007 at 03:47:02PM -0700, David Miller wrote:
> From: [EMAIL PROTECTED] (Linas Vepstas)
> Date: Fri, 13 Jul 2007 15:05:15 -0500
> 
> > 
> > This is a patch (& bug report) for a crash in sysctl_set_parent() 
> > in 2.6.22-git2. 
> > 
> > Problem: 2.6.22-git2 crashes with a stack trace 
> > 
> > Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>
> 
> Thanks for tracking this down, I'll apply your patch.

NAK. As I just explained in another email, this bug
was introduced by the "send-to-self" patch I habitually
apply -- so habitually, that I forgot I was not working 
with a "clean" tree. So it goes ... 

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] crash in 2.6.22-git2 sysctl_set_parent()

2007-07-16 Thread Linas Vepstas

On Fri, Jul 13, 2007 at 07:06:56PM -0600, Eric W. Biederman wrote:
> > .data   = _devconf.loop,
> > .maxlen = sizeof(int),
> > .mode   = 0644,
> > +   .child  = 0x0,
> > .proc_handler   = _dointvec,
> > },
> Where did this entry above in devinet_sysctl come from?

My bad.
I habitually apply the "send-to-self" patch, since some of the 
network testing that I do is easiest if I load up the all of the 
adapters in the same box. (If you're not familiar with this patch ... 
its great, and I wish it was integratedd into mainline. It allows
one to drive network traffic through the physical devices, even
if they are in the same box.  Without it, the network stack is
too clever, and won't allow you to do this.)

> > +   {
> > +   .ctl_name   = 0,
> > +   .procname   = 0,
> > +   },
> I probably would have just done:
> + {},

Yes, in retrospect, this would have been the simplest solution.

> What added the additional entry to devinet_root_dir?  I don't see that
> in Linus' tree?
> 
> The result may be fine but if it isn't named in a per network device
> manner we are adding duplicate entries to the root /proc/sys directory
> which is wrong.
> 
> Actually come to think of it I am concerned that someone added a
> settable entry into /proc/sys/ it should at least be in /proc/sys/net/
> where it won't conflict with other uses of that directory.  Especially
> as things like network devices have user controlled names.

Sigh. Silly me. Haste makes waste.

--linas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] crash in 2.6.22-git2 sysctl_set_parent()

2007-07-16 Thread Linas Vepstas

On Fri, Jul 13, 2007 at 07:06:56PM -0600, Eric W. Biederman wrote:
  .data   = ipv4_devconf.loop,
  .maxlen = sizeof(int),
  .mode   = 0644,
  +   .child  = 0x0,
  .proc_handler   = proc_dointvec,
  },
 Where did this entry above in devinet_sysctl come from?

My bad.
I habitually apply the send-to-self patch, since some of the 
network testing that I do is easiest if I load up the all of the 
adapters in the same box. (If you're not familiar with this patch ... 
its great, and I wish it was integratedd into mainline. It allows
one to drive network traffic through the physical devices, even
if they are in the same box.  Without it, the network stack is
too clever, and won't allow you to do this.)

  +   {
  +   .ctl_name   = 0,
  +   .procname   = 0,
  +   },
 I probably would have just done:
 + {},

Yes, in retrospect, this would have been the simplest solution.

 What added the additional entry to devinet_root_dir?  I don't see that
 in Linus' tree?
 
 The result may be fine but if it isn't named in a per network device
 manner we are adding duplicate entries to the root /proc/sys directory
 which is wrong.
 
 Actually come to think of it I am concerned that someone added a
 settable entry into /proc/sys/ it should at least be in /proc/sys/net/
 where it won't conflict with other uses of that directory.  Especially
 as things like network devices have user controlled names.

Sigh. Silly me. Haste makes waste.

--linas

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] crash in 2.6.22-git2 sysctl_set_parent()

2007-07-16 Thread Linas Vepstas

On Fri, Jul 13, 2007 at 03:47:02PM -0700, David Miller wrote:
 From: [EMAIL PROTECTED] (Linas Vepstas)
 Date: Fri, 13 Jul 2007 15:05:15 -0500

  This is a patch ( bug report) for a crash in sysctl_set_parent() 
  in 2.6.22-git2. 

  Problem: 2.6.22-git2 crashes with a stack trace 

  Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

 Thanks for tracking this down, I'll apply your patch.

NAK. As I just explained in another email, this bug
was introduced by the send-to-self patch I habitually
apply -- so habitually, that I forgot I was not working 
with a clean tree. So it goes ... 

--linas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] crash in 2.6.22-git2 sysctl_set_parent()

2007-07-13 Thread Linas Vepstas


This is a patch (& bug report) for a crash in sysctl_set_parent() 
in 2.6.22-git2. 

Problem: 2.6.22-git2 crashes with a stack trace 
[c1d0fb00] c0067b4c .sysctl_set_parent+0x48/0x7c
[c1d0fb90] c0069b40 .register_sysctl_table+0x7c/0xf4
[c1d0fc30] c065e710 .devinet_init+0x88/0xb0
[c1d0fcc0] c065db74 .ip_rt_init+0x17c/0x32c
[c1d0fd70] c065deec .ip_init+0x10/0x34
[c1d0fdf0] c065e898 .inet_init+0x160/0x3dc
[c1d0fea0] c0630bc4 .kernel_init+0x204/0x3c8

A bit of poking around makes it clear what the problem is:
In sysctl_set_parent(), the for loop 

   for (; table->ctl_name || table->procname; table++) {

walks off the end of the table, and into garbage.  Basically,
this for-loop iterator expects all table arrays to be 
"null terminated".  However, net/ipv4/devinet.c statically 
declares an array that is not null-terminated.  The patch 
below fixes that; it works for me.  Its somewhat conservative;
if one wishes to assume that the compiler will always zero out
the empty parts of the structure, then this pach can be shrunk 
to one line: +  ctl_table   devinet_root_dir[3];

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


I tried to audit some of the code to see where else there 
might be similar badly-formed static declarations.  This is hard,
as there's a lot of code. Most seems fine.


 net/core/neighbour.c |4 
 net/ipv4/devinet.c   |7 ++-
 2 files changed, 10 insertions(+), 1 deletion(-)

Index: linux-2.6.22-git2/net/ipv4/devinet.c
===
--- linux-2.6.22-git2.orig/net/ipv4/devinet.c   2007-07-13 14:23:21.0 
-0500
+++ linux-2.6.22-git2/net/ipv4/devinet.c2007-07-13 14:24:15.0 
-0500
@@ -1424,7 +1424,7 @@ static struct devinet_sysctl_table {
ctl_table   devinet_dev[2];
ctl_table   devinet_conf_dir[2];
ctl_table   devinet_proto_dir[2];
-   ctl_table   devinet_root_dir[2];
+   ctl_table   devinet_root_dir[3];
 } devinet_sysctl = {
.devinet_vars = {
DEVINET_SYSCTL_COMPLEX_ENTRY(FORWARDING, "forwarding",
@@ -1493,8 +1493,13 @@ static struct devinet_sysctl_table {
.data   = _devconf.loop,
.maxlen = sizeof(int),
.mode   = 0644,
+   .child  = 0x0,
.proc_handler   = _dointvec,
},
+   {
+   .ctl_name   = 0,
+   .procname   = 0,
+   },
},
 };
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] crash in 2.6.22-git2 sysctl_set_parent()

2007-07-13 Thread Linas Vepstas


This is a patch ( bug report) for a crash in sysctl_set_parent() 
in 2.6.22-git2. 

Problem: 2.6.22-git2 crashes with a stack trace 
[c1d0fb00] c0067b4c .sysctl_set_parent+0x48/0x7c
[c1d0fb90] c0069b40 .register_sysctl_table+0x7c/0xf4
[c1d0fc30] c065e710 .devinet_init+0x88/0xb0
[c1d0fcc0] c065db74 .ip_rt_init+0x17c/0x32c
[c1d0fd70] c065deec .ip_init+0x10/0x34
[c1d0fdf0] c065e898 .inet_init+0x160/0x3dc
[c1d0fea0] c0630bc4 .kernel_init+0x204/0x3c8

A bit of poking around makes it clear what the problem is:
In sysctl_set_parent(), the for loop 

   for (; table-ctl_name || table-procname; table++) {

walks off the end of the table, and into garbage.  Basically,
this for-loop iterator expects all table arrays to be 
null terminated.  However, net/ipv4/devinet.c statically 
declares an array that is not null-terminated.  The patch 
below fixes that; it works for me.  Its somewhat conservative;
if one wishes to assume that the compiler will always zero out
the empty parts of the structure, then this pach can be shrunk 
to one line: +  ctl_table   devinet_root_dir[3];

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


I tried to audit some of the code to see where else there 
might be similar badly-formed static declarations.  This is hard,
as there's a lot of code. Most seems fine.


 net/core/neighbour.c |4 
 net/ipv4/devinet.c   |7 ++-
 2 files changed, 10 insertions(+), 1 deletion(-)

Index: linux-2.6.22-git2/net/ipv4/devinet.c
===
--- linux-2.6.22-git2.orig/net/ipv4/devinet.c   2007-07-13 14:23:21.0 
-0500
+++ linux-2.6.22-git2/net/ipv4/devinet.c2007-07-13 14:24:15.0 
-0500
@@ -1424,7 +1424,7 @@ static struct devinet_sysctl_table {
ctl_table   devinet_dev[2];
ctl_table   devinet_conf_dir[2];
ctl_table   devinet_proto_dir[2];
-   ctl_table   devinet_root_dir[2];
+   ctl_table   devinet_root_dir[3];
 } devinet_sysctl = {
.devinet_vars = {
DEVINET_SYSCTL_COMPLEX_ENTRY(FORWARDING, forwarding,
@@ -1493,8 +1493,13 @@ static struct devinet_sysctl_table {
.data   = ipv4_devconf.loop,
.maxlen = sizeof(int),
.mode   = 0644,
+   .child  = 0x0,
.proc_handler   = proc_dointvec,
},
+   {
+   .ctl_name   = 0,
+   .procname   = 0,
+   },
},
 };
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] [ide] mmio ide support

2007-07-10 Thread Linas Vepstas

On Sun, Jul 08, 2007 at 03:15:41PM +0200, Bartlomiej Zolnierkiewicz wrote:
> 
> > on the argument that drivers/ide/ is going away soon. Most
> > current distros have already moved over to using libata
> > exclusively.
> 
> The in-kernel default for PATA systems is still IDE subsystem.

In part because libata still does not yet work correctly 
for at least some (older) ide chipsets (like mine).

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] [ide] mmio ide support

2007-07-10 Thread Linas Vepstas

On Sun, Jul 08, 2007 at 03:15:41PM +0200, Bartlomiej Zolnierkiewicz wrote:
 
  on the argument that drivers/ide/ is going away soon. Most
  current distros have already moved over to using libata
  exclusively.
 
 The in-kernel default for PATA systems is still IDE subsystem.

In part because libata still does not yet work correctly 
for at least some (older) ide chipsets (like mine).

--linas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH pata-2.6 fix] hpt366: use correct enablebits for HPT36x

2007-07-03 Thread Linas Vepstas

Hi,

On Sat, Jun 30, 2007 at 12:04:22AM +0400, Sergei Shtylyov wrote:
> The HPT36x chips finally turned out to have the channel enable bits -- 
> however,
> badly implemented.  Make use of them despite it's probably only going to 
> burden
> the driver's code -- assuming both channels are always enabled by the 
> HighPoint
> BIOS anyway...
> 
> Signed-off-by: Sergei Shtylyov <[EMAIL PROTECTED]>
> 
> ---
> Michal, Linas, please verify the patch... :-)

No negative impact for me.

(Am I supposed to acked-by if I do this kind of review/test?)

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 >

1 - 100 of 329 matches

Mail list logo