Re: Change in PCI behaviour

2010-12-04 Thread Gary Thomas

On 11/23/2010 07:44 AM, Gary Thomas wrote:

On 11/22/2010 01:26 PM, Benjamin Herrenschmidt wrote:

On Mon, 2010-11-22 at 03:01 -0700, Gary Thomas wrote:

I have a bit more information on this. I'm pretty sure that the failures
are only happening in my SCSI (SATA actually) code. My board (8347ea)
has
a PCI bus with a SIL SATA controller. This combo works perfectly in
2.6.28.
In 2.6.32, it will run for a while (possibly quite a while), then
timeout
trying to do a large block write - typically 256 blocks. Once this
timeout
happens, the SIL controller is stuck and accesses to it will eventually
cause the whole system to hang (as above).

Was there any major change in how PCI or DMA was handled between 2.6.28
and 2.6.32? Given the ephemeral nature of these failures (multiple runs
all eventually fail, but never the same twice), my only hope of
fixing it
will be to have some ideas what might have changed.


Maybe the changes you did to the PCI outbound windows are now breaking
DMA ? Make sure the outbound and inbound don't overlap for example and
that all RAM is reachable for inbound.


Here's what I did to work around this - in my DTS, I set up my PCI as
ranges = 0x0200 0x0 0xC400 0xC400 0x0 0x1C00
0x0100 0x0 0x 0xB800 0x0 0x0010;
Before, I had it as
ranges = 0x0200 0x0 0xC000 0xC000 0x0 0x2000
0x0100 0x0 0x 0xB800 0x0 0x0010;

I wasn't sure how to reserve the memory (based on your earlier suggestion),
so I just narrowed the window. Note that I did not change the PCI hardware
registers (maybe the FSL code does?), so the outbound window should still
be the whole 512MB.

If this isn't viable, perhaps you could explain a bit more how to reserve
such a chunk of memory so that the PCI mappings remain the same.


Any ideas on this?  I'm a bit lost as to how to reserve the memory like
you suggested and what I've tried so far has met little success.

Thanks again

--

Gary Thomas |  Consulting for the
MLB Associates  |Embedded world

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Change in PCI behaviour

2010-12-04 Thread Benjamin Herrenschmidt
On Sat, 2010-12-04 at 05:49 -0700, Gary Thomas wrote:
 On 11/23/2010 07:44 AM, Gary Thomas wrote:
  On 11/22/2010 01:26 PM, Benjamin Herrenschmidt wrote:
  On Mon, 2010-11-22 at 03:01 -0700, Gary Thomas wrote:
  I have a bit more information on this. I'm pretty sure that the failures
  are only happening in my SCSI (SATA actually) code. My board (8347ea)
  has
  a PCI bus with a SIL SATA controller. This combo works perfectly in
  2.6.28.
  In 2.6.32, it will run for a while (possibly quite a while), then
  timeout
  trying to do a large block write - typically 256 blocks. Once this
  timeout
  happens, the SIL controller is stuck and accesses to it will eventually
  cause the whole system to hang (as above).
 
  Was there any major change in how PCI or DMA was handled between 2.6.28
  and 2.6.32? Given the ephemeral nature of these failures (multiple runs
  all eventually fail, but never the same twice), my only hope of
  fixing it
  will be to have some ideas what might have changed.
 
  Maybe the changes you did to the PCI outbound windows are now breaking
  DMA ? Make sure the outbound and inbound don't overlap for example and
  that all RAM is reachable for inbound.
 
  Here's what I did to work around this - in my DTS, I set up my PCI as
  ranges = 0x0200 0x0 0xC400 0xC400 0x0 0x1C00
  0x0100 0x0 0x 0xB800 0x0 0x0010;
  Before, I had it as
  ranges = 0x0200 0x0 0xC000 0xC000 0x0 0x2000
  0x0100 0x0 0x 0xB800 0x0 0x0010;
 
  I wasn't sure how to reserve the memory (based on your earlier suggestion),
  so I just narrowed the window. Note that I did not change the PCI hardware
  registers (maybe the FSL code does?), so the outbound window should still
  be the whole 512MB.
 
  If this isn't viable, perhaps you could explain a bit more how to reserve
  such a chunk of memory so that the PCI mappings remain the same.
 
 Any ideas on this?  I'm a bit lost as to how to reserve the memory like
 you suggested and what I've tried so far has met little success.
 
 Thanks again
 

Look at pcibios_reserve_legacy_regions() in
arch/powerpc/kernel/pci-common.c, it reserves the legacy IO and VGA
regions on host bridges. You can make it reserve whatever your
device is allergic too.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Change in PCI behaviour

2010-11-23 Thread Gary Thomas

On 11/22/2010 01:26 PM, Benjamin Herrenschmidt wrote:

On Mon, 2010-11-22 at 03:01 -0700, Gary Thomas wrote:

I have a bit more information on this.  I'm pretty sure that the failures
are only happening in my SCSI (SATA actually) code.  My board (8347ea) has
a PCI bus with a SIL SATA controller.  This combo works perfectly in 2.6.28.
In 2.6.32, it will run for a while (possibly quite a while), then timeout
trying to do a large block write - typically 256 blocks.  Once this timeout
happens, the SIL controller is stuck and accesses to it will eventually
cause the whole system to hang (as above).

Was there any major change in how PCI or DMA was handled between 2.6.28
and 2.6.32?  Given the ephemeral nature of these failures (multiple runs
all eventually fail, but never the same twice), my only hope of fixing it
will be to have some ideas what might have changed.


Maybe the changes you did to the PCI outbound windows are now breaking
DMA ? Make sure the outbound and inbound don't overlap for example and
that all RAM is reachable for inbound.


Here's what I did to work around this - in my DTS, I set up my PCI as
ranges = 0x0200 0x0 0xC400 0xC400 0x0 0x1C00
  0x0100 0x0 0x 0xB800 0x0 0x0010;
Before, I had it as
ranges = 0x0200 0x0 0xC000 0xC000 0x0 0x2000
  0x0100 0x0 0x 0xB800 0x0 0x0010;

I wasn't sure how to reserve the memory (based on your earlier suggestion),
so I just narrowed the window.  Note that I did not change the PCI hardware
registers (maybe the FSL code does?), so the outbound window should still
be the whole 512MB.

If this isn't viable, perhaps you could explain a bit more how to reserve
such a chunk of memory so that the PCI mappings remain the same.

Thanks again

--

Gary Thomas |  Consulting for the
MLB Associates  |Embedded world

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Change in PCI behaviour

2010-11-22 Thread Gary Thomas

On 11/21/2010 10:59 AM, Gary Thomas wrote:

On 11/19/2010 02:46 PM, Benjamin Herrenschmidt wrote:

On Fri, 2010-11-19 at 08:42 -0700, Gary Thomas wrote:

In this case, note that PCI device :00:0c.0 is at 0xc000.
This causes problems because it's a truly stupid device that does
not work properly at PCI [relative] address 0x. It simply
does not respond at that address. Pick anywhere else and it will
work fine!


Hrm, we used to have a trick avoid giving out the first meg of a bus to
avoid that sort of thing, I suppose it got lost. The rest is related to
the way you map your PCI I suppose in your dts. You can switch back to a
1:1 instead of 1:0 mapping I suppose.

One way to achieve the above result would be to, in your platform code,
reserve the mem region that corresponds to PCI 0...1M (c000...+1M)
before the device resources are assigned/allocated.

I though we had code to do that with the legacy regions somewhere...
oh well, no code at hand to check right now.


Thanks, I found a combo of regions in my DTS that fixed this.

That went well and the system is now running, but it's not stable :-(
It will crash randomly, generally leaving no trace of what went wrong.
I've attached a BDI to it, but mostly all it can tell me is it's dead
The one thing that seems to pop up is it looks like it's jumping into
space (aka the wrong place) when doing rfi (this is a guess). I've
seen things like the MSR ends up loaded with an address, or similar
strangeness.

Were there any system level changes during this period (I know it's
some time ago) that might have introduced such an instability? It's
tough to scan through the diffs and get a feeling for any little details
like this.

Any ideas or hints greatly appreciated, thanks



I have a bit more information on this.  I'm pretty sure that the failures
are only happening in my SCSI (SATA actually) code.  My board (8347ea) has
a PCI bus with a SIL SATA controller.  This combo works perfectly in 2.6.28.
In 2.6.32, it will run for a while (possibly quite a while), then timeout
trying to do a large block write - typically 256 blocks.  Once this timeout
happens, the SIL controller is stuck and accesses to it will eventually
cause the whole system to hang (as above).

Was there any major change in how PCI or DMA was handled between 2.6.28
and 2.6.32?  Given the ephemeral nature of these failures (multiple runs
all eventually fail, but never the same twice), my only hope of fixing it
will be to have some ideas what might have changed.

Thanks for any ideas

--

Gary Thomas |  Consulting for the
MLB Associates  |Embedded world

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Change in PCI behaviour

2010-11-22 Thread Gabriel Paubert
On Fri, Nov 19, 2010 at 08:42:46AM -0700, Gary Thomas wrote:
 In this case, note that PCI device :00:0c.0 is at 0xc000.
 This causes problems because it's a truly stupid device that does
 not work properly at PCI [relative] address 0x.  It simply
 does not respond at that address.  Pick anywhere else and it will
 work fine!

Yes, but it was one upon a time in the PCI spec that setting the
a base register to 0 should disable the corresponding decoder.

I don't know whether this has changed (I actually never had the 
final PCI spec, only drafts). However I once had a device who
actually did not disable base addresses set to zero and this was 
described as a bug in its (numerous) errata. This also caused
a lot of mayhem since in some versions/configurations it used 
up to 64kB of PCI I/O space (especially fun on x86...). 

Gabriel
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Change in PCI behaviour

2010-11-22 Thread Benjamin Herrenschmidt
On Mon, 2010-11-22 at 03:01 -0700, Gary Thomas wrote:
 I have a bit more information on this.  I'm pretty sure that the failures
 are only happening in my SCSI (SATA actually) code.  My board (8347ea) has
 a PCI bus with a SIL SATA controller.  This combo works perfectly in 2.6.28.
 In 2.6.32, it will run for a while (possibly quite a while), then timeout
 trying to do a large block write - typically 256 blocks.  Once this timeout
 happens, the SIL controller is stuck and accesses to it will eventually
 cause the whole system to hang (as above).
 
 Was there any major change in how PCI or DMA was handled between 2.6.28
 and 2.6.32?  Given the ephemeral nature of these failures (multiple runs
 all eventually fail, but never the same twice), my only hope of fixing it
 will be to have some ideas what might have changed.

Maybe the changes you did to the PCI outbound windows are now breaking
DMA ? Make sure the outbound and inbound don't overlap for example and
that all RAM is reachable for inbound.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Change in PCI behaviour

2010-11-21 Thread Gary Thomas

On 11/19/2010 02:46 PM, Benjamin Herrenschmidt wrote:

On Fri, 2010-11-19 at 08:42 -0700, Gary Thomas wrote:

In this case, note that PCI device :00:0c.0 is at 0xc000.
This causes problems because it's a truly stupid device that does
not work properly at PCI [relative] address 0x.  It simply
does not respond at that address.  Pick anywhere else and it will
work fine!


Hrm, we used to have a trick avoid giving out the first meg of a bus to
avoid that sort of thing, I suppose it got lost. The rest is related to
the way you map your PCI I suppose in your dts. You can switch back to a
1:1 instead of 1:0 mapping I suppose.

One way to achieve the above result would be to, in your platform code,
reserve the mem region that corresponds to PCI 0...1M (c000...+1M)
before the device resources are assigned/allocated.

I though we had code to do that with the legacy regions somewhere...
oh well, no code at hand to check right now.


Thanks, I found a combo of regions in my DTS that fixed this.

That went well and the system is now running, but it's not stable :-(
It will crash randomly, generally leaving no trace of what went wrong.
I've attached a BDI to it, but mostly all it can tell me is it's dead
The one thing that seems to pop up is it looks like it's jumping into
space (aka the wrong place) when doing rfi (this is a guess).  I've
seen things like the MSR ends up loaded with an address, or similar
strangeness.

Were there any system level changes during this period (I know it's
some time ago) that might have introduced such an instability?  It's
tough to scan through the diffs and get a feeling for any little details
like this.

Any ideas or hints greatly appreciated, thanks

--

Gary Thomas |  Consulting for the
MLB Associates  |Embedded world

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Change in PCI behaviour

2010-11-19 Thread Gary Thomas

I'm upgrading from 2.6.28 to 2.6.32 (yes, I know it's not the latest,
but it's the best I can do at the moment).  There seems to have been
a change in how the PCI bus is scanned/assigned which is causing me
some hardware problems.  My hardware is FSL MPC8347 and the problem
is likely specific to the FSL PCI code.

My system has 256MB RAM, fully mapped into the PCI export window.
There are an additional 2 devices on the PCI bus.  On 2.6.28, I
get this layout:

PCI: Probing PCI hardware
pci :00:00.0: reg 10 32bit mmio: [0x00-0x0f]
pci :00:00.0: reg 18 64bit mmio: [0x00-0xfff]
pci :00:0b.0: reg 10 io port: [0x1000-0x1007]
pci :00:0b.0: reg 14 io port: [0x1008-0x100b]
pci :00:0b.0: reg 18 io port: [0x1010-0x1017]
pci :00:0b.0: reg 1c io port: [0x1018-0x101b]
pci :00:0b.0: reg 20 io port: [0x1020-0x102f]
pci :00:0b.0: reg 24 32bit mmio: [0x10-0x1001ff]
pci :00:0b.0: reg 30 32bit mmio: [0x00-0x07]
pci :00:0b.0: supports D1 D2
pci :00:0c.0: reg 10 32bit mmio: [0x400-0x7ff]
PCI: Cannot allocate resource region 5 of device :00:0b.0, will remap
PCI: Cannot allocate resource region 0 of device :00:0c.0, will remap
bus: 00 index 0 io port: [0x00-0xf]
bus: 00 index 1 mmio: [0xc000-0xdfff]

There are no notices, but the key item is that device :00:0c.0 gets
mapped at PCI address 0xD000.

On 2.6.32, I get this:

PCI: Probing PCI hardware
pci :00:0b.0: reg 10: [io  0x1000-0x1007]
pci :00:0b.0: reg 14: [io  0x1008-0x100b]
pci :00:0b.0: reg 18: [io  0x1010-0x1017]
pci :00:0b.0: reg 1c: [io  0x1018-0x101b]
pci :00:0b.0: reg 20: [io  0x1020-0x102f]
pci :00:0b.0: reg 24: [mem 0x0010-0x001001ff]
pci :00:0b.0: reg 30: [mem 0x-0x0007 pref]
pci :00:0b.0: supports D1 D2
pci :00:0c.0: reg 10: [mem 0x0400-0x07ff]
PCI: Cannot allocate resource region 5 of device :00:0b.0, will remap
PCI: Cannot allocate resource region 0 of device :00:0c.0, will remap
pci :00:0c.0: BAR 0: assigned [mem 0xc000-0xc3ff]
pci :00:0c.0: BAR 0: set to [mem 0xc000-0xc3ff] (PCI address 
[0xc000-0xc3ff]
pci :00:0b.0: BAR 6: assigned [mem 0xc400-0xc407 pref]
pci :00:0b.0: BAR 5: assigned [mem 0xc408-0xc40801ff]
pci :00:0b.0: BAR 5: set to [mem 0xc408-0xc40801ff] (PCI address 
[0xc408-0xc40801ff]
pci_bus :00: resource 0 [io  0x-0xf]
pci_bus :00: resource 1 [mem 0xc000-0xdfff]

In this case, note that PCI device :00:0c.0 is at 0xc000.
This causes problems because it's a truly stupid device that does
not work properly at PCI [relative] address 0x.  It simply
does not respond at that address.  Pick anywhere else and it will
work fine!

On 2.6.28, I get this layout:
# ls -l /sys/bus/pci/devices
lrwxrwxrwx1 root root 0 Jan  1  1970 :00:00.0 - 
../../../devices/pci:00/:00:00.0
lrwxrwxrwx1 root root 0 Jan  1  1970 :00:0b.0 - 
../../../devices/pci:00/:00:0b.0
lrwxrwxrwx1 root root 0 Jan  1  1970 :00:0c.0 - 
../../../devices/pci:00/:00:0c.0
# cat /sys/bus/pci/devices/\:00\:0c.0/resource
0xd000 0xd3ff 0x00020200
0x 0x 0x
0x 0x 0x
0x 0x 0x
0x 0x 0x
0x 0x 0x
0x 0x 0x


On 2.6.32, the final layout looks like this:
# ls -l /sys/bus/pci/devices/
lrwxrwxrwx1 root root 0 Jan  1  1970 :00:0b.0 - 
../../../devices/pci:00/:00:0b.0
lrwxrwxrwx1 root root 0 Jan  1  1970 :00:0c.0 - 
../../../devices/pci:00/:00:0c.0
# cat /sys/bus/pci/devices/\:00\:0c.0/resource
0xc000 0xc3ff 0x00020200
0x 0x 0x
0x 0x 0x
0x 0x 0x
0x 0x 0x
0x 0x 0x
0x 0x 0x

Bottom line: how can I get this behaviour back (so as to get my
stupid graphics controller working again)??

Thanks for any ideas

--

Gary Thomas |  Consulting for the
MLB Associates  |Embedded world

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Change in PCI behaviour

2010-11-19 Thread Benjamin Herrenschmidt
On Fri, 2010-11-19 at 08:42 -0700, Gary Thomas wrote:
 In this case, note that PCI device :00:0c.0 is at 0xc000.
 This causes problems because it's a truly stupid device that does
 not work properly at PCI [relative] address 0x.  It simply
 does not respond at that address.  Pick anywhere else and it will
 work fine! 

Hrm, we used to have a trick avoid giving out the first meg of a bus to
avoid that sort of thing, I suppose it got lost. The rest is related to
the way you map your PCI I suppose in your dts. You can switch back to a
1:1 instead of 1:0 mapping I suppose.

One way to achieve the above result would be to, in your platform code,
reserve the mem region that corresponds to PCI 0...1M (c000...+1M)
before the device resources are assigned/allocated.

I though we had code to do that with the legacy regions somewhere...
oh well, no code at hand to check right now.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev