Re: Inbound PCI and Memory Corruption

2013-08-02 Thread Peter LaDow
On Wed, Jul 24, 2013 at 11:13 PM, Peter LaDow  wrote:
> There are other items, such as drivers for our custom hardware modules
> implemented on the FPGA.  Perhaps I'll pull our drivers and run a
> stock kernel.  Maybe a stock 83xx configuration (such as the
> MPC8349E-MITX).  If we have problems even on a stock configuration...

Well, that didn't work either.  Unfortunately, the PCI slot on our
MPC8349E-MITX eval kit doesn't work.  It doesn't matter what card I
plug into that slot neither uboot, nor the kernel, recognize anything.

But I did have one thought.  Is it possible that somehow the
configured incoming PCI regions are marked as pre-fetchable, and the
e1000 is prefetching the descriptors?  Then at some later point the
kernel changes things with the e1000 unaware?

Thanks,
Pete
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-07-24 Thread Peter LaDow
On Wed, Jul 24, 2013 at 3:08 PM, Benjamin Herrenschmidt
 wrote:
> No, they resolve to the same thing under the hood. Did you do other
> changes ? Could it be another unrelated kernel bug causing something
> like use-after-free of network buffer or similar oddity unrelated to the
> network driver ?

There are other items, such as drivers for our custom hardware modules
implemented on the FPGA.  Perhaps I'll pull our drivers and run a
stock kernel.  Maybe a stock 83xx configuration (such as the
MPC8349E-MITX).  If we have problems even on a stock configuration...

> Have you tried with different kernel versions ?

Funny you mention it.  I just tried 3.10.2 today and we still get the
same memory corruption.  I was hoping that perhaps something had
changed between 3.0 and 3.10 that might clear up the problem, and then
I could bisect to find where it failed.  But unfortunately, 3.10.2
exhibits the same issue.

So clearly this isn't an issue specific to the kernel version.  Though
the e1000 driver looks largely unchanged in 3.10.  So if the problem
is driver related, it would still be there.

Thanks,
Pete
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-07-24 Thread Benjamin Herrenschmidt
On Wed, 2013-07-24 at 08:39 -0700, Peter LaDow wrote:
> A bit of history that may help.  We were using an e100 (an 82559)
> part, but Intel EOL'd that part so we picked up the 82540EP (which
> they have also recently EOL'd).  The e100 driver uses a different DMA
> model.  It uses pci_map_single/pci_unmap_single along with
> pci_dma_sync_single_for* calls (as well as other PCI calls).  The
> e1000 driver, however, does not use the pci_* calls.  We have never
> had a problem with the e100 parts.  I don't suppose the use of
> pci_map_* vs dma_map_* makes a difference does it?

No, they resolve to the same thing under the hood. Did you do other
changes ? Could it be another unrelated kernel bug causing something
like use-after-free of network buffer or similar oddity unrelated to the
network driver ?

Have you tried with different kernel versions ?

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-07-24 Thread Peter LaDow
On Tue, Jul 23, 2013 at 9:27 PM, Benjamin Herrenschmidt
 wrote:
> CONFIG_NOT_COHERENT_CACHE will do it for you (in
> arch/powerpc/kernel/dma.c) provided the driver does the right things vs.
> the DMA accessors but afaik e1000 does.

Well, when I went to make the changes I noted a few things.  First,
the e1000 driver does a dma_unmap_single() prior to processing the
descriptor.  So it would seem that the dma_sync_single_for_cpu() isn't
necessary in that case.  And when allocating descriptors, it does
dma_map_single() after setting up the descriptor, so
dma_sync_single_for_device() probably isn't necessary either.

But regardless, I put in the dma_sync_single_* calls and we still get
the same behavior.  So, even with CONFIG_NOT_COHERENT_CACHE we are
getting this error.

> If that helps, that might hint at either a missing barrier or some kind
> of HW (or HW configuration) bug with cache coherency.

And unfortunately it didn't help.  We have a few other things we are
trying, but I'm not hopeful that any will change the behavior.

A bit of history that may help.  We were using an e100 (an 82559)
part, but Intel EOL'd that part so we picked up the 82540EP (which
they have also recently EOL'd).  The e100 driver uses a different DMA
model.  It uses pci_map_single/pci_unmap_single along with
pci_dma_sync_single_for* calls (as well as other PCI calls).  The
e1000 driver, however, does not use the pci_* calls.  We have never
had a problem with the e100 parts.  I don't suppose the use of
pci_map_* vs dma_map_* makes a difference does it?

Thanks,
Pete
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: Inbound PCI and Memory Corruption

2013-07-24 Thread David Laight
> On Fri, Jul 19, 2013 at 6:46 AM, Gerhard Sittig  wrote:
> > So:  No, not having to fiddle with DMA stuff when doing PCI need
> > not be a problem, it's actually expected.  But since a DMA engine
> > might be involved (that's just not under your command), the
> > accompanying problems may arise.  You may need to flush CPU
> > provided data upon write before telling an external entity to
> > access it, and may need to invalidate caches (to have data
> > re-fetched) before the CPU accesses what an external entity did
> > manipulate.  And this applies to both payload data as well as
> > management data (descriptors) if the latter apply to the former.
> 
> This is something I've been exploring today.  But what is unclear is
> _how_ to flush/invalidate the caches'.  I was going to tweak the
> driver to setup the descriptors, flush the cache, then enable the
> hardware (and when taking the device down, disable the hardware, flush
> the cache, then deallocate the descriptors).  But this is in the
> network code and it isn't obvious how to make this happen.

FWIW it is almost impossible to code for non-coherent descriptors
(even ignoring problems with speculative cache line reads).
You don't even want to try to do it except for hardware where you
can no choice.

The problem is that you have no control over the device writes
into the descriptors. In order not to lose the device writes
the cpu must not write to any cache lines that contain active
descriptors.

For the receive side this can be arranged by initialising cache
line sized blocks of descriptors (if the cache line write isn't
atomic you still have problems).

The send side is much more tricky: you either have to setup a
full cache line of descriptors or wait until the transmit is idle.

David


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-07-23 Thread Benjamin Herrenschmidt
On Tue, 2013-07-23 at 21:22 -0700, Peter LaDow wrote:
> On Fri, Jul 19, 2013 at 6:46 AM, Gerhard Sittig  wrote:
> > So:  No, not having to fiddle with DMA stuff when doing PCI need
> > not be a problem, it's actually expected.  But since a DMA engine
> > might be involved (that's just not under your command), the
> > accompanying problems may arise.  You may need to flush CPU
> > provided data upon write before telling an external entity to
> > access it, and may need to invalidate caches (to have data
> > re-fetched) before the CPU accesses what an external entity did
> > manipulate.  And this applies to both payload data as well as
> > management data (descriptors) if the latter apply to the former.
> 
> This is something I've been exploring today.  But what is unclear is
> _how_ to flush/invalidate the caches'.  I was going to tweak the
> driver to setup the descriptors, flush the cache, then enable the
> hardware (and when taking the device down, disable the hardware, flush
> the cache, then deallocate the descriptors).  But this is in the
> network code and it isn't obvious how to make this happen.

CONFIG_NOT_COHERENT_CACHE will do it for you (in
arch/powerpc/kernel/dma.c) provided the driver does the right things vs.
the DMA accessors but afaik e1000 does.

The problem with that is we never "officially" supported that option of
non-coherent cache (non-coherent DMA) on any of the "S" processors
(including 603 aka e300) because first they are supposed to be used in
coherent fabrics, but also because the code somewhat assumes that your
CPU won't suddenly prefetch stuff back into the cache at any time.

The 603 does some amount of speculative prefech, so potentially might
pollute the cache.

But it's still worth trying out.

If that helps, that might hint at either a missing barrier or some kind
of HW (or HW configuration) bug with cache coherency.

> I think I figured something out.  Basically, in the receive interrupt,
> prior to reading the data in the descriptor, I call
> dma_sync_single_for_cpu().  Then the driver can continue to process
> the data, then unmap the DMA region (with dma_unmap_single() ).  When
> setting up the descriptors, after calling dma_map_single(),
> configuring the descriptor, I then call dma_sync_single_for_device().
> Does this look correct?

Yes.

> However, on the PPC platforms, these calls (dma_sync_*) are NOPs
> unless CONFIG_NOT_COHERENT_CACHE is defined (which it doesn't appear
> to be for the 8349).  So I tweaked the Kconfig to enable
> CONFIG_NOT_COHERENT.  Things built ok, but I'm not sure if this is
> sufficient to invoke the cache flush necessary.
> 
> Am I on the right track?

Well, they are supposed to be nops ... that's the thing. Because afaik,
anything built on a 603 core is *supposed* to be coherent (though those
NOPs should at least be memory barriers imho).

In any case, let us know if that helps.

Cheers,
Ben.

> Thanks,
> Pete


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-07-23 Thread Peter LaDow
On Fri, Jul 19, 2013 at 6:46 AM, Gerhard Sittig  wrote:
> So:  No, not having to fiddle with DMA stuff when doing PCI need
> not be a problem, it's actually expected.  But since a DMA engine
> might be involved (that's just not under your command), the
> accompanying problems may arise.  You may need to flush CPU
> provided data upon write before telling an external entity to
> access it, and may need to invalidate caches (to have data
> re-fetched) before the CPU accesses what an external entity did
> manipulate.  And this applies to both payload data as well as
> management data (descriptors) if the latter apply to the former.

This is something I've been exploring today.  But what is unclear is
_how_ to flush/invalidate the caches'.  I was going to tweak the
driver to setup the descriptors, flush the cache, then enable the
hardware (and when taking the device down, disable the hardware, flush
the cache, then deallocate the descriptors).  But this is in the
network code and it isn't obvious how to make this happen.

I think I figured something out.  Basically, in the receive interrupt,
prior to reading the data in the descriptor, I call
dma_sync_single_for_cpu().  Then the driver can continue to process
the data, then unmap the DMA region (with dma_unmap_single() ).  When
setting up the descriptors, after calling dma_map_single(),
configuring the descriptor, I then call dma_sync_single_for_device().
Does this look correct?

However, on the PPC platforms, these calls (dma_sync_*) are NOPs
unless CONFIG_NOT_COHERENT_CACHE is defined (which it doesn't appear
to be for the 8349).  So I tweaked the Kconfig to enable
CONFIG_NOT_COHERENT.  Things built ok, but I'm not sure if this is
sufficient to invoke the cache flush necessary.

Am I on the right track?

Thanks,
Pete
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-07-19 Thread Timur Tabi
On Thu, Jul 18, 2013 at 4:30 PM, Peter LaDow  wrote:
>
> It does seem that for incoming PCI transactions the Freescale DMA
> engine is not used.  And in our device tree we have the DMA engine
> commented out.  That is, the "fsl,mpc8349-dma" and "fsl,elo-dma"
> compatible items are not present in the FDT.


This is the standard on-chip DMA engine used (primarily) as an
off-loaded memcpy.  I've never seen it used for anything related to
PCI.  You can remove the DMA nodes from the device tree and see if
that fixes anything.  If it does, then it might be the DMA offload
from the network layer that's causing the problems.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-07-19 Thread Scott Wood

On 07/18/2013 05:02:33 PM, Benjamin Herrenschmidt wrote:

On Thu, 2013-07-18 at 14:30 -0700, Peter LaDow wrote:
> We are still stumped on this one, but during a review of the system
> setup one thing came up that we aren't sure about is the device tree
> and the DMA engine.
>
> It does seem that for incoming PCI transactions the Freescale DMA
> engine is not used.  And in our device tree we have the DMA engine
> commented out.  That is, the "fsl,mpc8349-dma" and "fsl,elo-dma"
> compatible items are not present in the FDT.
>
> I don't suppose this could be a problem?

I doubt it but somebody from FSL might be able to give a better  
answer.


The DMA engine is not related to inbound PCI transactions.

-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-07-19 Thread Gerhard Sittig
On Thu, Jul 18, 2013 at 14:30 -0700, Peter LaDow wrote:
> 
> We are still stumped on this one, but during a review of the system
> setup one thing came up that we aren't sure about is the device tree
> and the DMA engine.
> 
> It does seem that for incoming PCI transactions the Freescale DMA
> engine is not used.  And in our device tree we have the DMA engine
> commented out.  That is, the "fsl,mpc8349-dma" and "fsl,elo-dma"
> compatible items are not present in the FDT.
> 
> I don't suppose this could be a problem?

Can't tell whether it helps or whether I'm telling you what's
already known, but here we go:

Some Freescale SoC's don't have "_the_ DMA", but instead several
of them.  Many peripherals have a DMA engine of their own, which
you won't notice as a separate entity from the software POV
(typically ethernet, USB, PCI, partially video in and out, even
coprocessor may have dedicated DMA engines which they might take
care of themselves).

The thing that you do see (in the device tree, as a software
controllable entity) is the "general purpose DMA" with user
servicable channels.  This one is may be used for serial
communication via UART or SPI, or SDHC/MMC, or peripherals
attached to the EMB.  Sometimes it's called "DMA2" to reflect
that there are others as well.

So:  No, not having to fiddle with DMA stuff when doing PCI need
not be a problem, it's actually expected.  But since a DMA engine
might be involved (that's just not under your command), the
accompanying problems may arise.  You may need to flush CPU
provided data upon write before telling an external entity to
access it, and may need to invalidate caches (to have data
re-fetched) before the CPU accesses what an external entity did
manipulate.  And this applies to both payload data as well as
management data (descriptors) if the latter apply to the former.

HTH


virtually yours
Gerhard Sittig
-- 
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr. 5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-0 Fax: +49-8142-66989-80  Email: off...@denx.de
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-07-18 Thread Benjamin Herrenschmidt
On Thu, 2013-07-18 at 14:30 -0700, Peter LaDow wrote:
> We are still stumped on this one, but during a review of the system
> setup one thing came up that we aren't sure about is the device tree
> and the DMA engine.
> 
> It does seem that for incoming PCI transactions the Freescale DMA
> engine is not used.  And in our device tree we have the DMA engine
> commented out.  That is, the "fsl,mpc8349-dma" and "fsl,elo-dma"
> compatible items are not present in the FDT.
> 
> I don't suppose this could be a problem?

I doubt it but somebody from FSL might be able to give a better answer.

I'm personally at a loss. It looks like you are doing everything
right from what I can tell.

That leaves us with some kind of oddball driver bug or a problem
with the low level configuration of the PCIe bridge or the chip
internal bus related to cache coherency maybe.

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-07-18 Thread Peter LaDow
We are still stumped on this one, but during a review of the system
setup one thing came up that we aren't sure about is the device tree
and the DMA engine.

It does seem that for incoming PCI transactions the Freescale DMA
engine is not used.  And in our device tree we have the DMA engine
commented out.  That is, the "fsl,mpc8349-dma" and "fsl,elo-dma"
compatible items are not present in the FDT.

I don't suppose this could be a problem?

Thanks,
Pete
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-07-11 Thread Peter LaDow
On Wed, Jul 10, 2013 at 2:40 PM, Benjamin Herrenschmidt
 wrote:
> Did you get any traces that show the flow that happens around a case of
> corruption ?

Well, I captured a lot of data, both logging kernel output and
capturing PCI traffic.  I've put the full console log output on
pastebin at http://pastebin.com/ZFYbneNR

The initial corruption is a starting address of 0xe94f17f8. Looking at
the dumped data:

Slab corruption: fib6_nodes start=e94f17f8, len=32
Redzone: 0x9f911029d74e35b/0xd4bed90f1c6f0806.
Last user: [<06040001>](0x6040001)
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b ff ff ff ff ff ff
Prev obj: start=e94f17c0, len=32
Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
Last user: [<  (null)>](0x0)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5
Next obj: start=e94f1830, len=32
Redzone: 0xd4bed90f1c6f0aca/0xafba11029d74e35b.
Last user: [<  (null)>](0x0)
000: 0d 5b 00 00 00 00 00 00 0a ca 0d 01 00 00 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 bd 3e

The first corrupted byte is at address 0xe94f1802.  Looking at the
dump of all the DMA mappings this range is never mapped.  Nor is there
a single PCI write to this mapped address either.  However, I did find
some correlation to a PCI write to a near address.  From the PCI
capture:

Command | Address  |  Data | /BE
Mem Wr  | 294F1810 |   |
|  |   | 0011
|  |   | 
|  | 0FD9BED4  | 
|  | 06086F1C  | 
|  | 00080100  | 
|  | 01000406  | 
|  | 0FD9BED4  | 
|  | CA0A6F1C  | 
|  | 5B0D  | 
|  |   | 
|  | 010DCA0A  | 
|  |   | 
|  |   | 
|  |   | 
|  |   | 
|  | 3EBD  | 1100

The data in this write looks very much like the pattern in the
detected slab corruption.  Looking at the PCI trace, it doesn't appear
to be the incoming PCI data (unless the PCI Inbound Address
Translation registers are misconfigured).  Yet clearly these are
corrupted with ethernet traffic.

Thanks,
Pete
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-07-10 Thread Peter LaDow
On Wed, Jul 10, 2013 at 2:40 PM, Benjamin Herrenschmidt
 wrote:
> Well, it should work, 

I tried forcing NET_IP_ALIGN to 0, and I did see the DMA accesses
align on 32-bit boundaries with all the byte enables set.  However,
the memory corruption still occurred.

> but it's possible that there is some subtle bug on this specific Freescale 
> SoC

I looked through the Freescale errata
(http://www.freescale.com/files/32bit/doc/errata/MPC8349ECE.pdf) and
only 2 seem relevant:  PCI19 and DMA2 (the rest are either fixed in
our core version 3.1).

PCI19:  When using a dual-address cycle for inbound write accesses
with then IOS is full, the PCI overwrites the address for the IOS with
the new address from the bus.

DMA2: There can be corruption of the DMA data.  Examples are DAHTS is
8 bytes and the source port is a 32-bit PCI bus or the source memory
space i son the PCI bus and is not prefetchable.

I don't think PCI19 applies since no dual-address cycles are
generated.  From what I've seen, all the DMA addresses in the RX ring
descriptors are in the lower 32-bit address space.

I don't think DMA2 applies because it is for the DMA controller
specific to the 8349.  And since these transactions are not setup or
managed by the DMA controller...  At least I don't think they are
(unless dma_alloc and dma_map_single do something related to this).
My understanding is that in this case the PCI inbound registers are
configured and the DMA controller is not used.

> ...Did you correlate the corruption with one such packet ?
>
> Did you get any traces that show the flow that happens around a case of
> corruption ?

Not yet.  I'm having a difficult time syncing the PCI trace with the
kernel debug output.  And since the corruption may be detected well
after the actual corruption occurs, determine which DMA transfer
caused a corruption is difficult.

I'm still trying to gather more information.

Thanks,
Pete
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-07-10 Thread Benjamin Herrenschmidt
On Wed, 2013-07-10 at 14:06 -0700, Peter LaDow wrote:
> I have a bit more information, but I'm not sure of the impact.  So far
> I have been dump lots of debugging output trying to determine where
> this memory corruption could be coming from.  I've sprinkled the
> driver with wmb() (near every DMA function and the hardware IO), loads
> of printk's to get the DMA addresses, and lots and lots of PCI traces.
>
> One things that I noticed is that the addresses programmed into the
> descriptor ring for the E1000 are not 32-bit aligned.  The E1000 part
> is aligning the transfers, and use the BE's to mask off bytes.  Is
> there an issue with the PPC (notably the MPC8349) with incoming PCI
> transactions that are 32-bit word aligned but write less than a full
> word?

Well, it should work, but it's possible that there is some subtle bug on
this specific Freescale SoC Did you correlate the corruption with
one such packet ?

Did you get any traces that show the flow that happens around a case of
corruption ?

Ben.

> In looking at the PCI trace, all the DMA's of packets from the E1000
> start at a 32-bit aligned address, but the first and last words are
> not full word writes.  For example (probably need a fixed font to
> view):
> 
> Command | Address  |  Data | /BE
> Mem Wr  | 2950D180 |   |
>    | 0011
>    | 
>  DBA24DF0  | 
>  00085F19  | 
>  2424  | 
>  C530  | 
>  80D81180  | 
>  F10DCA0A  | 
>  FF0DCA0A  | 
>  CF06CC06  | 
>  A1BA1000  | 
>  01400BC5  | 
>  F1001000  | 
>    | 
>    | 
>  6873  | 
>  0F22  | 1100
> 
> Note that the first word is only a 16-bit transfer (in the upper half)
> and the last is only 16-bits (in the lower half).  And I dumped the
> descriptors and here's what is read (via DMA):
> 
> Command | Address  |  Data | /BE
> Mem Rd  | 2A2A72F0 |   |
>  2950D812  | 
>    | 
>  C8C70040  | 
>    | 
> 
> Note that the descriptor programmed into the part has a DMA address
> that is not word aligned.  And the E1000 part sets the proper byte
> enables and does a write to the aligned address of 0x2850D180.
> 
> Is there any traction on this idea?
> 
> Thanks,
> Pete


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-07-10 Thread Peter LaDow
I have a bit more information, but I'm not sure of the impact.  So far
I have been dump lots of debugging output trying to determine where
this memory corruption could be coming from.  I've sprinkled the
driver with wmb() (near every DMA function and the hardware IO), loads
of printk's to get the DMA addresses, and lots and lots of PCI traces.

One things that I noticed is that the addresses programmed into the
descriptor ring for the E1000 are not 32-bit aligned.  The E1000 part
is aligning the transfers, and use the BE's to mask off bytes.  Is
there an issue with the PPC (notably the MPC8349) with incoming PCI
transactions that are 32-bit word aligned but write less than a full
word?

In looking at the PCI trace, all the DMA's of packets from the E1000
start at a 32-bit aligned address, but the first and last words are
not full word writes.  For example (probably need a fixed font to
view):

Command | Address  |  Data | /BE
Mem Wr  | 2950D180 |   |
   | 0011
   | 
 DBA24DF0  | 
 00085F19  | 
 2424  | 
 C530  | 
 80D81180  | 
 F10DCA0A  | 
 FF0DCA0A  | 
 CF06CC06  | 
 A1BA1000  | 
 01400BC5  | 
 F1001000  | 
   | 
   | 
 6873  | 
 0F22  | 1100

Note that the first word is only a 16-bit transfer (in the upper half)
and the last is only 16-bits (in the lower half).  And I dumped the
descriptors and here's what is read (via DMA):

Command | Address  |  Data | /BE
Mem Rd  | 2A2A72F0 |   |
 2950D812  | 
   | 
 C8C70040  | 
   | 

Note that the descriptor programmed into the part has a DMA address
that is not word aligned.  And the E1000 part sets the proper byte
enables and does a write to the aligned address of 0x2850D180.

Is there any traction on this idea?

Thanks,
Pete
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-06-25 Thread Peter LaDow
On Sat, Jun 22, 2013 at 5:00 PM, Benjamin Herrenschmidt
 wrote:
> Afaik e300 is slightly out of order, maybe it's missing a memory barrier
> somewhere One thing to try is to add some to the dma_map/unmap ops.

I went through the driver and added memory barriers to the
dma_map_page/dma_unmap_page and dma_alloc_coherent/dma_free_coherent
calls (wmb() calls after each, which resolves to a sync instruction).
I still get a kernel panic.

I did turn on DEBUG_PAGE_ALLOC to try and get more information, but
I'm not finding anything new.  However, with the SLAB debugging I do
find SLAB corruption, e.g.:

Slab corruption: fib6_nodes start=e900c7f8, len=32
Redzone: 0x9f911029d74e35b/0x30a706a6050806.
Last user: [<06040001>](0x6040001)
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b ff ff ff ff ff ff
Prev obj: start=e900c7c0, len=32
Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
Last user: [<  (null)>](0x0)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5
Next obj: start=e900c830, len=32
Redzone: 0x30a706a6050aca/0xc8be11029d74e35b.
Last user: [<  (null)>](0x0)
000: 0d aa 00 00 00 00 00 00 0a ca 0d 49 00 00 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 75 8b

Which is clearly corrupted with ethernet frames.  The only interface
connected is the e1000.  Eventually this corruption leads to a kernel
panic.

I'm completely confused on how this could happen. Given the M bit is
set for all pages (see below), and with memory barriers on the DMA
map/unmap and register operations, the only thing I can think of is
something in the IO sequencer (which was suggested in the link I gave
earlier).  Yet the patch mentioned is in place.

> Also audit the driver to ensure that it properly uses barriers when
> populating descriptors (and maybe compare to a more recent version of
> the driver upstream).

I've gone through the driver and didn't see anything missing.  And the
upstream (v3.10-rc5) driver is the same version (7.3.21-k8-NAPI).  And
I've used the latest from the e1000 release (8.0.35-NAPI), and I get
the same problem.

On Sun, Jun 23, 2013 at 6:16 PM, Benjamin Herrenschmidt
 wrote:
> Also dbl check that the MMU is indeed mapping all these pages with the
> "M" bit.

The DBAT's have the M bit set (both have 0x12 in the DBATxL
registers)...sometimes.  Usually when I halt the CPU and dumps the
BAT's, all the IBAT's and DBAT's have zeros.  But occasionally I see
DBAT2 and DBAT3 with values and the M bit set.

I also dumped all the TLB entries, and every one of them has the M bit
set (see below).

TLB dump:

BDI>dtlb 0 63
IDX  V RC VSID   VPIRPN  WIMG PP
  0: V 0C 000eee_e9a -> 2e9a --M- 00
  1: V 0C 000eee_f401000 -> 2f401000 --M- 00
  2: V 1C 000ccc_0502000 -> 00502000 --M- 00
  3: V 0C 000eee_f403000 -> 2f403000 --M- 00
  4: V 0C 000eee_c124000 -> 2c124000 --M- 00
  5: V 0C 000eee_f405000 -> 2f405000 --M- 00
  6: V 0C 000eee_e9e6000 -> 2e9e6000 --M- 00
  7: V 0C 33afd1_0427000 -> 005f8000 --M- 10
  8: V 0C 33afd1_0428000 -> 2ff63000 --M- 10
  9: V 0C 000ccc_0349000 -> 00349000 --M- 00
 10: V 1C 000ccc_03ca000 -> 003ca000 --M- 00
 11: V 1C 000ccc_03cb000 -> 003cb000 --M- 00
 12: V 0C 33afd1_040c000 -> 003b4000 --M- 11
 13: V 0C 000eee_f40d000 -> 2f40d000 --M- 00
 14: V 1C 000eee_fa8e000 -> 2fa8e000 --M- 00
 15: V 0- 33afd1_034f000 -> 2e6b1000 --M- 11
 16: V 0C 000eee_f47 -> 2f47 --M- 00
 17: V 0C 33afd1_0411000 -> 2fe54000 --M- 10
 18: V 0C 000eee_f4b2000 -> 2f4b2000 --M- 00
 19: V 1C 33eb14_8073000 -> 00462000 --M- 10
 20: V 0C 000ccc_02f4000 -> 002f4000 --M- 00
 21: V 0C 000eee_f415000 -> 2f415000 --M- 00
 22: V 1C 000ccc_03f6000 -> 003f6000 --M- 00
 23: V 0C 000ccc_02f7000 -> 002f7000 --M- 00
 24: V 1C 000ccc_03f8000 -> 003f8000 --M- 00
 25: V 0C 000ccc_03d9000 -> 003d9000 --M- 00
 26: V 1C 33b304_a31a000 -> 007f4000 --M- 10
 27: V 1C 000ccc_03fb000 -> 003fb000 --M- 00
 28: V 1C 000ccc_03fc000 -> 003fc000 --M- 00
 29: V 0C 000eee_f41d000 -> 2f41d000 --M- 00
 30: V 1C 000eee_e87e000 -> 2e87e000 --M- 00
 31: V 1C 33afd1_045f000 -> 2fe52000 --M- 10
 32: V 0C 000ccc_000 ->  --M- 00
 33: V 0C 000eee_e9a1000 -> 2e9a1000 --M- 00
 34: V 1C 33b304_8022000 -> 00f44000 --M- 10
 35: V 0C 000ccc_0503000 -> 00503000 --M- 00
 36: V 0C 33afd1_0744000 -> 2fe17000 --M- 10
 37: V 0C 000eee_c125000 -> 2c125000 --M- 00
 38: V 0C 33e7e1_0406000 -> 0078e000 --M- 11
 39: V 0C 000eee_e987000 -> 2e987000 --M- 00
 40: V 0C 000ccc_0008000 -> 8000 --M- 00
 41: V 0C 000ccc_03c9000 -> 003c9000 --M- 00
 42: V 1C 33ba7b_f8ea000 -> 005f9000 --M- 10
 43: V 1C 33afd1_040b000 -> 2ffe --M- 11
 44: V 0C 000ccc_03cc000 -> 003cc000 --M- 00
 45: V 0C 000eee_b68d000 -> 2b68d000 --M- 00
 46: V 1C 000eee_f40e000 -> 2f40e000 --M- 00
 47: V 0C 000eee_fa8f000 -> 2fa8f000 --M- 00
 48: V 0C 33afd1_041 -> 2fe4a000 --M- 10
 49: V 0C 000eee_f471000 -> 2f471000 --M- 00
 50: V 0C 000ccc_03f2000 -> 003f2000 --M- 00
 51: V 1C 000eee_f473000 -> 2f473000 --M- 00
 52: V 0C 000ccc_03f

Re: Inbound PCI and Memory Corruption

2013-06-23 Thread Benjamin Herrenschmidt
On Sun, 2013-06-23 at 20:47 -0700, Peter LaDow wrote:
> > 
> > 
> > On Jun 23, 2013, at 6:16 PM, Benjamin Herrenschmidt 
> >  wrote:
> >> Also dbl check that the MMU is indeed mapping all these pages with the
> >> "M" bit.
> > 
> > Just to be clear, do you mean the e1000 registers in PCI space? Or the RAM 
> > pages?

The RAM pages.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-06-23 Thread Peter LaDow
> 
> 
> On Jun 23, 2013, at 6:16 PM, Benjamin Herrenschmidt 
>  wrote:
>> Also dbl check that the MMU is indeed mapping all these pages with the
>> "M" bit.
> 
> Just to be clear, do you mean the e1000 registers in PCI space? Or the RAM 
> pages?
> 
> Thanks,
> Pete
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-06-23 Thread Benjamin Herrenschmidt
On Sun, 2013-06-23 at 17:56 -0700, Peter LaDow wrote:
> 
> On Jun 22, 2013, at 5:00 PM, Benjamin Herrenschmidt
>  wrote:
> 
> > On Fri, 2013-06-21 at 10:14 -0700, Peter LaDow wrote:
> >> 
> > Afaik e300 is slightly out of order, maybe it's missing a memory
> barrier
> > somewhere One thing to try is to add some to the dma_map/unmap
> ops.
> > 
> > Also audit the driver to ensure that it properly uses barriers when
> > populating descriptors (and maybe compare to a more recent version
> of
> > the driver upstream).
> Thanks for the tips.
> 
> I've been working with the folk at Intel on the e1000-dev list, and
> they did add memory barriers. And I've tried the latest e1000 drivers
> (direct from the e1000 tree) with no luck.
> 
> I've done PCI traces, and there is no DMA after the disable is written
> to the e1000 part. All I can think is that there may be posted writes,
> the kernel goes on to cleanup the DMA buffers. But there are write
> memory barriers, so I don't see how this is possible.
> 
> Are the memory barriers meaningful in single processor builds?

Yes. However they have no effect on posted writes by the chip. You
need to do an MMIO read for these to take effect.

Also dbl check that the MMU is indeed mapping all these pages with the
"M" bit.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-06-23 Thread Peter LaDow


On Jun 22, 2013, at 5:00 PM, Benjamin Herrenschmidt  
wrote:

> On Fri, 2013-06-21 at 10:14 -0700, Peter LaDow wrote:
>> 
> Afaik e300 is slightly out of order, maybe it's missing a memory barrier
> somewhere One thing to try is to add some to the dma_map/unmap ops.
> 
> Also audit the driver to ensure that it properly uses barriers when
> populating descriptors (and maybe compare to a more recent version of
> the driver upstream).
Thanks for the tips.

I've been working with the folk at Intel on the e1000-dev list, and they did 
add memory barriers. And I've tried the latest e1000 drivers (direct from the 
e1000 tree) with no luck.

I've done PCI traces, and there is no DMA after the disable is written to the 
e1000 part. All I can think is that there may be posted writes, the kernel goes 
on to cleanup the DMA buffers. But there are write memory barriers, so I don't 
see how this is possible.

Are the memory barriers meaningful in single processor builds?

Thanks,
Pete
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-06-22 Thread Benjamin Herrenschmidt
On Fri, 2013-06-21 at 10:14 -0700, Peter LaDow wrote:
> After a (finally!) successful search of the list archive, I did find
> this:
> 
> http://web.archiveorange.com/archive/v/9IQA26gPvdf4foaTcmCV
> 
> Which seems very related to my problem.  However, the patch that is
> ultimately referenced is in place in 3.0.80 (see
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2006-February/021267.html).
> 
> Hmmm...  perhaps our FDT is bad?

Afaik e300 is slightly out of order, maybe it's missing a memory barrier
somewhere One thing to try is to add some to the dma_map/unmap ops.

Also audit the driver to ensure that it properly uses barriers when
populating descriptors (and maybe compare to a more recent version of
the driver upstream).

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-06-21 Thread Peter LaDow
On Fri, Jun 21, 2013 at 9:56 AM, Peter LaDow  wrote:
> We are running into a case where we get memory corruption when an
> external PCI master writes to the processor.  We are using an MPC8349
> with an external Intel 82540EP (an E1000) part.  I've spent several
> weeks on the e1000 list trying to track down this problem, and after
> some digging I'm thinking the problem is somewhere other than the
> e1000 driver or the e1000 part.

After a (finally!) successful search of the list archive, I did find this:

http://web.archiveorange.com/archive/v/9IQA26gPvdf4foaTcmCV

Which seems very related to my problem.  However, the patch that is
ultimately referenced is in place in 3.0.80 (see
https://lists.ozlabs.org/pipermail/linuxppc-dev/2006-February/021267.html).

Hmmm...  perhaps our FDT is bad?

Thanks,
Pete
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Inbound PCI and Memory Corruption

2013-06-21 Thread Peter LaDow
I'm posting this to the ppc-dev since I think the problem may be
specific to the PPC kernel.

We are running into a case where we get memory corruption when an
external PCI master writes to the processor.  We are using an MPC8349
with an external Intel 82540EP (an E1000) part.  I've spent several
weeks on the e1000 list trying to track down this problem, and after
some digging I'm thinking the problem is somewhere other than the
e1000 driver or the e1000 part.

We are running 3.0.57-rt82, though I can reliably re-create this
problem with 3.0.80 (vanilla, no preempt).  Basically it involves
bringing down the e1000 interface while it is on an extremely busy
network (i.e. a ton of traffic incoming traffic into the e1000).  From
the kernel panics I've seen, it appears that the incoming traffic is
present in the corrupted memory regions.

I'm suspecting some issue related to the DMA'ing of packet data from
the e1000 into main memory.  I've traced the PCI traffic with a bus
analyzer, and the part stops DMA'ing when disabled, and the driver
disables the part before unmapping the ring buffers (i.e. calls to
dma_unmap_*).

Prior to using the e1000 part, we were using an e100 part (an 82551).
We never experienced this problem, but then again, the driver is
different.  Some initial digging shows that the DMA is setup
different.  The e100 driver uses pci_map_single, but the e1000 driver
uses dma_alloc_coherent.  I don't know much about the guts of the
kernel related to PCI and DMA, but based upon the PCI traces and other
debugging, it seems to point there.

Below is the kernel panic output when the failure occurs.

Does anyone have an idea of where I can look to try and debug this?

Unable to handle kernel paging request for data at address 0x20454a46
 Faulting instruction address: 0xc0069924
 Oops: Kernel access of bad area, sig: 11 [#1]
 PREEMPT PPC Platform
 Modules linked in:
 NIP: c0069924 LR: c021cce0 CTR: c000cecc
 REGS: ed4f1c60 TRAP: 0300   Not tainted  (3.0.80-rt108)
 MSR: 9032   CR: 24008248  XER: 
 DAR: 20454a46, DSISR: 2000
 TASK = eda46780[3106] 'ifconfig' THREAD: ed4f
 GPR00:  ed4f1d10 eda46780 20454a46 2d6fcc2a 05f2 0002 
 GPR08: eda46780 ed6fd228 ed4f1cd0 90b1  10084718 bfcceaec 10062044
 GPR16: 10062120 bfcceadc  bfcceac4 0228  8914 c01ac398
 GPR24: c01ac8c8 ed066520 0061 ed0663a0 ef0448f0  0001 ed575580
 NIP [c0069924] put_page+0x0/0x34
 LR [c021cce0] skb_release_data+0x78/0xc8
 Call Trace:
 [ed4f1d20] [c021c914] __kfree_skb+0x18/0xbc
 [ed4f1d30] [c01a7620] e1000_clean_rx_ring+0x10c/0x1a4
 [ed4f1d60] [c01a76e0] e1000_clean_all_rx_rings+0x28/0x54
 [ed4f1d70] [c01aac50] e1000_close+0x30/0xb4
 [ed4f1d90] [c0226e2c] __dev_close_many+0xa0/0xe0
 [ed4f1da0] [c0228c64] __dev_close+0x2c/0x4c
 [ed4f1dc0] [c0225224] __dev_change_flags+0xb8/0x140
 [ed4f1de0] [c0226d48] dev_change_flags+0x1c/0x60
 [ed4f1e00] [c027e7f8] devinet_ioctl+0x2a4/0x700
 [ed4f1e60] [c027f450] inet_ioctl+0xc8/0xfc
 [ed4f1e70] [c02147b0] sock_ioctl+0x260/0x2a0
 [ed4f1e90] [c009b468] vfs_ioctl+0x2c/0x58
 [ed4f1ea0] [c009bc44] do_vfs_ioctl+0x64c/0x6d4
 [ed4f1f10] [c009bd24] sys_ioctl+0x58/0x88
 [ed4f1f40] [c000e954] ret_from_syscall+0x0/0x38
 --- Exception: c01 at 0xff35a3c
 LR = 0xff359a0
 Instruction dump:
 7c0802a6 3c80c007 3884a500 90010024 38a10008 3800 90010008 4b0d
 80010024 38210020 7c0803a6 4e800020
 <8003> 7c691b78 700bc000 41a20008
 Kernel panic - not syncing: Fatal exception
 Call Trace:
 [ed4f1b90] [c0007ccc] show_stack+0x58/0x154 (unreliable)
 [ed4f1bd0] [c001d744] panic+0xb0/0x1d8
 [ed4f1c20] [c000b4b8] die+0x1ac/0x1d0
 [ed4f1c40] [c0011e38] bad_page_fault+0xe8/0xfc
 [ed4f1c50] [c000edf4] handle_page_fault+0x7c/0x80
 --- Exception: 300 at put_page+0x0/0x34
 LR = skb_release_data+0x78/0xc8
 [ed4f1d10] []   (null) (unreliable)
 [ed4f1d20] [c021c914] __kfree_skb+0x18/0xbc
 [ed4f1d30] [c01a7620] e1000_clean_rx_ring+0x10c/0x1a4
 [ed4f1d60] [c01a76e0] e1000_clean_all_rx_rings+0x28/0x54
 [ed4f1d70] [c01aac50] e1000_close+0x30/0xb4
 [ed4f1d90] [c0226e2c] __dev_close_many+0xa0/0xe0
 [ed4f1da0] [c0228c64] __dev_close+0x2c/0x4c
 [ed4f1dc0] [c0225224] __dev_change_flags+0xb8/0x140
 [ed4f1de0] [c0226d48] dev_change_flags+0x1c/0x60
 [ed4f1e00] [c027e7f8] devinet_ioctl+0x2a4/0x700
 [ed4f1e60] [c027f450] inet_ioctl+0xc8/0xfc
 [ed4f1e70] [c02147b0] sock_ioctl+0x260/0x2a0
 [ed4f1e90] [c009b468] vfs_ioctl+0x2c/0x58
 [ed4f1ea0] [c009bc44] do_vfs_ioctl+0x64c/0x6d4
 [ed4f1f10] [c009bd24] sys_ioctl+0x58/0x88
 [ed4f1f40] [c000e954] ret_from_syscall+0x0/0x38
 --- Exception: c01 at 0xff35a3c
 LR = 0xff359a0

When turning on SLAB checks, I see:

Slab corruption: size-16384 start=ed4ec000, len=16384
 690: 6b 6b ff ff ff ff ff ff b8 ac 6f 99 bf 8b 08 00
 6a0: 45 00 00 24 3f 34 00 00 80 11 ca cf 0a ca 0d 33
 6b0: 0a ca 0d ff 06 cc 06 cf 00 10 bc 1d c5 0b 40 01
 6c0: 00 10 00 33 00 00 00 00 00 00 00 00 00 00 3f dd
 6d0: ed f8 6b 6b