Re: Inbound PCI and Memory Corruption
On Wed, Jul 24, 2013 at 11:13 PM, Peter LaDow wrote: > There are other items, such as drivers for our custom hardware modules > implemented on the FPGA. Perhaps I'll pull our drivers and run a > stock kernel. Maybe a stock 83xx configuration (such as the > MPC8349E-MITX). If we have problems even on a stock configuration... Well, that didn't work either. Unfortunately, the PCI slot on our MPC8349E-MITX eval kit doesn't work. It doesn't matter what card I plug into that slot neither uboot, nor the kernel, recognize anything. But I did have one thought. Is it possible that somehow the configured incoming PCI regions are marked as pre-fetchable, and the e1000 is prefetching the descriptors? Then at some later point the kernel changes things with the e1000 unaware? Thanks, Pete ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Inbound PCI and Memory Corruption
On Wed, Jul 24, 2013 at 3:08 PM, Benjamin Herrenschmidt wrote: > No, they resolve to the same thing under the hood. Did you do other > changes ? Could it be another unrelated kernel bug causing something > like use-after-free of network buffer or similar oddity unrelated to the > network driver ? There are other items, such as drivers for our custom hardware modules implemented on the FPGA. Perhaps I'll pull our drivers and run a stock kernel. Maybe a stock 83xx configuration (such as the MPC8349E-MITX). If we have problems even on a stock configuration... > Have you tried with different kernel versions ? Funny you mention it. I just tried 3.10.2 today and we still get the same memory corruption. I was hoping that perhaps something had changed between 3.0 and 3.10 that might clear up the problem, and then I could bisect to find where it failed. But unfortunately, 3.10.2 exhibits the same issue. So clearly this isn't an issue specific to the kernel version. Though the e1000 driver looks largely unchanged in 3.10. So if the problem is driver related, it would still be there. Thanks, Pete ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Inbound PCI and Memory Corruption
On Wed, 2013-07-24 at 08:39 -0700, Peter LaDow wrote: > A bit of history that may help. We were using an e100 (an 82559) > part, but Intel EOL'd that part so we picked up the 82540EP (which > they have also recently EOL'd). The e100 driver uses a different DMA > model. It uses pci_map_single/pci_unmap_single along with > pci_dma_sync_single_for* calls (as well as other PCI calls). The > e1000 driver, however, does not use the pci_* calls. We have never > had a problem with the e100 parts. I don't suppose the use of > pci_map_* vs dma_map_* makes a difference does it? No, they resolve to the same thing under the hood. Did you do other changes ? Could it be another unrelated kernel bug causing something like use-after-free of network buffer or similar oddity unrelated to the network driver ? Have you tried with different kernel versions ? Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Inbound PCI and Memory Corruption
On Tue, Jul 23, 2013 at 9:27 PM, Benjamin Herrenschmidt wrote: > CONFIG_NOT_COHERENT_CACHE will do it for you (in > arch/powerpc/kernel/dma.c) provided the driver does the right things vs. > the DMA accessors but afaik e1000 does. Well, when I went to make the changes I noted a few things. First, the e1000 driver does a dma_unmap_single() prior to processing the descriptor. So it would seem that the dma_sync_single_for_cpu() isn't necessary in that case. And when allocating descriptors, it does dma_map_single() after setting up the descriptor, so dma_sync_single_for_device() probably isn't necessary either. But regardless, I put in the dma_sync_single_* calls and we still get the same behavior. So, even with CONFIG_NOT_COHERENT_CACHE we are getting this error. > If that helps, that might hint at either a missing barrier or some kind > of HW (or HW configuration) bug with cache coherency. And unfortunately it didn't help. We have a few other things we are trying, but I'm not hopeful that any will change the behavior. A bit of history that may help. We were using an e100 (an 82559) part, but Intel EOL'd that part so we picked up the 82540EP (which they have also recently EOL'd). The e100 driver uses a different DMA model. It uses pci_map_single/pci_unmap_single along with pci_dma_sync_single_for* calls (as well as other PCI calls). The e1000 driver, however, does not use the pci_* calls. We have never had a problem with the e100 parts. I don't suppose the use of pci_map_* vs dma_map_* makes a difference does it? Thanks, Pete ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Inbound PCI and Memory Corruption
> On Fri, Jul 19, 2013 at 6:46 AM, Gerhard Sittig wrote: > > So: No, not having to fiddle with DMA stuff when doing PCI need > > not be a problem, it's actually expected. But since a DMA engine > > might be involved (that's just not under your command), the > > accompanying problems may arise. You may need to flush CPU > > provided data upon write before telling an external entity to > > access it, and may need to invalidate caches (to have data > > re-fetched) before the CPU accesses what an external entity did > > manipulate. And this applies to both payload data as well as > > management data (descriptors) if the latter apply to the former. > > This is something I've been exploring today. But what is unclear is > _how_ to flush/invalidate the caches'. I was going to tweak the > driver to setup the descriptors, flush the cache, then enable the > hardware (and when taking the device down, disable the hardware, flush > the cache, then deallocate the descriptors). But this is in the > network code and it isn't obvious how to make this happen. FWIW it is almost impossible to code for non-coherent descriptors (even ignoring problems with speculative cache line reads). You don't even want to try to do it except for hardware where you can no choice. The problem is that you have no control over the device writes into the descriptors. In order not to lose the device writes the cpu must not write to any cache lines that contain active descriptors. For the receive side this can be arranged by initialising cache line sized blocks of descriptors (if the cache line write isn't atomic you still have problems). The send side is much more tricky: you either have to setup a full cache line of descriptors or wait until the transmit is idle. David ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Inbound PCI and Memory Corruption
On Tue, 2013-07-23 at 21:22 -0700, Peter LaDow wrote: > On Fri, Jul 19, 2013 at 6:46 AM, Gerhard Sittig wrote: > > So: No, not having to fiddle with DMA stuff when doing PCI need > > not be a problem, it's actually expected. But since a DMA engine > > might be involved (that's just not under your command), the > > accompanying problems may arise. You may need to flush CPU > > provided data upon write before telling an external entity to > > access it, and may need to invalidate caches (to have data > > re-fetched) before the CPU accesses what an external entity did > > manipulate. And this applies to both payload data as well as > > management data (descriptors) if the latter apply to the former. > > This is something I've been exploring today. But what is unclear is > _how_ to flush/invalidate the caches'. I was going to tweak the > driver to setup the descriptors, flush the cache, then enable the > hardware (and when taking the device down, disable the hardware, flush > the cache, then deallocate the descriptors). But this is in the > network code and it isn't obvious how to make this happen. CONFIG_NOT_COHERENT_CACHE will do it for you (in arch/powerpc/kernel/dma.c) provided the driver does the right things vs. the DMA accessors but afaik e1000 does. The problem with that is we never "officially" supported that option of non-coherent cache (non-coherent DMA) on any of the "S" processors (including 603 aka e300) because first they are supposed to be used in coherent fabrics, but also because the code somewhat assumes that your CPU won't suddenly prefetch stuff back into the cache at any time. The 603 does some amount of speculative prefech, so potentially might pollute the cache. But it's still worth trying out. If that helps, that might hint at either a missing barrier or some kind of HW (or HW configuration) bug with cache coherency. > I think I figured something out. Basically, in the receive interrupt, > prior to reading the data in the descriptor, I call > dma_sync_single_for_cpu(). Then the driver can continue to process > the data, then unmap the DMA region (with dma_unmap_single() ). When > setting up the descriptors, after calling dma_map_single(), > configuring the descriptor, I then call dma_sync_single_for_device(). > Does this look correct? Yes. > However, on the PPC platforms, these calls (dma_sync_*) are NOPs > unless CONFIG_NOT_COHERENT_CACHE is defined (which it doesn't appear > to be for the 8349). So I tweaked the Kconfig to enable > CONFIG_NOT_COHERENT. Things built ok, but I'm not sure if this is > sufficient to invoke the cache flush necessary. > > Am I on the right track? Well, they are supposed to be nops ... that's the thing. Because afaik, anything built on a 603 core is *supposed* to be coherent (though those NOPs should at least be memory barriers imho). In any case, let us know if that helps. Cheers, Ben. > Thanks, > Pete ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Inbound PCI and Memory Corruption
On Fri, Jul 19, 2013 at 6:46 AM, Gerhard Sittig wrote: > So: No, not having to fiddle with DMA stuff when doing PCI need > not be a problem, it's actually expected. But since a DMA engine > might be involved (that's just not under your command), the > accompanying problems may arise. You may need to flush CPU > provided data upon write before telling an external entity to > access it, and may need to invalidate caches (to have data > re-fetched) before the CPU accesses what an external entity did > manipulate. And this applies to both payload data as well as > management data (descriptors) if the latter apply to the former. This is something I've been exploring today. But what is unclear is _how_ to flush/invalidate the caches'. I was going to tweak the driver to setup the descriptors, flush the cache, then enable the hardware (and when taking the device down, disable the hardware, flush the cache, then deallocate the descriptors). But this is in the network code and it isn't obvious how to make this happen. I think I figured something out. Basically, in the receive interrupt, prior to reading the data in the descriptor, I call dma_sync_single_for_cpu(). Then the driver can continue to process the data, then unmap the DMA region (with dma_unmap_single() ). When setting up the descriptors, after calling dma_map_single(), configuring the descriptor, I then call dma_sync_single_for_device(). Does this look correct? However, on the PPC platforms, these calls (dma_sync_*) are NOPs unless CONFIG_NOT_COHERENT_CACHE is defined (which it doesn't appear to be for the 8349). So I tweaked the Kconfig to enable CONFIG_NOT_COHERENT. Things built ok, but I'm not sure if this is sufficient to invoke the cache flush necessary. Am I on the right track? Thanks, Pete ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Inbound PCI and Memory Corruption
On Thu, Jul 18, 2013 at 4:30 PM, Peter LaDow wrote: > > It does seem that for incoming PCI transactions the Freescale DMA > engine is not used. And in our device tree we have the DMA engine > commented out. That is, the "fsl,mpc8349-dma" and "fsl,elo-dma" > compatible items are not present in the FDT. This is the standard on-chip DMA engine used (primarily) as an off-loaded memcpy. I've never seen it used for anything related to PCI. You can remove the DMA nodes from the device tree and see if that fixes anything. If it does, then it might be the DMA offload from the network layer that's causing the problems. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Inbound PCI and Memory Corruption
On 07/18/2013 05:02:33 PM, Benjamin Herrenschmidt wrote: On Thu, 2013-07-18 at 14:30 -0700, Peter LaDow wrote: > We are still stumped on this one, but during a review of the system > setup one thing came up that we aren't sure about is the device tree > and the DMA engine. > > It does seem that for incoming PCI transactions the Freescale DMA > engine is not used. And in our device tree we have the DMA engine > commented out. That is, the "fsl,mpc8349-dma" and "fsl,elo-dma" > compatible items are not present in the FDT. > > I don't suppose this could be a problem? I doubt it but somebody from FSL might be able to give a better answer. The DMA engine is not related to inbound PCI transactions. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Inbound PCI and Memory Corruption
On Thu, Jul 18, 2013 at 14:30 -0700, Peter LaDow wrote: > > We are still stumped on this one, but during a review of the system > setup one thing came up that we aren't sure about is the device tree > and the DMA engine. > > It does seem that for incoming PCI transactions the Freescale DMA > engine is not used. And in our device tree we have the DMA engine > commented out. That is, the "fsl,mpc8349-dma" and "fsl,elo-dma" > compatible items are not present in the FDT. > > I don't suppose this could be a problem? Can't tell whether it helps or whether I'm telling you what's already known, but here we go: Some Freescale SoC's don't have "_the_ DMA", but instead several of them. Many peripherals have a DMA engine of their own, which you won't notice as a separate entity from the software POV (typically ethernet, USB, PCI, partially video in and out, even coprocessor may have dedicated DMA engines which they might take care of themselves). The thing that you do see (in the device tree, as a software controllable entity) is the "general purpose DMA" with user servicable channels. This one is may be used for serial communication via UART or SPI, or SDHC/MMC, or peripherals attached to the EMB. Sometimes it's called "DMA2" to reflect that there are others as well. So: No, not having to fiddle with DMA stuff when doing PCI need not be a problem, it's actually expected. But since a DMA engine might be involved (that's just not under your command), the accompanying problems may arise. You may need to flush CPU provided data upon write before telling an external entity to access it, and may need to invalidate caches (to have data re-fetched) before the CPU accesses what an external entity did manipulate. And this applies to both payload data as well as management data (descriptors) if the latter apply to the former. HTH virtually yours Gerhard Sittig -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr. 5, D-82194 Groebenzell, Germany Phone: +49-8142-66989-0 Fax: +49-8142-66989-80 Email: off...@denx.de ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Inbound PCI and Memory Corruption
On Thu, 2013-07-18 at 14:30 -0700, Peter LaDow wrote: > We are still stumped on this one, but during a review of the system > setup one thing came up that we aren't sure about is the device tree > and the DMA engine. > > It does seem that for incoming PCI transactions the Freescale DMA > engine is not used. And in our device tree we have the DMA engine > commented out. That is, the "fsl,mpc8349-dma" and "fsl,elo-dma" > compatible items are not present in the FDT. > > I don't suppose this could be a problem? I doubt it but somebody from FSL might be able to give a better answer. I'm personally at a loss. It looks like you are doing everything right from what I can tell. That leaves us with some kind of oddball driver bug or a problem with the low level configuration of the PCIe bridge or the chip internal bus related to cache coherency maybe. Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Inbound PCI and Memory Corruption
We are still stumped on this one, but during a review of the system setup one thing came up that we aren't sure about is the device tree and the DMA engine. It does seem that for incoming PCI transactions the Freescale DMA engine is not used. And in our device tree we have the DMA engine commented out. That is, the "fsl,mpc8349-dma" and "fsl,elo-dma" compatible items are not present in the FDT. I don't suppose this could be a problem? Thanks, Pete ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Inbound PCI and Memory Corruption
On Wed, Jul 10, 2013 at 2:40 PM, Benjamin Herrenschmidt wrote: > Did you get any traces that show the flow that happens around a case of > corruption ? Well, I captured a lot of data, both logging kernel output and capturing PCI traffic. I've put the full console log output on pastebin at http://pastebin.com/ZFYbneNR The initial corruption is a starting address of 0xe94f17f8. Looking at the dumped data: Slab corruption: fib6_nodes start=e94f17f8, len=32 Redzone: 0x9f911029d74e35b/0xd4bed90f1c6f0806. Last user: [<06040001>](0x6040001) 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b ff ff ff ff ff ff Prev obj: start=e94f17c0, len=32 Redzone: 0x9f911029d74e35b/0x9f911029d74e35b. Last user: [< (null)>](0x0) 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 Next obj: start=e94f1830, len=32 Redzone: 0xd4bed90f1c6f0aca/0xafba11029d74e35b. Last user: [< (null)>](0x0) 000: 0d 5b 00 00 00 00 00 00 0a ca 0d 01 00 00 00 00 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 bd 3e The first corrupted byte is at address 0xe94f1802. Looking at the dump of all the DMA mappings this range is never mapped. Nor is there a single PCI write to this mapped address either. However, I did find some correlation to a PCI write to a near address. From the PCI capture: Command | Address | Data | /BE Mem Wr | 294F1810 | | | | | 0011 | | | | | 0FD9BED4 | | | 06086F1C | | | 00080100 | | | 01000406 | | | 0FD9BED4 | | | CA0A6F1C | | | 5B0D | | | | | | 010DCA0A | | | | | | | | | | | | | | | 3EBD | 1100 The data in this write looks very much like the pattern in the detected slab corruption. Looking at the PCI trace, it doesn't appear to be the incoming PCI data (unless the PCI Inbound Address Translation registers are misconfigured). Yet clearly these are corrupted with ethernet traffic. Thanks, Pete ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Inbound PCI and Memory Corruption
On Wed, Jul 10, 2013 at 2:40 PM, Benjamin Herrenschmidt wrote: > Well, it should work, I tried forcing NET_IP_ALIGN to 0, and I did see the DMA accesses align on 32-bit boundaries with all the byte enables set. However, the memory corruption still occurred. > but it's possible that there is some subtle bug on this specific Freescale > SoC I looked through the Freescale errata (http://www.freescale.com/files/32bit/doc/errata/MPC8349ECE.pdf) and only 2 seem relevant: PCI19 and DMA2 (the rest are either fixed in our core version 3.1). PCI19: When using a dual-address cycle for inbound write accesses with then IOS is full, the PCI overwrites the address for the IOS with the new address from the bus. DMA2: There can be corruption of the DMA data. Examples are DAHTS is 8 bytes and the source port is a 32-bit PCI bus or the source memory space i son the PCI bus and is not prefetchable. I don't think PCI19 applies since no dual-address cycles are generated. From what I've seen, all the DMA addresses in the RX ring descriptors are in the lower 32-bit address space. I don't think DMA2 applies because it is for the DMA controller specific to the 8349. And since these transactions are not setup or managed by the DMA controller... At least I don't think they are (unless dma_alloc and dma_map_single do something related to this). My understanding is that in this case the PCI inbound registers are configured and the DMA controller is not used. > ...Did you correlate the corruption with one such packet ? > > Did you get any traces that show the flow that happens around a case of > corruption ? Not yet. I'm having a difficult time syncing the PCI trace with the kernel debug output. And since the corruption may be detected well after the actual corruption occurs, determine which DMA transfer caused a corruption is difficult. I'm still trying to gather more information. Thanks, Pete ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Inbound PCI and Memory Corruption
On Wed, 2013-07-10 at 14:06 -0700, Peter LaDow wrote: > I have a bit more information, but I'm not sure of the impact. So far > I have been dump lots of debugging output trying to determine where > this memory corruption could be coming from. I've sprinkled the > driver with wmb() (near every DMA function and the hardware IO), loads > of printk's to get the DMA addresses, and lots and lots of PCI traces. > > One things that I noticed is that the addresses programmed into the > descriptor ring for the E1000 are not 32-bit aligned. The E1000 part > is aligning the transfers, and use the BE's to mask off bytes. Is > there an issue with the PPC (notably the MPC8349) with incoming PCI > transactions that are 32-bit word aligned but write less than a full > word? Well, it should work, but it's possible that there is some subtle bug on this specific Freescale SoC Did you correlate the corruption with one such packet ? Did you get any traces that show the flow that happens around a case of corruption ? Ben. > In looking at the PCI trace, all the DMA's of packets from the E1000 > start at a 32-bit aligned address, but the first and last words are > not full word writes. For example (probably need a fixed font to > view): > > Command | Address | Data | /BE > Mem Wr | 2950D180 | | > | 0011 > | > DBA24DF0 | > 00085F19 | > 2424 | > C530 | > 80D81180 | > F10DCA0A | > FF0DCA0A | > CF06CC06 | > A1BA1000 | > 01400BC5 | > F1001000 | > | > | > 6873 | > 0F22 | 1100 > > Note that the first word is only a 16-bit transfer (in the upper half) > and the last is only 16-bits (in the lower half). And I dumped the > descriptors and here's what is read (via DMA): > > Command | Address | Data | /BE > Mem Rd | 2A2A72F0 | | > 2950D812 | > | > C8C70040 | > | > > Note that the descriptor programmed into the part has a DMA address > that is not word aligned. And the E1000 part sets the proper byte > enables and does a write to the aligned address of 0x2850D180. > > Is there any traction on this idea? > > Thanks, > Pete ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Inbound PCI and Memory Corruption
I have a bit more information, but I'm not sure of the impact. So far I have been dump lots of debugging output trying to determine where this memory corruption could be coming from. I've sprinkled the driver with wmb() (near every DMA function and the hardware IO), loads of printk's to get the DMA addresses, and lots and lots of PCI traces. One things that I noticed is that the addresses programmed into the descriptor ring for the E1000 are not 32-bit aligned. The E1000 part is aligning the transfers, and use the BE's to mask off bytes. Is there an issue with the PPC (notably the MPC8349) with incoming PCI transactions that are 32-bit word aligned but write less than a full word? In looking at the PCI trace, all the DMA's of packets from the E1000 start at a 32-bit aligned address, but the first and last words are not full word writes. For example (probably need a fixed font to view): Command | Address | Data | /BE Mem Wr | 2950D180 | | | 0011 | DBA24DF0 | 00085F19 | 2424 | C530 | 80D81180 | F10DCA0A | FF0DCA0A | CF06CC06 | A1BA1000 | 01400BC5 | F1001000 | | | 6873 | 0F22 | 1100 Note that the first word is only a 16-bit transfer (in the upper half) and the last is only 16-bits (in the lower half). And I dumped the descriptors and here's what is read (via DMA): Command | Address | Data | /BE Mem Rd | 2A2A72F0 | | 2950D812 | | C8C70040 | | Note that the descriptor programmed into the part has a DMA address that is not word aligned. And the E1000 part sets the proper byte enables and does a write to the aligned address of 0x2850D180. Is there any traction on this idea? Thanks, Pete ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Inbound PCI and Memory Corruption
On Sat, Jun 22, 2013 at 5:00 PM, Benjamin Herrenschmidt wrote: > Afaik e300 is slightly out of order, maybe it's missing a memory barrier > somewhere One thing to try is to add some to the dma_map/unmap ops. I went through the driver and added memory barriers to the dma_map_page/dma_unmap_page and dma_alloc_coherent/dma_free_coherent calls (wmb() calls after each, which resolves to a sync instruction). I still get a kernel panic. I did turn on DEBUG_PAGE_ALLOC to try and get more information, but I'm not finding anything new. However, with the SLAB debugging I do find SLAB corruption, e.g.: Slab corruption: fib6_nodes start=e900c7f8, len=32 Redzone: 0x9f911029d74e35b/0x30a706a6050806. Last user: [<06040001>](0x6040001) 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b ff ff ff ff ff ff Prev obj: start=e900c7c0, len=32 Redzone: 0x9f911029d74e35b/0x9f911029d74e35b. Last user: [< (null)>](0x0) 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 Next obj: start=e900c830, len=32 Redzone: 0x30a706a6050aca/0xc8be11029d74e35b. Last user: [< (null)>](0x0) 000: 0d aa 00 00 00 00 00 00 0a ca 0d 49 00 00 00 00 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 75 8b Which is clearly corrupted with ethernet frames. The only interface connected is the e1000. Eventually this corruption leads to a kernel panic. I'm completely confused on how this could happen. Given the M bit is set for all pages (see below), and with memory barriers on the DMA map/unmap and register operations, the only thing I can think of is something in the IO sequencer (which was suggested in the link I gave earlier). Yet the patch mentioned is in place. > Also audit the driver to ensure that it properly uses barriers when > populating descriptors (and maybe compare to a more recent version of > the driver upstream). I've gone through the driver and didn't see anything missing. And the upstream (v3.10-rc5) driver is the same version (7.3.21-k8-NAPI). And I've used the latest from the e1000 release (8.0.35-NAPI), and I get the same problem. On Sun, Jun 23, 2013 at 6:16 PM, Benjamin Herrenschmidt wrote: > Also dbl check that the MMU is indeed mapping all these pages with the > "M" bit. The DBAT's have the M bit set (both have 0x12 in the DBATxL registers)...sometimes. Usually when I halt the CPU and dumps the BAT's, all the IBAT's and DBAT's have zeros. But occasionally I see DBAT2 and DBAT3 with values and the M bit set. I also dumped all the TLB entries, and every one of them has the M bit set (see below). TLB dump: BDI>dtlb 0 63 IDX V RC VSID VPIRPN WIMG PP 0: V 0C 000eee_e9a -> 2e9a --M- 00 1: V 0C 000eee_f401000 -> 2f401000 --M- 00 2: V 1C 000ccc_0502000 -> 00502000 --M- 00 3: V 0C 000eee_f403000 -> 2f403000 --M- 00 4: V 0C 000eee_c124000 -> 2c124000 --M- 00 5: V 0C 000eee_f405000 -> 2f405000 --M- 00 6: V 0C 000eee_e9e6000 -> 2e9e6000 --M- 00 7: V 0C 33afd1_0427000 -> 005f8000 --M- 10 8: V 0C 33afd1_0428000 -> 2ff63000 --M- 10 9: V 0C 000ccc_0349000 -> 00349000 --M- 00 10: V 1C 000ccc_03ca000 -> 003ca000 --M- 00 11: V 1C 000ccc_03cb000 -> 003cb000 --M- 00 12: V 0C 33afd1_040c000 -> 003b4000 --M- 11 13: V 0C 000eee_f40d000 -> 2f40d000 --M- 00 14: V 1C 000eee_fa8e000 -> 2fa8e000 --M- 00 15: V 0- 33afd1_034f000 -> 2e6b1000 --M- 11 16: V 0C 000eee_f47 -> 2f47 --M- 00 17: V 0C 33afd1_0411000 -> 2fe54000 --M- 10 18: V 0C 000eee_f4b2000 -> 2f4b2000 --M- 00 19: V 1C 33eb14_8073000 -> 00462000 --M- 10 20: V 0C 000ccc_02f4000 -> 002f4000 --M- 00 21: V 0C 000eee_f415000 -> 2f415000 --M- 00 22: V 1C 000ccc_03f6000 -> 003f6000 --M- 00 23: V 0C 000ccc_02f7000 -> 002f7000 --M- 00 24: V 1C 000ccc_03f8000 -> 003f8000 --M- 00 25: V 0C 000ccc_03d9000 -> 003d9000 --M- 00 26: V 1C 33b304_a31a000 -> 007f4000 --M- 10 27: V 1C 000ccc_03fb000 -> 003fb000 --M- 00 28: V 1C 000ccc_03fc000 -> 003fc000 --M- 00 29: V 0C 000eee_f41d000 -> 2f41d000 --M- 00 30: V 1C 000eee_e87e000 -> 2e87e000 --M- 00 31: V 1C 33afd1_045f000 -> 2fe52000 --M- 10 32: V 0C 000ccc_000 -> --M- 00 33: V 0C 000eee_e9a1000 -> 2e9a1000 --M- 00 34: V 1C 33b304_8022000 -> 00f44000 --M- 10 35: V 0C 000ccc_0503000 -> 00503000 --M- 00 36: V 0C 33afd1_0744000 -> 2fe17000 --M- 10 37: V 0C 000eee_c125000 -> 2c125000 --M- 00 38: V 0C 33e7e1_0406000 -> 0078e000 --M- 11 39: V 0C 000eee_e987000 -> 2e987000 --M- 00 40: V 0C 000ccc_0008000 -> 8000 --M- 00 41: V 0C 000ccc_03c9000 -> 003c9000 --M- 00 42: V 1C 33ba7b_f8ea000 -> 005f9000 --M- 10 43: V 1C 33afd1_040b000 -> 2ffe --M- 11 44: V 0C 000ccc_03cc000 -> 003cc000 --M- 00 45: V 0C 000eee_b68d000 -> 2b68d000 --M- 00 46: V 1C 000eee_f40e000 -> 2f40e000 --M- 00 47: V 0C 000eee_fa8f000 -> 2fa8f000 --M- 00 48: V 0C 33afd1_041 -> 2fe4a000 --M- 10 49: V 0C 000eee_f471000 -> 2f471000 --M- 00 50: V 0C 000ccc_03f2000 -> 003f2000 --M- 00 51: V 1C 000eee_f473000 -> 2f473000 --M- 00 52: V 0C 000ccc_03f
Re: Inbound PCI and Memory Corruption
On Sun, 2013-06-23 at 20:47 -0700, Peter LaDow wrote: > > > > > > On Jun 23, 2013, at 6:16 PM, Benjamin Herrenschmidt > > wrote: > >> Also dbl check that the MMU is indeed mapping all these pages with the > >> "M" bit. > > > > Just to be clear, do you mean the e1000 registers in PCI space? Or the RAM > > pages? The RAM pages. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Inbound PCI and Memory Corruption
> > > On Jun 23, 2013, at 6:16 PM, Benjamin Herrenschmidt > wrote: >> Also dbl check that the MMU is indeed mapping all these pages with the >> "M" bit. > > Just to be clear, do you mean the e1000 registers in PCI space? Or the RAM > pages? > > Thanks, > Pete ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Inbound PCI and Memory Corruption
On Sun, 2013-06-23 at 17:56 -0700, Peter LaDow wrote: > > On Jun 22, 2013, at 5:00 PM, Benjamin Herrenschmidt > wrote: > > > On Fri, 2013-06-21 at 10:14 -0700, Peter LaDow wrote: > >> > > Afaik e300 is slightly out of order, maybe it's missing a memory > barrier > > somewhere One thing to try is to add some to the dma_map/unmap > ops. > > > > Also audit the driver to ensure that it properly uses barriers when > > populating descriptors (and maybe compare to a more recent version > of > > the driver upstream). > Thanks for the tips. > > I've been working with the folk at Intel on the e1000-dev list, and > they did add memory barriers. And I've tried the latest e1000 drivers > (direct from the e1000 tree) with no luck. > > I've done PCI traces, and there is no DMA after the disable is written > to the e1000 part. All I can think is that there may be posted writes, > the kernel goes on to cleanup the DMA buffers. But there are write > memory barriers, so I don't see how this is possible. > > Are the memory barriers meaningful in single processor builds? Yes. However they have no effect on posted writes by the chip. You need to do an MMIO read for these to take effect. Also dbl check that the MMU is indeed mapping all these pages with the "M" bit. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Inbound PCI and Memory Corruption
On Jun 22, 2013, at 5:00 PM, Benjamin Herrenschmidt wrote: > On Fri, 2013-06-21 at 10:14 -0700, Peter LaDow wrote: >> > Afaik e300 is slightly out of order, maybe it's missing a memory barrier > somewhere One thing to try is to add some to the dma_map/unmap ops. > > Also audit the driver to ensure that it properly uses barriers when > populating descriptors (and maybe compare to a more recent version of > the driver upstream). Thanks for the tips. I've been working with the folk at Intel on the e1000-dev list, and they did add memory barriers. And I've tried the latest e1000 drivers (direct from the e1000 tree) with no luck. I've done PCI traces, and there is no DMA after the disable is written to the e1000 part. All I can think is that there may be posted writes, the kernel goes on to cleanup the DMA buffers. But there are write memory barriers, so I don't see how this is possible. Are the memory barriers meaningful in single processor builds? Thanks, Pete ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Inbound PCI and Memory Corruption
On Fri, 2013-06-21 at 10:14 -0700, Peter LaDow wrote: > After a (finally!) successful search of the list archive, I did find > this: > > http://web.archiveorange.com/archive/v/9IQA26gPvdf4foaTcmCV > > Which seems very related to my problem. However, the patch that is > ultimately referenced is in place in 3.0.80 (see > https://lists.ozlabs.org/pipermail/linuxppc-dev/2006-February/021267.html). > > Hmmm... perhaps our FDT is bad? Afaik e300 is slightly out of order, maybe it's missing a memory barrier somewhere One thing to try is to add some to the dma_map/unmap ops. Also audit the driver to ensure that it properly uses barriers when populating descriptors (and maybe compare to a more recent version of the driver upstream). Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Inbound PCI and Memory Corruption
On Fri, Jun 21, 2013 at 9:56 AM, Peter LaDow wrote: > We are running into a case where we get memory corruption when an > external PCI master writes to the processor. We are using an MPC8349 > with an external Intel 82540EP (an E1000) part. I've spent several > weeks on the e1000 list trying to track down this problem, and after > some digging I'm thinking the problem is somewhere other than the > e1000 driver or the e1000 part. After a (finally!) successful search of the list archive, I did find this: http://web.archiveorange.com/archive/v/9IQA26gPvdf4foaTcmCV Which seems very related to my problem. However, the patch that is ultimately referenced is in place in 3.0.80 (see https://lists.ozlabs.org/pipermail/linuxppc-dev/2006-February/021267.html). Hmmm... perhaps our FDT is bad? Thanks, Pete ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Inbound PCI and Memory Corruption
I'm posting this to the ppc-dev since I think the problem may be specific to the PPC kernel. We are running into a case where we get memory corruption when an external PCI master writes to the processor. We are using an MPC8349 with an external Intel 82540EP (an E1000) part. I've spent several weeks on the e1000 list trying to track down this problem, and after some digging I'm thinking the problem is somewhere other than the e1000 driver or the e1000 part. We are running 3.0.57-rt82, though I can reliably re-create this problem with 3.0.80 (vanilla, no preempt). Basically it involves bringing down the e1000 interface while it is on an extremely busy network (i.e. a ton of traffic incoming traffic into the e1000). From the kernel panics I've seen, it appears that the incoming traffic is present in the corrupted memory regions. I'm suspecting some issue related to the DMA'ing of packet data from the e1000 into main memory. I've traced the PCI traffic with a bus analyzer, and the part stops DMA'ing when disabled, and the driver disables the part before unmapping the ring buffers (i.e. calls to dma_unmap_*). Prior to using the e1000 part, we were using an e100 part (an 82551). We never experienced this problem, but then again, the driver is different. Some initial digging shows that the DMA is setup different. The e100 driver uses pci_map_single, but the e1000 driver uses dma_alloc_coherent. I don't know much about the guts of the kernel related to PCI and DMA, but based upon the PCI traces and other debugging, it seems to point there. Below is the kernel panic output when the failure occurs. Does anyone have an idea of where I can look to try and debug this? Unable to handle kernel paging request for data at address 0x20454a46 Faulting instruction address: 0xc0069924 Oops: Kernel access of bad area, sig: 11 [#1] PREEMPT PPC Platform Modules linked in: NIP: c0069924 LR: c021cce0 CTR: c000cecc REGS: ed4f1c60 TRAP: 0300 Not tainted (3.0.80-rt108) MSR: 9032 CR: 24008248 XER: DAR: 20454a46, DSISR: 2000 TASK = eda46780[3106] 'ifconfig' THREAD: ed4f GPR00: ed4f1d10 eda46780 20454a46 2d6fcc2a 05f2 0002 GPR08: eda46780 ed6fd228 ed4f1cd0 90b1 10084718 bfcceaec 10062044 GPR16: 10062120 bfcceadc bfcceac4 0228 8914 c01ac398 GPR24: c01ac8c8 ed066520 0061 ed0663a0 ef0448f0 0001 ed575580 NIP [c0069924] put_page+0x0/0x34 LR [c021cce0] skb_release_data+0x78/0xc8 Call Trace: [ed4f1d20] [c021c914] __kfree_skb+0x18/0xbc [ed4f1d30] [c01a7620] e1000_clean_rx_ring+0x10c/0x1a4 [ed4f1d60] [c01a76e0] e1000_clean_all_rx_rings+0x28/0x54 [ed4f1d70] [c01aac50] e1000_close+0x30/0xb4 [ed4f1d90] [c0226e2c] __dev_close_many+0xa0/0xe0 [ed4f1da0] [c0228c64] __dev_close+0x2c/0x4c [ed4f1dc0] [c0225224] __dev_change_flags+0xb8/0x140 [ed4f1de0] [c0226d48] dev_change_flags+0x1c/0x60 [ed4f1e00] [c027e7f8] devinet_ioctl+0x2a4/0x700 [ed4f1e60] [c027f450] inet_ioctl+0xc8/0xfc [ed4f1e70] [c02147b0] sock_ioctl+0x260/0x2a0 [ed4f1e90] [c009b468] vfs_ioctl+0x2c/0x58 [ed4f1ea0] [c009bc44] do_vfs_ioctl+0x64c/0x6d4 [ed4f1f10] [c009bd24] sys_ioctl+0x58/0x88 [ed4f1f40] [c000e954] ret_from_syscall+0x0/0x38 --- Exception: c01 at 0xff35a3c LR = 0xff359a0 Instruction dump: 7c0802a6 3c80c007 3884a500 90010024 38a10008 3800 90010008 4b0d 80010024 38210020 7c0803a6 4e800020 <8003> 7c691b78 700bc000 41a20008 Kernel panic - not syncing: Fatal exception Call Trace: [ed4f1b90] [c0007ccc] show_stack+0x58/0x154 (unreliable) [ed4f1bd0] [c001d744] panic+0xb0/0x1d8 [ed4f1c20] [c000b4b8] die+0x1ac/0x1d0 [ed4f1c40] [c0011e38] bad_page_fault+0xe8/0xfc [ed4f1c50] [c000edf4] handle_page_fault+0x7c/0x80 --- Exception: 300 at put_page+0x0/0x34 LR = skb_release_data+0x78/0xc8 [ed4f1d10] [] (null) (unreliable) [ed4f1d20] [c021c914] __kfree_skb+0x18/0xbc [ed4f1d30] [c01a7620] e1000_clean_rx_ring+0x10c/0x1a4 [ed4f1d60] [c01a76e0] e1000_clean_all_rx_rings+0x28/0x54 [ed4f1d70] [c01aac50] e1000_close+0x30/0xb4 [ed4f1d90] [c0226e2c] __dev_close_many+0xa0/0xe0 [ed4f1da0] [c0228c64] __dev_close+0x2c/0x4c [ed4f1dc0] [c0225224] __dev_change_flags+0xb8/0x140 [ed4f1de0] [c0226d48] dev_change_flags+0x1c/0x60 [ed4f1e00] [c027e7f8] devinet_ioctl+0x2a4/0x700 [ed4f1e60] [c027f450] inet_ioctl+0xc8/0xfc [ed4f1e70] [c02147b0] sock_ioctl+0x260/0x2a0 [ed4f1e90] [c009b468] vfs_ioctl+0x2c/0x58 [ed4f1ea0] [c009bc44] do_vfs_ioctl+0x64c/0x6d4 [ed4f1f10] [c009bd24] sys_ioctl+0x58/0x88 [ed4f1f40] [c000e954] ret_from_syscall+0x0/0x38 --- Exception: c01 at 0xff35a3c LR = 0xff359a0 When turning on SLAB checks, I see: Slab corruption: size-16384 start=ed4ec000, len=16384 690: 6b 6b ff ff ff ff ff ff b8 ac 6f 99 bf 8b 08 00 6a0: 45 00 00 24 3f 34 00 00 80 11 ca cf 0a ca 0d 33 6b0: 0a ca 0d ff 06 cc 06 cf 00 10 bc 1d c5 0b 40 01 6c0: 00 10 00 33 00 00 00 00 00 00 00 00 00 00 3f dd 6d0: ed f8 6b 6b