[PATCH 0/2] [PPC 4xx] L2-cache synchronization for ppc44x
Hello all, Here is a patch-set for support L2-cache synchronization routines for the ppc44x processors family. I know that the "ppc" branch is for bug-fixing only, thus the patch-set is just FYI [though enabled but non-coherent L2-cache may appear as a bug for someone who uses one of the boards listed below :)]. [PATCH 1/2] [PPC 4xx] invalidate_l2cache_range() implementation for ppc44x; [PATCH 2/2] [PPC 44x] enable L2-cache for the following ppc44x-based boards: ALPR, Katmai, Ocotea, and Taishan. Regards, Yuri -- Yuri Tikhonov, Senior Software Engineer Emcraft Systems, www.emcraft.com ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 0/2] [PPC 4xx] L2-cache synchronization for ppc44x
Hello, Eugene, The h/w snooping mechanism you are talking about is limited to the Low Latency (LL) segment of the PLB bus in ppc440sp and ppc440spe chips (see section "7.2.7 L2 Cache Coherency" of the ppc440spe spec), whereas DMA and XOR engines use the High Bandwidth (HB) segment of PLB bus (see section "1.1.2 Internal Buses" of the ppc440spe spec). Thus, the h/w snooping mechanism is not able to trace the results of operations performed by DMA and XOR engines and keep L2-cache coherent with SDRAM, because the data flow through the HB PLB segment. This leads to, for example, incorrect results of RAID-parity calculations if one uses the h/w accelerated ppc440spe ADMA driver with L2-cache enabled. The s/w synchronization algorithms proposed in my patches has no LL PLB limitations as opposed to h/w snooping, but, probably, this is not the best way of how it might be implemented. Even though with these patches the h/w accelerated RAID starts to operate correctly (with L2-cache enabled) there is a performance degradation (induced by loops in the L2-cache synchronization routines) observed in the most cases. So, as a result, there is no benefit from using L2-cache for these, RAID, cases at all. Regards, Yuri On Wednesday 28 November 2007 22:50, Eugene Surovegin wrote: > On Wed, Nov 07, 2007 at 01:40:10AM +0300, Yuri Tikhonov wrote: > > > > Hello all, > > > > Here is a patch-set for support L2-cache synchronization routines for > > the ppc44x processors family. I know that the "ppc" branch is for bug-fixing only, thus > > the patch-set is just FYI [though enabled but non-coherent L2-cache may appear as a bug for > > someone who uses one of the boards listed below :)]. > > > > [PATCH 1/2] [PPC 4xx] invalidate_l2cache_range() implementation for ppc44x; > > [PATCH 2/2] [PPC 44x] enable L2-cache for the following ppc44x-based boards: ALPR, > > Katmai, Ocotea, and Taishan. > > Why is this all needed? > > IIRC ibm440gx_l2c_enable() configures 64G snoop region for L2C. > > Did AMCC made non-only-coherent L2C chips recently? > > -- > Eugene > > -- Yuri Tikhonov, Senior Software Engineer Emcraft Systems, www.emcraft.com ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 0/2] [PPC 4xx] L2-cache synchronization for ppc44x
On Fri, Jan 11, 2008 at 06:24:46PM +0300, Yuri Tikhonov wrote: > > Hello, Eugene, > > The h/w snooping mechanism you are talking about is limited to the Low > Latency (LL) segment of the PLB bus in ppc440sp and ppc440spe chips (see > section "7.2.7 L2 Cache Coherency" of the ppc440spe spec), whereas DMA and > XOR engines use the High Bandwidth (HB) segment of PLB bus (see > section "1.1.2 Internal Buses" of the ppc440spe spec). > > Thus, the h/w snooping mechanism is not able to trace the results of > operations performed by DMA and XOR engines and keep L2-cache coherent with > SDRAM, because the data flow through the HB PLB segment. This leads to, for > example, incorrect results of RAID-parity calculations if one uses the h/w > accelerated ppc440spe ADMA driver with L2-cache enabled. > > The s/w synchronization algorithms proposed in my patches has no LL PLB > limitations as opposed to h/w snooping, but, probably, this is not the best > way of how it might be implemented. Even though with these patches the h/w > accelerated RAID starts to operate correctly (with L2-cache enabled) there is > a performance degradation (induced by loops in the L2-cache synchronization > routines) observed in the most cases. So, as a result, there is no benefit > from using L2-cache for these, RAID, cases at all. Thanks a lot for explanation, Yuri. I'd never imagine they were so stupid to make new chips with such behaviour. -- Eugene ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 0/2] [PPC 4xx] L2-cache synchronization for ppc44x
> > The s/w synchronization algorithms proposed in my patches has no LL PLB > > limitations as opposed to h/w snooping, but, probably, this is not the best > > way of how it might be implemented. Even though with these patches the h/w > > accelerated RAID starts to operate correctly (with L2-cache enabled) there > > is > > a performance degradation (induced by loops in the L2-cache synchronization > > routines) observed in the most cases. So, as a result, there is no benefit > > from using L2-cache for these, RAID, cases at all. > > Thanks a lot for explanation, Yuri. I'd never imagine they were so > stupid to make new chips with such behaviour. Indeed. Now the question is do we want to make that configurable by the platform so it can select whether to enable snooping, or use this mechanism (in which case we can disable snooping on the L2) ? Another option would be to make the dma_ops smart enough to know whether a given device is on the snooped portion of the bus, which would be easier to do after I merge 32 and 64 bits DMA ops, so we get the ability to change the dma-ops per bus or per device even. What do you guys think ? Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 0/2] [PPC 4xx] L2-cache synchronization for ppc44x
On Sat, Jan 12, 2008 at 09:05:35AM +1100, Benjamin Herrenschmidt wrote: > > > > The s/w synchronization algorithms proposed in my patches has no LL PLB > > > limitations as opposed to h/w snooping, but, probably, this is not the > > > best > > > way of how it might be implemented. Even though with these patches the > > > h/w > > > accelerated RAID starts to operate correctly (with L2-cache enabled) > > > there is > > > a performance degradation (induced by loops in the L2-cache > > > synchronization > > > routines) observed in the most cases. So, as a result, there is no > > > benefit > > > from using L2-cache for these, RAID, cases at all. > > > > Thanks a lot for explanation, Yuri. I'd never imagine they were so > > stupid to make new chips with such behaviour. > > Indeed. Now the question is do we want to make that configurable by the > platform so it can select whether to enable snooping, or use this > mechanism (in which case we can disable snooping on the L2) ? I don't think we should panish platforms with sane L2 caches, because there are some brain-dead ones. > Another option would be to make the dma_ops smart enough to know whether > a given device is on the snooped portion of the bus, which would be > easier to do after I merge 32 and 64 bits DMA ops, so we get the ability > to change the dma-ops per bus or per device even. > > What do you guys think ? I like the idea of having smart DMA routines with different per-bus/device behaviour. -- Eugene ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 0/2] [PPC 4xx] L2-cache synchronization for ppc44x
On Fri, 2008-01-11 at 14:38 -0800, Eugene Surovegin wrote: > On Sat, Jan 12, 2008 at 09:05:35AM +1100, Benjamin Herrenschmidt wrote: > > > > > > The s/w synchronization algorithms proposed in my patches has no LL > > > > PLB > > > > limitations as opposed to h/w snooping, but, probably, this is not the > > > > best > > > > way of how it might be implemented. Even though with these patches the > > > > h/w > > > > accelerated RAID starts to operate correctly (with L2-cache enabled) > > > > there is > > > > a performance degradation (induced by loops in the L2-cache > > > > synchronization > > > > routines) observed in the most cases. So, as a result, there is no > > > > benefit > > > > from using L2-cache for these, RAID, cases at all. > > > > > > Thanks a lot for explanation, Yuri. I'd never imagine they were so > > > stupid to make new chips with such behaviour. > > > > Indeed. Now the question is do we want to make that configurable by the > > platform so it can select whether to enable snooping, or use this > > mechanism (in which case we can disable snooping on the L2) ? > > I don't think we should panish platforms with sane L2 caches, because > there are some brain-dead ones. I agree, which is why I'm thinking about making it some kind of explicit thing that a give platform would call from it's setup_arch() callbacks to turn on manual L2 sycnhronization. > > Another option would be to make the dma_ops smart enough to know whether > > a given device is on the snooped portion of the bus, which would be > > easier to do after I merge 32 and 64 bits DMA ops, so we get the ability > > to change the dma-ops per bus or per device even. > > > > What do you guys think ? > > I like the idea of having smart DMA routines with different > per-bus/device behaviour. That would be longer term. When I merge the dma ops, I'll look into a way to provide 44x specific DMA ops that handle that case, and then a way for devices to be tagged (maybe via the device-tree) on whether they are on an L2 coherent or non-L2 coherent segment of the bus. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 0/2] [PPC 4xx] L2-cache synchronization for ppc44x
On Wed, Nov 07, 2007 at 01:40:10AM +0300, Yuri Tikhonov wrote: > > Hello all, > > Here is a patch-set for support L2-cache synchronization routines for > the ppc44x processors family. I know that the "ppc" branch is for bug-fixing > only, thus > the patch-set is just FYI [though enabled but non-coherent L2-cache may > appear as a bug for > someone who uses one of the boards listed below :)]. > > [PATCH 1/2] [PPC 4xx] invalidate_l2cache_range() implementation for ppc44x; > [PATCH 2/2] [PPC 44x] enable L2-cache for the following ppc44x-based boards: > ALPR, > Katmai, Ocotea, and Taishan. Why is this all needed? IIRC ibm440gx_l2c_enable() configures 64G snoop region for L2C. Did AMCC made non-only-coherent L2C chips recently? -- Eugene ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev