Re: Memory performance / Cache problem
> On Wednesday 14 October 2009 17:48:39 ext e...@gmx.de wrote: > > Mem clock is both times 166MHz. I don't know whether are differences in > > cycle access and timing, but memclock is fine. > > > > Following Siarhei hints of initialize the buffers (around 1.2 MByte > each) > > I get different results in 22kernel for use of > > malloc alone > > memcpy = 473.764, loop4 = 448.430, loop1 = 102.770, rand = > 29.641 > > calloc alone > > memcpy = 405.947, loop4 = 361.550, loop1 =95.441, rand = > 21.853 > > malloc+memset: > > memcpy = 239.294, loop4 = 188.617, loop1 =80.871, rand = > 4.726 > > > > In 31kernel all 3 measures are about the same (unfortunatly low) level > of > > malloc+memset in 22. > > > > First of all: What performance can be expected? > > Does 22 make failures if it is so much faster? > > Can the later kernels get a boost in memory handling? > > What you see is just a (fake) performance boost because you have a single > physical page shared between all the virtual pages in the source buffer. > So > you get no cache misses on read operations and everything seems fast. > > This is unlikely to happen on real use, and it does not reflect real > memory > performance. So the benchmark is inadequate. > > You can get some basic information here: > http://en.wikipedia.org/wiki/Copy-on-write > > Regarding the difference in behavior between .22 and recent kernels. It > may be > some regression in copy-on-write implementation, or just some change done > on > purpose. That is assuming that the userspace stuff was identical in both > tests. > Ok, understand the difference if the memory is uninitialised. But why there is the difference in "malloc + memset" and "calloc"? In both cases the memory will be cleared. -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Memory performance / Cache problem
> From: Siarhei Siamashka [mailto:siarhei.siamas...@nokia.com] > Sent: Wednesday, October 14, 2009 12:37 PM > To: ext e...@gmx.de > What you see is just a (fake) performance boost because you have a single > physical page shared between all the virtual pages in the source buffer. So > you get no cache misses on read operations and everything seems fast. > > This is unlikely to happen on real use, and it does not reflect real memory > performance. So the benchmark is inadequate. Yep, benchmark is only useful so far. If you control factors it can be useful but it's far from 1-1, to extrapolate it to something meaningful at system level. You can actually get even 'better' numbers if you take the DDR part geometry into mind and SDRC (sdram-controller) scheduler. Our silicon validation people report out of this world numbers for very specific test cases. These are component tests for the memory controller to make sure it behaves. If you alternate between open banks you can do really fast operations. A good amount of that test is not practical to count on at HLOS level. It can help explain some anomalies and help in designing a fair test. Regards, Richard W. -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: RE: RE: RE: Memory performance / Cache problem
> From: e...@gmx.de [mailto:e...@gmx.de] > Sent: Wednesday, October 14, 2009 12:23 PM > To: Woodruff, Richard; Premi, Sanjeev; linux-omap@vger.kernel.org > Subject: Re: RE: RE: RE: Memory performance / Cache problem > > Yes aligned buffers can make a difference. But probably more so for small > > copies. Of course you must touch the memory or mprotect() it so its > > faulted in, but indications are you have done this. > > Mh, alignment (to an address) is done with malloc already. Probably you mean > something different. I don't understand the difference. For me is > malloc+memset=calloc. > I'll send you the benchmark code, if you like. Ok, if it compiles I could try on sdp3430 or sdp3630. Alignment to a cache line is always best. This is 64 bytes in A8. Better yet, being 4K aligned is a good thing to reduce MMU effects. > In both kernels I used the same rootfs (via NFS). Indeed I used CS2009q1 and > its libs, but we are talking about factor 2..6. This must be something > serious. 2009 pay version has optimized thrumb2 and arm mode libraries you download separately. I don't recall if free/lite version integrated this. > What is your feeling? Does the 22 something strange or are the newer kernels > slower that they have to be. There are a lot of differences between 22 kernel and current ones. First things I'd check would be around MMU settings, special ARM cp15 memory control regs, then omap memory and clock settings. Also some bad device could undercut you. Always good to cat /proc/interrupts and make sure something isn't spewing. > Would be interesting to see results on other Omap3 boards with both old an new > kernels. I've not noticed anything on sdp on somewhat recent kernels. Regards, Richard W. -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory performance / Cache problem
On Wednesday 14 October 2009 17:48:39 ext e...@gmx.de wrote: > Mem clock is both times 166MHz. I don't know whether are differences in > cycle access and timing, but memclock is fine. > > Following Siarhei hints of initialize the buffers (around 1.2 MByte each) > I get different results in 22kernel for use of > malloc alone > memcpy = 473.764, loop4 = 448.430, loop1 = 102.770, rand =29.641 > calloc alone > memcpy = 405.947, loop4 = 361.550, loop1 =95.441, rand =21.853 > malloc+memset: > memcpy = 239.294, loop4 = 188.617, loop1 =80.871, rand = 4.726 > > In 31kernel all 3 measures are about the same (unfortunatly low) level of > malloc+memset in 22. > > First of all: What performance can be expected? > Does 22 make failures if it is so much faster? > Can the later kernels get a boost in memory handling? What you see is just a (fake) performance boost because you have a single physical page shared between all the virtual pages in the source buffer. So you get no cache misses on read operations and everything seems fast. This is unlikely to happen on real use, and it does not reflect real memory performance. So the benchmark is inadequate. You can get some basic information here: http://en.wikipedia.org/wiki/Copy-on-write Regarding the difference in behavior between .22 and recent kernels. It may be some regression in copy-on-write implementation, or just some change done on purpose. That is assuming that the userspace stuff was identical in both tests. -- Best regards, Siarhei Siamashka -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RE: RE: RE: Memory performance / Cache problem
> > Mem clock is both times 166MHz. I don't know whether are differences in > cycle > > access and timing, but memclock is fine. > > How did you physically verify this? Oszi show 166MHz, also the kernel message about freq are in both kernels the same. > > Following Siarhei hints of initialize the buffers (around 1.2 MByte > each) > > I get different results in 22kernel for use of > > malloc alone > > memcpy = 473.764, loop4 = 448.430, loop1 = 102.770, rand = > 29.641 > > calloc alone > > memcpy = 405.947, loop4 = 361.550, loop1 =95.441, rand = > 21.853 > > malloc+memset: > > memcpy = 239.294, loop4 = 188.617, loop1 =80.871, rand = > 4.726 > > > > In 31kernel all 3 measures are about the same (unfortunatly low) level > of > > malloc+memset in 22. > > Yes aligned buffers can make a difference. But probably more so for small > copies. Of course you must touch the memory or mprotect() it so its > faulted in, but indications are you have done this. Mh, alignment (to an address) is done with malloc already. Probably you mean something different. I don't understand the difference. For me is malloc+memset=calloc. I'll send you the benchmark code, if you like. > > I used a standard memcpy (think this is glib and hence not neonbased)? > > To be neonbased I guess it has to be recompiled? > > The version of glibc in use can make a difference. CodeSourcery in 2009 > release added PLD's to mem operations. This can give a good benefit. It > might be you have optimized library in one case and a non-optimized in > another. In both kernels I used the same rootfs (via NFS). Indeed I used CS2009q1 and its libs, but we are talking about factor 2..6. This must be something serious. What is your feeling? Does the 22 something strange or are the newer kernels slower that they have to be. Would be interesting to see results on other Omap3 boards with both old an new kernels. Best regards Steffen -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: RE: RE: Memory performance / Cache problem
> From: e...@gmx.de [mailto:e...@gmx.de] > Sent: Wednesday, October 14, 2009 9:49 AM > To: Woodruff, Richard; linux-omap@vger.kernel.org; Premi, Sanjeev > Subject: Re: RE: RE: Memory performance / Cache problem > > Mem clock is both times 166MHz. I don't know whether are differences in cycle > access and timing, but memclock is fine. How did you physically verify this? > Following Siarhei hints of initialize the buffers (around 1.2 MByte each) > I get different results in 22kernel for use of > malloc alone > memcpy = 473.764, loop4 = 448.430, loop1 = 102.770, rand =29.641 > calloc alone > memcpy = 405.947, loop4 = 361.550, loop1 =95.441, rand =21.853 > malloc+memset: > memcpy = 239.294, loop4 = 188.617, loop1 =80.871, rand = 4.726 > > In 31kernel all 3 measures are about the same (unfortunatly low) level of > malloc+memset in 22. Yes aligned buffers can make a difference. But probably more so for small copies. Of course you must touch the memory or mprotect() it so its faulted in, but indications are you have done this. > I used a standard memcpy (think this is glib and hence not neonbased)? > To be neonbased I guess it has to be recompiled? The version of glibc in use can make a difference. CodeSourcery in 2009 release added PLD's to mem operations. This can give a good benefit. It might be you have optimized library in one case and a non-optimized in another. Regards, Richard W. -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RE: RE: Memory performance / Cache problem
Mem clock is both times 166MHz. I don't know whether are differences in cycle access and timing, but memclock is fine. Following Siarhei hints of initialize the buffers (around 1.2 MByte each) I get different results in 22kernel for use of malloc alone memcpy = 473.764, loop4 = 448.430, loop1 = 102.770, rand =29.641 calloc alone memcpy = 405.947, loop4 = 361.550, loop1 =95.441, rand =21.853 malloc+memset: memcpy = 239.294, loop4 = 188.617, loop1 =80.871, rand = 4.726 In 31kernel all 3 measures are about the same (unfortunatly low) level of malloc+memset in 22. First of all: What performance can be expected? Does 22 make failures if it is so much faster? Can the later kernels get a boost in memory handling? I used a standard memcpy (think this is glib and hence not neonbased)? To be neonbased I guess it has to be recompiled? How can I find out that neon and cache settings are ok? Using a Omap3530 on EVM board Unfortunatly I don't have a Lauterbach, just a Spectrum Digital which works only until Linux kernel is booting. Best regards Steffen Original-Nachricht > Datum: Wed, 14 Oct 2009 08:59:05 -0500 > Von: "Woodruff, Richard" > An: "e...@gmx.de" , "Premi, Sanjeev" , > "linux-omap@vger.kernel.org" > Betreff: RE: RE: Memory performance / Cache problem > > There is no newer u-boot from TI available. There is a SDK 02.01.03.11 > > but it contains the same uboot 2008.10 with the only addition of the > second > > generation of EVM boards with another network chip. > > > > So I checked the uboot from git, but this doesn't support Microns NAND > Flash > > anymore. It is just working with ONENAND. > > > > I found a patch which shows the L2 Cache status while kernel boot and > > implemented it : L2 Cache seems to be already enabled - so this is not > the > > reason. > > > > So any other ideas? > > Are you confident your memory bus isn't running at 1/2 speed? > > I recall there was a couple day window during wtbu kernel upgrades where > memory bus speed with pm was running 1/2 speed after kernel started up. > This was somewhat a side effect of constraints frame work and a regression in > forward porting. It seems unlikely psp kernel would have shipped with this > bug but its something to check. This would match your results. > > If your memcpy() is neon based then I might be worried about > l1neon-caching effects along with factors of (exlcusive-l1-l2-read-allocate > cache + pld > not being effective on l1 only l2). > > Which memcpy test are you using? Something in lmbench or just one you > wrote. Generally results are a little hard to interpret with exclusive cache > behavior in 3430's r1px core. 3630's r3p2 core gives more traditional > results as exclusive feature has been removed by arm. > > If you have the ability using Lauterbach + per file will allow internal > space dump which will show all critical parameters during test. It's a 1 > minute check for someone who has done it before to ensure the few parameters > needed are in line. I can send an example off line of how to do capture. I > won't have time to expand on all relevant parameters. > > Regards, > Richard W. -- Neu: GMX DSL bis 50.000 kBit/s und 200,- Euro Startguthaben! http://portal.gmx.net/de/go/dsl02 -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: RE: Memory performance / Cache problem
> There is no newer u-boot from TI available. There is a SDK 02.01.03.11 > but it contains the same uboot 2008.10 with the only addition of the second > generation of EVM boards with another network chip. > > So I checked the uboot from git, but this doesn't support Microns NAND Flash > anymore. It is just working with ONENAND. > > I found a patch which shows the L2 Cache status while kernel boot and > implemented it : L2 Cache seems to be already enabled - so this is not the > reason. > > So any other ideas? Are you confident your memory bus isn't running at 1/2 speed? I recall there was a couple day window during wtbu kernel upgrades where memory bus speed with pm was running 1/2 speed after kernel started up. This was somewhat a side effect of constraints frame work and a regression in forward porting. It seems unlikely psp kernel would have shipped with this bug but its something to check. This would match your results. If your memcpy() is neon based then I might be worried about l1neon-caching effects along with factors of (exlcusive-l1-l2-read-allocate cache + pld not being effective on l1 only l2). Which memcpy test are you using? Something in lmbench or just one you wrote. Generally results are a little hard to interpret with exclusive cache behavior in 3430's r1px core. 3630's r3p2 core gives more traditional results as exclusive feature has been removed by arm. If you have the ability using Lauterbach + per file will allow internal space dump which will show all critical parameters during test. It's a 1 minute check for someone who has done it before to ensure the few parameters needed are in line. I can send an example off line of how to do capture. I won't have time to expand on all relevant parameters. Regards, Richard W. -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory performance / Cache problem
The L2 cache is set and running. I don't know - can it be configured or misconfigured somehow? I just checked the output of 2.6.22 kernel and get these lines (which I don't have in newer kernels): CPU0: D VIPT write-through cache CPU0: cache: 768 bytes, associativity 1, 8 byte lines, 64 sets Built 1 zonelists. Total pages: 32512 I am wondering what is this? First thought was L1 cache, but it's to small. The benchmark is running on same hardware, same uboot, same rootfs, just the kernel is different. > On Monday 12 October 2009 10:54:09 ext e...@gmx.de wrote: > > I found the memory performance of newer kernels are quit poor on an > > EVM-Omap3 board. It works with 2-6 times performance on the old 2.6.22 > > kernel from TI's PSP. > > > > Possible reasons: > > - problem in config the kernel (did omap3_evm_defconfig) > > - problem in kernel > > - kernel expects some settings from uboot, which are not done there > > > > I have tried the 2.6.29rc3 (from TI's PSP) and the 2.6.31 from git-tree. > > Both behave quite simular: > > > > Transport in MByte: > > memcpy = 204.073, loop4 = 183.212, loop1 =81.693, rand = > > 4.534 > > > > while the 22 kernel: > > memcpy = 453.932, loop4 = 469.934, loop1 = 125.031, rand = > > 29.631 > > > > Can someone give me help or can at least confirm that? > > The numbers from 2.6.22 kernel look much better than anything I have ever > seen with OMAP3. > > How are you doing benchmarking? Is source buffer properly initialized? > > The point is that if you just happen to allocate a large buffer without > initializing it, it may end up having all the memory pages referencing to > a > single zero page in physical memory. In this case reading from this buffer > will in fact be perfectly cached in L1 cache and memcpy would look fast. > > If it is not the case, investigating how to boost memory performance in > the > latest kernels is very interesting for sure. > > -- > Best regards, > Siarhei Siamashka -- Neu: GMX DSL bis 50.000 kBit/s und 200,- Euro Startguthaben! http://portal.gmx.net/de/go/dsl02 -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RE: Memory performance / Cache problem
> > Can you upgrade to a newer u-boot? Either from the PSP release > OR u-boot tree hosted at git.denx.de (atleast 2009.03)? > > Also, it will be good to see the sample program you are using. > > ~sanjeev > There is no newer u-boot from TI available. There is a SDK 02.01.03.11 but it contains the same uboot 2008.10 with the only addition of the second generation of EVM boards with another network chip. So I checked the uboot from git, but this doesn't support Microns NAND Flash anymore. It is just working with ONENAND. I found a patch which shows the L2 Cache status while kernel boot and implemented it : L2 Cache seems to be already enabled - so this is not the reason. So any other ideas? -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Memory performance / Cache problem
I am sorry, the code mentioned is from the Android Public Git tree not LO PM. Sorry for the confusion. >-Original Message- >From: linux-omap-ow...@vger.kernel.org [mailto:linux-omap- >ow...@vger.kernel.org] On Behalf Of Dasgupta, Romit >Sent: Monday, October 12, 2009 2:38 PM >To: e...@gmx.de; linux-omap@vger.kernel.org >Subject: RE: Memory performance / Cache problem > >Please update to the latest HEAD on the linux-omap pm branch. In the gitweb it >shows >b7ecc865c5f0885fae4c4401fa78a24084f45c40 > >Thanks, >-Romit > > >>-Original Message- >>From: linux-omap-ow...@vger.kernel.org [mailto:linux-omap- >>ow...@vger.kernel.org] On Behalf Of e...@gmx.de >>Sent: Monday, October 12, 2009 2:08 PM >>To: linux-omap@vger.kernel.org >>Subject: RE: Memory performance / Cache problem >> >>Linux version 2.6.31 (s...@localhost) (gcc version 4.3.3 (Sourcery G++ Lite >2009q1- >>203) ) #1 Mon Oct 12 08:30:58 CEST 2009 >>CPU: ARMv7 Processor [411fc082] revision 2 (ARMv7), cr=10c53c7f >>CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction cache >>Machine: OMAP3 EVM >>Memory policy: ECC disabled, Data cache writeback >>OMAP3430 ES2.1 >>SRAM: Mapped pa 0x4020 to va 0xe300 size: 0x10 >>Built 1 zonelists in Zone order, mobility grouping on. Total pages: 32512 >> >>I see, you get the message about L2 Cache, which I don't have >>Do you know how to enable this? >>Shoudn't the kernel configure all this things - not rely on bootloader? >>I am using the U-Boot 2008.10 (TIs PSP) >>The old 22 kernel is independet from the uboot in this issue. >> >>Thanks >>Steffen >>-- >>Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - >>sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser >>-- >>To unsubscribe from this list: send the line "unsubscribe linux-omap" in >>the body of a message to majord...@vger.kernel.org >>More majordomo info at http://vger.kernel.org/majordomo-info.html > >-- >To unsubscribe from this list: send the line "unsubscribe linux-omap" in >the body of a message to majord...@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Memory performance / Cache problem
> -Original Message- > From: linux-omap-ow...@vger.kernel.org > [mailto:linux-omap-ow...@vger.kernel.org] On Behalf Of e...@gmx.de > Sent: Monday, October 12, 2009 2:08 PM > To: linux-omap@vger.kernel.org > Subject: RE: Memory performance / Cache problem > > Linux version 2.6.31 (s...@localhost) (gcc version 4.3.3 > (Sourcery G++ Lite 2009q1-203) ) #1 Mon Oct 12 08:30:58 CEST 2009 > CPU: ARMv7 Processor [411fc082] revision 2 (ARMv7), cr=10c53c7f > CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction cache > Machine: OMAP3 EVM > Memory policy: ECC disabled, Data cache writeback > OMAP3430 ES2.1 > SRAM: Mapped pa 0x4020 to va 0xe300 size: 0x10 > Built 1 zonelists in Zone order, mobility grouping on. Total > pages: 32512 > > > I see, you get the message about L2 Cache, which I don't have > Do you know how to enable this? > Shoudn't the kernel configure all this things - not rely on > bootloader? > I am using the U-Boot 2008.10 (TIs PSP) > The old 22 kernel is independet from the uboot in this issue. Can you upgrade to a newer u-boot? Either from the PSP release OR u-boot tree hosted at git.denx.de (atleast 2009.03)? Also, it will be good to see the sample program you are using. ~sanjeev > > Thanks > Steffen > -- > Jetzt kostenlos herunterladen: Internet Explorer 8 und > Mozilla Firefox 3.5 - > sicherer, schneller und einfacher! > http://portal.gmx.net/de/go/atbrowser > -- > To unsubscribe from this list: send the line "unsubscribe > linux-omap" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Memory performance / Cache problem
Please update to the latest HEAD on the linux-omap pm branch. In the gitweb it shows b7ecc865c5f0885fae4c4401fa78a24084f45c40 Thanks, -Romit >-Original Message- >From: linux-omap-ow...@vger.kernel.org [mailto:linux-omap- >ow...@vger.kernel.org] On Behalf Of e...@gmx.de >Sent: Monday, October 12, 2009 2:08 PM >To: linux-omap@vger.kernel.org >Subject: RE: Memory performance / Cache problem > >Linux version 2.6.31 (s...@localhost) (gcc version 4.3.3 (Sourcery G++ Lite >2009q1- >203) ) #1 Mon Oct 12 08:30:58 CEST 2009 >CPU: ARMv7 Processor [411fc082] revision 2 (ARMv7), cr=10c53c7f >CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction cache >Machine: OMAP3 EVM >Memory policy: ECC disabled, Data cache writeback >OMAP3430 ES2.1 >SRAM: Mapped pa 0x4020 to va 0xe300 size: 0x10 >Built 1 zonelists in Zone order, mobility grouping on. Total pages: 32512 > >I see, you get the message about L2 Cache, which I don't have >Do you know how to enable this? >Shoudn't the kernel configure all this things - not rely on bootloader? >I am using the U-Boot 2008.10 (TIs PSP) >The old 22 kernel is independet from the uboot in this issue. > >Thanks >Steffen >-- >Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - >sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser >-- >To unsubscribe from this list: send the line "unsubscribe linux-omap" in >the body of a message to majord...@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Memory performance / Cache problem
Linux version 2.6.31 (s...@localhost) (gcc version 4.3.3 (Sourcery G++ Lite 2009q1-203) ) #1 Mon Oct 12 08:30:58 CEST 2009 CPU: ARMv7 Processor [411fc082] revision 2 (ARMv7), cr=10c53c7f CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction cache Machine: OMAP3 EVM Memory policy: ECC disabled, Data cache writeback OMAP3430 ES2.1 SRAM: Mapped pa 0x4020 to va 0xe300 size: 0x10 Built 1 zonelists in Zone order, mobility grouping on. Total pages: 32512 I see, you get the message about L2 Cache, which I don't have Do you know how to enable this? Shoudn't the kernel configure all this things - not rely on bootloader? I am using the U-Boot 2008.10 (TIs PSP) The old 22 kernel is independet from the uboot in this issue. Thanks Steffen -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory performance / Cache problem
On Monday 12 October 2009 10:54:09 ext e...@gmx.de wrote: > I found the memory performance of newer kernels are quit poor on an > EVM-Omap3 board. It works with 2-6 times performance on the old 2.6.22 > kernel from TI's PSP. > > Possible reasons: > - problem in config the kernel (did omap3_evm_defconfig) > - problem in kernel > - kernel expects some settings from uboot, which are not done there > > I have tried the 2.6.29rc3 (from TI's PSP) and the 2.6.31 from git-tree. > Both behave quite simular: > > Transport in MByte: > memcpy = 204.073, loop4 = 183.212, loop1 =81.693, rand = > 4.534 > > while the 22 kernel: > memcpy = 453.932, loop4 = 469.934, loop1 = 125.031, rand = > 29.631 > > Can someone give me help or can at least confirm that? The numbers from 2.6.22 kernel look much better than anything I have ever seen with OMAP3. How are you doing benchmarking? Is source buffer properly initialized? The point is that if you just happen to allocate a large buffer without initializing it, it may end up having all the memory pages referencing to a single zero page in physical memory. In this case reading from this buffer will in fact be perfectly cached in L1 cache and memcpy would look fast. If it is not the case, investigating how to boost memory performance in the latest kernels is very interesting for sure. -- Best regards, Siarhei Siamashka -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Memory performance / Cache problem
Can you dump the first few lines of the kernel log_buf during the kernel boot? Something like this: [0.00] Linux version 2.6.29-omap1-5-gf9f407c-dirty (ro...@maxwell) (gcc version 4.3.3 (Sourcery G++ Lite 2009q1-203) ) #14 PREEMPT Fri Oct 9 14:04:52 IST 2009 [0.00] CPU: ARMv7 Processor [411fc083] revision 3 (ARMv7), cr=10c5387f [0.00] CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction cache [0.00] Machine: OMAP ZOOM2 board [0.00] Memory policy: ECC disabled, Data cache writeback [0.00] L2 CACHE is enabled in bootloader [0.00] OMAP3430 ES3.1 It should tell you if L2 cache is enabled or not. Thanks, -Romit >-Original Message- >From: linux-omap-ow...@vger.kernel.org [mailto:linux-omap- >ow...@vger.kernel.org] On Behalf Of e...@gmx.de >Sent: Monday, October 12, 2009 1:24 PM >To: linux-omap@vger.kernel.org >Subject: Memory performance / Cache problem > >I found the memory performance of newer kernels are quit poor on an EVM- >Omap3 board. It works with 2-6 times performance on the old 2.6.22 kernel from >TI's PSP. > >Possible reasons: >- problem in config the kernel (did omap3_evm_defconfig) >- problem in kernel >- kernel expects some settings from uboot, which are not done there > >I have tried the 2.6.29rc3 (from TI's PSP) and the 2.6.31 from git-tree. >Both behave quite simular: > >Transport in MByte: > memcpy = 204.073, loop4 = 183.212, loop1 =81.693, rand = >4.534 > >while the 22 kernel: > memcpy = 453.932, loop4 = 469.934, loop1 = 125.031, rand = >29.631 > >Can someone give me help or can at least confirm that? >Best regards >Steffen >-- >GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! >Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 >-- >To unsubscribe from this list: send the line "unsubscribe linux-omap" in >the body of a message to majord...@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html