Re: [coreboot] fam10/h8dmr: extreme slowness in CBFS memset / memcpy
Following up on this - Patrick helped in IRC this evening, and we came to the conclusion that it's probably *not* an MTRR issue, since we figured out the code seems to set MTRRs properly. I wonder what else could cause it to be so slow? It's especially surprising for the memset, which is pretty simple. Does it use movnti for that? We found out after adding an extra MTRR over the flash chip, which did not change anything. Did you disable and re-enable the cache so that the settings take effect? I guess I would: 1. Add some little benchmark loops reading/writing different areas a. read ROM time it b. read from RAM (cached area) and time it c. read from RAM (non-cached area) d. write to RAM (cached area) ... 2. disable MTRRs to see if it would go even slower. Sorry that's not much help, but I don't have a fam10 box to try things on. Thanks, Myles -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] fam10/h8dmr: extreme slowness in CBFS memset / memcpy
On Tue, Jul 21, 2009 at 06:25:38AM -0600, Myles Watson wrote: Following up on this - Patrick helped in IRC this evening, and we came to the conclusion that it's probably *not* an MTRR issue, since we figured out the code seems to set MTRRs properly. I wonder what else could cause it to be so slow? It's especially surprising for the memset, which is pretty simple. Does it use movnti for that? It's actually just a plain byte-by-byte assignment in c, see src/lib/memset.c. We found out after adding an extra MTRR over the flash chip, which did not change anything. Did you disable and re-enable the cache so that the settings take effect? Hmm, we tried adding it here src/cpu/amd/car/clear_init_ram.c in function set_init_ram_access, which already sets an mtrr. This gets called just before CAR is disabled I think. And then we found the mtrr set in src/cpu/amd/car/cache_as_ram.inc which looks like it *should* do the right thing. But that's assembler of course. I don't suppose there's a way to print debug info from right there? I guess I would: 1. Add some little benchmark loops reading/writing different areas a. read ROM time it b. read from RAM (cached area) and time it c. read from RAM (non-cached area) d. write to RAM (cached area) ... 2. disable MTRRs to see if it would go even slower. Sorry that's not much help, but I don't have a fam10 box to try things on. Thanks - will see if I can try some of these things. Thanks, Ward. -- Ward Vandewege w...@gnu.org -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] fam10/h8dmr: extreme slowness in CBFS memset /memcpy
On Tue, Jul 21, 2009 at 06:25:38AM -0600, Myles Watson wrote: Following up on this - Patrick helped in IRC this evening, and we came to the conclusion that it's probably *not* an MTRR issue, since we figured out the code seems to set MTRRs properly. I wonder what else could cause it to be so slow? It's especially surprising for the memset, which is pretty simple. Does it use movnti for that? It's actually just a plain byte-by-byte assignment in c, see src/lib/memset.c. It would be interesting to see if you make it 4 bytes at a time if it is 4x faster. We found out after adding an extra MTRR over the flash chip, which did not change anything. Did you disable and re-enable the cache so that the settings take effect? Hmm, we tried adding it here src/cpu/amd/car/clear_init_ram.c in function set_init_ram_access, which already sets an mtrr. I always wondered about that one. The thing that makes it hard to debug is that it will read back correctly even if it hasn't taken effect. Thanks - will see if I can try some of these things. Good luck, Myles -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] fam10/h8dmr: extreme slowness in CBFS memset / memcpy
Following up on this - Patrick helped in IRC this evening, and we came to the conclusion that it's probably *not* an MTRR issue, since we figured out the code seems to set MTRRs properly. We found out after adding an extra MTRR over the flash chip, which did not change anything. The system boots fairly normally after the slowdowns, and appears to work normally. It sets three MTRRs further in the bootup process: reg00: base=0x ( 0MB), size=32768MB: write-back, count=1 reg01: base=0x8 (32768MB), size= 512MB: write-back, count=1 reg02: base=0xe000 (3584MB), size= 512MB: uncachable, count=1 Any thoughts on something else I should look at to debug this? Thanks, Ward. On Sun, Jul 19, 2009 at 09:23:21PM -0400, Ward Vandewege wrote: Hi all, I'm working on a fam10 tree for supermicro h8dmr. I'm using CBFS. It boots, but I'm struggling with some extreme slowness during boot. In particular, the memset function in src/lib/memset.c takes *minutes* to clear 1.2MB of ram. A little further CBFS does a memcpy which takes another 20 or 30 seconds: Stage: load fallback/coreboot_ram @ 2097152/1245184 bytes, enter @ 20 LOONG pause Stage: after memset on-stack variables at 00ffbec8 and 00ffbed4 cbfs_decompress: algo: 0 cbfs_decompress: uncompressed another lengthy pause cbfs_decompress: memcpy from 0xffbecc to 0xffbed0 for 0x2d304 bytes done Stage: done loading. The first, lengthly pause is new; it is apparently caused by something introduced between r4368 and r4440. The second pause was there already in r4368. I understand this may have something to do with MTRRs - looking at the logs it seems MTRRs are not set up until well after CBFS has dealt with coreboot_ram. This box has 32GB of ram, in case that makes a difference. Any suggestions? Thanks, Ward. -- Ward Vandewege w...@gnu.org -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
[coreboot] fam10/h8dmr: extreme slowness in CBFS memset / memcpy
Hi all, I'm working on a fam10 tree for supermicro h8dmr. I'm using CBFS. It boots, but I'm struggling with some extreme slowness during boot. In particular, the memset function in src/lib/memset.c takes *minutes* to clear 1.2MB of ram. A little further CBFS does a memcpy which takes another 20 or 30 seconds: Stage: load fallback/coreboot_ram @ 2097152/1245184 bytes, enter @ 20 LOONG pause Stage: after memset on-stack variables at 00ffbec8 and 00ffbed4 cbfs_decompress: algo: 0 cbfs_decompress: uncompressed another lengthy pause cbfs_decompress: memcpy from 0xffbecc to 0xffbed0 for 0x2d304 bytes done Stage: done loading. The first, lengthly pause is new; it is apparently caused by something introduced between r4368 and r4440. The second pause was there already in r4368. I understand this may have something to do with MTRRs - looking at the logs it seems MTRRs are not set up until well after CBFS has dealt with coreboot_ram. This box has 32GB of ram, in case that makes a difference. Any suggestions? Thanks, Ward. -- Ward Vandewege w...@gnu.org -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot