Glenn Ko wrote: >>>>>> 1. Is it because it's a 64bits and 4 bytes cannot be accessed? >>>>>> 2. icache is working with 4 bytes because 32bit instructions are >>>>>> > processed? > >>>>>> >>>>>> >>>>> Yup, that's the issue. >>>>> >>>>> With x86 support we now support cache-line-crossing accesses, but >>>>> that's only in the simple CPU models for now... so it might work in >>>>> the most recent code if you leave the "-d" off your command line. >>>>> >>>>> Steve >>>>> _______________________________________________ >>>>> m5-users mailing list >>>>> [email protected] >>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >>>>> >>>>> >>>>> >>>> Also accesses can only cross 1 cache line. If you have an 8 byte access >>>> offset by 1 byte from a cache line boundary, it actually falls on 3 >>>> cache lines and won't work. To work reliably, your block size needs to >>>> be at least the size of the largest access that ISA performs, and in the >>>> case of x86 I think that's 8 bytes. >>>> >>>> >>> Good point... but if you have an ISA that doesn't support unaligned >>> accesses (like SPARC) you should be able to get away with a block size >>> that's half the largest access size, right? >>> >>> Of course, notwithstanding the fact that it's totally unrealistic... >>> >>> Steve >>> _______________________________________________ >>> m5-users mailing list >>> [email protected] >>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >>> >>> >> Oh, sorry, yes. I thought he was using X86. SPARC actually has these >> load/store double instructions that can load/store up to 16 bytes, I >> believe, so that could be why 8 works and 4 doesn't. Also, I think O3 >> supports block boundary crossing accesses, although I may be wrong about >> that. >> >> Gabe >> >> >> ------------------------------ >> > > Thanks a lot for your replies. > I've tried not using the "-d" as suggested above but that resulted in > multiple > lines of, > > Info: Increasing stack size by one page. > > and > > fatal: Over max stack size for one thread. > > at the end. I do have to use detailed CPU model but I just wanted to see if > it > works. >
The simple CPU is more likely to be correct than O3 (since it's simpler), so it's likely O3 just doesn't get far enough to fail like this. Also, I think before I was wrong when I said O3 supports cache block straddling accesses (although I still haven't actually checked). I'm pretty sure I was thinking of something else. > In order to have the cache line size of 4 working, do I have to modify the > ISA > itself to 32bits or something? (which seems like a lot of work) > Or is there a workaround such as just changing the size of the largest ISA > access or increasing the number of allowed cache line cross? > In SPARC, if you compile your binaries to be 32 bit in SE mode, they should, I believe, access at most 8 bytes at a time. 32 bit SPARC also has the double loads, but they load two four byte values. Modifying the ISA would be possible, but it wouldn't be trivial and probably isn't worth the effort. This trick would only potentially be true in SE mode since FS mode boots the OS and the OS would need to be 64 bit I think. There isn't an easy way to change the largest access size. You could try to modify the simple CPU to allow crossing more than one boundary by extending what's there. When I initially implemented split accesses I had an eye towards that sort of thing, although it's been reworked at least once and it was always a secondary consideration. Like Steve said, though, cache lines that small don't seem very plausible. Is there a reason you're trying to get them to work? Gabe _______________________________________________ m5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
