Oh, gotcha, I forgot that these are each doing translations and not
just splitting up the access. I don't (off hand) think there's a
reason for splitting up the translation if it doesn't cross a page
boundary. There is a reason to split up the access itself, and it's
very important to translate all of the bytes one way or another, ie
you can't just delete the second call to the TLB. There are situations
where access control is more granular than a page, specifically
segments in x86, and the TLB does a lot of that work. It needs to see
all the bytes to determine if any fall out of bounds. That doesn't
mean you need to do two translations, just that the one translation
has to include everything even if you were to split up the access
itself later. I -think- that will work.
Gabe
Quoting Steve Reinhardt <[email protected]>:
Yes, I think the potentially confusing situation with this code is that it
appears we are doing two TLB lookups whenever the access crosses a
cache-line boundary, even if both of the accessed cache lines are on the
same page. Conceptually we should only need two TLB lookups if the access
crosses a page boundary (which, as Gabe points out, implies that the access
also crosses a cache line boundary, but the converse is not true).
I think the question Nilay is asking is whether this code is doing these
unnecessary TLB lookups just to keep the code simpler, or if there is a
deeper reason why it's hard to only do two TLB lookups when absolutely
necessary.
Steve
On Thu, Jul 7, 2011 at 2:12 PM, Gabriel Michael Black <[email protected]
wrote:
When I did the original version of this code (since improved by others) I
was told, by Steve I think, that accesses have to be contained in a single
"block". The size of the peer's block is reported through the port
interface. I think it's assumed that the page size is at least as large as a
cache line and that all page boundaries are also "block" boundaries. This
should be a valid assumption, although there's no true guarantee I suppose.
Gabe
Quoting Nilay Vaish <[email protected]>:
Yesterday, Brad, Steve and I were looking at code for TimingSimpleCPU.
There is a portion of the read/writeMem function that is not completely
explainable. I have copied the code below.
Addr split_addr = roundDown(addr + size - 1, block_size);
assert(split_addr <= addr || split_addr - addr < block_size);
_status = DTBWaitResponse;
if (split_addr > addr) {
RequestPtr req1, req2;
assert(!req->isLLSC() && !req->isSwap());
req->splitOnVaddr(split_addr, req1, req2);
WholeTranslationState *state =
new WholeTranslationState(req, req1, req2, new uint8_t[size],
NULL, mode);
DataTranslation<**TimingSimpleCPU> *trans1 =
new DataTranslation<**TimingSimpleCPU>(this, state, 0);
DataTranslation<**TimingSimpleCPU> *trans2 =
new DataTranslation<**TimingSimpleCPU>(this, state, 1);
thread->dtb->translateTiming(**req1, tc, trans1, mode);
thread->dtb->translateTiming(**req2, tc, trans2, mode);
} else {
WholeTranslationState *state =
new WholeTranslationState(req, new uint8_t[size], NULL, mode);
DataTranslation<**TimingSimpleCPU> *translation
= new DataTranslation<**TimingSimpleCPU>(this, state);
thread->dtb->translateTiming(**req, tc, translation, mode);
}
The code calls translateTiming() either once or twice depending on whether
or not the memory to be read lies in a single cache block. Should not the
check be that whether or no the memory to be read lies in a single page?
Thanks
Nilay
______________________________**_________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
______________________________**_________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev