Yes, I think the potentially confusing situation with this code is that it appears we are doing two TLB lookups whenever the access crosses a cache-line boundary, even if both of the accessed cache lines are on the same page. Conceptually we should only need two TLB lookups if the access crosses a page boundary (which, as Gabe points out, implies that the access also crosses a cache line boundary, but the converse is not true).
I think the question Nilay is asking is whether this code is doing these unnecessary TLB lookups just to keep the code simpler, or if there is a deeper reason why it's hard to only do two TLB lookups when absolutely necessary. Steve On Thu, Jul 7, 2011 at 2:12 PM, Gabriel Michael Black <[email protected] > wrote: > When I did the original version of this code (since improved by others) I > was told, by Steve I think, that accesses have to be contained in a single > "block". The size of the peer's block is reported through the port > interface. I think it's assumed that the page size is at least as large as a > cache line and that all page boundaries are also "block" boundaries. This > should be a valid assumption, although there's no true guarantee I suppose. > > Gabe > > > Quoting Nilay Vaish <[email protected]>: > > Yesterday, Brad, Steve and I were looking at code for TimingSimpleCPU. >> There is a portion of the read/writeMem function that is not completely >> explainable. I have copied the code below. >> >> Addr split_addr = roundDown(addr + size - 1, block_size); >> assert(split_addr <= addr || split_addr - addr < block_size); >> >> _status = DTBWaitResponse; >> if (split_addr > addr) { >> RequestPtr req1, req2; >> assert(!req->isLLSC() && !req->isSwap()); >> req->splitOnVaddr(split_addr, req1, req2); >> >> WholeTranslationState *state = >> new WholeTranslationState(req, req1, req2, new uint8_t[size], >> NULL, mode); >> DataTranslation<**TimingSimpleCPU> *trans1 = >> new DataTranslation<**TimingSimpleCPU>(this, state, 0); >> DataTranslation<**TimingSimpleCPU> *trans2 = >> new DataTranslation<**TimingSimpleCPU>(this, state, 1); >> >> thread->dtb->translateTiming(**req1, tc, trans1, mode); >> thread->dtb->translateTiming(**req2, tc, trans2, mode); >> } else { >> WholeTranslationState *state = >> new WholeTranslationState(req, new uint8_t[size], NULL, mode); >> DataTranslation<**TimingSimpleCPU> *translation >> = new DataTranslation<**TimingSimpleCPU>(this, state); >> thread->dtb->translateTiming(**req, tc, translation, mode); >> } >> >> >> The code calls translateTiming() either once or twice depending on whether >> or not the memory to be read lies in a single cache block. Should not the >> check be that whether or no the memory to be read lies in a single page? >> >> Thanks >> Nilay >> ______________________________**_________________ >> gem5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev> >> >> > > ______________________________**_________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev> > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
