I think the double fault malloc case could be solved by maybe hooking malloc with a small wrapper that will just touch the pointers at page-sized offsets within the malloc'd memory region. Hopefully most benchmarks actually use all of the memory they malloc/mmap so it won't be wasted cycles.
On Thu, Feb 9, 2012 at 12:44 PM, Ali Saidi <[email protected]> wrote: > ** > > It's all not necessarily roses there either as you'll need to get the CPU > model to call translate 3 times, but that is probably more contained and > you might be able to leverage the WholeTranslation code (see > src/cpu/translation.hh) which is normally used for requests that cross a > page boundary. If you end up taking a fault that the cpu needs to handle > (e.g. on page that has been malloced, but not actually allocated by the > kernel) it's still going to be trouble. However, you can probably work > around this with your benchmark. > > > > Ali > > > > > > > > On 09.02.2012 11:27, Paul Rosenfeld wrote: > > Well that doesn't sound like fun. Perhaps I'll look at ARM as a potential > target. > > On Thu, Feb 9, 2012 at 11:39 AM, Ali Saidi <[email protected]> wrote: > >> It's possible with Alpha, but it would take some work. You'd need to >> "take" a fault up to times and in between each time fix-up the fault status >> registers to have consistant data. Keeping track of what needs to be in the >> register as any one time sounds difficult, especially as translation faults >> can nest (you take a fault on the page table that you need te look at). An >> architecture with a hardware table walker is probably a bit easier to deal >> with. >> >> >> >> Ali >> >> >> >> >> >> On 09.02.2012 10:09, Paul Rosenfeld wrote: >> >> Thanks for the replies. I'm still trying to find my way around M5 and I >> thought the SE/TimingSimpleCPU would be a good way to see what's involved >> in modifying M5. >> One thing that I'm worried about is that for this work, I have multiple >> memory operands in registers that need to be translated to physical >> addresses. In the SE mode, I've simply added a new fake Fault where it will >> translate all 3 addresses. However, I'm not sure if a similar approach >> would be possible in FS mode (I haven't looked at how any of the PAL stuff >> works). Do you think it would be more feasible to generate multiple single >> TLB faults per operand in the instruction, or to do something where they >> all get translated together using a new fault? >> >> On Thu, Feb 9, 2012 at 10:16 AM, Ali Saidi <[email protected]> wrote: >> >>> Hi Paul, >>> >>> >>> >>> Yes, in SE mode, it's just faked as a pipeline flush (in the simple CPU >>> model then pretty much nothing happens). It should be reasonably easy to >>> change the model to delay some number of ns on a TLB miss, but you'll get >>> the best results by running in fs mode. >>> >>> >>> >>> Ali >>> >>> >>> >>> On 09.02.2012 01:05, Paul Rosenfeld wrote: >>> >>> So do you think that my reasoning that the TLB miss penalty is simply a >>> single cycle re-fetch penalty on the faulting instruction is correct for >>> ALPHA_SE/TimingSimpleCPU? >>> >>> On Thu, Feb 9, 2012 at 12:31 AM, Gabriel Michael Black < >>> [email protected]> wrote: >>> >>>> I believe that's correct. >>>> >>>> >>>> Gabe >>>> >>>> Quoting Paul Rosenfeld <[email protected]>: >>>> >>>> I guess I forgot to mention in my original email that I was talking >>>>> about >>>>> alpha.... I think in FS it will vector into a PAL routine, but in SE it >>>>> looks like it's all just faked ... >>>>> >>>>> On Wed, Feb 8, 2012 at 11:08 PM, Gabriel Michael Black < >>>>> [email protected]> wrote: >>>>> >>>>> There are two types of mechanisms to handle TLB misses, in hardware >>>>>> or in >>>>>> software. If the ISA you're using does it in software, there's a fault >>>>>> which makes the OS handle the miss. In that case it will take however >>>>>> long >>>>>> it takes the OS to get things set up again. If the miss is handled in >>>>>> hardware, then there's a TLB walker component which does memory >>>>>> accesses to >>>>>> look up the entry in the page tables, and the delay is determined by >>>>>> those >>>>>> accesses. >>>>>> >>>>>> Gabe >>>>>> >>>>>> >>>>>> Quoting Paul Rosenfeld <[email protected]>: >>>>>> >>>>>> Hello all, >>>>>> >>>>>>> >>>>>>> I'm trying to modify the TLB code for SimpleTimingCPU, but one thing >>>>>>> I >>>>>>> can't seem to find is what the latency of a DTLB miss is. I found >>>>>>> the code >>>>>>> in NDtbMissFault->invoke() for reading the page table mapping, but I >>>>>>> can't >>>>>>> seem to figure out if there's any mechanism for stalling the CPU to >>>>>>> handle >>>>>>> the fault. >>>>>>> >>>>>>> Reading the wiki for the SImpleTimingCPU, it sounds like it isn't >>>>>>> meant to >>>>>>> model this kind of detail. So is it just a one cycle fetch penalty >>>>>>> for >>>>>>> handling a TLB miss? >>>>>>> >>>>>>> If this is the case, what's the simplest CPU model that will actually >>>>>>> stall >>>>>>> for TLB misses? >>>>>>> >>>>>>> Thanks, >>>>>>> Paul >>>>>>> >>>>>>> >>>>>>> ______________________________**_________________ >>>>>> gem5-users mailing list >>>>>> [email protected] >>>>>> http://m5sim.org/cgi-bin/**mailman/listinfo/gem5-users<ht >>>>>> tp://m5sim.org/cgi-bin/mailman/listinfo/gem5-users> >>>>>> >>>>>> >>>> >>>> _______________________________________________ >>>> gem5-users mailing list >>>> [email protected] >>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >>>> >>> >>> >>> >>> _______________________________________________ >>> gem5-users mailing list >>> [email protected] >>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> >> >> >> > > >
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
