I think the double fault malloc case could be solved by maybe hooking
malloc with a small wrapper that will just touch the pointers at page-sized
offsets within the malloc'd memory region. Hopefully most benchmarks
actually use all of the memory they malloc/mmap so it won't be wasted
cycles.



On Thu, Feb 9, 2012 at 12:44 PM, Ali Saidi <[email protected]> wrote:

> **
>
> It's all not necessarily roses there either as you'll need to get the CPU
> model to call translate 3 times, but that is probably more contained and
> you might be able to leverage the WholeTranslation code (see
> src/cpu/translation.hh) which is normally used for requests that cross a
> page boundary. If you end up taking a fault that the cpu needs to handle
> (e.g. on page that has been malloced, but not actually allocated by the
> kernel) it's still going to be trouble. However, you can probably work
> around this with your benchmark.
>
>
>
> Ali
>
>
>
>
>
>
>
> On 09.02.2012 11:27, Paul Rosenfeld wrote:
>
> Well that doesn't sound like fun. Perhaps I'll look at ARM as a potential
> target.
>
> On Thu, Feb 9, 2012 at 11:39 AM, Ali Saidi <[email protected]> wrote:
>
>>  It's possible with Alpha, but it would take some work. You'd need to
>> "take" a fault up to times and in between each time fix-up the fault status
>> registers to have consistant data. Keeping track of what needs to be in the
>> register as any one time sounds difficult, especially as translation faults
>> can nest (you take a fault on the page table that you need te look at). An
>> architecture with a hardware table walker is probably a bit easier to deal
>> with.
>>
>>
>>
>> Ali
>>
>>
>>
>>
>>
>> On 09.02.2012 10:09, Paul Rosenfeld wrote:
>>
>> Thanks for the replies. I'm still trying to find my way around M5 and I
>> thought the SE/TimingSimpleCPU would be a good way to see what's involved
>> in modifying M5.
>> One thing that I'm worried about is that for this work, I have multiple
>> memory operands in registers that need to be translated to physical
>> addresses. In the SE mode, I've simply added a new fake Fault where it will
>> translate all 3 addresses. However, I'm not sure if a similar approach
>> would be possible in FS mode (I haven't looked at how any of the PAL stuff
>> works). Do you think it would be more feasible to generate multiple single
>> TLB faults per operand in the instruction, or to do something where they
>> all get translated together using a new fault?
>>
>> On Thu, Feb 9, 2012 at 10:16 AM, Ali Saidi <[email protected]> wrote:
>>
>>>  Hi Paul,
>>>
>>>
>>>
>>> Yes, in SE mode, it's just faked as a pipeline flush (in the simple CPU
>>> model then pretty much nothing happens). It should be reasonably easy to
>>> change the model to delay some number of ns on a TLB miss, but you'll get
>>> the best results by running in fs mode.
>>>
>>>
>>>
>>> Ali
>>>
>>>
>>>
>>> On 09.02.2012 01:05, Paul Rosenfeld wrote:
>>>
>>> So do you think that my reasoning that the TLB miss penalty is simply a
>>> single cycle re-fetch penalty on the faulting instruction is correct for
>>> ALPHA_SE/TimingSimpleCPU?
>>>
>>> On Thu, Feb 9, 2012 at 12:31 AM, Gabriel Michael Black <
>>> [email protected]> wrote:
>>>
>>>> I believe that's correct.
>>>>
>>>>
>>>> Gabe
>>>>
>>>> Quoting Paul Rosenfeld <[email protected]>:
>>>>
>>>>   I guess I forgot to mention in my original email that I was talking
>>>>> about
>>>>> alpha.... I think in FS it will vector into a PAL routine, but in SE it
>>>>> looks like it's all just faked ...
>>>>>
>>>>> On Wed, Feb 8, 2012 at 11:08 PM, Gabriel Michael Black <
>>>>> [email protected]> wrote:
>>>>>
>>>>>   There are two types of mechanisms to handle TLB misses, in hardware
>>>>>> or in
>>>>>> software. If the ISA you're using does it in software, there's a fault
>>>>>> which makes the OS handle the miss. In that case it will take however
>>>>>> long
>>>>>> it takes the OS to get things set up again. If the miss is handled in
>>>>>> hardware, then there's a TLB walker component which does memory
>>>>>> accesses to
>>>>>> look up the entry in the page tables, and the delay is determined by
>>>>>> those
>>>>>> accesses.
>>>>>>
>>>>>> Gabe
>>>>>>
>>>>>>
>>>>>> Quoting Paul Rosenfeld <[email protected]>:
>>>>>>
>>>>>>  Hello all,
>>>>>>
>>>>>>>
>>>>>>> I'm trying to modify the TLB code for SimpleTimingCPU, but one thing
>>>>>>> I
>>>>>>> can't seem to find is what the latency of a DTLB miss is. I found
>>>>>>> the code
>>>>>>> in NDtbMissFault->invoke() for reading the page table mapping, but I
>>>>>>> can't
>>>>>>> seem to figure out if there's any mechanism for stalling the CPU to
>>>>>>> handle
>>>>>>> the fault.
>>>>>>>
>>>>>>> Reading the wiki for the SImpleTimingCPU, it sounds like it isn't
>>>>>>> meant to
>>>>>>> model this kind of detail. So is it just a one cycle fetch penalty
>>>>>>> for
>>>>>>> handling a TLB miss?
>>>>>>>
>>>>>>> If this is the case, what's the simplest CPU model that will actually
>>>>>>> stall
>>>>>>> for TLB misses?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Paul
>>>>>>>
>>>>>>>
>>>>>>>  ______________________________**_________________
>>>>>> gem5-users mailing list
>>>>>> [email protected]
>>>>>> http://m5sim.org/cgi-bin/**mailman/listinfo/gem5-users<ht
>>>>>> tp://m5sim.org/cgi-bin/mailman/listinfo/gem5-users>
>>>>>>
>>>>>>
>>>>
>>>> _______________________________________________
>>>> gem5-users mailing list
>>>> [email protected]
>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> gem5-users mailing list
>>> [email protected]
>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>>
>>
>>
>
>
>
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to