Re: [m5-dev] X86 FS regression

Steve Reinhardt Tue, 23 Nov 2010 08:33:50 -0800

I think the two easy (python-only) solutions are sharing the existing L1 via
a bus and tacking on a small L1 to the walker.  Which one is more realistic
would depend on what you're trying to model.


Steve

On Tue, Nov 23, 2010 at 8:23 AM, Ali Saidi <[email protected]> wrote:

> So what is the relatively good way to make this work in the short term? A
> bus? What about the slightly better version? I suppose a small cache might
> be ok and probably somewhat realistic.
>
>
>
> Thanks,
>
> Ali
>
>
>
>
>
> On Tue, 23 Nov 2010 08:15:01 -0800, Steve Reinhardt <[email protected]>
> wrote:
>
> And even though I do think it could be made to work, I'm not sure it would
> be easy or a good idea.  There are a lot of corner cases to worry about,
> especially for writes, since you'd have to actually buffer the write data
> somewhere as opposed to just remembering that so-and-so has requested an
> exclusive copy.
>
> Actually as I think about it, that might be the case that's breaking now...
> if the L1 has an exclusive copy and then it snoops a write (and not a
> read-exclusive), I'm guessing it will just invalidate its copy, losing the
> modifications.  I wouldn't be terribly surprised if reads are working OK
> (the L1 should snoop those and respond if it's the owner), and of course
> it's all OK if the L1 doesn't have a copy of the block.
>
> So maybe there is a relatively easy way to make this work, but figuring out
> whether that's true and then testing it is still a non-trivial amount of
> effort.
>
> Steve
>
> On Tue, Nov 23, 2010 at 7:57 AM, Steve Reinhardt <[email protected]> wrote:
>
>> No, when the L2 receives a request it assumes the L1s above it have
>> already been snooped, which is true since the request came in on the bus
>> that the L1s snoop.  The issue is that caches don't necessarily behave
>> correctly when non-cache-block requests come in through their mem-side
>> (snoop) port and not through their cpu-side (request) port.  I'm guessing
>> this could be made to work, I'd just be very surprised if it does right now,
>> since the caches weren't designed to deal with this case and aren't tested
>> this way.
>>
>> Steve
>>
>>
>> On Tue, Nov 23, 2010 at 7:50 AM, Ali Saidi <[email protected]> wrote:
>>
>>> Does it? Shouldn't the l2 receive the request, ask for the block and end
>>> up snooping the l1s?
>>>
>>>
>>>
>>> Ali
>>>
>>>
>>>
>>>
>>>
>>> On Tue, 23 Nov 2010 07:30:00 -0800, Steve Reinhardt <[email protected]>
>>> wrote:
>>>
>>>  The point is that connecting between the L1 and L2 induces the same
>>> problems wrt the L1 that connecting directly to memory induces wrt the whole
>>> cache hierarchy.  You're just statistically more likely to get away with it
>>> in the former case because the L1 is smaller.
>>>
>>> Steve
>>>
>>>   On Tue, Nov 23, 2010 at 7:16 AM, Ali Saidi <[email protected]> wrote:
>>>
>>>>
>>>> Where are you connecting the table walker? If it's between the l1 and l2
>>>> my guess is that it will work. if it is to the memory bus, yes, memory is
>>>> just responding without the help of a cache and this could be the reason.
>>>>
>>>> Ali
>>>>
>>>>
>>>>
>>>> On Tue, 23 Nov 2010 06:29:20 -0500, Gabe Black <[email protected]>
>>>> wrote:
>>>>
>>>>>  I think I may have just now. I've fixed a few issues, and am now
>>>>> getting
>>>>> to the point where something that should be in the pagetables is
>>>>> causing
>>>>> a page fault. I found where the table walker is walking the tables for
>>>>> this particular access, and the last level entry is all 0s. There could
>>>>> be a number of reasons this is all 0s, but since the main difference
>>>>> other than timing between this and a working configuration is the
>>>>> presence of caches and we've identified a potential issue there, I'm
>>>>> inclined to suspect the actual page table entry is still in the L1 and
>>>>> hasn't been evicted out to memory yet.
>>>>>
>>>>> To fix this, is the best solution to add a bus below the CPU for all
>>>>> the
>>>>> connections that need to go to the L1? I'm assuming they'd all go into
>>>>> the dcache since they're more data-ey and that keeps the icache read
>>>>> only (ignoring SMC issues), and the dcache is probably servicing lower
>>>>> bandwidth normally. It also seems a little strange that this type of
>>>>> configuration is going on in the BaseCPU.py SimObject python file and
>>>>> not a configuration file, but I could be convinced there's a reason.
>>>>> Even if this isn't really a "fix" or the "right thing" to do, I'd still
>>>>> like to try it temporarily at least to see if it corrects the problem
>>>>> I'm seeing.
>>>>>
>>>>> Gabe
>>>>>
>>>>> Ali Saidi wrote:
>>>>>
>>>>>>
>>>>>> I haven't seen any strange behavior yet. That isn't to say it's not
>>>>>> going to cause an issue in the future, but we've taken many a tlb miss
>>>>>> and it hasn't fallen over yet.
>>>>>>
>>>>>> Ali
>>>>>>
>>>>>> On Mon, 22 Nov 2010 13:08:13 -0800, Steve Reinhardt <[email protected]
>>>>>> >
>>>>>> wrote:
>>>>>>
>>>>>>   Yea, I just got around to reading this thread and that was the
>>>>>>> point
>>>>>>> I was going to make... the L1 cache effectively serves as a
>>>>>>> translator between the CPU's word-size read & write requests and the
>>>>>>> coherent block-level requests that get snooped.  If you attach a
>>>>>>> CPU-like device (such as the table walker) directly to an L2, the
>>>>>>> CPU-like accesses that go to the L2 will get sent to the L1s but I'm
>>>>>>> not sure they'll be handled correctly.  Not that they fundamentally
>>>>>>> couldn't, this just isn't a configuration we test so it's likely that
>>>>>>> there are problems... for example, the L1 may try to hand ownership
>>>>>>> to the requester but the requester won't recognize that and things
>>>>>>> will break.
>>>>>>>
>>>>>>> Steve
>>>>>>>
>>>>>>> On Mon, Nov 22, 2010 at 12:00 PM, Gabe Black <[email protected]
>>>>>>>  [email protected]>> wrote:
>>>>>>>
>>>>>>>    What happens if an entry is in the L1 but not the L2?
>>>>>>>
>>>>>>>    Gabe
>>>>>>>
>>>>>>>    Ali Saidi wrote:
>>>>>>>    > Between the l1 and l2 caches seems like a good place to me. The
>>>>>>>    caches can cache page table entries, otherwise a tlb miss would
>>>>>>>    be even more expensive then it is. The l1 isn't normally used for
>>>>>>>    such things since it would get polluted (look why sparc has a
>>>>>>>    load 128bits from l2, do not allocate into l1 instruction).
>>>>>>>    >
>>>>>>>    > Ali
>>>>>>>    >
>>>>>>>    > On Nov 22, 2010, at 4:27 AM, Gabe Black wrote:
>>>>>>>    >
>>>>>>>    >
>>>>>>>    >>    For anybody waiting for an x86 FS regression (yes, I know,
>>>>>>>    you can
>>>>>>>    >> all hardly wait, but don't let this spoil your Thanksgiving)
>>>>>>>    I'm getting
>>>>>>>    >> closer to having it working, but I've discovered some issues
>>>>>>>    with the
>>>>>>>    >> mechanisms behind the --caches flag with fs.py and x86. I'm
>>>>>>>    surprised I
>>>>>>>    >> never thought to try it before. It also brings up some
>>>>>>>    questions about
>>>>>>>    >> where the table walkers should be hooked up in x86 and ARM.
>>>>>>>    Currently
>>>>>>>    >> it's after the L1, if any, but before the L2, if any, which
>>>>>>>    seems wrong
>>>>>>>    >> to me. Also caches don't seem to propagate requests upwards to
>>>>>>>    the CPUs
>>>>>>>    >> which may or may not be an issue. I'm still looking into that.
>>>>>>>    >>
>>>>>>>    >> Gabe
>>>>>>>    >> _______________________________________________
>>>>>>>    >> m5-dev mailing list
>>>>>>>     >> [email protected] [email protected]>
>>>>>>>
>>>>>>>    >> http://m5sim.org/mailman/listinfo/m5-dev
>>>>>>>    >>
>>>>>>>    >>
>>>>>>>    >
>>>>>>>    > _______________________________________________
>>>>>>>    > m5-dev mailing list
>>>>>>>    > [email protected] [email protected]>
>>>>>>>
>>>>>>>    > http://m5sim.org/mailman/listinfo/m5-dev
>>>>>>>    >
>>>>>>>
>>>>>>>    _______________________________________________
>>>>>>>    m5-dev mailing list
>>>>>>>    [email protected] [email protected]>
>>>>>>>
>>>>>>>    http://m5sim.org/mailman/listinfo/m5-dev
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>> _______________________________________________
>>>>>> m5-dev mailing list
>>>>>> [email protected]
>>>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> m5-dev mailing list
>>>>> [email protected]
>>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>>
>>>>
>>>> _______________________________________________
>>>> m5-dev mailing list
>>>> [email protected]
>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> m5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>
>>>
>
>
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>
>

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] X86 FS regression

Reply via email to