And even though I do think it could be made to work, I'm not sure it would
be easy or a good idea.  There are a lot of corner cases to worry about,
especially for writes, since you'd have to actually buffer the write data
somewhere as opposed to just remembering that so-and-so has requested an
exclusive copy.

Actually as I think about it, that might be the case that's breaking now...
if the L1 has an exclusive copy and then it snoops a write (and not a
read-exclusive), I'm guessing it will just invalidate its copy, losing the
modifications.  I wouldn't be terribly surprised if reads are working OK
(the L1 should snoop those and respond if it's the owner), and of course
it's all OK if the L1 doesn't have a copy of the block.

So maybe there is a relatively easy way to make this work, but figuring out
whether that's true and then testing it is still a non-trivial amount of
effort.

Steve

On Tue, Nov 23, 2010 at 7:57 AM, Steve Reinhardt <[email protected]> wrote:

> No, when the L2 receives a request it assumes the L1s above it have already
> been snooped, which is true since the request came in on the bus that the
> L1s snoop.  The issue is that caches don't necessarily behave correctly when
> non-cache-block requests come in through their mem-side (snoop) port and not
> through their cpu-side (request) port.  I'm guessing this could be made to
> work, I'd just be very surprised if it does right now, since the caches
> weren't designed to deal with this case and aren't tested this way.
>
> Steve
>
>
> On Tue, Nov 23, 2010 at 7:50 AM, Ali Saidi <[email protected]> wrote:
>
>> Does it? Shouldn't the l2 receive the request, ask for the block and end
>> up snooping the l1s?
>>
>>
>>
>> Ali
>>
>>
>>
>>
>>
>> On Tue, 23 Nov 2010 07:30:00 -0800, Steve Reinhardt <[email protected]>
>> wrote:
>>
>> The point is that connecting between the L1 and L2 induces the same
>> problems wrt the L1 that connecting directly to memory induces wrt the whole
>> cache hierarchy.  You're just statistically more likely to get away with it
>> in the former case because the L1 is smaller.
>>
>> Steve
>>
>> On Tue, Nov 23, 2010 at 7:16 AM, Ali Saidi <[email protected]> wrote:
>>
>>>
>>> Where are you connecting the table walker? If it's between the l1 and l2
>>> my guess is that it will work. if it is to the memory bus, yes, memory is
>>> just responding without the help of a cache and this could be the reason.
>>>
>>> Ali
>>>
>>>
>>>
>>> On Tue, 23 Nov 2010 06:29:20 -0500, Gabe Black <[email protected]>
>>> wrote:
>>>
>>>> I think I may have just now. I've fixed a few issues, and am now getting
>>>> to the point where something that should be in the pagetables is causing
>>>> a page fault. I found where the table walker is walking the tables for
>>>> this particular access, and the last level entry is all 0s. There could
>>>> be a number of reasons this is all 0s, but since the main difference
>>>> other than timing between this and a working configuration is the
>>>> presence of caches and we've identified a potential issue there, I'm
>>>> inclined to suspect the actual page table entry is still in the L1 and
>>>> hasn't been evicted out to memory yet.
>>>>
>>>> To fix this, is the best solution to add a bus below the CPU for all the
>>>> connections that need to go to the L1? I'm assuming they'd all go into
>>>> the dcache since they're more data-ey and that keeps the icache read
>>>> only (ignoring SMC issues), and the dcache is probably servicing lower
>>>> bandwidth normally. It also seems a little strange that this type of
>>>> configuration is going on in the BaseCPU.py SimObject python file and
>>>> not a configuration file, but I could be convinced there's a reason.
>>>> Even if this isn't really a "fix" or the "right thing" to do, I'd still
>>>> like to try it temporarily at least to see if it corrects the problem
>>>> I'm seeing.
>>>>
>>>> Gabe
>>>>
>>>> Ali Saidi wrote:
>>>>
>>>>>
>>>>> I haven't seen any strange behavior yet. That isn't to say it's not
>>>>> going to cause an issue in the future, but we've taken many a tlb miss
>>>>> and it hasn't fallen over yet.
>>>>>
>>>>> Ali
>>>>>
>>>>> On Mon, 22 Nov 2010 13:08:13 -0800, Steve Reinhardt <[email protected]>
>>>>> wrote:
>>>>>
>>>>> Yea, I just got around to reading this thread and that was the point
>>>>>> I was going to make... the L1 cache effectively serves as a
>>>>>> translator between the CPU's word-size read & write requests and the
>>>>>> coherent block-level requests that get snooped.  If you attach a
>>>>>> CPU-like device (such as the table walker) directly to an L2, the
>>>>>> CPU-like accesses that go to the L2 will get sent to the L1s but I'm
>>>>>> not sure they'll be handled correctly.  Not that they fundamentally
>>>>>> couldn't, this just isn't a configuration we test so it's likely that
>>>>>> there are problems... for example, the L1 may try to hand ownership
>>>>>> to the requester but the requester won't recognize that and things
>>>>>> will break.
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>> On Mon, Nov 22, 2010 at 12:00 PM, Gabe Black <[email protected]
>>>>>> [email protected]>> wrote:
>>>>>>
>>>>>>    What happens if an entry is in the L1 but not the L2?
>>>>>>
>>>>>>    Gabe
>>>>>>
>>>>>>    Ali Saidi wrote:
>>>>>>    > Between the l1 and l2 caches seems like a good place to me. The
>>>>>>    caches can cache page table entries, otherwise a tlb miss would
>>>>>>    be even more expensive then it is. The l1 isn't normally used for
>>>>>>    such things since it would get polluted (look why sparc has a
>>>>>>    load 128bits from l2, do not allocate into l1 instruction).
>>>>>>    >
>>>>>>    > Ali
>>>>>>    >
>>>>>>    > On Nov 22, 2010, at 4:27 AM, Gabe Black wrote:
>>>>>>    >
>>>>>>    >
>>>>>>    >>    For anybody waiting for an x86 FS regression (yes, I know,
>>>>>>    you can
>>>>>>    >> all hardly wait, but don't let this spoil your Thanksgiving)
>>>>>>    I'm getting
>>>>>>    >> closer to having it working, but I've discovered some issues
>>>>>>    with the
>>>>>>    >> mechanisms behind the --caches flag with fs.py and x86. I'm
>>>>>>    surprised I
>>>>>>    >> never thought to try it before. It also brings up some
>>>>>>    questions about
>>>>>>    >> where the table walkers should be hooked up in x86 and ARM.
>>>>>>    Currently
>>>>>>    >> it's after the L1, if any, but before the L2, if any, which
>>>>>>    seems wrong
>>>>>>    >> to me. Also caches don't seem to propagate requests upwards to
>>>>>>    the CPUs
>>>>>>    >> which may or may not be an issue. I'm still looking into that.
>>>>>>    >>
>>>>>>    >> Gabe
>>>>>>    >> _______________________________________________
>>>>>>    >> m5-dev mailing list
>>>>>>     >> [email protected] [email protected]>
>>>>>>
>>>>>>    >> http://m5sim.org/mailman/listinfo/m5-dev
>>>>>>    >>
>>>>>>    >>
>>>>>>    >
>>>>>>    > _______________________________________________
>>>>>>    > m5-dev mailing list
>>>>>>    > [email protected] [email protected]>
>>>>>>
>>>>>>    > http://m5sim.org/mailman/listinfo/m5-dev
>>>>>>    >
>>>>>>
>>>>>>    _______________________________________________
>>>>>>    m5-dev mailing list
>>>>>>     [email protected] [email protected]>
>>>>>>
>>>>>>    http://m5sim.org/mailman/listinfo/m5-dev
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> m5-dev mailing list
>>>>> [email protected]
>>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>>
>>>>>
>>>> _______________________________________________
>>>> m5-dev mailing list
>>>> [email protected]
>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>
>>>
>>> _______________________________________________
>>> m5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>
>>
>>
>>
>> _______________________________________________
>> m5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/m5-dev
>>
>>
>
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to