Re: [m5-dev] X86 FS regression

Gabe Black Tue, 23 Nov 2010 13:04:22 -0800

Of these, I think the walker cache sounds better for two reasons. First,
it avoids the L1 pollution Ali was talking about, and second, a new bus
would add mostly inert stuff on the way to memory and which would
involve looking up what port to use even though it'd always be the same
one. I'll give that a try.


Gabe

Steve Reinhardt wrote:
> I think the two easy (python-only) solutions are sharing the existing
> L1 via a bus and tacking on a small L1 to the walker.  Which one is
> more realistic would depend on what you're trying to model.
>
> Steve
>
> On Tue, Nov 23, 2010 at 8:23 AM, Ali Saidi <[email protected]
> <mailto:[email protected]>> wrote:
>
>     So what is the relatively good way to make this work in the short
>     term? A bus? What about the slightly better version? I suppose a
>     small cache might be ok and probably somewhat realistic.
>
>      
>
>     Thanks,
>
>     Ali
>
>      
>
>      
>
>     On Tue, 23 Nov 2010 08:15:01 -0800, Steve Reinhardt
>     <[email protected] <mailto:[email protected]>> wrote:
>
>>     And even though I do think it could be made to work, I'm not sure
>>     it would be easy or a good idea.  There are a lot of corner cases
>>     to worry about, especially for writes, since you'd have to
>>     actually buffer the write data somewhere as opposed to just
>>     remembering that so-and-so has requested an exclusive copy.
>>
>>     Actually as I think about it, that might be the case that's
>>     breaking now... if the L1 has an exclusive copy and then it
>>     snoops a write (and not a read-exclusive), I'm guessing it will
>>     just invalidate its copy, losing the modifications.  I wouldn't
>>     be terribly surprised if reads are working OK (the L1 should
>>     snoop those and respond if it's the owner), and of course it's
>>     all OK if the L1 doesn't have a copy of the block.
>>
>>     So maybe there is a relatively easy way to make this work, but
>>     figuring out whether that's true and then testing it is still a
>>     non-trivial amount of effort.
>>
>>     Steve
>>
>>     On Tue, Nov 23, 2010 at 7:57 AM, Steve Reinhardt
>>     <[email protected] <mailto:[email protected]>> wrote:
>>
>>         No, when the L2 receives a request it assumes the L1s above
>>         it have already been snooped, which is true since the request
>>         came in on the bus that the L1s snoop.  The issue is that
>>         caches don't necessarily behave correctly when
>>         non-cache-block requests come in through their mem-side
>>         (snoop) port and not through their cpu-side (request) port. 
>>         I'm guessing this could be made to work, I'd just be very
>>         surprised if it does right now, since the caches weren't
>>         designed to deal with this case and aren't tested this way.
>>
>>         Steve
>>
>>
>>         On Tue, Nov 23, 2010 at 7:50 AM, Ali Saidi <[email protected]
>>         <mailto:[email protected]>> wrote:
>>
>>             Does it? Shouldn't the l2 receive the request, ask for
>>             the block and end up snooping the l1s?
>>
>>              
>>
>>             Ali
>>
>>              
>>
>>              
>>
>>             On Tue, 23 Nov 2010 07:30:00 -0800, Steve Reinhardt
>>             <[email protected] <mailto:[email protected]>> wrote:
>>
>>                 The point is that connecting between the L1 and L2
>>                 induces the same problems wrt the L1 that connecting
>>                 directly to memory induces wrt the whole cache
>>                 hierarchy.  You're just statistically more likely to
>>                 get away with it in the former case because the L1 is
>>                 smaller.
>>
>>                 Steve
>>
>>                 On Tue, Nov 23, 2010 at 7:16 AM, Ali Saidi
>>                 <[email protected] <mailto:[email protected]>> wrote:
>>
>>
>>                     Where are you connecting the table walker? If
>>                     it's between the l1 and l2 my guess is that it
>>                     will work. if it is to the memory bus, yes,
>>                     memory is just responding without the help of a
>>                     cache and this could be the reason.
>>
>>                     Ali
>>
>>
>>
>>                     On Tue, 23 Nov 2010 06:29:20 -0500, Gabe Black
>>                     <[email protected]
>>                     <mailto:[email protected]>> wrote:
>>
>>                         I think I may have just now. I've fixed a few
>>                         issues, and am now getting
>>                         to the point where something that should be
>>                         in the pagetables is causing
>>                         a page fault. I found where the table walker
>>                         is walking the tables for
>>                         this particular access, and the last level
>>                         entry is all 0s. There could
>>                         be a number of reasons this is all 0s, but
>>                         since the main difference
>>                         other than timing between this and a working
>>                         configuration is the
>>                         presence of caches and we've identified a
>>                         potential issue there, I'm
>>                         inclined to suspect the actual page table
>>                         entry is still in the L1 and
>>                         hasn't been evicted out to memory yet.
>>
>>                         To fix this, is the best solution to add a
>>                         bus below the CPU for all the
>>                         connections that need to go to the L1? I'm
>>                         assuming they'd all go into
>>                         the dcache since they're more data-ey and
>>                         that keeps the icache read
>>                         only (ignoring SMC issues), and the dcache is
>>                         probably servicing lower
>>                         bandwidth normally. It also seems a little
>>                         strange that this type of
>>                         configuration is going on in the BaseCPU.py
>>                         SimObject python file and
>>                         not a configuration file, but I could be
>>                         convinced there's a reason.
>>                         Even if this isn't really a "fix" or the
>>                         "right thing" to do, I'd still
>>                         like to try it temporarily at least to see if
>>                         it corrects the problem
>>                         I'm seeing.
>>
>>                         Gabe
>>
>>                         Ali Saidi wrote:
>>
>>
>>                             I haven't seen any strange behavior yet.
>>                             That isn't to say it's not
>>                             going to cause an issue in the future,
>>                             but we've taken many a tlb miss
>>                             and it hasn't fallen over yet.
>>
>>                             Ali
>>
>>                             On Mon, 22 Nov 2010 13:08:13 -0800, Steve
>>                             Reinhardt <[email protected]
>>                             <mailto:[email protected]>>
>>                             wrote:
>>
>>                                 Yea, I just got around to reading
>>                                 this thread and that was the point
>>                                 I was going to make... the L1 cache
>>                                 effectively serves as a
>>                                 translator between the CPU's
>>                                 word-size read & write requests and the
>>                                 coherent block-level requests that
>>                                 get snooped.  If you attach a
>>                                 CPU-like device (such as the table
>>                                 walker) directly to an L2, the
>>                                 CPU-like accesses that go to the L2
>>                                 will get sent to the L1s but I'm
>>                                 not sure they'll be handled
>>                                 correctly.  Not that they fundamentally
>>                                 couldn't, this just isn't a
>>                                 configuration we test so it's likely that
>>                                 there are problems... for example,
>>                                 the L1 may try to hand ownership
>>                                 to the requester but the requester
>>                                 won't recognize that and things
>>                                 will break.
>>
>>                                 Steve
>>
>>                                 On Mon, Nov 22, 2010 at 12:00 PM,
>>                                 Gabe Black <[email protected]
>>                                 <mailto:[email protected]>
>>                                 [email protected]
>>                                 <mailto:[email protected]>>> wrote:
>>
>>                                    What happens if an entry is in the
>>                                 L1 but not the L2?
>>
>>                                    Gabe
>>
>>                                    Ali Saidi wrote:
>>                                    > Between the l1 and l2 caches
>>                                 seems like a good place to me. The
>>                                    caches can cache page table
>>                                 entries, otherwise a tlb miss would
>>                                    be even more expensive then it is.
>>                                 The l1 isn't normally used for
>>                                    such things since it would get
>>                                 polluted (look why sparc has a
>>                                    load 128bits from l2, do not
>>                                 allocate into l1 instruction).
>>                                    >
>>                                    > Ali
>>                                    >
>>                                    > On Nov 22, 2010, at 4:27 AM,
>>                                 Gabe Black wrote:
>>                                    >
>>                                    >
>>                                    >>    For anybody waiting for an
>>                                 x86 FS regression (yes, I know,
>>                                    you can
>>                                    >> all hardly wait, but don't let
>>                                 this spoil your Thanksgiving)
>>                                    I'm getting
>>                                    >> closer to having it working,
>>                                 but I've discovered some issues
>>                                    with the
>>                                    >> mechanisms behind the --caches
>>                                 flag with fs.py and x86. I'm
>>                                    surprised I
>>                                    >> never thought to try it before.
>>                                 It also brings up some
>>                                    questions about
>>                                    >> where the table walkers should
>>                                 be hooked up in x86 and ARM.
>>                                    Currently
>>                                    >> it's after the L1, if any, but
>>                                 before the L2, if any, which
>>                                    seems wrong
>>                                    >> to me. Also caches don't seem
>>                                 to propagate requests upwards to
>>                                    the CPUs
>>                                    >> which may or may not be an
>>                                 issue. I'm still looking into that.
>>                                    >>
>>                                    >> Gabe
>>                                    >>
>>                                 
>> _______________________________________________
>>                                    >> m5-dev mailing list
>>                                    >> [email protected]
>>                                 <mailto:[email protected]>
>>                                 [email protected]
>>                                 <mailto:[email protected]>>
>>
>>                                    >>
>>                                 http://m5sim.org/mailman/listinfo/m5-dev
>>                                    >>
>>                                    >>
>>                                    >
>>                                    >
>>                                 
>> _______________________________________________
>>                                    > m5-dev mailing list
>>                                    > [email protected]
>>                                 <mailto:[email protected]>
>>                                 [email protected]
>>                                 <mailto:[email protected]>>
>>
>>                                    >
>>                                 http://m5sim.org/mailman/listinfo/m5-dev
>>                                    >
>>
>>                                  
>>                                  
>> _______________________________________________
>>                                    m5-dev mailing list
>>                                    [email protected]
>>                                 <mailto:[email protected]>
>>                                 [email protected]
>>                                 <mailto:[email protected]>>
>>
>>                                  
>>                                  http://m5sim.org/mailman/listinfo/m5-dev
>>
>>
>>
>>
>>
>>                             
>> ------------------------------------------------------------------------
>>
>>                             _______________________________________________
>>                             m5-dev mailing list
>>                             [email protected] <mailto:[email protected]>
>>                             http://m5sim.org/mailman/listinfo/m5-dev
>>
>>
>>                         _______________________________________________
>>                         m5-dev mailing list
>>                         [email protected] <mailto:[email protected]>
>>                         http://m5sim.org/mailman/listinfo/m5-dev
>>
>>
>>                     _______________________________________________
>>                     m5-dev mailing list
>>                     [email protected] <mailto:[email protected]>
>>                     http://m5sim.org/mailman/listinfo/m5-dev
>>
>>
>>              
>>
>>
>>             _______________________________________________
>>             m5-dev mailing list
>>             [email protected] <mailto:[email protected]>
>>             http://m5sim.org/mailman/listinfo/m5-dev
>>
>>
>      
>
>
>     _______________________________________________
>     m5-dev mailing list
>     [email protected] <mailto:[email protected]>
>     http://m5sim.org/mailman/listinfo/m5-dev
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>   

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] X86 FS regression

Reply via email to