And even though I do think it could be made to work, I'm not sure it would be easy or a good idea. There are a lot of corner cases to worry about, especially for writes, since you'd have to actually buffer the write data somewhere as opposed to just remembering that so-and-so has requested an exclusive copy.
Actually as I think about it, that might be the case that's breaking now... if the L1 has an exclusive copy and then it snoops a write (and not a read-exclusive), I'm guessing it will just invalidate its copy, losing the modifications. I wouldn't be terribly surprised if reads are working OK (the L1 should snoop those and respond if it's the owner), and of course it's all OK if the L1 doesn't have a copy of the block. So maybe there is a relatively easy way to make this work, but figuring out whether that's true and then testing it is still a non-trivial amount of effort. Steve On Tue, Nov 23, 2010 at 7:57 AM, Steve Reinhardt <[email protected]> wrote: > No, when the L2 receives a request it assumes the L1s above it have already > been snooped, which is true since the request came in on the bus that the > L1s snoop. The issue is that caches don't necessarily behave correctly when > non-cache-block requests come in through their mem-side (snoop) port and not > through their cpu-side (request) port. I'm guessing this could be made to > work, I'd just be very surprised if it does right now, since the caches > weren't designed to deal with this case and aren't tested this way. > > Steve > > > On Tue, Nov 23, 2010 at 7:50 AM, Ali Saidi <[email protected]> wrote: > >> Does it? Shouldn't the l2 receive the request, ask for the block and end >> up snooping the l1s? >> >> >> >> Ali >> >> >> >> >> >> On Tue, 23 Nov 2010 07:30:00 -0800, Steve Reinhardt <[email protected]> >> wrote: >> >> The point is that connecting between the L1 and L2 induces the same >> problems wrt the L1 that connecting directly to memory induces wrt the whole >> cache hierarchy. You're just statistically more likely to get away with it >> in the former case because the L1 is smaller. >> >> Steve >> >> On Tue, Nov 23, 2010 at 7:16 AM, Ali Saidi <[email protected]> wrote: >> >>> >>> Where are you connecting the table walker? If it's between the l1 and l2 >>> my guess is that it will work. if it is to the memory bus, yes, memory is >>> just responding without the help of a cache and this could be the reason. >>> >>> Ali >>> >>> >>> >>> On Tue, 23 Nov 2010 06:29:20 -0500, Gabe Black <[email protected]> >>> wrote: >>> >>>> I think I may have just now. I've fixed a few issues, and am now getting >>>> to the point where something that should be in the pagetables is causing >>>> a page fault. I found where the table walker is walking the tables for >>>> this particular access, and the last level entry is all 0s. There could >>>> be a number of reasons this is all 0s, but since the main difference >>>> other than timing between this and a working configuration is the >>>> presence of caches and we've identified a potential issue there, I'm >>>> inclined to suspect the actual page table entry is still in the L1 and >>>> hasn't been evicted out to memory yet. >>>> >>>> To fix this, is the best solution to add a bus below the CPU for all the >>>> connections that need to go to the L1? I'm assuming they'd all go into >>>> the dcache since they're more data-ey and that keeps the icache read >>>> only (ignoring SMC issues), and the dcache is probably servicing lower >>>> bandwidth normally. It also seems a little strange that this type of >>>> configuration is going on in the BaseCPU.py SimObject python file and >>>> not a configuration file, but I could be convinced there's a reason. >>>> Even if this isn't really a "fix" or the "right thing" to do, I'd still >>>> like to try it temporarily at least to see if it corrects the problem >>>> I'm seeing. >>>> >>>> Gabe >>>> >>>> Ali Saidi wrote: >>>> >>>>> >>>>> I haven't seen any strange behavior yet. That isn't to say it's not >>>>> going to cause an issue in the future, but we've taken many a tlb miss >>>>> and it hasn't fallen over yet. >>>>> >>>>> Ali >>>>> >>>>> On Mon, 22 Nov 2010 13:08:13 -0800, Steve Reinhardt <[email protected]> >>>>> wrote: >>>>> >>>>> Yea, I just got around to reading this thread and that was the point >>>>>> I was going to make... the L1 cache effectively serves as a >>>>>> translator between the CPU's word-size read & write requests and the >>>>>> coherent block-level requests that get snooped. If you attach a >>>>>> CPU-like device (such as the table walker) directly to an L2, the >>>>>> CPU-like accesses that go to the L2 will get sent to the L1s but I'm >>>>>> not sure they'll be handled correctly. Not that they fundamentally >>>>>> couldn't, this just isn't a configuration we test so it's likely that >>>>>> there are problems... for example, the L1 may try to hand ownership >>>>>> to the requester but the requester won't recognize that and things >>>>>> will break. >>>>>> >>>>>> Steve >>>>>> >>>>>> On Mon, Nov 22, 2010 at 12:00 PM, Gabe Black <[email protected] >>>>>> [email protected]>> wrote: >>>>>> >>>>>> What happens if an entry is in the L1 but not the L2? >>>>>> >>>>>> Gabe >>>>>> >>>>>> Ali Saidi wrote: >>>>>> > Between the l1 and l2 caches seems like a good place to me. The >>>>>> caches can cache page table entries, otherwise a tlb miss would >>>>>> be even more expensive then it is. The l1 isn't normally used for >>>>>> such things since it would get polluted (look why sparc has a >>>>>> load 128bits from l2, do not allocate into l1 instruction). >>>>>> > >>>>>> > Ali >>>>>> > >>>>>> > On Nov 22, 2010, at 4:27 AM, Gabe Black wrote: >>>>>> > >>>>>> > >>>>>> >> For anybody waiting for an x86 FS regression (yes, I know, >>>>>> you can >>>>>> >> all hardly wait, but don't let this spoil your Thanksgiving) >>>>>> I'm getting >>>>>> >> closer to having it working, but I've discovered some issues >>>>>> with the >>>>>> >> mechanisms behind the --caches flag with fs.py and x86. I'm >>>>>> surprised I >>>>>> >> never thought to try it before. It also brings up some >>>>>> questions about >>>>>> >> where the table walkers should be hooked up in x86 and ARM. >>>>>> Currently >>>>>> >> it's after the L1, if any, but before the L2, if any, which >>>>>> seems wrong >>>>>> >> to me. Also caches don't seem to propagate requests upwards to >>>>>> the CPUs >>>>>> >> which may or may not be an issue. I'm still looking into that. >>>>>> >> >>>>>> >> Gabe >>>>>> >> _______________________________________________ >>>>>> >> m5-dev mailing list >>>>>> >> [email protected] [email protected]> >>>>>> >>>>>> >> http://m5sim.org/mailman/listinfo/m5-dev >>>>>> >> >>>>>> >> >>>>>> > >>>>>> > _______________________________________________ >>>>>> > m5-dev mailing list >>>>>> > [email protected] [email protected]> >>>>>> >>>>>> > http://m5sim.org/mailman/listinfo/m5-dev >>>>>> > >>>>>> >>>>>> _______________________________________________ >>>>>> m5-dev mailing list >>>>>> [email protected] [email protected]> >>>>>> >>>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> m5-dev mailing list >>>>> [email protected] >>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>> >>>>> >>>> _______________________________________________ >>>> m5-dev mailing list >>>> [email protected] >>>> http://m5sim.org/mailman/listinfo/m5-dev >>>> >>> >>> _______________________________________________ >>> m5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/listinfo/m5-dev >>> >> >> >> >> _______________________________________________ >> m5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/m5-dev >> >> >
_______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
