The patches I referred to have since been committed; if you're using
the latest code from the development repository, you've got them.

Steve

On Sun, Oct 3, 2010 at 6:04 AM, Lesha Jolondz <[email protected]> wrote:
> Hi Steve,
>
> I experience the same problem running PARSEC benchmarks at 4 core
> configuration with shared L2 cache. Could you please send me your patches.
>
> Thanks,
> Aleksei
>
> On Tue, Aug 24, 2010 at 12:27 AM, Steve Reinhardt <[email protected]> wrote:
>>
>> Hi Stijn,
>>
>> It's true that there are subtle bugs in the coherence protocol that
>> seem to appear only when you use a different DRAM model that creates
>> different timings.  I spent a fair amount of time a month or two ago
>> to try and fix things up, and I made some progress, but it's hard to
>> fix one subtle bug without introducing another one, and I got busy
>> with other things before I could wrap it up.  I hope to get back to it
>> soon.  I could send you my patches if you are interested.
>>
>> This complexity is one of the reasons we're focusing more on Ruby as
>> our long-term memory system model.
>>
>> As far as the particular behavior you are seeing, note that the
>> protocol is configuration-independent, so the UpgradeReq has to be
>> passed through the L2 cache in case there are other L2 or L3 caches
>> that need to be invalidated.  Main memory is responsible for
>> responding to the UpgradeReq since that's the point where it globally
>> completes.  There's no need for the DRAM controller to access DRAM or
>> impose a corresponding delay though; it could just respond right away.
>>
>> Steve
>>
>> On Mon, Aug 23, 2010 at 7:15 AM, Stijn Eyerman
>> <[email protected]> wrote:
>> > Hi,
>> >
>> > I'm simulating the PARSEC benchmarks using M5 (with the manual and files
>> > supplied by the UTAustin people). Some benchmarks execute without errors
>> > in
>> > functional mode, but seem to get stuck in an infinite loop when using
>> > timing
>> > simulation.
>> >
>> > One such is dedup with the test input. The cause is that one cpu (i.c.
>> > cpu2)
>> > stops dispatching and issuing instructions while it is not finished.
>> > After some days of debugging, I found this to be the cause:
>> > - cpu3 executes a conditional store (SC)
>> > - the cache line is present in its private L1 cache, but the status is
>> > shared, so an UpgradeReq-event is scheduled
>> > - the L1 cache of cpu2 finds the same cache line and invalidates it
>> > - the UpgradeReq is also sent to the shared L2 cache (why?), where it
>> > causes
>> > a miss (cache line not present)
>> > - main memory is accessed (why?) and the store cannot continue until
>> > after
>> > the memory latency
>> > - in the meanwhile cpu2 executes a LoadLocked (LL) to the same cache
>> > line
>> > - since the cacheline was invalidated, it accesses the bus
>> > - the L1 cache of cpu3 detects this load request, finds an mshr that
>> > waits
>> > on the return of data for that cache line (the UpgradeReq), attaches the
>> > request to that mshr's targets, and inhibits the L2 access for that
>> > memory
>> > operation (since it will be served by the coherence protocol)
>> > - when the memory has served the UpgradeReq from cpu3, the SC on cpu3
>> > can
>> > continue
>> > - the mshr finds another target (the LL from cpu2) but deletes it
>> > because
>> > the cache line is not dirty
>> >   --> see Cache::handleSnoop in cache_impl.hh:
>> >                bool respond = blk->isDirty() && pkt->needsResponse();
>> >                ...
>> >                if (respond){
>> >                ...
>> >                }
>> >                else if (is_timing && is_deferred) {
>> >                    delete pkt;
>> >                }
>> > - the status of the LL in cpu2 remains issued, but since the request is
>> > deleted, no answer returns, and the cpu blocks forever
>> >
>> > It is probably worth to note that I use the DRAM module to simulate
>> > physical
>> > memory. I've seen the warning that it is not tested with the current
>> > memory
>> > model, but as far as I can deduce, this is not the cause of the error
>> > (it
>> > just calculates and returns the memory latency).
>> >
>> > Can someone help me with this (complicated) problem?
>> >
>> > Thanks!
>> >
>> > Stijn
>> >
>> > --
>> > dr. ir. Stijn Eyerman
>> >
>> > Ghent University
>> > Electronics and Information Systems Department
>> > Sint-Pietersnieuwstraat 41
>> > 9000 Gent
>> > Belgium
>> >
>> > t: +32 9 264 3456
>> > f: +32 9 264 3594
>> > e: [email protected]
>> > w: http://www.elis.UGent.be/~seyerman/
>> >
>> > ----------------------------------------------------------------
>> > This message was sent using IMP, the Internet Messaging Program.
>> >
>> > _______________________________________________
>> > m5-users mailing list
>> > [email protected]
>> > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>> >
>> _______________________________________________
>> m5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>
>
> _______________________________________________
> m5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Reply via email to