Re: [m5-users] FS Stall Near Program Termination

Steve Reinhardt Sat, 03 Oct 2009 16:57:30 -0700

OK, I found my problem... I had disabled dramsim in my scripts so I
could isolate the earlier bug in the cpu, and forgot to re-enable it.
Now that I'm really using dramsim I do see the hasTargets() assertion
failure.  I haven't tried to reproduce the newer one.  Anyway, that
seems to indicate that the bug is in your code... I ran under the
debugger to find the address in the packet it's dying on (0x149a000),
then re-ran like this:


build/ALPHA_FS/m5.opt -d nopre-trace  --trace-start=1813000000000
--trace-flags=Cache,Bus configs/example/dramsimfs.py -b stream -F
10000000000 --mp "physicaladdressmappingpolicy sdramhiperf
commandorderingalgorithm bankroundrobin rowBufferPolicy closepage"
--nopre |& egrep -i '[ x:]149a0[0-3][0-9a-f]' >& trace.out

and found this at the end of the trace:
1813625971850: system.membus: recvTiming: src 2 dst 3 ReadResp 0x149a000
1813625971850: system.iocache: Handling response to 149a000
1813625971850: system.iocache: Block for addr 149a000 being updated in Cache
1813625971850: system.iocache: replacement: replacing 1faffc00 with
149a000: writeback
1813625971850: system.iocache: Block addr 149a000 moving from state 0 to 5
1813625971851: system.membus: recvTiming: src 2 dst 3 ReadResp 0x149a000 BUSY
1813625973000: system.tol2bus: recvTiming: src 2 dst 0 ReadResp 0x149a000
1813625973001: system.tol2bus: recvTiming: src 2 dst 0 ReadResp 0x149a000 BUSY
1813625973390: system.membus: recvTiming: src 2 dst 3 ReadResp 0x149a000
1813625973390: system.iocache: Handling response to 149a000

...which shows that the iocache is getting two ReadResp responses to a
single request.  I did a little more digging and found that the
problem is in M5dramSystem::MemoryPort::recvTiming(), where you check
memory->needRetry and return false *before* you check
packet->memInhibitAsserted().  What happens is that you're causing a
packet with memInhibit asserted to get retried, so after the cache
that asserted memInhibit responds, the bus retries the transaction
again, but the cache has already sent its response so it doesn't
inhibit memory again, and then your memory module sends a second
response.  I just reordered those two checks in that function and it
seems to work.

I'm going to add an assert in bus.cc so that this problem can be
detected more easily in the future.

Steve

On Wed, Sep 30, 2009 at 9:43 PM, Joe Gross <[email protected]> wrote:
> I'm also seeing a new one when I disable prefetchers and run stream again:
>  build/ALPHA_FS/mem/cache/cache_impl.hh:1335: Packet*
> Cache<TagStore>::getTimingPacket() [with TagStore = LRU]: Assertion
> `tags->findBlock(mshr->addr) == __null' failed.
>
> command line:
> m5/configs/example/dramsimfs.py -F 10000000000 --mp
> "physicaladdressmappingpolicy sdramhiperf commandorderingalgorithm
> bankroundrobin rowBufferPolicy closepage" -b stream
>
> I wonder if this could be the memory simulator handling the packet
> requests incorrectly, i.e. returning a packet that didn't need a
> response or something like that.
>
> The prefetcher was disabled, caches were set as follows:
> test_sys.l2 = L2Cache(size = '512kB', assoc=16, latency="6ns")
> test_sys.cpu[i].addPrivateSplitL1Caches(L1Cache(size =
> '64kB'),L1Cache(size = '64kB'))
>
> If you aren't able to replicate these, then we should compare setups and
> fix whatever's going on with my system. Or if you need any other info on
> how I'm running simulations, please let me know.
>
> Joe
>
> Steve Reinhardt wrote:
>> Thanks... I think you had mentioned that assertion failure before but
>> it's not one I've been able to reproduce so I forgot about it.  Sorry
>> about that.  I'll try again to see if I can reproduce it.
>>
>> Steve
>>
>> On Tue, Sep 29, 2009 at 12:33 AM, Joe Gross <[email protected]> wrote:
>>
>>> Steve,
>>>
>>> Thanks for looking into this. I'm able to get the stream benchmark to
>>> run to completion now.
>>>
>>> I'm not sure if this is the problem you're talking about (so sorry in
>>> advance if this is a repeat), but I'm still having trouble running bzip2
>>> and getting the assertion error:
>>>  build/ALPHA_FS/mem/cache/mshr.hh:228: MSHR::Target* MSHR::getTarget()
>>> const: Assertion `hasTargets()' failed.
>>>
>>> This turns into a segfault on m5.fast.
>>>
>>> I pulled your changes a few days ago and have been testing them out
>>> after updating a few of my files (changes pushed to my head rev). I
>>> tried running again with the prefetcher moved to the L1 cache and even
>>> removed all prefetchers from the system and still get this assertion, so
>>> I think this isn't related to the prefetcher, but I could be wrong. I
>>> even removed the old workaround of increasing the tgts_per_mshr and it
>>> remains.
>>>
>>> I use these flags when running to get the assertion error (config files
>>> no longer necessary):
>>> m5/configs/example/dramsimfs.py -b bzip2 -F 10000000000 --mp
>>> "physicaladdressmappingpolicy sdramhiperf commandorderingalgorithm
>>> bankroundrobin rowBufferPolicy closepage"
>>>
>>> Joe
>>>
>>> Steve Reinhardt wrote:
>>>
>>>> Joe,
>>>>
>>>> I think with the changes I pushed recently both the CPU stalling and
>>>> the assertion failure should be taken care of.  Note that if you pull
>>>> from the head you'll have to make some minor changes in your python
>>>> scripts because of some code reorganization that Nate did.
>>>>
>>>> Unfortunately I've run into another bug that's more complicated to
>>>> fix: because our cache hierarchy does not enforce inclusion, it's
>>>> possible for the L2 prefetcher to prefetch a block that's already
>>>> modified in the L1 and end up prefetching a stale copy out of main
>>>> memory.  In at least one situation (where the L1 writes its modified
>>>> copy back to the L2 in the interval between where the L2's prefetch is
>>>> issued and the prefetch response returns), the stale copy can
>>>> overwrite the modified copy and data is lost.  I think the best way to
>>>> address this is to have L2 prefetches probe the L1s, but there's no
>>>> clean mechanism for that right now, so it's not a trivial fix.  In the
>>>> meantime, having the prefetcher be associated with the L1 dcache and
>>>> not the L2 should be sufficient to avoid this problem.
>>>>
>>>> Please let us know if you run into anything else...
>>>>
>>>> Steve
>>>> _______________________________________________
>>>> m5-users mailing list
>>>> [email protected]
>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>>>>
>>>>
>>> _______________________________________________
>>> m5-users mailing list
>>> [email protected]
>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>>>
>>>
>> _______________________________________________
>> m5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>>
> _______________________________________________
> m5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Re: [m5-users] FS Stall Near Program Termination

Reply via email to