I had the same assertion problem when switching CPUs. I think there is one bug in the DRAM memory latency calculation (if you are using DRAMMemory, you should see this). the 'lat' is declared as an integer. For some cases, the lat value overflows and gets a negative value (when busy_until current bank has a large value). Because of this, eventually when this negative value is passed on, it appears in the tport.cc and the assertion fails. I solved it by declaring lat withing drammemory.cc to be of the type uint64_t and put a separate check that 'lat' should always be >0

secondly, the assertion within tport.cc which checks for ms timing should be modified. With # cores > 8, I often get timing latency > 1ms. I modified that assertion to "ms*10".

- Sujay


--------------------------------------------------
From: "Joe Gross" <[email protected]>
Sent: Thursday, June 24, 2010 2:38 PM
To: <[email protected]>
Subject: Re: [m5-users] Segfault in Cache

Hey Steve,

I ran m5.opt so the assertion statements would work, so my command line was like this: ~/m5/build/ALPHA_FS/m5.opt ~/m5/configs/example/fs.py -n 4 --l3cache --detailed --caches -F 1000000000000 --script=<PARSEC script>

We added the option for l3 caches which are configured as I mentioned below. I used the PARSEC autogenerated scripts and the image file provided by UTexas.

Actually I got 4 different types of errors, two I/O related and two cache related. But the errors appear for any PARSEC workload, so choose one and try it.

e.g.
/home/joe/m5/build/ALPHA_FS/m5.opt /home/joe/m5/configs/example/fs.py -n 4 --l3cache --detailed --caches -F 1000000000 --script=/home/joe/dramsimii/m5/configs/boot/dedup_4c_simmedium.rcS
...
m5.opt: build/ALPHA_FS/mem/tport.cc:110: void SimpleTimingPort::schedSendTiming(Packet*, Tick): Assertion `when > curTick' failed.
Program aborted at cycle 2251954476108

The assertion error happens fairly quickly once it switches to detailed mode so it shouldn't take long to test.

Joe

On 6/23/2010 1:14 AM, Steve Reinhardt wrote:
Hi Joe,

Can you give a command line for a run that's giving you trouble?

Thanks,

Steve

On Fri, Jun 18, 2010 at 9:26 AM, Joe Gross<[email protected]>  wrote:

We're getting a variety of assertion errors in the memory and I/O subsystem
after the latest round of patches. Things like:

panic: I/O Read - va0x801fc000005 size 1

or

build/ALPHA_FS/mem/cache/mshr.hh:228: MSHR::Target* MSHR::getTarget() const:
Assertion `hasTargets()' failed.

We make the L3 cache like this, essentially:
class L1Cache(BaseCache):
    size = '32kB'
    assoc = 4
    block_size = 64
    latency = '3ns'
    num_cpus = 1
    mshrs = 10
    tgts_per_mshr = 5

class L2Cache(BaseCache):
    size = '256kB'
    assoc = 8
    block_size = 64
    latency = '9ns'
    num_cpus = 1
    mshrs = 20
    tgts_per_mshr = 12

class L3Cache(BaseCache):
    size = '8MB'
    assoc = 16
    block_size = 64
    latency = '25ns'
    num_cpus = 4
    mshrs = 80
    tgts_per_mshr = 12

and config_cache is changed to include:
        system.l3cache = L3Cache()
        system.tol3bus = Bus()
        system.l3cache.cpu_side = system.tol3bus.port
        system.l3cache.mem_side = system.membus.port
        system.l3cache.num_cpus = options.num_cpus

        for i in xrange(options.num_cpus):
system.cpu[i].addTwoLevelCacheHierarchy(L1Cache(), L1Cache(),
L2Cache())
            system.cpu[i].connectMemPorts(system.tol3bus)

Anyone know what's going wrong with this configuration? It worked for most
PARSEC workloads before the updates in the past 3-4 days.

Joe


On 6/17/2010 2:39 PM, Joe Gross wrote:

That seems to fix the segfault problem, based on my testing thus far.

I seem to have run across another obscure bug as well:
panic: Inconsistent DMA transfer state: dmaState = 2 devState = 0
   @ cycle 437119165559000

#0  0x00007ffff5c5b4b5 in *__GI_raise (sig=<value optimized out>) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007ffff5c5ef50 in *__GI_abort () at abort.c:92
#2  0x00000000007086e5 in __exit_message (prefix=<value optimized out>,
code=<value optimized out>, func=<value optimized out>, file=<value
optimized out>, line=323,
      fmt=<value optimized out>, a01=..., a02=..., a03=..., a04=...,
a05=..., a06=..., a07=..., a08=..., a09=..., a10=..., a11=..., a12=...,
a13=..., a14=..., a15=..., a16=...)
      at build/ALPHA_FS/base/misc.cc:90
#3  0x00000000008eac25 in IdeDisk::doDmaTransfer (this=<value optimized
out>) at build/ALPHA_FS/dev/ide_disk.cc:323
#4  0x0000000000907a42 in DmaPort::recvTiming (this=0x2689370,
pkt=0xeb05620) at build/ALPHA_FS/dev/io_device.cc:146
#5  0x00000000004abb15 in Port::sendTiming (this=0x24f16d0,
pkt=0xeb05620) at build/ALPHA_FS/mem/port.hh:186
#6  Bus::recvTiming (this=0x24f16d0, pkt=0xeb05620) at
build/ALPHA_FS/mem/bus.cc:243
#7  0x00000000004dbd5b in Port::sendTiming (this=0x158e9d0) at
build/ALPHA_FS/mem/port.hh:186
#8  SimpleTimingPort::sendDeferredPacket (this=0x158e9d0) at
build/ALPHA_FS/mem/tport.cc:150
#9  0x0000000000413754 in EventQueue::serviceOne (this=<value optimized
out>) at build/ALPHA_FS/sim/eventq.cc:203
#10 0x000000000045e56a in simulate (num_cycles=9223372036854775807) at
build/ALPHA_FS/sim/simulate.cc:73

I was running dedup, from PARSEC, using the kernel and compiled binaries
that are provided.

Prior to that one I ran into a bug in the simulated program (same
simulator run, first attempt at running dedup):
[HOOKS] Entering ROI
*** glibc detected *** ./dedup: corrupted double-linked list:
0x00000200185351a0 ***

I don't think these are related to the 3-level cache setup, but I can't
be sure. We're still using PhysicalMemory also.

Joe


On 6/16/2010 1:21 PM, Steve Reinhardt wrote:


Yea, that even looks familiar... I feel like I've fixed this once
already, but obviously if I did it didn't get checked in.  I think
just changing this conditional to:
      if (prefetcher&&     !mshrQueue.isFull()) {
should take care of it.  Let me know either way, and if it does I'll
commit the fix.

Steve


On Wed, Jun 16, 2010 at 10:04 AM, Joe Gross<[email protected]> wrote:



Hello,

We've modified the cache hierarchy to include private L1, private L2 and
a
shared L3 cache, but we get a segfault when running certain PARSEC
workloads. The prefetcher value is NULL, as we have it turned off
because
the prefetcher is causing problems lately. Specifically, the problem is
at:

Program received signal SIGSEGV, Segmentation fault.
BasePrefetcher::getPacket (this=0x0) at
build/ALPHA_FS/mem/cache/prefetch/base.cc:135
135         if (pf.empty()) {
(gdb) bt
#0  BasePrefetcher::getPacket (this=0x0) at
build/ALPHA_FS/mem/cache/prefetch/base.cc:135
#1  0x0000000000503ceb in Cache<LRU>::getNextMSHR() ()
#2  0x0000000000503dc0 in Cache<LRU>::getTimingPacket() ()
#3  0x000000000050a589 in Cache<LRU>::MemSidePort::sendPacket() ()
#4 0x0000000000413754 in EventQueue::serviceOne (this=<value optimized
out>) at build/ALPHA_FS/sim/eventq.cc:203
#5 0x000000000045e56a in simulate (num_cycles=9223372036854775807) at
build/ALPHA_FS/sim/simulate.cc:73
...

I think the problem is in cache_impl.hh at line 1305:

PacketPtr pkt = prefetcher->getPacket();

Other points in the cache always have a conditional "if (prefetcher)"
before
they access it, but not at this point. Hopefully this is the only
problem
and is easily fixed.

Joe
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users




_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users



_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users


_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users


_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Reply via email to