Hi Joel,

On 18/06/2015 15:35, Joel Hestness wrote:
   I think we should keep the cache tracing functionality. I've used cache
warm-up after taking repeated checkpoints to find particular system
activity levels, and I only restore+simulate those that meet some criteria
(i.e. like simpoints). Often, the intervals simulated between these
checkpoints are fine-grained, so it is important to be able to do cache
cool-down and warm-up in a slim, automatic way. I expect we would rather
not try to figure out another means of warming up caches.

Ah, I see. No worries, I just wanted to be sure that we want to keep this functionality before I dive into it too much.

   Can you describe the restore problems you're running into? Perhaps we can
help debug.

The biggest problem is a seg fault within the simulated program, which happens almost immediately after restoration. An example output is below. I think my first step will be to verify that the blocks in each of the caches at the point the checkpoint is taken is the same as after the checkpoint has been restored.

Cheers
Tim


x264[1016]: segfault at 28 ip 00007ffbcb65dbc4 sp 00007fffe7d5f780 error 4 in ld-2.6.1.so[7ffbcb655000+1b000]
------------[ cut here ]------------
kernel BUG at mm/mmap.c:2274!
invalid opcode: 0000 [#1] SMP
CPU 0
Modules linked in:

Pid: 1016, comm: x264 Not tainted 3.2.24 #1
RIP: 0010:[<ffffffff810c00ae>]  [<ffffffff810c00ae>] exit_mmap+0xfe/0x100
RSP: 0000:ffff88001e1cfc08  EFLAGS: 0000022c
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88001e1cffd8
RDX: 000000000000006a RSI: ffff88001e1a1330 RDI: ffff88001e81e380
RBP: ffff88001e1b1740 R08: 0000000000000000 R09: 0000000000000000
R10: ffff88001e8fb8d0 R11: ffffffff81739cc0 R12: 00007fffe7e00000
R13: 0000000000000000 R14: 000000000000012a R15: ffff88001e930930
FS:  00007ffbcb8666f0(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007ffbcb65dbc4 CR3: 00000000016e3000 CR4: 00000000000006b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000000
Process x264 (pid: 1016, threadinfo ffff88001e1ce000, task ffff88001e930930)
Stack:
 00000000000006d5 ffff88001e1b1740 000000011e1cfc88 ffff88001e1cfc28
 0000000000000000 0000000800000000 ffffea00006e6a38 ffffea00006e6a00
 ffffea00006e69c8 ffffea00006e6990 ffffea00006e7100 ffffea00006e70c8
Call Trace:
 [<ffffffff810385d0>] ? mmput+0x30/0xe0
 [<ffffffff8103cd07>] ? exit_mm+0xf7/0x120
 [<ffffffff8105c2da>] ? hrtimer_try_to_cancel+0x6a/0xb0
 [<ffffffff8103e433>] ? do_exit+0x133/0x760
 [<ffffffff8103ead8>] ? do_group_exit+0x38/0xa0
 [<ffffffff8104c95e>] ? get_signal_to_deliver+0x19e/0x550
 [<ffffffff810016e4>] ? do_signal+0x44/0x700
 [<ffffffff814aac0d>] ? do_page_fault+0x3fd/0x490
 [<ffffffff8110ae3b>] ? fsnotify+0x24b/0x330
 [<ffffffff8102b8df>] ? __wake_up+0x2f/0x50
 [<ffffffff81223d10>] ? process_echoes+0x20/0x20
 [<ffffffff81001e09>] ? do_notify_resume+0x49/0x50
 [<ffffffff814a7d36>] ? retint_signal+0x3d/0x77
Code: e8 08 72 ff ff 0f 1f 84 00 00 00 00 00 48 89 df e8 68 d7 ff ff 48 85 c0 48 89 c3 75 f0 48 83 bd e0 00 00 00 00 0f 84 2a ff ff ff <0f> 0b 41 57 41 56 41 55 49 89 fd 41 54 49 89 f4 55 53 48 83 ec
RIP  [<ffffffff810c00ae>] exit_mmap+0xfe/0x100
 RSP <ffff88001e1cfc08>
---[ end trace d214638988f52ea9 ]---
Fixing recursive fault but reboot is needed!

--
Timothy M. Jones
http://www.cl.cam.ac.uk/~tmj32/
_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to