Hi Tim,
> On 18/06/2015 15:35, Joel Hestness wrote: > >> I think we should keep the cache tracing functionality. I've used cache >> warm-up after taking repeated checkpoints to find particular system >> activity levels, and I only restore+simulate those that meet some criteria >> (i.e. like simpoints). Often, the intervals simulated between these >> checkpoints are fine-grained, so it is important to be able to do cache >> cool-down and warm-up in a slim, automatic way. I expect we would rather >> not try to figure out another means of warming up caches. >> >> Ah, I see. No worries, I just wanted to be sure that we want to keep > this functionality before I dive into it too much. Alright. Thanks. Can you describe the restore problems you're running into? Perhaps we >> can >> help debug. >> > > The biggest problem is a seg fault within the simulated program, which > happens almost immediately after restoration. An example output is below. > I think my first step will be to verify that the blocks in each of the > caches at the point the checkpoint is taken is the same as after the > checkpoint has been restored. > > Cheers > Tim > > > x264[1016]: segfault at 28 ip 00007ffbcb65dbc4 sp 00007fffe7d5f780 error 4 > in ld-2.6.1.so[7ffbcb655000+1b000] > ------------[ cut here ]------------ > kernel BUG at mm/mmap.c:2274! > invalid opcode: 0000 [#1] SMP > CPU 0 > Modules linked in: > > Pid: 1016, comm: x264 Not tainted 3.2.24 #1 > RIP: 0010:[<ffffffff810c00ae>] [<ffffffff810c00ae>] exit_mmap+0xfe/0x100 > RSP: 0000:ffff88001e1cfc08 EFLAGS: 0000022c > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88001e1cffd8 > RDX: 000000000000006a RSI: ffff88001e1a1330 RDI: ffff88001e81e380 > RBP: ffff88001e1b1740 R08: 0000000000000000 R09: 0000000000000000 > R10: ffff88001e8fb8d0 R11: ffffffff81739cc0 R12: 00007fffe7e00000 > R13: 0000000000000000 R14: 000000000000012a R15: ffff88001e930930 > FS: 00007ffbcb8666f0(0000) GS:ffff88001fc00000(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 00007ffbcb65dbc4 CR3: 00000000016e3000 CR4: 00000000000006b0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000000 > Process x264 (pid: 1016, threadinfo ffff88001e1ce000, task > ffff88001e930930) > Stack: > 00000000000006d5 ffff88001e1b1740 000000011e1cfc88 ffff88001e1cfc28 > 0000000000000000 0000000800000000 ffffea00006e6a38 ffffea00006e6a00 > ffffea00006e69c8 ffffea00006e6990 ffffea00006e7100 ffffea00006e70c8 > Call Trace: > [<ffffffff810385d0>] ? mmput+0x30/0xe0 > [<ffffffff8103cd07>] ? exit_mm+0xf7/0x120 > [<ffffffff8105c2da>] ? hrtimer_try_to_cancel+0x6a/0xb0 > [<ffffffff8103e433>] ? do_exit+0x133/0x760 > [<ffffffff8103ead8>] ? do_group_exit+0x38/0xa0 > [<ffffffff8104c95e>] ? get_signal_to_deliver+0x19e/0x550 > [<ffffffff810016e4>] ? do_signal+0x44/0x700 > [<ffffffff814aac0d>] ? do_page_fault+0x3fd/0x490 > [<ffffffff8110ae3b>] ? fsnotify+0x24b/0x330 > [<ffffffff8102b8df>] ? __wake_up+0x2f/0x50 > [<ffffffff81223d10>] ? process_echoes+0x20/0x20 > [<ffffffff81001e09>] ? do_notify_resume+0x49/0x50 > [<ffffffff814a7d36>] ? retint_signal+0x3d/0x77 > Code: e8 08 72 ff ff 0f 1f 84 00 00 00 00 00 48 89 df e8 68 d7 ff ff 48 85 > c0 48 89 c3 75 f0 48 83 bd e0 00 00 00 00 0f 84 2a ff ff ff <0f> 0b 41 57 > 41 56 41 55 49 89 fd 41 54 49 89 f4 55 53 48 83 ec > RIP [<ffffffff810c00ae>] exit_mmap+0xfe/0x100 > RSP <ffff88001e1cfc08> > ---[ end trace d214638988f52ea9 ]--- > Fixing recursive fault but reboot is needed! Unsolicited hint: this looks like it could be a bug in x264 or libraries (depending on which version of Linux you've booted). ld-2.6.1 is 7 years old, and may only work with older versions of Linux (e.g. 2.6.28.4). If you're booting a newer Linux kernel, you might be running into kernel<>library version issues. Have you tried running a simpler application on checkpoint restore (e.g. hello in gem5/tests/test-progs/hello/src/)? Joel -- Joel Hestness PhD Candidate, Computer Architecture Dept. of Computer Science, University of Wisconsin - Madison http://pages.cs.wisc.edu/~hestness/ _______________________________________________ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev