Hi Tim,

> On 18/06/2015 15:35, Joel Hestness wrote:
>
>>    I think we should keep the cache tracing functionality. I've used cache
>> warm-up after taking repeated checkpoints to find particular system
>> activity levels, and I only restore+simulate those that meet some criteria
>> (i.e. like simpoints). Often, the intervals simulated between these
>> checkpoints are fine-grained, so it is important to be able to do cache
>> cool-down and warm-up in a slim, automatic way. I expect we would rather
>> not try to figure out another means of warming up caches.
>>
>>  Ah, I see.  No worries, I just wanted to be sure that we want to keep
> this functionality before I dive into it too much.


Alright. Thanks.


    Can you describe the restore problems you're running into? Perhaps we
>> can
>> help debug.
>>
>
> The biggest problem is a seg fault within the simulated program, which
> happens almost immediately after restoration.  An example output is below.
> I think my first step will be to verify that the blocks in each of the
> caches at the point the checkpoint is taken is the same as after the
> checkpoint has been restored.
>
> Cheers
> Tim
>
>
> x264[1016]: segfault at 28 ip 00007ffbcb65dbc4 sp 00007fffe7d5f780 error 4
> in ld-2.6.1.so[7ffbcb655000+1b000]
> ------------[ cut here ]------------
> kernel BUG at mm/mmap.c:2274!
> invalid opcode: 0000 [#1] SMP
> CPU 0
> Modules linked in:
>
> Pid: 1016, comm: x264 Not tainted 3.2.24 #1
> RIP: 0010:[<ffffffff810c00ae>]  [<ffffffff810c00ae>] exit_mmap+0xfe/0x100
> RSP: 0000:ffff88001e1cfc08  EFLAGS: 0000022c
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88001e1cffd8
> RDX: 000000000000006a RSI: ffff88001e1a1330 RDI: ffff88001e81e380
> RBP: ffff88001e1b1740 R08: 0000000000000000 R09: 0000000000000000
> R10: ffff88001e8fb8d0 R11: ffffffff81739cc0 R12: 00007fffe7e00000
> R13: 0000000000000000 R14: 000000000000012a R15: ffff88001e930930
> FS:  00007ffbcb8666f0(0000) GS:ffff88001fc00000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007ffbcb65dbc4 CR3: 00000000016e3000 CR4: 00000000000006b0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000000
> Process x264 (pid: 1016, threadinfo ffff88001e1ce000, task
> ffff88001e930930)
> Stack:
>  00000000000006d5 ffff88001e1b1740 000000011e1cfc88 ffff88001e1cfc28
>  0000000000000000 0000000800000000 ffffea00006e6a38 ffffea00006e6a00
>  ffffea00006e69c8 ffffea00006e6990 ffffea00006e7100 ffffea00006e70c8
> Call Trace:
>  [<ffffffff810385d0>] ? mmput+0x30/0xe0
>  [<ffffffff8103cd07>] ? exit_mm+0xf7/0x120
>  [<ffffffff8105c2da>] ? hrtimer_try_to_cancel+0x6a/0xb0
>  [<ffffffff8103e433>] ? do_exit+0x133/0x760
>  [<ffffffff8103ead8>] ? do_group_exit+0x38/0xa0
>  [<ffffffff8104c95e>] ? get_signal_to_deliver+0x19e/0x550
>  [<ffffffff810016e4>] ? do_signal+0x44/0x700
>  [<ffffffff814aac0d>] ? do_page_fault+0x3fd/0x490
>  [<ffffffff8110ae3b>] ? fsnotify+0x24b/0x330
>  [<ffffffff8102b8df>] ? __wake_up+0x2f/0x50
>  [<ffffffff81223d10>] ? process_echoes+0x20/0x20
>  [<ffffffff81001e09>] ? do_notify_resume+0x49/0x50
>  [<ffffffff814a7d36>] ? retint_signal+0x3d/0x77
> Code: e8 08 72 ff ff 0f 1f 84 00 00 00 00 00 48 89 df e8 68 d7 ff ff 48 85
> c0 48 89 c3 75 f0 48 83 bd e0 00 00 00 00 0f 84 2a ff ff ff <0f> 0b 41 57
> 41 56 41 55 49 89 fd 41 54 49 89 f4 55 53 48 83 ec
> RIP  [<ffffffff810c00ae>] exit_mmap+0xfe/0x100
>  RSP <ffff88001e1cfc08>
> ---[ end trace d214638988f52ea9 ]---
> Fixing recursive fault but reboot is needed!


Unsolicited hint: this looks like it could be a bug in x264 or libraries
(depending on which version of Linux you've booted). ld-2.6.1 is 7 years
old, and may only work with older versions of Linux (e.g. 2.6.28.4). If
you're booting a newer Linux kernel, you might be running into
kernel<>library version issues. Have you tried running a simpler
application on checkpoint restore (e.g. hello in
gem5/tests/test-progs/hello/src/)?


  Joel


-- 
  Joel Hestness
  PhD Candidate, Computer Architecture
  Dept. of Computer Science, University of Wisconsin - Madison
  http://pages.cs.wisc.edu/~hestness/
_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to