Sehr geehrte Herren/Frauen,
ich bin nicht im Büro und habe keine Internetverbindung. Erwarten Sie, dass 
sich meine Antwort bis zum 3. Januar verzögert.
Mit freundlichen Grüßen,
Antonio
____

Dear Sir/Madam,
I am out of office with low connection to internet. Expect delays in my answer 
till 3 January.
Best regards,
Antonio

On Dec 14, 2021, at 16:14, Jordi Vaquero via gem5-users <gem5-users@gem5.org> 
wrote:

Hello Gelin and Giacomo,
Recently I was using gem5 in se mode to run some SPEC 2006 runs and I realize 
there is some checkpoint retrieval problem.
In my case it doesn't behave like in Gelin case, but I though that better add 
it here than create a new thread.

Some information,
        •       I am working emulating riscv, I didn't check if this is still 
happens in other architectures.

        •       This specific test I am using is perlbench test1 from SPEC2006.

I created a checkpoint with gem5 and when retrieving it the following error 
appears,

build/RISCV/arch/riscv/faults.cc:b4: panic: Illegal instruction 0x00000000 at 
pc 0x00000000000106c4:
Memory Usage: 1214492 KBytes
Program aborted at tick 4436647500
--- BEGIN LIBC BACKTRACE ---
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0xba15cc)[0x55689b81c5cc]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0xbbb2ba)[0x55689b8362ba]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7f3dac504980]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f3daaae0fb7]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f3daaae2921]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0x1f2a7f)[0x55689ae6da7f]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0xa8ffe7)[0x55689b70afe7]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0xa90190)[0x55689b70b190]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0x95a714)[0x55689b5d5714]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0x95b310)[0x55689b5d6310]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0x95d330)[0x55689b5d8330]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0x95d7a8)[0x55689b5d87a8]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0x96c45b)[0x55689b5e745b]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0xbac815)[0x55689b827815]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0xbdc120)[0x55689b857120]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0xbdca52)[0x55689b857a52]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0xb91a0e)[0x55689b80ca0e]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0x62e965)[0x55689b2a9965]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(PyCFunction_Call+0x96)[0x7f3dac924736]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x76e0)[0x7f3dac895b20]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17ba0f)[0x7f3dac88ca0f]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17c0fc)[0x7f3dac88d0fc]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x4ec3)[0x7f3dac893303]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17a803)[0x7f3dac88b803]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17c2be)[0x7f3dac88d2be]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x4ec3)[0x7f3dac893303]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17ba0f)[0x7f3dac88ca0f]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17c0fc)[0x7f3dac88d0fc]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x4ec3)[0x7f3dac893303]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17ba0f)[0x7f3dac88ca0f]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(PyEval_EvalCodeEx+0x3e)[0x7f3dac88d4ce]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(PyEval_EvalCode+0x1b)[0x7f3dac88e24b]
--- END LIBC BACKTRACE ---
Aborted (core dumped)

This occurs in gem5.opt as well as in gem5.debug.
After some debug and tracing, comparing the full execution and the restored 
one, I found the main difference is in the following piece of trace

394631500: system.switch_cpus: A0 T0 : 0x62ee2 @Perl_yyparse+408    : lh s3, 
-1720(a4)           : MemRead :  D=0x0000000000000000 A=0x1362d0  
FetchSeq=289047  CPSeq=255916  flags=(IsInteger|IsLoad)
394631500: system.switch_cpus: A0 T0 : 0x62ee6 @Perl_yyparse+412    : c_li a4, 
1                 : IntAlu :  D=0x0000000000000001  FetchSeq=289048  
CPSeq=255917  flags=(IsInteger)
394632000: system.switch_cpus: A0 T0 : 0x62ee8 @Perl_yyparse+414    : addi a3, 
zero, 197         : IntAlu :  D=0x00000000000000c5  FetchSeq=289049  
CPSeq=255918  flags=(IsInteger)
394632000: system.switch_cpus: A0 T0 : 0x62eec @Perl_yyparse+418    : subw a4, 
a4, s3            : IntAlu :  D=0x0000000000000001  FetchSeq=289050  
CPSeq=255919  flags=(IsInteger)
394632500: system.switch_cpus: A0 T0 : 0x62ef0 @Perl_yyparse+422    : c_slli 
a4, 3               : IntAlu :  D=0x0000000000000008  FetchSeq=289051  
CPSeq=255920  flags=(IsInteger)
394633000: system.switch_cpus: A0 T0 : 0x62ef2 @Perl_yyparse+424    : c_add a4, 
s7               : IntAlu :  D=0x00000000001ae5d8  FetchSeq=289052  
CPSeq=255921  flags=(IsInteger)
394633000: system.cpu.workload: Translating: 0x1ae5d8->0x515d8
394633000: system.cpu.dcache: access for ReadReq [515d8:515df] hit state: e (M) 
writable: 1 readable: 1 dirty: 1 prefetched: 0 | tag: 0xa secure: 0 valid: 1 | 
set: 0x57 way: 0x1
394634000: system.switch_cpus: A0 T0 : 0x62ef4 @Perl_yyparse+426    : c_ld a4, 
0(a4)             : MemRead :  D=0xa423473885660018 A=0x1ae5d8  FetchSeq=289053 
 CPSeq=255922  flags=(IsInteger|IsLoad)
394634000: system.cpu.workload: Translating: 0x18c0c8->0x1c10c8
394634500: system.switch_cpus: A0 T0 : 0x62ef6 @Perl_yyparse+428    : sd a4, 
200(s5)             : MemWrite :  D=0xa423473885660018 A=0x18c0c8  
FetchSeq=289054  CPSeq=255923  flags=(IsInteger|IsStore)
394635000: system.cpu.dcache: access for WriteReq [1c10c8:1c10cf] hit state: e 
(M) writable: 1 readable: 1 dirty: 1 prefetched: 0 | tag: 0x38 secure: 0 valid: 
1 | set: 0x43 way: 0x1
394635000: system.cpu.dcache: satisfyRequest for WriteReq [1c10c8:1c10cf] 
(write)
394635500: system.switch_cpus: A0 T0 : 0x62efa @Perl_yyparse+432    : bltu a3, 
a5, 414           : IntAlu :   FetchSeq=289055  CPSeq=255924  
flags=(IsInteger|IsControl|IsDirectControl|IsCondControl)
394635500: system.switch_cpus: A0 T0 : 0x62efe @Perl_yyparse+436    : c_slli 
a5, 32              : IntAlu :  D=0x0000000400000000  FetchSeq=289056  
CPSeq=255925  flags=(IsInteger)
394636000: system.switch_cpus: A0 T0 : 0x62f00 @Perl_yyparse+438    : lui a4, 
309                : IntAlu :  D=0x0000000000135000  FetchSeq=289057  
CPSeq=255926  flags=(IsInteger)
394636000: system.switch_cpus: A0 T0 : 0x62f04 @Perl_yyparse+442    : c_srli 
a5, 30              : IntAlu :  D=0x0000000000000010  FetchSeq=289058  
CPSeq=255927  flags=(IsInteger)
394636500: system.switch_cpus: A0 T0 : 0x62f06 @Perl_yyparse+444    : addi a4, 
a4, 1560          : IntAlu :  D=0x0000000000135618  FetchSeq=289059  
CPSeq=255928  flags=(IsInteger)
394637000: system.switch_cpus: A0 T0 : 0x62f0a @Perl_yyparse+448    : c_add a5, 
a4               : IntAlu :  D=0x0000000000135628  FetchSeq=289060  
CPSeq=255929  flags=(IsInteger)
The bold parts of the text shows that in the tick 394634000 there is a C.LD 
instruction that loads from memory address 0x1ae5d8. In the trace from the full 
execution that load returns 0.
Looking on differences, I found that the main difference is that the MMU 
translates that address to 0x515d8 that contains information. Where that comes 
from?

After using the gdb to check what is putting that info there and why the MMU is 
considering that the PA is free I discovered that that area is stored during 
the checkpoint unserializeStore function, where it loads the 
system.physmem.store0.pmem file.

So that comes from the code previous to the checkpoint, so we take a look at 
the full execution trace to check where is the physical page 0x51000  used.

   469397500: system.cpu: A0 T0 : 0x12ffc @Perl_yylex+6674    : c_jr a4         
           : IntAlu :   FetchSeq=429379  CPSeq=363289  
flags=(IsInteger|IsControl|IsIndirectControl|IsUncondControl|IsCall)
   469403000: system.cpu.workload: Translating: 0x1b580->0x51580
   469486000: system.cpu.workload: Translating: 0x1b5c0->0x515c0
   469488500: system.cpu: A0 T0 : 0x1b5b2 @Perl_yylex+40904    : jal zero, 
-12884           : IntAlu :  D=0x000000000001b5b6  FetchSeq=429404  
CPSeq=363290  flags=(IsInteger|IsControl|IsDirectControl|IsUncondControl|IsCall)
   469621000: system.cpu.workload: Translating: 0x18340->0x4e340
   469704000: system.cpu.workload: Translating: 0x18380->0x4e380

So it looks like the code read in the checkpoint is an instruction from the 
.text segment.

The at this point I find, it should be supose to be marked the page as used in 
the Ptable entries in the m5.cpt ?
If it is not there is because that is not used in the checkpoint? but then the 
memory area should be zeroed? so when this situation happens, that the physical 
memory is used that content is not being used?
Or am I very wrong and I am reading all this wrongly, in that case, can you 
help me pointing out my errors?


Thank you for your time and your help, I tried to attach the traces files but 
the tgz of both of them together where over 500mb.

The commands used in these tests are,

        •       full exec:

gem5_orig/build/RISCV/gem5.opt --outdir gem5_orig/configs/example/se.py 
--cpu-type=MinorCPU --bp-type=TAGE --caches --mem-size 1073741824 -c 
$SPEC_DIR/CPU2006/400.perlbench/exe/perlbench_base.riscv -o " -I./lib attrs.pl
        •       checkpoint_generation:

gem5_orig/build/RISCV/gem5.opt --outdir gem5_orig/configs/example/se.py 
--take-checkpoints 124853750,10000000 --cpu-type=MinorCPU --bp-type=TAGE 
--caches --mem-size 1073741824 -c 
$SPEC_DIR/CPU2006/400.perlbench/exe/perlbench_base.riscv -o " -I./lib attrs.pl
        •       checkpoint restore:

           gem5_orig/build/RISCV/gem5.opt --outdir 
gem5_orig/configs/example/se.py --checkpoint-dir ./ -r 1 --cpu-type=MinorCPU 
--bp-type=TAGE --caches --mem-size 1073741824 -c 
$SPEC_DIR/CPU2006/400.perlbench/exe/perlbench_base.riscv -o " -I./lib attrs.pl


Thanks again, and sorry for the long post.



---- Activat Mon, 22 Nov 2021 14:23:48 +0100 Giacomo Travaglini via gem5-users 
<gem5-users@gem5.org> va escriure ----

Hi Gelin,



Are you compiling gem5 in debug mode?

You can do that by using “debug” instead of “opt”:



$scons build/ARM/gem5.debug -j`nproc`



Kind Regards



Giacomo



From: Gelin Fu via gem5-users <gem5-users@gem5.org>
Date: Monday, 22 November 2021 at 12:26
To: gem5-users@gem5.org <gem5-users@gem5.org>
Cc: Gelin Fu <20153...@cqu.edu.cn>
Subject: [gem5-users] Re: Problem with checkpoint and restoration in gem5 se 
mode
Hi, Giacomo.Thanks for your reply.
I am not familiar with gdb in se mode. So I try to use debug functions such as 
curTick() and eventqDump(). But gdb tells me that there is no symbol about 
eventqDump() and curTick. So I only use backtrace when the program aborted.
I am using the command as below:
gdb --args $GEM5_BIN --outdir=$OUTPUT_PATH $GEM5_PATH/configs/example/se.py \
--num-cpu 1 --cpu-clock 2.5GHz --cpu-type O3_ARM_v7a_3 \
--restore-with-cpu O3_ARM_v7a_3 -r 1 --checkpoint-dir \
"$CHECK_PATH" --caches --mem-type DDR3_2133_8x8 --mem-size 1GB \
-c "$TARGET_PATH" --options "$DATA_PATH"
the gdb output are as below:
(gdb) r
Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff5ca7921 in __GI_abort () at abort.c:79
#2  0x00007ffff5c9748a in __assert_fail_base (
    fmt=0x7ffff5e1e750 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x55555868991a "when >= getCurTick()",
    file=file@entry=0x555558689902 "build/ARM/sim/eventq.hh",
    line=line@entry=766,
    function=function@entry=0x555558689b20 
<gem5::EventQueue::schedule(gem5::Event*, unsigned long, 
bool)::__PRETTY_FUNCTION__> "void gem5::EventQueue::schedule(gem5::Event*, 
gem5::Tick, bool)") at assert.c:92
#3  0x00007ffff5c97502 in __GI___assert_fail (
    assertion=0x55555868991a "when >= getCurTick()",
    file=0x555558689902 "build/ARM/sim/eventq.hh", line=766,
    function=0x555558689b20 <gem5::EventQueue::schedule(gem5::Event*, unsigned 
long, bool)::__PRETTY_FUNCTION__> "void 
gem5::EventQueue::schedule(gem5::Event*, gem5::Tick, bool)") at assert.c:101
#4  0x0000555555cc1dfe in gem5::EventQueue::schedule (this=0x55555ad72ea0,
    event=0x55555ace0800, when=1010, global=false)
    at build/ARM/sim/eventq.hh:766
#5  0x0000555555dd3a94 in gem5::EventManager::schedule (this=0x55555ace0708,
    event=..., when=1010) at build/ARM/sim/eventq.hh:1021
#6  0x00005555561fc1a9 in gem5::BaseCache::startup (this=0x55555ace0700)
    at build/ARM/mem/cache/base.cc:169

(gdb) p curTick
No symbol "curTick" in current context.
(gdb) p curTick()
No symbol "curTick" in current context.

Kind regards
Gelin
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s


_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s


------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr. Astrid Lambrecht,
Prof. Dr. Frauke Melchior
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to