Thanks a lot for the tips. I will give a try.

best,
Da

On Thu, Jul 19, 2018 at 3:12 PM Gutierrez, Anthony <
anthony.gutier...@amd.com> wrote:

> Yes, make sure all buffers are flushed, etc., before taking your
> checkpoint you can call the “sync” command, which should be already
> installed on the image. You’ll need to call sync before your commands to
> halt and take a checkpoint.
>
>
>
> This page explains how I did the same for an Android disk image:
> http://gem5.org/BBench-gem5#Tips_for_Making_Your_Disk_Image_gem5_Friendly
>
>
>
> -Tony
>
>
>
> *From:* gem5-users <gem5-users-boun...@gem5.org> *On Behalf Of *Da Zhang
> *Sent:* Thursday, July 19, 2018 12:00 PM
> *To:* gem5 users mailing list <gem5-users@gem5.org>
> *Subject:* Re: [gem5-users] dacapo (java) benchmark suite encounters
> "SIGSEGV" and "null exception" during timing mode (fs mode) after
> restarting from a checkpoint
>
>
>
> Hey Gutierrez,
>
>
>
> "*sync* the disk image", do you mean making sure all disk modifications
> are actually made on the disk (update to date) before taking the
> checkpoint? How to do that?
>
> I haven't tried to take a checkpoint with COW layer disabled and then
> restart from that checkpoint before. All I have done is "ctrl+c" to stop
> gem5 to take the checkpoint (--checkpoint-at-end); I rely on gem5 to take
> care of all things that need to be checked when taking checkpoints.
>
>
>
> Best,
>
> Da Zhang
>
>
>
> On Thu, Jul 19, 2018 at 2:36 PM Gutierrez, Anthony <
> anthony.gutier...@amd.com> wrote:
>
> JIT was precisely the issue I was thinking was causing this. One thing may
> be necessary, that is to ensure you *sync* the disk image before taking
> your checkpoint.
>
>
>
> gem5’s debug flags should help you identify something like a hang, for
> example an ExecAll trace. A SyscallAll trace would most likely help you
> understand better what the JIT is doing.
>
>
>
> *From:* gem5-users <gem5-users-boun...@gem5.org> *On Behalf Of *Da Zhang
> *Sent:* Thursday, July 19, 2018 11:15 AM
> *To:* gem5 users mailing list <gem5-users@gem5.org>
> *Subject:* Re: [gem5-users] dacapo (java) benchmark suite encounters
> "SIGSEGV" and "null exception" during timing mode (fs mode) after
> restarting from a checkpoint
>
>
>
> Thanks for the suggestions.
>
> I have been trying a couple of solutions (I only test for  a small subset
> of decapo benchmark suite, which encounters segfault with O3CPU):
>
>
>
> 1. using TimingSimpleCPU: no segfaults
>
> 2. disable COW layer and write on the disk image when taking checkpoint:
> there are still segfaults
>
> 3. take checkpoints with JIT compiler disabled (20x slowdown): no segfaults
>
> 4. take checkpoints during atomic mode (without warming up JIT): no
> segfaults
>
> 5. take checkpoints with Java OOPs compress disabled: there are still
> segfaults
>
>
>
> One thing that I can't tell is if the benchmark hangs since there is no
> printing during the execution. Is there a statistic I can use to tell if
> the benchmark hangs?
>
>
>
> So far, all my experiments are running using 1CPU (even some benchmarks
> are multithreading). I attempted to take some checkpoints with more CPUs
> with KVM CPU. But unfortunately, I got some "rcu_sched self-detected stall
> on CPU" issues. Any idea?
>
>
>
> On Mon, Jul 16, 2018 at 5:47 PM Gutierrez, Anthony <
> anthony.gutier...@amd.com> wrote:
>
> Da,
>
>
>
> Do you encounter the segfault only when restoring from a checkpoint? That
> is, if you do not use checkpoints can any DaCapo benchmark successfully
> complete under one of the simple CPU models (and not just KVM CPU)?
>
>
>
> If so, you may want to get a syscall trace (e.g., using strace) to see
> what sorts of files the JVM is trying to read etc. It’s possible that the
> VM generates some files that it will read back later. If you use
> checkpoints, due to the disk image COW layer, I do not believe any disk
> updates are checkpointed, thus these files will not persist, which could
> lead to some weird segfault issues. Not sure if this is happening in your
> case, but it may be worth investigating.
>
>
>
> I created some of the original Android disk images, and the original
> DaCapo image, and at that time I would typically run the benchmarks thru
> the FS mode and Atomic CPU once, with the COW layer disabled, in order to
> generate the needed files on the disk image and have them persist. This was
> entirely for performance, however, to prevent the VMs from regenerating the
> same files for each run, but I can envision it causing issues during
> runtime as well. In particular, it seems you’re code is faulting while
> doing some XML serializing/deserializing, perhaps the xml file it is
> looking for is gone?
>
>
>
> Beyond that, assuming it is a real bug in gem5, I would recommend an
> ExecAll trace to figure out why the instruction at that PC is faulting.
>
>
>
> -Tony
>
>
>
> *From:* gem5-users [mailto:gem5-users-boun...@gem5.org] *On Behalf Of *Da
> Zhang
> *Sent:* Monday, July 16, 2018 1:50 PM
> *To:* gem5 users mailing list <gem5-users@gem5.org>
> *Subject:* Re: [gem5-users] dacapo (java) benchmark suite encounters
> "SIGSEGV" and "null exception" during timing mode (fs mode) after
> restarting from a checkpoint
>
>
>
> Hey Jason,
>
>
>
> There are a bunch of "warn: instruction 'prefetch_nta' unimplemented" in
> atomic modes, during which the java benchmarks don't crash. However, there
> is no these kind of warnings during timing mode. Does it imply that
> unimplemented instructions don't cause the problem? Any clues or
> suggestions to debug these problems?
>
>
>
> best,
>
> Da Zhang
>
>
>
>
>
>
>
> On Mon, Jul 16, 2018 at 1:32 PM Jason Lowe-Power <ja...@lowepower.com>
> wrote:
>
> Hello,
>
>
>
> Are you seeing any warnings like "warn: Instruction XXX not implemented"?
>
>
>
> There are many X86 SIMD instructions that are currently unimplemented. I
> would bet that your application is using some of those instructions and
> getting 0's as the output instead of the correct value.
>
>
>
> The "right" way to solve this problem is to implement these instructions
> (and we would really appreciate it if you contribute your fixes back on
> https://gem5-review.googlesource.com. The other option is to recompile
> your applications without SIMD extensions (e.g., -march=athlon64 or
> whatever is the original x86-64 name in GCC). However, this likely requires
> compiling all of the java runtime in your case.
>
>
>
> Cheers,
>
> Jason
>
>
>
> On Mon, Jul 16, 2018 at 10:11 AM Da Zhang <d...@vt.edu> wrote:
>
> To clarify, "SIGSEGV and null exceptions " happens to the benchmark
> suite, not gem5. Gem5 is running without errors. But in the
> system.pc.com_1.device files, I observe that most of the benchmarks crash
> due to SIGSEGV or null exceptions.
>
> Example:
>
> "
>
>  x/system.pc.com_1.device
>
>
>
>       buffers
>
>   1 #
>
>   2 # A fatal error has been detected by the Java Runtime Environment:
>
>   3 #
>
>   4 #  SIGSEGV (0xb) at pc=0x00007f81d17742b7, pid=1474,
> tid=0x00007f81cf46d700
>
>   5 #
>
>   6 # JRE version: Java(TM) SE Runtime Environment (8.0_171-b11) (build
> 1.8.0_171-b11)
>
>   7 # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.171-b11 mixed mode
> linux-amd64 compressed oops)
>
>   8 # Problematic frame:
>
>   9 # J 1815 C2
> org.apache.xml.serializer.ToHTMLStream.endElement(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)V
> (389 bytes) @ 0x00007f81d17742b7 [0x00007f81d1774280+0x37]
>
>
>
>  10 #
>
>  11 #
>
> "
>
>
>
> On Mon, Jul 16, 2018 at 11:39 AM Da Zhang <d...@vt.edu> wrote:
>
> Hey guys,
>
>
>
> I am testing a java benchmark suite, dacapo, on gem5 with fs mode.
> Unfortunately, I encounter a lot of  SIGSEGV and null exceptions during
> timing mode after restarting from the checkpoints.
>
> I am using linux kernel v4.8.13 and ubuntu-server-16.04.1 with
> oracle jdk v8.0_171-b11. To eliminate the influence of my modifications to
> gem5 src/ and configs/, I re-download gem5 and checkout to commit
> "ee2ffdc0fdb489767768e5273a4ccd7b51735c7c", which is the gem5 version I am
> working on. The checkpoint was taken by using kvm cpu with 1 CPU and 16GB
> memory. For the simulation, I use build/X86/gem5.opt (in order to enable
> assertions) with fs mode (configs/example/fs.py). Other options include
> "--cpu-type=DerivO3CPU -n 1 --mem-size=16GB --caches --l2cache
> --l2_size=${L2SIZE}" (I try L2SIZE from 256KB to 8MB). I test with 100ms
> warmup and 1ps real simulation time. There are no errors presented. But
> with longer real simulation time, the benchmark suite crashes with
> segfault.
>
> I am able to run the dacapo benchmark suite in fs mode with kvm cpu,
> without any segfaults or exceptions. I have some simple java benchmarks
> tested; neither segfaults nor exceptions present.
>
> Does anyone have suggestions or experience against these issues?
>
>
>
> best,
>
> Da Zhang
>
> _______________________________________________
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
> _______________________________________________
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
> _______________________________________________
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
> _______________________________________________
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
> _______________________________________________
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to