Hi Mitch, After I applied two patches and IsQuiesce modification, O3 CPU keeps in the same track as atomic CPU longer than before. But apic_timer_interrupt function comes out again in O3 CPU. It used to come out after about 500,000 instructions, now it comes out after about 990,000 instructions.
In addition, I dump out tick numbers as well as PCs, so that I find out there is a 459500 ticks gap between last committed user instruction and first instruction in apic_timer_interrupt function. This confirms that the last user instruction sits in commit until timer interrupt happens. Am I right about this? Next step, I think I need to label all x86 quiesce instructions. Do you have a list of those instructions? Or does somewhere in Intel manual tell me about this? Thanks. -- Best Regards Yan Zi On 27 Aug 2014, at 15:59, Mitch Hayenga wrote: > Yep, that should do it. > > > On Wed, Aug 27, 2014 at 2:57 PM, Zi Yan <birdman...@gmail.com> wrote: > >> Thanks. >> >> I will apply 1, and 2 patches. >> >> For 3, I need to change the file src/arch/x86/isa/microops/specop.isa:66 >> from >> setFlags | (ULL(1) << StaticInst::IsNonSpeculative), >> to >> setFlags | (ULL(1) << StaticInst::IsNonSpeculative) | (ULL(1) << >> StaticInst::IsQuiesce), >> >> Am I doing the right thing to tag "MicroHalt" instruction as "IsQuiesce"? >> >> BTW, what I did to boot linux is to install gentoo inside QEMU, >> then use x86KvmCPU to boot up, then take checkpoints and run from >> checkpoints. >> >> I will report whether this works or not. >> >> Thanks. >> >> -- >> Best Regards >> Yan Zi >> >> On 27 Aug 2014, at 15:44, Mitch Hayenga wrote: >> >>> There are probably three main patches that could help. The fact you >>> mention the timer interrupt makes me think Andreas is right and these >> might >>> solve your issue. >>> >>> 1. http://reviews.gem5.org/r/2363/ - o3 is supposed to stop fetching >>> instructions immediately once a quiesce instruction is encountered, some >>> managed to sneak by. Quiesce is used for things like sleeping until an >>> interrupt occurs, etc. Without this patch, we experienced the case where >>> o3 state would get corrupted and an instruction would sit at commit until >>> the next timer interrupt happened. At which point taking the interrupt >>> would clear the state and execution would continue (until this same bug >>> happened again). >>> >>> 2. http://reviews.gem5.org/r/2367/ - If o3 was being drained while an >>> interrupt occurred on x86, it could deadlock. >>> >>> 3. I believe this last patch will be posted in a day or two. x86 >> currently >>> does not tag any instruction that suspends() the CPU as a "quiesce". >> This >>> is required by o3 to properly operate, but not by the Atomic CPU. This >>> makes the issue in #1 far more likely to occur. It's pretty amazing that >>> x86 booted linux at all on o3 without this. I believe this patch will be >>> posted shortly, but otherwise you could just tag the "MicroHalt" >>> instruction as "IsQuiesce" yourself. >>> >>> So a combination of those things (mainly the last one) could lead to what >>> you are seeing. >>> >>> >>> On Wed, Aug 27, 2014 at 12:59 PM, Zi Yan via gem5-users < >> gem5-users@gem5.org >>>> wrote: >>> >>>> OK. Could you please tell me which patches are there? In the >>>> review board there are quite a lot of new patches waiting >>>> for review. >>>> >>>> I can apply those patches myself and do a quick test. >>>> >>>> Thanks. >>>> >>>> -- >>>> Best Regards >>>> Yan Zi >>>> >>>> On 27 Aug 2014, at 13:56, Andreas Hansson wrote: >>>> >>>>> Hi Yan, >>>>> >>>>> I would suspect this is due to a bug in the X86 O3 CPU. There have been >>>>> quite a few fixes posted on the review board for similar issues. I hope >>>> to >>>>> have these committed in the next week or so. >>>>> >>>>> Andreas >>>>> >>>>> >>>>> On 27/08/2014 18:02, "Zi Yan via gem5-users" <gem5-users@gem5.org> >>>> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I am running kmeans via hadoop in gem5 X86 FS mode. I am using >>>>>> linux kernel 3.2.60 with configuration file linux-2.6.28.4 from >>>>>> gem5.org. >>>>>> >>>>>> I take a checkpoint before a map task and put a "m5 exit" after the >> map >>>>>> task. >>>>>> I am using *X86kvmCPU* to take checkpoints. >>>>>> >>>>>> When I restore from the same checkpoint, atomic CPU and O3 CPU give me >>>>>> quite different executed instructions: >>>>>> 1) atomic CPU executes about 350 million instructions, reaches "m5 >>>> exit", >>>>>> then stops simulation. >>>>>> 2) O3 CPU executes more than 12 billion instructions, and still not >>>>>> reaches >>>>>> "m5 exit" to stop the simulation. >>>>>> >>>>>> I dump out committed PCs from atomic CPU and O3 CPU, finding out that >>>>>> after about 500,000 instructions, the systems behave differently, >>>>>> where atomic CPU is still executing user code, but O3 CPU switch to >>>>>> apic_timer_interrupt(a kernel function, it also appears in atomic CPU >>>>>> execution, but somewhere else). >>>>>> >>>>>> Could anyone please give some advice about why this happen? >>>>>> >>>>>> Thanks. >>>>>> >>>>>> -- >>>>>> Best Regards >>>>>> Yan Zi >>>>> >>>>> >>>>> -- IMPORTANT NOTICE: The contents of this email and any attachments are >>>> confidential and may also be privileged. If you are not the intended >>>> recipient, please notify the sender immediately and do not disclose the >>>> contents to any other person, use it for any purpose, or store or copy >> the >>>> information in any medium. Thank you. >>>>> >>>>> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, >>>> Registered in England & Wales, Company No: 2557590 >>>>> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 >>>> 9NJ, Registered in England & Wales, Company No: 2548782 >>>> >>>> _______________________________________________ >>>> gem5-users mailing list >>>> gem5-users@gem5.org >>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >>>> >>
signature.asc
Description: OpenPGP digital signature
_______________________________________________ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users