Re: [gem5-dev] changeset in gem5: cpu: Fix cache blocked load behavior in o3 cpu
Hi, Stores should be fine since they are only sent to the memory system after commit. The relevant functions to look at are sendStore, recvRetry, and writebackStores in lsq_unit_impl.hh. Basically, if a store gets blocked the core just waits until it gets a retry. Since stores are sent in-order from the SQ to the memory system, that queue just waits. The stores are never removed from the SQ unless they succeed. Loads were special in that they were effectively removed from the scheduler, even if they might fail. Stores however always maintain their entries/order until they succeed. On Thu, Jan 29, 2015 at 6:01 PM, Beckmann, Brad via gem5-dev gem5-dev@gem5.org wrote: Hi Mitch, Quick question regarding this patch. Does this patch also handle replaying stores once the cache becomes unblocked? The changes and comments appear to only handle loads, but it seems like stores could have the same problem. Thanks, Brad -Original Message- From: gem5-dev [mailto:gem5-dev-boun...@gem5.org] On Behalf Of Mitch Hayenga via gem5-dev Sent: Wednesday, September 03, 2014 4:38 AM To: gem5-...@m5sim.org Subject: [gem5-dev] changeset in gem5: cpu: Fix cache blocked load behavior in o3 cpu changeset 6be8945d226b in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=6be8945d226b description: cpu: Fix cache blocked load behavior in o3 cpu This patch fixes the load blocked/replay mechanism in the o3 cpu. Rather than flushing the entire pipeline, this patch replays loads once the cache becomes unblocked. Additionally, deferred memory instructions (loads which had conflicting stores), when replayed would not respect the number of functional units (only respected issue width). This patch also corrects that. Improvements over 20% have been observed on a microbenchmark designed to exercise this behavior. diffstat: src/cpu/o3/iew.hh | 13 +- src/cpu/o3/iew_impl.hh | 57 ++ src/cpu/o3/inst_queue.hh| 25 - src/cpu/o3/inst_queue_impl.hh | 68 ++--- src/cpu/o3/lsq.hh | 27 +- src/cpu/o3/lsq_impl.hh | 23 +--- src/cpu/o3/lsq_unit.hh | 198 --- src/cpu/o3/lsq_unit_impl.hh | 40 ++- src/cpu/o3/mem_dep_unit.hh |4 +- src/cpu/o3/mem_dep_unit_impl.hh |4 +- 10 files changed, 203 insertions(+), 256 deletions(-) diffs (truncated from 846 to 300 lines): diff -r 1ba825974ee6 -r 6be8945d226b src/cpu/o3/iew.hh --- a/src/cpu/o3/iew.hh Wed Sep 03 07:42:38 2014 -0400 +++ b/src/cpu/o3/iew.hh Wed Sep 03 07:42:39 2014 -0400 @@ -1,5 +1,5 @@ /* - * Copyright (c) 2010-2012 ARM Limited + * Copyright (c) 2010-2012, 2014 ARM Limited * All rights reserved * * The license below extends only to copyright in the software and shall @@ -181,6 +181,12 @@ /** Re-executes all rescheduled memory instructions. */ void replayMemInst(DynInstPtr inst); +/** Moves memory instruction onto the list of cache blocked instructions */ +void blockMemInst(DynInstPtr inst); + +/** Notifies that the cache has become unblocked */ +void cacheUnblocked(); + /** Sends an instruction to commit through the time buffer. */ void instToCommit(DynInstPtr inst); @@ -233,11 +239,6 @@ */ void squashDueToMemOrder(DynInstPtr inst, ThreadID tid); -/** Sends commit proper information for a squash due to memory becoming - * blocked (younger issued instructions must be retried). - */ -void squashDueToMemBlocked(DynInstPtr inst, ThreadID tid); - /** Sets Dispatch to blocked, and signals back to other stages to block. */ void block(ThreadID tid); diff -r 1ba825974ee6 -r 6be8945d226b src/cpu/o3/iew_impl.hh --- a/src/cpu/o3/iew_impl.hhWed Sep 03 07:42:38 2014 -0400 +++ b/src/cpu/o3/iew_impl.hhWed Sep 03 07:42:39 2014 -0400 @@ -530,29 +530,6 @@ templateclass Impl void -DefaultIEWImpl::squashDueToMemBlocked(DynInstPtr inst, ThreadID tid) -{ -DPRINTF(IEW, [tid:%i]: Memory blocked, squashing load and younger insts, -PC: %s [sn:%i].\n, tid, inst-pcState(), inst-seqNum); -if (!toCommit-squash[tid] || -inst-seqNum toCommit-squashedSeqNum[tid]) { -toCommit-squash[tid] = true; - -toCommit-squashedSeqNum[tid] = inst-seqNum; -toCommit-pc[tid] = inst-pcState(); -toCommit-mispredictInst[tid] = NULL; - -// Must include the broadcasted SN in the squash. -toCommit-includeSquashInst[tid] = true; - -ldstQueue.setLoadBlockedHandled(tid); - -wroteToTimeBuffer = true; -} -} - -templateclass Impl -void DefaultIEWImpl::block(ThreadID tid) { DPRINTF(IEW, [tid:%u]: Blocking.\n, tid); @@ -610,6 +587,20 @@ templateclass Impl void
[gem5-dev] changeset in gem5: config: arm: fix os_flags
changeset e17949745150 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=e17949745150 description: config: arm: fix os_flags Fix the makeArmSystem routine to reflect recent changes that support kernel commandline option when running android. Without this fix, trying to run android encounters a 'reference before assignment' error. Committed by: Nilay Vaish ni...@cs.wisc.edu diffstat: configs/common/FSConfig.py | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diffs (12 lines): diff -r 3c42be107634 -r e17949745150 configs/common/FSConfig.py --- a/configs/common/FSConfig.pySun Jan 25 07:22:56 2015 -0500 +++ b/configs/common/FSConfig.pyFri Jan 30 15:49:34 2015 -0600 @@ -286,7 +286,7 @@ self.flags_addr = self.realview.realview_io.pio_addr + 0x30 if mdesc.disk().lower().count('android'): -boot_flags += init=/init +cmdline += init=/init self.boot_osflags = fillInCmdline(mdesc, cmdline) self.realview.attachOnChipIO(self.membus, self.bridge) self.realview.attachIO(self.iobus) ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
Re: [gem5-dev] Review Request 2611: cpu: Tidy up the MemTest and make false sharing more obvious
--- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2611/#review5809 --- src/cpu/testers/memtest/MemTest.py http://reviews.gem5.org/r/2611/#comment5129 Are you sure this should be dropped? I think the coherence protocols that provide a dma controller need this for testing. - Nilay Vaish On Jan. 21, 2015, 1:23 p.m., Andreas Hansson wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2611/ --- (Updated Jan. 21, 2015, 1:23 p.m.) Review request for Default. Repository: gem5 Description --- Changeset 10671:94bc71e83168 --- cpu: Tidy up the MemTest and make false sharing more obvious The MemTest class really only tests false sharing, and as such there was a lot of old cruft that could be removed. This patch cleans up the tester, and also makes it more clear what the assumptions are. As part of this simplification the reference functional memory is also removed. The regression configs using MemTest are updated to reflect the changes, and the stats will be bumped in a separate patch. The example config will be updated in a separate patch due to more extensive re-work. In a follow-on patch a new tester will be introduced that uses the MemChecker to implement true sharing. Diffs - configs/example/memtest.py a6fe75e8296b src/cpu/testers/memtest/MemTest.py a6fe75e8296b src/cpu/testers/memtest/memtest.hh a6fe75e8296b src/cpu/testers/memtest/memtest.cc a6fe75e8296b tests/configs/memtest-filter.py a6fe75e8296b tests/configs/memtest-ruby.py a6fe75e8296b tests/configs/memtest.py a6fe75e8296b Diff: http://reviews.gem5.org/r/2611/diff/ Testing --- Thanks, Andreas Hansson ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
Re: [gem5-dev] changeset in gem5: cpu: Fix cache blocked load behavior in o3 cpu
Thanks Mitch for the quick reply. While assuming stores are only sent after commit is true for the current O3 model, aggressive out-of-order processors send store addresses to the memory system as soon as they are available (i.e. speculatively). We actually have a patch that provides such a capability, but I'm having a tough time figuring out how to merge it with your change. Any suggestions you may have would be very much appreciated. Thanks, Brad -Original Message- From: gem5-dev [mailto:gem5-dev-boun...@gem5.org] On Behalf Of Mitch Hayenga via gem5-dev Sent: Friday, January 30, 2015 9:34 AM To: gem5 Developer List Cc: gem5-...@m5sim.org Subject: Re: [gem5-dev] changeset in gem5: cpu: Fix cache blocked load behavior in o3 cpu Hi, Stores should be fine since they are only sent to the memory system after commit. The relevant functions to look at are sendStore, recvRetry, and writebackStores in lsq_unit_impl.hh. Basically, if a store gets blocked the core just waits until it gets a retry. Since stores are sent in-order from the SQ to the memory system, that queue just waits. The stores are never removed from the SQ unless they succeed. Loads were special in that they were effectively removed from the scheduler, even if they might fail. Stores however always maintain their entries/order until they succeed. On Thu, Jan 29, 2015 at 6:01 PM, Beckmann, Brad via gem5-dev gem5-dev@gem5.org wrote: Hi Mitch, Quick question regarding this patch. Does this patch also handle replaying stores once the cache becomes unblocked? The changes and comments appear to only handle loads, but it seems like stores could have the same problem. Thanks, Brad -Original Message- From: gem5-dev [mailto:gem5-dev-boun...@gem5.org] On Behalf Of Mitch Hayenga via gem5-dev Sent: Wednesday, September 03, 2014 4:38 AM To: gem5-...@m5sim.org Subject: [gem5-dev] changeset in gem5: cpu: Fix cache blocked load behavior in o3 cpu changeset 6be8945d226b in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=6be8945d226b description: cpu: Fix cache blocked load behavior in o3 cpu This patch fixes the load blocked/replay mechanism in the o3 cpu. Rather than flushing the entire pipeline, this patch replays loads once the cache becomes unblocked. Additionally, deferred memory instructions (loads which had conflicting stores), when replayed would not respect the number of functional units (only respected issue width). This patch also corrects that. Improvements over 20% have been observed on a microbenchmark designed to exercise this behavior. diffstat: src/cpu/o3/iew.hh | 13 +- src/cpu/o3/iew_impl.hh | 57 ++ src/cpu/o3/inst_queue.hh| 25 - src/cpu/o3/inst_queue_impl.hh | 68 ++--- src/cpu/o3/lsq.hh | 27 +- src/cpu/o3/lsq_impl.hh | 23 +--- src/cpu/o3/lsq_unit.hh | 198 --- src/cpu/o3/lsq_unit_impl.hh | 40 ++- src/cpu/o3/mem_dep_unit.hh |4 +- src/cpu/o3/mem_dep_unit_impl.hh |4 +- 10 files changed, 203 insertions(+), 256 deletions(-) diffs (truncated from 846 to 300 lines): diff -r 1ba825974ee6 -r 6be8945d226b src/cpu/o3/iew.hh --- a/src/cpu/o3/iew.hh Wed Sep 03 07:42:38 2014 -0400 +++ b/src/cpu/o3/iew.hh Wed Sep 03 07:42:39 2014 -0400 @@ -1,5 +1,5 @@ /* - * Copyright (c) 2010-2012 ARM Limited + * Copyright (c) 2010-2012, 2014 ARM Limited * All rights reserved * * The license below extends only to copyright in the software and shall @@ -181,6 +181,12 @@ /** Re-executes all rescheduled memory instructions. */ void replayMemInst(DynInstPtr inst); +/** Moves memory instruction onto the list of cache blocked instructions */ +void blockMemInst(DynInstPtr inst); + +/** Notifies that the cache has become unblocked */ +void cacheUnblocked(); + /** Sends an instruction to commit through the time buffer. */ void instToCommit(DynInstPtr inst); @@ -233,11 +239,6 @@ */ void squashDueToMemOrder(DynInstPtr inst, ThreadID tid); -/** Sends commit proper information for a squash due to memory becoming - * blocked (younger issued instructions must be retried). - */ -void squashDueToMemBlocked(DynInstPtr inst, ThreadID tid); - /** Sets Dispatch to blocked, and signals back to other stages to block. */ void block(ThreadID tid); diff -r 1ba825974ee6 -r 6be8945d226b src/cpu/o3/iew_impl.hh --- a/src/cpu/o3/iew_impl.hhWed Sep 03 07:42:38 2014 -0400 +++ b/src/cpu/o3/iew_impl.hhWed Sep 03 07:42:39 2014 -0400 @@ -530,29 +530,6 @@ templateclass Impl void -DefaultIEWImpl::squashDueToMemBlocked(DynInstPtr inst, ThreadID tid) -{ -DPRINTF(IEW, [tid:%i]: Memory blocked,
Re: [gem5-dev] changeset in gem5: cpu: Fix cache blocked load behavior in o3 cpu
Ahh, yeah I'm familiar with speculatively grabbing coherence rights for stores prior to commit. But the store isn't done right, its just globally ordered. And other system activity might make that ownership go away prior to the actual store commit. How about just dropping/ignoring the prefetch if the blocked case actually happens? On Fri, Jan 30, 2015 at 6:16 PM, Beckmann, Brad via gem5-dev gem5-dev@gem5.org wrote: Thanks Mitch for the quick reply. While assuming stores are only sent after commit is true for the current O3 model, aggressive out-of-order processors send store addresses to the memory system as soon as they are available (i.e. speculatively). We actually have a patch that provides such a capability, but I'm having a tough time figuring out how to merge it with your change. Any suggestions you may have would be very much appreciated. Thanks, Brad -Original Message- From: gem5-dev [mailto:gem5-dev-boun...@gem5.org] On Behalf Of Mitch Hayenga via gem5-dev Sent: Friday, January 30, 2015 9:34 AM To: gem5 Developer List Cc: gem5-...@m5sim.org Subject: Re: [gem5-dev] changeset in gem5: cpu: Fix cache blocked load behavior in o3 cpu Hi, Stores should be fine since they are only sent to the memory system after commit. The relevant functions to look at are sendStore, recvRetry, and writebackStores in lsq_unit_impl.hh. Basically, if a store gets blocked the core just waits until it gets a retry. Since stores are sent in-order from the SQ to the memory system, that queue just waits. The stores are never removed from the SQ unless they succeed. Loads were special in that they were effectively removed from the scheduler, even if they might fail. Stores however always maintain their entries/order until they succeed. On Thu, Jan 29, 2015 at 6:01 PM, Beckmann, Brad via gem5-dev gem5-dev@gem5.org wrote: Hi Mitch, Quick question regarding this patch. Does this patch also handle replaying stores once the cache becomes unblocked? The changes and comments appear to only handle loads, but it seems like stores could have the same problem. Thanks, Brad -Original Message- From: gem5-dev [mailto:gem5-dev-boun...@gem5.org] On Behalf Of Mitch Hayenga via gem5-dev Sent: Wednesday, September 03, 2014 4:38 AM To: gem5-...@m5sim.org Subject: [gem5-dev] changeset in gem5: cpu: Fix cache blocked load behavior in o3 cpu changeset 6be8945d226b in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=6be8945d226b description: cpu: Fix cache blocked load behavior in o3 cpu This patch fixes the load blocked/replay mechanism in the o3 cpu. Rather than flushing the entire pipeline, this patch replays loads once the cache becomes unblocked. Additionally, deferred memory instructions (loads which had conflicting stores), when replayed would not respect the number of functional units (only respected issue width). This patch also corrects that. Improvements over 20% have been observed on a microbenchmark designed to exercise this behavior. diffstat: src/cpu/o3/iew.hh | 13 +- src/cpu/o3/iew_impl.hh | 57 ++ src/cpu/o3/inst_queue.hh| 25 - src/cpu/o3/inst_queue_impl.hh | 68 ++--- src/cpu/o3/lsq.hh | 27 +- src/cpu/o3/lsq_impl.hh | 23 +--- src/cpu/o3/lsq_unit.hh | 198 --- src/cpu/o3/lsq_unit_impl.hh | 40 ++- src/cpu/o3/mem_dep_unit.hh |4 +- src/cpu/o3/mem_dep_unit_impl.hh |4 +- 10 files changed, 203 insertions(+), 256 deletions(-) diffs (truncated from 846 to 300 lines): diff -r 1ba825974ee6 -r 6be8945d226b src/cpu/o3/iew.hh --- a/src/cpu/o3/iew.hh Wed Sep 03 07:42:38 2014 -0400 +++ b/src/cpu/o3/iew.hh Wed Sep 03 07:42:39 2014 -0400 @@ -1,5 +1,5 @@ /* - * Copyright (c) 2010-2012 ARM Limited + * Copyright (c) 2010-2012, 2014 ARM Limited * All rights reserved * * The license below extends only to copyright in the software and shall @@ -181,6 +181,12 @@ /** Re-executes all rescheduled memory instructions. */ void replayMemInst(DynInstPtr inst); +/** Moves memory instruction onto the list of cache blocked instructions */ +void blockMemInst(DynInstPtr inst); + +/** Notifies that the cache has become unblocked */ +void cacheUnblocked(); + /** Sends an instruction to commit through the time buffer. */ void instToCommit(DynInstPtr inst); @@ -233,11 +239,6 @@ */ void squashDueToMemOrder(DynInstPtr inst, ThreadID tid); -/** Sends commit proper information for a squash due to memory becoming - * blocked (younger issued instructions must be retried). - */ -void
[gem5-dev] Cron m5test@zizzer /z/m5/regression/do-regression quick
* build/ALPHA/tests/opt/quick/se/00.hello/alpha/linux/minor-timing passed. * build/ALPHA/tests/opt/quick/se/00.hello/alpha/linux/o3-timing passed. * build/ALPHA/tests/opt/quick/se/00.hello/alpha/linux/simple-atomic passed. * build/ALPHA/tests/opt/quick/se/00.hello/alpha/linux/simple-timing passed. * build/ALPHA/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby passed. * build/ALPHA/tests/opt/quick/se/00.hello/alpha/tru64/minor-timing passed. * build/ALPHA/tests/opt/quick/se/00.hello/alpha/tru64/o3-timing passed. * build/ALPHA/tests/opt/quick/se/00.hello/alpha/tru64/simple-atomic passed. * build/ALPHA/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby passed. * build/ALPHA/tests/opt/quick/se/20.eio-short/alpha/eio/simple-atomic passed. * build/ALPHA/tests/opt/quick/se/20.eio-short/alpha/eio/simple-timing passed. * build/ALPHA/tests/opt/quick/se/30.eio-mp/alpha/eio/simple-atomic-mp passed. * build/ALPHA/tests/opt/quick/se/30.eio-mp/alpha/eio/simple-timing-mp passed. * build/ALPHA/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby passed. * build/ALPHA/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby passed. * build/ALPHA/tests/opt/quick/fs/10.linux-boot/alpha/linux/tsunami-simple-atomic passed. * build/ALPHA/tests/opt/quick/fs/10.linux-boot/alpha/linux/tsunami-simple-atomic-dual passed. * build/ALPHA/tests/opt/quick/fs/10.linux-boot/alpha/linux/tsunami-simple-timing passed. * build/ALPHA/tests/opt/quick/fs/10.linux-boot/alpha/linux/tsunami-simple-timing-dual passed. * build/ALPHA/tests/opt/quick/fs/80.netperf-stream/alpha/linux/twosys-tsunami-simple-atomic passed. * build/ALPHA/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing passed. * build/ALPHA/tests/opt/quick/se/01.hello-2T-smt/alpha/linux/o3-timing passed. * build/ALPHA_MOESI_hammer/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby-MOESI_hammer passed. * build/ALPHA_MOESI_hammer/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby-MOESI_hammer passed. * build/ALPHA_MOESI_hammer/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby-MOESI_hammer passed. * build/ALPHA_MOESI_hammer/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby-MOESI_hammer passed. * build/ALPHA_MESI_Two_Level/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby-MESI_Two_Level passed. * build/ALPHA_MESI_Two_Level/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby-MESI_Two_Level passed. * build/ALPHA_MESI_Two_Level/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby-MESI_Two_Level passed. * build/ALPHA_MESI_Two_Level/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby-MESI_Two_Level passed. * build/ALPHA_MOESI_CMP_directory/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_directory passed. * build/ALPHA_MOESI_CMP_directory/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_directory passed. * build/ALPHA_MOESI_CMP_directory/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_directory passed. * build/ALPHA_MOESI_CMP_directory/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_directory passed. * build/ALPHA_MOESI_CMP_token/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_token passed. * build/ALPHA_MOESI_CMP_token/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_token passed. * build/ALPHA_MOESI_CMP_token/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_token passed. * build/ALPHA_MOESI_CMP_token/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_token passed. * build/MIPS/tests/opt/quick/se/00.hello/mips/linux/o3-timing passed. * build/MIPS/tests/opt/quick/se/00.hello/mips/linux/simple-atomic passed. * build/MIPS/tests/opt/quick/se/00.hello/mips/linux/simple-timing passed. * build/MIPS/tests/opt/quick/se/00.hello/mips/linux/simple-timing-ruby passed. * build/NULL/tests/opt/quick/se/50.memtest/null/none/memtest passed. * build/NULL/tests/opt/quick/se/50.memtest/null/none/memtest-filter passed. * build/NULL/tests/opt/quick/se/70.tgen/null/none/tgen-dram-ctrl passed. * build/NULL/tests/opt/quick/se/70.tgen/null/none/tgen-simple-mem passed. * build/POWER/tests/opt/quick/se/00.hello/power/linux/o3-timing passed. * build/POWER/tests/opt/quick/se/00.hello/power/linux/simple-atomic passed. * build/SPARC/tests/opt/quick/se/00.hello/sparc/linux/simple-atomic passed. * build/SPARC/tests/opt/quick/se/00.hello/sparc/linux/simple-timing passed. * build/SPARC/tests/opt/quick/se/00.hello/sparc/linux/simple-timing-ruby passed. * build/SPARC/tests/opt/quick/se/02.insttest/sparc/linux/o3-timing passed. * build/SPARC/tests/opt/quick/se/02.insttest/sparc/linux/simple-atomic passed. *