[m5-dev] Cron m5test@zizzer /z/m5/regression/do-regression --scratch all
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-timing passed. * build/ALPHA_SE/tests/fast/quick/30.eio-mp/alpha/eio/simple-timing-mp passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-timing passed. * build/ALPHA_SE/tests/fast/quick/01.hello-2T-smt/alpha/linux/o3-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby passed. * build/ALPHA_SE/tests/fast/long/00.gzip/alpha/tru64/simple-atomic passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby passed. * build/ALPHA_SE/tests/fast/long/60.bzip2/alpha/tru64/simple-atomic passed. * build/ALPHA_SE/tests/fast/long/50.vortex/alpha/tru64/simple-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-atomic passed. * build/ALPHA_SE/tests/fast/quick/30.eio-mp/alpha/eio/simple-atomic-mp passed. * build/ALPHA_SE/tests/fast/long/40.perlbmk/alpha/tru64/simple-atomic passed. * build/ALPHA_SE/tests/fast/long/70.twolf/alpha/tru64/simple-timing passed. * build/ALPHA_SE/tests/fast/long/70.twolf/alpha/tru64/o3-timing passed. * build/ALPHA_SE/tests/fast/long/50.vortex/alpha/tru64/o3-timing passed. * build/ALPHA_SE/tests/fast/long/70.twolf/alpha/tru64/simple-atomic passed. * build/ALPHA_SE/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby passed. * build/ALPHA_SE/tests/fast/long/00.gzip/alpha/tru64/simple-timing passed. * build/ALPHA_SE/tests/fast/long/30.eon/alpha/tru64/simple-atomic passed. * build/ALPHA_SE/tests/fast/quick/20.eio-short/alpha/eio/simple-atomic passed. * build/ALPHA_SE/tests/fast/long/50.vortex/alpha/tru64/simple-atomic passed. * build/ALPHA_SE/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby passed. * build/ALPHA_SE/tests/fast/long/30.eon/alpha/tru64/simple-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/o3-timing passed. * build/ALPHA_SE/tests/fast/quick/20.eio-short/alpha/eio/simple-timing passed. * build/ALPHA_SE/tests/fast/long/70.twolf/alpha/tru64/inorder-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-atomic passed. * build/ALPHA_SE/tests/fast/long/50.vortex/alpha/tru64/inorder-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/o3-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/inorder-timing passed. * build/ALPHA_SE/tests/fast/quick/50.memtest/alpha/linux/memtest passed. * build/ALPHA_SE/tests/fast/long/40.perlbmk/alpha/tru64/simple-timing passed. * build/ALPHA_SE_MOESI_hammer/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_hammer passed. * build/ALPHA_SE_MOESI_hammer/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_hammer passed. * build/ALPHA_SE_MOESI_hammer/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_hammer passed. * build/ALPHA_SE_MOESI_hammer/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_hammer passed. * build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MESI_CMP_directory passed. * build/ALPHA_SE/tests/fast/long/60.bzip2/alpha/tru64/simple-timing passed. * build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MESI_CMP_directory passed. * build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MESI_CMP_directory passed. * build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MESI_CMP_directory passed. * build/ALPHA_SE/tests/fast/long/30.eon/alpha/tru64/o3-timing passed. * build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_directory passed. * build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_directory passed. * build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_directory passed. * build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_directory passed. * build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_token passed. * build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_token passed. * build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_token passed. * build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_token passed. * build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-atomic passed. * build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-timing passed. * build/ALPHA_FS/tests/fast/quick/80.netperf-stream/alpha/linux/twosys-tsunami-simple-atomic passed. *
[m5-dev] Notification from M5 Bugs
THIS IS AN AUTOMATED MESSAGE, DO NOT REPLY. The following task has a new comment added: FS#337 - Checkpoint Tester Identifies Mismatches (Bugs) for X86_FS User who did this: - Brad Beckmann (beckmabd) -- Responses below: 1. Actually I don't believe the MOESI_hammer part of the binary is important at all. It just happened to be the particular binary I had built when observing the issue. Since the test uses X86_FS in atomic mode, any X86_FS binary should behave the same. 2. Yes, the script option doesn't matter either. I only specified it because I copied it from the example listed in checkpoint-tester.py. 3. I believe that since each user may have their kernel in different locations that it made sense that a default wasn't specified. I'll send you a separate mail with the specific kernel I used. So in summary, the following commands also lead to the exact same problem: % scons -j 4 default=X86_FS build/X86_FS/m5.debug USE_MYSQL=False NO_FAST_ALLOC=1 EXTRAS= % util/checkpoint-tester.py -i 2000 -- build/X86_FS/m5.debug configs/example/fs.py -- More information can be found at the following URL: http://www.m5sim.org/flyspray/task/337#comment164 You are receiving this message because you have requested it from the Flyspray bugtracking system. You can be removed from future notifications by visiting the URL shown above. ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: ruby: support to stallAndWait the mandatory queue
Thanks Arka for that response. You summed it up well. There are just a couple additional things I want to point out: 1. One thing that makes this mechanism work is that one must rank each input port. In other words, the programmer must understand and communicate the dependencies between message classes/protocol virtual channels. That way the correct messages are woken up when the appropriate event occurs. 2. In Nilay's example, you want to make sure that you don't delay the issuing of request A until the replacement of block B completes. Instead, request A should allocate a TBE and issue in parallel with replacing B. The mandatory queue is popped only when the cache message is consumed. When the cache message is stalled, it is basically moved to a temporary data structure with the message buffer where it waits until a higher priority message of the same cache block wakes it up. Brad From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Arkaprava Basu Sent: Saturday, January 22, 2011 10:49 AM To: M5 Developer List Cc: Gabe Black; Ali Saidi Subject: Re: [m5-dev] Review Request: ruby: support to stallAndWait the mandatory queue Hi Nilay, You are mostly correct. I believe this patch contains two things 1. Support in SLICC to allow waiting and stalling on messages in message buffer when the directory is in blocking state for that address (i.e. can not process the message at this point), until some event occurred that can make consumption of the message possible. When the directory unblocks, it provides the support for waking up the messages that were hitherto waiting (this is the precise reason why u did not see pop of mandatory queue, but see WakeUpAllDependants). 2. It contains changes to MOESI_hammer protocol that leverages this support. For the purpose of this particular discussion, the 1st part is the relevant one. As far as I understand, the support in SLICC for waiting and stalling was introduced primarily to enhance fairness in the way SLICC handles the coherence requests. Without this support when a message arrives to a controller in blocking state, it recycles, which means it polls again (and thus looks up again) in 10 cycles (generally recycle latency is set to 10). If there are multiple messages arrive while the controller was blocking state for a given address, you can easily see that there is NO fairness. A message that arrived latest for the blocking address can be served first when the controller unblocks. With the new support for stalling and waiting, the blocked messages are put in a FIFO queue and thus providing better fairness. But as you have correctly guessed, another major advantage of this support is that it reduces unnecessary lookups to the cache structure that happens due to polling (a.k.a recycle). So in summary, I believe that the problem you are seeing with too many lookups will *reduce* when the protocols are adjusted to take advantage of this facility. On related note, I should also mention that another fringe benefit of this support is that it helps in debugging coherence protocols. With this, coherence protocol traces won't contains thousands of debug messages for recycling, which can be pretty annoying for the protocol writers. I hope this helps, Thanks Arka On 01/22/2011 06:40 AM, Nilay Vaish wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/408/#review797 --- I was thinking about why the ratio of number of memory lookups, as reported by gprof, and the number of memory references, as reported in stats.txt. While I was working with the MESI CMP directory protocol, I had seen that the same request from the processor is looked up again and again in the cache, if the request is waiting for some event to happen. For example, suppose a processor asks for loading address A, but the cache has no space for holding address A. Then, it will give up some cache block B before it can bring in address A. The problem is that while the cache block B is being given, it is possible that the request made for address A is looked up in the cache again, even though we know it is not possible that we would find it in the cache. This is because the requests in the mandatory queue are recycled till they get done with. Clearly, we should remove the request for bringing in address A to a separate structure, instead of looking it up again and again. The new structure should be looked up whenever an event, that could possibly affect the status of this request, occurs. If we do this, then I think we should see a further reduction in the number of lookups. I would expect almost 90% of the lookups to the cache to go away. This should also mean a 5% improvement in simulator performance. Brad, do agree
Re: [m5-dev] Error in Simulating Mesh Network
Yes, but right now my repo is a couple weeks behind the main repo and I'd rather get all these patches resolved first, then sync up with main repo and do my final regression testing once. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Gabe Black Sent: Saturday, January 22, 2011 2:26 AM To: M5 Developer List Subject: Re: [m5-dev] Error in Simulating Mesh Network You should be able to move that around any other patches ahead of it, right? It's so simple I wouldn't expect it to really depend on the intervening patches. Gabe Beckmann, Brad wrote: Hi Nilay, Yes, I am aware of this problem and one of the patches (http://reviews.m5sim.org/r/381/) I'm planning to check in does fix this. Unfortunately, those patches are being hung up because I need to do some more work on another one of them and right now I don't have any time to do so. As you can see from the patch, it is a very simple fix, so you may want to do it locally if it blocking you. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev- boun...@m5sim.org] On Behalf Of Nilay Vaish Sent: Thursday, January 20, 2011 6:16 AM To: m5-dev@m5sim.org Subject: [m5-dev] Error in Simulating Mesh Network Brad, I tried simulating a mesh network with four processors. ./build/ALPHA_FS_MOESI_hammer/m5.prof ./configs/example/ruby_fs.py - -maxtick 2000 -n 4 --topology Mesh --mesh-rows 2 --num-l2cache 4 --num-dir 4 I receive the following error: panic: FIFO ordering violated: [MessageBuffer: consumer-yes [ [71227521, 870, 1; ] ]] [Version 1, L1Cache, triggerQueue_in] name: [Version 1, L1Cache, triggerQueue_in] current time: 71227512 delta: 1 arrival_time: 71227513 last arrival_time: 71227521 @ cycle 35613756000 [enqueue:build/ALPHA_FS_MOESI_hammer/mem/ruby/buffers/MessageB uffer.cc, line 198] Do you think that the options I have specified should work correctly? Thanks Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: ruby: support to stallAndWait the mandatory queue
On Sun, 23 Jan 2011, Beckmann, Brad wrote: Thanks Arka for that response. You summed it up well. There are just a couple additional things I want to point out: 1. One thing that makes this mechanism work is that one must rank each input port. In other words, the programmer must understand and communicate the dependencies between message classes/protocol virtual channels. That way the correct messages are woken up when the appropriate event occurs. 2. In Nilay's example, you want to make sure that you don't delay the issuing of request A until the replacement of block B completes. Instead, request A should allocate a TBE and issue in parallel with replacing B. The mandatory queue is popped only when the cache message is consumed. When the cache message is stalled, it is basically moved to a temporary data structure with the message buffer where it waits until a higher priority message of the same cache block wakes it up. Brad I was testing the patch you had posted. I updated it so that it works with the latest version of the repository. Can you update the review board? Somehow, I do not see any change in the number of calls to lookup(). -- Nilay# HG changeset patch # Parent d10be3f3aa4e3440a58642ca4ddb6472efdfb2a7 diff --git a/src/mem/protocol/MOESI_CMP_token-L1cache.sm b/src/mem/protocol/MOESI_CMP_token-L1cache.sm --- a/src/mem/protocol/MOESI_CMP_token-L1cache.sm +++ b/src/mem/protocol/MOESI_CMP_token-L1cache.sm @@ -433,7 +433,7 @@ // ** IN_PORTS ** // Use Timer - in_port(useTimerTable_in, Address, useTimerTable) { + in_port(useTimerTable_in, Address, useTimerTable, rank=5) { if (useTimerTable_in.isReady()) { TBE tbe := L1_TBEs[useTimerTable.readyAddress()]; @@ -459,7 +459,7 @@ } // Reissue Timer - in_port(reissueTimerTable_in, Address, reissueTimerTable) { + in_port(reissueTimerTable_in, Address, reissueTimerTable, rank=4) { if (reissueTimerTable_in.isReady()) { trigger(Event:Request_Timeout, reissueTimerTable.readyAddress(), getCacheEntry(reissueTimerTable.readyAddress()), @@ -467,10 +467,8 @@ } } - - // Persistent Network - in_port(persistentNetwork_in, PersistentMsg, persistentToL1Cache) { + in_port(persistentNetwork_in, PersistentMsg, persistentToL1Cache, rank=3) { if (persistentNetwork_in.isReady()) { peek(persistentNetwork_in, PersistentMsg, block_on=Address) { assert(in_msg.Destination.isElement(machineID)); @@ -519,9 +517,80 @@ } } + // Response Network + in_port(responseNetwork_in, ResponseMsg, responseToL1Cache, rank=2) { +if (responseNetwork_in.isReady()) { + peek(responseNetwork_in, ResponseMsg, block_on=Address) { +assert(in_msg.Destination.isElement(machineID)); + +Entry cache_entry := getCacheEntry(in_msg.Address); +TBE tbe := L1_TBEs[in_msg.Address]; + +// Mark TBE flag if response received off-chip. Use this to update average latency estimate +if ( machineIDToMachineType(in_msg.Sender) == MachineType:L2Cache ) { + + if (in_msg.Sender == mapAddressToRange(in_msg.Address, + MachineType:L2Cache, + l2_select_low_bit, + l2_select_num_bits)) { + +// came from an off-chip L2 cache +if (is_valid(tbe)) { + // L1_TBEs[in_msg.Address].ExternalResponse := true; + // profile_offchipL2_response(in_msg.Address); +} + } + else { + // profile_onchipL2_response(in_msg.Address ); + } +} else if ( machineIDToMachineType(in_msg.Sender) == MachineType:Directory ) { + if (is_valid(tbe)) { +setExternalResponse(tbe); +// profile_memory_response( in_msg.Address); + } +} else if ( machineIDToMachineType(in_msg.Sender) == MachineType:L1Cache) { + //if (isLocalProcessor(machineID, in_msg.Sender) == false) { +//if (is_valid(tbe)) { + // tbe.ExternalResponse := true; + // profile_offchipL1_response(in_msg.Address ); +//} + //} + //else { + // profile_onchipL1_response(in_msg.Address ); + //} +} else { + error(unexpected SenderMachine); +} + + +if (getTokens(cache_entry) + in_msg.Tokens != max_tokens()) { + if (in_msg.Type == CoherenceResponseType:ACK) { +assert(in_msg.Tokens (max_tokens() / 2)); +trigger(Event:Ack, in_msg.Address, cache_entry, tbe); + } else if (in_msg.Type == CoherenceResponseType:DATA_OWNER) { +trigger(Event:Data_Owner, in_msg.Address, cache_entry, tbe); + } else if (in_msg.Type == CoherenceResponseType:DATA_SHARED) { +assert(in_msg.Tokens (max_tokens() / 2)); +
Re: [m5-dev] Profile Results for Mesh Network
I dug more in to the code today. There are three paths along which calls are made to the RubyPort::M5Port::recvTiming(), which eventually results in calls to CacheMemory::lookup(). 1. TimingSimpleCPU::sendFetch() - 140 million 2. TimingSimpleCPU::handleReadPacket() - 30 million 3. TimingSimpleCPU::handleWritePacket() - 18 million The number of times last two functions are called is very close to the total number of memory references (48 million) for all the cpus together. The number of lookup() calls is about 392 million. If we take into account the calls to sendFetch(), then the ratio of number of lookup() calls to that of the number of requests pushed in to ruby reduces to 2 to 1, from an earlier estimate of 8 to 1. My question would be why does sendFetch() makes calls to recvTiming()? -- Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Profile Results for Mesh Network
On Sun, 23 Jan 2011, Korey Sewell wrote: In sendFetch(), it calls sendTiming() which would then call the recvTiming on the cache port since those two should be binded as peers. I'm a little unsure of how the RubyPort, Sequencer, CacheMemory, and CacheController (?) relationship is working (right now at least), but the relationship between sendTiming and recvTiming is the key concept that connects 2 memory objects unless things have changed. On Sun, Jan 23, 2011 at 3:51 PM, Nilay Vaish ni...@cs.wisc.edu wrote: I dug more in to the code today. There are three paths along which calls are made to the RubyPort::M5Port::recvTiming(), which eventually results in calls to CacheMemory::lookup(). 1. TimingSimpleCPU::sendFetch() - 140 million 2. TimingSimpleCPU::handleReadPacket() - 30 million 3. TimingSimpleCPU::handleWritePacket() - 18 million The number of times last two functions are called is very close to the total number of memory references (48 million) for all the cpus together. The number of lookup() calls is about 392 million. If we take into account the calls to sendFetch(), then the ratio of number of lookup() calls to that of the number of requests pushed in to ruby reduces to 2 to 1, from an earlier estimate of 8 to 1. My question would be why does sendFetch() makes calls to recvTiming()? Some more reading revealed that that sendFetch() is calling recvTiming for instruction cache accesses. Whereas the other two calls (handleReadPacket and handleWritePacket) are for data cache accesses. -- Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] Notification from M5 Bugs
THIS IS AN AUTOMATED MESSAGE, DO NOT REPLY. The following task has a new comment added: FS#337 - Checkpoint Tester Identifies Mismatches (Bugs) for X86_FS User who did this: - Gabe Black (gblack) -- Yeah, I was able to reproduce the problem so that wasn't an issue, I just wanted to point out the differences in case somebody else wanted to reproduce it too. I poked at it a bit and have some idea what's going on, but I need to dig into what I was seeing a little deeper so I don't go charging off in the wrong direction. -- More information can be found at the following URL: http://www.m5sim.org/flyspray/task/337#comment165 You are receiving this message because you have requested it from the Flyspray bugtracking system. You can be removed from future notifications by visiting the URL shown above. ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] changeset in m5: refcnt: Change things around so that we handle ...
changeset 31a04e5ac4be in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=31a04e5ac4be description: refcnt: Change things around so that we handle constness correctly. To use a non const pointer: typedef RefCountingPtrFoo FooPtr; To use a const pointer: typedef RefCountingPtrconst Foo ConstFooPtr; diffstat: src/base/refcnt.hh | 12 1 files changed, 4 insertions(+), 8 deletions(-) diffs (29 lines): diff -r d38c1f650a4e -r 31a04e5ac4be src/base/refcnt.hh --- a/src/base/refcnt.hhFri Jan 21 17:51:22 2011 -0800 +++ b/src/base/refcnt.hhSat Jan 22 21:48:06 2011 -0800 @@ -34,7 +34,7 @@ class RefCounted { private: -int count; +mutable int count; private: // Don't allow a default copy constructor or copy operator on @@ -84,13 +84,9 @@ RefCountingPtr(const RefCountingPtr r) { copy(r.data); } ~RefCountingPtr() { del(); } -T *operator-() { return data; } -T operator*() { return *data; } -T *get() { return data; } - -const T *operator-() const { return data; } -const T operator*() const { return *data; } -const T *get() const { return data; } +T *operator-() const { return data; } +T operator*() const { return *data; } +T *get() const { return data; } const RefCountingPtr operator=(T *p) { set(p); return *this; } const RefCountingPtr operator=(const RefCountingPtr r) ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev