Re: [m5-dev] Review Request: ruby: support to stallAndWait the mandatory queue

2011-01-23 Thread Nilay Vaish

On Sun, 23 Jan 2011, Beckmann, Brad wrote:


Thanks Arka for that response.  You summed it up well.

There are just a couple additional things I want to point out:


1.   One thing that makes this mechanism work is that one must rank each 
input port.   In other words, the programmer must understand and communicate 
the dependencies between message classes/protocol virtual channels.  That way 
the correct messages are woken up when the appropriate event occurs.

2.   In Nilay's example, you want to make sure that you don't delay the 
issuing of request A until the replacement of block B completes.  Instead, 
request A should allocate a TBE and issue in parallel with replacing B. The 
mandatory queue is popped only when the cache message is consumed.  When the 
cache message is stalled, it is basically moved to a temporary data structure 
with the message buffer where it waits until a higher priority message of the 
same cache block wakes it up.

Brad



I was testing the patch you had posted. I updated it so that it works with 
the latest version of the repository. Can you update the review board?


Somehow, I do not see any change in the number of calls to lookup().

--
Nilay# HG changeset patch
# Parent d10be3f3aa4e3440a58642ca4ddb6472efdfb2a7

diff --git a/src/mem/protocol/MOESI_CMP_token-L1cache.sm 
b/src/mem/protocol/MOESI_CMP_token-L1cache.sm
--- a/src/mem/protocol/MOESI_CMP_token-L1cache.sm
+++ b/src/mem/protocol/MOESI_CMP_token-L1cache.sm
@@ -433,7 +433,7 @@
   // ** IN_PORTS **
 
   // Use Timer
-  in_port(useTimerTable_in, Address, useTimerTable) {
+  in_port(useTimerTable_in, Address, useTimerTable, rank=5) {
 if (useTimerTable_in.isReady()) {
   TBE tbe := L1_TBEs[useTimerTable.readyAddress()];
 
@@ -459,7 +459,7 @@
   }
 
   // Reissue Timer
-  in_port(reissueTimerTable_in, Address, reissueTimerTable) {
+  in_port(reissueTimerTable_in, Address, reissueTimerTable, rank=4) {
 if (reissueTimerTable_in.isReady()) {
   trigger(Event:Request_Timeout, reissueTimerTable.readyAddress(),
   getCacheEntry(reissueTimerTable.readyAddress()),
@@ -467,10 +467,8 @@
 }
   }
 
-
-
   // Persistent Network
-  in_port(persistentNetwork_in, PersistentMsg, persistentToL1Cache) {
+  in_port(persistentNetwork_in, PersistentMsg, persistentToL1Cache, rank=3) {
 if (persistentNetwork_in.isReady()) {
   peek(persistentNetwork_in, PersistentMsg, block_on="Address") {
 assert(in_msg.Destination.isElement(machineID));
@@ -519,9 +517,80 @@
 }
   }
 
+  // Response Network
+  in_port(responseNetwork_in, ResponseMsg, responseToL1Cache, rank=2) {
+if (responseNetwork_in.isReady()) {
+  peek(responseNetwork_in, ResponseMsg, block_on="Address") {
+assert(in_msg.Destination.isElement(machineID));
+
+Entry cache_entry := getCacheEntry(in_msg.Address);
+TBE tbe := L1_TBEs[in_msg.Address];
+
+// Mark TBE flag if response received off-chip.  Use this to update 
average latency estimate
+if ( machineIDToMachineType(in_msg.Sender) == MachineType:L2Cache ) {
+
+  if (in_msg.Sender == mapAddressToRange(in_msg.Address,
+ MachineType:L2Cache,
+ l2_select_low_bit,
+ l2_select_num_bits)) {
+
+// came from an off-chip L2 cache
+if (is_valid(tbe)) {
+   // L1_TBEs[in_msg.Address].ExternalResponse := true;
+   // profile_offchipL2_response(in_msg.Address);
+}
+  }
+  else {
+   // profile_onchipL2_response(in_msg.Address );
+  }
+} else if ( machineIDToMachineType(in_msg.Sender) == 
MachineType:Directory ) {
+  if (is_valid(tbe)) {
+setExternalResponse(tbe);
+// profile_memory_response( in_msg.Address);
+  }
+} else if ( machineIDToMachineType(in_msg.Sender) == 
MachineType:L1Cache) {
+  //if (isLocalProcessor(machineID, in_msg.Sender) == false) {
+//if (is_valid(tbe)) {
+   // tbe.ExternalResponse := true;
+   // profile_offchipL1_response(in_msg.Address );
+//}
+  //}
+  //else {
+   // profile_onchipL1_response(in_msg.Address );
+  //}
+} else {
+  error("unexpected SenderMachine");
+}
+
+
+if (getTokens(cache_entry) + in_msg.Tokens != max_tokens()) {
+  if (in_msg.Type == CoherenceResponseType:ACK) {
+assert(in_msg.Tokens < (max_tokens() / 2));
+trigger(Event:Ack, in_msg.Address, cache_entry, tbe);
+  } else if (in_msg.Type == CoherenceResponseType:DATA_OWNER) {
+trigger(Event:Data_Owner, in_msg.Address, cache_entry, tbe);
+  } else if (in_msg.Type == CoherenceResponseType:DATA_SHARED) {
+assert(in_msg.Tokens < (max_tokens() / 2));
+   

Re: [m5-dev] Review Request: ruby: support to stallAndWait the mandatory queue

2011-01-23 Thread Beckmann, Brad
Thanks Arka for that response.  You summed it up well.

There are just a couple additional things I want to point out:


1.   One thing that makes this mechanism work is that one must rank each 
input port.   In other words, the programmer must understand and communicate 
the dependencies between message classes/protocol virtual channels.  That way 
the correct messages are woken up when the appropriate event occurs.

2.   In Nilay's example, you want to make sure that you don't delay the 
issuing of request A until the replacement of block B completes.  Instead, 
request A should allocate a TBE and issue in parallel with replacing B. The 
mandatory queue is popped only when the cache message is consumed.  When the 
cache message is stalled, it is basically moved to a temporary data structure 
with the message buffer where it waits until a higher priority message of the 
same cache block wakes it up.

Brad


From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of 
Arkaprava Basu
Sent: Saturday, January 22, 2011 10:49 AM
To: M5 Developer List
Cc: Gabe Black; Ali Saidi
Subject: Re: [m5-dev] Review Request: ruby: support to stallAndWait the 
mandatory queue

Hi Nilay,

You are mostly correct. I believe this patch contains two things

1. Support in SLICC to allow waiting and stalling on messages in message buffer 
when the directory is in "blocking" state for that address (i.e. can not 
process the message at this point),  until some event occurred that can make 
consumption of the message possible. When the directory "unblocks", it provides 
the support for waking up the messages that were hitherto waiting (this is the 
precise reason why u did not see pop of mandatory queue, but see 
WakeUpAllDependants).

2. It contains changes to MOESI_hammer protocol that leverages this support.

For the purpose of this particular discussion, the 1st part is the relevant one.

As far as I understand, the support in SLICC for waiting and stalling was 
introduced primarily to enhance "fairness" in the way SLICC handles the 
coherence requests. Without this support when a message arrives to a controller 
in blocking state, it "recycles", which means it polls again (and thus looks up 
again) in 10 cycles (generally recycle latency is set to 10). If there are 
multiple messages arrive while the controller was blocking state for a given 
address, you can easily see that there is NO "fairness". A message that arrived 
latest for the blocking address can be served first when the controller 
"unblocks". With the new support for stalling and waiting, the blocked messages 
are put in a FIFO queue and thus providing better fairness.
But as you have correctly guessed, another major advantage of this support is 
that it reduces unnecessary "lookup"s to the cache structure that happens due 
to polling (a.k.a "recycle").  So in summary, I believe that the problem you 
are seeing with too many lookups will *reduce* when the protocols are adjusted 
to take advantage of this facility. On related note, I should also mention that 
another fringe benefit of this support is that it helps in debugging coherence 
protocols. With this, coherence protocol traces won't contains thousands of 
debug messages for recycling, which can be pretty annoying for the protocol 
writers.

I hope this helps,

Thanks
Arka



On 01/22/2011 06:40 AM, Nilay Vaish wrote:



---

This is an automatically generated e-mail. To reply, visit:

http://reviews.m5sim.org/r/408/#review797

---





I was thinking about why the ratio of number of memory lookups, as reported by 
gprof,

and the number of memory references, as reported in stats.txt.



While I was working with the MESI CMP directory protocol, I had seen that the 
same

request from the processor is looked up again and again in the cache, if the 
request

is waiting for some event to happen. For example, suppose a processor asks for 
loading

address A, but the cache has no space for holding address A. Then, it will give 
up

some cache block B before it can bring in address A.



The problem is that while the cache block B is being given, it is possible that 
the

request made for address A is looked up in the cache again, even though we know 
it

is not possible that we would find it in the cache. This is because the 
requests in

the mandatory queue are recycled till they get done with.



Clearly, we should remove the request for bringing in address A to a separate 
structure,

instead of looking it up again and again. The new structure should be looked up 
whenever

an event, that could possibly affect the status of this request, occurs. If we 
do this,

then I think we should see a further reduction in the number of lookups. I 
would expect

almost 90% of the loo

Re: [m5-dev] Review Request: ruby: support to stallAndWait the mandatory queue

2011-01-22 Thread Arkaprava Basu

Hi Nilay,

You are mostly correct. I believe this patch contains two things

1. Support in SLICC to allow waiting and stalling on messages in message 
buffer when the directory is in "blocking" state for that address (i.e. 
can not process the message at this point),  until some event occurred 
that can make consumption of the message possible. When the directory 
"unblocks", it provides the support for waking up the messages that were 
hitherto waiting (this is the precise reason why u did not see pop of 
mandatory queue, but see WakeUpAllDependants).


2. It contains changes to MOESI_hammer protocol that leverages this support.

For the purpose of this particular discussion, the 1st part is the 
relevant one.


As far as I understand, the support in SLICC for waiting and stalling 
was introduced primarily to enhance "fairness" in the way SLICC handles 
the coherence requests. Without this support when a message arrives to a 
controller in blocking state, it "recycles", which means it polls again 
(and thus looks up again) in 10 cycles (generally recycle latency is set 
to 10). If there are multiple messages arrive while the controller was 
blocking state for a given address, you can easily see that there is NO 
"fairness". A message that arrived latest for the blocking address can 
be served first when the controller "unblocks". With the new support for 
stalling and waiting, the blocked messages are put in a FIFO queue and 
thus providing better fairness.
But as you have correctly guessed, another major advantage of this 
support is that it reduces unnecessary "lookup"s to the cache structure 
that happens due to polling (a.k.a "recycle").  So in summary, I believe 
that the problem you are seeing with too many lookups will *reduce* when 
the protocols are adjusted to take advantage of this facility. On 
related note, I should also mention that another fringe benefit of this 
support is that it helps in debugging coherence protocols. With this, 
coherence protocol traces won't contains thousands of debug messages for 
recycling, which can be pretty annoying for the protocol writers.


I hope this helps,

Thanks
Arka



On 01/22/2011 06:40 AM, Nilay Vaish wrote:

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/408/#review797
---


I was thinking about why the ratio of number of memory lookups, as reported by 
gprof,
and the number of memory references, as reported in stats.txt.

While I was working with the MESI CMP directory protocol, I had seen that the 
same
request from the processor is looked up again and again in the cache, if the 
request
is waiting for some event to happen. For example, suppose a processor asks for 
loading
address A, but the cache has no space for holding address A. Then, it will give 
up
some cache block B before it can bring in address A.

The problem is that while the cache block B is being given, it is possible that 
the
request made for address A is looked up in the cache again, even though we know 
it
is not possible that we would find it in the cache. This is because the 
requests in
the mandatory queue are recycled till they get done with.

Clearly, we should remove the request for bringing in address A to a separate 
structure,
instead of looking it up again and again. The new structure should be looked up 
whenever
an event, that could possibly affect the status of this request, occurs. If we 
do this,
then I think we should see a further reduction in the number of lookups. I 
would expect
almost 90% of the lookups to the cache to go away. This should also mean a 5% 
improvement
in simulator performance.

Brad, do agree with the above reasoning? If I am reading the patch correctly, I 
think
this patch is trying to do that, though I do not see the mandatory queue being 
popped.
Can you explain the purpose of the patch in a slightly verbose manner? If it is 
doing
doing what I said above, then I think we should do this for all the protocols.

- Nilay


On 2011-01-06 16:19:46, Brad Beckmann wrote:

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/408/
---

(Updated 2011-01-06 16:19:46)


Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan 
Binkert.


Summary
---

ruby: support to stallAndWait the mandatory queue

By stalling and waiting the mandatory queue instead of recycling it, one can
ensure that no incoming messages are starved when the mandatory queue puts
signficant of pressure on the L1 cache controller (i.e. the ruby memtester).


Diffs
-

   src/mem/protocol/MOESI_CMP_token-L1cache.sm 9f9e10967912
   src/mem/protocol/MOESI_hammer-cache.sm 9f9e10967912
   src/mem/ruby/buffers/MessageBuffer.hh 9f9e10967912
   src/mem/

Re: [m5-dev] Review Request: ruby: support to stallAndWait the mandatory queue

2011-01-22 Thread Nilay Vaish

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/408/#review797
---


I was thinking about why the ratio of number of memory lookups, as reported by 
gprof, 
and the number of memory references, as reported in stats.txt.

While I was working with the MESI CMP directory protocol, I had seen that the 
same 
request from the processor is looked up again and again in the cache, if the 
request 
is waiting for some event to happen. For example, suppose a processor asks for 
loading 
address A, but the cache has no space for holding address A. Then, it will give 
up
some cache block B before it can bring in address A.

The problem is that while the cache block B is being given, it is possible that 
the 
request made for address A is looked up in the cache again, even though we know 
it
is not possible that we would find it in the cache. This is because the 
requests in 
the mandatory queue are recycled till they get done with.

Clearly, we should remove the request for bringing in address A to a separate 
structure, 
instead of looking it up again and again. The new structure should be looked up 
whenever 
an event, that could possibly affect the status of this request, occurs. If we 
do this, 
then I think we should see a further reduction in the number of lookups. I 
would expect 
almost 90% of the lookups to the cache to go away. This should also mean a 5% 
improvement 
in simulator performance.

Brad, do agree with the above reasoning? If I am reading the patch correctly, I 
think 
this patch is trying to do that, though I do not see the mandatory queue being 
popped.
Can you explain the purpose of the patch in a slightly verbose manner? If it is 
doing 
doing what I said above, then I think we should do this for all the protocols.

- Nilay


On 2011-01-06 16:19:46, Brad Beckmann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.m5sim.org/r/408/
> ---
> 
> (Updated 2011-01-06 16:19:46)
> 
> 
> Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and 
> Nathan Binkert.
> 
> 
> Summary
> ---
> 
> ruby: support to stallAndWait the mandatory queue
> 
> By stalling and waiting the mandatory queue instead of recycling it, one can
> ensure that no incoming messages are starved when the mandatory queue puts
> signficant of pressure on the L1 cache controller (i.e. the ruby memtester).
> 
> 
> Diffs
> -
> 
>   src/mem/protocol/MOESI_CMP_token-L1cache.sm 9f9e10967912 
>   src/mem/protocol/MOESI_hammer-cache.sm 9f9e10967912 
>   src/mem/ruby/buffers/MessageBuffer.hh 9f9e10967912 
>   src/mem/ruby/buffers/MessageBuffer.cc 9f9e10967912 
>   src/mem/slicc/ast/WakeUpAllDependentsStatementAST.py PRE-CREATION 
>   src/mem/slicc/ast/__init__.py 9f9e10967912 
>   src/mem/slicc/parser.py 9f9e10967912 
>   src/mem/slicc/symbols/StateMachine.py 9f9e10967912 
> 
> Diff: http://reviews.m5sim.org/r/408/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Brad
> 
>

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] Review Request: ruby: support to stallAndWait the mandatory queue

2011-01-06 Thread Brad Beckmann

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/408/
---

Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan 
Binkert.


Summary
---

ruby: support to stallAndWait the mandatory queue

By stalling and waiting the mandatory queue instead of recycling it, one can
ensure that no incoming messages are starved when the mandatory queue puts
signficant of pressure on the L1 cache controller (i.e. the ruby memtester).


Diffs
-

  src/mem/protocol/MOESI_CMP_token-L1cache.sm 9f9e10967912 
  src/mem/protocol/MOESI_hammer-cache.sm 9f9e10967912 
  src/mem/ruby/buffers/MessageBuffer.hh 9f9e10967912 
  src/mem/ruby/buffers/MessageBuffer.cc 9f9e10967912 
  src/mem/slicc/ast/WakeUpAllDependentsStatementAST.py PRE-CREATION 
  src/mem/slicc/ast/__init__.py 9f9e10967912 
  src/mem/slicc/parser.py 9f9e10967912 
  src/mem/slicc/symbols/StateMachine.py 9f9e10967912 

Diff: http://reviews.m5sim.org/r/408/diff


Testing
---


Thanks,

Brad

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev