Hi
I've found a deadlock in the MOESI_CMP_directory coherence protocol and
I'm not able to see the reason. Can someone give me a help? The
situation is the following:
I'm using x86, detailed OoO core, syscall emulation mode and Ruby
MOESI_CMP_directory. I'm running a parallel vector addition benchmark
written in OpenMP (linked against the m5threads library) with 64 threads
on 64 cores. The command line I use for the execution is this one:
./build/X86/gem5.debug --debug-flags=Ruby --debug-start=1945000000
--redirect-stdout --redirect-stderr configs/example/se.py
--output=/tmp/bench.out --errout=/tmp/bench.err --ruby
--cpu-type=detailed --num-cpus=64 --num-l2caches=64 --l2_size=1MB
--num-dirs=64 --topology=Mesh -c /benchmarks/vecadd -o "-N 64"
The sequence of events that happen during the simulation is the
following:
- All threads execute loads on physical address 0x103bc0. The state of
the line in the cache hierarchy is L1Cache-1 OWNER, all the other L1
caches SHARED and L2Cache-47 ILOSX (Invalid with local owner and
sharers).
- L1Cache-63 does a store on 0x103bc0, issues a GETX to the L2 and
transitions the state the state to SM.
- L2Cache-47 receives the GETX from L1Cache-63. It sends invalidations
to all L1 caches except L1Cache-1 (the owner) and L1Cache-63 (the
requestor), forwards the GETX to the L1Cache-1, sends an ACK to the
L1Cache-63 and transitions the state to IFLOX.
- L1Cache-1 receives the GETX from the L2Cache-47, sends a
DATA_EXCLUSIVE to the L1Cache-63 and transitions the state to I.
- All the L1 caches except L1Cache-1 and L1Cache-63 receive the INV
from the L2Cache-47, send the ACK to the L1Cache-63 and transition the
state to I.
- L1Cache-63 receives the DATA_EXCLUSIVE from L1Cache-1 and the INV
from all the other L1 caches. It transitions to OM and generates a
ALL_ACKS trigger event for itself.
(Up to here everything is fine)
- L1Cache-63 receives the ALL_ACKS trigger event, sends an
UNBLOCK_EXCLUSIVE to the L2Cache-47 and transitions the state to MM_W.
After a while it transitions the state to MM using a timeout.
- While all this is happening, the rest of cores keep doing loads to
0x103bc0, miss in their L1 caches (the state is invalid) and the L1_GETS
requests reach the L2Cache-47, which is in the transient state IFLXO, so
it recycles the L1RequestQueue at every L1_GETS. The L2Cache-47 should
transition from IFLXO to the non-transient state ILX when the
UNBLOCK_EXCLUSIVE from the L1Cache-63 arrives, which never happens...
so, deadlock!
I've been trying to figure out why the UNBLOCK_EXCLUSIVE message does
not arrive to the L2Cache-47, but I'm not able to find the reason.
According to the debug information, the message does the following trip:
1945386000: [Version 63, L1Cache, name=L1Cache_responseFromL1Cache]:
Enqueue arrival_time: 1945388000, Message: [ResponseMsg: Addr =
[0x103bc0, line 0x103bc0] Type = UNBLOCK_EXCLUSIVE Sender = L1Cache-63
SenderMachine = L1Cache Destination = [NetDest (4) 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 - - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 - ] DataBlk = [ 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 ]
Dirty = 0 Acks = 0 MessageSize = Unblock_Control Time = 1945386000000 ]
1945388000: PerfectSwitch-63: Message: (same message as above)
1945388000: [Queue to Throttle 63 3]: Enqueue arrival_time: 1945389000,
Message: (same message)
1945389000: [Queue from port 62 4 2 to PerfectSwitch]: Enqueue
arrival_time: 1945390000, Message: (same message)
1945389000: Throttle-63: [MessageBuffer: consumer-yes [ [1945390000,
191, (same message)] ]] [Queue from port 62 4 2 to PerfectSwitch]
1945390000: PerfectSwitch-62: Message: (same message)
(The message traverses all the switches till it arrives to the
PerfectSwitch-47)
1945420000: PerfectSwitch-47: Message: [ResponseMsg: Addr = [0x103bc0,
line 0x103bc0] Type = UNBLOCK_EXCLUSIVE Sender = L1Cache-63
SenderMachine = L1Cache Destination = [NetDest (4) 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 - - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 - ] DataBlk = [ 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 ]
Dirty = 0 Acks = 0 MessageSize = Unblock_Control Time = 1945386000000 ]
1945420000: [Queue to Throttle 47 1]: Enqueue arrival_time: 1945421000,
Message: [ResponseMsg: Addr = [0x103bc0, line 0x103bc0] Type =
UNBLOCK_EXCLUSIVE Sender = L1Cache-63 SenderMachine = L1Cache
Destination = [NetDest (4) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -
- 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - ] DataBlk =
[ 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 ] Dirty = 0 Acks = 0
MessageSize = Unblock_Control Time = 1945386000000 ]
1945421000: Throttle-47: throttle: 1 my bw 16000 bw spent enqueueing
net msg 8000 time: 1945421.
1945421000: [Version 47, L2Cache, responseNetwork_in]: Enqueue
arrival_time: 1945422000, Message: [ResponseMsg: Addr = [0x103bc0, line
0x103bc0] Type = UNBLOCK_EXCLUSIVE Sender = L1Cache-63 SenderMachine =
L1Cache Destination = [NetDest (4) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 - - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - ]
DataBlk = [ 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 ] Dirty = 0 Acks = 0
MessageSize = Unblock_Control Time = 1945386000000 ]
1945421000: [Queue to Throttle 47 1]: Popping
1945421000: Throttle-47: [MessageBuffer: consumer-yes [ [1945422000,
468, [ResponseMsg: Addr = [0x103bc0, line 0x103bc0] Type =
UNBLOCK_EXCLUSIVE Sender = L1Cache-63 SenderMachine = L1Cache
Destination = [NetDest (4) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -
- 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - ] DataBlk =
[ 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 ] Dirty = 0 Acks = 0
MessageSize = Unblock_Control Time = 1945386000000 ]; ] ]] [Version 47,
L2Cache, responseNetwork_in]
1945421000: Throttle-47: [1 bw: 16000] not scheduled again
There's nothing more related to the UNBLOCK_EXCLUSIVE message in the
trace. I would expect that, at this point, the L2Cache-47 grabs the
message, generates an Exclusive_Unblock event and does the transition
(IFLXO, Exclusive_Unblock, ILX), but for some reason it doesn't do it.
Notice that the L1RequestQueue of the L2Cache-47 is full of L1_GETS that
get eternally recycled while the UNBLOCK_EXCLUSIVE does not arrive. May
it be the case that the UNBLOCK_EXCLUSIVE suffers some starvation
problem due to this? It's the only reason that comes to my mind...
Thanks and sorry for the heavy email,
Lluc
WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.
http://www.bsc.es/disclaimer
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev