Hi

I've found a deadlock in the MOESI_CMP_directory coherence protocol and I'm not able to see the reason. Can someone give me a help? The situation is the following:

I'm using x86, detailed OoO core, syscall emulation mode and Ruby MOESI_CMP_directory. I'm running a parallel vector addition benchmark written in OpenMP (linked against the m5threads library) with 64 threads on 64 cores. The command line I use for the execution is this one:

./build/X86/gem5.debug --debug-flags=Ruby --debug-start=1945000000 --redirect-stdout --redirect-stderr configs/example/se.py --output=/tmp/bench.out --errout=/tmp/bench.err --ruby --cpu-type=detailed --num-cpus=64 --num-l2caches=64 --l2_size=1MB --num-dirs=64 --topology=Mesh -c /benchmarks/vecadd -o "-N 64"

The sequence of events that happen during the simulation is the following:

- All threads execute loads on physical address 0x103bc0. The state of the line in the cache hierarchy is L1Cache-1 OWNER, all the other L1 caches SHARED and L2Cache-47 ILOSX (Invalid with local owner and sharers). - L1Cache-63 does a store on 0x103bc0, issues a GETX to the L2 and transitions the state the state to SM. - L2Cache-47 receives the GETX from L1Cache-63. It sends invalidations to all L1 caches except L1Cache-1 (the owner) and L1Cache-63 (the requestor), forwards the GETX to the L1Cache-1, sends an ACK to the L1Cache-63 and transitions the state to IFLOX. - L1Cache-1 receives the GETX from the L2Cache-47, sends a DATA_EXCLUSIVE to the L1Cache-63 and transitions the state to I. - All the L1 caches except L1Cache-1 and L1Cache-63 receive the INV from the L2Cache-47, send the ACK to the L1Cache-63 and transition the state to I. - L1Cache-63 receives the DATA_EXCLUSIVE from L1Cache-1 and the INV from all the other L1 caches. It transitions to OM and generates a ALL_ACKS trigger event for itself.
(Up to here everything is fine)
- L1Cache-63 receives the ALL_ACKS trigger event, sends an UNBLOCK_EXCLUSIVE to the L2Cache-47 and transitions the state to MM_W. After a while it transitions the state to MM using a timeout. - While all this is happening, the rest of cores keep doing loads to 0x103bc0, miss in their L1 caches (the state is invalid) and the L1_GETS requests reach the L2Cache-47, which is in the transient state IFLXO, so it recycles the L1RequestQueue at every L1_GETS. The L2Cache-47 should transition from IFLXO to the non-transient state ILX when the UNBLOCK_EXCLUSIVE from the L1Cache-63 arrives, which never happens... so, deadlock!

I've been trying to figure out why the UNBLOCK_EXCLUSIVE message does not arrive to the L2Cache-47, but I'm not able to find the reason. According to the debug information, the message does the following trip:

1945386000: [Version 63, L1Cache, name=L1Cache_responseFromL1Cache]: Enqueue arrival_time: 1945388000, Message: [ResponseMsg: Addr = [0x103bc0, line 0x103bc0] Type = UNBLOCK_EXCLUSIVE Sender = L1Cache-63 SenderMachine = L1Cache Destination = [NetDest (4) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - ] DataBlk = [ 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 ] Dirty = 0 Acks = 0 MessageSize = Unblock_Control Time = 1945386000000 ]

1945388000: PerfectSwitch-63: Message: (same message as above)

1945388000: [Queue to Throttle 63 3]: Enqueue arrival_time: 1945389000, Message: (same message)

1945389000: [Queue from port 62 4 2 to PerfectSwitch]: Enqueue arrival_time: 1945390000, Message: (same message)

1945389000: Throttle-63: [MessageBuffer: consumer-yes [ [1945390000, 191, (same message)] ]] [Queue from port 62 4 2 to PerfectSwitch]

1945390000: PerfectSwitch-62: Message: (same message)

(The message traverses all the switches till it arrives to the PerfectSwitch-47)

1945420000: PerfectSwitch-47: Message: [ResponseMsg: Addr = [0x103bc0, line 0x103bc0] Type = UNBLOCK_EXCLUSIVE Sender = L1Cache-63 SenderMachine = L1Cache Destination = [NetDest (4) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - ] DataBlk = [ 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 ] Dirty = 0 Acks = 0 MessageSize = Unblock_Control Time = 1945386000000 ]

1945420000: [Queue to Throttle 47 1]: Enqueue arrival_time: 1945421000, Message: [ResponseMsg: Addr = [0x103bc0, line 0x103bc0] Type = UNBLOCK_EXCLUSIVE Sender = L1Cache-63 SenderMachine = L1Cache Destination = [NetDest (4) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - ] DataBlk = [ 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 ] Dirty = 0 Acks = 0 MessageSize = Unblock_Control Time = 1945386000000 ]

1945421000: Throttle-47: throttle: 1 my bw 16000 bw spent enqueueing net msg 8000 time: 1945421.

1945421000: [Version 47, L2Cache, responseNetwork_in]: Enqueue arrival_time: 1945422000, Message: [ResponseMsg: Addr = [0x103bc0, line 0x103bc0] Type = UNBLOCK_EXCLUSIVE Sender = L1Cache-63 SenderMachine = L1Cache Destination = [NetDest (4) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - ] DataBlk = [ 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 ] Dirty = 0 Acks = 0 MessageSize = Unblock_Control Time = 1945386000000 ]

1945421000: [Queue to Throttle 47 1]: Popping

1945421000: Throttle-47: [MessageBuffer: consumer-yes [ [1945422000, 468, [ResponseMsg: Addr = [0x103bc0, line 0x103bc0] Type = UNBLOCK_EXCLUSIVE Sender = L1Cache-63 SenderMachine = L1Cache Destination = [NetDest (4) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - ] DataBlk = [ 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 ] Dirty = 0 Acks = 0 MessageSize = Unblock_Control Time = 1945386000000 ]; ] ]] [Version 47, L2Cache, responseNetwork_in]

1945421000: Throttle-47: [1 bw: 16000] not scheduled again


There's nothing more related to the UNBLOCK_EXCLUSIVE message in the trace. I would expect that, at this point, the L2Cache-47 grabs the message, generates an Exclusive_Unblock event and does the transition (IFLXO, Exclusive_Unblock, ILX), but for some reason it doesn't do it. Notice that the L1RequestQueue of the L2Cache-47 is full of L1_GETS that get eternally recycled while the UNBLOCK_EXCLUSIVE does not arrive. May it be the case that the UNBLOCK_EXCLUSIVE suffers some starvation problem due to this? It's the only reason that comes to my mind...

Thanks and sorry for the heavy email,

Lluc

WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to