Hi,
I share my recent finding on this issue.
In my case, I ran splash on 8-core target machine and the benchmark was
stuck at kernel region executing a spin lock "__ticket_spin_lock".
To illustrate this issue, I simplified ticket_spin_lock and
ticket_spin_unlock code.
<ticket_spin_lock>
{
while LD [ticket_addr] != MY_TICKET // Wait my turn.
continue;
}
<ticket_spin_unlock>
{
LD [ticket_addr] -> TEMP
ST [ticket_addr] <- TEMP + 1 // Set next ticket number to wake up the
next entry.
}
Let's assume we have two cores A and B and one ticket lock (ticket value =
0). Currently core A has the lock and B is waiting the lock to be released.
In most of cases, when core A release the lock, core B can acquire and
release the lock like the following scenario.
Core A Core B
LD[Lock] -> 0
ST[Lock] <- 1
while LD[Lock] != 1: // Acquire the lock
LD[Lock] == 1
continue;
LD[Lock] -> 1
ST[Lock] <- 2 // Release the lock
However, sometimes ticket_spin_unlock resides in ROB with ticket_spin_lock
code. At this situation, LD in ticket_spin_unlock could bypass the previous
LD in ticket_spin_lock. If that happened, the lock would be locked and
never released.
Core A Core B
**LD[Lock] -> 0
<-----------------------|
LD[Lock] -> 0
|
ST[Lock] <- 1
|
while LD[Lock] != 1: // Acquire the lock
|
continue;
|
|
**LD[Lock] -> 0 // This load bypassed the
previous load and got the previous value 0
ST[Lock] <- 1 // Release the previous lock
???
// the next waiting
core with ticket 2 cannot get the lock forever.
In my understand MARSS allows load reordering; however, x86 architecture
does not according to ISA manual. This incompatibility seems to cause this
issue. To solve this issue, I think we need some technique used in MIPS
R10000 to enforce SC or just disable load reordering.
Thanks,
Hanhwi
2016-12-06 5:18 GMT+09:00 Seunghee Shin <[email protected]>:
> Dear,
>
> Hope that you are doing great. Thank you for your efforts on this
> simulator. By the way, I encountered an issue about one month now and I am
> still struggling so far due to this. My issue looks very similar to what
> was addressed about three years ago.
>
> http://marss86-devel.cs.binghamton.narkive.com/
> 48BCHVib/simulation-is-stuck-in-kernel
>
> It seems that it happens less frequently with PARSEC, but it very likely
> happens with my benchmarks (1 out of 2 simulations), although my benchmarks
> only have simple data structure insert and delete operations. It is totally
> working fine on single core configuration, but I need multi-cores.
> Currently, I am running my benchmarks on 4 cores. Would you anyone know how
> to fix it or how to avoid it? I will really appreciate your helps. Thank
> you in advance.
>
> Sincerely,
> Seunghee Shin
>
>
_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel