Dear Seunghee Shin,

What I tried to fix the timing issue you mention is to change a way to
check RC operand is ready to read in issue queue. I described what I fixed
in the previous post.

I think that timing issue is not relevant with the deadlock issue. As far
as I know, the deadlock issue is due to TSO violation in MARSS. MARSS
pipeline seems to implement RC model. To resolve around the deadlock, you
have two choices I think. One way is preventing load-load reordering in
issue stage and another way is checking loaded value has been changed when
committing an load instructions; if the value has been changed, flush whole
pipeline. The later is what I used to fix the deadlock.

Thanks,
Hanhwi

2017-02-27 20:00 GMT+09:00 Seunghee Shin <[email protected]>:

> Thank you for your reply. By the way, I searched previous mail-archive to
> find some clues, I found your old discussion about timing issue.
> Especially, I feel that one discussion may be related to this issue. Could
> you explain what you try to fix below? I am looking forward to hearing from
> you.
>
> Sincerely,
> Seunghee Shin
>
>
> 3) Stores that wait a value to store wake up early than their dependent
> uops.
> O3 model issues in two phases, address calculation and memory access. So,
> stores first issues when operands ra and rb are ready. At the same time, rc
> value is checked with PhysicalRegister::ready() function. If rc is ready,
> the stores succeed in issue. However, the ready function is not for
> checking whether value is available in O3 timing model. It's for checking a
> value is calculated by PTLSim.
>
> I changed the line ooo-exec.cpp:562 like the below
> bool rcready = rc.state != PHYSREG_WAITING;
> completed = issuestore(*lsq, origvirt, radata, rbdata, rcdata,
> rcready, pteupdate);
> This is not complete as it does not take account of clustered architecture.
> If you want to model clustered architecture, which have several issue
> queues, you need to know when rc value arrives in the issue queue where
> store is waiting to issue.
>
>
>
> On Thu, Dec 22, 2016 at 12:30 PM, hanhwi jang <[email protected]>
> wrote:
>
>> Hi,
>>
>> I share my recent finding on this issue.
>>
>> In my case, I ran splash on 8-core target machine and the benchmark was
>> stuck at kernel region executing a spin lock "__ticket_spin_lock".
>>
>> To illustrate this issue, I simplified ticket_spin_lock and
>> ticket_spin_unlock code.
>>
>> <ticket_spin_lock>
>> {
>> while LD [ticket_addr] != MY_TICKET  // Wait my turn.
>>     continue;
>> }
>>
>> <ticket_spin_unlock>
>> {
>>    LD [ticket_addr] -> TEMP
>>    ST [ticket_addr] <- TEMP + 1 // Set next ticket number to wake up the
>> next entry.
>> }
>>
>> Let's assume we have two cores A and B and one ticket lock (ticket value
>> = 0). Currently core A has the lock and B is waiting the lock to be
>> released.
>>
>> In most of cases, when core A release the lock, core B can acquire and
>> release the lock like the following scenario.
>>
>>        Core A                   Core B
>>
>> LD[Lock] -> 0
>> ST[Lock] <- 1
>>                                while LD[Lock] != 1: // Acquire the lock
>> LD[Lock] == 1
>>                                     continue;
>>
>>                                LD[Lock] -> 1
>>                                ST[Lock] <- 2 // Release the lock
>>
>> However, sometimes ticket_spin_unlock resides in ROB with
>> ticket_spin_lock code. At this situation, LD in ticket_spin_unlock could
>> bypass the previous LD in ticket_spin_lock. If that happened, the lock
>> would be locked and never released.
>>
>>
>>       Core A                   Core B
>>
>>                               **LD[Lock] -> 0
>> <-----------------------|
>> LD[Lock] -> 0
>>                     |
>> ST[Lock] <- 1
>>                     |
>>                                while LD[Lock] != 1: // Acquire the lock
>>         |
>>                                     continue;
>>                        |
>>
>>                             |
>>                               **LD[Lock] -> 0 // This load bypassed the
>> previous load and got the previous value 0
>>                                  ST[Lock] <- 1 // Release the previous
>> lock ???
>>                                                        // the next
>> waiting core with ticket 2 cannot get the lock forever.
>>
>> In my understand MARSS allows load reordering; however, x86 architecture
>> does not according to ISA manual. This incompatibility seems to cause this
>> issue. To solve this issue, I think we need some technique used in MIPS
>> R10000 to enforce SC or just disable load reordering.
>>
>> Thanks,
>> Hanhwi
>>
>> 2016-12-06 5:18 GMT+09:00 Seunghee Shin <[email protected]>:
>>
>>> Dear,
>>>
>>> Hope that you are doing great. Thank you for your efforts on this
>>> simulator. By the way, I encountered an issue about one month now and I am
>>> still struggling so far due to this. My issue looks very similar to what
>>> was addressed about three years ago.
>>>
>>> http://marss86-devel.cs.binghamton.narkive.com/48BCHVib/simu
>>> lation-is-stuck-in-kernel
>>>
>>> It seems that it happens less frequently with PARSEC, but it very likely
>>> happens with my benchmarks (1 out of 2 simulations), although my benchmarks
>>> only have simple data structure insert and delete operations. It is totally
>>> working fine on single core configuration, but I need multi-cores.
>>> Currently, I am running my benchmarks on 4 cores. Would you anyone know how
>>> to fix it or how to avoid it? I will really appreciate your helps. Thank
>>> you in advance.
>>>
>>> Sincerely,
>>> Seunghee Shin
>>>
>>>
>>
>
>
> --
> Sincerely,
> Seunghee Shin
>
_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

Reply via email to