Well,

In MARSS, loaded values are stored in LSQ's data and you can read current
value at a certain address with loadvirt function. So, when an load
instruction is committed, compare its LSQ value and a fresh value at its
virtual address loaded by "loadvirt" function at commit stage.

Thanks,
Hanhwi

2017-02-28 16:48 GMT+09:00 Seunghee Shin <[email protected]>:

> Hello Hanhwi Jang,
>
> Thank you for your reply. Could I ask your solution further? How do you
> check the loaded value and how it can be changed in the middle?
>
> I am looking forward to hearing from you further. Thank you for your
> advice.
>
> Sincerely,
> Seunghee Shin
>
>
> On Tue, Feb 28, 2017 at 2:37 AM, hanhwi jang <[email protected]> wrote:
>
>> Dear Seunghee Shin,
>>
>> What I tried to fix the timing issue you mention is to change a way to
>> check RC operand is ready to read in issue queue. I described what I fixed
>> in the previous post.
>>
>> I think that timing issue is not relevant with the deadlock issue. As far
>> as I know, the deadlock issue is due to TSO violation in MARSS. MARSS
>> pipeline seems to implement RC model. To resolve around the deadlock, you
>> have two choices I think. One way is preventing load-load reordering in
>> issue stage and another way is checking loaded value has been changed when
>> committing an load instructions; if the value has been changed, flush whole
>> pipeline. The later is what I used to fix the deadlock.
>>
>> Thanks,
>> Hanhwi
>>
>> 2017-02-27 20:00 GMT+09:00 Seunghee Shin <[email protected]>:
>>
>>> Thank you for your reply. By the way, I searched previous mail-archive
>>> to find some clues, I found your old discussion about timing issue.
>>> Especially, I feel that one discussion may be related to this issue. Could
>>> you explain what you try to fix below? I am looking forward to hearing from
>>> you.
>>>
>>> Sincerely,
>>> Seunghee Shin
>>>
>>>
>>> 3) Stores that wait a value to store wake up early than their dependent
>>> uops.
>>> O3 model issues in two phases, address calculation and memory access. So,
>>> stores first issues when operands ra and rb are ready. At the same time,
>>> rc
>>> value is checked with PhysicalRegister::ready() function. If rc is ready,
>>> the stores succeed in issue. However, the ready function is not for
>>> checking whether value is available in O3 timing model. It's for
>>> checking a
>>> value is calculated by PTLSim.
>>>
>>> I changed the line ooo-exec.cpp:562 like the below
>>> bool rcready = rc.state != PHYSREG_WAITING;
>>> completed = issuestore(*lsq, origvirt, radata, rbdata, rcdata,
>>> rcready, pteupdate);
>>> This is not complete as it does not take account of clustered
>>> architecture.
>>> If you want to model clustered architecture, which have several issue
>>> queues, you need to know when rc value arrives in the issue queue where
>>> store is waiting to issue.
>>>
>>>
>>>
>>> On Thu, Dec 22, 2016 at 12:30 PM, hanhwi jang <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I share my recent finding on this issue.
>>>>
>>>> In my case, I ran splash on 8-core target machine and the benchmark was
>>>> stuck at kernel region executing a spin lock "__ticket_spin_lock".
>>>>
>>>> To illustrate this issue, I simplified ticket_spin_lock and
>>>> ticket_spin_unlock code.
>>>>
>>>> <ticket_spin_lock>
>>>> {
>>>> while LD [ticket_addr] != MY_TICKET  // Wait my turn.
>>>>     continue;
>>>> }
>>>>
>>>> <ticket_spin_unlock>
>>>> {
>>>>    LD [ticket_addr] -> TEMP
>>>>    ST [ticket_addr] <- TEMP + 1 // Set next ticket number to wake up
>>>> the next entry.
>>>> }
>>>>
>>>> Let's assume we have two cores A and B and one ticket lock (ticket
>>>> value = 0). Currently core A has the lock and B is waiting the lock to be
>>>> released.
>>>>
>>>> In most of cases, when core A release the lock, core B can acquire and
>>>> release the lock like the following scenario.
>>>>
>>>>        Core A                   Core B
>>>>
>>>> LD[Lock] -> 0
>>>> ST[Lock] <- 1
>>>>                                while LD[Lock] != 1: // Acquire the lock
>>>> LD[Lock] == 1
>>>>                                     continue;
>>>>
>>>>                                LD[Lock] -> 1
>>>>                                ST[Lock] <- 2 // Release the lock
>>>>
>>>> However, sometimes ticket_spin_unlock resides in ROB with
>>>> ticket_spin_lock code. At this situation, LD in ticket_spin_unlock could
>>>> bypass the previous LD in ticket_spin_lock. If that happened, the lock
>>>> would be locked and never released.
>>>>
>>>>
>>>>       Core A                   Core B
>>>>
>>>>                               **LD[Lock] -> 0
>>>> <-----------------------|
>>>> LD[Lock] -> 0
>>>>                       |
>>>> ST[Lock] <- 1
>>>>                       |
>>>>                                while LD[Lock] != 1: // Acquire the lock
>>>>           |
>>>>                                     continue;
>>>>                          |
>>>>
>>>>                               |
>>>>                               **LD[Lock] -> 0 // This load bypassed the
>>>> previous load and got the previous value 0
>>>>                                  ST[Lock] <- 1 // Release the previous
>>>> lock ???
>>>>                                                        // the next
>>>> waiting core with ticket 2 cannot get the lock forever.
>>>>
>>>> In my understand MARSS allows load reordering; however, x86
>>>> architecture does not according to ISA manual. This incompatibility seems
>>>> to cause this issue. To solve this issue, I think we need some technique
>>>> used in MIPS R10000 to enforce SC or just disable load reordering.
>>>>
>>>> Thanks,
>>>> Hanhwi
>>>>
>>>> 2016-12-06 5:18 GMT+09:00 Seunghee Shin <[email protected]>:
>>>>
>>>>> Dear,
>>>>>
>>>>> Hope that you are doing great. Thank you for your efforts on this
>>>>> simulator. By the way, I encountered an issue about one month now and I am
>>>>> still struggling so far due to this. My issue looks very similar to what
>>>>> was addressed about three years ago.
>>>>>
>>>>> http://marss86-devel.cs.binghamton.narkive.com/48BCHVib/simu
>>>>> lation-is-stuck-in-kernel
>>>>>
>>>>> It seems that it happens less frequently with PARSEC, but it very
>>>>> likely happens with my benchmarks (1 out of 2 simulations), although my
>>>>> benchmarks only have simple data structure insert and delete operations. 
>>>>> It
>>>>> is totally working fine on single core configuration, but I need
>>>>> multi-cores. Currently, I am running my benchmarks on 4 cores. Would you
>>>>> anyone know how to fix it or how to avoid it? I will really appreciate 
>>>>> your
>>>>> helps. Thank you in advance.
>>>>>
>>>>> Sincerely,
>>>>> Seunghee Shin
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Sincerely,
>>> Seunghee Shin
>>>
>>
>>
>
>
> --
> Sincerely,
> Seunghee Shin
>
_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

Reply via email to