Re: [marss86-devel] [MARSSx86] Simulation is stuck with multi-core

hanhwi jang Tue, 28 Feb 2017 04:54:29 -0800

After fixing it, I have never experienced the deadlock. Actually dependency
checking is correct I think. The problem happens when loads are reordered.
In TSO model, all loads cannot be reordered. However, we allow a O3 core
speculatively reorder loads for performance. To speculatively reorder
loads, a O3 core has to flush the pipeline when the load reordering has
been seen by other cores. I think current MARSS core model does not have
it.


The fix what I did is not preventing other cores updating the data until a
speculative load finishes. The fix is replaying each speculative load at
commit stage to check other cores have updated the value. If the value has
been updated, the core speculatively issuing load should flush the pipeline.

I also want to mention that the fix is stricter than TSO (actually similar
to SC, I think). To implement TSO, we need write buffer model. As far as I
know, we do not have it in the current model.

I hope this explanation is helpful to you. If there is something I have
misunderstood, let me know I will check again.

Thanks,
Hanhwi

2017-02-28 20:35 GMT+09:00 Seunghee Shin <[email protected]>:

> Ok. Now I can guess what you try to do. But I don't think current MarssX86
> is wrong. As long as it is out-of-order processor, loads can be re-ordered
> and also it reads data before the instruction is committed. Furthermore, as
> long as the dependency check is correct, the re-ordering loads must be
> safe. By the way, this is nothing to do with memory consistency model.
>
> However, based on your finding, I feel that there is a bug at corner case
> (maybe in dependency check). I think the reason why the data is changed at
> commit time, because other cores updated the data since you load the data.
> And you try to prevent such case.  Am I guessing correctly? In short, once
> you read data, you do not allow other cores update the data in the memory
> until the load is committed. It seems that this can enforce ordering across
> cores at certain time. I feel that it is not correct, because such ordering
> will be very expensive. However, I think that this is a good hint to fix
> the bug.
>
> So after this fix, you have never experienced the deadlock issue again? I
> really appreciate your advice and looking forward to discuss with you
> further. Thank you.
>
> Sincerely,
> Seunghee Shin
>
>
>
> On Tue, Feb 28, 2017 at 5:47 AM, hanhwi jang <[email protected]> wrote:
>
>> Well,
>>
>> In MARSS, loaded values are stored in LSQ's data and you can read current
>> value at a certain address with loadvirt function. So, when an load
>> instruction is committed, compare its LSQ value and a fresh value at its
>> virtual address loaded by "loadvirt" function at commit stage.
>>
>> Thanks,
>> Hanhwi
>>
>> 2017-02-28 16:48 GMT+09:00 Seunghee Shin <[email protected]>:
>>
>>> Hello Hanhwi Jang,
>>>
>>> Thank you for your reply. Could I ask your solution further? How do you
>>> check the loaded value and how it can be changed in the middle?
>>>
>>> I am looking forward to hearing from you further. Thank you for your
>>> advice.
>>>
>>> Sincerely,
>>> Seunghee Shin
>>>
>>>
>>> On Tue, Feb 28, 2017 at 2:37 AM, hanhwi jang <[email protected]>
>>> wrote:
>>>
>>>> Dear Seunghee Shin,
>>>>
>>>> What I tried to fix the timing issue you mention is to change a way to
>>>> check RC operand is ready to read in issue queue. I described what I fixed
>>>> in the previous post.
>>>>
>>>> I think that timing issue is not relevant with the deadlock issue. As
>>>> far as I know, the deadlock issue is due to TSO violation in MARSS. MARSS
>>>> pipeline seems to implement RC model. To resolve around the deadlock, you
>>>> have two choices I think. One way is preventing load-load reordering in
>>>> issue stage and another way is checking loaded value has been changed when
>>>> committing an load instructions; if the value has been changed, flush whole
>>>> pipeline. The later is what I used to fix the deadlock.
>>>>
>>>> Thanks,
>>>> Hanhwi
>>>>
>>>> 2017-02-27 20:00 GMT+09:00 Seunghee Shin <[email protected]>:
>>>>
>>>>> Thank you for your reply. By the way, I searched previous mail-archive
>>>>> to find some clues, I found your old discussion about timing issue.
>>>>> Especially, I feel that one discussion may be related to this issue. Could
>>>>> you explain what you try to fix below? I am looking forward to hearing 
>>>>> from
>>>>> you.
>>>>>
>>>>> Sincerely,
>>>>> Seunghee Shin
>>>>>
>>>>>
>>>>> 3) Stores that wait a value to store wake up early than their dependent
>>>>> uops.
>>>>> O3 model issues in two phases, address calculation and memory access.
>>>>> So,
>>>>> stores first issues when operands ra and rb are ready. At the same
>>>>> time, rc
>>>>> value is checked with PhysicalRegister::ready() function. If rc is
>>>>> ready,
>>>>> the stores succeed in issue. However, the ready function is not for
>>>>> checking whether value is available in O3 timing model. It's for
>>>>> checking a
>>>>> value is calculated by PTLSim.
>>>>>
>>>>> I changed the line ooo-exec.cpp:562 like the below
>>>>> bool rcready = rc.state != PHYSREG_WAITING;
>>>>> completed = issuestore(*lsq, origvirt, radata, rbdata, rcdata,
>>>>> rcready, pteupdate);
>>>>> This is not complete as it does not take account of clustered
>>>>> architecture.
>>>>> If you want to model clustered architecture, which have several issue
>>>>> queues, you need to know when rc value arrives in the issue queue where
>>>>> store is waiting to issue.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Dec 22, 2016 at 12:30 PM, hanhwi jang <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I share my recent finding on this issue.
>>>>>>
>>>>>> In my case, I ran splash on 8-core target machine and the benchmark
>>>>>> was stuck at kernel region executing a spin lock "__ticket_spin_lock".
>>>>>>
>>>>>> To illustrate this issue, I simplified ticket_spin_lock and
>>>>>> ticket_spin_unlock code.
>>>>>>
>>>>>> <ticket_spin_lock>
>>>>>> {
>>>>>> while LD [ticket_addr] != MY_TICKET  // Wait my turn.
>>>>>>     continue;
>>>>>> }
>>>>>>
>>>>>> <ticket_spin_unlock>
>>>>>> {
>>>>>>    LD [ticket_addr] -> TEMP
>>>>>>    ST [ticket_addr] <- TEMP + 1 // Set next ticket number to wake up
>>>>>> the next entry.
>>>>>> }
>>>>>>
>>>>>> Let's assume we have two cores A and B and one ticket lock (ticket
>>>>>> value = 0). Currently core A has the lock and B is waiting the lock to be
>>>>>> released.
>>>>>>
>>>>>> In most of cases, when core A release the lock, core B can acquire
>>>>>> and release the lock like the following scenario.
>>>>>>
>>>>>>        Core A                   Core B
>>>>>>
>>>>>> LD[Lock] -> 0
>>>>>> ST[Lock] <- 1
>>>>>>                                while LD[Lock] != 1: // Acquire the
>>>>>> lock LD[Lock] == 1
>>>>>>                                     continue;
>>>>>>
>>>>>>                                LD[Lock] -> 1
>>>>>>                                ST[Lock] <- 2 // Release the lock
>>>>>>
>>>>>> However, sometimes ticket_spin_unlock resides in ROB with
>>>>>> ticket_spin_lock code. At this situation, LD in ticket_spin_unlock could
>>>>>> bypass the previous LD in ticket_spin_lock. If that happened, the lock
>>>>>> would be locked and never released.
>>>>>>
>>>>>>
>>>>>>       Core A                   Core B
>>>>>>
>>>>>>                               **LD[Lock] -> 0
>>>>>> <-----------------------|
>>>>>> LD[Lock] -> 0
>>>>>>                         |
>>>>>> ST[Lock] <- 1
>>>>>>                         |
>>>>>>                                while LD[Lock] != 1: // Acquire the
>>>>>> lock           |
>>>>>>                                     continue;
>>>>>>                            |
>>>>>>
>>>>>>                                 |
>>>>>>                               **LD[Lock] -> 0 // This load bypassed
>>>>>> the previous load and got the previous value 0
>>>>>>                                  ST[Lock] <- 1 // Release the
>>>>>> previous lock ???
>>>>>>                                                        // the next
>>>>>> waiting core with ticket 2 cannot get the lock forever.
>>>>>>
>>>>>> In my understand MARSS allows load reordering; however, x86
>>>>>> architecture does not according to ISA manual. This incompatibility seems
>>>>>> to cause this issue. To solve this issue, I think we need some technique
>>>>>> used in MIPS R10000 to enforce SC or just disable load reordering.
>>>>>>
>>>>>> Thanks,
>>>>>> Hanhwi
>>>>>>
>>>>>> 2016-12-06 5:18 GMT+09:00 Seunghee Shin <[email protected]>:
>>>>>>
>>>>>>> Dear,
>>>>>>>
>>>>>>> Hope that you are doing great. Thank you for your efforts on this
>>>>>>> simulator. By the way, I encountered an issue about one month now and I 
>>>>>>> am
>>>>>>> still struggling so far due to this. My issue looks very similar to what
>>>>>>> was addressed about three years ago.
>>>>>>>
>>>>>>> http://marss86-devel.cs.binghamton.narkive.com/48BCHVib/simu
>>>>>>> lation-is-stuck-in-kernel
>>>>>>>
>>>>>>> It seems that it happens less frequently with PARSEC, but it very
>>>>>>> likely happens with my benchmarks (1 out of 2 simulations), although my
>>>>>>> benchmarks only have simple data structure insert and delete 
>>>>>>> operations. It
>>>>>>> is totally working fine on single core configuration, but I need
>>>>>>> multi-cores. Currently, I am running my benchmarks on 4 cores. Would you
>>>>>>> anyone know how to fix it or how to avoid it? I will really appreciate 
>>>>>>> your
>>>>>>> helps. Thank you in advance.
>>>>>>>
>>>>>>> Sincerely,
>>>>>>> Seunghee Shin
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sincerely,
>>>>> Seunghee Shin
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Sincerely,
>>> Seunghee Shin
>>>
>>
>>
>
>
> --
> Sincerely,
> Seunghee Shin
>

_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

Re: [marss86-devel] [MARSSx86] Simulation is stuck with multi-core

Reply via email to