Re: [marss86-devel] [MARSSx86] Simulation is stuck with multi-core

Seunghee Shin Wed, 01 Mar 2017 02:54:08 -0800

I checked TSO and I agree that you are right. It is strong consistency
model and MarssX86 doesn't have such. And I feel that your rollback
implementation for speculative loads is reasonable to fix the issue as
well. Thank you for your advice.


Sincerely,
Seunghee Shin


On Tue, Feb 28, 2017 at 7:52 AM, hanhwi jang <[email protected]> wrote:

> After fixing it, I have never experienced the deadlock. Actually
> dependency checking is correct I think. The problem happens when loads are
> reordered. In TSO model, all loads cannot be reordered. However, we allow a
> O3 core speculatively reorder loads for performance. To speculatively
> reorder loads, a O3 core has to flush the pipeline when the load reordering
> has been seen by other cores. I think current MARSS core model does not
> have it.
>
> The fix what I did is not preventing other cores updating the data until a
> speculative load finishes. The fix is replaying each speculative load at
> commit stage to check other cores have updated the value. If the value has
> been updated, the core speculatively issuing load should flush the pipeline.
>
> I also want to mention that the fix is stricter than TSO (actually similar
> to SC, I think). To implement TSO, we need write buffer model. As far as I
> know, we do not have it in the current model.
>
> I hope this explanation is helpful to you. If there is something I have
> misunderstood, let me know I will check again.
>
> Thanks,
> Hanhwi
>
> 2017-02-28 20:35 GMT+09:00 Seunghee Shin <[email protected]>:
>
>> Ok. Now I can guess what you try to do. But I don't think current
>> MarssX86 is wrong. As long as it is out-of-order processor, loads can be
>> re-ordered and also it reads data before the instruction is committed.
>> Furthermore, as long as the dependency check is correct, the re-ordering
>> loads must be safe. By the way, this is nothing to do with memory
>> consistency model.
>>
>> However, based on your finding, I feel that there is a bug at corner case
>> (maybe in dependency check). I think the reason why the data is changed at
>> commit time, because other cores updated the data since you load the data.
>> And you try to prevent such case.  Am I guessing correctly? In short, once
>> you read data, you do not allow other cores update the data in the memory
>> until the load is committed. It seems that this can enforce ordering across
>> cores at certain time. I feel that it is not correct, because such ordering
>> will be very expensive. However, I think that this is a good hint to fix
>> the bug.
>>
>> So after this fix, you have never experienced the deadlock issue again? I
>> really appreciate your advice and looking forward to discuss with you
>> further. Thank you.
>>
>> Sincerely,
>> Seunghee Shin
>>
>>
>>
>> On Tue, Feb 28, 2017 at 5:47 AM, hanhwi jang <[email protected]>
>> wrote:
>>
>>> Well,
>>>
>>> In MARSS, loaded values are stored in LSQ's data and you can read
>>> current value at a certain address with loadvirt function. So, when an load
>>> instruction is committed, compare its LSQ value and a fresh value at its
>>> virtual address loaded by "loadvirt" function at commit stage.
>>>
>>> Thanks,
>>> Hanhwi
>>>
>>> 2017-02-28 16:48 GMT+09:00 Seunghee Shin <[email protected]>:
>>>
>>>> Hello Hanhwi Jang,
>>>>
>>>> Thank you for your reply. Could I ask your solution further? How do you
>>>> check the loaded value and how it can be changed in the middle?
>>>>
>>>> I am looking forward to hearing from you further. Thank you for your
>>>> advice.
>>>>
>>>> Sincerely,
>>>> Seunghee Shin
>>>>
>>>>
>>>> On Tue, Feb 28, 2017 at 2:37 AM, hanhwi jang <[email protected]>
>>>> wrote:
>>>>
>>>>> Dear Seunghee Shin,
>>>>>
>>>>> What I tried to fix the timing issue you mention is to change a way to
>>>>> check RC operand is ready to read in issue queue. I described what I fixed
>>>>> in the previous post.
>>>>>
>>>>> I think that timing issue is not relevant with the deadlock issue. As
>>>>> far as I know, the deadlock issue is due to TSO violation in MARSS. MARSS
>>>>> pipeline seems to implement RC model. To resolve around the deadlock, you
>>>>> have two choices I think. One way is preventing load-load reordering in
>>>>> issue stage and another way is checking loaded value has been changed when
>>>>> committing an load instructions; if the value has been changed, flush 
>>>>> whole
>>>>> pipeline. The later is what I used to fix the deadlock.
>>>>>
>>>>> Thanks,
>>>>> Hanhwi
>>>>>
>>>>> 2017-02-27 20:00 GMT+09:00 Seunghee Shin <[email protected]>:
>>>>>
>>>>>> Thank you for your reply. By the way, I searched previous
>>>>>> mail-archive to find some clues, I found your old discussion about timing
>>>>>> issue. Especially, I feel that one discussion may be related to this 
>>>>>> issue.
>>>>>> Could you explain what you try to fix below? I am looking forward to
>>>>>> hearing from you.
>>>>>>
>>>>>> Sincerely,
>>>>>> Seunghee Shin
>>>>>>
>>>>>>
>>>>>> 3) Stores that wait a value to store wake up early than their
>>>>>> dependent
>>>>>> uops.
>>>>>> O3 model issues in two phases, address calculation and memory access.
>>>>>> So,
>>>>>> stores first issues when operands ra and rb are ready. At the same
>>>>>> time, rc
>>>>>> value is checked with PhysicalRegister::ready() function. If rc is
>>>>>> ready,
>>>>>> the stores succeed in issue. However, the ready function is not for
>>>>>> checking whether value is available in O3 timing model. It's for
>>>>>> checking a
>>>>>> value is calculated by PTLSim.
>>>>>>
>>>>>> I changed the line ooo-exec.cpp:562 like the below
>>>>>> bool rcready = rc.state != PHYSREG_WAITING;
>>>>>> completed = issuestore(*lsq, origvirt, radata, rbdata, rcdata,
>>>>>> rcready, pteupdate);
>>>>>> This is not complete as it does not take account of clustered
>>>>>> architecture.
>>>>>> If you want to model clustered architecture, which have several issue
>>>>>> queues, you need to know when rc value arrives in the issue queue
>>>>>> where
>>>>>> store is waiting to issue.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 22, 2016 at 12:30 PM, hanhwi jang <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I share my recent finding on this issue.
>>>>>>>
>>>>>>> In my case, I ran splash on 8-core target machine and the benchmark
>>>>>>> was stuck at kernel region executing a spin lock "__ticket_spin_lock".
>>>>>>>
>>>>>>> To illustrate this issue, I simplified ticket_spin_lock and
>>>>>>> ticket_spin_unlock code.
>>>>>>>
>>>>>>> <ticket_spin_lock>
>>>>>>> {
>>>>>>> while LD [ticket_addr] != MY_TICKET  // Wait my turn.
>>>>>>>     continue;
>>>>>>> }
>>>>>>>
>>>>>>> <ticket_spin_unlock>
>>>>>>> {
>>>>>>>    LD [ticket_addr] -> TEMP
>>>>>>>    ST [ticket_addr] <- TEMP + 1 // Set next ticket number to wake up
>>>>>>> the next entry.
>>>>>>> }
>>>>>>>
>>>>>>> Let's assume we have two cores A and B and one ticket lock (ticket
>>>>>>> value = 0). Currently core A has the lock and B is waiting the lock to 
>>>>>>> be
>>>>>>> released.
>>>>>>>
>>>>>>> In most of cases, when core A release the lock, core B can acquire
>>>>>>> and release the lock like the following scenario.
>>>>>>>
>>>>>>>        Core A                   Core B
>>>>>>>
>>>>>>> LD[Lock] -> 0
>>>>>>> ST[Lock] <- 1
>>>>>>>                                while LD[Lock] != 1: // Acquire the
>>>>>>> lock LD[Lock] == 1
>>>>>>>                                     continue;
>>>>>>>
>>>>>>>                                LD[Lock] -> 1
>>>>>>>                                ST[Lock] <- 2 // Release the lock
>>>>>>>
>>>>>>> However, sometimes ticket_spin_unlock resides in ROB with
>>>>>>> ticket_spin_lock code. At this situation, LD in ticket_spin_unlock could
>>>>>>> bypass the previous LD in ticket_spin_lock. If that happened, the lock
>>>>>>> would be locked and never released.
>>>>>>>
>>>>>>>
>>>>>>>       Core A                   Core B
>>>>>>>
>>>>>>>                               **LD[Lock] -> 0
>>>>>>> <-----------------------|
>>>>>>> LD[Lock] -> 0
>>>>>>>                           |
>>>>>>> ST[Lock] <- 1
>>>>>>>                           |
>>>>>>>                                while LD[Lock] != 1: // Acquire the
>>>>>>> lock           |
>>>>>>>                                     continue;
>>>>>>>                              |
>>>>>>>
>>>>>>>                                   |
>>>>>>>                               **LD[Lock] -> 0 // This load bypassed
>>>>>>> the previous load and got the previous value 0
>>>>>>>                                  ST[Lock] <- 1 // Release the
>>>>>>> previous lock ???
>>>>>>>                                                        // the next
>>>>>>> waiting core with ticket 2 cannot get the lock forever.
>>>>>>>
>>>>>>> In my understand MARSS allows load reordering; however, x86
>>>>>>> architecture does not according to ISA manual. This incompatibility 
>>>>>>> seems
>>>>>>> to cause this issue. To solve this issue, I think we need some technique
>>>>>>> used in MIPS R10000 to enforce SC or just disable load reordering.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Hanhwi
>>>>>>>
>>>>>>> 2016-12-06 5:18 GMT+09:00 Seunghee Shin <[email protected]>:
>>>>>>>
>>>>>>>> Dear,
>>>>>>>>
>>>>>>>> Hope that you are doing great. Thank you for your efforts on this
>>>>>>>> simulator. By the way, I encountered an issue about one month now and 
>>>>>>>> I am
>>>>>>>> still struggling so far due to this. My issue looks very similar to 
>>>>>>>> what
>>>>>>>> was addressed about three years ago.
>>>>>>>>
>>>>>>>> http://marss86-devel.cs.binghamton.narkive.com/48BCHVib/simu
>>>>>>>> lation-is-stuck-in-kernel
>>>>>>>>
>>>>>>>> It seems that it happens less frequently with PARSEC, but it very
>>>>>>>> likely happens with my benchmarks (1 out of 2 simulations), although my
>>>>>>>> benchmarks only have simple data structure insert and delete 
>>>>>>>> operations. It
>>>>>>>> is totally working fine on single core configuration, but I need
>>>>>>>> multi-cores. Currently, I am running my benchmarks on 4 cores. Would 
>>>>>>>> you
>>>>>>>> anyone know how to fix it or how to avoid it? I will really appreciate 
>>>>>>>> your
>>>>>>>> helps. Thank you in advance.
>>>>>>>>
>>>>>>>> Sincerely,
>>>>>>>> Seunghee Shin
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sincerely,
>>>>>> Seunghee Shin
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Sincerely,
>>>> Seunghee Shin
>>>>
>>>
>>>
>>
>>
>> --
>> Sincerely,
>> Seunghee Shin
>>
>
>


-- 
Sincerely,
Seunghee Shin

_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

Re: [marss86-devel] [MARSSx86] Simulation is stuck with multi-core

Reply via email to