I checked TSO and I agree that you are right. It is strong consistency model and MarssX86 doesn't have such. And I feel that your rollback implementation for speculative loads is reasonable to fix the issue as well. Thank you for your advice.
Sincerely, Seunghee Shin On Tue, Feb 28, 2017 at 7:52 AM, hanhwi jang <[email protected]> wrote: > After fixing it, I have never experienced the deadlock. Actually > dependency checking is correct I think. The problem happens when loads are > reordered. In TSO model, all loads cannot be reordered. However, we allow a > O3 core speculatively reorder loads for performance. To speculatively > reorder loads, a O3 core has to flush the pipeline when the load reordering > has been seen by other cores. I think current MARSS core model does not > have it. > > The fix what I did is not preventing other cores updating the data until a > speculative load finishes. The fix is replaying each speculative load at > commit stage to check other cores have updated the value. If the value has > been updated, the core speculatively issuing load should flush the pipeline. > > I also want to mention that the fix is stricter than TSO (actually similar > to SC, I think). To implement TSO, we need write buffer model. As far as I > know, we do not have it in the current model. > > I hope this explanation is helpful to you. If there is something I have > misunderstood, let me know I will check again. > > Thanks, > Hanhwi > > 2017-02-28 20:35 GMT+09:00 Seunghee Shin <[email protected]>: > >> Ok. Now I can guess what you try to do. But I don't think current >> MarssX86 is wrong. As long as it is out-of-order processor, loads can be >> re-ordered and also it reads data before the instruction is committed. >> Furthermore, as long as the dependency check is correct, the re-ordering >> loads must be safe. By the way, this is nothing to do with memory >> consistency model. >> >> However, based on your finding, I feel that there is a bug at corner case >> (maybe in dependency check). I think the reason why the data is changed at >> commit time, because other cores updated the data since you load the data. >> And you try to prevent such case. Am I guessing correctly? In short, once >> you read data, you do not allow other cores update the data in the memory >> until the load is committed. It seems that this can enforce ordering across >> cores at certain time. I feel that it is not correct, because such ordering >> will be very expensive. However, I think that this is a good hint to fix >> the bug. >> >> So after this fix, you have never experienced the deadlock issue again? I >> really appreciate your advice and looking forward to discuss with you >> further. Thank you. >> >> Sincerely, >> Seunghee Shin >> >> >> >> On Tue, Feb 28, 2017 at 5:47 AM, hanhwi jang <[email protected]> >> wrote: >> >>> Well, >>> >>> In MARSS, loaded values are stored in LSQ's data and you can read >>> current value at a certain address with loadvirt function. So, when an load >>> instruction is committed, compare its LSQ value and a fresh value at its >>> virtual address loaded by "loadvirt" function at commit stage. >>> >>> Thanks, >>> Hanhwi >>> >>> 2017-02-28 16:48 GMT+09:00 Seunghee Shin <[email protected]>: >>> >>>> Hello Hanhwi Jang, >>>> >>>> Thank you for your reply. Could I ask your solution further? How do you >>>> check the loaded value and how it can be changed in the middle? >>>> >>>> I am looking forward to hearing from you further. Thank you for your >>>> advice. >>>> >>>> Sincerely, >>>> Seunghee Shin >>>> >>>> >>>> On Tue, Feb 28, 2017 at 2:37 AM, hanhwi jang <[email protected]> >>>> wrote: >>>> >>>>> Dear Seunghee Shin, >>>>> >>>>> What I tried to fix the timing issue you mention is to change a way to >>>>> check RC operand is ready to read in issue queue. I described what I fixed >>>>> in the previous post. >>>>> >>>>> I think that timing issue is not relevant with the deadlock issue. As >>>>> far as I know, the deadlock issue is due to TSO violation in MARSS. MARSS >>>>> pipeline seems to implement RC model. To resolve around the deadlock, you >>>>> have two choices I think. One way is preventing load-load reordering in >>>>> issue stage and another way is checking loaded value has been changed when >>>>> committing an load instructions; if the value has been changed, flush >>>>> whole >>>>> pipeline. The later is what I used to fix the deadlock. >>>>> >>>>> Thanks, >>>>> Hanhwi >>>>> >>>>> 2017-02-27 20:00 GMT+09:00 Seunghee Shin <[email protected]>: >>>>> >>>>>> Thank you for your reply. By the way, I searched previous >>>>>> mail-archive to find some clues, I found your old discussion about timing >>>>>> issue. Especially, I feel that one discussion may be related to this >>>>>> issue. >>>>>> Could you explain what you try to fix below? I am looking forward to >>>>>> hearing from you. >>>>>> >>>>>> Sincerely, >>>>>> Seunghee Shin >>>>>> >>>>>> >>>>>> 3) Stores that wait a value to store wake up early than their >>>>>> dependent >>>>>> uops. >>>>>> O3 model issues in two phases, address calculation and memory access. >>>>>> So, >>>>>> stores first issues when operands ra and rb are ready. At the same >>>>>> time, rc >>>>>> value is checked with PhysicalRegister::ready() function. If rc is >>>>>> ready, >>>>>> the stores succeed in issue. However, the ready function is not for >>>>>> checking whether value is available in O3 timing model. It's for >>>>>> checking a >>>>>> value is calculated by PTLSim. >>>>>> >>>>>> I changed the line ooo-exec.cpp:562 like the below >>>>>> bool rcready = rc.state != PHYSREG_WAITING; >>>>>> completed = issuestore(*lsq, origvirt, radata, rbdata, rcdata, >>>>>> rcready, pteupdate); >>>>>> This is not complete as it does not take account of clustered >>>>>> architecture. >>>>>> If you want to model clustered architecture, which have several issue >>>>>> queues, you need to know when rc value arrives in the issue queue >>>>>> where >>>>>> store is waiting to issue. >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Dec 22, 2016 at 12:30 PM, hanhwi jang <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I share my recent finding on this issue. >>>>>>> >>>>>>> In my case, I ran splash on 8-core target machine and the benchmark >>>>>>> was stuck at kernel region executing a spin lock "__ticket_spin_lock". >>>>>>> >>>>>>> To illustrate this issue, I simplified ticket_spin_lock and >>>>>>> ticket_spin_unlock code. >>>>>>> >>>>>>> <ticket_spin_lock> >>>>>>> { >>>>>>> while LD [ticket_addr] != MY_TICKET // Wait my turn. >>>>>>> continue; >>>>>>> } >>>>>>> >>>>>>> <ticket_spin_unlock> >>>>>>> { >>>>>>> LD [ticket_addr] -> TEMP >>>>>>> ST [ticket_addr] <- TEMP + 1 // Set next ticket number to wake up >>>>>>> the next entry. >>>>>>> } >>>>>>> >>>>>>> Let's assume we have two cores A and B and one ticket lock (ticket >>>>>>> value = 0). Currently core A has the lock and B is waiting the lock to >>>>>>> be >>>>>>> released. >>>>>>> >>>>>>> In most of cases, when core A release the lock, core B can acquire >>>>>>> and release the lock like the following scenario. >>>>>>> >>>>>>> Core A Core B >>>>>>> >>>>>>> LD[Lock] -> 0 >>>>>>> ST[Lock] <- 1 >>>>>>> while LD[Lock] != 1: // Acquire the >>>>>>> lock LD[Lock] == 1 >>>>>>> continue; >>>>>>> >>>>>>> LD[Lock] -> 1 >>>>>>> ST[Lock] <- 2 // Release the lock >>>>>>> >>>>>>> However, sometimes ticket_spin_unlock resides in ROB with >>>>>>> ticket_spin_lock code. At this situation, LD in ticket_spin_unlock could >>>>>>> bypass the previous LD in ticket_spin_lock. If that happened, the lock >>>>>>> would be locked and never released. >>>>>>> >>>>>>> >>>>>>> Core A Core B >>>>>>> >>>>>>> **LD[Lock] -> 0 >>>>>>> <-----------------------| >>>>>>> LD[Lock] -> 0 >>>>>>> | >>>>>>> ST[Lock] <- 1 >>>>>>> | >>>>>>> while LD[Lock] != 1: // Acquire the >>>>>>> lock | >>>>>>> continue; >>>>>>> | >>>>>>> >>>>>>> | >>>>>>> **LD[Lock] -> 0 // This load bypassed >>>>>>> the previous load and got the previous value 0 >>>>>>> ST[Lock] <- 1 // Release the >>>>>>> previous lock ??? >>>>>>> // the next >>>>>>> waiting core with ticket 2 cannot get the lock forever. >>>>>>> >>>>>>> In my understand MARSS allows load reordering; however, x86 >>>>>>> architecture does not according to ISA manual. This incompatibility >>>>>>> seems >>>>>>> to cause this issue. To solve this issue, I think we need some technique >>>>>>> used in MIPS R10000 to enforce SC or just disable load reordering. >>>>>>> >>>>>>> Thanks, >>>>>>> Hanhwi >>>>>>> >>>>>>> 2016-12-06 5:18 GMT+09:00 Seunghee Shin <[email protected]>: >>>>>>> >>>>>>>> Dear, >>>>>>>> >>>>>>>> Hope that you are doing great. Thank you for your efforts on this >>>>>>>> simulator. By the way, I encountered an issue about one month now and >>>>>>>> I am >>>>>>>> still struggling so far due to this. My issue looks very similar to >>>>>>>> what >>>>>>>> was addressed about three years ago. >>>>>>>> >>>>>>>> http://marss86-devel.cs.binghamton.narkive.com/48BCHVib/simu >>>>>>>> lation-is-stuck-in-kernel >>>>>>>> >>>>>>>> It seems that it happens less frequently with PARSEC, but it very >>>>>>>> likely happens with my benchmarks (1 out of 2 simulations), although my >>>>>>>> benchmarks only have simple data structure insert and delete >>>>>>>> operations. It >>>>>>>> is totally working fine on single core configuration, but I need >>>>>>>> multi-cores. Currently, I am running my benchmarks on 4 cores. Would >>>>>>>> you >>>>>>>> anyone know how to fix it or how to avoid it? I will really appreciate >>>>>>>> your >>>>>>>> helps. Thank you in advance. >>>>>>>> >>>>>>>> Sincerely, >>>>>>>> Seunghee Shin >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Sincerely, >>>>>> Seunghee Shin >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Sincerely, >>>> Seunghee Shin >>>> >>> >>> >> >> >> -- >> Sincerely, >> Seunghee Shin >> > > -- Sincerely, Seunghee Shin
_______________________________________________ http://www.marss86.org Marss86-Devel mailing list [email protected] https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
