Hello Hanhwi Jang, Thank you for your reply. Could I ask your solution further? How do you check the loaded value and how it can be changed in the middle?
I am looking forward to hearing from you further. Thank you for your advice. Sincerely, Seunghee Shin On Tue, Feb 28, 2017 at 2:37 AM, hanhwi jang <[email protected]> wrote: > Dear Seunghee Shin, > > What I tried to fix the timing issue you mention is to change a way to > check RC operand is ready to read in issue queue. I described what I fixed > in the previous post. > > I think that timing issue is not relevant with the deadlock issue. As far > as I know, the deadlock issue is due to TSO violation in MARSS. MARSS > pipeline seems to implement RC model. To resolve around the deadlock, you > have two choices I think. One way is preventing load-load reordering in > issue stage and another way is checking loaded value has been changed when > committing an load instructions; if the value has been changed, flush whole > pipeline. The later is what I used to fix the deadlock. > > Thanks, > Hanhwi > > 2017-02-27 20:00 GMT+09:00 Seunghee Shin <[email protected]>: > >> Thank you for your reply. By the way, I searched previous mail-archive to >> find some clues, I found your old discussion about timing issue. >> Especially, I feel that one discussion may be related to this issue. Could >> you explain what you try to fix below? I am looking forward to hearing from >> you. >> >> Sincerely, >> Seunghee Shin >> >> >> 3) Stores that wait a value to store wake up early than their dependent >> uops. >> O3 model issues in two phases, address calculation and memory access. So, >> stores first issues when operands ra and rb are ready. At the same time, >> rc >> value is checked with PhysicalRegister::ready() function. If rc is ready, >> the stores succeed in issue. However, the ready function is not for >> checking whether value is available in O3 timing model. It's for checking >> a >> value is calculated by PTLSim. >> >> I changed the line ooo-exec.cpp:562 like the below >> bool rcready = rc.state != PHYSREG_WAITING; >> completed = issuestore(*lsq, origvirt, radata, rbdata, rcdata, >> rcready, pteupdate); >> This is not complete as it does not take account of clustered >> architecture. >> If you want to model clustered architecture, which have several issue >> queues, you need to know when rc value arrives in the issue queue where >> store is waiting to issue. >> >> >> >> On Thu, Dec 22, 2016 at 12:30 PM, hanhwi jang <[email protected]> >> wrote: >> >>> Hi, >>> >>> I share my recent finding on this issue. >>> >>> In my case, I ran splash on 8-core target machine and the benchmark was >>> stuck at kernel region executing a spin lock "__ticket_spin_lock". >>> >>> To illustrate this issue, I simplified ticket_spin_lock and >>> ticket_spin_unlock code. >>> >>> <ticket_spin_lock> >>> { >>> while LD [ticket_addr] != MY_TICKET // Wait my turn. >>> continue; >>> } >>> >>> <ticket_spin_unlock> >>> { >>> LD [ticket_addr] -> TEMP >>> ST [ticket_addr] <- TEMP + 1 // Set next ticket number to wake up the >>> next entry. >>> } >>> >>> Let's assume we have two cores A and B and one ticket lock (ticket value >>> = 0). Currently core A has the lock and B is waiting the lock to be >>> released. >>> >>> In most of cases, when core A release the lock, core B can acquire and >>> release the lock like the following scenario. >>> >>> Core A Core B >>> >>> LD[Lock] -> 0 >>> ST[Lock] <- 1 >>> while LD[Lock] != 1: // Acquire the lock >>> LD[Lock] == 1 >>> continue; >>> >>> LD[Lock] -> 1 >>> ST[Lock] <- 2 // Release the lock >>> >>> However, sometimes ticket_spin_unlock resides in ROB with >>> ticket_spin_lock code. At this situation, LD in ticket_spin_unlock could >>> bypass the previous LD in ticket_spin_lock. If that happened, the lock >>> would be locked and never released. >>> >>> >>> Core A Core B >>> >>> **LD[Lock] -> 0 >>> <-----------------------| >>> LD[Lock] -> 0 >>> | >>> ST[Lock] <- 1 >>> | >>> while LD[Lock] != 1: // Acquire the lock >>> | >>> continue; >>> | >>> >>> | >>> **LD[Lock] -> 0 // This load bypassed the >>> previous load and got the previous value 0 >>> ST[Lock] <- 1 // Release the previous >>> lock ??? >>> // the next >>> waiting core with ticket 2 cannot get the lock forever. >>> >>> In my understand MARSS allows load reordering; however, x86 architecture >>> does not according to ISA manual. This incompatibility seems to cause this >>> issue. To solve this issue, I think we need some technique used in MIPS >>> R10000 to enforce SC or just disable load reordering. >>> >>> Thanks, >>> Hanhwi >>> >>> 2016-12-06 5:18 GMT+09:00 Seunghee Shin <[email protected]>: >>> >>>> Dear, >>>> >>>> Hope that you are doing great. Thank you for your efforts on this >>>> simulator. By the way, I encountered an issue about one month now and I am >>>> still struggling so far due to this. My issue looks very similar to what >>>> was addressed about three years ago. >>>> >>>> http://marss86-devel.cs.binghamton.narkive.com/48BCHVib/simu >>>> lation-is-stuck-in-kernel >>>> >>>> It seems that it happens less frequently with PARSEC, but it very >>>> likely happens with my benchmarks (1 out of 2 simulations), although my >>>> benchmarks only have simple data structure insert and delete operations. It >>>> is totally working fine on single core configuration, but I need >>>> multi-cores. Currently, I am running my benchmarks on 4 cores. Would you >>>> anyone know how to fix it or how to avoid it? I will really appreciate your >>>> helps. Thank you in advance. >>>> >>>> Sincerely, >>>> Seunghee Shin >>>> >>>> >>> >> >> >> -- >> Sincerely, >> Seunghee Shin >> > > -- Sincerely, Seunghee Shin
_______________________________________________ http://www.marss86.org Marss86-Devel mailing list [email protected] https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
