hi Mikey On 09/18/2018 01:04 AM, Michael Neuling wrote: >> On top of that, both tm_reclaim_task() and tm_recheckpoint_new_task() >> functions are not used anymore, removing them. > > What about tm_reclaim_current(). This is being used in places like signals > which I would have thought we could avoid with this series
In fact tm_reclaim_current() is the default function to reclaim. It is the function that is being called by TM_KERNEL_ENTRY, other than that, it should never be called on the sane path. >> + * If we got here with an active transaction, then, it was >> + * aborted by TM_KERNEL_ENTRY and the fix the failure case >> + * needs to be fixed, so, indepedently how we arrived here, the >> + * new TM abort case will be TM_CAUSE_RESCHED now. > > What does "fix the failure case needs to be fixed" mean? > > also s/indepedently/independently/ In fact, I rewrote this paragraph. I try to say that the "TEXASR Failure code" needs to be updated/fixed, but it became that messy and crazy paragraph. :-( > >> + */ >> + if (MSR_TM_ACTIVE(prev->thread.regs->msr)) { >> + /* >> + * If there was an IRQ during trecheckpoint, it will >> + * cause an IRQ to be replayed. This replayed IRQ can >> + * invoke SCHEDULE_USER, thus, we arrive here with a TM >> + * active transaction. > > I don't think this can happen. trecheckpoint (and treclaim) are called with > IRQs > hard off (since they change r1). There will be an IRQ check and replay later. An IRQ being replayed can cause a process to be reschedule and arrive here after a trecheckpoint. > I think something else is going on here. I think this code and comment needs > to > go but I assume it's here because you are seeing something. Right, and it was my major concern about this whole review. The tm_recheckpoint() was being called with IRQ hard off, as you said, but the IRQ is being re-enabled later, after "trecheckpoint", when it calls local_irq_restore(). Since the IRQ was re-enabled, there is a mechanism to check for IRQs that were pending and execute them (after recheckpoint - hence with MSR[TS] active). Some of these pending IRQ might cause a task reschedule, bringing us to __switch_to with MSR[TS] active. I also checked that there were cases where a pending IRQ was waiting to be replayed even before tm_recheckpoint() is called. I suspect we were reaching tm_recheckpoint soft-disabled. On the v2 patchset, I am following your suggestion, which is calling restore_tm_state() much later (at the beginning of fast_exception_return), after the _TIF_USER_WORK_MASK section, and after the IRQ replay section also. So, after this point, there is no way to rollback the exit to userspace after trecheckpoint. Therefore, if there is a recheckpoint, a rfid will always come directly. In order to do so, I need to remove _TIF_RESTORE_TM as part of _TIF_USER_WORK_MASK and handle _TIF_RESTORE_TM individually later. This seems to solve this problem, and I can remove this reclaim in the middle of __switch_to_tm(). Thank you