On Fri, Feb 10, 2017 at 1:43 AM, Thomas Gleixner <t...@linutronix.de> wrote: > On Fri, 3 Feb 2017, Gabriel Beddingfield wrote: >> Hi Thomas and John, >> >> TL;DR: if an alarmtimer-backed timerfd expires just prior to >> alarmtimer_suspend() begin called, the system will continue to go into >> suspend (with possibly no future wakeups scheduled). The expected behavior >> is that the timer expiration would cause the suspend operation to abort. I >> see several ways to fix it and want to know your preference. >> >> When using an alarmtimer-backed timerfd (i.e. CLOCK_BOOTTIME_ALARM or >> CLOCK_REALTIME_ALARM) we have observed the following race condition: >> >> 1. Userspace commands the system to go into suspend (echo mem > >> /sys/power/state) >> 2. The alarmtimer for a timerfd expires, making the timer inactive until >> someone reads from the file descriptor. >> 3. alarmtimer_suspend() does not find any pending timers, and therefore >> does not schedule a wakeup. >> 4. device goes into suspend. >> >> However, if steps 2 and 3 are swapped, alarmtimer_suspend() would have seen >> that an expiration was "soon" and cause an abort of the suspend. This can >> be reproduced on an idle system by having a process aggressively doing >> `echo mem > /sys/power/state' while another process sets a 4-sec repeating >> timerfd backed by CLOCK_BOOTTIME_ALARM. Eventually the system will go to >> sleep and not wake up. >> >> I see a few ways to fix it: >> >> 1. Create a wakeup_source for each timerfd, and if it's an alarm timer call >> __pm_stay_awake() in timerfd_triggered() and __pm_relax() in timerfd_read(). >> 2. call pm_system_wakeup() in alarmtimer_fired() >> 3. call `if (isalarm(ctx)) pm_system_wakeup();' in timerfd_triggered() >> 4. call __pm_wakeup_event(ws, 2 * MSECS_PER_SEC) in alarmtimer_fired() >> 5. call `if (isalarm(ctc)) __pm_wakeup_event(ws, 2 * MSECS_PER_SEC);' in >> timerfd_triggered() (using a static struct wakeup_source). >> >> I think #1 is right, followed by #2. They all have pros/cons: >> >> * #1 Can eliminate race conditions (rather than an arbitrary 2-sec >> timeout)... but is effectively holding a hard-to-trace block on all PM >> operations (e.g. read/write of /sys/power/wakeup_count blocks until someone >> reads from the file descriptor). >> * #2 Matches the current behavior of the "happy case"... but bypasses the >> userspace policy system provided by wakeup. >> * #3 same pro/con as #2... but solution is specific to timerfd's. >> * #4 Matches the current behavior of the "happy case" if and only if >> userspace is using the 'wakeup' system, otherwise doesn't change any >> behavior. But, I wonder how many people think the current behavior is a bug. >> * #5 Same pro/con as #4... but solution is specific to timerfd's. >> >> I've attached a patch that implements #1.
Sorry for not getting back to you earlier (been traveling this week). I'm surprised this issue hasn't bitten any of the android folks, as I believe they have been making use of this for some time now. CC'ing a few just to make sure we're all on the same page. The approach you took in your patch looks basically ok to me, though I think the __pm_wakeup_event() method in #4 sounds safer, just to avoid the problematic issue if no one is waiting on the fd. Though I worry I'm not quite understanding the con for that case properly, so maybe you can clarify what concerns you there? thanks -john