On Thu, Nov 2, 2017 at 3:39 AM, Nick Coghlan <ncogh...@gmail.com> wrote:
> On 2 November 2017 at 04:29, Koos Zevenhoven <k7ho...@gmail.com> wrote: > >> SPOILER ALERT! At the moment, Nick's statement is in fact **always** >> true in **all** cases (at least when ignoring untypical cases and some >> inaccuracies in phrasing). Another question is whether the statement >> **should** be true at all. >> >> >> PyErr_CheckSignals(), the function that checks for pending signals, now >> **implicitly** uses the strictest possible memory-order requirement >> (SEQ_CST) for checking the `is_tripped` flag, a value which can be used to >> peek whether there are any pending signals. This means that two threads >> that call PyErr_CheckSignals can't even **check** the value of that flag >> at the same time, and they'll have to wait for each other and whatever the >> CPU hardware implementation does for cache synchronzation. >> > > Nice, that's deeper than I went - I just assumed there was an > interlocked_increment in there somewhere, without checking whether or not > there were any optimised code paths that managed to avoid that call :) > > >> From a correctness point of view, that is absolutely great: if >> PyErr_CheckSignals() is called, it is guaranteed to notice a new signal >> regardles of how small the number of picoseconds after the `is_tripped` >> flag has been set. But is that really important? Do we really need things >> to be slowed down by this? And do we want people to "optimize" code by >> avoiding signal checking? >> > > It isn't signal handling latency that's the concern here, it's potentially > missing signals. Consider the case where we have 3 threads: A, B, C. A is > the main thread that will actually handle signals, B and C both happened to > receive them. We'll also assume the two threads receive *different* > signals (otherwise they'll potentially get collapsed into one notification > regardless). > > The scenario we want to avoid is having one or both of the signals set, > but is_tripped cleared. With an interlocked check (where both 0->1 and 1->0 > are memory barriers), that clearly can't happen. I suspect you're right > that this could also be achieved with a less strict memory sync requirement > on the 0->1 check, but that's much harder to reason about than simply > limiting the number of iterations we make through an iterator consumption > loop before checking for signals. > Missing signals is a concern, and there's another concern that one received signal might be interpreted as two separate occurrences. But it doesn't prevent us from *checking* is_tripped in a more relaxed manner, as long as the 0->1 check is *verified* using something like SEQ_CST and the associated signal handling is synchronized properly. And that's why my PyErr_PROBE_SIGNALS below should work, and wouldn't prevent multiple threads from handling signals either. > > >> The signals won't be caught until checked anyway, and as we've seen, one >> solution is to use a counter to determine if enough time has passed that >> another check for potential signals should happen. That does, however, >> raise the question of who should own the counter, and if it's OK for >> multiple threads to use the same counter. If yes, then would we be able to >> avoid slow atomic decrements (or increments)? >> >> But another solution might be to make a less strict but almost equivalent >> functionality with much less overhead. Something like this in a header file: >> >> static inline int PyErr_PROBE_SIGNALS() { >> static int volatile *flag = (int volatile *) &is_tripped; >> if (*flag) { >> return PyErr_CheckSignals(); >> } >> else { >> return 0; >> } >> } >> >> Here, the flag is casted to volatile to force a check to happen each >> time from memory/cache. However, to make it more standard and to make sure >> it works with all compilers/platforms, it might be better to, instead of >> *flag, use an atomic load with "MEMORY_ORDER_RELAXED". Another thing is >> that `is_tripped`, which is currently declared as static inside >> signalmodule.c [4], should somehow be exposed in the API to make it >> available for the inline function in headers. >> >> This solution might be fast enough for a lot of cases, although still >> slightly slower than the counter approach, at least if the counter approach >> would completely avoid per-iteration memory access by requiring each >> function to own the counter that is used for this. >> > > One thing I like about a nested inner loop is that it's easy to understand > the rationale for it *without* knowing any of the details of how memory > synchronisation works, as it's just: > > - we want to check for signals regularly so the latency in interrupt > handling isn't too high > - PyErr_CheckSignals() is too expensive to call on every iteration > - so we only check every N iterations > > My initial "negative" reaction to that potential solution ("I don't believe nesting loops is strictly a requirement." –K) was because going for manually splitting the loops into smaller inner loops sounds like preparing a nightmare for all C programmers. And I don't think it's possible to write a preprocessor macro that does that for you. Maybe things like Cython could in principle do this automatically, though. Using a counter can have a very small overhead, because things can happen completely in processor registers. > Memory synchronisation then only comes up if someone asks why > "PyErr_CheckSignals" is expensive to call. And while it wouldn't surprise > at all if you're right and there are ways to make that call cheaper, > they're still never going to be cheaper than explicitly reducing the > frequency with which it is called. > > There's a limit to how cheap the call to PyErr_CheckSignals() can be. As I mentioned earlier, even just the fact that it's a C function call can be too much. That's why, in the above, I used a new name PyErr_PROBE_SIGNALS() instead of optimizing the existing PyErr_CheckSignals() –– turning PyErr_CheckSignals() into a static inline function would change the ABI. I also didn't call it PyErr_CHECK_SIGNALS() because the functionality is not strictly equivalent to the existing function. ––Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/