On Mon, Jan 30, 2023 at 6:36 PM Andres Freund <and...@anarazel.de> wrote: > On 2023-01-30 15:22:34 +1300, Thomas Munro wrote: > > On Mon, Jan 30, 2023 at 6:26 AM Thomas Munro <thomas.mu...@gmail.com> wrote: > > > out-of-order hazard > > > > I've been trying to understand how that could happen, but my CPU-fu is > > weak. Let me try to write an argument for why it can't happen, so > > that later I can look back at how stupid and naive I was. We have A > > B, and if the CPU sees no dependency and decides to execute B A > > (pipelined), shouldn't an interrupt either wait for the whole > > schemozzle to commit first (if not in a hurry), or nuke it, handle the > > IPI and restart, or something? > > In a core local view, yes, I think so. But I don't think that's how it can > work on multi-core, and even more so, multi-socket machines. Imagine how it'd > influence latency if every interrupt on any CPU would prevent all out-of-order > execution on any CPU.
Good. Yeah, I was talking only about a single thread/core. > > After an hour of reviewing randoma > > slides from classes on out-of-order execution and reorder buffers and > > the like, I think the term for making sure that interrupts run with > > the illusion of in-order execution maintained is called "precise > > interrupts", and it is expected in all modern architectures, after the > > early OoO pioneers lost their minds trying to program without it. I > > guess generally you want that because it would otherwise run your > > interrupt handler in a completely uncertain environment, and > > specifically in this case it would reach our signal handler which > > reads A's output (waiting) and writes to B's input (is_set), so B IPI > > A surely shouldn't be allowed? > > Userspace signals aren't delivered synchronously during hardware interrupts > afaik - and I don't think they even possibly could be (after all the process > possibly isn't scheduled). Yeah, they're not synchronous and the target might not even be running. BUT if a suitable thread is running then AFAICT an IPI is delivered to that sucker to get it running the handler ASAP, at least on the three OSes I looked at. (See breadcrumbs below). > I think what you're talking about with precise interrupts above is purely > about the single-core view, and mostly about hardware interrupts for faults > etc. The CPU will unwind state from speculatively executed code etc on > interrupt, sure - but I think that's separate from guaranteeing that you can't > have stale cache contents *due to work by another CPU*. Yeah. I get the cache problem, a separate issue that does indeed look pretty dodgy. I guess I wrote my email out-of-order: at the end I speculated that cache coherency probably can't explain this failure at least in THAT bit of the source, because of that funky extra self-SetLatch(). I just got spooked by the mention of out-of-order execution and I wanted to chase it down and straighten out my understanding. > I'm not even sure that userspace signals are generally delivered via an > immediate hardware interrupt, or whether they're processed at the next > scheduler tick. After all, we know that multiple signals are coalesced, which > certainly isn't compatible with synchronous execution. But it could be that > that just happens when the target of a signal is not currently scheduled. FreeBSD: By default, they are when possible, eg if the process is currently running a suitable thread. You can set sysctl kern.smp.forward_signal_enabled=0 to turn that off, and then it works more like the way you imagined (checking for pending signals at various arbitrary times, not sure). See tdsigwakeup() -> forward_signal() -> ipi_cpu(). Linux: Well it certainly smells approximately similar. See signal_wake_up_state() -> kick_process() -> smp_send_reschedule() -> smp_cross_call() -> __ipi_send_mask(). The comment for kick_process() explains that it's using the scheduler IPI to get signals handled ASAP. Darwin: ... -> cpu_signal() -> something that talks about IPIs Coalescing is happening not only at the pending signal level (an invention of the OS), and then for the inter-processor wakeups there is also interrupt coalescing. It's latches all the way down.