Hi, On Mon, Aug 5, 2024 at 10:26 AM Yu Zhao <yuz...@google.com> wrote: > > > > > + /* > > > > + * Start with a normal IPI and wait up to one second for other > > > > CPUs to > > > > + * stop. We do this first because it gives other processors a > > > > chance > > > > + * to exit critical sections / drop locks and makes the rest of > > > > the > > > > + * stop process (especially console flush) more robust. > > > > + */ > > > > + smp_cross_call(&mask, IPI_CPU_STOP); > > > > > > I realise you've moved this out of crash_smp_send_stop() and it looks > > > like we inherited the code from x86, but do you know how this serialise > > > against CPU hotplug operations? I've spent the last 20m looking at the > > > code and I can't see what prevents other CPUs from coming and going > > > while we're trying to IPI a non-atomic copy of 'cpu_online_mask'. > > > > I don't think there is anything. ...and it's not just this code > > either. > > I agree -- I noticed this a while ago [1], and I added > cpus_read_lock/unlock() on the caller side, because having the locks > inside callees can be a problem for callers who can't sleep. > > [1] https://lore.kernel.org/linux-mm/znkgps-vqbiyn...@google.com/
Sounds reasonable. I'm happy to review a patch doing that. > Also, do the handlers always see `crash_stop` set by the sender? If > not, would it be a problem? (In the above link, it has to do extra > work to make sure that handlers eventually see any updated values.) I _think_ this is OK. It's been a while since I wrote the original patch but I seem to remember thinking that I didn't need to do anything special. Tracing through the code again, I see in gic_ipi_send_mask() the comment: /* * Ensure that stores to Normal memory are visible to the * other CPUs before they observe us issuing the IPI. */ dmb(ishst); ...so I think that means we're fine, right?