Quoting Doug Anderson (2020-12-03 08:40:46) > I would guess that if "mas->cur_xfer" is NULL then > geni_spi_handle_rx() should read all data in the FIFO and throw it > away and geni_spi_handle_tx() should set SE_GENI_TX_WATERMARK_REG to > 0. NOTE: I _think_ that with the synchronize_irq() I'm suggesting > above we'll avoid this case, but it never hurts to be defensive. > > > Does that all make sense? So the summary is that instead of your patch:
Can we get a CPU diagram describing the race and scenario where this happens? Something like: CPU0 CPU1 ---- ---- setup_fifo_xfer() spin_lock_irq(&mas->lock); spin_unlock_irq(&mas->lock); mas->cur_xfer = xfer ... <IRQ> geni_spi_isr() geni_spi_handle_rx() <NULL deref boom explosion!> But obviously this example diagram is incorrect and some timeout happens instead? Sorry, I'm super lazy and don't want to read many paragraphs of text. :) I'd rather have a diagram like above that clearly points out the steps taken to the NULL pointer deref. > > 1. Add synchronize_irq() at the start and end of > handle_fifo_timeout(). Not under lock. > > 2. In geni_spi_handle_rx(), check for NULL "mas->cur_xfer". Read all > data in the FIFO (don't cap at rx_rem_bytes), but throw it away. > > 3. In geni_spi_handle_tx(), check for NULL "mas->cur_xfer". Don't > write any data. Just write 0 to SE_GENI_TX_WATERMARK_REG. > > I think #1 is the real fix, but #2 and #3 will avoid crashes in case > there's another bug somewhere. > Aren't 2 and 3 papering over some weird problem though where irqs are coming in unexpectedly?