Quoting Doug Anderson (2020-12-03 08:40:46)

> I would guess that if "mas->cur_xfer" is NULL then
> geni_spi_handle_rx() should read all data in the FIFO and throw it
> away and geni_spi_handle_tx() should set SE_GENI_TX_WATERMARK_REG to
> 0.  NOTE: I _think_ that with the synchronize_irq() I'm suggesting
> above we'll avoid this case, but it never hurts to be defensive.
> 
> 
> Does that all make sense?  So the summary is that instead of your patch:

Can we get a CPU diagram describing the race and scenario where this
happens? Something like:

  CPU0                                CPU1
  ----                                ----
  setup_fifo_xfer()
   spin_lock_irq(&mas->lock);
   spin_unlock_irq(&mas->lock);
   mas->cur_xfer = xfer
   ...
   <IRQ>
                                      geni_spi_isr()
                                       geni_spi_handle_rx()
                                        <NULL deref boom explosion!>

But obviously this example diagram is incorrect and some timeout happens
instead? Sorry, I'm super lazy and don't want to read many paragraphs of
text. :) I'd rather have a diagram like above that clearly points out
the steps taken to the NULL pointer deref.

> 
> 1. Add synchronize_irq() at the start and end of
> handle_fifo_timeout().  Not under lock.
> 
> 2. In geni_spi_handle_rx(), check for NULL "mas->cur_xfer".  Read all
> data in the FIFO (don't cap at rx_rem_bytes), but throw it away.
> 
> 3. In geni_spi_handle_tx(), check for NULL "mas->cur_xfer".  Don't
> write any data.  Just write 0 to SE_GENI_TX_WATERMARK_REG.
> 
> I think #1 is the real fix, but #2 and #3 will avoid crashes in case
> there's another bug somewhere.
> 

Aren't 2 and 3 papering over some weird problem though where irqs are
coming in unexpectedly?

Reply via email to