> >>> I think the bigger issues as you've pointed out are the cost of
> >>> the additional spin lock and should the additional state be
> >>> stored in-band (fewer cache lines) or out-of band (less risk of
> >>> breaking due to unpredictable application behavior).
> >>
> >> We don't need the spinlock if clearing the shadow byte after
> >> setting the status to user.
> >>
> >> Worst case, user will set it back to kernel while the shadow
> >> byte is not cleared yet and the next producer will drop a packet.
> >> But next producers will make progress, so there is no deadlock
> >> or corruption.
> >
> > I thought so too for a while but after spending more time than I
> > care to admit I relized the following sequence was occuring:
> >
> >    Core A                       Core B
> >    ------                       ------
> >    - Enter spin_lock
> >    -   Get tp_status of head (X)
> >        tp_status == 0
> >    -   Check inuse
> >        inuse == 0
> >    -   Allocate entry X
> >        advance head (X+1)
> >        set inuse=1
> >    - Exit spin_lock
> >
> >      <very long delay>
> >
> >                                 <allocate N-1 entries
> >                                 where N = size of ring>
> >
> >                                 - Enter spin_lock
> >                                 -   get tp_status of head (X+N)
> >                                     tp_status == 0 (but slot
> >                                     in use for X on core A)
> >
> >    - write tp_status of         <--- trouble!
> >      X = TP_STATUS_USER         <--- trouble!
> >    - write inuse=0              <--- trouble!
> >
> >                                 -   Check inuse
> >                                     inuse == 0
> >                                 -   Allocate entry X+N
> >                                     advance head (X+N+1)
> >                                     set inuse=1
> >                                 - Exit spin_lock
> >
> >
> > At this point Core A just passed slot X to userspace with a
> > packet and Core B has just been assigned slot X+N (same slot as
> > X) for it's new packet. Both cores A and B end up filling in that
> > slot.  Tracking ths donw was one of the reasons it took me a
> > while to produce these updated diffs.
> 
> Is this not just an ordering issue? Since inuse is set after tp_status,
> it has to be tested first (and barriers are needed to avoid reordering).

I changed the code as you suggest to do the inuse check first and
removed the extra added spin_lock/unlock and it seems to be working.
I was able to run through the night without an issue (normally I would
hit the ring corruption in 1 to 2 hours).

Thanks for pointing that out, I should have caught that myself.  Next
I'll look at your suggestion for where to put the shadow ring.

Reply via email to