>>> I think the bigger issues as you've pointed out are the cost of
>>> the additional spin lock and should the additional state be
>>> stored in-band (fewer cache lines) or out-of band (less risk of
>>> breaking due to unpredictable application behavior).
>>
>> We don't need the spinlock if clearing the shadow byte after
>> setting the status to user.
>>
>> Worst case, user will set it back to kernel while the shadow
>> byte is not cleared yet and the next producer will drop a packet.
>> But next producers will make progress, so there is no deadlock
>> or corruption.
>
> I thought so too for a while but after spending more time than I
> care to admit I relized the following sequence was occuring:
>
>    Core A                       Core B
>    ------                       ------
>    - Enter spin_lock
>    -   Get tp_status of head (X)
>        tp_status == 0
>    -   Check inuse
>        inuse == 0
>    -   Allocate entry X
>        advance head (X+1)
>        set inuse=1
>    - Exit spin_lock
>
>      <very long delay>
>
>                                 <allocate N-1 entries
>                                 where N = size of ring>
>
>                                 - Enter spin_lock
>                                 -   get tp_status of head (X+N)
>                                     tp_status == 0 (but slot
>                                     in use for X on core A)
>
>    - write tp_status of         <--- trouble!
>      X = TP_STATUS_USER         <--- trouble!
>    - write inuse=0              <--- trouble!
>
>                                 -   Check inuse
>                                     inuse == 0
>                                 -   Allocate entry X+N
>                                     advance head (X+N+1)
>                                     set inuse=1
>                                 - Exit spin_lock
>
>
> At this point Core A just passed slot X to userspace with a
> packet and Core B has just been assigned slot X+N (same slot as
> X) for it's new packet. Both cores A and B end up filling in that
> slot.  Tracking ths donw was one of the reasons it took me a
> while to produce these updated diffs.

Is this not just an ordering issue? Since inuse is set after tp_status,
it has to be tested first (and barriers are needed to avoid reordering).

Reply via email to