On 04/21/2018 08:54 AM, Willem de Bruijn wrote:
> On Fri, Apr 20, 2018 at 9:48 AM, Jesper Dangaard Brouer
> <bro...@redhat.com> wrote:
>>
>> On Thu, 19 Apr 2018 06:47:10 -0700 Eric Dumazet <eric.duma...@gmail.com> 
>> wrote:
>>> On 04/19/2018 12:40 AM, Paolo Abeni wrote:
>>>> On Wed, 2018-04-18 at 12:21 -0700, Eric Dumazet wrote:
>>>>> On 04/18/2018 10:15 AM, Paolo Abeni wrote:
>> [...]
>>>>
>>>> Any suggestions for better results are more than welcome!
>>>
>>> Yes, remote skb freeing. I mentioned this idea to Jesper and Tariq in
>>> Seoul (netdev conference). Not tied to UDP, but a generic solution.
>>
>> Yes, I remember.  I think... was it the idea, where you basically
>> wanted to queue back SKBs to the CPU that allocated them, right?
>>
>> Freeing an SKB on the same CPU that allocated it, have multiple
>> advantages. (1) the SLUB allocator can use a non-atomic
>> "cpu-local" (double)cmpxchg. (2) the 4 cache-lines memset cleared of
>> the SKB stay local.  (3) the atomic SKB refcnt/users stay local.
>>
>> We just have to avoid that queue back SKB's mechanism, doesn't cost
>> more than the operations we expect to save.  Bulk transfer is an
>> obvious approach.  For storing SKBs until they are returned, we already
>> have a fast mechanism see napi_consume_skb calling _kfree_skb_defer,
>> which SLUB/SLAB-bulk free to amortize cost (1).
>>
>> I guess, the missing information is that we don't know what CPU the SKB
>> were created on...
> 
> For connected sockets, sk->sk_incoming_cpu has this data. It
> records BH cpu on enqueue to udp socket, so one caveat is that
> it may be wrong with rps/rfs.
> 
> Another option is to associate not with source cpu but napi struct
> and have the device driver free in the context of its napi processing.
> This has the additional benefit that skb->napi_id is already stored
> per skb, so this also works for unconnected sockets.
> 
> Third, the skb->napi_id field is unused after setting sk->sk_napi_id
> on sk enqueue, so the BH cpu could be stored here after that,
> essentially extending sk_incoming_cpu to unconnected sockets.

We use at Google something named TXCS, which is what I mentioned to Jesper and 
Tariq.

(In our case, we wanted to not perform skb destructor/freeing on the cpu 
handling the TX queue,
but on cpus that originally cooked the skb (running TCP stack))

To accommodate generic needs (both RX and TX), I do not believe we can union 
any existing fields,
without a lot of pain/bugs.


Reply via email to