> I do know that, but fact is, the current drivers do it. I am concerned
> about the amount of entropy in one single 16 bit field used to
> distinguish flows. Flow labels fine and good, but if current hardware
> does not support it, it does not help. Imagine containers with lots of
> applications, 16 bit doesn't seem to fit here.
>
Based on what? RSS indirection table is only seven bits so even 16
bits would be overkill for that. Please provide a concrete example,
data where 16 bits wouldn't be sufficient.

>> > Please provide a sketch up for a protocol generic api that can tell
>> > hardware where a inner protocol header starts that supports vxlan,
>> > vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is
>> > starting at that point.
>> >
>> BPF. Implementing protocol generic offloads are not just a HW concern
>> either, adding kernel GRO code for every possible protocol that comes
>> along doesn't scale well. This becomes especially obvious when we
>> consider how to provide offloads for applications protocols. If the
>> kernel provides a programmable framework for the offloads then
>> application protocols, such as QUIC, could use use that without
>> needing to hack the kernel to support the specific protocol (which no
>> one wants!). Application protocol parsing in KCM and some other use
>> cases of BPF have already foreshadowed this, and we are working on a
>> prototype for a BPF programmable engine in the kernel. Presumably,
>> this same model could eventually be applied as the HW API to
>> programmable offload.
>
> So your proposal is like this:
>
> dev->ops->ndo_add_offload(struct net_device *, struct bpf_prog *) ?
>
> What do network cards do when they don't support bpf in hardware as
> currently all cards. Should they do program equivalence testing on the
> bpf program to check if it conforms some of its offload capabilities and
> activate those for the port they parsed out of the bpf program? I don't
> really care about more function pointers in struct net_device_ops
> because it really doesn't matter but what really concerns me is the huge
> size of the drivers in the kernel. Just tell the driver specifically
> what is wanted and let them do that. Don't force them to do program
> inspection or anything.
>
Nobody is forcing anyone to do anything. If someone implements generic
offload like this it's treated just like any other optional feature of
a NIC.

> About your argument regarding GRO for every possible protocol:
>
> Adding GRO for QUIC or SPUD transparently does not work as it breaks the
> semantics of UDP. UDP is a framed protocol not a streamed one so it does
> not make sense to add that. You can implement GRO for fragmented UDP,
> though. The length of the packet is end-to-end information. If you add a
> new protocol with a new socket type, sure you can add GRO engine
> transparently for that but not simply peeking data inside UDP if you
> don't know how the local application uses this data. In case of
> forwarding you can never do that, it will break the internet actually.
> In case you are the end host GRO engine can ask the socket what type it
> is or what framing inside UDP is used. Thus this cannot work on hardware
> either.
>
This is not correct, We already have many instances of GRO being used
over UDP in several UDP encapsulations, there is no issue with
breaking UDP semantics. QUIC is a stream based transport like TCP so
it will fit into the model (granted the fact that this incoming from
userspace and the per packet security will make it little more
challenging to implement offload). I don't know if this is needed, but
I can only assume that server performance in QUIC must be miserable if
all the I/O is 1350 bytes.

> I am not very happy with the use cases of BPF outside of tracing and
> cls_bpf and packet steering.
>
> Please don't propose that we should use BPF as the API for HW
> programmable offloading currently. It does not make sense.
>
If you have an alternative, please propose it now.

> Bye,
> Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to