On Wed, Jan 22, 2020 at 12:54 AM Flavio Leitner <f...@sysclose.org> wrote: > > > Hi Ben, > > Thanks for reviewing it! > > On Tue, Jan 21, 2020 at 01:35:39PM -0800, Ben Pfaff wrote: > > On Sat, Jan 18, 2020 at 12:08:06AM +0100, Ilya Maximets wrote: > > > On 18.01.2020 00:03, Stokes, Ian wrote: > > > > Thanks all for review/testing, pushed to master. > > > > > > OK, thanks Ian. > > > > > > @Ben, even though this patch already merged, I'd ask you to take a look > > > at the code in case you'll spot some issues especially in non-DPDK related > > > parts. > > > > I found the name dp_packet_hwol_is_ipv4(), and similar, confusing. The > > name suggested to me "test whether the packet is IPv4" not "test whether > > the packet has an offloaded IPv4 checksum". I guess the "hwol" is > > offload related but... I like the name dp_packet_hwol_tx_l4_checksum() > > much more, it makes it obvious at a glance that it's a > > checksum-offloading check. > > hwol = hardware offloading. I hear that all the time, but maybe there is a > better name. I will improve that if no one gets on it first. > > > In the case where we actually receive a 64 kB packet, I think that this > > code is going to be relatively inefficient. If I'm reading the code > > correctly (I did it quickly), then this is what happens: > > > > - The first 1500 bytes of the packet land in the first > > dp_packet. > > > > - The remaining 64000ish bytes land in the second dp_packet.
It's not a dp_packet, it's a preallocated buffer per rxq (aux_bufs). struct netdev_rxq_linux { struct netdev_rxq up; bool is_tap; int fd; char *aux_bufs[NETDEV_MAX_BURST]; /* Batch of preallocated TSO buffers. */ }; > > > > - Then we expand the first dp_packet to the needed size and copy > > the remaining 64000 bytes into it. > > That's correct. > > > An alternative would be: > > > > - Set up the first dp_packet as currently. > > > > - Set up the second dp_packet so that the bytes are received > > into it starting at offset (mtu + headroom). > > > > - If more than mtu bytes are received, then copy those bytes > > into the headroom of the second dp_packet and return it to the > > caller instead of the first dp_packet. > > I wanted to avoid doing more extensive processing if it's not a TSO packet > to avoid performance regressions since it' very sensitive. Right now the 64k > buffer is preallocated and is static for each queue to avoid the malloc > performance issue. Now for TSO case, we have more time per packet for > processing. Can we implement Ben's idea by 1) set size of aux_buf to 64k + mtu 2) create 2nd dp_packet using this aux_buf and copy first packet to first mtu bytes of aux_buf 3) since we steal this aux_bufs, allocate a new aux_buf by rxq->aux_bufs[i] = xmalloc(64k + mtu) 4) free the first dp_packet, and use the second dp_packet Regards, William > > > The advantage is that we do a 1500-byte copy instead of a 64000-byte > > copy. The disadvantage is that we waste any memory leftover in the > > second dp_packet, e.g. 32 kB if it's only a 32 kB packet instead of 64 > > kB. Also we need slightly more sophisticated dp_packet allocation (we > > will need to replenish the supply of aux_bufs). > > I also tried to avoid waste of memory, which becomes a problem with multiple > ports and queues working in parallel. > > -- > fbl > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev