> > > + /* When sw csum is needed, multi-segs needs a buf to contain > > > + * the whole packet for later UDP/TCP csum calculation. > > > + */ > > > + if (m->nb_segs > 1 && !(tx_ol_flags & PKT_TX_TCP_SEG) && > > > + !(tx_offloads & UDP_TCP_CSUM)) { > > > + l3_buf = rte_zmalloc("csum l3_buf", > > > + info.pkt_len - info.l2_len, > > > + RTE_CACHE_LINE_SIZE); > > > + rte_pktmbuf_read(m, info.l2_len, > > > + info.pkt_len - info.l2_len, l3_buf); > > > + l3_hdr = l3_buf; > > > + } else > > > + l3_hdr = (char *)eth_hdr + info.l2_len; > > > > > > > Rather than copying whole packet, make the code handle checksum streaming. > > Copying is the easiest way to do this. > > The problem of handling checksum streaming is that in the first segment, l2 > and l3 hdr len is 14 bytes when checksum takes 4 bytes each > time. > If the datalen of the first segment is 4 bytes aligned (usual case), for the > second segment and the following segments, they may need to add > a special 2 bytes 0x0 at the start.
Didn't understand that one... Why you suddenly need to pad non-first segments with zeroes? Why simply rte_raw_cksum() can't be used for multi-seg case? > Also, mbuf is not passed down to process_inner/outer_chksum so the change > will be a lot. I also think that copying whole packet just to calculate a checksum - way too much overhead.