Yossi -
You may already understand this, but fragments of IP datagrams ("IP packet" is
non-standard slang that confuses IP fragments - packets - with the end-to-end
data unit of IP) need to be checksummed together with items from the ?virtual
header? before delivery to TCP and then userspace. Also, TCP datagrams can
overlap each other?s sequence space and also be partially ?old?. There is no
rule that says that a later IP datagram cannot transmit the part of the
sequence-number range of earlier received IP datagrams. The bytes must be
identical, of course.
So, for example, if a prior TCP datagram had been received covering sequence
numbers 504-508, a subsequent TCP segment might cover sequence number 500-535
(if the sender has not seen the ack up to 508, which can happen for many
reasons). 504-508 would be covered by the segment?s TCP checksum (along with
that segment?s virtual header).
Whatever you do to handle zero-copy implementation of TCP direct into TCP
receiver buffers must, for example, be able to deliver bytes 509-535 directly
into the user buffer, if bytes 504-508 have already been delivered. Otherwise
it is a non-standard implementation.
A simpler approach might work with certain sender-stacks (those that use the
same ?datagram-boundaries? for retransmission), but hardly all, since the
standard does not require retransmission on such boundaries. In the old days,
terminal concentrators that used telnet over TCP would retransmit larger
segments than the ?single character? segments in order to reduce the overhead
of catching up with packets dropped. It?s dangerous to presume that one?s
?sending stack? and one?s ?receiving stack? are in the same version of the same
OS - especially dangerous to promote a technique that fails on certain standard
cases as a performance improving win.
I suspect that a zero-copy TCP requires that at least sometimes, given
fragmentation and this ?overlapping sequence number? issue, actual copying,
especially with fragmentation involved.
So if you are talking about ?almost always zero-copy with certain senders? that
might make the complexity far less. Zero-copy fragment assembly only in the IP
layer is much more doable, but it still requires a copy from the reassembled IP
datagram into TCP sequence number space.
David P. Reed, Ph.D.
TidalScale, Inc.
On Mar 30, 2014, at 2:52 AM, Yossi Barshishat <yossi at imvisiontech.com> wrote:
> Hi,
>
>
>
> Assuming I know ahead that all IP segments related to one single IP packet
> ID arrive consequently and I need to forward the entire IP payload toward
> the application layer.
>
> One way to handle this is using a hash table for reassembly of the packet
> data (like the ipv4_reassembly example), another way would be to assume one
> single bucket (following the above assumption).
>
>
>
> However any means the DPDK provides doesn't enable a zero copy mechanism (it
> will be required to copy the segments payloads into one larger buffer).
>
>
>
> Does anybody has any idea regarding a method to control the place where each
> part of the packet will be written to?
>
> e.g. allocating the first segment regularly while the packet data buffer is
> set to the maximum packet length (rather than to MTU size), and then reading
> n bytes after the start of each following segment into the data buffer.
>
>
>
> That way I can forward the app layer the buffer without copying it.
>
>
>
> Thanks,
>
>
>
>
>