On 19 February 2018 at 16:32, Bill Fischofer <bill.fischo...@linaro.org> wrote: > On Mon, Feb 19, 2018 at 7:36 AM, Bogdan Pricope > <bogdan.pric...@linaro.org> wrote: >> >> On 17 February 2018 at 03:36, Bill Fischofer <bill.fischo...@linaro.org> >> wrote: >> > Changed to post this to the ODP mailing list, as this is a good topic to >> > discuss there. >> > >> > Let's first back up a bit and discuss the intent behind packet references. >> > Packets typically consist of one or more headers followed by a payload. >> > References are designed to allow multiple unique headers to share a common >> > payload, which must be treated as read/only. This is because headers >> > typically have things like unique destination addresses (for multicast) or >> > IP identifiers (for retransmission). A fundamental assumption of the ODP >> > programming model is that a packet handle is owned by only one thread at a >> > time. If two or more threads were to make simultaneous accesses to a packet >> > through the same handle, the result would be unpredictable race conditions >> > and likely data corruption.This is why the shared part of any packet >> > reference must be treated as read only by all parties. >> > >> > So any packet reference consists of a R/W prefix, which is unique to that >> > reference, followed by a (potentially) shared R/O suffix. The point at >> > which >> > the R/O suffix begins is set by the offset parameter. For dynamic >> > references >> > the prefix can either start out null, which is what odp_packet_ref() does, >> > or can be supplied by the caller, which is what odp_packet_ref_pkt() does. >> > In both cases the R/W prefix can be modified >> > >> > With this background let's look at the specifics. >> > >> > On Fri, Feb 16, 2018 at 9:31 AM, Bogdan Pricope <bogdan.pric...@linaro.org> >> > wrote: >> >> >> >> Hi, >> >> >> >> I am trying to use ODP packet reference API to improve transmission of >> >> TCP data. Yet, it seems current API is not enough for this. >> >> >> >> Usecase: three data areas stored inside odp_packet_t >> >> >> >> mb1: size 40 bytes – to send last 20 bytes >> >> mb2: size 40 bytes – to send 40 bytes >> >> mb3: size 40 bytes – to send first 20 bytes >> >> >> > >> > Are you looking at three areas within the same packet or three areas that >> > are in separate packets? You cannot create a reference to part of a packet. >> > The only control is the starting offset within the packet that is to become >> > the shared portion of the reference. That extends to the end of the packet >> > on which the reference is created. We had considered adding a length >> > parameter but that would pose undue implementation complications. >> >> Each data area is in separate odp packet: this is generated by 3 calls >> to an API equivalent to: >> >> ssize_t send(int sockfd, const void *buf, size_t len, int flags); >> >> Buffer 'buf' provided by application is stored as odp_packet_t inside >> socket send buffer array. > > > You can certainly use odp_packet_offset() to get addressability to a > particular offset within a packet and send() from there. You just need > to be aware of potential segmentation along the way (seg_len value > returned from the odp_packet_offset() call). If you need a particular > offset/len to be contiguously addressable odp_packet_align() may be > used to force that, but that can be an expensive operation if the > packet has to be resegmented to meet the needs expressed by the call.
I meant, send() calls are generating the tree data areas: - application calls send() - ofp identifies socket structure - ofp copy buf in an odp_packet_t and store it in socket's send sockbuf. Usually, it will try to send packet form sockbuf immediately, but in congestion situations, it may stay longer in sockbuf. Eventually, send window is long enough and ofp will try to aggregate multiple sends into one bigger packet. Right now, this is done with memcpy(), but I was wondering if we can use packet ref APIs > >> >> >> Packetization function receives as arguments: a list of payload >> buffers (odp_packet_t), offset (start of the un-ack data) and >> a len (min(data available, send window size)). Length parameter is needed. > > > The odp_pktout_send() API currently does not support sending partial > packets, however you can simulate this with references. If you've > ACK'd a bytestream up to a certain point and the bytestream is held in > an odp_packet_t, you can create a reference at the ACK point and then > push the transmit headers you need onto that. For example: > > odp_packet_t bytestream; > int ack_point; > odp_packet_t xmit_hdrs; > ...setup xmit_hdrs as needed > > odp_packet_t xmit_pkt = odp_packet_ref_pkt(bytestream, ack_point, xmit_hdrs); > > odp_pktout_send(pktio, xmit_pkt, 1); > > ...etc. > > The xmit_pkt consists of the xmit_hdrs followed by the bytestream > starting at offset ack_point, which is what it sounds like you're > asking for. > Ideally, odp_packet_pull_tail() would work on xmit_pkt to send partial packets. Yet, not... >> >> >> > >> >> >> >> >> >> Resulting packet should look like: >> >> mb0: len 32 bytes to hold IPv4 + TCP headers >> >> reference to mb1: 20 bytes offset, 20 bytes len >> >> reference to mb2: 0 bytes offset, 40 bytes len >> >> reference to mb3: 0 offset, 20 bytes len >> > >> > >> > What you're describing here is either a composite packet that is assembled >> > from multiple different packets via odp_packet_concat(), or a compound >> > reference where the packet on which you're creating a reference is itself a >> > reference. You cannot create a reference directly to more than one packet. >> >> I meant a compound reference. >> >> > >> >> >> >> >> >> >> >> API calls: >> >> pkt = odp_packet_alloc(pool, 32); >> >> pkt = odp_packet_ref_pkt(mb1, 20, pkt); >> >> pkt = odp_packet_ref_pkt(mb2, 0, pkt); >> >> pkt = odp_packet_ref_pkt(mb3, 0, pkt); >> > >> > >> > Not sure if this is an omission, but having pkt on both sides of the >> > assignment here is attempting to create a circular reference (pkt refers to >> >> I thought it works like: >> >> pkt = pkt + new_buff; > > > Yes it does, however as a result pkt is now a reference and Petri > added the restriction to odp_packet_ref_pkt() that while the base > packet may be a reference, the hdr may not. This was to avoid > implementation complications in some platforms. Therefore the first > call will succeed but the others will fail because you're trying to > prefix mb2 with pkt, which is now a reference. > > If we want to revisit this restriction that's something that could be > discussed for post-Tiger Moth. > I agree. Adding buffers in reverse order is enough for now. >> >> > itself) and that is expressly forbidden by the APIs. The proper way to do >> > what I think you're trying to do is: >> > >> > odp_packet_t mb0, mb1, mb2, mb3, refpkt; >> > >> > ... setup mb0 to hold 14 bytes of Ethernet header >> > ... setup mb1 to hold 20 bytes of IPv4 header >> > ... setup mb2 to hold 20 bytes of TCP header >> > ... setup mb3 to hold your payload >> > >> > refpkt = odp_packet_ref_pkt(odp_packet_ref_pkt(odp_packet_ref_pkt(mb3, 0, >> > mb2), 0, mb1), 0, mb0); >> > >> >> It should work (except for missing length part) on single thread >> scenario, yet is an odd construction. > > Of you want to start at an offset in mb0, just change this to: > > refpkt = odp_packet_ref_pkt(odp_packet_ref_pkt(odp_packet_ref_pkt(mb3, > ack_point, mb2), 0, mb1), 0, mb0); > > As for the odd construction, you could expand this to: > > mb2 = odp_packet_ref_pkt(mb3, ack_point, mb2); > mb1 = odp_packet_ref_pkt(mb2, 0, mb1); > mb0 = odp_packet_ref_pkt(mb1, 0, mb0); > > which does the same thing. > >> Not sure if single thread scenario is good enough for this usecase. > > Not sure I understand this concern. The general ODP restriction that > only a single thread can own an odp_packet_t handle at a time is > unchanged here. Can you elaborate? In usecase above, odp packet mb1 will be allocated by application thread (call of send()) and handler is stored in a shared memory in socket's send buffer. Further, xmit_pkt may be allocated by the same thread or not (congestion) . Further, application may have multiple 'control' threads and may send() data on the same socket - second odp packet mb2 may be allocated by this thread. Assuming enough send window, xmit_pkt may be allocated by this thread and contain references to mb1 and mb2. Assuming some sort of TX bursting mechanism, may exist multiple xmit_pkts for the same socket allocated by different threads that are referencing the same odp packets. At send time, segments references must be decremented. At ACK time, segments references must be decremented. What do you mean by "a single thread can own an odp_packet_t handle"? > >> >> >> > You of course could use intermediate assignments to capture the handles for >> > the intermediate reference. >> > >> > The result is refpkt is the handle to the composite reference {mb0 || mb1 >> > || >> > mb2 || mb3}, where only mb0 is R/W and all others must be treated as R/O. >> > Of >> > the original packets you now have (in linux-generic, other implementations >> > may be slightly different). >> > >> > refpkt has a ref_count of 1 since it is a unique (unreferenced) packet. >> > mb0 has a ref_count of 1 since it is returned as refpkt. >> > mb1 has a ref_count of 2 since is is referred by to mb0/refpkt. >> > mb2 has a ref_count of 3 since it is referred to by mb1 and (indirectly) by >> > mb0/refpkt. >> > mb3 has a ref_count of 4 since it is referred to by mb2 and (indirectly) by >> > both mb1, and mb0/refpkt. >> > >> >> >> >> >> >> Two problems: >> >> 1. Cannot add mb2 and mb3 because pkt has references after first >> >> call to odp_packet_ref_pkt() >> >> >> >> * The packet handle ('pkt') may itself be a (dynamic) reference to a >> >> packet, >> >> * but the header packet handle ('hdr') must be unique. >> >> >> >> odp_packet_t odp_packet_ref_pkt(odp_packet_t pkt, uint32_t offset, >> >> >> >> odp_packet_t hdr); >> >> >> >> 2. Although packet reference can start with an offset, it cannot >> >> be less that all the rest of the packet (no size). >> > >> > >> > Hopefully the above explanation makes sense. >> > >> >> >> >> >> >> Is there any other way to create this packet structure? Maybe with >> >> odp_packet_concat() / odp_packet_split()? >> > >> > >> > odp_packet_concat() is actually quite efficient in the current >> > linux-generic >> > implementation since it just links segments and does not do data copies. >> > Since this results in a single packet the result is R/W. However it's also >> > not sharable, which may or may not be what you want. >> > >> >> >> >> >> >> Br, >> >> Bogdan >> > >> >