From: EXT Bill Fischofer [mailto:bill.fischo...@linaro.org] Sent: Tuesday, January 12, 2016 4:29 AM To: Savolainen, Petri (Nokia - FI/Espoo) Cc: LNG ODP Mailman List Subject: Re: [lng-odp] [RFC API-NEXT PATCH] api: packet: add packet segment manipulation
On Mon, Jan 11, 2016 at 7:47 AM, Petri Savolainen <petri.savolai...@nokia.com<mailto:petri.savolai...@nokia.com>> wrote: Packet segments can be allocated/freed/multi-referenced. Segments data pointer and length can be modified (push/pull). Segments can be linked to packets when needed (can exist also when not connected to packets). TODO: Packet pool params need tuning for segment length configuration. Signed-off-by: Petri Savolainen <petri.savolai...@nokia.com<mailto:petri.savolai...@nokia.com>> --- include/odp/api/packet.h | 82 +++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 74 insertions(+), 8 deletions(-) diff --git a/include/odp/api/packet.h b/include/odp/api/packet.h index 8651a47..7d315ba 100644 --- a/include/odp/api/packet.h +++ b/include/odp/api/packet.h @@ -118,6 +118,15 @@ int odp_packet_alloc_multi(odp_pool_t pool, uint32_t len, odp_packet_t pkt[], int num); /** + * Create a new reference to the packet + * + * @param pkt Packet handle + * + * @return New handle which refers to the input packet + */ +odp_packet_t odp_packet_ref(odp_packet_t pkt); Do we need to distinguish between non-shared vs. shared references? That was something we discussed earlier, which is why I had a 2nd parameter in my proposed version of this API. Under this form this implies that all references are shared, meaning that a change to the underlying packet via any handle is visible to all other references to that packet. OK if that's what we want, but we need to note this behavior. A non-shared reference to a full packet is a new copy of the packet, and we have odp_packet_copy() defined for that. I think we need to specify only if packet metadata is duplicated or not. If not, this is just a reference counter increment. If metadata is duplicated, this allows sharing the (payload) data and adding new head segment(s) to it (create N new packets from the same payload part). This was demonstrated on my example code. See “[RFC API-NEXT PATCH] packet segmentation”. So, yes I think the “duplicate vs share metadata” parameter is needed. + +/** * Free packet * * Frees the packet into the buffer pool it was allocated from. @@ -810,11 +819,44 @@ void odp_packet_shaper_len_adjust_set(odp_packet_t pkt, int8_t adj); */ /** + * Allocate a new segment + * + * Allocates a new segment from the specified packet pool. The segment + * is initialized with data pointers and lengths set according to the + * specified len. All other segment metadata are set to their default values. + * + * @param pool Pool handle + * @param len Segment data length + * + * @return Handle of allocated packet + * @retval ODP_PACKET_SEG_INVALID Segment could not be allocated + */ +odp_packet_seg_t odp_packet_seg_alloc(odp_pool_t pool, uint32_t len); The problem I have with this proposed API is that segments are used to represent the physical organization of packets at a platform level, which controls contiguous addressability. This is why the various packet APIs that return addresses also return a seg_len output parameter which tells the caller how many bytes are contiguously addressable from the returned address. Since segment lengths are inherently platform-specific, what happens if the caller requests a len that's larger than one of these platform-defined segment lengths? That would seem to get awkward as alloc calls would either fail unpredictably from platform to platform or else the implementation would have to allocate multiple physical segments and chain them together into a "logical segment" that would share the same ODP data type as a physical segment (odp_packet_seg_t), which would also be confusing and ambiguous. We already have seg_len packet pool parameter which (currently) ensures that the first segment has at least seg_len bytes. In practice, most HW pools/pktios/etc work on scatter-gather lists and fixed sized segments. An implementation just needs to choose the segment size during pool initialization. Valid value for ‘len’ here is 0 to pool_param.pkt.seg_len (or similar, the pool param definition needs also an update). Len is used for initialization of the segment data pointer and length. But a logical segment, as envisioned here is effectively what a packet is, which is why I've proposed that composites be packet-based rather than segment based. The difference is the amount of metadata that's carried. A simple solution would be to provide a hint to the implementation that full metadata is not required for one of these packets that are intended to be used as composite packet components. I'd rather tackle the metadata management issue separately rather than conflate physical and logical segments as that would seem to be more portable and more easily implementable across a range of HW-based or assisted implementations. If application pre-pins e.g. a constant tunnel header to a packet, the tunnel header itself is not a packet (alone). It’s just a small segment of data that you want to link into the head of an existing packet (without a need to copy it). This is what I demonstrated in my example code. It’s logically wrong and wasteful to call a piece of packet data a full packet. For example after the composite call under, a user could access the same packet through three different handles. It very easy to mix up which of the three is the head, the tail or the one carrying the “valid” metadata. Also explicit reference creation is missing. After couple of these composite rounds (while the packet travels through the application pipeline), you would have even more packet handles around pointing into the same packet. odp_packet_t odp_packet_push_prefix(odp_packet_t pkt, odp_packet_t prefix); Usually, HW defines a packet as a (single) packet descriptor structure, which includes a scatter-gather list (a table or a linked list) of data segment pointers and lengths. Also, we’ll need to define later on true composites of (full) packets. For example, many applications (including OFP) need to store and process locally lists of packets (e.g. keep a sorted list of received packets waiting for re-assembly, re-transmission, etc). A packet is a list of data segments with a common metadata record. A packet-list would be a list of related packets. Possible solutions to this would be either an additional parameter (e.g., odp_packet_alloc(pool, len, options);) or a separate API that also returns an odp_packet_t (e.g., odp_packet_alloc_subpacket(pool, len);). This would also have the advantage that per-subpacket metadata can be added back as needed to allow for more sophisticated structures over time. The other issue is that this API is that it introduces the notion of odp_packet_seg_t objects that are not associated with any packet since until they are attached to a packet they are "floating", which is something new. Currently an odp_packet_seg_t is always a part of a packet and simply represents how a platform physically organizes an odp_packet_t into memory. It's not clear how easy this change would be for many implementations. This alloc or another (odp_packet_seg_list_alloc(pool, len)) call could also return a list of segments if size of one segment is too limiting. In practice, HW stores data into segments that are multiples of cache line size (64, 128, 256, 512, … bytes) and in most of the cases header manipulation involves only the first one or two segments. So, zero-copy insertion / deletion of the first segment the main use case. + +/** + * Free segment + * + * Frees the segment into the pool it was allocated from. Segment must not be + * linked into any packet. + * + * @param seg Segment handle + */ +void odp_packet_seg_free(odp_packet_seg_t seg); If we use packets for composite parts, then this would just be odp_packet_free() for the subpacket. There are no “sub-packets” on the wire, or in the HW. So, better call packets and segments on their common names. + +/** + * Create a new reference to the segment + * + * @param seg Segment handle + * + * @return New handle which refers to the input segment + */ +odp_packet_seg_t odp_packet_seg_ref(odp_packet_seg_t seg); For packet-based composites this is not needed. From implementation point of view odp_packet_ref() for a sub-packet is the same operation. + +/** * Segment buffer address * * Returns start address of the segment. * - * @param pkt Packet handle * @param seg Segment handle * * @return Start address of the segment @@ -822,28 +864,26 @@ void odp_packet_shaper_len_adjust_set(odp_packet_t pkt, int8_t adj); * * @see odp_packet_seg_buf_len() */ -void *odp_packet_seg_buf_addr(odp_packet_t pkt, odp_packet_seg_t seg); +void *odp_packet_seg_buf_addr(odp_packet_seg_t seg); Having the two arguments to this call was deemed useful at the time as a convenience for implementations since segments were always associated with a packet. Need to check how this complicates existing HW-based implementations to separate these two concepts. Again, aggregating packets would have no such issues. As mentioned above, true packet aggregation (e.g. a list of packets) is needed later on. Usage of a packet handle per segment would not be more efficient, it would just introduce lots of extra specification work to define when a packet is actually a “sub-packet” and what features can be done for a “sub-packet” vs a “real-packet”. Better to just use segment type for segments and packet type for packets. /** * Segment buffer length * * Returns segment buffer length in bytes. * - * @param pkt Packet handle * @param seg Segment handle * * @return Segment buffer length in bytes * * @see odp_packet_seg_buf_addr() */ -uint32_t odp_packet_seg_buf_len(odp_packet_t pkt, odp_packet_seg_t seg); +uint32_t odp_packet_seg_buf_len(odp_packet_seg_t seg); Same comment as for odp_packet_seg_buf_addr() /** * Segment data pointer * * Returns pointer to the first byte of data in the segment. * - * @param pkt Packet handle * @param seg Segment handle * * @return Pointer to the segment data @@ -851,21 +891,29 @@ uint32_t odp_packet_seg_buf_len(odp_packet_t pkt, odp_packet_seg_t seg); * * @see odp_packet_seg_data_len() */ -void *odp_packet_seg_data(odp_packet_t pkt, odp_packet_seg_t seg); +void *odp_packet_seg_data(odp_packet_seg_t seg); Same comment as above. /** * Segment data length * * Returns segment data length in bytes. * - * @param pkt Packet handle * @param seg Segment handle * * @return Segment data length in bytes * * @see odp_packet_seg_data() */ -uint32_t odp_packet_seg_data_len(odp_packet_t pkt, odp_packet_seg_t seg); +uint32_t odp_packet_seg_data_len(odp_packet_seg_t seg); Same comment as above. + + +void *odp_packet_seg_push_head(odp_packet_seg_t seg, uint32_t len); + +void *odp_packet_seg_pull_head(odp_packet_seg_t seg, uint32_t len); + +void *odp_packet_seg_push_tail(odp_packet_seg_t seg, uint32_t len); + +void *odp_packet_seg_pull_tail(odp_packet_seg_t seg, uint32_t len); We're now introducing the notion of segment-based headroom and tailroom, which is metadata, which is additional overhead for "regular" segments that are not part of composites. Again, much easier to do this by compositing packets rather than segments since each part itself has access to the packet APIs, which include the notion of headroom and tailroom. You’d reuse these calls but would need to re-define all other current and future (about 50 of them ?) odp_packet_xxx calls (= define on each call what happens when called on “main-packet” vs. “sub-packet” vs. “sub-sub-packet”, …). /* @@ -875,6 +923,24 @@ uint32_t odp_packet_seg_data_len(odp_packet_t pkt, odp_packet_seg_t seg); * */ +/** + * Insert segment into packet at an offset + * + * Links a segment into packet. Operation updates data pointers, offsets and + * lengths, but not other packet metadata (e.g. L2/L3/L4 offsets and pointers). + * The packet is not modified on an error. + * + * Segment must have been allocated from the same pool as the packet. + * + * @param pkt Packet handle + * @param offset Byte offset into the packet + * @param seg Segment to be added into the offset + * + * @retval 0 on success + * @retval !0 on failure + */ +int odp_packet_add_seg(odp_packet_t pkt, uint32_t offset, + odp_packet_seg_t seg); I agree that this is a better approach than the prefix/suffix routines I originally proposed. So perhaps: odp_packet_t odp_packet_add packet(odp_packet_t pkt, uint32_t offset, odp_packet_t subpacket); Returns a composite handle that results from inserting subpacket into packet at offset offset, or ODP_PACKET_INVALID if the operation is unsuccessful. Presumably there would also be a corresponding odp_packet_rem_packet() to reverse this operation (as I assume there would be a symmetric odp_packet_rem_seg() call in this nomenclature). Again multiplication of full packet metadata on each segment would not bring implementation benefit, but unused metadata and specification effort to ban usage of that unused metadata. We could also have a segment list version of this, which would allow addition of multiple segments at once. odp_packet_add_seg_list(odp_packet_t pkt, uint32_t offset, odp_packet_seg_t seg[], int num_seg); - Petri /** * Add data into an offset -- 2.6.3 _______________________________________________ lng-odp mailing list lng-odp@lists.linaro.org<mailto:lng-odp@lists.linaro.org> https://lists.linaro.org/mailman/listinfo/lng-odp
_______________________________________________ lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp