From: EXT Bill Fischofer [mailto:bill.fischo...@linaro.org]
Sent: Tuesday, January 12, 2016 4:29 AM
To: Savolainen, Petri (Nokia - FI/Espoo)
Cc: LNG ODP Mailman List
Subject: Re: [lng-odp] [RFC API-NEXT PATCH] api: packet: add packet segment 
manipulation



On Mon, Jan 11, 2016 at 7:47 AM, Petri Savolainen 
<petri.savolai...@nokia.com<mailto:petri.savolai...@nokia.com>> wrote:
Packet segments can be allocated/freed/multi-referenced.
Segments data pointer and length can be modified (push/pull).
Segments can be linked to packets when needed (can exist also
when not connected to packets).

TODO: Packet pool params need tuning for segment length
configuration.

Signed-off-by: Petri Savolainen 
<petri.savolai...@nokia.com<mailto:petri.savolai...@nokia.com>>
---
 include/odp/api/packet.h | 82 +++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 74 insertions(+), 8 deletions(-)

diff --git a/include/odp/api/packet.h b/include/odp/api/packet.h
index 8651a47..7d315ba 100644
--- a/include/odp/api/packet.h
+++ b/include/odp/api/packet.h
@@ -118,6 +118,15 @@ int odp_packet_alloc_multi(odp_pool_t pool, uint32_t len,
                           odp_packet_t pkt[], int num);

 /**
+ * Create a new reference to the packet
+ *
+ * @param pkt           Packet handle
+ *
+ * @return New handle which refers to the input packet
+ */
+odp_packet_t odp_packet_ref(odp_packet_t pkt);

Do we need to distinguish between non-shared vs. shared references?  That was 
something we discussed earlier, which is why I had a 2nd parameter in my 
proposed version of this API.  Under this form this implies that all references 
are shared, meaning that a change to the underlying packet via any handle is 
visible to all other references to that packet.  OK if that's what we want, but 
we need to note this behavior.

A non-shared reference to a full packet is a new copy of the packet,  and we 
have odp_packet_copy() defined for that.
I think we need to specify only if packet metadata is duplicated or not. If 
not, this is just a reference counter increment. If metadata is duplicated, 
this allows sharing the (payload) data and adding new head segment(s) to it 
(create N new packets from the same payload part). This was demonstrated on my 
example code. See “[RFC API-NEXT PATCH] packet segmentation”.

So, yes I think the “duplicate vs share metadata” parameter is needed.


+
+/**
  * Free packet
  *
  * Frees the packet into the buffer pool it was allocated from.
@@ -810,11 +819,44 @@ void odp_packet_shaper_len_adjust_set(odp_packet_t pkt, 
int8_t adj);
  */

 /**
+ * Allocate a new segment
+ *
+ * Allocates a new segment from the specified packet pool. The segment
+ * is initialized with data pointers and lengths set according to the
+ * specified len. All other segment metadata are set to their default values.
+ *
+ * @param pool          Pool handle
+ * @param len           Segment data length
+ *
+ * @return Handle of allocated packet
+ * @retval ODP_PACKET_SEG_INVALID  Segment could not be allocated
+ */
+odp_packet_seg_t odp_packet_seg_alloc(odp_pool_t pool, uint32_t len);

The problem I have with this proposed API is that segments are used to 
represent the physical organization of packets at a platform level, which 
controls contiguous addressability.  This is why the various packet APIs that 
return addresses also return a seg_len output parameter which tells the caller 
how many bytes are contiguously addressable from the returned address. Since 
segment lengths are inherently platform-specific, what happens if the caller 
requests a len that's larger than one of these platform-defined segment 
lengths?  That would seem to get awkward as alloc calls would either fail 
unpredictably from platform to platform or else the implementation would have 
to allocate multiple physical segments and chain them together into a "logical 
segment" that would share the same ODP data type as a physical segment 
(odp_packet_seg_t), which would also be confusing and ambiguous.

We already have seg_len packet pool parameter which (currently) ensures that 
the first segment has at least seg_len bytes. In practice, most HW 
pools/pktios/etc work on scatter-gather lists and fixed sized segments. An 
implementation just needs to choose the segment size during pool initialization.

Valid value for  ‘len’ here is 0 to pool_param.pkt.seg_len (or similar, the 
pool param definition needs also an update). Len is used for initialization of 
the segment data pointer and length.


But a logical segment, as envisioned here is effectively what a packet is, 
which is why I've proposed that composites be packet-based rather than segment 
based.  The difference is the amount of metadata that's carried.  A simple 
solution would be to provide a hint to the implementation that full metadata is 
not required for one of these packets that are intended to be used as composite 
packet components.  I'd rather tackle the metadata management issue separately 
rather than conflate physical and logical segments as that would seem to be 
more portable and more easily implementable across a range of HW-based or 
assisted implementations.


If application pre-pins e.g. a constant tunnel header to a packet,  the tunnel 
header itself is not a packet (alone). It’s just a small segment of data that 
you want to link into the head of an existing packet (without a need to copy 
it). This is what I demonstrated in my example code.

It’s logically wrong and wasteful to call a piece of packet data a full packet. 
For example after the composite call under, a user could access the same packet 
through three different handles. It very easy to mix up which of the three is 
the head, the tail or the one carrying the “valid” metadata. Also explicit 
reference creation is missing. After couple of these composite rounds (while 
the packet travels through the application pipeline), you would have even more 
packet handles around pointing into the same packet.


odp_packet_t odp_packet_push_prefix(odp_packet_t pkt, odp_packet_t prefix);
Usually, HW defines a packet as a (single) packet descriptor structure, which 
includes a scatter-gather list (a table or a linked list) of data segment 
pointers and lengths.

Also, we’ll need to define later on true composites of (full) packets. For 
example, many applications (including OFP) need to store and process locally 
lists of packets (e.g. keep a sorted list of received packets waiting for 
re-assembly, re-transmission, etc).

A packet is a list of data segments with a common metadata record.
A packet-list would be a list of related packets.


Possible solutions to this would be either an additional parameter (e.g., 
odp_packet_alloc(pool, len, options);) or a separate API that also returns an 
odp_packet_t (e.g., odp_packet_alloc_subpacket(pool, len);).  This would also 
have the advantage that per-subpacket metadata can be added back as needed to 
allow for more sophisticated structures over time.

The other issue is that this API is that it introduces the notion of 
odp_packet_seg_t objects that are not associated with any packet since until 
they are attached to a packet they are "floating", which is something new.  
Currently an odp_packet_seg_t is always a part of a packet and simply 
represents how a platform physically organizes an odp_packet_t into memory. 
It's not clear how easy this change would be for many implementations.

This alloc or another (odp_packet_seg_list_alloc(pool, len)) call could also 
return a list of segments if size of one segment is too limiting. In practice, 
HW stores data into segments that are multiples of cache line size (64, 128, 
256, 512, … bytes) and in most of the cases header manipulation involves only 
the first one or two segments. So, zero-copy insertion / deletion of the first 
segment the main use case.


+
+/**
+ * Free segment
+ *
+ * Frees the segment into the pool it was allocated from. Segment must not be
+ * linked into any packet.
+ *
+ * @param seg           Segment handle
+ */
+void odp_packet_seg_free(odp_packet_seg_t seg);

If we use packets for composite parts, then this would just be 
odp_packet_free() for the subpacket.


There are no “sub-packets” on the wire, or in the HW. So, better call packets 
and segments on their common names.



+
+/**
+ * Create a new reference to the segment
+ *
+ * @param seg           Segment handle
+ *
+ * @return New handle which refers to the input segment
+ */
+odp_packet_seg_t odp_packet_seg_ref(odp_packet_seg_t seg);

For packet-based composites this is not needed.

From implementation point of view odp_packet_ref() for a sub-packet is the same 
operation.


+
+/**
  * Segment buffer address
  *
  * Returns start address of the segment.
  *
- * @param pkt  Packet handle
  * @param seg  Segment handle
  *
  * @return  Start address of the segment
@@ -822,28 +864,26 @@ void odp_packet_shaper_len_adjust_set(odp_packet_t pkt, 
int8_t adj);
  *
  * @see odp_packet_seg_buf_len()
  */
-void *odp_packet_seg_buf_addr(odp_packet_t pkt, odp_packet_seg_t seg);
+void *odp_packet_seg_buf_addr(odp_packet_seg_t seg);

Having the two arguments to this call was deemed useful at the time as a 
convenience for implementations since segments were always associated with a 
packet.  Need to check how this complicates existing HW-based implementations 
to separate these two concepts.  Again, aggregating packets would have no such 
issues.

As mentioned above, true packet aggregation (e.g. a list of packets) is needed 
later on. Usage of a packet handle per segment would not be more efficient, it 
would just introduce lots of extra specification work to define when a packet 
is actually a “sub-packet” and what features can be done for a “sub-packet” vs 
a “real-packet”. Better to just use segment type for segments and packet type 
for packets.



 /**
  * Segment buffer length
  *
  * Returns segment buffer length in bytes.
  *
- * @param pkt  Packet handle
  * @param seg  Segment handle
  *
  * @return  Segment buffer length in bytes
  *
  * @see odp_packet_seg_buf_addr()
  */
-uint32_t odp_packet_seg_buf_len(odp_packet_t pkt, odp_packet_seg_t seg);
+uint32_t odp_packet_seg_buf_len(odp_packet_seg_t seg);
Same comment as for odp_packet_seg_buf_addr()

 /**
  * Segment data pointer
  *
  * Returns pointer to the first byte of data in the segment.
  *
- * @param pkt  Packet handle
  * @param seg  Segment handle
  *
  * @return  Pointer to the segment data
@@ -851,21 +891,29 @@ uint32_t odp_packet_seg_buf_len(odp_packet_t pkt, 
odp_packet_seg_t seg);
  *
  * @see odp_packet_seg_data_len()
  */
-void *odp_packet_seg_data(odp_packet_t pkt, odp_packet_seg_t seg);
+void *odp_packet_seg_data(odp_packet_seg_t seg);

Same comment as above.


 /**
  * Segment data length
  *
  * Returns segment data length in bytes.
  *
- * @param pkt  Packet handle
  * @param seg  Segment handle
  *
  * @return  Segment data length in bytes
  *
  * @see odp_packet_seg_data()
  */
-uint32_t odp_packet_seg_data_len(odp_packet_t pkt, odp_packet_seg_t seg);
+uint32_t odp_packet_seg_data_len(odp_packet_seg_t seg);

Same comment as above.

+
+
+void *odp_packet_seg_push_head(odp_packet_seg_t seg, uint32_t len);
+
+void *odp_packet_seg_pull_head(odp_packet_seg_t seg, uint32_t len);
+
+void *odp_packet_seg_push_tail(odp_packet_seg_t seg, uint32_t len);
+
+void *odp_packet_seg_pull_tail(odp_packet_seg_t seg, uint32_t len);

We're now introducing the notion of segment-based headroom and tailroom, which 
is metadata, which is additional overhead for "regular" segments that are not 
part of composites.  Again, much easier to do this by compositing packets 
rather than segments since each part itself has access to the packet APIs, 
which include the notion of headroom and tailroom.

You’d reuse these calls but would need to re-define all other current and 
future (about 50 of them ?) odp_packet_xxx calls (= define on each call what 
happens when called on “main-packet” vs. “sub-packet” vs. “sub-sub-packet”, …).






 /*
@@ -875,6 +923,24 @@ uint32_t odp_packet_seg_data_len(odp_packet_t pkt, 
odp_packet_seg_t seg);
  *
  */

+/**
+ * Insert segment into packet at an offset
+ *
+ * Links a segment into packet. Operation updates data pointers, offsets and
+ * lengths, but not other packet metadata (e.g. L2/L3/L4 offsets and pointers).
+ * The packet is not modified on an error.
+ *
+ * Segment must have been allocated from the same pool as the packet.
+ *
+ * @param pkt     Packet handle
+ * @param offset  Byte offset into the packet
+ * @param seg     Segment to be added into the offset
+ *
+ * @retval 0 on success
+ * @retval !0 on failure
+ */
+int odp_packet_add_seg(odp_packet_t pkt, uint32_t offset,
+                      odp_packet_seg_t seg);

I agree that this is a better approach than the prefix/suffix routines I 
originally proposed.  So perhaps:

odp_packet_t odp_packet_add packet(odp_packet_t pkt, uint32_t offset, 
odp_packet_t subpacket);

Returns a composite handle that results from inserting subpacket into packet at 
offset offset, or ODP_PACKET_INVALID if the operation is unsuccessful.

Presumably there would also be a corresponding odp_packet_rem_packet() to 
reverse this operation (as I assume there would be a symmetric 
odp_packet_rem_seg() call in this nomenclature).


Again multiplication of full packet metadata on each segment would not bring 
implementation benefit, but unused metadata and specification effort to ban 
usage of that unused metadata.

We could also have a segment list version of this, which would allow addition 
of multiple segments at once.
odp_packet_add_seg_list(odp_packet_t pkt, uint32_t offset, odp_packet_seg_t 
seg[], int num_seg);

- Petri




 /**
  * Add data into an offset
--
2.6.3

_______________________________________________
lng-odp mailing list
lng-odp@lists.linaro.org<mailto:lng-odp@lists.linaro.org>
https://lists.linaro.org/mailman/listinfo/lng-odp

_______________________________________________
lng-odp mailing list
lng-odp@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/lng-odp

Reply via email to