Re: [lng-odp] RFC: packet interface to drivers

Christophe Milard Thu, 22 Sep 2016 01:22:57 -0700

On 21 September 2016 at 17:55, Bill Fischofer <bill.fischo...@linaro.org> wrote:
>
>
>
> On Tue, Sep 20, 2016 at 10:16 AM, Christophe Milard 
> <christophe.mil...@linaro.org> wrote:
>>
>>
>>
>> On 20 September 2016 at 16:01, Bill Fischofer <bill.fischo...@linaro.org> 
>> wrote:
>>>
>>>
>>>
>>> On Tue, Sep 20, 2016 at 8:30 AM, Christophe Milard 
>>> <christophe.mil...@linaro.org> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am here trying to make a summary of what is needed by the driver 
>>>> interface
>>>> regarding odp packet handling. Will serve as the base for the discussions
>>>> at connect. Please read and comment... possibly at connect...
>>>>
>>>> /Christophe
>>>>
>>>> From the driver perspective, the situation is rather simple: what we need 
>>>> is:
>>>>
>>>> /* definition of a packet segment descriptor:
>>>>  * A packet segment is just an area, continuous in virtual address space,
>>>>  * and continuous in the physical address space -at least when no iommu is
>>>>  * used, e.g for virtio-. Probably we want to have physical continuity in
>>>>  * all cases (to avoid handling different cases to start with), but that
>>>>  * would not take advantage of the remapping that can be done by iommus,
>>>>  * so it can come with a little performance penalty for iommu cases.
>>>
>>>
>>> I thought we had discussed and agreed that ODP would assume it is running 
>>> on a platform with IOMMU capability? Are there any non-IOMMU platforms of 
>>> interest that we need to support? If not, then I see no need to make this 
>>> provision. In ODP we already have an odp_packet_seg_t type that represents 
>>> a portion of an odp_packet_t that can be contiguously addressed.
>>
>>
>> yes. we did. but then the focus changed to virtio. there is no iommu there...
>
>
> I thought virtio is independent of the underlying HW. If we assume the 
> underlying HW has an IOMMU, then virtio should see the benefits of that, no?



I wish you were right, but this is not my understanding: The iommu is
a physical device that sits between the IO HW (the pci bus assuming
pci) and the physical memory. The virtio communication between a
switch in a host system and odp running in a guest only involves
memory. The physical iommu is not usable there, as I understand. It
seems there are ongoing initiative to emulate iommu in qemu/kvm so
that guests could use the iommu/dma interface on all drivers, but I
doesn't seem mature yet.
I'd be happy to hear what virtualization gurus say on this topic, I
have to admit.
But since our focus moved from physical pci NIC to host to guest
access with virtio, my understandling is that we'll have to cope with
guest physical memory. sadly.
>
>
>>
>>
>>>
>>>
>>>>
>>>>  * Segments are shared among all odp threads (including linux processes),
>>>
>>>
>>> Might be more precise to simply say "segments are accessible to all odp 
>>> threads". Sharing implies simultaneous access, along with some notion of 
>>> coherence, which is something that probably isn't needed.
>>
>>
>> Tell me if I am wrong, but the default in ODP is that a queue access can be 
>> shared between different ODP thread (there is a flag  to garantee 
>> 1thread<->1queue access -and hence to have performance benefit-), but as it 
>> is now, nothing
>
>
> Yes, queues store events and can be shared among threads, but remember that 
> what's on a queue is an odp_event_t, *not* an address. Events of most 
> interest to drivers are packets, which are of type odp_packet_t and are 
> obtained via the odp_packet_from_event() API. An odp_packet_t cannot be used 
> to access memory since it is not an address but an opaque type. If the driver 
> (or anyone else) needs to address the contents of an odp_packet_t, it calls a 
> data access function like odp_packet_data(), odp_packet_offset(), 
> odp_packet_seg_data(), etc., which returns a void * that can be used for 
> memory access. How this is done is implementation-defined, but the intended 
> user of the returned address is solely the calling thread.


I don't really agree with that: Not that I see any error in what you
said, but all this is valid on the north API: what we decide to show
to a driver on the south API does not need to be the same: the
abstract notion of event is probably of no interest for a driver: even
the odp_packet_t don't need to be the same as the odpdrv_packet_t (or
whatever we call it, if there is a need for it): If these 2 objects
most likely refer to the same physical memory (at least on system
where packets are on main CPU memory), many packet manipulation
methods available on the north API interface are of no interest on the
south interface. Likewise, many methods on the south interface (such
as those splitting the packet in its physically contiguous segments)
have to reasons to be known on the north interface.
Is there any reason to abstract the packet type and then use a method
to get an address when ALL drivers are known to be willing to get this
address?
My point is: the north application interface and the south driver
interface are 2 completely separated interfaces now: we can define
whatever we want on the south interface as long as is ease driver and
ODP implementation and performance. The way packet are shown of both
interfaces does not need to be the same.
>
>
>>
>> prevent thread A to but something in the TX ring buffer and thread B to free 
>> the TX'ed data when putting its own stuff in the same TX queue. Same 
>> shareability in RX.
>
>
> The only way packets are freed is via the odp_packet_free() API, which takes 
> an odp_packet_t as an argument, not a data address. odp_packet_t's are freely 
> shareable among threads in the same ODP instance. So Thread B's call is 
> independent of what the odp_packet_t represents.


That is true on the north interface: If we intended to use the same
API on the south interface, we should have used one single interface
to both the application and the drivers (this was rejected after my
first driver attempt). We have decided the use of 2 distinct
interfaces, so there is no need to define the same API, unless we see
a reason to do so.
For drivers it seems reasonable to mosly work with segments of
physical contiguous memory (for non iommu cases at least). freeing the
last segment of a packet may be a valid way to free a packet (or
decreasing its refcount) on that south interface.
>
>
>>
>> With these ODP assumptions, we have to access the segments from  different 
>> ODP threads. I would be very pleased to be wrong here :-)
>
>
> If you wish to access segments from different threads you do so via the same 
> API calls that any thread uses: odp_packet_data(), etc., which take an 
> odp_packet_t and returns a void * valid for the caller.


same here: this are north interface assumptions: we do not need to do
that on drivers which use another interface.
To be honest, I am afraid that going through all this dereferencing +
getting different address (in different VA spaces) in all ODP threads
making "the driver" would make this driver rather inefficient.
>
>
>>
>> Maybe I should say that I don't think it is an option to have a context 
>> switch at each driver "access", i.e. I don't see a driver as its own ODP 
>> thread/linux process being accessed by some IPC: For me, any ODPthread 
>> sending/receiving packet will act as a driver (same context).
>
>
> Agreed. The driver effectively runs under the calling thread, either 
> directly, for poll-mode I/O, or indirectly via the scheduler (RX) or traffic 
> manager (TX).


 OK :-)

> All use the same odp_packet_t handles either directly or packaged as 
> odp_event_t's when queues are involved.


no. again: different interface: if driver have to use the primitice
available on the north interface, why did we deicide to define a south
one?
>
>
>
>>
>>
>>>
>>>
>>>>
>>>>  * and are guaranteed to be mapped at the same virtual address space in
>>>>  * all ODP instances (single_va flag in ishm) */
>>>
>>>
>>> Why is this important? How does Thread A know how a segment is accessible 
>>> by Thread B, and does it care?
>>
>>
>> I am afraid it is with regard to my previous answer. If addresses of segment 
>> (and packets) differ from thread to thread, no reference via shared pointer 
>> will be possible between the ODP threads acting as driver =>loss in 
>> efficiency.
>
>
> That may or may not be true, and while it may be part of a given 
> implementation, it's not part of the ODP architecture. We need to make a 
> clear distinction here as we're doing two things. First and foremost, we're 
> defining a southbound API that, like the current northbound API, may support 
> many possible implementations. Second, we're creating a reference 
> implementation of that southbound API in odp-linux. A driver, in this sense, 
> is like an ODP application and should be usable with and portable across any 
> implementation of the defined southbound API, just as ODP applications are 
> portable across implementations of the northbound APIs.


Now we are back to memory and "shareness" of pointers. point S19 S20
and S21 of 
https://collaborate.linaro.org/pages/viewpage.action?spaceKey=ODP&title=odp_thread+and+shmem+debate...
If I remember right, Bill, you were among the persons supporting the
idea that the SINGLE_VA flag  of the ishm allocator should be
available to the north interface (meaning that when shared memory is
allocated, one could decide whether the VA address returned by the
shm_get_addr() function should be the same an all ODP threads -even
processes)...
I am not sure this was a good idea as it makes a lot of assumption on
the implementation to guarantee this "thread behaviour" on processes.
As the application writer can choose between forking processes or
using pthreads, he is given the ability to choose between shared
addresses or not anyway, even if the SINGLE_VA flag is not available
on the north interface.

For drivers, on the other hand, the problem is different: The driver
writer will not be able to assume common virtual addresses on anything
-unless the south interface specification guarantees it- because the
driver writer will not be able to choose which kind of odpthread
(pthreads or processes) share the TX or RX interfaces. The application
chooses.
If the drivers cannot assume any common VA for any of their objects,
it would more or less ban the usage of pointers for them, which I
assume would lead to quite a large efficiency loss.
Now regarding the packets themselves: Are packet handles scoped by the
ODP threads or ODP instance? -S18 in the previous doc- (by that, I
mean can one allocate a packet in thread A,  save its handle in a file
and from thread B read this handle and free the packet, for instance).
This is raised by S18 in the document. But Jerry and Bala said that
ODP handles for certain objects (packets included) would have the ODP
thread as scope.
Now assume the following scenario: Threads A sends 3 packets on
interface I. The driver code (running in thread A context), puts these
3 packet on the TX ringbuffer. Now, thread B wants to send 1000
packets on the same interface I. thread B may ignore the stuff of
thread A to start with and put its packets following those of thread A
on the ring buffer, but after some time (at least when the ring buffer
wraps around), thread B will have to remove the item from thread A
from the ring buffer and replace them with its own: At this time,
thread B has to release the packets sent by thread A. How would it do
if neither the packet address nor its saved handle (in some related
driver structure, not necesseraly in a file) makes sense it its scope?
>
>
>>>
>>>
>>>>
>>>>  * Note that this definition just implies that a packet segment is 
>>>> reachable
>>>>  * by the driver. A segment could actually be part of a HW IO chip in a HW
>>>>  * accelerated HW.
>>>
>>>
>>> I think this is the key. All that (should) be needed is for a driver to be 
>>> able to access any segment that it is working with. How it does so would 
>>> seem to be secondary from an architectural perspective.
>>
>>
>> Sure. but we still have to implement something on linux-generic, and to make 
>> it possible for other to do something good.
>
>
> Agreed, but odp-linux is just an implementation of the API specification. The 
> specification is independent of odp-linux. So while we may well map 
> everything to the same addresses in odp-linux, the architecture does not 
> require that.


Mapping everything at the same address for linux while not
guaranteeing it at south API specification level is useless for the
driver: The goal is that a given driver (for a given (possibly
simulated) HW) should work on all ODP implementations: so the driver
will have to cope with the least common denominator.
The driver will not be able to take advantage of any linux-gen specific choice.
>
>
>>
>>
>>>
>>>
>>>>
>>>> /* for linux-gen:
>>>>  * Segment are memory areas.
>>>>  * In TX, pkt_sgmt_join() put the pointer to the odp packet in the 
>>>> 'odp_private'
>>>>  * element of the last segment of each packet, so that pkt_sgmt_free()
>>>>  * can just do nothing when odp_private is NULL and release the complete
>>>>  * odp packet when not null. Segments allocated with pkt_sgmt_alloc()
>>>>  * will have their odp_private set to NULL. The name and the 'void*' is
>>>>  * to make that opaque to the driver interface which really should not 
>>>> care...
>>>>  * Other ODP implementation could handle that as they wish.
>>>
>>>
>>> Need to elaborate on this. Currently we have an odp_packet_alloc() API that 
>>> allocates a packet that consists of one or more segments. What seems to be 
>>> new from the driver is the ability to allocate (and free) individual 
>>> segments and then (a) assemble them into odp_packet_t objects or (b) remove 
>>> them from odp_packet_t objects so that they become unaffiliated raw 
>>> segments not associated with any odp_packet_t.
>>
>>
>> Yes a) is definitely needed. We have to be able to allocate segments without 
>> telling which ODP packet they refer to: simply because we cannot know that 
>> at alloc time (at least for some NICs) what packet segment would relate to 
>> which packet: if we put 32 x 2K segments in a RX buffer, this can result as 
>> one single ODP packet using them all (for a 64K jumbo frame) or 32 separate 
>> ODP standart packets. We don't know at alloc time
>
>
> Agreed. This is one of the key new API areas needed.
>
>>
>> No, b) is not really as you wrote, I think: we have to be able to split the 
>> packet to be transmitted into its physically contiguous components (assuming 
>> no iommu, e.g. for virtio), but it is fully acceptable not to be able to 
>> re-use a TX'd segment, i.e. the segment can die with the ODP packet, 
>> probably when the last segment is freed but not even necesseraly: an ODP 
>> packet having multiple reference to it would keep all its segments valid 
>> untill the last reference diseappear.
>> So calling pkt_sgmt_free() would do nothing except for the last segment of a 
>> packet (It is reasonable to assume order as segments will be TX'd from firt 
>> to last, I assume). On the last segment, it would decrease the packet ref 
>> count and free the ODP packet (and hence all its segments) when refcount 
>> reaches 0.
>> This is a bit asymetrical, you are right: segments need to be 
>> packet-relation free when allocated, but this requirement is not needed in 
>> TX, and we can gain performance with this asymmetry, I assume.
>
>
> I think you're right here and my initial generalization is overkill. Once 
> assembled into an odp_packet_t, the individual odp_packet_seg_t's remain with 
> it until the containing packet is freed, or until they are moved to some 
> other packet as part of some other packet-manipulation API like 
> odp_packet_split(), etc. Once assembled, they can no longer become "naked" 
> segments but are always part of a containing odp_packet_t.
>
> I think this is one of the reasons why we probably want to have a separate 
> odp_segment_t type that represents a "naked" segment, while the 
> odp_packet_seg_t represents an odp_segment_t once it has become part of an 
> odp_packet_t. So the relationship is sort of like odp_event_t being a base 
> type and odp_packet_t being a specialization of that event type. An 
> odp_segment_t transforms once it is a part of some larger aggregate, of which 
> odp_packet_t's are the only currently-defined examples. In our original 
> discussions we had the notion that buffers could potentially be segmented as 
> well. If we had need to re-introduce that concept, for example, than an 
> odp_segment_t could similarly transform into an odp_buffer_seg_t, etc.


Once again: south interface => we choose what fits best:
To me:
*In RX: The driver will need to provide a RX function which needs:
-A function to allocate "packet unrelated" segments.
-A function to assemble a set of this packets into a odpdrv_packet
-A function to pass the odpdrv_packets to the upper ODP.
 (if these 2 last functions are made into one "assemble_and_pass()"
function, the knownledge of odpdrv_packet may even be not needed)
-A function to release free ("packet unrelated") segments. needed at
termination when closing the interface to release non used RX
segments.
*In TX, the driver needs to provide a TX function for transmitting a
set of segments (belonging to a same packet). which needs:
-a function to release the transmitted segments.

I am not even sure the drivers needs to do much more about packets.

>
>
>>>
>>>
>>> So it seems we need a corresponding set of odp_segment_xxx() APIs that 
>>> operate on a new base type: odp_segment_t. An odp_segment_t becomes an 
>>> odp_packet_seg_t when it (and possibly other segments) are converted into 
>>> an odp_packet_t as part of a packet assembly operation. Conversely, an 
>>> odp_packet_seg_t becomes an odp_segment_t when it is disconnected from an 
>>> odp_packet_t.
>>
>>
>> I don't think we have a need for this 2 types (odp_segment_t and 
>> odp_packet_seg_t depending on packet relationship): I think it is simpler to 
>> give only one type for the driver: segment. The driver just see an alloc() 
>> and free() function for those, and a function to Tell ODP which segment form 
>> a packet at RX time. The fact that there is an asymmetry in the ODP internal 
>> implementation of segment does not even need to be visible to the driver, so 
>> I don't see the need to expose these 2 types.
>
>
> We definitely need to work through the details on this. My concern is that if 
> we overload an odp_packet_seg_t as being able to be either part of or not 
> part of a packet, then any API that takes an odp_packet_seg_t might need to 
> always add a check (have I mistakenly been passed a "naked" segment?) that 
> would add inefficiency when compared to the compile-time type checking that 
> is implicit in having separate (convertible) types.

naked segments and packet segments can be 2 different types if we
wish. I don't see any problem with that if we need. TX and RX segment
will have to be kept separated anyway.

>
>>
>>
>>>
>>>
>>>>
>>>>  */
>>>>
>>>> typedef uint64_t phy_address_t;
>>>>
>>>> typedef struct{
>>>>         void            *address;
>>>>         phy_address_t   phy_addr;
>>>>         uint32_t        len;
>>>>         void*           odp_private;
>>>> } pkt_sgmt_t;
>>>>
>>>> /* FOR RX: */
>>>> /* segment allocation function:
>>>>  * As it is not possible to guarantee physical memory continuity from
>>>>  * user space, this segment alloc function is best effort:
>>>>  * The size passed in parameter is a hint of what the most probable 
>>>> received
>>>>  * packet size could be: this alloc function will allocate a segment whose 
>>>> size
>>>>  * will be greater or equal to the required size if the latter can fit in
>>>>  * a single page (or huge page), hence guarateeing the segment physical
>>>>  * continuity.
>>>>  * If there is no physical page large enough for 'size' bytes, then
>>>>  * the largest page is returned, meaning that in that case the allocated
>>>>  * segment will be smaller than the required size. (the received packet
>>>>  * will be fragmented in this case).
>>>>  * This pkt_sgmt_alloc function is called by the driver RX side to populate
>>>>  * the NIC RX ring buffer(s).
>>>>  * returns the number of allocated segments (1) on success or 0 on error.
>>>>  * Note: on unix system with 2K and 2M pages, this means that 2M will get
>>>>  * allocated for each large (64K?) packet... to much waste? should we 
>>>> handle
>>>>  * page fragmentation (which would really not change this interface)?
>>>>  */
>>>> int pkt_sgmt_alloc(uint32_t size, pkt_sgmt_t *returned_sgmt);
>>>
>>>
>>> Presumably the driver is maintaining rings of odp_segment_t's that it has 
>>> allocated. Currently allocations are from ODP pools that are of various 
>>> pool types. So one question is do we define a new pool type 
>>> (ODP_POOL_SEGMENT) of this existing odp_pool_t or do we introduce a 
>>> separate odp_segment_pool_t type to represent an aggregate area of memory 
>>> from which odp_segment_t's are drawn?
>>
>>
>> If I remember right, a pktio is given a RX pool at creation time (or maybe 
>> this is now a per queue thing, I have to check). But anyway, it make indeed 
>> sense that a  pkt_sgmt_alloc() would take its segments for the memory 
>> associated with the RX queue.
>
>
> An odp_pool_t is passed as a parameter to odp_pktio_open(). I think we just 
> need to add a new odp_segment_alloc() API that takes a compatible odp_pool_t 
> (of type ODP_POOL_PACKET) and returns an odp_segment_t that can later be 
> assembled into an odp_packet_t.

Yes. I think we have relatively close opinion on that :-)

>
>>
>>
>>>
>>> Alternately, do we provide an extended set of semantics on the current 
>>> ODP_POOL_PACKET pools? That might be cleaner given how odp_segment_t 
>>> objects need to be convertible to and from odp_packet_seg_t objects. This 
>>> means that either (a) applications must pass an odp_pool_t to the driver as 
>>> part of odp_pktio_open() processing, or (b) the driver itself calls 
>>> odp_pool_create() and communicates that back to the application.
>>
>>
>> I thought that was already the case? where are packet received on pktio put 
>> today? Doesn't this relationship exist already today?
>
>
> Yes, see my above comment.
>
>>
>>
>>>
>>>
>>>>
>>>>
>>>> /*
>>>>  * another variant of the above function could be:
>>>>  * returns the number of allocated segments on success or 0 on error.
>>>>  */
>>>> int pkt_sgmt_alloc_multi(uint32_t size, pkt_sgmt_t *returned_sgmts,
>>>>                          int* nb_sgmts);
>>>>
>>>> /*
>>>>  * creating ODP packets from the segments:
>>>>  * Once a series of segments belonging to a single received packet is
>>>>  * fully received (note that this serie can be of lengh 1 if the received
>>>>  * packet fitted in a single segment), we need a function to create the
>>>>  * ODP packet from the list of segments.
>>>>  * We first define the "pkt_sgmt_hint" structure, which can be used by
>>>>  * a NIC to pass information about the received packet (the HW probably
>>>>  * knows a lot about the received packet so the SW does not nesseceraly
>>>>  * need to reparse it: the hint struct contains info which is already known
>>>>  * by the HW. If hint is NULL when calling pkt_sgmt_join(), then the SW has
>>>>  * to reparse the received packet from scratch.
>>>>  * pkt_sgmt_join() returns 0 on success.
>>>>  */
>>>> typedef struct {
>>>>         /* ethtype, crc_ok, L2 and L3 offset, ip_crc_ok, ... */
>>>> } pkt_sgmt_hint;
>>>
>>>
>>> I think the main point here is that the driver may have information 
>>> relating to the received packet that should populate the packet metadata as 
>>> part of creating the odp_packet_t from the individual odp_segment_t's that 
>>> contain the packet data. At a gross level we'd expect this sort of 
>>> processing:
>>>
>>> odp_packet_t driver_rx() {
>>>          ...receive a packet into a collection of odp_segment_t's
>>>          return odp_packet_assemble(seg_list, seg_count, ...);
>>> }
>>>
>>> Where odp_packet_assemble() would have the signature:
>>>
>>> odp_packet_t odp_packet_assemble(odp_segment_t seg_list[], int seg_count, 
>>> ...);
>>>
>>> This would be a new API that, unlike odp_packet_alloc(), takes an existing 
>>> list of odp_segment_t's and associated info to return an odp_packet_t.
>>>
>>> The driver_rx() interface would be called by the scheduler as it polls the 
>>> pktio and is then processed by issuing the call:
>>>
>>> odp_packet_classify(pkt, 0, ODP_PARSE_L2, ODP_NO_REBUFFER);
>>>
>>> where this API is as proposed in the pipeline design doc:
>>>
>>> int odp_packet_classify(odp_packet_t pkt, int offset,
>>>
>>>                        enum odp_packet_layer,
>>>
>>>                        int rebuffer);
>>>
>>> This call releases the packet from the scheduler by populating the packet 
>>> metadata and enqueuing it on an appropriate (application) receive queue 
>>> according to its CoS.
>>
>>
>> I think we seem to agree here. This is just making it 2 function call 
>> instead of one. I'like to keep the notion of odp_packet off the driver 
>> knowledge though, as all these are south interface functions (actually I 
>> should have prefixed all my symbols by odpdrv_* to make that clear). But 
>> even it the driver should not know about the odp_packets (and all the stuff 
>> and application using the north interface can do) it will be given the 
>> possibility to tell which segments form a packet and to set some the the 
>> packet attribute. That can be done by 2 separates function calls of course...
>
>
> Yes, just need to work out the details, but the driver's job should be simply 
> to produce/consume odp_packet_t's to/from the rest of ODP while operating the 
> NIC.
>
>>
>>
>>>
>>>
>>>>
>>>>
>>>> int pkt_sgmt_join(pkt_sgmt_hint *hint,
>>>>                   pkt_sgmt_t *segments, int nb_segments,
>>>>                   odp_packet_t *returned_packet);
>>>>
>>>> /* another variant of the above, directely passing the packet to a given 
>>>> queue*/
>>>> int pkt_sgmt_join_and_send(pkt_sgmt_hint *hint,
>>>>                            pkt_sgmt_t *segments, int nb_segments,
>>>>                            odp_queue_t *dest_queue);
>>>
>>>
>>> As noted above, we probably don't want the driver itself concerned with 
>>> destination queues since that's really the job of the classifier.
>>
>>
>> This is actually a tricky thing as part (all?) of the classification can be 
>> done by most nic HW.... To start with I'll need a place to put the packets 
>> to... maybe the entrance of a SW classifier to start with...We'll need to 
>> discuss that... But I am afraid we are touching the top of an uncomfortable 
>> iceberg here...because I assume people will be willing to be using their HW 
>> (NIC) ability to perform classification, so eventually the classification 
>> might go (at least partly) down to the driver...
>
>
> What NICs do things like choose which ring to place an arriving packet on 
> (RSS) or a few fancier things like header splitting (where individual packets 
> are stored in two different rings). But packet rings are not an ODP concept, 
> so the driver still needs to produce an odp_packet_t as the end product.

No. odp_packet_t is not a south interface concept. We may have an
odpdrv_packet_t, which may just be a cast to the former, even if I
don't really think this is best: packet fragmentation /reassembly will
need to be performed by all drivers, so maybe these should remain ODP
internal functions and the driver interface more focused on segments.

> Things like which ring it arrived on is the sort of metadata hint that the 
> classifier might use in shortcutting CoS selection, but this is still a 
> classification function even if some of the preprocessing is done by the NIC. 
> That's why odp_packet_classify() might take some extra hint arguments to 
> capture this work.
>
>>
>>
>>>
>>>
>>>>
>>>>
>>>>
>>>> /* FOR TX: */
>>>> /*
>>>>  * Function returning a list of segments making an odp_packet:
>>>>  * return the number of segments or 0 on error:
>>>>  * The segments are returned in the segments[] array, whose length will
>>>>  * never exceed max_nb_segments.
>>>>  */
>>>> int pkt_sgmt_get(odp_pool_t *packet, pkt_sgmt_t *segments, int 
>>>> max_nb_segments);
>>>>
>>>> /*
>>>>  * "free" a segment
>>>>  */
>>>> /*
>>>>  * For linux-generic, that would just do nothing, unless 
>>>> segment->odp_private
>>>>  * is not NULL, in which case the whole ODP packet is freed.
>>>>  */
>>>> int pkt_sgmt_free(pkt_sgmt_t *segment);
>>>> int pkt_sgmt_free_multi(pkt_sgmt_t *segments, int nb_segments);
>>>>
>>>
>>> We already have APIs for getting and walking the list of odp_packet_seg_t's 
>>> associated with a packet. An API that would do a bulk "disassemble" of an 
>>> odp_packet_t into individual odp_segment_t's would be reasonable here. We 
>>> do need to be careful about freeing, however, as this is going to intersect 
>>> with whatever we do with packet references.
>>
>>
>> I guess my comments above clarify my view on that: first, these new set of 
>> functions belong to the south interface and are unrelated to the north 
>> interface functions (but they could of course share as much as they want 
>> implementationwise).
>> Hopefully, the asymetry I discussed above clarify my views regarding freeing 
>> the segments.
>
>
> Yes, based on earlier discussion I think the TX side is simpler here since 
> there's no need to disassemble a packet as part of TX processing.
>
>>
>>
>>>
>>> Something to explore more fully.
>>>
>> Yes :-)
>>
>> Thanks for your comments. Hope my answers make sense.
>>
>> Christophe
>
>

At leas, we won't get bored at LAS :-)

Thanks for your comments,

Christophe.

Re: [lng-odp] RFC: packet interface to drivers

Reply via email to