Re: [lng-odp] RFC: packet interface to drivers

2016-09-22 Thread Christophe Milard
On 21 September 2016 at 17:55, Bill Fischofer  wrote:
>
>
>
> On Tue, Sep 20, 2016 at 10:16 AM, Christophe Milard 
>  wrote:
>>
>>
>>
>> On 20 September 2016 at 16:01, Bill Fischofer  
>> wrote:
>>>
>>>
>>>
>>> On Tue, Sep 20, 2016 at 8:30 AM, Christophe Milard 
>>>  wrote:

 Hi,

 I am here trying to make a summary of what is needed by the driver 
 interface
 regarding odp packet handling. Will serve as the base for the discussions
 at connect. Please read and comment... possibly at connect...

 /Christophe

 From the driver perspective, the situation is rather simple: what we need 
 is:

 /* definition of a packet segment descriptor:
  * A packet segment is just an area, continuous in virtual address space,
  * and continuous in the physical address space -at least when no iommu is
  * used, e.g for virtio-. Probably we want to have physical continuity in
  * all cases (to avoid handling different cases to start with), but that
  * would not take advantage of the remapping that can be done by iommus,
  * so it can come with a little performance penalty for iommu cases.
>>>
>>>
>>> I thought we had discussed and agreed that ODP would assume it is running 
>>> on a platform with IOMMU capability? Are there any non-IOMMU platforms of 
>>> interest that we need to support? If not, then I see no need to make this 
>>> provision. In ODP we already have an odp_packet_seg_t type that represents 
>>> a portion of an odp_packet_t that can be contiguously addressed.
>>
>>
>> yes. we did. but then the focus changed to virtio. there is no iommu there...
>
>
> I thought virtio is independent of the underlying HW. If we assume the 
> underlying HW has an IOMMU, then virtio should see the benefits of that, no?


I wish you were right, but this is not my understanding: The iommu is
a physical device that sits between the IO HW (the pci bus assuming
pci) and the physical memory. The virtio communication between a
switch in a host system and odp running in a guest only involves
memory. The physical iommu is not usable there, as I understand. It
seems there are ongoing initiative to emulate iommu in qemu/kvm so
that guests could use the iommu/dma interface on all drivers, but I
doesn't seem mature yet.
I'd be happy to hear what virtualization gurus say on this topic, I
have to admit.
But since our focus moved from physical pci NIC to host to guest
access with virtio, my understandling is that we'll have to cope with
guest physical memory. sadly.
>
>
>>
>>
>>>
>>>

  * Segments are shared among all odp threads (including linux processes),
>>>
>>>
>>> Might be more precise to simply say "segments are accessible to all odp 
>>> threads". Sharing implies simultaneous access, along with some notion of 
>>> coherence, which is something that probably isn't needed.
>>
>>
>> Tell me if I am wrong, but the default in ODP is that a queue access can be 
>> shared between different ODP thread (there is a flag  to garantee 
>> 1thread<->1queue access -and hence to have performance benefit-), but as it 
>> is now, nothing
>
>
> Yes, queues store events and can be shared among threads, but remember that 
> what's on a queue is an odp_event_t, *not* an address. Events of most 
> interest to drivers are packets, which are of type odp_packet_t and are 
> obtained via the odp_packet_from_event() API. An odp_packet_t cannot be used 
> to access memory since it is not an address but an opaque type. If the driver 
> (or anyone else) needs to address the contents of an odp_packet_t, it calls a 
> data access function like odp_packet_data(), odp_packet_offset(), 
> odp_packet_seg_data(), etc., which returns a void * that can be used for 
> memory access. How this is done is implementation-defined, but the intended 
> user of the returned address is solely the calling thread.


I don't really agree with that: Not that I see any error in what you
said, but all this is valid on the north API: what we decide to show
to a driver on the south API does not need to be the same: the
abstract notion of event is probably of no interest for a driver: even
the odp_packet_t don't need to be the same as the odpdrv_packet_t (or
whatever we call it, if there is a need for it): If these 2 objects
most likely refer to the same physical memory (at least on system
where packets are on main CPU memory), many packet manipulation
methods available on the north API interface are of no interest on the
south interface. Likewise, many methods on the south interface (such
as those splitting the packet in its physically contiguous segments)
have to reasons to be known on the north interface.
Is there any reason to abstract the packet type and then use a method
to get an address when ALL drivers are known to be willing to get this
address?
My point is: the north application 

Re: [lng-odp] RFC: packet interface to drivers

2016-09-21 Thread Bill Fischofer
On Tue, Sep 20, 2016 at 10:16 AM, Christophe Milard <
christophe.mil...@linaro.org> wrote:

>
>
> On 20 September 2016 at 16:01, Bill Fischofer 
> wrote:
>
>>
>>
>> On Tue, Sep 20, 2016 at 8:30 AM, Christophe Milard <
>> christophe.mil...@linaro.org> wrote:
>>
>>> Hi,
>>>
>>> I am here trying to make a summary of what is needed by the driver
>>> interface
>>> regarding odp packet handling. Will serve as the base for the discussions
>>> at connect. Please read and comment... possibly at connect...
>>>
>>> /Christophe
>>>
>>> From the driver perspective, the situation is rather simple: what we
>>> need is:
>>>
>>> /* definition of a packet segment descriptor:
>>>  * A packet segment is just an area, continuous in virtual address space,
>>>  * and continuous in the physical address space -at least when no iommu
>>> is
>>>  * used, e.g for virtio-. Probably we want to have physical continuity in
>>>  * all cases (to avoid handling different cases to start with), but that
>>>  * would not take advantage of the remapping that can be done by iommus,
>>>  * so it can come with a little performance penalty for iommu cases.
>>>
>>
>> I thought we had discussed and agreed that ODP would assume it is running
>> on a platform with IOMMU capability? Are there any non-IOMMU platforms of
>> interest that we need to support? If not, then I see no need to make this
>> provision. In ODP we already have an odp_packet_seg_t type that represents
>> a portion of an odp_packet_t that can be contiguously addressed.
>>
>
> yes. we did. but then the focus changed to virtio. there is no iommu
> there...
>

I thought virtio is independent of the underlying HW. If we assume the
underlying HW has an IOMMU, then virtio should see the benefits of that, no?


>
>
>>
>>
>>>  * Segments are shared among all odp threads (including linux processes),
>>>
>>
>> Might be more precise to simply say "segments are accessible to all odp
>> threads". Sharing implies simultaneous access, along with some notion of
>> coherence, which is something that probably isn't needed.
>>
>
> Tell me if I am wrong, but the default in ODP is that a queue access can
> be shared between different ODP thread (there is a flag  to garantee
> 1thread<->1queue access -and hence to have performance benefit-), but as it
> is now, nothing
>

Yes, queues store events and can be shared among threads, but remember that
what's on a queue is an odp_event_t, *not* an address. Events of most
interest to drivers are packets, which are of type odp_packet_t and are
obtained via the odp_packet_from_event() API. An odp_packet_t cannot be
used to access memory since it is not an address but an opaque type. If the
driver (or anyone else) needs to address the contents of an odp_packet_t,
it calls a data access function like odp_packet_data(),
odp_packet_offset(), odp_packet_seg_data(), etc., which returns a void *
that can be used for memory access. How this is done is
implementation-defined, but the intended user of the returned address is
solely the calling thread.


> prevent thread A to but something in the TX ring buffer and thread B to
> free the TX'ed data when putting its own stuff in the same TX queue. Same
> shareability in RX.
>

The only way packets are freed is via the odp_packet_free() API, which
takes an odp_packet_t as an argument, not a data address. odp_packet_t's
are freely shareable among threads in the same ODP instance. So Thread B's
call is independent of what the odp_packet_t represents.


> With these ODP assumptions, we have to access the segments from  different
> ODP threads. I would be very pleased to be wrong here :-)
>

If you wish to access segments from different threads you do so via the
same API calls that any thread uses: odp_packet_data(), etc., which take an
odp_packet_t and returns a void * valid for the caller.


> Maybe I should say that I don't think it is an option to have a context
> switch at each driver "access", i.e. I don't see a driver as its own ODP
> thread/linux process being accessed by some IPC: For me, any ODPthread
> sending/receiving packet will act as a driver (same context).
>

Agreed. The driver effectively runs under the calling thread, either
directly, for poll-mode I/O, or indirectly via the scheduler (RX) or
traffic manager (TX). All use the same odp_packet_t handles either directly
or packaged as odp_event_t's when queues are involved.


>
>
>>
>>
>>>  * and are guaranteed to be mapped at the same virtual address space in
>>>  * all ODP instances (single_va flag in ishm) */
>>>
>>
>> Why is this important? How does Thread A know how a segment is accessible
>> by Thread B, and does it care?
>>
>
> I am afraid it is with regard to my previous answer. If addresses of
> segment (and packets) differ from thread to thread, no reference via shared
> pointer will be possible between the ODP threads acting as driver =>loss in
> efficiency.
>

That may or may not be true, and while it may be part of a given

Re: [lng-odp] RFC: packet interface to drivers

2016-09-20 Thread Christophe Milard
On 20 September 2016 at 16:01, Bill Fischofer 
wrote:

>
>
> On Tue, Sep 20, 2016 at 8:30 AM, Christophe Milard <
> christophe.mil...@linaro.org> wrote:
>
>> Hi,
>>
>> I am here trying to make a summary of what is needed by the driver
>> interface
>> regarding odp packet handling. Will serve as the base for the discussions
>> at connect. Please read and comment... possibly at connect...
>>
>> /Christophe
>>
>> From the driver perspective, the situation is rather simple: what we need
>> is:
>>
>> /* definition of a packet segment descriptor:
>>  * A packet segment is just an area, continuous in virtual address space,
>>  * and continuous in the physical address space -at least when no iommu is
>>  * used, e.g for virtio-. Probably we want to have physical continuity in
>>  * all cases (to avoid handling different cases to start with), but that
>>  * would not take advantage of the remapping that can be done by iommus,
>>  * so it can come with a little performance penalty for iommu cases.
>>
>
> I thought we had discussed and agreed that ODP would assume it is running
> on a platform with IOMMU capability? Are there any non-IOMMU platforms of
> interest that we need to support? If not, then I see no need to make this
> provision. In ODP we already have an odp_packet_seg_t type that represents
> a portion of an odp_packet_t that can be contiguously addressed.
>

yes. we did. but then the focus changed to virtio. there is no iommu
there...


>
>
>>  * Segments are shared among all odp threads (including linux processes),
>>
>
> Might be more precise to simply say "segments are accessible to all odp
> threads". Sharing implies simultaneous access, along with some notion of
> coherence, which is something that probably isn't needed.
>

Tell me if I am wrong, but the default in ODP is that a queue access can be
shared between different ODP thread (there is a flag  to garantee
1thread<->1queue access -and hence to have performance benefit-), but as it
is now, nothing prevent thread A to but something in the TX ring buffer and
thread B to free the TX'ed data when putting its own stuff in the same TX
queue. Same shareability in RX.
With these ODP assumptions, we have to access the segments from  different
ODP threads. I would be very pleased to be wrong here :-)
Maybe I should say that I don't think it is an option to have a context
switch at each driver "access", i.e. I don't see a driver as its own ODP
thread/linux process being accessed by some IPC: For me, any ODPthread
sending/receiving packet will act as a driver (same context).


>
>
>>  * and are guaranteed to be mapped at the same virtual address space in
>>  * all ODP instances (single_va flag in ishm) */
>>
>
> Why is this important? How does Thread A know how a segment is accessible
> by Thread B, and does it care?
>

I am afraid it is with regard to my previous answer. If addresses of
segment (and packets) differ from thread to thread, no reference via shared
pointer will be possible between the ODP threads acting as driver =>loss in
efficiency.

>
>
>>  * Note that this definition just implies that a packet segment is
>> reachable
>>  * by the driver. A segment could actually be part of a HW IO chip in a HW
>>  * accelerated HW.
>>
>
> I think this is the key. All that (should) be needed is for a driver to be
> able to access any segment that it is working with. How it does so would
> seem to be secondary from an architectural perspective.
>

Sure. but we still have to implement something on linux-generic, and to
make it possible for other to do something good.


>
>
>> /* for linux-gen:
>>  * Segment are memory areas.
>>  * In TX, pkt_sgmt_join() put the pointer to the odp packet in the
>> 'odp_private'
>>  * element of the last segment of each packet, so that pkt_sgmt_free()
>>  * can just do nothing when odp_private is NULL and release the complete
>>  * odp packet when not null. Segments allocated with pkt_sgmt_alloc()
>>  * will have their odp_private set to NULL. The name and the 'void*' is
>>  * to make that opaque to the driver interface which really should not
>> care...
>>  * Other ODP implementation could handle that as they wish.
>>
>
> Need to elaborate on this. Currently we have an odp_packet_alloc() API
> that allocates a packet that consists of one or more segments. What seems
> to be new from the driver is the ability to allocate (and free) individual
> segments and then (a) assemble them into odp_packet_t objects or (b) remove
> them from odp_packet_t objects so that they become unaffiliated raw
> segments not associated with any odp_packet_t.
>

Yes a) is definitely needed. We have to be able to allocate segments
without telling which ODP packet they refer to: simply because we cannot
know that at alloc time (at least for some NICs) what packet segment would
relate to which packet: if we put 32 x 2K segments in a RX buffer, this can
result as one single ODP packet using them all (for a 64K jumbo frame) or

Re: [lng-odp] RFC: packet interface to drivers

2016-09-20 Thread Bill Fischofer
On Tue, Sep 20, 2016 at 8:30 AM, Christophe Milard <
christophe.mil...@linaro.org> wrote:

> Hi,
>
> I am here trying to make a summary of what is needed by the driver
> interface
> regarding odp packet handling. Will serve as the base for the discussions
> at connect. Please read and comment... possibly at connect...
>
> /Christophe
>
> From the driver perspective, the situation is rather simple: what we need
> is:
>
> /* definition of a packet segment descriptor:
>  * A packet segment is just an area, continuous in virtual address space,
>  * and continuous in the physical address space -at least when no iommu is
>  * used, e.g for virtio-. Probably we want to have physical continuity in
>  * all cases (to avoid handling different cases to start with), but that
>  * would not take advantage of the remapping that can be done by iommus,
>  * so it can come with a little performance penalty for iommu cases.
>

I thought we had discussed and agreed that ODP would assume it is running
on a platform with IOMMU capability? Are there any non-IOMMU platforms of
interest that we need to support? If not, then I see no need to make this
provision. In ODP we already have an odp_packet_seg_t type that represents
a portion of an odp_packet_t that can be contiguously addressed.


>  * Segments are shared among all odp threads (including linux processes),
>

Might be more precise to simply say "segments are accessible to all odp
threads". Sharing implies simultaneous access, along with some notion of
coherence, which is something that probably isn't needed.


>  * and are guaranteed to be mapped at the same virtual address space in
>  * all ODP instances (single_va flag in ishm) */
>

Why is this important? How does Thread A know how a segment is accessible
by Thread B, and does it care?


>  * Note that this definition just implies that a packet segment is
> reachable
>  * by the driver. A segment could actually be part of a HW IO chip in a HW
>  * accelerated HW.
>

I think this is the key. All that (should) be needed is for a driver to be
able to access any segment that it is working with. How it does so would
seem to be secondary from an architectural perspective.


> /* for linux-gen:
>  * Segment are memory areas.
>  * In TX, pkt_sgmt_join() put the pointer to the odp packet in the
> 'odp_private'
>  * element of the last segment of each packet, so that pkt_sgmt_free()
>  * can just do nothing when odp_private is NULL and release the complete
>  * odp packet when not null. Segments allocated with pkt_sgmt_alloc()
>  * will have their odp_private set to NULL. The name and the 'void*' is
>  * to make that opaque to the driver interface which really should not
> care...
>  * Other ODP implementation could handle that as they wish.
>

Need to elaborate on this. Currently we have an odp_packet_alloc() API that
allocates a packet that consists of one or more segments. What seems to be
new from the driver is the ability to allocate (and free) individual
segments and then (a) assemble them into odp_packet_t objects or (b) remove
them from odp_packet_t objects so that they become unaffiliated raw
segments not associated with any odp_packet_t.

So it seems we need a corresponding set of odp_segment_xxx() APIs that
operate on a new base type: odp_segment_t. An odp_segment_t becomes an
odp_packet_seg_t when it (and possibly other segments) are converted into
an odp_packet_t as part of a packet assembly operation. Conversely, an
odp_packet_seg_t becomes an odp_segment_t when it is disconnected from an
odp_packet_t.


>  */
>
> typedef uint64_t phy_address_t;
>
> typedef struct{
> void*address;
> phy_address_t   phy_addr;
> uint32_tlen;
> void*   odp_private;
> } pkt_sgmt_t;
>
> /* FOR RX: */
> /* segment allocation function:
>  * As it is not possible to guarantee physical memory continuity from
>  * user space, this segment alloc function is best effort:
>  * The size passed in parameter is a hint of what the most probable
> received
>  * packet size could be: this alloc function will allocate a segment whose
> size
>  * will be greater or equal to the required size if the latter can fit in
>  * a single page (or huge page), hence guarateeing the segment physical
>  * continuity.
>  * If there is no physical page large enough for 'size' bytes, then
>  * the largest page is returned, meaning that in that case the allocated
>  * segment will be smaller than the required size. (the received packet
>  * will be fragmented in this case).
>  * This pkt_sgmt_alloc function is called by the driver RX side to populate
>  * the NIC RX ring buffer(s).
>  * returns the number of allocated segments (1) on success or 0 on error.
>  * Note: on unix system with 2K and 2M pages, this means that 2M will get
>  * allocated for each large (64K?) packet... to much waste? should we
> handle
>  * page fragmentation (which would really not change this interface)?
>  */
> int 

[lng-odp] RFC: packet interface to drivers

2016-09-20 Thread Christophe Milard
Hi,

I am here trying to make a summary of what is needed by the driver interface
regarding odp packet handling. Will serve as the base for the discussions
at connect. Please read and comment... possibly at connect...

/Christophe

>From the driver perspective, the situation is rather simple: what we need is:

/* definition of a packet segment descriptor:
 * A packet segment is just an area, continuous in virtual address space,
 * and continuous in the physical address space -at least when no iommu is
 * used, e.g for virtio-. Probably we want to have physical continuity in
 * all cases (to avoid handling different cases to start with), but that
 * would not take advantage of the remapping that can be done by iommus,
 * so it can come with a little performance penalty for iommu cases.
 * Segments are shared among all odp threads (including linux processes),
 * and are guaranteed to be mapped at the same virtual address space in
 * all ODP instances (single_va flag in ishm) */
 * Note that this definition just implies that a packet segment is reachable
 * by the driver. A segment could actually be part of a HW IO chip in a HW
 * accelerated HW.
/* for linux-gen:
 * Segment are memory areas.
 * In TX, pkt_sgmt_join() put the pointer to the odp packet in the 'odp_private'
 * element of the last segment of each packet, so that pkt_sgmt_free()
 * can just do nothing when odp_private is NULL and release the complete
 * odp packet when not null. Segments allocated with pkt_sgmt_alloc()
 * will have their odp_private set to NULL. The name and the 'void*' is
 * to make that opaque to the driver interface which really should not care...
 * Other ODP implementation could handle that as they wish.
 */

typedef uint64_t phy_address_t;

typedef struct{
void*address;
phy_address_t   phy_addr;
uint32_tlen;
void*   odp_private;
} pkt_sgmt_t;

/* FOR RX: */
/* segment allocation function:
 * As it is not possible to guarantee physical memory continuity from
 * user space, this segment alloc function is best effort:
 * The size passed in parameter is a hint of what the most probable received
 * packet size could be: this alloc function will allocate a segment whose size
 * will be greater or equal to the required size if the latter can fit in
 * a single page (or huge page), hence guarateeing the segment physical
 * continuity.
 * If there is no physical page large enough for 'size' bytes, then
 * the largest page is returned, meaning that in that case the allocated
 * segment will be smaller than the required size. (the received packet
 * will be fragmented in this case).
 * This pkt_sgmt_alloc function is called by the driver RX side to populate
 * the NIC RX ring buffer(s).
 * returns the number of allocated segments (1) on success or 0 on error.
 * Note: on unix system with 2K and 2M pages, this means that 2M will get
 * allocated for each large (64K?) packet... to much waste? should we handle
 * page fragmentation (which would really not change this interface)?
 */
int pkt_sgmt_alloc(uint32_t size, pkt_sgmt_t *returned_sgmt);

/*
 * another variant of the above function could be:
 * returns the number of allocated segments on success or 0 on error.
 */
int pkt_sgmt_alloc_multi(uint32_t size, pkt_sgmt_t *returned_sgmts,
 int* nb_sgmts);

/*
 * creating ODP packets from the segments:
 * Once a series of segments belonging to a single received packet is
 * fully received (note that this serie can be of lengh 1 if the received
 * packet fitted in a single segment), we need a function to create the
 * ODP packet from the list of segments.
 * We first define the "pkt_sgmt_hint" structure, which can be used by
 * a NIC to pass information about the received packet (the HW probably
 * knows a lot about the received packet so the SW does not nesseceraly
 * need to reparse it: the hint struct contains info which is already known
 * by the HW. If hint is NULL when calling pkt_sgmt_join(), then the SW has
 * to reparse the received packet from scratch.
 * pkt_sgmt_join() returns 0 on success.
 */
typedef struct {
/* ethtype, crc_ok, L2 and L3 offset, ip_crc_ok, ... */
} pkt_sgmt_hint;

int pkt_sgmt_join(pkt_sgmt_hint *hint,
  pkt_sgmt_t *segments, int nb_segments,
  odp_packet_t *returned_packet);

/* another variant of the above, directely passing the packet to a given queue*/
int pkt_sgmt_join_and_send(pkt_sgmt_hint *hint,
   pkt_sgmt_t *segments, int nb_segments,
   odp_queue_t *dest_queue);


/* FOR TX: */
/*
 * Function returning a list of segments making an odp_packet:
 * return the number of segments or 0 on error:
 * The segments are returned in the segments[] array, whose length will
 * never exceed max_nb_segments.
 */
int pkt_sgmt_get(odp_pool_t *packet, pkt_sgmt_t *segments, int max_nb_segments);

/*
 * "free" a segment
 */
/*
 * For linux-generic,