Hi, I am here trying to make a summary of what is needed by the driver interface regarding odp packet handling. Will serve as the base for the discussions at connect. Please read and comment... possibly at connect...
/Christophe >From the driver perspective, the situation is rather simple: what we need is: /* definition of a packet segment descriptor: * A packet segment is just an area, continuous in virtual address space, * and continuous in the physical address space -at least when no iommu is * used, e.g for virtio-. Probably we want to have physical continuity in * all cases (to avoid handling different cases to start with), but that * would not take advantage of the remapping that can be done by iommus, * so it can come with a little performance penalty for iommu cases. * Segments are shared among all odp threads (including linux processes), * and are guaranteed to be mapped at the same virtual address space in * all ODP instances (single_va flag in ishm) */ * Note that this definition just implies that a packet segment is reachable * by the driver. A segment could actually be part of a HW IO chip in a HW * accelerated HW. /* for linux-gen: * Segment are memory areas. * In TX, pkt_sgmt_join() put the pointer to the odp packet in the 'odp_private' * element of the last segment of each packet, so that pkt_sgmt_free() * can just do nothing when odp_private is NULL and release the complete * odp packet when not null. Segments allocated with pkt_sgmt_alloc() * will have their odp_private set to NULL. The name and the 'void*' is * to make that opaque to the driver interface which really should not care... * Other ODP implementation could handle that as they wish. */ typedef uint64_t phy_address_t; typedef struct{ void *address; phy_address_t phy_addr; uint32_t len; void* odp_private; } pkt_sgmt_t; /* FOR RX: */ /* segment allocation function: * As it is not possible to guarantee physical memory continuity from * user space, this segment alloc function is best effort: * The size passed in parameter is a hint of what the most probable received * packet size could be: this alloc function will allocate a segment whose size * will be greater or equal to the required size if the latter can fit in * a single page (or huge page), hence guarateeing the segment physical * continuity. * If there is no physical page large enough for 'size' bytes, then * the largest page is returned, meaning that in that case the allocated * segment will be smaller than the required size. (the received packet * will be fragmented in this case). * This pkt_sgmt_alloc function is called by the driver RX side to populate * the NIC RX ring buffer(s). * returns the number of allocated segments (1) on success or 0 on error. * Note: on unix system with 2K and 2M pages, this means that 2M will get * allocated for each large (64K?) packet... to much waste? should we handle * page fragmentation (which would really not change this interface)? */ int pkt_sgmt_alloc(uint32_t size, pkt_sgmt_t *returned_sgmt); /* * another variant of the above function could be: * returns the number of allocated segments on success or 0 on error. */ int pkt_sgmt_alloc_multi(uint32_t size, pkt_sgmt_t *returned_sgmts, int* nb_sgmts); /* * creating ODP packets from the segments: * Once a series of segments belonging to a single received packet is * fully received (note that this serie can be of lengh 1 if the received * packet fitted in a single segment), we need a function to create the * ODP packet from the list of segments. * We first define the "pkt_sgmt_hint" structure, which can be used by * a NIC to pass information about the received packet (the HW probably * knows a lot about the received packet so the SW does not nesseceraly * need to reparse it: the hint struct contains info which is already known * by the HW. If hint is NULL when calling pkt_sgmt_join(), then the SW has * to reparse the received packet from scratch. * pkt_sgmt_join() returns 0 on success. */ typedef struct { /* ethtype, crc_ok, L2 and L3 offset, ip_crc_ok, ... */ } pkt_sgmt_hint; int pkt_sgmt_join(pkt_sgmt_hint *hint, pkt_sgmt_t *segments, int nb_segments, odp_packet_t *returned_packet); /* another variant of the above, directely passing the packet to a given queue*/ int pkt_sgmt_join_and_send(pkt_sgmt_hint *hint, pkt_sgmt_t *segments, int nb_segments, odp_queue_t *dest_queue); /* FOR TX: */ /* * Function returning a list of segments making an odp_packet: * return the number of segments or 0 on error: * The segments are returned in the segments[] array, whose length will * never exceed max_nb_segments. */ int pkt_sgmt_get(odp_pool_t *packet, pkt_sgmt_t *segments, int max_nb_segments); /* * "free" a segment */ /* * For linux-generic, that would just do nothing, unless segment->odp_private * is not NULL, in which case the whole ODP packet is freed. */ int pkt_sgmt_free(pkt_sgmt_t *segment); int pkt_sgmt_free_multi(pkt_sgmt_t *segments, int nb_segments);