On 4/20/15, 8:19 AM, "Wiles, Keith" <keith.wiles at intel.com> wrote:
> > >From: Marc Sune <marc.sune at bisdn.de<mailto:marc.sune at bisdn.de>> >Date: Monday, April 20, 2015 at 1:51 AM >To: Keith Wiles <keith.wiles at intel.com<mailto:keith.wiles at intel.com>>, >"dev at dpdk.org<mailto:dev at dpdk.org>" <dev at dpdk.org<mailto:dev at >dpdk.org>> >Subject: Re: [dpdk-dev] [RFC PATCH 0/4] pktdev > > > >On 17/04/15 21:50, Wiles, Keith wrote: > >Hi Marc and Bruce, > >Hi Keith, Bruce, > > >On 4/17/15, 1:49 PM, "Marc Sune" ><marc.sune at bisdn.de><mailto:marc.sune at bisdn.de> wrote: > > > >On 17/04/15 17:16, Bruce Richardson wrote: > > >Hi all, > >to continue this discussion a bit more, here is my, slightly different, >slant >on what a pktdev abstraction may look like. > >The primary objective I had in mind when drafting this is to provide the >minimal abstraction that can be *easily* used as a common device >abstraction for >existing (and future) device types to be passed to dataplane code. The >patchset >demonstrates this by defining a minimal interface for pktdev - since I >firmly >believe the interface should be as small as possible - and then showing >how that >common interface can be used to unify rings and ethdevs under a common >API for the >datapath. I believe any attempt to unify things much beyond this to the >control >plane or setup phase is not worth doing - at least not initially - as at >init time the code always needs to be aware of the underlying resource >type in >order to configure it properly for dataplane use. > >The overall objective I look to achieve is illustrated by the final >patch in >the series, which is a sample app where the same code is used for all >cores, >irrespective of the underlying device type. > >To get to that point, patch 1 defines the minimal API - just RX and TX. >The .c >file in the library is empty for simplicity, though I would see some >functionality moving there when/if it makes sense e.g. the callback >support >from ethdev, as is done in Keith's patchset. >Patch 2 then makes very minimal changes to ethdev to allow ethdevs to >be used >as pktdevs, and to make use of the pktdev functions when appropriate >Patch 3 was, for me, the key test for this implementation - how hard >was it to >make an rte_ring usable as a pktdev too. Two single-line functions for >RX/TX >and a separate "converter" function proved to be all that was necessary >here - >and I believe simpler solutions may be possible too, as the extra >structures >allocated on conversion could be merged into the rte_ring structure >itself and >initialized on ring creation if we prefer that option. It is >hoped/presumed that >wrapping other structures, such as KNI, may prove to be just as easily >done. >[Not attempted yet - left as an exercise for the reader :-)]. > >Now, in terms of pktdev vs ethdev, there is nothing in this proposal >that >cannot also be done using ethdev AFAIK. However, pktdev as outlined here >should make the process far easier than trying to create a full PMD for >something. >All NIC specific functions, including things like stop/start, are >stripped out, >as they don't make sense for an rte_ring or other software objects. >Also, the other thing this provides is that we can move away from just >using >port ids. Instead in the same way as we now reference >rings/mempools/KNIs etc >via pointer, we can do the same with ethernet ports as pktdevs on the >data path. >There was discussion previously on moving beyond 8-bit port ids. If we >look to >use ethdev as a common abstraction, I feel that change will soon have >to be made >causing a large amount of code churn. > > >Hi Richard, > >First thank you both for taking the time to look at this. I did not not >reply to Keith because you Richard summarized most of my concerns. > >I had a brief look to this second proposal. It is more aligned to what I >had in mind. But still I feel it is slightly too complicated. I don't >like much the necessary (in your approach) MACRO-like pkt_dev_data >struct. It is also slightly inconvenient that the user has to do: > >+ struct rte_pkt_dev *in = rte_eth_get_dev(0); > >+ struct rte_pkt_dev *out = rte_ring_get_dev( >+ rte_ring_create(name, 4096, >rte_socket_id(), 0)); > > > >What about something like (~pseudo-code): > >rte_pkt_dev_data.h: > > enum rte_pkt_dev_type{ > RTE_PKT_DEV_ETH, > RTE_PKT_DEV_RING, > RTE_PKT_DEV_KNI, > //Keep adding as more PMDs are supported > }; > > > //This struct may be redundant if there is nothing more > struct rte_pkt_dev_data{ > enum rte_pkt_dev_type; > //Placeholder, maybe we need more... > }; > > //Make RX/TX pktdev APIs more readable, but not really needed > typedef void pkt_dev_t; > >(In all PMDs and e.g. KNI and RINGs): > > struct rte_eth_dev { > struct rte_pkt_dev_data pkt_dev;// ><++++++++++++++++++++++++++++++++++ > eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */ > eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. >*/ > struct rte_eth_dev_data *data; /**< Pointer to device data */ > /... > >rte_pkt_dev.h: > > #include <rte_ethdev.h> > //Include PMD (and non-PMD) TX/RX headers... > > static inline uint16_t > rte_pkt_tx_burst(pkt_dev_t* dev, uint16_t queue_id, > struct rte_mbuf **tx_pkts, uint16_t nb_pkts) > { > switch (((struct rte_pkt_dev_data*)dev)->type){ > case RTE_PKT_DEV_ETH: > struct rte_eth_dev* eth_dev = (struct >rte_eth_dev*)pkt_dev; > rte_pkt_tx_burst(eth_dev, queue_id, tx_pkts, nb_pkts); > break; > case RTE_PKT_DEV_RING: > //... > } > } > > //... > > >I do not see the functional different from my proposal other then a bit >more search/replace, which does effect some of the PMD?s. The changes to >the PMD is just to use the macros as we now have a nested structure. > >I think what I am proposing is different. See below. > Start of Keith's comment. > >The nesting and movement some structures and code to a common location is >to provide a better way to add other devices. Moving code into a common >set of structures and APIs should only improve the code as we get more >devices added to DPDK. The basic flow and design is the same. End of Keith?s comment. > >In your proposal you are movig TX/RX queues as well as some shared stuff >and, most importantly, you are using function pointers to execute the >RX/TX routines. > >What I was proposing is to try to add the minimum common shared state in >order to properly demultiplex the RX/TX call and have a common set of >abstract calls (the pkt_dev type). In a way, I was proposing to >deliberately not have a shared struct rte_dev_data because I think the >internals of the "pkt_dev" can be very different across devices (e.g. >queues in kni vs eth port vs. crypto?). I treat the pkt_dev as a "black >box" that conforms to TX/RX API, leaving the developer of that device to >define its internal structures as it better suites the needs. I only use >each of the specific device type TX/RX APIs (external to us, pkt_dev >library) in rte_pkt_dev.h. This also simplifies the refactor required to >eventually integrate the rte_pkt_dev library and builds it "on top" of >the existing APIs. > >The other important difference with both, Bruce and your approach, and >mine is the use of function pointers for RX/TX. I don't use them, which >makes the entire abstracted TX/RX (including the final RX/TX routines >itself) functions be "inlinable". Making something inlinable is not based on using a pointer or not, is that what you mean? > >Btw, I forgot to add something basic in the previous pseudo-code. The >different types have to be conditionally compiled according to >compiled-in DPDK libs: > >rte_pkt_dev.h: > > #include <rte_config.h> > > //Eth devices > #ifdef RTE_LIBRTE_ETHER > #include <rte_ethdev.h> > #endif > > //KNI > #ifdef RTE_LIBRTE_KNI > #include <rte_kni.h> > #endif > > //... > //Include PMD (and non-PMD) TX/RX headers... > > static inline uint16_t > rte_pkt_tx_burst(pkt_dev_t* dev, uint16_t queue_id, > struct rte_mbuf **tx_pkts, uint16_t nb_pkts) > { > switch (((struct rte_pkt_dev_data*)dev)->type){ > #ifdef RTE_LIBRTE_ETHER > case RTE_PKT_DEV_ETH: > struct rte_eth_dev* eth_dev = (struct >rte_eth_dev*)pkt_dev; > rte_pkt_tx_burst(eth_dev, queue_id, tx_pkts, nb_pkts); > break; > #endif > > #ifdef RTE_LIBRTE_KNI > case RTE_PKT_DEV_KNI: > //... > break; > #endif > > default: > //Corrupted type or unsupported (without compiled >support) > //Ignore or fail(fatal error)? > break; > } > } > > //... > >This adds a switch into the performance path, not a great solution IMO. > > >From rte_common_device.h (format screwed with cut/paste and not all of the >code) > > >struct rte_dev_data { > char name[RTE_DEV_NAME_MAX_LEN]; /**< Unique identifier name */ > > void **rx_queues; /**< Array of pointers to RX >queues. */ > void **tx_queues; /**< Array of pointers to TX >queues. */ > > uint16_t nb_rx_queues; /**< Number of RX queues. */ > uint16_t nb_tx_queues; /**< Number of TX queues. */ > > uint16_t mtu; /**< Maximum Transmission Unit. */ > uint8_t unit_id; /**< Unit/Port ID for this >instance */ > uint8_t _pad0; /* alignment filler */ > > void *dev_private; /**< PMD-specific private data */ > uint64_t rx_mbuf_alloc_failed; /**< RX ring mbuf allocation >failures. */ > uint32_t min_rx_buf_size; /**< Common rx buffer size >handled by all >queues */ >}; > >struct rte_dev_info { > struct rte_pci_device *pci_dev; /**< Device PCI information. >*/ > const char *driver_name; /**< Device Driver name. >*/ > unsigned int if_index; /**< Index to bound host >interface, or 0 >if none. */ > /* Use if_indextoname() to translate into an interface name. */ > unsigned int _pad0; >}; > > > >struct rte_dev { > dev_rx_burst_t rx_pkt_burst; /**< Pointer to PMD >receive >function. */ > dev_tx_burst_t tx_pkt_burst; /**< Pointer to PMD >transmit >function. */ > void *data; /**< >Pointer to device data */ > struct rte_dev_global *gbl_data; /**< Pointer to device >global data >*/ > const struct rte_dev_drv *driver; /**< Driver for this >device */ > void *dev_ops; /**< >Functions exported by PMD */ > struct rte_pci_device *pci_dev; /**< PCI info. supplied by >probing */ > struct rte_dev_info *dev_info; /**< >Pointer to the device info >structure */ > > /** User application callback for interrupts if present */ > struct rte_dev_cb_list link_intr_cbs; > /** > * User-supplied functions called from rx_burst to post-process > * received packets before passing them to the user > */ > struct rte_dev_rxtx_callback **post_rx_burst_cbs; > /** > * User-supplied functions called from tx_burst to pre-process > * received packets before passing them to the driver for >transmission. > */ > struct rte_dev_rxtx_callback **pre_tx_burst_cbs; > >What is the indended use of these callbacks and what is the benefit? If >the user is supplying the callback, why not putting it directly after the >RX and before the TX call in user's code (anyway)? Keith?s Comment. > >Just moving the callback routines next to the Rx/Tx routine pointers does >not make a lot of different in this structure, right? Keith?s comment end. > > > > enum rte_dev_type dev_type; /**< Flag >indicating the device type */ > uint8_t attached; /**< Flag indicating the >port is >attached */ >}; > >From rte_ethdev.h > >struct rte_eth_dev_info { > struct rte_dev_info di; /**< Common device information */ > > /* Rest of structure is for private device data */ > uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */ > uint32_t max_rx_pktlen; /**< Maximum configurable length of RX >pkt. */ > uint16_t max_rx_queues; /**< Maximum number of RX queues. */ > uint16_t max_tx_queues; /**< Maximum number of TX queues. */ > uint32_t max_mac_addrs; /**< Maximum number of MAC addresses. */ > uint32_t max_hash_mac_addrs; > /** Maximum number of hash MAC addresses for MTA and UTA. */ > uint16_t max_vfs; /**< Maximum number of VFs. */ > uint16_t max_vmdq_pools; /**< Maximum number of VMDq pools. */ > uint32_t rx_offload_capa; /**< Device RX offload capabilities. */ > uint32_t tx_offload_capa; /**< Device TX offload capabilities. */ > uint16_t reta_size; > /**< Device redirection table size, the total number of entries. >*/ > /** Bit mask of RSS offloads, the bit offset also means flow type >*/ > uint64_t flow_type_rss_offloads; > struct rte_eth_rxconf default_rxconf; /**< Default RX >configuration */ > struct rte_eth_txconf default_txconf; /**< Default TX >configuration */ > uint16_t vmdq_queue_base; /**< First queue ID for VMDQ pools. */ > uint16_t vmdq_queue_num; /**< Queue number for VMDQ pools. */ > uint16_t vmdq_pool_base; /**< First ID of VMDQ pools. */ >}; > >struct rte_eth_dev_data { > struct rte_dev_data dd; /**< >Common device data */ > > /* Rest of the structure is private data */ > struct rte_eth_dev_sriov sriov; /**< SRIOV data */ > > struct rte_eth_link dev_link; > /**< Link-level information & status */ > > struct rte_eth_conf dev_conf; /**< Configuration applied to >device. */ > struct ether_addr* mac_addrs; /**< Device Ethernet Link >address. */ > uint64_t mac_pool_sel[ETH_NUM_RECEIVE_MAC_ADDR]; > /** bitmap array of associating Ethernet MAC addresses to pools */ > struct ether_addr* hash_mac_addrs; > > /** Device Ethernet MAC addresses of hash filtering. */ > uint8_t promiscuous : 1, /**< RX promiscuous mode ON(1) / >OFF(0). */ > scattered_rx : 1, /**< RX of scattered packets is >ON(1) / OFF(0) */ > all_multicast: 1, /**< RX all multicast mode ON(1) / >OFF(0). */ > dev_started : 1; /**< Device state: STARTED(1) / >STOPPED(0). */ >}; > >In the case of rte_ethdev.h the rte_eth_dev turns into a macro '#define >rte_eth_dev rte_dev', in other devices they may have a rte_xyz_dev >structure rte_dev as the first member and then private members. Also later >if we find something private to ethdev then a structure can be created. > > >Most of the code changes outside of these changes are because using a >macro to fix the code was easier. Rewriting the functions to use real >variables instead of casting void pointer could yield without the macros. >Not sure as the performance impact in rewriting some of these functions to >eliminate the macros. > >This is the heart of what I am proposing and it just so happens to allow >Bruce? ideas as well. I have another patch I have not sent that includes >Bruce?s ideas using my suggestions. > >Being able to use a single API for all devices does have its benefits IMO, >but I want to cleanup ethdev to allow for other devices to be added to the >system. > >At some point (when the cryptodev is done) I want to add it to this >framework, plus add more APIs for crypto similar to Linux Crypto API or >something we can agree is a reasonable set of APIs for DPDK. > >I currently don't see the benefit of abstracting beyond the RX/TX >functions, and I see more cons than pros. Doing so we will also make >librte_pkt_dev sort of mandatory, even if you are not using it our >abstracted RX/TX functions at all. Keith Comment. > >I do not see why we would not pull these common items out. Can you list >your cons as I do not see your point here. Keith Comment > > >regards >marc > > > >Regards, > >++Keith > > >Maybe it is oversimplified? With this approach tough, we would barely >touch the PMDs and the functionality can be built on top of the existing >PMDs. > >Thoughts? >Marc > > > > >Bruce Richardson (4): > Add example pktdev implementation > Make ethdev explicitly a subclass of pktdev > add support for a ring to be a pktdev > example app showing pktdevs used in a chain > > config/common_bsdapp | 5 + > config/common_linuxapp | 5 + > examples/pktdev/Makefile | 57 +++++++++++ > examples/pktdev/basicfwd.c | 222 >+++++++++++++++++++++++++++++++++++++++++ > lib/Makefile | 1 + > lib/librte_ether/rte_ethdev.h | 26 ++--- > lib/librte_pktdev/Makefile | 56 +++++++++++ > lib/librte_pktdev/rte_pktdev.c | 35 +++++++ > lib/librte_pktdev/rte_pktdev.h | 144 ++++++++++++++++++++++++++ > lib/librte_ring/rte_ring.c | 41 ++++++++ > lib/librte_ring/rte_ring.h | 3 + > 11 files changed, 582 insertions(+), 13 deletions(-) > create mode 100644 examples/pktdev/Makefile > create mode 100644 examples/pktdev/basicfwd.c > create mode 100644 lib/librte_pktdev/Makefile > create mode 100644 lib/librte_pktdev/rte_pktdev.c > create mode 100644 lib/librte_pktdev/rte_pktdev.h > > >