Hi All, just to make a comment on my own email :-) On 4/1/15, 7:44 AM, "Wiles, Keith" <keith.wiles at intel.com> wrote:
>Hi all, (hoping format of the text is maintained) > >Bruce and myself are submitting this RFC in hopes of providing discussion >points for the idea. Please do not get carried away with the code >included, it was to help everyone understand the proposal/RFC. > >The RFC is to describe a proposed change we are looking to make to DPDK to >add more device types. We would like to add in to DPDK the idea of a >generic packet-device or ?pktdev?, which can be thought of as a thin layer >for all device classes. For other device types such as potentially a >?cryptodev? or ?dpidev?. One of the main goals is to not effect >performance and not require any current application to be modified. The >pktdev layer is providing a light framework for developers to add a device >to DPDK. > >Reason for Change >----------------- > >The reason why we are looking to introduce these concepts to DPDK are: > >* Expand the scope of DPDK so that it can provide APIs for more than just >packet acquisition and transmission, but also provide APIs that can be >used to work with other hardware and software offloads, such as >cryptographic accelerators, or accelerated libraries for cryptographic >functions. [The reason why both software and hardware are mentioned is so >that the same APIs can be used whether or not a hardware accelerator is >actually available]. >* Provide a minimal common basis for device abstraction in DPDK, that can >be used to unify the different types of packet I/O devices already >existing in DPDK. To this end, the ethdev APIs are a good starting point, >but the ethdev library contains too many functions which are NIC-specific >to be a general-purpose set of APIs across all devices. > Note: The idea was previously touched on here: >http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/13545 > >Description of Proposed Change >------------------------------ > >The basic idea behind "pktdev" is to abstract out a few common routines >and structures/members of structures by starting with ethdev structures as >a starting point, cut it down to little more than a few members in each >structure then possible add just rx_burst and tx_burst. Then use the >structures as a starting point for writing a device type. Currently we >have the rx_burst/tx_burst routines moved to the pktdev and it see like >move a couple more common functions maybe resaonable. It could be the >Rx/Tx routines in pktdev should be left as is, but in the code below is a >possible reason to abstract a few routines into a common set of files. > >From there, we have the ethdev type which adds in the existing functions >specific to Ethernet devices, and also, for example, a cryptodev which may >add in functions specific for cryptographic offload. As now, with the >ethdev, the specific drivers provide concrete implementations of the >functionality exposed by the interface. This hierarchy is shown in the >diagram below, using the existing ethdev and ixgbe drivers as a reference, >alongside a hypothetical cryptodev class and driver implementation >(catchingly called) "X": > > ,---------------------. > | struct rte_pktdev | > +---------------------+ > | rte_pkt_rx_burst() | > .-------| rte_pkt_tx_burst() |-----------. > | `---------------------' | > | | > | | > ,-------------------------------. ,------------------------------. > | struct rte_ethdev | | struct rte_cryptodev | > +-------------------------------+ +------------------------------+ > | rte_eth_dev_configure() | | rte_crypto_init_sym_session()| > | rte_eth_allmulticast_enable() | | rte_crypto_del_sym_session() | > | rte_eth_filter_ctrl() | | | > `-------------------------------' `---------------.--------------' > | | > | | > ,---------'---------------------. ,---------------'--------------. > | struct rte_pmd_ixgbe | | struct rte_pmd_X | > +-------------------------------+ +------------------------------+ > | .configure -> ixgbe_configure | | .init_session -> X_init_ses()| > | .tx_burst -> ixgbe_xmit_pkts | | .tx_burst -> X_handle_pkts() | > `-------------------------------' `------------------------------' > >We are not attempting to create a real class model here only looking at >creating a very basic common set of APIs and structures for other device >types. > >In terms of code changes for this, we obviously need to add in new >interface libraries for pktdev and cryptodev. The pktdev library can >define a skeleton structure for the first few elements of the nested >structures to ensure consistency. Each of the defines below illustrate the >common members in device structures, which gives some basic structure the >device framework. Each of the defines are placed at the top of the devices >matching structures and allows the devices to contain common and private >data. The pkdev structures overlay the first common set of members for >each device type. > >For example: >------------ > >We are using macros to reduce code changes to DPDK, but nested structures >are a better solution: > >#define RTE_PKT_COMMON_DEV(_t) > \ > pkt_rx_burst_t rx_pkt_burst; /**< Pointer to PMD >receive function. */ \ > pkt_tx_burst_t tx_pkt_burst; /**< Pointer to PMD >transmit function. */ \ > struct rte_##_t##_dev_data *data; /**< Pointer to device >data */ \ > const struct _t##_driver *driver; /**< Driver for this >device */ \ > struct _t##_dev_ops *dev_ops; /**< Functions exported by >PMD */ \ > struct rte_pci_device *pci_dev; /**< PCI info. supplied by >probing */ \ > /** User application callback for interrupts if present */ > \ > struct rte_##_t##_dev_cb_list link_intr_cbs; > \ > /** > \ > * User-supplied functions called from rx_burst to post-process > \ > * received packets before passing them to the user > \ > */ > \ > struct rte_##_t##_rxtx_callback **post_rx_burst_cbs; > \ > /** > \ > * User-supplied functions called from tx_burst to pre-process > \ > * received packets before passing them to the driver for >transmission. \ > */ > \ > struct rte_##_t##_rxtx_callback **pre_tx_burst_cbs; > \ > enum rte_pkt_dev_type dev_type; /**< Flag indicating the >device type */ \ > uint8_t attached /**< Flag indicating the >port is attached */ > /* Possible alignment or a hole in the structure */ > >#define RTE_PKT_NAME_MAX_LEN (32) > >#define RTE_PKT_COMMON_DEV_DATA > \ > char name[RTE_PKT_NAME_MAX_LEN]; /**< Unique identifier name */ > \ > > \ > void **rx_queues; /**< Array of pointers to RX queues. >*/ \ > void **tx_queues; /**< Array of pointers to TX queues. >*/ \ > uint16_t nb_rx_queues; /**< Number of RX queues. */ > \ > uint16_t nb_tx_queues; /**< Number of TX queues. */ > \ > > \ > uint16_t flags; /**< Bit fields for xyzdev's to use. >*/ \ > uint16_t mtu; /**< Maximum Transmission Unit. */ > \ > uint8_t unit_id; /**< Unit ID for this instance */ > \ > uint8_t _filler[7]; /* alignment filler */ > \ > > \ > /* 64bit alignment starts here */ > \ > void *dev_private; /**< PMD-specific private data */ > \ > uint64_t rx_mbuf_alloc_failed; /**< RX ring mbuf allocation failures. >*/ \ > uint32_t min_rx_buf_size; /**< Common rx buffer size handled by >all queues */ \ > uint32_t _pad0 > >#define port_id unit_id > >#define RTE_PKT_COMMON_DEV_INFO > \ > struct rte_pci_device *pci_dev; /**< Device PCI information. >*/ \ > const char *driver_name; /**< Device Driver name. */ > \ > unsigned int if_index; /**< Index to bound host >interface, or 0 if none. */ \ > /* Use if_indextoname() to translate into an interface name. */ > \ > uint32_t _pad0 > >The above is attempting to collect the common members to be place into the >top of private device structures as we feel these members should be fairly >common among the device types. > >/** >* @internal >* The generic data structure associated with each device. >* >* Pointers to burst-oriented packet receive and transmit functions are >* located at the beginning of the structure, along with the pointer to >* where all the data elements for the particular device are stored in >shared >* memory. This split allows the function pointer and driver data to be >per- >* process, while the actual configuration data for the device is shared. >*/ >struct rte_pkt_dev { > RTE_PKT_COMMON_DEV(pkt); >}; > >/** >* @internal >* The data part, with no function pointers, associated with each device. >* >* This structure is safe to place in shared memory to be common among >different >* processes in a multi-process configuration. >*/ >struct rte_pkt_dev_data { > RTE_PKT_COMMON_DEV_DATA; >}; > >------ > >The existing ethdev code can then have a minor updates such as those shown >below: > >struct rte_eth_dev_info { > RTE_PKT_COMMON_DEV_INFO; > > /* Private device data maybe here */ > uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */ > uint32_t max_rx_pktlen; /**< Maximum configurable length of RX pkt. */ > ... > >struct rte_eth_dev_data { > RTE_PKT_COMMON_DEV_DATA; /**< Define located in <rte_pkt.h> */ > > /* Private device data maybe here */ > struct rte_eth_dev_sriov sriov; /**< SRIOV data */ > > struct rte_eth_link dev_link; /**< Link-level information & status */ > ... > >struct rte_eth_dev { > RTE_PKT_COMMON_DEV(eth); > /* Private device data maybe here */ >}; > >/* Bit defines for flags in common pkt structure */ >#define promiscuous 0x0008 /**< RX promiscuous mode ON(1) / OFF(0). >*/ >#define scattered_rx 0x0004 /**< RX of scattered packets is ON(1) / >OFF(0) */ >#define all_multicast 0x0002 /**< RX all multicast mode ON(1) / >OFF(0). */ >#define dev_started 0x0001 /**< Device state: STARTED(1)/STOPPED(0) >*/ > >The advantage of doing a common set of member is the existing ethdev >structures and APIs can remain exactly the same, but every ethdev is also >a pktdev, which can be used as either as appropriate. Similarly for a type >of crypto devices, or dpi devices (or software rings or KNI devices, if we >so desire), we can base them off this common minimal framework and use >them all in a similar manner. > >Moving some basic common functions and structures into a common set of >files gives everyone a clean starting point for a new device plus adding a >light framework. The pktdev code is normally not called directly from the >application, but called from the device itself via a define in the device >header files. The pktdev RX/TX routines can be called from the >application, but the application needs to get the device structure pointer >based on the port id. > >The cryptodev API maybe very different from other devices and following >some type of Open Crypto API. The goal is not to restrict the device API, >but try to give some type of structure to tghe design. Does it make sense >to have a mbuf based Rx/Tx API, maybe not. Could the mbuf based APIs be >hidden in the pktdev code, very possible. We have a lot of options here. Adding some modified API looking very close to the OpenCrypto API or Linux Kernel Crypto API is a good way to extend and create a common API for crypto in DPDK. Following a standard set of APIs should help adoption. > >How the two Rx/Tx routines are defined: >--------------------------------------- > >/** > * > * Retrieve a burst of input packets from a receive queue of an Ethernet > * device. > * ><SNIP> > */ >#define rte_eth_rx_burst(_pid, _qid, _pkts, _nb_pkts) \ > rte_pkt_rx_burst((struct rte_pkt_dev *)&rte_eth_devices[_pid], _qid, >_pkts, _nb_pkts) > >/** > * Send a burst of output packets on a transmit queue of an Ethernet >device. > * ><SNIP> > */ >#define rte_eth_tx_burst(_pid, _qid, _pkts, _nb_pkts) \ > rte_pkt_tx_burst((struct rte_pkt_dev *)&rte_eth_device[_pid], _qid, >_pkts, _nb_pkts) > >A snip of code showing some advantages and use case of using pktdev API: >------------------------------------------------------------------------ > >Not the complete code and it has not been tested and is only an example >how one could use the design. > >/* > * The lcore main. This is the main thread that does the work, reading >from > * an input port and writing to an output port. > */ >static __attribute__((noreturn)) void >do_work(const struct pipeline_params *p) >{ > printf("\nCore %u forwarding packets. %s -> %s\n", > rte_lcore_id(), > p->src->data->name, > p->dst->data->name); > > /* Run until the application is quit or killed. */ > for (;;) { > /* > * Receive packets on a src device and forward them on out > * the dst device. > */ > /* Get burst of RX packets, from first port of pair. */ > struct rte_mbuf *bufs[BURST_SIZE]; > const uint16_t nb_rx = rte_pkt_rx_burst(p->src, 0, > bufs, BURST_SIZE); > > if (unlikely(nb_rx == 0)) > continue; > > /* Send burst of TX packets, to second port of pair. */ > const uint16_t nb_tx = rte_pkt_tx_burst(p->dst, 0, > bufs, nb_rx); > > /* Free any unsent packets. */ > if (unlikely(nb_tx < nb_rx)) { > uint16_t buf; > for (buf = nb_tx; buf < nb_rx; buf++) > rte_pktmbuf_free(bufs[buf]); > } > } >} > >/* > * The main function, which does initialization and calls the per-lcore > * functions. > */ >int >main(int argc, char *argv[]) >{ > struct pipeline_params p[RTE_MAX_LCORE]; > struct rte_mempool *mbuf_pool; > unsigned nb_ports, lcore_id; > uint8_t portid; > > /* Initialize the Environment Abstraction Layer (EAL). */ > int ret = rte_eal_init(argc, argv); > if (ret < 0) > rte_exit(EXIT_FAILURE, "Error with EAL initialization\n"); > > argc -= ret; > argv += ret; > > /* Check that there is an even number of ports to send/receive on. */ > nb_ports = rte_eth_dev_count(); > if (nb_ports < 2 || (nb_ports & 1)) > rte_exit(EXIT_FAILURE, "Error: number of ports must be even\n"); > > /* Creates a new mempool in memory to hold the mbufs. */ > mbuf_pool = rte_mempool_create("MBUF_POOL", > NUM_MBUFS * nb_ports, > MBUF_SIZE, > MBUF_CACHE_SIZE, > sizeof(struct rte_pktmbuf_pool_private), > rte_pktmbuf_pool_init, NULL, > rte_pktmbuf_init, NULL, > rte_socket_id(), > 0); > > if (mbuf_pool == NULL) > rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n"); > > /* Initialize all ports. */ > for (portid = 0; portid < nb_ports; portid++) > if (port_init(portid, mbuf_pool) != 0) > rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8 "\n", > portid); > > struct rte_pkt_dev *in = rte_eth_get_dev(0); > RTE_LCORE_FOREACH_SLAVE(lcore_id){ > char name[RTE_RING_NAMESIZE]; > snprintf(name, sizeof(name), "RING_from_%u", lcore_id); > struct rte_pkt_dev *out = rte_ring_get_dev( > rte_ring_create(name, 4096, rte_socket_id(), 0)); > > p[lcore_id].src = in; > p[lcore_id].dst = out; > rte_eal_remote_launch((lcore_function_t *)do_work, > &p[lcore_id], lcore_id); > in = out; // next pipeline stage reads from my output. > } > //now finish pipeline on master lcore > lcore_id = rte_lcore_id(); > p[lcore_id].src = in; > p[lcore_id].dst = rte_eth_get_dev(1); > do_work(&p[lcore_id]); > > return 0; >} > > >Changes to rte_ethdev.[ch] >-------------------------- > >The most changes to rte_ethdev.[ch] was to use the new defines from >rte_pkt.h. All of the references to the globals in ethdev had to be >replaced with a reference to a global structure in ethdev. Moving the >global or private data into a device specific structure seemed reasonable >to reduce name space issues with new devices. The rx_burst/tx_burst >routines were removed as they now exist in the rte_pktdev.c file. If we >use nested structures instead of macros then more of the code will need to >be converted or macros used to convert the members to address the nested >structures. > >Example: >#define rx_pkt_burst dev_data.rx_pkt_burst >#define tx_pkt_burst dev_data.tx_pkt_burst > > >Impact to Existing Applications >------------------------------- > >None. The existing APIs should all remain unchanged, only the underlying >library code needs to change. [Obviously changes to apps will be needed to >take advantage of new device classes as we make them available]. > >The crypto API could be similar to the Open Crypto APIs and they seem >reasonable, but also using mbufs to hold data is just trying to use that >container type to provide some common structure to the system. Some of the >crypto data with be in the form of packets and some in the form of chunks >of data, which the API should account for in the design. > >My goal is to provide a light weight framework for adding more devices and >not try to make everthing look like Ethernet device. > >Regards, >++Keith and Bruce > >