Re: [dpdk-dev] [PATCH v7] app/testpmd: add forwarding mode to simulate a noisy neighbour
Hi Bernard, On Mon, Oct 01, 2018 at 01:13:32PM +, Iremonger, Bernard wrote: Hi Jens, -Original Message- From: Jens Freimann [mailto:jfreim...@redhat.com] Sent: Friday, September 21, 2018 2:27 PM To: dev@dpdk.org Cc: ai...@redhat.com; jan.scheur...@ericsson.com; Richardson, Bruce ; tho...@monjalon.net; maxime.coque...@redhat.com; Ananyev, Konstantin ; Yigit, Ferruh ; Iremonger, Bernard ; ktray...@redhat.com Subject: [PATCH v7] app/testpmd: add forwarding mode to simulate a noisy neighbor ./devtools/check-git-log.sh -1 Headline too long: app/testpmd: add forwarding mode to simulate a noisy neighbour I'm sorry, I failed to use checkpatches.sh correctly :) I did: #> git show | DPDK_CHECKPATCH_PATH="/home/jfreiman/code/linux/scripts/checkpatch.pl" devtools/checkpatches.sh -- 1/1 valid patch I'll fix all errors/warnings and re-sent Thx! regards, Jens
[dpdk-dev] [PATCH v8] app/testpmd: add noisy neighbour forwarding mode
This adds a new forwarding mode to testpmd to simulate more realistic behavior of a guest machine engaged in receiving and sending packets performing Virtual Network Function (VNF). The goal is to enable a simple way of measuring performance impact on cache and memory footprint utilization from various VNF co-located on the same host machine. For this it does: * Buffer packets in a FIFO: Create a fifo to buffer received packets. Once it flows over put those packets into the actual tx queue. The fifo is created per tx queue and its size can be set with the --noisy-tx-sw-buffer-flushtime commandline parameter. A second commandline parameter is used to set a timeout in milliseconds after which the fifo is flushed. --noisy-tx-sw-buffer-size [packet numbers] Keep the mbuf in a FIFO and forward the over flooding packets from the FIFO. This queue is per TX-queue (after all other packet processing). --noisy-tx-sw-buffer-flushtime [delay] Flush the packet queue if no packets have been seen during [delay]. As long as packets are seen, the timer is reset. Add several options to simulate route lookups (memory reads) in tables that can be quite large, as well as route hit statistics update. These options simulates the while stack traversal and will trash the cache. Memory access is random. * simulate route lookups: Allocate a buffer and perform reads and writes on it as specified by commandline options: --noisy-lkup-memory [size] Size of the VNF internal memory (MB), in which the random read/write will be done, allocated by rte_malloc (hugepages). --noisy-lkup-num-writes [num] Number of random writes in memory per packet should be performed, simulating hit-flags update. 64 bits per write, all write in different cache lines. --noisy-lkup-num-reads [num] Number of random reads in memory per packet should be performed, simulating FIB/table lookups. 64 bits per read, all write in different cache lines. --noisy-lkup-num-reads-writes [num] Number of random reads and writes in memory per packet should be performed, simulating stats update. 64 bits per read-write, all reads and writes in different cache lines. Signed-off-by: Jens Freimann Acked-by: Kevin Traynor --- v7-v8: * fix checkpatches.sh warnings/errors v6->v7: * fix return value of mem allocation in noisy_begin * remove blank line * allow 0 as parameter value v5->v6: * fix patch description * fix comment for pkt_burst_noisy_vnf * check if flush needed when no packets were received * free dropped packets * remove redundant else-if * do memory access simulation in all cases * change order of free'd data structures in noisy_fwd_end * only allocate one noisy_config struct per port * check for return of rte_ring_create() * change checking of input parameters in noisy_fwd_begin(). Decided to allow to set paramters to 0 (which is the default) * did not change use of "=N" in documentation as suggested by Kevin because it is how it's done for most other parameters * make error message match code in checking of flush time parameter * don't add whitespace in testpmd.h v4->v5: * try to minimize impact in common code. Instead implement fwd_begin and fwd_end * simplify actual fwd function * remove unnecessary casts (Kevin) * use more meaningful names for parameters and global variables (Kevin) * free ring and vnf_mem as well (Kevin) * squash documentation and code into single patch (Bernard) * fix patch subject to "app/testpmd" (Bernard) app/test-pmd/Makefile | 1 + app/test-pmd/meson.build| 1 + app/test-pmd/noisy_vnf.c| 279 app/test-pmd/parameters.c | 60 + app/test-pmd/testpmd.c | 35 +++ app/test-pmd/testpmd.h | 8 + doc/guides/testpmd_app_ug/run_app.rst | 33 +++ doc/guides/testpmd_app_ug/testpmd_funcs.rst | 7 +- 8 files changed, 422 insertions(+), 2 deletions(-) create mode 100644 app/test-pmd/noisy_vnf.c diff --git a/app/test-pmd/Makefile b/app/test-pmd/Makefile index 2b4d604b8..e2581ca66 100644 --- a/app/test-pmd/Makefile +++ b/app/test-pmd/Makefile @@ -33,6 +33,7 @@ SRCS-y += rxonly.c SRCS-y += txonly.c SRCS-y += csumonly.c SRCS-y += icmpecho.c +SRCS-y += noisy_vnf.c SRCS-$(CONFIG_RTE_LIBRTE_IEEE1588) += ieee1588fwd.c SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_cmd.c diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build index a0b3be07f..9ef6ed957 100644 --- a/app/test-pmd/meson.build +++ b/app/test-pmd/meson.build @@ -17,6 +17,7 @@ sources = files('cmdline.c', 'iofwd.c', 'macfwd.c', 'macswap.c', + 'noisy_vnf.c', 'parameters.c', 'rxonly.c', 'testpmd.c', diff --git a/app/test-pmd/noisy_vnf.c b/app/test-pmd/noisy_vnf.c new file mode 100644 index 0..58c4ee925 --- /dev/null +++ b/app/test-pmd/noisy_vnf.c @@ -0,0 +1,279 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2018 Red Ha
Re: [dpdk-dev] [PATCH v7] app/testpmd: add forwarding mode to simulate a noisy neighbour
02/10/2018 09:19, Jens Freimann: > On Mon, Oct 01, 2018 at 01:13:32PM +, Iremonger, Bernard wrote: > >./devtools/check-git-log.sh -1 > >Headline too long: > >app/testpmd: add forwarding mode to simulate a noisy neighbour > > I'm sorry, I failed to use checkpatches.sh correctly :) I did: > > #> git show | > DPDK_CHECKPATCH_PATH="/home/jfreiman/code/linux/scripts/checkpatch.pl" > devtools/checkpatches.sh -- > > 1/1 valid patch Why this command is not correct?
Re: [dpdk-dev] [PATCH] net/softnic: add flow flush API
> --- a/drivers/net/softnic/rte_eth_softnic_flow.c > +++ b/drivers/net/softnic/rte_eth_softnic_flow.c > @@ -1915,6 +1915,50 @@ pmd_flow_destroy(struct rte_eth_dev *dev, > return 0; > } > > +static int > +pmd_flow_flush(struct rte_eth_dev *dev, > + struct rte_flow_error *error) > +{ > + struct pmd_internals *softnic = dev->data->dev_private; > + struct pipeline *pipeline; > + int status; > + uint32_t i = 0; > + > + TAILQ_FOREACH(pipeline, &softnic->pipeline_list, node) { Removing elements when iterating tailq lists with TAILQ_FOREACH macro is not safe. Instead, use TAILQ_FOREACH_SAFE for safe tailq element removal within the loop. > + /* Remove all the flows added to the tables. */ > + for (i = 0; i < pipeline->n_tables; i++) { > + struct softnic_table *table; > + struct rte_flow *flow; > + > + table = &pipeline->table[i]; > + TAILQ_FOREACH(flow, &table->flows, node) { > + /* Rule delete. */ > + status = softnic_pipeline_table_rule_delete > + (softnic, > + flow->pipeline->name, > + flow->table_id, > + &flow->match); > + if (status) > + return rte_flow_error_set(error, > + EINVAL, > + > RTE_FLOW_ERROR_TYPE_UNSPECIFIED, > + NULL, > + "Pipeline table rule delete > failed"); > + > + /* Update dependencies */ > + if (is_meter_action_enable(softnic, table)) > + flow_meter_owner_reset(softnic, flow); Fix Indentation here. > + /* Flow delete. */ > + TAILQ_REMOVE(&table->flows, flow, node); > + free(flow); > + } > + } > + } > + > + return 0; > +} > + > static int > pmd_flow_query(struct rte_eth_dev *dev __rte_unused, > struct rte_flow *flow, > @@ -1971,7 +2015,7 @@ const struct rte_flow_ops pmd_flow_ops = { > .validate = pmd_flow_validate, > .create = pmd_flow_create, > .destroy = pmd_flow_destroy, > - .flush = NULL, > + .flush = pmd_flow_flush, > .query = pmd_flow_query, > .isolate = NULL, > }; > -- > 2.17.1
Re: [dpdk-dev] [PATCH v3 00/15] bnxt patchset
On 9/29/2018 2:59 AM, Ajit Khaparde wrote: > Patchset against dpdk-next-net. > > v1->v2: > net/bnxt: get rid of ff pools array and use the vnic info array instead > - Fix access to uninitialized variable. > - Rectify the wrong 'Fixes' reference. > > net/bnxt: update HWRM version > - Update from 1.9.2.45 to version 1.9.2.53 > > v2->v3: > net/bnxt: update HWRM version > - Tried to split into more than one patch. > > - Updated commit logs and messages for rest based on review comments. > > Please apply. > > Ajit Khaparde (10): > net/bnxt: fix MTU setting > net/bnxt: update HWRM version > net/bnxt: update HWRM version part 2 > net/bnxt: update HWRM version part 3 Is there a logical to separation of part 1,2 & 3? Commit logs are empty and there is nothing distinctive from commits. If the separation is not logical but just physically split into 3 pieces I am for merging them back with the "Update the HWRM API to version 1.9.2.53" commit log. Or if there is a logic please clarify it on patch subject and commit log. I will wait your answer before moving on. Thanks, ferruh
Re: [dpdk-dev] [PATCH v2 1/2] eal: add eal option to configure iova mode
On 10/1/2018 5:00 PM, Eric Zhang wrote: > > > On 09/26/2018 08:42 AM, Burakov, Anatoly wrote: >> On 18-Sep-18 8:10 PM, eric zhang wrote: >>> From: Santosh Shukla >>> >>> In the case of user don't want to use bus iova scheme and want >>> to override. >>> >>> For that, Adding eal option --iova-mode= where valid input >>> string is 'pa' or 'va'. >>> >>> Signed-off-by: Santosh Shukla >>> Signed-off-by: Jerin Jacob >>> --- >> >> Needs documentation update in Programmer's Guide to explain why such a >> thing might be needed, and update EAL parameter guides. >> >> For the patch itself, >> Acked-by: Anatoly Burakov > Thanks Anatoly. Documentations were updated and patch is at > http://patchwork.dpdk.org/patch/45785/. Would you please give a review? I suggest sending a new version of this patchset with that patch included, instead of two separate patches. Makes life easy for people that needs to follow that dependency and good for keeping record for future.
Re: [dpdk-dev] [PATCH v6 5/5] vhost: message handling implemented as a callback array
Hi Nikolay, On 09/24/2018 10:17 PM, Nikolay Nikolaev wrote: Introduce vhost_message_handlers, which maps the message request type to the message handler. Then replace the switch construct with a map and call. Failing vhost_user_set_features is fatal and all processing should stop immediately and propagate the error to the upper layers. Change the code accordingly to reflect that. Signed-off-by: Nikolay Nikolaev --- lib/librte_vhost/vhost_user.c | 150 - 1 file changed, 57 insertions(+), 93 deletions(-) diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index e1b705fa7..f6ce8e092 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -1477,6 +1477,35 @@ vhost_user_iotlb_msg(struct virtio_net **pdev, struct VhostUserMsg *msg) return VH_RESULT_OK; } +typedef int (*vhost_message_handler_t)(struct virtio_net **pdev, + struct VhostUserMsg *msg); +static vhost_message_handler_t vhost_message_handlers[VHOST_USER_MAX] = { + [VHOST_USER_NONE] = NULL, + [VHOST_USER_GET_FEATURES] = vhost_user_get_features, + [VHOST_USER_SET_FEATURES] = vhost_user_set_features, + [VHOST_USER_SET_OWNER] = vhost_user_set_owner, + [VHOST_USER_RESET_OWNER] = vhost_user_reset_owner, + [VHOST_USER_SET_MEM_TABLE] = vhost_user_set_mem_table, + [VHOST_USER_SET_LOG_BASE] = vhost_user_set_log_base, + [VHOST_USER_SET_LOG_FD] = vhost_user_set_log_fd, + [VHOST_USER_SET_VRING_NUM] = vhost_user_set_vring_num, + [VHOST_USER_SET_VRING_ADDR] = vhost_user_set_vring_addr, + [VHOST_USER_SET_VRING_BASE] = vhost_user_set_vring_base, + [VHOST_USER_GET_VRING_BASE] = vhost_user_get_vring_base, + [VHOST_USER_SET_VRING_KICK] = vhost_user_set_vring_kick, + [VHOST_USER_SET_VRING_CALL] = vhost_user_set_vring_call, + [VHOST_USER_SET_VRING_ERR] = vhost_user_set_vring_err, + [VHOST_USER_GET_PROTOCOL_FEATURES] = vhost_user_get_protocol_features, + [VHOST_USER_SET_PROTOCOL_FEATURES] = vhost_user_set_protocol_features, + [VHOST_USER_GET_QUEUE_NUM] = vhost_user_get_queue_num, + [VHOST_USER_SET_VRING_ENABLE] = vhost_user_set_vring_enable, + [VHOST_USER_SEND_RARP] = vhost_user_send_rarp, + [VHOST_USER_NET_SET_MTU] = vhost_user_net_set_mtu, + [VHOST_USER_SET_SLAVE_REQ_FD] = vhost_user_set_req_fd, + [VHOST_USER_IOTLB_MSG] = vhost_user_iotlb_msg, +}; + + /* return bytes# of read on success or negative val on failure. */ static int read_vhost_message(int sockfd, struct VhostUserMsg *msg) @@ -1630,6 +1659,7 @@ vhost_user_msg_handler(int vid, int fd) int ret; int unlock_required = 0; uint32_t skip_master = 0; + int request; dev = get_device(vid); if (dev == NULL) @@ -1722,100 +1752,34 @@ vhost_user_msg_handler(int vid, int fd) goto skip_to_post_handle; } - switch (msg.request.master) { - case VHOST_USER_GET_FEATURES: - ret = vhost_user_get_features(&dev, &msg); - send_vhost_reply(fd, &msg); - break; - case VHOST_USER_SET_FEATURES: - ret = vhost_user_set_features(&dev, &msg); - break; - - case VHOST_USER_GET_PROTOCOL_FEATURES: - ret = vhost_user_get_protocol_features(&dev, &msg); - send_vhost_reply(fd, &msg); - break; - case VHOST_USER_SET_PROTOCOL_FEATURES: - ret = vhost_user_set_protocol_features(&dev, &msg); - break; - - case VHOST_USER_SET_OWNER: - ret = vhost_user_set_owner(&dev, &msg); - break; - case VHOST_USER_RESET_OWNER: - ret = vhost_user_reset_owner(&dev, &msg); - break; - - case VHOST_USER_SET_MEM_TABLE: - ret = vhost_user_set_mem_table(&dev, &msg); - break; - - case VHOST_USER_SET_LOG_BASE: - ret = vhost_user_set_log_base(&dev, &msg); - if (ret) - goto skip_to_reply; - /* it needs a reply */ - send_vhost_reply(fd, &msg); - break; - case VHOST_USER_SET_LOG_FD: - ret = vhost_user_set_log_fd(&dev, &msg); - break; - - case VHOST_USER_SET_VRING_NUM: - ret = vhost_user_set_vring_num(&dev, &msg); - break; - case VHOST_USER_SET_VRING_ADDR: - ret = vhost_user_set_vring_addr(&dev, &msg); - break; - case VHOST_USER_SET_VRING_BASE: - ret = vhost_user_set_vring_base(&dev, &msg); - break; - - case VHOST_USER_GET_VRING_BASE: - ret = vhost_user_get_vring_base(&dev, &msg); - if (ret) - goto skip_to_reply; - send_vhost_reply(fd, &msg); - bre
[dpdk-dev] [PATCH v6 01/10] examples/power: add checks around hypervisor
Allow vm_power_manager to run without requiring qemu to be present on the machine. This will be required for instances where the JSON interface is used for commands and polices, without any VMs present. A use case for this is a container enviromnent. Signed-off-by: David Hunt Acked-by: Anatoly Burakov --- examples/vm_power_manager/channel_manager.c | 71 + 1 file changed, 43 insertions(+), 28 deletions(-) diff --git a/examples/vm_power_manager/channel_manager.c b/examples/vm_power_manager/channel_manager.c index 927fc35ab..2e471d0c1 100644 --- a/examples/vm_power_manager/channel_manager.c +++ b/examples/vm_power_manager/channel_manager.c @@ -43,7 +43,8 @@ static unsigned char *global_cpumaps; static virVcpuInfo *global_vircpuinfo; static size_t global_maplen; -static unsigned global_n_host_cpus; +static unsigned int global_n_host_cpus; +static bool global_hypervisor_available; /* * Represents a single Virtual Machine @@ -198,7 +199,11 @@ get_pcpus_mask(struct channel_info *chan_info, unsigned vcpu) { struct virtual_machine_info *vm_info = (struct virtual_machine_info *)chan_info->priv_info; - return rte_atomic64_read(&vm_info->pcpu_mask[vcpu]); + + if (global_hypervisor_available && (vm_info != NULL)) + return rte_atomic64_read(&vm_info->pcpu_mask[vcpu]); + else + return 0; } static inline int @@ -559,6 +564,8 @@ get_all_vm(int *num_vm, int *num_vcpu) VIR_CONNECT_LIST_DOMAINS_PERSISTENT; unsigned int domain_flag = VIR_DOMAIN_VCPU_CONFIG; + if (!global_hypervisor_available) + return; memset(global_cpumaps, 0, CHANNEL_CMDS_MAX_CPUS*global_maplen); if (virNodeGetInfo(global_vir_conn_ptr, &node_info)) { @@ -768,38 +775,42 @@ connect_hypervisor(const char *path) } return 0; } - int -channel_manager_init(const char *path) +channel_manager_init(const char *path __rte_unused) { virNodeInfo info; LIST_INIT(&vm_list_head); if (connect_hypervisor(path) < 0) { - RTE_LOG(ERR, CHANNEL_MANAGER, "Unable to initialize channel manager\n"); - return -1; - } - - global_maplen = VIR_CPU_MAPLEN(CHANNEL_CMDS_MAX_CPUS); + global_n_host_cpus = 64; + global_hypervisor_available = 0; + RTE_LOG(INFO, CHANNEL_MANAGER, "Unable to initialize channel manager\n"); + } else { + global_hypervisor_available = 1; + + global_maplen = VIR_CPU_MAPLEN(CHANNEL_CMDS_MAX_CPUS); + + global_vircpuinfo = rte_zmalloc(NULL, + sizeof(*global_vircpuinfo) * + CHANNEL_CMDS_MAX_CPUS, RTE_CACHE_LINE_SIZE); + if (global_vircpuinfo == NULL) { + RTE_LOG(ERR, CHANNEL_MANAGER, "Error allocating memory for CPU Info\n"); + goto error; + } + global_cpumaps = rte_zmalloc(NULL, + CHANNEL_CMDS_MAX_CPUS * global_maplen, + RTE_CACHE_LINE_SIZE); + if (global_cpumaps == NULL) + goto error; - global_vircpuinfo = rte_zmalloc(NULL, sizeof(*global_vircpuinfo) * - CHANNEL_CMDS_MAX_CPUS, RTE_CACHE_LINE_SIZE); - if (global_vircpuinfo == NULL) { - RTE_LOG(ERR, CHANNEL_MANAGER, "Error allocating memory for CPU Info\n"); - goto error; - } - global_cpumaps = rte_zmalloc(NULL, CHANNEL_CMDS_MAX_CPUS * global_maplen, - RTE_CACHE_LINE_SIZE); - if (global_cpumaps == NULL) { - goto error; + if (virNodeGetInfo(global_vir_conn_ptr, &info)) { + RTE_LOG(ERR, CHANNEL_MANAGER, "Unable to retrieve node Info\n"); + goto error; + } + global_n_host_cpus = (unsigned int)info.cpus; } - if (virNodeGetInfo(global_vir_conn_ptr, &info)) { - RTE_LOG(ERR, CHANNEL_MANAGER, "Unable to retrieve node Info\n"); - goto error; - } - global_n_host_cpus = (unsigned)info.cpus; if (global_n_host_cpus > CHANNEL_CMDS_MAX_CPUS) { RTE_LOG(WARNING, CHANNEL_MANAGER, "The number of host CPUs(%u) exceeds the " @@ -811,7 +822,8 @@ channel_manager_init(const char *path) return 0; error: - disconnect_hypervisor(); + if (global_hypervisor_available) + disconnect_hypervisor(); return -1; } @@ -838,7 +850,10 @@ channel_manager_exit(void) rte_free(vm_info); } - rte_free(global_cpumaps); - rte_free(global_vircpuinfo); - disconnect_hypervisor(); + if (global_hypervisor_available) { + /* Only needed if hypervisor available */ +
[dpdk-dev] [PATCH v6 03/10] lib/power: add changes for host commands/policies
This patch does a couple of things: * Adds a new message type for removing policies (PKT_POLICY_REMOVE) Used when we want to remove a previously created policy. * Adds a core_type bool to the channel packet struct to specify whether the type of core we want to control is cirtual or physical. Signed-off-by: David Hunt Acked-by: Anatoly Burakov --- lib/librte_power/channel_commands.h | 5 + 1 file changed, 5 insertions(+) diff --git a/lib/librte_power/channel_commands.h b/lib/librte_power/channel_commands.h index ee638eefa..e7b93a797 100644 --- a/lib/librte_power/channel_commands.h +++ b/lib/librte_power/channel_commands.h @@ -19,6 +19,7 @@ extern "C" { #define CPU_POWER 1 #define CPU_POWER_CONNECT 2 #define PKT_POLICY 3 +#define PKT_POLICY_REMOVE 4 /* CPU Power Command Scaling */ #define CPU_POWER_SCALE_UP 1 @@ -58,6 +59,9 @@ struct traffic { uint32_t max_max_packet_thresh; }; +#define CORE_TYPE_VIRTUAL 0 +#define CORE_TYPE_PHYSICAL 1 + struct channel_packet { uint64_t resource_id; /**< core_num, device */ uint32_t unit;/**< scale down/up/min/max */ @@ -70,6 +74,7 @@ struct channel_packet { uint8_t vcpu_to_control[MAX_VCPU_PER_VM]; uint8_t num_vcpu; struct timer_profile timer_policy; + bool core_type; enum workload workload; enum policy_to_use policy_to_use; struct t_boost_status t_boost_status; -- 2.17.1
[dpdk-dev] [PATCH v6 04/10] examples/power: add necessary changes to guest app
The changes here are minimal, as the guest app functionality is not changing at all, but there is a new element in the channel_packet struct that needs to have a default set (channel_packet->core_type). Signed-off-by: David Hunt Acked-by: Anatoly Burakov --- examples/vm_power_manager/guest_cli/vm_power_cli_guest.c | 1 + 1 file changed, 1 insertion(+) diff --git a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c index 0db1b804f..2d9e7689a 100644 --- a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c +++ b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c @@ -92,6 +92,7 @@ set_policy_defaults(struct channel_packet *pkt) pkt->timer_policy.hours_to_use_traffic_profile[0] = 8; pkt->timer_policy.hours_to_use_traffic_profile[1] = 10; + pkt->core_type = CORE_TYPE_VIRTUAL; pkt->workload = LOW; pkt->policy_to_use = TIME; pkt->command = PKT_POLICY; -- 2.17.1
[dpdk-dev] [PATCH v6 02/10] examples/power: allow for number of vms to be zero
Previously the vm_power_manager app required to have some vms defined, so the call to get_all_vm() always set the noVms variable. Now we're accepting policies from the host OS (without any VMs defined), so it is now valid to have zero VMs. This patch initialises the relevant variables to zero just in case the call to get_all_vms() does not find any, so could return with the variables uninitialised. Signed-off-by: David Hunt Acked-by: Anatoly Burakov --- examples/vm_power_manager/channel_monitor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/vm_power_manager/channel_monitor.c b/examples/vm_power_manager/channel_monitor.c index 7fa47ba97..f180d74e6 100644 --- a/examples/vm_power_manager/channel_monitor.c +++ b/examples/vm_power_manager/channel_monitor.c @@ -66,7 +66,7 @@ static void core_share_status(int pNo) { - int noVms, noVcpus, z, x, t; + int noVms = 0, noVcpus = 0, z, x, t; get_all_vm(&noVms, &noVcpus); -- 2.17.1
[dpdk-dev] [PATCH v6 0/10] add json power policy interface for containers
The current vm_power_manager example app has the capability to accept power policies from virtual machines via virtio-serial channels. These power policies allow a virtual machine to give information to the power manager to allow the power manager take care of the power management of the virtual machine based on the information in the policy. This power policy functionality is limited to virtual machines sending the policies to the power manager (which runs in the Host OS), and a solution was needed for additional methods of sending power policies to the power manager app. The main use-case for this modification is for containers and host applications that wish to send polices to the power manager. This patchset adds the capability to send power polices and power commands to the vm_power_manager app via JSON strings through a fifo on the file system. For example, given the following file, policy.json: {"policy": { "name": "ubuntu2", "command": "create", "policy_type": "TIME", "busy_hours":[ 17, 18, 19, 20, 21, 22, 23 ], "quiet_hours":[ 2, 3, 4, 5, 6 ], "core_list":[ 11, 12, 13 ] }} Then running the command: cat policy.json >/tmp/powermonitor/fifo The policy is sent to the vm_power_manager. The power manager app then parses the JSON data, and inserts the policy into the array of policies. Part of the patch series contains documentation updates to give all the details of the valid name-value pairs, the data types, etc. Patch v2: * Fixed review comments from Stephen Hemminger and Lei A Yao. * Added a check in the Makefile for libjansson-dev. Will Warn user and build without JSON functionality if not present, will build including JSON functionality if it is present. Patch v3: * Added meson/ninja support for vm_power_manager and guest_cli apps * Fixed compilation issue with guest_cli app Patch v4: * Split out some unrelated changes to separate patches in the set * Some changes out of review by Anatoly (Thanks!) Patch v5: * Removed the directory with JSON examples, as they already exist in the documentation. * Fixed some typos and formatting issues in the documentation. * Changed the JSON examples in the documentation to 'javascript' causing the syntax to be highlighted nicely. * Inherited the Acks from previous version. Patch v6: * Added ability to set WORKLOAD policy to LOW, MEDIUM, or HIGH. "workload": "MEDIUM" * Added missing functionality to allow passing of a list of mac addresses for the TRAFFIC profile type. "mac_list":[ "de:ad:be:ef:01:01", "de:ad:be:ef:01:02" ] * Updated docs to include both of the above additions. [01/10] examples/power: add checks around hypervisor [02/10] examples/power: allow for number of vms to be zero [03/10] lib/power: add changes for host commands/policies [04/10] examples/power: add necessary changes to guest app [05/10] examples/power: add host channel to power manager [06/10] examples/power: increase allowed number of clients [07/10] examples/power: add json string handling [08/10] examples/power: clean up verbose messages [09/10] examples/power: add meson/ninja build support [10/10] doc/vm_power_manager: add JSON interface API info
[dpdk-dev] [PATCH v6 06/10] examples/power: increase allowed number of clients
Now that we're handling host policies, containers and virtual machines, we'll rename MAX_VMS to MAX_CLIENTS, and increase from 4 to 64 Signed-off-by: David Hunt Acked-by: Anatoly Burakov --- examples/vm_power_manager/channel_manager.h | 4 ++-- examples/vm_power_manager/channel_monitor.c | 10 +- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/examples/vm_power_manager/channel_manager.h b/examples/vm_power_manager/channel_manager.h index e32235b07..d948b304c 100644 --- a/examples/vm_power_manager/channel_manager.h +++ b/examples/vm_power_manager/channel_manager.h @@ -37,7 +37,7 @@ struct sockaddr_un _sockaddr_un; #define UNIX_PATH_MAX sizeof(_sockaddr_un.sun_path) #endif -#define MAX_VMS 4 +#define MAX_CLIENTS 64 #define MAX_VCPUS 20 @@ -47,7 +47,7 @@ struct libvirt_vm_info { uint8_t num_cpus; }; -struct libvirt_vm_info lvm_info[MAX_VMS]; +struct libvirt_vm_info lvm_info[MAX_CLIENTS]; /* Communication Channel Status */ enum channel_status { CHANNEL_MGR_CHANNEL_DISCONNECTED = 0, CHANNEL_MGR_CHANNEL_CONNECTED, diff --git a/examples/vm_power_manager/channel_monitor.c b/examples/vm_power_manager/channel_monitor.c index c3c3d7bb1..53a4efe45 100644 --- a/examples/vm_power_manager/channel_monitor.c +++ b/examples/vm_power_manager/channel_monitor.c @@ -41,7 +41,7 @@ static volatile unsigned run_loop = 1; static int global_event_fd; static unsigned int policy_is_set; static struct epoll_event *global_events_list; -static struct policy policies[MAX_VMS]; +static struct policy policies[MAX_CLIENTS]; void channel_monitor_exit(void) { @@ -199,7 +199,7 @@ update_policy(struct channel_packet *pkt) RTE_LOG(INFO, CHANNEL_MONITOR, "Applying policy for %s\n", pkt->vm_name); - for (i = 0; i < MAX_VMS; i++) { + for (i = 0; i < MAX_CLIENTS; i++) { if (strcmp(policies[i].pkt.vm_name, pkt->vm_name) == 0) { /* Copy the contents of *pkt into the policy.pkt */ policies[i].pkt = *pkt; @@ -214,7 +214,7 @@ update_policy(struct channel_packet *pkt) } } if (!updated) { - for (i = 0; i < MAX_VMS; i++) { + for (i = 0; i < MAX_CLIENTS; i++) { if (policies[i].enabled == 0) { policies[i].pkt = *pkt; get_pcpu_to_control(&policies[i]); @@ -238,7 +238,7 @@ remove_policy(struct channel_packet *pkt __rte_unused) * Disabling the policy is simply a case of setting * enabled to 0 */ - for (i = 0; i < MAX_VMS; i++) { + for (i = 0; i < MAX_CLIENTS; i++) { if (strcmp(policies[i].pkt.vm_name, pkt->vm_name) == 0) { policies[i].enabled = 0; return 0; @@ -609,7 +609,7 @@ run_channel_monitor(void) if (policy_is_set) { int j; - for (j = 0; j < MAX_VMS; j++) { + for (j = 0; j < MAX_CLIENTS; j++) { if (policies[j].enabled == 1) apply_policy(&policies[j]); } -- 2.17.1
[dpdk-dev] [PATCH v6 07/10] examples/power: add json string handling
Add JSON string handling to vm_power_manager for JSON strings received through the fifo. The format of the JSON strings are detailed in the next patch, the vm_power_manager user guide documentation updates. This patch introduces a new dependency on Jansson, a C library for encoding, decoding and manipulating JSON data. To compile the sample app you now need to have installed libjansson4 and libjansson-dev (these may be named slightly differently depending on your Operating System) Signed-off-by: David Hunt Acked-by: Anatoly Burakov --- examples/vm_power_manager/Makefile | 6 + examples/vm_power_manager/channel_monitor.c | 371 ++-- 2 files changed, 352 insertions(+), 25 deletions(-) diff --git a/examples/vm_power_manager/Makefile b/examples/vm_power_manager/Makefile index 13a5205ba..50147c05d 100644 --- a/examples/vm_power_manager/Makefile +++ b/examples/vm_power_manager/Makefile @@ -31,6 +31,12 @@ CFLAGS += $(WERROR_FLAGS) LDLIBS += -lvirt +JANSSON := $(shell pkg-config --exists jansson; echo $$?) +ifeq ($(JANSSON), 0) +LDLIBS += $(shell pkg-config --libs jansson) +CFLAGS += -DUSE_JANSSON +endif + ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y) ifeq ($(CONFIG_RTE_LIBRTE_IXGBE_PMD),y) diff --git a/examples/vm_power_manager/channel_monitor.c b/examples/vm_power_manager/channel_monitor.c index 53a4efe45..afb44a069 100644 --- a/examples/vm_power_manager/channel_monitor.c +++ b/examples/vm_power_manager/channel_monitor.c @@ -9,11 +9,18 @@ #include #include #include +#include #include #include #include #include - +#include +#include +#ifdef USE_JANSSON +#include +#else +#pragma message "Jansson dev libs unavailable, not including JSON parsing" +#endif #include #include #include @@ -35,6 +42,8 @@ uint64_t vsi_pkt_count_prev[384]; uint64_t rdtsc_prev[384]; +#define MAX_JSON_STRING_LEN 1024 +char json_data[MAX_JSON_STRING_LEN]; double time_period_ms = 1; static volatile unsigned run_loop = 1; @@ -43,6 +52,234 @@ static unsigned int policy_is_set; static struct epoll_event *global_events_list; static struct policy policies[MAX_CLIENTS]; +#ifdef USE_JANSSON + +union PFID { + struct ether_addr addr; + uint64_t pfid; +}; + +static int +str_to_ether_addr(const char *a, struct ether_addr *ether_addr) +{ + int i; + char *end; + unsigned long o[ETHER_ADDR_LEN]; + + i = 0; + do { + errno = 0; + o[i] = strtoul(a, &end, 16); + if (errno != 0 || end == a || (end[0] != ':' && end[0] != 0)) + return -1; + a = end + 1; + } while (++i != RTE_DIM(o) / sizeof(o[0]) && end[0] != 0); + + /* Junk at the end of line */ + if (end[0] != 0) + return -1; + + /* Support the format XX:XX:XX:XX:XX:XX */ + if (i == ETHER_ADDR_LEN) { + while (i-- != 0) { + if (o[i] > UINT8_MAX) + return -1; + ether_addr->addr_bytes[i] = (uint8_t)o[i]; + } + /* Support the format :: */ + } else if (i == ETHER_ADDR_LEN / 2) { + while (i-- != 0) { + if (o[i] > UINT16_MAX) + return -1; + ether_addr->addr_bytes[i * 2] = + (uint8_t)(o[i] >> 8); + ether_addr->addr_bytes[i * 2 + 1] = + (uint8_t)(o[i] & 0xff); + } + /* unknown format */ + } else + return -1; + + return 0; +} + +static int +set_policy_mac(struct channel_packet *pkt, int idx, char *mac) +{ + union PFID pfid; + int ret; + + /* Use port MAC address as the vfid */ + ret = str_to_ether_addr(mac, &pfid.addr); + + if (ret != 0) { + RTE_LOG(ERR, CHANNEL_MONITOR, + "Invalid mac address received in JSON\n"); + pkt->vfid[idx] = 0; + return -1; + } + + printf("Received MAC Address: %02" PRIx8 ":%02" PRIx8 ":%02" PRIx8 ":" + "%02" PRIx8 ":%02" PRIx8 ":%02" PRIx8 "\n", + pfid.addr.addr_bytes[0], pfid.addr.addr_bytes[1], + pfid.addr.addr_bytes[2], pfid.addr.addr_bytes[3], + pfid.addr.addr_bytes[4], pfid.addr.addr_bytes[5]); + + pkt->vfid[idx] = pfid.pfid; + return 0; +} + + +static int +parse_json_to_pkt(json_t *element, struct channel_packet *pkt) +{ + const char *key; + json_t *value; + int ret; + + memset(pkt, 0, sizeof(struct channel_packet)); + + pkt->nb_mac_to_monitor = 0; + pkt->t_boost_status.tbEnabled = false; + pkt->workload = LOW; + pkt->policy_to_use = TIME; + pkt->command = PKT_POLICY; + pkt->core_type = CORE_TYPE_PHYSICAL; + + json_object_foreach(element,
[dpdk-dev] [PATCH v6 08/10] examples/power: clean up verbose messages
Some messages appearing several times a second, removing as they are unnecessary. Other less severe messages change from INFO to DEBUG Signed-off-by: David Hunt Acked-by: Anatoly Burakov --- examples/vm_power_manager/channel_monitor.c | 19 +-- 1 file changed, 5 insertions(+), 14 deletions(-) diff --git a/examples/vm_power_manager/channel_monitor.c b/examples/vm_power_manager/channel_monitor.c index afb44a069..5da531542 100644 --- a/examples/vm_power_manager/channel_monitor.c +++ b/examples/vm_power_manager/channel_monitor.c @@ -361,7 +361,7 @@ get_pcpu_to_control(struct policy *pol) ci = get_core_info(); - RTE_LOG(INFO, CHANNEL_MONITOR, + RTE_LOG(DEBUG, CHANNEL_MONITOR, "Looking for pcpu for %s\n", pol->pkt.vm_name); /* @@ -528,8 +528,6 @@ apply_traffic_profile(struct policy *pol) diff = get_pkt_diff(pol); - RTE_LOG(INFO, CHANNEL_MONITOR, "Applying traffic profile\n"); - if (diff >= (pol->pkt.traffic_policy.max_max_packet_thresh)) { for (count = 0; count < pol->pkt.num_vcpu; count++) { if (pol->core_share[count].status != 1) @@ -573,9 +571,6 @@ apply_time_profile(struct policy *pol) if (pol->core_share[count].status != 1) { power_manager_scale_core_max( pol->core_share[count].pcpu); - RTE_LOG(INFO, CHANNEL_MONITOR, - "Scaling up core %d to max\n", - pol->core_share[count].pcpu); } } break; @@ -585,9 +580,6 @@ apply_time_profile(struct policy *pol) if (pol->core_share[count].status != 1) { power_manager_scale_core_min( pol->core_share[count].pcpu); - RTE_LOG(INFO, CHANNEL_MONITOR, - "Scaling down core %d to min\n", - pol->core_share[count].pcpu); } } break; @@ -649,8 +641,6 @@ process_request(struct channel_packet *pkt, struct channel_info *chan_info) if (chan_info == NULL) return -1; - RTE_LOG(INFO, CHANNEL_MONITOR, "Processing Request %s\n", pkt->vm_name); - if (rte_atomic32_cmpset(&(chan_info->status), CHANNEL_MGR_CHANNEL_CONNECTED, CHANNEL_MGR_CHANNEL_PROCESSING) == 0) return -1; @@ -719,8 +709,8 @@ process_request(struct channel_packet *pkt, struct channel_info *chan_info) } if (pkt->command == PKT_POLICY) { - RTE_LOG(INFO, CHANNEL_MONITOR, - "\nProcessing Policy request\n"); + RTE_LOG(INFO, CHANNEL_MONITOR, "Processing policy request %s\n", + pkt->vm_name); update_policy(pkt); policy_is_set = 1; } @@ -904,7 +894,8 @@ run_channel_monitor(void) global_events_list[i].data.ptr; if ((global_events_list[i].events & EPOLLERR) || (global_events_list[i].events & EPOLLHUP)) { - RTE_LOG(DEBUG, CHANNEL_MONITOR, "Remote closed connection for " + RTE_LOG(INFO, CHANNEL_MONITOR, + "Remote closed connection for " "channel '%s'\n", chan_info->channel_path); remove_channel(&chan_info); -- 2.17.1
[dpdk-dev] [PATCH v6 05/10] examples/power: add host channel to power manager
This patch adds a fifo channel to the vm_power_manager app through which we can send commands and polices. Intended for sending JSON strings. The fifo is at /tmp/powermonitor/fifo Signed-off-by: David Hunt Acked-by: Anatoly Burakov --- examples/vm_power_manager/channel_manager.c | 109 +++ examples/vm_power_manager/channel_manager.h | 17 +++ examples/vm_power_manager/channel_monitor.c | 142 +++- examples/vm_power_manager/main.c| 2 + 4 files changed, 236 insertions(+), 34 deletions(-) diff --git a/examples/vm_power_manager/channel_manager.c b/examples/vm_power_manager/channel_manager.c index 2e471d0c1..4fac099df 100644 --- a/examples/vm_power_manager/channel_manager.c +++ b/examples/vm_power_manager/channel_manager.c @@ -13,6 +13,7 @@ #include #include +#include #include #include @@ -284,6 +285,38 @@ open_non_blocking_channel(struct channel_info *info) return 0; } +static int +open_host_channel(struct channel_info *info) +{ + int flags; + + info->fd = open(info->channel_path, O_RDWR | O_RSYNC); + if (info->fd == -1) { + RTE_LOG(ERR, CHANNEL_MANAGER, "Error(%s) opening fifo for '%s'\n", + strerror(errno), + info->channel_path); + return -1; + } + + /* Get current flags */ + flags = fcntl(info->fd, F_GETFL, 0); + if (flags < 0) { + RTE_LOG(WARNING, CHANNEL_MANAGER, "Error(%s) fcntl get flags socket for" + "'%s'\n", strerror(errno), info->channel_path); + return 1; + } + /* Set to Non Blocking */ + flags |= O_NONBLOCK; + if (fcntl(info->fd, F_SETFL, flags) < 0) { + RTE_LOG(WARNING, CHANNEL_MANAGER, + "Error(%s) setting non-blocking " + "socket for '%s'\n", + strerror(errno), info->channel_path); + return -1; + } + return 0; +} + static int setup_channel_info(struct virtual_machine_info **vm_info_dptr, struct channel_info **chan_info_dptr, unsigned channel_num) @@ -294,6 +327,7 @@ setup_channel_info(struct virtual_machine_info **vm_info_dptr, chan_info->channel_num = channel_num; chan_info->priv_info = (void *)vm_info; chan_info->status = CHANNEL_MGR_CHANNEL_DISCONNECTED; + chan_info->type = CHANNEL_TYPE_BINARY; if (open_non_blocking_channel(chan_info) < 0) { RTE_LOG(ERR, CHANNEL_MANAGER, "Could not open channel: " "'%s' for VM '%s'\n", @@ -316,6 +350,42 @@ setup_channel_info(struct virtual_machine_info **vm_info_dptr, return 0; } +static void +fifo_path(char *dst, unsigned int len) +{ + snprintf(dst, len, "%sfifo", CHANNEL_MGR_SOCKET_PATH); +} + +static int +setup_host_channel_info(struct channel_info **chan_info_dptr, + unsigned int channel_num) +{ + struct channel_info *chan_info = *chan_info_dptr; + + chan_info->channel_num = channel_num; + chan_info->priv_info = (void *)NULL; + chan_info->status = CHANNEL_MGR_CHANNEL_DISCONNECTED; + chan_info->type = CHANNEL_TYPE_JSON; + + fifo_path(chan_info->channel_path, sizeof(chan_info->channel_path)); + + if (open_host_channel(chan_info) < 0) { + RTE_LOG(ERR, CHANNEL_MANAGER, "Could not open host channel: " + "'%s'\n", + chan_info->channel_path); + return -1; + } + if (add_channel_to_monitor(&chan_info) < 0) { + RTE_LOG(ERR, CHANNEL_MANAGER, "Could add channel: " + "'%s' to epoll ctl\n", + chan_info->channel_path); + return -1; + + } + chan_info->status = CHANNEL_MGR_CHANNEL_CONNECTED; + return 0; +} + int add_all_channels(const char *vm_name) { @@ -470,6 +540,45 @@ add_channels(const char *vm_name, unsigned *channel_list, return num_channels_enabled; } +int +add_host_channel(void) +{ + struct channel_info *chan_info; + char socket_path[PATH_MAX]; + int num_channels_enabled = 0; + int ret; + + fifo_path(socket_path, sizeof(socket_path)); + + ret = mkfifo(socket_path, 0660); + if ((errno != EEXIST) && (ret < 0)) { + RTE_LOG(ERR, CHANNEL_MANAGER, "Cannot create fifo '%s' error: " + "%s\n", socket_path, strerror(errno)); + return 0; + } + + if (access(socket_path, F_OK) < 0) { + RTE_LOG(ERR, CHANNEL_MANAGER, "Channel path '%s' error: " + "%s\n", socket_path, strerror(errno)); + return 0; + } + chan_info = rte_malloc(NULL, sizeof(*chan_info), 0); + if (chan_info == NULL) { +
[dpdk-dev] [PATCH v6 10/10] doc/vm_power_manager: add JSON interface API info
Also added meson/ninja build info Signed-off-by: David Hunt Acked-by: Marko Kovacevic --- .../sample_app_ug/vm_power_management.rst | 300 +- 1 file changed, 298 insertions(+), 2 deletions(-) diff --git a/doc/guides/sample_app_ug/vm_power_management.rst b/doc/guides/sample_app_ug/vm_power_management.rst index 855570d6b..1ad4f1490 100644 --- a/doc/guides/sample_app_ug/vm_power_management.rst +++ b/doc/guides/sample_app_ug/vm_power_management.rst @@ -199,7 +199,7 @@ see :doc:`compiling`. The application is located in the ``vm_power_manager`` sub-directory. -To build just the ``vm_power_manager`` application: +To build just the ``vm_power_manager`` application using ``make``: .. code-block:: console @@ -208,6 +208,22 @@ To build just the ``vm_power_manager`` application: cd ${RTE_SDK}/examples/vm_power_manager/ make +The resulting binary will be ${RTE_SDK}/build/examples/vm_power_manager + +To build just the ``vm_power_manager`` application using ``meson/ninja``: + +.. code-block:: console + + export RTE_SDK=/path/to/rte_sdk + cd ${RTE_SDK} + meson build + cd build + ninja + meson configure -Dexamples=vm_power_manager + ninja + +The resulting binary will be ${RTE_SDK}/build/examples/dpdk-vm_power_manager + Running ~~~ @@ -337,6 +353,270 @@ monitoring of branch ratio on cores doing busy polling via PMDs. and will need to be adjusted for different workloads. + +JSON API + + +In addition to the command line interface for host command and a virtio-serial +interface for VM power policies, there is also a JSON interface through which +power commands and policies can be sent. This functionality adds a dependency +on the Jansson library, and the Jansson development package must be installed +on the system before the JSON parsing functionality is included in the app. +This is achieved by: + + .. code-block:: javascript + +apt-get install libjansson-dev + +The command and package name may be different depending on your operating +system. It's worth noting that the app will successfully build without this +package present, but a warning is shown during compilation, and the JSON +parsing functionality will not be present in the app. + +Sending a command or policy to the power manager application is achieved by +simply opening a fifo file, writing a JSON string to that fifo, and closing +the file. + +The fifo is at /tmp/powermonitor/fifo + +The jason string can be a policy or instruction, and takes the following +format: + + .. code-block:: javascript + +{"packet_type": { + "pair_1": value, + "pair_2": value +}} + +The 'packet_type' header can contain one of two values, depending on +whether a policy or power command is being sent. The two possible values are +"policy" and "instruction", and the expected name-value pairs is different +depending on which type is being sent. + +The pairs are the format of standard JSON name-value pairs. The value type +varies between the different name/value pairs, and may be integers, strings, +arrays, etc. Examples of policies follow later in this document. The allowed +names and value types are as follows: + + +:Pair Name: "name" +:Description: Name of the VM or Host. Allows the parser to associate the + policy with the relevant VM or Host OS. +:Type: string +:Values: any valid string +:Required: yes +:Example: + +.. code-block:: javascript + + "name", "ubuntu2" + + +:Pair Name: "command" +:Description: The type of packet we're sending to the power manager. We can be + creating or destroying a policy, or sending a direct command to adjust + the frequency of a core, similar to the command line interface. +:Type: string +:Values: + + :CREATE: used when creating a new policy, + :DESTROY: used when removing a policy, + :POWER: used when sending an immediate command, max, min, etc. +:Required: yes +:Example: + +.. code-block:: javascript + + "command", "CREATE" + + +:Pair Name: "policy_type" +:Description: Type of policy to apply. Please see vm_power_manager documentation + for more information on the types of policies that may be used. +:Type: string +:Values: + + :TIME: Time-of-day policy. Frequencies of the relevant cores are +scaled up/down depending on busy and quiet hours. + :TRAFFIC: This policy takes statistics from the NIC and scales up +and down accordingly. + :WORKLOAD: This policy looks at how heavily loaded the cores are, +and scales up and down accordingly. + :BRANCH_RATIO: This out-of-band policy can look at the ratio between +branch hits and misses on a core, and is useful for detecting +how much packet processing a core is doing. +:Required: only for CREATE/DESTROY command +:Example: + + .. code-block:: javascript + +"policy_type", "TIME" + +:Pair Name: "busy_hours" +:Description: The hours of the day in which we scale up the cores for busy + times. +:Type: array of integers +:Values: array with list of hour numbers, (0-23)
[dpdk-dev] [PATCH v6 09/10] examples/power: add meson/ninja build support
Add meson.build in vm_power_manager and the guest_cli subdirectory. Building can be achieved by going to the build directory, and using meson configure -Dexamples=vm_power_manager,vm_power_manager/guest_cli Then, when ninja is invoked, it will build dpdk-vm_power_manger and dpdk-guest_cli Work still needs to be done on the meson build system to handles the case where the target list of example apps is defined as 'all'. That will come in a future patch. Signed-off-by: David Hunt Acked-by: Bruce Richardson --- .../vm_power_manager/guest_cli/meson.build| 21 +++ examples/vm_power_manager/meson.build | 37 ++- 2 files changed, 56 insertions(+), 2 deletions(-) create mode 100644 examples/vm_power_manager/guest_cli/meson.build diff --git a/examples/vm_power_manager/guest_cli/meson.build b/examples/vm_power_manager/guest_cli/meson.build new file mode 100644 index 0..9e821ceb8 --- /dev/null +++ b/examples/vm_power_manager/guest_cli/meson.build @@ -0,0 +1,21 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2018 Intel Corporation + +# meson file, for building this example as part of a main DPDK build. +# +# To build this example as a standalone application with an already-installed +# DPDK instance, use 'make' + +# Setting the name here because the default name will conflict with the +# vm_power_manager app because of the way the directories are parsed. +name = 'guest_cli' + +deps += ['power'] + +sources = files( + 'main.c', 'parse.c', 'vm_power_cli_guest.c' +) + +opt_dep = cc.find_library('virt', required : false) +build = opt_dep.found() +ext_deps += opt_dep diff --git a/examples/vm_power_manager/meson.build b/examples/vm_power_manager/meson.build index c370d7476..f98445bc6 100644 --- a/examples/vm_power_manager/meson.build +++ b/examples/vm_power_manager/meson.build @@ -6,5 +6,38 @@ # To build this example as a standalone application with an already-installed # DPDK instance, use 'make' -# Example app currently unsupported by meson build -build = false +if dpdk_conf.has('RTE_LIBRTE_BNXT_PMD') + deps += ['pmd_bnxt'] +endif + +if dpdk_conf.has('RTE_LIBRTE_I40E_PMD') + deps += ['pmd_i40e'] +endif + +if dpdk_conf.has('RTE_LIBRTE_IXGBE_PMD') + deps += ['pmd_ixgbe'] +endif + +deps += ['power'] + + +sources = files( + 'channel_manager.c', 'channel_monitor.c', 'main.c', 'parse.c', 'power_manager.c', 'vm_power_cli.c' +) + +# If we're on X86, pull in the x86 code for the branch monitor algo. +if dpdk_conf.has('RTE_ARCH_X86_64') + sources += files('oob_monitor_x86.c') +else + sources += files('oob_monitor_nop.c') +endif + +opt_dep = cc.find_library('virt', required : false) +build = opt_dep.found() +ext_deps += opt_dep + +opt_dep = dependency('jansson', required : false) +if opt_dep.found() + ext_deps += opt_dep + cflags += '-DUSE_JANSSON' +endif -- 2.17.1
Re: [dpdk-dev] [PATCH] app/testpmd: check Rx VLAN offload flag to print VLAN TCI
On 10/2/2018 3:29 AM, Hyong Youb Kim wrote: > On Mon, Oct 01, 2018 at 03:01:40PM +0100, Ferruh Yigit wrote: >> On 9/26/2018 4:06 AM, John Daley wrote: >>> From: Hyong Youb Kim >>> >>> Since the following commit, PKT_RX_VLAN indicates the presence of >>> mbuf's vlan_tci, not PKT_RX_VLAN_STRIPPED. >>> >>> commit 380a7aab1ae2 ("mbuf: rename deprecated VLAN flags") >>> Cc: olivier.m...@6wind.com >>> >>> Signed-off-by: Hyong Youb Kim >>> Reviewed-by: John Daley >>> --- >>> app/test-pmd/rxonly.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c >>> index a93d80612..e8d226624 100644 >>> --- a/app/test-pmd/rxonly.c >>> +++ b/app/test-pmd/rxonly.c >>> @@ -130,7 +130,7 @@ pkt_burst_receive(struct fwd_stream *fs) >>> } >>> if (ol_flags & PKT_RX_TIMESTAMP) >>> printf(" - timestamp %"PRIu64" ", mb->timestamp); >>> - if (ol_flags & PKT_RX_VLAN_STRIPPED) >>> + if (ol_flags & PKT_RX_VLAN) >>> printf(" - VLAN tci=0x%x", mb->vlan_tci); >>> if (ol_flags & PKT_RX_QINQ_STRIPPED) >>> printf(" - QinQ VLAN tci=0x%x, VLAN tci outer=0x%x", >> >> Isn't same also correct for QinQ, PKT_RX_QINQ means mb->vlan_tci & >> mb->vlan_tci_outer are set? >> > > That is a good point. > > According to rte_mbuf.h, PKT_RX_QINQ means "The RX packet is a double > VLAN, and the outer tci has been saved in in mbuf->vlan_tci_outer." > > Here is a summary. > PKT_RX_VLAN => vlan_tci is set > PKT_RX_QINQ => vlan_tci_outer is set Because of the comment on "PKT_RX_QINQ_STRIPPED" I think: PKT_RX_QINQ => vlan_tci_outer & vlan_tci is set Although it is not clear from "PKT_RX_QINQ" comment. > PKT_RX_VLAN_STRIPPED => must also set PKT_RX_VLAN > PKT_RX_QINQ_STRIPPED => must also set PKT_RX_VLAN, PKT_RX_QINQ, > PKT_RX_VLAN_STRIPPED > > Looks like i40e is the only driver that is using > PKT_RX_QINQ_STRIPPED. And, it does not set PKT_RX_QINQ. I am CC'ing > i40e maintainers. > > Back to rxonly.. > > + if (ol_flags & (PKT_RX_QINQ | PKT_RX_VLAN)) > printf(" - QinQ VLAN tci=0x%x, VLAN tci outer=0x%x", > mb->vlan_tci, mb->vlan_tci_outer); > > A change like this would be technically correct, but may break i40e > test cases. Or, if the above message is really meant for 'stripped', > then perhaps add comment or rephrase the message for now? > > As for the use of PKT_RX_VLAN, some drivers like enic and ixgbe can > set PKT_RX_VLAN independent of vlan stripping, which led me to writing > this patch. I think Olivier fixed all drivers when he introduced > PKT_RX_VLAN. So using PKT_RX_VLAN in rxonly shouldn't be breaking > anyone's test cases. +1 to PKT_RX_VLAN update. I was thinking PKT_RX_QINQ also can be fixed quickly in testpmd with this patch, taking into account that it may affect other piece of code, agree to get this patch as it is and consider QINQ changes in different patch.
Re: [dpdk-dev] [PATCH] app/testpmd: check Rx VLAN offload flag to print VLAN TCI
On 9/26/2018 4:06 AM, John Daley wrote: > From: Hyong Youb Kim > > Since the following commit, PKT_RX_VLAN indicates the presence of > mbuf's vlan_tci, not PKT_RX_VLAN_STRIPPED. > > commit 380a7aab1ae2 ("mbuf: rename deprecated VLAN flags") > Cc: olivier.m...@6wind.com > > Signed-off-by: Hyong Youb Kim > Reviewed-by: John Daley Reviewed-by: Ferruh Yigit
Re: [dpdk-dev] [PATCH 1/4] ethdev: add SCTP Rx checksum offload support
-Original Message- > Date: Mon, 1 Oct 2018 17:11:50 +0100 > From: Ferruh Yigit > To: Jerin Jacob > CC: Wenzhuo Lu , Jingjing Wu , > Bernard Iremonger , John McNamara > , Marko Kovacevic , > Thomas Monjalon , Andrew Rybchenko > , dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH 1/4] ethdev: add SCTP Rx checksum offload > support > User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 > Thunderbird/52.9.1 > > On 10/1/2018 4:59 PM, Jerin Jacob wrote: > > -Original Message- > >> Date: Mon, 1 Oct 2018 14:46:39 +0100 > >> From: Ferruh Yigit > >> To: Jerin Jacob , Wenzhuo Lu > >> , Jingjing Wu , Bernard > >> Iremonger , John McNamara > >> , Marko Kovacevic , > >> Thomas Monjalon , Andrew Rybchenko > >> > >> CC: dev@dpdk.org > >> Subject: Re: [dpdk-dev] [PATCH 1/4] ethdev: add SCTP Rx checksum offload > >> support > >> User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 > >> Thunderbird/52.9.1 > >> > >> > >> On 9/13/2018 2:47 PM, Jerin Jacob wrote: > >>> Signed-off-by: Jerin Jacob > >> > >> Overall set looks good to me, I put some comments on individual patches. > >> > >> And can you please rebase on top of latest head? > > > > Sure. > > > > Regarding space issue mentioned in other email in this thread. > > It looks like similar space added in other offloads. > > example: http://git.dpdk.org/dpdk/tree/app/test-pmd/config.c#n571 > > Hi Jerin, > > This is just detail, the alignment is broken in the output of the log, for > others on/off start from column 56, for this one it is 57, just delete a space > from printf please. Sure Ferruh. Will add it in v2 > > > > > So, I expect no change in this patch other than rebase to latest head. > > If not, let me know. > > > >> > >> Thanks, > >> ferruh > >> >
Re: [dpdk-dev] [PATCH v6 4/5] vhost: unify message handling function signature
On 09/24/2018 10:17 PM, Nikolay Nikolaev wrote: Each vhost-user message handling function will return an int result which is described in the new enum vh_result: error, OK and reply. All functions will now have two arguments, virtio_net double pointer and VhostUserMsg pointer. Signed-off-by: Nikolay Nikolaev --- lib/librte_vhost/vhost_user.c | 211 - 1 file changed, 125 insertions(+), 86 deletions(-) diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index 77905dda0..e1b705fa7 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -71,6 +71,16 @@ static const char *vhost_message_str[VHOST_USER_MAX] = { [VHOST_USER_CRYPTO_CLOSE_SESS] = "VHOST_USER_CRYPTO_CLOSE_SESS", }; +/* The possible results of a message handling function */ +enum vh_result { + /* Message handling failed */ + VH_RESULT_ERR = -1, + /* Message handling successful */ + VH_RESULT_OK= 0, + /* Message handling successful and reply prepared */ + VH_RESULT_REPLY = 1, +}; + -vhost_user_get_vring_base(struct virtio_net *dev, +vhost_user_get_vring_base(struct virtio_net **pdev, struct VhostUserMsg *msg) { + struct virtio_net *dev = *pdev; struct vhost_virtqueue *vq = dev->virtqueue[msg->payload.state.index]; /* We have to stop the queue (virtio) if it is running. */ @@ -1135,7 +1161,7 @@ vhost_user_get_vring_base(struct virtio_net *dev, msg->size = sizeof(msg->payload.state); - return 0; + return VH_RESULT_OK; } VH_RESULT_REPLY here. -static void -vhost_user_get_protocol_features(struct virtio_net *dev, +static int +vhost_user_get_protocol_features(struct virtio_net **pdev, struct VhostUserMsg *msg) { + struct virtio_net *dev = *pdev; uint64_t features, protocol_features; rte_vhost_driver_get_features(dev->ifname, &features); @@ -1189,40 +1217,46 @@ vhost_user_get_protocol_features(struct virtio_net *dev, msg->payload.u64 = protocol_features; msg->size = sizeof(msg->payload.u64); + + return VH_RESULT_OK; } Ditto. I have the patches to fix these, it will be posted as preliminary part of my postcopy series. Please, next time, test your series before posting. Thanks, Maxime
Re: [dpdk-dev] [PATCH v7] app/testpmd: add forwarding mode to simulate a noisy neighbour
On Tue, Oct 02, 2018 at 09:57:52AM +0200, Thomas Monjalon wrote: 02/10/2018 09:19, Jens Freimann: On Mon, Oct 01, 2018 at 01:13:32PM +, Iremonger, Bernard wrote: >./devtools/check-git-log.sh -1 >Headline too long: >app/testpmd: add forwarding mode to simulate a noisy neighbour I'm sorry, I failed to use checkpatches.sh correctly :) I did: #> git show | DPDK_CHECKPATCH_PATH="/home/jfreiman/code/linux/scripts/checkpatch.pl" devtools/checkpatches.sh -- 1/1 valid patch Why this command is not correct? checkpatches.sh looks for the string "Subject:" which is not included in git show output. Using cat on the patch file instead will work. regards, Jens
Re: [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list
On 01-Oct-18 6:01 PM, Stephen Hemminger wrote: On Mon, 1 Oct 2018 13:56:09 +0100 Anatoly Burakov wrote: diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h index aff0688dd..1d8b0a6fe 100644 --- a/lib/librte_eal/common/include/rte_eal_memconfig.h +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h @@ -30,6 +30,7 @@ struct rte_memseg_list { uint64_t addr_64; /**< Makes sure addr is always 64-bits */ }; + size_t len; /**< Length of memory area covered by this memseg list. */ int socket_id; /**< Socket ID for all memsegs in this list. */ uint64_t page_sz; /**< Page size for all memsegs in this list. */ volatile uint32_t version; /**< version number for multiprocess sync. */ If you are going to break ABI, why not try and rearrange to eliminate holes: Output of pahole (on x86 64 bit): struct rte_memseg_list { union { void * base_va; /* 0 8 */ uint64_t addr_64; /* 0 8 */ }; /* 0 8 */ size_t len; /* 8 8 */ intsocket_id;/*16 4 */ /* XXX 4 bytes hole, try to pack */ uint64_t page_sz; /*24 8 */ volatile uint32_t version; /*32 4 */ /* XXX 4 bytes hole, try to pack */ struct rte_fbarray memseg_arr; /*4096 */ /* XXX last struct has 4 bytes of padding */ /* size: 136, cachelines: 3, members: 6 */ /* sum members: 128, holes: 2, sum holes: 8 */ /* paddings: 1, sum paddings: 4 */ /* last cacheline: 8 bytes */ }; Hi Stephen, This data structure isn't performance-critical in any remote sense, but sure, I can do that. -- Thanks, Anatoly
Re: [dpdk-dev] [PATCH 1/4] ethdev: add SCTP Rx checksum offload support
On 10/1/2018 4:59 PM, Jerin Jacob wrote: > -Original Message- >> Date: Mon, 1 Oct 2018 14:46:39 +0100 >> From: Ferruh Yigit >> To: Jerin Jacob , Wenzhuo Lu >> , Jingjing Wu , Bernard >> Iremonger , John McNamara >> , Marko Kovacevic , >> Thomas Monjalon , Andrew Rybchenko >> >> CC: dev@dpdk.org >> Subject: Re: [dpdk-dev] [PATCH 1/4] ethdev: add SCTP Rx checksum offload >> support >> User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 >> Thunderbird/52.9.1 >> >> >> On 9/13/2018 2:47 PM, Jerin Jacob wrote: >>> Signed-off-by: Jerin Jacob >> >> Overall set looks good to me, I put some comments on individual patches. >> >> And can you please rebase on top of latest head? > > Sure. > > Regarding space issue mentioned in other email in this thread. > It looks like similar space added in other offloads. > example: http://git.dpdk.org/dpdk/tree/app/test-pmd/config.c#n571 > > So, I expect no change in this patch other than rebase to latest head. > If not, let me know. As commented to the patch, can you also check "csum show", "csum set" functions in testpmd, I think they are affected and need to be updated with your patch.
[dpdk-dev] [PATCH] mbuf: clarify QINQ flag usage
Update implementation that when PKT_RX_QINQ_STRIPPED mbuf ol_flags set by PMD, PKT_RX_QINQ, PKT_RX_VLAN_STRIPPED & PKT_RX_VLAN should be also set. Clarify mbuf documentations that when PKT_RX_QINQ set PKT_RX_VLAN also should be set. So that appllication can rely on PKT_RX_QINQ flag to access both mbuf.vlan_tci & mbuf.vlan_tci_outer Signed-off-by: Ferruh Yigit --- Cc: Hyong Youb Kim Cc: John Daley --- app/test-pmd/rxonly.c| 2 +- doc/guides/nics/features.rst | 7 --- drivers/net/i40e/i40e_rxtx.c | 3 ++- lib/librte_mbuf/rte_mbuf.c | 1 + lib/librte_mbuf/rte_mbuf.h | 5 +++-- 5 files changed, 11 insertions(+), 7 deletions(-) diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c index a93d80612..08a5fc2cf 100644 --- a/app/test-pmd/rxonly.c +++ b/app/test-pmd/rxonly.c @@ -132,7 +132,7 @@ pkt_burst_receive(struct fwd_stream *fs) printf(" - timestamp %"PRIu64" ", mb->timestamp); if (ol_flags & PKT_RX_VLAN_STRIPPED) printf(" - VLAN tci=0x%x", mb->vlan_tci); - if (ol_flags & PKT_RX_QINQ_STRIPPED) + if (ol_flags & PKT_RX_QINQ) printf(" - QinQ VLAN tci=0x%x, VLAN tci outer=0x%x", mb->vlan_tci, mb->vlan_tci_outer); if (mb->packet_type) { diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst index b085bda86..c0cbe3784 100644 --- a/doc/guides/nics/features.rst +++ b/doc/guides/nics/features.rst @@ -528,7 +528,7 @@ Supports VLAN offload to hardware. * **[uses] rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_VLAN_STRIP,DEV_RX_OFFLOAD_VLAN_FILTER,DEV_RX_OFFLOAD_VLAN_EXTEND``. * **[uses] rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_VLAN_INSERT``. * **[implements] eth_dev_ops**: ``vlan_offload_set``. -* **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_VLAN_STRIPPED``, ``mbuf.vlan_tci``. +* **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_VLAN_STRIPPED``, ``mbuf.ol_flags:PKT_RX_VLAN`` ``mbuf.vlan_tci``. * **[provides] rte_eth_dev_info**: ``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_VLAN_STRIP``, ``tx_offload_capa,tx_queue_offload_capa:DEV_TX_OFFLOAD_VLAN_INSERT``. * **[related]API**: ``rte_eth_dev_set_vlan_offload()``, @@ -545,8 +545,9 @@ Supports QinQ (queue in queue) offload. * **[uses] rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_QINQ_STRIP``. * **[uses] rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_QINQ_INSERT``. * **[uses] mbuf**: ``mbuf.ol_flags:PKT_TX_QINQ_PKT``. -* **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_QINQ_STRIPPED``, ``mbuf.vlan_tci``, - ``mbuf.vlan_tci_outer``. +* **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_QINQ_STRIPPED``, ``mbuf.ol_flags:PKT_RX_QINQ``, + ``mbuf.ol_flags:PKT_RX_VLAN_STRIPPED``, ``mbuf.ol_flags:PKT_RX_VLAN`` + ``mbuf.vlan_tci``, ``mbuf.vlan_tci_outer``. * **[provides] rte_eth_dev_info**: ``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_QINQ_STRIP``, ``tx_offload_capa,tx_queue_offload_capa:DEV_TX_OFFLOAD_QINQ_INSERT``. diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 7c986d535..b2819f757 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -83,7 +83,8 @@ i40e_rxd_to_vlan_tci(struct rte_mbuf *mb, volatile union i40e_rx_desc *rxdp) #ifndef RTE_LIBRTE_I40E_16BYTE_RX_DESC if (rte_le_to_cpu_16(rxdp->wb.qword2.ext_status) & (1 << I40E_RX_DESC_EXT_STATUS_L2TAG2P_SHIFT)) { - mb->ol_flags |= PKT_RX_QINQ_STRIPPED; + mb->ol_flags |= PKT_RX_QINQ_STRIPPED | PKT_RX_QINQ | + PKT_RX_VLAN_STRIPPED | PKT_RX_VLAN; mb->vlan_tci_outer = mb->vlan_tci; mb->vlan_tci = rte_le_to_cpu_16(rxdp->wb.qword2.l2tag2_2); PMD_RX_LOG(DEBUG, "Descriptor l2tag2_1: %u, l2tag2_2: %u", diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c index e714c5a59..05a5a17fe 100644 --- a/lib/librte_mbuf/rte_mbuf.c +++ b/lib/librte_mbuf/rte_mbuf.c @@ -297,6 +297,7 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask) case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP"; case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST"; case PKT_RX_QINQ_STRIPPED: return "PKT_RX_QINQ_STRIPPED"; + case PKT_RX_QINQ: return "PKT_RX_QINQ"; case PKT_RX_LRO: return "PKT_RX_LRO"; case PKT_RX_TIMESTAMP: return "PKT_RX_TIMESTAMP"; case PKT_RX_SEC_OFFLOAD: return "PKT_RX_SEC_OFFLOAD"; diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index a50b05c64..d018f19bd 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -140,7 +140,7 @@ extern "C" { * The 2 vlans have been stripped by the hardware and their tci are * saved in mbuf->vlan_tci (inner) and mbuf->vlan_tci_outer (outer). * This can only happen if vlan stripping is enabled in the R
[dpdk-dev] [PATCH v2 00/17] vhost: add postcopy live-migration support
In this v2: - Rebase on top of Nikolay message handling series. It requires passing an extra parameter to message handlers ( the fd of the socket as set_mem_table needs to send an intermediate reply). - Preliminary patches to fix issues with message handling rework not handling replies properly. - Handle userfaultfd region registration errors properly. - Don't build postcopy by default as userfaultd only landed in v4.3 kernel. It gets automatically enabled with Meson if headers are present. With classic live-migration, the VM runs on source while its content is being migrated to destination. When pages already migrated to destination are dirtied by the source, they get copied until both source and destination memory converge. At that time, the source is stopped and destination is started. With postcopy live-migration, the VM is started on destination before all the memory has been migrated. When the VM tries to access a page that haven't been migrated yet, a pagefault is triggered, handled by userfaultfd which pauses the thread. A Qemu thread in charge of postcopy request the source for the missing page. Once received and mapped, the paused thread gets resumed. Userfaultfd supports handling faults from a different process, and Qemu supports postcopy with vhost-user backends since v2.12. One problem encountered with classic live-migration for VMs relying on vhost-user backends is that when the traffic is high (e.g. PVP), it happens that it never converges as pages gets dirtied at a faster rate than they are copied to the destination. It is expected this problem sould be solved with using postcopy, as rings memory and buffers will be copied once, when destination will pagefault on them. Note that it will certainly require a rebase to apply on top of Nikolay's vhost-user message handling rework. Steps to test postcopy: 1. Run DPDK's Testpmd application on source: ./install/bin/testpmd -m 512 --file-prefix=src -l 0,2 -n 4 \ --vdev 'net_vhost0,iface=/tmp/vu-src' -- --portmask=1 -i \ --rxq=1 --txq=1 --nb-cores=1 --eth-peer=0,52:54:00:11:22:12 \ --no-mlockall 2. Run DPDK's Testpmd application on destination: ./install/bin/testpmd -m 512 --file-prefix=dst -l 0,2 -n 4 \ --vdev 'net_vhost0,iface=/tmp/vu-dst,postcopy-support=1' -- --portmask=1 -i \ --rxq=1 --txq=1 --nb-cores=1 --eth-peer=0,52:54:00:11:22:12 \ --no-mlockall 3. Launch VM on source: ./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 3G -smp 2 -cpu host \ -object memory-backend-file,id=mem,size=3G,mem-path=/dev/shm,share=on \ -numa node,memdev=mem -mem-prealloc \ -chardev socket,id=char0,path=/tmp/vu-src \ -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \ -device virtio-net-pci,netdev=mynet1 /home/virt/rhel7.6-1-clone.qcow2 \ -net none -vnc :0 -monitor stdio 4. Launch VM on destination: ./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 3G -smp 2 -cpu host \ -object memory-backend-file,id=mem,size=3G,mem-path=/dev/shm,share=on \ -numa node,memdev=mem -mem-prealloc \ -chardev socket,id=char0,path=/tmp/vu-dst \ -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \ -device virtio-net-pci,netdev=mynet1 /home/virt/rhel7.6-1-clone.qcow2 \ -net none -vnc :1 -monitor stdio -incoming tcp:: 5. In both testpmd prompts, start flooding the virtio-net device: testpmd> set fwd txonly testpmd> start 6. In destination's Qemu monitor, enable postcopy: (qemu) migrate_set_capability postcopy-ram on 7. In source's Qemu monitor, enable postcopy and launch migration: (qemu) migrate_set_capability postcopy-ram on (qemu) migrate -d tcp:0: (qemu) migrate_start_postcopy Maxime Coquelin (17): vhost: fix messages error checks vhost: fix return code of messages requiring replies vhost: fix error handling when mem table gets updated vhost: define postcopy protocol flag vhost: add number of fds to vhost-user messages and use it vhost: pass socket fd to message handling callbacks vhost: enable fds passing when sending vhost-user messages vhost: add config flag for postcopy feature vhost: introduce postcopy's advise message vhost: add support for postcopy's listen message vhost: register new regions with userfaultfd vhost: avoid useless VhostUserMemory copy vhost: send userfault range addresses back to qemu vhost: add support to postcopy's end request vhost: enable postcopy protocol feature vhost: add flag to enable postcopy live-migration net/vhost: add parameter to enable postcopy support config/common_linuxapp | 1 + doc/guides/nics/vhost.rst | 5 + doc/guides/prog_guide/vhost_lib.rst | 8 + drivers/net/vhost/rte_eth_vhost.c | 13 ++ lib/librte_vhost/meson.build| 2 + lib/librte_vhost/rte_vhost.h| 5 + lib/librte_vhost/socket.c | 40 +++- lib/librte_vhost/vhost.h| 3 + lib/librte_vhost/vhost_user.c | 304 +++- lib/librte_vhost/vhost_user.h | 12 +- 10 files change
[dpdk-dev] [PATCH v2 01/17] vhost: fix messages error checks
Return of message handling has now changed to an enum that can take non-negative value that is not zero in case a reply is needed. But the code checking the variable afterwards has not been updated, leading to success messages handling being treated as errors. Fixes: 4e601952cae6 ("vhost: message handling implemented as a callback array") Signed-off-by: Maxime Coquelin --- lib/librte_vhost/vhost_user.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index 7ef3fb4a4..060b41893 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -1783,7 +1783,7 @@ vhost_user_msg_handler(int vid, int fd) } skip_to_post_handle: - if (!ret && dev->extern_ops.post_msg_handle) { + if (ret != VH_RESULT_ERR && dev->extern_ops.post_msg_handle) { uint32_t need_reply; ret = (*dev->extern_ops.post_msg_handle)( @@ -1800,10 +1800,10 @@ vhost_user_msg_handler(int vid, int fd) vhost_user_unlock_all_queue_pairs(dev); if (msg.flags & VHOST_USER_NEED_REPLY) { - msg.payload.u64 = !!ret; + msg.payload.u64 = ret == VH_RESULT_ERR; msg.size = sizeof(msg.payload.u64); send_vhost_reply(fd, &msg); - } else if (ret) { + } else if (ret == VH_RESULT_ERR) { RTE_LOG(ERR, VHOST_CONFIG, "vhost message handling failed.\n"); return -1; -- 2.17.1
[dpdk-dev] [PATCH v2 02/17] vhost: fix return code of messages requiring replies
VHOST_USER_GET_PROTOCOL_FEATURES, VHOST_USER_GET_VRING_BASE and VHOST_USER_SET_LOG_BASE require replies, so their handlers should return VH_RESULT_REPLY, not VH_RESULT_OK. Fixes: 2cfbbb86c62a ("vhost: unify message handling function signature") Signed-off-by: Maxime Coquelin --- lib/librte_vhost/vhost_user.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index 060b41893..ce0ac0098 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -1161,7 +1161,7 @@ vhost_user_get_vring_base(struct virtio_net **pdev, msg->size = sizeof(msg->payload.state); - return VH_RESULT_OK; + return VH_RESULT_REPLY; } /* @@ -1218,7 +1218,7 @@ vhost_user_get_protocol_features(struct virtio_net **pdev, msg->payload.u64 = protocol_features; msg->size = sizeof(msg->payload.u64); - return VH_RESULT_OK; + return VH_RESULT_REPLY; } static int @@ -1298,7 +1298,7 @@ vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg) msg->size = sizeof(msg->payload.u64); - return VH_RESULT_OK; + return VH_RESULT_REPLY; } static int vhost_user_set_log_fd(struct virtio_net **pdev __rte_unused, -- 2.17.1
[dpdk-dev] [PATCH v2 03/17] vhost: fix error handling when mem table gets updated
When the memory table gets updated, the rings addresses need to be translated again. If it fails, we need to exit cleanly by unmapping memory regions. Fixes: d5022533c20a ("vhost: retranslate vring addr when memory table changes") Cc: sta...@dpdk.org Signed-off-by: Maxime Coquelin --- lib/librte_vhost/vhost_user.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index ce0ac0098..c669d3c0a 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -964,7 +964,8 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg) dev = translate_ring_addresses(dev, i); if (!dev) - return VH_RESULT_ERR; + goto err_mmap; + *pdev = dev; } -- 2.17.1
[dpdk-dev] [PATCH v2 04/17] vhost: define postcopy protocol flag
Signed-off-by: Dr. David Alan Gilbert Signed-off-by: Maxime Coquelin --- lib/librte_vhost/rte_vhost.h | 4 1 file changed, 4 insertions(+) diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h index b02673d4a..b3cc6990d 100644 --- a/lib/librte_vhost/rte_vhost.h +++ b/lib/librte_vhost/rte_vhost.h @@ -66,6 +66,10 @@ extern "C" { #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11 #endif +#ifndef VHOST_USER_PROTOCOL_F_PAGEFAULT +#define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 +#endif + /** Indicate whether protocol features negotiation is supported. */ #ifndef VHOST_USER_F_PROTOCOL_FEATURES #define VHOST_USER_F_PROTOCOL_FEATURES 30 -- 2.17.1
[dpdk-dev] [PATCH v2 06/17] vhost: pass socket fd to message handling callbacks
This is not used for now, but will be needed for the special handling of VHOST_USER_SET_MEM_TABLE message once postcopy will be supported. Signed-off-by: Maxime Coquelin --- lib/librte_vhost/vhost_user.c | 71 +++ 1 file changed, 47 insertions(+), 24 deletions(-) diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index 608d2f3e4..050fc8bf9 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -138,14 +138,16 @@ vhost_backend_cleanup(struct virtio_net *dev) */ static int vhost_user_set_owner(struct virtio_net **pdev __rte_unused, - struct VhostUserMsg *msg __rte_unused) + struct VhostUserMsg *msg __rte_unused, + int main_fd __rte_unused) { return VH_RESULT_OK; } static int vhost_user_reset_owner(struct virtio_net **pdev, - struct VhostUserMsg *msg __rte_unused) + struct VhostUserMsg *msg __rte_unused, + int main_fd __rte_unused) { struct virtio_net *dev = *pdev; vhost_destroy_device_notify(dev); @@ -159,7 +161,8 @@ vhost_user_reset_owner(struct virtio_net **pdev, * The features that we support are requested. */ static int -vhost_user_get_features(struct virtio_net **pdev, struct VhostUserMsg *msg) +vhost_user_get_features(struct virtio_net **pdev, struct VhostUserMsg *msg, + int main_fd __rte_unused) { struct virtio_net *dev = *pdev; uint64_t features = 0; @@ -176,7 +179,8 @@ vhost_user_get_features(struct virtio_net **pdev, struct VhostUserMsg *msg) * The queue number that we support are requested. */ static int -vhost_user_get_queue_num(struct virtio_net **pdev, struct VhostUserMsg *msg) +vhost_user_get_queue_num(struct virtio_net **pdev, struct VhostUserMsg *msg, + int main_fd __rte_unused) { struct virtio_net *dev = *pdev; uint32_t queue_num = 0; @@ -193,7 +197,8 @@ vhost_user_get_queue_num(struct virtio_net **pdev, struct VhostUserMsg *msg) * We receive the negotiated features supported by us and the virtio device. */ static int -vhost_user_set_features(struct virtio_net **pdev, struct VhostUserMsg *msg) +vhost_user_set_features(struct virtio_net **pdev, struct VhostUserMsg *msg, + int main_fd __rte_unused) { struct virtio_net *dev = *pdev; uint64_t features = msg->payload.u64; @@ -275,7 +280,8 @@ vhost_user_set_features(struct virtio_net **pdev, struct VhostUserMsg *msg) */ static int vhost_user_set_vring_num(struct virtio_net **pdev, -struct VhostUserMsg *msg) + struct VhostUserMsg *msg, + int main_fd __rte_unused) { struct virtio_net *dev = *pdev; struct vhost_virtqueue *vq = dev->virtqueue[msg->payload.state.index]; @@ -637,7 +643,8 @@ translate_ring_addresses(struct virtio_net *dev, int vq_index) * This function then converts these to our address space. */ static int -vhost_user_set_vring_addr(struct virtio_net **pdev, struct VhostUserMsg *msg) +vhost_user_set_vring_addr(struct virtio_net **pdev, struct VhostUserMsg *msg, + int main_fd __rte_unused) { struct virtio_net *dev = *pdev; struct vhost_virtqueue *vq; @@ -674,7 +681,8 @@ vhost_user_set_vring_addr(struct virtio_net **pdev, struct VhostUserMsg *msg) */ static int vhost_user_set_vring_base(struct virtio_net **pdev, - struct VhostUserMsg *msg) + struct VhostUserMsg *msg, + int main_fd __rte_unused) { struct virtio_net *dev = *pdev; dev->virtqueue[msg->payload.state.index]->last_used_idx = @@ -807,7 +815,8 @@ vhost_memory_changed(struct VhostUserMemory *new, } static int -vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg) +vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg, + int main_fd __rte_unused) { struct virtio_net *dev = *pdev; struct VhostUserMemory memory = msg->payload.memory; @@ -1022,7 +1031,8 @@ virtio_is_ready(struct virtio_net *dev) } static int -vhost_user_set_vring_call(struct virtio_net **pdev, struct VhostUserMsg *msg) +vhost_user_set_vring_call(struct virtio_net **pdev, struct VhostUserMsg *msg, + int main_fd __rte_unused) { struct virtio_net *dev = *pdev; struct vhost_vring_file file; @@ -1046,7 +1056,8 @@ vhost_user_set_vring_call(struct virtio_net **pdev, struct VhostUserMsg *msg) } static int vhost_user_set_vring_err(struct virtio_net **pdev __rte_unused, - struct VhostUserMsg *msg) + struct VhostUserMsg *msg, + int main_fd __rte_unused) { if (!(msg->payload.u64 & VHOST_USER_VRING_NOFD_MASK)) close(msg->
[dpdk-dev] [PATCH v2 08/17] vhost: add config flag for postcopy feature
Postcopy live-migration features relies on userfaultfd, which was only introduced in kernel v4.3. This patch introduces a new define to allow building vhost library on kernels not supporting userfaultfd. With legacy build system, user has to explicitly set CONFIG_RTE_LIBRTE_VHOST_POSTCOPY to 'y'. With Meson build system, RTE_LIBRTE_VHOST_POSTCOPY gets automatically defined if userfaultfd kernel header is present. Suggested-by: Ilya Maximets Signed-off-by: Maxime Coquelin --- config/common_linuxapp | 1 + lib/librte_vhost/meson.build | 2 ++ 2 files changed, 3 insertions(+) diff --git a/config/common_linuxapp b/config/common_linuxapp index 9c5ea9d89..dc43dcc36 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -14,6 +14,7 @@ CONFIG_RTE_LIBRTE_KNI=y CONFIG_RTE_LIBRTE_PMD_KNI=y CONFIG_RTE_LIBRTE_VHOST=y CONFIG_RTE_LIBRTE_VHOST_NUMA=y +CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n CONFIG_RTE_LIBRTE_PMD_VHOST=y CONFIG_RTE_LIBRTE_IFC_PMD=y CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y diff --git a/lib/librte_vhost/meson.build b/lib/librte_vhost/meson.build index 9d25b4d88..e33e6fc16 100644 --- a/lib/librte_vhost/meson.build +++ b/lib/librte_vhost/meson.build @@ -7,6 +7,8 @@ endif if has_libnuma == 1 dpdk_conf.set10('RTE_LIBRTE_VHOST_NUMA', true) endif +dpdk_conf.set('RTE_LIBRTE_VHOST_POSTCOPY', + cc.has_header('linux/userfaultfd.h')) version = 4 allow_experimental_apis = true cflags += '-fno-strict-aliasing' -- 2.17.1
[dpdk-dev] [PATCH v2 09/17] vhost: introduce postcopy's advise message
This patch opens a userfaultfd and sends it back to Qemu's VHOST_USER_POSTCOPY_ADVISE request. Signed-off-by: Dr. David Alan Gilbert Signed-off-by: Maxime Coquelin --- lib/librte_vhost/vhost.h | 2 ++ lib/librte_vhost/vhost_user.c | 44 +++ lib/librte_vhost/vhost_user.h | 3 ++- 3 files changed, 48 insertions(+), 1 deletion(-) diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 25ffd7614..21722d8a8 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -363,6 +363,8 @@ struct virtio_net { int slave_req_fd; rte_spinlock_t slave_req_lock; + int postcopy_ufd; + /* * Device id to identify a specific backend device. * It's set to -1 for the default software implementation. diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index 436ab7bf5..71721edc7 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -24,13 +24,19 @@ #include #include #include +#include +#include #include #include #include +#include #include #ifdef RTE_LIBRTE_VHOST_NUMA #include #endif +#ifdef RTE_LIBRTE_VHOST_POSTCOPY +#include +#endif #include #include @@ -69,6 +75,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = { [VHOST_USER_IOTLB_MSG] = "VHOST_USER_IOTLB_MSG", [VHOST_USER_CRYPTO_CREATE_SESS] = "VHOST_USER_CRYPTO_CREATE_SESS", [VHOST_USER_CRYPTO_CLOSE_SESS] = "VHOST_USER_CRYPTO_CLOSE_SESS", + [VHOST_USER_POSTCOPY_ADVISE] = "VHOST_USER_POSTCOPY_ADVISE", }; /* The possible results of a message handling function */ @@ -1505,6 +1512,42 @@ vhost_user_iotlb_msg(struct virtio_net **pdev, struct VhostUserMsg *msg, return VH_RESULT_OK; } +static int +vhost_user_set_postcopy_advise(struct virtio_net **pdev, + struct VhostUserMsg *msg, + int main_fd __rte_unused) +{ + struct virtio_net *dev = *pdev; +#ifdef RTE_LIBRTE_VHOST_POSTCOPY + struct uffdio_api api_struct; + + dev->postcopy_ufd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); + + if (dev->postcopy_ufd == -1) { + RTE_LOG(ERR, VHOST_CONFIG, "Userfaultfd not available: %s\n", + strerror(errno)); + return VH_RESULT_ERR; + } + api_struct.api = UFFD_API; + api_struct.features = 0; + if (ioctl(dev->postcopy_ufd, UFFDIO_API, &api_struct)) { + RTE_LOG(ERR, VHOST_CONFIG, "UFFDIO_API ioctl failure: %s\n", + strerror(errno)); + close(dev->postcopy_ufd); + return VH_RESULT_ERR; + } + msg->fds[0] = dev->postcopy_ufd; + msg->fd_num = 1; + + return VH_RESULT_REPLY; +#else + dev->postcopy_ufd = -1; + msg->fd_num = 0; + + return VH_RESULT_ERR; +#endif +} + typedef int (*vhost_message_handler_t)(struct virtio_net **pdev, struct VhostUserMsg *msg, int main_fd); @@ -1532,6 +1575,7 @@ static vhost_message_handler_t vhost_message_handlers[VHOST_USER_MAX] = { [VHOST_USER_NET_SET_MTU] = vhost_user_net_set_mtu, [VHOST_USER_SET_SLAVE_REQ_FD] = vhost_user_set_req_fd, [VHOST_USER_IOTLB_MSG] = vhost_user_iotlb_msg, + [VHOST_USER_POSTCOPY_ADVISE] = vhost_user_set_postcopy_advise, }; diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h index dd0262f8f..2030b40a5 100644 --- a/lib/librte_vhost/vhost_user.h +++ b/lib/librte_vhost/vhost_user.h @@ -50,7 +50,8 @@ typedef enum VhostUserRequest { VHOST_USER_IOTLB_MSG = 22, VHOST_USER_CRYPTO_CREATE_SESS = 26, VHOST_USER_CRYPTO_CLOSE_SESS = 27, - VHOST_USER_MAX = 28 + VHOST_USER_POSTCOPY_ADVISE = 28, + VHOST_USER_MAX = 29 } VhostUserRequest; typedef enum VhostUserSlaveRequest { -- 2.17.1
[dpdk-dev] [PATCH v2 05/17] vhost: add number of fds to vhost-user messages and use it
As soons as some anciliarry datai (fds) are received, it is copied without checking its length. This patch adds adds the number of fds received to the message, which is set in read_vhost_message(). This is preliminary work to support sending fds to Qemu. Signed-off-by: Dr. David Alan Gilbert Signed-off-by: Maxime Coquelin --- lib/librte_vhost/socket.c | 21 - lib/librte_vhost/vhost_user.c | 2 +- lib/librte_vhost/vhost_user.h | 4 +++- 3 files changed, 20 insertions(+), 7 deletions(-) diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index d63031747..c04d3d305 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -94,18 +94,23 @@ static struct vhost_user vhost_user = { .mutex = PTHREAD_MUTEX_INITIALIZER, }; -/* return bytes# of read on success or negative val on failure. */ +/* + * return bytes# of read on success or negative val on failure. Update fdnum + * with number of fds read. + */ int -read_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num) +read_fd_message(int sockfd, char *buf, int buflen, int *fds, int max_fds, + int *fd_num) { struct iovec iov; struct msghdr msgh; - size_t fdsize = fd_num * sizeof(int); - char control[CMSG_SPACE(fdsize)]; + char control[CMSG_SPACE(max_fds * sizeof(int))]; struct cmsghdr *cmsg; int got_fds = 0; int ret; + *fd_num = 0; + memset(&msgh, 0, sizeof(msgh)); iov.iov_base = buf; iov.iov_len = buflen; @@ -131,13 +136,19 @@ read_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num) if ((cmsg->cmsg_level == SOL_SOCKET) && (cmsg->cmsg_type == SCM_RIGHTS)) { got_fds = (cmsg->cmsg_len - CMSG_LEN(0)) / sizeof(int); + if (got_fds > max_fds) { + RTE_LOG(ERR, VHOST_CONFIG, + "Received msg contains more fds than supported\n"); + return -1; + } + *fd_num = got_fds; memcpy(fds, CMSG_DATA(cmsg), got_fds * sizeof(int)); break; } } /* Clear out unused file descriptors */ - while (got_fds < fd_num) + while (got_fds < max_fds) fds[got_fds++] = -1; return ret; diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index c669d3c0a..608d2f3e4 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -1514,7 +1514,7 @@ read_vhost_message(int sockfd, struct VhostUserMsg *msg) int ret; ret = read_fd_message(sockfd, (char *)msg, VHOST_USER_HDR_SIZE, - msg->fds, VHOST_MEMORY_MAX_NREGIONS); + msg->fds, VHOST_MEMORY_MAX_NREGIONS, &msg->fd_num); if (ret <= 0) return ret; diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h index 42166adf2..dd0262f8f 100644 --- a/lib/librte_vhost/vhost_user.h +++ b/lib/librte_vhost/vhost_user.h @@ -132,6 +132,7 @@ typedef struct VhostUserMsg { VhostUserVringArea area; } payload; int fds[VHOST_MEMORY_MAX_NREGIONS]; + int fd_num; } __attribute((packed)) VhostUserMsg; #define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64) @@ -146,7 +147,8 @@ int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm); int vhost_user_host_notifier_ctrl(int vid, bool enable); /* socket.c */ -int read_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num); +int read_fd_message(int sockfd, char *buf, int buflen, int *fds, int max_fds, + int *fd_num); int send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num); #endif -- 2.17.1
[dpdk-dev] [PATCH v2 07/17] vhost: enable fds passing when sending vhost-user messages
Passing userfault fds to Qemu will be required for postcopy live-migration feature. Signed-off-by: Dr. David Alan Gilbert Signed-off-by: Maxime Coquelin --- lib/librte_vhost/vhost_user.c | 27 +++ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index 050fc8bf9..436ab7bf5 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -171,6 +171,7 @@ vhost_user_get_features(struct virtio_net **pdev, struct VhostUserMsg *msg, msg->payload.u64 = features; msg->size = sizeof(msg->payload.u64); + msg->fd_num = 0; return VH_RESULT_REPLY; } @@ -189,6 +190,7 @@ vhost_user_get_queue_num(struct virtio_net **pdev, struct VhostUserMsg *msg, msg->payload.u64 = (uint64_t)queue_num; msg->size = sizeof(msg->payload.u64); + msg->fd_num = 0; return VH_RESULT_REPLY; } @@ -1174,6 +1176,7 @@ vhost_user_get_vring_base(struct virtio_net **pdev, vq->batch_copy_elems = NULL; msg->size = sizeof(msg->payload.state); + msg->fd_num = 0; return VH_RESULT_REPLY; } @@ -1233,6 +1236,7 @@ vhost_user_get_protocol_features(struct virtio_net **pdev, msg->payload.u64 = protocol_features; msg->size = sizeof(msg->payload.u64); + msg->fd_num = 0; return VH_RESULT_REPLY; } @@ -1315,6 +1319,7 @@ vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg, dev->log_size = size; msg->size = sizeof(msg->payload.u64); + msg->fd_num = 0; return VH_RESULT_REPLY; } @@ -1561,13 +1566,13 @@ read_vhost_message(int sockfd, struct VhostUserMsg *msg) } static int -send_vhost_message(int sockfd, struct VhostUserMsg *msg, int *fds, int fd_num) +send_vhost_message(int sockfd, struct VhostUserMsg *msg) { if (!msg) return 0; return send_fd_message(sockfd, (char *)msg, - VHOST_USER_HDR_SIZE + msg->size, fds, fd_num); + VHOST_USER_HDR_SIZE + msg->size, msg->fds, msg->fd_num); } static int @@ -1581,19 +1586,18 @@ send_vhost_reply(int sockfd, struct VhostUserMsg *msg) msg->flags |= VHOST_USER_VERSION; msg->flags |= VHOST_USER_REPLY_MASK; - return send_vhost_message(sockfd, msg, NULL, 0); + return send_vhost_message(sockfd, msg); } static int -send_vhost_slave_message(struct virtio_net *dev, struct VhostUserMsg *msg, -int *fds, int fd_num) +send_vhost_slave_message(struct virtio_net *dev, struct VhostUserMsg *msg) { int ret; if (msg->flags & VHOST_USER_NEED_REPLY) rte_spinlock_lock(&dev->slave_req_lock); - ret = send_vhost_message(dev->slave_req_fd, msg, fds, fd_num); + ret = send_vhost_message(dev->slave_req_fd, msg); if (ret < 0 && (msg->flags & VHOST_USER_NEED_REPLY)) rte_spinlock_unlock(&dev->slave_req_lock); @@ -1826,6 +1830,7 @@ vhost_user_msg_handler(int vid, int fd) if (msg.flags & VHOST_USER_NEED_REPLY) { msg.payload.u64 = ret == VH_RESULT_ERR; msg.size = sizeof(msg.payload.u64); + msg.fd_num = 0; send_vhost_reply(fd, &msg); } else if (ret == VH_RESULT_ERR) { RTE_LOG(ERR, VHOST_CONFIG, @@ -1909,7 +1914,7 @@ vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm) }, }; - ret = send_vhost_message(dev->slave_req_fd, &msg, NULL, 0); + ret = send_vhost_message(dev->slave_req_fd, &msg); if (ret < 0) { RTE_LOG(ERR, VHOST_CONFIG, "Failed to send IOTLB miss message (%d)\n", @@ -1925,8 +1930,6 @@ static int vhost_user_slave_set_vring_host_notifier(struct virtio_net *dev, uint64_t offset, uint64_t size) { - int *fdp = NULL; - size_t fd_num = 0; int ret; struct VhostUserMsg msg = { .request.slave = VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG, @@ -1942,11 +1945,11 @@ static int vhost_user_slave_set_vring_host_notifier(struct virtio_net *dev, if (fd < 0) msg.payload.area.u64 |= VHOST_USER_VRING_NOFD_MASK; else { - fdp = &fd; - fd_num = 1; + msg.fds[0] = fd; + msg.fd_num = 1; } - ret = send_vhost_slave_message(dev, &msg, fdp, fd_num); + ret = send_vhost_slave_message(dev, &msg); if (ret < 0) { RTE_LOG(ERR, VHOST_CONFIG, "Failed to set host notifier (%d)\n", ret); -- 2.17.1
[dpdk-dev] [PATCH v2 11/17] vhost: register new regions with userfaultfd
Signed-off-by: Dr. David Alan Gilbert Signed-off-by: Maxime Coquelin --- lib/librte_vhost/vhost_user.c | 33 - 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index bd468ca12..2f681d291 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -968,6 +968,32 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg, mmap_size, alignment, mmap_offset); + + if (dev->postcopy_listening) { +#ifdef RTE_LIBRTE_VHOST_POSTCOPY + struct uffdio_register reg_struct; + + reg_struct.range.start = (uint64_t)(uintptr_t)mmap_addr; + reg_struct.range.len = mmap_size; + reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING; + + if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER, + ®_struct)) { + RTE_LOG(ERR, VHOST_CONFIG, + "Failed to register ufd for region %d: (ufd = %d) %s\n", + i, dev->postcopy_ufd, + strerror(errno)); + goto err_ufd; + } + RTE_LOG(INFO, VHOST_CONFIG, + "\t userfaultfd registered for range : %llx - %llx\n", + reg_struct.range.start, + reg_struct.range.start + + reg_struct.range.len - 1); +#else + goto err_ufd; +#endif + } } for (i = 0; i < dev->nr_vring; i++) { @@ -983,7 +1009,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg, dev = translate_ring_addresses(dev, i); if (!dev) - goto err_mmap; + goto err_ufd; *pdev = dev; @@ -994,6 +1020,11 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg, return VH_RESULT_OK; +err_ufd: + if (dev->postcopy_ufd >= 0) { + close(dev->postcopy_ufd); + dev->postcopy_ufd = -1; + } err_mmap: free_mem_region(dev); rte_free(dev->mem); -- 2.17.1
[dpdk-dev] [PATCH v2 10/17] vhost: add support for postcopy's listen message
Signed-off-by: Dr. David Alan Gilbert Signed-off-by: Maxime Coquelin --- lib/librte_vhost/vhost.h | 1 + lib/librte_vhost/vhost_user.c | 19 +++ lib/librte_vhost/vhost_user.h | 4 +++- 3 files changed, 23 insertions(+), 1 deletion(-) diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 21722d8a8..9453cb28d 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -364,6 +364,7 @@ struct virtio_net { rte_spinlock_t slave_req_lock; int postcopy_ufd; + int postcopy_listening; /* * Device id to identify a specific backend device. diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index 71721edc7..bd468ca12 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -76,6 +76,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = { [VHOST_USER_CRYPTO_CREATE_SESS] = "VHOST_USER_CRYPTO_CREATE_SESS", [VHOST_USER_CRYPTO_CLOSE_SESS] = "VHOST_USER_CRYPTO_CLOSE_SESS", [VHOST_USER_POSTCOPY_ADVISE] = "VHOST_USER_POSTCOPY_ADVISE", + [VHOST_USER_POSTCOPY_LISTEN] = "VHOST_USER_POSTCOPY_LISTEN", }; /* The possible results of a message handling function */ @@ -1548,6 +1549,23 @@ vhost_user_set_postcopy_advise(struct virtio_net **pdev, #endif } +static int +vhost_user_set_postcopy_listen(struct virtio_net **pdev, + struct VhostUserMsg *msg __rte_unused, + int main_fd __rte_unused) +{ + struct virtio_net *dev = *pdev; + + if (dev->mem && dev->mem->nregions) { + RTE_LOG(ERR, VHOST_CONFIG, + "Regions already registered at postcopy-listen\n"); + return VH_RESULT_ERR; + } + dev->postcopy_listening = 1; + + return VH_RESULT_OK; +} + typedef int (*vhost_message_handler_t)(struct virtio_net **pdev, struct VhostUserMsg *msg, int main_fd); @@ -1576,6 +1594,7 @@ static vhost_message_handler_t vhost_message_handlers[VHOST_USER_MAX] = { [VHOST_USER_SET_SLAVE_REQ_FD] = vhost_user_set_req_fd, [VHOST_USER_IOTLB_MSG] = vhost_user_iotlb_msg, [VHOST_USER_POSTCOPY_ADVISE] = vhost_user_set_postcopy_advise, + [VHOST_USER_POSTCOPY_LISTEN] = vhost_user_set_postcopy_listen, }; diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h index 2030b40a5..73b1fe2b9 100644 --- a/lib/librte_vhost/vhost_user.h +++ b/lib/librte_vhost/vhost_user.h @@ -51,7 +51,9 @@ typedef enum VhostUserRequest { VHOST_USER_CRYPTO_CREATE_SESS = 26, VHOST_USER_CRYPTO_CLOSE_SESS = 27, VHOST_USER_POSTCOPY_ADVISE = 28, - VHOST_USER_MAX = 29 + VHOST_USER_POSTCOPY_LISTEN = 29, + VHOST_USER_POSTCOPY_END = 30, + VHOST_USER_MAX = 31 } VhostUserRequest; typedef enum VhostUserSlaveRequest { -- 2.17.1
[dpdk-dev] [PATCH v2 12/17] vhost: avoid useless VhostUserMemory copy
The VHOST_USER_SET_MEM_TABLE payload is copied when handled, whereas it could directly be referenced. This is not very important, but next, we'll need to update the payload and send it back to Qemu. Signed-off-by: Dr. David Alan Gilbert Signed-off-by: Maxime Coquelin --- lib/librte_vhost/vhost_user.c | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index 2f681d291..515d3c61c 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -829,7 +829,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg, int main_fd __rte_unused) { struct virtio_net *dev = *pdev; - struct VhostUserMemory memory = msg->payload.memory; + struct VhostUserMemory *memory = &msg->payload.memory; struct rte_vhost_mem_region *reg; void *mmap_addr; uint64_t mmap_size; @@ -839,17 +839,17 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg, int populate; int fd; - if (memory.nregions > VHOST_MEMORY_MAX_NREGIONS) { + if (memory->nregions > VHOST_MEMORY_MAX_NREGIONS) { RTE_LOG(ERR, VHOST_CONFIG, - "too many memory regions (%u)\n", memory.nregions); + "too many memory regions (%u)\n", memory->nregions); return VH_RESULT_ERR; } - if (dev->mem && !vhost_memory_changed(&memory, dev->mem)) { + if (dev->mem && !vhost_memory_changed(memory, dev->mem)) { RTE_LOG(INFO, VHOST_CONFIG, "(%d) memory regions not changed\n", dev->vid); - for (i = 0; i < memory.nregions; i++) + for (i = 0; i < memory->nregions; i++) close(msg->fds[i]); return VH_RESULT_OK; @@ -881,25 +881,25 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg, } dev->mem = rte_zmalloc("vhost-mem-table", sizeof(struct rte_vhost_memory) + - sizeof(struct rte_vhost_mem_region) * memory.nregions, 0); + sizeof(struct rte_vhost_mem_region) * memory->nregions, 0); if (dev->mem == NULL) { RTE_LOG(ERR, VHOST_CONFIG, "(%d) failed to allocate memory for dev->mem\n", dev->vid); return VH_RESULT_ERR; } - dev->mem->nregions = memory.nregions; + dev->mem->nregions = memory->nregions; - for (i = 0; i < memory.nregions; i++) { + for (i = 0; i < memory->nregions; i++) { fd = msg->fds[i]; reg = &dev->mem->regions[i]; - reg->guest_phys_addr = memory.regions[i].guest_phys_addr; - reg->guest_user_addr = memory.regions[i].userspace_addr; - reg->size= memory.regions[i].memory_size; + reg->guest_phys_addr = memory->regions[i].guest_phys_addr; + reg->guest_user_addr = memory->regions[i].userspace_addr; + reg->size= memory->regions[i].memory_size; reg->fd = fd; - mmap_offset = memory.regions[i].mmap_offset; + mmap_offset = memory->regions[i].mmap_offset; /* Check for memory_size + mmap_offset overflow */ if (mmap_offset >= -reg->size) { -- 2.17.1
[dpdk-dev] [PATCH v2 15/17] vhost: enable postcopy protocol feature
Enable postcopy protocol feature except if dequeue zero-copy is enabled. In this case, guest memory requires to be populated, which is not compatible with userfaultfd. Signed-off-by: Dr. David Alan Gilbert Signed-off-by: Maxime Coquelin --- lib/librte_vhost/vhost_user.c | 7 +++ lib/librte_vhost/vhost_user.h | 3 ++- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index ee7337ac8..9d08f4af0 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -1317,6 +1317,13 @@ vhost_user_get_protocol_features(struct virtio_net **pdev, if (!(features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))) protocol_features &= ~(1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK); + /* +* If dequeue zerocopy is enabled, guest memory requires to be +* populated, which is not compatible with postcopy. +*/ + if (dev->dequeue_zero_copy) + protocol_features &= ~(1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT); + msg->payload.u64 = protocol_features; msg->size = sizeof(msg->payload.u64); msg->fd_num = 0; diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h index 73b1fe2b9..dc97be843 100644 --- a/lib/librte_vhost/vhost_user.h +++ b/lib/librte_vhost/vhost_user.h @@ -22,7 +22,8 @@ (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ) | \ (1ULL << VHOST_USER_PROTOCOL_F_CRYPTO_SESSION) | \ (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD) | \ -(1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER)) +(1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) | \ +(1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT)) typedef enum VhostUserRequest { VHOST_USER_NONE = 0, -- 2.17.1
[dpdk-dev] [PATCH v2 13/17] vhost: send userfault range addresses back to qemu
Signed-off-by: Dr. David Alan Gilbert Signed-off-by: Maxime Coquelin --- lib/librte_vhost/vhost_user.c | 49 --- 1 file changed, 46 insertions(+), 3 deletions(-) diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index 515d3c61c..b207de6e0 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -89,6 +89,11 @@ enum vh_result { VH_RESULT_REPLY = 1, }; +static int +send_vhost_reply(int sockfd, struct VhostUserMsg *msg); +static int +read_vhost_message(int sockfd, struct VhostUserMsg *msg); + static uint64_t get_blk_size(int fd) { @@ -826,7 +831,7 @@ vhost_memory_changed(struct VhostUserMemory *new, static int vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg, - int main_fd __rte_unused) + int main_fd) { struct virtio_net *dev = *pdev; struct VhostUserMemory *memory = &msg->payload.memory; @@ -970,11 +975,49 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg, mmap_offset); if (dev->postcopy_listening) { + /* +* We haven't a better way right now than sharing +* DPDK's virtual address with Qemu, so that Qemu can +* retreive the region offset when handling userfaults. +*/ + memory->regions[i].userspace_addr = + reg->host_user_addr; + } + } + if (dev->postcopy_listening) { + /* Send the addresses back to qemu */ + msg->fd_num = 0; + send_vhost_reply(main_fd, msg); + + /* Wait for qemu to acknolwedge it's got the addresses +* we've got to wait before we're allowed to generate faults. +*/ + VhostUserMsg ack_msg; + if (read_vhost_message(main_fd, &ack_msg) <= 0) { + RTE_LOG(ERR, VHOST_CONFIG, + "Failed to read qemu ack on postcopy set-mem-table\n"); + goto err_mmap; + } + if (ack_msg.request.master != VHOST_USER_SET_MEM_TABLE) { + RTE_LOG(ERR, VHOST_CONFIG, + "Bad qemu ack on postcopy set-mem-table (%d)\n", + ack_msg.request.master); + goto err_mmap; + } + + /* Now userfault register and we can use the memory */ + for (i = 0; i < memory->nregions; i++) { #ifdef RTE_LIBRTE_VHOST_POSTCOPY + reg = &dev->mem->regions[i]; struct uffdio_register reg_struct; - reg_struct.range.start = (uint64_t)(uintptr_t)mmap_addr; - reg_struct.range.len = mmap_size; + /* +* Let's register all the mmap'ed area to ensure +* alignement on page boundary. +*/ + reg_struct.range.start = + (uint64_t)(uintptr_t)reg->mmap_addr; + reg_struct.range.len = reg->mmap_size; reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING; if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER, -- 2.17.1
[dpdk-dev] [PATCH v2 14/17] vhost: add support to postcopy's end request
The master sends this message before stopping handling userfaults, so that the backend closes the userfaultfd. The master waits for the slave to acknowledge the request with an empty 64bits payload for synchronization purpose. Signed-off-by: Dr. David Alan Gilbert Signed-off-by: Maxime Coquelin --- lib/librte_vhost/vhost_user.c | 21 + 1 file changed, 21 insertions(+) diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index b207de6e0..ee7337ac8 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -77,6 +77,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = { [VHOST_USER_CRYPTO_CLOSE_SESS] = "VHOST_USER_CRYPTO_CLOSE_SESS", [VHOST_USER_POSTCOPY_ADVISE] = "VHOST_USER_POSTCOPY_ADVISE", [VHOST_USER_POSTCOPY_LISTEN] = "VHOST_USER_POSTCOPY_LISTEN", + [VHOST_USER_POSTCOPY_END] = "VHOST_USER_POSTCOPY_END", }; /* The possible results of a message handling function */ @@ -1640,6 +1641,25 @@ vhost_user_set_postcopy_listen(struct virtio_net **pdev, return VH_RESULT_OK; } +static int +vhost_user_postcopy_end(struct virtio_net **pdev, struct VhostUserMsg *msg, + int main_fd __rte_unused) +{ + struct virtio_net *dev = *pdev; + + dev->postcopy_listening = 0; + if (dev->postcopy_ufd >= 0) { + close(dev->postcopy_ufd); + dev->postcopy_ufd = -1; + } + + msg->payload.u64 = 0; + msg->size = sizeof(msg->payload.u64); + msg->fd_num = 0; + + return 0; +} + typedef int (*vhost_message_handler_t)(struct virtio_net **pdev, struct VhostUserMsg *msg, int main_fd); @@ -1669,6 +1689,7 @@ static vhost_message_handler_t vhost_message_handlers[VHOST_USER_MAX] = { [VHOST_USER_IOTLB_MSG] = vhost_user_iotlb_msg, [VHOST_USER_POSTCOPY_ADVISE] = vhost_user_set_postcopy_advise, [VHOST_USER_POSTCOPY_LISTEN] = vhost_user_set_postcopy_listen, + [VHOST_USER_POSTCOPY_END] = vhost_user_postcopy_end, }; -- 2.17.1
[dpdk-dev] [PATCH v2 17/17] net/vhost: add parameter to enable postcopy support
Introduce a new postcopy-support parameter to Vhost PMD that passes the RTE_VHOST_USER_POSTCOPY_SUPPORT flag at vhost device register time. Flag should only be set if application does not prefault guest memory using, for example, mlockall() syscall. Default value is 0, meaning that postcopy support is disabled unless specified explicitly. Example to enable postcopy support for a given device: --vdev 'net_vhost0,iface=/tmp/vhost-user1,postcopy-support=1' Signed-off-by: Maxime Coquelin --- doc/guides/nics/vhost.rst | 5 + drivers/net/vhost/rte_eth_vhost.c | 13 + 2 files changed, 18 insertions(+) diff --git a/doc/guides/nics/vhost.rst b/doc/guides/nics/vhost.rst index 4f7ae8990..23f2e87aa 100644 --- a/doc/guides/nics/vhost.rst +++ b/doc/guides/nics/vhost.rst @@ -71,6 +71,11 @@ The user can specify below arguments in `--vdev` option. It is used to enable iommu support in vhost library. (Default: 0 (disabled)) +#. ``postcopy-support``: + +It is used to enable postcopy live-migration support in vhost library. +(Default: 0 (disabled)) + Vhost PMD event handling diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c index aa6052221..1330f06ba 100644 --- a/drivers/net/vhost/rte_eth_vhost.c +++ b/drivers/net/vhost/rte_eth_vhost.c @@ -30,6 +30,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; #define ETH_VHOST_CLIENT_ARG "client" #define ETH_VHOST_DEQUEUE_ZERO_COPY"dequeue-zero-copy" #define ETH_VHOST_IOMMU_SUPPORT"iommu-support" +#define ETH_VHOST_POSTCOPY_SUPPORT "postcopy-support" #define VHOST_MAX_PKT_BURST 32 static const char *valid_arguments[] = { @@ -38,6 +39,7 @@ static const char *valid_arguments[] = { ETH_VHOST_CLIENT_ARG, ETH_VHOST_DEQUEUE_ZERO_COPY, ETH_VHOST_IOMMU_SUPPORT, + ETH_VHOST_POSTCOPY_SUPPORT, NULL }; @@ -1339,6 +1341,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev) int client_mode = 0; int dequeue_zero_copy = 0; int iommu_support = 0; + int postcopy_support = 0; struct rte_eth_dev *eth_dev; const char *name = rte_vdev_device_name(dev); @@ -1411,6 +1414,16 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev) flags |= RTE_VHOST_USER_IOMMU_SUPPORT; } + if (rte_kvargs_count(kvlist, ETH_VHOST_POSTCOPY_SUPPORT) == 1) { + ret = rte_kvargs_process(kvlist, ETH_VHOST_POSTCOPY_SUPPORT, +&open_int, &postcopy_support); + if (ret < 0) + goto out_free; + + if (postcopy_support) + flags |= RTE_VHOST_USER_POSTCOPY_SUPPORT; + } + if (dev->device.numa_node == SOCKET_ID_ANY) dev->device.numa_node = rte_socket_id(); -- 2.17.1
[dpdk-dev] [PATCH v2 16/17] vhost: add flag to enable postcopy live-migration
Postcopy live-migration feature require the application to not populate the guest memory. As the vhost library cannot prevent the application to that (e.g. preventing the application to call mlockall()), the feature is disabled by default. The application should only enable the feature if it does not force the guest memory to be populated. In case the user passes the RTE_VHOST_USER_POSTCOPY_SUPPORT flag at registration but the feature was not compiled, registration fails. Signed-off-by: Maxime Coquelin --- doc/guides/prog_guide/vhost_lib.rst | 8 lib/librte_vhost/rte_vhost.h| 1 + lib/librte_vhost/socket.c | 19 +-- 3 files changed, 26 insertions(+), 2 deletions(-) diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst index 77af4d775..c77df338f 100644 --- a/doc/guides/prog_guide/vhost_lib.rst +++ b/doc/guides/prog_guide/vhost_lib.rst @@ -106,6 +106,14 @@ The following is an overview of some key Vhost API functions: Enabling this flag with these Qemu version results in Qemu being blocked when multiple queue pairs are declared. + - ``RTE_VHOST_USER_POSTCOPY_SUPPORT`` + +Postcopy live-migration support will be enabled when this flag is set. +It is disabled by default. + +Enabling this flag should only be done when the calling application does +not pre-fault the guest shared memory, otherwise migration would fail. + * ``rte_vhost_driver_set_features(path, features)`` This function sets the feature bits the vhost-user driver supports. The diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h index b3cc6990d..b26afbffa 100644 --- a/lib/librte_vhost/rte_vhost.h +++ b/lib/librte_vhost/rte_vhost.h @@ -28,6 +28,7 @@ extern "C" { #define RTE_VHOST_USER_NO_RECONNECT(1ULL << 1) #define RTE_VHOST_USER_DEQUEUE_ZERO_COPY (1ULL << 2) #define RTE_VHOST_USER_IOMMU_SUPPORT (1ULL << 3) +#define RTE_VHOST_USER_POSTCOPY_SUPPORT(1ULL << 4) /** Protocol features. */ #ifndef VHOST_USER_PROTOCOL_F_MQ diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index c04d3d305..3df303be8 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -51,6 +51,8 @@ struct vhost_user_socket { uint64_t supported_features; uint64_t features; + uint64_t protocol_features; + /* * Device id to identify a specific backend device. * It's set to -1 for the default software implementation. @@ -731,7 +733,7 @@ rte_vhost_driver_get_protocol_features(const char *path, did = vsocket->vdpa_dev_id; vdpa_dev = rte_vdpa_get_device(did); if (!vdpa_dev || !vdpa_dev->ops->get_protocol_features) { - *protocol_features = VHOST_USER_PROTOCOL_FEATURES; + *protocol_features = vsocket->protocol_features; goto unlock_exit; } @@ -744,7 +746,7 @@ rte_vhost_driver_get_protocol_features(const char *path, goto unlock_exit; } - *protocol_features = VHOST_USER_PROTOCOL_FEATURES + *protocol_features = vsocket->protocol_features & vdpa_protocol_features; unlock_exit: @@ -863,6 +865,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags) vsocket->use_builtin_virtio_net = true; vsocket->supported_features = VIRTIO_NET_SUPPORTED_FEATURES; vsocket->features = VIRTIO_NET_SUPPORTED_FEATURES; + vsocket->protocol_features = VHOST_USER_PROTOCOL_FEATURES; /* Dequeue zero copy can't assure descriptors returned in order */ if (vsocket->dequeue_zero_copy) { @@ -875,6 +878,18 @@ rte_vhost_driver_register(const char *path, uint64_t flags) vsocket->features &= ~(1ULL << VIRTIO_F_IOMMU_PLATFORM); } + if (!(flags & RTE_VHOST_USER_POSTCOPY_SUPPORT)) { + vsocket->protocol_features &= + ~(1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT); + } else { +#ifndef RTE_LIBRTE_VHOST_POSTCOPY + RTE_LOG(ERR, VHOST_CONFIG, + "Postcopy requested but not compiled\n"); + ret = -1; + goto out_mutex; +#endif + } + if ((flags & RTE_VHOST_USER_CLIENT) != 0) { vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT); if (vsocket->reconnect && reconn_tid == 0) { -- 2.17.1
Re: [dpdk-dev] [PATCH] mbuf: clarify QINQ flag usage
On 10/2/18 1:17 PM, Ferruh Yigit wrote: Update implementation that when PKT_RX_QINQ_STRIPPED mbuf ol_flags set by PMD, PKT_RX_QINQ, PKT_RX_VLAN_STRIPPED & PKT_RX_VLAN should be also set. Clarify mbuf documentations that when PKT_RX_QINQ set PKT_RX_VLAN also should be set. So that appllication can rely on PKT_RX_QINQ flag to access both mbuf.vlan_tci & mbuf.vlan_tci_outer Signed-off-by: Ferruh Yigit --- Cc: Hyong Youb Kim Cc: John Daley --- app/test-pmd/rxonly.c| 2 +- doc/guides/nics/features.rst | 7 --- drivers/net/i40e/i40e_rxtx.c | 3 ++- lib/librte_mbuf/rte_mbuf.c | 1 + lib/librte_mbuf/rte_mbuf.h | 5 +++-- 5 files changed, 11 insertions(+), 7 deletions(-) diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c index a93d80612..08a5fc2cf 100644 --- a/app/test-pmd/rxonly.c +++ b/app/test-pmd/rxonly.c @@ -132,7 +132,7 @@ pkt_burst_receive(struct fwd_stream *fs) printf(" - timestamp %"PRIu64" ", mb->timestamp); if (ol_flags & PKT_RX_VLAN_STRIPPED) It looks like it should be PKT_RX_VLAN above. printf(" - VLAN tci=0x%x", mb->vlan_tci); - if (ol_flags & PKT_RX_QINQ_STRIPPED) + if (ol_flags & PKT_RX_QINQ) printf(" - QinQ VLAN tci=0x%x, VLAN tci outer=0x%x", mb->vlan_tci, mb->vlan_tci_outer); The first one duplicates above printout, so it should be either put before PKT_RX_VLAN check and do PKT_RX_VLAN in else branch, or simply removed from here. <...>
Re: [dpdk-dev] [PATCH v7] app/testpmd: add forwarding mode to simulate a noisy neighbour
02/10/2018 10:59, Jens Freimann: > On Tue, Oct 02, 2018 at 09:57:52AM +0200, Thomas Monjalon wrote: > >02/10/2018 09:19, Jens Freimann: > >> On Mon, Oct 01, 2018 at 01:13:32PM +, Iremonger, Bernard wrote: > >> >./devtools/check-git-log.sh -1 > >> >Headline too long: > >> >app/testpmd: add forwarding mode to simulate a noisy neighbour > >> > >> I'm sorry, I failed to use checkpatches.sh correctly :) I did: > >> > >> #> git show | > >> DPDK_CHECKPATCH_PATH="/home/jfreiman/code/linux/scripts/checkpatch.pl" > >> devtools/checkpatches.sh -- > >> > >> 1/1 valid patch > > > >Why this command is not correct? > > checkpatches.sh looks for the string "Subject:" which is not included > in git show output. Using cat on the patch file instead will work. Yes indeed. As an improvement, we could check for "git show" output, starting with "commit" word.
Re: [dpdk-dev] [PATCH 3/3] event/dpaa2: support for crypto adapter
Acked-By: Abhinandan Gujjar > -Original Message- > From: Jerin Jacob > Sent: Tuesday, September 25, 2018 8:42 AM > To: akhil.go...@nxp.com > Cc: dev@dpdk.org; hemant.agra...@nxp.com; De Lara Guarch, Pablo > ; Gujjar, Abhinandan S > > Subject: Re: [dpdk-dev] [PATCH 3/3] event/dpaa2: support for crypto adapter > > -Original Message- > > Date: Fri, 14 Sep 2018 17:18:10 +0530 > > From: akhil.go...@nxp.com > > To: dev@dpdk.org > > CC: hemant.agra...@nxp.com, pablo.de.lara.gua...@intel.com, Akhil > > Goyal > > Subject: [dpdk-dev] [PATCH 3/3] event/dpaa2: support for crypto > > adapter > > X-Mailer: git-send-email 2.17.1 > > > > > > From: Akhil Goyal > > > > Signed-off-by: Akhil Goyal > > Signed-off-by: Ashish Jain > > Signed-off-by: Hemant Agrawal > > > Adding Eventdev Crypto Adapter maintainer > > + Abhinandan Gujjar > > > > --- > > drivers/event/dpaa2/Makefile | 3 +- > > drivers/event/dpaa2/dpaa2_eventdev.c | 150 > +++ > > drivers/event/dpaa2/dpaa2_eventdev.h | 9 ++ > > drivers/event/dpaa2/meson.build | 3 +- > > 4 files changed, 163 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/event/dpaa2/Makefile > > b/drivers/event/dpaa2/Makefile index 5e1a63200..46f7d061e 100644 > > --- a/drivers/event/dpaa2/Makefile > > +++ b/drivers/event/dpaa2/Makefile > > @@ -20,9 +20,10 @@ CFLAGS += -I$(RTE_SDK)/drivers/event/dpaa2 CFLAGS > > += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal > > LDLIBS += -lrte_eal -lrte_eventdev > > LDLIBS += -lrte_bus_fslmc -lrte_mempool_dpaa2 -lrte_pmd_dpaa2 -LDLIBS > > += -lrte_bus_vdev > > +LDLIBS += -lrte_bus_vdev -lrte_pmd_dpaa2_sec > > CFLAGS += -I$(RTE_SDK)/drivers/net/dpaa2 CFLAGS += > > -I$(RTE_SDK)/drivers/net/dpaa2/mc > > +CFLAGS += -I$(RTE_SDK)/drivers/crypto/dpaa2_sec > > > > # versioning export map > > EXPORT_MAP := rte_pmd_dpaa2_event_version.map diff --git > > a/drivers/event/dpaa2/dpaa2_eventdev.c > > b/drivers/event/dpaa2/dpaa2_eventdev.c > > index cadbdb13b..890ab461c 100644 > > --- a/drivers/event/dpaa2/dpaa2_eventdev.c > > +++ b/drivers/event/dpaa2/dpaa2_eventdev.c > > @@ -27,6 +27,7 @@ > > #include > > #include > > #include > > +#include > > #include > > > > #include > > @@ -34,6 +35,7 @@ > > #include > > #include > > #include > > +#include > > #include "dpaa2_eventdev.h" > > #include "dpaa2_eventdev_logs.h" > > #include > > @@ -793,6 +795,149 @@ dpaa2_eventdev_eth_stop(const struct > rte_eventdev *dev, > > return 0; > > } > > > > +static int > > +dpaa2_eventdev_crypto_caps_get(const struct rte_eventdev *dev, > > + const struct rte_cryptodev *cdev, > > + uint32_t *caps) { > > + const char *name = cdev->data->name; > > + > > + EVENTDEV_INIT_FUNC_TRACE(); > > + > > + RTE_SET_USED(dev); > > + > > + if (!strncmp(name, "dpsec-", 6)) > > + *caps = RTE_EVENT_CRYPTO_ADAPTER_DPAA2_CAP; > > + else > > + return -1; > > + > > + return 0; > > +} > > + > > +static int > > +dpaa2_eventdev_crypto_queue_add_all(const struct rte_eventdev *dev, > > + const struct rte_cryptodev *cryptodev, > > + const struct rte_event *ev) { > > + struct dpaa2_eventdev *priv = dev->data->dev_private; > > + uint8_t ev_qid = ev->queue_id; > > + uint16_t dpcon_id = priv->evq_info[ev_qid].dpcon->dpcon_id; > > + int i, ret; > > + > > + EVENTDEV_INIT_FUNC_TRACE(); > > + > > + for (i = 0; i < cryptodev->data->nb_queue_pairs; i++) { > > + ret = dpaa2_sec_eventq_attach(cryptodev, i, > > + dpcon_id, ev); > > + if (ret) { > > + DPAA2_EVENTDEV_ERR("dpaa2_sec_eventq_attach failed: > > ret > %d\n", > > + ret); > > + goto fail; > > + } > > + } > > + return 0; > > +fail: > > + for (i = (i - 1); i >= 0 ; i--) > > + dpaa2_sec_eventq_detach(cryptodev, i); > > + > > + return ret; > > +} > > + > > +static int > > +dpaa2_eventdev_crypto_queue_add(const struct rte_eventdev *dev, > > + const struct rte_cryptodev *cryptodev, > > + int32_t rx_queue_id, > > + const struct rte_event *ev) { > > + struct dpaa2_eventdev *priv = dev->data->dev_private; > > + uint8_t ev_qid = ev->queue_id; > > + uint16_t dpcon_id = priv->evq_info[ev_qid].dpcon->dpcon_id; > > + int ret; > > + > > + EVENTDEV_INIT_FUNC_TRACE(); > > + > > + if (rx_queue_id == -1) > > + return dpaa2_eventdev_crypto_queue_add_all(dev, > > + cryptodev, ev); > > + > > + ret = dpaa2_sec_eventq_attach(cryptodev, rx_queue_id, > > + dpcon_id, ev); > > + if (ret) { > > + DPAA2_EVENTDEV_ERR( > > + "dpaa2_sec_eventq_att
Re: [dpdk-dev] [PATCH 4/4] ethdev: add Tx offload outer L4 checksum definitions
-Original Message- > Date: Mon, 1 Oct 2018 14:45:39 +0100 > From: Ferruh Yigit > To: Jerin Jacob , Wenzhuo Lu > , Jingjing Wu , Bernard > Iremonger , John McNamara > , Marko Kovacevic , > Thomas Monjalon , Andrew Rybchenko > , Olivier Matz > CC: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH 4/4] ethdev: add Tx offload outer L4 > checksum definitions > User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 > Thunderbird/52.9.1 > > On 9/13/2018 2:47 PM, Jerin Jacob wrote: > > Introduced DEV_TX_OFFLOAD_OUTER_UDP_CKSUM, DEV_TX_OFFLOAD_OUTER_TCP_CKSUM > > and DEV_TX_OFFLOAD_OUTER_SCTP_CKSUM offload flags and > > > > PKT_TX_OUTER_L4_NO_CKSUM, PKT_TX_OUTER_TCP_CKSUM, PKT_TX_OUTER_SCTP_CKSUM > > and PKT_TX_OUTER_UDP_CKSUM mbuf ol_flags to enable Tx outer L4 checksum > > offload. > > > > To use hardware Tx outer L4 checksum offload, the user needs to. > > # enable following in mbuff: > > - fill outer_l2_len and outer_l3_len in mbuf > > - set the flags PKT_TX_OUTER_TCP_CKSUM, PKT_TX_OUTER_SCTP_CKSUM or > > PKT_TX_OUTER_UDP_CKSUM > > - set the flag PKT_TX_OUTER_IPV4 or PKT_TX_OUTER_IPV6 > > > > # configure DEV_TX_OFFLOAD_OUTER_* offload flags in slow path. > > > > Signed-off-by: Jerin Jacob > > --- > > app/test-pmd/config.c | 27 +++ > > doc/guides/nics/features.rst | 6 ++ > > lib/librte_ethdev/rte_ethdev.c | 3 +++ > > lib/librte_ethdev/rte_ethdev.h | 6 ++ > > lib/librte_mbuf/rte_mbuf.c | 5 + > > lib/librte_mbuf/rte_mbuf.h | 23 ++- > > 6 files changed, 69 insertions(+), 1 deletion(-) > > > > diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c > > index 92a177e29..85f832bf0 100644 > > --- a/app/test-pmd/config.c > > +++ b/app/test-pmd/config.c > > @@ -773,6 +773,33 @@ port_offload_cap_display(portid_t port_id) > > else > > printf("off\n"); > > } > > + > > + if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_OUTER_UDP_CKSUM) { > > + printf("TX Outer UDP checksum: "); > > + if (ports[port_id].dev_conf.txmode.offloads & > > + DEV_TX_OFFLOAD_OUTER_UDP_CKSUM) > > + printf("on\n"); > > + else > > + printf("off\n"); > > + } > > + > > + if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_OUTER_TCP_CKSUM) { > > + printf("TX Outer TCP checksum: "); > > + if (ports[port_id].dev_conf.txmode.offloads & > > + DEV_TX_OFFLOAD_OUTER_TCP_CKSUM) > > + printf("on\n"); > > + else > > + printf("off\n"); > > + } > > + > > + if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_OUTER_SCTP_CKSUM) { > > + printf("TX Outer SCTP checksum: "); > > + if (ports[port_id].dev_conf.txmode.offloads & > > + DEV_TX_OFFLOAD_OUTER_SCTP_CKSUM) > > + printf("on\n"); > > + else > > + printf("off\n"); > > + } > > } > > There is also "csum show", "csum set" functions, can you please check that > too? + Shahaf I checked the details. It is for "csumonly.c" forward engine to select various Tx checksum in HW or SW(provide fallback SW implementation) for testing purpose. If I need to implement this support for this release, I will reduce the scope to DEV_TX_OFFLOAD_OUTER_UDP_CKSUM and DEV_RX_OFFLOAD_OUTER_UDP_CKSUM. Since there is NO real world non encrypted TCP/SCTP based tunnel protocols(Based on http://patches.dpdk.org/patch/44692/ discussions) I will limit the offload definition only to DEV_?X_OFFLOAD_OUTER_UDP_CKSUM and associated test code in "csumonly.c" forward engine in v2. Thoughts? I will split 1/4 and 2/4 as separate patch series and and rework 3/4 and 4/4 as separate series to make forward progress. > And I am not sure why those functions seems only concerned about Tx csum > offloads. It does check for errors in Rx checksum too. See rx_bad_ip_csum, rx_bad_l4_csum
Re: [dpdk-dev] [PATCH v11 0/7] hot-unplug failure handle mechanism
hi, stephen Thanks for your review, my answer as below. On 10/1/2018 5:00 PM, Stephen Hemminger wrote: On Sun, 30 Sep 2018 19:29:56 +0800 Jeff Guo wrote: Hotplug is an important feature for use-cases like the datacenter device's fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher flexibility and continuality to networking services in multiple use-cases in the industry. So let's see how DPDK can help users implement hotplug solutions. We already have a general device-event monitor mechanism, failsafe driver, and hot plug/unplug API in DPDK. We have already got the solution of “ethdev event + kernel PMD hotplug handler + failsafe”, but we still not got “eal event + hotplug handler for pci PMD + failsafe” implement, and we need to considerate 2 different solutions between uio pci and vfio pci. In the case of hotplug for igb_uio, when a hardware device be removed physically or disabled in software, the application needs to be notified and detach the device out of the bus, and then make the device invalidate. The problem is that, the removal of the device is not instantaneous in software. If the application data path tries to read/write to the device when removal is still in process, it will cause an MMIO error and application will crash. In this patch set, we propose a PCIe bus failure handler mechanism for hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs, the application will not crash. The mechanism should work as below: First, the application enables the device event monitor, registers the hotplug event’s callback and enable hotplug handling before running the data path. Once the hot-unplug occurs, the mechanism will detect the removal event and then accordingly do the failure handling. In order to do that, the below functionality will be required: - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure. - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci, it will be based on the failure address to remap memory for the corresponding device that unplugged. For vfio pci, could seperate implement case by case. For the data path or other unexpected behaviors from the control path when a hot unplug occurs: - Add a new bus ops “sigbus_handler”, that is responsible for handling the sigbus error which is either an original memory error, or a specific memory error that is caused by a hot unplug. When a sigbus error is captured, it will call this function to handle sigbus error. - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all device on PCI bus to find which device encounter the failure. - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus to handle the failure. - Add a couple of APIs “rte_dev_hotplug_handle_enable” and “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling. It will monitor the sigbus error by a handler which is per-process. Based on the signal event principle, the control path thread and the data path thread will randomly receive the sigbus error, but will call the common sigbus process. When sigbus be captured, it will call the above API to find bus to handle it. The mechanism could be used by app or PMDs. For example, the whole process of hotplug in testpmd is: - Enable device event monitor->Enable hotplug handle->Register event callback ->attach port->start port->start forwarding->Device unplug->failure handle ->stop forwarding->stop port->close port->detach port. This patch set would not cover hotplug insert and binding, and it is only implement the igb_uio failure handler, the vfio hotplug failure handler will be in next coming patch set. patchset history: v11->v10: change the ops name, since both uio and vfio will use the hot-unplug ops. add experimental tag. since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage. move the igb_uio fixing part, since it is random issue and should be considarate as kernel driver defect but not include as this failure handler mechanism. v10->v9: modify the api name and exposure out for public use. add hotplug handle enable/disable APIs refine commit log v9->v8: refine commit log to be more readable. v8->v7: refine errno process in sigbus handler. refine igb uio release process v7->v6: delete some unused part v6->v5: refine some description about bus ops refine commit log add some entry check. v5->v4: split patches to focus on the failure handle, remove the event usage by testpmd to another patch. change the hotplug failure handler name. refine the sigbus handle logic. add lock for udev state in igb uio driver. v4->v3: split patches to be small and clear. change to use new parameter "--hotplug-mode" in testpmd to identify the eal hotplug and ethdev hotplug. v3->v2: change bus ops name to bus_hotplug_handler. add new API and bus ops of bus_signal_handler d
Re: [dpdk-dev] [PATCH] doc: update the doc for adding EAL option
Hi Eric, Ferruh has already mention that this should be part of the patch adding the --iova-mode flag, not separate (or at the very least be in the same patchset!). In addition, the commit headline is very vague. Suggested rewording: doc: document --iova-mode EAL flag On 01-Oct-18 4:54 PM, eric zhang wrote: This patch updates Programmer's Guide and EAL parameter guides to show EAL option "--iova-mode" support. Signed-off-by: eric zhang --- doc/guides/prog_guide/env_abstraction_layer.rst | 8 doc/guides/testpmd_app_ug/run_app.rst | 4 2 files changed, 12 insertions(+) diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index d362c92..a47fb38 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -321,6 +321,14 @@ Misc Functions Locks and atomic operations are per-architecture (i686 and x86_64). +IOVA Mode Configuration +~~~ + +Auto detection of the IOVA mode, based on probing the PCI bus and IOMMU configuration, may not report +the desired addressing mode when virtual devices that are not directly attached to the PCI bus are present. +To facilitate forcing the IOVA mode to a specific value the EAL command line option ``--iova-mode=mode`` can +be used to select either physical addressing('pa') or virtual addressing('va'). Presumably this isn't only applicable to PCI bus, but can be any bus, correct? + Memory Segments and Memory Zones (memzone) -- diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst index f301c2b..be2911c 100644 --- a/doc/guides/testpmd_app_ug/run_app.rst +++ b/doc/guides/testpmd_app_ug/run_app.rst @@ -133,6 +133,10 @@ See the DPDK Getting Started Guides for more information on these options. I wanted to ask why are you adding this to testpmd user guide, as this is an EAL parameter, not a testpmd parameter, but as far as i can tell, there isn't a central location where we document all EAL flags. +Thomas, John This looks like a gap in our documentation. There should be a place where we can describe all EAL parameters. Since they can be OS-specific, it probably should be somewhere under Linux/FreeBSD GSG. Thoughts? Use malloc instead of hugetlbfs. +* ``--iova-mode=mode`` Current style is to list all valid values, like this: ``--iova-mode `` + +Force IOVA mode to a specific value. Valid values are 'pa' or 'va'. + Testpmd Command-line Options -- Thanks, Anatoly
[dpdk-dev] [PATCH] eal: remove experimental from hotplug add/remove
rte_eal_hotplug_add() & rte_eal_hotplug_remove() APIs first added on v17.08 as experimental Commit a3ee360f4440 ("eal: add hotplug add/remove device") When __rte_experimental tag created, APIs tagged with it on v18.02 Commit 77b7b81e32e9 ("add experimental tag to appropriate functions") After rte_eth_dev_attach() & rte_eth_dev_detach() APIs has been deprecated in v18.08 eal APIs are only ones for hotplug operations Commit 9f2be5b3db8b ("ethdev: deprecate attach and detach functions") These APIs are around for a few releases now and without an alternative, removing the experimental tag from them. Signed-off-by: Ferruh Yigit --- Cc: Ian Stokes CC: arybche...@solarflare.com --- drivers/raw/ifpga_rawdev/Makefile | 1 - drivers/raw/ifpga_rawdev/meson.build| 2 -- lib/librte_eal/common/eal_common_dev.c | 7 --- lib/librte_eal/common/include/rte_dev.h | 15 +-- lib/librte_eal/rte_eal_version.map | 4 ++-- 5 files changed, 11 insertions(+), 18 deletions(-) diff --git a/drivers/raw/ifpga_rawdev/Makefile b/drivers/raw/ifpga_rawdev/Makefile index f3b9d5e61..c534f7f08 100644 --- a/drivers/raw/ifpga_rawdev/Makefile +++ b/drivers/raw/ifpga_rawdev/Makefile @@ -8,7 +8,6 @@ include $(RTE_SDK)/mk/rte.vars.mk # LIB = librte_pmd_ifpga_rawdev.a -CFLAGS += -DALLOW_EXPERIMENTAL_API CFLAGS += -O3 CFLAGS += $(WERROR_FLAGS) CFLAGS += -I$(RTE_SDK)/drivers/bus/ifpga diff --git a/drivers/raw/ifpga_rawdev/meson.build b/drivers/raw/ifpga_rawdev/meson.build index 67256872d..37896afba 100644 --- a/drivers/raw/ifpga_rawdev/meson.build +++ b/drivers/raw/ifpga_rawdev/meson.build @@ -11,5 +11,3 @@ deps += ['rawdev', 'pci', 'bus_pci', 'kvargs', sources = files('ifpga_rawdev.c') includes += include_directories('base') - -allow_experimental_apis = true diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c index 678dbcac7..ab3170ebc 100644 --- a/lib/librte_eal/common/eal_common_dev.c +++ b/lib/librte_eal/common/eal_common_dev.c @@ -127,8 +127,9 @@ int rte_eal_dev_detach(struct rte_device *dev) return ret; } -int __rte_experimental rte_eal_hotplug_add(const char *busname, const char *devname, - const char *devargs) +int +rte_eal_hotplug_add(const char *busname, const char *devname, + const char *devargs) { struct rte_bus *bus; struct rte_device *dev; @@ -193,7 +194,7 @@ int __rte_experimental rte_eal_hotplug_add(const char *busname, const char *devn return ret; } -int __rte_experimental +int rte_eal_hotplug_remove(const char *busname, const char *devname) { struct rte_bus *bus; diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h index b80a80598..2db506987 100644 --- a/lib/librte_eal/common/include/rte_dev.h +++ b/lib/librte_eal/common/include/rte_dev.h @@ -189,9 +189,6 @@ __rte_deprecated int rte_eal_dev_detach(struct rte_device *dev); /** - * @warning - * @b EXPERIMENTAL: this API may change without prior notice - * * Hotplug add a given device to a specific bus. * * @param busname @@ -204,13 +201,11 @@ int rte_eal_dev_detach(struct rte_device *dev); * @return * 0 on success, negative on error. */ -int __rte_experimental rte_eal_hotplug_add(const char *busname, const char *devname, - const char *devargs); +int +rte_eal_hotplug_add(const char *busname, const char *devname, + const char *devargs); /** - * @warning - * @b EXPERIMENTAL: this API may change without prior notice - * * Hotplug remove a given device from a specific bus. * * @param busname @@ -220,8 +215,8 @@ int __rte_experimental rte_eal_hotplug_add(const char *busname, const char *devn * @return * 0 on success, negative on error. */ -int __rte_experimental rte_eal_hotplug_remove(const char *busname, - const char *devname); +int +rte_eal_hotplug_remove(const char *busname, const char *devname); /** * Device comparison function. diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map index 73282bbb0..bd6ba15e3 100644 --- a/lib/librte_eal/rte_eal_version.map +++ b/lib/librte_eal/rte_eal_version.map @@ -265,6 +265,8 @@ DPDK_18.08 { DPDK_18.11 { global: + rte_eal_hotplug_add; + rte_eal_hotplug_remove; rte_strscpy; } DPDK_18.08; @@ -292,8 +294,6 @@ EXPERIMENTAL { rte_devargs_remove; rte_devargs_type_count; rte_eal_cleanup; - rte_eal_hotplug_add; - rte_eal_hotplug_remove; rte_fbarray_attach; rte_fbarray_destroy; rte_fbarray_detach; -- 2.17.1
Re: [dpdk-dev] [PATCH] doc: update the doc for adding EAL option
02/10/2018 11:59, Burakov, Anatoly: > This looks like a gap in our documentation. There should be a place > where we can describe all EAL parameters. Since they can be OS-specific, > it probably should be somewhere under Linux/FreeBSD GSG. Thoughts? I agree
Re: [dpdk-dev] [PATCH] mbuf: clarify QINQ flag usage
On 10/2/2018 10:44 AM, Andrew Rybchenko wrote: > On 10/2/18 1:17 PM, Ferruh Yigit wrote: >> Update implementation that when PKT_RX_QINQ_STRIPPED mbuf ol_flags >> set by PMD, PKT_RX_QINQ, PKT_RX_VLAN_STRIPPED & PKT_RX_VLAN >> should be also set. >> >> Clarify mbuf documentations that when PKT_RX_QINQ set PKT_RX_VLAN also >> should be set. >> >> So that appllication can rely on PKT_RX_QINQ flag to access both >> mbuf.vlan_tci & mbuf.vlan_tci_outer >> >> Signed-off-by: Ferruh Yigit >> --- >> Cc: Hyong Youb Kim >> Cc: John Daley >> --- >> app/test-pmd/rxonly.c| 2 +- >> doc/guides/nics/features.rst | 7 --- >> drivers/net/i40e/i40e_rxtx.c | 3 ++- >> lib/librte_mbuf/rte_mbuf.c | 1 + >> lib/librte_mbuf/rte_mbuf.h | 5 +++-- >> 5 files changed, 11 insertions(+), 7 deletions(-) >> >> diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c >> index a93d80612..08a5fc2cf 100644 >> --- a/app/test-pmd/rxonly.c >> +++ b/app/test-pmd/rxonly.c >> @@ -132,7 +132,7 @@ pkt_burst_receive(struct fwd_stream *fs) >> printf(" - timestamp %"PRIu64" ", mb->timestamp); >> if (ol_flags & PKT_RX_VLAN_STRIPPED) > > It looks like it should be PKT_RX_VLAN above. There is already a patch from Hyong for it which triggered this patch: https://patches.dpdk.org/patch/45350/ > >> printf(" - VLAN tci=0x%x", mb->vlan_tci); >> -if (ol_flags & PKT_RX_QINQ_STRIPPED) >> +if (ol_flags & PKT_RX_QINQ) >> printf(" - QinQ VLAN tci=0x%x, VLAN tci outer=0x%x", >> mb->vlan_tci, mb->vlan_tci_outer); > > The first one duplicates above printout, so it should be either put before > PKT_RX_VLAN check and do PKT_RX_VLAN in else branch, or simply removed > from here. Right, let me check it.
Re: [dpdk-dev] [PATCH v11 0/7] hot-unplug failure handle mechanism
hi, jerin Thanks for your comment and reply as below. On 10/1/2018 5:55 PM, Jerin Jacob wrote: -Original Message- Date: Mon, 1 Oct 2018 11:00:12 +0200 From: Stephen Hemminger To: Jeff Guo Cc: bruce.richard...@intel.com, ferruh.yi...@intel.com, konstantin.anan...@intel.com, gaetan.ri...@6wind.com, jingjing...@intel.com, tho...@monjalon.net, mo...@mellanox.com, ma...@mellanox.com, harry.van.haa...@intel.com, qi.z.zh...@intel.com, shaopeng...@intel.com, bernard.iremon...@intel.com, arybche...@solarflare.com, wenzhuo...@intel.com, anatoly.bura...@intel.com, jblu...@infradead.org, shreyansh.j...@nxp.com, dev@dpdk.org, helin.zh...@intel.com Subject: Re: [dpdk-dev] [PATCH v11 0/7] hot-unplug failure handle mechanism On Sun, 30 Sep 2018 19:29:56 +0800 Jeff Guo wrote: Hotplug is an important feature for use-cases like the datacenter device's fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher flexibility and continuality to networking services in multiple use-cases in the industry. So let's see how DPDK can help users implement hotplug solutions. We already have a general device-event monitor mechanism, failsafe driver, and hot plug/unplug API in DPDK. We have already got the solution of “ethdev event + kernel PMD hotplug handler + failsafe”, but we still not got “eal event + hotplug handler for pci PMD + failsafe” implement, and we need to considerate 2 different solutions between uio pci and vfio pci. In the case of hotplug for igb_uio, when a hardware device be removed physically or disabled in software, the application needs to be notified and detach the device out of the bus, and then make the device invalidate. The problem is that, the removal of the device is not instantaneous in software. If the application data path tries to read/write to the device when removal is still in process, it will cause an MMIO error and application will crash. In this patch set, we propose a PCIe bus failure handler mechanism for hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs, the application will not crash. The mechanism should work as below: First, the application enables the device event monitor, registers the hotplug event’s callback and enable hotplug handling before running the data path. Once the hot-unplug occurs, the mechanism will detect the removal event and then accordingly do the failure handling. In order to do that, the below functionality will be required: - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure. - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci, it will be based on the failure address to remap memory for the corresponding device that unplugged. For vfio pci, could seperate implement case by case. For the data path or other unexpected behaviors from the control path when a hot unplug occurs: - Add a new bus ops “sigbus_handler”, that is responsible for handling the sigbus error which is either an original memory error, or a specific memory error that is caused by a hot unplug. When a sigbus error is captured, it will call this function to handle sigbus error. - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all device on PCI bus to find which device encounter the failure. - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus to handle the failure. - Add a couple of APIs “rte_dev_hotplug_handle_enable” and “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling. It will monitor the sigbus error by a handler which is per-process. Based on the signal event principle, the control path thread and the data path thread will randomly receive the sigbus error, but will call the common sigbus process. When sigbus be captured, it will call the above API to find bus to handle it. The mechanism could be used by app or PMDs. For example, the whole process of hotplug in testpmd is: - Enable device event monitor->Enable hotplug handle->Register event callback ->attach port->start port->start forwarding->Device unplug->failure handle ->stop forwarding->stop port->close port->detach port. This patch set would not cover hotplug insert and binding, and it is only implement the igb_uio failure handler, the vfio hotplug failure handler will be in next coming patch set. patchset history: v11->v10: change the ops name, since both uio and vfio will use the hot-unplug ops. add experimental tag. since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage. move the igb_uio fixing part, since it is random issue and should be considarate as kernel driver defect but not include as this failure handler mechanism. v10->v9: modify the api name and exposure out for public use. add hotplug handle enable/disable APIs refine commit log v9->v8: refine commit log to be more readable. v8->v7: refine errno process
Re: [dpdk-dev] [PATCH] eal: remove experimental from hotplug add/remove
On 10/2/18 2:04 PM, Ferruh Yigit wrote: rte_eal_hotplug_add() & rte_eal_hotplug_remove() APIs first added on v17.08 as experimental Commit a3ee360f4440 ("eal: add hotplug add/remove device") When __rte_experimental tag created, APIs tagged with it on v18.02 Commit 77b7b81e32e9 ("add experimental tag to appropriate functions") After rte_eth_dev_attach() & rte_eth_dev_detach() APIs has been deprecated in v18.08 eal APIs are only ones for hotplug operations Commit 9f2be5b3db8b ("ethdev: deprecate attach and detach functions") These APIs are around for a few releases now and without an alternative, removing the experimental tag from them. Signed-off-by: Ferruh Yigit Dup of http://patches.dpdk.org/patch/45791/ ?
Re: [dpdk-dev] [PATCH v8 1/4] lib/librte_power: traffic pattern aware power control
Hi Dave, Please check comment below. On 28 Sep 11:47, Hunt, David wrote: > Hi Liang, > > > On 17/9/2018 2:30 PM, Liang Ma wrote: > >1. Abstract > > > >For packet processing workloads such as DPDK polling is continuous. > >This means CPU cores always show 100% busy independent of how much work > >those cores are doing. It is critical to accurately determine how busy > >a core is hugely important for the following reasons: > > > >* No indication of overload conditions > > > >* User do not know how much real load is on a system meaning resulted > >in > > wasted energy as no power management is utilized > > > >Compared to the original l3fwd-power design, instead of going to sleep > >after detecting an empty poll, the new mechanism just lowers the core > >frequency. As a result, the application does not stop polling the device, > >which leads to improved handling of bursts of traffic. > > > >When the system become busy, the empty poll mechanism can also increase the > >core frequency (including turbo) to do best effort for intensive traffic. > >This gives us more flexible and balanced traffic awareness over the > >standard l3fwd-power application. > > > >2. Proposed solution > > > >The proposed solution focuses on how many times empty polls are executed. > >The less the number of empty polls, means current core is busy with > >processing workload, therefore, the higher frequency is needed. The high > >empty poll number indicates the current core not doing any real work > >therefore, we can lower the frequency to safe power. > > > >In the current implementation, each core has 1 empty-poll counter which > >assume 1 core is dedicated to 1 queue. This will need to be expanded in the > >future to support multiple queues per core. > > > >2.1 Power state definition: > > > > LOW: Not currently used, reserved for future use. > > > > MED: the frequency is used to process modest traffic workload. > > > > HIGH: the frequency is used to process busy traffic workload. > > > >2.2 There are two phases to establish the power management system: > > > > a.Initialization/Training phase. The training phase is necessary > > in order to figure out the system polling baseline numbers from > > idle to busy. The highest poll count will be during idle, where > > all polls are empty. These poll counts will be different between > > systems due to the many possible processor micro-arch, cache > > and device configurations, hence the training phase. > > In the training phase, traffic is blocked so the training > > algorithm can average the empty-poll numbers for the LOW, MED and > > HIGH power states in order to create a baseline. > > The core's counter are collected every 10ms, and the Training > > phase will take 2 seconds. > > Training is disabled as default configuration. the default > > parameter is applied. Simple App still can trigger training > > Typo: "Simple" should be "Sample" > > Suggest adding: Once the training phase has been executed once on a > system, the application > can then be started with the relevant thresholds provided on the command > line, allowing the > application to start passing start traffic immediately. agree > > > if that's needed. > > > > b.Normal phase. When the training phase is complete, traffic is > > started. The run-time poll counts are compared with the > > baseline and the decision will be taken to move to MED power > > state or HIGH power state. The counters are calculated every 10ms. > > Propose changing the first sentence: Traffic starts immediately based > on the default > thresholds, or based on the user supplied thresholds via the command > line parameters. > agree > > > > >3. Proposed API > > > >1. rte_power_empty_poll_stat_init(struct ep_params **eptr, > > uint8_t *freq_tlb, struct ep_policy *policy); > >which is used to initialize the power management system. > > > >2. rte_power_empty_poll_stat_free(void); > >which is used to free the resource hold by power management system. > > > >3. rte_power_empty_poll_stat_update(unsigned int lcore_id); > >which is used to update specific core empty poll counter, not thread safe > > > >4. rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt); > >which is used to update specific core valid poll counter, not thread safe > > > >5. rte_power_empty_poll_stat_fetch(unsigned int lcore_id); > >which is used to get specific core empty poll counter. > > > >6. rte_power_poll_stat_fetch(unsigned int lcore_id); > >which is used to get specific core valid poll counter. > > > >7. rte_empty_poll_detection(struct rte_timer *tim, void *arg); > >which is used to detect empty poll state changes then take action. > > > >ChangeLog: > >v2: fix some coding style issues. > >v3: rename the filename, API name. > >v4: no change. > >v5: no change. > >v6: re-work the code layout, update
Re: [dpdk-dev] [PATCH v8 2/4] examples/l3fwd-power: simple app update for new API
On 28 Sep 12:19, Hunt, David wrote: > Hi Liang, > > A few tweaks below: > > > On 17/9/2018 2:30 PM, Liang Ma wrote: > >Add the support for new traffic pattern aware power control > >power management API. > > > >Example: > >./l3fwd-power -l xxx -n 4 -w :xx:00.0 -w :xx:00.1 -- -p 0x3 > >-P --config="(0,0,xx),(1,0,xx)" --empty-poll="0,0,0" -l 14 -m 9 -h 1 > > > >Please Reference l3fwd-power document for all parameter except > >empty-poll. > > The docs should probably include empty poll parameter. Suggest > re-wording to > > Please Reference l3fwd-power document for full parameter usage > agree > > > >The option "l", "m", "h" are used to set the power index for > >LOW, MED, HIGH power state. only is useful after enable empty-poll > > > >--empty-poll="training_flag, med_threshold, high_threshold" > > > >The option training_flag is used to enable/disable training mode. > > > >The option med_threshold is used to indicate the empty poll threshold > >of modest state which is customized by user. > > > >The option high_threshold is used to indicate the empty poll threshold > >of busy state which is customized by user. > > > >Above three option default value is all 0. > > > >Once enable empty-poll. System will apply the default parameter. > >Training mode is disabled as default. > > Suggest: > > Once empty-poll is enabled, the system will apply the default parameters is > no > other command line options are provided. > agree > > > >If training mode is triggered, there should not has any traffic > >pass-through during training phase. > > Suggest: > If training mode is enabled, the user should ensure that no traffic > is allowed to pass through the system. > > >When training phase complete, system transfer to normal phase. > > When training phase complete, the application transfer to normal operation > > agree > > > > >System will running with modest power stat at beginning. > > System will start running with the modest power mode. > > > >If the system busyness percentage above 70%, then system will adjust > >power state move to High power state. If the traffic become lower(eg. The > >system busyness percentage drop below 30%), system will fallback > >to the modest power state. > > If the traffic goes above 70%, then system will move to High power state. > If the traffic drops below 30%, the system will fallback to the modest > power state. > > > >Example code use master thread to monitoring worker thread busyness. > >the default timer resolution is 10ms. > > > >ChangeLog: > >v2 fix some coding style issues > >v3 rename the API. > >v6 re-work the API. > >v7 no change. > >v8 disable training as default option. > > > >Signed-off-by: Liang Ma > > > >Reviewed-by: Lei Yao > >--- > > examples/l3fwd-power/Makefile| 3 + > > examples/l3fwd-power/main.c | 325 > > +-- > > examples/l3fwd-power/meson.build | 1 + > > 3 files changed, 312 insertions(+), 17 deletions(-) > > > >diff --git a/examples/l3fwd-power/Makefile b/examples/l3fwd-power/Makefile > >index d7e39a3..772ec7b 100644 > >--- a/examples/l3fwd-power/Makefile > >+++ b/examples/l3fwd-power/Makefile > >@@ -23,6 +23,8 @@ CFLAGS += -O3 $(shell pkg-config --cflags libdpdk) > > LDFLAGS_SHARED = $(shell pkg-config --libs libdpdk) > > LDFLAGS_STATIC = -Wl,-Bstatic $(shell pkg-config --static --libs libdpdk) > > > >+CFLAGS += -DALLOW_EXPERIMENTAL_API > >+ > > build/$(APP)-shared: $(SRCS-y) Makefile $(PC_FILE) | build > > $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED) > > > >@@ -54,6 +56,7 @@ please change the definition of the RTE_TARGET > >environment variable) > > all: > > else > > > >+CFLAGS += -DALLOW_EXPERIMENTAL_API > > CFLAGS += -O3 > > CFLAGS += $(WERROR_FLAGS) > > > >diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c > >index 68527d2..1465608 100644 > >--- a/examples/l3fwd-power/main.c > >+++ b/examples/l3fwd-power/main.c > >@@ -43,6 +43,7 @@ > > #include > > #include > > #include > >+#include > > > > #include "perf_core.h" > > #include "main.h" > >@@ -55,6 +56,8 @@ > > > > /* 100 ms interval */ > > #define TIMER_NUMBER_PER_SECOND 10 > >+/* (10ms) */ > >+#define INTERVALS_PER_SECOND 100 > > /* 10 us */ > > #define SCALING_PERIOD > > (100/TIMER_NUMBER_PER_SECOND) > > #define SCALING_DOWN_TIME_RATIO_THRESHOLD 0.25 > >@@ -117,6 +120,11 @@ > > */ > > #define RTE_TEST_RX_DESC_DEFAULT 1024 > > #define RTE_TEST_TX_DESC_DEFAULT 1024 > >+#define EMPTY_POLL_MED_THRESHOLD 35UL > >+#define EMPTY_POLL_HGH_THRESHOLD 58UL > > I'd suggest adding some explanation around these two numbers. > E.g. > /* > * These two thresholds were decided on by running the training > algorithm on > * a 2.5GHz Xeon. These defaults can be overridden by supplying > non-zero values > * for the med_threshold and high_threshold parameters on the command line. > */ > > > >+ > >+
Re: [dpdk-dev] [PATCH] eal: remove experimental from hotplug add/remove
On 10/2/2018 11:08 AM, Andrew Rybchenko wrote: > On 10/2/18 2:04 PM, Ferruh Yigit wrote: >> rte_eal_hotplug_add() & rte_eal_hotplug_remove() APIs first added on >> v17.08 as experimental >> Commit a3ee360f4440 ("eal: add hotplug add/remove device") >> >> When __rte_experimental tag created, APIs tagged with it on v18.02 >> Commit 77b7b81e32e9 ("add experimental tag to appropriate functions") >> >> After rte_eth_dev_attach() & rte_eth_dev_detach() APIs has been >> deprecated in v18.08 eal APIs are only ones for hotplug operations >> Commit 9f2be5b3db8b ("ethdev: deprecate attach and detach functions") >> >> These APIs are around for a few releases now and without an alternative, >> removing the experimental tag from them. >> >> Signed-off-by: Ferruh Yigit > > Dup of http://patches.dpdk.org/patch/45791/ ? Ahh, yes it is, I will mark this as Rejected, thanks.
Re: [dpdk-dev] [PATCH v3 3/4] eal: remove experimental flag of hotplug functions
On 9/28/2018 5:21 PM, Thomas Monjalon wrote: > These functions are quite old and are the only available replacement > for the deprecated attach/detach functions. > > Note: some new functions may (again) replace these hotplug functions, > in future, with better parameters. > > Signed-off-by: Thomas Monjalon > --- > lib/librte_eal/common/eal_common_dev.c | 7 --- > lib/librte_eal/common/include/rte_dev.h | 11 ++- > lib/librte_eal/rte_eal_version.map | 4 ++-- Can remove "-DALLOW_EXPERIMENTAL_API" (or "allow_experimental_apis = true" for meson) from drivers/raw/ifpga_rawdev when these APIs are not experimental anymore. For reference: https://patches.dpdk.org/patch/45836/ It is easy to know when to add "-DALLOW_EXPERIMENTAL_API" but it is hard to know when to remove one, some helper should be good there.
Re: [dpdk-dev] [PATCH v3 1/2] net/tap: change queue fd to be pointers to process private
Hi, what I'm really doing is simply do some private array for all the fd's that each process will allocate it separately which will allow that each process will be able to access the fd's for the queues in order not to overwrite the shared ones and it's working for me this way. Now coming to your comment I'm not sure I fully understand it but, what you are suggesting is to create an array which will be accessed by the process_id to store these fd's in it. As far as I know we don't have something as process_id in dpdk we only have the system process id which is not relevant for the array of fd's. Please correct me if I'm wrong, I think this way we'll be limiting the number of secondary processes to number of queues by tap. Meanwhile, in my solution we don't have such limitation. Kindest regards, Raslan Darawsheh -Original Message- From: Wiles, Keith Sent: Thursday, September 27, 2018 4:18 PM To: Raslan Darawsheh Cc: Thomas Monjalon ; dev@dpdk.org; Shahaf Shuler ; Ori Katz Subject: Re: [PATCH v3 1/2] net/tap: change queue fd to be pointers to process private > On Sep 27, 2018, at 6:19 AM, Raslan Darawsheh wrote: > > change the fds for the queues to be pointers and add new process > private structure and make the queue fds point to it. > > Signed-off-by: Raslan Darawsheh > --- > drivers/net/tap/rte_eth_tap.c | 63 > --- > drivers/net/tap/rte_eth_tap.h | 9 +-- > drivers/net/tap/tap_intr.c| 4 +-- > 3 files changed, 44 insertions(+), 32 deletions(-) > > diff --git a/drivers/net/tap/rte_eth_tap.c > b/drivers/net/tap/rte_eth_tap.c index ad5ae98..8cc4552 100644 > --- a/drivers/net/tap/rte_eth_tap.c > +++ b/drivers/net/tap/rte_eth_tap.c > @@ -64,6 +64,7 @@ > > static struct rte_vdev_driver pmd_tap_drv; static struct > rte_vdev_driver pmd_tun_drv; > +static struct pmd_process_private *process_private; Maybe I do not see some minor point for making fd a pointer to fd when we could have an array of process_private[RTE_PMD_TAP_MAX_QUEUES] instead of a pointer type here. Then we do not need to allocate the memory each PMD and they would still have a private copy. Remove the array of rx/tx fds in the structure. This way it appears we can remove the code below that is making fd a pointer to fd. It just seems overly complex to me at the cost of a few more bytes of memory. This would remove int fd; from the structure and add a pointer to the pid_process_private instead, which is private by default. Did I miss some detail here that makes my comment wrong? > > static const char *valid_arguments[] = { > ETH_TAP_IFACE_ARG, > @@ -331,7 +332,7 @@ pmd_rx_burst(void *queue, struct rte_mbuf **bufs, > uint16_t nb_pkts) > uint16_t data_off = rte_pktmbuf_headroom(mbuf); > int len; > > - len = readv(rxq->fd, *rxq->iovecs, > + len = readv(*rxq->fd, *rxq->iovecs, > 1 + > (rxq->rxmode->offloads & DEV_RX_OFFLOAD_SCATTER ? >rxq->nb_rx_desc : 1)); > @@ -595,7 +596,7 @@ tap_write_mbufs(struct tx_queue *txq, uint16_t num_mbufs, > tap_tx_l4_cksum(l4_cksum, l4_phdr_cksum, l4_raw_cksum); > > /* copy the tx frame data */ > - n = writev(txq->fd, iovecs, j); > + n = writev(*txq->fd, iovecs, j); > if (n <= 0) > break; > (*num_packets)++; > @@ -976,13 +977,13 @@ tap_dev_close(struct rte_eth_dev *dev) > tap_flow_implicit_flush(internals, NULL); > > for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) { > - if (internals->rxq[i].fd != -1) { > - close(internals->rxq[i].fd); > - internals->rxq[i].fd = -1; > + if (*internals->rxq[i].fd != -1) { > + close(*internals->rxq[i].fd); > + *internals->rxq[i].fd = -1; > } > - if (internals->txq[i].fd != -1) { > - close(internals->txq[i].fd); > - internals->txq[i].fd = -1; > + if (*internals->txq[i].fd != -1) { > + close(*internals->txq[i].fd); > + *internals->txq[i].fd = -1; > } > } > > @@ -1007,9 +1008,9 @@ tap_rx_queue_release(void *queue) { > struct rx_queue *rxq = queue; > > - if (rxq && (rxq->fd > 0)) { > - close(rxq->fd); > - rxq->fd = -1; > + if (rxq && rxq->fd && (*rxq->fd > 0)) { > + close(*rxq->fd); > + *rxq->fd = -1; > rte_pktmbuf_free(rxq->pool); > rte_free(rxq->iovecs); > rxq->pool = NULL; > @@ -1022,9 +1023,9 @@ tap_tx_queue_release(void *queue) { > struct tx_queue *txq = queue; > > - if (txq && (txq->fd > 0)) { > - close(txq->fd); > - txq->fd = -1; > + if (txq && txq->fd &
[dpdk-dev] [PATCH v4 1/2] net/tap: change queue fd to be pointers to process private
change the fds for the queues to be pointers and add new process private structure and make the queue fds point to it. Signed-off-by: Raslan Darawsheh --- drivers/net/tap/rte_eth_tap.c | 63 --- drivers/net/tap/rte_eth_tap.h | 9 +-- drivers/net/tap/tap_intr.c| 4 +-- 3 files changed, 44 insertions(+), 32 deletions(-) diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c index ad5ae98..8cc4552 100644 --- a/drivers/net/tap/rte_eth_tap.c +++ b/drivers/net/tap/rte_eth_tap.c @@ -64,6 +64,7 @@ static struct rte_vdev_driver pmd_tap_drv; static struct rte_vdev_driver pmd_tun_drv; +static struct pmd_process_private *process_private; static const char *valid_arguments[] = { ETH_TAP_IFACE_ARG, @@ -331,7 +332,7 @@ pmd_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) uint16_t data_off = rte_pktmbuf_headroom(mbuf); int len; - len = readv(rxq->fd, *rxq->iovecs, + len = readv(*rxq->fd, *rxq->iovecs, 1 + (rxq->rxmode->offloads & DEV_RX_OFFLOAD_SCATTER ? rxq->nb_rx_desc : 1)); @@ -595,7 +596,7 @@ tap_write_mbufs(struct tx_queue *txq, uint16_t num_mbufs, tap_tx_l4_cksum(l4_cksum, l4_phdr_cksum, l4_raw_cksum); /* copy the tx frame data */ - n = writev(txq->fd, iovecs, j); + n = writev(*txq->fd, iovecs, j); if (n <= 0) break; (*num_packets)++; @@ -976,13 +977,13 @@ tap_dev_close(struct rte_eth_dev *dev) tap_flow_implicit_flush(internals, NULL); for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) { - if (internals->rxq[i].fd != -1) { - close(internals->rxq[i].fd); - internals->rxq[i].fd = -1; + if (*internals->rxq[i].fd != -1) { + close(*internals->rxq[i].fd); + *internals->rxq[i].fd = -1; } - if (internals->txq[i].fd != -1) { - close(internals->txq[i].fd); - internals->txq[i].fd = -1; + if (*internals->txq[i].fd != -1) { + close(*internals->txq[i].fd); + *internals->txq[i].fd = -1; } } @@ -1007,9 +1008,9 @@ tap_rx_queue_release(void *queue) { struct rx_queue *rxq = queue; - if (rxq && (rxq->fd > 0)) { - close(rxq->fd); - rxq->fd = -1; + if (rxq && rxq->fd && (*rxq->fd > 0)) { + close(*rxq->fd); + *rxq->fd = -1; rte_pktmbuf_free(rxq->pool); rte_free(rxq->iovecs); rxq->pool = NULL; @@ -1022,9 +1023,9 @@ tap_tx_queue_release(void *queue) { struct tx_queue *txq = queue; - if (txq && (txq->fd > 0)) { - close(txq->fd); - txq->fd = -1; + if (txq && txq->fd && (*txq->fd > 0)) { + close(*txq->fd); + *txq->fd = -1; } } @@ -1214,13 +1215,13 @@ tap_setup_queue(struct rte_eth_dev *dev, struct rte_gso_ctx *gso_ctx; if (is_rx) { - fd = &rx->fd; - other_fd = &tx->fd; + fd = rx->fd; + other_fd = tx->fd; dir = "rx"; gso_ctx = NULL; } else { - fd = &tx->fd; - other_fd = &rx->fd; + fd = tx->fd; + other_fd = rx->fd; dir = "tx"; gso_ctx = &tx->gso_ctx; } @@ -1331,7 +1332,7 @@ tap_rx_queue_setup(struct rte_eth_dev *dev, } TAP_LOG(DEBUG, " RX TUNTAP device name %s, qid %d on fd %d", - internals->name, rx_queue_id, internals->rxq[rx_queue_id].fd); + internals->name, rx_queue_id, *internals->rxq[rx_queue_id].fd); return 0; @@ -1371,7 +1372,7 @@ tap_tx_queue_setup(struct rte_eth_dev *dev, return -1; TAP_LOG(DEBUG, " TX TUNTAP device name %s, qid %d on fd %d csum %s", - internals->name, tx_queue_id, internals->txq[tx_queue_id].fd, + internals->name, tx_queue_id, *internals->txq[tx_queue_id].fd, txq->csum ? "on" : "off"); return 0; @@ -1633,6 +1634,10 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, char *tap_name, goto error_exit_nodev; } + process_private = (struct pmd_process_private *) + rte_zmalloc_socket(tap_name, sizeof(struct pmd_process_private), + RTE_CACHE_LINE_SIZE, dev->device->numa_node); + pmd = dev->data->dev_private; pmd->dev = dev; snprintf(pmd->name, sizeof(pmd->name), "%s", tap_name); @@ -1669,8 +1674,10 @@ eth_dev_tap_
[dpdk-dev] [PATCH v4 2/2] net/tap: add queues when attaching from secondary process
In the case the device is created by the primary process, the secondary must request some file descriptors to attach the queues. The file descriptors are shared via IPC Unix socket. Thanks to the IPC synchronization, the secondary process is now able to do Rx/Tx on a TAP created by the primary process. Signed-off-by: Raslan Darawsheh Signed-off-by: Thomas Monjalon --- v2: - translate file descriptors via IPC API - add documentation v3: - rabse the commit - use private static array for fd's to be local for each process v4: - change strcpy to be strlcpy - remove the fixme and todo from comments. --- --- doc/guides/nics/tap.rst| 16 doc/guides/rel_notes/release_18_11.rst | 4 + drivers/net/tap/Makefile | 1 + drivers/net/tap/rte_eth_tap.c | 133 - 4 files changed, 153 insertions(+), 1 deletion(-) diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst index 2714868..d1f3e1c 100644 --- a/doc/guides/nics/tap.rst +++ b/doc/guides/nics/tap.rst @@ -152,6 +152,22 @@ Distribute IPv4 TCP packets using RSS to a given MAC address over queues 0-3:: testpmd> flow create 0 priority 4 ingress pattern eth dst is 0a:0b:0c:0d:0e:0f \ / ipv4 / tcp / end actions rss queues 0 1 2 3 end / end +Multi-process sharing +- + +It is possible to attach an existing TAP device in a secondary process, +by declaring it as a vdev with the same name as in the primary process, +and without any parameter. + +The port attached in a secondary process will give access to the +statistics and the queues. +Therefore it can be used for monitoring or Rx/Tx processing. + +The IPC synchronization of Rx/Tx queues is currently limited: + + - Only 8 queues + - Synchronized on probing, but not on later port update + Example --- diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst index 8c4bb54..a9dda5b 100644 --- a/doc/guides/rel_notes/release_18_11.rst +++ b/doc/guides/rel_notes/release_18_11.rst @@ -67,6 +67,10 @@ New Features SR-IOV option in Hyper-V and Azure. This is an alternative to the previous vdev_netvsc, tap, and failsafe drivers combination. +* **Added TAP Rx/Tx queues sharing with a secondary process.** + + A secondary process can attach a TAP device created in the primary process, + probe the queues, and process Rx/Tx in a secondary process. API Changes --- diff --git a/drivers/net/tap/Makefile b/drivers/net/tap/Makefile index 3243365..7748283 100644 --- a/drivers/net/tap/Makefile +++ b/drivers/net/tap/Makefile @@ -22,6 +22,7 @@ CFLAGS += -O3 CFLAGS += -I$(SRCDIR) CFLAGS += -I. CFLAGS += $(WERROR_FLAGS) +CFLAGS += -DALLOW_EXPERIMENTAL_API LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs -lrte_hash LDLIBS += -lrte_bus_vdev -lrte_gso diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c index 8cc4552..a751f76 100644 --- a/drivers/net/tap/rte_eth_tap.c +++ b/drivers/net/tap/rte_eth_tap.c @@ -16,6 +16,8 @@ #include #include #include +#include +#include #include #include @@ -62,6 +64,9 @@ #define TAP_GSO_MBUFS_NUM \ (TAP_GSO_MBUFS_PER_CORE * TAP_GSO_MBUF_CACHE_SIZE) +/* IPC key for queue fds sync */ +#define TAP_MP_KEY "tap_mp_sync_queues" + static struct rte_vdev_driver pmd_tap_drv; static struct rte_vdev_driver pmd_tun_drv; static struct pmd_process_private *process_private; @@ -101,6 +106,17 @@ enum ioctl_mode { REMOTE_ONLY, }; +/* Message header to synchronize queues via IPC */ +struct ipc_queues { + char port_name[RTE_DEV_NAME_MAX_LEN]; + int rxq_count; + int txq_count; + /* +* The file descriptors are in the dedicated part +* of the Unix message to be translated by the kernel. +*/ +}; + static int tap_intr_handle_set(struct rte_eth_dev *dev, int set); /** @@ -1980,6 +1996,99 @@ rte_pmd_tun_probe(struct rte_vdev_device *dev) return ret; } +/* Request queue file descriptors from secondary to primary. */ +static int +tap_mp_attach_queues(const char *port_name, struct rte_eth_dev *dev) +{ + int ret; + struct timespec timeout = {.tv_sec = 1, .tv_nsec = 0}; + struct rte_mp_msg request, *reply; + struct rte_mp_reply replies; + struct ipc_queues *request_param = (struct ipc_queues *)request.param; + struct ipc_queues *reply_param; + int queue, fd_iterator; + + /* Prepare the request */ + strlcpy(request.name, TAP_MP_KEY, sizeof(request.name)); + strlcpy(request_param->port_name, port_name, + sizeof(request_param->port_name)); + request.len_param = sizeof(*request_param); + /* Send request and receive reply */ + ret = rte_mp_request_sync(&request, &replies, &timeout); + if (ret < 0) { + TAP_LOG(ERR, "Failed to request queues from primary: %d
[dpdk-dev] [PATCH v2] mbuf: clarify QINQ flag usage
Update implementation that when PKT_RX_QINQ_STRIPPED mbuf ol_flags set by PMD, PKT_RX_QINQ, PKT_RX_VLAN_STRIPPED & PKT_RX_VLAN should be also set. Clarify mbuf documentations that when PKT_RX_QINQ set PKT_RX_VLAN also should be set. So that appllication can rely on PKT_RX_QINQ flag to access both mbuf.vlan_tci & mbuf.vlan_tci_outer Signed-off-by: Ferruh Yigit --- Cc: Hyong Youb Kim Cc: John Daley --- app/test-pmd/rxonly.c| 6 +++--- doc/guides/nics/features.rst | 7 --- drivers/net/i40e/i40e_rxtx.c | 3 ++- lib/librte_mbuf/rte_mbuf.c | 1 + lib/librte_mbuf/rte_mbuf.h | 5 +++-- 5 files changed, 13 insertions(+), 9 deletions(-) diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c index a93d80612..21b60062f 100644 --- a/app/test-pmd/rxonly.c +++ b/app/test-pmd/rxonly.c @@ -130,11 +130,11 @@ pkt_burst_receive(struct fwd_stream *fs) } if (ol_flags & PKT_RX_TIMESTAMP) printf(" - timestamp %"PRIu64" ", mb->timestamp); - if (ol_flags & PKT_RX_VLAN_STRIPPED) - printf(" - VLAN tci=0x%x", mb->vlan_tci); - if (ol_flags & PKT_RX_QINQ_STRIPPED) + if (ol_flags & PKT_RX_QINQ) printf(" - QinQ VLAN tci=0x%x, VLAN tci outer=0x%x", mb->vlan_tci, mb->vlan_tci_outer); + else if (ol_flags & PKT_RX_VLAN_STRIPPED) + printf(" - VLAN tci=0x%x", mb->vlan_tci); if (mb->packet_type) { rte_get_ptype_name(mb->packet_type, buf, sizeof(buf)); printf(" - hw ptype: %s", buf); diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst index b085bda86..c0cbe3784 100644 --- a/doc/guides/nics/features.rst +++ b/doc/guides/nics/features.rst @@ -528,7 +528,7 @@ Supports VLAN offload to hardware. * **[uses] rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_VLAN_STRIP,DEV_RX_OFFLOAD_VLAN_FILTER,DEV_RX_OFFLOAD_VLAN_EXTEND``. * **[uses] rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_VLAN_INSERT``. * **[implements] eth_dev_ops**: ``vlan_offload_set``. -* **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_VLAN_STRIPPED``, ``mbuf.vlan_tci``. +* **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_VLAN_STRIPPED``, ``mbuf.ol_flags:PKT_RX_VLAN`` ``mbuf.vlan_tci``. * **[provides] rte_eth_dev_info**: ``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_VLAN_STRIP``, ``tx_offload_capa,tx_queue_offload_capa:DEV_TX_OFFLOAD_VLAN_INSERT``. * **[related]API**: ``rte_eth_dev_set_vlan_offload()``, @@ -545,8 +545,9 @@ Supports QinQ (queue in queue) offload. * **[uses] rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_QINQ_STRIP``. * **[uses] rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_QINQ_INSERT``. * **[uses] mbuf**: ``mbuf.ol_flags:PKT_TX_QINQ_PKT``. -* **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_QINQ_STRIPPED``, ``mbuf.vlan_tci``, - ``mbuf.vlan_tci_outer``. +* **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_QINQ_STRIPPED``, ``mbuf.ol_flags:PKT_RX_QINQ``, + ``mbuf.ol_flags:PKT_RX_VLAN_STRIPPED``, ``mbuf.ol_flags:PKT_RX_VLAN`` + ``mbuf.vlan_tci``, ``mbuf.vlan_tci_outer``. * **[provides] rte_eth_dev_info**: ``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_QINQ_STRIP``, ``tx_offload_capa,tx_queue_offload_capa:DEV_TX_OFFLOAD_QINQ_INSERT``. diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 7c986d535..b2819f757 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -83,7 +83,8 @@ i40e_rxd_to_vlan_tci(struct rte_mbuf *mb, volatile union i40e_rx_desc *rxdp) #ifndef RTE_LIBRTE_I40E_16BYTE_RX_DESC if (rte_le_to_cpu_16(rxdp->wb.qword2.ext_status) & (1 << I40E_RX_DESC_EXT_STATUS_L2TAG2P_SHIFT)) { - mb->ol_flags |= PKT_RX_QINQ_STRIPPED; + mb->ol_flags |= PKT_RX_QINQ_STRIPPED | PKT_RX_QINQ | + PKT_RX_VLAN_STRIPPED | PKT_RX_VLAN; mb->vlan_tci_outer = mb->vlan_tci; mb->vlan_tci = rte_le_to_cpu_16(rxdp->wb.qword2.l2tag2_2); PMD_RX_LOG(DEBUG, "Descriptor l2tag2_1: %u, l2tag2_2: %u", diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c index e714c5a59..05a5a17fe 100644 --- a/lib/librte_mbuf/rte_mbuf.c +++ b/lib/librte_mbuf/rte_mbuf.c @@ -297,6 +297,7 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask) case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP"; case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST"; case PKT_RX_QINQ_STRIPPED: return "PKT_RX_QINQ_STRIPPED"; + case PKT_RX_QINQ: return "PKT_RX_QINQ"; case PKT_RX_LRO: return "PKT_RX_LRO"; case PKT_RX_TIMESTAMP: return "PKT_RX_TIMESTAMP"; case PKT_RX_SEC_OFFLOAD: return "PKT_RX_SEC_OFFLOAD"; diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h in
Re: [dpdk-dev] [PATCH v2] mbuf: clarify QINQ flag usage
On 10/2/18 2:36 PM, Ferruh Yigit wrote: Update implementation that when PKT_RX_QINQ_STRIPPED mbuf ol_flags set by PMD, PKT_RX_QINQ, PKT_RX_VLAN_STRIPPED & PKT_RX_VLAN should be also set. Clarify mbuf documentations that when PKT_RX_QINQ set PKT_RX_VLAN also should be set. So that appllication can rely on PKT_RX_QINQ flag to access both mbuf.vlan_tci & mbuf.vlan_tci_outer Signed-off-by: Ferruh Yigit Reviewed-by: Andrew Rybchenko
Re: [dpdk-dev] [PATCH v4 2/2] net/tap: add queues when attaching from secondary process
02/10/2018 12:34, Raslan Darawsheh: > @@ -2056,6 +2179,13 @@ rte_pmd_tap_probe(struct rte_vdev_device *dev) > TAP_LOG(NOTICE, "Initializing pmd_tap for %s as %s", > name, tap_name); > > + /* Register IPC feed callback */ > + ret = rte_mp_action_register(TAP_MP_KEY, tap_mp_sync_queues); > + if (ret < 0 && rte_errno != EEXIST) { > + TAP_LOG(ERR, "%s: Failed to register IPC callback: %s", > + tuntap_name, strerror(rte_errno)); > + goto leave; > + } > ret = eth_dev_tap_create(dev, tap_name, remote_iface, &user_mac, > ETH_TUNTAP_TYPE_TAP); Is it an issue registering tap_mp_sync_queues at each tap probing? Should we do it only once?
Re: [dpdk-dev] [PATCH v4 2/2] net/tap: add queues when attaching from secondary process
02/10/2018 12:34, Raslan Darawsheh: > --- a/doc/guides/rel_notes/release_18_11.rst > +++ b/doc/guides/rel_notes/release_18_11.rst > @@ -67,6 +67,10 @@ New Features >SR-IOV option in Hyper-V and Azure. This is an alternative to the previous >vdev_netvsc, tap, and failsafe drivers combination. > > +* **Added TAP Rx/Tx queues sharing with a secondary process.** > + > + A secondary process can attach a TAP device created in the primary process, > + probe the queues, and process Rx/Tx in a secondary process. A blank line is missing here. > @@ -2006,9 +2115,23 @@ rte_pmd_tap_probe(struct rte_vdev_device *dev) > TAP_LOG(ERR, "Failed to probe %s", name); > return -1; > } > - /* TODO: request info from primary to set up Rx and Tx */ > eth_dev->dev_ops = &ops; > eth_dev->device = &dev->device; > + eth_dev->rx_pkt_burst = pmd_rx_burst; > + eth_dev->tx_pkt_burst = pmd_tx_burst; > + if (!rte_eal_primary_proc_alive(NULL)) { > + TAP_LOG(ERR, "Primary process is missing"); > + return -1; > + } > + process_private = (struct pmd_process_private *) > + rte_zmalloc_socket(name, > + sizeof(struct pmd_process_private), > + RTE_CACHE_LINE_SIZE, > + eth_dev->device->numa_node); > + > + ret = tap_mp_attach_queues(name, eth_dev); > + if (ret != 0) > + return -1; > rte_eth_dev_probing_finish(eth_dev); > return 0; > } Should we manage rte_pmd_tun_probe too?
Re: [dpdk-dev] [PATCH v3] ethdev: get rxq interrupt fd
On 9/29/2018 3:12 AM, Xiaoyun Li wrote: > Some users want to use their own epoll instances to control both > DPDK rxq interrupt fds and their own other fds. So added a function > to get rxq interrupt fd based on port id and queue id. > > Signed-off-by: Xiaoyun Li Reviewed-by: Ferruh Yigit
Re: [dpdk-dev] [PATCH v4 2/2] net/tap: add queues when attaching from secondary process
It should be as of per device so we should do it for each port alone since several ports can have different queues. Moreover, if the port that has the registration was closed or unplugged we'll not be able to sync qeues for other ports. Kindest regards, Raslan Darawsheh -Original Message- From: Thomas Monjalon Sent: Tuesday, October 2, 2018 1:41 PM To: Raslan Darawsheh Cc: dev@dpdk.org; keith.wi...@intel.com; Shahaf Shuler ; Ori Kam Subject: Re: [dpdk-dev] [PATCH v4 2/2] net/tap: add queues when attaching from secondary process 02/10/2018 12:34, Raslan Darawsheh: > @@ -2056,6 +2179,13 @@ rte_pmd_tap_probe(struct rte_vdev_device *dev) > TAP_LOG(NOTICE, "Initializing pmd_tap for %s as %s", > name, tap_name); > > + /* Register IPC feed callback */ > + ret = rte_mp_action_register(TAP_MP_KEY, tap_mp_sync_queues); > + if (ret < 0 && rte_errno != EEXIST) { > + TAP_LOG(ERR, "%s: Failed to register IPC callback: %s", > + tuntap_name, strerror(rte_errno)); > + goto leave; > + } > ret = eth_dev_tap_create(dev, tap_name, remote_iface, &user_mac, > ETH_TUNTAP_TYPE_TAP); Is it an issue registering tap_mp_sync_queues at each tap probing? Should we do it only once?
[dpdk-dev] [PATCH v2 2/2] mbuf: fix Tx offload mask
Fixes missing PKT_TX_UDP_SEG, PKT_TX_OUTER_IPV6,PKT_TX_OUTER_IPV4, PKT_TX_IPV6 and PKT_TX_IPV4 values in PKT_TX_OFFLOAD_MASK. Also sort them in bit wise order to recognize missing items later. Fixes: 6d18505efaa6 ("vhost: support UDP Fragmentation Offload") Fixes: 1c3b7c33e977 ("mbuf: add Tx offloading flags for tunnels") Fixes: 711ba9e23e68 ("mbuf: remove aliasing of Tx offloading flags with Rx ones") Cc: sta...@dpdk.org Cc: jiayu...@intel.com Signed-off-by: Jerin Jacob --- v2: - Add all missing PKT_TX_ types - Sort them in bit mask order(Ferruh Yigit) --- lib/librte_mbuf/rte_mbuf.h | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index a50b05c64..c8ebc3230 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -334,16 +334,21 @@ extern "C" { * which can be set for packet. */ #define PKT_TX_OFFLOAD_MASK (\ + PKT_TX_OUTER_IPV6 | \ + PKT_TX_OUTER_IPV4 | \ + PKT_TX_OUTER_IP_CKSUM | \ + PKT_TX_VLAN_PKT |\ + PKT_TX_IPV6 |\ + PKT_TX_IPV4 |\ PKT_TX_IP_CKSUM |\ PKT_TX_L4_MASK | \ - PKT_TX_OUTER_IP_CKSUM | \ - PKT_TX_TCP_SEG | \ PKT_TX_IEEE1588_TMST | \ + PKT_TX_TCP_SEG | \ PKT_TX_QINQ_PKT |\ - PKT_TX_VLAN_PKT |\ PKT_TX_TUNNEL_MASK | \ PKT_TX_MACSEC | \ - PKT_TX_SEC_OFFLOAD) + PKT_TX_SEC_OFFLOAD |\ + PKT_TX_UDP_SEG) /** * Mbuf having an external buffer attached. shinfo in mbuf must be filled. -- 2.19.0
[dpdk-dev] [PATCH v2 1/2] ethdev: add SCTP Rx checksum offload support
Added SCTP Rx checksum offload support Signed-off-by: Jerin Jacob --- v2: - Fix printf formatting error(Ferruh Yigit) --- app/test-pmd/config.c | 9 + doc/guides/nics/features.rst | 4 ++-- lib/librte_ethdev/rte_ethdev.c | 1 + lib/librte_ethdev/rte_ethdev.h | 1 + 4 files changed, 13 insertions(+), 2 deletions(-) diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index 794aa5268..1adc9b94b 100644 --- a/app/test-pmd/config.c +++ b/app/test-pmd/config.c @@ -576,6 +576,15 @@ port_offload_cap_display(portid_t port_id) printf("off\n"); } + if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_SCTP_CKSUM) { + printf("RX SCTP checksum: "); + if (ports[port_id].dev_conf.rxmode.offloads & + DEV_RX_OFFLOAD_SCTP_CKSUM) + printf("on\n"); + else + printf("off\n"); + } + if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_OUTER_IPV4_CKSUM) { printf("RX Outer IPv4 checksum:"); if (ports[port_id].dev_conf.rxmode.offloads & diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst index b085bda86..d42489b6d 100644 --- a/doc/guides/nics/features.rst +++ b/doc/guides/nics/features.rst @@ -576,7 +576,7 @@ L4 checksum offload Supports L4 checksum offload. -* **[uses] rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_UDP_CKSUM,DEV_RX_OFFLOAD_TCP_CKSUM``. +* **[uses] rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_UDP_CKSUM,DEV_RX_OFFLOAD_TCP_CKSUM,DEV_RX_OFFLOAD_SCTP_CKSUM``. * **[uses] rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_UDP_CKSUM,DEV_TX_OFFLOAD_TCP_CKSUM,DEV_TX_OFFLOAD_SCTP_CKSUM``. * **[uses] mbuf**: ``mbuf.ol_flags:PKT_TX_IPV4`` | ``PKT_TX_IPV6``, ``mbuf.ol_flags:PKT_TX_L4_NO_CKSUM`` | ``PKT_TX_TCP_CKSUM`` | @@ -584,7 +584,7 @@ Supports L4 checksum offload. * **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_L4_CKSUM_UNKNOWN`` | ``PKT_RX_L4_CKSUM_BAD`` | ``PKT_RX_L4_CKSUM_GOOD`` | ``PKT_RX_L4_CKSUM_NONE``. -* **[provides] rte_eth_dev_info**: ``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_UDP_CKSUM,DEV_RX_OFFLOAD_TCP_CKSUM``, +* **[provides] rte_eth_dev_info**: ``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_UDP_CKSUM,DEV_RX_OFFLOAD_TCP_CKSUM,DEV_RX_OFFLOAD_SCTP_CKSUM``, ``tx_offload_capa,tx_queue_offload_capa:DEV_TX_OFFLOAD_UDP_CKSUM,DEV_TX_OFFLOAD_TCP_CKSUM,DEV_TX_OFFLOAD_SCTP_CKSUM``. .. _nic_features_hw_timestamp: diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c index ef99f7068..e9a82fe7f 100644 --- a/lib/librte_ethdev/rte_ethdev.c +++ b/lib/librte_ethdev/rte_ethdev.c @@ -126,6 +126,7 @@ static const struct { RTE_RX_OFFLOAD_BIT2STR(TIMESTAMP), RTE_RX_OFFLOAD_BIT2STR(SECURITY), RTE_RX_OFFLOAD_BIT2STR(KEEP_CRC), + RTE_RX_OFFLOAD_BIT2STR(SCTP_CKSUM), }; #undef RTE_RX_OFFLOAD_BIT2STR diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h index 012577b0a..d02db14ad 100644 --- a/lib/librte_ethdev/rte_ethdev.h +++ b/lib/librte_ethdev/rte_ethdev.h @@ -888,6 +888,7 @@ struct rte_eth_conf { #define DEV_RX_OFFLOAD_TIMESTAMP 0x4000 #define DEV_RX_OFFLOAD_SECURITY 0x8000 #define DEV_RX_OFFLOAD_KEEP_CRC0x0001 +#define DEV_RX_OFFLOAD_SCTP_CKSUM 0x0002 #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \ DEV_RX_OFFLOAD_UDP_CKSUM | \ -- 2.19.0
Re: [dpdk-dev] [PATCH] build: add drivers_install_subdir meson option
On Mon, 2018-10-01 at 12:24 +0100, Luca Boccassi wrote: > On Mon, 2018-10-01 at 12:06 +0100, Bruce Richardson wrote: > > On Mon, Oct 01, 2018 at 12:42:09PM +0200, Timothy Redaelli wrote: > > > On Mon, 01 Oct 2018 10:46:02 +0100 > > > Luca Boccassi wrote: > > > > > > > On Mon, 2018-10-01 at 10:25 +0100, Bruce Richardson wrote: > > > > > On Mon, Oct 01, 2018 at 10:17:14AM +0100, Bruce Richardson > > > > > wrote: > > > > > > On Fri, Sep 28, 2018 at 06:58:03PM +0100, Luca Boccassi > > > > > > wrote: > > > > > > > Allow users and packagers to override the default > > > > > > > dpdk/drivers > > > > > > > subdirectory where the PMDs get installed under $lib. > > > > > > > > > > > > > > Signed-off-by: Luca Boccassi > > > > > > > --- > > > > > > > > > > > > I'm ok with this change, but what is the current location > > > > > > used by > > > > > > distro's > > > > > > right now? I mistakenly never checked what was done before I > > > > > > used > > > > > > dpdk/drivers as a default value, and I'd like the default to > > > > > > match > > > > > > the > > > > > > common option if possible. > > > > > > > > > > > > /Bruce > > > > > > > > > > > > > > > > Replying to my own question, I've just checked on CentOS and > > > > > Debian, > > > > > and it > > > > > appears both are using directory "dpdk-pmds" as the subdir > > > > > name. > > > > > Therefore, > > > > > let's just make that the default. [Does it need to be > > > > > configurable in > > > > > that > > > > > case?] > > > > > > > > > > /Bruce > > > > > > > > If the default is the one I expect then I'm fine without having > > > > an > > > > option (actually happier - less things to configure). > > > > > > > > But in Debian/Ubuntu it's dpdk-MAJORVER-drivers since last > > > > January :-) > > > > We changed because using a single directory creates problems when > > > > multiple different ABI versions are installed, due to the EAL > > > > autoload > > > > from that directory. So we need a different subdirectory per ABI > > > > revision. > > > > > > > > We were actually talking with Timothy a while ago to make this > > > > consistent across our distros, and perhaps Marco can chip in as > > > > well. > > > > > > > > Timothy, Marco, is using dpdk-MAJORVER-$something ok for you? I'm > > > > not > > > > too fussy on $something, it can be drivers or pmds or something > > > > else. > > > > > > > > > > LGTM. > > > If needed, we can just do a compatibility symlink using the current > > > dpdk-pmds path > > > > > > > One suggestion/comment. Would using a unique directory per release > > not lead > > to clobbering up the lib directory unnecessarily? How about having a > > single > > "dpdk" or "dpdk-pmds" directory in lib, and having $MAJORVER as a > > subdir > > under that? > > > > E.g. dpdk/pmds-18.08/, dpdk/pmds-18.11/, or dpdk-pmds/18.08/ > > dpdk-pmds/18.11 > > > > [The former of the above would be my preference, since I don't like > > having > > hypenated names, and like having "dpdk" alone as a folder name :-)] > > > > /Bruce > > dpdk/pmds-XX.YY/ would work for me. Timothy and Marco? That would work for us. However, I would suggest to have the path to be configurable (feature to be dropped in maybe next release). Just to make sure the transition can happen without pain in the remote circumstance that something goes wrong with packaging... > -- Marco V SUSE LINUX GmbH | GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Maxfeldstr. 5, D-90409, Nürnberg
Re: [dpdk-dev] [PATCH v8] app/testpmd: add noisy neighbour forwarding mode
Hi Jens, > -Original Message- > From: Jens Freimann [mailto:jfreim...@redhat.com] > Sent: Tuesday, October 2, 2018 8:44 AM > To: dev@dpdk.org > Cc: ai...@redhat.com; jan.scheur...@ericsson.com; Richardson, Bruce > ; tho...@monjalon.net; > maxime.coque...@redhat.com; Ananyev, Konstantin > ; Yigit, Ferruh ; > Iremonger, Bernard ; ktray...@redhat.com > Subject: [PATCH v8] app/testpmd: add noisy neighbour forwarding mode > > diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c index > 9220e1c1b..97e0dfa49 100644 > --- a/app/test-pmd/parameters.c > +++ b/app/test-pmd/parameters.c The usage() function needs to be updated with the noisy information after line 192. > diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst > b/doc/guides/testpmd_app_ug/testpmd_funcs.rst > index 3a73000a6..99a005a0c 100644 > --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst > +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst > -Example:: > +* ``noisy``: Noisy neighbour simulation. > + Simulate more realistic behavior of a guest machine engaged in > +receiving > + and sending packets performing Virtual Network Function (VNF). > > +Example:: A line has been deleted after the above line, it should be restored to correct the formatting in the html output > testpmd> set fwd rxonly > > Set rxonly packet forwarding mode > -- > 2.17.1 Regards, Bernard.
Re: [dpdk-dev] [PATCH v4 2/2] net/tap: add queues when attaching from secondary process
02/10/2018 12:50, Raslan Darawsheh: > From: Thomas Monjalon > > 02/10/2018 12:34, Raslan Darawsheh: > > > @@ -2056,6 +2179,13 @@ rte_pmd_tap_probe(struct rte_vdev_device *dev) > > > > > > TAP_LOG(NOTICE, "Initializing pmd_tap for %s as %s", > > > > > > name, tap_name); > > > > > > + /* Register IPC feed callback */ > > > + ret = rte_mp_action_register(TAP_MP_KEY, tap_mp_sync_queues); > > > + if (ret < 0 && rte_errno != EEXIST) { > > > + TAP_LOG(ERR, "%s: Failed to register IPC callback: %s", > > > + tuntap_name, strerror(rte_errno)); > > > + goto leave; > > > + } > > > > > > ret = eth_dev_tap_create(dev, tap_name, remote_iface, &user_mac, > > > > > > ETH_TUNTAP_TYPE_TAP); > > > > Is it an issue registering tap_mp_sync_queues at each tap probing? > > Should we do it only once? > > It should be as of per device so we should do it for each port alone since > several ports can have different queues. > > Moreover, if the port that has the registration was closed or unplugged we'll > not be able to sync qeues for other ports. I think we should do register on first tap device probing and never unregisters. Ferruh, any opinion?
Re: [dpdk-dev] [PATCH v6 1/8] net/mvneta: add neta PMD skeleton
On 10/1/2018 10:26 AM, Andrzej Ostruszka wrote: > From: Zyta Szpak > > Add neta pmd driver skeleton providing base for the further > development. > > Signed-off-by: Natalie Samsonov > Signed-off-by: Yelena Krivosheev > Signed-off-by: Dmitri Epshtein > Signed-off-by: Zyta Szpak > Signed-off-by: Andrzej Ostruszka > --- > MAINTAINERS | 8 + > config/common_base| 5 + > devtools/test-build.sh| 2 + > doc/guides/nics/features/mvneta.ini | 11 + > doc/guides/nics/mvneta.rst| 152 +++ dpdk/doc/guides/nics/mvneta.rst: WARNING: document isn't included in any toctree Please add document to doc/guides/nics/index.rst <...> > +Config File Options > +--- > + > +The following options can be modified in the ``config`` file. > + > +- ``CONFIG_RTE_LIBRTE_MVNETA_PMD`` (default ``n``) > + > +Toggle compilation of the librte_pmd_mvneta driver. > + Good to have another section to document "Runtime options" (iface) > + > +Usage example > +^ > + > +.. code-block:: console > + > + ./testpmd --vdev=net_mvneta,iface=eth0,iface=eth1 \ > +-c 3 -- -i --p 3 -a > + > + > +Building DPDK > +- > + > +Driver needs precompiled MUSDK library during compilation. > + > +.. code-block:: console > + > + export CROSS_COMPILE=/bin/aarch64-linux-gnu- > + ./bootstrap > + ./configure --host=aarch64-linux-gnu --enable-bpool-dma=64 getting "configure: WARNING: unrecognized options: --enable-bpool-dma" Is this config option still valid for 18.09? <...> > + > +static int mvneta_dev_num; > +static int mvneta_lcore_first; > +static int mvneta_lcore_last; These static variables seems assigned but not used, can you please check? <...> > + > +RTE_PMD_REGISTER_VDEV(net_mvneta, pmd_mvneta_drv); Need to document supported devargs with RTE_PMD_REGISTER_PARAM_STRING <...> > +struct mvneta_priv { > + /* Hot fields, used in fast path. */ > + struct neta_ppio*ppio;/**< Port handler pointer */ > + > + uint8_t pp_id; > + uint8_t ppio_id;/* ppio port id */ > + uint8_t uc_mc_flushed; > + uint8_t multiseg; > + > + struct neta_ppio_params ppio_params; > + uint16_t nb_rx_queues; Do you need this private variable, isn't it duplicate of "dev->data->nb_rx_queues"? And as far as I can see "dev->data->nb_rx_queues" one used.
Re: [dpdk-dev] [PATCH v6 2/8] net/mvneta: add Rx/Tx support
On 10/1/2018 10:26 AM, Andrzej Ostruszka wrote: > From: Zyta Szpak > > Add part of PMD for actual reception/transmission. > > Signed-off-by: Yelena Krivosheev > Signed-off-by: Dmitri Epshtein > Signed-off-by: Zyta Szpak <...> > @@ -0,0 +1,850 @@ > +#include "mvneta_rxtx.h" > + > +uint64_t cookie_addr_high = MVNETA_COOKIE_ADDR_INVALID; > +uint16_t rx_desc_free_thresh = MRVL_NETA_BUF_RELEASE_BURST_SIZE_MIN; Can these global variables be static? If not please add a mvneta_ prefix, but please try to make them static.
Re: [dpdk-dev] [PATCH v8] app/testpmd: add noisy neighbour forwarding mode
On 10/02/2018 08:44 AM, Jens Freimann wrote: > This adds a new forwarding mode to testpmd to simulate > more realistic behavior of a guest machine engaged in receiving > and sending packets performing Virtual Network Function (VNF). > As there's going to be a v9 anyway, you can also fix the below error messages to be '>= 0' > + if (!strcmp(lgopts[opt_idx].name, > + "noisy-lkup-memory")) { > + n = atoi(optarg); > + if (n >= 0) > + noisy_lkup_mem_sz = n; > + else > + rte_exit(EXIT_FAILURE, > + "noisy-lkup-memory must be > > 0\n"); > + } > + if (!strcmp(lgopts[opt_idx].name, > + "noisy-lkup-num-writes")) { > + n = atoi(optarg); > + if (n >= 0) > + noisy_lkup_num_writes = n; > + else > + rte_exit(EXIT_FAILURE, > + "noisy-lkup-num-writes must be > > 0\n"); > + } > + if (!strcmp(lgopts[opt_idx].name, > + "noisy-lkup-num-reads")) { > + n = atoi(optarg); > + if (n >= 0) > + noisy_lkup_num_reads = n; > + else > + rte_exit(EXIT_FAILURE, > + "noisy-lkup-num-reads must be > > 0\n"); > + } > + if (!strcmp(lgopts[opt_idx].name, > + "noisy-lkup-num-reads-writes")) { > + n = atoi(optarg); > + if (n >= 0) > + noisy_lkup_num_reads_writes = n; > + else > + rte_exit(EXIT_FAILURE, > + "noisy-lkup-num-reads-writes > must be > 0\n"); > + }
Re: [dpdk-dev] [PATCH] build: add drivers_install_subdir meson option
On Tue, Oct 02, 2018 at 01:02:26PM +0200, Marco Varlese wrote: > On Mon, 2018-10-01 at 12:24 +0100, Luca Boccassi wrote: > > On Mon, 2018-10-01 at 12:06 +0100, Bruce Richardson wrote: > > > On Mon, Oct 01, 2018 at 12:42:09PM +0200, Timothy Redaelli wrote: > > > > On Mon, 01 Oct 2018 10:46:02 +0100 > > > > Luca Boccassi wrote: > > > > > > > > > On Mon, 2018-10-01 at 10:25 +0100, Bruce Richardson wrote: > > > > > > On Mon, Oct 01, 2018 at 10:17:14AM +0100, Bruce Richardson > > > > > > wrote: > > > > > > > On Fri, Sep 28, 2018 at 06:58:03PM +0100, Luca Boccassi > > > > > > > wrote: > > > > > > > > Allow users and packagers to override the default > > > > > > > > dpdk/drivers > > > > > > > > subdirectory where the PMDs get installed under $lib. > > > > > > > > > > > > > > > > Signed-off-by: Luca Boccassi > > > > > > > > --- > > > > > > > > > > > > > > I'm ok with this change, but what is the current location > > > > > > > used by > > > > > > > distro's > > > > > > > right now? I mistakenly never checked what was done before I > > > > > > > used > > > > > > > dpdk/drivers as a default value, and I'd like the default to > > > > > > > match > > > > > > > the > > > > > > > common option if possible. > > > > > > > > > > > > > > /Bruce > > > > > > > > > > > > > > > > > > > Replying to my own question, I've just checked on CentOS and > > > > > > Debian, > > > > > > and it > > > > > > appears both are using directory "dpdk-pmds" as the subdir > > > > > > name. > > > > > > Therefore, > > > > > > let's just make that the default. [Does it need to be > > > > > > configurable in > > > > > > that > > > > > > case?] > > > > > > > > > > > > /Bruce > > > > > > > > > > If the default is the one I expect then I'm fine without having > > > > > an > > > > > option (actually happier - less things to configure). > > > > > > > > > > But in Debian/Ubuntu it's dpdk-MAJORVER-drivers since last > > > > > January :-) > > > > > We changed because using a single directory creates problems when > > > > > multiple different ABI versions are installed, due to the EAL > > > > > autoload > > > > > from that directory. So we need a different subdirectory per ABI > > > > > revision. > > > > > > > > > > We were actually talking with Timothy a while ago to make this > > > > > consistent across our distros, and perhaps Marco can chip in as > > > > > well. > > > > > > > > > > Timothy, Marco, is using dpdk-MAJORVER-$something ok for you? I'm > > > > > not > > > > > too fussy on $something, it can be drivers or pmds or something > > > > > else. > > > > > > > > > > > > > LGTM. > > > > If needed, we can just do a compatibility symlink using the current > > > > dpdk-pmds path > > > > > > > > > > One suggestion/comment. Would using a unique directory per release > > > not lead > > > to clobbering up the lib directory unnecessarily? How about having a > > > single > > > "dpdk" or "dpdk-pmds" directory in lib, and having $MAJORVER as a > > > subdir > > > under that? > > > > > > E.g. dpdk/pmds-18.08/, dpdk/pmds-18.11/, or dpdk-pmds/18.08/ > > > dpdk-pmds/18.11 > > > > > > [The former of the above would be my preference, since I don't like > > > having > > > hypenated names, and like having "dpdk" alone as a folder name :-)] > > > > > > /Bruce > > > > dpdk/pmds-XX.YY/ would work for me. Timothy and Marco? > That would work for us. > However, I would suggest to have the path to be configurable (feature to be > dropped in maybe next release). Just to make sure the transition can happen > without pain in the remote circumstance that something goes wrong with > packaging... > > > -- > Marco V > Yes, I think it needs to be configurable for the forseeable future. If the DPDK version is to be put in the path then we either need to always use a configurable version, since we can't hardcode a version number in the default, or else we need to put logic in the meson.build file to always insert a version number. /Bruce
[dpdk-dev] [PATCH v12 1/7] bus: add hot-unplug handler
A hot-unplug failure and app crash can be caused, when a device is hot-unplugged but the application still try to access the device by reading or writing from the BARs, which is already invalid but still not timely be unmap or released. This patch introduces bus ops to handle hot-unplug failures. Each bus can implement its own case-dependent logic to handle the failures. Signed-off-by: Jeff Guo --- v12->v11: no change. --- lib/librte_eal/common/include/rte_bus.h | 16 1 file changed, 16 insertions(+) diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h index b7b5b08..1bb53dc 100644 --- a/lib/librte_eal/common/include/rte_bus.h +++ b/lib/librte_eal/common/include/rte_bus.h @@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev); typedef int (*rte_bus_parse_t)(const char *name, void *addr); /** + * Implement a specific hot-unplug handler, which is responsible for + * handle the failure when device be hot-unplugged. When the event of + * hot-unplug be detected, it could call this function to handle + * the hot-unplug failure and avoid app crash. + * @param dev + * Pointer of the device structure. + * + * @return + * 0 on success. + * !0 on error. + */ +typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev); + +/** * Bus scan policies */ enum rte_bus_scan_mode { @@ -212,6 +226,8 @@ struct rte_bus { struct rte_bus_conf conf;/**< Bus configuration */ rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */ rte_dev_iterate_t dev_iterate; /**< Device iterator. */ + rte_bus_hot_unplug_handler_t hot_unplug_handler; + /**< handle hot-unplug failure on the bus */ }; /** -- 2.7.4
[dpdk-dev] [PATCH v12 2/7] bus/pci: implement hot-unplug handler ops
This patch implements the ops to handle hot-unplug on the PCI bus. For UIO PCI, it could avoids BARs read/write errors by creating a new dummy memory to remap the memory where the failure is. For VFIO or other kernel driver, it could specific implement function to handle hot-unplug case by case. Signed-off-by: Jeff Guo Acked-by: Shaopeng He --- v12->v11: no change. --- drivers/bus/pci/pci_common.c | 28 drivers/bus/pci/pci_common_uio.c | 33 + drivers/bus/pci/private.h| 12 3 files changed, 73 insertions(+) diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c index 7736b3f..d286234 100644 --- a/drivers/bus/pci/pci_common.c +++ b/drivers/bus/pci/pci_common.c @@ -406,6 +406,33 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp, } static int +pci_hot_unplug_handler(struct rte_device *dev) +{ + struct rte_pci_device *pdev = NULL; + int ret = 0; + + pdev = RTE_DEV_TO_PCI(dev); + if (!pdev) + return -1; + + switch (pdev->kdrv) { + case RTE_KDRV_IGB_UIO: + case RTE_KDRV_UIO_GENERIC: + case RTE_KDRV_NIC_UIO: + /* BARs resource is invalid, remap it to be safe. */ + ret = pci_uio_remap_resource(pdev); + break; + default: + RTE_LOG(DEBUG, EAL, + "Not managed by a supported kernel driver, skipped\n"); + ret = -1; + break; + } + + return ret; +} + +static int pci_plug(struct rte_device *dev) { return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev)); @@ -435,6 +462,7 @@ struct rte_pci_bus rte_pci_bus = { .unplug = pci_unplug, .parse = pci_parse, .get_iommu_class = rte_pci_get_iommu_class, + .hot_unplug_handler = pci_hot_unplug_handler, }, .device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list), .driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list), diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c index 54bc20b..7ea73db 100644 --- a/drivers/bus/pci/pci_common_uio.c +++ b/drivers/bus/pci/pci_common_uio.c @@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res) } } +/* remap the PCI resource of a PCI device in anonymous virtual memory */ +int +pci_uio_remap_resource(struct rte_pci_device *dev) +{ + int i; + void *map_address; + + if (dev == NULL) + return -1; + + /* Remap all BARs */ + for (i = 0; i != PCI_MAX_RESOURCE; i++) { + /* skip empty BAR */ + if (dev->mem_resource[i].phys_addr == 0) + continue; + map_address = mmap(dev->mem_resource[i].addr, + (size_t)dev->mem_resource[i].len, + PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0); + if (map_address == MAP_FAILED) { + RTE_LOG(ERR, EAL, + "Cannot remap resource for device %s\n", + dev->name); + return -1; + } + RTE_LOG(INFO, EAL, + "Successful remap resource for device %s\n", + dev->name); + } + + return 0; +} + static struct mapped_pci_resource * pci_uio_find_resource(struct rte_pci_device *dev) { diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h index 8ddd03e..6b312e5 100644 --- a/drivers/bus/pci/private.h +++ b/drivers/bus/pci/private.h @@ -123,6 +123,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev, struct mapped_pci_resource *uio_res); /** + * Remap the PCI resource of a PCI device in anonymous virtual memory. + * + * @param dev + * Point to the struct rte pci device. + * @return + * - On success, zero. + * - On failure, a negative value. + */ +int +pci_uio_remap_resource(struct rte_pci_device *dev); + +/** * Map device memory to uio resource * * This function is private to EAL. -- 2.7.4
[dpdk-dev] [PATCH v12 0/7] hot-unplug failure handle mechanism
Hotplug is an important feature for use-cases like the datacenter device's fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher flexibility and continuality to networking services in multiple use-cases in the industry. So let's see how DPDK can help users implement hotplug solutions. We already have a general device-event monitor mechanism, failsafe driver, and hot plug/unplug API in DPDK. We have already got the solution of “ethdev event + kernel PMD hotplug handler + failsafe”, but we still not got “eal event + hotplug handler for pci PMD + failsafe” implement, and we need to considerate 2 different solutions between uio pci and vfio pci. In the case of hotplug for igb_uio, when a hardware device be removed physically or disabled in software, the application needs to be notified and detach the device out of the bus, and then make the device invalidate. The problem is that, the removal of the device is not instantaneous in software. If the application data path tries to read/write to the device when removal is still in process, it will cause an MMIO error and application will crash. In this patch set, we propose a PCIe bus failure handler mechanism for hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs, the application will not crash. The mechanism should work as below: First, the application enables the device event monitor, registers the hotplug event’s callback and enable hotplug handling before running the data path. Once the hot-unplug occurs, the mechanism will detect the removal event and then accordingly do the failure handling. In order to do that, the below functionality will be required: - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure. - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci, it will be based on the failure address to remap memory for the corresponding device that unplugged. For vfio pci, could seperate implement case by case. For the data path or other unexpected behaviors from the control path when a hot unplug occurs: - Add a new bus ops “sigbus_handler”, that is responsible for handling the sigbus error which is either an original memory error, or a specific memory error that is caused by a hot unplug. When a sigbus error is captured, it will call this function to handle sigbus error. - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all device on PCI bus to find which device encounter the failure. - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus to handle the failure. - Add a couple of APIs “rte_dev_hotplug_handle_enable” and “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling. It will monitor the sigbus error by a handler which is per-process. Based on the signal event principle, the control path thread and the data path thread will randomly receive the sigbus error, but will call the common sigbus process. When sigbus be captured, it will call the above API to find bus to handle it. The mechanism could be used by app or PMDs. For example, the whole process of hotplug in testpmd is: - Enable device event monitor->Enable hotplug handle->Register event callback ->attach port->start port->start forwarding->Device unplug->failure handle ->stop forwarding->stop port->close port->detach port. This patch set would not cover hotplug insert and binding, and it is only implement the igb_uio failure handler, the vfio hotplug failure handler will be in next coming patch set. patchset history: v12->v11: add and delete some checking about sigbus recover. v11->v10: change the ops name, since both uio and vfio will use the hot-unplug ops. since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage. move the igb_uio fixing part, since it is random issue and should be considarate as kernel driver defect but not include as this failure handler mechanism. v10->v9: modify the api name and exposure out for public use. add hotplug handle enable/disable APIs refine commit log v9->v8: refine commit log to be more readable. v8->v7: refine errno process in sigbus handler. refine igb uio release process v7->v6: delete some unused part v6->v5: refine some description about bus ops refine commit log add some entry check. v5->v4: split patches to focus on the failure handle, remove the event usage by testpmd to another patch. change the hotplug failure handler name. refine the sigbus handle logic. add lock for udev state in igb uio driver. v4->v3: split patches to be small and clear. change to use new parameter "--hotplug-mode" in testpmd to identify the eal hotplug and ethdev hotplug. v3->v2: change bus ops name to bus_hotplug_handler. add new API and bus ops of bus_signal_handler distingush handle generic. sigbus and hotplug sigbus. v2->v1(v21): refine some doc and commit log. fix igb uio kernel issue for control path fai
[dpdk-dev] [PATCH v12 4/7] bus/pci: implement sigbus handler ops
This patch implements the ops for the PCI bus sigbus handler. It finds the PCI device that is being hot-unplugged and calls the relevant ops of the hot-unplug handler to handle the hot-unplug failure of the device. Signed-off-by: Jeff Guo Acked-by: Shaopeng He --- v12->v11: no change. --- drivers/bus/pci/pci_common.c | 53 1 file changed, 53 insertions(+) diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c index d286234..f313fe9 100644 --- a/drivers/bus/pci/pci_common.c +++ b/drivers/bus/pci/pci_common.c @@ -405,6 +405,36 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp, return NULL; } +/** + * find the device which encounter the failure, by iterate over all device on + * PCI bus to check if the memory failure address is located in the range + * of the BARs of the device. + */ +static struct rte_pci_device * +pci_find_device_by_addr(const void *failure_addr) +{ + struct rte_pci_device *pdev = NULL; + int i; + + FOREACH_DEVICE_ON_PCIBUS(pdev) { + for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) { + if ((uint64_t)(uintptr_t)failure_addr >= + (uint64_t)(uintptr_t)pdev->mem_resource[i].addr && + (uint64_t)(uintptr_t)failure_addr < + (uint64_t)(uintptr_t)pdev->mem_resource[i].addr + + pdev->mem_resource[i].len) { + RTE_LOG(INFO, EAL, "Failure address " + "%16.16"PRIx64" belongs to " + "device %s!\n", + (uint64_t)(uintptr_t)failure_addr, + pdev->device.name); + return pdev; + } + } + } + return NULL; +} + static int pci_hot_unplug_handler(struct rte_device *dev) { @@ -433,6 +463,28 @@ pci_hot_unplug_handler(struct rte_device *dev) } static int +pci_sigbus_handler(const void *failure_addr) +{ + struct rte_pci_device *pdev = NULL; + int ret = 0; + + pdev = pci_find_device_by_addr(failure_addr); + if (!pdev) { + /* It is a generic sigbus error, no bus would handle it. */ + ret = 1; + } else { + /* The sigbus error is caused of hot-unplug. */ + ret = pci_hot_unplug_handler(&pdev->device); + if (ret) { + RTE_LOG(ERR, EAL, "Failed to handle hot-unplug for " + "device %s", pdev->name); + ret = -1; + } + } + return ret; +} + +static int pci_plug(struct rte_device *dev) { return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev)); @@ -463,6 +515,7 @@ struct rte_pci_bus rte_pci_bus = { .parse = pci_parse, .get_iommu_class = rte_pci_get_iommu_class, .hot_unplug_handler = pci_hot_unplug_handler, + .sigbus_handler = pci_sigbus_handler, }, .device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list), .driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list), -- 2.7.4
[dpdk-dev] [PATCH v12 3/7] bus: add sigbus handler
When a device is hot-unplugged, a sigbus error will occur of the datapath can still read/write to the device. A handler is required here to capture the sigbus signal and handle it appropriately. This patch introduces a bus ops to handle sigbus errors. Each bus can implement its own case-dependent logic to handle the sigbus errors. Signed-off-by: Jeff Guo Acked-by: Shaopeng He --- v12->v11: no change. --- lib/librte_eal/common/include/rte_bus.h | 18 ++ 1 file changed, 18 insertions(+) diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h index 1bb53dc..201454a 100644 --- a/lib/librte_eal/common/include/rte_bus.h +++ b/lib/librte_eal/common/include/rte_bus.h @@ -182,6 +182,21 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr); typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev); /** + * Implement a specific sigbus handler, which is responsible for handling + * the sigbus error which is either original memory error, or specific memory + * error that caused of device be hot-unplugged. When sigbus error be captured, + * it could call this function to handle sigbus error. + * @param failure_addr + * Pointer of the fault address of the sigbus error. + * + * @return + * 0 for success handle the sigbus. + * 1 for no bus handle the sigbus. + * -1 for failed to handle the sigbus + */ +typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr); + +/** * Bus scan policies */ enum rte_bus_scan_mode { @@ -228,6 +243,9 @@ struct rte_bus { rte_dev_iterate_t dev_iterate; /**< Device iterator. */ rte_bus_hot_unplug_handler_t hot_unplug_handler; /**< handle hot-unplug failure on the bus */ + rte_bus_sigbus_handler_t sigbus_handler; + /**< handle sigbus error on the bus */ + }; /** -- 2.7.4
[dpdk-dev] [PATCH v11 6/7] eal: add failure handle mechanism for hot-unplug
The mechanism can initially register the sigbus handler after the device event monitor is enabled. When a sigbus event is captured, it will check the failure address and accordingly handle the memory failure of the corresponding device by invoke the hot-unplug handler. It could prevent the application from crashing when a device is hot-unplugged. By this patch, users could call below new added APIs to enable/disable the device hotplug handle mechanism. Note that it just implement the hot-unplug handler in these functions, the other handler of hotplug, such as handler for hotplug binding, could be add in the future if need: - rte_dev_hotplug_handle_enable - rte_dev_hotplug_handle_disable Signed-off-by: Jeff Guo --- v12->v11: add and delete some checking about sigbus recover. --- doc/guides/rel_notes/release_18_08.rst | 5 + lib/librte_eal/bsdapp/eal/eal_dev.c | 14 +++ lib/librte_eal/common/eal_private.h | 26 + lib/librte_eal/common/include/rte_dev.h | 26 + lib/librte_eal/linuxapp/eal/eal_dev.c | 164 +++- lib/librte_eal/rte_eal_version.map | 2 + 6 files changed, 236 insertions(+), 1 deletion(-) diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_notes/release_18_08.rst index 321fa84..fe0e60f 100644 --- a/doc/guides/rel_notes/release_18_08.rst +++ b/doc/guides/rel_notes/release_18_08.rst @@ -117,6 +117,11 @@ New Features Added support for chained mbufs (input and output). +* **Added hot-unplug handle mechanism.** + + ``rte_dev_hotplug_handle_enable`` and ``rte_dev_hotplug_handle_disable`` are + for enabling or disabling hotplug handle mechanism. + API Changes --- diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c index 1c6c51b..255d611 100644 --- a/lib/librte_eal/bsdapp/eal/eal_dev.c +++ b/lib/librte_eal/bsdapp/eal/eal_dev.c @@ -19,3 +19,17 @@ rte_dev_event_monitor_stop(void) RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n"); return -1; } + +int __rte_experimental +rte_dev_hotplug_handle_enable(void) +{ + RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n"); + return -1; +} + +int __rte_experimental +rte_dev_hotplug_handle_disable(void) +{ + RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n"); + return -1; +} diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index a2d1528..637f20d 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -317,4 +317,30 @@ rte_devargs_layers_parse(struct rte_devargs *devargs, */ int rte_bus_sigbus_handler(const void *failure_addr); +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Register the sigbus handler. + * + * @return + * - On success, zero. + * - On failure, a negative value. + */ +int +rte_dev_sigbus_handler_register(void); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Unregister the sigbus handler. + * + * @return + * - On success, zero. + * - On failure, a negative value. + */ +int +rte_dev_sigbus_handler_unregister(void); + #endif /* _EAL_PRIVATE_H_ */ diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h index b80a805..ff580a0 100644 --- a/lib/librte_eal/common/include/rte_dev.h +++ b/lib/librte_eal/common/include/rte_dev.h @@ -460,4 +460,30 @@ rte_dev_event_monitor_start(void); int __rte_experimental rte_dev_event_monitor_stop(void); +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Enable hotplug handling for devices. + * + * @return + * - On success, zero. + * - On failure, a negative value. + */ +int __rte_experimental +rte_dev_hotplug_handle_enable(void); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Disable hotplug handling for devices. + * + * @return + * - On success, zero. + * - On failure, a negative value. + */ +int __rte_experimental +rte_dev_hotplug_handle_disable(void); + #endif /* _RTE_DEV_H_ */ diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c index 1cf6aeb..72fc033 100644 --- a/lib/librte_eal/linuxapp/eal/eal_dev.c +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c @@ -4,6 +4,8 @@ #include #include +#include +#include #include #include @@ -14,15 +16,32 @@ #include #include #include +#include +#include +#include +#include #include "eal_private.h" static struct rte_intr_handle intr_handle = {.fd = -1 }; static bool monitor_started; +static bool hotplug_handle; #define EAL_UEV_MSG_LEN 4096 #define EAL_UEV_MSG_ELEM_LEN 128 +/* + * spinlock for device hot-unplug failure handling. If it try to access bus or + * device, such as handle sigbus on bus or handle memory failure for device + * just need to use this lock. It could protect the bus and the device to
[dpdk-dev] [PATCH v12 5/7] bus: add helper to handle sigbus
This patch aims to add a helper to iterate over all buses to find the relevant bus to handle the sigbus error. Signed-off-by: Jeff Guo Acked-by: Shaopeng He --- v12->v11: no change. --- lib/librte_eal/common/eal_common_bus.c | 43 ++ lib/librte_eal/common/eal_private.h| 13 ++ 2 files changed, 56 insertions(+) diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c index 0943851..62b7318 100644 --- a/lib/librte_eal/common/eal_common_bus.c +++ b/lib/librte_eal/common/eal_common_bus.c @@ -37,6 +37,7 @@ #include #include #include +#include #include "eal_private.h" @@ -242,3 +243,45 @@ rte_bus_get_iommu_class(void) } return mode; } + +static int +bus_handle_sigbus(const struct rte_bus *bus, + const void *failure_addr) +{ + int ret; + + if (!bus->sigbus_handler) + return -1; + + ret = bus->sigbus_handler(failure_addr); + + /* find bus but handle failed, keep the errno be set. */ + if (ret < 0 && rte_errno == 0) + rte_errno = ENOTSUP; + + return ret > 0; +} + +int +rte_bus_sigbus_handler(const void *failure_addr) +{ + struct rte_bus *bus; + + int ret = 0; + int old_errno = rte_errno; + + rte_errno = 0; + + bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr); + /* can not find bus. */ + if (!bus) + return 1; + /* find bus but handle failed, pass on the new errno. */ + else if (rte_errno != 0) + return -1; + + /* restore the old errno. */ + rte_errno = old_errno; + + return ret; +} diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index 4f809a8..a2d1528 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -304,4 +304,17 @@ int rte_devargs_layers_parse(struct rte_devargs *devargs, const char *devstr); +/** + * Iterate over all buses to find the corresponding bus to handle the sigbus + * error. + * @param failure_addr + * Pointer of the fault address of the sigbus error. + * + * @return + * 0 success to handle the sigbus. + * -1 failed to handle the sigbus + * 1 no bus can handler the sigbus + */ +int rte_bus_sigbus_handler(const void *failure_addr); + #endif /* _EAL_PRIVATE_H_ */ -- 2.7.4
[dpdk-dev] [PATCH v11 7/7] testpmd: use hot-unplug failure handle mechanism
This patch use testpmd for example, to show how an app smoothly handle failure when device be hot-unplug. Except app should enabled the device event monitor and register the hotplug event’s callback, it also need enable hotplug handle mechanism before running. Once app detect the removal event, the hot-unplug callback would be called. It will first stop the packet forwarding, then stop the port, close the port, and finally detach the port to clean the device and release the resources. Signed-off-by: Jeff Guo --- v12->v11: no change. --- app/test-pmd/testpmd.c | 39 +++ 1 file changed, 31 insertions(+), 8 deletions(-) diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 001f0e5..bfef483 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -2093,14 +2093,22 @@ pmd_test_exit(void) if (hot_plug) { ret = rte_dev_event_monitor_stop(); - if (ret) + if (ret) { RTE_LOG(ERR, EAL, "fail to stop device event monitor."); + return; + } ret = eth_dev_event_callback_unregister(); if (ret) + return; + + ret = rte_dev_hotplug_handle_disable(); + if (ret) { RTE_LOG(ERR, EAL, - "fail to unregister all event callbacks."); + "fail to disable hotplug handling."); + return; + } } printf("\nBye...\n"); @@ -2244,6 +2252,9 @@ static void eth_dev_event_callback(char *device_name, enum rte_dev_event_type type, __rte_unused void *arg) { + uint16_t port_id; + int ret; + if (type >= RTE_DEV_EVENT_MAX) { fprintf(stderr, "%s called upon invalid event %d\n", __func__, type); @@ -2254,9 +2265,12 @@ eth_dev_event_callback(char *device_name, enum rte_dev_event_type type, case RTE_DEV_EVENT_REMOVE: RTE_LOG(ERR, EAL, "The device: %s has been removed!\n", device_name); - /* TODO: After finish failure handle, begin to stop -* packet forward, stop port, close port, detach port. -*/ + ret = rte_eth_dev_get_port_by_name(device_name, &port_id); + if (ret) { + printf("can not get port by device %s!\n", device_name); + return; + } + rmv_event_callback((void *)(intptr_t)port_id); break; case RTE_DEV_EVENT_ADD: RTE_LOG(ERR, EAL, "The device: %s has been added!\n", @@ -2779,14 +2793,23 @@ main(int argc, char** argv) init_config(); if (hot_plug) { - /* enable hot plug monitoring */ + ret = rte_dev_hotplug_handle_enable(); + if (ret) { + RTE_LOG(ERR, EAL, + "fail to enable hotplug handling."); + return -1; + } + ret = rte_dev_event_monitor_start(); if (ret) { - rte_errno = EINVAL; + RTE_LOG(ERR, EAL, + "fail to start device event monitoring."); return -1; } - eth_dev_event_callback_register(); + ret = eth_dev_event_callback_register(); + if (ret) + return -1; } if (start_port(RTE_PORT_ALL) != 0) -- 2.7.4
[dpdk-dev] [PATCH v12 1/7] bus: add hot-unplug handler
A hot-unplug failure and app crash can be caused, when a device is hot-unplugged but the application still try to access the device by reading or writing from the BARs, which is already invalid but still not timely be unmap or released. This patch introduces bus ops to handle hot-unplug failures. Each bus can implement its own case-dependent logic to handle the failures. Signed-off-by: Jeff Guo --- v12->v11: no change. --- lib/librte_eal/common/include/rte_bus.h | 16 1 file changed, 16 insertions(+) diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h index b7b5b08..1bb53dc 100644 --- a/lib/librte_eal/common/include/rte_bus.h +++ b/lib/librte_eal/common/include/rte_bus.h @@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev); typedef int (*rte_bus_parse_t)(const char *name, void *addr); /** + * Implement a specific hot-unplug handler, which is responsible for + * handle the failure when device be hot-unplugged. When the event of + * hot-unplug be detected, it could call this function to handle + * the hot-unplug failure and avoid app crash. + * @param dev + * Pointer of the device structure. + * + * @return + * 0 on success. + * !0 on error. + */ +typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev); + +/** * Bus scan policies */ enum rte_bus_scan_mode { @@ -212,6 +226,8 @@ struct rte_bus { struct rte_bus_conf conf;/**< Bus configuration */ rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */ rte_dev_iterate_t dev_iterate; /**< Device iterator. */ + rte_bus_hot_unplug_handler_t hot_unplug_handler; + /**< handle hot-unplug failure on the bus */ }; /** -- 2.7.4
[dpdk-dev] [PATCH v12 0/7] hot-unplug failure handle mechanism
Hotplug is an important feature for use-cases like the datacenter device's fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher flexibility and continuality to networking services in multiple use-cases in the industry. So let's see how DPDK can help users implement hotplug solutions. We already have a general device-event monitor mechanism, failsafe driver, and hot plug/unplug API in DPDK. We have already got the solution of “ethdev event + kernel PMD hotplug handler + failsafe”, but we still not got “eal event + hotplug handler for pci PMD + failsafe” implement, and we need to considerate 2 different solutions between uio pci and vfio pci. In the case of hotplug for igb_uio, when a hardware device be removed physically or disabled in software, the application needs to be notified and detach the device out of the bus, and then make the device invalidate. The problem is that, the removal of the device is not instantaneous in software. If the application data path tries to read/write to the device when removal is still in process, it will cause an MMIO error and application will crash. In this patch set, we propose a PCIe bus failure handler mechanism for hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs, the application will not crash. The mechanism should work as below: First, the application enables the device event monitor, registers the hotplug event’s callback and enable hotplug handling before running the data path. Once the hot-unplug occurs, the mechanism will detect the removal event and then accordingly do the failure handling. In order to do that, the below functionality will be required: - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure. - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci, it will be based on the failure address to remap memory for the corresponding device that unplugged. For vfio pci, could seperate implement case by case. For the data path or other unexpected behaviors from the control path when a hot unplug occurs: - Add a new bus ops “sigbus_handler”, that is responsible for handling the sigbus error which is either an original memory error, or a specific memory error that is caused by a hot unplug. When a sigbus error is captured, it will call this function to handle sigbus error. - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all device on PCI bus to find which device encounter the failure. - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus to handle the failure. - Add a couple of APIs “rte_dev_hotplug_handle_enable” and “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling. It will monitor the sigbus error by a handler which is per-process. Based on the signal event principle, the control path thread and the data path thread will randomly receive the sigbus error, but will call the common sigbus process. When sigbus be captured, it will call the above API to find bus to handle it. The mechanism could be used by app or PMDs. For example, the whole process of hotplug in testpmd is: - Enable device event monitor->Enable hotplug handle->Register event callback ->attach port->start port->start forwarding->Device unplug->failure handle ->stop forwarding->stop port->close port->detach port. This patch set would not cover hotplug insert and binding, and it is only implement the igb_uio failure handler, the vfio hotplug failure handler will be in next coming patch set. patchset history: v12->v11: add and delete some checking about sigbus recover. v11->v10: change the ops name, since both uio and vfio will use the hot-unplug ops. since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage. move the igb_uio fixing part, since it is random issue and should be considarate as kernel driver defect but not include as this failure handler mechanism. v10->v9: modify the api name and exposure out for public use. add hotplug handle enable/disable APIs refine commit log v9->v8: refine commit log to be more readable. v8->v7: refine errno process in sigbus handler. refine igb uio release process v7->v6: delete some unused part v6->v5: refine some description about bus ops refine commit log add some entry check. v5->v4: split patches to focus on the failure handle, remove the event usage by testpmd to another patch. change the hotplug failure handler name. refine the sigbus handle logic. add lock for udev state in igb uio driver. v4->v3: split patches to be small and clear. change to use new parameter "--hotplug-mode" in testpmd to identify the eal hotplug and ethdev hotplug. v3->v2: change bus ops name to bus_hotplug_handler. add new API and bus ops of bus_signal_handler distingush handle generic. sigbus and hotplug sigbus. v2->v1(v21): refine some doc and commit log. fix igb uio kernel issue for control path fai
[dpdk-dev] [PATCH v12 2/7] bus/pci: implement hot-unplug handler ops
This patch implements the ops to handle hot-unplug on the PCI bus. For UIO PCI, it could avoids BARs read/write errors by creating a new dummy memory to remap the memory where the failure is. For VFIO or other kernel driver, it could specific implement function to handle hot-unplug case by case. Signed-off-by: Jeff Guo Acked-by: Shaopeng He --- v12->v11: no change. --- drivers/bus/pci/pci_common.c | 28 drivers/bus/pci/pci_common_uio.c | 33 + drivers/bus/pci/private.h| 12 3 files changed, 73 insertions(+) diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c index 7736b3f..d286234 100644 --- a/drivers/bus/pci/pci_common.c +++ b/drivers/bus/pci/pci_common.c @@ -406,6 +406,33 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp, } static int +pci_hot_unplug_handler(struct rte_device *dev) +{ + struct rte_pci_device *pdev = NULL; + int ret = 0; + + pdev = RTE_DEV_TO_PCI(dev); + if (!pdev) + return -1; + + switch (pdev->kdrv) { + case RTE_KDRV_IGB_UIO: + case RTE_KDRV_UIO_GENERIC: + case RTE_KDRV_NIC_UIO: + /* BARs resource is invalid, remap it to be safe. */ + ret = pci_uio_remap_resource(pdev); + break; + default: + RTE_LOG(DEBUG, EAL, + "Not managed by a supported kernel driver, skipped\n"); + ret = -1; + break; + } + + return ret; +} + +static int pci_plug(struct rte_device *dev) { return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev)); @@ -435,6 +462,7 @@ struct rte_pci_bus rte_pci_bus = { .unplug = pci_unplug, .parse = pci_parse, .get_iommu_class = rte_pci_get_iommu_class, + .hot_unplug_handler = pci_hot_unplug_handler, }, .device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list), .driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list), diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c index 54bc20b..7ea73db 100644 --- a/drivers/bus/pci/pci_common_uio.c +++ b/drivers/bus/pci/pci_common_uio.c @@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res) } } +/* remap the PCI resource of a PCI device in anonymous virtual memory */ +int +pci_uio_remap_resource(struct rte_pci_device *dev) +{ + int i; + void *map_address; + + if (dev == NULL) + return -1; + + /* Remap all BARs */ + for (i = 0; i != PCI_MAX_RESOURCE; i++) { + /* skip empty BAR */ + if (dev->mem_resource[i].phys_addr == 0) + continue; + map_address = mmap(dev->mem_resource[i].addr, + (size_t)dev->mem_resource[i].len, + PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0); + if (map_address == MAP_FAILED) { + RTE_LOG(ERR, EAL, + "Cannot remap resource for device %s\n", + dev->name); + return -1; + } + RTE_LOG(INFO, EAL, + "Successful remap resource for device %s\n", + dev->name); + } + + return 0; +} + static struct mapped_pci_resource * pci_uio_find_resource(struct rte_pci_device *dev) { diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h index 8ddd03e..6b312e5 100644 --- a/drivers/bus/pci/private.h +++ b/drivers/bus/pci/private.h @@ -123,6 +123,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev, struct mapped_pci_resource *uio_res); /** + * Remap the PCI resource of a PCI device in anonymous virtual memory. + * + * @param dev + * Point to the struct rte pci device. + * @return + * - On success, zero. + * - On failure, a negative value. + */ +int +pci_uio_remap_resource(struct rte_pci_device *dev); + +/** * Map device memory to uio resource * * This function is private to EAL. -- 2.7.4
[dpdk-dev] [PATCH v12 3/7] bus: add sigbus handler
When a device is hot-unplugged, a sigbus error will occur of the datapath can still read/write to the device. A handler is required here to capture the sigbus signal and handle it appropriately. This patch introduces a bus ops to handle sigbus errors. Each bus can implement its own case-dependent logic to handle the sigbus errors. Signed-off-by: Jeff Guo Acked-by: Shaopeng He --- v12->v11: no change. --- lib/librte_eal/common/include/rte_bus.h | 18 ++ 1 file changed, 18 insertions(+) diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h index 1bb53dc..201454a 100644 --- a/lib/librte_eal/common/include/rte_bus.h +++ b/lib/librte_eal/common/include/rte_bus.h @@ -182,6 +182,21 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr); typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev); /** + * Implement a specific sigbus handler, which is responsible for handling + * the sigbus error which is either original memory error, or specific memory + * error that caused of device be hot-unplugged. When sigbus error be captured, + * it could call this function to handle sigbus error. + * @param failure_addr + * Pointer of the fault address of the sigbus error. + * + * @return + * 0 for success handle the sigbus. + * 1 for no bus handle the sigbus. + * -1 for failed to handle the sigbus + */ +typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr); + +/** * Bus scan policies */ enum rte_bus_scan_mode { @@ -228,6 +243,9 @@ struct rte_bus { rte_dev_iterate_t dev_iterate; /**< Device iterator. */ rte_bus_hot_unplug_handler_t hot_unplug_handler; /**< handle hot-unplug failure on the bus */ + rte_bus_sigbus_handler_t sigbus_handler; + /**< handle sigbus error on the bus */ + }; /** -- 2.7.4
[dpdk-dev] [PATCH v12 5/7] bus: add helper to handle sigbus
This patch aims to add a helper to iterate over all buses to find the relevant bus to handle the sigbus error. Signed-off-by: Jeff Guo Acked-by: Shaopeng He --- v12->v11: no change. --- lib/librte_eal/common/eal_common_bus.c | 43 ++ lib/librte_eal/common/eal_private.h| 13 ++ 2 files changed, 56 insertions(+) diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c index 0943851..62b7318 100644 --- a/lib/librte_eal/common/eal_common_bus.c +++ b/lib/librte_eal/common/eal_common_bus.c @@ -37,6 +37,7 @@ #include #include #include +#include #include "eal_private.h" @@ -242,3 +243,45 @@ rte_bus_get_iommu_class(void) } return mode; } + +static int +bus_handle_sigbus(const struct rte_bus *bus, + const void *failure_addr) +{ + int ret; + + if (!bus->sigbus_handler) + return -1; + + ret = bus->sigbus_handler(failure_addr); + + /* find bus but handle failed, keep the errno be set. */ + if (ret < 0 && rte_errno == 0) + rte_errno = ENOTSUP; + + return ret > 0; +} + +int +rte_bus_sigbus_handler(const void *failure_addr) +{ + struct rte_bus *bus; + + int ret = 0; + int old_errno = rte_errno; + + rte_errno = 0; + + bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr); + /* can not find bus. */ + if (!bus) + return 1; + /* find bus but handle failed, pass on the new errno. */ + else if (rte_errno != 0) + return -1; + + /* restore the old errno. */ + rte_errno = old_errno; + + return ret; +} diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index 4f809a8..a2d1528 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -304,4 +304,17 @@ int rte_devargs_layers_parse(struct rte_devargs *devargs, const char *devstr); +/** + * Iterate over all buses to find the corresponding bus to handle the sigbus + * error. + * @param failure_addr + * Pointer of the fault address of the sigbus error. + * + * @return + * 0 success to handle the sigbus. + * -1 failed to handle the sigbus + * 1 no bus can handler the sigbus + */ +int rte_bus_sigbus_handler(const void *failure_addr); + #endif /* _EAL_PRIVATE_H_ */ -- 2.7.4
[dpdk-dev] [PATCH v12 6/7] eal: add failure handle mechanism for hot-unplug
The mechanism can initially register the sigbus handler after the device event monitor is enabled. When a sigbus event is captured, it will check the failure address and accordingly handle the memory failure of the corresponding device by invoke the hot-unplug handler. It could prevent the application from crashing when a device is hot-unplugged. By this patch, users could call below new added APIs to enable/disable the device hotplug handle mechanism. Note that it just implement the hot-unplug handler in these functions, the other handler of hotplug, such as handler for hotplug binding, could be add in the future if need: - rte_dev_hotplug_handle_enable - rte_dev_hotplug_handle_disable Signed-off-by: Jeff Guo --- v12->v11: add and delete some checking about sigbus recover. --- doc/guides/rel_notes/release_18_08.rst | 5 + lib/librte_eal/bsdapp/eal/eal_dev.c | 14 +++ lib/librte_eal/common/eal_private.h | 26 + lib/librte_eal/common/include/rte_dev.h | 26 + lib/librte_eal/linuxapp/eal/eal_dev.c | 164 +++- lib/librte_eal/rte_eal_version.map | 2 + 6 files changed, 236 insertions(+), 1 deletion(-) diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_notes/release_18_08.rst index 321fa84..fe0e60f 100644 --- a/doc/guides/rel_notes/release_18_08.rst +++ b/doc/guides/rel_notes/release_18_08.rst @@ -117,6 +117,11 @@ New Features Added support for chained mbufs (input and output). +* **Added hot-unplug handle mechanism.** + + ``rte_dev_hotplug_handle_enable`` and ``rte_dev_hotplug_handle_disable`` are + for enabling or disabling hotplug handle mechanism. + API Changes --- diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c index 1c6c51b..255d611 100644 --- a/lib/librte_eal/bsdapp/eal/eal_dev.c +++ b/lib/librte_eal/bsdapp/eal/eal_dev.c @@ -19,3 +19,17 @@ rte_dev_event_monitor_stop(void) RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n"); return -1; } + +int __rte_experimental +rte_dev_hotplug_handle_enable(void) +{ + RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n"); + return -1; +} + +int __rte_experimental +rte_dev_hotplug_handle_disable(void) +{ + RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n"); + return -1; +} diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index a2d1528..637f20d 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -317,4 +317,30 @@ rte_devargs_layers_parse(struct rte_devargs *devargs, */ int rte_bus_sigbus_handler(const void *failure_addr); +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Register the sigbus handler. + * + * @return + * - On success, zero. + * - On failure, a negative value. + */ +int +rte_dev_sigbus_handler_register(void); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Unregister the sigbus handler. + * + * @return + * - On success, zero. + * - On failure, a negative value. + */ +int +rte_dev_sigbus_handler_unregister(void); + #endif /* _EAL_PRIVATE_H_ */ diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h index b80a805..ff580a0 100644 --- a/lib/librte_eal/common/include/rte_dev.h +++ b/lib/librte_eal/common/include/rte_dev.h @@ -460,4 +460,30 @@ rte_dev_event_monitor_start(void); int __rte_experimental rte_dev_event_monitor_stop(void); +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Enable hotplug handling for devices. + * + * @return + * - On success, zero. + * - On failure, a negative value. + */ +int __rte_experimental +rte_dev_hotplug_handle_enable(void); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Disable hotplug handling for devices. + * + * @return + * - On success, zero. + * - On failure, a negative value. + */ +int __rte_experimental +rte_dev_hotplug_handle_disable(void); + #endif /* _RTE_DEV_H_ */ diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c index 1cf6aeb..72fc033 100644 --- a/lib/librte_eal/linuxapp/eal/eal_dev.c +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c @@ -4,6 +4,8 @@ #include #include +#include +#include #include #include @@ -14,15 +16,32 @@ #include #include #include +#include +#include +#include +#include #include "eal_private.h" static struct rte_intr_handle intr_handle = {.fd = -1 }; static bool monitor_started; +static bool hotplug_handle; #define EAL_UEV_MSG_LEN 4096 #define EAL_UEV_MSG_ELEM_LEN 128 +/* + * spinlock for device hot-unplug failure handling. If it try to access bus or + * device, such as handle sigbus on bus or handle memory failure for device + * just need to use this lock. It could protect the bus and the device to
[dpdk-dev] [PATCH v12 7/7] testpmd: use hot-unplug failure handle mechanism
This patch use testpmd for example, to show how an app smoothly handle failure when device be hot-unplug. Except app should enabled the device event monitor and register the hotplug event’s callback, it also need enable hotplug handle mechanism before running. Once app detect the removal event, the hot-unplug callback would be called. It will first stop the packet forwarding, then stop the port, close the port, and finally detach the port to clean the device and release the resources. Signed-off-by: Jeff Guo --- v12->v11: no change. --- app/test-pmd/testpmd.c | 39 +++ 1 file changed, 31 insertions(+), 8 deletions(-) diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 001f0e5..bfef483 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -2093,14 +2093,22 @@ pmd_test_exit(void) if (hot_plug) { ret = rte_dev_event_monitor_stop(); - if (ret) + if (ret) { RTE_LOG(ERR, EAL, "fail to stop device event monitor."); + return; + } ret = eth_dev_event_callback_unregister(); if (ret) + return; + + ret = rte_dev_hotplug_handle_disable(); + if (ret) { RTE_LOG(ERR, EAL, - "fail to unregister all event callbacks."); + "fail to disable hotplug handling."); + return; + } } printf("\nBye...\n"); @@ -2244,6 +2252,9 @@ static void eth_dev_event_callback(char *device_name, enum rte_dev_event_type type, __rte_unused void *arg) { + uint16_t port_id; + int ret; + if (type >= RTE_DEV_EVENT_MAX) { fprintf(stderr, "%s called upon invalid event %d\n", __func__, type); @@ -2254,9 +2265,12 @@ eth_dev_event_callback(char *device_name, enum rte_dev_event_type type, case RTE_DEV_EVENT_REMOVE: RTE_LOG(ERR, EAL, "The device: %s has been removed!\n", device_name); - /* TODO: After finish failure handle, begin to stop -* packet forward, stop port, close port, detach port. -*/ + ret = rte_eth_dev_get_port_by_name(device_name, &port_id); + if (ret) { + printf("can not get port by device %s!\n", device_name); + return; + } + rmv_event_callback((void *)(intptr_t)port_id); break; case RTE_DEV_EVENT_ADD: RTE_LOG(ERR, EAL, "The device: %s has been added!\n", @@ -2779,14 +2793,23 @@ main(int argc, char** argv) init_config(); if (hot_plug) { - /* enable hot plug monitoring */ + ret = rte_dev_hotplug_handle_enable(); + if (ret) { + RTE_LOG(ERR, EAL, + "fail to enable hotplug handling."); + return -1; + } + ret = rte_dev_event_monitor_start(); if (ret) { - rte_errno = EINVAL; + RTE_LOG(ERR, EAL, + "fail to start device event monitoring."); return -1; } - eth_dev_event_callback_register(); + ret = eth_dev_event_callback_register(); + if (ret) + return -1; } if (start_port(RTE_PORT_ALL) != 0) -- 2.7.4
[dpdk-dev] [PATCH v12 4/7] bus/pci: implement sigbus handler ops
This patch implements the ops for the PCI bus sigbus handler. It finds the PCI device that is being hot-unplugged and calls the relevant ops of the hot-unplug handler to handle the hot-unplug failure of the device. Signed-off-by: Jeff Guo Acked-by: Shaopeng He --- v12->v11: no change. --- drivers/bus/pci/pci_common.c | 53 1 file changed, 53 insertions(+) diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c index d286234..f313fe9 100644 --- a/drivers/bus/pci/pci_common.c +++ b/drivers/bus/pci/pci_common.c @@ -405,6 +405,36 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp, return NULL; } +/** + * find the device which encounter the failure, by iterate over all device on + * PCI bus to check if the memory failure address is located in the range + * of the BARs of the device. + */ +static struct rte_pci_device * +pci_find_device_by_addr(const void *failure_addr) +{ + struct rte_pci_device *pdev = NULL; + int i; + + FOREACH_DEVICE_ON_PCIBUS(pdev) { + for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) { + if ((uint64_t)(uintptr_t)failure_addr >= + (uint64_t)(uintptr_t)pdev->mem_resource[i].addr && + (uint64_t)(uintptr_t)failure_addr < + (uint64_t)(uintptr_t)pdev->mem_resource[i].addr + + pdev->mem_resource[i].len) { + RTE_LOG(INFO, EAL, "Failure address " + "%16.16"PRIx64" belongs to " + "device %s!\n", + (uint64_t)(uintptr_t)failure_addr, + pdev->device.name); + return pdev; + } + } + } + return NULL; +} + static int pci_hot_unplug_handler(struct rte_device *dev) { @@ -433,6 +463,28 @@ pci_hot_unplug_handler(struct rte_device *dev) } static int +pci_sigbus_handler(const void *failure_addr) +{ + struct rte_pci_device *pdev = NULL; + int ret = 0; + + pdev = pci_find_device_by_addr(failure_addr); + if (!pdev) { + /* It is a generic sigbus error, no bus would handle it. */ + ret = 1; + } else { + /* The sigbus error is caused of hot-unplug. */ + ret = pci_hot_unplug_handler(&pdev->device); + if (ret) { + RTE_LOG(ERR, EAL, "Failed to handle hot-unplug for " + "device %s", pdev->name); + ret = -1; + } + } + return ret; +} + +static int pci_plug(struct rte_device *dev) { return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev)); @@ -463,6 +515,7 @@ struct rte_pci_bus rte_pci_bus = { .parse = pci_parse, .get_iommu_class = rte_pci_get_iommu_class, .hot_unplug_handler = pci_hot_unplug_handler, + .sigbus_handler = pci_sigbus_handler, }, .device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list), .driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list), -- 2.7.4
[dpdk-dev] [PATCH v2 2/4] eal: modify device event callback process func
This patch modify the device event callback process function name to be more explicit, change the variable to be const. And more, because not only eal device helper will use the callback, but also vfio bus will use the callback to handle hot-unplug, so exposure the API out from private eal. The bus drivers and eal device would directly use this API to process device event callback. Signed-off-by: Jeff Guo --- modify commit log to be more clear --- app/test-pmd/testpmd.c | 4 ++-- lib/librte_eal/bsdapp/eal/eal_dev.c | 8 lib/librte_eal/common/eal_common_dev.c | 5 +++-- lib/librte_eal/common/eal_private.h | 12 lib/librte_eal/common/include/rte_dev.h | 18 +- lib/librte_eal/linuxapp/eal/eal_dev.c | 2 +- lib/librte_eal/rte_eal_version.map | 1 + 7 files changed, 32 insertions(+), 18 deletions(-) diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index bfef483..1313100 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -431,7 +431,7 @@ static void check_all_ports_link_status(uint32_t port_mask); static int eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param, void *ret_param); -static void eth_dev_event_callback(char *device_name, +static void eth_dev_event_callback(const char *device_name, enum rte_dev_event_type type, void *param); static int eth_dev_event_callback_register(void); @@ -2249,7 +2249,7 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param, /* This function is used by the interrupt thread */ static void -eth_dev_event_callback(char *device_name, enum rte_dev_event_type type, +eth_dev_event_callback(const char *device_name, enum rte_dev_event_type type, __rte_unused void *arg) { uint16_t port_id; diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c index 255d611..3a3a2a5 100644 --- a/lib/librte_eal/bsdapp/eal/eal_dev.c +++ b/lib/librte_eal/bsdapp/eal/eal_dev.c @@ -33,3 +33,11 @@ rte_dev_hotplug_handle_disable(void) RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n"); return -1; } + +void __rte_experimental +rte_dev_event_callback_process(const char *device_name, + enum rte_dev_event_type event) +{ + RTE_LOG(ERR, EAL, + "Device event callback process is not supported for FreeBSD.\n"); +} diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c index 678dbca..2d610a4 100644 --- a/lib/librte_eal/common/eal_common_dev.c +++ b/lib/librte_eal/common/eal_common_dev.c @@ -342,8 +342,9 @@ rte_dev_event_callback_unregister(const char *device_name, return ret; } -void -dev_callback_process(char *device_name, enum rte_dev_event_type event) +void __rte_experimental +rte_dev_event_callback_process(const char *device_name, + enum rte_dev_event_type event) { struct dev_event_callback *cb_lst; diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index 637f20d..47e8a33 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -259,18 +259,6 @@ struct rte_bus *rte_bus_find_by_device_name(const char *str); int rte_mp_channel_init(void); /** - * Internal Executes all the user application registered callbacks for - * the specific device. It is for DPDK internal user only. User - * application should not call it directly. - * - * @param device_name - * The device name. - * @param event - * the device event type. - */ -void dev_callback_process(char *device_name, enum rte_dev_event_type event); - -/** * @internal * Parse a device string and store its information in an * rte_devargs structure. diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h index ff580a0..58fab43 100644 --- a/lib/librte_eal/common/include/rte_dev.h +++ b/lib/librte_eal/common/include/rte_dev.h @@ -39,7 +39,7 @@ struct rte_dev_event { char *devname; /**< device name */ }; -typedef void (*rte_dev_event_cb_fn)(char *device_name, +typedef void (*rte_dev_event_cb_fn)(const char *device_name, enum rte_dev_event_type event, void *cb_arg); @@ -438,6 +438,22 @@ rte_dev_event_callback_unregister(const char *device_name, * @warning * @b EXPERIMENTAL: this API may change without prior notice * + * Executes all the user application registered callbacks for + * the specific device. + * + * @param device_name + * The device name. + * @param event + * the device event type. + */ +void __rte_experimental +rte_dev_event_callback_process(const char *device_name, + enum rte
[dpdk-dev] [PATCH v2 1/4] eal: add a new req notifier to eal interrupt
Add a new req notifier in eal interrupt for enable vfio hotplug. Signed-off-by: Jeff Guo --- v3->v2: change some code sytle to make consistent. --- lib/librte_eal/common/include/rte_eal_interrupts.h | 1 + lib/librte_eal/linuxapp/eal/eal_interrupts.c | 71 ++ 2 files changed, 72 insertions(+) diff --git a/lib/librte_eal/common/include/rte_eal_interrupts.h b/lib/librte_eal/common/include/rte_eal_interrupts.h index 6eb4932..5204ed4 100644 --- a/lib/librte_eal/common/include/rte_eal_interrupts.h +++ b/lib/librte_eal/common/include/rte_eal_interrupts.h @@ -35,6 +35,7 @@ enum rte_intr_handle_type { RTE_INTR_HANDLE_EXT, /**< external handler */ RTE_INTR_HANDLE_VDEV, /**< virtual device */ RTE_INTR_HANDLE_DEV_EVENT,/**< device event handle */ + RTE_INTR_HANDLE_VFIO_REQ, /**< vfio device handle (req) */ RTE_INTR_HANDLE_MAX /**< count of elements */ }; diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c index 4076c6d..7f611b3 100644 --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c @@ -308,6 +308,64 @@ vfio_disable_msix(const struct rte_intr_handle *intr_handle) { return ret; } + +/* enable req notifier */ +static int +vfio_enable_req(const struct rte_intr_handle *intr_handle) +{ + int len, ret; + char irq_set_buf[IRQ_SET_BUF_LEN]; + struct vfio_irq_set *irq_set; + int *fd_ptr; + + len = sizeof(irq_set_buf); + + irq_set = (struct vfio_irq_set *) irq_set_buf; + irq_set->argsz = len; + irq_set->count = 1; + irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | +VFIO_IRQ_SET_ACTION_TRIGGER; + irq_set->index = VFIO_PCI_REQ_IRQ_INDEX; + irq_set->start = 0; + fd_ptr = (int *) &irq_set->data; + *fd_ptr = intr_handle->fd; + + ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); + + if (ret) { + RTE_LOG(ERR, EAL, "Error enabling req interrupts for fd %d\n", + intr_handle->fd); + return -1; + } + + return 0; +} + +/* disable req notifier */ +static int +vfio_disable_req(const struct rte_intr_handle *intr_handle) +{ + struct vfio_irq_set *irq_set; + char irq_set_buf[IRQ_SET_BUF_LEN]; + int len, ret; + + len = sizeof(struct vfio_irq_set); + + irq_set = (struct vfio_irq_set *) irq_set_buf; + irq_set->argsz = len; + irq_set->count = 0; + irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER; + irq_set->index = VFIO_PCI_REQ_IRQ_INDEX; + irq_set->start = 0; + + ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); + + if (ret) + RTE_LOG(ERR, EAL, "Error disabling req interrupts for fd %d\n", + intr_handle->fd); + + return ret; +} #endif static int @@ -556,6 +614,10 @@ rte_intr_enable(const struct rte_intr_handle *intr_handle) if (vfio_enable_intx(intr_handle)) return -1; break; + case RTE_INTR_HANDLE_VFIO_REQ: + if (vfio_enable_req(intr_handle)) + return -1; + break; #endif /* not used at this moment */ case RTE_INTR_HANDLE_DEV_EVENT: @@ -606,6 +668,11 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle) if (vfio_disable_intx(intr_handle)) return -1; break; + case RTE_INTR_HANDLE_VFIO_REQ: + if (vfio_disable_req(intr_handle)) + return -1; + break; + #endif /* not used at this moment */ case RTE_INTR_HANDLE_DEV_EVENT: @@ -682,6 +749,10 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds) bytes_read = 0; call = true; break; + case RTE_INTR_HANDLE_VFIO_REQ: + bytes_read = 0; + call = true; + break; default: bytes_read = 1; break; -- 2.7.4
[dpdk-dev] [PATCH v2 0/4] Enable hotplug in vfio
As we may know that the process of hotplug is different between igb_uio and vfio. For igb_uio, it could use uevent notification and memory failure handle mechanism for hot-unplug. But for vfio, when device is be hot-unplugged, the uevent can not be detected immediately, because of the vfio kernel module will use a special mechanism to guaranty the pci device would not be deleted until the user space release the resources, so it will use another event “req notifier” at first to notify user space to release resources for hotplug. This patch will add a new interrupt type of req notifier in eal interrupt, and add the new interrupt handler in pci device to handle the req device event. When the req notifier be detected, it can trigger the device event callback process to process for hot-unplug. With this mechanism, hotplug could be enable in vfio. patchset history: v3->v2: change some commit log and coding style and typo. v2->v1: change the rte_dev_event_callback_prcess from internal to external api for bus or app usage. change some code logic. Jeff Guo (4): eal: add a new req notifier to eal interrupt eal: modify device event callback process func pci: add req handler field to generic pci device vfio: enable vfio hotplug by req notifier handler app/test-pmd/testpmd.c | 4 +- drivers/bus/pci/linux/pci_vfio.c | 111 + drivers/bus/pci/pci_common.c | 10 ++ drivers/bus/pci/rte_bus_pci.h | 1 + lib/librte_eal/bsdapp/eal/eal_dev.c| 8 ++ lib/librte_eal/common/eal_common_dev.c | 5 +- lib/librte_eal/common/eal_private.h| 12 --- lib/librte_eal/common/include/rte_dev.h| 18 +++- lib/librte_eal/common/include/rte_eal_interrupts.h | 1 + lib/librte_eal/linuxapp/eal/eal_dev.c | 2 +- lib/librte_eal/linuxapp/eal/eal_interrupts.c | 71 + lib/librte_eal/rte_eal_version.map | 1 + 12 files changed, 226 insertions(+), 18 deletions(-) -- 2.7.4
[dpdk-dev] [PATCH v2 3/4] pci: add req handler field to generic pci device
There are some extended interrupt types in vfio pci device except from the existing interrupts, such as err and req notifier, they could be useful for device error monitoring. And these corresponding interrupt handler is different from the other interrupt handler that register in PMDs, so a new interrupt handler should be added. This patch will add specific req handler in generic pci device. Signed-off-by: Jeff Guo --- v3->v2: no change. --- drivers/bus/pci/rte_bus_pci.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h index 0d1955f..c45a820 100644 --- a/drivers/bus/pci/rte_bus_pci.h +++ b/drivers/bus/pci/rte_bus_pci.h @@ -66,6 +66,7 @@ struct rte_pci_device { uint16_t max_vfs; /**< sriov enable if not zero */ enum rte_kernel_driver kdrv;/**< Kernel driver passthrough */ char name[PCI_PRI_STR_SIZE+1]; /**< PCI location (ASCII) */ + struct rte_intr_handle req_notifier_handler;/**< Req notifier handle */ }; /** -- 2.7.4
[dpdk-dev] [PATCH v2 4/4] vfio: enable vfio hotplug by req notifier handler
When device is be hot-unplugged, the vfio kernel module will sent req notifier to request user space to release the allocated resources at first. After that, vfio kernel module will detect the device disappear, and then delete the device in kernel. This patch aim to add req notifier processing to enable hotplug for vfio. By enable the req notifier monitoring and register the notifier callback, when device be hot-unplugged, the hot-unplug handler will be called to process hotplug for vfio. Signed-off-by: Jeff Guo --- v3->v2: change some code style and typo --- drivers/bus/pci/linux/pci_vfio.c | 111 +++ drivers/bus/pci/pci_common.c | 10 2 files changed, 121 insertions(+) diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c index 686386d..5d3d026 100644 --- a/drivers/bus/pci/linux/pci_vfio.c +++ b/drivers/bus/pci/linux/pci_vfio.c @@ -17,6 +17,8 @@ #include #include #include +#include +#include #include "eal_filesystem.h" @@ -277,6 +279,101 @@ pci_vfio_setup_interrupts(struct rte_pci_device *dev, int vfio_dev_fd) return -1; } +static void +pci_vfio_req_handler(void *param) +{ + struct rte_bus *bus; + int ret; + struct rte_device *device = (struct rte_device *)param; + + bus = rte_bus_find_by_device(device); + if (bus == NULL) { + RTE_LOG(ERR, EAL, "Cannot find bus for device (%s)\n", + device->name); + return; + } + + /* +* vfio kernel module request user space to release allocated +* resources before device be deleted in kernel, so it can directly +* call the vfio bus hot-unplug handler to process it. +*/ + ret = bus->hot_unplug_handler(device); + if (ret) + RTE_LOG(ERR, EAL, + "Can not handle hot-unplug for device (%s)\n", + device->name); +} + +/* enable notifier (only enable req now) */ +static int +pci_vfio_enable_notifier(struct rte_pci_device *dev, int vfio_dev_fd) +{ + int ret; + int fd = -1; + + /* set up an eventfd for req notifier */ + fd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); + if (fd < 0) { + RTE_LOG(ERR, EAL, "Cannot set up eventfd, error %i (%s)\n", + errno, strerror(errno)); + return -1; + } + + dev->req_notifier_handler.fd = fd; + dev->req_notifier_handler.type = RTE_INTR_HANDLE_VFIO_REQ; + dev->req_notifier_handler.vfio_dev_fd = vfio_dev_fd; + ret = rte_intr_callback_register(&dev->req_notifier_handler, +pci_vfio_req_handler, +(void *)&dev->device); + if (ret) { + RTE_LOG(ERR, EAL, "Fail to register req notifier handler.\n"); + goto error; + } + + ret = rte_intr_enable(&dev->req_notifier_handler); + if (ret) { + RTE_LOG(ERR, EAL, "Fail to enable req notifier.\n"); + ret = rte_intr_callback_unregister(&dev->req_notifier_handler, +pci_vfio_req_handler, +(void *)&dev->device); + if (ret) + RTE_LOG(ERR, EAL, + "Fail to unregister req notifier handler.\n"); + goto error; + } + + return 0; +error: + close(fd); + return -1; +} + +/* disable notifier (only disable req now) */ +static int +pci_vfio_disable_notifier(struct rte_pci_device *dev) +{ + int ret; + + ret = rte_intr_disable(&dev->req_notifier_handler); + if (ret) { + RTE_LOG(ERR, EAL, "fail to disable req notifier.\n"); + return -1; + } + + ret = rte_intr_callback_unregister(&dev->req_notifier_handler, + pci_vfio_req_handler, + (void *)&dev->device); + if (ret) { + RTE_LOG(ERR, EAL, +"fail to unregister req notifier handler.\n"); + return -1; + } + + close(dev->req_notifier_handler.fd); + return 0; +} + static int pci_vfio_is_ioport_bar(int vfio_dev_fd, int bar_index) { @@ -430,6 +527,7 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev) struct pci_map *maps; dev->intr_handle.fd = -1; + dev->req_notifier_handler.fd = -1; /* store PCI address string */ snprintf(pci_addr, sizeof(pci_addr), PCI_PRI_FMT, @@ -521,6 +619,11 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev) goto err_vfio_res; } + if (pci_vfio_enable_notifier(dev, vfio_dev_fd) != 0) { + RTE_LOG(ERR, EAL, "Error setting up notifier!\n"); + goto err_vfio_res; + } + TAILQ_INSERT_T
Re: [dpdk-dev] [PATCH v2 1/5] net/bnx2x: fix logging to include dev name
On 9/29/2018 6:42 AM, Mody, Rasesh wrote: > Fix PMD logging scheme to include device name in the messages printed. > > Fixes: 540a211084a7 ("bnx2x: driver core") > Cc: sta...@dpdk.org > > Signed-off-by: Rasesh Mody Series applied to dpdk-next-net/master, thanks.
[dpdk-dev] DPDK passthrough to container in virtual machine
Hi, I'm trying to do OVS DPDK vhost port passthrough to a container resides in the VM. Here are steps followed: 1. Add DPDK port into OVS integration bridge. 2. Bring up an Ubuntu VM using virsh with vhostuser interface configuration. 3. In the VM, convert this virtio port into DPDK network device. How this DPDK interface can attached with docker container ? After the configuration, I would like to run a sample DPDK application (example: testpmd) on the container. Thanks, Periyasamy