date:20181002

Re: [dpdk-dev] [PATCH v7] app/testpmd: add forwarding mode to simulate a noisy neighbour

2018-10-02 Thread Jens Freimann


Hi Bernard,

On Mon, Oct 01, 2018 at 01:13:32PM +, Iremonger, Bernard wrote:

Hi Jens,


-Original Message-
From: Jens Freimann [mailto:jfreim...@redhat.com]
Sent: Friday, September 21, 2018 2:27 PM
To: dev@dpdk.org
Cc: ai...@redhat.com; jan.scheur...@ericsson.com; Richardson, Bruce
; tho...@monjalon.net;
maxime.coque...@redhat.com; Ananyev, Konstantin
; Yigit, Ferruh ;
Iremonger, Bernard ; ktray...@redhat.com
Subject: [PATCH v7] app/testpmd: add forwarding mode to simulate a noisy
neighbor


./devtools/check-git-log.sh -1
Headline too long:
   app/testpmd: add forwarding mode to simulate a noisy neighbour


I'm sorry, I failed to use checkpatches.sh correctly :) I did:

#> git show | 
DPDK_CHECKPATCH_PATH="/home/jfreiman/code/linux/scripts/checkpatch.pl" 
devtools/checkpatches.sh --

1/1 valid patch

I'll fix all errors/warnings and re-sent


Thx!

regards,
Jens

[dpdk-dev] [PATCH v8] app/testpmd: add noisy neighbour forwarding mode

2018-10-02 Thread Jens Freimann

This adds a new forwarding mode to testpmd to simulate
more realistic behavior of a guest machine engaged in receiving
and sending packets performing Virtual Network Function (VNF).

The goal is to enable a simple way of measuring performance impact on
cache and memory footprint utilization from various VNF co-located on
the same host machine. For this it does:

* Buffer packets in a FIFO:

Create a fifo to buffer received packets. Once it flows over put
those packets into the actual tx queue. The fifo is created per tx
queue and its size can be set with the --noisy-tx-sw-buffer-flushtime
commandline parameter.

A second commandline parameter is used to set a timeout in
milliseconds after which the fifo is flushed.

--noisy-tx-sw-buffer-size [packet numbers]
Keep the mbuf in a FIFO and forward the over flooding packets from the
FIFO. This queue is per TX-queue (after all other packet processing).

--noisy-tx-sw-buffer-flushtime [delay]
Flush the packet queue if no packets have been seen during
[delay]. As long as packets are seen, the timer is reset.

Add several options to simulate route lookups (memory reads) in tables
that can be quite large, as well as route hit statistics update.
These options simulates the while stack traversal and
will trash the cache. Memory access is random.

* simulate route lookups:

Allocate a buffer and perform reads and writes on it as specified by
commandline options:

--noisy-lkup-memory [size]
Size of the VNF internal memory (MB), in which the random
read/write will be done, allocated by rte_malloc (hugepages).

--noisy-lkup-num-writes [num]
Number of random writes in memory per packet should be
performed, simulating hit-flags update. 64 bits per write,
all write in different cache lines.

--noisy-lkup-num-reads [num]
Number of random reads in memory per packet should be
performed, simulating FIB/table lookups. 64 bits per read,
all write in different cache lines.

--noisy-lkup-num-reads-writes [num]
Number of random reads and writes in memory per packet should
be performed, simulating stats update. 64 bits per read-write, all
reads and writes in different cache lines.

Signed-off-by: Jens Freimann 
Acked-by: Kevin Traynor 

---
v7-v8:
 * fix checkpatches.sh warnings/errors

v6->v7:
 * fix return value of mem allocation in noisy_begin
 * remove blank line
 * allow 0 as parameter value

v5->v6:
 * fix patch description
 * fix comment for pkt_burst_noisy_vnf
 * check if flush needed when no packets were received
 * free dropped packets
 * remove redundant else-if
 * do memory access simulation in all cases
 * change order of free'd data structures in noisy_fwd_end
 * only allocate one noisy_config struct per port
 * check for return of rte_ring_create()
 * change checking of input parameters in noisy_fwd_begin(). Decided to
   allow to set paramters to 0 (which is the default)
 * did not change use of "=N" in documentation as suggested by Kevin
   because it is how it's done for most other parameters
 * make error message match code in checking of flush time parameter
 * don't add whitespace in testpmd.h

v4->v5:
 * try to minimize impact in common code. Instead implement fwd_begin and
fwd_end
 * simplify actual fwd function
 * remove unnecessary casts (Kevin)
 * use more meaningful names for parameters and global variables (Kevin)
 * free ring and vnf_mem as well (Kevin)
 * squash documentation and code into single patch (Bernard)
 * fix patch subject to "app/testpmd" (Bernard)

 app/test-pmd/Makefile   |   1 +
 app/test-pmd/meson.build|   1 +
 app/test-pmd/noisy_vnf.c| 279 
 app/test-pmd/parameters.c   |  60 +
 app/test-pmd/testpmd.c  |  35 +++
 app/test-pmd/testpmd.h  |   8 +
 doc/guides/testpmd_app_ug/run_app.rst   |  33 +++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |   7 +-
 8 files changed, 422 insertions(+), 2 deletions(-)
 create mode 100644 app/test-pmd/noisy_vnf.c

diff --git a/app/test-pmd/Makefile b/app/test-pmd/Makefile
index 2b4d604b8..e2581ca66 100644
--- a/app/test-pmd/Makefile
+++ b/app/test-pmd/Makefile
@@ -33,6 +33,7 @@ SRCS-y += rxonly.c
 SRCS-y += txonly.c
 SRCS-y += csumonly.c
 SRCS-y += icmpecho.c
+SRCS-y += noisy_vnf.c
 SRCS-$(CONFIG_RTE_LIBRTE_IEEE1588) += ieee1588fwd.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_cmd.c
 
diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build
index a0b3be07f..9ef6ed957 100644
--- a/app/test-pmd/meson.build
+++ b/app/test-pmd/meson.build
@@ -17,6 +17,7 @@ sources = files('cmdline.c',
'iofwd.c',
'macfwd.c',
'macswap.c',
+   'noisy_vnf.c',
'parameters.c',
'rxonly.c',
'testpmd.c',
diff --git a/app/test-pmd/noisy_vnf.c b/app/test-pmd/noisy_vnf.c
new file mode 100644
index 0..58c4ee925
--- /dev/null
+++ b/app/test-pmd/noisy_vnf.c
@@ -0,0 +1,279 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Red Ha

Re: [dpdk-dev] [PATCH v7] app/testpmd: add forwarding mode to simulate a noisy neighbour

2018-10-02 Thread Thomas Monjalon

02/10/2018 09:19, Jens Freimann:
> On Mon, Oct 01, 2018 at 01:13:32PM +, Iremonger, Bernard wrote:
> >./devtools/check-git-log.sh -1
> >Headline too long:
> >app/testpmd: add forwarding mode to simulate a noisy neighbour
> 
> I'm sorry, I failed to use checkpatches.sh correctly :) I did:
> 
> #> git show | 
> DPDK_CHECKPATCH_PATH="/home/jfreiman/code/linux/scripts/checkpatch.pl" 
> devtools/checkpatches.sh --
> 
> 1/1 valid patch

Why this command is not correct?

Re: [dpdk-dev] [PATCH] net/softnic: add flow flush API

2018-10-02 Thread Singh, Jasvinder





> --- a/drivers/net/softnic/rte_eth_softnic_flow.c
> +++ b/drivers/net/softnic/rte_eth_softnic_flow.c
> @@ -1915,6 +1915,50 @@ pmd_flow_destroy(struct rte_eth_dev *dev,
>   return 0;
>  }
> 
> +static int
> +pmd_flow_flush(struct rte_eth_dev *dev,
> + struct rte_flow_error *error)
> +{
> + struct pmd_internals *softnic = dev->data->dev_private;
> + struct pipeline *pipeline;
> + int status;
> + uint32_t i = 0;
> +
> + TAILQ_FOREACH(pipeline, &softnic->pipeline_list, node) {

Removing elements when iterating tailq lists with TAILQ_FOREACH macro is not 
safe.
Instead, use TAILQ_FOREACH_SAFE  for safe tailq element removal within the loop.

> + /* Remove all the flows added to the tables. */
> + for (i = 0; i < pipeline->n_tables; i++) {
> + struct softnic_table *table;
> + struct rte_flow *flow;
> +
> + table = &pipeline->table[i];
> + TAILQ_FOREACH(flow, &table->flows, node) {
> + /* Rule delete. */
> + status = softnic_pipeline_table_rule_delete
> + (softnic,
> + flow->pipeline->name,
> + flow->table_id,
> + &flow->match);
> + if (status)
> + return rte_flow_error_set(error,
> + EINVAL,
> +
>   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> + NULL,
> + "Pipeline table rule delete
> failed");
> +
> + /* Update dependencies */
> + if (is_meter_action_enable(softnic, table))
> + flow_meter_owner_reset(softnic, flow);
Fix Indentation here.
> + /* Flow delete. */
> + TAILQ_REMOVE(&table->flows, flow, node);
> + free(flow);
> + }
> + }
> + }
> +
> + return 0;
> +}
> +
>  static int
>  pmd_flow_query(struct rte_eth_dev *dev __rte_unused,
>   struct rte_flow *flow,
> @@ -1971,7 +2015,7 @@ const struct rte_flow_ops pmd_flow_ops = {
>   .validate = pmd_flow_validate,
>   .create = pmd_flow_create,
>   .destroy = pmd_flow_destroy,
> - .flush = NULL,
> + .flush = pmd_flow_flush,
>   .query = pmd_flow_query,
>   .isolate = NULL,
>  };
> --
> 2.17.1

Re: [dpdk-dev] [PATCH v3 00/15] bnxt patchset

2018-10-02 Thread Ferruh Yigit

On 9/29/2018 2:59 AM, Ajit Khaparde wrote:
> Patchset against dpdk-next-net.
> 
> v1->v2:
> net/bnxt: get rid of ff pools array and use the vnic info array instead
> - Fix access to uninitialized variable.
> - Rectify the wrong 'Fixes' reference.
> 
> net/bnxt: update HWRM version
> - Update from 1.9.2.45 to version 1.9.2.53
> 
> v2->v3:
> net/bnxt: update HWRM version
> - Tried to split into more than one patch.
> 
> - Updated commit logs and messages for rest based on review comments.
> 
> Please apply.
> 
> Ajit Khaparde (10):
>   net/bnxt: fix MTU setting
>   net/bnxt: update HWRM version
>   net/bnxt: update HWRM version part 2
>   net/bnxt: update HWRM version part 3

Is there a logical to separation of part 1,2 & 3?
Commit logs are empty and there is nothing distinctive from commits. If the
separation is not logical but just physically split into 3 pieces I am for
merging them back with the "Update the HWRM API to version 1.9.2.53" commit log.

Or if there is a logic please clarify it on patch subject and commit log.

I will wait your answer before moving on.

Thanks,
ferruh

Re: [dpdk-dev] [PATCH v2 1/2] eal: add eal option to configure iova mode

2018-10-02 Thread Ferruh Yigit

On 10/1/2018 5:00 PM, Eric Zhang wrote:
> 
> 
> On 09/26/2018 08:42 AM, Burakov, Anatoly wrote:
>> On 18-Sep-18 8:10 PM, eric zhang wrote:
>>> From: Santosh Shukla 
>>>
>>> In the case of user don't want to use bus iova scheme and want
>>> to override.
>>>
>>> For that, Adding eal option --iova-mode= where valid input
>>> string is 'pa' or 'va'.
>>>
>>> Signed-off-by: Santosh Shukla 
>>> Signed-off-by: Jerin Jacob 
>>> ---
>>
>> Needs documentation update in Programmer's Guide to explain why such a 
>> thing might be needed, and update EAL parameter guides.
>>
>> For the patch itself,
>> Acked-by: Anatoly Burakov 
> Thanks Anatoly. Documentations were updated and patch is at 
> http://patchwork.dpdk.org/patch/45785/. Would you please give a review?

I suggest sending a new version of this patchset with that patch included,
instead of two separate patches.
Makes life easy for people that needs to follow that dependency and good for
keeping record for future.

Re: [dpdk-dev] [PATCH v6 5/5] vhost: message handling implemented as a callback array

2018-10-02 Thread Maxime Coquelin


Hi Nikolay,

On 09/24/2018 10:17 PM, Nikolay Nikolaev wrote:

Introduce vhost_message_handlers, which maps the message request
type to the message handler. Then replace the switch construct
with a map and call.

Failing vhost_user_set_features is fatal and all processing should
stop immediately and propagate the error to the upper layers. Change
the code accordingly to reflect that.

Signed-off-by: Nikolay Nikolaev 
---
  lib/librte_vhost/vhost_user.c |  150 -
  1 file changed, 57 insertions(+), 93 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index e1b705fa7..f6ce8e092 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1477,6 +1477,35 @@ vhost_user_iotlb_msg(struct virtio_net **pdev, struct 
VhostUserMsg *msg)
return VH_RESULT_OK;
  }
  
+typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,

+   struct VhostUserMsg *msg);
+static vhost_message_handler_t vhost_message_handlers[VHOST_USER_MAX] = {
+   [VHOST_USER_NONE] = NULL,
+   [VHOST_USER_GET_FEATURES] = vhost_user_get_features,
+   [VHOST_USER_SET_FEATURES] = vhost_user_set_features,
+   [VHOST_USER_SET_OWNER] = vhost_user_set_owner,
+   [VHOST_USER_RESET_OWNER] = vhost_user_reset_owner,
+   [VHOST_USER_SET_MEM_TABLE] = vhost_user_set_mem_table,
+   [VHOST_USER_SET_LOG_BASE] = vhost_user_set_log_base,
+   [VHOST_USER_SET_LOG_FD] = vhost_user_set_log_fd,
+   [VHOST_USER_SET_VRING_NUM] = vhost_user_set_vring_num,
+   [VHOST_USER_SET_VRING_ADDR] = vhost_user_set_vring_addr,
+   [VHOST_USER_SET_VRING_BASE] = vhost_user_set_vring_base,
+   [VHOST_USER_GET_VRING_BASE] = vhost_user_get_vring_base,
+   [VHOST_USER_SET_VRING_KICK] = vhost_user_set_vring_kick,
+   [VHOST_USER_SET_VRING_CALL] = vhost_user_set_vring_call,
+   [VHOST_USER_SET_VRING_ERR] = vhost_user_set_vring_err,
+   [VHOST_USER_GET_PROTOCOL_FEATURES] = vhost_user_get_protocol_features,
+   [VHOST_USER_SET_PROTOCOL_FEATURES] = vhost_user_set_protocol_features,
+   [VHOST_USER_GET_QUEUE_NUM] = vhost_user_get_queue_num,
+   [VHOST_USER_SET_VRING_ENABLE] = vhost_user_set_vring_enable,
+   [VHOST_USER_SEND_RARP] = vhost_user_send_rarp,
+   [VHOST_USER_NET_SET_MTU] = vhost_user_net_set_mtu,
+   [VHOST_USER_SET_SLAVE_REQ_FD] = vhost_user_set_req_fd,
+   [VHOST_USER_IOTLB_MSG] = vhost_user_iotlb_msg,
+};
+
+
  /* return bytes# of read on success or negative val on failure. */
  static int
  read_vhost_message(int sockfd, struct VhostUserMsg *msg)
@@ -1630,6 +1659,7 @@ vhost_user_msg_handler(int vid, int fd)
int ret;
int unlock_required = 0;
uint32_t skip_master = 0;
+   int request;
  
  	dev = get_device(vid);

if (dev == NULL)
@@ -1722,100 +1752,34 @@ vhost_user_msg_handler(int vid, int fd)
goto skip_to_post_handle;
}
  
-	switch (msg.request.master) {

-   case VHOST_USER_GET_FEATURES:
-   ret = vhost_user_get_features(&dev, &msg);
-   send_vhost_reply(fd, &msg);
-   break;
-   case VHOST_USER_SET_FEATURES:
-   ret = vhost_user_set_features(&dev, &msg);
-   break;
-
-   case VHOST_USER_GET_PROTOCOL_FEATURES:
-   ret = vhost_user_get_protocol_features(&dev, &msg);
-   send_vhost_reply(fd, &msg);
-   break;
-   case VHOST_USER_SET_PROTOCOL_FEATURES:
-   ret = vhost_user_set_protocol_features(&dev, &msg);
-   break;
-
-   case VHOST_USER_SET_OWNER:
-   ret = vhost_user_set_owner(&dev, &msg);
-   break;
-   case VHOST_USER_RESET_OWNER:
-   ret = vhost_user_reset_owner(&dev, &msg);
-   break;
-
-   case VHOST_USER_SET_MEM_TABLE:
-   ret = vhost_user_set_mem_table(&dev, &msg);
-   break;
-
-   case VHOST_USER_SET_LOG_BASE:
-   ret = vhost_user_set_log_base(&dev, &msg);
-   if (ret)
-   goto skip_to_reply;
-   /* it needs a reply */
-   send_vhost_reply(fd, &msg);
-   break;
-   case VHOST_USER_SET_LOG_FD:
-   ret = vhost_user_set_log_fd(&dev, &msg);
-   break;
-
-   case VHOST_USER_SET_VRING_NUM:
-   ret = vhost_user_set_vring_num(&dev, &msg);
-   break;
-   case VHOST_USER_SET_VRING_ADDR:
-   ret = vhost_user_set_vring_addr(&dev, &msg);
-   break;
-   case VHOST_USER_SET_VRING_BASE:
-   ret = vhost_user_set_vring_base(&dev, &msg);
-   break;
-
-   case VHOST_USER_GET_VRING_BASE:
-   ret = vhost_user_get_vring_base(&dev, &msg);
-   if (ret)
-   goto skip_to_reply;
-   send_vhost_reply(fd, &msg);
-   bre

[dpdk-dev] [PATCH v6 01/10] examples/power: add checks around hypervisor

2018-10-02 Thread David Hunt

Allow vm_power_manager to run without requiring qemu to be present
on the machine. This will be required for instances where the JSON
interface is used for commands and polices, without any VMs present.
A use case for this is a container enviromnent.

Signed-off-by: David Hunt 
Acked-by: Anatoly Burakov 
---
 examples/vm_power_manager/channel_manager.c | 71 +
 1 file changed, 43 insertions(+), 28 deletions(-)

diff --git a/examples/vm_power_manager/channel_manager.c 
b/examples/vm_power_manager/channel_manager.c
index 927fc35ab..2e471d0c1 100644
--- a/examples/vm_power_manager/channel_manager.c
+++ b/examples/vm_power_manager/channel_manager.c
@@ -43,7 +43,8 @@ static unsigned char *global_cpumaps;
 static virVcpuInfo *global_vircpuinfo;
 static size_t global_maplen;
 
-static unsigned global_n_host_cpus;
+static unsigned int global_n_host_cpus;
+static bool global_hypervisor_available;
 
 /*
  * Represents a single Virtual Machine
@@ -198,7 +199,11 @@ get_pcpus_mask(struct channel_info *chan_info, unsigned 
vcpu)
 {
struct virtual_machine_info *vm_info =
(struct virtual_machine_info *)chan_info->priv_info;
-   return rte_atomic64_read(&vm_info->pcpu_mask[vcpu]);
+
+   if (global_hypervisor_available && (vm_info != NULL))
+   return rte_atomic64_read(&vm_info->pcpu_mask[vcpu]);
+   else
+   return 0;
 }
 
 static inline int
@@ -559,6 +564,8 @@ get_all_vm(int *num_vm, int *num_vcpu)
VIR_CONNECT_LIST_DOMAINS_PERSISTENT;
unsigned int domain_flag = VIR_DOMAIN_VCPU_CONFIG;
 
+   if (!global_hypervisor_available)
+   return;
 
memset(global_cpumaps, 0, CHANNEL_CMDS_MAX_CPUS*global_maplen);
if (virNodeGetInfo(global_vir_conn_ptr, &node_info)) {
@@ -768,38 +775,42 @@ connect_hypervisor(const char *path)
}
return 0;
 }
-
 int
-channel_manager_init(const char *path)
+channel_manager_init(const char *path __rte_unused)
 {
virNodeInfo info;
 
LIST_INIT(&vm_list_head);
if (connect_hypervisor(path) < 0) {
-   RTE_LOG(ERR, CHANNEL_MANAGER, "Unable to initialize channel 
manager\n");
-   return -1;
-   }
-
-   global_maplen = VIR_CPU_MAPLEN(CHANNEL_CMDS_MAX_CPUS);
+   global_n_host_cpus = 64;
+   global_hypervisor_available = 0;
+   RTE_LOG(INFO, CHANNEL_MANAGER, "Unable to initialize channel 
manager\n");
+   } else {
+   global_hypervisor_available = 1;
+
+   global_maplen = VIR_CPU_MAPLEN(CHANNEL_CMDS_MAX_CPUS);
+
+   global_vircpuinfo = rte_zmalloc(NULL,
+   sizeof(*global_vircpuinfo) *
+   CHANNEL_CMDS_MAX_CPUS, RTE_CACHE_LINE_SIZE);
+   if (global_vircpuinfo == NULL) {
+   RTE_LOG(ERR, CHANNEL_MANAGER, "Error allocating memory 
for CPU Info\n");
+   goto error;
+   }
+   global_cpumaps = rte_zmalloc(NULL,
+   CHANNEL_CMDS_MAX_CPUS * global_maplen,
+   RTE_CACHE_LINE_SIZE);
+   if (global_cpumaps == NULL)
+   goto error;
 
-   global_vircpuinfo = rte_zmalloc(NULL, sizeof(*global_vircpuinfo) *
-   CHANNEL_CMDS_MAX_CPUS, RTE_CACHE_LINE_SIZE);
-   if (global_vircpuinfo == NULL) {
-   RTE_LOG(ERR, CHANNEL_MANAGER, "Error allocating memory for CPU 
Info\n");
-   goto error;
-   }
-   global_cpumaps = rte_zmalloc(NULL, CHANNEL_CMDS_MAX_CPUS * 
global_maplen,
-   RTE_CACHE_LINE_SIZE);
-   if (global_cpumaps == NULL) {
-   goto error;
+   if (virNodeGetInfo(global_vir_conn_ptr, &info)) {
+   RTE_LOG(ERR, CHANNEL_MANAGER, "Unable to retrieve node 
Info\n");
+   goto error;
+   }
+   global_n_host_cpus = (unsigned int)info.cpus;
}
 
-   if (virNodeGetInfo(global_vir_conn_ptr, &info)) {
-   RTE_LOG(ERR, CHANNEL_MANAGER, "Unable to retrieve node Info\n");
-   goto error;
-   }
 
-   global_n_host_cpus = (unsigned)info.cpus;
 
if (global_n_host_cpus > CHANNEL_CMDS_MAX_CPUS) {
RTE_LOG(WARNING, CHANNEL_MANAGER, "The number of host CPUs(%u) 
exceeds the "
@@ -811,7 +822,8 @@ channel_manager_init(const char *path)
 
return 0;
 error:
-   disconnect_hypervisor();
+   if (global_hypervisor_available)
+   disconnect_hypervisor();
return -1;
 }
 
@@ -838,7 +850,10 @@ channel_manager_exit(void)
rte_free(vm_info);
}
 
-   rte_free(global_cpumaps);
-   rte_free(global_vircpuinfo);
-   disconnect_hypervisor();
+   if (global_hypervisor_available) {
+   /* Only needed if hypervisor available */
+

[dpdk-dev] [PATCH v6 03/10] lib/power: add changes for host commands/policies

2018-10-02 Thread David Hunt

This patch does a couple of things:
  * Adds a new message type for removing policies (PKT_POLICY_REMOVE)
Used when we want to remove a previously created policy.
  * Adds a core_type bool to the channel packet struct to specify whether
the type of core we want to control is cirtual or physical.

Signed-off-by: David Hunt 
Acked-by: Anatoly Burakov 
---
 lib/librte_power/channel_commands.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/lib/librte_power/channel_commands.h 
b/lib/librte_power/channel_commands.h
index ee638eefa..e7b93a797 100644
--- a/lib/librte_power/channel_commands.h
+++ b/lib/librte_power/channel_commands.h
@@ -19,6 +19,7 @@ extern "C" {
 #define CPU_POWER   1
 #define CPU_POWER_CONNECT   2
 #define PKT_POLICY  3
+#define PKT_POLICY_REMOVE   4
 
 /* CPU Power Command Scaling */
 #define CPU_POWER_SCALE_UP  1
@@ -58,6 +59,9 @@ struct traffic {
uint32_t max_max_packet_thresh;
 };
 
+#define CORE_TYPE_VIRTUAL 0
+#define CORE_TYPE_PHYSICAL 1
+
 struct channel_packet {
uint64_t resource_id; /**< core_num, device */
uint32_t unit;/**< scale down/up/min/max */
@@ -70,6 +74,7 @@ struct channel_packet {
uint8_t vcpu_to_control[MAX_VCPU_PER_VM];
uint8_t num_vcpu;
struct timer_profile timer_policy;
+   bool core_type;
enum workload workload;
enum policy_to_use policy_to_use;
struct t_boost_status t_boost_status;
-- 
2.17.1

[dpdk-dev] [PATCH v6 04/10] examples/power: add necessary changes to guest app

2018-10-02 Thread David Hunt

The changes here are minimal, as the guest app functionality is not
changing at all, but there is a new element in the channel_packet
struct that needs to have a default set (channel_packet->core_type).

Signed-off-by: David Hunt 
Acked-by: Anatoly Burakov 
---
 examples/vm_power_manager/guest_cli/vm_power_cli_guest.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c 
b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c
index 0db1b804f..2d9e7689a 100644
--- a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c
+++ b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c
@@ -92,6 +92,7 @@ set_policy_defaults(struct channel_packet *pkt)
pkt->timer_policy.hours_to_use_traffic_profile[0] = 8;
pkt->timer_policy.hours_to_use_traffic_profile[1] = 10;
 
+   pkt->core_type = CORE_TYPE_VIRTUAL;
pkt->workload = LOW;
pkt->policy_to_use = TIME;
pkt->command = PKT_POLICY;
-- 
2.17.1

[dpdk-dev] [PATCH v6 02/10] examples/power: allow for number of vms to be zero

2018-10-02 Thread David Hunt

Previously the vm_power_manager app required to have some vms defined, so
the call to get_all_vm() always set the noVms variable. Now we're accepting
policies from the host OS (without any VMs defined), so it is now valid to
have zero VMs. This patch initialises the relevant variables to zero just
in case the call to get_all_vms() does not find any, so could return with
the variables uninitialised.

Signed-off-by: David Hunt 
Acked-by: Anatoly Burakov 
---
 examples/vm_power_manager/channel_monitor.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/vm_power_manager/channel_monitor.c 
b/examples/vm_power_manager/channel_monitor.c
index 7fa47ba97..f180d74e6 100644
--- a/examples/vm_power_manager/channel_monitor.c
+++ b/examples/vm_power_manager/channel_monitor.c
@@ -66,7 +66,7 @@ static void
 core_share_status(int pNo)
 {
 
-   int noVms, noVcpus, z, x, t;
+   int noVms = 0, noVcpus = 0, z, x, t;
 
get_all_vm(&noVms, &noVcpus);
 
-- 
2.17.1

[dpdk-dev] [PATCH v6 0/10] add json power policy interface for containers

2018-10-02 Thread David Hunt

The current vm_power_manager example app has the capability to accept power
policies from virtual machines via virtio-serial channels. These power
policies  allow a virtual machine to give information to the power manager
to allow the power manager take care of the power management of the virtual
machine based on the information in the policy.

This power policy functionality is limited to virtual machines sending
the policies to the power manager (which runs in the Host OS), and a solution
was needed for additional methods of sending power policies to the power
manager app.

The main use-case for this modification is for containers and host
applications that wish to send polices to the power manager.

This patchset adds the capability to send power polices and power commands
to the vm_power_manager app via JSON strings through a fifo on the file
system.
For example, given the following file, policy.json:

{"policy": {
  "name": "ubuntu2",
  "command": "create",
  "policy_type": "TIME",
  "busy_hours":[ 17, 18, 19, 20, 21, 22, 23 ],
  "quiet_hours":[ 2, 3, 4, 5, 6 ],
  "core_list":[ 11, 12, 13 ]
}}

Then running the command:

cat policy.json >/tmp/powermonitor/fifo

The policy is sent to the vm_power_manager. The power manager app then parses
the JSON data, and inserts the policy into the array of policies.

Part of the patch series contains documentation updates to give all the
details of the valid name-value pairs, the data types, etc.

Patch v2:
  * Fixed review comments from Stephen Hemminger and Lei A Yao.
  * Added a check in the Makefile for libjansson-dev. Will Warn user and build
without JSON functionality if not present, will build including JSON
functionality if it is present.

Patch v3:
  * Added meson/ninja support for vm_power_manager and guest_cli apps
  * Fixed compilation issue with guest_cli app

Patch v4:
  * Split out some unrelated changes to separate patches in the set
  * Some changes out of review by Anatoly (Thanks!)

Patch v5:
  * Removed the directory with JSON examples, as they already exist in
the documentation.
  * Fixed some typos and formatting issues in the documentation.
  * Changed the JSON examples in the documentation to 'javascript' causing
the syntax to be highlighted nicely.
  * Inherited the Acks from previous version.

Patch v6:
  * Added ability to set WORKLOAD policy to LOW, MEDIUM, or HIGH.
"workload": "MEDIUM"
  * Added missing functionality to allow passing of a list of mac
addresses for the TRAFFIC profile type.
"mac_list":[ "de:ad:be:ef:01:01", "de:ad:be:ef:01:02" ]
  * Updated docs to include both of the above additions.

[01/10] examples/power: add checks around hypervisor
[02/10] examples/power: allow for number of vms to be zero
[03/10] lib/power: add changes for host commands/policies
[04/10] examples/power: add necessary changes to guest app
[05/10] examples/power: add host channel to power manager
[06/10] examples/power: increase allowed number of clients
[07/10] examples/power: add json string handling
[08/10] examples/power: clean up verbose messages
[09/10] examples/power: add meson/ninja build support
[10/10] doc/vm_power_manager: add JSON interface API info

[dpdk-dev] [PATCH v6 06/10] examples/power: increase allowed number of clients

2018-10-02 Thread David Hunt

Now that we're handling host policies, containers and virtual machines,
we'll rename MAX_VMS to MAX_CLIENTS, and increase from 4 to 64

Signed-off-by: David Hunt 
Acked-by: Anatoly Burakov 
---
 examples/vm_power_manager/channel_manager.h |  4 ++--
 examples/vm_power_manager/channel_monitor.c | 10 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/examples/vm_power_manager/channel_manager.h 
b/examples/vm_power_manager/channel_manager.h
index e32235b07..d948b304c 100644
--- a/examples/vm_power_manager/channel_manager.h
+++ b/examples/vm_power_manager/channel_manager.h
@@ -37,7 +37,7 @@ struct sockaddr_un _sockaddr_un;
 #define UNIX_PATH_MAX sizeof(_sockaddr_un.sun_path)
 #endif
 
-#define MAX_VMS 4
+#define MAX_CLIENTS 64
 #define MAX_VCPUS 20
 
 
@@ -47,7 +47,7 @@ struct libvirt_vm_info {
uint8_t num_cpus;
 };
 
-struct libvirt_vm_info lvm_info[MAX_VMS];
+struct libvirt_vm_info lvm_info[MAX_CLIENTS];
 /* Communication Channel Status */
 enum channel_status { CHANNEL_MGR_CHANNEL_DISCONNECTED = 0,
CHANNEL_MGR_CHANNEL_CONNECTED,
diff --git a/examples/vm_power_manager/channel_monitor.c 
b/examples/vm_power_manager/channel_monitor.c
index c3c3d7bb1..53a4efe45 100644
--- a/examples/vm_power_manager/channel_monitor.c
+++ b/examples/vm_power_manager/channel_monitor.c
@@ -41,7 +41,7 @@ static volatile unsigned run_loop = 1;
 static int global_event_fd;
 static unsigned int policy_is_set;
 static struct epoll_event *global_events_list;
-static struct policy policies[MAX_VMS];
+static struct policy policies[MAX_CLIENTS];
 
 void channel_monitor_exit(void)
 {
@@ -199,7 +199,7 @@ update_policy(struct channel_packet *pkt)
RTE_LOG(INFO, CHANNEL_MONITOR,
"Applying policy for %s\n", pkt->vm_name);
 
-   for (i = 0; i < MAX_VMS; i++) {
+   for (i = 0; i < MAX_CLIENTS; i++) {
if (strcmp(policies[i].pkt.vm_name, pkt->vm_name) == 0) {
/* Copy the contents of *pkt into the policy.pkt */
policies[i].pkt = *pkt;
@@ -214,7 +214,7 @@ update_policy(struct channel_packet *pkt)
}
}
if (!updated) {
-   for (i = 0; i < MAX_VMS; i++) {
+   for (i = 0; i < MAX_CLIENTS; i++) {
if (policies[i].enabled == 0) {
policies[i].pkt = *pkt;
get_pcpu_to_control(&policies[i]);
@@ -238,7 +238,7 @@ remove_policy(struct channel_packet *pkt __rte_unused)
 * Disabling the policy is simply a case of setting
 * enabled to 0
 */
-   for (i = 0; i < MAX_VMS; i++) {
+   for (i = 0; i < MAX_CLIENTS; i++) {
if (strcmp(policies[i].pkt.vm_name, pkt->vm_name) == 0) {
policies[i].enabled = 0;
return 0;
@@ -609,7 +609,7 @@ run_channel_monitor(void)
if (policy_is_set) {
int j;
 
-   for (j = 0; j < MAX_VMS; j++) {
+   for (j = 0; j < MAX_CLIENTS; j++) {
if (policies[j].enabled == 1)
apply_policy(&policies[j]);
}
-- 
2.17.1

[dpdk-dev] [PATCH v6 07/10] examples/power: add json string handling

2018-10-02 Thread David Hunt

Add JSON string handling to vm_power_manager for JSON strings received
through the fifo. The format of the JSON strings are detailed in the
next patch, the vm_power_manager user guide documentation updates.

This patch introduces a new dependency on Jansson, a C library for
encoding, decoding and manipulating JSON data. To compile the sample app
you now need to have installed libjansson4 and libjansson-dev (these may
be named slightly differently depending on your Operating System)

Signed-off-by: David Hunt 
Acked-by: Anatoly Burakov 
---
 examples/vm_power_manager/Makefile  |   6 +
 examples/vm_power_manager/channel_monitor.c | 371 ++--
 2 files changed, 352 insertions(+), 25 deletions(-)

diff --git a/examples/vm_power_manager/Makefile 
b/examples/vm_power_manager/Makefile
index 13a5205ba..50147c05d 100644
--- a/examples/vm_power_manager/Makefile
+++ b/examples/vm_power_manager/Makefile
@@ -31,6 +31,12 @@ CFLAGS += $(WERROR_FLAGS)
 
 LDLIBS += -lvirt
 
+JANSSON := $(shell pkg-config --exists jansson; echo $$?)
+ifeq ($(JANSSON), 0)
+LDLIBS += $(shell pkg-config --libs jansson)
+CFLAGS += -DUSE_JANSSON
+endif
+
 ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
 
 ifeq ($(CONFIG_RTE_LIBRTE_IXGBE_PMD),y)
diff --git a/examples/vm_power_manager/channel_monitor.c 
b/examples/vm_power_manager/channel_monitor.c
index 53a4efe45..afb44a069 100644
--- a/examples/vm_power_manager/channel_monitor.c
+++ b/examples/vm_power_manager/channel_monitor.c
@@ -9,11 +9,18 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
-
+#include 
+#include 
+#ifdef USE_JANSSON
+#include 
+#else
+#pragma message "Jansson dev libs unavailable, not including JSON parsing"
+#endif
 #include 
 #include 
 #include 
@@ -35,6 +42,8 @@
 
 uint64_t vsi_pkt_count_prev[384];
 uint64_t rdtsc_prev[384];
+#define MAX_JSON_STRING_LEN 1024
+char json_data[MAX_JSON_STRING_LEN];
 
 double time_period_ms = 1;
 static volatile unsigned run_loop = 1;
@@ -43,6 +52,234 @@ static unsigned int policy_is_set;
 static struct epoll_event *global_events_list;
 static struct policy policies[MAX_CLIENTS];
 
+#ifdef USE_JANSSON
+
+union PFID {
+   struct ether_addr addr;
+   uint64_t pfid;
+};
+
+static int
+str_to_ether_addr(const char *a, struct ether_addr *ether_addr)
+{
+   int i;
+   char *end;
+   unsigned long o[ETHER_ADDR_LEN];
+
+   i = 0;
+   do {
+   errno = 0;
+   o[i] = strtoul(a, &end, 16);
+   if (errno != 0 || end == a || (end[0] != ':' && end[0] != 0))
+   return -1;
+   a = end + 1;
+   } while (++i != RTE_DIM(o) / sizeof(o[0]) && end[0] != 0);
+
+   /* Junk at the end of line */
+   if (end[0] != 0)
+   return -1;
+
+   /* Support the format XX:XX:XX:XX:XX:XX */
+   if (i == ETHER_ADDR_LEN) {
+   while (i-- != 0) {
+   if (o[i] > UINT8_MAX)
+   return -1;
+   ether_addr->addr_bytes[i] = (uint8_t)o[i];
+   }
+   /* Support the format :: */
+   } else if (i == ETHER_ADDR_LEN / 2) {
+   while (i-- != 0) {
+   if (o[i] > UINT16_MAX)
+   return -1;
+   ether_addr->addr_bytes[i * 2] =
+   (uint8_t)(o[i] >> 8);
+   ether_addr->addr_bytes[i * 2 + 1] =
+   (uint8_t)(o[i] & 0xff);
+   }
+   /* unknown format */
+   } else
+   return -1;
+
+   return 0;
+}
+
+static int
+set_policy_mac(struct channel_packet *pkt, int idx, char *mac)
+{
+   union PFID pfid;
+   int ret;
+
+   /* Use port MAC address as the vfid */
+   ret = str_to_ether_addr(mac, &pfid.addr);
+
+   if (ret != 0) {
+   RTE_LOG(ERR, CHANNEL_MONITOR,
+   "Invalid mac address received in JSON\n");
+   pkt->vfid[idx] = 0;
+   return -1;
+   }
+
+   printf("Received MAC Address: %02" PRIx8 ":%02" PRIx8 ":%02" PRIx8 ":"
+   "%02" PRIx8 ":%02" PRIx8 ":%02" PRIx8 "\n",
+   pfid.addr.addr_bytes[0], pfid.addr.addr_bytes[1],
+   pfid.addr.addr_bytes[2], pfid.addr.addr_bytes[3],
+   pfid.addr.addr_bytes[4], pfid.addr.addr_bytes[5]);
+
+   pkt->vfid[idx] = pfid.pfid;
+   return 0;
+}
+
+
+static int
+parse_json_to_pkt(json_t *element, struct channel_packet *pkt)
+{
+   const char *key;
+   json_t *value;
+   int ret;
+
+   memset(pkt, 0, sizeof(struct channel_packet));
+
+   pkt->nb_mac_to_monitor = 0;
+   pkt->t_boost_status.tbEnabled = false;
+   pkt->workload = LOW;
+   pkt->policy_to_use = TIME;
+   pkt->command = PKT_POLICY;
+   pkt->core_type = CORE_TYPE_PHYSICAL;
+
+   json_object_foreach(element,

[dpdk-dev] [PATCH v6 08/10] examples/power: clean up verbose messages

2018-10-02 Thread David Hunt

Some messages appearing several times a second, removing as they are
unnecessary. Other less severe messages change from INFO to DEBUG

Signed-off-by: David Hunt 
Acked-by: Anatoly Burakov 
---
 examples/vm_power_manager/channel_monitor.c | 19 +--
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/examples/vm_power_manager/channel_monitor.c 
b/examples/vm_power_manager/channel_monitor.c
index afb44a069..5da531542 100644
--- a/examples/vm_power_manager/channel_monitor.c
+++ b/examples/vm_power_manager/channel_monitor.c
@@ -361,7 +361,7 @@ get_pcpu_to_control(struct policy *pol)
 
ci = get_core_info();
 
-   RTE_LOG(INFO, CHANNEL_MONITOR,
+   RTE_LOG(DEBUG, CHANNEL_MONITOR,
"Looking for pcpu for %s\n", pol->pkt.vm_name);
 
/*
@@ -528,8 +528,6 @@ apply_traffic_profile(struct policy *pol)
 
diff = get_pkt_diff(pol);
 
-   RTE_LOG(INFO, CHANNEL_MONITOR, "Applying traffic profile\n");
-
if (diff >= (pol->pkt.traffic_policy.max_max_packet_thresh)) {
for (count = 0; count < pol->pkt.num_vcpu; count++) {
if (pol->core_share[count].status != 1)
@@ -573,9 +571,6 @@ apply_time_profile(struct policy *pol)
if (pol->core_share[count].status != 1) {
power_manager_scale_core_max(
pol->core_share[count].pcpu);
-   RTE_LOG(INFO, CHANNEL_MONITOR,
-   "Scaling up core %d to max\n",
-   pol->core_share[count].pcpu);
}
}
break;
@@ -585,9 +580,6 @@ apply_time_profile(struct policy *pol)
if (pol->core_share[count].status != 1) {
power_manager_scale_core_min(
pol->core_share[count].pcpu);
-   RTE_LOG(INFO, CHANNEL_MONITOR,
-   "Scaling down core %d to min\n",
-   pol->core_share[count].pcpu);
}
}
break;
@@ -649,8 +641,6 @@ process_request(struct channel_packet *pkt, struct 
channel_info *chan_info)
if (chan_info == NULL)
return -1;
 
-   RTE_LOG(INFO, CHANNEL_MONITOR, "Processing Request %s\n", pkt->vm_name);
-
if (rte_atomic32_cmpset(&(chan_info->status), 
CHANNEL_MGR_CHANNEL_CONNECTED,
CHANNEL_MGR_CHANNEL_PROCESSING) == 0)
return -1;
@@ -719,8 +709,8 @@ process_request(struct channel_packet *pkt, struct 
channel_info *chan_info)
}
 
if (pkt->command == PKT_POLICY) {
-   RTE_LOG(INFO, CHANNEL_MONITOR,
-   "\nProcessing Policy request\n");
+   RTE_LOG(INFO, CHANNEL_MONITOR, "Processing policy request %s\n",
+   pkt->vm_name);
update_policy(pkt);
policy_is_set = 1;
}
@@ -904,7 +894,8 @@ run_channel_monitor(void)
global_events_list[i].data.ptr;
if ((global_events_list[i].events & EPOLLERR) ||
(global_events_list[i].events & EPOLLHUP)) {
-   RTE_LOG(DEBUG, CHANNEL_MONITOR, "Remote closed 
connection for "
+   RTE_LOG(INFO, CHANNEL_MONITOR,
+   "Remote closed connection for "
"channel '%s'\n",
chan_info->channel_path);
remove_channel(&chan_info);
-- 
2.17.1

[dpdk-dev] [PATCH v6 05/10] examples/power: add host channel to power manager

2018-10-02 Thread David Hunt

This patch adds a fifo channel to the vm_power_manager app through which
we can send commands and polices. Intended for sending JSON strings.
The fifo is at /tmp/powermonitor/fifo

Signed-off-by: David Hunt 
Acked-by: Anatoly Burakov 
---
 examples/vm_power_manager/channel_manager.c | 109 +++
 examples/vm_power_manager/channel_manager.h |  17 +++
 examples/vm_power_manager/channel_monitor.c | 142 +++-
 examples/vm_power_manager/main.c|   2 +
 4 files changed, 236 insertions(+), 34 deletions(-)

diff --git a/examples/vm_power_manager/channel_manager.c 
b/examples/vm_power_manager/channel_manager.c
index 2e471d0c1..4fac099df 100644
--- a/examples/vm_power_manager/channel_manager.c
+++ b/examples/vm_power_manager/channel_manager.c
@@ -13,6 +13,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -284,6 +285,38 @@ open_non_blocking_channel(struct channel_info *info)
return 0;
 }
 
+static int
+open_host_channel(struct channel_info *info)
+{
+   int flags;
+
+   info->fd = open(info->channel_path, O_RDWR | O_RSYNC);
+   if (info->fd == -1) {
+   RTE_LOG(ERR, CHANNEL_MANAGER, "Error(%s) opening fifo for 
'%s'\n",
+   strerror(errno),
+   info->channel_path);
+   return -1;
+   }
+
+   /* Get current flags */
+   flags = fcntl(info->fd, F_GETFL, 0);
+   if (flags < 0) {
+   RTE_LOG(WARNING, CHANNEL_MANAGER, "Error(%s) fcntl get flags 
socket for"
+   "'%s'\n", strerror(errno), info->channel_path);
+   return 1;
+   }
+   /* Set to Non Blocking */
+   flags |= O_NONBLOCK;
+   if (fcntl(info->fd, F_SETFL, flags) < 0) {
+   RTE_LOG(WARNING, CHANNEL_MANAGER,
+   "Error(%s) setting non-blocking "
+   "socket for '%s'\n",
+   strerror(errno), info->channel_path);
+   return -1;
+   }
+   return 0;
+}
+
 static int
 setup_channel_info(struct virtual_machine_info **vm_info_dptr,
struct channel_info **chan_info_dptr, unsigned channel_num)
@@ -294,6 +327,7 @@ setup_channel_info(struct virtual_machine_info 
**vm_info_dptr,
chan_info->channel_num = channel_num;
chan_info->priv_info = (void *)vm_info;
chan_info->status = CHANNEL_MGR_CHANNEL_DISCONNECTED;
+   chan_info->type = CHANNEL_TYPE_BINARY;
if (open_non_blocking_channel(chan_info) < 0) {
RTE_LOG(ERR, CHANNEL_MANAGER, "Could not open channel: "
"'%s' for VM '%s'\n",
@@ -316,6 +350,42 @@ setup_channel_info(struct virtual_machine_info 
**vm_info_dptr,
return 0;
 }
 
+static void
+fifo_path(char *dst, unsigned int len)
+{
+   snprintf(dst, len, "%sfifo", CHANNEL_MGR_SOCKET_PATH);
+}
+
+static int
+setup_host_channel_info(struct channel_info **chan_info_dptr,
+   unsigned int channel_num)
+{
+   struct channel_info *chan_info = *chan_info_dptr;
+
+   chan_info->channel_num = channel_num;
+   chan_info->priv_info = (void *)NULL;
+   chan_info->status = CHANNEL_MGR_CHANNEL_DISCONNECTED;
+   chan_info->type = CHANNEL_TYPE_JSON;
+
+   fifo_path(chan_info->channel_path, sizeof(chan_info->channel_path));
+
+   if (open_host_channel(chan_info) < 0) {
+   RTE_LOG(ERR, CHANNEL_MANAGER, "Could not open host channel: "
+   "'%s'\n",
+   chan_info->channel_path);
+   return -1;
+   }
+   if (add_channel_to_monitor(&chan_info) < 0) {
+   RTE_LOG(ERR, CHANNEL_MANAGER, "Could add channel: "
+   "'%s' to epoll ctl\n",
+   chan_info->channel_path);
+   return -1;
+
+   }
+   chan_info->status = CHANNEL_MGR_CHANNEL_CONNECTED;
+   return 0;
+}
+
 int
 add_all_channels(const char *vm_name)
 {
@@ -470,6 +540,45 @@ add_channels(const char *vm_name, unsigned *channel_list,
return num_channels_enabled;
 }
 
+int
+add_host_channel(void)
+{
+   struct channel_info *chan_info;
+   char socket_path[PATH_MAX];
+   int num_channels_enabled = 0;
+   int ret;
+
+   fifo_path(socket_path, sizeof(socket_path));
+
+   ret = mkfifo(socket_path, 0660);
+   if ((errno != EEXIST) && (ret < 0)) {
+   RTE_LOG(ERR, CHANNEL_MANAGER, "Cannot create fifo '%s' error: "
+   "%s\n", socket_path, strerror(errno));
+   return 0;
+   }
+
+   if (access(socket_path, F_OK) < 0) {
+   RTE_LOG(ERR, CHANNEL_MANAGER, "Channel path '%s' error: "
+   "%s\n", socket_path, strerror(errno));
+   return 0;
+   }
+   chan_info = rte_malloc(NULL, sizeof(*chan_info), 0);
+   if (chan_info == NULL) {
+

[dpdk-dev] [PATCH v6 10/10] doc/vm_power_manager: add JSON interface API info

2018-10-02 Thread David Hunt

Also added meson/ninja build info

Signed-off-by: David Hunt 
Acked-by: Marko Kovacevic 
---
 .../sample_app_ug/vm_power_management.rst | 300 +-
 1 file changed, 298 insertions(+), 2 deletions(-)

diff --git a/doc/guides/sample_app_ug/vm_power_management.rst 
b/doc/guides/sample_app_ug/vm_power_management.rst
index 855570d6b..1ad4f1490 100644
--- a/doc/guides/sample_app_ug/vm_power_management.rst
+++ b/doc/guides/sample_app_ug/vm_power_management.rst
@@ -199,7 +199,7 @@ see :doc:`compiling`.
 
 The application is located in the ``vm_power_manager`` sub-directory.
 
-To build just the ``vm_power_manager`` application:
+To build just the ``vm_power_manager`` application using ``make``:
 
 .. code-block:: console
 
@@ -208,6 +208,22 @@ To build just the ``vm_power_manager`` application:
   cd ${RTE_SDK}/examples/vm_power_manager/
   make
 
+The resulting binary will be ${RTE_SDK}/build/examples/vm_power_manager
+
+To build just the ``vm_power_manager`` application using ``meson/ninja``:
+
+.. code-block:: console
+
+  export RTE_SDK=/path/to/rte_sdk
+  cd ${RTE_SDK}
+  meson build
+  cd build
+  ninja
+  meson configure -Dexamples=vm_power_manager
+  ninja
+
+The resulting binary will be ${RTE_SDK}/build/examples/dpdk-vm_power_manager
+
 Running
 ~~~
 
@@ -337,6 +353,270 @@ monitoring of branch ratio on cores doing busy polling 
via PMDs.
   and will need to be adjusted for different workloads.
 
 
+
+JSON API
+
+
+In addition to the command line interface for host command and a virtio-serial
+interface for VM power policies, there is also a JSON interface through which
+power commands and policies can be sent. This functionality adds a dependency
+on the Jansson library, and the Jansson development package must be installed
+on the system before the JSON parsing functionality is included in the app.
+This is achieved by:
+
+  .. code-block:: javascript
+
+apt-get install libjansson-dev
+
+The command and package name may be different depending on your operating
+system. It's worth noting that the app will successfully build without this
+package present, but a warning is shown during compilation, and the JSON
+parsing functionality will not be present in the app.
+
+Sending a command or policy to the power manager application is achieved by
+simply opening a fifo file, writing a JSON string to that fifo, and closing
+the file.
+
+The fifo is at /tmp/powermonitor/fifo
+
+The jason string can be a policy or instruction, and takes the following
+format:
+
+  .. code-block:: javascript
+
+{"packet_type": {
+  "pair_1": value,
+  "pair_2": value
+}}
+
+The 'packet_type' header can contain one of two values, depending on
+whether a policy or power command is being sent. The two possible values are
+"policy" and "instruction", and the expected name-value pairs is different
+depending on which type is being sent.
+
+The pairs are the format of standard JSON name-value pairs. The value type
+varies between the different name/value pairs, and may be integers, strings,
+arrays, etc. Examples of policies follow later in this document. The allowed
+names and value types are as follows:
+
+
+:Pair Name: "name"
+:Description: Name of the VM or Host. Allows the parser to associate the
+  policy with the relevant VM or Host OS.
+:Type: string
+:Values: any valid string
+:Required: yes
+:Example:
+
+.. code-block:: javascript
+
+  "name", "ubuntu2"
+
+
+:Pair Name: "command"
+:Description: The type of packet we're sending to the power manager. We can be
+  creating or destroying a policy, or sending a direct command to adjust
+  the frequency of a core, similar to the command line interface.
+:Type: string
+:Values:
+
+  :CREATE: used when creating a new policy,
+  :DESTROY: used when removing a policy,
+  :POWER: used when sending an immediate command, max, min, etc.
+:Required: yes
+:Example:
+
+.. code-block:: javascript
+
+  "command", "CREATE"
+
+
+:Pair Name: "policy_type"
+:Description: Type of policy to apply. Please see vm_power_manager 
documentation
+  for more information on the types of policies that may be used.
+:Type: string
+:Values:
+
+  :TIME: Time-of-day policy. Frequencies of the relevant cores are
+scaled up/down depending on busy and quiet hours.
+  :TRAFFIC: This policy takes statistics from the NIC and scales up
+and down accordingly.
+  :WORKLOAD: This policy looks at how heavily loaded the cores are,
+and scales up and down accordingly.
+  :BRANCH_RATIO: This out-of-band policy can look at the ratio between
+branch hits and misses on a core, and is useful for detecting
+how much packet processing a core is doing.
+:Required: only for CREATE/DESTROY command
+:Example:
+
+  .. code-block:: javascript
+
+"policy_type", "TIME"
+
+:Pair Name: "busy_hours"
+:Description: The hours of the day in which we scale up the cores for busy
+  times.
+:Type: array of integers
+:Values: array with list of hour numbers, (0-23)

[dpdk-dev] [PATCH v6 09/10] examples/power: add meson/ninja build support

2018-10-02 Thread David Hunt

Add meson.build in vm_power_manager and the guest_cli subdirectory.
Building can be achieved by going to the build directory, and using

meson configure -Dexamples=vm_power_manager,vm_power_manager/guest_cli

Then, when ninja is invoked, it will build dpdk-vm_power_manger and
dpdk-guest_cli

Work still needs to be done on the meson build system to handles the case
where the target list of example apps is defined as 'all'. That will come
in a future patch.

Signed-off-by: David Hunt 
Acked-by: Bruce Richardson 
---
 .../vm_power_manager/guest_cli/meson.build| 21 +++
 examples/vm_power_manager/meson.build | 37 ++-
 2 files changed, 56 insertions(+), 2 deletions(-)
 create mode 100644 examples/vm_power_manager/guest_cli/meson.build

diff --git a/examples/vm_power_manager/guest_cli/meson.build 
b/examples/vm_power_manager/guest_cli/meson.build
new file mode 100644
index 0..9e821ceb8
--- /dev/null
+++ b/examples/vm_power_manager/guest_cli/meson.build
@@ -0,0 +1,21 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+# meson file, for building this example as part of a main DPDK build.
+#
+# To build this example as a standalone application with an already-installed
+# DPDK instance, use 'make'
+
+# Setting the name here because the default name will conflict with the
+# vm_power_manager app because of the way the directories are parsed.
+name = 'guest_cli'
+
+deps += ['power']
+
+sources = files(
+   'main.c', 'parse.c', 'vm_power_cli_guest.c'
+)
+
+opt_dep = cc.find_library('virt', required : false)
+build = opt_dep.found()
+ext_deps += opt_dep
diff --git a/examples/vm_power_manager/meson.build 
b/examples/vm_power_manager/meson.build
index c370d7476..f98445bc6 100644
--- a/examples/vm_power_manager/meson.build
+++ b/examples/vm_power_manager/meson.build
@@ -6,5 +6,38 @@
 # To build this example as a standalone application with an already-installed
 # DPDK instance, use 'make'
 
-# Example app currently unsupported by meson build
-build = false
+if dpdk_conf.has('RTE_LIBRTE_BNXT_PMD')
+   deps += ['pmd_bnxt']
+endif
+
+if dpdk_conf.has('RTE_LIBRTE_I40E_PMD')
+   deps += ['pmd_i40e']
+endif
+
+if dpdk_conf.has('RTE_LIBRTE_IXGBE_PMD')
+   deps += ['pmd_ixgbe']
+endif
+
+deps += ['power']
+
+
+sources = files(
+   'channel_manager.c', 'channel_monitor.c', 'main.c', 'parse.c', 
'power_manager.c', 'vm_power_cli.c'
+)
+
+# If we're on X86, pull in the x86 code for the branch monitor algo.
+if dpdk_conf.has('RTE_ARCH_X86_64')
+   sources += files('oob_monitor_x86.c')
+else
+   sources += files('oob_monitor_nop.c')
+endif
+
+opt_dep = cc.find_library('virt', required : false)
+build = opt_dep.found()
+ext_deps += opt_dep
+
+opt_dep = dependency('jansson', required : false)
+if opt_dep.found()
+   ext_deps += opt_dep
+   cflags += '-DUSE_JANSSON'
+endif
-- 
2.17.1

Re: [dpdk-dev] [PATCH] app/testpmd: check Rx VLAN offload flag to print VLAN TCI

2018-10-02 Thread Ferruh Yigit

On 10/2/2018 3:29 AM, Hyong Youb Kim wrote:
> On Mon, Oct 01, 2018 at 03:01:40PM +0100, Ferruh Yigit wrote:
>> On 9/26/2018 4:06 AM, John Daley wrote:
>>> From: Hyong Youb Kim 
>>>
>>> Since the following commit, PKT_RX_VLAN indicates the presence of
>>> mbuf's vlan_tci, not PKT_RX_VLAN_STRIPPED.
>>>
>>> commit 380a7aab1ae2 ("mbuf: rename deprecated VLAN flags")
>>> Cc: olivier.m...@6wind.com
>>>
>>> Signed-off-by: Hyong Youb Kim 
>>> Reviewed-by: John Daley 
>>> ---
>>>  app/test-pmd/rxonly.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
>>> index a93d80612..e8d226624 100644
>>> --- a/app/test-pmd/rxonly.c
>>> +++ b/app/test-pmd/rxonly.c
>>> @@ -130,7 +130,7 @@ pkt_burst_receive(struct fwd_stream *fs)
>>> }
>>> if (ol_flags & PKT_RX_TIMESTAMP)
>>> printf(" - timestamp %"PRIu64" ", mb->timestamp);
>>> -   if (ol_flags & PKT_RX_VLAN_STRIPPED)
>>> +   if (ol_flags & PKT_RX_VLAN)
>>> printf(" - VLAN tci=0x%x", mb->vlan_tci);
>>> if (ol_flags & PKT_RX_QINQ_STRIPPED)
>>> printf(" - QinQ VLAN tci=0x%x, VLAN tci outer=0x%x",
>>
>> Isn't same also correct for QinQ, PKT_RX_QINQ means mb->vlan_tci &
>> mb->vlan_tci_outer are set?
>>
> 
> That is a good point.
> 
> According to rte_mbuf.h, PKT_RX_QINQ means "The RX packet is a double
> VLAN, and the outer tci has been saved in in mbuf->vlan_tci_outer."
> 
> Here is a summary.
> PKT_RX_VLAN => vlan_tci is set
> PKT_RX_QINQ => vlan_tci_outer is set

Because of the comment on "PKT_RX_QINQ_STRIPPED" I think:
PKT_RX_QINQ => vlan_tci_outer & vlan_tci is set

Although it is not clear from "PKT_RX_QINQ" comment.

> PKT_RX_VLAN_STRIPPED => must also set PKT_RX_VLAN
> PKT_RX_QINQ_STRIPPED => must also set PKT_RX_VLAN, PKT_RX_QINQ,
> PKT_RX_VLAN_STRIPPED
> 
> Looks like i40e is the only driver that is using
> PKT_RX_QINQ_STRIPPED. And, it does not set PKT_RX_QINQ. I am CC'ing
> i40e maintainers.
> 
> Back to rxonly..
> 
> +   if (ol_flags & (PKT_RX_QINQ | PKT_RX_VLAN))
> printf(" - QinQ VLAN tci=0x%x, VLAN tci outer=0x%x",
> mb->vlan_tci, mb->vlan_tci_outer);
> 
> A change like this would be technically correct, but may break i40e
> test cases. Or, if the above message is really meant for 'stripped',
> then perhaps add comment or rephrase the message for now?
> 
> As for the use of PKT_RX_VLAN, some drivers like enic and ixgbe can
> set PKT_RX_VLAN independent of vlan stripping, which led me to writing
> this patch. I think Olivier fixed all drivers when he introduced
> PKT_RX_VLAN. So using PKT_RX_VLAN in rxonly shouldn't be breaking
> anyone's test cases.

+1 to PKT_RX_VLAN update.

I was thinking PKT_RX_QINQ also can be fixed quickly in testpmd with this patch,
taking into account that it may affect other piece of code, agree to get this
patch as it is and consider QINQ changes in different patch.

Re: [dpdk-dev] [PATCH] app/testpmd: check Rx VLAN offload flag to print VLAN TCI

2018-10-02 Thread Ferruh Yigit

On 9/26/2018 4:06 AM, John Daley wrote:
> From: Hyong Youb Kim 
> 
> Since the following commit, PKT_RX_VLAN indicates the presence of
> mbuf's vlan_tci, not PKT_RX_VLAN_STRIPPED.
> 
> commit 380a7aab1ae2 ("mbuf: rename deprecated VLAN flags")
> Cc: olivier.m...@6wind.com
> 
> Signed-off-by: Hyong Youb Kim 
> Reviewed-by: John Daley 

Reviewed-by: Ferruh Yigit

Re: [dpdk-dev] [PATCH 1/4] ethdev: add SCTP Rx checksum offload support

2018-10-02 Thread Jerin Jacob

-Original Message-
> Date: Mon, 1 Oct 2018 17:11:50 +0100
> From: Ferruh Yigit 
> To: Jerin Jacob 
> CC: Wenzhuo Lu , Jingjing Wu ,
>  Bernard Iremonger , John McNamara
>  , Marko Kovacevic ,
>  Thomas Monjalon , Andrew Rybchenko
>  , dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/4] ethdev: add SCTP Rx checksum offload
>  support
> User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
>  Thunderbird/52.9.1
> 
> On 10/1/2018 4:59 PM, Jerin Jacob wrote:
> > -Original Message-
> >> Date: Mon, 1 Oct 2018 14:46:39 +0100
> >> From: Ferruh Yigit 
> >> To: Jerin Jacob , Wenzhuo Lu
> >>  , Jingjing Wu , Bernard
> >>  Iremonger , John McNamara
> >>  , Marko Kovacevic ,
> >>  Thomas Monjalon , Andrew Rybchenko
> >>  
> >> CC: dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH 1/4] ethdev: add SCTP Rx checksum offload
> >>  support
> >> User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
> >>  Thunderbird/52.9.1
> >>
> >>
> >> On 9/13/2018 2:47 PM, Jerin Jacob wrote:
> >>> Signed-off-by: Jerin Jacob 
> >>
> >> Overall set looks good to me, I put some comments on individual patches.
> >>
> >> And can you please rebase on top of latest head?
> >
> > Sure.
> >
> > Regarding space issue mentioned in other email in this thread.
> > It looks like similar space added in other offloads.
> > example: http://git.dpdk.org/dpdk/tree/app/test-pmd/config.c#n571
> 
> Hi Jerin,
> 
> This is just detail, the alignment is broken in the output of the log, for
> others on/off start from column 56, for this one it is 57, just delete a space
> from printf please.

Sure Ferruh. Will add it in v2

> 
> >
> > So, I expect no change in this patch other than rebase to latest head.
> > If not, let me know.
> >
> >>
> >> Thanks,
> >> ferruh
> >>
>

Re: [dpdk-dev] [PATCH v6 4/5] vhost: unify message handling function signature

2018-10-02 Thread Maxime Coquelin





On 09/24/2018 10:17 PM, Nikolay Nikolaev wrote:

Each vhost-user message handling function will return an int result
which is described in the new enum vh_result: error, OK and reply.
All functions will now have two arguments, virtio_net double pointer
and VhostUserMsg pointer.

Signed-off-by: Nikolay Nikolaev 
---
  lib/librte_vhost/vhost_user.c |  211 -
  1 file changed, 125 insertions(+), 86 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 77905dda0..e1b705fa7 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -71,6 +71,16 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
[VHOST_USER_CRYPTO_CLOSE_SESS] = "VHOST_USER_CRYPTO_CLOSE_SESS",
  };
  
+/* The possible results of a message handling function */

+enum vh_result {
+   /* Message handling failed */
+   VH_RESULT_ERR   = -1,
+   /* Message handling successful */
+   VH_RESULT_OK=  0,
+   /* Message handling successful and reply prepared */
+   VH_RESULT_REPLY =  1,
+};
+




-vhost_user_get_vring_base(struct virtio_net *dev,
+vhost_user_get_vring_base(struct virtio_net **pdev,
  struct VhostUserMsg *msg)
  {
+   struct virtio_net *dev = *pdev;
struct vhost_virtqueue *vq = dev->virtqueue[msg->payload.state.index];
  
  	/* We have to stop the queue (virtio) if it is running. */

@@ -1135,7 +1161,7 @@ vhost_user_get_vring_base(struct virtio_net *dev,
  
  	msg->size = sizeof(msg->payload.state);
  
-	return 0;

+   return VH_RESULT_OK;
  }


VH_RESULT_REPLY here.


-static void
-vhost_user_get_protocol_features(struct virtio_net *dev,
+static int
+vhost_user_get_protocol_features(struct virtio_net **pdev,
 struct VhostUserMsg *msg)
  {
+   struct virtio_net *dev = *pdev;
uint64_t features, protocol_features;
  
  	rte_vhost_driver_get_features(dev->ifname, &features);

@@ -1189,40 +1217,46 @@ vhost_user_get_protocol_features(struct virtio_net *dev,
  
  	msg->payload.u64 = protocol_features;

msg->size = sizeof(msg->payload.u64);
+
+   return VH_RESULT_OK;
  }


Ditto.

I have the patches to fix these, it will be posted as preliminary part 
of my postcopy series.


Please, next time, test your series before posting.

Thanks,
Maxime

Re: [dpdk-dev] [PATCH v7] app/testpmd: add forwarding mode to simulate a noisy neighbour

2018-10-02 Thread Jens Freimann

On Tue, Oct 02, 2018 at 09:57:52AM +0200, Thomas Monjalon wrote:

02/10/2018 09:19, Jens Freimann:

On Mon, Oct 01, 2018 at 01:13:32PM +, Iremonger, Bernard wrote:
>./devtools/check-git-log.sh -1
>Headline too long:
>app/testpmd: add forwarding mode to simulate a noisy neighbour

I'm sorry, I failed to use checkpatches.sh correctly :) I did:

#> git show | 
DPDK_CHECKPATCH_PATH="/home/jfreiman/code/linux/scripts/checkpatch.pl" 
devtools/checkpatches.sh --

1/1 valid patch

Why this command is not correct?

checkpatches.sh looks for the string "Subject:" which is not included
in git show output. Using cat on the patch file instead will work. 

regards,
Jens

Re: [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list

2018-10-02 Thread Burakov, Anatoly


On 01-Oct-18 6:01 PM, Stephen Hemminger wrote:

On Mon,  1 Oct 2018 13:56:09 +0100
Anatoly Burakov  wrote:


diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h 
b/lib/librte_eal/common/include/rte_eal_memconfig.h
index aff0688dd..1d8b0a6fe 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -30,6 +30,7 @@ struct rte_memseg_list {
uint64_t addr_64;
/**< Makes sure addr is always 64-bits */
};
+   size_t len; /**< Length of memory area covered by this memseg list. */
int socket_id; /**< Socket ID for all memsegs in this list. */
uint64_t page_sz; /**< Page size for all memsegs in this list. */
volatile uint32_t version; /**< version number for multiprocess sync. */


If you are going to break ABI, why not try and rearrange to eliminate holes:

Output of pahole (on x86 64 bit):

struct rte_memseg_list {
union {
void * base_va;  /* 0 8 */
uint64_t   addr_64;  /* 0 8 */
};   /* 0 8 */
size_t len;  /* 8 8 */
intsocket_id;/*16 4 */

/* XXX 4 bytes hole, try to pack */

uint64_t   page_sz;  /*24 8 */
volatile uint32_t  version;  /*32 4 */

/* XXX 4 bytes hole, try to pack */

struct rte_fbarray memseg_arr;   /*4096 */

/* XXX last struct has 4 bytes of padding */

/* size: 136, cachelines: 3, members: 6 */
/* sum members: 128, holes: 2, sum holes: 8 */
/* paddings: 1, sum paddings: 4 */
/* last cacheline: 8 bytes */
};



Hi Stephen,

This data structure isn't performance-critical in any remote sense, but 
sure, I can do that.


--
Thanks,
Anatoly

Re: [dpdk-dev] [PATCH 1/4] ethdev: add SCTP Rx checksum offload support

2018-10-02 Thread Ferruh Yigit

On 10/1/2018 4:59 PM, Jerin Jacob wrote:
> -Original Message-
>> Date: Mon, 1 Oct 2018 14:46:39 +0100
>> From: Ferruh Yigit 
>> To: Jerin Jacob , Wenzhuo Lu
>>  , Jingjing Wu , Bernard
>>  Iremonger , John McNamara
>>  , Marko Kovacevic ,
>>  Thomas Monjalon , Andrew Rybchenko
>>  
>> CC: dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH 1/4] ethdev: add SCTP Rx checksum offload
>>  support
>> User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
>>  Thunderbird/52.9.1
>>
>>
>> On 9/13/2018 2:47 PM, Jerin Jacob wrote:
>>> Signed-off-by: Jerin Jacob 
>>
>> Overall set looks good to me, I put some comments on individual patches.
>>
>> And can you please rebase on top of latest head?
> 
> Sure.
> 
> Regarding space issue mentioned in other email in this thread.
> It looks like similar space added in other offloads.
> example: http://git.dpdk.org/dpdk/tree/app/test-pmd/config.c#n571
> 
> So, I expect no change in this patch other than rebase to latest head.
> If not, let me know.

As commented to the patch, can you also check "csum show", "csum set" functions
in testpmd, I think they are affected and need to be updated with your patch.

[dpdk-dev] [PATCH] mbuf: clarify QINQ flag usage

2018-10-02 Thread Ferruh Yigit

Update implementation that when PKT_RX_QINQ_STRIPPED mbuf ol_flags
set by PMD, PKT_RX_QINQ, PKT_RX_VLAN_STRIPPED & PKT_RX_VLAN
should be also set.

Clarify mbuf documentations that when PKT_RX_QINQ set PKT_RX_VLAN also
should be set.

So that appllication can rely on PKT_RX_QINQ flag to access both
mbuf.vlan_tci & mbuf.vlan_tci_outer

Signed-off-by: Ferruh Yigit 
---
Cc: Hyong Youb Kim 
Cc: John Daley 
---
 app/test-pmd/rxonly.c| 2 +-
 doc/guides/nics/features.rst | 7 ---
 drivers/net/i40e/i40e_rxtx.c | 3 ++-
 lib/librte_mbuf/rte_mbuf.c   | 1 +
 lib/librte_mbuf/rte_mbuf.h   | 5 +++--
 5 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
index a93d80612..08a5fc2cf 100644
--- a/app/test-pmd/rxonly.c
+++ b/app/test-pmd/rxonly.c
@@ -132,7 +132,7 @@ pkt_burst_receive(struct fwd_stream *fs)
printf(" - timestamp %"PRIu64" ", mb->timestamp);
if (ol_flags & PKT_RX_VLAN_STRIPPED)
printf(" - VLAN tci=0x%x", mb->vlan_tci);
-   if (ol_flags & PKT_RX_QINQ_STRIPPED)
+   if (ol_flags & PKT_RX_QINQ)
printf(" - QinQ VLAN tci=0x%x, VLAN tci outer=0x%x",
mb->vlan_tci, mb->vlan_tci_outer);
if (mb->packet_type) {
diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index b085bda86..c0cbe3784 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -528,7 +528,7 @@ Supports VLAN offload to hardware.
 * **[uses]   rte_eth_rxconf,rte_eth_rxmode**: 
``offloads:DEV_RX_OFFLOAD_VLAN_STRIP,DEV_RX_OFFLOAD_VLAN_FILTER,DEV_RX_OFFLOAD_VLAN_EXTEND``.
 * **[uses]   rte_eth_txconf,rte_eth_txmode**: 
``offloads:DEV_TX_OFFLOAD_VLAN_INSERT``.
 * **[implements] eth_dev_ops**: ``vlan_offload_set``.
-* **[provides]   mbuf**: ``mbuf.ol_flags:PKT_RX_VLAN_STRIPPED``, 
``mbuf.vlan_tci``.
+* **[provides]   mbuf**: ``mbuf.ol_flags:PKT_RX_VLAN_STRIPPED``, 
``mbuf.ol_flags:PKT_RX_VLAN`` ``mbuf.vlan_tci``.
 * **[provides]   rte_eth_dev_info**: 
``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_VLAN_STRIP``,
   ``tx_offload_capa,tx_queue_offload_capa:DEV_TX_OFFLOAD_VLAN_INSERT``.
 * **[related]API**: ``rte_eth_dev_set_vlan_offload()``,
@@ -545,8 +545,9 @@ Supports QinQ (queue in queue) offload.
 * **[uses] rte_eth_rxconf,rte_eth_rxmode**: 
``offloads:DEV_RX_OFFLOAD_QINQ_STRIP``.
 * **[uses] rte_eth_txconf,rte_eth_txmode**: 
``offloads:DEV_TX_OFFLOAD_QINQ_INSERT``.
 * **[uses] mbuf**: ``mbuf.ol_flags:PKT_TX_QINQ_PKT``.
-* **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_QINQ_STRIPPED``, 
``mbuf.vlan_tci``,
-   ``mbuf.vlan_tci_outer``.
+* **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_QINQ_STRIPPED``, 
``mbuf.ol_flags:PKT_RX_QINQ``,
+  ``mbuf.ol_flags:PKT_RX_VLAN_STRIPPED``, ``mbuf.ol_flags:PKT_RX_VLAN``
+  ``mbuf.vlan_tci``, ``mbuf.vlan_tci_outer``.
 * **[provides] rte_eth_dev_info**: 
``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_QINQ_STRIP``,
   ``tx_offload_capa,tx_queue_offload_capa:DEV_TX_OFFLOAD_QINQ_INSERT``.
 
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 7c986d535..b2819f757 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -83,7 +83,8 @@ i40e_rxd_to_vlan_tci(struct rte_mbuf *mb, volatile union 
i40e_rx_desc *rxdp)
 #ifndef RTE_LIBRTE_I40E_16BYTE_RX_DESC
if (rte_le_to_cpu_16(rxdp->wb.qword2.ext_status) &
(1 << I40E_RX_DESC_EXT_STATUS_L2TAG2P_SHIFT)) {
-   mb->ol_flags |= PKT_RX_QINQ_STRIPPED;
+   mb->ol_flags |= PKT_RX_QINQ_STRIPPED | PKT_RX_QINQ |
+   PKT_RX_VLAN_STRIPPED | PKT_RX_VLAN;
mb->vlan_tci_outer = mb->vlan_tci;
mb->vlan_tci = rte_le_to_cpu_16(rxdp->wb.qword2.l2tag2_2);
PMD_RX_LOG(DEBUG, "Descriptor l2tag2_1: %u, l2tag2_2: %u",
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index e714c5a59..05a5a17fe 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -297,6 +297,7 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
case PKT_RX_QINQ_STRIPPED: return "PKT_RX_QINQ_STRIPPED";
+   case PKT_RX_QINQ: return "PKT_RX_QINQ";
case PKT_RX_LRO: return "PKT_RX_LRO";
case PKT_RX_TIMESTAMP: return "PKT_RX_TIMESTAMP";
case PKT_RX_SEC_OFFLOAD: return "PKT_RX_SEC_OFFLOAD";
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index a50b05c64..d018f19bd 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -140,7 +140,7 @@ extern "C" {
  * The 2 vlans have been stripped by the hardware and their tci are
  * saved in mbuf->vlan_tci (inner) and mbuf->vlan_tci_outer (outer).
  * This can only happen if vlan stripping is enabled in the R

[dpdk-dev] [PATCH v2 00/17] vhost: add postcopy live-migration support

2018-10-02 Thread Maxime Coquelin

In this v2:
- Rebase on top of Nikolay message handling series. It
requires passing an extra parameter to message handlers (
the fd of the socket as set_mem_table needs to send an
intermediate reply).
- Preliminary patches to fix issues with message handling
rework not handling replies properly.
- Handle userfaultfd region registration errors properly.
- Don't build postcopy by default as userfaultd only
landed in v4.3 kernel. It gets automatically enabled with
Meson if headers are present.

With classic live-migration, the VM runs on source while its
content is being migrated to destination. When pages already
migrated to destination are dirtied by the source, they get
copied until both source and destination memory converge.
At that time, the source is stopped and destination is
started.

With postcopy live-migration, the VM is started on destination
before all the memory has been migrated. When the VM tries to
access a page that haven't been migrated yet, a pagefault is
triggered, handled by userfaultfd which pauses the thread.
A Qemu thread in charge of postcopy request the source for
the missing page. Once received and mapped, the paused thread
gets resumed.

Userfaultfd supports handling faults from a different process,
and Qemu supports postcopy with vhost-user backends since
v2.12.

One problem encountered with classic live-migration for VMs
relying on vhost-user backends is that when the traffic is
high (e.g. PVP), it happens that it never converges as
pages gets dirtied at a faster rate than they are copied
to the destination.
It is expected this problem sould be solved with using
postcopy, as rings memory and buffers will be copied once,
when destination will pagefault on them.

Note that it will certainly require a rebase to apply on top
of Nikolay's vhost-user message handling rework.

Steps to test postcopy:
1. Run DPDK's Testpmd application on source:
./install/bin/testpmd -m 512 --file-prefix=src -l 0,2 -n 4 \
  --vdev 'net_vhost0,iface=/tmp/vu-src' -- --portmask=1 -i \
  --rxq=1 --txq=1 --nb-cores=1 --eth-peer=0,52:54:00:11:22:12 \
  --no-mlockall

2. Run DPDK's Testpmd application on destination:
./install/bin/testpmd -m 512 --file-prefix=dst -l 0,2 -n 4 \
  --vdev 'net_vhost0,iface=/tmp/vu-dst,postcopy-support=1' -- --portmask=1 -i \
  --rxq=1 --txq=1 --nb-cores=1 --eth-peer=0,52:54:00:11:22:12 \
  --no-mlockall

3. Launch VM on source:
./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 3G -smp 2 -cpu host \
  -object memory-backend-file,id=mem,size=3G,mem-path=/dev/shm,share=on \
  -numa node,memdev=mem -mem-prealloc \
  -chardev socket,id=char0,path=/tmp/vu-src \
  -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
  -device virtio-net-pci,netdev=mynet1 /home/virt/rhel7.6-1-clone.qcow2 \
  -net none -vnc :0 -monitor stdio

4. Launch VM on destination:
./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 3G -smp 2 -cpu host \
  -object memory-backend-file,id=mem,size=3G,mem-path=/dev/shm,share=on \
  -numa node,memdev=mem -mem-prealloc \
  -chardev socket,id=char0,path=/tmp/vu-dst \
  -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
  -device virtio-net-pci,netdev=mynet1 /home/virt/rhel7.6-1-clone.qcow2 \
  -net none -vnc :1 -monitor stdio -incoming tcp::

5. In both testpmd prompts, start flooding the virtio-net device:
testpmd> set fwd txonly
testpmd> start

6. In destination's Qemu monitor, enable postcopy:
(qemu) migrate_set_capability postcopy-ram on

7. In source's Qemu monitor, enable postcopy and launch migration:
(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate -d tcp:0:
(qemu) migrate_start_postcopy

Maxime Coquelin (17):
  vhost: fix messages error checks
  vhost: fix return code of messages requiring replies
  vhost: fix error handling when mem table gets updated
  vhost: define postcopy protocol flag
  vhost: add number of fds to vhost-user messages and use it
  vhost: pass socket fd to message handling callbacks
  vhost: enable fds passing when sending vhost-user messages
  vhost: add config flag for postcopy feature
  vhost: introduce postcopy's advise message
  vhost: add support for postcopy's listen message
  vhost: register new regions with userfaultfd
  vhost: avoid useless VhostUserMemory copy
  vhost: send userfault range addresses back to qemu
  vhost: add support to postcopy's end request
  vhost: enable postcopy protocol feature
  vhost: add flag to enable postcopy live-migration
  net/vhost: add parameter to enable postcopy support

 config/common_linuxapp  |   1 +
 doc/guides/nics/vhost.rst   |   5 +
 doc/guides/prog_guide/vhost_lib.rst |   8 +
 drivers/net/vhost/rte_eth_vhost.c   |  13 ++
 lib/librte_vhost/meson.build|   2 +
 lib/librte_vhost/rte_vhost.h|   5 +
 lib/librte_vhost/socket.c   |  40 +++-
 lib/librte_vhost/vhost.h|   3 +
 lib/librte_vhost/vhost_user.c   | 304 +++-
 lib/librte_vhost/vhost_user.h   |  12 +-
 10 files change

[dpdk-dev] [PATCH v2 01/17] vhost: fix messages error checks

2018-10-02 Thread Maxime Coquelin

Return of message handling has now changed to an enum that can
take non-negative value that is not zero in case a reply is
needed. But the code checking the variable afterwards has not
been updated, leading to success messages handling being
treated as errors.

Fixes: 4e601952cae6 ("vhost: message handling implemented as a callback array")

Signed-off-by: Maxime Coquelin 
---
 lib/librte_vhost/vhost_user.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 7ef3fb4a4..060b41893 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1783,7 +1783,7 @@ vhost_user_msg_handler(int vid, int fd)
}
 
 skip_to_post_handle:
-   if (!ret && dev->extern_ops.post_msg_handle) {
+   if (ret != VH_RESULT_ERR && dev->extern_ops.post_msg_handle) {
uint32_t need_reply;
 
ret = (*dev->extern_ops.post_msg_handle)(
@@ -1800,10 +1800,10 @@ vhost_user_msg_handler(int vid, int fd)
vhost_user_unlock_all_queue_pairs(dev);
 
if (msg.flags & VHOST_USER_NEED_REPLY) {
-   msg.payload.u64 = !!ret;
+   msg.payload.u64 = ret == VH_RESULT_ERR;
msg.size = sizeof(msg.payload.u64);
send_vhost_reply(fd, &msg);
-   } else if (ret) {
+   } else if (ret == VH_RESULT_ERR) {
RTE_LOG(ERR, VHOST_CONFIG,
"vhost message handling failed.\n");
return -1;
-- 
2.17.1

[dpdk-dev] [PATCH v2 02/17] vhost: fix return code of messages requiring replies

2018-10-02 Thread Maxime Coquelin

VHOST_USER_GET_PROTOCOL_FEATURES, VHOST_USER_GET_VRING_BASE
and VHOST_USER_SET_LOG_BASE require replies, so their handlers
should return VH_RESULT_REPLY, not VH_RESULT_OK.

Fixes: 2cfbbb86c62a ("vhost: unify message handling function signature")

Signed-off-by: Maxime Coquelin 
---
 lib/librte_vhost/vhost_user.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 060b41893..ce0ac0098 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1161,7 +1161,7 @@ vhost_user_get_vring_base(struct virtio_net **pdev,
 
msg->size = sizeof(msg->payload.state);
 
-   return VH_RESULT_OK;
+   return VH_RESULT_REPLY;
 }
 
 /*
@@ -1218,7 +1218,7 @@ vhost_user_get_protocol_features(struct virtio_net **pdev,
msg->payload.u64 = protocol_features;
msg->size = sizeof(msg->payload.u64);
 
-   return VH_RESULT_OK;
+   return VH_RESULT_REPLY;
 }
 
 static int
@@ -1298,7 +1298,7 @@ vhost_user_set_log_base(struct virtio_net **pdev, struct 
VhostUserMsg *msg)
 
msg->size = sizeof(msg->payload.u64);
 
-   return VH_RESULT_OK;
+   return VH_RESULT_REPLY;
 }
 
 static int vhost_user_set_log_fd(struct virtio_net **pdev __rte_unused,
-- 
2.17.1

[dpdk-dev] [PATCH v2 03/17] vhost: fix error handling when mem table gets updated

2018-10-02 Thread Maxime Coquelin

When the memory table gets updated, the rings addresses need
to be translated again. If it fails, we need to exit cleanly
by unmapping memory regions.

Fixes: d5022533c20a ("vhost: retranslate vring addr when memory table changes")
Cc: sta...@dpdk.org

Signed-off-by: Maxime Coquelin 
---
 lib/librte_vhost/vhost_user.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index ce0ac0098..c669d3c0a 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -964,7 +964,8 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct 
VhostUserMsg *msg)
 
dev = translate_ring_addresses(dev, i);
if (!dev)
-   return VH_RESULT_ERR;
+   goto err_mmap;
+
 
*pdev = dev;
}
-- 
2.17.1

[dpdk-dev] [PATCH v2 04/17] vhost: define postcopy protocol flag

2018-10-02 Thread Maxime Coquelin

Signed-off-by: Dr. David Alan Gilbert 
Signed-off-by: Maxime Coquelin 
---
 lib/librte_vhost/rte_vhost.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index b02673d4a..b3cc6990d 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -66,6 +66,10 @@ extern "C" {
 #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11
 #endif
 
+#ifndef VHOST_USER_PROTOCOL_F_PAGEFAULT
+#define VHOST_USER_PROTOCOL_F_PAGEFAULT 8
+#endif
+
 /** Indicate whether protocol features negotiation is supported. */
 #ifndef VHOST_USER_F_PROTOCOL_FEATURES
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
-- 
2.17.1

[dpdk-dev] [PATCH v2 06/17] vhost: pass socket fd to message handling callbacks

2018-10-02 Thread Maxime Coquelin

This is not used for now, but will be needed for the
special handling of VHOST_USER_SET_MEM_TABLE message
once postcopy will be supported.

Signed-off-by: Maxime Coquelin 
---
 lib/librte_vhost/vhost_user.c | 71 +++
 1 file changed, 47 insertions(+), 24 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 608d2f3e4..050fc8bf9 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -138,14 +138,16 @@ vhost_backend_cleanup(struct virtio_net *dev)
  */
 static int
 vhost_user_set_owner(struct virtio_net **pdev __rte_unused,
-   struct VhostUserMsg *msg __rte_unused)
+   struct VhostUserMsg *msg __rte_unused,
+   int main_fd __rte_unused)
 {
return VH_RESULT_OK;
 }
 
 static int
 vhost_user_reset_owner(struct virtio_net **pdev,
-   struct VhostUserMsg *msg __rte_unused)
+   struct VhostUserMsg *msg __rte_unused,
+   int main_fd __rte_unused)
 {
struct virtio_net *dev = *pdev;
vhost_destroy_device_notify(dev);
@@ -159,7 +161,8 @@ vhost_user_reset_owner(struct virtio_net **pdev,
  * The features that we support are requested.
  */
 static int
-vhost_user_get_features(struct virtio_net **pdev, struct VhostUserMsg *msg)
+vhost_user_get_features(struct virtio_net **pdev, struct VhostUserMsg *msg,
+   int main_fd __rte_unused)
 {
struct virtio_net *dev = *pdev;
uint64_t features = 0;
@@ -176,7 +179,8 @@ vhost_user_get_features(struct virtio_net **pdev, struct 
VhostUserMsg *msg)
  * The queue number that we support are requested.
  */
 static int
-vhost_user_get_queue_num(struct virtio_net **pdev, struct VhostUserMsg *msg)
+vhost_user_get_queue_num(struct virtio_net **pdev, struct VhostUserMsg *msg,
+   int main_fd __rte_unused)
 {
struct virtio_net *dev = *pdev;
uint32_t queue_num = 0;
@@ -193,7 +197,8 @@ vhost_user_get_queue_num(struct virtio_net **pdev, struct 
VhostUserMsg *msg)
  * We receive the negotiated features supported by us and the virtio device.
  */
 static int
-vhost_user_set_features(struct virtio_net **pdev, struct VhostUserMsg *msg)
+vhost_user_set_features(struct virtio_net **pdev, struct VhostUserMsg *msg,
+   int main_fd __rte_unused)
 {
struct virtio_net *dev = *pdev;
uint64_t features = msg->payload.u64;
@@ -275,7 +280,8 @@ vhost_user_set_features(struct virtio_net **pdev, struct 
VhostUserMsg *msg)
  */
 static int
 vhost_user_set_vring_num(struct virtio_net **pdev,
-struct VhostUserMsg *msg)
+   struct VhostUserMsg *msg,
+   int main_fd __rte_unused)
 {
struct virtio_net *dev = *pdev;
struct vhost_virtqueue *vq = dev->virtqueue[msg->payload.state.index];
@@ -637,7 +643,8 @@ translate_ring_addresses(struct virtio_net *dev, int 
vq_index)
  * This function then converts these to our address space.
  */
 static int
-vhost_user_set_vring_addr(struct virtio_net **pdev, struct VhostUserMsg *msg)
+vhost_user_set_vring_addr(struct virtio_net **pdev, struct VhostUserMsg *msg,
+   int main_fd __rte_unused)
 {
struct virtio_net *dev = *pdev;
struct vhost_virtqueue *vq;
@@ -674,7 +681,8 @@ vhost_user_set_vring_addr(struct virtio_net **pdev, struct 
VhostUserMsg *msg)
  */
 static int
 vhost_user_set_vring_base(struct virtio_net **pdev,
- struct VhostUserMsg *msg)
+   struct VhostUserMsg *msg,
+   int main_fd __rte_unused)
 {
struct virtio_net *dev = *pdev;
dev->virtqueue[msg->payload.state.index]->last_used_idx  =
@@ -807,7 +815,8 @@ vhost_memory_changed(struct VhostUserMemory *new,
 }
 
 static int
-vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg)
+vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg,
+   int main_fd __rte_unused)
 {
struct virtio_net *dev = *pdev;
struct VhostUserMemory memory = msg->payload.memory;
@@ -1022,7 +1031,8 @@ virtio_is_ready(struct virtio_net *dev)
 }
 
 static int
-vhost_user_set_vring_call(struct virtio_net **pdev, struct VhostUserMsg *msg)
+vhost_user_set_vring_call(struct virtio_net **pdev, struct VhostUserMsg *msg,
+   int main_fd __rte_unused)
 {
struct virtio_net *dev = *pdev;
struct vhost_vring_file file;
@@ -1046,7 +1056,8 @@ vhost_user_set_vring_call(struct virtio_net **pdev, 
struct VhostUserMsg *msg)
 }
 
 static int vhost_user_set_vring_err(struct virtio_net **pdev __rte_unused,
-   struct VhostUserMsg *msg)
+   struct VhostUserMsg *msg,
+   int main_fd __rte_unused)
 {
if (!(msg->payload.u64 & VHOST_USER_VRING_NOFD_MASK))
close(msg->

[dpdk-dev] [PATCH v2 08/17] vhost: add config flag for postcopy feature

2018-10-02 Thread Maxime Coquelin

Postcopy live-migration features relies on userfaultfd,
which was only introduced in kernel v4.3.

This patch introduces a new define to allow building vhost
library on kernels not supporting userfaultfd.

With legacy build system, user has to explicitly set
CONFIG_RTE_LIBRTE_VHOST_POSTCOPY to 'y'.

With Meson build system, RTE_LIBRTE_VHOST_POSTCOPY gets
automatically defined if userfaultfd kernel header is
present.

Suggested-by: Ilya Maximets 
Signed-off-by: Maxime Coquelin 
---
 config/common_linuxapp   | 1 +
 lib/librte_vhost/meson.build | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 9c5ea9d89..dc43dcc36 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -14,6 +14,7 @@ CONFIG_RTE_LIBRTE_KNI=y
 CONFIG_RTE_LIBRTE_PMD_KNI=y
 CONFIG_RTE_LIBRTE_VHOST=y
 CONFIG_RTE_LIBRTE_VHOST_NUMA=y
+CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=y
 CONFIG_RTE_LIBRTE_IFC_PMD=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
diff --git a/lib/librte_vhost/meson.build b/lib/librte_vhost/meson.build
index 9d25b4d88..e33e6fc16 100644
--- a/lib/librte_vhost/meson.build
+++ b/lib/librte_vhost/meson.build
@@ -7,6 +7,8 @@ endif
 if has_libnuma == 1
dpdk_conf.set10('RTE_LIBRTE_VHOST_NUMA', true)
 endif
+dpdk_conf.set('RTE_LIBRTE_VHOST_POSTCOPY',
+ cc.has_header('linux/userfaultfd.h'))
 version = 4
 allow_experimental_apis = true
 cflags += '-fno-strict-aliasing'
-- 
2.17.1

[dpdk-dev] [PATCH v2 09/17] vhost: introduce postcopy's advise message

2018-10-02 Thread Maxime Coquelin

This patch opens a userfaultfd and sends it back to Qemu's
VHOST_USER_POSTCOPY_ADVISE request.

Signed-off-by: Dr. David Alan Gilbert 
Signed-off-by: Maxime Coquelin 
---
 lib/librte_vhost/vhost.h  |  2 ++
 lib/librte_vhost/vhost_user.c | 44 +++
 lib/librte_vhost/vhost_user.h |  3 ++-
 3 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 25ffd7614..21722d8a8 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -363,6 +363,8 @@ struct virtio_net {
int slave_req_fd;
rte_spinlock_t  slave_req_lock;
 
+   int postcopy_ufd;
+
/*
 * Device id to identify a specific backend device.
 * It's set to -1 for the default software implementation.
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 436ab7bf5..71721edc7 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -24,13 +24,19 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #ifdef RTE_LIBRTE_VHOST_NUMA
 #include 
 #endif
+#ifdef RTE_LIBRTE_VHOST_POSTCOPY
+#include 
+#endif
 
 #include 
 #include 
@@ -69,6 +75,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
[VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
[VHOST_USER_CRYPTO_CREATE_SESS] = "VHOST_USER_CRYPTO_CREATE_SESS",
[VHOST_USER_CRYPTO_CLOSE_SESS] = "VHOST_USER_CRYPTO_CLOSE_SESS",
+   [VHOST_USER_POSTCOPY_ADVISE]  = "VHOST_USER_POSTCOPY_ADVISE",
 };
 
 /* The possible results of a message handling function */
@@ -1505,6 +1512,42 @@ vhost_user_iotlb_msg(struct virtio_net **pdev, struct 
VhostUserMsg *msg,
return VH_RESULT_OK;
 }
 
+static int
+vhost_user_set_postcopy_advise(struct virtio_net **pdev,
+   struct VhostUserMsg *msg,
+   int main_fd __rte_unused)
+{
+   struct virtio_net *dev = *pdev;
+#ifdef RTE_LIBRTE_VHOST_POSTCOPY
+   struct uffdio_api api_struct;
+
+   dev->postcopy_ufd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
+
+   if (dev->postcopy_ufd == -1) {
+   RTE_LOG(ERR, VHOST_CONFIG, "Userfaultfd not available: %s\n",
+   strerror(errno));
+   return VH_RESULT_ERR;
+   }
+   api_struct.api = UFFD_API;
+   api_struct.features = 0;
+   if (ioctl(dev->postcopy_ufd, UFFDIO_API, &api_struct)) {
+   RTE_LOG(ERR, VHOST_CONFIG, "UFFDIO_API ioctl failure: %s\n",
+   strerror(errno));
+   close(dev->postcopy_ufd);
+   return VH_RESULT_ERR;
+   }
+   msg->fds[0] = dev->postcopy_ufd;
+   msg->fd_num = 1;
+
+   return VH_RESULT_REPLY;
+#else
+   dev->postcopy_ufd = -1;
+   msg->fd_num = 0;
+
+   return VH_RESULT_ERR;
+#endif
+}
+
 typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
struct VhostUserMsg *msg,
int main_fd);
@@ -1532,6 +1575,7 @@ static vhost_message_handler_t 
vhost_message_handlers[VHOST_USER_MAX] = {
[VHOST_USER_NET_SET_MTU] = vhost_user_net_set_mtu,
[VHOST_USER_SET_SLAVE_REQ_FD] = vhost_user_set_req_fd,
[VHOST_USER_IOTLB_MSG] = vhost_user_iotlb_msg,
+   [VHOST_USER_POSTCOPY_ADVISE] = vhost_user_set_postcopy_advise,
 };
 
 
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index dd0262f8f..2030b40a5 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -50,7 +50,8 @@ typedef enum VhostUserRequest {
VHOST_USER_IOTLB_MSG = 22,
VHOST_USER_CRYPTO_CREATE_SESS = 26,
VHOST_USER_CRYPTO_CLOSE_SESS = 27,
-   VHOST_USER_MAX = 28
+   VHOST_USER_POSTCOPY_ADVISE = 28,
+   VHOST_USER_MAX = 29
 } VhostUserRequest;
 
 typedef enum VhostUserSlaveRequest {
-- 
2.17.1

[dpdk-dev] [PATCH v2 05/17] vhost: add number of fds to vhost-user messages and use it

2018-10-02 Thread Maxime Coquelin

As soons as some anciliarry datai (fds) are received, it is copied
without checking its length.

This patch adds adds the number of fds received to the message,
which is set in read_vhost_message().

This is preliminary work to support sending fds to Qemu.

Signed-off-by: Dr. David Alan Gilbert 
Signed-off-by: Maxime Coquelin 
---
 lib/librte_vhost/socket.c | 21 -
 lib/librte_vhost/vhost_user.c |  2 +-
 lib/librte_vhost/vhost_user.h |  4 +++-
 3 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index d63031747..c04d3d305 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -94,18 +94,23 @@ static struct vhost_user vhost_user = {
.mutex = PTHREAD_MUTEX_INITIALIZER,
 };
 
-/* return bytes# of read on success or negative val on failure. */
+/*
+ * return bytes# of read on success or negative val on failure. Update fdnum
+ * with number of fds read.
+ */
 int
-read_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num)
+read_fd_message(int sockfd, char *buf, int buflen, int *fds, int max_fds,
+   int *fd_num)
 {
struct iovec iov;
struct msghdr msgh;
-   size_t fdsize = fd_num * sizeof(int);
-   char control[CMSG_SPACE(fdsize)];
+   char control[CMSG_SPACE(max_fds * sizeof(int))];
struct cmsghdr *cmsg;
int got_fds = 0;
int ret;
 
+   *fd_num = 0;
+
memset(&msgh, 0, sizeof(msgh));
iov.iov_base = buf;
iov.iov_len  = buflen;
@@ -131,13 +136,19 @@ read_fd_message(int sockfd, char *buf, int buflen, int 
*fds, int fd_num)
if ((cmsg->cmsg_level == SOL_SOCKET) &&
(cmsg->cmsg_type == SCM_RIGHTS)) {
got_fds = (cmsg->cmsg_len - CMSG_LEN(0)) / sizeof(int);
+   if (got_fds > max_fds) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "Received msg contains more fds 
than supported\n");
+   return -1;
+   }
+   *fd_num = got_fds;
memcpy(fds, CMSG_DATA(cmsg), got_fds * sizeof(int));
break;
}
}
 
/* Clear out unused file descriptors */
-   while (got_fds < fd_num)
+   while (got_fds < max_fds)
fds[got_fds++] = -1;
 
return ret;
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index c669d3c0a..608d2f3e4 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1514,7 +1514,7 @@ read_vhost_message(int sockfd, struct VhostUserMsg *msg)
int ret;
 
ret = read_fd_message(sockfd, (char *)msg, VHOST_USER_HDR_SIZE,
-   msg->fds, VHOST_MEMORY_MAX_NREGIONS);
+   msg->fds, VHOST_MEMORY_MAX_NREGIONS, &msg->fd_num);
if (ret <= 0)
return ret;
 
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 42166adf2..dd0262f8f 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -132,6 +132,7 @@ typedef struct VhostUserMsg {
VhostUserVringArea area;
} payload;
int fds[VHOST_MEMORY_MAX_NREGIONS];
+   int fd_num;
 } __attribute((packed)) VhostUserMsg;
 
 #define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
@@ -146,7 +147,8 @@ int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t 
iova, uint8_t perm);
 int vhost_user_host_notifier_ctrl(int vid, bool enable);
 
 /* socket.c */
-int read_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num);
+int read_fd_message(int sockfd, char *buf, int buflen, int *fds, int max_fds,
+   int *fd_num);
 int send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num);
 
 #endif
-- 
2.17.1

[dpdk-dev] [PATCH v2 07/17] vhost: enable fds passing when sending vhost-user messages

2018-10-02 Thread Maxime Coquelin

Passing userfault fds to Qemu will be required for postcopy
live-migration feature.

Signed-off-by: Dr. David Alan Gilbert 
Signed-off-by: Maxime Coquelin 
---
 lib/librte_vhost/vhost_user.c | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 050fc8bf9..436ab7bf5 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -171,6 +171,7 @@ vhost_user_get_features(struct virtio_net **pdev, struct 
VhostUserMsg *msg,
 
msg->payload.u64 = features;
msg->size = sizeof(msg->payload.u64);
+   msg->fd_num = 0;
 
return VH_RESULT_REPLY;
 }
@@ -189,6 +190,7 @@ vhost_user_get_queue_num(struct virtio_net **pdev, struct 
VhostUserMsg *msg,
 
msg->payload.u64 = (uint64_t)queue_num;
msg->size = sizeof(msg->payload.u64);
+   msg->fd_num = 0;
 
return VH_RESULT_REPLY;
 }
@@ -1174,6 +1176,7 @@ vhost_user_get_vring_base(struct virtio_net **pdev,
vq->batch_copy_elems = NULL;
 
msg->size = sizeof(msg->payload.state);
+   msg->fd_num = 0;
 
return VH_RESULT_REPLY;
 }
@@ -1233,6 +1236,7 @@ vhost_user_get_protocol_features(struct virtio_net **pdev,
 
msg->payload.u64 = protocol_features;
msg->size = sizeof(msg->payload.u64);
+   msg->fd_num = 0;
 
return VH_RESULT_REPLY;
 }
@@ -1315,6 +1319,7 @@ vhost_user_set_log_base(struct virtio_net **pdev, struct 
VhostUserMsg *msg,
dev->log_size = size;
 
msg->size = sizeof(msg->payload.u64);
+   msg->fd_num = 0;
 
return VH_RESULT_REPLY;
 }
@@ -1561,13 +1566,13 @@ read_vhost_message(int sockfd, struct VhostUserMsg *msg)
 }
 
 static int
-send_vhost_message(int sockfd, struct VhostUserMsg *msg, int *fds, int fd_num)
+send_vhost_message(int sockfd, struct VhostUserMsg *msg)
 {
if (!msg)
return 0;
 
return send_fd_message(sockfd, (char *)msg,
-   VHOST_USER_HDR_SIZE + msg->size, fds, fd_num);
+   VHOST_USER_HDR_SIZE + msg->size, msg->fds, msg->fd_num);
 }
 
 static int
@@ -1581,19 +1586,18 @@ send_vhost_reply(int sockfd, struct VhostUserMsg *msg)
msg->flags |= VHOST_USER_VERSION;
msg->flags |= VHOST_USER_REPLY_MASK;
 
-   return send_vhost_message(sockfd, msg, NULL, 0);
+   return send_vhost_message(sockfd, msg);
 }
 
 static int
-send_vhost_slave_message(struct virtio_net *dev, struct VhostUserMsg *msg,
-int *fds, int fd_num)
+send_vhost_slave_message(struct virtio_net *dev, struct VhostUserMsg *msg)
 {
int ret;
 
if (msg->flags & VHOST_USER_NEED_REPLY)
rte_spinlock_lock(&dev->slave_req_lock);
 
-   ret = send_vhost_message(dev->slave_req_fd, msg, fds, fd_num);
+   ret = send_vhost_message(dev->slave_req_fd, msg);
if (ret < 0 && (msg->flags & VHOST_USER_NEED_REPLY))
rte_spinlock_unlock(&dev->slave_req_lock);
 
@@ -1826,6 +1830,7 @@ vhost_user_msg_handler(int vid, int fd)
if (msg.flags & VHOST_USER_NEED_REPLY) {
msg.payload.u64 = ret == VH_RESULT_ERR;
msg.size = sizeof(msg.payload.u64);
+   msg.fd_num = 0;
send_vhost_reply(fd, &msg);
} else if (ret == VH_RESULT_ERR) {
RTE_LOG(ERR, VHOST_CONFIG,
@@ -1909,7 +1914,7 @@ vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t 
iova, uint8_t perm)
},
};
 
-   ret = send_vhost_message(dev->slave_req_fd, &msg, NULL, 0);
+   ret = send_vhost_message(dev->slave_req_fd, &msg);
if (ret < 0) {
RTE_LOG(ERR, VHOST_CONFIG,
"Failed to send IOTLB miss message (%d)\n",
@@ -1925,8 +1930,6 @@ static int 
vhost_user_slave_set_vring_host_notifier(struct virtio_net *dev,
uint64_t offset,
uint64_t size)
 {
-   int *fdp = NULL;
-   size_t fd_num = 0;
int ret;
struct VhostUserMsg msg = {
.request.slave = VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG,
@@ -1942,11 +1945,11 @@ static int 
vhost_user_slave_set_vring_host_notifier(struct virtio_net *dev,
if (fd < 0)
msg.payload.area.u64 |= VHOST_USER_VRING_NOFD_MASK;
else {
-   fdp = &fd;
-   fd_num = 1;
+   msg.fds[0] = fd;
+   msg.fd_num = 1;
}
 
-   ret = send_vhost_slave_message(dev, &msg, fdp, fd_num);
+   ret = send_vhost_slave_message(dev, &msg);
if (ret < 0) {
RTE_LOG(ERR, VHOST_CONFIG,
"Failed to set host notifier (%d)\n", ret);
-- 
2.17.1

[dpdk-dev] [PATCH v2 11/17] vhost: register new regions with userfaultfd

2018-10-02 Thread Maxime Coquelin

Signed-off-by: Dr. David Alan Gilbert 
Signed-off-by: Maxime Coquelin 
---
 lib/librte_vhost/vhost_user.c | 33 -
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index bd468ca12..2f681d291 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -968,6 +968,32 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct 
VhostUserMsg *msg,
mmap_size,
alignment,
mmap_offset);
+
+   if (dev->postcopy_listening) {
+#ifdef RTE_LIBRTE_VHOST_POSTCOPY
+   struct uffdio_register reg_struct;
+
+   reg_struct.range.start = (uint64_t)(uintptr_t)mmap_addr;
+   reg_struct.range.len = mmap_size;
+   reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
+
+   if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER,
+   ®_struct)) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "Failed to register ufd for 
region %d: (ufd = %d) %s\n",
+   i, dev->postcopy_ufd,
+   strerror(errno));
+   goto err_ufd;
+   }
+   RTE_LOG(INFO, VHOST_CONFIG,
+   "\t userfaultfd registered for range : 
%llx - %llx\n",
+   reg_struct.range.start,
+   reg_struct.range.start +
+   reg_struct.range.len - 1);
+#else
+   goto err_ufd;
+#endif
+   }
}
 
for (i = 0; i < dev->nr_vring; i++) {
@@ -983,7 +1009,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct 
VhostUserMsg *msg,
 
dev = translate_ring_addresses(dev, i);
if (!dev)
-   goto err_mmap;
+   goto err_ufd;
 
 
*pdev = dev;
@@ -994,6 +1020,11 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct 
VhostUserMsg *msg,
 
return VH_RESULT_OK;
 
+err_ufd:
+   if (dev->postcopy_ufd >= 0) {
+   close(dev->postcopy_ufd);
+   dev->postcopy_ufd = -1;
+   }
 err_mmap:
free_mem_region(dev);
rte_free(dev->mem);
-- 
2.17.1

[dpdk-dev] [PATCH v2 10/17] vhost: add support for postcopy's listen message

2018-10-02 Thread Maxime Coquelin

Signed-off-by: Dr. David Alan Gilbert 
Signed-off-by: Maxime Coquelin 
---
 lib/librte_vhost/vhost.h  |  1 +
 lib/librte_vhost/vhost_user.c | 19 +++
 lib/librte_vhost/vhost_user.h |  4 +++-
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 21722d8a8..9453cb28d 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -364,6 +364,7 @@ struct virtio_net {
rte_spinlock_t  slave_req_lock;
 
int postcopy_ufd;
+   int postcopy_listening;
 
/*
 * Device id to identify a specific backend device.
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 71721edc7..bd468ca12 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -76,6 +76,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
[VHOST_USER_CRYPTO_CREATE_SESS] = "VHOST_USER_CRYPTO_CREATE_SESS",
[VHOST_USER_CRYPTO_CLOSE_SESS] = "VHOST_USER_CRYPTO_CLOSE_SESS",
[VHOST_USER_POSTCOPY_ADVISE]  = "VHOST_USER_POSTCOPY_ADVISE",
+   [VHOST_USER_POSTCOPY_LISTEN]  = "VHOST_USER_POSTCOPY_LISTEN",
 };
 
 /* The possible results of a message handling function */
@@ -1548,6 +1549,23 @@ vhost_user_set_postcopy_advise(struct virtio_net **pdev,
 #endif
 }
 
+static int
+vhost_user_set_postcopy_listen(struct virtio_net **pdev,
+   struct VhostUserMsg *msg __rte_unused,
+   int main_fd __rte_unused)
+{
+   struct virtio_net *dev = *pdev;
+
+   if (dev->mem && dev->mem->nregions) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "Regions already registered at 
postcopy-listen\n");
+   return VH_RESULT_ERR;
+   }
+   dev->postcopy_listening = 1;
+
+   return VH_RESULT_OK;
+}
+
 typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
struct VhostUserMsg *msg,
int main_fd);
@@ -1576,6 +1594,7 @@ static vhost_message_handler_t 
vhost_message_handlers[VHOST_USER_MAX] = {
[VHOST_USER_SET_SLAVE_REQ_FD] = vhost_user_set_req_fd,
[VHOST_USER_IOTLB_MSG] = vhost_user_iotlb_msg,
[VHOST_USER_POSTCOPY_ADVISE] = vhost_user_set_postcopy_advise,
+   [VHOST_USER_POSTCOPY_LISTEN] = vhost_user_set_postcopy_listen,
 };
 
 
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 2030b40a5..73b1fe2b9 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -51,7 +51,9 @@ typedef enum VhostUserRequest {
VHOST_USER_CRYPTO_CREATE_SESS = 26,
VHOST_USER_CRYPTO_CLOSE_SESS = 27,
VHOST_USER_POSTCOPY_ADVISE = 28,
-   VHOST_USER_MAX = 29
+   VHOST_USER_POSTCOPY_LISTEN = 29,
+   VHOST_USER_POSTCOPY_END = 30,
+   VHOST_USER_MAX = 31
 } VhostUserRequest;
 
 typedef enum VhostUserSlaveRequest {
-- 
2.17.1

[dpdk-dev] [PATCH v2 12/17] vhost: avoid useless VhostUserMemory copy

2018-10-02 Thread Maxime Coquelin

The VHOST_USER_SET_MEM_TABLE payload is copied when handled,
whereas it could directly be referenced.

This is not very important, but next, we'll need to update the
payload and send it back to Qemu.

Signed-off-by: Dr. David Alan Gilbert 
Signed-off-by: Maxime Coquelin 
---
 lib/librte_vhost/vhost_user.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 2f681d291..515d3c61c 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -829,7 +829,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct 
VhostUserMsg *msg,
int main_fd __rte_unused)
 {
struct virtio_net *dev = *pdev;
-   struct VhostUserMemory memory = msg->payload.memory;
+   struct VhostUserMemory *memory = &msg->payload.memory;
struct rte_vhost_mem_region *reg;
void *mmap_addr;
uint64_t mmap_size;
@@ -839,17 +839,17 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct 
VhostUserMsg *msg,
int populate;
int fd;
 
-   if (memory.nregions > VHOST_MEMORY_MAX_NREGIONS) {
+   if (memory->nregions > VHOST_MEMORY_MAX_NREGIONS) {
RTE_LOG(ERR, VHOST_CONFIG,
-   "too many memory regions (%u)\n", memory.nregions);
+   "too many memory regions (%u)\n", memory->nregions);
return VH_RESULT_ERR;
}
 
-   if (dev->mem && !vhost_memory_changed(&memory, dev->mem)) {
+   if (dev->mem && !vhost_memory_changed(memory, dev->mem)) {
RTE_LOG(INFO, VHOST_CONFIG,
"(%d) memory regions not changed\n", dev->vid);
 
-   for (i = 0; i < memory.nregions; i++)
+   for (i = 0; i < memory->nregions; i++)
close(msg->fds[i]);
 
return VH_RESULT_OK;
@@ -881,25 +881,25 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct 
VhostUserMsg *msg,
}
 
dev->mem = rte_zmalloc("vhost-mem-table", sizeof(struct 
rte_vhost_memory) +
-   sizeof(struct rte_vhost_mem_region) * memory.nregions, 0);
+   sizeof(struct rte_vhost_mem_region) * memory->nregions, 0);
if (dev->mem == NULL) {
RTE_LOG(ERR, VHOST_CONFIG,
"(%d) failed to allocate memory for dev->mem\n",
dev->vid);
return VH_RESULT_ERR;
}
-   dev->mem->nregions = memory.nregions;
+   dev->mem->nregions = memory->nregions;
 
-   for (i = 0; i < memory.nregions; i++) {
+   for (i = 0; i < memory->nregions; i++) {
fd  = msg->fds[i];
reg = &dev->mem->regions[i];
 
-   reg->guest_phys_addr = memory.regions[i].guest_phys_addr;
-   reg->guest_user_addr = memory.regions[i].userspace_addr;
-   reg->size= memory.regions[i].memory_size;
+   reg->guest_phys_addr = memory->regions[i].guest_phys_addr;
+   reg->guest_user_addr = memory->regions[i].userspace_addr;
+   reg->size= memory->regions[i].memory_size;
reg->fd  = fd;
 
-   mmap_offset = memory.regions[i].mmap_offset;
+   mmap_offset = memory->regions[i].mmap_offset;
 
/* Check for memory_size + mmap_offset overflow */
if (mmap_offset >= -reg->size) {
-- 
2.17.1

[dpdk-dev] [PATCH v2 15/17] vhost: enable postcopy protocol feature

2018-10-02 Thread Maxime Coquelin

Enable postcopy protocol feature except if dequeue
zero-copy is enabled. In this case, guest memory requires
to be populated, which is not compatible with userfaultfd.

Signed-off-by: Dr. David Alan Gilbert 
Signed-off-by: Maxime Coquelin 
---
 lib/librte_vhost/vhost_user.c | 7 +++
 lib/librte_vhost/vhost_user.h | 3 ++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index ee7337ac8..9d08f4af0 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1317,6 +1317,13 @@ vhost_user_get_protocol_features(struct virtio_net 
**pdev,
if (!(features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)))
protocol_features &= ~(1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK);
 
+   /*
+* If dequeue zerocopy is enabled, guest memory requires to be
+* populated, which is not compatible with postcopy.
+*/
+   if (dev->dequeue_zero_copy)
+   protocol_features &= ~(1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT);
+
msg->payload.u64 = protocol_features;
msg->size = sizeof(msg->payload.u64);
msg->fd_num = 0;
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 73b1fe2b9..dc97be843 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -22,7 +22,8 @@
 (1ULL << 
VHOST_USER_PROTOCOL_F_SLAVE_REQ) | \
 (1ULL << 
VHOST_USER_PROTOCOL_F_CRYPTO_SESSION) | \
 (1ULL << 
VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD) | \
-(1ULL << 
VHOST_USER_PROTOCOL_F_HOST_NOTIFIER))
+(1ULL << 
VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) | \
+(1ULL << 
VHOST_USER_PROTOCOL_F_PAGEFAULT))
 
 typedef enum VhostUserRequest {
VHOST_USER_NONE = 0,
-- 
2.17.1

[dpdk-dev] [PATCH v2 13/17] vhost: send userfault range addresses back to qemu

2018-10-02 Thread Maxime Coquelin

Signed-off-by: Dr. David Alan Gilbert 
Signed-off-by: Maxime Coquelin 
---
 lib/librte_vhost/vhost_user.c | 49 ---
 1 file changed, 46 insertions(+), 3 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 515d3c61c..b207de6e0 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -89,6 +89,11 @@ enum vh_result {
VH_RESULT_REPLY =  1,
 };
 
+static int
+send_vhost_reply(int sockfd, struct VhostUserMsg *msg);
+static int
+read_vhost_message(int sockfd, struct VhostUserMsg *msg);
+
 static uint64_t
 get_blk_size(int fd)
 {
@@ -826,7 +831,7 @@ vhost_memory_changed(struct VhostUserMemory *new,
 
 static int
 vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg,
-   int main_fd __rte_unused)
+   int main_fd)
 {
struct virtio_net *dev = *pdev;
struct VhostUserMemory *memory = &msg->payload.memory;
@@ -970,11 +975,49 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct 
VhostUserMsg *msg,
mmap_offset);
 
if (dev->postcopy_listening) {
+   /*
+* We haven't a better way right now than sharing
+* DPDK's virtual address with Qemu, so that Qemu can
+* retreive the region offset when handling userfaults.
+*/
+   memory->regions[i].userspace_addr =
+   reg->host_user_addr;
+   }
+   }
+   if (dev->postcopy_listening) {
+   /* Send the addresses back to qemu */
+   msg->fd_num = 0;
+   send_vhost_reply(main_fd, msg);
+
+   /* Wait for qemu to acknolwedge it's got the addresses
+* we've got to wait before we're allowed to generate faults.
+*/
+   VhostUserMsg ack_msg;
+   if (read_vhost_message(main_fd, &ack_msg) <= 0) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "Failed to read qemu ack on postcopy 
set-mem-table\n");
+   goto err_mmap;
+   }
+   if (ack_msg.request.master != VHOST_USER_SET_MEM_TABLE) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "Bad qemu ack on postcopy set-mem-table 
(%d)\n",
+   ack_msg.request.master);
+   goto err_mmap;
+   }
+
+   /* Now userfault register and we can use the memory */
+   for (i = 0; i < memory->nregions; i++) {
 #ifdef RTE_LIBRTE_VHOST_POSTCOPY
+   reg = &dev->mem->regions[i];
struct uffdio_register reg_struct;
 
-   reg_struct.range.start = (uint64_t)(uintptr_t)mmap_addr;
-   reg_struct.range.len = mmap_size;
+   /*
+* Let's register all the mmap'ed area to ensure
+* alignement on page boundary.
+*/
+   reg_struct.range.start =
+   (uint64_t)(uintptr_t)reg->mmap_addr;
+   reg_struct.range.len = reg->mmap_size;
reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
 
if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER,
-- 
2.17.1

[dpdk-dev] [PATCH v2 14/17] vhost: add support to postcopy's end request

2018-10-02 Thread Maxime Coquelin

The master sends this message before stopping handling
userfaults, so that the backend closes the userfaultfd.

The master waits for the slave to acknowledge the request
with an empty 64bits payload for synchronization purpose.

Signed-off-by: Dr. David Alan Gilbert 
Signed-off-by: Maxime Coquelin 
---
 lib/librte_vhost/vhost_user.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index b207de6e0..ee7337ac8 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -77,6 +77,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
[VHOST_USER_CRYPTO_CLOSE_SESS] = "VHOST_USER_CRYPTO_CLOSE_SESS",
[VHOST_USER_POSTCOPY_ADVISE]  = "VHOST_USER_POSTCOPY_ADVISE",
[VHOST_USER_POSTCOPY_LISTEN]  = "VHOST_USER_POSTCOPY_LISTEN",
+   [VHOST_USER_POSTCOPY_END]  = "VHOST_USER_POSTCOPY_END",
 };
 
 /* The possible results of a message handling function */
@@ -1640,6 +1641,25 @@ vhost_user_set_postcopy_listen(struct virtio_net **pdev,
return VH_RESULT_OK;
 }
 
+static int
+vhost_user_postcopy_end(struct virtio_net **pdev, struct VhostUserMsg *msg,
+   int main_fd __rte_unused)
+{
+   struct virtio_net *dev = *pdev;
+
+   dev->postcopy_listening = 0;
+   if (dev->postcopy_ufd >= 0) {
+   close(dev->postcopy_ufd);
+   dev->postcopy_ufd = -1;
+   }
+
+   msg->payload.u64 = 0;
+   msg->size = sizeof(msg->payload.u64);
+   msg->fd_num = 0;
+
+   return 0;
+}
+
 typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
struct VhostUserMsg *msg,
int main_fd);
@@ -1669,6 +1689,7 @@ static vhost_message_handler_t 
vhost_message_handlers[VHOST_USER_MAX] = {
[VHOST_USER_IOTLB_MSG] = vhost_user_iotlb_msg,
[VHOST_USER_POSTCOPY_ADVISE] = vhost_user_set_postcopy_advise,
[VHOST_USER_POSTCOPY_LISTEN] = vhost_user_set_postcopy_listen,
+   [VHOST_USER_POSTCOPY_END] = vhost_user_postcopy_end,
 };
 
 
-- 
2.17.1

[dpdk-dev] [PATCH v2 17/17] net/vhost: add parameter to enable postcopy support

2018-10-02 Thread Maxime Coquelin

Introduce a new postcopy-support parameter to Vhost PMD that
passes the RTE_VHOST_USER_POSTCOPY_SUPPORT flag at vhost
device register time.

Flag should only be set if application does not prefault guest
memory using, for example, mlockall() syscall.

Default value is 0, meaning that postcopy support is disabled
unless specified explicitly.

Example to enable postcopy support for a given device:

--vdev 'net_vhost0,iface=/tmp/vhost-user1,postcopy-support=1'

Signed-off-by: Maxime Coquelin 
---
 doc/guides/nics/vhost.rst |  5 +
 drivers/net/vhost/rte_eth_vhost.c | 13 +
 2 files changed, 18 insertions(+)

diff --git a/doc/guides/nics/vhost.rst b/doc/guides/nics/vhost.rst
index 4f7ae8990..23f2e87aa 100644
--- a/doc/guides/nics/vhost.rst
+++ b/doc/guides/nics/vhost.rst
@@ -71,6 +71,11 @@ The user can specify below arguments in `--vdev` option.
 It is used to enable iommu support in vhost library.
 (Default: 0 (disabled))
 
+#.  ``postcopy-support``:
+
+It is used to enable postcopy live-migration support in vhost library.
+(Default: 0 (disabled))
+
 Vhost PMD event handling
 
 
diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index aa6052221..1330f06ba 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -30,6 +30,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
 #define ETH_VHOST_CLIENT_ARG   "client"
 #define ETH_VHOST_DEQUEUE_ZERO_COPY"dequeue-zero-copy"
 #define ETH_VHOST_IOMMU_SUPPORT"iommu-support"
+#define ETH_VHOST_POSTCOPY_SUPPORT "postcopy-support"
 #define VHOST_MAX_PKT_BURST 32
 
 static const char *valid_arguments[] = {
@@ -38,6 +39,7 @@ static const char *valid_arguments[] = {
ETH_VHOST_CLIENT_ARG,
ETH_VHOST_DEQUEUE_ZERO_COPY,
ETH_VHOST_IOMMU_SUPPORT,
+   ETH_VHOST_POSTCOPY_SUPPORT,
NULL
 };
 
@@ -1339,6 +1341,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev)
int client_mode = 0;
int dequeue_zero_copy = 0;
int iommu_support = 0;
+   int postcopy_support = 0;
struct rte_eth_dev *eth_dev;
const char *name = rte_vdev_device_name(dev);
 
@@ -1411,6 +1414,16 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev)
flags |= RTE_VHOST_USER_IOMMU_SUPPORT;
}
 
+   if (rte_kvargs_count(kvlist, ETH_VHOST_POSTCOPY_SUPPORT) == 1) {
+   ret = rte_kvargs_process(kvlist, ETH_VHOST_POSTCOPY_SUPPORT,
+&open_int, &postcopy_support);
+   if (ret < 0)
+   goto out_free;
+
+   if (postcopy_support)
+   flags |= RTE_VHOST_USER_POSTCOPY_SUPPORT;
+   }
+
if (dev->device.numa_node == SOCKET_ID_ANY)
dev->device.numa_node = rte_socket_id();
 
-- 
2.17.1

[dpdk-dev] [PATCH v2 16/17] vhost: add flag to enable postcopy live-migration

2018-10-02 Thread Maxime Coquelin

Postcopy live-migration feature require the application to
not populate the guest memory. As the vhost library cannot
prevent the application to that (e.g. preventing the
application to call mlockall()), the feature is disabled by
default.

The application should only enable the feature if it does not
force the guest memory to be populated.

In case the user passes the RTE_VHOST_USER_POSTCOPY_SUPPORT
flag at registration but the feature was not compiled,
registration fails.

Signed-off-by: Maxime Coquelin 
---
 doc/guides/prog_guide/vhost_lib.rst |  8 
 lib/librte_vhost/rte_vhost.h|  1 +
 lib/librte_vhost/socket.c   | 19 +--
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst 
b/doc/guides/prog_guide/vhost_lib.rst
index 77af4d775..c77df338f 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -106,6 +106,14 @@ The following is an overview of some key Vhost API 
functions:
 Enabling this flag with these Qemu version results in Qemu being blocked
 when multiple queue pairs are declared.
 
+  - ``RTE_VHOST_USER_POSTCOPY_SUPPORT``
+
+Postcopy live-migration support will be enabled when this flag is set.
+It is disabled by default.
+
+Enabling this flag should only be done when the calling application does
+not pre-fault the guest shared memory, otherwise migration would fail.
+
 * ``rte_vhost_driver_set_features(path, features)``
 
   This function sets the feature bits the vhost-user driver supports. The
diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index b3cc6990d..b26afbffa 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -28,6 +28,7 @@ extern "C" {
 #define RTE_VHOST_USER_NO_RECONNECT(1ULL << 1)
 #define RTE_VHOST_USER_DEQUEUE_ZERO_COPY   (1ULL << 2)
 #define RTE_VHOST_USER_IOMMU_SUPPORT   (1ULL << 3)
+#define RTE_VHOST_USER_POSTCOPY_SUPPORT(1ULL << 4)
 
 /** Protocol features. */
 #ifndef VHOST_USER_PROTOCOL_F_MQ
diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index c04d3d305..3df303be8 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -51,6 +51,8 @@ struct vhost_user_socket {
uint64_t supported_features;
uint64_t features;
 
+   uint64_t protocol_features;
+
/*
 * Device id to identify a specific backend device.
 * It's set to -1 for the default software implementation.
@@ -731,7 +733,7 @@ rte_vhost_driver_get_protocol_features(const char *path,
did = vsocket->vdpa_dev_id;
vdpa_dev = rte_vdpa_get_device(did);
if (!vdpa_dev || !vdpa_dev->ops->get_protocol_features) {
-   *protocol_features = VHOST_USER_PROTOCOL_FEATURES;
+   *protocol_features = vsocket->protocol_features;
goto unlock_exit;
}
 
@@ -744,7 +746,7 @@ rte_vhost_driver_get_protocol_features(const char *path,
goto unlock_exit;
}
 
-   *protocol_features = VHOST_USER_PROTOCOL_FEATURES
+   *protocol_features = vsocket->protocol_features
& vdpa_protocol_features;
 
 unlock_exit:
@@ -863,6 +865,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
vsocket->use_builtin_virtio_net = true;
vsocket->supported_features = VIRTIO_NET_SUPPORTED_FEATURES;
vsocket->features   = VIRTIO_NET_SUPPORTED_FEATURES;
+   vsocket->protocol_features  = VHOST_USER_PROTOCOL_FEATURES;
 
/* Dequeue zero copy can't assure descriptors returned in order */
if (vsocket->dequeue_zero_copy) {
@@ -875,6 +878,18 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
vsocket->features &= ~(1ULL << VIRTIO_F_IOMMU_PLATFORM);
}
 
+   if (!(flags & RTE_VHOST_USER_POSTCOPY_SUPPORT)) {
+   vsocket->protocol_features &=
+   ~(1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT);
+   } else {
+#ifndef RTE_LIBRTE_VHOST_POSTCOPY
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "Postcopy requested but not compiled\n");
+   ret = -1;
+   goto out_mutex;
+#endif
+   }
+
if ((flags & RTE_VHOST_USER_CLIENT) != 0) {
vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT);
if (vsocket->reconnect && reconn_tid == 0) {
-- 
2.17.1

Re: [dpdk-dev] [PATCH] mbuf: clarify QINQ flag usage

2018-10-02 Thread Andrew Rybchenko


On 10/2/18 1:17 PM, Ferruh Yigit wrote:

Update implementation that when PKT_RX_QINQ_STRIPPED mbuf ol_flags
set by PMD, PKT_RX_QINQ, PKT_RX_VLAN_STRIPPED & PKT_RX_VLAN
should be also set.

Clarify mbuf documentations that when PKT_RX_QINQ set PKT_RX_VLAN also
should be set.

So that appllication can rely on PKT_RX_QINQ flag to access both
mbuf.vlan_tci & mbuf.vlan_tci_outer

Signed-off-by: Ferruh Yigit 
---
Cc: Hyong Youb Kim 
Cc: John Daley 
---
  app/test-pmd/rxonly.c| 2 +-
  doc/guides/nics/features.rst | 7 ---
  drivers/net/i40e/i40e_rxtx.c | 3 ++-
  lib/librte_mbuf/rte_mbuf.c   | 1 +
  lib/librte_mbuf/rte_mbuf.h   | 5 +++--
  5 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
index a93d80612..08a5fc2cf 100644
--- a/app/test-pmd/rxonly.c
+++ b/app/test-pmd/rxonly.c
@@ -132,7 +132,7 @@ pkt_burst_receive(struct fwd_stream *fs)
printf(" - timestamp %"PRIu64" ", mb->timestamp);
if (ol_flags & PKT_RX_VLAN_STRIPPED)


It looks like it should be PKT_RX_VLAN above.


printf(" - VLAN tci=0x%x", mb->vlan_tci);
-   if (ol_flags & PKT_RX_QINQ_STRIPPED)
+   if (ol_flags & PKT_RX_QINQ)
printf(" - QinQ VLAN tci=0x%x, VLAN tci outer=0x%x",
mb->vlan_tci, mb->vlan_tci_outer);


The first one duplicates above printout, so it should be either put before
PKT_RX_VLAN check and do PKT_RX_VLAN in else branch, or simply removed
from here.

<...>

Re: [dpdk-dev] [PATCH v7] app/testpmd: add forwarding mode to simulate a noisy neighbour

2018-10-02 Thread Thomas Monjalon

02/10/2018 10:59, Jens Freimann:
> On Tue, Oct 02, 2018 at 09:57:52AM +0200, Thomas Monjalon wrote:
> >02/10/2018 09:19, Jens Freimann:
> >> On Mon, Oct 01, 2018 at 01:13:32PM +, Iremonger, Bernard wrote:
> >> >./devtools/check-git-log.sh -1
> >> >Headline too long:
> >> >app/testpmd: add forwarding mode to simulate a noisy neighbour
> >>
> >> I'm sorry, I failed to use checkpatches.sh correctly :) I did:
> >>
> >> #> git show | 
> >> DPDK_CHECKPATCH_PATH="/home/jfreiman/code/linux/scripts/checkpatch.pl" 
> >> devtools/checkpatches.sh --
> >>
> >> 1/1 valid patch
> >
> >Why this command is not correct?
> 
> checkpatches.sh looks for the string "Subject:" which is not included
> in git show output. Using cat on the patch file instead will work. 

Yes indeed.

As an improvement, we could check for "git show" output, starting with "commit" 
word.

Re: [dpdk-dev] [PATCH 3/3] event/dpaa2: support for crypto adapter

2018-10-02 Thread Gujjar, Abhinandan S

Acked-By: Abhinandan Gujjar 


> -Original Message-
> From: Jerin Jacob 
> Sent: Tuesday, September 25, 2018 8:42 AM
> To: akhil.go...@nxp.com
> Cc: dev@dpdk.org; hemant.agra...@nxp.com; De Lara Guarch, Pablo
> ; Gujjar, Abhinandan S
> 
> Subject: Re: [dpdk-dev] [PATCH 3/3] event/dpaa2: support for crypto adapter
> 
> -Original Message-
> > Date: Fri, 14 Sep 2018 17:18:10 +0530
> > From: akhil.go...@nxp.com
> > To: dev@dpdk.org
> > CC: hemant.agra...@nxp.com, pablo.de.lara.gua...@intel.com, Akhil
> > Goyal  
> > Subject: [dpdk-dev] [PATCH 3/3] event/dpaa2: support for crypto
> > adapter
> > X-Mailer: git-send-email 2.17.1
> >
> >
> > From: Akhil Goyal 
> >
> > Signed-off-by: Akhil Goyal 
> > Signed-off-by: Ashish Jain 
> > Signed-off-by: Hemant Agrawal 
> 
> 
> Adding Eventdev Crypto Adapter maintainer
> 
> + Abhinandan Gujjar 
> 
> 
> > ---
> >  drivers/event/dpaa2/Makefile |   3 +-
> >  drivers/event/dpaa2/dpaa2_eventdev.c | 150
> +++
> >  drivers/event/dpaa2/dpaa2_eventdev.h |   9 ++
> >  drivers/event/dpaa2/meson.build  |   3 +-
> >  4 files changed, 163 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/event/dpaa2/Makefile
> > b/drivers/event/dpaa2/Makefile index 5e1a63200..46f7d061e 100644
> > --- a/drivers/event/dpaa2/Makefile
> > +++ b/drivers/event/dpaa2/Makefile
> > @@ -20,9 +20,10 @@ CFLAGS += -I$(RTE_SDK)/drivers/event/dpaa2  CFLAGS
> > += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
> >  LDLIBS += -lrte_eal -lrte_eventdev
> >  LDLIBS += -lrte_bus_fslmc -lrte_mempool_dpaa2 -lrte_pmd_dpaa2 -LDLIBS
> > += -lrte_bus_vdev
> > +LDLIBS += -lrte_bus_vdev -lrte_pmd_dpaa2_sec
> >  CFLAGS += -I$(RTE_SDK)/drivers/net/dpaa2  CFLAGS +=
> > -I$(RTE_SDK)/drivers/net/dpaa2/mc
> > +CFLAGS += -I$(RTE_SDK)/drivers/crypto/dpaa2_sec
> >
> >  # versioning export map
> >  EXPORT_MAP := rte_pmd_dpaa2_event_version.map diff --git
> > a/drivers/event/dpaa2/dpaa2_eventdev.c
> > b/drivers/event/dpaa2/dpaa2_eventdev.c
> > index cadbdb13b..890ab461c 100644
> > --- a/drivers/event/dpaa2/dpaa2_eventdev.c
> > +++ b/drivers/event/dpaa2/dpaa2_eventdev.c
> > @@ -27,6 +27,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >
> >  #include 
> > @@ -34,6 +35,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include "dpaa2_eventdev.h"
> >  #include "dpaa2_eventdev_logs.h"
> >  #include 
> > @@ -793,6 +795,149 @@ dpaa2_eventdev_eth_stop(const struct
> rte_eventdev *dev,
> > return 0;
> >  }
> >
> > +static int
> > +dpaa2_eventdev_crypto_caps_get(const struct rte_eventdev *dev,
> > +   const struct rte_cryptodev *cdev,
> > +   uint32_t *caps) {
> > +   const char *name = cdev->data->name;
> > +
> > +   EVENTDEV_INIT_FUNC_TRACE();
> > +
> > +   RTE_SET_USED(dev);
> > +
> > +   if (!strncmp(name, "dpsec-", 6))
> > +   *caps = RTE_EVENT_CRYPTO_ADAPTER_DPAA2_CAP;
> > +   else
> > +   return -1;
> > +
> > +   return 0;
> > +}
> > +
> > +static int
> > +dpaa2_eventdev_crypto_queue_add_all(const struct rte_eventdev *dev,
> > +   const struct rte_cryptodev *cryptodev,
> > +   const struct rte_event *ev) {
> > +   struct dpaa2_eventdev *priv = dev->data->dev_private;
> > +   uint8_t ev_qid = ev->queue_id;
> > +   uint16_t dpcon_id = priv->evq_info[ev_qid].dpcon->dpcon_id;
> > +   int i, ret;
> > +
> > +   EVENTDEV_INIT_FUNC_TRACE();
> > +
> > +   for (i = 0; i < cryptodev->data->nb_queue_pairs; i++) {
> > +   ret = dpaa2_sec_eventq_attach(cryptodev, i,
> > +   dpcon_id, ev);
> > +   if (ret) {
> > +   DPAA2_EVENTDEV_ERR("dpaa2_sec_eventq_attach failed: 
> > ret
> %d\n",
> > +   ret);
> > +   goto fail;
> > +   }
> > +   }
> > +   return 0;
> > +fail:
> > +   for (i = (i - 1); i >= 0 ; i--)
> > +   dpaa2_sec_eventq_detach(cryptodev, i);
> > +
> > +   return ret;
> > +}
> > +
> > +static int
> > +dpaa2_eventdev_crypto_queue_add(const struct rte_eventdev *dev,
> > +   const struct rte_cryptodev *cryptodev,
> > +   int32_t rx_queue_id,
> > +   const struct rte_event *ev) {
> > +   struct dpaa2_eventdev *priv = dev->data->dev_private;
> > +   uint8_t ev_qid = ev->queue_id;
> > +   uint16_t dpcon_id = priv->evq_info[ev_qid].dpcon->dpcon_id;
> > +   int ret;
> > +
> > +   EVENTDEV_INIT_FUNC_TRACE();
> > +
> > +   if (rx_queue_id == -1)
> > +   return dpaa2_eventdev_crypto_queue_add_all(dev,
> > +   cryptodev, ev);
> > +
> > +   ret = dpaa2_sec_eventq_attach(cryptodev, rx_queue_id,
> > +   dpcon_id, ev);
> > +   if (ret) {
> > +   DPAA2_EVENTDEV_ERR(
> > +   "dpaa2_sec_eventq_att

Re: [dpdk-dev] [PATCH 4/4] ethdev: add Tx offload outer L4 checksum definitions

2018-10-02 Thread Jerin Jacob

-Original Message-
> Date: Mon, 1 Oct 2018 14:45:39 +0100
> From: Ferruh Yigit 
> To: Jerin Jacob , Wenzhuo Lu
>  , Jingjing Wu , Bernard
>  Iremonger , John McNamara
>  , Marko Kovacevic ,
>  Thomas Monjalon , Andrew Rybchenko
>  , Olivier Matz 
> CC: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 4/4] ethdev: add Tx offload outer L4
>  checksum definitions
> User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
>  Thunderbird/52.9.1
> 
> On 9/13/2018 2:47 PM, Jerin Jacob wrote:
> > Introduced DEV_TX_OFFLOAD_OUTER_UDP_CKSUM, DEV_TX_OFFLOAD_OUTER_TCP_CKSUM
> > and DEV_TX_OFFLOAD_OUTER_SCTP_CKSUM offload flags and
> >
> > PKT_TX_OUTER_L4_NO_CKSUM, PKT_TX_OUTER_TCP_CKSUM, PKT_TX_OUTER_SCTP_CKSUM
> > and PKT_TX_OUTER_UDP_CKSUM mbuf ol_flags to enable Tx outer L4 checksum
> > offload.
> >
> > To use hardware Tx outer L4 checksum offload, the user needs to.
> > # enable following in mbuff:
> > - fill outer_l2_len and outer_l3_len in mbuf
> > - set the flags PKT_TX_OUTER_TCP_CKSUM, PKT_TX_OUTER_SCTP_CKSUM or
> > PKT_TX_OUTER_UDP_CKSUM
> > - set the flag PKT_TX_OUTER_IPV4 or PKT_TX_OUTER_IPV6
> >
> > # configure DEV_TX_OFFLOAD_OUTER_* offload flags in slow path.
> >
> > Signed-off-by: Jerin Jacob 
> > ---
> >  app/test-pmd/config.c  | 27 +++
> >  doc/guides/nics/features.rst   |  6 ++
> >  lib/librte_ethdev/rte_ethdev.c |  3 +++
> >  lib/librte_ethdev/rte_ethdev.h |  6 ++
> >  lib/librte_mbuf/rte_mbuf.c |  5 +
> >  lib/librte_mbuf/rte_mbuf.h | 23 ++-
> >  6 files changed, 69 insertions(+), 1 deletion(-)
> >
> > diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> > index 92a177e29..85f832bf0 100644
> > --- a/app/test-pmd/config.c
> > +++ b/app/test-pmd/config.c
> > @@ -773,6 +773,33 @@ port_offload_cap_display(portid_t port_id)
> >   else
> >   printf("off\n");
> >   }
> > +
> > + if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_OUTER_UDP_CKSUM) {
> > + printf("TX Outer UDP checksum:   ");
> > + if (ports[port_id].dev_conf.txmode.offloads &
> > + DEV_TX_OFFLOAD_OUTER_UDP_CKSUM)
> > + printf("on\n");
> > + else
> > + printf("off\n");
> > + }
> > +
> > + if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_OUTER_TCP_CKSUM) {
> > + printf("TX Outer TCP checksum:   ");
> > + if (ports[port_id].dev_conf.txmode.offloads &
> > + DEV_TX_OFFLOAD_OUTER_TCP_CKSUM)
> > + printf("on\n");
> > + else
> > + printf("off\n");
> > + }
> > +
> > + if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_OUTER_SCTP_CKSUM) {
> > + printf("TX Outer SCTP checksum:   ");
> > + if (ports[port_id].dev_conf.txmode.offloads &
> > + DEV_TX_OFFLOAD_OUTER_SCTP_CKSUM)
> > + printf("on\n");
> > + else
> > + printf("off\n");
> > + }
> >  }
> 
> There is also "csum show", "csum set" functions, can you please check that 
> too?

+ Shahaf

I checked the details. It is for "csumonly.c" forward engine to select
various Tx checksum in HW or SW(provide fallback SW implementation) for
testing purpose.

If I need to implement this support for this release, I will reduce the
scope to DEV_TX_OFFLOAD_OUTER_UDP_CKSUM and
DEV_RX_OFFLOAD_OUTER_UDP_CKSUM.

Since there is NO real world non encrypted TCP/SCTP based tunnel
protocols(Based on http://patches.dpdk.org/patch/44692/ discussions)
I will limit the offload definition only to DEV_?X_OFFLOAD_OUTER_UDP_CKSUM and
associated test code in "csumonly.c" forward engine in v2.

Thoughts?

I will split 1/4 and 2/4 as separate patch series and and rework 3/4 and
4/4 as separate series to make forward progress.


> And I am not sure why those functions seems only concerned about Tx csum 
> offloads.

It does check for errors in Rx checksum too. See rx_bad_ip_csum,
rx_bad_l4_csum

Re: [dpdk-dev] [PATCH v11 0/7] hot-unplug failure handle mechanism

2018-10-02 Thread Jeff Guo


hi, stephen

Thanks for your review, my answer as below.

On 10/1/2018 5:00 PM, Stephen Hemminger wrote:

On Sun, 30 Sep 2018 19:29:56 +0800
Jeff Guo  wrote:


Hotplug is an important feature for use-cases like the datacenter device's
fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
flexibility and continuality to networking services in multiple use-cases
in the industry. So let's see how DPDK can help users implement hotplug
solutions.

We already have a general device-event monitor mechanism, failsafe driver,
and hot plug/unplug API in DPDK. We have already got the solution of
“ethdev event + kernel PMD hotplug handler + failsafe”, but we still not
got “eal event + hotplug handler for pci PMD + failsafe” implement, and we
need to considerate 2 different solutions between uio pci and vfio pci.

In the case of hotplug for igb_uio, when a hardware device be removed
physically or disabled in software, the application needs to be notified
and detach the device out of the bus, and then make the device invalidate.
The problem is that, the removal of the device is not instantaneous in
software. If the application data path tries to read/write to the device
when removal is still in process, it will cause an MMIO error and
application will crash.

In this patch set, we propose a PCIe bus failure handler mechanism for
hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs,
the application will not crash.

The mechanism should work as below:

First, the application enables the device event monitor, registers the
hotplug event’s callback and enable hotplug handling before running the
data path. Once the hot-unplug occurs, the mechanism will detect the
removal event and then accordingly do the failure handling. In order to
do that, the below functionality will be required:
  - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure.
  - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci,
it will be based on the failure address to remap memory for the 
corresponding
device that unplugged. For vfio pci, could seperate implement case by case.

For the data path or other unexpected behaviors from the control path
when a hot unplug occurs:
  - Add a new bus ops “sigbus_handler”, that is responsible for handling
the sigbus error which is either an original memory error, or a specific
memory error that is caused by a hot unplug. When a sigbus error is
captured, it will call this function to handle sigbus error.
  - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all
device on PCI bus to find which device encounter the failure.
  - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
to handle the failure.
  - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
“rte_dev_hotplug_handle_diable” to enable/disable hotplug handling.
It will monitor the sigbus error by a handler which is per-process.
Based on the signal event principle, the control path thread and the
data path thread will randomly receive the sigbus error, but will call the
common sigbus process. When sigbus be captured, it will call the above API
to find bus to handle it.

The mechanism could be used by app or PMDs. For example, the whole process
of hotplug in testpmd is:
  - Enable device event monitor->Enable hotplug handle->Register event callback
->attach port->start port->start forwarding->Device unplug->failure handle
->stop forwarding->stop port->close port->detach port.

This patch set would not cover hotplug insert and binding, and it is only
implement the igb_uio failure handler, the vfio hotplug failure handler
will be in next coming patch set.

patchset history:
v11->v10:
change the ops name, since both uio and vfio will use the hot-unplug ops.
add experimental tag.
since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
move the igb_uio fixing part, since it is random issue and should be considarate
as kernel driver defect but not include as this failure handler mechanism.

v10->v9:
modify the api name and exposure out for public use.
add hotplug handle enable/disable APIs
refine commit log

v9->v8:
refine commit log to be more readable.

v8->v7:
refine errno process in sigbus handler.
refine igb uio release process

v7->v6:
delete some unused part

v6->v5:
refine some description about bus ops
refine commit log
add some entry check.

v5->v4:
split patches to focus on the failure handle, remove the event usage
by testpmd to another patch.
change the hotplug failure handler name.
refine the sigbus handle logic.
add lock for udev state in igb uio driver.

v4->v3:
split patches to be small and clear.
change to use new parameter "--hotplug-mode" in testpmd to identify
the eal hotplug and ethdev hotplug.

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler d

Re: [dpdk-dev] [PATCH] doc: update the doc for adding EAL option

2018-10-02 Thread Burakov, Anatoly


Hi Eric,

Ferruh has already mention that this should be part of the patch adding 
the --iova-mode flag, not separate (or at the very least be in the same 
patchset!).


In addition, the commit headline is very vague. Suggested rewording:

doc: document --iova-mode EAL flag

On 01-Oct-18 4:54 PM, eric zhang wrote:

This patch updates Programmer's Guide and EAL parameter guides
to show EAL option "--iova-mode" support.

Signed-off-by: eric zhang 
---
  doc/guides/prog_guide/env_abstraction_layer.rst | 8 
  doc/guides/testpmd_app_ug/run_app.rst   | 4 
  2 files changed, 12 insertions(+)

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst 
b/doc/guides/prog_guide/env_abstraction_layer.rst
index d362c92..a47fb38 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -321,6 +321,14 @@ Misc Functions
  
  Locks and atomic operations are per-architecture (i686 and x86_64).
  
+IOVA Mode Configuration

+~~~
+
+Auto detection of the IOVA mode, based on probing the PCI bus and IOMMU 
configuration, may not report
+the desired addressing mode when virtual devices that are not directly 
attached to the PCI bus are present.
+To facilitate forcing the IOVA mode to a specific value the EAL command line 
option ``--iova-mode=mode`` can
+be used to select either physical addressing('pa') or virtual addressing('va').


Presumably this isn't only applicable to PCI bus, but can be any bus, 
correct?



+
  Memory Segments and Memory Zones (memzone)
  --
  
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst

index f301c2b..be2911c 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -133,6 +133,10 @@ See the DPDK Getting Started Guides for more information 
on these options.


I wanted to ask why are you adding this to testpmd user guide, as this 
is an EAL parameter, not a testpmd parameter, but as far as i can tell, 
there isn't a central location where we document all EAL flags.


+Thomas, John

This looks like a gap in our documentation. There should be a place 
where we can describe all EAL parameters. Since they can be OS-specific, 
it probably should be somewhere under Linux/FreeBSD GSG. Thoughts?


  
  Use malloc instead of hugetlbfs.
  
+*   ``--iova-mode=mode``


Current style is to list all valid values, like this:

``--iova-mode ``


+
+Force IOVA mode to a specific value. Valid values are 'pa' or 'va'.
+
  
  Testpmd Command-line Options

  




--
Thanks,
Anatoly

[dpdk-dev] [PATCH] eal: remove experimental from hotplug add/remove

2018-10-02 Thread Ferruh Yigit

rte_eal_hotplug_add() & rte_eal_hotplug_remove() APIs first added on
v17.08 as experimental
Commit a3ee360f4440 ("eal: add hotplug add/remove device")

When __rte_experimental tag created, APIs tagged with it on v18.02
Commit 77b7b81e32e9 ("add experimental tag to appropriate functions")

After rte_eth_dev_attach() & rte_eth_dev_detach() APIs has been
deprecated in v18.08 eal APIs are only ones for hotplug operations
Commit 9f2be5b3db8b ("ethdev: deprecate attach and detach functions")

These APIs are around for a few releases now and without an alternative,
removing the experimental tag from them.

Signed-off-by: Ferruh Yigit 
---
Cc: Ian Stokes 
CC: arybche...@solarflare.com
---
 drivers/raw/ifpga_rawdev/Makefile   |  1 -
 drivers/raw/ifpga_rawdev/meson.build|  2 --
 lib/librte_eal/common/eal_common_dev.c  |  7 ---
 lib/librte_eal/common/include/rte_dev.h | 15 +--
 lib/librte_eal/rte_eal_version.map  |  4 ++--
 5 files changed, 11 insertions(+), 18 deletions(-)

diff --git a/drivers/raw/ifpga_rawdev/Makefile 
b/drivers/raw/ifpga_rawdev/Makefile
index f3b9d5e61..c534f7f08 100644
--- a/drivers/raw/ifpga_rawdev/Makefile
+++ b/drivers/raw/ifpga_rawdev/Makefile
@@ -8,7 +8,6 @@ include $(RTE_SDK)/mk/rte.vars.mk
 #
 LIB = librte_pmd_ifpga_rawdev.a
 
-CFLAGS += -DALLOW_EXPERIMENTAL_API
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS)
 CFLAGS += -I$(RTE_SDK)/drivers/bus/ifpga
diff --git a/drivers/raw/ifpga_rawdev/meson.build 
b/drivers/raw/ifpga_rawdev/meson.build
index 67256872d..37896afba 100644
--- a/drivers/raw/ifpga_rawdev/meson.build
+++ b/drivers/raw/ifpga_rawdev/meson.build
@@ -11,5 +11,3 @@ deps += ['rawdev', 'pci', 'bus_pci', 'kvargs',
 sources = files('ifpga_rawdev.c')
 
 includes += include_directories('base')
-
-allow_experimental_apis = true
diff --git a/lib/librte_eal/common/eal_common_dev.c 
b/lib/librte_eal/common/eal_common_dev.c
index 678dbcac7..ab3170ebc 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -127,8 +127,9 @@ int rte_eal_dev_detach(struct rte_device *dev)
return ret;
 }
 
-int __rte_experimental rte_eal_hotplug_add(const char *busname, const char 
*devname,
-   const char *devargs)
+int
+rte_eal_hotplug_add(const char *busname, const char *devname,
+   const char *devargs)
 {
struct rte_bus *bus;
struct rte_device *dev;
@@ -193,7 +194,7 @@ int __rte_experimental rte_eal_hotplug_add(const char 
*busname, const char *devn
return ret;
 }
 
-int __rte_experimental
+int
 rte_eal_hotplug_remove(const char *busname, const char *devname)
 {
struct rte_bus *bus;
diff --git a/lib/librte_eal/common/include/rte_dev.h 
b/lib/librte_eal/common/include/rte_dev.h
index b80a80598..2db506987 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -189,9 +189,6 @@ __rte_deprecated
 int rte_eal_dev_detach(struct rte_device *dev);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Hotplug add a given device to a specific bus.
  *
  * @param busname
@@ -204,13 +201,11 @@ int rte_eal_dev_detach(struct rte_device *dev);
  * @return
  *   0 on success, negative on error.
  */
-int __rte_experimental rte_eal_hotplug_add(const char *busname, const char 
*devname,
-   const char *devargs);
+int
+rte_eal_hotplug_add(const char *busname, const char *devname,
+   const char *devargs);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Hotplug remove a given device from a specific bus.
  *
  * @param busname
@@ -220,8 +215,8 @@ int __rte_experimental rte_eal_hotplug_add(const char 
*busname, const char *devn
  * @return
  *   0 on success, negative on error.
  */
-int __rte_experimental rte_eal_hotplug_remove(const char *busname,
- const char *devname);
+int
+rte_eal_hotplug_remove(const char *busname, const char *devname);
 
 /**
  * Device comparison function.
diff --git a/lib/librte_eal/rte_eal_version.map 
b/lib/librte_eal/rte_eal_version.map
index 73282bbb0..bd6ba15e3 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -265,6 +265,8 @@ DPDK_18.08 {
 DPDK_18.11 {
global:
 
+   rte_eal_hotplug_add;
+   rte_eal_hotplug_remove;
rte_strscpy;
 
 } DPDK_18.08;
@@ -292,8 +294,6 @@ EXPERIMENTAL {
rte_devargs_remove;
rte_devargs_type_count;
rte_eal_cleanup;
-   rte_eal_hotplug_add;
-   rte_eal_hotplug_remove;
rte_fbarray_attach;
rte_fbarray_destroy;
rte_fbarray_detach;
-- 
2.17.1

Re: [dpdk-dev] [PATCH] doc: update the doc for adding EAL option

2018-10-02 Thread Thomas Monjalon

02/10/2018 11:59, Burakov, Anatoly:
> This looks like a gap in our documentation. There should be a place 
> where we can describe all EAL parameters. Since they can be OS-specific, 
> it probably should be somewhere under Linux/FreeBSD GSG. Thoughts?

I agree

Re: [dpdk-dev] [PATCH] mbuf: clarify QINQ flag usage

2018-10-02 Thread Ferruh Yigit

On 10/2/2018 10:44 AM, Andrew Rybchenko wrote:
> On 10/2/18 1:17 PM, Ferruh Yigit wrote:
>> Update implementation that when PKT_RX_QINQ_STRIPPED mbuf ol_flags
>> set by PMD, PKT_RX_QINQ, PKT_RX_VLAN_STRIPPED & PKT_RX_VLAN
>> should be also set.
>>
>> Clarify mbuf documentations that when PKT_RX_QINQ set PKT_RX_VLAN also
>> should be set.
>>
>> So that appllication can rely on PKT_RX_QINQ flag to access both
>> mbuf.vlan_tci & mbuf.vlan_tci_outer
>>
>> Signed-off-by: Ferruh Yigit 
>> ---
>> Cc: Hyong Youb Kim 
>> Cc: John Daley 
>> ---
>>   app/test-pmd/rxonly.c| 2 +-
>>   doc/guides/nics/features.rst | 7 ---
>>   drivers/net/i40e/i40e_rxtx.c | 3 ++-
>>   lib/librte_mbuf/rte_mbuf.c   | 1 +
>>   lib/librte_mbuf/rte_mbuf.h   | 5 +++--
>>   5 files changed, 11 insertions(+), 7 deletions(-)
>>
>> diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
>> index a93d80612..08a5fc2cf 100644
>> --- a/app/test-pmd/rxonly.c
>> +++ b/app/test-pmd/rxonly.c
>> @@ -132,7 +132,7 @@ pkt_burst_receive(struct fwd_stream *fs)
>>  printf(" - timestamp %"PRIu64" ", mb->timestamp);
>>  if (ol_flags & PKT_RX_VLAN_STRIPPED)
> 
> It looks like it should be PKT_RX_VLAN above.

There is already a patch from Hyong for it which triggered this patch:
https://patches.dpdk.org/patch/45350/


> 
>>  printf(" - VLAN tci=0x%x", mb->vlan_tci);
>> -if (ol_flags & PKT_RX_QINQ_STRIPPED)
>> +if (ol_flags & PKT_RX_QINQ)
>>  printf(" - QinQ VLAN tci=0x%x, VLAN tci outer=0x%x",
>>  mb->vlan_tci, mb->vlan_tci_outer);
> 
> The first one duplicates above printout, so it should be either put before
> PKT_RX_VLAN check and do PKT_RX_VLAN in else branch, or simply removed
> from here.

Right, let me check it.

Re: [dpdk-dev] [PATCH v11 0/7] hot-unplug failure handle mechanism

2018-10-02 Thread Jeff Guo


hi, jerin

Thanks for your comment and reply as below.

On 10/1/2018 5:55 PM, Jerin Jacob wrote:

-Original Message-

Date: Mon, 1 Oct 2018 11:00:12 +0200
From: Stephen Hemminger 
To: Jeff Guo 
Cc: bruce.richard...@intel.com, ferruh.yi...@intel.com,
  konstantin.anan...@intel.com, gaetan.ri...@6wind.com,
  jingjing...@intel.com, tho...@monjalon.net, mo...@mellanox.com,
  ma...@mellanox.com, harry.van.haa...@intel.com, qi.z.zh...@intel.com,
  shaopeng...@intel.com, bernard.iremon...@intel.com,
  arybche...@solarflare.com, wenzhuo...@intel.com,
  anatoly.bura...@intel.com, jblu...@infradead.org, shreyansh.j...@nxp.com,
  dev@dpdk.org, helin.zh...@intel.com
Subject: Re: [dpdk-dev] [PATCH v11 0/7] hot-unplug failure handle mechanism


On Sun, 30 Sep 2018 19:29:56 +0800
Jeff Guo  wrote:


Hotplug is an important feature for use-cases like the datacenter device's
fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
flexibility and continuality to networking services in multiple use-cases
in the industry. So let's see how DPDK can help users implement hotplug
solutions.

We already have a general device-event monitor mechanism, failsafe driver,
and hot plug/unplug API in DPDK. We have already got the solution of
“ethdev event + kernel PMD hotplug handler + failsafe”, but we still not
got “eal event + hotplug handler for pci PMD + failsafe” implement, and we
need to considerate 2 different solutions between uio pci and vfio pci.

In the case of hotplug for igb_uio, when a hardware device be removed
physically or disabled in software, the application needs to be notified
and detach the device out of the bus, and then make the device invalidate.
The problem is that, the removal of the device is not instantaneous in
software. If the application data path tries to read/write to the device
when removal is still in process, it will cause an MMIO error and
application will crash.

In this patch set, we propose a PCIe bus failure handler mechanism for
hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs,
the application will not crash.

The mechanism should work as below:

First, the application enables the device event monitor, registers the
hotplug event’s callback and enable hotplug handling before running the
data path. Once the hot-unplug occurs, the mechanism will detect the
removal event and then accordingly do the failure handling. In order to
do that, the below functionality will be required:
  - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure.
  - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci,
it will be based on the failure address to remap memory for the 
corresponding
device that unplugged. For vfio pci, could seperate implement case by case.

For the data path or other unexpected behaviors from the control path
when a hot unplug occurs:
  - Add a new bus ops “sigbus_handler”, that is responsible for handling
the sigbus error which is either an original memory error, or a specific
memory error that is caused by a hot unplug. When a sigbus error is
captured, it will call this function to handle sigbus error.
  - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all
device on PCI bus to find which device encounter the failure.
  - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
to handle the failure.
  - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
“rte_dev_hotplug_handle_diable” to enable/disable hotplug handling.
It will monitor the sigbus error by a handler which is per-process.
Based on the signal event principle, the control path thread and the
data path thread will randomly receive the sigbus error, but will call the
common sigbus process. When sigbus be captured, it will call the above API
to find bus to handle it.

The mechanism could be used by app or PMDs. For example, the whole process
of hotplug in testpmd is:
  - Enable device event monitor->Enable hotplug handle->Register event callback
->attach port->start port->start forwarding->Device unplug->failure handle
->stop forwarding->stop port->close port->detach port.

This patch set would not cover hotplug insert and binding, and it is only
implement the igb_uio failure handler, the vfio hotplug failure handler
will be in next coming patch set.

patchset history:
v11->v10:
change the ops name, since both uio and vfio will use the hot-unplug ops.
add experimental tag.
since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
move the igb_uio fixing part, since it is random issue and should be considarate
as kernel driver defect but not include as this failure handler mechanism.

v10->v9:
modify the api name and exposure out for public use.
add hotplug handle enable/disable APIs
refine commit log

v9->v8:
refine commit log to be more readable.

v8->v7:
refine errno process

Re: [dpdk-dev] [PATCH] eal: remove experimental from hotplug add/remove

2018-10-02 Thread Andrew Rybchenko


On 10/2/18 2:04 PM, Ferruh Yigit wrote:

rte_eal_hotplug_add() & rte_eal_hotplug_remove() APIs first added on
v17.08 as experimental
Commit a3ee360f4440 ("eal: add hotplug add/remove device")

When __rte_experimental tag created, APIs tagged with it on v18.02
Commit 77b7b81e32e9 ("add experimental tag to appropriate functions")

After rte_eth_dev_attach() & rte_eth_dev_detach() APIs has been
deprecated in v18.08 eal APIs are only ones for hotplug operations
Commit 9f2be5b3db8b ("ethdev: deprecate attach and detach functions")

These APIs are around for a few releases now and without an alternative,
removing the experimental tag from them.

Signed-off-by: Ferruh Yigit 


Dup of http://patches.dpdk.org/patch/45791/ ?

Re: [dpdk-dev] [PATCH v8 1/4] lib/librte_power: traffic pattern aware power control

2018-10-02 Thread Liang, Ma

Hi Dave, 
Please check comment below. 

On 28 Sep 11:47, Hunt, David wrote:
> Hi Liang,
> 
> 
> On 17/9/2018 2:30 PM, Liang Ma wrote:
> >1. Abstract
> >
> >For packet processing workloads such as DPDK polling is continuous.
> >This means CPU cores always show 100% busy independent of how much work
> >those cores are doing. It is critical to accurately determine how busy
> >a core is hugely important for the following reasons:
> >
> >* No indication of overload conditions
> >
> >* User do not know how much real load is on a system meaning resulted 
> >in
> >  wasted energy as no power management is utilized
> >
> >Compared to the original l3fwd-power design, instead of going to sleep
> >after detecting an empty poll, the new mechanism just lowers the core
> >frequency. As a result, the application does not stop polling the device,
> >which leads to improved handling of bursts of traffic.
> >
> >When the system become busy, the empty poll mechanism can also increase the
> >core frequency (including turbo) to do best effort for intensive traffic.
> >This gives us more flexible and balanced traffic awareness over the
> >standard l3fwd-power application.
> >
> >2. Proposed solution
> >
> >The proposed solution focuses on how many times empty polls are executed.
> >The less the number of empty polls, means current core is busy with
> >processing workload, therefore, the higher frequency is needed. The high
> >empty poll number indicates the current core not doing any real work
> >therefore, we can lower the frequency to safe power.
> >
> >In the current implementation, each core has 1 empty-poll counter which
> >assume 1 core is dedicated to 1 queue. This will need to be expanded in the
> >future to support multiple queues per core.
> >
> >2.1 Power state definition:
> >
> > LOW:  Not currently used, reserved for future use.
> >
> > MED:  the frequency is used to process modest traffic workload.
> >
> > HIGH: the frequency is used to process busy traffic workload.
> >
> >2.2 There are two phases to establish the power management system:
> >
> > a.Initialization/Training phase. The training phase is necessary
> >   in order to figure out the system polling baseline numbers from
> >   idle to busy. The highest poll count will be during idle, where
> >   all polls are empty. These poll counts will be different between
> >   systems due to the many possible processor micro-arch, cache
> >   and device configurations, hence the training phase.
> >   In the training phase, traffic is blocked so the training
> >   algorithm can average the empty-poll numbers for the LOW, MED and
> >   HIGH  power states in order to create a baseline.
> >   The core's counter are collected every 10ms, and the Training
> >   phase will take 2 seconds.
> >   Training is disabled as default configuration. the default
> >   parameter is applied. Simple App still can trigger training
> 
> Typo: "Simple" should be "Sample"
> 
> Suggest adding: Once the training phase has been executed once on a 
> system, the application
> can then be started with the relevant thresholds provided on the command 
> line, allowing the
> application to start passing start traffic immediately.
agree
> 
> >   if that's needed.
> >
> > b.Normal phase. When the training phase is complete, traffic is
> >   started. The run-time poll counts are compared with the
> >   baseline and the decision will be taken to move to MED power
> >   state or HIGH power state. The counters are calculated every 10ms.
> 
> Propose changing the first sentence:  Traffic starts immediately based 
> on the default
> thresholds, or based on the user supplied thresholds via the command 
> line parameters.
>
agree
> 
> 
> 
> >3. Proposed  API
> >
> >1.  rte_power_empty_poll_stat_init(struct ep_params **eptr,
> > uint8_t *freq_tlb, struct ep_policy *policy);
> >which is used to initialize the power management system.
> >  
> >2.  rte_power_empty_poll_stat_free(void);
> >which is used to free the resource hold by power management system.
> >  
> >3.  rte_power_empty_poll_stat_update(unsigned int lcore_id);
> >which is used to update specific core empty poll counter, not thread safe
> >  
> >4.  rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt);
> >which is used to update specific core valid poll counter, not thread safe
> >  
> >5.  rte_power_empty_poll_stat_fetch(unsigned int lcore_id);
> >which is used to get specific core empty poll counter.
> >  
> >6.  rte_power_poll_stat_fetch(unsigned int lcore_id);
> >which is used to get specific core valid poll counter.
> >
> >7.  rte_empty_poll_detection(struct rte_timer *tim, void *arg);
> >which is used to detect empty poll state changes then take action.
> >
> >ChangeLog:
> >v2: fix some coding style issues.
> >v3: rename the filename, API name.
> >v4: no change.
> >v5: no change.
> >v6: re-work the code layout, update

Re: [dpdk-dev] [PATCH v8 2/4] examples/l3fwd-power: simple app update for new API

2018-10-02 Thread Liang, Ma

On 28 Sep 12:19, Hunt, David wrote:
> Hi Liang,
> 
> A few tweaks below:
> 
> 
> On 17/9/2018 2:30 PM, Liang Ma wrote:
> >Add the support for new traffic pattern aware power control
> >power management API.
> >
> >Example:
> >./l3fwd-power -l xxx   -n 4   -w :xx:00.0 -w :xx:00.1 -- -p 0x3
> >-P --config="(0,0,xx),(1,0,xx)" --empty-poll="0,0,0" -l 14 -m 9 -h 1
> >
> >Please Reference l3fwd-power document for all parameter except
> >empty-poll.
> 
> The docs should probably include empty poll parameter. Suggest 
> re-wording to
> 
> Please Reference l3fwd-power document for full parameter usage
> 
agree
> 
> 
> >The option "l", "m", "h" are used to set the power index for
> >LOW, MED, HIGH power state. only is useful after enable empty-poll
> >
> >--empty-poll="training_flag, med_threshold, high_threshold"
> >
> >The option training_flag is used to enable/disable training mode.
> >
> >The option med_threshold is used to indicate the empty poll threshold
> >of modest state which is customized by user.
> >
> >The option high_threshold is used to indicate the empty poll threshold
> >of busy state which is customized by user.
> >
> >Above three option default value is all 0.
> >
> >Once enable empty-poll. System will apply the default parameter.
> >Training mode is disabled as default.
> 
> Suggest:
> 
> Once empty-poll is enabled, the system will apply the default parameters is 
> no
> other command line options are provided.
> 
agree
> 
> 
> >If training mode is triggered, there should not has any traffic
> >pass-through during training phase.
> 
> Suggest:
> If training mode is enabled, the user should ensure that no traffic
> is allowed to pass through the system.
> 
> >When training phase complete, system transfer to normal phase.
> 
> When training phase complete, the application transfer to normal operation
> 
> 
agree
> 
> >
> >System will running with modest power stat at beginning.
> 
> System will start running with the modest power mode.
> 

> 
> >If the system busyness percentage above 70%, then system will adjust
> >power state move to High power state. If the traffic become lower(eg. The
> >system busyness percentage drop below 30%), system will fallback
> >to the modest power state.
> 
> If the traffic goes above 70%, then system will move to High power state.
> If the traffic drops below 30%, the system will fallback to the modest
> power state.
> 
> 
> >Example code use master thread to monitoring worker thread busyness.
> >the default timer resolution is 10ms.
> >
> >ChangeLog:
> >v2 fix some coding style issues
> >v3 rename the API.
> >v6 re-work the API.
> >v7 no change.
> >v8 disable training as default option.
> >
> >Signed-off-by: Liang Ma 
> >
> >Reviewed-by: Lei Yao 
> >---
> >  examples/l3fwd-power/Makefile|   3 +
> >  examples/l3fwd-power/main.c  | 325 
> >  +--
> >  examples/l3fwd-power/meson.build |   1 +
> >  3 files changed, 312 insertions(+), 17 deletions(-)
> >
> >diff --git a/examples/l3fwd-power/Makefile b/examples/l3fwd-power/Makefile
> >index d7e39a3..772ec7b 100644
> >--- a/examples/l3fwd-power/Makefile
> >+++ b/examples/l3fwd-power/Makefile
> >@@ -23,6 +23,8 @@ CFLAGS += -O3 $(shell pkg-config --cflags libdpdk)
> >  LDFLAGS_SHARED = $(shell pkg-config --libs libdpdk)
> >  LDFLAGS_STATIC = -Wl,-Bstatic $(shell pkg-config --static --libs libdpdk)
> >  
> >+CFLAGS += -DALLOW_EXPERIMENTAL_API
> >+
> >  build/$(APP)-shared: $(SRCS-y) Makefile $(PC_FILE) | build
> > $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED)
> >  
> >@@ -54,6 +56,7 @@ please change the definition of the RTE_TARGET 
> >environment variable)
> >  all:
> >  else
> >  
> >+CFLAGS += -DALLOW_EXPERIMENTAL_API
> >  CFLAGS += -O3
> >  CFLAGS += $(WERROR_FLAGS)
> >  
> >diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
> >index 68527d2..1465608 100644
> >--- a/examples/l3fwd-power/main.c
> >+++ b/examples/l3fwd-power/main.c
> >@@ -43,6 +43,7 @@
> >  #include 
> >  #include 
> >  #include 
> >+#include 
> >  
> >  #include "perf_core.h"
> >  #include "main.h"
> >@@ -55,6 +56,8 @@
> >  
> >  /* 100 ms interval */
> >  #define TIMER_NUMBER_PER_SECOND   10
> >+/* (10ms) */
> >+#define INTERVALS_PER_SECOND 100
> >  /* 10 us */
> >  #define SCALING_PERIOD
> >  (100/TIMER_NUMBER_PER_SECOND)
> >  #define SCALING_DOWN_TIME_RATIO_THRESHOLD 0.25
> >@@ -117,6 +120,11 @@
> >   */
> >  #define RTE_TEST_RX_DESC_DEFAULT 1024
> >  #define RTE_TEST_TX_DESC_DEFAULT 1024
> >+#define EMPTY_POLL_MED_THRESHOLD 35UL
> >+#define EMPTY_POLL_HGH_THRESHOLD 58UL
> 
> I'd suggest adding some explanation around these two numbers.
> E.g.
> /*
>  * These two thresholds were decided on by running the training 
> algorithm on
>  * a 2.5GHz Xeon. These defaults can be overridden by supplying 
> non-zero values
>  * for the med_threshold and high_threshold parameters on the command line.
>  */
> 
> 
> >+
> >+

Re: [dpdk-dev] [PATCH] eal: remove experimental from hotplug add/remove

2018-10-02 Thread Ferruh Yigit

On 10/2/2018 11:08 AM, Andrew Rybchenko wrote:
> On 10/2/18 2:04 PM, Ferruh Yigit wrote:
>> rte_eal_hotplug_add() & rte_eal_hotplug_remove() APIs first added on
>> v17.08 as experimental
>> Commit a3ee360f4440 ("eal: add hotplug add/remove device")
>>
>> When __rte_experimental tag created, APIs tagged with it on v18.02
>> Commit 77b7b81e32e9 ("add experimental tag to appropriate functions")
>>
>> After rte_eth_dev_attach() & rte_eth_dev_detach() APIs has been
>> deprecated in v18.08 eal APIs are only ones for hotplug operations
>> Commit 9f2be5b3db8b ("ethdev: deprecate attach and detach functions")
>>
>> These APIs are around for a few releases now and without an alternative,
>> removing the experimental tag from them.
>>
>> Signed-off-by: Ferruh Yigit 
> 
> Dup of http://patches.dpdk.org/patch/45791/ ?

Ahh, yes it is, I will mark this as Rejected, thanks.

Re: [dpdk-dev] [PATCH v3 3/4] eal: remove experimental flag of hotplug functions

2018-10-02 Thread Ferruh Yigit

On 9/28/2018 5:21 PM, Thomas Monjalon wrote:
> These functions are quite old and are the only available replacement
> for the deprecated attach/detach functions.
> 
> Note: some new functions may (again) replace these hotplug functions,
> in future, with better parameters.
> 
> Signed-off-by: Thomas Monjalon 
> ---
>  lib/librte_eal/common/eal_common_dev.c  |  7 ---
>  lib/librte_eal/common/include/rte_dev.h | 11 ++-
>  lib/librte_eal/rte_eal_version.map  |  4 ++--

Can remove "-DALLOW_EXPERIMENTAL_API" (or "allow_experimental_apis = true" for
meson) from drivers/raw/ifpga_rawdev when these APIs are not experimental 
anymore.
For reference: https://patches.dpdk.org/patch/45836/

It is easy to know when to add "-DALLOW_EXPERIMENTAL_API" but it is hard to know
when to remove one, some helper should be good there.

Re: [dpdk-dev] [PATCH v3 1/2] net/tap: change queue fd to be pointers to process private

2018-10-02 Thread Raslan Darawsheh

Hi,

what I'm really doing is simply do some private array for all the fd's that 
each process will allocate it separately which will allow that each process 
will be 
able to access the fd's for the queues in order not to overwrite the shared 
ones and it's working for me this way.

Now coming to your comment I'm not sure I fully understand it but, what you are 
suggesting is to create an array which will be accessed by the process_id to 
store these fd's in it.
As far as I know we don't have something as process_id in dpdk we only have the 
system process id which is not relevant for the array of fd's.

Please correct me if I'm wrong, 
I think this way we'll be limiting the number of secondary processes to number 
of queues by tap.
Meanwhile, in my solution we don't have such limitation.

Kindest regards,
Raslan Darawsheh

-Original Message-
From: Wiles, Keith  
Sent: Thursday, September 27, 2018 4:18 PM
To: Raslan Darawsheh 
Cc: Thomas Monjalon ; dev@dpdk.org; Shahaf Shuler 
; Ori Katz 
Subject: Re: [PATCH v3 1/2] net/tap: change queue fd to be pointers to process 
private

> On Sep 27, 2018, at 6:19 AM, Raslan Darawsheh  wrote:
> 
> change the fds for the queues to be pointers and add new process 
> private structure and make the queue fds point to it.
> 
> Signed-off-by: Raslan Darawsheh 
> ---
> drivers/net/tap/rte_eth_tap.c | 63 
> ---
> drivers/net/tap/rte_eth_tap.h |  9 +--
> drivers/net/tap/tap_intr.c|  4 +--
> 3 files changed, 44 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/net/tap/rte_eth_tap.c 
> b/drivers/net/tap/rte_eth_tap.c index ad5ae98..8cc4552 100644
> --- a/drivers/net/tap/rte_eth_tap.c
> +++ b/drivers/net/tap/rte_eth_tap.c
> @@ -64,6 +64,7 @@
> 
> static struct rte_vdev_driver pmd_tap_drv; static struct 
> rte_vdev_driver pmd_tun_drv;
> +static struct pmd_process_private *process_private;

Maybe I do not see some minor point for making fd a pointer to fd when we could 
have an array of process_private[RTE_PMD_TAP_MAX_QUEUES] instead of a pointer 
type here. Then we do not need to allocate the memory each PMD and they would 
still have a private copy. Remove the array of rx/tx fds in the structure. This 
way it appears we can remove the code below that is making fd a pointer to fd. 
It just seems overly complex to me at the cost of a few more bytes of memory.

This would remove int fd; from the structure and add a pointer to the 
pid_process_private instead, which is private by default.

Did I miss some detail here that makes my comment wrong?

> 
> static const char *valid_arguments[] = {
>   ETH_TAP_IFACE_ARG,
> @@ -331,7 +332,7 @@ pmd_rx_burst(void *queue, struct rte_mbuf **bufs, 
> uint16_t nb_pkts)
>   uint16_t data_off = rte_pktmbuf_headroom(mbuf);
>   int len;
> 
> - len = readv(rxq->fd, *rxq->iovecs,
> + len = readv(*rxq->fd, *rxq->iovecs,
>   1 +
>   (rxq->rxmode->offloads & DEV_RX_OFFLOAD_SCATTER ?
>rxq->nb_rx_desc : 1));
> @@ -595,7 +596,7 @@ tap_write_mbufs(struct tx_queue *txq, uint16_t num_mbufs,
>   tap_tx_l4_cksum(l4_cksum, l4_phdr_cksum, l4_raw_cksum);
> 
>   /* copy the tx frame data */
> - n = writev(txq->fd, iovecs, j);
> + n = writev(*txq->fd, iovecs, j);
>   if (n <= 0)
>   break;
>   (*num_packets)++;
> @@ -976,13 +977,13 @@ tap_dev_close(struct rte_eth_dev *dev)
>   tap_flow_implicit_flush(internals, NULL);
> 
>   for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {
> - if (internals->rxq[i].fd != -1) {
> - close(internals->rxq[i].fd);
> - internals->rxq[i].fd = -1;
> + if (*internals->rxq[i].fd != -1) {
> + close(*internals->rxq[i].fd);
> + *internals->rxq[i].fd = -1;
>   }
> - if (internals->txq[i].fd != -1) {
> - close(internals->txq[i].fd);
> - internals->txq[i].fd = -1;
> + if (*internals->txq[i].fd != -1) {
> + close(*internals->txq[i].fd);
> + *internals->txq[i].fd = -1;
>   }
>   }
> 
> @@ -1007,9 +1008,9 @@ tap_rx_queue_release(void *queue) {
>   struct rx_queue *rxq = queue;
> 
> - if (rxq && (rxq->fd > 0)) {
> - close(rxq->fd);
> - rxq->fd = -1;
> + if (rxq && rxq->fd && (*rxq->fd > 0)) {
> + close(*rxq->fd);
> + *rxq->fd = -1;
>   rte_pktmbuf_free(rxq->pool);
>   rte_free(rxq->iovecs);
>   rxq->pool = NULL;
> @@ -1022,9 +1023,9 @@ tap_tx_queue_release(void *queue) {
>   struct tx_queue *txq = queue;
> 
> - if (txq && (txq->fd > 0)) {
> - close(txq->fd);
> - txq->fd = -1;
> + if (txq && txq->fd &

[dpdk-dev] [PATCH v4 1/2] net/tap: change queue fd to be pointers to process private

2018-10-02 Thread Raslan Darawsheh

change the fds for the queues to be pointers and add new process private
structure and make the queue fds point to it.

Signed-off-by: Raslan Darawsheh 
---
 drivers/net/tap/rte_eth_tap.c | 63 ---
 drivers/net/tap/rte_eth_tap.h |  9 +--
 drivers/net/tap/tap_intr.c|  4 +--
 3 files changed, 44 insertions(+), 32 deletions(-)

diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index ad5ae98..8cc4552 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -64,6 +64,7 @@
 
 static struct rte_vdev_driver pmd_tap_drv;
 static struct rte_vdev_driver pmd_tun_drv;
+static struct pmd_process_private *process_private;
 
 static const char *valid_arguments[] = {
ETH_TAP_IFACE_ARG,
@@ -331,7 +332,7 @@ pmd_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t 
nb_pkts)
uint16_t data_off = rte_pktmbuf_headroom(mbuf);
int len;
 
-   len = readv(rxq->fd, *rxq->iovecs,
+   len = readv(*rxq->fd, *rxq->iovecs,
1 +
(rxq->rxmode->offloads & DEV_RX_OFFLOAD_SCATTER ?
 rxq->nb_rx_desc : 1));
@@ -595,7 +596,7 @@ tap_write_mbufs(struct tx_queue *txq, uint16_t num_mbufs,
tap_tx_l4_cksum(l4_cksum, l4_phdr_cksum, l4_raw_cksum);
 
/* copy the tx frame data */
-   n = writev(txq->fd, iovecs, j);
+   n = writev(*txq->fd, iovecs, j);
if (n <= 0)
break;
(*num_packets)++;
@@ -976,13 +977,13 @@ tap_dev_close(struct rte_eth_dev *dev)
tap_flow_implicit_flush(internals, NULL);
 
for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {
-   if (internals->rxq[i].fd != -1) {
-   close(internals->rxq[i].fd);
-   internals->rxq[i].fd = -1;
+   if (*internals->rxq[i].fd != -1) {
+   close(*internals->rxq[i].fd);
+   *internals->rxq[i].fd = -1;
}
-   if (internals->txq[i].fd != -1) {
-   close(internals->txq[i].fd);
-   internals->txq[i].fd = -1;
+   if (*internals->txq[i].fd != -1) {
+   close(*internals->txq[i].fd);
+   *internals->txq[i].fd = -1;
}
}
 
@@ -1007,9 +1008,9 @@ tap_rx_queue_release(void *queue)
 {
struct rx_queue *rxq = queue;
 
-   if (rxq && (rxq->fd > 0)) {
-   close(rxq->fd);
-   rxq->fd = -1;
+   if (rxq && rxq->fd && (*rxq->fd > 0)) {
+   close(*rxq->fd);
+   *rxq->fd = -1;
rte_pktmbuf_free(rxq->pool);
rte_free(rxq->iovecs);
rxq->pool = NULL;
@@ -1022,9 +1023,9 @@ tap_tx_queue_release(void *queue)
 {
struct tx_queue *txq = queue;
 
-   if (txq && (txq->fd > 0)) {
-   close(txq->fd);
-   txq->fd = -1;
+   if (txq && txq->fd && (*txq->fd > 0)) {
+   close(*txq->fd);
+   *txq->fd = -1;
}
 }
 
@@ -1214,13 +1215,13 @@ tap_setup_queue(struct rte_eth_dev *dev,
struct rte_gso_ctx *gso_ctx;
 
if (is_rx) {
-   fd = &rx->fd;
-   other_fd = &tx->fd;
+   fd = rx->fd;
+   other_fd = tx->fd;
dir = "rx";
gso_ctx = NULL;
} else {
-   fd = &tx->fd;
-   other_fd = &rx->fd;
+   fd = tx->fd;
+   other_fd = rx->fd;
dir = "tx";
gso_ctx = &tx->gso_ctx;
}
@@ -1331,7 +1332,7 @@ tap_rx_queue_setup(struct rte_eth_dev *dev,
}
 
TAP_LOG(DEBUG, "  RX TUNTAP device name %s, qid %d on fd %d",
-   internals->name, rx_queue_id, internals->rxq[rx_queue_id].fd);
+   internals->name, rx_queue_id, *internals->rxq[rx_queue_id].fd);
 
return 0;
 
@@ -1371,7 +1372,7 @@ tap_tx_queue_setup(struct rte_eth_dev *dev,
return -1;
TAP_LOG(DEBUG,
"  TX TUNTAP device name %s, qid %d on fd %d csum %s",
-   internals->name, tx_queue_id, internals->txq[tx_queue_id].fd,
+   internals->name, tx_queue_id, *internals->txq[tx_queue_id].fd,
txq->csum ? "on" : "off");
 
return 0;
@@ -1633,6 +1634,10 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, char 
*tap_name,
goto error_exit_nodev;
}
 
+   process_private = (struct pmd_process_private *)
+   rte_zmalloc_socket(tap_name, sizeof(struct pmd_process_private),
+   RTE_CACHE_LINE_SIZE, dev->device->numa_node);
+
pmd = dev->data->dev_private;
pmd->dev = dev;
snprintf(pmd->name, sizeof(pmd->name), "%s", tap_name);
@@ -1669,8 +1674,10 @@ eth_dev_tap_

[dpdk-dev] [PATCH v4 2/2] net/tap: add queues when attaching from secondary process

2018-10-02 Thread Raslan Darawsheh

In the case the device is created by the primary process,
the secondary must request some file descriptors to attach the queues.
The file descriptors are shared via IPC Unix socket.

Thanks to the IPC synchronization, the secondary process
is now able to do Rx/Tx on a TAP created by the primary process.

Signed-off-by: Raslan Darawsheh 
Signed-off-by: Thomas Monjalon 

---
v2:
   - translate file descriptors via IPC API
   - add documentation
v3:
   - rabse the commit
   - use private static array for fd's to be local for each process

v4:
  - change strcpy to be strlcpy
  - remove the fixme and todo from comments.

---
---
 doc/guides/nics/tap.rst|  16 
 doc/guides/rel_notes/release_18_11.rst |   4 +
 drivers/net/tap/Makefile   |   1 +
 drivers/net/tap/rte_eth_tap.c  | 133 -
 4 files changed, 153 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst
index 2714868..d1f3e1c 100644
--- a/doc/guides/nics/tap.rst
+++ b/doc/guides/nics/tap.rst
@@ -152,6 +152,22 @@ Distribute IPv4 TCP packets using RSS to a given MAC 
address over queues 0-3::
testpmd> flow create 0 priority 4 ingress pattern eth dst is 
0a:0b:0c:0d:0e:0f \
 / ipv4 / tcp / end actions rss queues 0 1 2 3 end / end
 
+Multi-process sharing
+-
+
+It is possible to attach an existing TAP device in a secondary process,
+by declaring it as a vdev with the same name as in the primary process,
+and without any parameter.
+
+The port attached in a secondary process will give access to the
+statistics and the queues.
+Therefore it can be used for monitoring or Rx/Tx processing.
+
+The IPC synchronization of Rx/Tx queues is currently limited:
+
+  - Only 8 queues
+  - Synchronized on probing, but not on later port update
+
 Example
 ---
 
diff --git a/doc/guides/rel_notes/release_18_11.rst 
b/doc/guides/rel_notes/release_18_11.rst
index 8c4bb54..a9dda5b 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -67,6 +67,10 @@ New Features
   SR-IOV option in Hyper-V and Azure. This is an alternative to the previous
   vdev_netvsc, tap, and failsafe drivers combination.
 
+* **Added TAP Rx/Tx queues sharing with a secondary process.**
+
+  A secondary process can attach a TAP device created in the primary process,
+  probe the queues, and process Rx/Tx in a secondary process.
 
 API Changes
 ---
diff --git a/drivers/net/tap/Makefile b/drivers/net/tap/Makefile
index 3243365..7748283 100644
--- a/drivers/net/tap/Makefile
+++ b/drivers/net/tap/Makefile
@@ -22,6 +22,7 @@ CFLAGS += -O3
 CFLAGS += -I$(SRCDIR)
 CFLAGS += -I.
 CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
 LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs -lrte_hash
 LDLIBS += -lrte_bus_vdev -lrte_gso
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 8cc4552..a751f76 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -16,6 +16,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -62,6 +64,9 @@
 #define TAP_GSO_MBUFS_NUM \
(TAP_GSO_MBUFS_PER_CORE * TAP_GSO_MBUF_CACHE_SIZE)
 
+/* IPC key for queue fds sync */
+#define TAP_MP_KEY "tap_mp_sync_queues"
+
 static struct rte_vdev_driver pmd_tap_drv;
 static struct rte_vdev_driver pmd_tun_drv;
 static struct pmd_process_private *process_private;
@@ -101,6 +106,17 @@ enum ioctl_mode {
REMOTE_ONLY,
 };
 
+/* Message header to synchronize queues via IPC */
+struct ipc_queues {
+   char port_name[RTE_DEV_NAME_MAX_LEN];
+   int rxq_count;
+   int txq_count;
+   /*
+* The file descriptors are in the dedicated part
+* of the Unix message to be translated by the kernel.
+*/
+};
+
 static int tap_intr_handle_set(struct rte_eth_dev *dev, int set);
 
 /**
@@ -1980,6 +1996,99 @@ rte_pmd_tun_probe(struct rte_vdev_device *dev)
return ret;
 }
 
+/* Request queue file descriptors from secondary to primary. */
+static int
+tap_mp_attach_queues(const char *port_name, struct rte_eth_dev *dev)
+{
+   int ret;
+   struct timespec timeout = {.tv_sec = 1, .tv_nsec = 0};
+   struct rte_mp_msg request, *reply;
+   struct rte_mp_reply replies;
+   struct ipc_queues *request_param = (struct ipc_queues *)request.param;
+   struct ipc_queues *reply_param;
+   int queue, fd_iterator;
+
+   /* Prepare the request */
+   strlcpy(request.name, TAP_MP_KEY, sizeof(request.name));
+   strlcpy(request_param->port_name, port_name,
+   sizeof(request_param->port_name));
+   request.len_param = sizeof(*request_param);
+   /* Send request and receive reply */
+   ret = rte_mp_request_sync(&request, &replies, &timeout);
+   if (ret < 0) {
+   TAP_LOG(ERR, "Failed to request queues from primary: %d

[dpdk-dev] [PATCH v2] mbuf: clarify QINQ flag usage

2018-10-02 Thread Ferruh Yigit

Update implementation that when PKT_RX_QINQ_STRIPPED mbuf ol_flags
set by PMD, PKT_RX_QINQ, PKT_RX_VLAN_STRIPPED & PKT_RX_VLAN
should be also set.

Clarify mbuf documentations that when PKT_RX_QINQ set PKT_RX_VLAN also
should be set.

So that appllication can rely on PKT_RX_QINQ flag to access both
mbuf.vlan_tci & mbuf.vlan_tci_outer

Signed-off-by: Ferruh Yigit 
---
Cc: Hyong Youb Kim 
Cc: John Daley 
---
 app/test-pmd/rxonly.c| 6 +++---
 doc/guides/nics/features.rst | 7 ---
 drivers/net/i40e/i40e_rxtx.c | 3 ++-
 lib/librte_mbuf/rte_mbuf.c   | 1 +
 lib/librte_mbuf/rte_mbuf.h   | 5 +++--
 5 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
index a93d80612..21b60062f 100644
--- a/app/test-pmd/rxonly.c
+++ b/app/test-pmd/rxonly.c
@@ -130,11 +130,11 @@ pkt_burst_receive(struct fwd_stream *fs)
}
if (ol_flags & PKT_RX_TIMESTAMP)
printf(" - timestamp %"PRIu64" ", mb->timestamp);
-   if (ol_flags & PKT_RX_VLAN_STRIPPED)
-   printf(" - VLAN tci=0x%x", mb->vlan_tci);
-   if (ol_flags & PKT_RX_QINQ_STRIPPED)
+   if (ol_flags & PKT_RX_QINQ)
printf(" - QinQ VLAN tci=0x%x, VLAN tci outer=0x%x",
mb->vlan_tci, mb->vlan_tci_outer);
+   else if (ol_flags & PKT_RX_VLAN_STRIPPED)
+   printf(" - VLAN tci=0x%x", mb->vlan_tci);
if (mb->packet_type) {
rte_get_ptype_name(mb->packet_type, buf, sizeof(buf));
printf(" - hw ptype: %s", buf);
diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index b085bda86..c0cbe3784 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -528,7 +528,7 @@ Supports VLAN offload to hardware.
 * **[uses]   rte_eth_rxconf,rte_eth_rxmode**: 
``offloads:DEV_RX_OFFLOAD_VLAN_STRIP,DEV_RX_OFFLOAD_VLAN_FILTER,DEV_RX_OFFLOAD_VLAN_EXTEND``.
 * **[uses]   rte_eth_txconf,rte_eth_txmode**: 
``offloads:DEV_TX_OFFLOAD_VLAN_INSERT``.
 * **[implements] eth_dev_ops**: ``vlan_offload_set``.
-* **[provides]   mbuf**: ``mbuf.ol_flags:PKT_RX_VLAN_STRIPPED``, 
``mbuf.vlan_tci``.
+* **[provides]   mbuf**: ``mbuf.ol_flags:PKT_RX_VLAN_STRIPPED``, 
``mbuf.ol_flags:PKT_RX_VLAN`` ``mbuf.vlan_tci``.
 * **[provides]   rte_eth_dev_info**: 
``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_VLAN_STRIP``,
   ``tx_offload_capa,tx_queue_offload_capa:DEV_TX_OFFLOAD_VLAN_INSERT``.
 * **[related]API**: ``rte_eth_dev_set_vlan_offload()``,
@@ -545,8 +545,9 @@ Supports QinQ (queue in queue) offload.
 * **[uses] rte_eth_rxconf,rte_eth_rxmode**: 
``offloads:DEV_RX_OFFLOAD_QINQ_STRIP``.
 * **[uses] rte_eth_txconf,rte_eth_txmode**: 
``offloads:DEV_TX_OFFLOAD_QINQ_INSERT``.
 * **[uses] mbuf**: ``mbuf.ol_flags:PKT_TX_QINQ_PKT``.
-* **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_QINQ_STRIPPED``, 
``mbuf.vlan_tci``,
-   ``mbuf.vlan_tci_outer``.
+* **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_QINQ_STRIPPED``, 
``mbuf.ol_flags:PKT_RX_QINQ``,
+  ``mbuf.ol_flags:PKT_RX_VLAN_STRIPPED``, ``mbuf.ol_flags:PKT_RX_VLAN``
+  ``mbuf.vlan_tci``, ``mbuf.vlan_tci_outer``.
 * **[provides] rte_eth_dev_info**: 
``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_QINQ_STRIP``,
   ``tx_offload_capa,tx_queue_offload_capa:DEV_TX_OFFLOAD_QINQ_INSERT``.
 
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 7c986d535..b2819f757 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -83,7 +83,8 @@ i40e_rxd_to_vlan_tci(struct rte_mbuf *mb, volatile union 
i40e_rx_desc *rxdp)
 #ifndef RTE_LIBRTE_I40E_16BYTE_RX_DESC
if (rte_le_to_cpu_16(rxdp->wb.qword2.ext_status) &
(1 << I40E_RX_DESC_EXT_STATUS_L2TAG2P_SHIFT)) {
-   mb->ol_flags |= PKT_RX_QINQ_STRIPPED;
+   mb->ol_flags |= PKT_RX_QINQ_STRIPPED | PKT_RX_QINQ |
+   PKT_RX_VLAN_STRIPPED | PKT_RX_VLAN;
mb->vlan_tci_outer = mb->vlan_tci;
mb->vlan_tci = rte_le_to_cpu_16(rxdp->wb.qword2.l2tag2_2);
PMD_RX_LOG(DEBUG, "Descriptor l2tag2_1: %u, l2tag2_2: %u",
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index e714c5a59..05a5a17fe 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -297,6 +297,7 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
case PKT_RX_QINQ_STRIPPED: return "PKT_RX_QINQ_STRIPPED";
+   case PKT_RX_QINQ: return "PKT_RX_QINQ";
case PKT_RX_LRO: return "PKT_RX_LRO";
case PKT_RX_TIMESTAMP: return "PKT_RX_TIMESTAMP";
case PKT_RX_SEC_OFFLOAD: return "PKT_RX_SEC_OFFLOAD";
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
in

Re: [dpdk-dev] [PATCH v2] mbuf: clarify QINQ flag usage

2018-10-02 Thread Andrew Rybchenko


On 10/2/18 2:36 PM, Ferruh Yigit wrote:

Update implementation that when PKT_RX_QINQ_STRIPPED mbuf ol_flags
set by PMD, PKT_RX_QINQ, PKT_RX_VLAN_STRIPPED & PKT_RX_VLAN
should be also set.

Clarify mbuf documentations that when PKT_RX_QINQ set PKT_RX_VLAN also
should be set.

So that appllication can rely on PKT_RX_QINQ flag to access both
mbuf.vlan_tci & mbuf.vlan_tci_outer

Signed-off-by: Ferruh Yigit 


Reviewed-by: Andrew Rybchenko

Re: [dpdk-dev] [PATCH v4 2/2] net/tap: add queues when attaching from secondary process

2018-10-02 Thread Thomas Monjalon

02/10/2018 12:34, Raslan Darawsheh:
> @@ -2056,6 +2179,13 @@ rte_pmd_tap_probe(struct rte_vdev_device *dev)
> TAP_LOG(NOTICE, "Initializing pmd_tap for %s as %s",
> name, tap_name);
>  
> +   /* Register IPC feed callback */
> +   ret = rte_mp_action_register(TAP_MP_KEY, tap_mp_sync_queues);
> +   if (ret < 0 && rte_errno != EEXIST) {
> +   TAP_LOG(ERR, "%s: Failed to register IPC callback: %s",
> +   tuntap_name, strerror(rte_errno));
> +   goto leave;
> +   }
> ret = eth_dev_tap_create(dev, tap_name, remote_iface, &user_mac,
> ETH_TUNTAP_TYPE_TAP);

Is it an issue registering tap_mp_sync_queues at each tap probing?
Should we do it only once?

Re: [dpdk-dev] [PATCH v4 2/2] net/tap: add queues when attaching from secondary process

2018-10-02 Thread Thomas Monjalon

02/10/2018 12:34, Raslan Darawsheh:
> --- a/doc/guides/rel_notes/release_18_11.rst
> +++ b/doc/guides/rel_notes/release_18_11.rst
> @@ -67,6 +67,10 @@ New Features
>SR-IOV option in Hyper-V and Azure. This is an alternative to the previous
>vdev_netvsc, tap, and failsafe drivers combination.
>  
> +* **Added TAP Rx/Tx queues sharing with a secondary process.**
> +
> +  A secondary process can attach a TAP device created in the primary process,
> +  probe the queues, and process Rx/Tx in a secondary process.

A blank line is missing here.

> @@ -2006,9 +2115,23 @@ rte_pmd_tap_probe(struct rte_vdev_device *dev)
>   TAP_LOG(ERR, "Failed to probe %s", name);
>   return -1;
>   }
> - /* TODO: request info from primary to set up Rx and Tx */
>   eth_dev->dev_ops = &ops;
>   eth_dev->device = &dev->device;
> + eth_dev->rx_pkt_burst = pmd_rx_burst;
> + eth_dev->tx_pkt_burst = pmd_tx_burst;
> + if (!rte_eal_primary_proc_alive(NULL)) {
> + TAP_LOG(ERR, "Primary process is missing");
> + return -1;
> + }
> + process_private = (struct pmd_process_private *)
> + rte_zmalloc_socket(name,
> + sizeof(struct pmd_process_private),
> + RTE_CACHE_LINE_SIZE,
> + eth_dev->device->numa_node);
> +
> + ret = tap_mp_attach_queues(name, eth_dev);
> + if (ret != 0)
> + return -1;
>   rte_eth_dev_probing_finish(eth_dev);
>   return 0;
>   }

Should we manage rte_pmd_tun_probe too?

Re: [dpdk-dev] [PATCH v3] ethdev: get rxq interrupt fd

2018-10-02 Thread Ferruh Yigit

On 9/29/2018 3:12 AM, Xiaoyun Li wrote:
> Some users want to use their own epoll instances to control both
> DPDK rxq interrupt fds and their own other fds. So added a function
> to get rxq interrupt fd based on port id and queue id.
> 
> Signed-off-by: Xiaoyun Li 

Reviewed-by: Ferruh Yigit

Re: [dpdk-dev] [PATCH v4 2/2] net/tap: add queues when attaching from secondary process

2018-10-02 Thread Raslan Darawsheh

It should be as of per device so we should do it for each port alone since 
several ports can have different queues.

Moreover, if the port that has the registration was closed or unplugged we'll 
not be able to sync qeues for other ports. 

Kindest regards,
Raslan Darawsheh

-Original Message-
From: Thomas Monjalon  
Sent: Tuesday, October 2, 2018 1:41 PM
To: Raslan Darawsheh 
Cc: dev@dpdk.org; keith.wi...@intel.com; Shahaf Shuler ; 
Ori Kam 
Subject: Re: [dpdk-dev] [PATCH v4 2/2] net/tap: add queues when attaching from 
secondary process

02/10/2018 12:34, Raslan Darawsheh:
> @@ -2056,6 +2179,13 @@ rte_pmd_tap_probe(struct rte_vdev_device *dev)
> TAP_LOG(NOTICE, "Initializing pmd_tap for %s as %s",
> name, tap_name);
>  
> +   /* Register IPC feed callback */
> +   ret = rte_mp_action_register(TAP_MP_KEY, tap_mp_sync_queues);
> +   if (ret < 0 && rte_errno != EEXIST) {
> +   TAP_LOG(ERR, "%s: Failed to register IPC callback: %s",
> +   tuntap_name, strerror(rte_errno));
> +   goto leave;
> +   }
> ret = eth_dev_tap_create(dev, tap_name, remote_iface, &user_mac,
> ETH_TUNTAP_TYPE_TAP);

Is it an issue registering tap_mp_sync_queues at each tap probing?
Should we do it only once?

[dpdk-dev] [PATCH v2 2/2] mbuf: fix Tx offload mask

2018-10-02 Thread Jerin Jacob

Fixes missing PKT_TX_UDP_SEG, PKT_TX_OUTER_IPV6,PKT_TX_OUTER_IPV4,
PKT_TX_IPV6 and  PKT_TX_IPV4 values in PKT_TX_OFFLOAD_MASK.

Also sort them in bit wise order to recognize missing items later.

Fixes: 6d18505efaa6 ("vhost: support UDP Fragmentation Offload")
Fixes: 1c3b7c33e977 ("mbuf: add Tx offloading flags for tunnels")
Fixes: 711ba9e23e68 ("mbuf: remove aliasing of Tx offloading flags with Rx 
ones")
Cc: sta...@dpdk.org
Cc: jiayu...@intel.com

Signed-off-by: Jerin Jacob 
---
v2:
- Add all missing PKT_TX_ types
- Sort them in bit mask order(Ferruh Yigit)
---
 lib/librte_mbuf/rte_mbuf.h | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index a50b05c64..c8ebc3230 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -334,16 +334,21 @@ extern "C" {
  * which can be set for packet.
  */
 #define PKT_TX_OFFLOAD_MASK (\
+   PKT_TX_OUTER_IPV6 |  \
+   PKT_TX_OUTER_IPV4 |  \
+   PKT_TX_OUTER_IP_CKSUM |  \
+   PKT_TX_VLAN_PKT |\
+   PKT_TX_IPV6 |\
+   PKT_TX_IPV4 |\
PKT_TX_IP_CKSUM |\
PKT_TX_L4_MASK | \
-   PKT_TX_OUTER_IP_CKSUM |  \
-   PKT_TX_TCP_SEG | \
PKT_TX_IEEE1588_TMST |   \
+   PKT_TX_TCP_SEG | \
PKT_TX_QINQ_PKT |\
-   PKT_TX_VLAN_PKT |\
PKT_TX_TUNNEL_MASK | \
PKT_TX_MACSEC |  \
-   PKT_TX_SEC_OFFLOAD)
+   PKT_TX_SEC_OFFLOAD |\
+   PKT_TX_UDP_SEG)
 
 /**
  * Mbuf having an external buffer attached. shinfo in mbuf must be filled.
-- 
2.19.0

[dpdk-dev] [PATCH v2 1/2] ethdev: add SCTP Rx checksum offload support

2018-10-02 Thread Jerin Jacob

Added SCTP Rx checksum offload support

Signed-off-by: Jerin Jacob 
---
v2:
- Fix printf formatting error(Ferruh Yigit)
---
 app/test-pmd/config.c  | 9 +
 doc/guides/nics/features.rst   | 4 ++--
 lib/librte_ethdev/rte_ethdev.c | 1 +
 lib/librte_ethdev/rte_ethdev.h | 1 +
 4 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 794aa5268..1adc9b94b 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -576,6 +576,15 @@ port_offload_cap_display(portid_t port_id)
printf("off\n");
}
 
+   if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_SCTP_CKSUM) {
+   printf("RX SCTP checksum:  ");
+   if (ports[port_id].dev_conf.rxmode.offloads &
+   DEV_RX_OFFLOAD_SCTP_CKSUM)
+   printf("on\n");
+   else
+   printf("off\n");
+   }
+
if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_OUTER_IPV4_CKSUM) {
printf("RX Outer IPv4 checksum:");
if (ports[port_id].dev_conf.rxmode.offloads &
diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index b085bda86..d42489b6d 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -576,7 +576,7 @@ L4 checksum offload
 
 Supports L4 checksum offload.
 
-* **[uses] rte_eth_rxconf,rte_eth_rxmode**: 
``offloads:DEV_RX_OFFLOAD_UDP_CKSUM,DEV_RX_OFFLOAD_TCP_CKSUM``.
+* **[uses] rte_eth_rxconf,rte_eth_rxmode**: 
``offloads:DEV_RX_OFFLOAD_UDP_CKSUM,DEV_RX_OFFLOAD_TCP_CKSUM,DEV_RX_OFFLOAD_SCTP_CKSUM``.
 * **[uses] rte_eth_txconf,rte_eth_txmode**: 
``offloads:DEV_TX_OFFLOAD_UDP_CKSUM,DEV_TX_OFFLOAD_TCP_CKSUM,DEV_TX_OFFLOAD_SCTP_CKSUM``.
 * **[uses] mbuf**: ``mbuf.ol_flags:PKT_TX_IPV4`` | ``PKT_TX_IPV6``,
   ``mbuf.ol_flags:PKT_TX_L4_NO_CKSUM`` | ``PKT_TX_TCP_CKSUM`` |
@@ -584,7 +584,7 @@ Supports L4 checksum offload.
 * **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_L4_CKSUM_UNKNOWN`` |
   ``PKT_RX_L4_CKSUM_BAD`` | ``PKT_RX_L4_CKSUM_GOOD`` |
   ``PKT_RX_L4_CKSUM_NONE``.
-* **[provides] rte_eth_dev_info**: 
``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_UDP_CKSUM,DEV_RX_OFFLOAD_TCP_CKSUM``,
+* **[provides] rte_eth_dev_info**: 
``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_UDP_CKSUM,DEV_RX_OFFLOAD_TCP_CKSUM,DEV_RX_OFFLOAD_SCTP_CKSUM``,
   
``tx_offload_capa,tx_queue_offload_capa:DEV_TX_OFFLOAD_UDP_CKSUM,DEV_TX_OFFLOAD_TCP_CKSUM,DEV_TX_OFFLOAD_SCTP_CKSUM``.
 
 .. _nic_features_hw_timestamp:
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index ef99f7068..e9a82fe7f 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -126,6 +126,7 @@ static const struct {
RTE_RX_OFFLOAD_BIT2STR(TIMESTAMP),
RTE_RX_OFFLOAD_BIT2STR(SECURITY),
RTE_RX_OFFLOAD_BIT2STR(KEEP_CRC),
+   RTE_RX_OFFLOAD_BIT2STR(SCTP_CKSUM),
 };
 
 #undef RTE_RX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 012577b0a..d02db14ad 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -888,6 +888,7 @@ struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_TIMESTAMP   0x4000
 #define DEV_RX_OFFLOAD_SECURITY 0x8000
 #define DEV_RX_OFFLOAD_KEEP_CRC0x0001
+#define DEV_RX_OFFLOAD_SCTP_CKSUM  0x0002
 
 #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
 DEV_RX_OFFLOAD_UDP_CKSUM | \
-- 
2.19.0

Re: [dpdk-dev] [PATCH] build: add drivers_install_subdir meson option

2018-10-02 Thread Marco Varlese

On Mon, 2018-10-01 at 12:24 +0100, Luca Boccassi wrote:
> On Mon, 2018-10-01 at 12:06 +0100, Bruce Richardson wrote:
> > On Mon, Oct 01, 2018 at 12:42:09PM +0200, Timothy Redaelli wrote:
> > > On Mon, 01 Oct 2018 10:46:02 +0100
> > > Luca Boccassi  wrote:
> > > 
> > > > On Mon, 2018-10-01 at 10:25 +0100, Bruce Richardson wrote:
> > > > > On Mon, Oct 01, 2018 at 10:17:14AM +0100, Bruce Richardson
> > > > > wrote:  
> > > > > > On Fri, Sep 28, 2018 at 06:58:03PM +0100, Luca Boccassi
> > > > > > wrote:  
> > > > > > > Allow users and packagers to override the default
> > > > > > > dpdk/drivers
> > > > > > > subdirectory where the PMDs get installed under $lib.
> > > > > > > 
> > > > > > > Signed-off-by: Luca Boccassi 
> > > > > > > ---  
> > > > > > 
> > > > > > I'm ok with this change, but what is the current location
> > > > > > used by
> > > > > > distro's
> > > > > > right now? I mistakenly never checked what was done before I
> > > > > > used
> > > > > > dpdk/drivers as a default value, and I'd like the default to
> > > > > > match
> > > > > > the
> > > > > > common option if possible.
> > > > > > 
> > > > > > /Bruce
> > > > > >   
> > > > > 
> > > > > Replying to my own question, I've just checked on CentOS and
> > > > > Debian,
> > > > > and it
> > > > > appears both are using directory "dpdk-pmds" as the subdir
> > > > > name.
> > > > > Therefore,
> > > > > let's just make that the default. [Does it need to be
> > > > > configurable in
> > > > > that
> > > > > case?]
> > > > > 
> > > > > /Bruce  
> > > > 
> > > > If the default is the one I expect then I'm fine without having
> > > > an
> > > > option (actually happier - less things to configure).
> > > > 
> > > > But in Debian/Ubuntu it's dpdk-MAJORVER-drivers since last
> > > > January :-)
> > > > We changed because using a single directory creates problems when
> > > > multiple different ABI versions are installed, due to the EAL
> > > > autoload
> > > > from that directory. So we need a different subdirectory per ABI
> > > > revision.
> > > > 
> > > > We were actually talking with Timothy a while ago to make this
> > > > consistent across our distros, and perhaps Marco can chip in as
> > > > well.
> > > > 
> > > > Timothy, Marco, is using dpdk-MAJORVER-$something ok for you? I'm
> > > > not
> > > > too fussy on $something, it can be drivers or pmds or something
> > > > else.
> > > > 
> > > 
> > > LGTM.
> > > If needed, we can just do a compatibility symlink using the current
> > > dpdk-pmds path
> > > 
> > 
> > One suggestion/comment. Would using a unique directory per release
> > not lead
> > to clobbering up the lib directory unnecessarily? How about having a
> > single
> > "dpdk" or "dpdk-pmds" directory in lib, and having $MAJORVER as a
> > subdir
> > under that?
> > 
> > E.g. dpdk/pmds-18.08/, dpdk/pmds-18.11/, or dpdk-pmds/18.08/
> > dpdk-pmds/18.11
> > 
> > [The former of the above would be my preference, since I don't like
> > having
> > hypenated names, and like having "dpdk" alone as a folder name :-)]
> > 
> > /Bruce
> 
> dpdk/pmds-XX.YY/ would work for me. Timothy and Marco?
That would work for us.
However, I would suggest to have the path to be configurable (feature to be
dropped in maybe next release). Just to make sure the transition can happen
without pain in the remote circumstance that something goes wrong with
packaging...
> 
-- 
Marco V

SUSE LINUX GmbH | GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg) Maxfeldstr. 5, D-90409, Nürnberg

Re: [dpdk-dev] [PATCH v8] app/testpmd: add noisy neighbour forwarding mode

2018-10-02 Thread Iremonger, Bernard

Hi Jens,

> -Original Message-
> From: Jens Freimann [mailto:jfreim...@redhat.com]
> Sent: Tuesday, October 2, 2018 8:44 AM
> To: dev@dpdk.org
> Cc: ai...@redhat.com; jan.scheur...@ericsson.com; Richardson, Bruce
> ; tho...@monjalon.net;
> maxime.coque...@redhat.com; Ananyev, Konstantin
> ; Yigit, Ferruh ;
> Iremonger, Bernard ; ktray...@redhat.com
> Subject: [PATCH v8] app/testpmd: add noisy neighbour forwarding mode
> 


> diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c index
> 9220e1c1b..97e0dfa49 100644
> --- a/app/test-pmd/parameters.c
> +++ b/app/test-pmd/parameters.c

The usage() function needs to be updated with the  noisy information after line 
192.
 
> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> index 3a73000a6..99a005a0c 100644
> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst


> -Example::
> +* ``noisy``: Noisy neighbour simulation.
> +  Simulate more realistic behavior of a guest machine engaged in
> +receiving
> +  and sending packets performing Virtual Network Function (VNF).
> 
> +Example::
A line has been deleted after the above line, it should be restored to correct 
the formatting in the html output

> testpmd> set fwd rxonly
> 
> Set rxonly packet forwarding mode
> --
> 2.17.1

Regards,

Bernard.

Re: [dpdk-dev] [PATCH v4 2/2] net/tap: add queues when attaching from secondary process

2018-10-02 Thread Thomas Monjalon

02/10/2018 12:50, Raslan Darawsheh:
> From: Thomas Monjalon  
> > 02/10/2018 12:34, Raslan Darawsheh:
> > > @@ -2056,6 +2179,13 @@ rte_pmd_tap_probe(struct rte_vdev_device *dev)
> > > 
> > > TAP_LOG(NOTICE, "Initializing pmd_tap for %s as %s",
> > > 
> > > name, tap_name);
> > > 
> > > +   /* Register IPC feed callback */
> > > +   ret = rte_mp_action_register(TAP_MP_KEY, tap_mp_sync_queues);
> > > +   if (ret < 0 && rte_errno != EEXIST) {
> > > +   TAP_LOG(ERR, "%s: Failed to register IPC callback: %s",
> > > +   tuntap_name, strerror(rte_errno));
> > > +   goto leave;
> > > +   }
> > > 
> > > ret = eth_dev_tap_create(dev, tap_name, remote_iface, &user_mac,
> > > 
> > > ETH_TUNTAP_TYPE_TAP);
> > 
> > Is it an issue registering tap_mp_sync_queues at each tap probing?
> > Should we do it only once?
> 
> It should be as of per device so we should do it for each port alone since 
> several ports can have different queues.
> 
> Moreover, if the port that has the registration was closed or unplugged we'll 
> not be able to sync qeues for other ports. 

I think we should do register on first tap device probing and never unregisters.

Ferruh, any opinion?

Re: [dpdk-dev] [PATCH v6 1/8] net/mvneta: add neta PMD skeleton

2018-10-02 Thread Ferruh Yigit

On 10/1/2018 10:26 AM, Andrzej Ostruszka wrote:
> From: Zyta Szpak 
> 
> Add neta pmd driver skeleton providing base for the further
> development.
> 
> Signed-off-by: Natalie Samsonov 
> Signed-off-by: Yelena Krivosheev 
> Signed-off-by: Dmitri Epshtein 
> Signed-off-by: Zyta Szpak 
> Signed-off-by: Andrzej Ostruszka 
> ---
>  MAINTAINERS   |   8 +
>  config/common_base|   5 +
>  devtools/test-build.sh|   2 +
>  doc/guides/nics/features/mvneta.ini   |  11 +
>  doc/guides/nics/mvneta.rst| 152 +++

dpdk/doc/guides/nics/mvneta.rst: WARNING: document isn't included in any toctree

Please add document to doc/guides/nics/index.rst

<...>

> +Config File Options
> +---
> +
> +The following options can be modified in the ``config`` file.
> +
> +- ``CONFIG_RTE_LIBRTE_MVNETA_PMD`` (default ``n``)
> +
> +Toggle compilation of the librte_pmd_mvneta driver.
> +

Good to have another section to document "Runtime options" (iface)

> +
> +Usage example
> +^
> +
> +.. code-block:: console
> +
> +   ./testpmd --vdev=net_mvneta,iface=eth0,iface=eth1 \
> +-c 3 -- -i --p 3 -a
> +
> +
> +Building DPDK
> +-
> +
> +Driver needs precompiled MUSDK library during compilation.
> +
> +.. code-block:: console
> +
> +   export CROSS_COMPILE=/bin/aarch64-linux-gnu-
> +   ./bootstrap
> +   ./configure --host=aarch64-linux-gnu --enable-bpool-dma=64

getting "configure: WARNING: unrecognized options: --enable-bpool-dma"

Is this config option still valid for 18.09?

<...>

> +
> +static int mvneta_dev_num;
> +static int mvneta_lcore_first;
> +static int mvneta_lcore_last;

These static variables seems assigned but not used, can you please check?


<...>

> +
> +RTE_PMD_REGISTER_VDEV(net_mvneta, pmd_mvneta_drv);

Need to document supported devargs with RTE_PMD_REGISTER_PARAM_STRING

<...>

> +struct mvneta_priv {
> + /* Hot fields, used in fast path. */
> + struct neta_ppio*ppio;/**< Port handler pointer */
> +
> + uint8_t pp_id;
> + uint8_t ppio_id;/* ppio port id */
> + uint8_t uc_mc_flushed;
> + uint8_t multiseg;
> +
> + struct neta_ppio_params ppio_params;
> + uint16_t nb_rx_queues;

Do you need this private variable, isn't it duplicate of 
"dev->data->nb_rx_queues"?
And as far as I can see "dev->data->nb_rx_queues" one used.

Re: [dpdk-dev] [PATCH v6 2/8] net/mvneta: add Rx/Tx support

2018-10-02 Thread Ferruh Yigit

On 10/1/2018 10:26 AM, Andrzej Ostruszka wrote:
> From: Zyta Szpak 
> 
> Add part of PMD for actual reception/transmission.
> 
> Signed-off-by: Yelena Krivosheev 
> Signed-off-by: Dmitri Epshtein 
> Signed-off-by: Zyta Szpak 

<...>

> @@ -0,0 +1,850 @@
> +#include "mvneta_rxtx.h"
> +
> +uint64_t cookie_addr_high = MVNETA_COOKIE_ADDR_INVALID;
> +uint16_t rx_desc_free_thresh = MRVL_NETA_BUF_RELEASE_BURST_SIZE_MIN;

Can these global variables be static?
If not please add a mvneta_ prefix, but please try to make them static.

Re: [dpdk-dev] [PATCH v8] app/testpmd: add noisy neighbour forwarding mode

2018-10-02 Thread Kevin Traynor

On 10/02/2018 08:44 AM, Jens Freimann wrote:
> This adds a new forwarding mode to testpmd to simulate
> more realistic behavior of a guest machine engaged in receiving
> and sending packets performing Virtual Network Function (VNF).
> 



As there's going to be a v9 anyway, you can also fix the below error
messages to be '>= 0'

> + if (!strcmp(lgopts[opt_idx].name,
> + "noisy-lkup-memory")) {
> + n = atoi(optarg);
> + if (n >= 0)
> + noisy_lkup_mem_sz = n;
> + else
> + rte_exit(EXIT_FAILURE,
> +  "noisy-lkup-memory must be > 
> 0\n");
> + }
> + if (!strcmp(lgopts[opt_idx].name,
> + "noisy-lkup-num-writes")) {
> + n = atoi(optarg);
> + if (n >= 0)
> + noisy_lkup_num_writes = n;
> + else
> + rte_exit(EXIT_FAILURE,
> +  "noisy-lkup-num-writes must be 
> > 0\n");
> + }
> + if (!strcmp(lgopts[opt_idx].name,
> + "noisy-lkup-num-reads")) {
> + n = atoi(optarg);
> + if (n >= 0)
> + noisy_lkup_num_reads = n;
> + else
> + rte_exit(EXIT_FAILURE,
> +  "noisy-lkup-num-reads must be 
> > 0\n");
> + }
> + if (!strcmp(lgopts[opt_idx].name,
> + "noisy-lkup-num-reads-writes")) {
> + n = atoi(optarg);
> + if (n >= 0)
> + noisy_lkup_num_reads_writes = n;
> + else
> + rte_exit(EXIT_FAILURE,
> +  "noisy-lkup-num-reads-writes 
> must be > 0\n");
> + }

Re: [dpdk-dev] [PATCH] build: add drivers_install_subdir meson option

2018-10-02 Thread Bruce Richardson

On Tue, Oct 02, 2018 at 01:02:26PM +0200, Marco Varlese wrote:
> On Mon, 2018-10-01 at 12:24 +0100, Luca Boccassi wrote:
> > On Mon, 2018-10-01 at 12:06 +0100, Bruce Richardson wrote:
> > > On Mon, Oct 01, 2018 at 12:42:09PM +0200, Timothy Redaelli wrote:
> > > > On Mon, 01 Oct 2018 10:46:02 +0100
> > > > Luca Boccassi  wrote:
> > > > 
> > > > > On Mon, 2018-10-01 at 10:25 +0100, Bruce Richardson wrote:
> > > > > > On Mon, Oct 01, 2018 at 10:17:14AM +0100, Bruce Richardson
> > > > > > wrote:  
> > > > > > > On Fri, Sep 28, 2018 at 06:58:03PM +0100, Luca Boccassi
> > > > > > > wrote:  
> > > > > > > > Allow users and packagers to override the default
> > > > > > > > dpdk/drivers
> > > > > > > > subdirectory where the PMDs get installed under $lib.
> > > > > > > > 
> > > > > > > > Signed-off-by: Luca Boccassi 
> > > > > > > > ---  
> > > > > > > 
> > > > > > > I'm ok with this change, but what is the current location
> > > > > > > used by
> > > > > > > distro's
> > > > > > > right now? I mistakenly never checked what was done before I
> > > > > > > used
> > > > > > > dpdk/drivers as a default value, and I'd like the default to
> > > > > > > match
> > > > > > > the
> > > > > > > common option if possible.
> > > > > > > 
> > > > > > > /Bruce
> > > > > > >   
> > > > > > 
> > > > > > Replying to my own question, I've just checked on CentOS and
> > > > > > Debian,
> > > > > > and it
> > > > > > appears both are using directory "dpdk-pmds" as the subdir
> > > > > > name.
> > > > > > Therefore,
> > > > > > let's just make that the default. [Does it need to be
> > > > > > configurable in
> > > > > > that
> > > > > > case?]
> > > > > > 
> > > > > > /Bruce  
> > > > > 
> > > > > If the default is the one I expect then I'm fine without having
> > > > > an
> > > > > option (actually happier - less things to configure).
> > > > > 
> > > > > But in Debian/Ubuntu it's dpdk-MAJORVER-drivers since last
> > > > > January :-)
> > > > > We changed because using a single directory creates problems when
> > > > > multiple different ABI versions are installed, due to the EAL
> > > > > autoload
> > > > > from that directory. So we need a different subdirectory per ABI
> > > > > revision.
> > > > > 
> > > > > We were actually talking with Timothy a while ago to make this
> > > > > consistent across our distros, and perhaps Marco can chip in as
> > > > > well.
> > > > > 
> > > > > Timothy, Marco, is using dpdk-MAJORVER-$something ok for you? I'm
> > > > > not
> > > > > too fussy on $something, it can be drivers or pmds or something
> > > > > else.
> > > > > 
> > > > 
> > > > LGTM.
> > > > If needed, we can just do a compatibility symlink using the current
> > > > dpdk-pmds path
> > > > 
> > > 
> > > One suggestion/comment. Would using a unique directory per release
> > > not lead
> > > to clobbering up the lib directory unnecessarily? How about having a
> > > single
> > > "dpdk" or "dpdk-pmds" directory in lib, and having $MAJORVER as a
> > > subdir
> > > under that?
> > > 
> > > E.g. dpdk/pmds-18.08/, dpdk/pmds-18.11/, or dpdk-pmds/18.08/
> > > dpdk-pmds/18.11
> > > 
> > > [The former of the above would be my preference, since I don't like
> > > having
> > > hypenated names, and like having "dpdk" alone as a folder name :-)]
> > > 
> > > /Bruce
> > 
> > dpdk/pmds-XX.YY/ would work for me. Timothy and Marco?
> That would work for us.
> However, I would suggest to have the path to be configurable (feature to be
> dropped in maybe next release). Just to make sure the transition can happen
> without pain in the remote circumstance that something goes wrong with
> packaging...
> > 
> -- 
> Marco V
> 
Yes, I think it needs to be configurable for the forseeable future. If the
DPDK version is to be put in the path then we either need to always use a
configurable version, since we can't hardcode a version number in the
default, or else we need to put logic in the meson.build file to always
insert a version number.

/Bruce

[dpdk-dev] [PATCH v12 1/7] bus: add hot-unplug handler

2018-10-02 Thread Jeff Guo

A hot-unplug failure and app crash can be caused, when a device is
hot-unplugged but the application still try to access the device
by reading or writing from the BARs, which is already invalid but
still not timely be unmap or released.

This patch introduces bus ops to handle hot-unplug failures. Each
bus can implement its own case-dependent logic to handle the failures.

Signed-off-by: Jeff Guo 
---
v12->v11:
no change.
---
 lib/librte_eal/common/include/rte_bus.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h 
b/lib/librte_eal/common/include/rte_bus.h
index b7b5b08..1bb53dc 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implement a specific hot-unplug handler, which is responsible for
+ * handle the failure when device be hot-unplugged. When the event of
+ * hot-unplug be detected, it could call this function to handle
+ * the hot-unplug failure and avoid app crash.
+ * @param dev
+ * Pointer of the device structure.
+ *
+ * @return
+ * 0 on success.
+ * !0 on error.
+ */
+typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -212,6 +226,8 @@ struct rte_bus {
struct rte_bus_conf conf;/**< Bus configuration */
rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
rte_dev_iterate_t dev_iterate; /**< Device iterator. */
+   rte_bus_hot_unplug_handler_t hot_unplug_handler;
+   /**< handle hot-unplug failure on the bus */
 };
 
 /**
-- 
2.7.4

[dpdk-dev] [PATCH v12 2/7] bus/pci: implement hot-unplug handler ops

2018-10-02 Thread Jeff Guo

This patch implements the ops to handle hot-unplug on the PCI bus.
For UIO PCI, it could avoids BARs read/write errors by creating a
new dummy memory to remap the memory where the failure is. For VFIO
or other kernel driver, it could specific implement function to handle
hot-unplug case by case.

Signed-off-by: Jeff Guo 
Acked-by: Shaopeng He 
---
v12->v11:
no change.
---
 drivers/bus/pci/pci_common.c | 28 
 drivers/bus/pci/pci_common_uio.c | 33 +
 drivers/bus/pci/private.h| 12 
 3 files changed, 73 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 7736b3f..d286234 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -406,6 +406,33 @@ pci_find_device(const struct rte_device *start, 
rte_dev_cmp_t cmp,
 }
 
 static int
+pci_hot_unplug_handler(struct rte_device *dev)
+{
+   struct rte_pci_device *pdev = NULL;
+   int ret = 0;
+
+   pdev = RTE_DEV_TO_PCI(dev);
+   if (!pdev)
+   return -1;
+
+   switch (pdev->kdrv) {
+   case RTE_KDRV_IGB_UIO:
+   case RTE_KDRV_UIO_GENERIC:
+   case RTE_KDRV_NIC_UIO:
+   /* BARs resource is invalid, remap it to be safe. */
+   ret = pci_uio_remap_resource(pdev);
+   break;
+   default:
+   RTE_LOG(DEBUG, EAL,
+   "Not managed by a supported kernel driver, skipped\n");
+   ret = -1;
+   break;
+   }
+
+   return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -435,6 +462,7 @@ struct rte_pci_bus rte_pci_bus = {
.unplug = pci_unplug,
.parse = pci_parse,
.get_iommu_class = rte_pci_get_iommu_class,
+   .hot_unplug_handler = pci_hot_unplug_handler,
},
.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+   int i;
+   void *map_address;
+
+   if (dev == NULL)
+   return -1;
+
+   /* Remap all BARs */
+   for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+   /* skip empty BAR */
+   if (dev->mem_resource[i].phys_addr == 0)
+   continue;
+   map_address = mmap(dev->mem_resource[i].addr,
+   (size_t)dev->mem_resource[i].len,
+   PROT_READ | PROT_WRITE,
+   MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+   if (map_address == MAP_FAILED) {
+   RTE_LOG(ERR, EAL,
+   "Cannot remap resource for device %s\n",
+   dev->name);
+   return -1;
+   }
+   RTE_LOG(INFO, EAL,
+   "Successful remap resource for device %s\n",
+   dev->name);
+   }
+
+   return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 8ddd03e..6b312e5 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -123,6 +123,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
-- 
2.7.4

[dpdk-dev] [PATCH v12 0/7] hot-unplug failure handle mechanism

2018-10-02 Thread Jeff Guo

Hotplug is an important feature for use-cases like the datacenter device's
fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
flexibility and continuality to networking services in multiple use-cases
in the industry. So let's see how DPDK can help users implement hotplug
solutions.

We already have a general device-event monitor mechanism, failsafe driver,
and hot plug/unplug API in DPDK. We have already got the solution of
“ethdev event + kernel PMD hotplug handler + failsafe”, but we still not
got “eal event + hotplug handler for pci PMD + failsafe” implement, and we
need to considerate 2 different solutions between uio pci and vfio pci.

In the case of hotplug for igb_uio, when a hardware device be removed
physically or disabled in software, the application needs to be notified
and detach the device out of the bus, and then make the device invalidate.
The problem is that, the removal of the device is not instantaneous in
software. If the application data path tries to read/write to the device
when removal is still in process, it will cause an MMIO error and
application will crash.

In this patch set, we propose a PCIe bus failure handler mechanism for
hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs,
the application will not crash.

The mechanism should work as below:

First, the application enables the device event monitor, registers the
hotplug event’s callback and enable hotplug handling before running the
data path. Once the hot-unplug occurs, the mechanism will detect the
removal event and then accordingly do the failure handling. In order to
do that, the below functionality will be required:
 - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure.
 - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci,
   it will be based on the failure address to remap memory for the corresponding
   device that unplugged. For vfio pci, could seperate implement case by case.

For the data path or other unexpected behaviors from the control path
when a hot unplug occurs:
 - Add a new bus ops “sigbus_handler”, that is responsible for handling
   the sigbus error which is either an original memory error, or a specific
   memory error that is caused by a hot unplug. When a sigbus error is
   captured, it will call this function to handle sigbus error.
 - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all
   device on PCI bus to find which device encounter the failure.
 - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
   to handle the failure.
 - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
   “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling.
   It will monitor the sigbus error by a handler which is per-process.
   Based on the signal event principle, the control path thread and the
   data path thread will randomly receive the sigbus error, but will call the
   common sigbus process. When sigbus be captured, it will call the above API
   to find bus to handle it.

The mechanism could be used by app or PMDs. For example, the whole process
of hotplug in testpmd is:
 - Enable device event monitor->Enable hotplug handle->Register event callback
   ->attach port->start port->start forwarding->Device unplug->failure handle
   ->stop forwarding->stop port->close port->detach port.

This patch set would not cover hotplug insert and binding, and it is only
implement the igb_uio failure handler, the vfio hotplug failure handler
will be in next coming patch set.

patchset history:
v12->v11:
add and delete some checking about sigbus recover.

v11->v10:
change the ops name, since both uio and vfio will use the hot-unplug ops.
since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
move the igb_uio fixing part, since it is random issue and should be considarate
as kernel driver defect but not include as this failure handler mechanism.

v10->v9:
modify the api name and exposure out for public use.
add hotplug handle enable/disable APIs
refine commit log

v9->v8:
refine commit log to be more readable.

v8->v7:
refine errno process in sigbus handler.
refine igb uio release process

v7->v6:
delete some unused part

v6->v5:
refine some description about bus ops
refine commit log
add some entry check.

v5->v4:
split patches to focus on the failure handle, remove the event usage
by testpmd to another patch.
change the hotplug failure handler name.
refine the sigbus handle logic.
add lock for udev state in igb uio driver.

v4->v3:
split patches to be small and clear.
change to use new parameter "--hotplug-mode" in testpmd to identify
the eal hotplug and ethdev hotplug.

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler distingush handle generic.
sigbus and hotplug sigbus.

v2->v1(v21):
refine some doc and commit log.
fix igb uio kernel issue for control path fai

[dpdk-dev] [PATCH v12 4/7] bus/pci: implement sigbus handler ops

2018-10-02 Thread Jeff Guo

This patch implements the ops for the PCI bus sigbus handler. It finds the
PCI device that is being hot-unplugged and calls the relevant ops of the
hot-unplug handler to handle the hot-unplug failure of the device.

Signed-off-by: Jeff Guo 
Acked-by: Shaopeng He 
---
v12->v11:
no change.
---
 drivers/bus/pci/pci_common.c | 53 
 1 file changed, 53 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index d286234..f313fe9 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -405,6 +405,36 @@ pci_find_device(const struct rte_device *start, 
rte_dev_cmp_t cmp,
return NULL;
 }
 
+/**
+ * find the device which encounter the failure, by iterate over all device on
+ * PCI bus to check if the memory failure address is located in the range
+ * of the BARs of the device.
+ */
+static struct rte_pci_device *
+pci_find_device_by_addr(const void *failure_addr)
+{
+   struct rte_pci_device *pdev = NULL;
+   int i;
+
+   FOREACH_DEVICE_ON_PCIBUS(pdev) {
+   for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+   if ((uint64_t)(uintptr_t)failure_addr >=
+   (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
+   (uint64_t)(uintptr_t)failure_addr <
+   (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
+   pdev->mem_resource[i].len) {
+   RTE_LOG(INFO, EAL, "Failure address "
+   "%16.16"PRIx64" belongs to "
+   "device %s!\n",
+   (uint64_t)(uintptr_t)failure_addr,
+   pdev->device.name);
+   return pdev;
+   }
+   }
+   }
+   return NULL;
+}
+
 static int
 pci_hot_unplug_handler(struct rte_device *dev)
 {
@@ -433,6 +463,28 @@ pci_hot_unplug_handler(struct rte_device *dev)
 }
 
 static int
+pci_sigbus_handler(const void *failure_addr)
+{
+   struct rte_pci_device *pdev = NULL;
+   int ret = 0;
+
+   pdev = pci_find_device_by_addr(failure_addr);
+   if (!pdev) {
+   /* It is a generic sigbus error, no bus would handle it. */
+   ret = 1;
+   } else {
+   /* The sigbus error is caused of hot-unplug. */
+   ret = pci_hot_unplug_handler(&pdev->device);
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Failed to handle hot-unplug for "
+   "device %s", pdev->name);
+   ret = -1;
+   }
+   }
+   return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -463,6 +515,7 @@ struct rte_pci_bus rte_pci_bus = {
.parse = pci_parse,
.get_iommu_class = rte_pci_get_iommu_class,
.hot_unplug_handler = pci_hot_unplug_handler,
+   .sigbus_handler = pci_sigbus_handler,
},
.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
-- 
2.7.4

[dpdk-dev] [PATCH v12 3/7] bus: add sigbus handler

2018-10-02 Thread Jeff Guo

When a device is hot-unplugged, a sigbus error will occur of the datapath
can still read/write to the device. A handler is required here to capture
the sigbus signal and handle it appropriately.

This patch introduces a bus ops to handle sigbus errors. Each bus can
implement its own case-dependent logic to handle the sigbus errors.

Signed-off-by: Jeff Guo 
Acked-by: Shaopeng He 
---
v12->v11:
no change.
---
 lib/librte_eal/common/include/rte_bus.h | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h 
b/lib/librte_eal/common/include/rte_bus.h
index 1bb53dc..201454a 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -182,6 +182,21 @@ typedef int (*rte_bus_parse_t)(const char *name, void 
*addr);
 typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev);
 
 /**
+ * Implement a specific sigbus handler, which is responsible for handling
+ * the sigbus error which is either original memory error, or specific memory
+ * error that caused of device be hot-unplugged. When sigbus error be captured,
+ * it could call this function to handle sigbus error.
+ * @param failure_addr
+ * Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ * 0 for success handle the sigbus.
+ * 1 for no bus handle the sigbus.
+ * -1 for failed to handle the sigbus
+ */
+typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -228,6 +243,9 @@ struct rte_bus {
rte_dev_iterate_t dev_iterate; /**< Device iterator. */
rte_bus_hot_unplug_handler_t hot_unplug_handler;
/**< handle hot-unplug failure on the bus */
+   rte_bus_sigbus_handler_t sigbus_handler;
+   /**< handle sigbus error on the bus */
+
 };
 
 /**
-- 
2.7.4

[dpdk-dev] [PATCH v11 6/7] eal: add failure handle mechanism for hot-unplug

2018-10-02 Thread Jeff Guo

The mechanism can initially register the sigbus handler after the device
event monitor is enabled. When a sigbus event is captured, it will check
the failure address and accordingly handle the memory failure of the
corresponding device by invoke the hot-unplug handler. It could prevent
the application from crashing when a device is hot-unplugged.

By this patch, users could call below new added APIs to enable/disable
the device hotplug handle mechanism. Note that it just implement the
hot-unplug handler in these functions, the other handler of hotplug, such
as handler for hotplug binding, could be add in the future if need:
  - rte_dev_hotplug_handle_enable
  - rte_dev_hotplug_handle_disable

Signed-off-by: Jeff Guo 
---
v12->v11:
add and delete some checking about sigbus recover.
---
 doc/guides/rel_notes/release_18_08.rst  |   5 +
 lib/librte_eal/bsdapp/eal/eal_dev.c |  14 +++
 lib/librte_eal/common/eal_private.h |  26 +
 lib/librte_eal/common/include/rte_dev.h |  26 +
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 164 +++-
 lib/librte_eal/rte_eal_version.map  |   2 +
 6 files changed, 236 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_18_08.rst 
b/doc/guides/rel_notes/release_18_08.rst
index 321fa84..fe0e60f 100644
--- a/doc/guides/rel_notes/release_18_08.rst
+++ b/doc/guides/rel_notes/release_18_08.rst
@@ -117,6 +117,11 @@ New Features
 
   Added support for chained mbufs (input and output).
 
+* **Added hot-unplug handle mechanism.**
+
+  ``rte_dev_hotplug_handle_enable`` and ``rte_dev_hotplug_handle_disable`` are
+  for enabling or disabling hotplug handle mechanism.
+
 
 API Changes
 ---
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c 
b/lib/librte_eal/bsdapp/eal/eal_dev.c
index 1c6c51b..255d611 100644
--- a/lib/librte_eal/bsdapp/eal/eal_dev.c
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -19,3 +19,17 @@ rte_dev_event_monitor_stop(void)
RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
return -1;
 }
+
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void)
+{
+   RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+   return -1;
+}
+
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void)
+{
+   RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+   return -1;
+}
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index a2d1528..637f20d 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -317,4 +317,30 @@ rte_devargs_layers_parse(struct rte_devargs *devargs,
  */
 int rte_bus_sigbus_handler(const void *failure_addr);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Register the sigbus handler.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_sigbus_handler_register(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Unregister the sigbus handler.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_sigbus_handler_unregister(void);
+
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h 
b/lib/librte_eal/common/include/rte_dev.h
index b80a805..ff580a0 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -460,4 +460,30 @@ rte_dev_event_monitor_start(void);
 int __rte_experimental
 rte_dev_event_monitor_stop(void);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Enable hotplug handling for devices.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Disable hotplug handling for devices.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void);
+
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c 
b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..72fc033 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -14,15 +16,32 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 
 #include "eal_private.h"
 
 static struct rte_intr_handle intr_handle = {.fd = -1 };
 static bool monitor_started;
+static bool hotplug_handle;
 
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/*
+ * spinlock for device hot-unplug failure handling. If it try to access bus or
+ * device, such as handle sigbus on bus or handle memory failure for device
+ * just need to use this lock. It could protect the bus and the device to

[dpdk-dev] [PATCH v12 5/7] bus: add helper to handle sigbus

2018-10-02 Thread Jeff Guo

This patch aims to add a helper to iterate over all buses to find the
relevant bus to handle the sigbus error.

Signed-off-by: Jeff Guo 
Acked-by: Shaopeng He 
---
v12->v11:
no change.
---
 lib/librte_eal/common/eal_common_bus.c | 43 ++
 lib/librte_eal/common/eal_private.h| 13 ++
 2 files changed, 56 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_bus.c 
b/lib/librte_eal/common/eal_common_bus.c
index 0943851..62b7318 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "eal_private.h"
 
@@ -242,3 +243,45 @@ rte_bus_get_iommu_class(void)
}
return mode;
 }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+   const void *failure_addr)
+{
+   int ret;
+
+   if (!bus->sigbus_handler)
+   return -1;
+
+   ret = bus->sigbus_handler(failure_addr);
+
+   /* find bus but handle failed, keep the errno be set. */
+   if (ret < 0 && rte_errno == 0)
+   rte_errno = ENOTSUP;
+
+   return ret > 0;
+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+   struct rte_bus *bus;
+
+   int ret = 0;
+   int old_errno = rte_errno;
+
+   rte_errno = 0;
+
+   bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+   /* can not find bus. */
+   if (!bus)
+   return 1;
+   /* find bus but handle failed, pass on the new errno. */
+   else if (rte_errno != 0)
+   return -1;
+
+   /* restore the old errno. */
+   rte_errno = old_errno;
+
+   return ret;
+}
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index 4f809a8..a2d1528 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -304,4 +304,17 @@ int
 rte_devargs_layers_parse(struct rte_devargs *devargs,
 const char *devstr);
 
+/**
+ * Iterate over all buses to find the corresponding bus to handle the sigbus
+ * error.
+ * @param failure_addr
+ * Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *  0 success to handle the sigbus.
+ * -1 failed to handle the sigbus
+ *  1 no bus can handler the sigbus
+ */
+int rte_bus_sigbus_handler(const void *failure_addr);
+
 #endif /* _EAL_PRIVATE_H_ */
-- 
2.7.4

[dpdk-dev] [PATCH v11 7/7] testpmd: use hot-unplug failure handle mechanism

2018-10-02 Thread Jeff Guo

This patch use testpmd for example, to show how an app smoothly handle
failure when device be hot-unplug. Except app should enabled the device
event monitor and register the hotplug event’s callback, it also need
enable hotplug handle mechanism before running. Once app detect the
removal event, the hot-unplug callback would be called. It will first stop
the packet forwarding, then stop the port, close the port, and finally
detach the port to clean the device and release the resources.

Signed-off-by: Jeff Guo 
---
v12->v11:
no change.
---
 app/test-pmd/testpmd.c | 39 +++
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 001f0e5..bfef483 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2093,14 +2093,22 @@ pmd_test_exit(void)
 
if (hot_plug) {
ret = rte_dev_event_monitor_stop();
-   if (ret)
+   if (ret) {
RTE_LOG(ERR, EAL,
"fail to stop device event monitor.");
+   return;
+   }
 
ret = eth_dev_event_callback_unregister();
if (ret)
+   return;
+
+   ret = rte_dev_hotplug_handle_disable();
+   if (ret) {
RTE_LOG(ERR, EAL,
-   "fail to unregister all event callbacks.");
+   "fail to disable hotplug handling.");
+   return;
+   }
}
 
printf("\nBye...\n");
@@ -2244,6 +2252,9 @@ static void
 eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 __rte_unused void *arg)
 {
+   uint16_t port_id;
+   int ret;
+
if (type >= RTE_DEV_EVENT_MAX) {
fprintf(stderr, "%s called upon invalid event %d\n",
__func__, type);
@@ -2254,9 +2265,12 @@ eth_dev_event_callback(char *device_name, enum 
rte_dev_event_type type,
case RTE_DEV_EVENT_REMOVE:
RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
device_name);
-   /* TODO: After finish failure handle, begin to stop
-* packet forward, stop port, close port, detach port.
-*/
+   ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
+   if (ret) {
+   printf("can not get port by device %s!\n", device_name);
+   return;
+   }
+   rmv_event_callback((void *)(intptr_t)port_id);
break;
case RTE_DEV_EVENT_ADD:
RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
@@ -2779,14 +2793,23 @@ main(int argc, char** argv)
init_config();
 
if (hot_plug) {
-   /* enable hot plug monitoring */
+   ret = rte_dev_hotplug_handle_enable();
+   if (ret) {
+   RTE_LOG(ERR, EAL,
+   "fail to enable hotplug handling.");
+   return -1;
+   }
+
ret = rte_dev_event_monitor_start();
if (ret) {
-   rte_errno = EINVAL;
+   RTE_LOG(ERR, EAL,
+   "fail to start device event monitoring.");
return -1;
}
-   eth_dev_event_callback_register();
 
+   ret = eth_dev_event_callback_register();
+   if (ret)
+   return -1;
}
 
if (start_port(RTE_PORT_ALL) != 0)
-- 
2.7.4

[dpdk-dev] [PATCH v12 1/7] bus: add hot-unplug handler

2018-10-02 Thread Jeff Guo

A hot-unplug failure and app crash can be caused, when a device is
hot-unplugged but the application still try to access the device
by reading or writing from the BARs, which is already invalid but
still not timely be unmap or released.

This patch introduces bus ops to handle hot-unplug failures. Each
bus can implement its own case-dependent logic to handle the failures.

Signed-off-by: Jeff Guo 
---
v12->v11:
no change.
---
 lib/librte_eal/common/include/rte_bus.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h 
b/lib/librte_eal/common/include/rte_bus.h
index b7b5b08..1bb53dc 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implement a specific hot-unplug handler, which is responsible for
+ * handle the failure when device be hot-unplugged. When the event of
+ * hot-unplug be detected, it could call this function to handle
+ * the hot-unplug failure and avoid app crash.
+ * @param dev
+ * Pointer of the device structure.
+ *
+ * @return
+ * 0 on success.
+ * !0 on error.
+ */
+typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -212,6 +226,8 @@ struct rte_bus {
struct rte_bus_conf conf;/**< Bus configuration */
rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
rte_dev_iterate_t dev_iterate; /**< Device iterator. */
+   rte_bus_hot_unplug_handler_t hot_unplug_handler;
+   /**< handle hot-unplug failure on the bus */
 };
 
 /**
-- 
2.7.4

[dpdk-dev] [PATCH v12 0/7] hot-unplug failure handle mechanism

2018-10-02 Thread Jeff Guo

Hotplug is an important feature for use-cases like the datacenter device's
fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
flexibility and continuality to networking services in multiple use-cases
in the industry. So let's see how DPDK can help users implement hotplug
solutions.

We already have a general device-event monitor mechanism, failsafe driver,
and hot plug/unplug API in DPDK. We have already got the solution of
“ethdev event + kernel PMD hotplug handler + failsafe”, but we still not
got “eal event + hotplug handler for pci PMD + failsafe” implement, and we
need to considerate 2 different solutions between uio pci and vfio pci.

In the case of hotplug for igb_uio, when a hardware device be removed
physically or disabled in software, the application needs to be notified
and detach the device out of the bus, and then make the device invalidate.
The problem is that, the removal of the device is not instantaneous in
software. If the application data path tries to read/write to the device
when removal is still in process, it will cause an MMIO error and
application will crash.

In this patch set, we propose a PCIe bus failure handler mechanism for
hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs,
the application will not crash.

The mechanism should work as below:

First, the application enables the device event monitor, registers the
hotplug event’s callback and enable hotplug handling before running the
data path. Once the hot-unplug occurs, the mechanism will detect the
removal event and then accordingly do the failure handling. In order to
do that, the below functionality will be required:
 - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure.
 - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci,
   it will be based on the failure address to remap memory for the corresponding
   device that unplugged. For vfio pci, could seperate implement case by case.

For the data path or other unexpected behaviors from the control path
when a hot unplug occurs:
 - Add a new bus ops “sigbus_handler”, that is responsible for handling
   the sigbus error which is either an original memory error, or a specific
   memory error that is caused by a hot unplug. When a sigbus error is
   captured, it will call this function to handle sigbus error.
 - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all
   device on PCI bus to find which device encounter the failure.
 - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
   to handle the failure.
 - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
   “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling.
   It will monitor the sigbus error by a handler which is per-process.
   Based on the signal event principle, the control path thread and the
   data path thread will randomly receive the sigbus error, but will call the
   common sigbus process. When sigbus be captured, it will call the above API
   to find bus to handle it.

The mechanism could be used by app or PMDs. For example, the whole process
of hotplug in testpmd is:
 - Enable device event monitor->Enable hotplug handle->Register event callback
   ->attach port->start port->start forwarding->Device unplug->failure handle
   ->stop forwarding->stop port->close port->detach port.

This patch set would not cover hotplug insert and binding, and it is only
implement the igb_uio failure handler, the vfio hotplug failure handler
will be in next coming patch set.

patchset history:
v12->v11:
add and delete some checking about sigbus recover.

v11->v10:
change the ops name, since both uio and vfio will use the hot-unplug ops.
since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
move the igb_uio fixing part, since it is random issue and should be considarate
as kernel driver defect but not include as this failure handler mechanism.

v10->v9:
modify the api name and exposure out for public use.
add hotplug handle enable/disable APIs
refine commit log

v9->v8:
refine commit log to be more readable.

v8->v7:
refine errno process in sigbus handler.
refine igb uio release process

v7->v6:
delete some unused part

v6->v5:
refine some description about bus ops
refine commit log
add some entry check.

v5->v4:
split patches to focus on the failure handle, remove the event usage
by testpmd to another patch.
change the hotplug failure handler name.
refine the sigbus handle logic.
add lock for udev state in igb uio driver.

v4->v3:
split patches to be small and clear.
change to use new parameter "--hotplug-mode" in testpmd to identify
the eal hotplug and ethdev hotplug.

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler distingush handle generic.
sigbus and hotplug sigbus.

v2->v1(v21):
refine some doc and commit log.
fix igb uio kernel issue for control path fai

[dpdk-dev] [PATCH v12 2/7] bus/pci: implement hot-unplug handler ops

2018-10-02 Thread Jeff Guo

This patch implements the ops to handle hot-unplug on the PCI bus.
For UIO PCI, it could avoids BARs read/write errors by creating a
new dummy memory to remap the memory where the failure is. For VFIO
or other kernel driver, it could specific implement function to handle
hot-unplug case by case.

Signed-off-by: Jeff Guo 
Acked-by: Shaopeng He 
---
v12->v11:
no change.
---
 drivers/bus/pci/pci_common.c | 28 
 drivers/bus/pci/pci_common_uio.c | 33 +
 drivers/bus/pci/private.h| 12 
 3 files changed, 73 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 7736b3f..d286234 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -406,6 +406,33 @@ pci_find_device(const struct rte_device *start, 
rte_dev_cmp_t cmp,
 }
 
 static int
+pci_hot_unplug_handler(struct rte_device *dev)
+{
+   struct rte_pci_device *pdev = NULL;
+   int ret = 0;
+
+   pdev = RTE_DEV_TO_PCI(dev);
+   if (!pdev)
+   return -1;
+
+   switch (pdev->kdrv) {
+   case RTE_KDRV_IGB_UIO:
+   case RTE_KDRV_UIO_GENERIC:
+   case RTE_KDRV_NIC_UIO:
+   /* BARs resource is invalid, remap it to be safe. */
+   ret = pci_uio_remap_resource(pdev);
+   break;
+   default:
+   RTE_LOG(DEBUG, EAL,
+   "Not managed by a supported kernel driver, skipped\n");
+   ret = -1;
+   break;
+   }
+
+   return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -435,6 +462,7 @@ struct rte_pci_bus rte_pci_bus = {
.unplug = pci_unplug,
.parse = pci_parse,
.get_iommu_class = rte_pci_get_iommu_class,
+   .hot_unplug_handler = pci_hot_unplug_handler,
},
.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+   int i;
+   void *map_address;
+
+   if (dev == NULL)
+   return -1;
+
+   /* Remap all BARs */
+   for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+   /* skip empty BAR */
+   if (dev->mem_resource[i].phys_addr == 0)
+   continue;
+   map_address = mmap(dev->mem_resource[i].addr,
+   (size_t)dev->mem_resource[i].len,
+   PROT_READ | PROT_WRITE,
+   MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+   if (map_address == MAP_FAILED) {
+   RTE_LOG(ERR, EAL,
+   "Cannot remap resource for device %s\n",
+   dev->name);
+   return -1;
+   }
+   RTE_LOG(INFO, EAL,
+   "Successful remap resource for device %s\n",
+   dev->name);
+   }
+
+   return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 8ddd03e..6b312e5 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -123,6 +123,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
-- 
2.7.4

[dpdk-dev] [PATCH v12 3/7] bus: add sigbus handler

2018-10-02 Thread Jeff Guo

When a device is hot-unplugged, a sigbus error will occur of the datapath
can still read/write to the device. A handler is required here to capture
the sigbus signal and handle it appropriately.

This patch introduces a bus ops to handle sigbus errors. Each bus can
implement its own case-dependent logic to handle the sigbus errors.

Signed-off-by: Jeff Guo 
Acked-by: Shaopeng He 
---
v12->v11:
no change.
---
 lib/librte_eal/common/include/rte_bus.h | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h 
b/lib/librte_eal/common/include/rte_bus.h
index 1bb53dc..201454a 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -182,6 +182,21 @@ typedef int (*rte_bus_parse_t)(const char *name, void 
*addr);
 typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev);
 
 /**
+ * Implement a specific sigbus handler, which is responsible for handling
+ * the sigbus error which is either original memory error, or specific memory
+ * error that caused of device be hot-unplugged. When sigbus error be captured,
+ * it could call this function to handle sigbus error.
+ * @param failure_addr
+ * Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ * 0 for success handle the sigbus.
+ * 1 for no bus handle the sigbus.
+ * -1 for failed to handle the sigbus
+ */
+typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -228,6 +243,9 @@ struct rte_bus {
rte_dev_iterate_t dev_iterate; /**< Device iterator. */
rte_bus_hot_unplug_handler_t hot_unplug_handler;
/**< handle hot-unplug failure on the bus */
+   rte_bus_sigbus_handler_t sigbus_handler;
+   /**< handle sigbus error on the bus */
+
 };
 
 /**
-- 
2.7.4

[dpdk-dev] [PATCH v12 5/7] bus: add helper to handle sigbus

2018-10-02 Thread Jeff Guo

This patch aims to add a helper to iterate over all buses to find the
relevant bus to handle the sigbus error.

Signed-off-by: Jeff Guo 
Acked-by: Shaopeng He 
---
v12->v11:
no change.
---
 lib/librte_eal/common/eal_common_bus.c | 43 ++
 lib/librte_eal/common/eal_private.h| 13 ++
 2 files changed, 56 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_bus.c 
b/lib/librte_eal/common/eal_common_bus.c
index 0943851..62b7318 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "eal_private.h"
 
@@ -242,3 +243,45 @@ rte_bus_get_iommu_class(void)
}
return mode;
 }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+   const void *failure_addr)
+{
+   int ret;
+
+   if (!bus->sigbus_handler)
+   return -1;
+
+   ret = bus->sigbus_handler(failure_addr);
+
+   /* find bus but handle failed, keep the errno be set. */
+   if (ret < 0 && rte_errno == 0)
+   rte_errno = ENOTSUP;
+
+   return ret > 0;
+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+   struct rte_bus *bus;
+
+   int ret = 0;
+   int old_errno = rte_errno;
+
+   rte_errno = 0;
+
+   bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+   /* can not find bus. */
+   if (!bus)
+   return 1;
+   /* find bus but handle failed, pass on the new errno. */
+   else if (rte_errno != 0)
+   return -1;
+
+   /* restore the old errno. */
+   rte_errno = old_errno;
+
+   return ret;
+}
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index 4f809a8..a2d1528 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -304,4 +304,17 @@ int
 rte_devargs_layers_parse(struct rte_devargs *devargs,
 const char *devstr);
 
+/**
+ * Iterate over all buses to find the corresponding bus to handle the sigbus
+ * error.
+ * @param failure_addr
+ * Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *  0 success to handle the sigbus.
+ * -1 failed to handle the sigbus
+ *  1 no bus can handler the sigbus
+ */
+int rte_bus_sigbus_handler(const void *failure_addr);
+
 #endif /* _EAL_PRIVATE_H_ */
-- 
2.7.4

[dpdk-dev] [PATCH v12 6/7] eal: add failure handle mechanism for hot-unplug

2018-10-02 Thread Jeff Guo

The mechanism can initially register the sigbus handler after the device
event monitor is enabled. When a sigbus event is captured, it will check
the failure address and accordingly handle the memory failure of the
corresponding device by invoke the hot-unplug handler. It could prevent
the application from crashing when a device is hot-unplugged.

By this patch, users could call below new added APIs to enable/disable
the device hotplug handle mechanism. Note that it just implement the
hot-unplug handler in these functions, the other handler of hotplug, such
as handler for hotplug binding, could be add in the future if need:
  - rte_dev_hotplug_handle_enable
  - rte_dev_hotplug_handle_disable

Signed-off-by: Jeff Guo 
---
v12->v11:
add and delete some checking about sigbus recover.
---
 doc/guides/rel_notes/release_18_08.rst  |   5 +
 lib/librte_eal/bsdapp/eal/eal_dev.c |  14 +++
 lib/librte_eal/common/eal_private.h |  26 +
 lib/librte_eal/common/include/rte_dev.h |  26 +
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 164 +++-
 lib/librte_eal/rte_eal_version.map  |   2 +
 6 files changed, 236 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_18_08.rst 
b/doc/guides/rel_notes/release_18_08.rst
index 321fa84..fe0e60f 100644
--- a/doc/guides/rel_notes/release_18_08.rst
+++ b/doc/guides/rel_notes/release_18_08.rst
@@ -117,6 +117,11 @@ New Features
 
   Added support for chained mbufs (input and output).
 
+* **Added hot-unplug handle mechanism.**
+
+  ``rte_dev_hotplug_handle_enable`` and ``rte_dev_hotplug_handle_disable`` are
+  for enabling or disabling hotplug handle mechanism.
+
 
 API Changes
 ---
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c 
b/lib/librte_eal/bsdapp/eal/eal_dev.c
index 1c6c51b..255d611 100644
--- a/lib/librte_eal/bsdapp/eal/eal_dev.c
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -19,3 +19,17 @@ rte_dev_event_monitor_stop(void)
RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
return -1;
 }
+
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void)
+{
+   RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+   return -1;
+}
+
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void)
+{
+   RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+   return -1;
+}
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index a2d1528..637f20d 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -317,4 +317,30 @@ rte_devargs_layers_parse(struct rte_devargs *devargs,
  */
 int rte_bus_sigbus_handler(const void *failure_addr);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Register the sigbus handler.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_sigbus_handler_register(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Unregister the sigbus handler.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_sigbus_handler_unregister(void);
+
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h 
b/lib/librte_eal/common/include/rte_dev.h
index b80a805..ff580a0 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -460,4 +460,30 @@ rte_dev_event_monitor_start(void);
 int __rte_experimental
 rte_dev_event_monitor_stop(void);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Enable hotplug handling for devices.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Disable hotplug handling for devices.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void);
+
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c 
b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..72fc033 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -14,15 +16,32 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 
 #include "eal_private.h"
 
 static struct rte_intr_handle intr_handle = {.fd = -1 };
 static bool monitor_started;
+static bool hotplug_handle;
 
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/*
+ * spinlock for device hot-unplug failure handling. If it try to access bus or
+ * device, such as handle sigbus on bus or handle memory failure for device
+ * just need to use this lock. It could protect the bus and the device to

[dpdk-dev] [PATCH v12 7/7] testpmd: use hot-unplug failure handle mechanism

2018-10-02 Thread Jeff Guo

This patch use testpmd for example, to show how an app smoothly handle
failure when device be hot-unplug. Except app should enabled the device
event monitor and register the hotplug event’s callback, it also need
enable hotplug handle mechanism before running. Once app detect the
removal event, the hot-unplug callback would be called. It will first stop
the packet forwarding, then stop the port, close the port, and finally
detach the port to clean the device and release the resources.

Signed-off-by: Jeff Guo 
---
v12->v11:
no change.
---
 app/test-pmd/testpmd.c | 39 +++
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 001f0e5..bfef483 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2093,14 +2093,22 @@ pmd_test_exit(void)
 
if (hot_plug) {
ret = rte_dev_event_monitor_stop();
-   if (ret)
+   if (ret) {
RTE_LOG(ERR, EAL,
"fail to stop device event monitor.");
+   return;
+   }
 
ret = eth_dev_event_callback_unregister();
if (ret)
+   return;
+
+   ret = rte_dev_hotplug_handle_disable();
+   if (ret) {
RTE_LOG(ERR, EAL,
-   "fail to unregister all event callbacks.");
+   "fail to disable hotplug handling.");
+   return;
+   }
}
 
printf("\nBye...\n");
@@ -2244,6 +2252,9 @@ static void
 eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 __rte_unused void *arg)
 {
+   uint16_t port_id;
+   int ret;
+
if (type >= RTE_DEV_EVENT_MAX) {
fprintf(stderr, "%s called upon invalid event %d\n",
__func__, type);
@@ -2254,9 +2265,12 @@ eth_dev_event_callback(char *device_name, enum 
rte_dev_event_type type,
case RTE_DEV_EVENT_REMOVE:
RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
device_name);
-   /* TODO: After finish failure handle, begin to stop
-* packet forward, stop port, close port, detach port.
-*/
+   ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
+   if (ret) {
+   printf("can not get port by device %s!\n", device_name);
+   return;
+   }
+   rmv_event_callback((void *)(intptr_t)port_id);
break;
case RTE_DEV_EVENT_ADD:
RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
@@ -2779,14 +2793,23 @@ main(int argc, char** argv)
init_config();
 
if (hot_plug) {
-   /* enable hot plug monitoring */
+   ret = rte_dev_hotplug_handle_enable();
+   if (ret) {
+   RTE_LOG(ERR, EAL,
+   "fail to enable hotplug handling.");
+   return -1;
+   }
+
ret = rte_dev_event_monitor_start();
if (ret) {
-   rte_errno = EINVAL;
+   RTE_LOG(ERR, EAL,
+   "fail to start device event monitoring.");
return -1;
}
-   eth_dev_event_callback_register();
 
+   ret = eth_dev_event_callback_register();
+   if (ret)
+   return -1;
}
 
if (start_port(RTE_PORT_ALL) != 0)
-- 
2.7.4

[dpdk-dev] [PATCH v12 4/7] bus/pci: implement sigbus handler ops

2018-10-02 Thread Jeff Guo

This patch implements the ops for the PCI bus sigbus handler. It finds the
PCI device that is being hot-unplugged and calls the relevant ops of the
hot-unplug handler to handle the hot-unplug failure of the device.

Signed-off-by: Jeff Guo 
Acked-by: Shaopeng He 
---
v12->v11:
no change.
---
 drivers/bus/pci/pci_common.c | 53 
 1 file changed, 53 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index d286234..f313fe9 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -405,6 +405,36 @@ pci_find_device(const struct rte_device *start, 
rte_dev_cmp_t cmp,
return NULL;
 }
 
+/**
+ * find the device which encounter the failure, by iterate over all device on
+ * PCI bus to check if the memory failure address is located in the range
+ * of the BARs of the device.
+ */
+static struct rte_pci_device *
+pci_find_device_by_addr(const void *failure_addr)
+{
+   struct rte_pci_device *pdev = NULL;
+   int i;
+
+   FOREACH_DEVICE_ON_PCIBUS(pdev) {
+   for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+   if ((uint64_t)(uintptr_t)failure_addr >=
+   (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
+   (uint64_t)(uintptr_t)failure_addr <
+   (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
+   pdev->mem_resource[i].len) {
+   RTE_LOG(INFO, EAL, "Failure address "
+   "%16.16"PRIx64" belongs to "
+   "device %s!\n",
+   (uint64_t)(uintptr_t)failure_addr,
+   pdev->device.name);
+   return pdev;
+   }
+   }
+   }
+   return NULL;
+}
+
 static int
 pci_hot_unplug_handler(struct rte_device *dev)
 {
@@ -433,6 +463,28 @@ pci_hot_unplug_handler(struct rte_device *dev)
 }
 
 static int
+pci_sigbus_handler(const void *failure_addr)
+{
+   struct rte_pci_device *pdev = NULL;
+   int ret = 0;
+
+   pdev = pci_find_device_by_addr(failure_addr);
+   if (!pdev) {
+   /* It is a generic sigbus error, no bus would handle it. */
+   ret = 1;
+   } else {
+   /* The sigbus error is caused of hot-unplug. */
+   ret = pci_hot_unplug_handler(&pdev->device);
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Failed to handle hot-unplug for "
+   "device %s", pdev->name);
+   ret = -1;
+   }
+   }
+   return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -463,6 +515,7 @@ struct rte_pci_bus rte_pci_bus = {
.parse = pci_parse,
.get_iommu_class = rte_pci_get_iommu_class,
.hot_unplug_handler = pci_hot_unplug_handler,
+   .sigbus_handler = pci_sigbus_handler,
},
.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
-- 
2.7.4

[dpdk-dev] [PATCH v2 2/4] eal: modify device event callback process func

2018-10-02 Thread Jeff Guo

This patch modify the device event callback process function name to be
more explicit, change the variable to be const. And more, because not only
eal device helper will use the callback, but also vfio bus will use the
callback to handle hot-unplug, so exposure the API out from private eal.
The bus drivers and eal device would directly use this API to process
device event callback.

Signed-off-by: Jeff Guo 
---
modify commit log to be more clear
---
 app/test-pmd/testpmd.c  |  4 ++--
 lib/librte_eal/bsdapp/eal/eal_dev.c |  8 
 lib/librte_eal/common/eal_common_dev.c  |  5 +++--
 lib/librte_eal/common/eal_private.h | 12 
 lib/librte_eal/common/include/rte_dev.h | 18 +-
 lib/librte_eal/linuxapp/eal/eal_dev.c   |  2 +-
 lib/librte_eal/rte_eal_version.map  |  1 +
 7 files changed, 32 insertions(+), 18 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index bfef483..1313100 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -431,7 +431,7 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(portid_t port_id,
  enum rte_eth_event_type type,
  void *param, void *ret_param);
-static void eth_dev_event_callback(char *device_name,
+static void eth_dev_event_callback(const char *device_name,
enum rte_dev_event_type type,
void *param);
 static int eth_dev_event_callback_register(void);
@@ -2249,7 +2249,7 @@ eth_event_callback(portid_t port_id, enum 
rte_eth_event_type type, void *param,
 
 /* This function is used by the interrupt thread */
 static void
-eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
+eth_dev_event_callback(const char *device_name, enum rte_dev_event_type type,
 __rte_unused void *arg)
 {
uint16_t port_id;
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c 
b/lib/librte_eal/bsdapp/eal/eal_dev.c
index 255d611..3a3a2a5 100644
--- a/lib/librte_eal/bsdapp/eal/eal_dev.c
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -33,3 +33,11 @@ rte_dev_hotplug_handle_disable(void)
RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
return -1;
 }
+
+void  __rte_experimental
+rte_dev_event_callback_process(const char *device_name,
+  enum rte_dev_event_type event)
+{
+   RTE_LOG(ERR, EAL,
+   "Device event callback process is not supported for 
FreeBSD.\n");
+}
diff --git a/lib/librte_eal/common/eal_common_dev.c 
b/lib/librte_eal/common/eal_common_dev.c
index 678dbca..2d610a4 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -342,8 +342,9 @@ rte_dev_event_callback_unregister(const char *device_name,
return ret;
 }
 
-void
-dev_callback_process(char *device_name, enum rte_dev_event_type event)
+void __rte_experimental
+rte_dev_event_callback_process(const char *device_name,
+  enum rte_dev_event_type event)
 {
struct dev_event_callback *cb_lst;
 
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index 637f20d..47e8a33 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -259,18 +259,6 @@ struct rte_bus *rte_bus_find_by_device_name(const char 
*str);
 int rte_mp_channel_init(void);
 
 /**
- * Internal Executes all the user application registered callbacks for
- * the specific device. It is for DPDK internal user only. User
- * application should not call it directly.
- *
- * @param device_name
- *  The device name.
- * @param event
- *  the device event type.
- */
-void dev_callback_process(char *device_name, enum rte_dev_event_type event);
-
-/**
  * @internal
  * Parse a device string and store its information in an
  * rte_devargs structure.
diff --git a/lib/librte_eal/common/include/rte_dev.h 
b/lib/librte_eal/common/include/rte_dev.h
index ff580a0..58fab43 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -39,7 +39,7 @@ struct rte_dev_event {
char *devname;  /**< device name */
 };
 
-typedef void (*rte_dev_event_cb_fn)(char *device_name,
+typedef void (*rte_dev_event_cb_fn)(const char *device_name,
enum rte_dev_event_type event,
void *cb_arg);
 
@@ -438,6 +438,22 @@ rte_dev_event_callback_unregister(const char *device_name,
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice
  *
+ * Executes all the user application registered callbacks for
+ * the specific device.
+ *
+ * @param device_name
+ *  The device name.
+ * @param event
+ *  the device event type.
+ */
+void  __rte_experimental
+rte_dev_event_callback_process(const char *device_name,
+  enum rte

[dpdk-dev] [PATCH v2 1/4] eal: add a new req notifier to eal interrupt

2018-10-02 Thread Jeff Guo

Add a new req notifier in eal interrupt for enable vfio hotplug.

Signed-off-by: Jeff Guo 
---
v3->v2:
change some code sytle to make consistent.
---
 lib/librte_eal/common/include/rte_eal_interrupts.h |  1 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c   | 71 ++
 2 files changed, 72 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_eal_interrupts.h 
b/lib/librte_eal/common/include/rte_eal_interrupts.h
index 6eb4932..5204ed4 100644
--- a/lib/librte_eal/common/include/rte_eal_interrupts.h
+++ b/lib/librte_eal/common/include/rte_eal_interrupts.h
@@ -35,6 +35,7 @@ enum rte_intr_handle_type {
RTE_INTR_HANDLE_EXT,  /**< external handler */
RTE_INTR_HANDLE_VDEV, /**< virtual device */
RTE_INTR_HANDLE_DEV_EVENT,/**< device event handle */
+   RTE_INTR_HANDLE_VFIO_REQ, /**< vfio device handle (req) */
RTE_INTR_HANDLE_MAX   /**< count of elements */
 };
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c 
b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 4076c6d..7f611b3 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -308,6 +308,64 @@ vfio_disable_msix(const struct rte_intr_handle 
*intr_handle) {
 
return ret;
 }
+
+/* enable req notifier */
+static int
+vfio_enable_req(const struct rte_intr_handle *intr_handle)
+{
+   int len, ret;
+   char irq_set_buf[IRQ_SET_BUF_LEN];
+   struct vfio_irq_set *irq_set;
+   int *fd_ptr;
+
+   len = sizeof(irq_set_buf);
+
+   irq_set = (struct vfio_irq_set *) irq_set_buf;
+   irq_set->argsz = len;
+   irq_set->count = 1;
+   irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
+VFIO_IRQ_SET_ACTION_TRIGGER;
+   irq_set->index = VFIO_PCI_REQ_IRQ_INDEX;
+   irq_set->start = 0;
+   fd_ptr = (int *) &irq_set->data;
+   *fd_ptr = intr_handle->fd;
+
+   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Error enabling req interrupts for fd %d\n",
+   intr_handle->fd);
+   return -1;
+   }
+
+   return 0;
+}
+
+/* disable req notifier */
+static int
+vfio_disable_req(const struct rte_intr_handle *intr_handle)
+{
+   struct vfio_irq_set *irq_set;
+   char irq_set_buf[IRQ_SET_BUF_LEN];
+   int len, ret;
+
+   len = sizeof(struct vfio_irq_set);
+
+   irq_set = (struct vfio_irq_set *) irq_set_buf;
+   irq_set->argsz = len;
+   irq_set->count = 0;
+   irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+   irq_set->index = VFIO_PCI_REQ_IRQ_INDEX;
+   irq_set->start = 0;
+
+   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+   if (ret)
+   RTE_LOG(ERR, EAL, "Error disabling req interrupts for fd %d\n",
+   intr_handle->fd);
+
+   return ret;
+}
 #endif
 
 static int
@@ -556,6 +614,10 @@ rte_intr_enable(const struct rte_intr_handle *intr_handle)
if (vfio_enable_intx(intr_handle))
return -1;
break;
+   case RTE_INTR_HANDLE_VFIO_REQ:
+   if (vfio_enable_req(intr_handle))
+   return -1;
+   break;
 #endif
/* not used at this moment */
case RTE_INTR_HANDLE_DEV_EVENT:
@@ -606,6 +668,11 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle)
if (vfio_disable_intx(intr_handle))
return -1;
break;
+   case RTE_INTR_HANDLE_VFIO_REQ:
+   if (vfio_disable_req(intr_handle))
+   return -1;
+   break;
+
 #endif
/* not used at this moment */
case RTE_INTR_HANDLE_DEV_EVENT:
@@ -682,6 +749,10 @@ eal_intr_process_interrupts(struct epoll_event *events, 
int nfds)
bytes_read = 0;
call = true;
break;
+   case RTE_INTR_HANDLE_VFIO_REQ:
+   bytes_read = 0;
+   call = true;
+   break;
default:
bytes_read = 1;
break;
-- 
2.7.4

[dpdk-dev] [PATCH v2 0/4] Enable hotplug in vfio

2018-10-02 Thread Jeff Guo

As we may know that the process of hotplug is different between igb_uio
and vfio. For igb_uio, it could use uevent notification and memory
failure handle mechanism for hot-unplug. But for vfio, when device is be
hot-unplugged, the uevent can not be detected immediately, because of the
vfio kernel module will use a special mechanism to guaranty the pci
device would not be deleted until the user space release the resources,
so it will use another event “req notifier” at first to notify user space
to release resources for hotplug.

This patch will add a new interrupt type of req notifier in eal interrupt,
and add the new interrupt handler in pci device to handle the req device
event. When the req notifier be detected, it can trigger the device event
callback process to process for hot-unplug. With this mechanism, hotplug
could be enable in vfio.

patchset history:
v3->v2:
change some commit log and coding style and typo.

v2->v1:
change the rte_dev_event_callback_prcess from internal to external api
for bus or app usage.
change some code logic.

Jeff Guo (4):
  eal: add a new req notifier to eal interrupt
  eal: modify device event callback process func
  pci: add req handler field to generic pci device
  vfio: enable vfio hotplug by req notifier handler

 app/test-pmd/testpmd.c |   4 +-
 drivers/bus/pci/linux/pci_vfio.c   | 111 +
 drivers/bus/pci/pci_common.c   |  10 ++
 drivers/bus/pci/rte_bus_pci.h  |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c|   8 ++
 lib/librte_eal/common/eal_common_dev.c |   5 +-
 lib/librte_eal/common/eal_private.h|  12 ---
 lib/librte_eal/common/include/rte_dev.h|  18 +++-
 lib/librte_eal/common/include/rte_eal_interrupts.h |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c  |   2 +-
 lib/librte_eal/linuxapp/eal/eal_interrupts.c   |  71 +
 lib/librte_eal/rte_eal_version.map |   1 +
 12 files changed, 226 insertions(+), 18 deletions(-)

-- 
2.7.4

[dpdk-dev] [PATCH v2 3/4] pci: add req handler field to generic pci device

2018-10-02 Thread Jeff Guo

There are some extended interrupt types in vfio pci device except from the
existing interrupts, such as err and req notifier, they could be useful for
device error monitoring. And these corresponding interrupt handler is
different from the other interrupt handler that register in PMDs, so a new
interrupt handler should be added. This patch will add specific req handler
in generic pci device.

Signed-off-by: Jeff Guo 
---
v3->v2:
no change.
---
 drivers/bus/pci/rte_bus_pci.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index 0d1955f..c45a820 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -66,6 +66,7 @@ struct rte_pci_device {
uint16_t max_vfs;   /**< sriov enable if not zero */
enum rte_kernel_driver kdrv;/**< Kernel driver passthrough */
char name[PCI_PRI_STR_SIZE+1];  /**< PCI location (ASCII) */
+   struct rte_intr_handle req_notifier_handler;/**< Req notifier handle */
 };
 
 /**
-- 
2.7.4

[dpdk-dev] [PATCH v2 4/4] vfio: enable vfio hotplug by req notifier handler

2018-10-02 Thread Jeff Guo

When device is be hot-unplugged, the vfio kernel module will sent req
notifier to request user space to release the allocated resources at
first. After that, vfio kernel module will detect the device disappear,
and then delete the device in kernel.

This patch aim to add req notifier processing to enable hotplug for vfio.
By enable the req notifier monitoring and register the notifier callback,
when device be hot-unplugged, the hot-unplug handler will be called to
process hotplug for vfio.

Signed-off-by: Jeff Guo 
---
v3->v2:
change some code style and typo
---
 drivers/bus/pci/linux/pci_vfio.c | 111 +++
 drivers/bus/pci/pci_common.c |  10 
 2 files changed, 121 insertions(+)

diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index 686386d..5d3d026 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -17,6 +17,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "eal_filesystem.h"
 
@@ -277,6 +279,101 @@ pci_vfio_setup_interrupts(struct rte_pci_device *dev, int 
vfio_dev_fd)
return -1;
 }
 
+static void
+pci_vfio_req_handler(void *param)
+{
+   struct rte_bus *bus;
+   int ret;
+   struct rte_device *device = (struct rte_device *)param;
+
+   bus = rte_bus_find_by_device(device);
+   if (bus == NULL) {
+   RTE_LOG(ERR, EAL, "Cannot find bus for device (%s)\n",
+   device->name);
+   return;
+   }
+
+   /*
+* vfio kernel module request user space to release allocated
+* resources before device be deleted in kernel, so it can directly
+* call the vfio bus hot-unplug handler to process it.
+*/
+   ret = bus->hot_unplug_handler(device);
+   if (ret)
+   RTE_LOG(ERR, EAL,
+   "Can not handle hot-unplug for device (%s)\n",
+   device->name);
+}
+
+/* enable notifier (only enable req now) */
+static int
+pci_vfio_enable_notifier(struct rte_pci_device *dev, int vfio_dev_fd)
+{
+   int ret;
+   int fd = -1;
+
+   /* set up an eventfd for req notifier */
+   fd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
+   if (fd < 0) {
+   RTE_LOG(ERR, EAL, "Cannot set up eventfd, error %i (%s)\n",
+   errno, strerror(errno));
+   return -1;
+   }
+
+   dev->req_notifier_handler.fd = fd;
+   dev->req_notifier_handler.type = RTE_INTR_HANDLE_VFIO_REQ;
+   dev->req_notifier_handler.vfio_dev_fd = vfio_dev_fd;
+   ret = rte_intr_callback_register(&dev->req_notifier_handler,
+pci_vfio_req_handler,
+(void *)&dev->device);
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Fail to register req notifier handler.\n");
+   goto error;
+   }
+
+   ret = rte_intr_enable(&dev->req_notifier_handler);
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Fail to enable req notifier.\n");
+   ret = rte_intr_callback_unregister(&dev->req_notifier_handler,
+pci_vfio_req_handler,
+(void *)&dev->device);
+   if (ret)
+   RTE_LOG(ERR, EAL,
+   "Fail to unregister req notifier handler.\n");
+   goto error;
+   }
+
+   return 0;
+error:
+   close(fd);
+   return -1;
+}
+
+/* disable notifier (only disable req now) */
+static int
+pci_vfio_disable_notifier(struct rte_pci_device *dev)
+{
+   int ret;
+
+   ret = rte_intr_disable(&dev->req_notifier_handler);
+   if (ret) {
+   RTE_LOG(ERR, EAL, "fail to disable req notifier.\n");
+   return -1;
+   }
+
+   ret = rte_intr_callback_unregister(&dev->req_notifier_handler,
+  pci_vfio_req_handler,
+  (void *)&dev->device);
+   if (ret) {
+   RTE_LOG(ERR, EAL,
+"fail to unregister req notifier handler.\n");
+   return -1;
+   }
+
+   close(dev->req_notifier_handler.fd);
+   return 0;
+}
+
 static int
 pci_vfio_is_ioport_bar(int vfio_dev_fd, int bar_index)
 {
@@ -430,6 +527,7 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
struct pci_map *maps;
 
dev->intr_handle.fd = -1;
+   dev->req_notifier_handler.fd = -1;
 
/* store PCI address string */
snprintf(pci_addr, sizeof(pci_addr), PCI_PRI_FMT,
@@ -521,6 +619,11 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
goto err_vfio_res;
}
 
+   if (pci_vfio_enable_notifier(dev, vfio_dev_fd) != 0) {
+   RTE_LOG(ERR, EAL, "Error setting up notifier!\n");
+   goto err_vfio_res;
+   }
+
TAILQ_INSERT_T

Re: [dpdk-dev] [PATCH v2 1/5] net/bnx2x: fix logging to include dev name

2018-10-02 Thread Ferruh Yigit

On 9/29/2018 6:42 AM, Mody, Rasesh wrote:
> Fix PMD logging scheme to include device name in the messages printed.
> 
> Fixes: 540a211084a7 ("bnx2x: driver core")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Rasesh Mody 

Series applied to dpdk-next-net/master, thanks.

[dpdk-dev] DPDK passthrough to container in virtual machine

2018-10-02 Thread Periyasamy Palanisamy

Hi,

I'm trying to do OVS DPDK vhost port passthrough to a container resides in the 
VM.

Here are steps followed:

1. Add DPDK port into OVS integration bridge.
2. Bring up an Ubuntu VM using virsh with vhostuser interface configuration.

  
  
  
  
  
  

3. In the VM, convert this virtio port into DPDK network device.

How this DPDK interface can attached with docker container ?
After the configuration, I would like to run a sample DPDK application 
(example: testpmd) on the container.

Thanks,
Periyasamy

1 2 3 >

1 - 100 of 226 matches

Mail list logo