[dpdk-dev] [PATCH 2/3] version: adjust printing for new version scheme

2015-12-28 Thread Thomas Monjalon
2015-12-21 13:26, Bruce Richardson:
> Since we are now using a year-month numbering scheme, adjust
> the printing of the version to always use 2-digits for YY.MM
> format.

Yes
It must be done for "make showversion" also.

> Also omit the patch version unless there is a patch version present,
> since patches for releases are rare on DPDK. This means that the
> final release of 16.04 will report as 16.04, rather than 16.04.0.

So the numbering of master and maintenance releases will not be consistent:
16.04 and then 16.04.1
It's true that maintenance releases are rare but it has been discussed at
Dublin to have ones in future.
So are we OK to omit the .0 even if not consistent?

> Release candidates for it will similarly report as 16.04-rcX.

Yes, 16.04-rcX looks nicer than 16.04.0-rcX.

Shouldn't we take the opportunity to update RTE_VER_PREFIX from
"RTE" to "DPDK"?

>  /**
>   * Major version number i.e. the x in x.y.z
>   */
> -#define RTE_VER_MAJOR 16
> +#define RTE_REL_YEAR 16
>  
>  /**
>   * Minor version number i.e. the y in x.y.z
>   */
> -#define RTE_VER_MINOR 4
> +#define RTE_REL_MONTH 4

Why renaming from _VER_ to _REL_?
mk/rte.sdkconfig.mk is not updated accordingly
(make showversion is broken)

[...]
>  #define RTE_VERSION RTE_VERSION_NUM( \
> - RTE_VER_MAJOR, \
> - RTE_VER_MINOR, \
> + RTE_REL_YEAR, \
> + RTE_REL_MONTH, \
>   RTE_VER_PATCH_LEVEL, \
>   RTE_VER_PATCH_RELEASE)

Is there a better name for RTE_VER_PATCH_LEVEL and RTE_VER_PATCH_RELEASE?
I think PATCH_LEVEL should be MINOR, i.e. the number increased when
doing some maintenance releases.
The last digit is useful for release candidates and non-official packaging
(downstream consumers like Linux distros or vendors). It should be updated
when delivering a patched DPDK version. RTE_VER_SUFFIX should also be updated
accordingly. So is RTE_VER_PATCH_RELEASE the right name? I guess yes but not
sure.

[...]
> + if (RTE_VER_PATCH_LEVEL > 0)
> + pos += snprintf(version + pos, sizeof(version) - pos, ".%d",
>   RTE_VER_PATCH_LEVEL);

I disagree.
It's important to know that it is the first of the major release (.0).
I think we can remove it elsewhere. Example: PROJECT_NUMBER in doxygen.



[dpdk-dev] [PATCH 2/4] eal: set kdrv to RTE_KDRV_NONE if kernel driver isn't manipulating the device.

2015-12-28 Thread David Marchand
On Thu, Dec 24, 2015 at 7:38 PM, Huawei Xie  wrote:

> Use RTE_KDRV_NONE to indicate that kernel driver isn't manipulating the
> device.
>
> Signed-off-by: Huawei Xie 
> ---
>  lib/librte_eal/linuxapp/eal/eal_pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c
> b/lib/librte_eal/linuxapp/eal/eal_pci.c
> index bc5b5be..640b190 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
> @@ -362,7 +362,7 @@ pci_scan_one(const char *dirname, uint16_t domain,
> uint8_t bus,
> else
> dev->kdrv = RTE_KDRV_UNKNOWN;
> } else
> -   dev->kdrv = RTE_KDRV_UNKNOWN;
> +   dev->kdrv = RTE_KDRV_NONE;
>
> /* device is valid, add in list (sorted) */
> if (TAILQ_EMPTY(_device_list)) {
> --
> 1.8.1.4
>
>
lgtm
Acked-by: David Marchand 


-- 
David Marchand


[dpdk-dev] [PATCH v3 3/3] examples/l3fwd: Handle SIGINT and SIGTERM in l3fwd

2015-12-28 Thread Zhihong Wang
Handle SIGINT and SIGTERM in l3fwd.

Signed-off-by: Zhihong Wang 
Acked-by: Michael Qiu 
---
 examples/l3fwd/main.c | 129 +-
 1 file changed, 107 insertions(+), 22 deletions(-)

diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
index 5b0c2dd..c766cf5 100644
--- a/examples/l3fwd/main.c
+++ b/examples/l3fwd/main.c
@@ -41,6 +41,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 

 #include 
 #include 
@@ -75,6 +78,10 @@
 #include 
 #include 

+static volatile bool port_started;
+static volatile bool force_quit;
+static volatile int signo_quit;
+
 #define APP_LOOKUP_EXACT_MATCH  0
 #define APP_LOOKUP_LPM  1
 #define DO_RFC_1812_CHECKS
@@ -1553,8 +1560,7 @@ main_loop(__attribute__((unused)) void *dummy)
portid, queueid);
}

-   while (1) {
-
+   while (!force_quit) {
cur_tsc = rte_rdtsc();

/*
@@ -2516,8 +2522,12 @@ check_all_ports_link_status(uint8_t port_num, uint32_t 
port_mask)
printf("\nChecking link status");
fflush(stdout);
for (count = 0; count <= MAX_CHECK_TIME; count++) {
+   if (force_quit)
+   return;
all_ports_up = 1;
for (portid = 0; portid < port_num; portid++) {
+   if (force_quit)
+   return;
if ((port_mask & (1 << portid)) == 0)
continue;
memset(, 0, sizeof(link));
@@ -2559,6 +2569,76 @@ check_all_ports_link_status(uint8_t port_num, uint32_t 
port_mask)
}
 }

+static uint8_t
+start_ports(void)
+{
+   unsigned portid, nb_ports, avail_ports;
+   int ret;
+
+   nb_ports = rte_eth_dev_count();
+   avail_ports = 0;
+   for (portid = 0; portid < nb_ports; portid++) {
+   if ((enabled_port_mask & (1 << portid)) == 0)
+   continue;
+   avail_ports++;
+   port_started = true;
+   printf("Starting port %d...", portid);
+   ret = rte_eth_dev_start(portid);
+   if (ret < 0)
+   rte_exit(EXIT_FAILURE,
+   "rte_eth_dev_start: err=%d, port=%d\n",
+   ret, portid);
+   /*
+* If enabled, put device in promiscuous mode.
+* This allows IO forwarding mode to forward packets
+* to itself through 2 cross-connected  ports of the
+* target machine.
+*/
+   if (promiscuous_on)
+   rte_eth_promiscuous_enable(portid);
+   printf(" Done\n");
+   }
+
+   return avail_ports;
+}
+
+static void
+stop_ports(void)
+{
+   unsigned portid, nb_ports;
+
+   nb_ports = rte_eth_dev_count();
+   for (portid = 0; portid < nb_ports; portid++) {
+   if ((enabled_port_mask & (1 << portid)) == 0)
+   continue;
+   printf("Stopping port %d...", portid);
+   rte_eth_dev_stop(portid);
+   rte_eth_dev_close(portid);
+   printf(" Done\n");
+   }
+   port_started = false;
+}
+
+static void
+signal_handler(int signum)
+{
+   if (signum == SIGINT || signum == SIGTERM) {
+   printf("\nSignal %d received, preparing to exit...\n",
+   signum);
+   if (port_started) {
+   printf("Ports started already...\n");
+   signo_quit = signum;
+   force_quit = true;
+   } else {
+   printf("Ports not started yet...\n");
+   printf("Bye...\n");
+   /* exit with the expected status */
+   signal(signum, SIG_DFL);
+   kill(getpid(), signum);
+   }
+   }
+}
+
 int
 main(int argc, char **argv)
 {
@@ -2571,6 +2651,12 @@ main(int argc, char **argv)
unsigned lcore_id;
uint32_t n_tx_queue, nb_lcores;
uint8_t portid, nb_rx_queue, queue, socketid;
+   uint8_t avail_ports;
+
+   port_started = false;
+   force_quit = false;
+   signal(SIGINT, signal_handler);
+   signal(SIGTERM, signal_handler);

/* init EAL */
ret = rte_eal_init(argc, argv);
@@ -2711,34 +2797,33 @@ main(int argc, char **argv)
printf("\n");

/* start ports */
-   for (portid = 0; portid < nb_ports; portid++) {
-   if ((enabled_port_mask & (1 << portid)) == 0) {
-   continue;
-   }
-   /* Start device */
-   ret = rte_eth_dev_start(portid);
-   if (ret < 0)
-   rte_exit(EXIT_FAILURE, "rte_eth_dev_start: err=%d, 
port=%d\n",
-   ret, 

[dpdk-dev] [PATCH v3 2/3] examples/l2fwd: Handle SIGINT and SIGTERM in l2fwd

2015-12-28 Thread Zhihong Wang
Handle SIGINT and SIGTERM in l2fwd.

Signed-off-by: Zhihong Wang 
Acked-by: Michael Qiu 
---
 examples/l2fwd/main.c | 123 +-
 1 file changed, 101 insertions(+), 22 deletions(-)

diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c
index 720fd5a..ecd5d2b 100644
--- a/examples/l2fwd/main.c
+++ b/examples/l2fwd/main.c
@@ -44,6 +44,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 

 #include 
 #include 
@@ -69,6 +72,10 @@
 #include 
 #include 

+static volatile bool port_started;
+static volatile bool force_quit;
+static volatile int signo_quit;
+
 #define RTE_LOGTYPE_L2FWD RTE_LOGTYPE_USER1

 #define NB_MBUF   8192
@@ -283,8 +290,7 @@ l2fwd_main_loop(void)
portid);
}

-   while (1) {
-
+   while (!force_quit) {
cur_tsc = rte_rdtsc();

/*
@@ -491,8 +497,12 @@ check_all_ports_link_status(uint8_t port_num, uint32_t 
port_mask)
printf("\nChecking link status");
fflush(stdout);
for (count = 0; count <= MAX_CHECK_TIME; count++) {
+   if (force_quit)
+   return;
all_ports_up = 1;
for (portid = 0; portid < port_num; portid++) {
+   if (force_quit)
+   return;
if ((port_mask & (1 << portid)) == 0)
continue;
memset(, 0, sizeof(link));
@@ -534,18 +544,85 @@ check_all_ports_link_status(uint8_t port_num, uint32_t 
port_mask)
}
 }

+static uint8_t
+start_ports(void)
+{
+   unsigned portid, nb_ports, avail_ports;
+   int ret;
+
+   nb_ports = rte_eth_dev_count();
+   avail_ports = 0;
+   for (portid = 0; portid < nb_ports; portid++) {
+   if ((l2fwd_enabled_port_mask & (1 << portid)) == 0)
+   continue;
+   avail_ports++;
+   port_started = true;
+   printf("Starting port %d...", portid);
+   ret = rte_eth_dev_start(portid);
+   if (ret < 0)
+   rte_exit(EXIT_FAILURE,
+   "rte_eth_dev_start:err=%d, port=%u\n",
+ ret, (unsigned) portid);
+   rte_eth_promiscuous_enable(portid);
+   printf(" Done\n");
+   }
+
+   return avail_ports;
+}
+
+static void
+stop_ports(void)
+{
+   unsigned portid, nb_ports;
+
+   nb_ports = rte_eth_dev_count();
+   for (portid = 0; portid < nb_ports; portid++) {
+   if ((l2fwd_enabled_port_mask & (1 << portid)) == 0)
+   continue;
+   printf("Stopping port %d...", portid);
+   rte_eth_dev_stop(portid);
+   rte_eth_dev_close(portid);
+   printf(" Done\n");
+   }
+   port_started = false;
+}
+
+static void
+signal_handler(int signum)
+{
+   if (signum == SIGINT || signum == SIGTERM) {
+   printf("\nSignal %d received, preparing to exit...\n",
+   signum);
+   if (port_started) {
+   printf("Ports started already...\n");
+   signo_quit = signum;
+   force_quit = true;
+   } else {
+   printf("Ports not started yet...\n");
+   printf("Bye...\n");
+   /* exit with the expected status */
+   signal(signum, SIG_DFL);
+   kill(getpid(), signum);
+   }
+   }
+}
+
 int
 main(int argc, char **argv)
 {
struct lcore_queue_conf *qconf;
struct rte_eth_dev_info dev_info;
int ret;
-   uint8_t nb_ports;
-   uint8_t nb_ports_available;
+   uint8_t nb_ports, avail_ports;
uint8_t portid, last_port;
unsigned lcore_id, rx_lcore_id;
unsigned nb_ports_in_mask = 0;

+   port_started = false;
+   force_quit = false;
+   signal(SIGINT, signal_handler);
+   signal(SIGTERM, signal_handler);
+
/* init EAL */
ret = rte_eal_init(argc, argv);
if (ret < 0)
@@ -627,14 +704,11 @@ main(int argc, char **argv)
printf("Lcore %u: RX port %u\n", rx_lcore_id, (unsigned) 
portid);
}

-   nb_ports_available = nb_ports;
-
/* Initialise each port */
for (portid = 0; portid < nb_ports; portid++) {
/* skip ports that are not enabled */
if ((l2fwd_enabled_port_mask & (1 << portid)) == 0) {
printf("Skipping disabled port %u\n", (unsigned) 
portid);
-   nb_ports_available--;
continue;
}
/* init port */
@@ -666,16 +740,6 @@ main(int argc, char **argv)
rte_exit(EXIT_FAILURE, "rte_eth_tx_queue_setup:err=%d, 
port=%u\n",
   

[dpdk-dev] [PATCH v3 1/3] app/test-pmd: Handle SIGINT and SIGTERM in testpmd

2015-12-28 Thread Zhihong Wang
Handle SIGINT and SIGTERM in testpmd.

Signed-off-by: Zhihong Wang 
Acked-by: Michael Qiu 
---
 app/test-pmd/cmdline.c | 20 +---
 app/test-pmd/testpmd.c | 39 +--
 app/test-pmd/testpmd.h |  1 +
 3 files changed, 47 insertions(+), 13 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 73298c9..6d28c1b 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -90,6 +90,8 @@

 #include "testpmd.h"

+static struct cmdline *testpmd_cl;
+
 static void cmd_reconfig_device_queue(portid_t id, uint8_t dev, uint8_t queue);

 #ifdef RTE_NIC_BYPASS
@@ -9778,17 +9780,21 @@ cmdline_parse_ctx_t main_ctx[] = {
 void
 prompt(void)
 {
-   struct cmdline *cl;
-
/* initialize non-constant commands */
cmd_set_fwd_mode_init();

-   cl = cmdline_stdin_new(main_ctx, "testpmd> ");
-   if (cl == NULL) {
+   testpmd_cl = cmdline_stdin_new(main_ctx, "testpmd> ");
+   if (testpmd_cl == NULL)
return;
-   }
-   cmdline_interact(cl);
-   cmdline_stdin_exit(cl);
+   cmdline_interact(testpmd_cl);
+   cmdline_stdin_exit(testpmd_cl);
+}
+
+void
+prompt_exit(void)
+{
+   if (testpmd_cl != NULL)
+   cmdline_quit(testpmd_cl);
 }

 static void
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 98ae46d..1319917 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -1570,13 +1570,16 @@ pmd_test_exit(void)
if (test_done == 0)
stop_packet_forwarding();

-   FOREACH_PORT(pt_id, ports) {
-   printf("Stopping port %d...", pt_id);
-   fflush(stdout);
-   rte_eth_dev_close(pt_id);
-   printf("done\n");
+   if (ports != NULL) {
+   no_link_check = 1;
+   FOREACH_PORT(pt_id, ports) {
+   printf("\nShutting down port %d...\n", pt_id);
+   fflush(stdout);
+   stop_port(pt_id);
+   close_port(pt_id);
+   }
}
-   printf("bye...\n");
+   printf("\nBye...\n");
 }

 typedef void (*cmd_func_t)(void);
@@ -1984,12 +1987,35 @@ init_port(void)
ports[pid].enabled = 1;
 }

+static void
+force_quit(void)
+{
+   pmd_test_exit();
+   prompt_exit();
+}
+
+static void
+signal_handler(int signum)
+{
+   if (signum == SIGINT || signum == SIGTERM) {
+   printf("\nSignal %d received, preparing to exit...\n",
+   signum);
+   force_quit();
+   /* exit with the expected status */
+   signal(signum, SIG_DFL);
+   kill(getpid(), signum);
+   }
+}
+
 int
 main(int argc, char** argv)
 {
int  diag;
uint8_t port_id;

+   signal(SIGINT, signal_handler);
+   signal(SIGTERM, signal_handler);
+
diag = rte_eal_init(argc, argv);
if (diag < 0)
rte_panic("Cannot init EAL\n");
@@ -2041,6 +2067,7 @@ main(int argc, char** argv)
start_packet_forwarding(0);
printf("Press enter to exit\n");
rc = read(0, , 1);
+   pmd_test_exit();
if (rc < 0)
return 1;
}
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index ee7de98..7ffc17b 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -462,6 +462,7 @@ unsigned int parse_item_list(char* str, const char* 
item_name,
unsigned int *parsed_items, int check_unique_values);
 void launch_args_parse(int argc, char** argv);
 void prompt(void);
+void prompt_exit(void);
 void nic_stats_display(portid_t port_id);
 void nic_stats_clear(portid_t port_id);
 void nic_xstats_display(portid_t port_id);
-- 
2.5.0



[dpdk-dev] [PATCH v3 0/3] Handle SIGINT and SIGTERM in DPDK examples

2015-12-28 Thread Zhihong Wang
This patch handles SIGINT and SIGTERM in testpmd, l2fwd, and l3fwd, make sure 
all ports are properly stopped and closed.
For virtual ports, the stop and close function may deal with resource cleanup, 
such as socket files unlinking.

--
Changes in v3:

1. Make sure correct port operations regarding status

2. Small fixes to make the code clearer

--
Changes in v2:

1. Make sure graceful exit for all running phases

2. Make sure program exits with the right status

Zhihong Wang (3):
  app/test-pmd: Handle SIGINT and SIGTERM in testpmd
  examples/l2fwd: Handle SIGINT and SIGTERM in l2fwd
  examples/l3fwd: Handle SIGINT and SIGTERM in l3fwd

 app/test-pmd/cmdline.c |  20 +---
 app/test-pmd/testpmd.c |  39 ---
 app/test-pmd/testpmd.h |   1 +
 examples/l2fwd/main.c  | 123 +-
 examples/l3fwd/main.c  | 129 -
 5 files changed, 255 insertions(+), 57 deletions(-)

-- 
2.5.0



[dpdk-dev] [RFC PATCH 0/2] Virtio-net PMD Extension to work on host.

2015-12-28 Thread Tetsuya Mukawa
On 2015/12/28 14:15, Qiu, Michael wrote:
> Hi, Tetsuya
>
> I have a question about your solution, as I know you plan to run qemu
> and dpdk both in container right?

Hi Michael,

Thanks for your comments.

It depends on use cases.
For example, if we want to use vhost-user, we need to invoke 3 processes
(DPDK app that uses virtio-net PMD, QEMU and DPDK app that uses
vhost-user PMD).
These 3 process can run in both container and host.
Some users may want to run 3 processes in different container.
Some users may want to run QEMU and vhost-user PMD prcess in one container.
Anyway, in some cases, DPDK app and QEMU will run in one container like
you said.

>
> If so, I think it's a bit tricky, DPDK is a lib, and qemu is a App,
> seems it is not suitable to let a lib depends on Apps.

If we implement as Jianfeng did, virtio-net PMD will depends on
vhost-user PMD process.
And if there is no vhost-user process, it doesn't work.
In my case, if there is no QEMU process, it doesn't work.

> Also, till now I don't see any usecase to run qemu inside container.

Described above, if you don't want to run QEMU in container, you don't
need to do.
You can run QEMU in host.

Actually, QEMU will handle file descriptor of virtio-net PMD process
memory, so someone may want to isolate it.
In this case, they can run QEMU in container.

Thanks,
Tetsuya

> Thanks,
> Michael
>
> On 11/19/2015 6:58 PM, Tetsuya Mukawa wrote:
>> THIS IS A PoC IMPLEMENATION.
>>
>> [Abstraction]
>>
>> Normally, virtio-net PMD only works on VM, because there is no virtio-net 
>> device on host.
>> This RFC patch extends virtio-net PMD to be able to work on host as virtual 
>> PMD.
>> But we didn't implement virtio-net device as a part of virtio-net PMD.
>> To prepare virtio-net device for the PMD, start QEMU process with special 
>> QTest mode, then connect it from virtio-net PMD through unix domain socket.
>>
>> The PMD can connect to anywhere QEMU virtio-net device can.
>> For example, the PMD can connects to vhost-net kernel module and vhost-user 
>> backend application.
>> Similar to virtio-net PMD on QEMU, application memory that uses virtio-net 
>> PMD will be shared between vhost backend application.
>> But vhost backend application memory will not be shared.
>>
>> Main target of this PMD is container like docker, rkt, lxc and etc.
>> We can isolate related processes(virtio-net PMD process, QEMU and vhost-user 
>> backend process) by container.
>> But, to communicate through unix domain socket, shared directory will be 
>> needed.
>>
>>
>> [How to use]
>>
>> So far, we need QEMU patch to connect to vhost-user backend.
>> Please check known issue in later section.
>> Because of this, I will describe example of using vhost-net kernel module.
>>
>>  - Compile
>>  Set "CONFIG_RTE_VIRTIO_VDEV=y" in config/common_linux.
>>  Then compile it.
>>
>>  - Start QEMU like below.
>>  $ sudo qemu-system-x86_64 -qtest unix:/tmp/qtest0,server -machine 
>> accel=qtest \
>>-display none -qtest-log /dev/null \
>>-netdev 
>> type=tap,script=/etc/qemu-ifup,id=net0,vhost=on \
>>-device virtio-net-pci,netdev=net0 \
>>-chardev socket,id=chr1,path=/tmp/ivshmem0,server 
>> \
>>-device ivshmem,size=1G,chardev=chr1,vectors=1
>>
>>  - Start DPDK application like below
>>  $ sudo ./testpmd -c f -n 1 -m 1024 --shm \
>>   --vdev="eth_cvio0,qtest=/tmp/qtest0,ivshmem=/tmp/ivshmem0" 
>> -- \
>>   --disable-hw-vlan --txqflags=0xf00 -i
>>
>>  - Check created tap device.
>>
>> (*1) Please Specify same memory size in QEMU and DPDK command line.
>>
>>
>> [Detailed Description]
>>
>>  - virtio-net device implementation
>> The PMD uses QEMU virtio-net device. To do that, QEMU QTest functionality is 
>> used.
>> QTest is a test framework of QEMU devices. It allows us to implement a 
>> device driver outside of QEMU.
>> With QTest, we can implement DPDK application and virtio-net PMD as 
>> standalone process on host.
>> When QEMU is invoked as QTest mode, any guest code will not run.
>> To know more about QTest, see below.
>> http://wiki.qemu.org/Features/QTest
>>
>>  - probing devices
>> QTest provides a unix domain socket. Through this socket, driver process can 
>> access to I/O port and memory of QEMU virtual machine.
>> The PMD will send I/O port accesses to probe pci devices.
>> If we can find virtio-net and ivshmem device, initialize the devices.
>> Also, I/O port accesses of virtio-net PMD will be sent through socket, and 
>> virtio-net PMD can initialize vitio-net device on QEMU correctly.
>>
>>  - ivshmem device to share memory
>> To share memory that virtio-net PMD process uses, ivshmem device will be 
>> used.
>> Because ivshmem device can only handle one file descriptor, shared memory 
>> should be consist of one file.
>> To allocate such a memory, EAL has new option called "--shm".
>> If the option is 

[dpdk-dev] [PATCH v1 0/2] Virtio-net PMD Extension to work on host

2015-12-28 Thread Tetsuya Mukawa
On 2015/12/24 23:05, Tan, Jianfeng wrote:
> Hi Tetsuya,
>
> After several days' studying your patch, I have some questions as follows:
>
> 1. Is physically-contig memory really necessary?
> This is a too strong requirement IMHO. IVSHMEM doesn't require this in its 
> original meaning. So how do you think of
> Huawei Xie's idea of using virtual address for address translation? (In 
> addition, virtual address of mem_table could be
> different in application and QTest, but this can be addressed because 
> SET_MEM_TABLE msg will be intercepted by
> QTest)

Hi Jianfeng,

Thanks for your suggestion.
Huawei's idea may solve contig-mem restriction.
Let me have time to check it more.

> 2. Is root privilege OK in container's case?
> Another reason we'd like to give up physically-contig feature is that it 
> needs root privilege to read /proc/self/pagemap
> file. Container has already been widely criticized for bad security 
> isolation. Enabling root privilege will make it worse.

I haven't checked how to invoke DPDK application in non-privileged
container.
But if we can invoke it, it's great.

I guess if we allocate memory like you did, probably we will not read
"/proc/self/pagemap".
Then we will be able to invoke DPDK application in non-privileged container.
Is this correct?

> On the other hand, it's not easy to remove root privilege too. If we use 
> vhost-net as the backend, kernel will definitely
> require root privilege to create a tap device/raw socket. We tend to pick 
> such work, which requires root, into runtime
> preparation of a container. Do you agree?

Yes, I agree. It's not easy to remove root privilege in some cases.
I guess if we can remove it in vhost-user case, it will be enough for
DPDK users.
What do you think?

> 3.Is one Qtest process per virtio device too heavy?
> Although we can foresee that each container always owns only one virtio 
> device, but take its possible high density
> into consideration, hundreds or even thousands of container requires the same 
> number of QTest processes. As
> you mentioned that port hotplug is supported, is it possible to use just one 
> QTest process for all virtio devices
> emulation?

Yes, we can use pci hotplug for that purpose.
But it may depends on security policy.
The shared QEMU process knows all file descriptors of DPDK application
memories.
Because of this, I guess some users don't want to share QEMU process.

If the vhost-user is used, QEMU process doesn't use CPU resource.
So, I am not sure sleeping QEMU process will be overhead.

BTW, If we use pci hotplug, we need to use (virtual) pci bridge to
cascade pci devices.
So implementation will be more complex.
Honestly, I am not sure I will be able to finish it by next DPDK release.
How about starting from this implementation?
If we really need this feature, then add it.

> As you know, we have another solution according to this (which under heavy 
> internal review). But I think we have lots
> of common problems to be solved, right?

Yes, I think so. And thanks for good suggestion.

Tetsuya,

> Thanks for your great work!
>
> Thanks,
> Jianfeng
>
>> -Original Message-
>> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
>> Sent: Wednesday, December 16, 2015 4:37 PM
>> To: dev at dpdk.org
>> Cc: nakajima.yoshihiro at lab.ntt.co.jp; Tan, Jianfeng; Xie, Huawei;
>> mst at redhat.com; marcandre.lureau at gmail.com; Tetsuya Mukawa
>> Subject: [PATCH v1 0/2] Virtio-net PMD Extension to work on host
>>
>> [Change log]
>>
>> PATCH v1:
>> (Just listing functionality changes and important bug fix)
>> * Support virtio-net interrupt handling.
>>   (It means virtio-net PMD on host and guest have same virtio-net features)
>> * Fix memory allocation method to allocate contiguous memory correctly.
>> * Port Hotplug is supported.
>> * Rebase on DPDK-2.2.
>>
>>
>> [Abstraction]
>>
>> Normally, virtio-net PMD only works on VM, because there is no virtio-net
>> device on host.
>> This RFC patch extends virtio-net PMD to be able to work on host as virtual
>> PMD.
>> But we didn't implement virtio-net device as a part of virtio-net PMD.
>> To prepare virtio-net device for the PMD, start QEMU process with special
>> QTest mode, then connect it from virtio-net PMD through unix domain
>> socket.
>>
>> The virtio-net PMD on host is fully compatible with the PMD on guest.
>> We can use same functionalities, and connect to anywhere QEMU virtio-net
>> device can.
>> For example, the PMD can use virtio-net multi queues function. Also it can
>> connects to vhost-net kernel module and vhost-user backend application.
>> Similar to virtio-net PMD on QEMU, application memory that uses virtio-net
>> PMD will be shared between vhost backend application. But vhost backend
>> application memory will not be shared.
>>
>> Main target of this PMD is container like docker, rkt, lxc and etc.
>> We can isolate related processes(virtio-net PMD process, QEMU and vhost-
>> user backend process) by container.
>> But, to communicate 

[dpdk-dev] [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD

2015-12-28 Thread Rich Lane
On Wed, Dec 23, 2015 at 11:58 PM, Tetsuya Mukawa  wrote:
>
> Hi Rich and Yuanhan,
>
> I guess we have 2 implementations here.
>
> 1. rte_eth_vhost_get_queue_event() returns each event.
> 2. rte_eth_vhost_get_queue_status() returns current status of the queues.
>
> I guess option "2" is more generic manner to handle interrupts from
> device driver.
> In the case of option "1", if DPDK application doesn't call
> rte_eth_vhost_get_queue_event(), the vhost PMD needs to keep all events.
> This may exhaust memory.
>

Option 1 can be implemented in constant space by only tracking the latest
state of each
queue. I pushed a rough implementation to https://github.com/rlane/dpdk
vhost-queue-callback.

One more example is current link status interrupt handling.
> Actually ethdev API just returns current status of the port.
> What do you think?
>

Option 2 adds a small burden to the application but I'm fine with this as
long as it's
thread-safe (see my comments below).


> > An issue with having the application dig into struct virtio_net is that
> it
> > can only be safely accessed from
> > a callback on the vhost thread.
>
> Here is one of example how to invoke a callback handler registered by
> DPDK application from the PMD.
>
>   _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC);
>
> Above function is called by interrupt handling thread of the PMDs.
>
> Please check implementation of above function.
> A callback handler that DPDK application registers is called in
> "interrupt handling context".
> (I mean the interrupt handling thread of the PMD calls the callback
> handler of DPDK application also.)
> Anyway, I guess the callback handler of DPDK application can access to
> "struct virtio_net" safely.


> > A typical application running its control
> > plane on lcore 0 would need to
> > copy all the relevant info from struct virtio_net before sending it over.
>
> Could you please describe it more?
> Sorry, probably I don't understand correctly which restriction make you
> copy data.
> (As described above, the callback handler registered by DPDK application
> can safely access "to struct virtio_net". Does this solve the copy issue?)
>

The ethdev event callback can safely access struct virtio_net, yes. The
problem is
that a real application will likely want to handle queue state changes as
part
of its main event loop running on a separate thread. Because no other thread
can safely access struct virtio_net. the callback would need to copy the
queue
states out of struct virtio_net into another datastructure before sending
it to
the main thread.

Instead of having the callback muck around with struct virtio_net, I would
prefer
an API that I could call from any thread to get the current queue states.
This
also removes struct virtio_net from the PMD's API which I think is a win.


> > As you mentioned, queues for a single vhost port could be located on
> > different NUMA nodes. I think this
> > is an uncommon scenario but if needed you could add an API to retrieve
> the
> > NUMA node for a given port
> > and queue.
> >
>
> I agree this is very specific for vhost, because in the case of generic
> PCI device, all queues of a port are on same NUMA node.
> Anyway, because it's very specific for vhost, I am not sure we should
> add ethdev API to handle this.
>
> If we handle it by vhost PMD API, we probably have 2 options also here.
>
> 1. Extend "struct rte_eth_vhost_queue_event , and use
> rte_eth_vhost_get_queue_event() like you described.
> struct rte_eth_vhost_queue_event
> {
> uint16_t queue_id;
> bool rx;
> bool enable;
> +  int socket_id;
> };
>
> 2. rte_eth_vhost_get_queue_status() returns current socket_ids of all
> queues.
>

Up to you, but I think we can skip this for the time being because it would
be unusual
for a guest to place virtqueues for one PCI device on different NUMA nodes.


[dpdk-dev] [PATCH 4/4] virtio: check if any kernel driver is manipulating the device

2015-12-28 Thread Yuanhan Liu
On Fri, Dec 25, 2015 at 02:38:12AM +0800, Huawei Xie wrote:
> virtio PMD could use IO port to configure the virtio device without
> using uio driver.
> 
> There are two issues with previous implementation:
> 1) virtio PMD will take over each virtio device blindly even if some
> are not intended for DPDK.
> 2) driver conflict between virtio PMD and virtio-net kernel driver.
> 
> This patch checks if there is any kernel driver manipulating the virtio
> device before virtio PMD uses IO port to configure the device.
> 
> Fixes: da978dfdc43b ("virtio: use port IO to get PCI resource")
> 
> Signed-off-by: Huawei Xie 
> ---
>  drivers/net/virtio/virtio_ethdev.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/net/virtio/virtio_ethdev.c 
> b/drivers/net/virtio/virtio_ethdev.c
> index 00015ef..504346a 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -1138,6 +1138,13 @@ static int virtio_resource_init_by_ioports(struct 
> rte_pci_device *pci_dev)
>   int found = 0;
>   size_t linesz;
>  
> + if (pci_dev->kdrv != RTE_KDRV_NONE) {
> + PMD_INIT_LOG(ERR,
> + "%s(): kernel driver is manipulating this device." \
> + " Please unbind the kernel driver.", __func__);

PMD_INIT_LOG already prints the function, no need to reference __func__
again here.

--yliu
> + return -1;
> + }
> +
>   snprintf(pci_id, sizeof(pci_id), PCI_PRI_FMT,
>pci_dev->addr.domain,
>pci_dev->addr.bus,
> -- 
> 1.8.1.4


[dpdk-dev] [PATCH] mk: fix examples build failure

2015-12-28 Thread Yuanhan Liu
On Thu, Dec 24, 2015 at 08:38:07PM +0800, steeven lee wrote:
> 1. Fix examples build failure

Paste the build error here, so that we know you are acutally fixing a
build error. And if it's a build error specific to some GCC, or Linux
distribution, please also note it in the commit log.

> 2. make build as default output folder name

No, please do not do two things in one patch.

--yliu
> 
> Signed-off-by: steeven 
> ---
>  mk/internal/rte.extvars.mk | 4 ++--
>  mk/rte.extsubdir.mk| 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/mk/internal/rte.extvars.mk b/mk/internal/rte.extvars.mk
> index 040d39f..cabef0a 100644
> --- a/mk/internal/rte.extvars.mk
> +++ b/mk/internal/rte.extvars.mk
> @@ -52,9 +52,9 @@ RTE_EXTMK ?= $(RTE_SRCDIR)/Makefile
>  export RTE_EXTMK
> 
>  # RTE_SDK_BIN must point to .config, include/ and lib/.
> -RTE_SDK_BIN := $(RTE_SDK)/$(RTE_TARGET)
> +RTE_SDK_BIN := $(RTE_SDK)/build
>  ifeq ($(wildcard $(RTE_SDK_BIN)/.config),)
> -$(error Cannot find .config in $(RTE_SDK))
> +$(error Cannot find .config in $(RTE_SDK_BIN))
>  endif
> 
>  #
> diff --git a/mk/rte.extsubdir.mk b/mk/rte.extsubdir.mk
> index f50f006..819020a 100644
> --- a/mk/rte.extsubdir.mk
> +++ b/mk/rte.extsubdir.mk
> @@ -46,7 +46,7 @@ $(DIRS-y):
> @echo "== $@"
> $(Q)$(MAKE) -C $(@) \
> M=$(CURDIR)/$(@)/Makefile \
> -   O=$(BASE_OUTPUT)/$(CUR_SUBDIR)/$(@)/$(RTE_TARGET) \
> +   O=$(BASE_OUTPUT)/$(CUR_SUBDIR)/build \
> BASE_OUTPUT=$(BASE_OUTPUT) \
> CUR_SUBDIR=$(CUR_SUBDIR)/$(@) \
> S=$(CURDIR)/$(@) \
> -- 
> 1.9.1


[dpdk-dev] [PATCH 0/4] check if any kernel driver is manipulating the virtio device

2015-12-28 Thread Peter Xu
On Fri, Dec 25, 2015 at 02:38:08AM +0800, Huawei Xie wrote:
> virtio PMD doesn't set RTE_PCI_DRV_NEED_MAPPING in drv_flags of its
> eth_driver. It will try igb_uio and PORT IO in turn to configure
> virtio device. Even user in guest VM doesn't want to use virtio for
> DPDK, virtio PMD will take over the device blindly.
> 
> The more serious problem is kernel driver is still manipulating the
> device, which causes driver conflict.
> 
> This patch checks if there is any kernel driver manipulating the
> virtio device before virtio PMD uses port IO to configure the device.
> 
> Huawei Xie (4):
>   eal: make the comment more accurate
>   eal: set kdrv to RTE_KDRV_NONE if kernel driver isn't manipulating the 
> device.
>   virtio: return 1 to tell the kernel we don't take over this device
>   virtio: check if any kernel driver is manipulating the virtio device

Thanks for the patch. Looks good to me.

Peter

> 
>  drivers/net/virtio/virtio_ethdev.c | 15 +--
>  lib/librte_eal/common/eal_common_pci.c |  8 
>  lib/librte_eal/linuxapp/eal/eal_pci.c  |  2 +-
>  3 files changed, 18 insertions(+), 7 deletions(-)
> 
> -- 
> 1.8.1.4
> 


[dpdk-dev] [PATCH v2 0/3] Handle SIGINT and SIGTERM in DPDK examples

2015-12-28 Thread Wang, Zhihong


> -Original Message-
> From: Qiu, Michael
> Sent: Monday, December 28, 2015 12:18 PM
> To: Wang, Zhihong ; dev at dpdk.org
> Cc: Ananyev, Konstantin ;
> stephen at networkplumber.org
> Subject: Re: [PATCH v2 0/3] Handle SIGINT and SIGTERM in DPDK examples
> 
> On 2015/12/25 17:40, Wang, Zhihong wrote:
> > This patch handles SIGINT and SIGTERM in testpmd, l2fwd, and l3fwd, make
> sure all ports are properly stopped and closed.
> > For virtual ports, the stop and close function may deal with resource 
> > cleanup,
> such as socket files unlinking.
> >
> > --
> > Changes in v2:
> >
> > 1. Make sure graceful exit for all running phases
> >
> > 2. Make sure program exits with the right status
> >
> > Zhihong Wang (3):
> >   app/test-pmd: Handle SIGINT and SIGTERM in testpmd
> >   examples/l2fwd: Handle SIGINT and SIGTERM in l2fwd
> >   examples/l3fwd: Handle SIGINT and SIGTERM in l3fwd
> >
> >  app/test-pmd/cmdline.c |  19 ++---
> >  app/test-pmd/testpmd.c |  38 ++---
> >  app/test-pmd/testpmd.h |   1 +
> >  examples/l2fwd/main.c  |  60 +++
> >  examples/l3fwd/main.c  | 110
> -
> >  5 files changed, 196 insertions(+), 32 deletions(-)
> >
> 
> Next time, you'd better not to top post for V2 :)

Gotcha :)

> 
> Acked-by: Michael Qiu 


[dpdk-dev] [PATCH] remove redundant __func__ in PMD_INIT_LOG and PMD_RX_LOG

2015-12-28 Thread Stephen Hemminger
On Mon, 28 Dec 2015 01:28:28 +0800
Huawei Xie  wrote:

> 
> Signed-off-by: Huawei Xie 
> ---
>  drivers/net/virtio/virtio_ethdev.c   | 12 +---
>  drivers/net/vmxnet3/vmxnet3_ethdev.c |  6 +++---
>  drivers/net/vmxnet3/vmxnet3_rxtx.c   |  2 +-
>  3 files changed, 9 insertions(+), 11 deletions(-)
> 

Looks good, probably worth auditing the code for extra newlines as well at some 
point.

Acked-by: Stephen Hemminger 


[dpdk-dev] [RFC PATCH 0/6] General tunneling APIs

2015-12-28 Thread Liu, Jijiang
Hi Miroslaw,

The partial answer is below.

> -Original Message-
> From: Walukiewicz, Miroslaw
> Sent: Wednesday, December 23, 2015 7:18 PM
> To: Liu, Jijiang; dev at dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs
> 
> Hi Jijang,
> 
> I like an idea of tunnel API very much.
> 
> I have a few questions.
> 
> 1. I see that you have only i40e support due to lack of HW tunneling support
> in other NICs.
> I don't see a way how do you want to handle tunneling requests for NICs
> without HW offload.

The flow director offload mechanism is used here, flow director is a common 
feature in current NICs.
Here I don't use special related tunneling HW offload features, the goal is 
that we want to support  all of NICs.

> I think that we should have one common function for sending tunneled
> packets but the initialization should check the NIC capabilities and call some
> registered function making tunneling in SW in case of lack of HW support.
Yes, we should check NIC capabilities.

> I know that making tunnel is very time consuming process, but it makes an
> API more generic. Similar only 3 protocols are supported by i40e by HW and
> we can imagine about 40 or more different tunnels working with this NIC.
> 
> Making the SW implementation we could support missing tunnels even for
> i40e.

In this patch set, I just use VXLAN protocol to demonstrate the framework, 
If the framework is accepted, other tunneling protocol will be added one by one 
in future. 

> 2. I understand that we need RX HW queue defined in struct
> rte_eth_tunnel_conf but why tx_queue is necessary?.
>   As I know i40e HW we can set tunneled packet descriptors in any HW queue
> and receive only on one specific queue.

As for adding tx_queue here, I have already explained here at [1]

[1] http://dpdk.org/ml/archives/dev/2015-December/030509.html

Do you think it makes sense?

> 4. In your implementation you are assuming the there is one tunnel
> configured per DPDK interface
> 
> rte_eth_dev_tunnel_configure(uint8_t port_id,
> +  struct rte_eth_tunnel_conf *tunnel_conf)
> 
No, in terms of i40e,  there will  be up to 8K tunnels  in one DPDK interface,
It depends on number of flow rules on a pair of queues.

struct rte_eth_tunnel_conf {
uint16_t rx_queue;
uint16_t tx_queue;
uint16_t udp_tunnel_port;
uint16_t nb_flow;
uint16_t filter_type;
struct rte_eth_tunnel_flow *tunnel_flow;
};

If the ' nb_flow ' is set 2000, and you can configure 2000 flow rules on one 
queues on a port.

> The sense of tunnel is lack of interfaces in the system because number of
> possible VLANs is too small (4095).
> In the DPDK we have only one tunnel per physical port what is useless even
> with such big acceleration provided with i40e.

> In normal use cases there is a need for 10,000s of tunnels per interface. Even
> for Vxlan we have 24 bits for tunnel definition


We use flow director HW offload here, in terms of i40e, it support up to 8K 
flow rules of exact match.
This is HW limitation, 10,000s of tunnels per interface is not supported by HW.


> 5. I see that you have implementations for VXLAN,TEREDO, and GENEVE
> tunnels in i40e drivers. I could  find the implementation for VXLAN
> encap/decap. Are all files in the patch present?
No, I have not finished all of codes, just VXLAN here.
Other tunneling protocol will be added one by one in future.

> Regards,
> 
> Mirek
> 
> 
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jijiang Liu
> > Sent: Wednesday, December 23, 2015 9:50 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs
> >
> > I want to define a set of General tunneling APIs, which are used to
> > accelarate tunneling packet processing in DPDK, In this RFC patch set,
> > I wll explain my idea using some codes.
> >
> > 1. Using flow director offload to define a tunnel flow in a pair of queues.
> >
> > flow rule: src IP + dst IP + src port + dst port + tunnel ID (for
> > VXLAN)
> >
> > For example:
> > struct rte_eth_tunnel_conf{
> > .tunnel_type = VXLAN,
> > .rx_queue = 1,
> > .tx_queue = 1,
> > .filter_type = 'src ip + dst ip + src port + dst port + tunnel id'
> > .flow_tnl {
> > .tunnel_type = VXLAN,
> > .tunnel_id = 100,
> > .remote_mac = 11.22.33.44.55.66,
> >  .ip_type = ipv4,
> >  .outer_ipv4.src_ip = 192.168.10.1
> >  .outer_ipv4.dst_ip = 10.239.129.11
> >  .src_port = 1000,
> >  .dst_port =2000
> > };
> >
> > 2. Configure tunnel flow for a device and for a pair of queues.
> >
> > rte_eth_dev_tunnel_configure(0, _eth_tunnel_conf);
> >
> > In this API, it will call RX decapsulation and TX encapsulation
> > callback function if HW doesn't support encap/decap, and a space will
> > be allocated for tunnel configuration and store a pointer to this new
> > allocated space 

[dpdk-dev] [PATCH 3/4] virtio: return 1 to tell the upper layer we don't take over this device

2015-12-28 Thread Xie, Huawei


> -Original Message-
> From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> Sent: Monday, December 28, 2015 1:25 PM
> To: Xie, Huawei
> Cc: dev at dpdk.org; Jayakumar, Muthurajan; Troitsky, Nikita;
> peterx at redhat.com; stephen at networkplumber.org;
> Changchun.ouyang at hotmail.com; thomas.monjalon at 6wind.com
> Subject: Re: [PATCH 3/4] virtio: return 1 to tell the upper layer we
> don't take over this device
> 
> On Fri, Dec 25, 2015 at 02:38:11AM +0800, Huawei Xie wrote:
> > if virtio_resource_init fails, cleanup the resource and return 1 to
> > tell the upper layer we don't take over this device.
> > return -1 means error and DPDK will exit.
> >
> > Signed-off-by: Huawei Xie 
> > ---
> >  drivers/net/virtio/virtio_ethdev.c | 8 ++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/virtio/virtio_ethdev.c
> b/drivers/net/virtio/virtio_ethdev.c
> > index d928339..00015ef 100644
> > --- a/drivers/net/virtio/virtio_ethdev.c
> > +++ b/drivers/net/virtio/virtio_ethdev.c
> > @@ -1287,8 +1287,12 @@ eth_virtio_dev_init(struct rte_eth_dev
> *eth_dev)
> >
> > pci_dev = eth_dev->pci_dev;
> >
> > -   if (virtio_resource_init(pci_dev) < 0)
> > -   return -1;
> > +   /* Return 1 to tell the upper layer we don't take over this
> device. */
> > +   if (virtio_resource_init(pci_dev) < 0) {
> > +   rte_free(eth_dev->data->mac_addrs);
> > +   eth_dev->data->mac_addrs = NULL;
> 
> This assignment looks unnecessary to me.
> 

Noted this when write this code. It is ok not to reset as this layer is 
responsible for all the allocation/free.


> 
> And, I think above comment is better to put here, right above the
> return
> statement.
> 
> > +   return 1;
> > +   }
> 
>   --yliu


[dpdk-dev] [PATCH 4/4] virtio: check if any kernel driver is manipulating the device

2015-12-28 Thread Xie, Huawei


> -Original Message-
> From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> Sent: Monday, December 28, 2015 1:27 PM
> To: Xie, Huawei
> Cc: dev at dpdk.org; Jayakumar, Muthurajan; Troitsky, Nikita;
> peterx at redhat.com; stephen at networkplumber.org;
> Changchun.ouyang at hotmail.com; thomas.monjalon at 6wind.com
> Subject: Re: [PATCH 4/4] virtio: check if any kernel driver is
> manipulating the device
> 
> On Fri, Dec 25, 2015 at 02:38:12AM +0800, Huawei Xie wrote:
> > virtio PMD could use IO port to configure the virtio device without
> > using uio driver.
> >
> > There are two issues with previous implementation:
> > 1) virtio PMD will take over each virtio device blindly even if some
> > are not intended for DPDK.
> > 2) driver conflict between virtio PMD and virtio-net kernel driver.
> >
> > This patch checks if there is any kernel driver manipulating the
> virtio
> > device before virtio PMD uses IO port to configure the device.
> >
> > Fixes: da978dfdc43b ("virtio: use port IO to get PCI resource")
> >
> > Signed-off-by: Huawei Xie 
> > ---
> >  drivers/net/virtio/virtio_ethdev.c | 7 +++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/net/virtio/virtio_ethdev.c
> b/drivers/net/virtio/virtio_ethdev.c
> > index 00015ef..504346a 100644
> > --- a/drivers/net/virtio/virtio_ethdev.c
> > +++ b/drivers/net/virtio/virtio_ethdev.c
> > @@ -1138,6 +1138,13 @@ static int
> virtio_resource_init_by_ioports(struct rte_pci_device *pci_dev)
> > int found = 0;
> > size_t linesz;
> >
> > +   if (pci_dev->kdrv != RTE_KDRV_NONE) {
> > +   PMD_INIT_LOG(ERR,
> > +   "%s(): kernel driver is manipulating this device." \
> > +   " Please unbind the kernel driver.", __func__);
> 
> PMD_INIT_LOG already prints the function, no need to reference
> __func__
> again here.

Good. We have to fix the similar issue in virtio_resource_init_xx as well.

> 
>   --yliu
> > +   return -1;
> > +   }
> > +
> > snprintf(pci_id, sizeof(pci_id), PCI_PRI_FMT,
> >  pci_dev->addr.domain,
> >  pci_dev->addr.bus,
> > --
> > 1.8.1.4


[dpdk-dev] [RFC PATCH 0/2] Virtio-net PMD Extension to work on host.

2015-12-28 Thread Qiu, Michael
Hi, Tetsuya

I have a question about your solution, as I know you plan to run qemu
and dpdk both in container right?

If so, I think it's a bit tricky, DPDK is a lib, and qemu is a App,
seems it is not suitable to let a lib depends on Apps.

Also, till now I don't see any usecase to run qemu inside container.

Thanks,
Michael

On 11/19/2015 6:58 PM, Tetsuya Mukawa wrote:
> THIS IS A PoC IMPLEMENATION.
>
> [Abstraction]
>
> Normally, virtio-net PMD only works on VM, because there is no virtio-net 
> device on host.
> This RFC patch extends virtio-net PMD to be able to work on host as virtual 
> PMD.
> But we didn't implement virtio-net device as a part of virtio-net PMD.
> To prepare virtio-net device for the PMD, start QEMU process with special 
> QTest mode, then connect it from virtio-net PMD through unix domain socket.
>
> The PMD can connect to anywhere QEMU virtio-net device can.
> For example, the PMD can connects to vhost-net kernel module and vhost-user 
> backend application.
> Similar to virtio-net PMD on QEMU, application memory that uses virtio-net 
> PMD will be shared between vhost backend application.
> But vhost backend application memory will not be shared.
>
> Main target of this PMD is container like docker, rkt, lxc and etc.
> We can isolate related processes(virtio-net PMD process, QEMU and vhost-user 
> backend process) by container.
> But, to communicate through unix domain socket, shared directory will be 
> needed.
>
>
> [How to use]
>
> So far, we need QEMU patch to connect to vhost-user backend.
> Please check known issue in later section.
> Because of this, I will describe example of using vhost-net kernel module.
>
>  - Compile
>  Set "CONFIG_RTE_VIRTIO_VDEV=y" in config/common_linux.
>  Then compile it.
>
>  - Start QEMU like below.
>  $ sudo qemu-system-x86_64 -qtest unix:/tmp/qtest0,server -machine 
> accel=qtest \
>-display none -qtest-log /dev/null \
>-netdev 
> type=tap,script=/etc/qemu-ifup,id=net0,vhost=on \
>-device virtio-net-pci,netdev=net0 \
>-chardev socket,id=chr1,path=/tmp/ivshmem0,server \
>-device ivshmem,size=1G,chardev=chr1,vectors=1
>
>  - Start DPDK application like below
>  $ sudo ./testpmd -c f -n 1 -m 1024 --shm \
>   --vdev="eth_cvio0,qtest=/tmp/qtest0,ivshmem=/tmp/ivshmem0" 
> -- \
>   --disable-hw-vlan --txqflags=0xf00 -i
>
>  - Check created tap device.
>
> (*1) Please Specify same memory size in QEMU and DPDK command line.
>
>
> [Detailed Description]
>
>  - virtio-net device implementation
> The PMD uses QEMU virtio-net device. To do that, QEMU QTest functionality is 
> used.
> QTest is a test framework of QEMU devices. It allows us to implement a device 
> driver outside of QEMU.
> With QTest, we can implement DPDK application and virtio-net PMD as 
> standalone process on host.
> When QEMU is invoked as QTest mode, any guest code will not run.
> To know more about QTest, see below.
> http://wiki.qemu.org/Features/QTest
>
>  - probing devices
> QTest provides a unix domain socket. Through this socket, driver process can 
> access to I/O port and memory of QEMU virtual machine.
> The PMD will send I/O port accesses to probe pci devices.
> If we can find virtio-net and ivshmem device, initialize the devices.
> Also, I/O port accesses of virtio-net PMD will be sent through socket, and 
> virtio-net PMD can initialize vitio-net device on QEMU correctly.
>
>  - ivshmem device to share memory
> To share memory that virtio-net PMD process uses, ivshmem device will be used.
> Because ivshmem device can only handle one file descriptor, shared memory 
> should be consist of one file.
> To allocate such a memory, EAL has new option called "--shm".
> If the option is specified, EAL will open a file and allocate memory from 
> hugepages.
> While initializing ivshmem device, we can set BAR(Base Address Register).
> It represents which memory QEMU vcpu can access to this shared memory.
> We will specify host physical address of shared memory as this address.
> It is very useful because we don't need to apply patch to QEMU to calculate 
> address offset.
> (For example, if virtio-net PMD process will allocate memory from shared 
> memory, then specify the physical address of it to virtio-net register, QEMU 
> virtio-net device can understand it without calculating address offset.)
>
>  - Known limitation
> So far, the PMD doesn't handle interrupts from QEMU devices.
> Because of this, VIRTIO_NET_F_STATUS functionality is dropped.
> But without it, we can use all virtio-net functions.
>
>  - Known issues
> So far, to use vhost-user, we need to apply vhost-user patch to QEMU and DPDK 
> vhost library.
> This is because, QEMU will not send memory information and file descriptor of 
> ivshmem device to vhost-user backend.
> (Anyway, vhost-net kernel module can receive the information. So 

[dpdk-dev] [PATCH] mk: fix examples build failure

2015-12-28 Thread Qiu, Michael
On 12/24/2015 8:38 PM, steeven lee wrote:
> 1. Fix examples build failure
> 2. make build as default output folder name
>
> Signed-off-by: steeven 
> ---
>  mk/internal/rte.extvars.mk | 4 ++--
>  mk/rte.extsubdir.mk| 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/mk/internal/rte.extvars.mk b/mk/internal/rte.extvars.mk
> index 040d39f..cabef0a 100644
> --- a/mk/internal/rte.extvars.mk
> +++ b/mk/internal/rte.extvars.mk
> @@ -52,9 +52,9 @@ RTE_EXTMK ?= $(RTE_SRCDIR)/Makefile
>  export RTE_EXTMK
>
>  # RTE_SDK_BIN must point to .config, include/ and lib/.
> -RTE_SDK_BIN := $(RTE_SDK)/$(RTE_TARGET)
> +RTE_SDK_BIN := $(RTE_SDK)/build
>  ifeq ($(wildcard $(RTE_SDK_BIN)/.config),)
> -$(error Cannot find .config in $(RTE_SDK))
> +$(error Cannot find .config in $(RTE_SDK_BIN))
>  endif
>
>  #
> diff --git a/mk/rte.extsubdir.mk b/mk/rte.extsubdir.mk
> index f50f006..819020a 100644
> --- a/mk/rte.extsubdir.mk
> +++ b/mk/rte.extsubdir.mk
> @@ -46,7 +46,7 @@ $(DIRS-y):
> @echo "== $@"
> $(Q)$(MAKE) -C $(@) \
> M=$(CURDIR)/$(@)/Makefile \
> -   O=$(BASE_OUTPUT)/$(CUR_SUBDIR)/$(@)/$(RTE_TARGET) \
> +   O=$(BASE_OUTPUT)/$(CUR_SUBDIR)/build \
> BASE_OUTPUT=$(BASE_OUTPUT) \
> CUR_SUBDIR=$(CUR_SUBDIR)/$(@) \
> S=$(CURDIR)/$(@) \

Could you show your compile error log? And how to reproduce it?

Thanks,
Michael


[dpdk-dev] [RFC PATCH 0/6] General tunneling APIs

2015-12-28 Thread Liu, Jijiang


> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Thursday, December 24, 2015 2:31 AM
> To: Liu, Jijiang
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs
> 
> On Wed, 23 Dec 2015 16:49:46 +0800
> Jijiang Liu  wrote:
> 
> > 1)at config phase
> >
> > dev_config(port, ...);
> > tunnel_config(port,...);
> > ...
> > dev_start(port);
> > ...
> > rx_burst(port, rxq,... );
> > tx_burst(port, txq,...);
> 
> What about dynamically adding and deleting multiple tunnels after device
> has started? This would be the more common case in a real world
> environment.
Yes, this makes sense, we will support.


[dpdk-dev] [PATCH v2 1/3] app/test-pmd: Handle SIGINT and SIGTERM in testpmd

2015-12-28 Thread Wang, Zhihong
> > -   cl = cmdline_stdin_new(main_ctx, "testpmd> ");
> > -   if (cl == NULL) {
> > +   testpmd_cl = cmdline_stdin_new(main_ctx, "testpmd> ");
> > +   if (testpmd_cl == NULL) {
> > return;
> > }
> 
> Style nit: don't need {} around single statement.
> 
> > +static void
> > +sigint_handler(__rte_unused int signum) {
> > +   if (signum == SIGINT || signum == SIGTERM) {
> 
> signmum is used, so don't want __rte_unused
> 

Thanks :) Will fix these in the next version.



[dpdk-dev] [PATCH v2 2/3] examples/l2fwd: Handle SIGINT and SIGTERM in l2fwd

2015-12-28 Thread Wang, Zhihong
Hi Stephen,

Really appreciate the detailed review!
Please see comments below.


> > +static int force_quit = -1;
> > +static int signo_quit = -1;
> 
> These need to be volatile otherwise you risk compiler optimizing away your
> checks.

Yes. Don't wanna take chances here.

> 
> Also, don't use -1/0 just use 0/1 for boolean or better yet the definition in
>  of bool and true/false.
> That way the code can read much nicer.

-1 when forwarding not started yet.
Can add a "static bool fwd_started;" to represent this to make it clearer.

> 
> >  #define RTE_LOGTYPE_L2FWD RTE_LOGTYPE_USER1
> >
> >  #define NB_MBUF   8192
> > @@ -284,6 +289,8 @@ l2fwd_main_loop(void)
> > }
> >
> > while (1) {
> > +   if (unlikely(force_quit != 0))
> > +   break;
> 
> Please maske this a proper while loop instead.

Exactly.

> 
> while (!force_quit) {
> 
> >
> > cur_tsc = rte_rdtsc();
> >
> > @@ -534,6 +541,45 @@ check_all_ports_link_status(uint8_t port_num,
> uint32_t port_mask)
> > }
> >  }
> >
> > +static void
> > +stop_ports(void)
> > +{
> > +   unsigned portid, nb_ports;
> > +
> > +   nb_ports = rte_eth_dev_count();
> > +   for (portid = 0; portid < nb_ports; portid++) {
> > +   if ((l2fwd_enabled_port_mask & (1 << portid)) == 0) {
> > +   continue;
> > +   }
> 
> No need for {} here.
> 
> > +   printf("Stopping port %d...", portid);
> > +   rte_eth_dev_stop(portid);
> > +   rte_eth_dev_close(portid);
> > +   printf(" Done\n");
> > +   }
> > +}
> > +
> > +static void
> > +signal_handler(__rte_unused int signum) {
> > +   if (signum == SIGINT || signum == SIGTERM) {
> 
> signum is used, dont give __rte_unused attribute.
> 
> >
> > /* launch per-lcore init on every lcore */
> > +   force_quit = 0;
> 
> What is gained by having tri-value here. Just initialize it as false.

As stated above.

> 
> 
> > rte_eal_mp_remote_launch(l2fwd_launch_one_lcore, NULL,
> CALL_MASTER);
> > RTE_LCORE_FOREACH_SLAVE(lcore_id) {
> > if (rte_eal_wait_lcore(lcore_id) < 0)
> > return -1;
> > }
> >
> > +   printf("Stopping forwarding... Done\n");
> > +   /* stop ports */
> > +   stop_ports();
> > +   printf("Bye...\n");
> > +   /* inform if there's a caller */
> > +   if (force_quit != 0) {
> > +   signal(signo_quit, SIG_DFL);
> > +   kill(getpid(), signo_quit);
> 
> The kill should not be needed.

The purpose is to make the program exit with the killed status.

> 
> It would be good if examples cleaned up allocations, that way they could be 
> used
> with valgrind for validation of drivers, etc.



[dpdk-dev] [PATCH] remove redundant __func__ in PMD_INIT_LOG and PMD_RX_LOG

2015-12-28 Thread Huawei Xie

Signed-off-by: Huawei Xie 
---
 drivers/net/virtio/virtio_ethdev.c   | 12 +---
 drivers/net/vmxnet3/vmxnet3_ethdev.c |  6 +++---
 drivers/net/vmxnet3/vmxnet3_rxtx.c   |  2 +-
 3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index d928339..f19306f 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -150,9 +150,7 @@ virtio_send_command(struct virtqueue *vq, struct 
virtio_pmd_ctrl *ctrl,
ctrl->status = status;

if (!(vq && vq->hw->cvq)) {
-   PMD_INIT_LOG(ERR,
-"%s(): Control queue is not supported.",
-__func__);
+   PMD_INIT_LOG(ERR, "Control queue is not supported.");
return -1;
}
head = vq->vq_desc_head_idx;
@@ -306,12 +304,12 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
vq_size = VIRTIO_READ_REG_2(hw, VIRTIO_PCI_QUEUE_NUM);
PMD_INIT_LOG(DEBUG, "vq_size: %u nb_desc:%u", vq_size, nb_desc);
if (vq_size == 0) {
-   PMD_INIT_LOG(ERR, "%s: virtqueue does not exist", __func__);
+   PMD_INIT_LOG(ERR, "virtqueue does not exist");
return -EINVAL;
}

if (!rte_is_power_of_2(vq_size)) {
-   PMD_INIT_LOG(ERR, "%s: virtqueue size is not powerof 2", 
__func__);
+   PMD_INIT_LOG(ERR, "virtqueue size is not powerof 2");
return -EINVAL;
}

@@ -336,7 +334,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
RTE_CACHE_LINE_SIZE);
}
if (vq == NULL) {
-   PMD_INIT_LOG(ERR, "%s: Can not allocate virtqueue", __func__);
+   PMD_INIT_LOG(ERR, "Can not allocate virtqueue");
return (-ENOMEM);
}
if (queue_type == VTNET_RQ && vq->sw_ring == NULL) {
@@ -1146,7 +1144,7 @@ static int virtio_resource_init_by_ioports(struct 
rte_pci_device *pci_dev)

fp = fopen("/proc/ioports", "r");
if (fp == NULL) {
-   PMD_INIT_LOG(ERR, "%s(): can't open ioports", __func__);
+   PMD_INIT_LOG(ERR, "can't open ioports");
return -1;
}

diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c 
b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index c363bf6..f5834d6 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -564,7 +564,7 @@ vmxnet3_dev_start(struct rte_eth_dev *dev)
status = VMXNET3_READ_BAR1_REG(hw, VMXNET3_REG_CMD);

if (status != 0) {
-   PMD_INIT_LOG(ERR, "Device activation in %s(): UNSUCCESSFUL", 
__func__);
+   PMD_INIT_LOG(ERR, "Device activation: UNSUCCESSFUL");
return -1;
}

@@ -577,7 +577,7 @@ vmxnet3_dev_start(struct rte_eth_dev *dev)
 */
ret = vmxnet3_dev_rxtx_init(dev);
if (ret != VMXNET3_SUCCESS) {
-   PMD_INIT_LOG(ERR, "Device receive init in %s: UNSUCCESSFUL", 
__func__);
+   PMD_INIT_LOG(ERR, "Device receive init: UNSUCCESSFUL");
return ret;
}

@@ -882,7 +882,7 @@ vmxnet3_process_events(struct vmxnet3_hw *hw)
uint32_t events = hw->shared->ecr;

if (!events) {
-   PMD_INIT_LOG(ERR, "No events to process in %s()", __func__);
+   PMD_INIT_LOG(ERR, "No events to process");
return;
}

diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index 4de5d89..e592010 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -462,7 +462,7 @@ vmxnet3_post_rx_bufs(vmxnet3_rx_queue_t *rxq, uint8_t 
ring_id)
/* Allocate blank mbuf for the current Rx Descriptor */
mbuf = rte_rxmbuf_alloc(rxq->mp);
if (unlikely(mbuf == NULL)) {
-   PMD_RX_LOG(ERR, "Error allocating mbuf in %s", 
__func__);
+   PMD_RX_LOG(ERR, "Error allocating mbuf");
rxq->stats.rx_buf_alloc_failure++;
err = ENOMEM;
break;
-- 
1.8.1.4



[dpdk-dev] [PATCH] remove redundant __func__ in PMD_INIT_LOG and PMD_RX_LOG

2015-12-28 Thread Huawei Xie

Signed-off-by: Huawei Xie 
---
 drivers/net/virtio/virtio_ethdev.c   | 12 +---
 drivers/net/vmxnet3/vmxnet3_ethdev.c |  6 +++---
 drivers/net/vmxnet3/vmxnet3_rxtx.c   |  2 +-
 3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index d928339..f19306f 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -150,9 +150,7 @@ virtio_send_command(struct virtqueue *vq, struct 
virtio_pmd_ctrl *ctrl,
ctrl->status = status;

if (!(vq && vq->hw->cvq)) {
-   PMD_INIT_LOG(ERR,
-"%s(): Control queue is not supported.",
-__func__);
+   PMD_INIT_LOG(ERR, "Control queue is not supported.");
return -1;
}
head = vq->vq_desc_head_idx;
@@ -306,12 +304,12 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
vq_size = VIRTIO_READ_REG_2(hw, VIRTIO_PCI_QUEUE_NUM);
PMD_INIT_LOG(DEBUG, "vq_size: %u nb_desc:%u", vq_size, nb_desc);
if (vq_size == 0) {
-   PMD_INIT_LOG(ERR, "%s: virtqueue does not exist", __func__);
+   PMD_INIT_LOG(ERR, "virtqueue does not exist");
return -EINVAL;
}

if (!rte_is_power_of_2(vq_size)) {
-   PMD_INIT_LOG(ERR, "%s: virtqueue size is not powerof 2", 
__func__);
+   PMD_INIT_LOG(ERR, "virtqueue size is not powerof 2");
return -EINVAL;
}

@@ -336,7 +334,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
RTE_CACHE_LINE_SIZE);
}
if (vq == NULL) {
-   PMD_INIT_LOG(ERR, "%s: Can not allocate virtqueue", __func__);
+   PMD_INIT_LOG(ERR, "Can not allocate virtqueue");
return (-ENOMEM);
}
if (queue_type == VTNET_RQ && vq->sw_ring == NULL) {
@@ -1146,7 +1144,7 @@ static int virtio_resource_init_by_ioports(struct 
rte_pci_device *pci_dev)

fp = fopen("/proc/ioports", "r");
if (fp == NULL) {
-   PMD_INIT_LOG(ERR, "%s(): can't open ioports", __func__);
+   PMD_INIT_LOG(ERR, "can't open ioports");
return -1;
}

diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c 
b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index c363bf6..f5834d6 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -564,7 +564,7 @@ vmxnet3_dev_start(struct rte_eth_dev *dev)
status = VMXNET3_READ_BAR1_REG(hw, VMXNET3_REG_CMD);

if (status != 0) {
-   PMD_INIT_LOG(ERR, "Device activation in %s(): UNSUCCESSFUL", 
__func__);
+   PMD_INIT_LOG(ERR, "Device activation: UNSUCCESSFUL");
return -1;
}

@@ -577,7 +577,7 @@ vmxnet3_dev_start(struct rte_eth_dev *dev)
 */
ret = vmxnet3_dev_rxtx_init(dev);
if (ret != VMXNET3_SUCCESS) {
-   PMD_INIT_LOG(ERR, "Device receive init in %s: UNSUCCESSFUL", 
__func__);
+   PMD_INIT_LOG(ERR, "Device receive init: UNSUCCESSFUL");
return ret;
}

@@ -882,7 +882,7 @@ vmxnet3_process_events(struct vmxnet3_hw *hw)
uint32_t events = hw->shared->ecr;

if (!events) {
-   PMD_INIT_LOG(ERR, "No events to process in %s()", __func__);
+   PMD_INIT_LOG(ERR, "No events to process");
return;
}

diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index 4de5d89..e592010 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -462,7 +462,7 @@ vmxnet3_post_rx_bufs(vmxnet3_rx_queue_t *rxq, uint8_t 
ring_id)
/* Allocate blank mbuf for the current Rx Descriptor */
mbuf = rte_rxmbuf_alloc(rxq->mp);
if (unlikely(mbuf == NULL)) {
-   PMD_RX_LOG(ERR, "Error allocating mbuf in %s", 
__func__);
+   PMD_RX_LOG(ERR, "Error allocating mbuf");
rxq->stats.rx_buf_alloc_failure++;
err = ENOMEM;
break;
-- 
1.8.1.4



[dpdk-dev] [PATCH v5 2/2] vhost: call rte_pktmbuf_alloc_bulk in vhost dequeue

2015-12-28 Thread Huawei Xie
v4 changes:
 fix a silly typo in error handling when rte_pktmbuf_alloc fails
reported by haifeng

pre-allocate a bulk of mbufs instead of allocating one mbuf a time on demand

Signed-off-by: Gerald Rogers 
Signed-off-by: Huawei Xie 
Acked-by: Konstantin Ananyev 
Acked-by: Yuanhan Liu 
Tested-by: Yuanhan Liu 
---
 lib/librte_vhost/vhost_rxtx.c | 35 ++-
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index bbf3fac..f10d534 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -576,6 +576,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
uint32_t i;
uint16_t free_entries, entry_success = 0;
uint16_t avail_idx;
+   uint8_t alloc_err = 0;
+   uint8_t seg_num;

if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) {
RTE_LOG(ERR, VHOST_DATA,
@@ -609,6 +611,14 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,

LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Buffers available %d\n",
dev->device_fh, free_entries);
+
+   if (unlikely(rte_pktmbuf_alloc_bulk(mbuf_pool,
+   pkts, free_entries)) < 0) {
+   RTE_LOG(ERR, VHOST_DATA,
+   "Failed to bulk allocating %d mbufs\n", free_entries);
+   return 0;
+   }
+
/* Retrieve all of the head indexes first to avoid caching issues. */
for (i = 0; i < free_entries; i++)
head[i] = vq->avail->ring[(vq->last_used_idx + i) & (vq->size - 
1)];
@@ -621,9 +631,9 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
uint32_t vb_avail, vb_offset;
uint32_t seg_avail, seg_offset;
uint32_t cpy_len;
-   uint32_t seg_num = 0;
+   seg_num = 0;
struct rte_mbuf *cur;
-   uint8_t alloc_err = 0;
+

desc = >desc[head[entry_success]];

@@ -654,13 +664,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
vq->used->ring[used_idx].id = head[entry_success];
vq->used->ring[used_idx].len = 0;

-   /* Allocate an mbuf and populate the structure. */
-   m = rte_pktmbuf_alloc(mbuf_pool);
-   if (unlikely(m == NULL)) {
-   RTE_LOG(ERR, VHOST_DATA,
-   "Failed to allocate memory for mbuf.\n");
-   break;
-   }
+   prev = cur = m = pkts[entry_success];
seg_offset = 0;
seg_avail = m->buf_len - RTE_PKTMBUF_HEADROOM;
cpy_len = RTE_MIN(vb_avail, seg_avail);
@@ -668,8 +672,6 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
PRINT_PACKET(dev, (uintptr_t)vb_addr, desc->len, 0);

seg_num++;
-   cur = m;
-   prev = m;
while (cpy_len != 0) {
rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, 
seg_offset),
(void *)((uintptr_t)(vb_addr + vb_offset)),
@@ -761,16 +763,23 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
cpy_len = RTE_MIN(vb_avail, seg_avail);
}

-   if (unlikely(alloc_err == 1))
+   if (unlikely(alloc_err))
break;

m->nb_segs = seg_num;

-   pkts[entry_success] = m;
vq->last_used_idx++;
entry_success++;
}

+   if (unlikely(alloc_err)) {
+   uint16_t i = entry_success;
+
+   m->nb_segs = seg_num;
+   for (; i < free_entries; i++)
+   rte_pktmbuf_free(pkts[i]);
+   }
+
rte_compiler_barrier();
vq->used->idx += entry_success;
/* Kick guest if required. */
-- 
1.8.1.4



[dpdk-dev] [PATCH v5 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API

2015-12-28 Thread Huawei Xie
v5 changes:
 add comment about duff's device and our variant implementation
 revise the code style a bit

v3 changes:
 move while after case 0
 add context about duff's device and why we use while loop in the commit
message

v2 changes:
 unroll the loop a bit to help the performance

rte_pktmbuf_alloc_bulk allocates a bulk of packet mbufs.

There is related thread about this bulk API.
http://dpdk.org/dev/patchwork/patch/4718/
Thanks to Konstantin's loop unrolling.

Attached the wiki page about duff's device. It explains the performance
optimization through loop unwinding, and also the most dramatic use of
case label fall-through.
https://en.wikipedia.org/wiki/Duff%27s_device

In our implementation, we use while() loop rather than do{} while() loop
because we could not assume count is strictly positive. Using while()
loop saves one line of check if count is zero.

Signed-off-by: Gerald Rogers 
Signed-off-by: Huawei Xie 
Acked-by: Konstantin Ananyev 
---
 lib/librte_mbuf/rte_mbuf.h | 55 ++
 1 file changed, 55 insertions(+)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index f234ac9..b2ed479 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1336,6 +1336,61 @@ static inline struct rte_mbuf *rte_pktmbuf_alloc(struct 
rte_mempool *mp)
 }

 /**
+ * Allocate a bulk of mbufs, initialize refcnt and reset the fields to default
+ * values.
+ *
+ *  @param pool
+ *The mempool from which mbufs are allocated.
+ *  @param mbufs
+ *Array of pointers to mbufs
+ *  @param count
+ *Array size
+ *  @return
+ *   - 0: Success
+ */
+static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool,
+struct rte_mbuf **mbufs, unsigned count)
+{
+   unsigned idx = 0;
+   int rc;
+
+   rc = rte_mempool_get_bulk(pool, (void **)mbufs, count);
+   if (unlikely(rc))
+   return rc;
+
+   /* To understand duff's device on loop unwinding optimization, see
+* https://en.wikipedia.org/wiki/Duff's_device.
+* Here while() loop is used rather than do() while{} to avoid extra
+* check if count is zero.
+*/
+   switch (count % 4) {
+   case 0:
+   while (idx != count) {
+   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+   rte_pktmbuf_reset(mbufs[idx]);
+   idx++;
+   case 3:
+   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+   rte_pktmbuf_reset(mbufs[idx]);
+   idx++;
+   case 2:
+   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+   rte_pktmbuf_reset(mbufs[idx]);
+   idx++;
+   case 1:
+   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+   rte_pktmbuf_reset(mbufs[idx]);
+   idx++;
+   }
+   }
+   return 0;
+}
+
+/**
  * Attach packet mbuf to another packet mbuf.
  *
  * After attachment we refer the mbuf we attached as 'indirect',
-- 
1.8.1.4



[dpdk-dev] [PATCH v5 0/2] provide rte_pktmbuf_alloc_bulk API and call it in vhost dequeue

2015-12-28 Thread Huawei Xie
v5 changes:
 add comment about duff's device and our variant implementation

v4 changes:
 fix a silly typo in error handling when rte_pktmbuf_alloc fails

v3 changes:
 move while after case 0
 add context about duff's device and why we use while loop in the commit
message

v2 changes:
 unroll the loop in rte_pktmbuf_alloc_bulk to help the performance

For symmetric rte_pktmbuf_free_bulk, if the app knows in its scenarios
their mbufs are all simple mbufs, i.e meet the following requirements:
 * no multiple segments
 * not indirect mbuf
 * refcnt is 1
 * belong to the same mbuf memory pool,
it could directly call rte_mempool_put to free the bulk of mbufs,
otherwise rte_pktmbuf_free_bulk has to call rte_pktmbuf_free to free
the mbuf one by one.
This patchset will not provide this symmetric implementation.

Huawei Xie (2):
  mbuf: provide rte_pktmbuf_alloc_bulk API
  vhost: call rte_pktmbuf_alloc_bulk in vhost dequeue

 lib/librte_mbuf/rte_mbuf.h| 55 +++
 lib/librte_vhost/vhost_rxtx.c | 35 +--
 2 files changed, 77 insertions(+), 13 deletions(-)

-- 
1.8.1.4