date:20150909

[dpdk-dev] [RFC PATCH 4/8] driver/virtio:enqueue TSO offload

2015-09-09 Thread Ouyang, Changchun



> -Original Message-
> From: Liu, Jijiang
> Sent: Thursday, September 10, 2015 1:21 AM
> To: Ouyang, Changchun; dev at dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 4/8] driver/virtio:enqueue TSO offload
> 
> 
> 
> > -Original Message-
> > From: Ouyang, Changchun
> > Sent: Tuesday, September 8, 2015 6:18 PM
> > To: Liu, Jijiang; dev at dpdk.org
> > Cc: Ouyang, Changchun
> > Subject: RE: [dpdk-dev] [RFC PATCH 4/8] driver/virtio:enqueue TSO
> > offload
> >
> >
> >
> > > -Original Message-
> > > From: Liu, Jijiang
> > > Sent: Monday, September 7, 2015 2:11 PM
> > > To: Ouyang, Changchun; dev at dpdk.org
> > > Subject: RE: [dpdk-dev] [RFC PATCH 4/8] driver/virtio:enqueue TSO
> > > offload
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Ouyang, Changchun
> > > > Sent: Monday, August 31, 2015 8:29 PM
> > > > To: Liu, Jijiang; dev at dpdk.org
> > > > Cc: Ouyang, Changchun
> > > > Subject: RE: [dpdk-dev] [RFC PATCH 4/8] driver/virtio:enqueue TSO
> > > > offload
> > > >
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jijiang Liu
> > > > > Sent: Monday, August 31, 2015 5:42 PM
> > > > > To: dev at dpdk.org
> > > > > Subject: [dpdk-dev] [RFC PATCH 4/8] driver/virtio:enqueue TSO
> > > > > offload
> > > > >
> > > > > Enqueue TSO4/6 offload.
> > > > >
> > > > > Signed-off-by: Jijiang Liu 
> > > > > ---
> > > > >  drivers/net/virtio/virtio_rxtx.c |   23 +++
> > > > >  1 files changed, 23 insertions(+), 0 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio/virtio_rxtx.c
> > > > > b/drivers/net/virtio/virtio_rxtx.c
> > > > > index c5b53bb..4c2d838 100644
> > > > > --- a/drivers/net/virtio/virtio_rxtx.c
> > > > > +++ b/drivers/net/virtio/virtio_rxtx.c
> > > > > @@ -198,6 +198,28 @@ virtqueue_enqueue_recv_refill(struct
> > > > > virtqueue *vq, struct rte_mbuf *cookie)
> > > > >   return 0;
> > > > >  }
> > > > >
> > > > > +static void
> > > > > +virtqueue_enqueue_offload(struct virtqueue *txvq, struct
> > > > > +rte_mbuf
> > > *m,
> > > > > + uint16_t idx, uint16_t hdr_sz) {
> > > > > + struct virtio_net_hdr *hdr = (struct virtio_net_hdr
> *)(uintptr_t)
> > > > > + (txvq->virtio_net_hdr_addr + idx *
> hdr_sz);
> > > > > +
> > > > > + if (m->tso_segsz != 0 && m->ol_flags & PKT_TX_TCP_SEG) {
> > > > > + if (m->ol_flags & PKT_TX_IPV4) {
> > > > > + if (!vtpci_with_feature(txvq->hw,
> > > > > VIRTIO_NET_F_HOST_TSO4))
> > > > > + return;
> > > >
> > > > Do we need return error if host can't handle tso for the packet?
> > > >
> > > > > + hdr->gso_type =
> VIRTIO_NET_HDR_GSO_TCPV4;
> > > > > + } else if (m->ol_flags & PKT_TX_IPV6) {
> > > > > + if (!vtpci_with_feature(txvq->hw,
> > > > > VIRTIO_NET_F_HOST_TSO6))
> > > > > + return;
> > > >
> > > > Same as above
> > > >
> > > > > + hdr->gso_type =
> VIRTIO_NET_HDR_GSO_TCPV6;
> > > > > + }
> > > >
> > > > Do we need else branch for the case of neither tcpv4 nor tcpv6?
> > > >
> > > > > + hdr->gso_size = m->tso_segsz;
> > > > > + hdr->hdr_len = m->l2_len + m->l3_len + m->l4_len;
> > > > > + }
> > > > > +}
> > > > > +
> > > > >  static int
> > > > >  virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf
> > > > > *cookie) { @@ -221,6 +243,7 @@ virtqueue_enqueue_xmit(struct
> > > > > virtqueue *txvq, struct rte_mbuf *cookie)
> > > > >   dxp->cookie = (void *)cookie;
> > > > >   dxp->ndescs = needed;
> > > > >
> > > > > + virtqueue_enqueue_offload(txvq, cookie, idx, head_size);
> > > >
> > > > If TSO is not enabled in the feature bit, how to resolve here?
> > >
> > > The TSO enablement  check is in the function.
> > >
> > > If TSO is not enabled, and don't need to fill virtio_net_hdr now.
> >
> > Here I mean if (m->ol_flags & PKT_TX_TCP_SEG) is true, that is to say,
> > the virtio- pmd user expect do tso in vhost or virtio, but the host
> > feature bit doesn't support it, then it should handle this case,
> > either handle it in virtio pmd, or return error to caller.
> > Otherwise the packet with flag tso maybe can't be send out normally.
> > Am I right?
> Not exactly, if host feature bit doesn't support TSO, and the TSO flag cannot
> been set in this function, and then continue processing the packet in host.

TSO flag is set by upper level, so it could be set even if host feature bit 
doesn't support TSO,
You know the other path is also supporting TSO in guest feature bit.
Currently the virtio driver can't support TSO, at least we should return error 
for this possible case.

> Now I think it had better return an error, and don't continue processing this
> packet.
> 
> > >
> > > > >   start_dp = txvq->vq_ring.desc;
> > > > >   start_dp[idx].addr =
> > > > >

[dpdk-dev] Order of system brought up affects throughput with qos_sched app

2015-09-09 Thread Wei Shen

Hi Cristian,
Thanks for your quick response. I did a quick test of your hypothesis and it 
sort of came out as you mentioned. That is, it went back to ~4Gbps after around 
ten minutes with the previous profile I posted.
In another test, I set the pipe token rate to ~20Mbps instead of full line rate 
each. Although I did run into the same order issue, I haven't noticed any slow 
down yet by the time of this email (it's been up for an hour or so).
I am sorry but I still don't get why. Do you mean ~10Gbps throughput saw in 
order #2 is made possible by the initial accumulation of credits and later when 
the app runs long enough the old credits would run out and get capped by this 
credit rate? But in the profile I set everything to be line rate so I think 
this is not the bottleneck.
Could you please illustrate it further? Appreciate it. Thank you.
> From: cristian.dumitrescu at intel.com
> To: wshen0123 at outlook.com; dev at dpdk.org
> Subject: RE: [dpdk-dev] Order of system brought up affects throughput with
> qos_sched app
> Date: Wed, 9 Sep 2015 19:54:12 +
> 
> Hi Wei,
> 
> Here is another hypothesis for you to consider: if the size of your token 
> buckets (used to store subport and pipe credits) is big (and it actually is 
> set big in the default config file of the app), then when no packets are 
> received for a long while (which is the case when you start the app first and 
> the traffic gen later), the token buckets are continuously replenished (with 
> nothing consumed) until they become full; when packets start to arrive, the 
> token buckets are full and it can take a long time (might be minutes or even 
> hours, depending on how big your buckets are) until they come down to their 
> normal values (this time can actually be comuted/estimated).
> 
> If this is what happens in your case, lowering the size of your buckets will 
> help.
> 
> Regards,
> Cristian
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wei Shen
> > Sent: Wednesday, September 9, 2015 9:39 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] Order of system brought up affects throughput with
> > qos_sched app
> > 
> > Hi all,
> > I ran into problems with qos_sched with different order of system brought
> > up. I can bring up the system in two ways:
> > 1. Start traffic gen first. Then start qos_sched.2. Start qos_sched first. 
> > Then
> > start traffic gen.
> > With 256K pipes and 64 queue size, 128B packet size, I got ~4Gbps with order
> > #1. While I got 10G with order #2.
> > qos_sched command stats showed that ~59% packets got dropped in RX
> > (rte_ring_enqueue).
> > Plus, with #1, if I restart the traffic gen later, I would regain 10Gbps
> > throughput, which suggests that this is not an initialization issue but 
> > runtime
> > behavior.
> > I also tried to assign qos_sched on different cores and got the same result.
> > I suspect that there is some rte_ring bugs when connecting two cores and
> > one core started enqueuing before another core is ready for dequeue.
> > Have you experienced the same issue? Appreciate your help.
> > 
> > Wei 
> > Shen.-
> > ---My system spec 
> > is:Intel(R) Xeon(R)
> > CPU E5-2699 v3 @ 2.30GHz15 * 1G hugepages
> > qos_sched argument: ./build/app/qos_sched -c 1c0002 -n 4 -- --pfc
> > "0,1,20,18,19" --cfg profile.cfg
> > profile.cfg:[port]frame overhead = 20number of subports per port =
> > 1number of pipes per subport = 262144queue sizes = 64 64 64 64
> > 
> > ; Subport configuration[subport 0]tb rate = 125000   ; Bytes per
> > secondtb size = 100 ; Bytes
> > tc 0 rate = 125000 ; Bytes per secondtc 1 rate = 125000 
> > ;
> > Bytes per secondtc 2 rate = 125000 ; Bytes per secondtc 3 rate =
> > 125000 ; Bytes per secondtc period = 10
> > ; Milliseconds
> > pipe 0-262143 = 0; These pipes are configured with pipe 
> > profile 0
> > ; Pipe configuration[pipe profile 0]tb rate = 125000   ; Bytes 
> > per
> > secondtb size = 100 ; Bytes
> > tc 0 rate = 125000 ; Bytes per secondtc 1 rate = 125000 
> > ;
> > Bytes per secondtc 2 rate = 125000 ; Bytes per secondtc 3 rate =
> > 125000 ; Bytes per secondtc period = 10
> > ; Milliseconds
> > tc 3 oversubscription weight = 1
> > tc 0 wrr weights = 1 1 1 1tc 1 wrr weights = 1 1 1 1tc 2 wrr weights = 1 1 
> > 1 1tc 3
> > wrr weights = 1 1 1 1

[dpdk-dev] Order of system brought up affects throughput with qos_sched app

2015-09-09 Thread Dumitrescu, Cristian

Hi Wei,

Here is another hypothesis for you to consider: if the size of your token 
buckets (used to store subport and pipe credits) is big (and it actually is set 
big in the default config file of the app), then when no packets are received 
for a long while (which is the case when you start the app first and the 
traffic gen later), the token buckets are continuously replenished (with 
nothing consumed) until they become full; when packets start to arrive, the 
token buckets are full and it can take a long time (might be minutes or even 
hours, depending on how big your buckets are) until they come down to their 
normal values (this time can actually be comuted/estimated).

If this is what happens in your case, lowering the size of your buckets will 
help.

Regards,
Cristian

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wei Shen
> Sent: Wednesday, September 9, 2015 9:39 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] Order of system brought up affects throughput with
> qos_sched app
> 
> Hi all,
> I ran into problems with qos_sched with different order of system brought
> up. I can bring up the system in two ways:
> 1. Start traffic gen first. Then start qos_sched.2. Start qos_sched first. 
> Then
> start traffic gen.
> With 256K pipes and 64 queue size, 128B packet size, I got ~4Gbps with order
> #1. While I got 10G with order #2.
> qos_sched command stats showed that ~59% packets got dropped in RX
> (rte_ring_enqueue).
> Plus, with #1, if I restart the traffic gen later, I would regain 10Gbps
> throughput, which suggests that this is not an initialization issue but 
> runtime
> behavior.
> I also tried to assign qos_sched on different cores and got the same result.
> I suspect that there is some rte_ring bugs when connecting two cores and
> one core started enqueuing before another core is ready for dequeue.
> Have you experienced the same issue? Appreciate your help.
> 
> Wei 
> Shen.-
> ---My system spec is:Intel(R) 
> Xeon(R)
> CPU E5-2699 v3 @ 2.30GHz15 * 1G hugepages
> qos_sched argument: ./build/app/qos_sched -c 1c0002 -n 4 -- --pfc
> "0,1,20,18,19" --cfg profile.cfg
> profile.cfg:[port]frame overhead = 20number of subports per port =
> 1number of pipes per subport = 262144queue sizes = 64 64 64 64
> 
> ; Subport configuration[subport 0]tb rate = 125000   ; Bytes per
> secondtb size = 100 ; Bytes
> tc 0 rate = 125000 ; Bytes per secondtc 1 rate = 125000   
>   ;
> Bytes per secondtc 2 rate = 125000 ; Bytes per secondtc 3 rate =
> 125000 ; Bytes per secondtc period = 10; 
> Milliseconds
> pipe 0-262143 = 0; These pipes are configured with pipe 
> profile 0
> ; Pipe configuration[pipe profile 0]tb rate = 125000   ; Bytes per
> secondtb size = 100 ; Bytes
> tc 0 rate = 125000 ; Bytes per secondtc 1 rate = 125000   
>   ;
> Bytes per secondtc 2 rate = 125000 ; Bytes per secondtc 3 rate =
> 125000 ; Bytes per secondtc period = 10; 
> Milliseconds
> tc 3 oversubscription weight = 1
> tc 0 wrr weights = 1 1 1 1tc 1 wrr weights = 1 1 1 1tc 2 wrr weights = 1 1 1 
> 1tc 3
> wrr weights = 1 1 1 1

[dpdk-dev] Order of system brought up affects throughput with qos_sched app

2015-09-09 Thread Wei Shen

Hi all,
I ran into problems with qos_sched with different order of system brought up. I 
can bring up the system in two ways:
1. Start traffic gen first. Then start qos_sched.2. Start qos_sched first. Then 
start traffic gen.
With 256K pipes and 64 queue size, 128B packet size, I got ~4Gbps with order 
#1. While I got 10G with order #2.
qos_sched command stats showed that ~59% packets got dropped in RX 
(rte_ring_enqueue).
Plus, with #1, if I restart the traffic gen later, I would regain 10Gbps 
throughput, which suggests that this is not an initialization issue but runtime 
behavior.
I also tried to assign qos_sched on different cores and got the same result.
I suspect that there is some rte_ring bugs when connecting two cores and one 
core started enqueuing before another core is ready for dequeue.
Have you experienced the same issue? Appreciate your help.

Wei 
Shen.My
 system spec is:Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz15 * 1G hugepages
qos_sched argument: ./build/app/qos_sched -c 1c0002 -n 4 -- --pfc 
"0,1,20,18,19" --cfg profile.cfg
profile.cfg:[port]frame overhead = 20number of subports per port = 1number of 
pipes per subport = 262144queue sizes = 64 64 64 64

; Subport configuration[subport 0]tb rate = 125000   ; Bytes per 
secondtb size = 100 ; Bytes
tc 0 rate = 125000 ; Bytes per secondtc 1 rate = 125000 
; Bytes per secondtc 2 rate = 125000 ; Bytes per secondtc 3 rate = 
125000 ; Bytes per secondtc period = 10; 
Milliseconds
pipe 0-262143 = 0; These pipes are configured with pipe profile 0
; Pipe configuration[pipe profile 0]tb rate = 125000   ; Bytes per 
secondtb size = 100 ; Bytes
tc 0 rate = 125000 ; Bytes per secondtc 1 rate = 125000 
; Bytes per secondtc 2 rate = 125000 ; Bytes per secondtc 3 rate = 
125000 ; Bytes per secondtc period = 10; 
Milliseconds
tc 3 oversubscription weight = 1
tc 0 wrr weights = 1 1 1 1tc 1 wrr weights = 1 1 1 1tc 2 wrr weights = 1 1 1 
1tc 3 wrr weights = 1 1 1 1

[dpdk-dev] [PATCH 0/5] fixup ip pipeline examples

2015-09-09 Thread Dumitrescu, Cristian



> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Tuesday, September 1, 2015 4:59 AM
> To: Dumitrescu, Cristian
> Cc: dev at dpdk.org; Stephen Hemminger
> Subject: [PATCH 0/5] fixup ip pipeline examples
> 
> Lots of little trivial bugs/typos here.
> Let's not start users off with a bad example.
> 

Thanks very much for doing this work, Steve!

I agree with most of the bug fixes here except a few, which I indicated in my 
reply to each individual patch.

Do you want to resend or do you want us to integrate the fixes in our next 
patches? Whatever works best for you is fine with us.

Regards,
Cristian

[dpdk-dev] [PATCH 5/5] examples_ip_pipeline: fix possible string overrun

2015-09-09 Thread Dumitrescu, Cristian



> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Tuesday, September 1, 2015 4:59 AM
> To: Dumitrescu, Cristian
> Cc: dev at dpdk.org; Stephen Hemminger
> Subject: [PATCH 5/5] examples_ip_pipeline: fix possible string overrun
> 
> If a long name was passed the code would clobber memory with
> strcpy.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  examples/ip_pipeline/app.h  | 2 +-
>  examples/ip_pipeline/init.c | 5 +++--
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/examples/ip_pipeline/app.h b/examples/ip_pipeline/app.h
> index 521e3a0..1f6bf0c 100644
> --- a/examples/ip_pipeline/app.h
> +++ b/examples/ip_pipeline/app.h
> @@ -190,7 +190,7 @@ struct app_pktq_out_params {
>  #define APP_MAX_PIPELINE_ARGSPIPELINE_MAX_ARGS
> 
>  struct app_pipeline_params {
> - char *name;
> + const char *name;
>   uint8_t parsed;
> 
>   char type[APP_PIPELINE_TYPE_SIZE];
> diff --git a/examples/ip_pipeline/init.c b/examples/ip_pipeline/init.c
> index 75e3767..007af83 100644
> --- a/examples/ip_pipeline/init.c
> +++ b/examples/ip_pipeline/init.c
> @@ -1022,12 +1022,13 @@ app_init_msgq(struct app_params *app)
>  }
> 
>  static void app_pipeline_params_get(struct app_params *app,
> - struct app_pipeline_params *p_in,
> + const struct app_pipeline_params *p_in,
>   struct pipeline_params *p_out)
>  {
>   uint32_t i;
> 
> - strcpy(p_out->name, p_in->name);
> + strncpy(p_out->name, p_in->name, PIPELINE_NAME_SIZE - 1);
> + p_out->name[PIPELINE_NAME_SIZE - 1] = '\0';

Could be done, but not necessary, as the pipeline name string was already 
validated in the parser module; now it is just copied over, and it is safe to 
assume it of right size.

I don't mind doing it for extra safety.

> 
>   p_out->socket_id = (int) p_in->socket_id;
> 
> --
> 2.1.4

[dpdk-dev] [PATCH 4/5] examples_ip_pipeline: remove useless code

2015-09-09 Thread Dumitrescu, Cristian



> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Tuesday, September 1, 2015 4:59 AM
> To: Dumitrescu, Cristian
> Cc: dev at dpdk.org; Stephen Hemminger
> Subject: [PATCH 4/5] examples_ip_pipeline: remove useless code
> 
> Code here checks return from strdup() in only one place in this
> whole module, and then does nothing useful by setting one
> value that is then cleared. Just remove the check.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  examples/ip_pipeline/config_parse.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/examples/ip_pipeline/config_parse.c
> b/examples/ip_pipeline/config_parse.c
> index 6b651a8..81ec33b 100644
> --- a/examples/ip_pipeline/config_parse.c
> +++ b/examples/ip_pipeline/config_parse.c
> @@ -1483,9 +1483,6 @@ parse_tm(struct app_params *app,
>   ret = -ESRCH;
>   if (strcmp(ent->name, "cfg") == 0) {
>   param->file_name = strdup(ent->value);
> - if (param->file_name == NULL)
> - ret = -EINVAL;
> -
>   ret = 0;

The required implementation would actually be:

if (param->file_name == NULL)
ret = -EINVAL;
else
ret = 0;

The TM parameter file_name is used by function app_config_parse_tm() in file 
config_parse_tm.c.

Regarding the way the error management (including the error messages) is done 
in our parser (currently done by repeatedly setting the ret variable), I would 
like to improve (replace) when I get the time to take another look at this.

>   } else if (strcmp(ent->name, "burst_read") == 0)
>   ret = parser_read_uint32(¶m->burst_read,
> --
> 2.1.4

[dpdk-dev] [PATCH 3/5] example_ip_pipeline: fix sizeof() on memcpy

2015-09-09 Thread Dumitrescu, Cristian



> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Tuesday, September 1, 2015 4:59 AM
> To: Dumitrescu, Cristian
> Cc: dev at dpdk.org; Stephen Hemminger
> Subject: [PATCH 3/5] example_ip_pipeline: fix sizeof() on memcpy
> 
> Found by Coverity:
>   Sizeof not portable (SIZEOF_MISMATCH)
>   suspicious_sizeof: Passing argument &app->cmds[app->n_cmds] of type
>   cmdline_parse_ctx_t * and argument n_cmds * 8UL /* sizeof
> (cmdline_parse_ctx_t *) */
>   to function memcpy is suspicious.
>In this case, sizeof (cmdline_parse_ctx_t *) is equal to sizeof
> (cmdline_parse_ctx_t),
>but this is not a portable assumption.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  examples/ip_pipeline/init.c  | 2 +-
>  examples/ip_pipeline/pipeline/pipeline_common_fe.c   | 2 +-
>  examples/ip_pipeline/pipeline/pipeline_flow_classification.c | 1 -
>  3 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/examples/ip_pipeline/init.c b/examples/ip_pipeline/init.c
> index 3f9c68d..75e3767 100644
> --- a/examples/ip_pipeline/init.c
> +++ b/examples/ip_pipeline/init.c
> @@ -1325,7 +1325,7 @@ app_pipeline_type_cmd_push(struct app_params
> *app,
>   /* Push pipeline commands into the application */
>   memcpy(&app->cmds[app->n_cmds],
>   cmds,
> - n_cmds * sizeof(cmdline_parse_ctx_t *));
> + n_cmds * sizeof(cmdline_parse_ctx_t));

Actually no, as we are both the destination and the source of memcpy are array 
of pointers.

The source is a pipeline type, which is a static global data structure, so no 
issues with the life time of the data pointed to by the pointers.

> 
>   for (i = 0; i < n_cmds; i++)
>   app->cmds[app->n_cmds + i]->data = app;
> diff --git a/examples/ip_pipeline/pipeline/pipeline_common_fe.c
> b/examples/ip_pipeline/pipeline/pipeline_common_fe.c
> index fcda0ce..4eec66b 100644
> --- a/examples/ip_pipeline/pipeline/pipeline_common_fe.c
> +++ b/examples/ip_pipeline/pipeline/pipeline_common_fe.c
> @@ -1321,7 +1321,7 @@ app_pipeline_common_cmd_push(struct
> app_params *app)
>   /* Push pipeline commands into the application */
>   memcpy(&app->cmds[app->n_cmds],
>   pipeline_common_cmds,
> - n_cmds * sizeof(cmdline_parse_ctx_t *));
> + n_cmds * sizeof(cmdline_parse_ctx_t));

Actually no, as we are both the destination and the source of memcpy are array 
of pointers.

The source is a pipeline type, which is a static global data structure, so no 
issues with the life time of the data pointed to by the pointers.

> 
>   for (i = 0; i < n_cmds; i++)
>   app->cmds[app->n_cmds + i]->data = app;
> diff --git a/examples/ip_pipeline/pipeline/pipeline_flow_classification.c
> b/examples/ip_pipeline/pipeline/pipeline_flow_classification.c
> index 24cf7dc..e5141b0 100644
> --- a/examples/ip_pipeline/pipeline/pipeline_flow_classification.c
> +++ b/examples/ip_pipeline/pipeline/pipeline_flow_classification.c
> @@ -126,7 +126,6 @@ app_pipeline_fc_key_convert(struct pipeline_fc_key
> *key_in,
>   {
>   struct pkt_key_ipv6_5tuple *ipv6 = key_buffer;
> 
> - memset(ipv6, 0, 64);

Agreed!

>   ipv6->payload_length = 0;
>   ipv6->proto = key_in->key.ipv6_5tuple.proto;
>   ipv6->hop_limit = 0;
> --
> 2.1.4

[dpdk-dev] [PATCH 2/5] example_ip_pipeline: avoid strncpy issue

2015-09-09 Thread Dumitrescu, Cristian



> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Tuesday, September 1, 2015 4:59 AM
> To: Dumitrescu, Cristian
> Cc: dev at dpdk.org; Stephen Hemminger
> Subject: [PATCH 2/5] example_ip_pipeline: avoid strncpy issue
> 
> If name is so long that it fills buffer, then string would not
> be null terminated.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  examples/ip_pipeline/config_parse_tm.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/examples/ip_pipeline/config_parse_tm.c
> b/examples/ip_pipeline/config_parse_tm.c
> index 84702b0..4a35715 100644
> --- a/examples/ip_pipeline/config_parse_tm.c
> +++ b/examples/ip_pipeline/config_parse_tm.c
> @@ -354,7 +354,9 @@ tm_cfgfile_load_sched_subport(
>   profile = atoi(entries[j].value);
>   strncpy(name,
>   entries[j].name,
> - sizeof(name));
> + CFG_NAME_LEN - 1);
> + name[CFG_NAME_LEN-1] = '\0';
> +
>   n_tokens = rte_strsplit(
>   &name[sizeof("pipe")],
>   strnlen(name,
> CFG_NAME_LEN),
> --
> 2.1.4

Acked-by: Cristian Dumitrescu

[dpdk-dev] [PATCH 1/5] examples_ip_pipeline: fix typo's

2015-09-09 Thread Dumitrescu, Cristian



> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Tuesday, September 1, 2015 4:59 AM
> To: Dumitrescu, Cristian
> Cc: dev at dpdk.org; Stephen Hemminger
> Subject: [PATCH 1/5] examples_ip_pipeline: fix typo's
> 
> Coverity found these as dead-code and/or copy-paste bugs.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  examples/ip_pipeline/config_parse.c  | 2 +-
>  examples/ip_pipeline/config_parse_tm.c   | 2 +-
>  examples/ip_pipeline/pipeline/pipeline_flow_classification.c | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/examples/ip_pipeline/config_parse.c
> b/examples/ip_pipeline/config_parse.c
> index c9b78f9..6b651a8 100644
> --- a/examples/ip_pipeline/config_parse.c
> +++ b/examples/ip_pipeline/config_parse.c
> @@ -2238,7 +2238,7 @@ save_pipeline_params(struct app_params *app,
> FILE *f)
>   }
> 
>   /* msgq_out */
> - if (p->n_msgq_in) {
> + if (p->n_msgq_out) {
>   uint32_t j;
> 
>   fprintf(f, "msgq_out =");
> diff --git a/examples/ip_pipeline/config_parse_tm.c
> b/examples/ip_pipeline/config_parse_tm.c
> index cdebbdc..84702b0 100644
> --- a/examples/ip_pipeline/config_parse_tm.c
> +++ b/examples/ip_pipeline/config_parse_tm.c
> @@ -399,7 +399,7 @@ tm_cfgfile_load(struct app_pktq_tm_params *tm)
> 
>   memset(tm->sched_subport_params, 0, sizeof(tm-
> >sched_subport_params));
>   memset(tm->sched_pipe_profiles, 0, sizeof(tm-
> >sched_pipe_profiles));
> - memset(&tm->sched_port_params, 0, sizeof(tm-
> >sched_pipe_profiles));
> + memset(&tm->sched_port_params, 0, sizeof(tm-
> >sched_port_params));
>   for (i = 0; i < APP_MAX_SCHED_SUBPORTS *
> APP_MAX_SCHED_PIPES; i++)
>   tm->sched_pipe_to_profile[i] = -1;
> 
> diff --git a/examples/ip_pipeline/pipeline/pipeline_flow_classification.c
> b/examples/ip_pipeline/pipeline/pipeline_flow_classification.c
> index 4b82180..24cf7dc 100644
> --- a/examples/ip_pipeline/pipeline/pipeline_flow_classification.c
> +++ b/examples/ip_pipeline/pipeline/pipeline_flow_classification.c
> @@ -442,7 +442,7 @@ app_pipeline_fc_add_bulk(struct app_params *app,
>   flow_rsp = rte_malloc(NULL,
>   n_keys * sizeof(struct pipeline_fc_add_bulk_flow_rsp),
>   RTE_CACHE_LINE_SIZE);
> - if (flow_req == NULL) {
> + if (flow_rsq == NULL) {
>   rte_free(flow_req);
>   rte_free(new_flow);
>   rte_free(signature);
> --
> 2.1.4

Acked-by: Cristian Dumitrescu

[dpdk-dev] DPDK 2.2 roadmap

2015-09-09 Thread Patel, Rashmin N

There were two line items on the 2.2 roadmap: Xen Driver and Hyper-V Driver. 
Can you provide some more details?

Thanks,
Rashmin

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas Monjalon
Sent: Wednesday, September 09, 2015 1:45 AM
To: dev at dpdk.org
Subject: [dpdk-dev] DPDK 2.2 roadmap

Hello,

The new features for the release 2.2 must be first submitted before 2nd October.
They should be integrated before 23rd October.

In order to ease cooperation and integration, it would be nice to see announces 
or technical details of planned features for 2.2 or 2.3.
Then the roadmap page will be filled accordingly:
http://dpdk.org/dev/roadmap
Generally speaking, it helps to know what you are working on and what is the 
status.

Thanks

[dpdk-dev] Random packet drops with ip_pipeline on R730.

2015-09-09 Thread husainee

hi Cristian

I am using 2.0 release. I will try with 2.1 and revert.

But for additional information I tried the same 2.0 ip_pipeline
application with a Desktop system which has a single socket  Intel(R)
Core(TM) i5-4440 CPU @ 3.10GHz, 4 core. The nic is same i350.

On this machine I am sending packets on 4 ports at 1Gbps full duplex and
i get 4Gbps throughput with no drops.

The change between the two systems is the processor (speed, cores) and
no of sockets. Is the speed of processor reducing the performance of
DPDK drastically from 4Gbps to something <0.5Gpbs. This is confusing!

regards
husainee


On 09/09/2015 05:09 PM, Dumitrescu, Cristian wrote:
>
> Hi Husainee,
>
>  
>
> Looking at your config file, looks like you are using an old DPDK
> release prior to 2.1, can you please try out same simple test in your
> environment for latest DPDK 2.1 release?
>
>  
>
> We did a lot of work in DPDK release 2.1 for the ip_pipeline
> application, we basically rewrote large parts of it, including the
> parser, checks, run-time, library of pipelines, etc. The format of the
> config file has been improved a lot, you should be able to adapt your
> config file to the latest syntax very quickly.
>
>  
>
> Btw, you config file is not really equivalent to the l2fwd, as you are
> using two CPU cores connected through software rings rather than a
> single core, as l2fwd.
>
>  
>
> Here is an equivalent DPDK 2.1 config file using two cores connected
> through software rings (port 0 -> port 1, port 1-> port 0, port 2 ->
> port 3, port 3 -> port2):
>
>  
>
> [PIPELINE0]
>
> type = MASTER
>
> core = 0
>
>  
>
> [PIPELINE1]
>
> type = PASS-THROUGH
>
> core = 1
>
> pktq_in = RXQ0.0 RXQ1.0 RXQ2.0 RXQ3.0
>
> pktq_out = SWQ0 SWQ1 SWQ2 SWQ3
>
>  
>
> [PIPELINE2]
>
> type = PASS-THROUGH
>
> core = 2; you can also place PIPELINE2 on same core as PIPELINE1: core = 1
>
> pktq_in = SWQ1 SWQ0 SWQ3 SWQ2
>
> pktq_out = TXQ0.0 TXQ1.0 TXQ2.0 TXQ3.0
>
>  
>
> Here is an config file doing similar processing with a single core,
> closer configuration to l2fwd (port 0 -> port 1, port 1-> port 0, port
> 2 -> port 3, port 3 -> port2):
>
>  
>
> [PIPELINE0]
>
> type = MASTER
>
> core = 0
>
>  
>
> [PIPELINE1]
>
> type = PASS-THROUGH
>
> core = 1
>
> pktq_in = RXQ0.0 RXQ1.0 RXQ2.0 RXQ3.0
>
> pktq_out = TXQ1.0 TXQ0.0 TXQ3.0 TXQ2.0
>
>  
>
> Regards,
>
> Cristian
>
>  
>
> *From:*husainee [mailto:husainee.plumber at nevisnetworks.com]
> *Sent:* Wednesday, September 9, 2015 12:47 PM
> *To:* Dumitrescu, Cristian; dev at dpdk.org
> *Cc:* Cao, Waterman
> *Subject:* Re: [dpdk-dev] Random packet drops with ip_pipeline on R730.
>
>  
>
> Hi Cristian
> PFA the config file.
>
> I am sending packets from port0 and receiving on port1.
>
> By random packet drops I mean, on every run the number of packets
> dropped is not same. Here are some results as below.
>
> Frame sent rate 1488095.2 fps, 64Byte packets (100% of 1000Mbps)
> Run1- 0.0098% (20-22 Million Packets)
> Run2- 0.021% (20-22 Million Packets)
> Run3- 0.0091% (20-22 Million Packets)
>
> Frame rate 744047.62 fps, 64 Byte packets, (50% of 1000Mbps)
> Run1- 0.0047% (20-22 Million Packets)
> Run2- 0.0040% (20-22 Million Packets)
> Run3- 0.0040% (20-22 Million Packets)
>
>
> Frame rate 148809.52 fps, 64 Byte packets,(10% of 1000Mbps)
> Run1- 0 (20-22 Million Packets)
> Run2- 0 (20-22 Million Packets)
> Run3- 0 (20-22 Million Packets)
>
>
>
> Following are the hw nic setting differences btw ip_pipeline and l2fwd
> app.
>
> parameter
>
>   
>
> ip_pipeline
>
>   
>
> l2fwd
>
> jumbo frame
>
>   
>
> 1
>
>   
>
> 0
>
> hw_ip_checksum
>
>   
>
> 1
>
>   
>
> 0
>
> rx_conf. wthresh
>
>   
>
> 4
>
>   
>
> 0
>
> rx_conf.rx_free_thresh
>
>   
>
> 64
>
>   
>
> 32
>
> tx_conf.pthresh
>
>   
>
> 36
>
>   
>
> 32
>
> burst size
>
>   
>
> 64
>
>   
>
> 32
>
>
> We tried to make the ip_pipeline settings same as l2fwd but no change
> in results.
>
> I have not tried with 10GbE . I do not have 10GbE test equipment.
>
>
>
> regards
> husainee
>
>
>
> On 09/08/2015 06:32 PM, Dumitrescu, Cristian wrote:
>
> Hi Husainee,
>
>  
>
> Can you please explain what do you mean by random packet drops? What 
> percentage of the input packets get dropped, does it take place on every run, 
> does the number of dropped packets vary on every run, etc?
>
>  
>
> Are you also able to reproduce this issue with other NICs, e.g. 10GbE NIC?
>
>  
>
> Can you share your config file?
>
>  
>
> Can you please double check the low level NIC settings between the two 
> applications, i.e. the settings in structures link_params_default, 
> default_hwq_in_params, default_hwq_out_params from ip_pipeline file 
> config_parse.c vs. their equivalents from l2fwd? The only thing I can think 
> of right now is maybe one of the low level threshold values for the Ethernet 
> link is not tuned for your 1GbE NIC.
>
>  
>
> Regards,
>
> Cristian
>
>  
>
>

[dpdk-dev] [Pktgen] [PATCH] pktgen_setup_packets: fix race for packet header

2015-09-09 Thread Ilya Maximets

While pktgen_setup_packets() all threads of one port uses same
info->seq_pkt. This leads to constructing packets in the same memory region
(&pkt->hdr). As a result, pktgen_setup_packets generates random headers.

Fix that by making a local copy of info->seq_pkt and using it for
constructing of packets.

Signed-off-by: Ilya Maximets 
---
 app/pktgen-arp.c  |  2 +-
 app/pktgen-cmds.c | 40 
 app/pktgen-ipv4.c |  2 +-
 app/pktgen.c  | 39 +++
 app/pktgen.h  |  4 ++--
 app/t/pktgen.t.c  |  6 +++---
 6 files changed, 54 insertions(+), 39 deletions(-)

diff --git a/app/pktgen-arp.c b/app/pktgen-arp.c
index c378880..b7040d7 100644
--- a/app/pktgen-arp.c
+++ b/app/pktgen-arp.c
@@ -190,7 +190,7 @@ pktgen_process_arp( struct rte_mbuf * m, uint32_t pid, 
uint32_t vlan )

rte_memcpy(&pkt->eth_dst_addr, &arp->sha, 6);
for (i = 0; i < info->seqCnt; i++)
-   pktgen_packet_ctor(info, i, -1);
+   pktgen_packet_ctor(info, i, -1, NULL);
}

// Swap the two MAC addresses
diff --git a/app/pktgen-cmds.c b/app/pktgen-cmds.c
index da040e5..a6abb41 100644
--- a/app/pktgen-cmds.c
+++ b/app/pktgen-cmds.c
@@ -931,7 +931,7 @@ pktgen_set_proto(port_info_t * info, char type)
if ( type == 'i' )
info->seq_pkt[SINGLE_PKT].ethType = ETHER_TYPE_IPv4;

-   pktgen_packet_ctor(info, SINGLE_PKT, -1);
+   pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL);
 }

 /**//**
@@ -1067,7 +1067,7 @@ pktgen_set_pkt_type(port_info_t * info, const char * type)

(type[3] == '6') ? ETHER_TYPE_IPv6 :

/* TODO print error: unknown type */ ETHER_TYPE_IPv4;

-   pktgen_packet_ctor(info, SINGLE_PKT, -1);
+   pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL);
 }

 /**//**
@@ -1092,7 +1092,7 @@ pktgen_set_vlan(port_info_t * info, uint32_t onOff)
}
else
pktgen_clr_port_flags(info, SEND_VLAN_ID);
-   pktgen_packet_ctor(info, SINGLE_PKT, -1);
+   pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL);
 }

 /**//**
@@ -1112,7 +1112,7 @@ pktgen_set_vlanid(port_info_t * info, uint16_t vlanid)
 {
info->vlanid = vlanid;
info->seq_pkt[SINGLE_PKT].vlanid = info->vlanid;
-   pktgen_packet_ctor(info, SINGLE_PKT, -1);
+   pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL);
 }

 /**//**
@@ -1137,7 +1137,7 @@ pktgen_set_mpls(port_info_t * info, uint32_t onOff)
}
else
pktgen_clr_port_flags(info, SEND_MPLS_LABEL);
-   pktgen_packet_ctor(info, SINGLE_PKT, -1);
+   pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL);
 }

 /**//**
@@ -1157,7 +1157,7 @@ pktgen_set_mpls_entry(port_info_t * info, uint32_t 
mpls_entry)
 {
info->mpls_entry = mpls_entry;
info->seq_pkt[SINGLE_PKT].mpls_entry = info->mpls_entry;
-   pktgen_packet_ctor(info, SINGLE_PKT, -1);
+   pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL);
 }

 /**//**
@@ -1182,7 +1182,7 @@ pktgen_set_qinq(port_info_t * info, uint32_t onOff)
}
else
pktgen_clr_port_flags(info, SEND_Q_IN_Q_IDS);
-   pktgen_packet_ctor(info, SINGLE_PKT, -1);
+   pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL);
 }

 /**//**
@@ -1204,7 +1204,7 @@ pktgen_set_qinqids(port_info_t * info, uint16_t outerid, 
uint16_t innerid)
info->seq_pkt[SINGLE_PKT].qinq_outerid = info->qinq_outerid;
info->qinq_innerid = innerid;
info->seq_pkt[SINGLE_PKT].qinq_innerid = info->qinq_innerid;
-   pktgen_packet_ctor(info, SINGLE_PKT, -1);
+   pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL);
 }

 /**//**
@@ -1228,7 +1228,7 @@ pktgen_set_gre(port_info_t * info, uint32_t onOff)
}
else
pktgen_clr_port_flags(info, SEND_GRE_IPv4_HEADER);
-   pktgen_packet_ctor(info, SINGLE_PKT, -1);
+   pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL);
 }

 /**//**
@@ -1252,7 +1252,7 @@ pktgen_set_gre_eth(port_info_t * info, uint32_t onOff)
}
else
pktgen_clr_port_flags(info, SEND

[dpdk-dev] [RFC PATCH 4/8] driver/virtio:enqueue TSO offload

2015-09-09 Thread Liu, Jijiang



> -Original Message-
> From: Ouyang, Changchun
> Sent: Tuesday, September 8, 2015 6:18 PM
> To: Liu, Jijiang; dev at dpdk.org
> Cc: Ouyang, Changchun
> Subject: RE: [dpdk-dev] [RFC PATCH 4/8] driver/virtio:enqueue TSO offload
> 
> 
> 
> > -Original Message-
> > From: Liu, Jijiang
> > Sent: Monday, September 7, 2015 2:11 PM
> > To: Ouyang, Changchun; dev at dpdk.org
> > Subject: RE: [dpdk-dev] [RFC PATCH 4/8] driver/virtio:enqueue TSO
> > offload
> >
> >
> >
> > > -Original Message-
> > > From: Ouyang, Changchun
> > > Sent: Monday, August 31, 2015 8:29 PM
> > > To: Liu, Jijiang; dev at dpdk.org
> > > Cc: Ouyang, Changchun
> > > Subject: RE: [dpdk-dev] [RFC PATCH 4/8] driver/virtio:enqueue TSO
> > > offload
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jijiang Liu
> > > > Sent: Monday, August 31, 2015 5:42 PM
> > > > To: dev at dpdk.org
> > > > Subject: [dpdk-dev] [RFC PATCH 4/8] driver/virtio:enqueue TSO
> > > > offload
> > > >
> > > > Enqueue TSO4/6 offload.
> > > >
> > > > Signed-off-by: Jijiang Liu 
> > > > ---
> > > >  drivers/net/virtio/virtio_rxtx.c |   23 +++
> > > >  1 files changed, 23 insertions(+), 0 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio/virtio_rxtx.c
> > > > b/drivers/net/virtio/virtio_rxtx.c
> > > > index c5b53bb..4c2d838 100644
> > > > --- a/drivers/net/virtio/virtio_rxtx.c
> > > > +++ b/drivers/net/virtio/virtio_rxtx.c
> > > > @@ -198,6 +198,28 @@ virtqueue_enqueue_recv_refill(struct
> > > > virtqueue *vq, struct rte_mbuf *cookie)
> > > > return 0;
> > > >  }
> > > >
> > > > +static void
> > > > +virtqueue_enqueue_offload(struct virtqueue *txvq, struct rte_mbuf
> > *m,
> > > > +   uint16_t idx, uint16_t hdr_sz) {
> > > > +   struct virtio_net_hdr *hdr = (struct virtio_net_hdr 
> > > > *)(uintptr_t)
> > > > +   (txvq->virtio_net_hdr_addr + idx * 
> > > > hdr_sz);
> > > > +
> > > > +   if (m->tso_segsz != 0 && m->ol_flags & PKT_TX_TCP_SEG) {
> > > > +   if (m->ol_flags & PKT_TX_IPV4) {
> > > > +   if (!vtpci_with_feature(txvq->hw,
> > > > VIRTIO_NET_F_HOST_TSO4))
> > > > +   return;
> > >
> > > Do we need return error if host can't handle tso for the packet?
> > >
> > > > +   hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
> > > > +   } else if (m->ol_flags & PKT_TX_IPV6) {
> > > > +   if (!vtpci_with_feature(txvq->hw,
> > > > VIRTIO_NET_F_HOST_TSO6))
> > > > +   return;
> > >
> > > Same as above
> > >
> > > > +   hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
> > > > +   }
> > >
> > > Do we need else branch for the case of neither tcpv4 nor tcpv6?
> > >
> > > > +   hdr->gso_size = m->tso_segsz;
> > > > +   hdr->hdr_len = m->l2_len + m->l3_len + m->l4_len;
> > > > +   }
> > > > +}
> > > > +
> > > >  static int
> > > >  virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf
> > > > *cookie) { @@ -221,6 +243,7 @@ virtqueue_enqueue_xmit(struct
> > > > virtqueue *txvq, struct rte_mbuf *cookie)
> > > > dxp->cookie = (void *)cookie;
> > > > dxp->ndescs = needed;
> > > >
> > > > +   virtqueue_enqueue_offload(txvq, cookie, idx, head_size);
> > >
> > > If TSO is not enabled in the feature bit, how to resolve here?
> >
> > The TSO enablement  check is in the function.
> >
> > If TSO is not enabled, and don't need to fill virtio_net_hdr now.
> 
> Here I mean if (m->ol_flags & PKT_TX_TCP_SEG) is true, that is to say, the 
> virtio-
> pmd user expect do tso in vhost or virtio, but the host feature bit doesn't
> support it, then it should handle this case, either handle it in virtio pmd, 
> or return
> error to caller.
> Otherwise the packet with flag tso maybe can't be send out normally.
> Am I right?
Not exactly, if host feature bit doesn't support TSO, and the TSO flag cannot 
been set in this function,
and then continue processing the packet in host. 
Now I think it had better return an error, and don't continue processing this 
packet.

> >
> > > > start_dp = txvq->vq_ring.desc;
> > > > start_dp[idx].addr =
> > > > txvq->virtio_net_hdr_mem + idx * head_size;
> > > > --
> > > > 1.7.7.6

[dpdk-dev] Random packet drops with ip_pipeline on R730.

2015-09-09 Thread Dumitrescu, Cristian

Hi Husainee,

Yes, please try on release 2.1 and do come back to us with your findings. Based 
on your findings so far though, looks like this is not a SW issue with the 
ip_pipeline application (from 2.0 release).

The packet i/O rate that you are using is of a few Mpps, which is low enough to 
be sustained by a 1.6 GHz CPU or 3.1 GHz CPU, so I don?t think the CPU is the 
issue, but some other HW things might be: how many PCIe lanes are routed to 
each of the NICs, are they PCI Gen2 or Gen1, are the PCIs slots used by the 
NICs on the same CPU socket with the CPU core(s) you?re using for packet 
forwarding, etc? I think you got it right: the fastest way to debug this issue 
is to try our multiple CPUs and NICs.

Regards,
Cristian

From: husainee [mailto:husainee.plum...@nevisnetworks.com]
Sent: Wednesday, September 9, 2015 3:16 PM
To: Dumitrescu, Cristian; dev at dpdk.org
Cc: Cao, Waterman
Subject: Re: [dpdk-dev] Random packet drops with ip_pipeline on R730.

hi Cristian

I am using 2.0 release. I will try with 2.1 and revert.

But for additional information I tried the same 2.0 ip_pipeline application 
with a Desktop system which has a single socket  Intel(R) Core(TM) i5-4440 CPU 
@ 3.10GHz, 4 core. The nic is same i350.

On this machine I am sending packets on 4 ports at 1Gbps full duplex and i get 
4Gbps throughput with no drops.

The change between the two systems is the processor (speed, cores) and no of 
sockets. Is the speed of processor reducing the performance of DPDK drastically 
from 4Gbps to something <0.5Gpbs. This is confusing!

regards
husainee

On 09/09/2015 05:09 PM, Dumitrescu, Cristian wrote:
Hi Husainee,

Looking at your config file, looks like you are using an old DPDK release prior 
to 2.1, can you please try out same simple test in your environment for latest 
DPDK 2.1 release?

We did a lot of work in DPDK release 2.1 for the ip_pipeline application, we 
basically rewrote large parts of it, including the parser, checks, run-time, 
library of pipelines, etc. The format of the config file has been improved a 
lot, you should be able to adapt your config file to the latest syntax very 
quickly.

Btw, you config file is not really equivalent to the l2fwd, as you are using 
two CPU cores connected through software rings rather than a single core, as 
l2fwd.

Here is an equivalent DPDK 2.1 config file using two cores connected through 
software rings (port 0 -> port 1, port 1-> port 0, port 2 -> port 3, port 3 -> 
port2):

[PIPELINE0]
type = MASTER
core = 0

[PIPELINE1]
type = PASS-THROUGH
core = 1
pktq_in = RXQ0.0 RXQ1.0 RXQ2.0 RXQ3.0
pktq_out = SWQ0 SWQ1 SWQ2 SWQ3

[PIPELINE2]
type = PASS-THROUGH
core = 2; you can also place PIPELINE2 on same core as PIPELINE1: core = 1
pktq_in = SWQ1 SWQ0 SWQ3 SWQ2
pktq_out = TXQ0.0 TXQ1.0 TXQ2.0 TXQ3.0

Here is an config file doing similar processing with a single core, closer 
configuration to l2fwd (port 0 -> port 1, port 1-> port 0, port 2 -> port 3, 
port 3 -> port2):

[PIPELINE0]
type = MASTER
core = 0

[PIPELINE1]
type = PASS-THROUGH
core = 1
pktq_in = RXQ0.0 RXQ1.0 RXQ2.0 RXQ3.0
pktq_out = TXQ1.0 TXQ0.0 TXQ3.0 TXQ2.0

Regards,
Cristian

From: husainee [mailto:husainee.plum...@nevisnetworks.com]
Sent: Wednesday, September 9, 2015 12:47 PM
To: Dumitrescu, Cristian; dev at dpdk.org
Cc: Cao, Waterman
Subject: Re: [dpdk-dev] Random packet drops with ip_pipeline on R730.

Hi Cristian
PFA the config file.

I am sending packets from port0 and receiving on port1.

By random packet drops I mean, on every run the number of packets dropped is 
not same. Here are some results as below.

Frame sent rate 1488095.2 fps, 64Byte packets (100% of 1000Mbps)
Run1- 0.0098% (20-22 Million Packets)
Run2- 0.021% (20-22 Million Packets)
Run3- 0.0091% (20-22 Million Packets)

Frame rate 744047.62 fps, 64 Byte packets, (50% of 1000Mbps)
Run1- 0.0047% (20-22 Million Packets)
Run2- 0.0040% (20-22 Million Packets)
Run3- 0.0040% (20-22 Million Packets)


Frame rate 148809.52 fps, 64 Byte packets,(10% of 1000Mbps)
Run1- 0 (20-22 Million Packets)
Run2- 0 (20-22 Million Packets)
Run3- 0 (20-22 Million Packets)



Following are the hw nic setting differences btw ip_pipeline and l2fwd app.
parameter

ip_pipeline

l2fwd

jumbo frame

1

0

hw_ip_checksum

1

0

rx_conf. wthresh

4

0

rx_conf.rx_free_thresh

64

32

tx_conf.pthresh

36

32

burst size

64

32


We tried to make the ip_pipeline settings same as l2fwd but no change in 
results.

I have not tried with 10GbE . I do not have 10GbE test equipment.



regards
husainee




On 09/08/2015 06:32 PM, Dumitrescu, Cristian wrote:

Hi Husainee,



Can you please explain what do you mean by random packet drops? What percentage 
of the input packets get dropped, does it take place on every run, does the 
number of dropped packets vary on every run, etc?



Are you also able to reproduce this issue with other NICs, e.g. 10GbE NIC?



Can you share your config file?



Can you please double check the low leve

[dpdk-dev] [PATCH v2 4/4] ethdev: check driver support for functions

2015-09-09 Thread Bruce Richardson

The functions rte_eth_rx_queue_count and rte_eth_descriptor_done are
supported by very few PMDs. Therefore, it is best to check for support
for the functions in the ethdev library, so as to avoid run-time crashes
at run-time if the application goes to use those APIs. The performance
impact of this change should be very small as this is a predictable
branch in the function.

Signed-off-by: Bruce Richardson 
---
 lib/librte_ether/rte_ethdev.h | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 1113ed2..0759de4 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -2608,16 +2608,18 @@ rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id,
  * @param queue_id
  *  The queue id on the specific port.
  * @return
- *  The number of used descriptors in the specific queue.
+ *  The number of used descriptors in the specific queue, or:
+ * (-EINVAL) if *port_id* is invalid
+ * (-ENOTSUP) if the device does not support this function
  */
-static inline uint32_t
+static inline int
 rte_eth_rx_queue_count(uint8_t port_id, uint16_t queue_id)
 {
 struct rte_eth_dev *dev = &rte_eth_devices[port_id];
 #ifdef RTE_LIBRTE_ETHDEV_DEBUG
-   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
-   RTE_ETH_FPTR_OR_ERR_RET(*dev->dev_ops->rx_queue_count, 0);
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
 #endif
+   RTE_ETH_FPTR_OR_ERR_RET(*dev->dev_ops->rx_queue_count, -ENOTSUP);
 return (*dev->dev_ops->rx_queue_count)(dev, queue_id);
 }

@@ -2634,6 +2636,7 @@ rte_eth_rx_queue_count(uint8_t port_id, uint16_t queue_id)
  *  - (1) if the specific DD bit is set.
  *  - (0) if the specific DD bit is not set.
  *  - (-ENODEV) if *port_id* invalid.
+ *  - (-ENOTSUP) if the device does not support this function
  */
 static inline int
 rte_eth_rx_descriptor_done(uint8_t port_id, uint16_t queue_id, uint16_t offset)
@@ -2641,8 +2644,8 @@ rte_eth_rx_descriptor_done(uint8_t port_id, uint16_t 
queue_id, uint16_t offset)
struct rte_eth_dev *dev = &rte_eth_devices[port_id];
 #ifdef RTE_LIBRTE_ETHDEV_DEBUG
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-   RTE_ETH_FPTR_OR_ERR_RET(*dev->dev_ops->rx_descriptor_done, -ENOTSUP);
 #endif
+   RTE_ETH_FPTR_OR_ERR_RET(*dev->dev_ops->rx_descriptor_done, -ENOTSUP);
return (*dev->dev_ops->rx_descriptor_done)( \
dev->data->rx_queues[queue_id], offset);
 }
-- 
2.4.3

[dpdk-dev] [PATCH v2 3/4] ethdev: remove duplicated debug functions

2015-09-09 Thread Bruce Richardson

The functions for rx/tx burst, for rx_queue_count and descriptor_done in
the ethdev library all had two copies of the code. One copy in
rte_ethdev.h was inlined for performance, while a second was in
rte_ethdev.c for debugging purposes only. We can eliminate the second
copy of the functions by moving the additional debug checks into the
copies of the functions in the header file. [Any compilation for
debugging at optimization level 0 will not inline the function so the
result should be same as when the function was in the .c file.]

Signed-off-by: Bruce Richardson 
---
 lib/librte_ether/rte_ethdev.c | 64 ---
 lib/librte_ether/rte_ethdev.h | 59 ---
 2 files changed, 29 insertions(+), 94 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 01e44c6..4ce59dd 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -2827,70 +2827,6 @@ rte_eth_mirror_rule_reset(uint8_t port_id, uint8_t 
rule_id)
return (*dev->dev_ops->mirror_rule_reset)(dev, rule_id);
 }

-#ifdef RTE_LIBRTE_ETHDEV_DEBUG
-uint16_t
-rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id,
-struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
-{
-   struct rte_eth_dev *dev;
-
-   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
-
-   dev = &rte_eth_devices[port_id];
-   RTE_ETH_FPTR_OR_ERR_RET(*dev->rx_pkt_burst, 0);
-   if (queue_id >= dev->data->nb_rx_queues) {
-   RTE_PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", queue_id);
-   return 0;
-   }
-   return (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
-   rx_pkts, nb_pkts);
-}
-
-uint16_t
-rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
-struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
-{
-   struct rte_eth_dev *dev;
-
-   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
-
-   dev = &rte_eth_devices[port_id];
-
-   RTE_ETH_FPTR_OR_ERR_RET(*dev->tx_pkt_burst, 0);
-   if (queue_id >= dev->data->nb_tx_queues) {
-   RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
-   return 0;
-   }
-   return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id],
-   tx_pkts, nb_pkts);
-}
-
-uint32_t
-rte_eth_rx_queue_count(uint8_t port_id, uint16_t queue_id)
-{
-   struct rte_eth_dev *dev;
-
-   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
-
-   dev = &rte_eth_devices[port_id];
-   RTE_ETH_FPTR_OR_ERR_RET(*dev->dev_ops->rx_queue_count, 0);
-   return (*dev->dev_ops->rx_queue_count)(dev, queue_id);
-}
-
-int
-rte_eth_rx_descriptor_done(uint8_t port_id, uint16_t queue_id, uint16_t offset)
-{
-   struct rte_eth_dev *dev;
-
-   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-
-   dev = &rte_eth_devices[port_id];
-   RTE_ETH_FPTR_OR_ERR_RET(*dev->dev_ops->rx_descriptor_done, -ENOTSUP);
-   return 
(*dev->dev_ops->rx_descriptor_done)(dev->data->rx_queues[queue_id],
-  offset);
-}
-#endif
-
 int
 rte_eth_dev_callback_register(uint8_t port_id,
enum rte_eth_event_type event,
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 5501b04..1113ed2 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -2567,18 +2567,21 @@ extern int rte_eth_dev_set_vlan_pvid(uint8_t port_id, 
uint16_t pvid, int on);
  *   of pointers to *rte_mbuf* structures effectively supplied to the
  *   *rx_pkts* array.
  */
-#ifdef RTE_LIBRTE_ETHDEV_DEBUG
-extern uint16_t rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id,
-struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
-#else
 static inline uint16_t
 rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id,
 struct rte_mbuf **rx_pkts, const uint16_t nb_pkts)
 {
-   struct rte_eth_dev *dev;
+   struct rte_eth_dev *dev = &rte_eth_devices[port_id];

-   dev = &rte_eth_devices[port_id];
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
+   RTE_ETH_FPTR_OR_ERR_RET(*dev->rx_pkt_burst, 0);

+   if (queue_id >= dev->data->nb_rx_queues) {
+   RTE_PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", queue_id);
+   return 0;
+   }
+#endif
int16_t nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
rx_pkts, nb_pkts);

@@ -2596,7 +2599,6 @@ rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id,

return nb_rx;
 }
-#endif

 /**
  * Get the number of used descriptors in a specific queue
@@ -2608,18 +2610,16 @@ rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id,
  * @return
  *  The number of used descriptors in the specific queue.
  */
-#ifdef RTE_LIBRTE_ETHDEV_DEBUG
-extern uint32_t rte_eth_rx_queue_count(uint8_t port_id, uint1

[dpdk-dev] [PATCH v2 2/4] ethdev: move error checking macros to header

2015-09-09 Thread Bruce Richardson

Move the function ptr and port id checking macros to the header file, so
that they can be used in the static inline functions there. In doxygen
comments, mark them as for internal use only.

Signed-off-by: Bruce Richardson 
---
 lib/librte_ether/rte_ethdev.c | 38 --
 lib/librte_ether/rte_ethdev.h | 55 +++
 2 files changed, 55 insertions(+), 38 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 09af303..01e44c6 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -69,14 +69,6 @@
 #include "rte_ether.h"
 #include "rte_ethdev.h"

-#ifdef RTE_LIBRTE_ETHDEV_DEBUG
-#define RTE_PMD_DEBUG_TRACE(fmt, args...) do {\
-   RTE_LOG(ERR, PMD, "%s: " fmt, __func__, ## args); \
-   } while (0)
-#else
-#define RTE_PMD_DEBUG_TRACE(fmt, args...)
-#endif
-
 /* Macros for checking for restricting functions to primary instance only */
 #define PROC_PRIMARY_OR_ERR_RET(retval) do { \
if (rte_eal_process_type() != RTE_PROC_PRIMARY) { \
@@ -92,36 +84,6 @@
} \
 } while (0)

-/* Macros to check for invalid function pointers in dev_ops structure */
-#define RTE_ETH_FPTR_OR_ERR_RET(func, retval) do { \
-   if ((func) == NULL) { \
-   RTE_PMD_DEBUG_TRACE("Function not supported\n"); \
-   return (retval); \
-   } \
-} while (0)
-
-#define RTE_ETH_FPTR_OR_RET(func) do { \
-   if ((func) == NULL) { \
-   RTE_PMD_DEBUG_TRACE("Function not supported\n"); \
-   return; \
-   } \
-} while (0)
-
-/* Macros to check for valid port */
-#define RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, retval) do {  \
-   if (!rte_eth_dev_is_valid_port(port_id)) {  \
-   RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); \
-   return retval;  \
-   }   \
-} while (0)
-
-#define RTE_ETH_VALID_PORTID_OR_RET(port_id) do {  \
-   if (!rte_eth_dev_is_valid_port(port_id)) {  \
-   RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); \
-   return; \
-   }   \
-} while (0)
-
 static const char *MZ_RTE_ETH_DEV_DATA = "rte_eth_dev_data";
 struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS];
 static struct rte_eth_dev_data *rte_eth_dev_data;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index fa06554..5501b04 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -968,6 +968,61 @@ struct rte_eth_dev_callback;
 /** @internal Structure to keep track of registered callbacks */
 TAILQ_HEAD(rte_eth_dev_cb_list, rte_eth_dev_callback);

+/**
+ * @internal
+ *  Macro to print a message if in debugging mode
+ */
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+#define RTE_PMD_DEBUG_TRACE(fmt, args...) do {\
+   RTE_LOG(ERR, PMD, "%s: " fmt, __func__, ## args); \
+   } while (0)
+#else
+#define RTE_PMD_DEBUG_TRACE(fmt, args...)
+#endif
+
+/**
+ * @internal
+ *  Macro to check for invalid function pointer in dev_ops structure
+ */
+#define RTE_ETH_FPTR_OR_ERR_RET(func, retval) do { \
+   if ((func) == NULL) { \
+   RTE_PMD_DEBUG_TRACE("Function not supported\n"); \
+   return (retval); \
+   } \
+} while(0)
+/**
+ * @internal
+ *  Macro to check for invalid function pointer in dev_ops structure
+ */
+#define RTE_ETH_FPTR_OR_RET(func) do { \
+   if ((func) == NULL) { \
+   RTE_PMD_DEBUG_TRACE("Function not supported\n"); \
+   return; \
+   } \
+} while(0)
+
+/**
+ * @internal
+ * Macro to check for valid port id
+ */
+#define RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, retval) do {  \
+   if (!rte_eth_dev_is_valid_port(port_id)) {  \
+   RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); \
+   return retval;  \
+   }   \
+} while (0)
+
+/**
+ * @internal
+ * Macro to check for valid port id
+ */
+#define RTE_ETH_VALID_PORTID_OR_RET(port_id) do {  \
+   if (!rte_eth_dev_is_valid_port(port_id)) {  \
+   RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); \
+   return; \
+   }   \
+} while (0)
+
 /*
  * Definitions of all functions exported by an Ethernet driver through the
  * the generic structure of type *eth_dev_ops* supplied in the *rte_eth_dev*
-- 
2.4.3

[dpdk-dev] [PATCH v2 1/4] ethdev: rename macros to have RTE_ETH prefix

2015-09-09 Thread Bruce Richardson

The macros to check that the function pointers and port ids are valid
for an ethdev are potentially useful to have in the ethdev.h file.
However, since they would then become externally visible, we apply
the RTE_ETH prefix to them.

Signed-off-by: Bruce Richardson 
---
 lib/librte_ether/rte_ethdev.c | 612 +-
 1 file changed, 306 insertions(+), 306 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index b309309..09af303 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -70,54 +70,54 @@
 #include "rte_ethdev.h"

 #ifdef RTE_LIBRTE_ETHDEV_DEBUG
-#define PMD_DEBUG_TRACE(fmt, args...) do {\
+#define RTE_PMD_DEBUG_TRACE(fmt, args...) do {\
RTE_LOG(ERR, PMD, "%s: " fmt, __func__, ## args); \
} while (0)
 #else
-#define PMD_DEBUG_TRACE(fmt, args...)
+#define RTE_PMD_DEBUG_TRACE(fmt, args...)
 #endif

 /* Macros for checking for restricting functions to primary instance only */
 #define PROC_PRIMARY_OR_ERR_RET(retval) do { \
if (rte_eal_process_type() != RTE_PROC_PRIMARY) { \
-   PMD_DEBUG_TRACE("Cannot run in secondary processes\n"); \
+   RTE_PMD_DEBUG_TRACE("Cannot run in secondary processes\n"); \
return (retval); \
} \
 } while (0)

 #define PROC_PRIMARY_OR_RET() do { \
if (rte_eal_process_type() != RTE_PROC_PRIMARY) { \
-   PMD_DEBUG_TRACE("Cannot run in secondary processes\n"); \
+   RTE_PMD_DEBUG_TRACE("Cannot run in secondary processes\n"); \
return; \
} \
 } while (0)

 /* Macros to check for invalid function pointers in dev_ops structure */
-#define FUNC_PTR_OR_ERR_RET(func, retval) do { \
+#define RTE_ETH_FPTR_OR_ERR_RET(func, retval) do { \
if ((func) == NULL) { \
-   PMD_DEBUG_TRACE("Function not supported\n"); \
+   RTE_PMD_DEBUG_TRACE("Function not supported\n"); \
return (retval); \
} \
 } while (0)

-#define FUNC_PTR_OR_RET(func) do { \
+#define RTE_ETH_FPTR_OR_RET(func) do { \
if ((func) == NULL) { \
-   PMD_DEBUG_TRACE("Function not supported\n"); \
+   RTE_PMD_DEBUG_TRACE("Function not supported\n"); \
return; \
} \
 } while (0)

 /* Macros to check for valid port */
-#define VALID_PORTID_OR_ERR_RET(port_id, retval) do {  \
+#define RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, retval) do {  \
if (!rte_eth_dev_is_valid_port(port_id)) {  \
-   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); \
+   RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); \
return retval;  \
}   \
 } while (0)

-#define VALID_PORTID_OR_RET(port_id) do {  \
+#define RTE_ETH_VALID_PORTID_OR_RET(port_id) do {  \
if (!rte_eth_dev_is_valid_port(port_id)) {  \
-   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); \
+   RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); \
return; \
}   \
 } while (0)
@@ -241,7 +241,7 @@ rte_eth_dev_allocate(const char *name, enum 
rte_eth_dev_type type)

port_id = rte_eth_dev_find_free_port();
if (port_id == RTE_MAX_ETHPORTS) {
-   PMD_DEBUG_TRACE("Reached maximum number of Ethernet ports\n");
+   RTE_PMD_DEBUG_TRACE("Reached maximum number of Ethernet 
ports\n");
return NULL;
}

@@ -249,7 +249,7 @@ rte_eth_dev_allocate(const char *name, enum 
rte_eth_dev_type type)
rte_eth_dev_data_alloc();

if (rte_eth_dev_allocated(name) != NULL) {
-   PMD_DEBUG_TRACE("Ethernet Device with name %s already 
allocated!\n",
+   RTE_PMD_DEBUG_TRACE("Ethernet Device with name %s already 
allocated!\n",
name);
return NULL;
}
@@ -336,7 +336,7 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv,
if (diag == 0)
return 0;

-   PMD_DEBUG_TRACE("driver %s: eth_dev_init(vendor_id=0x%u device_id=0x%x) 
failed\n",
+   RTE_PMD_DEBUG_TRACE("driver %s: eth_dev_init(vendor_id=0x%u 
device_id=0x%x) failed\n",
pci_drv->name,
(unsigned) pci_dev->id.vendor_id,
(unsigned) pci_dev->id.device_id);
@@ -470,10 +470,10 @@ rte_eth_dev_get_changed_port(struct rte_eth_dev *devs, 
uint8_t *port_id)
 static int
 rte_eth_dev_get_addr_by_port(uint8_t port_id, struct rte_pci_addr *addr)
 {
-   VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);

if (addr

[dpdk-dev] [PATCH v2 0/4] ethdev: minor cleanup

2015-09-09 Thread Bruce Richardson

This patchset performs two cleanups:
1. Four functions in ethdev.c which were enabled for debug only have been
  merged into their inlined header-file counterparts. This change required that
  a number of macros be renamed and moved to the header file too. The macro 
changes
  are in patches 1 & 2, and the elimination of the separate debug fns are in 
patch 3.
2. Checks for valid function pointers are added to the API calls for reading
  the descriptor ring count, and checking for a valid descriptor. This is 
because
  these functions are not implemented by most drivers, and so it's far safer to
  have the check.

---

V2 Changes:
* Rebased to latest DPDK codebase
* Changed type from uint32_t to int for the count function, on the basis of
feedback received.

Bruce Richardson (4):
  ethdev: rename macros to have RTE_ETH prefix
  ethdev: move error checking macros to header
  ethdev: remove duplicated debug functions
  ethdev: check driver support for functions

 lib/librte_ether/rte_ethdev.c | 674 ++
 lib/librte_ether/rte_ethdev.h | 121 ++--
 2 files changed, 375 insertions(+), 420 deletions(-)

-- 
2.4.3

[dpdk-dev] [PATCH v3 1/1] ip_frag: fix creating ipv6 fragment extension header

2015-09-09 Thread Piotr Azarewicz

Previous implementation won't work on every environment. The order of
allocation of bit-fields within a unit (high-order to low-order or
low-order to high-order) is implementation-defined.
Solution: used bytes instead of bit fields.

v2 changes:
- remove useless union
- fix process_ipv6 function (due to remove the union above)

v3 changes:
- add macros to read/set fragment_offset and more_flags values

Signed-off-by: Piotr Azarewicz 
---
 lib/librte_ip_frag/rte_ip_frag.h|   27 ---
 lib/librte_ip_frag/rte_ipv6_fragmentation.c |   12 ++--
 lib/librte_port/rte_port_ras.c  |7 ---
 3 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/lib/librte_ip_frag/rte_ip_frag.h b/lib/librte_ip_frag/rte_ip_frag.h
index 52f44c9..92cedf2 100644
--- a/lib/librte_ip_frag/rte_ip_frag.h
+++ b/lib/librte_ip_frag/rte_ip_frag.h
@@ -128,19 +128,24 @@ struct rte_ip_frag_tbl {
 };

 /** IPv6 fragment extension header */
+#defineRTE_IPV6_EHDR_MF_SHIFT  0
+#defineRTE_IPV6_EHDR_MF_MASK   1
+#defineRTE_IPV6_EHDR_FO_SHIFT  3
+#defineRTE_IPV6_EHDR_FO_MASK   (~((1 << 
RTE_IPV6_EHDR_FO_SHIFT) - 1))
+
+#define RTE_IPV6_FRAG_USED_MASK\
+   (RTE_IPV6_EHDR_MF_MASK | RTE_IPV6_EHDR_FO_MASK)
+
+#define RTE_IPV6_GET_MF(x) ((x) & 
RTE_IPV6_EHDR_MF_MASK)
+#define RTE_IPV6_GET_FO(x) ((x) >> 
RTE_IPV6_EHDR_FO_SHIFT)
+
+#define RTE_IPV6_SET_FRAG_DATA(fo, mf) \
+   (((fo) & RTE_IPV6_EHDR_FO_MASK) | ((mf) & RTE_IPV6_EHDR_MF_MASK))
+
 struct ipv6_extension_fragment {
uint8_t next_header;/**< Next header type */
-   uint8_t reserved1;  /**< Reserved */
-   union {
-   struct {
-   uint16_t frag_offset:13; /**< Offset from the start of 
the packet */
-   uint16_t reserved2:2; /**< Reserved */
-   uint16_t more_frags:1;
-   /**< 1 if more fragments left, 0 if last fragment */
-   };
-   uint16_t frag_data;
-   /**< union of all fragmentation data */
-   };
+   uint8_t reserved;   /**< Reserved */
+   uint16_t frag_data; /**< All fragmentation data */
uint32_t id;/**< Packet ID */
 } __attribute__((__packed__));

diff --git a/lib/librte_ip_frag/rte_ipv6_fragmentation.c 
b/lib/librte_ip_frag/rte_ipv6_fragmentation.c
index 0e32aa8..251a4b8 100644
--- a/lib/librte_ip_frag/rte_ipv6_fragmentation.c
+++ b/lib/librte_ip_frag/rte_ipv6_fragmentation.c
@@ -46,12 +46,6 @@
  *
  */

-/* Fragment Extension Header */
-#defineIPV6_HDR_MF_SHIFT   0
-#defineIPV6_HDR_FO_SHIFT   3
-#defineIPV6_HDR_MF_MASK(1 << IPV6_HDR_MF_SHIFT)
-#defineIPV6_HDR_FO_MASK((1 << 
IPV6_HDR_FO_SHIFT) - 1)
-
 static inline void
 __fill_ipv6hdr_frag(struct ipv6_hdr *dst,
const struct ipv6_hdr *src, uint16_t len, uint16_t fofs,
@@ -65,10 +59,8 @@ __fill_ipv6hdr_frag(struct ipv6_hdr *dst,

fh = (struct ipv6_extension_fragment *) ++dst;
fh->next_header = src->proto;
-   fh->reserved1   = 0;
-   fh->frag_offset = rte_cpu_to_be_16(fofs);
-   fh->reserved2   = 0;
-   fh->more_frags  = rte_cpu_to_be_16(mf);
+   fh->reserved = 0;
+   fh->frag_data = rte_cpu_to_be_16(RTE_IPV6_SET_FRAG_DATA(fofs, mf));
fh->id = 0;
 }

diff --git a/lib/librte_port/rte_port_ras.c b/lib/librte_port/rte_port_ras.c
index 6bd0f8c..8a2e554 100644
--- a/lib/librte_port/rte_port_ras.c
+++ b/lib/librte_port/rte_port_ras.c
@@ -212,12 +212,13 @@ process_ipv6(struct rte_port_ring_writer_ras *p, struct 
rte_mbuf *pkt)
struct ipv6_hdr *pkt_hdr = rte_pktmbuf_mtod(pkt, struct ipv6_hdr *);

struct ipv6_extension_fragment *frag_hdr;
+   uint16_t frag_data = 0;
frag_hdr = rte_ipv6_frag_get_ipv6_fragment_header(pkt_hdr);
-   uint16_t frag_offset = frag_hdr->frag_offset;
-   uint16_t frag_flag = frag_hdr->more_frags;
+   if (frag_hdr != NULL)
+   frag_data = rte_be_to_cpu_16(frag_hdr->frag_data);

/* If it is a fragmented packet, then try to reassemble */
-   if ((frag_flag == 0) && (frag_offset == 0))
+   if ((frag_data & RTE_IPV6_FRAG_USED_MASK) == 0)
p->tx_buf[p->tx_buf_count++] = pkt;
else {
struct rte_mbuf *mo;
-- 
1.7.9.5

[dpdk-dev] [PATCH v4 0/2] ethdev: add port speed capability bitmap

2015-09-09 Thread Thomas Monjalon

2015-09-09 15:10, N?lio Laranjeiro:
> I think V2 is better, maybe you can add a function to convert a single
> bitmap value to the equivalent integer and get rid of ETH_SPEED_XXX macros.
> 
> Thomas what is your opinion?

Your proposal looks good Nelio.
Thanks

[dpdk-dev] [PATCH] i40e: fix base driver allocation when on numa != 0

2015-09-09 Thread Thomas Monjalon

> > Seen by code review.
> > 
> > If dpdk is run with memory only available on socket 0, then i40e pmd 
> > refuses to
> > initialize ports as this pmd requires some memory on socket 0.
> > Fix this by setting socket to SOCKET_ID_ANY, so that allocations happen on 
> > the
> > caller socket.
> > 
> > Signed-off-by: David Marchand 
> Acked-by: Helin Zhang 

Applied, thanks

[dpdk-dev] Random packet drops with ip_pipeline on R730.

2015-09-09 Thread husainee

Hi Cristian
PFA the config file.

I am sending packets from port0 and receiving on port1.

By random packet drops I mean, on every run the number of packets
dropped is not same. Here are some results as below.

Frame sent rate 1488095.2 fps, 64Byte packets (100% of 1000Mbps)
Run1- 0.0098% (20-22 Million Packets)
Run2- 0.021% (20-22 Million Packets)
Run3- 0.0091% (20-22 Million Packets)

Frame rate 744047.62 fps, 64 Byte packets, (50% of 1000Mbps)
Run1- 0.0047% (20-22 Million Packets)
Run2- 0.0040% (20-22 Million Packets)
Run3- 0.0040% (20-22 Million Packets)


Frame rate 148809.52 fps, 64 Byte packets,(10% of 1000Mbps)
Run1- 0 (20-22 Million Packets)
Run2- 0 (20-22 Million Packets)
Run3- 0 (20-22 Million Packets)



Following are the hw nic setting differences btw ip_pipeline and l2fwd app.

parameter
ip_pipeline
l2fwd
jumbo frame
1
0
hw_ip_checksum
1
0
rx_conf. wthresh
4
0
rx_conf.rx_free_thresh
64
32
tx_conf.pthresh
36
32
burst size
64
32


We tried to make the ip_pipeline settings same as l2fwd but no change in
results.

I have not tried with 10GbE . I do not have 10GbE test equipment.



regards
husainee




On 09/08/2015 06:32 PM, Dumitrescu, Cristian wrote:
> Hi Husainee,
>
> Can you please explain what do you mean by random packet drops? What 
> percentage of the input packets get dropped, does it take place on every run, 
> does the number of dropped packets vary on every run, etc?
>
> Are you also able to reproduce this issue with other NICs, e.g. 10GbE NIC?
>
> Can you share your config file?
>
> Can you please double check the low level NIC settings between the two 
> applications, i.e. the settings in structures link_params_default, 
> default_hwq_in_params, default_hwq_out_params from ip_pipeline file 
> config_parse.c vs. their equivalents from l2fwd? The only thing I can think 
> of right now is maybe one of the low level threshold values for the Ethernet 
> link is not tuned for your 1GbE NIC.
>
> Regards,
> Cristian
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of husainee
>> Sent: Tuesday, September 8, 2015 7:56 AM
>> To: dev at dpdk.org
>> Subject: [dpdk-dev] Random packet drops with ip_pipeline on R730.
>>
>> Hi
>>
>> I am using a DELL730 with Dual socket. Processor in each socket is
>> Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz- 6Cores.
>> The CPU layout has socket 0 with 0,2,4,6,8,10 cores and socket 1 with
>> 1,3,5,7,9,11 cores.
>> The NIC card is i350.
>>
>> The Cores 2-11 are isolated using isolcpus kernel parameter. We are
>> running the ip_peipeline application with only Master, RX and TX threads
>> (Flow and Route have been removed from cfg file). The threads are run as
>> follows
>>
>> - Master on CPU core 2
>> - RX on CPU core 4
>> - TX on CPU core 6
>>
>> 64 byte packets are sent from ixia at different speeds, but we are
>> seeing random packet drops.  Same excercise is done on core 3,5,7 and
>> results are same.
>>
>> We tried the l2fwd app and it works fine with no packet drops.
>>
>> Hugepages per 1024 x 2M per socket.
>>
>>
>> Can anyone suggest what could be the reason for these random packet
>> drops.
>>
>> regards
>> husainee
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>  



-- next part --
;   BSD LICENSE
;
;   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
;   All rights reserved.
;
;   Redistribution and use in source and binary forms, with or without
;   modification, are permitted provided that the following conditions
;   are met:
;
; * Redistributions of source code must retain the above copyright
;   notice, this list of conditions and the following disclaimer.
; * Redistributions in binary form must reproduce the above copyright
;   notice, this list of conditions and the following disclaimer in
;   the documentation and/or other materials provided with the
;   distribution.
; * Neither the name of Intel Corporation nor the names of its
;   contributors may be used to endorse or promote products derived
;   from this software without specific prior written permission.
;
;   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
;   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
;   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
;   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
;   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
;   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
;   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
;   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
;   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
;   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
;   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DA

[dpdk-dev] [PATCH v3] ixgbe: remove vector pmd burst size restriction

2015-09-09 Thread Thomas Monjalon

> > On receive side, the burst size now floor aligns to 
> > RTE_IXGBE_DESCS_PER_LOOP power of 2.
> > According to this rule, the burst size less than 4 still won't receive 
> > anything.
> > (Before this change, the burst size less than 32 can't receive anything.)
> > _recv_*_pkts_vec returns no more than 32(RTE_IXGBE_RXQ_REARM_THRESH) 
> > packets.
> > 
> > On transmit side, the max burst size no longer bind with a constant, 
> > however it still
> > require to check the cross tx_rs_thresh violation.
> > 
> > There's no obvious performance drop found on both recv_pkts_vec
> > and recv_scattered_pkts_vec on burst size 32.
> > 
> > Signed-off-by: Cunming Liang 
> 
> Acked-by: Konstantin Ananyev 

Applied, thanks

[dpdk-dev] [PATCH v4 0/2] ethdev: add port speed capability bitmap

2015-09-09 Thread Nélio Laranjeiro

Marc,

(making this discussion public again)

On Wed, Sep 09, 2015 at 12:07:01PM +0200, Marc Sune wrote:
> Hi Nelio
> 
> 2015-09-09 11:08 GMT+02:00 N?lio Laranjeiro :
> 
> Marc,
> 
> On Tue, Sep 08, 2015 at 10:24:36PM +0200, Marc Sune wrote:
> > Neilo,
> >
> > 2015-09-08 12:03 GMT+02:00 N?lio Laranjeiro  6wind.com>:
> >
> >? ? ?On Mon, Sep 07, 2015 at 10:52:53PM +0200, Marc Sune wrote:
> >? ? ?> 2015-08-29 2:16 GMT+02:00 Marc Sune :
> >? ? ?>
> >? ? ?> > The current rte_eth_dev_info abstraction does not provide any
> mechanism
> >? ? ?to
> >? ? ?> > get the supported speed(s) of an ethdev.
> >? ? ?> >
> >? ? ?> > For some drivers (e.g. ixgbe), an educated guess can be done
> based on
> >? ? ?the
> >? ? ?> > driver's name (driver_name in rte_eth_dev_info), see:
> >? ? ?> >
> >? ? ?> > http://dpdk.org/ml/archives/dev/2013-August/000412.html
> >? ? ?> >
> >? ? ?> > However, i) doing string comparisons is annoying, and can
> silently
> >? ? ?> > break existing applications if PMDs change their names ii) it
> does not
> >? ? ?> > provide all the supported capabilities of the ethdev iii) for
> some
> >? ? ?drivers
> >? ? ?> > it
> >? ? ?> > is impossible determine correctly the (max) speed by the
> application
> >? ? ?> > (e.g. in i40, distinguish between XL710 and X710).
> >? ? ?> >
> >? ? ?> > This small patch adds speed_capa bitmap in rte_eth_dev_info,
> which is
> >? ? ?> > filled
> >? ? ?> > by the PMDs according to the physical device capabilities.
> >? ? ?> >
> >? ? ?> > v2: rebase, converted speed_capa into 32 bits bitmap, fixed
> alignment
> >? ? ?> > (checkpatch).
> >? ? ?> >
> >? ? ?> > v3: rebase to v2.1. unified ETH_LINK_SPEED and ETH_SPEED_CAP 
> into
> >? ? ?> > ETH_SPEED.
> >? ? ?> >? ? ?Converted field speed in struct rte_eth_conf to speeds, to
> allow a
> >? ? ?> > bitmap
> >? ? ?> >? ? ?for defining the announced speeds, as suggested by M. 
> Brorup.
> Fixed
> >? ? ?> >? ? ?spelling issues.
> >? ? ?> >
> >? ? ?> > v4: fixed errata in the documentation of field speeds of
> rte_eth_conf,
> >? ? ?and
> >? ? ?> >? ? ?commit 1/2 message. rebased to v2.1.0. v3 was incorrectly
> based on
> >? ? ?> >? ? ?~2.1.0-rc1.
> >? ? ?> >
> >? ? ?>
> >? ? ?> Thomas,
> >? ? ?>
> >? ? ?> Since mostly you were commenting for v1 and v2; any opinion on 
> this
> one?
> >? ? ?>
> >? ? ?> Regards
> >? ? ?> marc
> >
> >? ? ?Hi Marc,
> >
> >? ? ?I have read your patches, and there are a few mistakes, for instance
> mlx4
> >? ? ?(ConnectX-3 devices) does not support 100Gbps.
> >
> >
> > When I circulated v1 and v2 I was kindly asking maintainers and 
> reviewers
> of
> > the drivers to fix any mistakes in SPEED capabilities, since I was 
> taking
> the
> > speeds from the online websites&catalogues. Some were fixed, but
> apparently
> > some were still missing. I will remove 100Gbps. Please circulate any
> other
> > error you have spotted.
> 
> From Mellanox website:
> ?- ConnectX-3 EN: 10/40/56Gb/s
> ?- ConnectX-3 Pro EN 10GBASE-T: 10G/s
> ?- ConnectX-3 Pro: EN 10/40/56GbE
> ?- ConnectX-3 Pro Programmable: 10/40Gb/s
> 
> This PMD works with any of the ConnectX-3 adapters, so the announce speed
> should be 10/40/56Gb/s.
>
> 
> 
> I will change this
> ?
> 
> >? ? ?In addition, it seems your new bitmap does not support all kind of
> >? ? ?speeds, take a look at the header of Ethtool, in the Linux kernel
> >? ? ?(include/uapi/linux/ethtool.h) which already consumes 30bits without
> even
> >? ? ?managing speeds above 56Gbps.
> >
> >
> > The bitmaps you are referring is SUPPORTED_ and ADVERTISED_. These
> bitmaps not
> > only contain the speeds but PHY properties (e.g. BASE for ETH).
> >
> > The intention of this patch was to expose speed capabilities, similar to
> the
> > bitmap SPEED_ in include/uapi/linux/ethtool.h, which as you see maps
> closely to
> > ETH_SPEED_ proposed in this patch.
> >
> > I think the encoding of other things, like the exact model of the
> interface and
> > its PHY details should go somewhere else. But I might be wrong here, so
> open to
> > hear opinions.
> 
> I understand the need to have capability fields, but I don't understand
> why you want to mix speeds and duplex mode in something which was
> previously only handling speeds.
> 
> 
> Please refer to the comments from Thomas. He was arguing the duplicity in
> speeds between link and capabilities was not necessary, hence patch v3 and 4
> are unifying. The reason why there is only 100 and 100_HD is because of that
> and the "solution" Thomas was proposing.
>
> I was originally doing as you s

[dpdk-dev] [PATCH v4 0/2] ethdev: add port speed capability

2015-09-09 Thread Marc Sune

Answering

2015-09-09 11:29 GMT+02:00 N?lio Laranjeiro :

> bitmap
> Reply-To:
>
> Shern , Adrien
> Mazarguil 
> Bcc:
> Subject: Re: [dpdk-dev] [PATCH v4 0/2] ethdev: add port speed capability
> bitmap
> Reply-To:
> In-Reply-To: <20150909090855.GC17463 at autoinstall.dev.6wind.com>
>
> Marc,
>
> On Tue, Sep 08, 2015 at 10:24:36PM +0200, Marc Sune wrote:
> > Neilo,
> >
> > 2015-09-08 12:03 GMT+02:00 N?lio Laranjeiro  >:
> >
> > On Mon, Sep 07, 2015 at 10:52:53PM +0200, Marc Sune wrote:
> > > 2015-08-29 2:16 GMT+02:00 Marc Sune :
> > >
> > > > The current rte_eth_dev_info abstraction does not provide any
> mechanism
> > to
> > > > get the supported speed(s) of an ethdev.
> > > >
> > > > For some drivers (e.g. ixgbe), an educated guess can be done
> based on
> > the
> > > > driver's name (driver_name in rte_eth_dev_info), see:
> > > >
> > > > http://dpdk.org/ml/archives/dev/2013-August/000412.html
> > > >
> > > > However, i) doing string comparisons is annoying, and can
> silently
> > > > break existing applications if PMDs change their names ii) it
> does not
> > > > provide all the supported capabilities of the ethdev iii) for
> some
> > drivers
> > > > it
> > > > is impossible determine correctly the (max) speed by the
> application
> > > > (e.g. in i40, distinguish between XL710 and X710).
> > > >
> > > > This small patch adds speed_capa bitmap in rte_eth_dev_info,
> which is
> > > > filled
> > > > by the PMDs according to the physical device capabilities.
> > > >
> > > > v2: rebase, converted speed_capa into 32 bits bitmap, fixed
> alignment
> > > > (checkpatch).
> > > >
> > > > v3: rebase to v2.1. unified ETH_LINK_SPEED and ETH_SPEED_CAP into
> > > > ETH_SPEED.
> > > > Converted field speed in struct rte_eth_conf to speeds, to
> allow a
> > > > bitmap
> > > > for defining the announced speeds, as suggested by M.
> Brorup. Fixed
> > > > spelling issues.
> > > >
> > > > v4: fixed errata in the documentation of field speeds of
> rte_eth_conf,
> > and
> > > > commit 1/2 message. rebased to v2.1.0. v3 was incorrectly
> based on
> > > > ~2.1.0-rc1.
> > > >
> > >
> > > Thomas,
> > >
> > > Since mostly you were commenting for v1 and v2; any opinion on
> this one?
> > >
> > > Regards
> > > marc
> >
> > Hi Marc,
> >
> > I have read your patches, and there are a few mistakes, for instance
> mlx4
> > (ConnectX-3 devices) does not support 100Gbps.
> >
> >
> > When I circulated v1 and v2 I was kindly asking maintainers and
> reviewers of
> > the drivers to fix any mistakes in SPEED capabilities, since I was
> taking the
> > speeds from the online websites&catalogues. Some were fixed, but
> apparently
> > some were still missing. I will remove 100Gbps. Please circulate any
> other
> > error you have spotted.
>
> From Mellanox website:
>  - ConnectX-3 EN: 10/40/56Gb/s
>  - ConnectX-3 Pro EN 10GBASE-T: 10G/s
>  - ConnectX-3 Pro: EN 10/40/56GbE
>  - ConnectX-3 Pro Programmable: 10/40Gb/s
>
> This PMD works with any of the ConnectX-3 adapters, so the announce speed
> should be 10/40/56Gb/s.
>

I will change this, thanks.


>
> > In addition, it seems your new bitmap does not support all kind of
> > speeds, take a look at the header of Ethtool, in the Linux kernel
> > (include/uapi/linux/ethtool.h) which already consumes 30bits without
> even
> > managing speeds above 56Gbps.
> >
> >
> > The bitmaps you are referring is SUPPORTED_ and ADVERTISED_. These
> bitmaps not
> > only contain the speeds but PHY properties (e.g. BASE for ETH).
> >
> > The intention of this patch was to expose speed capabilities, similar to
> the
> > bitmap SPEED_ in include/uapi/linux/ethtool.h, which as you see maps
> closely to
> > ETH_SPEED_ proposed in this patch.
> >
> > I think the encoding of other things, like the exact model of the
> interface and
> > its PHY details should go somewhere else. But I might be wrong here, so
> open to
> > hear opinions.
>
> I understand the need to have capability fields, but I don't understand
> why you want to mix speeds and duplex mode in something which was
> previously only handling speeds.
>

Please refer to the discussion in this thread for patches v1 and v2,
specially the comments by Thomas. He was arguing the duplicity in speeds
between link and capabilities was not necessary. Patches v3 and v4 try to
unify it. The reason why there is only 100 and 100_HD is because of the use
on both link and capabilities.

I was originally doing as you suggested, separating them and not changing
current APIs. There seemed to be a consensus on that, so let's just go back
to that discussion if needed.


>
> We now have redundant information in struct rte_eth_conf, whereas
> that structure has a speed field which embeds the duplex mode and
> a duplex field which does the same, whi

[dpdk-dev] [PATCH] ixgbe: fix a x550 DCB issue

2015-09-09 Thread Thomas Monjalon

> > There's a DCB issue on x550. For 8 TCs, if a packet with user priority 6 or 
> > 7 is
> > injected to the NIC, then the NIC will put 3 packets into the queue. There's
> > also a similar issue for 4 TCs.
> > The root cause is RXPBSIZE is not right. RXPBSIZE of x550 is 384. It's 
> > different
> > from other 10G NICs. We need to set the RXPBSIZE according to the NIC type.
> > 
> > Signed-off-by: Wenzhuo Lu 
> Acked-by: Jingjing Wu 

Applied, thanks

[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-09-09 Thread Thomas Monjalon

2015-08-25 20:13, Zhang, Helin:
> Yes, I got the perfect answers. Thank you very much!
> I just wanted to make sure the test case was OK with the limit of maximum 
> number of descriptors, as I heard there is a hang issue on other NICs of 
> using more descriptors than hardware allowed.
> OK. I am still waiting for the answers/confirmation from x540 hardware 
> designers. We need all agree on your patches to avoid risks.

Helin, any news?
Can we apply v4 of this patch?

[dpdk-dev] i40e: problem with rx packet drops not accounted in statistics

2015-09-09 Thread Martin Weiser

Hi Helin,

in one of our test setups involving i40e adapters we are experiencing
packet drops which are not reflected in the interfaces statistics.
The call to rte_eth_stats_get suggests that all packets were properly
received but the total number of packets received through
rte_eth_rx_burst is less than the ipackets counter.
When for example running the l2fwd application (l2fwd -c 0xfe -n 4 -- -p
0x3) and having driver debug messages enabled the following output is
generated for the interface in question:

...
PMD: i40e_update_vsi_stats(): * VSI[6] stats start
***
PMD: i40e_update_vsi_stats(): rx_bytes:24262434
PMD: i40e_update_vsi_stats(): rx_unicast:  16779
PMD: i40e_update_vsi_stats(): rx_multicast:0
PMD: i40e_update_vsi_stats(): rx_broadcast:0
PMD: i40e_update_vsi_stats(): rx_discards: 1192557
PMD: i40e_update_vsi_stats(): rx_unknown_protocol: 0
PMD: i40e_update_vsi_stats(): tx_bytes:0
PMD: i40e_update_vsi_stats(): tx_unicast:  0
PMD: i40e_update_vsi_stats(): tx_multicast:0
PMD: i40e_update_vsi_stats(): tx_broadcast:0
PMD: i40e_update_vsi_stats(): tx_discards: 0
PMD: i40e_update_vsi_stats(): tx_errors:   0
PMD: i40e_update_vsi_stats(): * VSI[6] stats end
***
PMD: i40e_dev_stats_get(): * PF stats start
***
PMD: i40e_dev_stats_get(): rx_bytes:24262434
PMD: i40e_dev_stats_get(): rx_unicast:  16779
PMD: i40e_dev_stats_get(): rx_multicast:0
PMD: i40e_dev_stats_get(): rx_broadcast:0
PMD: i40e_dev_stats_get(): rx_discards: 0
PMD: i40e_dev_stats_get(): rx_unknown_protocol: 16779
PMD: i40e_dev_stats_get(): tx_bytes:0
PMD: i40e_dev_stats_get(): tx_unicast:  0
PMD: i40e_dev_stats_get(): tx_multicast:0
PMD: i40e_dev_stats_get(): tx_broadcast:0
PMD: i40e_dev_stats_get(): tx_discards: 0
PMD: i40e_dev_stats_get(): tx_errors:   0
PMD: i40e_dev_stats_get(): tx_dropped_link_down: 0
PMD: i40e_dev_stats_get(): crc_errors:   0
PMD: i40e_dev_stats_get(): illegal_bytes:0
PMD: i40e_dev_stats_get(): error_bytes:  0
PMD: i40e_dev_stats_get(): mac_local_faults: 1
PMD: i40e_dev_stats_get(): mac_remote_faults:1
PMD: i40e_dev_stats_get(): rx_length_errors: 0
PMD: i40e_dev_stats_get(): link_xon_rx:  0
PMD: i40e_dev_stats_get(): link_xoff_rx: 0
PMD: i40e_dev_stats_get(): priority_xon_rx[0]:  0
PMD: i40e_dev_stats_get(): priority_xoff_rx[0]: 0
PMD: i40e_dev_stats_get(): priority_xon_rx[1]:  0
PMD: i40e_dev_stats_get(): priority_xoff_rx[1]: 0
PMD: i40e_dev_stats_get(): priority_xon_rx[2]:  0
PMD: i40e_dev_stats_get(): priority_xoff_rx[2]: 0
PMD: i40e_dev_stats_get(): priority_xon_rx[3]:  0
PMD: i40e_dev_stats_get(): priority_xoff_rx[3]: 0
PMD: i40e_dev_stats_get(): priority_xon_rx[4]:  0
PMD: i40e_dev_stats_get(): priority_xoff_rx[4]: 0
PMD: i40e_dev_stats_get(): priority_xon_rx[5]:  0
PMD: i40e_dev_stats_get(): priority_xoff_rx[5]: 0
PMD: i40e_dev_stats_get(): priority_xon_rx[6]:  0
PMD: i40e_dev_stats_get(): priority_xoff_rx[6]: 0
PMD: i40e_dev_stats_get(): priority_xon_rx[7]:  0
PMD: i40e_dev_stats_get(): priority_xoff_rx[7]: 0
PMD: i40e_dev_stats_get(): link_xon_tx:  0
PMD: i40e_dev_stats_get(): link_xoff_tx: 0
PMD: i40e_dev_stats_get(): priority_xon_tx[0]:  0
PMD: i40e_dev_stats_get(): priority_xoff_tx[0]: 0
PMD: i40e_dev_stats_get(): priority_xon_2_xoff[0]:  0
PMD: i40e_dev_stats_get(): priority_xon_tx[1]:  0
PMD: i40e_dev_stats_get(): priority_xoff_tx[1]: 0
PMD: i40e_dev_stats_get(): priority_xon_2_xoff[1]:  0
PMD: i40e_dev_stats_get(): priority_xon_tx[2]:  0
PMD: i40e_dev_stats_get(): priority_xoff_tx[2]: 0
PMD: i40e_dev_stats_get(): priority_xon_2_xoff[2]:  0
PMD: i40e_dev_stats_get(): priority_xon_tx[3]:  0
PMD: i40e_dev_stats_get(): priority_xoff_tx[3]: 0
PMD: i40e_dev_stats_get(): priority_xon_2_xoff[3]:  0
PMD: i40e_dev_stats_get(): priority_xon_tx[4]:  0
PMD: i40e_dev_stats_get(): priority_xoff_tx[4]: 0
PMD: i40e_dev_stats_get(): priority_xon_2_xoff[4]:  0
PMD: i40e_dev_stats_get(): priority_xon_tx[5]:  0
PMD: i40e_dev_stats_get(): priority_xoff_tx[5]: 0
PMD: i40e_dev_stats_get(): priority_xon_2_xoff[5]:  0
PMD: i40e_dev_stats_get(): priority_xon_tx[6]:  0
PMD: i40e_dev_stats_get(): priority_xoff_tx[6]: 0
PMD: i40e_dev_stats_get(): priority_xon_2_xoff[6]:  0
PMD: i40e_dev_stats_get(): priority_xon_tx[7]:  0
PMD: i40e_dev_stats_get(): priority_xoff_tx[7]: 0
PMD: i40e_dev_stats_get(): priority_xon_2_xoff[7]:  0
PMD: i40e_dev_stats_get(): rx_size_64:   0
PMD: i40e_dev_stats_get(): rx_size_127:  0
PMD: i40e_dev_stats_get(): rx_size_255:  0
PMD: i40e_de

[dpdk-dev] ixgbe: account more Rx errors Issue

2015-09-09 Thread Kyle Larose

On Mon, Sep 7, 2015 at 7:44 AM, Tahhan, Maryam 
wrote:

> > From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> > Sent: Monday, September 7, 2015 9:30 AM
> > To: Tahhan, Maryam; Andriy Berestovskyy
> > Cc: dev at dpdk.org
> > Subject: Re: ixgbe: account more Rx errors Issue
> >
> > Hi,
> >
> > On 09/06/2015 07:15 PM, Tahhan, Maryam wrote:
> > >> From: Andriy Berestovskyy [mailto:aber at semihalf.com]
> > >> Sent: Friday, September 4, 2015 5:59 PM
> > >> To: Tahhan, Maryam
> > >> Cc: dev at dpdk.org; Olivier MATZ
> > >> Subject: Re: ixgbe: account more Rx errors Issue
> > >>
> > >> Hi Maryam,
> > >> Please see below.
> > >>
> > >>> XEC counts the Number of receive IPv4, TCP, UDP or SCTP XSUM errors
> > >>
> > >> Please note than UDP checksum is optional for IPv4, but UDP packets
> > >> with zero checksum hit XEC.
> > >>
> > >
> > > I understand, but this is what the hardware register is picking up and
> what I
> > included previously is the definitions of the registers from the
> datasheet.
> > >
> > >>> And general crc errors counts Counts the number of receive packets
> > >>> with
> > >> CRC errors.
> > >>
> > >> Let me explain you with an example.
> > >>
> > >> DPDK 2.0 behavior:
> > >> host A sends 10M IPv4 UDP packets (no checksum) to host B host B
> > >> stats: 9M ipackets + 1M ierrors (missed) = 10M
> > >>
> > >> DPDK 2.1 behavior:
> > >> host A sends 10M IPv4 UDP packets (no checksum) to host B host B
> > >> stats: 9M ipackets + 11M in ierrors (1M missed + 10M XEC) = 20M?
> > >
> > > Because it's hitting the 2 error registers. If you had packets with
> multiple
> > errors that are added up as part of ierrors you'll still be getting more
> than
> > 10M errors which is why I asked for feedback on the 3 suggestions below.
> > What I'm saying is the number of errors being > the number of received
> > packets will be seen if you hit multiple error registers on the NIC.
> > >
> > >>
> > >>> So our options are we can:
> > >>> 1. Add only one of these into the error stats.
> > >>> 2. We can introduce some cooking of stats in this scenario, so only
> > >>> add
> > >> either or if they are equal or one is higher than the other.
> > >>> 3. Add them all which means you can have more errors than the number
> > >>> of
> > >> received packets, but TBH this is going to be the case if your
> > >> packets have multiple errors anyway.
> > >>
> > >> 4. ierrors should reflect NIC drops only.
> > >
> > > I may have misinterpreted this, but ierrors in rte_ethdev.h ierrors is
> defined
> > as the Total number of erroneous received packets.
> > > Maybe we need a clear definition or a separate drop counter as I see
> > uint64_t q_errors defined as: Total number of queue packets received that
> > are dropped.
> > >
> > >> XEC does not count drops, so IMO it should be removed from ierrors.
> > >
> > > While it's picking up the 0 checksum as an error (which it shouldn't
> > > necessarily be doing), removing it could mean missing other valid
> > > L3/L4 checksum errors... Let me experiment some more with L3/L4
> > > checksum errors and crcerrs to see if we can cook the stats around
> > > this register in particular. I would hate to remove it and miss
> > > genuine errors
> >
> > For me, the definition that looks the most straightforward is:
> >
> > ipackets = packets successfully received by hardware imissed = packets
> > dropped by hardware because the software does
> >   not poll fast enough (= queue full)
> > ierrors = packets dropped by hardware (malformed packets, ...)
> >
> > These 3 stats never count twice the same packet.
> >
> > If we want more statistics, they could go in xstats. For instance, a
> counter for
> > invalid checksum. The definition of these stats would be pmd-specific.
> >
> > I agree we should clarify and have a consensus on the definitions before
> going
> > further.
> >
> >
> > Regards,
> > Olivier
>
> Hi Olivier
> I think it's important to distinguish between errors and drops and provide
> a statistics API that exposes both. This way people have access to as much
> information as possible when things do go wrong and nothing is missed in
> terms of errors.
>
> My suggestion for the high level registers would be:
> ipackets = Total number of packets successfully received by hardware
> imissed = Total number of  packets dropped by hardware because the
> software does not poll fast enough (= queue full)
> idrops = Total number of packets dropped by hardware (malformed packets,
> ...) Where the # of drops can ONLY be <=  the packets received (without
> overlap between registers).
> ierrors = Total number of erroneous received packets. Where the # of
> errors can be >= the packets received (without overlap between registers),
> this is because there may be multiple errors associated with a packet.
>
> This way people can see how many packets were dropped and why at a high
> level as well as through the extended stats API rather than using one API
> or the other. What do you think?
>
> Best Regards
> Mar

[dpdk-dev] [PATCH 4/4 v2] vhost: fix wrong usage of eventfd_t

2015-09-09 Thread Yuanhan Liu

According to eventfd man page:

typedef uint64_t eventfd_t;

int eventfd_read(int fd, eventfd_t *value);
int eventfd_write(int fd, eventfd_t value);

eventfd_t is defined for the second arg(value), but not for fd.

Here I redefine those fd fields to `int' type, which also removes
the redundant (int) cast. And as the man page stated, we should
cast 1 to eventfd_t type for eventfd_write().

v2: cast 1 with `eventfd_t' type.

Signed-off-by: Yuanhan Liu 
---
 examples/vhost/main.c |  6 ++---
 lib/librte_vhost/rte_virtio_net.h |  4 ++--
 lib/librte_vhost/vhost_rxtx.c |  6 ++---
 lib/librte_vhost/vhost_user/virtio-net-user.c | 16 +++---
 lib/librte_vhost/virtio-net.c | 32 +--
 5 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 1b137b9..9eac2d0 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1433,7 +1433,7 @@ put_desc_to_used_list_zcp(struct vhost_virtqueue *vq, 
uint16_t desc_idx)

/* Kick the guest if necessary. */
if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
-   eventfd_write((int)vq->callfd, 1);
+   eventfd_write(vq->callfd, (eventfd_t)1);
 }

 /*
@@ -1626,7 +1626,7 @@ txmbuf_clean_zcp(struct virtio_net *dev, struct vpool 
*vpool)

/* Kick guest if required. */
if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
-   eventfd_write((int)vq->callfd, 1);
+   eventfd_write(vq->callfd, (eventfd_t)1);

return 0;
 }
@@ -1774,7 +1774,7 @@ virtio_dev_rx_zcp(struct virtio_net *dev, struct rte_mbuf 
**pkts,

/* Kick the guest if necessary. */
if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
-   eventfd_write((int)vq->callfd, 1);
+   eventfd_write(vq->callfd, (eventfd_t)1);

return count;
 }
diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index b9bf320..a037c15 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -87,8 +87,8 @@ struct vhost_virtqueue {
uint16_tvhost_hlen; /**< Vhost header 
length (varies depending on RX merge buffers. */
volatile uint16_t   last_used_idx;  /**< Last index used on 
the available ring */
volatile uint16_t   last_used_idx_res;  /**< Used for multiple 
devices reserving buffers. */
-   eventfd_t   callfd; /**< Used to notify the 
guest (trigger interrupt). */
-   eventfd_t   kickfd; /**< Currently unused 
as polling mode is enabled. */
+   int callfd; /**< Used to notify the 
guest (trigger interrupt). */
+   int kickfd; /**< Currently unused 
as polling mode is enabled. */
struct buf_vector   buf_vec[BUF_VECTOR_MAX];/**< for 
scatter RX. */
 } __rte_cache_aligned;

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index d412293..b2b2bcc 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -230,7 +230,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,

/* Kick the guest if necessary. */
if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
-   eventfd_write((int)vq->callfd, 1);
+   eventfd_write(vq->callfd, (eventfd_t)1);
return count;
 }

@@ -529,7 +529,7 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t 
queue_id,

/* Kick the guest if necessary. */
if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
-   eventfd_write((int)vq->callfd, 1);
+   eventfd_write(vq->callfd, (eventfd_t)1);
}

return count;
@@ -752,6 +752,6 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
vq->used->idx += entry_success;
/* Kick guest if required. */
if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
-   eventfd_write((int)vq->callfd, 1);
+   eventfd_write(vq->callfd, (eventfd_t)1);
return entry_success;
 }
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c 
b/lib/librte_vhost/vhost_user/virtio-net-user.c
index c1ffc38..4689927 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -214,10 +214,10 @@ virtio_is_ready(struct virtio_net *dev)
rvq = dev->virtqueue[VIRTIO_RXQ];
tvq = dev->virtqueue[VIRTIO_TXQ];
if (rvq && tvq && rvq->desc && tvq->desc &&
-   (rvq->kickfd != (eventfd_t)-1) &&
-   (rvq->callfd != (eventfd_t)-1) &&
-   (tvq->kickfd != (eventfd_t)-1) &&
-   (tvq->callfd != (eventfd_t)-1)) {
+   (rvq->kickfd != -1) &&
+

[dpdk-dev] [PATCH 3/4] vhost: get rid of duplicate code

2015-09-09 Thread Yuanhan Liu

Signed-off-by: Yuanhan Liu 
Acked-by: Changchun Ouyang 
Acked-by: Huawei Xie 
---
 lib/librte_vhost/vhost_user/vhost-net-user.c | 36 
 1 file changed, 10 insertions(+), 26 deletions(-)

diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c 
b/lib/librte_vhost/vhost_user/vhost-net-user.c
index f406a94..d1f8877 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -329,32 +329,16 @@ vserver_message_handler(int connfd, void *dat, int 
*remove)

ctx.fh = cfd_ctx->fh;
ret = read_vhost_message(connfd, &msg);
-   if (ret < 0) {
-   RTE_LOG(ERR, VHOST_CONFIG,
-   "vhost read message failed\n");
-
-   close(connfd);
-   *remove = 1;
-   free(cfd_ctx);
-   user_destroy_device(ctx);
-   ops->destroy_device(ctx);
-
-   return;
-   } else if (ret == 0) {
-   RTE_LOG(INFO, VHOST_CONFIG,
-   "vhost peer closed\n");
-
-   close(connfd);
-   *remove = 1;
-   free(cfd_ctx);
-   user_destroy_device(ctx);
-   ops->destroy_device(ctx);
-
-   return;
-   }
-   if (msg.request > VHOST_USER_MAX) {
-   RTE_LOG(ERR, VHOST_CONFIG,
-   "vhost read incorrect message\n");
+   if (ret <= 0 || msg.request > VHOST_USER_MAX) {
+   if (ret < 0)
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "vhost read message failed\n");
+   else if (ret == 0)
+   RTE_LOG(INFO, VHOST_CONFIG,
+   "vhost peer closed\n");
+   else
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "vhost read incorrect message\n");

close(connfd);
*remove = 1;
-- 
1.9.0

[dpdk-dev] [PATCH 2/4] vhost: fix typo

2015-09-09 Thread Yuanhan Liu

_det => _dev

Signed-off-by: Yuanhan Liu 
Acked-by: Changchun Ouyang 
Acked-by: Huawei Xie 
---
 lib/librte_vhost/virtio-net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index b520ec5..b670992 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -485,7 +485,7 @@ set_vring_num(struct vhost_device_ctx ctx, struct 
vhost_vring_state *state)
 }

 /*
- * Reallocate virtio_det and vhost_virtqueue data structure to make them on the
+ * Reallocate virtio_dev and vhost_virtqueue data structure to make them on the
  * same numa node as the memory of vring descriptor.
  */
 #ifdef RTE_LIBRTE_VHOST_NUMA
-- 
1.9.0

[dpdk-dev] [PATCH 1/4] vhost: remove redundant ;

2015-09-09 Thread Yuanhan Liu

Signed-off-by: Yuanhan Liu 
Acked-by: Changchun Ouyang 
Acked-by: Huawei Xie 
---
 lib/librte_vhost/vhost_rxtx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 0d07338..d412293 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -185,7 +185,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
}
}
len_to_cpy = RTE_MIN(data_len - offset, desc->len - 
vb_offset);
-   };
+   }

/* Update used ring with desc information */
vq->used->ring[res_cur_idx & (vq->size - 1)].id =
-- 
1.9.0

[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-09-09 Thread Ananyev, Konstantin

Hi Thomas,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Wednesday, September 09, 2015 1:19 PM
> To: Zhang, Helin
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for 
> all NICs but 82598
> 
> 2015-08-25 20:13, Zhang, Helin:
> > Yes, I got the perfect answers. Thank you very much!
> > I just wanted to make sure the test case was OK with the limit of maximum 
> > number of descriptors, as I heard there is a hang issue on
> other NICs of using more descriptors than hardware allowed.
> > OK. I am still waiting for the answers/confirmation from x540 hardware 
> > designers. We need all agree on your patches to avoid risks.
> 
> Helin, any news?
> Can we apply v4 of this patch?

Unfortunately we are seeing a huge performance drop with that patch:
On my box bi-directional traffic (64B packet) over one port can't reach even 11 
Mpps.
We are still doing some experiments and consultations with ND guys how we can 
overcome
this problem and keep performance intact.

Konstantin

[dpdk-dev] DPDK 2.2 roadmap

2015-09-09 Thread O'Driscoll, Tim

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Wednesday, September 9, 2015 9:45 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] DPDK 2.2 roadmap
> 
> Hello,
> 
> The new features for the release 2.2 must be first submitted before 2nd
> October.
> They should be integrated before 23rd October.
> 
> In order to ease cooperation and integration, it would be nice to see
> announces or technical details of planned features for 2.2 or 2.3.
> Then the roadmap page will be filled accordingly:
>   http://dpdk.org/dev/roadmap
> Generally speaking, it helps to know what you are working on and what is
> the
> status.

I think it's a great idea to create an overall roadmap for the project for 2.2 
and beyond. To kick this off, below are the features that we're hoping to 
submit for the 2.2 release. You should be seeing RFCs and v1 patch sets for 
these over the next few weeks.

Userspace Ethtool Sample App:  Further enhancements to the userspace ethtool 
implementation that was submitted in 2.1 including: Implement rte_ethtool shim 
layer based on rte_ethdev API, Provide a sample application that 
demonstrates/allows testing of the rte_ethtool API, Implement 
rte_ethtool_get_ringparam/rte_ethtool_set_ringparam, Rework 
rte_eth_dev_get_reg_info() so we can get/set HW registers in a convenient way.

Vector PMD for fm10k:  A vector PMD, similar to the one that currently exists 
for the Intel 10Gbps NICs, will be implemented for fm10k.

DCB for i40e &X550:  DCB support will be extended to the i40e and X550 NICs.

IEEE1588 on X550 & fm10k:  IEEE1588 support will be extended to the X550 and 
fm10k.

IEEE1588 Sample App:  A sample application for an IEEE1588 Client.

Cryptodev Library:  This implements a cryptodev API and PMDs for the Intel 
Quick Assist Technology DH895xxC hardware accelerator and the AES-NI 
multi-buffer software implementation. See 
http://dpdk.org/ml/archives/dev/2015-August/022930.html for further details.

IPsec Sample App:  A sample application will be provided to show how the 
cryptodev library can be used to implement IPsec. This will be based on the 
NetBSD IPsec implementation.

Interrupt Mode for i40e, fm10k & e1000:  Interrupt mode support, which was 
added in the 2.1 release, will be extended to the 140e, fm10k and e1000.

Completion of PCI Hot Plug:  PCI Hot Plug support, which was added in the 2.0 
and 2.1 releases, will be extended to the Xenvirt and Vmxnet3 PMDs.

Increase Number of Next Hops for LPM (IPv4 and IPv6):  The current DPDK 
implementation for LPM for IPv4 and IPv6 limits the number of next hops to 256, 
as the next hop ID is an 8-bit long field. The size of the field will  be 
increased to allow an increased number of next hops.

Performance Thread Sample App:  A sample application will be provided showing 
how different threading models can be used in DPDK. It will be possible to 
configure the application for, and contrast forwarding performance of, 
different threading models including lightweight threads. See 
http://dpdk.org/ml/archives/dev/2015-September/023186.html for further details.

DPDK Keep-Alive:  The purpose is to detect packet processing core failures 
(e.g. infinite loop) and ensure the failure of the core does not result in a 
fault that is not detectable by a management entity.

Vhost Offload Feature Support:  This feature will implement virtio TSO offload 
to help improve performance.

Common Port Statistics:  This feature will extend the exposed NIC statistics, 
improving the method of presentation to make it obvious what their purpose is. 
This functionality is based on the rte_eth_xstats_* extended stats API 
implemented in DPDK 2.1.

NFV Use-cases using Packet Framework (Edge Router):  Enhancements will be made 
to the IP_Pipeline application and the Packet Framework libraries so that they 
can be used to support an Edge Router NFV use case.

Refactor EAL for Non-PCI Devices:  This has been discussed extensively on the 
mailing list. See the RFCs for Refactor eal driver registration code 
(http://dpdk.org/ml/archives/dev/2015-September/023257.html) and Remove pci 
driver from vdevs (http://dpdk.org/ml/archives/dev/2015-August/023016.html).

Vhost Multi-Queue Support:  The vhost-user library will be updated to provide 
multi-queue support in the host similar to the RSS model so that guest driver 
may allocate multiple rx/tx queues which then may be used to load balance 
between cores. See http://dpdk.org/ml/archives/dev/2015-August/022758.html for 
more details.

Virtio Vhost Optimization:  Virtio and vhost performance will be optimized to 
allow efficient high performance packet movement between guest and host.

Config Granularity of RSS for i40e:  All RSS hash and filter configurations for 
IPv4/v6, TCP/UDP, GRE, etc will be implemented for i40e. This includes support 
for QinQ and tunneled packets.

I40e 32-bit GRE Keys:  Both 24 and 32 bit keys for GRE will be supported for 
i40e.

X550

[dpdk-dev] [PATCH] hash: fix wrong size memory calculation

2015-09-09 Thread Thomas Monjalon

2015-08-31 14:30, Pablo de Lara:
> When calculating the size for the table which allocates
> the keys, size was calculated wrongly from multiplying
> two 32-bit variables, resulting on a 32-bit number,
> before casting to 64-bit, so maximum size was 4G.
> 
> Fixes: 48a399119619 ("hash: replace with cuckoo hash implementation")
> 
> Signed-off-by: Pablo de Lara 

Applied, thanks

[dpdk-dev] [PATCH 3/5] example_ip_pipeline: fix sizeof() on memcpy

2015-09-09 Thread Stephen Hemminger

On Wed, 9 Sep 2015 18:25:53 +
"Dumitrescu, Cristian"  wrote:

> > diff --git a/examples/ip_pipeline/init.c b/examples/ip_pipeline/init.c
> > index 3f9c68d..75e3767 100644
> > --- a/examples/ip_pipeline/init.c
> > +++ b/examples/ip_pipeline/init.c
> > @@ -1325,7 +1325,7 @@ app_pipeline_type_cmd_push(struct app_params
> > *app,
> > /* Push pipeline commands into the application */
> > memcpy(&app->cmds[app->n_cmds],
> > cmds,
> > -   n_cmds * sizeof(cmdline_parse_ctx_t *));
> > +   n_cmds * sizeof(cmdline_parse_ctx_t));  
> 
> Actually no, as we are both the destination and the source of memcpy are 
> array of pointers.
> 
> The source is a pipeline type, which is a static global data structure, so no 
> issues with the life time of the data pointed to by the pointers.

In order to make tools happy, shouldn't the source and target be pointers to 
array of pointers.
In the current code
  &app->cmd[app->n_cmds] is type cmdline_parse_ctx_t *
  cmds is type cmdline_parse_ctx_t *

And type cmdline_parse_ctx_t is already a pointer.

Copying a set of pointers to pointers vs set of pointers will be the same since 
both are
the same size, but static checking tools see the problem.

This is why kernel developers particularly despise typedefs which hide
pointers like this one.

[dpdk-dev] Random packet drops with ip_pipeline on R730.

2015-09-09 Thread Dumitrescu, Cristian

Hi Husainee,

Looking at your config file, looks like you are using an old DPDK release prior 
to 2.1, can you please try out same simple test in your environment for latest 
DPDK 2.1 release?

We did a lot of work in DPDK release 2.1 for the ip_pipeline application, we 
basically rewrote large parts of it, including the parser, checks, run-time, 
library of pipelines, etc. The format of the config file has been improved a 
lot, you should be able to adapt your config file to the latest syntax very 
quickly.

Btw, you config file is not really equivalent to the l2fwd, as you are using 
two CPU cores connected through software rings rather than a single core, as 
l2fwd.

Here is an equivalent DPDK 2.1 config file using two cores connected through 
software rings (port 0 -> port 1, port 1-> port 0, port 2 -> port 3, port 3 -> 
port2):

[PIPELINE0]
type = MASTER
core = 0

[PIPELINE1]
type = PASS-THROUGH
core = 1
pktq_in = RXQ0.0 RXQ1.0 RXQ2.0 RXQ3.0
pktq_out = SWQ0 SWQ1 SWQ2 SWQ3

[PIPELINE2]
type = PASS-THROUGH
core = 2; you can also place PIPELINE2 on same core as PIPELINE1: core = 1
pktq_in = SWQ1 SWQ0 SWQ3 SWQ2
pktq_out = TXQ0.0 TXQ1.0 TXQ2.0 TXQ3.0

Here is an config file doing similar processing with a single core, closer 
configuration to l2fwd (port 0 -> port 1, port 1-> port 0, port 2 -> port 3, 
port 3 -> port2):

[PIPELINE0]
type = MASTER
core = 0

[PIPELINE1]
type = PASS-THROUGH
core = 1
pktq_in = RXQ0.0 RXQ1.0 RXQ2.0 RXQ3.0
pktq_out = TXQ1.0 TXQ0.0 TXQ3.0 TXQ2.0

Regards,
Cristian

From: husainee [mailto:husainee.plum...@nevisnetworks.com]
Sent: Wednesday, September 9, 2015 12:47 PM
To: Dumitrescu, Cristian; dev at dpdk.org
Cc: Cao, Waterman
Subject: Re: [dpdk-dev] Random packet drops with ip_pipeline on R730.

Hi Cristian
PFA the config file.

I am sending packets from port0 and receiving on port1.

By random packet drops I mean, on every run the number of packets dropped is 
not same. Here are some results as below.

Frame sent rate 1488095.2 fps, 64Byte packets (100% of 1000Mbps)
Run1- 0.0098% (20-22 Million Packets)
Run2- 0.021% (20-22 Million Packets)
Run3- 0.0091% (20-22 Million Packets)

Frame rate 744047.62 fps, 64 Byte packets, (50% of 1000Mbps)
Run1- 0.0047% (20-22 Million Packets)
Run2- 0.0040% (20-22 Million Packets)
Run3- 0.0040% (20-22 Million Packets)


Frame rate 148809.52 fps, 64 Byte packets,(10% of 1000Mbps)
Run1- 0 (20-22 Million Packets)
Run2- 0 (20-22 Million Packets)
Run3- 0 (20-22 Million Packets)



Following are the hw nic setting differences btw ip_pipeline and l2fwd app.
parameter

ip_pipeline

l2fwd

jumbo frame

1

0

hw_ip_checksum

1

0

rx_conf. wthresh

4

0

rx_conf.rx_free_thresh

64

32

tx_conf.pthresh

36

32

burst size

64

32


We tried to make the ip_pipeline settings same as l2fwd but no change in 
results.

I have not tried with 10GbE . I do not have 10GbE test equipment.



regards
husainee



On 09/08/2015 06:32 PM, Dumitrescu, Cristian wrote:

Hi Husainee,



Can you please explain what do you mean by random packet drops? What percentage 
of the input packets get dropped, does it take place on every run, does the 
number of dropped packets vary on every run, etc?



Are you also able to reproduce this issue with other NICs, e.g. 10GbE NIC?



Can you share your config file?



Can you please double check the low level NIC settings between the two 
applications, i.e. the settings in structures link_params_default, 
default_hwq_in_params, default_hwq_out_params from ip_pipeline file 
config_parse.c vs. their equivalents from l2fwd? The only thing I can think of 
right now is maybe one of the low level threshold values for the Ethernet link 
is not tuned for your 1GbE NIC.



Regards,

Cristian



-Original Message-

From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of husainee

Sent: Tuesday, September 8, 2015 7:56 AM

To: dev at dpdk.org

Subject: [dpdk-dev] Random packet drops with ip_pipeline on R730.



Hi



I am using a DELL730 with Dual socket. Processor in each socket is

Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz- 6Cores.

The CPU layout has socket 0 with 0,2,4,6,8,10 cores and socket 1 with

1,3,5,7,9,11 cores.

The NIC card is i350.



The Cores 2-11 are isolated using isolcpus kernel parameter. We are

running the ip_peipeline application with only Master, RX and TX threads

(Flow and Route have been removed from cfg file). The threads are run as

follows



- Master on CPU core 2

- RX on CPU core 4

- TX on CPU core 6



64 byte packets are sent from ixia at different speeds, but we are

seeing random packet drops.  Same excercise is done on core 3,5,7 and

results are same.



We tried the l2fwd app and it works fine with no packet drops.



Hugepages per 1024 x 2M per socket.





Can anyone suggest what could be the reason for these random packet

drops.



regards

husainee

[dpdk-dev] [PATCH] replaced O(n^2) sort in sort_by_physaddr() with qsort() from standard library

2015-09-09 Thread Gonzalez Monroy, Sergio

Following conversation in 
http://dpdk.org/ml/archives/dev/2015-September/023230.html :

On 17/12/2014 13:31, rolette at infiniteio.com (Jay Rolette) wrote:
> Signed-off-by: Jay Rolette 
> ---
Update commit title/description, maybe something like:
   eal/linux: use qsort for sorting hugepages
   Replace O(n^2) sort in sort_by_physaddr() with qsort() from standard 
library
>   lib/librte_eal/linuxapp/eal/eal_memory.c | 59 
> +++-
>   1 file changed, 20 insertions(+), 39 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
> b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index bae2507..3656515 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -670,6 +670,25 @@ error:
>   return -1;
>   }
>   
> +static int
> +cmp_physaddr(const void *a, const void *b)
> +{
> +#ifndef RTE_ARCH_PPC_64
> + const struct hugepage_file *p1 = (const struct hugepage_file *)a;
> + const struct hugepage_file *p2 = (const struct hugepage_file *)b;
> +#else
> + // PowerPC needs memory sorted in reverse order from x86
> + const struct hugepage_file *p1 = (const struct hugepage_file *)b;
> + const struct hugepage_file *p2 = (const struct hugepage_file *)a;
> +#endif
> + if (p1->physaddr < p2->physaddr)
> + return -1;
> + else if (p1->physaddr > p2->physaddr)
> + return 1;
> + else
> + return 0;
> +}
> +
There were a couple of comments from Thomas about the comments style and 
#ifdef:
- Comment style should be modified as per 
http://dpdk.org/doc/guides/contributing/coding_style.html#c-comment-style
- Regarding the #ifdef, I agree with Jay that it is the current approach 
in the file.
>   /*
>* Sort the hugepg_tbl by physical address (lower addresses first on x86,
>* higher address first on powerpc). We use a slow algorithm, but we won't
> @@ -678,45 +697,7 @@ error:
>   static int
>   sort_by_physaddr(struct hugepage_file *hugepg_tbl, struct hugepage_info 
> *hpi)
>   {
> - unsigned i, j;
> - int compare_idx;
> - uint64_t compare_addr;
> - struct hugepage_file tmp;
> -
> - for (i = 0; i < hpi->num_pages[0]; i++) {
> - compare_addr = 0;
> - compare_idx = -1;
> -
> - /*
> -  * browse all entries starting at 'i', and find the
> -  * entry with the smallest addr
> -  */
> - for (j=i; j< hpi->num_pages[0]; j++) {
> -
> - if (compare_addr == 0 ||
> -#ifdef RTE_ARCH_PPC_64
> - hugepg_tbl[j].physaddr > compare_addr) {
> -#else
> - hugepg_tbl[j].physaddr < compare_addr) {
> -#endif
> - compare_addr = hugepg_tbl[j].physaddr;
> - compare_idx = j;
> - }
> - }
> -
> - /* should not happen */
> - if (compare_idx == -1) {
> - RTE_LOG(ERR, EAL, "%s(): error in physaddr sorting\n", 
> __func__);
> - return -1;
> - }
> -
> - /* swap the 2 entries in the table */
> - memcpy(&tmp, &hugepg_tbl[compare_idx],
> - sizeof(struct hugepage_file));
> - memcpy(&hugepg_tbl[compare_idx], &hugepg_tbl[i],
> - sizeof(struct hugepage_file));
> - memcpy(&hugepg_tbl[i], &tmp, sizeof(struct hugepage_file));
> - }
> + qsort(hugepg_tbl, hpi->num_pages[0], sizeof(struct hugepage_file), 
> cmp_physaddr);
>   return 0;
>   }
Why not just remove sort_by_physaddr() and call qsort() directly?

Sergio

[dpdk-dev] [PATCH v4 0/2] ethdev: add port speed capability

2015-09-09 Thread Nélio Laranjeiro

bitmap
Reply-To: 

Shern , Adrien
Mazarguil 
Bcc: 
Subject: Re: [dpdk-dev] [PATCH v4 0/2] ethdev: add port speed capability bitmap
Reply-To: 
In-Reply-To: <20150909090855.GC17463 at autoinstall.dev.6wind.com>

Marc,

On Tue, Sep 08, 2015 at 10:24:36PM +0200, Marc Sune wrote:
> Neilo,
> 
> 2015-09-08 12:03 GMT+02:00 N?lio Laranjeiro :
> 
> On Mon, Sep 07, 2015 at 10:52:53PM +0200, Marc Sune wrote:
> > 2015-08-29 2:16 GMT+02:00 Marc Sune :
> >
> > > The current rte_eth_dev_info abstraction does not provide any 
> mechanism
> to
> > > get the supported speed(s) of an ethdev.
> > >
> > > For some drivers (e.g. ixgbe), an educated guess can be done based on
> the
> > > driver's name (driver_name in rte_eth_dev_info), see:
> > >
> > > http://dpdk.org/ml/archives/dev/2013-August/000412.html
> > >
> > > However, i) doing string comparisons is annoying, and can silently
> > > break existing applications if PMDs change their names ii) it does not
> > > provide all the supported capabilities of the ethdev iii) for some
> drivers
> > > it
> > > is impossible determine correctly the (max) speed by the application
> > > (e.g. in i40, distinguish between XL710 and X710).
> > >
> > > This small patch adds speed_capa bitmap in rte_eth_dev_info, which is
> > > filled
> > > by the PMDs according to the physical device capabilities.
> > >
> > > v2: rebase, converted speed_capa into 32 bits bitmap, fixed alignment
> > > (checkpatch).
> > >
> > > v3: rebase to v2.1. unified ETH_LINK_SPEED and ETH_SPEED_CAP into
> > > ETH_SPEED.
> > >? ? ?Converted field speed in struct rte_eth_conf to speeds, to allow a
> > > bitmap
> > >? ? ?for defining the announced speeds, as suggested by M. Brorup. 
> Fixed
> > >? ? ?spelling issues.
> > >
> > > v4: fixed errata in the documentation of field speeds of rte_eth_conf,
> and
> > >? ? ?commit 1/2 message. rebased to v2.1.0. v3 was incorrectly based on
> > >? ? ?~2.1.0-rc1.
> > >
> >
> > Thomas,
> >
> > Since mostly you were commenting for v1 and v2; any opinion on this one?
> >
> > Regards
> > marc
> 
> Hi Marc,
> 
> I have read your patches, and there are a few mistakes, for instance mlx4
> (ConnectX-3 devices) does not support 100Gbps.
> 
> 
> When I circulated v1 and v2 I was kindly asking maintainers and reviewers of
> the drivers to fix any mistakes in SPEED capabilities, since I was taking the
> speeds from the online websites&catalogues. Some were fixed, but apparently
> some were still missing. I will remove 100Gbps. Please circulate any other
> error you have spotted.

>From Mellanox website:
 - ConnectX-3 EN: 10/40/56Gb/s
 - ConnectX-3 Pro EN 10GBASE-T: 10G/s
 - ConnectX-3 Pro: EN 10/40/56GbE
 - ConnectX-3 Pro Programmable: 10/40Gb/s 

This PMD works with any of the ConnectX-3 adapters, so the announce speed
should be 10/40/56Gb/s.

> In addition, it seems your new bitmap does not support all kind of
> speeds, take a look at the header of Ethtool, in the Linux kernel
> (include/uapi/linux/ethtool.h) which already consumes 30bits without even
> managing speeds above 56Gbps.
> 
> 
> The bitmaps you are referring is SUPPORTED_ and ADVERTISED_. These bitmaps not
> only contain the speeds but PHY properties (e.g. BASE for ETH).
> 
> The intention of this patch was to expose speed capabilities, similar to the
> bitmap SPEED_ in include/uapi/linux/ethtool.h, which as you see maps closely 
> to
> ETH_SPEED_ proposed in this patch.
> 
> I think the encoding of other things, like the exact model of the interface 
> and
> its PHY details should go somewhere else. But I might be wrong here, so open 
> to
> hear opinions.

I understand the need to have capability fields, but I don't understand
why you want to mix speeds and duplex mode in something which was
previously only handling speeds.

We now have redundant information in struct rte_eth_conf, whereas
that structure has a speed field which embeds the duplex mode and
a duplex field which does the same, which one should be used? 

> It would be nice to keep the field to represent the real speed of the
> link, in case it is not represented by the bitmap, it could be also
> useful for aggregated links (bonding for instance).? The current API
> already works this way, it just needs to be extended from 16 to 32 bit
> to manage speed above 64Gbps.
> 
> 
> This patch does not remove rte_eth_link_get() API. It just changes the 
> encoding
> of speed in struct rte_eth_link, to have an homogeneous set of constants with
> the speed capabilities bitmap, as discussed previously in the thread (see
> Thomas comments). IOW, it returns now a single SPEED_ value in the struct
> rte_eth_link's link_speed field.

You change the coding of the speed field, but applications still expect
an integer, see port_infos_display funct

[dpdk-dev] DPDK 2.2 roadmap

2015-09-09 Thread Stephen Hemminger

On Wed, 9 Sep 2015 17:48:44 +
"Patel, Rashmin N"  wrote:

> There were two line items on the 2.2 roadmap: Xen Driver and Hyper-V Driver. 
> Can you provide some more details?
> 

Brocade will be resubmitting the Xen netfront and Hyper-V virtual drivers.
I had been holding off until some issues found during QA testing and DPDK 2.1 
was released.
Now that 2.1 is out, and QA tests are looking good will rebase in a couple of 
weeks.

[dpdk-dev] [PATCH 4/4] vhost: define callfd and kickfd as int type

2015-09-09 Thread Yuanhan Liu

On Wed, Sep 09, 2015 at 02:41:37AM +, Ouyang, Changchun wrote:
> 
> 
> > -Original Message-
> > From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> > Sent: Monday, August 24, 2015 11:55 AM
> > To: dev at dpdk.org
> > Cc: Xie, Huawei; Ouyang, Changchun; Yuanhan Liu
> > Subject: [PATCH 4/4] vhost: define callfd and kickfd as int type
> > 
> > So that we can remove the redundant (int) cast.
> > 
> > Signed-off-by: Yuanhan Liu 
> > ---
> >  examples/vhost/main.c |  6 ++---
> >  lib/librte_vhost/rte_virtio_net.h |  4 ++--
> >  lib/librte_vhost/vhost_rxtx.c |  6 ++---
> >  lib/librte_vhost/vhost_user/virtio-net-user.c | 16 +++---
> >  lib/librte_vhost/virtio-net.c | 32 
> > +--
> >  5 files changed, 32 insertions(+), 32 deletions(-)
> > 
> > diff --git a/examples/vhost/main.c b/examples/vhost/main.c index
> > 1b137b9..b090b25 100644
> > --- a/examples/vhost/main.c
> > +++ b/examples/vhost/main.c
> > @@ -1433,7 +1433,7 @@ put_desc_to_used_list_zcp(struct vhost_virtqueue
> > *vq, uint16_t desc_idx)
> > 
> > /* Kick the guest if necessary. */
> > if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
> > -   eventfd_write((int)vq->callfd, 1);
> > +   eventfd_write(vq->callfd, 1);
> 
> Don't we need type conversion for '1' to eventfd_t here?

Nope. See eventfd_write man page:

   int eventfd_read(int fd, eventfd_t *value);
   int eventfd_write(int fd, eventfd_t value);

--yliu

[dpdk-dev] FW: [PATCH v2 1/1] ip_frag: fix creating ipv6 fragment extension header

2015-09-09 Thread Ananyev, Konstantin

Hi Piotr,

> -Original Message-
> From: Azarewicz, PiotrX T
> Sent: Tuesday, September 08, 2015 3:08 PM
> To: dev at dpdk.org
> Cc: Dumitrescu, Cristian; Ananyev, Konstantin; Azarewicz, PiotrX T
> Subject: [PATCH v2 1/1] ip_frag: fix creating ipv6 fragment extension header
> 
> Previous implementation won't work on every environment. The order of
> allocation of bit-fields within a unit (high-order to low-order or
> low-order to high-order) is implementation-defined.
> Solution: used bytes instead of bit fields.
> 
> v2 changes:
> - remove useless union
> - fix process_ipv6 function (due to remove the union above)
> 
> Signed-off-by: Piotr Azarewicz 
> ---
>  lib/librte_ip_frag/rte_ip_frag.h|   13 ++---
>  lib/librte_ip_frag/rte_ipv6_fragmentation.c |6 ++
>  lib/librte_port/rte_port_ras.c  |   10 +++---
>  3 files changed, 11 insertions(+), 18 deletions(-)
> 
> diff --git a/lib/librte_ip_frag/rte_ip_frag.h 
> b/lib/librte_ip_frag/rte_ip_frag.h
> index 52f44c9..f3ca566 100644
> --- a/lib/librte_ip_frag/rte_ip_frag.h
> +++ b/lib/librte_ip_frag/rte_ip_frag.h
> @@ -130,17 +130,8 @@ struct rte_ip_frag_tbl {
>  /** IPv6 fragment extension header */
>  struct ipv6_extension_fragment {
>   uint8_t next_header;/**< Next header type */
> - uint8_t reserved1;  /**< Reserved */
> - union {
> - struct {
> - uint16_t frag_offset:13; /**< Offset from the start of 
> the packet */
> - uint16_t reserved2:2; /**< Reserved */
> - uint16_t more_frags:1;
> - /**< 1 if more fragments left, 0 if last fragment */
> - };
> - uint16_t frag_data;
> - /**< union of all fragmentation data */
> - };
> + uint8_t reserved;   /**< Reserved */
> + uint16_t frag_data; /**< All fragmentation data */
>   uint32_t id;/**< Packet ID */
>  } __attribute__((__packed__));
> 
> diff --git a/lib/librte_ip_frag/rte_ipv6_fragmentation.c 
> b/lib/librte_ip_frag/rte_ipv6_fragmentation.c
> index 0e32aa8..ab62efd 100644
> --- a/lib/librte_ip_frag/rte_ipv6_fragmentation.c
> +++ b/lib/librte_ip_frag/rte_ipv6_fragmentation.c
> @@ -65,10 +65,8 @@ __fill_ipv6hdr_frag(struct ipv6_hdr *dst,
> 
>   fh = (struct ipv6_extension_fragment *) ++dst;
>   fh->next_header = src->proto;
> - fh->reserved1   = 0;
> - fh->frag_offset = rte_cpu_to_be_16(fofs);
> - fh->reserved2   = 0;
> - fh->more_frags  = rte_cpu_to_be_16(mf);
> + fh->reserved = 0;
> + fh->frag_data = rte_cpu_to_be_16((fofs & ~IPV6_HDR_FO_MASK) | mf);
>   fh->id = 0;
>  }
> 
> diff --git a/lib/librte_port/rte_port_ras.c b/lib/librte_port/rte_port_ras.c
> index 6bd0f8c..3dbd5be 100644
> --- a/lib/librte_port/rte_port_ras.c
> +++ b/lib/librte_port/rte_port_ras.c
> @@ -205,6 +205,9 @@ process_ipv4(struct rte_port_ring_writer_ras *p, struct 
> rte_mbuf *pkt)
>   }
>  }
> 
> +#define MORE_FRAGS(x) ((x) & 0x0001)
> +#define FRAG_OFFSET(x) ((x) >> 3)
> +
>  static void
>  process_ipv6(struct rte_port_ring_writer_ras *p, struct rte_mbuf *pkt)
>  {
> @@ -212,12 +215,13 @@ process_ipv6(struct rte_port_ring_writer_ras *p, struct 
> rte_mbuf *pkt)
>   struct ipv6_hdr *pkt_hdr = rte_pktmbuf_mtod(pkt, struct ipv6_hdr *);
> 
>   struct ipv6_extension_fragment *frag_hdr;
> + uint16_t frag_data = 0;
>   frag_hdr = rte_ipv6_frag_get_ipv6_fragment_header(pkt_hdr);
> - uint16_t frag_offset = frag_hdr->frag_offset;
> - uint16_t frag_flag = frag_hdr->more_frags;
> + if (frag_hdr != NULL)
> + frag_data = rte_be_to_cpu_16(frag_hdr->frag_data);
> 
>   /* If it is a fragmented packet, then try to reassemble */
> - if ((frag_flag == 0) && (frag_offset == 0))
> + if ((MORE_FRAGS(frag_data) == 0) && (FRAG_OFFSET(frag_data) == 0))
>   p->tx_buf[p->tx_buf_count++] = pkt;
>   else {
>   struct rte_mbuf *mo;
> --
> 1.7.9.5

When I wrote " provide macros to read/set fragment_offset and more_flags 
values",
I meant move IPV6_HDR_*macros that are useful from 
lib/librte_ip_frag/rte_ipv6_fragmentation.c
into rte_ip_frag.h and add new one based on them.
Obviously it seems strange to define some macros in .c file, never use most of 
them,
and in another .c file use hardcoded values instead of them again.

Something like:

#define RTE_IPV6_HDR_MF_MASK 1
#define RTE_IPV6_HDR_FO_SHIFT   3
#define RTE_IPV6_HDR_FO_MASK   ((1 << IPV6_HDR_FO_SHIFT) - 
1))

#define RTE_IPV6_GET_MF(x) ((x) & RTE_IPV6_HDR_MF_MASK)
#define RTE_IPV6_GET_FO(x)  ((x) >> RTE_IPV6_HDR_FO_SHIFT)

#define RTE_IPV6_FRAG_DATA(fo, mf)  (((fo) & ~RTE_IPV6_HDR_FO_MASK)) | 
((mf) & RTE_IPV6_HDR_MF_MASK))

And then use these macros in both .c files.

Actually another note:
+   if ((MORE_FRAGS(frag_data) =

[dpdk-dev] DPDK 2.2 roadmap

2015-09-09 Thread Thomas Monjalon

Hello,

The new features for the release 2.2 must be first submitted before 2nd October.
They should be integrated before 23rd October.

In order to ease cooperation and integration, it would be nice to see
announces or technical details of planned features for 2.2 or 2.3.
Then the roadmap page will be filled accordingly:
http://dpdk.org/dev/roadmap
Generally speaking, it helps to know what you are working on and what is the
status.

Thanks

[dpdk-dev] [PATCH v1] change hugepage sorting to avoid overlapping memcpy

2015-09-09 Thread Ralf Hoffmann

Hi Sergio,

On 08.09.2015 14:45, Gonzalez Monroy, Sergio wrote:
> Just a few comments/suggestions:
> 
> Add 'eal/linux:'  to the commit title, ie:
>   "eal/linux: change hugepage sorting to avoid overlapping memcpy"
> 

I would modify the patch according to your notes if needed, but if you
consider the other patch from Jay, then I would vote for that instead.
Actually, I thought about using qsort too, but decided against it to
keep the number of changes low and the sorting speed is not a problem
for me. Changing the return value of that function to void might still
be a good idea.

Best Regards,

Ralf

-- 
Ralf Hoffmann
Allegro Packets GmbH
K?the-Kollwitz-Str. 54
04109 Leipzig
HRB 30535, Amtsgericht Leipzig

[dpdk-dev] [PATCH v2 1/1] ip_frag: fix creating ipv6 fragment extension header

2015-09-09 Thread Dumitrescu, Cristian



> -Original Message-
> From: Azarewicz, PiotrX T
> Sent: Tuesday, September 8, 2015 5:08 PM
> To: dev at dpdk.org
> Cc: Dumitrescu, Cristian; Ananyev, Konstantin; Azarewicz, PiotrX T
> Subject: [PATCH v2 1/1] ip_frag: fix creating ipv6 fragment extension header
> 
> Previous implementation won't work on every environment. The order of
> allocation of bit-fields within a unit (high-order to low-order or
> low-order to high-order) is implementation-defined.
> Solution: used bytes instead of bit fields.
> 
> v2 changes:
> - remove useless union
> - fix process_ipv6 function (due to remove the union above)
> 
> Signed-off-by: Piotr Azarewicz 
> ---

Acked-by: Cristian Dumitrescu

[dpdk-dev] virtio optimization idea

2015-09-09 Thread Michael S. Tsirkin

On Fri, Sep 04, 2015 at 08:25:05AM +, Xie, Huawei wrote:
> Hi:
> 
> Recently I have done one virtio optimization proof of concept. The
> optimization includes two parts:
> 1) avail ring set with fixed descriptors
> 2) RX vectorization
> With the optimizations, we could have several times of performance boost
> for purely vhost-virtio throughput.

Thanks!
I'm very happy to see people work on the virtio ring format
optimizations.

I think it's best to analyze each optimization separately,
unless you see a reason why they would only give benefit when applied
together.

Also ideally, we'd need a unit test to show the performance impact.
We've been using the tests in tools/virtio/ under linux,
feel free to enhance these to simulate more workloads, or
to suggest something else entirely.


> Here i will only cover the first part, which is the prerequisite for the
> second part.
> Let us first take RX for example. Currently when we fill the avail ring
> with guest mbuf, we need
> a) allocate one descriptor(for non sg mbuf) from free descriptors
> b) set the idx of the desc into the entry of avail ring
> c) set the addr/len field of the descriptor to point to guest blank mbuf
> data area
> 
> Those operation takes time, and especially step b results in modifed (M)
> state of the cache line for the avail ring in the virtio processing
> core. When vhost processes the avail ring, the cache line transfer from
> virtio processing core to vhost processing core takes pretty much CPU
> cycles.
> To solve this problem, this is the arrangement of RX ring for DPDK
> pmd(for non-mergable case).
>
> avail  
> idx
> +  
> |  
> +++---+-+--+   
> | 0  | 1  | 2 | ... |  254  | 255  |  avail ring
> +-+--+-+--+-+-+-+---+--+---+   
>   |||   |   |  |   
>   |||   |   |  |   
>   vvv   |   v  v   
> +-+--+-+--+-+-+-+---+--+---+   
> | 0  | 1  | 2 | ... |  254  | 255  |  desc ring
> +++---+-+--+   
> |  
> |  
> +++---+-+--+   
> | 0  | 1  | 2 | |  254  | 255  |  used ring
> +++---+-+--+   
> |  
> +
> Avail ring is initialized with fixed descriptor and is never changed,
> i.e, the index value of the nth avail ring entry is always n, which
> means virtio PMD is actually refilling desc ring only, without having to
> change avail ring.
> When vhost fetches avail ring, if not evicted, it is always in its first
> level cache.
> 
> When RX receives packets from used ring, we use the used->idx as the
> desc idx. This requires that vhost processes and returns descs from
> avail ring to used ring in order, which is true for both current dpdk
> vhost and kernel vhost implementation. In my understanding, there is no
> necessity for vhost net to process descriptors OOO. One case could be
> zero copy, for example, if one descriptor doesn't meet zero copy
> requirment, we could directly return it to used ring, earlier than the
> descriptors in front of it.
> To enforce this, i want to use a reserved bit to indicate in order
> processing of descriptors.

So what's the point in changing the idx for the used ring?
You need to communicate the length to the guest anyway, don't you?


> For tx ring, the arrangement is like below. Each transmitted mbuf needs
> a desc for virtio_net_hdr, so actually we have only 128 free slots.

Just fix this one. Support ANY_LAYOUT and then you can put data
linearly. And/or support INDIRECT_DESC and then you can
use an indirect descriptor.


> 
>
> ++  
>
> ||  
>
> ||  
>   
> +-+-+-+--+--+--+--+   
>
> 
>|  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
> with fixed descriptor
>   
> +--+--+--+--+-+---+--+---+--+---+--+--+---+   
>
> 
>   | ||  ||  |  |
> |  
>   v vv  ||  v  v
> v  
>   
> +--+--+--+--+-+---+--+---+--+---+--+--+---+   
>
> 
>| 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring
> for virtio_net_hdr
>   
> +--+--+--+--+-+---+--+---+--+---+--

[dpdk-dev] [PATCH 4/4] vhost: define callfd and kickfd as int type

2015-09-09 Thread Yuanhan Liu

On Wed, Sep 09, 2015 at 01:43:06AM +, Ouyang, Changchun wrote:
> 
> 
> > -Original Message-
> > From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> > Sent: Monday, August 24, 2015 11:55 AM
> > To: dev at dpdk.org
> > Cc: Xie, Huawei; Ouyang, Changchun; Yuanhan Liu
> > Subject: [PATCH 4/4] vhost: define callfd and kickfd as int type
> > 
> > So that we can remove the redundant (int) cast.
> > 
> > Signed-off-by: Yuanhan Liu 
> > ---
> 
> > diff --git a/lib/librte_vhost/rte_virtio_net.h
> > b/lib/librte_vhost/rte_virtio_net.h
> > index b9bf320..a037c15 100644
> > --- a/lib/librte_vhost/rte_virtio_net.h
> > +++ b/lib/librte_vhost/rte_virtio_net.h
> > @@ -87,8 +87,8 @@ struct vhost_virtqueue {
> > uint16_tvhost_hlen; /**< Vhost header
> > length (varies depending on RX merge buffers. */
> > volatile uint16_t   last_used_idx;  /**< Last index used
> > on the available ring */
> > volatile uint16_t   last_used_idx_res;  /**< Used for
> > multiple devices reserving buffers. */
> > -   eventfd_t   callfd; /**< Used to notify
> > the guest (trigger interrupt). */
> > -   eventfd_t   kickfd; /**< Currently
> > unused as polling mode is enabled. */
> > +   int callfd; /**< Used to notify
> > the guest (trigger interrupt). */
> > +   int kickfd; /**< Currently
> > unused as polling mode is enabled. */
> 
> I don't think we have to change it from 8B(eventfd_t is defined as uint64_t) 
> to 4B,
> Any benefit for this change? 

As I stated in the commit log, to remove the redundant (int) cast. Casts
like following are a bit ugly:

if ((int)dev->virtqueue[VIRTIO_RXQ]->callfd >= 0)
close((int)dev->virtqueue[VIRTIO_RXQ]->callfd);

On the other hand, why it has to be uint64_t? The caller side sends the
message(be more precisely, qemu) actually uses int type.

--yliu

[dpdk-dev] [PATCH v2] mbuf/ip_frag: Move mbuf chaining to common code

2015-09-09 Thread Ananyev, Konstantin



> -Original Message-
> From: Simon K?gstr?m [mailto:simon.kagstrom at netinsight.net]
> Sent: Tuesday, September 08, 2015 11:41 AM
> To: Ananyev, Konstantin; dev at dpdk.org
> Cc: Olivier MATZ; Zhang, Helin; Gonzalez Monroy, Sergio; Burakov, Anatoly
> Subject: Re: [PATCH v2] mbuf/ip_frag: Move mbuf chaining to common code
> 
> On 2015-09-08 01:21, Ananyev, Konstantin wrote:
> >>
> >> Thanks. I got it wrong anyway, what I wanted was to be able to handle
> >> the day when nb_segs changes to a 16-bit number, but then it should
> >> really be
> >>
> >>   ... >= 1 << (sizeof(head->nb_segs) * 8)
> >>
> >> anyway. I'll fix that and also add a warning that the implementation
> >> will do a linear search to find the tail entry.
> >
> > Probably just me, but I can't foresee the situation when  we would need to 
> > increase nb_segs to 16 bits.
> > Looks like an overkill to me.
> 
> I don't think it will happen either, but with this solution, this
> particular piece of code will work regardless. The value is known at
> compile-time anyway, so it should not be a performance issue.

Ok :)
Konstantin

> 
> // Simon

[dpdk-dev] [PATCH] ring: add function to free a ring

2015-09-09 Thread De Lara Guarch, Pablo



> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Monday, September 07, 2015 9:45 AM
> To: De Lara Guarch, Pablo; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] ring: add function to free a ring
> 
> Hi Pablo,
> 
> Please find some comments below.
> 
> On 08/18/2015 04:00 PM, Pablo de Lara wrote:
> > When creating a ring, a memzone is created to allocate it in memory,
> > but the ring could not be freed, as memzones could not be.
> >
> > Since memzones can be freed now, then rings can be as well,
> > taking into account if they were initialized using pre-allocated memory
> > (in which case, memory should be freed externally) or using
> rte_memzone_reserve
> > (with rte_ring_create), freeing the memory with rte_memzone_free.
> >
> > Signed-off-by: Pablo de Lara 
> > ---
> >  lib/librte_ring/rte_ring.c   | 43
> 
> >  lib/librte_ring/rte_ring.h   |  7 ++
> >  lib/librte_ring/rte_ring_version.map |  7 ++
> >  3 files changed, 57 insertions(+)
> >
> > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> > index c9e59d4..83ce6d3 100644
> > --- a/lib/librte_ring/rte_ring.c
> > +++ b/lib/librte_ring/rte_ring.c
> > @@ -208,6 +208,49 @@ rte_ring_create(const char *name, unsigned
> count, int socket_id,
> > return r;
> >  }
> >
> > +/* free the ring */
> > +void
> > +rte_ring_free(struct rte_ring *r)
> > +{
> > +   struct rte_ring_list *ring_list = NULL;
> > +   char mz_name[RTE_MEMZONE_NAMESIZE];
> > +   struct rte_tailq_entry *te;
> > +   const struct rte_memzone *mz;
> > +
> > +   if (r == NULL)
> > +   return;
> > +
> > +   snprintf(mz_name, sizeof(mz_name), "%s%s",
> RTE_RING_MZ_PREFIX, r->name);
> > +   mz = rte_memzone_lookup(mz_name);
> > +
> > +   /*
> > +* Free ring memory if it was allocated with rte_memzone_reserve,
> > +* otherwise it should be freed externally
> > +*/
> > +   if (rte_memzone_free(mz) != 0)
> > +   return;
> 
> Should we have a log here? I think it may hide important
> bugs if we just return silently here.

Agree.

> 
> > +
> > +   ring_list = RTE_TAILQ_CAST(rte_ring_tailq.head, rte_ring_list);
> > +   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
> > +
> > +   /* find out tailq entry */
> > +   TAILQ_FOREACH(te, ring_list, next) {
> > +   if (te->data == (void *) r)
> > +   break;
> > +   }
> > +
> > +   if (te == NULL) {
> > +   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
> > +   return;
> > +   }
> 
> If I understand well, a ring is in the tailq only if it was
> created with rte_ring_create(). A ring that is statically created
> in memory is not in the tailq. But we already returned in that
> case. So (te == NULL) should not happen here, right? We could
> also add an error log then.

Yes, (te == NULL) should not happen, but not sure if there is a way
where it can happen, but looking at other libraries, like rte_hash or rte_acl,
we check for this and don't add any error log.

> 
> I'm not sure we should handle the case where the ring is not allocated
> with rte_ring_create() in this function. If the ring is allocated with
> another mean (either in a global variable, or with another dynamic
> memory allocator), this function should not be called.

You are right that that rte_ring_free should not be called
if the ring was not created with rte_ring_create,
but we should have a way to handle it, just in case, right?

> 
> What do you think?

Thanks for the comments!

Pablo
> 
> 
> Regards,
> Olivier
> 
> 
> > +
> > +   TAILQ_REMOVE(ring_list, te, next);
> > +
> > +   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
> > +
> > +   rte_free(te);
> > +}
> > +
> >  /*
> >   * change the high water mark. If *count* is 0, water marking is
> >   * disabled
> > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > index af6..e75566f 100644
> > --- a/lib/librte_ring/rte_ring.h
> > +++ b/lib/librte_ring/rte_ring.h
> > @@ -300,6 +300,13 @@ int rte_ring_init(struct rte_ring *r, const char
> *name, unsigned count,
> >   */
> >  struct rte_ring *rte_ring_create(const char *name, unsigned count,
> >  int socket_id, unsigned flags);
> > +/**
> > + * De-allocate all memory used by the ring.
> > + *
> > + * @param r
> > + *   Ring to free
> > + */
> > +void rte_ring_free(struct rte_ring *r);
> >
> >  /**
> >   * Change the high water mark.
> > diff --git a/lib/librte_ring/rte_ring_version.map
> b/lib/librte_ring/rte_ring_version.map
> > index 982fdd1..5474b98 100644
> > --- a/lib/librte_ring/rte_ring_version.map
> > +++ b/lib/librte_ring/rte_ring_version.map
> > @@ -11,3 +11,10 @@ DPDK_2.0 {
> >
> > local: *;
> >  };
> > +
> > +DPDK_2.2 {
> > +   global:
> > +
> > +   rte_ring_free;
> > +
> > +} DPDK_2.0;
> >

[dpdk-dev] [PATCH 4/4 v2] vhost: fix wrong usage of eventfd_t

2015-09-09 Thread Xie, Huawei


Acked-by: Huawei Xie 

Thanks for fixing this.

On 9/9/2015 1:32 PM, Yuanhan Liu wrote:
> According to eventfd man page:
>
> typedef uint64_t eventfd_t;
>
> int eventfd_read(int fd, eventfd_t *value);
> int eventfd_write(int fd, eventfd_t value);
>
> eventfd_t is defined for the second arg(value), but not for fd.
>
> Here I redefine those fd fields to `int' type, which also removes
> the redundant (int) cast. And as the man page stated, we should
> cast 1 to eventfd_t type for eventfd_write().
>
> v2: cast 1 with `eventfd_t' type.
>
> Signed-off-by: Yuanhan Liu 
> ---
>  examples/vhost/main.c |  6 ++---
>  lib/librte_vhost/rte_virtio_net.h |  4 ++--
>  lib/librte_vhost/vhost_rxtx.c |  6 ++---
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 16 +++---
>  lib/librte_vhost/virtio-net.c | 32 
> +--
>  5 files changed, 32 insertions(+), 32 deletions(-)
>

[dpdk-dev] [PATCH 1/4] vhost: remove redundant ;

2015-09-09 Thread Xie, Huawei

On 8/24/2015 11:54 AM, Yuanhan Liu wrote:
> Signed-off-by: Yuanhan Liu 
Acked-by: Huawei Xie 

> ---
>  lib/librte_vhost/vhost_rxtx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> index 0d07338..d412293 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -185,7 +185,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
>   }
>   }
>   len_to_cpy = RTE_MIN(data_len - offset, desc->len - 
> vb_offset);
> - };
> + }
>  
>   /* Update used ring with desc information */
>   vq->used->ring[res_cur_idx & (vq->size - 1)].id =

[dpdk-dev] [PATCH 2/4] vhost: fix typo

2015-09-09 Thread Xie, Huawei

On 8/24/2015 11:54 AM, Yuanhan Liu wrote:
> _det => _dev
>
> Signed-off-by: Yuanhan Liu 
Acked-by: Huawei Xie 

> ---
>  lib/librte_vhost/virtio-net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
> index b520ec5..b670992 100644
> --- a/lib/librte_vhost/virtio-net.c
> +++ b/lib/librte_vhost/virtio-net.c
> @@ -485,7 +485,7 @@ set_vring_num(struct vhost_device_ctx ctx, struct 
> vhost_vring_state *state)
>  }
>  
>  /*
> - * Reallocate virtio_det and vhost_virtqueue data structure to make them on 
> the
> + * Reallocate virtio_dev and vhost_virtqueue data structure to make them on 
> the
>   * same numa node as the memory of vring descriptor.
>   */
>  #ifdef RTE_LIBRTE_VHOST_NUMA

[dpdk-dev] [PATCH 3/4] vhost: get rid of duplicate code

2015-09-09 Thread Xie, Huawei


On 8/24/2015 11:54 AM, Yuanhan Liu wrote:
> Signed-off-by: Yuanhan Liu 
Acked-by: Huawei Xie 

> ---
>  lib/librte_vhost/vhost_user/vhost-net-user.c | 36 
> 
>  1 file changed, 10 insertions(+), 26 deletions(-)
>
>

[dpdk-dev] virtio_dev_merge_rx send more interrupts than needed

2015-09-09 Thread Xie, Huawei

In virtio_dev_merge_rx, we inject interrupts to guest for each packet rather 
than for the burst of packets. Would submit the fix.

/huawei

[dpdk-dev] vring_init bug

2015-09-09 Thread Xie, Huawei



> -Original Message-
> From: Ouyang, Changchun
> Sent: Wednesday, September 09, 2015 11:18 AM
> To: Xie, Huawei; dev at dpdk.org
> Cc: Ouyang, Changchun
> Subject: RE: vring_init bug
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Xie, Huawei
> > Sent: Wednesday, September 9, 2015 11:00 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] vring_init bug
> >
> > static inline void
> > vring_init(struct vring *vr, unsigned int num, uint8_t *p,
> > unsigned long align)
> > {
> > vr->num = num;
> > vr->desc = (struct vring_desc *) p;
> > vr->avail = (struct vring_avail *) (p +
> > num * sizeof(struct vring_desc));
> > vr->used = (void *)
> > RTE_ALIGN_CEIL((uintptr_t)(&vr->avail->ring[num]), align); }
> >
> > There is a bug in vr->used calculation. 2 bytes of used_event_idx isn't
> > considered. Would submit a fix.
> > __u16 available[num];
> > __u16 used_event_idx;
> 
> For vring_used ring, it also misses avail_event.
> 
> struct vring_used {
> u16 flags ;
> u16 idx ;
> struct vring_used_elem r ing [qsz] ;
> u16 avail_event ;  // this one missed in dpdk
> } ;
> 
> It doesn't affect the offset calculation, but it will be great if you can add 
> it
> together.

No need to add this field for use vring and you couldn't because previous array 
is variable length.

[dpdk-dev] vring_init bug

2015-09-09 Thread Ouyang, Changchun


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Xie, Huawei
> Sent: Wednesday, September 9, 2015 11:00 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] vring_init bug
> 
> static inline void
> vring_init(struct vring *vr, unsigned int num, uint8_t *p,
> unsigned long align)
> {
> vr->num = num;
> vr->desc = (struct vring_desc *) p;
> vr->avail = (struct vring_avail *) (p +
> num * sizeof(struct vring_desc));
> vr->used = (void *)
> RTE_ALIGN_CEIL((uintptr_t)(&vr->avail->ring[num]), align); }
> 
> There is a bug in vr->used calculation. 2 bytes of used_event_idx isn't
> considered. Would submit a fix.
> __u16 available[num];
> __u16 used_event_idx;

For vring_used ring, it also misses avail_event.

struct vring_used {
u16 flags ;
u16 idx ;
struct vring_used_elem r ing [qsz] ;
u16 avail_event ;  // this one missed in dpdk
} ;

It doesn't affect the offset calculation, but it will be great if you can add 
it together.

[dpdk-dev] vring_init bug

2015-09-09 Thread Xie, Huawei

static inline void
vring_init(struct vring *vr, unsigned int num, uint8_t *p,
unsigned long align)
{
vr->num = num;
vr->desc = (struct vring_desc *) p;
vr->avail = (struct vring_avail *) (p +
num * sizeof(struct vring_desc));
vr->used = (void *)
RTE_ALIGN_CEIL((uintptr_t)(&vr->avail->ring[num]), align);
}

There is a bug in vr->used calculation. 2 bytes of used_event_idx isn't
considered. Would submit a fix.
__u16 available[num];
__u16 used_event_idx;

[dpdk-dev] [PATCH 4/4] vhost: define callfd and kickfd as int type

2015-09-09 Thread Ouyang, Changchun



> -Original Message-
> From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> Sent: Monday, August 24, 2015 11:55 AM
> To: dev at dpdk.org
> Cc: Xie, Huawei; Ouyang, Changchun; Yuanhan Liu
> Subject: [PATCH 4/4] vhost: define callfd and kickfd as int type
> 
> So that we can remove the redundant (int) cast.
> 
> Signed-off-by: Yuanhan Liu 
> ---
>  examples/vhost/main.c |  6 ++---
>  lib/librte_vhost/rte_virtio_net.h |  4 ++--
>  lib/librte_vhost/vhost_rxtx.c |  6 ++---
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 16 +++---
>  lib/librte_vhost/virtio-net.c | 32 
> +--
>  5 files changed, 32 insertions(+), 32 deletions(-)
> 
> diff --git a/examples/vhost/main.c b/examples/vhost/main.c index
> 1b137b9..b090b25 100644
> --- a/examples/vhost/main.c
> +++ b/examples/vhost/main.c
> @@ -1433,7 +1433,7 @@ put_desc_to_used_list_zcp(struct vhost_virtqueue
> *vq, uint16_t desc_idx)
> 
>   /* Kick the guest if necessary. */
>   if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
> - eventfd_write((int)vq->callfd, 1);
> + eventfd_write(vq->callfd, 1);

Don't we need type conversion for '1' to eventfd_t here?

>  }
> 
>  /*
> @@ -1626,7 +1626,7 @@ txmbuf_clean_zcp(struct virtio_net *dev, struct
> vpool *vpool)
> 
>   /* Kick guest if required. */
>   if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
> - eventfd_write((int)vq->callfd, 1);
> + eventfd_write(vq->callfd, 1);

Same as above

> 
>   return 0;
>  }
> @@ -1774,7 +1774,7 @@ virtio_dev_rx_zcp(struct virtio_net *dev, struct
> rte_mbuf **pkts,
> 
>   /* Kick the guest if necessary. */
>   if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
> - eventfd_write((int)vq->callfd, 1);
> + eventfd_write(vq->callfd, 1);

Same as above

> 
>   return count;
>  }
> diff --git a/lib/librte_vhost/rte_virtio_net.h
> b/lib/librte_vhost/rte_virtio_net.h
> index b9bf320..a037c15 100644
> --- a/lib/librte_vhost/rte_virtio_net.h
> +++ b/lib/librte_vhost/rte_virtio_net.h
> @@ -87,8 +87,8 @@ struct vhost_virtqueue {
>   uint16_tvhost_hlen; /**< Vhost header
> length (varies depending on RX merge buffers. */
>   volatile uint16_t   last_used_idx;  /**< Last index used
> on the available ring */
>   volatile uint16_t   last_used_idx_res;  /**< Used for
> multiple devices reserving buffers. */
> - eventfd_t   callfd; /**< Used to notify
> the guest (trigger interrupt). */
> - eventfd_t   kickfd; /**< Currently
> unused as polling mode is enabled. */
> + int callfd; /**< Used to notify
> the guest (trigger interrupt). */
> + int kickfd; /**< Currently
> unused as polling mode is enabled. */
>   struct buf_vector   buf_vec[BUF_VECTOR_MAX];/**< for
> scatter RX. */
>  } __rte_cache_aligned;
> 
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> index d412293..887cdb6 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -230,7 +230,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t
> queue_id,
> 
>   /* Kick the guest if necessary. */
>   if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
> - eventfd_write((int)vq->callfd, 1);
> + eventfd_write(vq->callfd, 1);
>   return count;
>  }
> 
> @@ -529,7 +529,7 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t
> queue_id,
> 
>   /* Kick the guest if necessary. */
>   if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
> - eventfd_write((int)vq->callfd, 1);
> + eventfd_write(vq->callfd, 1);
>   }
> 
>   return count;
> @@ -752,6 +752,6 @@ rte_vhost_dequeue_burst(struct virtio_net *dev,
> uint16_t queue_id,
>   vq->used->idx += entry_success;
>   /* Kick guest if required. */
>   if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
> - eventfd_write((int)vq->callfd, 1);
> + eventfd_write(vq->callfd, 1);
>   return entry_success;
>  }
> diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c
> b/lib/librte_vhost/vhost_user/virtio-net-user.c
> index c1ffc38..4689927 100644
> --- a/lib/librte_vhost/vhost_user/virtio-net-user.c
> +++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
> @@ -214,10 +214,10 @@ virtio_is_ready(struct virtio_net *dev)
>   rvq = dev->virtqueue[VIRTIO_RXQ];
>   tvq = dev->virtqueue[VIRTIO_TXQ];
>   if (rvq && tvq && rvq->desc && tvq->desc &&
> - (rvq->kickfd != (eventfd_t)-1) &&
> - (rvq->callfd != (eventfd_t)-1) &&
> - (tvq->kickfd != (eventfd_t)-1) &&
> - (tvq->callfd != (eventfd_t)-1)) {
> +

[dpdk-dev] [PATCH 4/4] vhost: define callfd and kickfd as int type

2015-09-09 Thread Ouyang, Changchun



> -Original Message-
> From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> Sent: Wednesday, September 9, 2015 9:55 AM
> To: Ouyang, Changchun
> Cc: dev at dpdk.org; Xie, Huawei
> Subject: Re: [PATCH 4/4] vhost: define callfd and kickfd as int type
> 
> On Wed, Sep 09, 2015 at 01:43:06AM +, Ouyang, Changchun wrote:
> >
> >
> > > -Original Message-
> > > From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> > > Sent: Monday, August 24, 2015 11:55 AM
> > > To: dev at dpdk.org
> > > Cc: Xie, Huawei; Ouyang, Changchun; Yuanhan Liu
> > > Subject: [PATCH 4/4] vhost: define callfd and kickfd as int type
> > >
> > > So that we can remove the redundant (int) cast.
> > >
> > > Signed-off-by: Yuanhan Liu 
> > > ---
> >
> > > diff --git a/lib/librte_vhost/rte_virtio_net.h
> > > b/lib/librte_vhost/rte_virtio_net.h
> > > index b9bf320..a037c15 100644
> > > --- a/lib/librte_vhost/rte_virtio_net.h
> > > +++ b/lib/librte_vhost/rte_virtio_net.h
> > > @@ -87,8 +87,8 @@ struct vhost_virtqueue {
> > >   uint16_tvhost_hlen; /**< Vhost header
> > > length (varies depending on RX merge buffers. */
> > >   volatile uint16_t   last_used_idx;  /**< Last index used
> > > on the available ring */
> > >   volatile uint16_t   last_used_idx_res;  /**< Used for
> > > multiple devices reserving buffers. */
> > > - eventfd_t   callfd; /**< Used to notify
> > > the guest (trigger interrupt). */
> > > - eventfd_t   kickfd; /**< Currently
> > > unused as polling mode is enabled. */
> > > + int callfd; /**< Used to notify
> > > the guest (trigger interrupt). */
> > > + int kickfd; /**< Currently
> > > unused as polling mode is enabled. */
> >
> > I don't think we have to change it from 8B(eventfd_t is defined as
> > uint64_t) to 4B, Any benefit for this change?
> 
> As I stated in the commit log, to remove the redundant (int) cast. Casts like
> following are a bit ugly:
> 
> if ((int)dev->virtqueue[VIRTIO_RXQ]->callfd >= 0)
> close((int)dev->virtqueue[VIRTIO_RXQ]->callfd);
> 
> On the other hand, why it has to be uint64_t? The caller side sends the
> message(be more precisely, qemu) actually uses int type.
> 

Agree, qemu use 32 bit for the callfd and kickfd.
It could use int.
Well, there is another comment in other place in this patch, I will send out 
soon.


>   --yliu

[dpdk-dev] virtio-net: bind systematically on all non blacklisted virtio-net devices

2015-09-09 Thread Ouyang, Changchun



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Franck Baudin
> Sent: Tuesday, September 8, 2015 4:23 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] virtio-net: bind systematically on all non blacklisted
> virtio-net devices
> 
> Hi,
> 
> virtio-net driver bind on all virtio-net devices, even if the devices are 
> used by
> the kernel (leading to kernel soft-lookup/panic). One way around is to
> blacklist the ports in use by Linux. This is the case since v2.0.0, in fact 
> since
> commit da978dfdc43b59e290a46d7ece5fd19ce79a1162
> and the removal of the RTE_PCI_DRV_NEED_MAPPING driver flag.

It allows virtio-pmd not necessarily depend on igb_uio, this is which 
characteristic other pmd drivers don't have.

> 
> Questions:
>  1/ Is it the expected behaviour?
>  2/ Why is it different from vmxnet3 pmd? In other words, should't we re-
> add the RTE_PCI_DRV_NEED_MAPPING to virtio pmd or remove it from
> pmxnet3 pmd?
>  3/ If this is the expected behaviour, shouldn't we update
> dpdk_nic_bind.py (binding status irrelevant for virtio) tool and the
> documentation (mentioning igb_uio while misleading and useless)?
> 
> Thanks!
> 
> Best Regards,
> Franck
> 
> 
>

[dpdk-dev] [PATCH 00/52] update i40e base driver

2015-09-09 Thread Zhang, Helin

Acked-by: Helin Zhang 

> -Original Message-
> From: Wu, Jingjing
> Sent: Sunday, September 6, 2015 3:11 PM
> To: dev at dpdk.org
> Cc: Wu, Jingjing; Lu, Wenzhuo; Xu, HuilongX; Zhang, Helin
> Subject: [PATCH 00/52] update i40e base driver
> 
> Here is the update of i40e base driver.
> Main update:
>  - support New X722 Device (FortPark)
>  - extend virtual channel interface
>  - support CEE DCBX
>  - support promiscuous on VLAN
>  - support Tx Scheduling related AQ functions
>  - support RSS AQ related functions
>  - code clean up
>

[dpdk-dev] [PATCH 1/4] vhost: remove redundant ;

2015-09-09 Thread Ouyang, Changchun



> -Original Message-
> From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> Sent: Monday, August 24, 2015 11:55 AM
> To: dev at dpdk.org
> Cc: Xie, Huawei; Ouyang, Changchun; Yuanhan Liu
> Subject: [PATCH 1/4] vhost: remove redundant ;
> 
> Signed-off-by: Yuanhan Liu 

Acked-by: Changchun Ouyang 

> ---
>  lib/librte_vhost/vhost_rxtx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> index 0d07338..d412293 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -185,7 +185,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t
> queue_id,
>   }
>   }
>   len_to_cpy = RTE_MIN(data_len - offset, desc->len -
> vb_offset);
> - };
> + }
> 
>   /* Update used ring with desc information */
>   vq->used->ring[res_cur_idx & (vq->size - 1)].id =
> --
> 1.9.0

[dpdk-dev] [PATCH 4/4] vhost: define callfd and kickfd as int type

2015-09-09 Thread Ouyang, Changchun



> -Original Message-
> From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> Sent: Monday, August 24, 2015 11:55 AM
> To: dev at dpdk.org
> Cc: Xie, Huawei; Ouyang, Changchun; Yuanhan Liu
> Subject: [PATCH 4/4] vhost: define callfd and kickfd as int type
> 
> So that we can remove the redundant (int) cast.
> 
> Signed-off-by: Yuanhan Liu 
> ---

> diff --git a/lib/librte_vhost/rte_virtio_net.h
> b/lib/librte_vhost/rte_virtio_net.h
> index b9bf320..a037c15 100644
> --- a/lib/librte_vhost/rte_virtio_net.h
> +++ b/lib/librte_vhost/rte_virtio_net.h
> @@ -87,8 +87,8 @@ struct vhost_virtqueue {
>   uint16_tvhost_hlen; /**< Vhost header
> length (varies depending on RX merge buffers. */
>   volatile uint16_t   last_used_idx;  /**< Last index used
> on the available ring */
>   volatile uint16_t   last_used_idx_res;  /**< Used for
> multiple devices reserving buffers. */
> - eventfd_t   callfd; /**< Used to notify
> the guest (trigger interrupt). */
> - eventfd_t   kickfd; /**< Currently
> unused as polling mode is enabled. */
> + int callfd; /**< Used to notify
> the guest (trigger interrupt). */
> + int kickfd; /**< Currently
> unused as polling mode is enabled. */

I don't think we have to change it from 8B(eventfd_t is defined as uint64_t) to 
4B,
Any benefit for this change?

[dpdk-dev] [PATCH 3/4] vhost: get rid of duplicate code

2015-09-09 Thread Ouyang, Changchun



> -Original Message-
> From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> Sent: Monday, August 24, 2015 11:55 AM
> To: dev at dpdk.org
> Cc: Xie, Huawei; Ouyang, Changchun; Yuanhan Liu
> Subject: [PATCH 3/4] vhost: get rid of duplicate code
> 
> Signed-off-by: Yuanhan Liu 

Acked-by: Changchun Ouyang

[dpdk-dev] [PATCH 2/4] vhost: fix typo

2015-09-09 Thread Ouyang, Changchun



> -Original Message-
> From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> Sent: Monday, August 24, 2015 11:55 AM
> To: dev at dpdk.org
> Cc: Xie, Huawei; Ouyang, Changchun; Yuanhan Liu
> Subject: [PATCH 2/4] vhost: fix typo
> 
> _det => _dev
> 
> Signed-off-by: Yuanhan Liu 

Acked-by: Changchun Ouyang

[dpdk-dev] [RFC PATCH 4/8] driver/virtio:enqueue TSO offload

2015-09-09 Thread Ouyang, Changchun



> -Original Message-
> From: Liu, Jijiang
> Sent: Monday, September 7, 2015 2:11 PM
> To: Ouyang, Changchun; dev at dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 4/8] driver/virtio:enqueue TSO offload
> 
> 
> 
> > -Original Message-
> > From: Ouyang, Changchun
> > Sent: Monday, August 31, 2015 8:29 PM
> > To: Liu, Jijiang; dev at dpdk.org
> > Cc: Ouyang, Changchun
> > Subject: RE: [dpdk-dev] [RFC PATCH 4/8] driver/virtio:enqueue TSO
> > offload
> >
> >
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jijiang Liu
> > > Sent: Monday, August 31, 2015 5:42 PM
> > > To: dev at dpdk.org
> > > Subject: [dpdk-dev] [RFC PATCH 4/8] driver/virtio:enqueue TSO
> > > offload
> > >
> > > Enqueue TSO4/6 offload.
> > >
> > > Signed-off-by: Jijiang Liu 
> > > ---
> > >  drivers/net/virtio/virtio_rxtx.c |   23 +++
> > >  1 files changed, 23 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio/virtio_rxtx.c
> > > b/drivers/net/virtio/virtio_rxtx.c
> > > index c5b53bb..4c2d838 100644
> > > --- a/drivers/net/virtio/virtio_rxtx.c
> > > +++ b/drivers/net/virtio/virtio_rxtx.c
> > > @@ -198,6 +198,28 @@ virtqueue_enqueue_recv_refill(struct virtqueue
> > > *vq, struct rte_mbuf *cookie)
> > >   return 0;
> > >  }
> > >
> > > +static void
> > > +virtqueue_enqueue_offload(struct virtqueue *txvq, struct rte_mbuf
> *m,
> > > + uint16_t idx, uint16_t hdr_sz)
> > > +{
> > > + struct virtio_net_hdr *hdr = (struct virtio_net_hdr *)(uintptr_t)
> > > + (txvq->virtio_net_hdr_addr + idx * hdr_sz);
> > > +
> > > + if (m->tso_segsz != 0 && m->ol_flags & PKT_TX_TCP_SEG) {
> > > + if (m->ol_flags & PKT_TX_IPV4) {
> > > + if (!vtpci_with_feature(txvq->hw,
> > > VIRTIO_NET_F_HOST_TSO4))
> > > + return;
> >
> > Do we need return error if host can't handle tso for the packet?
> >
> > > + hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
> > > + } else if (m->ol_flags & PKT_TX_IPV6) {
> > > + if (!vtpci_with_feature(txvq->hw,
> > > VIRTIO_NET_F_HOST_TSO6))
> > > + return;
> >
> > Same as above
> >
> > > + hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
> > > + }
> >
> > Do we need else branch for the case of neither tcpv4 nor tcpv6?
> >
> > > + hdr->gso_size = m->tso_segsz;
> > > + hdr->hdr_len = m->l2_len + m->l3_len + m->l4_len;
> > > + }
> > > +}
> > > +
> > >  static int
> > >  virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf
> > > *cookie) { @@ -221,6 +243,7 @@ virtqueue_enqueue_xmit(struct
> > > virtqueue *txvq, struct rte_mbuf *cookie)
> > >   dxp->cookie = (void *)cookie;
> > >   dxp->ndescs = needed;
> > >
> > > + virtqueue_enqueue_offload(txvq, cookie, idx, head_size);
> >
> > If TSO is not enabled in the feature bit, how to resolve here?
> 
> The TSO enablement  check is in the function.
> 
> If TSO is not enabled, and don't need to fill virtio_net_hdr now.

Here I mean if (m->ol_flags & PKT_TX_TCP_SEG) is true, that is to say, the 
virtio-pmd user expect do tso in vhost or virtio,
but the host feature bit doesn't support it, then it should handle this case,
either handle it in virtio pmd, or return error to caller. 
Otherwise the packet with flag tso maybe can't be send out normally.  
Am I right?

> 
> > >   start_dp = txvq->vq_ring.desc;
> > >   start_dp[idx].addr =
> > >   txvq->virtio_net_hdr_mem + idx * head_size;
> > > --
> > > 1.7.7.6

69 matches

Mail list logo