date:20160120

[dpdk-dev] [PATCH] eal: add function to check if primary proc alive

2016-01-20 Thread Matthew Hall

On 1/20/16 10:14 PM, Qiu, Michael wrote:
> As we could start up many primaries, how does your secondary process
> work with them?

I just worked on this tonight myself. When doing > 1 primary (for 
example pktgen and app), I had to specify:

--no-shconf
--file-prefix pktgen
--file-prefix app

Or you get a panic and RTE fails to init, but the file-prefix seems to 
get applied both to the hugepage mmap() files and also to the lockfiles 
in /var/run:

$ ls -a /var/run | egrep -i '^\.'
. 
..
.pktgen_hugepage_info
.rte_config
.rte_hugepage_info
.sdn_sensor_hugepage_info

So I think you have to keep the different primary-secondary sets 
separate using --file-prefix .

Matthew.

[dpdk-dev] [PKTGEN] dumb question: how to start packet TX and set the payload

2016-01-20 Thread Matthew Hall

Hello,

I was trying to just use the default PKT file, test/set_seq.pkt, like so:

sudo "./app/app/${RTE_TARGET}/pktgen" \
-l 2,3 \
--master-lcore 2 \
-n 2 \
-m 1024 \
-w 0a:00.1 \
--no-shconf \
--file-prefix pktgen \
-- \
-P \
-m 2.0 \
-f test/set_seq.pkt

After pktgen loaded, the port 0 is marked as UP. So I typed "start all" 
and also tried "str". Sadly, so far, it seems like I could not get this 
to actually begin sending any packets. At least, no counters are 
incrementing in the pktgen UI. So I wasn't sure how to make sure it is 
really sending or not.

The documentation talked about many different commands available, but it 
didn't specifically say how to start transmitting the packets based on 
the content of your *.pkt script file.

I'm just trying to figure out what I messed up so that I can write 
(another) doc patch besides the one I just sent a moment ago.

I was also curious about putting some specific payloads into the packets 
in pktgen. There are many ways of configuring the packet size, but it 
doesn't talk about how and where to set the packet content. This is 
important for my app as its performance will go up and down depending on 
if the L4-L7 data has "interesting" content inside or not.

Sincerely,
Matthew.

[dpdk-dev] [PKTGEN] additional terminal IO question

2016-01-20 Thread Matthew Hall

On 1/20/16 10:00 PM, Arnon Warshavsky wrote:
> Black background gets me to the blind reset as well.
> Pktgen is the only tab I keep with non black background..

Thanks for confirming. Never had this many termio issues before so I was 
wondering if I just went totally crazy!

Matthew.

[dpdk-dev] [PKTGEN] additional terminal IO question

2016-01-20 Thread Matthew Hall

If I try using pktgen theme mode (-T) or unmodified, without commenting 
out some of the stuff I mentioned I disabled for debugging in the 
previous thread, it seems like it sets the pktgen prompt to be invisible 
(black text on black??? or I'm not sure just want) on my TTY which has a 
black background.

If you quit the app it does not reset the colors so my shell is also 
invisible, until I blindly run the reset command.

Did anybody else try it on a black background? Did anybody else see 
these issues with it as well?

Matthew.

[dpdk-dev] [PKTGEN] [PATCH 2/2] usage_pktgen.rst: multiple instances: clarify EAL options needed

2016-01-20 Thread Matthew Hall

Signed-off-by: Matthew Hall 
---
 docs/source/usage_pktgen.rst | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/docs/source/usage_pktgen.rst b/docs/source/usage_pktgen.rst
index efe8aa4..223d033 100644
--- a/docs/source/usage_pktgen.rst
+++ b/docs/source/usage_pktgen.rst
@@ -157,4 +157,19 @@ The -m option then assigns lcores to the ports.
 The information from above is taken from two new files pktgen-master.sh
 and pktgen-slave.sh, have a look at them and adjust as you need.

+The following DPDK / EAL options must be configured correctly as well:
+
+* ``-l lcore_id_list``: non-conflicting list of lcores for each app
+
+* ``--master-lcore lcore_id``: non-conflicting master lcore for each app
+
+* ``-m hugepage_mb / --socket-mem hugepage_mb_list``: non-conflicting amount
+of hugepage memory for each app, or for each app on each CPU socket
+
+* ``--no-shconf``: prevents DPDK from claiming a lockfile that breaks
+concurrent use of multiple apps
+
+* ``--file-prefix``: assigns a unique name to the hugepage mmap() files for
+each app
+
 Pktgen can also be configured using the :ref:`commands`.
-- 
2.5.0

[dpdk-dev] [PKTGEN] [PATCH 1/2] usage_pktgen.rst: multiple instances: clean up section intro

2016-01-20 Thread Matthew Hall

Signed-off-by: Matthew Hall 
---
 docs/source/usage_pktgen.rst | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/docs/source/usage_pktgen.rst b/docs/source/usage_pktgen.rst
index 20bd314..efe8aa4 100644
--- a/docs/source/usage_pktgen.rst
+++ b/docs/source/usage_pktgen.rst
@@ -103,15 +103,15 @@ Multiple Instances of Pktgen or other application
 =

 One possible solution I use and if you have enough ports available to use.
-Lets say you need two ports for your application, but you have 4 ports in
-your system. I physically loop back the cables to have port 0 connect to
-port 2 and port 1 connected to port 3. Now I can give two ports to my
-application and two ports to Pktgen.
-
-Setup if pktgen and your application you have to startup each one a bit
-differently to make sure they share the resources like memory and the
-ports. I will use two Pktgen running on the same machine, which just means
-you have to setup your application as one of the applications.
+Let's say you need two ports for your application, but you have 4 ports in
+your system. I physically loop back the cables to have port 0 connect to port
+2 and port 1 connected to port 3. Now I can give two ports to my application
+and two ports to Pktgen.
+
+If you are running pktgen and your application together, you have to start up
+each one a bit differently to make sure they share the resources like memory
+and the ports. I will use two Pktgens running on the same machine, which just
+means you have imagine your application as one of the applications.

 In my machine I have 8 10G ports and 72 lcores between 2 sockets. Plus I
 have 1024 hugepages per socket for a total of 2048.
-- 
2.5.0

[dpdk-dev] [PATCH] rte.extvars.mk: allow overriding RTE_SDK_BIN from the environment

2016-01-20 Thread Matthew Hall

On 1/20/16 7:27 AM, Thomas Monjalon wrote:
> Hi Matthew,
>
> RTE_SDK_BIN is an internal variable and should not be overriden.
 >
> Have you installed DPDK somewhere? Example:
>   make install O=mybuild DESTDIR=mylocalinstall
>
> Then you should build your app like this:
>   make RTE_SDK=$(readlink -e ../dpdk/mylocalinstall/usr/local/share/dpdk)

Hello Thomas,

Is the way the make install target really works documented somewhere?

This target did not exist when I first used DPDK in 2011, and since then 
I saw various documentation on building DPDK in various places, but not 
that much explanation what make install actually does. I recall various 
list threads about changing its behavior as well.

For example, if I look at this apparently most official document:

http://dpdk.org/doc/guides/linux_gsg/build_dpdk.html

It has build examples such as:

make install T=x86_64-native-linuxapp-gcc

But it does not discuss "O=" or "DESTDIR=" or any other additional 
options. From some experiments on my machine, it looks like maybe I 
could do this:

make install "T=${RTE_TARGET}" "O=build" "DESTDIR=build"

Is that a valid possibility, to keep it all in one easy directory?

Thanks,
Matthew.

[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-20 Thread Amit Tomer

Hello,

> For this case, please use --single-file option because it creates much more
> than 8 fds, which can be handled by vhost-user sendmsg().

Thanks, I'm able to verify it by sending ARP packet from container to
host on arm64. But sometimes, I do see following message while running
l2fwd in container(pointed by Rich).

EAL: Master lcore 0 is ready (tid=8a7a3000;cpuset=[0])
EAL: lcore 1 is ready (tid=89cdf050;cpuset=[1])
Notice: odd number of ports in portmask.
Lcore 0: RX port 0
Initializing port 0... PANIC in kick_all_vq():
TUNSETVNETHDRSZ failed: Inappropriate ioctl for device

How it could be avoided?

Thanks,
Amit.

[dpdk-dev] [PATCH 2/4] i40e: split function for input set change of hash and fdir

2016-01-20 Thread Chilikin, Andrey

Hi Jingjing,

As I can see this patch not only splits fdir functionality from common 
fdir/hash code but also removes compatibility with DPDK 2.2 as it deletes 
I40E_INSET_FLEX_PAYLOAD from valid fdir input set values. Yes, flexible payload 
configuration can be set for fdir separately at the port initialization, but 
this is more legacy from the previous generations of NICs which did not support 
dynamic input set configuration. I believe it would better to have 
I40E_INSET_FLEX_PAYLOAD valid for fdir input set same as in DPDK 2.2. So in 
legacy mode, when application has to run on an old NIC and on a new one, only 
legacy configuration would be used, but for applications targeting new HW 
single point of configuration would be used instead of mix of two.

Regards,
Andrey

> -Original Message-
> From: Wu, Jingjing
> Sent: Friday, December 25, 2015 8:30 AM
> To: dev at dpdk.org
> Cc: Wu, Jingjing; Zhang, Helin; Chilikin, Andrey; Pei, Yulong
> Subject: [PATCH 2/4] i40e: split function for input set change of hash and 
> fdir
> 
> This patch splited function for input set change of hash and fdir, and added a
> new function to set the input set to default when initialization.
> 
> Signed-off-by: Jingjing Wu 
> ---
>  drivers/net/i40e/i40e_ethdev.c | 330 
> +
>  drivers/net/i40e/i40e_ethdev.h |  11 +-
>  drivers/net/i40e/i40e_fdir.c   |   5 +-
>  3 files changed, 180 insertions(+), 166 deletions(-)
> 
> diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
> index bf6220d..b919aac 100644
> --- a/drivers/net/i40e/i40e_ethdev.c
> +++ b/drivers/net/i40e/i40e_ethdev.c
> @@ -262,7 +262,8 @@
>  #define I40E_REG_INSET_FLEX_PAYLOAD_WORD7
> 0x0080ULL
>  /* 8th word of flex payload */
>  #define I40E_REG_INSET_FLEX_PAYLOAD_WORD8
> 0x0040ULL
> -
> +/* all 8 words flex payload */
> +#define I40E_REG_INSET_FLEX_PAYLOAD_WORDS
> 0x3FC0ULL
>  #define I40E_REG_INSET_MASK_DEFAULT  0xULL
> 
>  #define I40E_TRANSLATE_INSET 0
> @@ -373,6 +374,7 @@ static int i40e_dev_udp_tunnel_add(struct rte_eth_dev
> *dev,
>   struct rte_eth_udp_tunnel *udp_tunnel);  static
> int i40e_dev_udp_tunnel_del(struct rte_eth_dev *dev,
>   struct rte_eth_udp_tunnel *udp_tunnel);
> +static void i40e_filter_input_set_init(struct i40e_pf *pf);
>  static int i40e_ethertype_filter_set(struct i40e_pf *pf,
>   struct rte_eth_ethertype_filter *filter,
>   bool add);
> @@ -787,6 +789,8 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
>* It should be removed once issues are fixed in NVM.
>*/
>   i40e_flex_payload_reg_init(hw);
> + /* Initialize the input set for filters (hash and fd) to default value 
> */
> + i40e_filter_input_set_init(pf);
> 
>   /* Initialize the parameters for adminq */
>   i40e_init_adminq_parameter(hw);
> @@ -6545,43 +6549,32 @@ i40e_get_valid_input_set(enum i40e_filter_pctype
> pctype,
>*/
>   static const uint64_t valid_fdir_inset_table[] = {
>   [I40E_FILTER_PCTYPE_FRAG_IPV4] =
> - I40E_INSET_IPV4_SRC | I40E_INSET_IPV4_DST |
> - I40E_INSET_FLEX_PAYLOAD,
> + I40E_INSET_IPV4_SRC | I40E_INSET_IPV4_DST,
>   [I40E_FILTER_PCTYPE_NONF_IPV4_UDP] =
>   I40E_INSET_IPV4_SRC | I40E_INSET_IPV4_DST |
> - I40E_INSET_SRC_PORT | I40E_INSET_DST_PORT |
> - I40E_INSET_FLEX_PAYLOAD,
> + I40E_INSET_SRC_PORT | I40E_INSET_DST_PORT,
>   [I40E_FILTER_PCTYPE_NONF_IPV4_TCP] =
> - I40E_INSET_IPV4_SRC | I40E_INSET_IPV4_DST |
> - I40E_INSET_SRC_PORT | I40E_INSET_DST_PORT |
> - I40E_INSET_FLEX_PAYLOAD,
> + I40E_INSET_IPV4_SRC | I40E_INSET_IPV4_DST,
>   [I40E_FILTER_PCTYPE_NONF_IPV4_SCTP] =
>   I40E_INSET_IPV4_SRC | I40E_INSET_IPV4_DST |
>   I40E_INSET_SRC_PORT | I40E_INSET_DST_PORT |
> - I40E_INSET_SCTP_VT | I40E_INSET_FLEX_PAYLOAD,
> + I40E_INSET_SCTP_VT,
>   [I40E_FILTER_PCTYPE_NONF_IPV4_OTHER] =
> - I40E_INSET_IPV4_SRC | I40E_INSET_IPV4_DST |
> - I40E_INSET_FLEX_PAYLOAD,
> + I40E_INSET_IPV4_SRC | I40E_INSET_IPV4_DST,
>   [I40E_FILTER_PCTYPE_FRAG_IPV6] =
> - I40E_INSET_IPV6_SRC | I40E_INSET_IPV6_DST |
> - I40E_INSET_FLEX_PAYLOAD,
> + I40E_INSET_IPV6_SRC | I40E_INSET_IPV6_DST,
>   [I40E_FILTER_PCTYPE_NONF_IPV6_UDP] =
> - I40E_INSET_IPV6_SRC | I40E_INSET_IPV6_DST |
> - I40E_INSET_SRC_PORT | I40E_INSET_DST_PORT |
> - I40E_INSET_FLEX_PAYLOAD,
> + I40E_INSET_IPV6_SRC | I40E_INSET_IPV6_DST,
>   [I40E_FILTER_PCTYPE_NONF_IPV6_TCP] =
> - I40E_INSET_IPV6_SRC | I40E_INSET_IPV6_DST |
> - I40E_INS

[dpdk-dev] [PKTGEN] fixing weird termio issues that complicate debugging

2016-01-20 Thread Matthew Hall

On 1/20/16 8:26 AM, Wiles, Keith wrote:
> One problem is a number of people wanted to steal the code and use in a paid 
> application, so the copyright is some what a requirement. As you may know I 
> do a lot of debugging on Pktgen and I feel they are a nuisance. I can try to 
> see if we can clean up these messages, but do not hold your breath on getting 
> them to be removed.

Understood, I am just providing some usability feedback from the 
community. Any cleanup, however partial it may be for other reasons, 
will personally aid me in simplicity of debugging and using the pktgen 
to find performance improvements in other community applications and 
DPDK itself, which is my true end goal here. In particular I need it for 
all the changes I posted at various points for librte_lpm so I can test 
all this stuff to make sure it really works.

> IMO most of the information from DPDK is not very useful as why do I need to 
> see every lcore line, plus a lot of more useless information. Most of the 
> information could be reduced a couple of lines or only report issues not just 
> a bunch of useless information.

DPDK's messages might not be helpful for you, but in my case, the 
temporary hostile modifications I made based on the writeup sent 
previously, in order to make these messages visible again, is what 
allowed me to find and fix the root causes of my inactive port issues, 
because I have been working with DPDK's messages since 2011 and am very 
familiar with what they mean inside DPDK itself, so they were the only 
UI of Pktgen familiar to me at all compared to the rest which is custom 
stuff I didn't use before.

> The screen init should be scrolling the information off the screen to 
> preserve that info, unless it was changed by mistake.

I found a lot of info is being overwritten or lost due to the complex 
sequence of all these calls. This is what led to my email of questions 
for you.

> Please use tab stop of 4 instead of 8. IMO tab stop of 8 is so 1970?s and we 
> should not need tab stop of 8 as any system today will work. :-)

OK. But do note that this convention is different from every other 
project I've coded on before.

Sincerely,
Matthew.

[dpdk-dev] L3 Forwarding performance of DPDK on virtio

2016-01-20 Thread Clarylin L

Sorry. It's L2 forwarding.

I used testpmd with forwarding mode, like
testpmd --pci-blacklist :00:05.0 -c f -n 4  -- --portmask 3 -i
--total-num-mbufs=2 --nb-cores=3 --mbcache=512 --burst=512
--forward-mode=mac --eth-peer=0,90:e2:ba:9f:95:94
--eth-peer=1,90:e2:ba:9f:95:95

On Wed, Jan 20, 2016 at 5:25 PM, Tan, Jianfeng 
wrote:

>
> Hello!
>
>
> On 1/21/2016 7:51 AM, Clarylin L wrote:
>
>> I am running dpdk within a virtual guest as a L3 forwarder.
>>
>>
>> The VM has two ports connecting to two linux bridges (in turn connecting
>> two physical ports). DPDK is used to forward between these two ports (one
>> port connected to traffic generator and the other connected to sink). I
>> used iperf to test the throughput.
>>
>>
>> If the VM/DPDK is running on passthrough, it can achieve around 10G
>> end-to-end (from traffic generator to sink) throughput. However if the
>> VM/DPDK is running on virtio (virtio-net-pmd), it achieves just 150M
>> throughput, which is a huge degrade.
>>
>>
>> On the virtio, I also measured the throughput between the traffic
>> generator
>> and its connected port on VM, as well as throughput between the sink and
>> it's VM port. Both legs show around 7.5G throughput. So I guess forwarding
>> within the VM (from one port to the other) would be a big killer of the
>> performance.
>>
>>
>> Any suggestion on how I can root cause the poor performance issue, or any
>> idea on performance tuning techniques for virtio? thanks a lot!
>>
>
> The L3 forwarder, you mentioned, is the l3fwd example in DPDK? If so, I
> doubt it can work well with virtio, see another thread "Add API to get
> packet type info".
>
> Thanks,
> Jianfeng
>

[dpdk-dev] Status of Linux Foundation

2016-01-20 Thread Matthew Hall

Hello,

I was just reading the following blog post about downscaling community 
involvement at the Linux Foundation.

http://mjg59.dreamwidth.org/39546.html

I wondered if any of issues discussed there this might be relevant for 
the governance efforts moving forward on DPDK?

Sincerely,
Matthew.

[dpdk-dev] Future Direction for rte_eth_stats_get()

2016-01-20 Thread David Harton (dharton)

I see that some of the rte_eth_stats have been marked deprecated in 2.2 that 
are returned by rte_eth_stats_get().  Applications that utilize any number of 
device types rely on functions like this one to debug I/O issues.

Is there a reason the stats have been deprecated?  Why not keep the stats in 
line with the standard linux practices such as rtnl_link_stats64?

Note, using rte_eth_xstats_get() does not help for this particular scenario 
because a common binary API is needed to communicate through various layers and 
also provide a consistent view/meaning to users.  The xstats is excellent for 
debugging device specific scenarios but can't help in scenarios where a static 
view is expected.

Thanks,
Dave

[dpdk-dev] where to find ethernet CRC when stripping is off

2016-01-20 Thread Montorsi, Francesco

Hi Ivan,


> -Original Message-
> You would be right... if the PMDs did not transparently strip the CRC in
> software when hardware CRC stripping is disabled at port configuration (as
> described above).
> See for instance how the function ixgbe_recv_pkts_lro() in file
> drivers/net/ixgbe/ixgbe_rxtx.c deals with crc_len.

Yeah, I see. However, I wonder what's the utility of the hw_strip_crc feature 
if finally it is completely masked to the mbuf user.
However, to my understanding, looking at that ixgbe code, I think that what I 
wrote before:

   uint32_t crc = *(rte_pktmbuf_mtod_offset (mymbuf, uint32_t*, 
mymbuf->pkt_len)) ;

should work, since the pkt_len and data_len has the "crc_len" removed, but the 
CRC itself should be there.
I know it is kind of an hack, but at least for ixgbe that sounds like a 
possible (temporary) solution for me


> Considering your need, I think now that PMDs should keep the CRC that are
> stored in received packets when hardware CRC stripping is disabled by the
> application, so that the application can access it as needed.
> 
Yes, that would be very useful.

> Note that this would impose that the input packet processing of such DPDK
> applications be aware of the CRC presence (+4 in the packet length , for
> instance).
Or perhaps, to maintain backward compatibility, just a flag inside the mbuf 
could be set that informs the user that at the end of the mbuf packet, you can 
find 4 bytes with the CRC.

> 
> Let's see what others, if any, that might care think about such a change into
> the CRC stripping semantics.

Thanks!
Francesco

[dpdk-dev] [PATCH 3/3] examples/vmdq_dcb: extend sample for X710 supporting

2016-01-20 Thread Jingjing Wu

Currently, the example vmdq_dcb only works on Intel? 82599 NICs.
This patch extended this sample to make it work both on Intel? 82599
and X710/XL710 NICs by following changes:
  1. add VMDQ base queue checking to avoid forwarding on PF queues.
  2. assign each VMDQ pools with MAC address.
  3. add more arguments (nb-tcs, enable-rss) to change the default
 setting
  4. extend the max number of queues from 128 to 1024.
This patch also reworked the user guide for the vmdq_dcb sample.

Signed-off-by: Jingjing Wu 
---
 doc/guides/rel_notes/release_2_3.rst |   2 +
 doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst | 169 ++
 examples/vmdq_dcb/main.c | 388 ++-
 3 files changed, 430 insertions(+), 129 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index cd3d391..9637bf1 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -25,6 +25,8 @@ Libraries
 Examples
 

+* **vmdq_dcb: extended to support Intel XL710 series NICs.**
+

 Other
 ~
diff --git a/doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst 
b/doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst
index 9140a22..fe717fa 100644
--- a/doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst
+++ b/doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst
@@ -32,8 +32,8 @@ VMDQ and DCB Forwarding Sample Application
 ==

 The VMDQ and DCB Forwarding sample application is a simple example of packet 
processing using the DPDK.
-The application performs L2 forwarding using VMDQ and DCB to divide the 
incoming traffic into 128 queues.
-The traffic splitting is performed in hardware by the VMDQ and DCB features of 
the Intel? 82599 10 Gigabit Ethernet Controller.
+The application performs L2 forwarding using VMDQ and DCB to divide the 
incoming traffic into queues.
+The traffic splitting is performed in hardware by the VMDQ and DCB features of 
the Intel? 82599 and X710/XL710  Ethernet Controller.

 Overview
 
@@ -41,28 +41,27 @@ Overview
 This sample application can be used as a starting point for developing a new 
application that is based on the DPDK and
 uses VMDQ and DCB for traffic partitioning.

-The VMDQ and DCB filters work on VLAN traffic to divide the traffic into 128 
input queues on the basis of the VLAN ID field and
-VLAN user priority field.
-VMDQ filters split the traffic into 16 or 32 groups based on the VLAN ID.
-Then, DCB places each packet into one of either 4 or 8 queues within that 
group, based upon the VLAN user priority field.
-
-In either case, 16 groups of 8 queues, or 32 groups of 4 queues, the traffic 
can be split into 128 hardware queues on the NIC,
-each of which can be polled individually by a DPDK application.
+The VMDQ and DCB filters work on MAC and VLAN traffic to divide the traffic 
into input queues on the basis of the Destination MAC
+address, VLAN ID and VLAN user priority fields.
+VMDQ filters split the traffic into 16 or 32 groups based on the Destination 
MAC and VLAN ID.
+Then, DCB places each packet into one of queues within that group, based upon 
the VLAN user priority field.

 All traffic is read from a single incoming port (port 0) and output on port 1, 
without any processing being performed.
-The traffic is split into 128 queues on input, where each thread of the 
application reads from multiple queues.
-For example, when run with 8 threads, that is, with the -c FF option, each 
thread receives and forwards packets from 16 queues.
+Take Intel? 82599 NIC for example, the traffic is split into 128 queues on 
input, where each thread of the application reads from
+multiple queues. When run with 8 threads, that is, with the -c FF option, each 
thread receives and forwards packets from 16 queues.

-As supplied, the sample application configures the VMDQ feature to have 16 
pools with 8 queues each as indicated in :numref:`figure_vmdq_dcb_example`.
-The Intel? 82599 10 Gigabit Ethernet Controller NIC also supports the 
splitting of traffic into 32 pools of 4 queues each and
-this can be used by changing the NUM_POOLS parameter in the supplied code.
-The NUM_POOLS parameter can be passed on the command line, after the EAL 
parameters:
+As supplied, the sample application configures the VMDQ feature to have 32 
pools with 4 queues each as indicated in :numref:`figure_vmdq_dcb_example`.
+The Intel? 82599 10 Gigabit Ethernet Controller NIC also supports the 
splitting of traffic into 16 pools of 8 queues. While the
+Intel? X710 or XL710 Ethernet Controller NICs support any specified VMDQ pools 
of 4 or 8 queues each. For simplicity, only 16
+or 32 pools is supported in this sample. And queues numbers for each VMDQ pool 
can be changed by setting CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM
+in config/common_* file.
+The nb-pools, nb-tcs and enable-rss parameters can be passed on the command 
line, after the EAL parameters:

 .. code-bl

[dpdk-dev] [PATCH 2/3] ixgbe: add more multi queue mode checking

2016-01-20 Thread Jingjing Wu

The multi queue mode ETH_MQ_RX_VMDQ_DCB_RSS is not supported in
ixgbe driver. This patch added the checking.

Signed-off-by: Jingjing Wu 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 4c4c6df..24cd30b 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1853,6 +1853,11 @@ ixgbe_check_mq_mode(struct rte_eth_dev *dev)
return -EINVAL;
}
} else {
+   if (dev_conf->rxmode.mq_mode == ETH_MQ_RX_VMDQ_DCB_RSS) {
+   PMD_INIT_LOG(ERR, "VMDQ+DCB+RSS mq_mode is"
+ " not supported.");
+   return -EINVAL;
+   }
/* check configuration for vmdb+dcb mode */
if (dev_conf->rxmode.mq_mode == ETH_MQ_RX_VMDQ_DCB) {
const struct rte_eth_vmdq_dcb_conf *conf;
-- 
2.4.0

[dpdk-dev] [PATCH 1/3] i40e: enable DCB in VMDQ vsis

2016-01-20 Thread Jingjing Wu

Previously, DCB is only enabled on PF, queue mapping and BW
configuration is only done on PF vsi. This patch enabled DCB
for VMDQ vsis by following steps:
 1. Take BW and ETS configuration on VEB.
 2. Take BW and ETS configuration on VMDQ vsis.
 3. Update TC and queues mapping on VMDQ vsis.
To enable DCB on VMDQ, the number of TCs should not be lager than
the number of queues in VMDQ pools, and the number of queues per
VMDQ pool is specified by CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM
in config/common_* file.

Signed-off-by: Jingjing Wu 
---
 doc/guides/rel_notes/release_2_3.rst |   2 +
 drivers/net/i40e/i40e_ethdev.c   | 153 ++-
 drivers/net/i40e/i40e_ethdev.h   |  28 ---
 3 files changed, 151 insertions(+), 32 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 99de186..cd3d391 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -4,6 +4,8 @@ DPDK Release 2.3
 New Features
 

+* **Added i40e DCB support in VMDQ mode.**
+

 Resolved Issues
 ---
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index bf6220d..fbafcc6 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -8087,6 +8087,8 @@ i40e_vsi_update_queue_mapping(struct i40e_vsi *vsi,
int i, total_tc = 0;
uint16_t qpnum_per_tc, bsf, qp_idx;
struct rte_eth_dev_data *dev_data = I40E_VSI_TO_DEV_DATA(vsi);
+   struct i40e_pf *pf = I40E_VSI_TO_PF(vsi);
+   uint16_t used_queues;

ret = validate_tcmap_parameter(vsi, enabled_tcmap);
if (ret != I40E_SUCCESS)
@@ -8100,7 +8102,18 @@ i40e_vsi_update_queue_mapping(struct i40e_vsi *vsi,
total_tc = 1;
vsi->enabled_tc = enabled_tcmap;

-   qpnum_per_tc = dev_data->nb_rx_queues / total_tc;
+   /* different VSI has different queues assigned */
+   if (vsi->type == I40E_VSI_MAIN)
+   used_queues = dev_data->nb_rx_queues -
+   pf->nb_cfg_vmdq_vsi * RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM;
+   else if (vsi->type == I40E_VSI_VMDQ2)
+   used_queues = RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM;
+   else {
+   PMD_INIT_LOG(ERR, "unsupported VSI type.");
+   return I40E_ERR_NO_AVAILABLE_VSI;
+   }
+
+   qpnum_per_tc = used_queues / total_tc;
/* Number of queues per enabled TC */
if (qpnum_per_tc == 0) {
PMD_INIT_LOG(ERR, " number of queues is less that tcs.");
@@ -8145,6 +8158,93 @@ i40e_vsi_update_queue_mapping(struct i40e_vsi *vsi,
 }

 /*
+ * i40e_config_switch_comp_tc - Configure VEB tc setting for given TC map
+ * @veb: VEB to be configured
+ * @tc_map: enabled TC bitmap
+ *
+ * Returns 0 on success, negative value on failure
+ */
+static enum i40e_status_code
+i40e_config_switch_comp_tc(struct i40e_veb *veb, uint8_t tc_map)
+{
+   struct i40e_aqc_configure_switching_comp_bw_config_data veb_bw;
+   struct i40e_aqc_query_switching_comp_bw_config_resp bw_query;
+   struct i40e_aqc_query_switching_comp_ets_config_resp ets_query;
+   struct i40e_hw *hw = I40E_VSI_TO_HW(veb->associate_vsi);
+   enum i40e_status_code ret = I40E_SUCCESS;
+   int i;
+   uint32_t bw_max;
+
+   /* Check if enabled_tc is same as existing or new TCs */
+   if (veb->enabled_tc == tc_map)
+   return ret;
+
+   /* configure tc bandwidth */
+   memset(&veb_bw, 0, sizeof(veb_bw));
+   veb_bw.tc_valid_bits = tc_map;
+   /* Enable ETS TCs with equal BW Share for now across all VSIs */
+   for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++) {
+   if (tc_map & BIT_ULL(i))
+   veb_bw.tc_bw_share_credits[i] = 1;
+   }
+   ret = i40e_aq_config_switch_comp_bw_config(hw, veb->seid,
+  &veb_bw, NULL);
+   if (ret) {
+   PMD_INIT_LOG(ERR, "AQ command Config switch_comp BW allocation"
+ " per TC failed = %d",
+ hw->aq.asq_last_status);
+   return ret;
+   }
+
+   memset(&ets_query, 0, sizeof(ets_query));
+   ret = i40e_aq_query_switch_comp_ets_config(hw, veb->seid,
+  &ets_query, NULL);
+   if (ret != I40E_SUCCESS) {
+   PMD_DRV_LOG(ERR, "Failed to get switch_comp ETS"
+" configuration %u", hw->aq.asq_last_status);
+   return ret;
+   }
+   memset(&bw_query, 0, sizeof(bw_query));
+   ret = i40e_aq_query_switch_comp_bw_config(hw, veb->seid,
+ &bw_query, NULL);
+   if (ret != I40E_SUCCESS) {
+   PMD_DRV_LOG(ERR, "Failed to get switch_comp bandwidth"
+" configuration %u", hw->aq.asq_last_status);
+

[dpdk-dev] [PATCH 0/3] extend vmdq_dcb sample for X710 supporting

2016-01-20 Thread Jingjing Wu

Currently, the example vmdq_dcb only works on Intel? 82599 NICs.
This patch set extended this sample to make it works both on
Intel? 82599 and X710/XL710 NICs. This patch set also enabled
DCB VMDQ mode in i40e driver and added unsupported mode checking
in ixgbe driver.


Jingjing Wu (3):
  i40e: enable DCB in VMDQ vsis
  ixgbe: add more multi queue mode checking
  examples/vmdq_dcb: extend sample for X710 supporting

 doc/guides/rel_notes/release_2_3.rst |   4 +
 doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst | 169 ++
 drivers/net/i40e/i40e_ethdev.c   | 153 +++--
 drivers/net/i40e/i40e_ethdev.h   |  28 +-
 drivers/net/ixgbe/ixgbe_ethdev.c |   5 +
 examples/vmdq_dcb/main.c | 388 ++-
 6 files changed, 586 insertions(+), 161 deletions(-)

-- 
2.4.0

[dpdk-dev] where to find ethernet CRC when stripping is off

2016-01-20 Thread Ivan Boule

On 01/20/2016 04:02 PM, Montorsi, Francesco wrote:
> Hi all,
>
> I need to get access to the Ethernet CRC of received packets.
> To do this, I'm configuring:
>
> port_conf.rxmode.hw_strip_crc = 0;
>
> Now my question is: how am I supposed to access the Ethernet CRC from a DPDK 
> mbuf?
> Is the CRC just the 4 final bytes of the packets?
>
> Is this correct:
>
> uint32_t crc = rte_pktmbuf_mtod_offset (mymbuf, uint32_t*, 
> mymbuf->pkt_len) ;
>
> ?
>
> Thanks,
> Francesco Montorsi
>
Hi Francesco,

You would be right... if the PMDs did not transparently strip the CRC in 
software when hardware CRC stripping is disabled at port configuration 
(as described above).
See for instance how the function ixgbe_recv_pkts_lro() in file 
drivers/net/ixgbe/ixgbe_rxtx.c deals with crc_len.

Considering your need, I think now that PMDs should keep the CRC that 
are stored in received packets when hardware CRC stripping is disabled 
by the application, so that the application can access it as needed.

Note that this would impose that the input packet processing of such 
DPDK applications be aware of the CRC presence (+4 in the packet length 
, for instance).

Let's see what others, if any, that might care think about such a change 
into the CRC stripping semantics.

Ivan

-- 
Ivan Boule
6WIND Development Engineer

[dpdk-dev] [PKTGEN] fixing weird termio issues that complicate debugging

2016-01-20 Thread Wiles, Keith

On 1/20/16, 10:26 AM, "dev on behalf of Wiles, Keith"  wrote:

>On 1/20/16, 12:32 AM, "dev on behalf of Matthew Hall" on behalf of mhall at mhcomputing.net> wrote:
>
>>Hello,

Please try modifying pktgen-main.c:main() at the top of the function to this:

wr_scrn_setw(1);/* Reset the window size, from possible crash 
run. */
wr_scrn_pos(100, 1);/* Move the cursor to the bottom of the screen 
again */

printf("\n%s %s\n", wr_copyright_msg(), wr_powered_by()); fflush(stdout);

/* call before the rte_eal_init() */
(void)rte_set_application_usage_hook(pktgen_usage);

Maybe this will fix up most of your issues with DPDK output. I normally set the 
log-level to 7 to remove most of the DPDK messages.


>>
>>Since the pktgen code is reindented I am finding time to read through it 
>>and experiment and see if I can get it working.
>>
>>I have issues with the init process of pktgen. It is difficult to debug 
>>it because the init code does a lot of very scary stuff to the terminal 
>>control / TTY device at inconvenient times in an inconvenient order, and 
>>in the process damages the debug output and damages the screen of your 
>>GDB without doing weird things to run GDB on a different TTY.
>>
>>Of course I am willing to contribute patches and not just complain, but 
>>first I need some help to follow what is going on.
>>
>>Here is the problematic call-flow with some explanation what went wrong 
>>trying it on some community machines outside of its original environment:
>>
>>1) it calls printf("\n%s %s\n", wr_copyright_msg(), wr_powered_by()); 
>>which dumps tons of weird boilerplate of licenses, copyrights, code 
>>creator, etc.
>>
>>It is open source and everybody that matters already knows who coded it, 
>>so is this stuff really that important? This gets in the way when you 
>>are trying to work on it and I just have to comment it out.
>
>One problem is a number of people wanted to steal the code and use in a paid 
>application, so the copyright is some what a requirement. As you may know I do 
>a lot of debugging on Pktgen and I feel they are a nuisance. I can try to see 
>if we can clean up these messages, but do not hold your breath on getting them 
>to be removed. 
>>
>>2) it calls wr_scrn_setw and tinkers with the windows size very early in 
>>the init which can make your terminal weird
>>
>>3) it calls rte_eal_init which produces a lot of nice debug output, 
>>which is fine
>
>IMO most of the information from DPDK is not very useful as why do I need to 
>see every lcore line, plus a lot of more useless information. Most of the 
>information could be reduced a couple of lines or only report issues not just 
>a bunch of useless information.
>>
>>4) it calls pktgen_init_screen, which calls wr_scrn_init, which calls 
>>wr_scrn_erase which destroys the valuable debug output just created in 
>>(c) which is a bad thing
>
>The screen init should be scrolling the information off the screen to preserve 
>that info, unless it was changed by mistake.
>>
>>5) it calls wr_print_copyright and dumps more boilerplate I am not sure 
>>is needed
>>
>>6) it logs some helpful messages about the port / descriptor settings 
>>which is fine
>>
>>7) it calls the pktgen_config_ports function which can crash in ways you 
>>need the destroyed debug output to fix.
>>
>>For example in my case that function crashes here:
>>
>> if (pktgen.nb_ports == 0)
>> pktgen_log_panic("*** Did not find any ports to use ***");
>>
>>8) Later it makes a logo and a splash screen (wr_log, wr_splash_screen). 
>>Is this stuff really needed? This is a ton of output for just starting 
>>up some test program.
>>
>>To fix this debug problem I propose some changes which I am happy to 
>>help develop:
>>
>>1) decide what of this output we really need here and greatly simplify 
>>how much gets printed out
>>
>>2) move wr_scrn_setw right before pktgen_init_screen and after 
>>rte_eal_init to prevent damaging that output
>>
>>3) consider how wr_scrn_init is called in pktgen_init_screen, because it 
>>calls wr_scrn_erase which damages output
>
>Again it could be scrolling that information off the screen, just need a large 
>screen scroll buffer.
>>
>>4) I think that pktgen_config_ports should be called before all this 
>>weird screen init stuff, so that if it fails you can actually see what 
>>happened there.
>>
>>One other random topic... on the long lines of code it looks like there 
>>are some gigantic tab-indents pushing things off to the right still. One 
>>example, maybe there are others or another setting which is needed to 
>>fix all of these:
>
>Please use tab stop of 4 instead of 8. IMO tab stop of 8 is so 1970?s and we 
>should not need tab stop of 8 as any system today will work. :-)
>>
>> info->seq_pkt = (pkt_seq_t *)rte_zmalloc_socket(buff, 
>>(sizeof(pkt_seq_t) * NUM_TOTAL_PKTS),
>> 
>>  RTE_CACHE_LINE_SIZE, 
>>rte_socket_id());
>>
>

[dpdk-dev] [PATCH] rte.extvars.mk: allow overriding RTE_SDK_BIN from the environment

2016-01-20 Thread Thomas Monjalon

Hi Matthew,

RTE_SDK_BIN is an internal variable and should not be overriden.

2016-01-19 21:30, Matthew Hall:
> Currently pktgen-dpdk and many other external apps will fail to compile
> if the build output directory name is not equal to the target name.
> 
> This causes problems if you used an alternative build output directory.

Have you installed DPDK somewhere? Example:
make install O=mybuild DESTDIR=mylocalinstall

Then you should build your app like this:
make RTE_SDK=$(readlink -e ../dpdk/mylocalinstall/usr/local/share/dpdk)

[dpdk-dev] [PKTGEN] fixing weird termio issues that complicate debugging

2016-01-20 Thread Wiles, Keith

On 1/20/16, 12:32 AM, "dev on behalf of Matthew Hall"  wrote:

>Hello,
>
>Since the pktgen code is reindented I am finding time to read through it 
>and experiment and see if I can get it working.
>
>I have issues with the init process of pktgen. It is difficult to debug 
>it because the init code does a lot of very scary stuff to the terminal 
>control / TTY device at inconvenient times in an inconvenient order, and 
>in the process damages the debug output and damages the screen of your 
>GDB without doing weird things to run GDB on a different TTY.
>
>Of course I am willing to contribute patches and not just complain, but 
>first I need some help to follow what is going on.
>
>Here is the problematic call-flow with some explanation what went wrong 
>trying it on some community machines outside of its original environment:
>
>1) it calls printf("\n%s %s\n", wr_copyright_msg(), wr_powered_by()); 
>which dumps tons of weird boilerplate of licenses, copyrights, code 
>creator, etc.
>
>It is open source and everybody that matters already knows who coded it, 
>so is this stuff really that important? This gets in the way when you 
>are trying to work on it and I just have to comment it out.

One problem is a number of people wanted to steal the code and use in a paid 
application, so the copyright is some what a requirement. As you may know I do 
a lot of debugging on Pktgen and I feel they are a nuisance. I can try to see 
if we can clean up these messages, but do not hold your breath on getting them 
to be removed. 
>
>2) it calls wr_scrn_setw and tinkers with the windows size very early in 
>the init which can make your terminal weird
>
>3) it calls rte_eal_init which produces a lot of nice debug output, 
>which is fine

IMO most of the information from DPDK is not very useful as why do I need to 
see every lcore line, plus a lot of more useless information. Most of the 
information could be reduced a couple of lines or only report issues not just a 
bunch of useless information.
>
>4) it calls pktgen_init_screen, which calls wr_scrn_init, which calls 
>wr_scrn_erase which destroys the valuable debug output just created in 
>(c) which is a bad thing

The screen init should be scrolling the information off the screen to preserve 
that info, unless it was changed by mistake.
>
>5) it calls wr_print_copyright and dumps more boilerplate I am not sure 
>is needed
>
>6) it logs some helpful messages about the port / descriptor settings 
>which is fine
>
>7) it calls the pktgen_config_ports function which can crash in ways you 
>need the destroyed debug output to fix.
>
>For example in my case that function crashes here:
>
> if (pktgen.nb_ports == 0)
> pktgen_log_panic("*** Did not find any ports to use ***");
>
>8) Later it makes a logo and a splash screen (wr_log, wr_splash_screen). 
>Is this stuff really needed? This is a ton of output for just starting 
>up some test program.
>
>To fix this debug problem I propose some changes which I am happy to 
>help develop:
>
>1) decide what of this output we really need here and greatly simplify 
>how much gets printed out
>
>2) move wr_scrn_setw right before pktgen_init_screen and after 
>rte_eal_init to prevent damaging that output
>
>3) consider how wr_scrn_init is called in pktgen_init_screen, because it 
>calls wr_scrn_erase which damages output

Again it could be scrolling that information off the screen, just need a large 
screen scroll buffer.
>
>4) I think that pktgen_config_ports should be called before all this 
>weird screen init stuff, so that if it fails you can actually see what 
>happened there.
>
>One other random topic... on the long lines of code it looks like there 
>are some gigantic tab-indents pushing things off to the right still. One 
>example, maybe there are others or another setting which is needed to 
>fix all of these:

Please use tab stop of 4 instead of 8. IMO tab stop of 8 is so 1970?s and we 
should not need tab stop of 8 as any system today will work. :-)
>
> info->seq_pkt = (pkt_seq_t *)rte_zmalloc_socket(buff, 
>(sizeof(pkt_seq_t) * NUM_TOTAL_PKTS),
> 
>  RTE_CACHE_LINE_SIZE, 
>rte_socket_id());
>
>Thoughts?
>Matthew Hall

Improvement to Pktgen is always welcome and the copyright info is going to be a 
bit hard to remove as that was one of the requirements when I open sourced the 
code. I understand it maybe a bit of output. I do not think it is really a user 
issue causing users to stop using it as startup is only down once, in my case I 
may start Pktgen a few times a day for development and it does not seem to slow 
me down much. :-)
>


Regards,
Keith

[dpdk-dev] [PATCH] ethdev: expose link status and speed using xstats

2016-01-20 Thread Thomas Monjalon

2016-01-20 10:03, Kyle Larose:
> Hi Harry,
> 
> On Wed, Jan 20, 2016 at 9:45 AM, Van Haaren, Harry
>  wrote:
> > Hi Kyle,
> 
> >
> > In theory we could create a new API for this, but I think the current 
> > xstats API is a good fit for exposing this info, so why create extra APIs? 
> > As a client of the DPDK API, I would prefer more statistics in a single API 
> > than have to research and implement two or more APIs to retrieve the 
> > information to monitor.
> >
> 
> You create new APIs for many reasons: modularity, simplicitly within
> the API, consistency, etc. My main concern with this proposed change
> relates to consistency. Previously, each stat had similar semantics.
> It was a number, representing the amount of times something had
> occurred on a chip. This fact allows you to perform operations like
> addition, subtraction/etc and expect that the result will be
> meaningful for every value in the array.
> 
> For example, suppose I wrote a tool to give the "rate" for each of the
> stats. We could sample these stats periodically, then output the
> difference between the two samples divided by the time between samples
> for each stat. A naive implementation, but quite simple.
> 
> However, if we start adding values like link speed and state, which
> are not really numerical, or not monotonic, you can no longer apply
> the same mathematical operations on them and expect them to be
> meaningful. For example, suppose a link went down. The "rate" for that
> stat would be -1. Does that really make sense? Anyone using this API
> would need to explicitly filter out the non-stats, or risk nonsensical
> output.
> 
> Let's also consider how to interpret the value. When I look at a stat,
> there's usually one of two meanings: it's either a number of packets,
> or it's a number of bytes. We're now adding exceptions to that rule.
> Link state is a boolean. Link speed is a value in mbps. Duplex is
> pretty much an enum.
> 
> We already have the rte_eth_link_get function. Why not let users
> continue to use that? It's well defined, it is simple, and it is
> consistent.

+1

Please also consider this work in progress
about link speed information:
http://dpdk.org/dev/patchwork/patch/7995/

[dpdk-dev] [PKTGEN] fixing weird termio issues that complicate debugging

2016-01-20 Thread Wiles, Keith

On 1/20/16, 12:32 AM, "dev on behalf of Matthew Hall"  wrote:

Hi Matthew,

I have some comments below, but will address your full email later when I have 
a bit more time.

>Hello,
>
>Since the pktgen code is reindented I am finding time to read through it 
>and experiment and see if I can get it working.
>
>I have issues with the init process of pktgen. It is difficult to debug 
>it because the init code does a lot of very scary stuff to the terminal 
>control / TTY device at inconvenient times in an inconvenient order, and 
>in the process damages the debug output and damages the screen of your 
>GDB without doing weird things to run GDB on a different TTY.
>
>Of course I am willing to contribute patches and not just complain, but 
>first I need some help to follow what is going on.
>
>Here is the problematic call-flow with some explanation what went wrong 
>trying it on some community machines outside of its original environment:
>
>1) it calls printf("\n%s %s\n", wr_copyright_msg(), wr_powered_by()); 
>which dumps tons of weird boilerplate of licenses, copyrights, code 
>creator, etc.
>
>It is open source and everybody that matters already knows who coded it, 
>so is this stuff really that important? This gets in the way when you 
>are trying to work on it and I just have to comment it out.
>
>2) it calls wr_scrn_setw and tinkers with the windows size very early in 
>the init which can make your terminal weird
>
>3) it calls rte_eal_init which produces a lot of nice debug output, 
>which is fine
>
>4) it calls pktgen_init_screen, which calls wr_scrn_init, which calls 
>wr_scrn_erase which destroys the valuable debug output just created in 
>(c) which is a bad thing
>
>5) it calls wr_print_copyright and dumps more boilerplate I am not sure 
>is needed
>
>6) it logs some helpful messages about the port / descriptor settings 
>which is fine
>
>7) it calls the pktgen_config_ports function which can crash in ways you 
>need the destroyed debug output to fix.
>
>For example in my case that function crashes here:
>
> if (pktgen.nb_ports == 0)
> pktgen_log_panic("*** Did not find any ports to use ***");

This problem is DPDK did not find any ports to use for Pktgen. Please check to 
make sure you have the right ports attached to gib_uio and they are usable by 
DPDK.
>
>8) Later it makes a logo and a splash screen (wr_log, wr_splash_screen). 
>Is this stuff really needed? This is a ton of output for just starting 
>up some test program.
>
>To fix this debug problem I propose some changes which I am happy to 
>help develop:
>
>1) decide what of this output we really need here and greatly simplify 
>how much gets printed out
>
>2) move wr_scrn_setw right before pktgen_init_screen and after 
>rte_eal_init to prevent damaging that output
>
>3) consider how wr_scrn_init is called in pktgen_init_screen, because it 
>calls wr_scrn_erase which damages output
>
>4) I think that pktgen_config_ports should be called before all this 
>weird screen init stuff, so that if it fails you can actually see what 
>happened there.
>
>One other random topic... on the long lines of code it looks like there 
>are some gigantic tab-indents pushing things off to the right still. One 
>example, maybe there are others or another setting which is needed to 
>fix all of these:

Please use tab stops of 4 instead of 8.
>
> info->seq_pkt = (pkt_seq_t *)rte_zmalloc_socket(buff, 
>(sizeof(pkt_seq_t) * NUM_TOTAL_PKTS),
> 
>  RTE_CACHE_LINE_SIZE, 
>rte_socket_id());
>
>Thoughts?
>Matthew Hall
>


Regards,
Keith

[dpdk-dev] [PATCH] ethdev: expose link status and speed using xstats

2016-01-20 Thread Van Haaren, Harry

> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> 2016-01-20 10:03, Kyle Larose:
> > We already have the rte_eth_link_get function. Why not let users
> > continue to use that? It's well defined, it is simple, and it is
> > consistent.
> 
> +1


Ok, no problem. I'll mark the link-status patch rejected in Patchwork.


I've just sent the Keepalive patchset, patch #3 is of interest regarding this 
discussion:
http://dpdk.org/dev/patchwork/patch/10003/

It adds a function to the API for collecting xstats, meaning it doesn't pollute 
the rte_eth_xstats_get() output. I'm interested to hear the communities view of 
this approach.


Regards, -Harry

[dpdk-dev] [PATCH] i40e: fix vlan filtering

2016-01-20 Thread Julien Meunier

Hello,

Yes, you are right. Even if VLAN filtering is configured most of the 
time during initialization, we should managed the case of multiple MAC 
addresses already configured.

I will send you a v2 patch with this modification, use ether_addr_copy 
and add additional debug messages.

Regards,

On 01/20/2016 06:00 AM, Zhang, Helin wrote:
>> -Original Message-
>> From: Julien Meunier [mailto:julien.meunier at 6wind.com]
>> Sent: Tuesday, January 19, 2016 1:19 AM
>> To: Zhang, Helin
>> Cc:dev at dpdk.org
>> Subject: [PATCH] i40e: fix vlan filtering
>>
>> VLAN filtering was always performed, even if hw_vlan_filter was disabled.
>> During device initialization, default filter RTE_MACVLAN_PERFECT_MATCH
>> was applied. In this situation, all incoming VLAN frames were dropped by the
>> card (increase of the register RUPP - Rx Unsupported Protocol).
>>
>> In order to restore default behavior, if HW VLAN filtering is activated, set 
>> a
>> filter to match MAC and VLAN. If not, set a filter to only match MAC.
>>
>> Signed-off-by: Julien Meunier
>> Signed-off-by: David Marchand
>> ---
>>   drivers/net/i40e/i40e_ethdev.c | 39
>> ++-
>>   drivers/net/i40e/i40e_ethdev.h |  1 +
>>   2 files changed, 39 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
>> index bf6220d..ef9d578 100644
>> --- a/drivers/net/i40e/i40e_ethdev.c
>> +++ b/drivers/net/i40e/i40e_ethdev.c
>> @@ -2332,6 +2332,13 @@ i40e_vlan_offload_set(struct rte_eth_dev *dev,
>> int mask)
>>  struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data-
>>> dev_private);
>>  struct i40e_vsi *vsi = pf->main_vsi;
>>
>> +if (mask & ETH_VLAN_FILTER_MASK) {
>> +if (dev->data->dev_conf.rxmode.hw_vlan_filter)
>> +i40e_vsi_config_vlan_filter(vsi, TRUE);
>> +else
>> +i40e_vsi_config_vlan_filter(vsi, FALSE);
>> +}
>> +
>>  if (mask & ETH_VLAN_STRIP_MASK) {
>>  /* Enable or disable VLAN stripping */
>>  if (dev->data->dev_conf.rxmode.hw_vlan_strip)
>> @@ -4156,6 +4163,34 @@ fail_mem:
>>  return NULL;
>>   }
>>
>> +/* Configure vlan filter on or off */
>> +int
>> +i40e_vsi_config_vlan_filter(struct i40e_vsi *vsi, bool on) {
>> +struct i40e_hw *hw = I40E_VSI_TO_HW(vsi);
>> +struct i40e_mac_filter_info filter;
>> +int ret;
>> +
>> +rte_memcpy(&filter.mac_addr,
>> +   (struct ether_addr *)(hw->mac.perm_addr),
>> ETH_ADDR_LEN);
>> +ret = i40e_vsi_delete_mac(vsi, &filter.mac_addr);
>> +
>> +if (on) {
>> +/* Filter to match MAC and VLAN */
>> +filter.filter_type = RTE_MACVLAN_PERFECT_MATCH;
>> +} else {
>> +/* Filter to match only MAC */
>> +filter.filter_type = RTE_MAC_PERFECT_MATCH;
>> +}
>> +
>> +ret |= i40e_vsi_add_mac(vsi, &filter);
> How would it be if multiple mac addresses has been configured?
> I think this might be ignored in the code changes, right?
>
> Regards,
> Helin
>
>> +
>> +if (ret)
>> +PMD_DRV_LOG(INFO, "Update VSI failed to %s vlan filter",
>> +on ? "enable" : "disable");
>> +return ret;
>> +}
>> +
>>   /* Configure vlan stripping on or off */  int
>> i40e_vsi_config_vlan_stripping(struct i40e_vsi *vsi, bool on) @@ -4203,9
>> +4238,11 @@ i40e_dev_init_vlan(struct rte_eth_dev *dev)  {
>>  struct rte_eth_dev_data *data = dev->data;
>>  int ret;
>> +int mask = 0;
>>
>>  /* Apply vlan offload setting */
>> -i40e_vlan_offload_set(dev, ETH_VLAN_STRIP_MASK);
>> +mask = ETH_VLAN_STRIP_MASK | ETH_VLAN_FILTER_MASK;
>> +i40e_vlan_offload_set(dev, mask);
>>
>>  /* Apply double-vlan setting, not implemented yet */
>>
>> diff --git a/drivers/net/i40e/i40e_ethdev.h
>> b/drivers/net/i40e/i40e_ethdev.h index 1f9792b..5505d72 100644
>> --- a/drivers/net/i40e/i40e_ethdev.h
>> +++ b/drivers/net/i40e/i40e_ethdev.h
>> @@ -551,6 +551,7 @@ void i40e_vsi_queues_unbind_intr(struct i40e_vsi
>> *vsi);  int i40e_vsi_vlan_pvid_set(struct i40e_vsi *vsi,
>> struct i40e_vsi_vlan_pvid_info *info);  int
>> i40e_vsi_config_vlan_stripping(struct i40e_vsi *vsi, bool on);
>> +int i40e_vsi_config_vlan_filter(struct i40e_vsi *vsi, bool on);
>>   uint64_t i40e_config_hena(uint64_t flags);  uint64_t
>> i40e_parse_hena(uint64_t flags);  enum i40e_status_code
>> i40e_fdir_setup_tx_resources(struct i40e_pf *pf);
>> --
>> 2.1.4

-- 
Julien MEUNIER
6WIND

[dpdk-dev] [PATCH 3/3] keepalive: add rte_keepalive_xstats() and example

2016-01-20 Thread Harry van Haaren

This patch adds a function that exposes keepalive statistics
re-using the existing rte_eth_xstats struct. The function provides
the client API the opportunity to read last-seen and status of
each core.

Signed-off-by: Harry van Haaren 
---
 doc/guides/rel_notes/release_2_3.rst|  6 
 doc/guides/sample_app_ug/keep_alive.rst | 11 ++
 examples/l2fwd-keepalive/main.c | 22 ++--
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  7 
 lib/librte_eal/common/include/rte_keepalive.h   | 17 -
 lib/librte_eal/common/rte_keepalive.c   | 48 -
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  7 
 7 files changed, 113 insertions(+), 5 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 99de186..9e33aa2 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -4,6 +4,12 @@ DPDK Release 2.3
 New Features
 

+* **Keep Alive xstats**
+
+  A function ``rte_keepalive_xstats()`` has been added to the
+  keepalive header, allowing the retrieval of keepalive statistics
+  such as last-alive-time and the status of each core registered
+  for monitoring. The API reflects that of the existing xstats API.

 Resolved Issues
 ---
diff --git a/doc/guides/sample_app_ug/keep_alive.rst 
b/doc/guides/sample_app_ug/keep_alive.rst
index 1478faf..839e29c 100644
--- a/doc/guides/sample_app_ug/keep_alive.rst
+++ b/doc/guides/sample_app_ug/keep_alive.rst
@@ -190,3 +190,14 @@ The rte_keepalive_mark_alive function simply sets the core 
state to alive.
 {
 keepcfg->state_flags[rte_lcore_id()] = ALIVE;
 }
+
+Keepalive exposes its statistics using an API very similar to the xstats API.
+This allows client code to call the function and retrieve the current status
+of keepalive, providing information like last-alive time and status per-core
+that has keepalive enabled.
+
+.. code-block:: c
+
+nstats = rte_keepalive_xstats(rte_global_keepalive_info, xstats, nstats);
+for (i = 0; i < nstats; i++)
+printf("%s : %lu\n", xstats[i].name, xstats[i].value);
diff --git a/examples/l2fwd-keepalive/main.c b/examples/l2fwd-keepalive/main.c
index f4d52f2..a8f2ba4 100644
--- a/examples/l2fwd-keepalive/main.c
+++ b/examples/l2fwd-keepalive/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -66,6 +66,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -139,7 +140,7 @@ struct l2fwd_port_statistics 
port_statistics[RTE_MAX_ETHPORTS];
 /* A tsc-based timer responsible for triggering statistics printout */
 #define TIMER_MILLISECOND 1
 #define MAX_TIMER_PERIOD 86400 /* 1 day max */
-static int64_t timer_period = 10 * TIMER_MILLISECOND * 1000; /* 10 seconds */
+static int64_t timer_period = 1 * TIMER_MILLISECOND * 1000; /* 1 second */
 static int64_t check_period = 5; /* default check cycle is 5ms */

 /* Keepalive structure */
@@ -189,7 +190,22 @@ print_stats(__attribute__((unused)) struct rte_timer 
*ptr_timer,
   total_packets_tx,
   total_packets_rx,
   total_packets_dropped);
-   printf("\n\n");
+   printf("\nKeep Alive xstats ==\n");
+
+   /* Keepalive Xstats */
+   unsigned nstats = rte_keepalive_xstats(rte_global_keepalive_info, 0, 0);
+   struct rte_eth_xstats *xstats = rte_zmalloc( "RTE_KEEPALIVE_XSTATS",
+   sizeof( struct rte_eth_xstats) * nstats,
+   RTE_CACHE_LINE_SIZE);
+
+   nstats = rte_keepalive_xstats(rte_global_keepalive_info, xstats,
+ nstats);
+   unsigned i;
+   for (i = 0; i < nstats; i++)
+   printf("%s\t%lu\n", xstats[i].name,
+   xstats[i].value);
+   printf("\n");
+   rte_free(xstats);
 }

 /* Send the burst of packets on an output interface */
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 9d7adf1..f5e16a7 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -135,3 +135,10 @@ DPDK_2.2 {
rte_xen_dom0_supported;

 } DPDK_2.1;
+
+DPDK_2.3 {
+   global:
+
+   rte_keepalive_xstats;
+
+} DPDK_2.2;
diff --git a/lib/librte_eal/common/include/rte_keepalive.h 
b/lib/librte_eal/common/include/rte_keepalive.h
index 02472c0..352dd17 100644
--- a/lib/librte_eal/common/include/rte_keepalive.h
+++ b/lib/librte_eal/common/include/rte_kee

[dpdk-dev] [PATCH 2/3] eal: add keepalive core register timestamp

2016-01-20 Thread Harry van Haaren

This patch sets a timestamp on each lcore when it is registered
for keepalive. This causes the first values read by the monitor
to show time since the core was registered, instead of the delta
between 0 and the timestamp counter.

Signed-off-by: Harry van Haaren 
---
 lib/librte_eal/common/rte_keepalive.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/rte_keepalive.c 
b/lib/librte_eal/common/rte_keepalive.c
index 736fd0f..5358322 100644
--- a/lib/librte_eal/common/rte_keepalive.c
+++ b/lib/librte_eal/common/rte_keepalive.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 

 static void
 print_trace(const char *msg, struct rte_keepalive *keepcfg, int idx_core)
@@ -108,6 +109,8 @@ rte_keepalive_create(rte_keepalive_failure_callback_t 
callback,
 void
 rte_keepalive_register_core(struct rte_keepalive *keepcfg, const int id_core)
 {
-   if (id_core < RTE_KEEPALIVE_MAXCORES)
+   if (id_core < RTE_KEEPALIVE_MAXCORES) {
keepcfg->active_cores[id_core] = 1;
+   keepcfg->last_alive[id_core] = rte_rdtsc();
+   }
 }
-- 
2.5.0

[dpdk-dev] [PATCH 1/3] doc: fix keepalive sample app guide

2016-01-20 Thread Harry van Haaren

This patch fixes some mismatches between the keepalive code
and the docs. Struct names, and descriptions are not in line
with the codebase.

Fixes: e64833f2273a ("examples/l2fwd-keepalive: add sample application")

Signed-off-by: Harry van Haaren 
---
 doc/guides/sample_app_ug/keep_alive.rst | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/doc/guides/sample_app_ug/keep_alive.rst 
b/doc/guides/sample_app_ug/keep_alive.rst
index 080811b..1478faf 100644
--- a/doc/guides/sample_app_ug/keep_alive.rst
+++ b/doc/guides/sample_app_ug/keep_alive.rst
@@ -1,6 +1,6 @@

 ..  BSD LICENSE
-Copyright(c) 2015 Intel Corporation. All rights reserved.
+Copyright(c) 2015-2016 Intel Corporation. All rights reserved.
 All rights reserved.

 Redistribution and use in source and binary forms, with or without
@@ -143,17 +143,17 @@ The Keep-Alive/'Liveliness' conceptual scheme:
 The following sections provide some explanation of the code aspects
 that are specific to the Keep Alive sample application.

-The heartbeat functionality is initialized with a struct
-rte_heartbeat and the callback function to invoke in the
+The keepalive functionality is initialized with a struct
+rte_keepalive and the callback function to invoke in the
 case of a timeout.

 .. code-block:: c

 rte_global_keepalive_info = rte_keepalive_create(&dead_core, NULL);
-if (rte_global_hbeat_info == NULL)
+if (rte_global_keepalive_info == NULL)
 rte_exit(EXIT_FAILURE, "keepalive_create() failed");

-The function that issues the pings hbeat_dispatch_pings()
+The function that issues the pings keepalive_dispatch_pings()
 is configured to run every check_period milliseconds.

 .. code-block:: c
@@ -162,7 +162,8 @@ is configured to run every check_period milliseconds.
 (check_period * rte_get_timer_hz()) / 1000,
 PERIODICAL,
 rte_lcore_id(),
-&hbeat_dispatch_pings, rte_global_keepalive_info
+&rte_keepalive_dispatch_pings,
+rte_global_keepalive_info
 ) != 0 )
 rte_exit(EXIT_FAILURE, "Keepalive setup failure.\n");

@@ -173,7 +174,7 @@ functionality and the example random failures.

 .. code-block:: c

-rte_keepalive_mark_alive(&rte_global_hbeat_info);
+rte_keepalive_mark_alive(&rte_global_keepalive_info);
 cur_tsc = rte_rdtsc();

 /* Die randomly within 7 secs for demo purposes.. */
@@ -185,7 +186,7 @@ The rte_keepalive_mark_alive function simply sets the core 
state to alive.
 .. code-block:: c

 static inline void
-rte_keepalive_mark_alive(struct rte_heartbeat *keepcfg)
+rte_keepalive_mark_alive(struct rte_keepalive *keepcfg)
 {
-keepcfg->state_flags[rte_lcore_id()] = 1;
+keepcfg->state_flags[rte_lcore_id()] = ALIVE;
 }
-- 
2.5.0

[dpdk-dev] [PATCH 0/3] Keep-alive stats and doc fixes

2016-01-20 Thread Harry van Haaren

This patchset contains:
1. Fix variable naming consistency in sample guide
2. Set last_seen time on core when it gets registered
3. An xstats implementation for last-seen and current core status

Harry van Haaren (3):
  doc: fix keepalive sample app guide
  eal: add keepalive core register timestamp
  keepalive: add rte_keepalive_xstats() and example

 doc/guides/rel_notes/release_2_3.rst|  6 +++
 doc/guides/sample_app_ug/keep_alive.rst | 30 +-
 examples/l2fwd-keepalive/main.c | 22 --
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  7 
 lib/librte_eal/common/include/rte_keepalive.h   | 17 +++-
 lib/librte_eal/common/rte_keepalive.c   | 53 -
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  7 
 7 files changed, 127 insertions(+), 15 deletions(-)

-- 
2.5.0

[dpdk-dev] L3 Forwarding performance of DPDK on virtio

2016-01-20 Thread Clarylin L

I am running dpdk within a virtual guest as a L3 forwarder.


The VM has two ports connecting to two linux bridges (in turn connecting
two physical ports). DPDK is used to forward between these two ports (one
port connected to traffic generator and the other connected to sink). I
used iperf to test the throughput.


If the VM/DPDK is running on passthrough, it can achieve around 10G
end-to-end (from traffic generator to sink) throughput. However if the
VM/DPDK is running on virtio (virtio-net-pmd), it achieves just 150M
throughput, which is a huge degrade.


On the virtio, I also measured the throughput between the traffic generator
and its connected port on VM, as well as throughput between the sink and
it's VM port. Both legs show around 7.5G throughput. So I guess forwarding
within the VM (from one port to the other) would be a big killer of the
performance.


Any suggestion on how I can root cause the poor performance issue, or any
idea on performance tuning techniques for virtio? thanks a lot!

[dpdk-dev] where to find ethernet CRC when stripping is off

2016-01-20 Thread Montorsi, Francesco

Hi all,

I need to get access to the Ethernet CRC of received packets.
To do this, I'm configuring:

port_conf.rxmode.hw_strip_crc = 0;

Now my question is: how am I supposed to access the Ethernet CRC from a DPDK 
mbuf? 
Is the CRC just the 4 final bytes of the packets? 

Is this correct:

   uint32_t crc = rte_pktmbuf_mtod_offset (mymbuf, uint32_t*, mymbuf->pkt_len) ;

?

Thanks,
Francesco Montorsi

[dpdk-dev] [PATCH] ethdev: expose link status and speed using xstats

2016-01-20 Thread Van Haaren, Harry

Hi Kyle,

> From: Kyle Larose [mailto:eomereadig at gmail.com]
> On Wed, Jan 20, 2016 at 9:28 AM, Harry van Haaren
>  wrote:
> > This patch exposes link duplex, speed, and status via the
> > existing xstats API.
> 
> I'm slightly confused by this. Why are we exposing operational
> properties of the chip through an API which I thought was primarily
> targeting statistics?

In a fault-detection situation, link state is a good item to monitor - just like
the rest of the statistics on the NIC.

> When I think of statistics and a NIC, I think of
> values which are monotonically increasing. I think of values that are
> derived primary from the packets flowing through the system. I do not
> think of link state, link speed and duplex, which have nothing to do
> with packets, and are not monotonic.

Link state, and speed seem a good fit to me. I'll admit I'm not sure about 
duplex, and would be happy to respin the patch without duplex if the community 
would prefer that.

> Should we not have a separate API to get this type information? I
> mean, just because we have a generic "string to uint64_t" map doesn't
> mean we should toss in anything that can fit into a uin64_t.

In theory we could create a new API for this, but I think the current xstats 
API is a good fit for exposing this info, so why create extra APIs? As a client 
of the DPDK API, I would prefer more statistics in a single API than have to 
research and implement two or more APIs to retrieve the information to monitor.

I'm working on exposing keep-alive statistics using an xstats style API, I'll 
the patches later today so we can discuss them too.

Regards, -Harry

[dpdk-dev] How classification happens in scheduling ?

2016-01-20 Thread Singh, Jasvinder

Hi Uday,


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of
> ravulakollu.kumar at wipro.com
> Sent: Wednesday, January 20, 2016 12:06 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] How classification happens in scheduling ?
> 
> Hi all,
> 
> Could someone explain me how this code snippet determining
> subport,pipe,traffic_class,queue,color.
> 
> uint16_t *pdata = rte_pktmbuf_mtod(m, uint16_t *); //points to the
> start of the data in the mbuf
> 
> *subport = (rte_be_to_cpu_16(pdata[SUBPORT_OFFSET]) & 0x0FFF)
> &(port_params.n_subports_per_port - 1); /* Outer VLAN ID*/
> *pipe = (rte_be_to_cpu_16(pdata[PIPE_OFFSET]) & 0x0FFF) &
> (port_params.n_pipes_per_subport - 1); /* Inner VLAN ID */
> *traffic_class = (pdata[QUEUE_OFFSET] & 0x0F) &
> (RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE - 1); /* Destination IP */
> *queue = ((pdata[QUEUE_OFFSET] >> 8) & 0x0F) &
> (RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS - 1) ; /* Destination IP */
> *color = pdata[COLOR_OFFSET] & 0x03;/* Destination IP */
> 
> Thanks & Regards,
> Uday

To understand this, please refer to explanation (23.4) at 
http://dpdk.org/doc/guides/sample_app_ug/qos_scheduler.html 

The above code snippet is about  classifying the incoming traffic packets based 
on their QinQ double VLAN tags and the IP destination address.  
The subport ID and pipe ID  are determined by reading 12 bits svlan field at  
SUBPORT_OFFSET and 12 bits cvlan field at PIPE_OFFSET from the packet header. 
Traffic Class, pipe queue and color are determined by reading specific fields 
at offset  QUEUE_OFFSET  (Destination IP),  QUEUE_OFFSET  (Destination IP) and 
COLOR_OFFSET  (Destination IP) from the packet's header.
To read all these values from the packet header, first packet header fields 
need to be converted from big endian to CPU order.  Since these values should 
not exceed their maximum values determined from configuration file, therefore, 
"&" operation with parameters such as port_params.n_subports_per_port, 
port_params.n_pipes_per_subport etc is performed to upper limit them.   

For these kind of queries, please use users at dpdk.org.

Thanks,
Jasvinder

[dpdk-dev] [PATCH] ethdev: expose link status and speed using xstats

2016-01-20 Thread Harry van Haaren

This patch exposes link duplex, speed, and status via the
existing xstats API.

Signed-off-by: Harry van Haaren 
---
 doc/guides/rel_notes/release_2_3.rst |  1 +
 lib/librte_ether/rte_ethdev.c| 29 ++---
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 99de186..c3449dc 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -19,6 +19,7 @@ Drivers
 Libraries
 ~

+* **Link Status added to extended statistics in ethdev**

 Examples
 
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index ed971b4..3c35e1b 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -83,6 +83,15 @@ struct rte_eth_xstats_name_off {
unsigned offset;
 };

+/* Link Status display in xstats */
+static const char * const rte_eth_duplex_strings[] = {
+   "link_duplex_autonegotiate",
+   "link_duplex_half",
+   "link_duplex_full"
+};
+
+#define RTE_NB_LINK_STATUS_STATS 3
+
 static const struct rte_eth_xstats_name_off rte_stats_strings[] = {
{"rx_good_packets", offsetof(struct rte_eth_stats, ipackets)},
{"tx_good_packets", offsetof(struct rte_eth_stats, opackets)},
@@ -94,7 +103,10 @@ static const struct rte_eth_xstats_name_off 
rte_stats_strings[] = {
rx_nombuf)},
 };

-#define RTE_NB_STATS (sizeof(rte_stats_strings) / sizeof(rte_stats_strings[0]))
+#define RTE_GENERIC_STATS (sizeof(rte_stats_strings) / \
+   sizeof(rte_stats_strings[0]))
+
+#define RTE_NB_STATS (RTE_NB_LINK_STATUS_STATS + RTE_GENERIC_STATS)

 static const struct rte_eth_xstats_name_off rte_rxq_stats_strings[] = {
{"packets", offsetof(struct rte_eth_stats, q_ipackets)},
@@ -1466,6 +1478,7 @@ rte_eth_xstats_get(uint8_t port_id, struct rte_eth_xstats 
*xstats,
 {
struct rte_eth_stats eth_stats;
struct rte_eth_dev *dev;
+   struct rte_eth_link link;
unsigned count = 0, i, q;
signed xcount = 0;
uint64_t val, *stats_ptr;
@@ -1497,8 +1510,18 @@ rte_eth_xstats_get(uint8_t port_id, struct 
rte_eth_xstats *xstats,
count = 0;
rte_eth_stats_get(port_id, ð_stats);

+   /* link status */
+   rte_eth_link_get_nowait(port_id, &link);
+   snprintf(xstats[count].name, sizeof(xstats[count].name), "link_status");
+   xstats[count++].value = link.link_status;
+   snprintf(xstats[count].name, sizeof(xstats[count].name), "link_speed");
+   xstats[count++].value = link.link_speed;
+   snprintf(xstats[count].name, sizeof(xstats[count].name),
+"%s", rte_eth_duplex_strings[link.link_duplex]);
+   xstats[count++].value = 1;
+
/* global stats */
-   for (i = 0; i < RTE_NB_STATS; i++) {
+   for (i = 0; i < RTE_GENERIC_STATS; i++) {
stats_ptr = RTE_PTR_ADD(ð_stats,
rte_stats_strings[i].offset);
val = *stats_ptr;
-- 
2.5.0

[dpdk-dev] [PATCH] eal: add function to check if primary proc alive

2016-01-20 Thread Harry van Haaren

This patch adds a new function to the EAL API:
int rte_eal_primary_proc_alive(const char *path);

The function indicates if a primary process is alive right now.
This functionality is implemented by testing for a write-
lock on the config file, and the function tests for a lock.

The use case for this functionality is that a secondary
process can wait until a primary process starts by polling
the function and waiting. When the primary is running, the
secondary continues to poll to detect if the primary process
has quit unexpectedly, the secondary process can detect this.

The RTE_MAGIC number is written to the shared config by the
primary process, this is the signal to the secondary process
that the EAL is set up, and ready to be used. The function
rte_eal_mcfg_complete() writes RTE_MAGIC. This has been
delayed in the EAL init proceedure, as the PCI probing in
the primary process can interfere with the secondary running.

Signed-off-by: Harry van Haaren 
---
 doc/guides/rel_notes/release_2_3.rst|  7 +++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  8 
 lib/librte_eal/common/include/rte_eal.h | 19 +++
 lib/librte_eal/linuxapp/eal/eal.c   | 18 --
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  7 +++
 5 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 99de186..14b5b06 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -11,6 +11,13 @@ Resolved Issues
 EAL
 ~~~

+* **Added rte_eal_primary_proc_alive() function**
+
+  A new function ``rte_eal_primary_proc_alive()`` has been added
+  to allow the user to detect if a primary process is running.
+  Use cases for this feature include fault detection, and monitoring
+  using secondary processes.
+

 Drivers
 ~~~
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 9d7adf1..0e28017 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -135,3 +135,11 @@ DPDK_2.2 {
rte_xen_dom0_supported;

 } DPDK_2.1;
+
+
+DPDK_2.3 {
+   global:
+
+   rte_eal_primary_proc_alive;
+
+} DPDK_2.2;
diff --git a/lib/librte_eal/common/include/rte_eal.h 
b/lib/librte_eal/common/include/rte_eal.h
index d2816a8..6eb65f9 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -156,6 +156,25 @@ int rte_eal_iopl_init(void);
  *   - On failure, a negative error value.
  */
 int rte_eal_init(int argc, char **argv);
+
+/**
+ * Check if a primary process is currently alive
+ *
+ * This function returns true when a primary process is currently
+ * active.
+ *
+ * @param config_file_path
+ *   The config_file_path argument provided should point at the location
+ *   that the primary process will create its config file. By default,
+ *   /var/run/.rte_config is used. This path can be customized when starting
+ *   a primary process using --file-prefix=custom_path
+ *
+ * @return
+ *  - If alive, returns one.
+ *  - If dead, returns zero.
+ */
+int rte_eal_primary_proc_alive(const char *config_file_path);
+
 /**
  * Usage function typedef used by the application usage function.
  *
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 635ec36..b419066 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -818,8 +818,6 @@ rte_eal_init(int argc, char **argv)

eal_check_mem_on_local_socket();

-   rte_eal_mcfg_complete();
-
if (eal_plugins_init() < 0)
rte_panic("Cannot init plugins\n");

@@ -877,9 +875,25 @@ rte_eal_init(int argc, char **argv)
if (rte_eal_pci_probe())
rte_panic("Cannot probe PCI\n");

+   rte_eal_mcfg_complete();
+
return fctret;
 }

+int
+rte_eal_primary_proc_alive(const char *config_file_path)
+{
+   int config_fd;
+   config_fd = open(config_file_path, O_RDONLY);
+   if (config_fd < 0)
+   return 0;
+
+   int ret = lockf(config_fd, F_TEST, 0);
+   close(config_fd);
+
+   return !!ret;
+}
+
 /* get core role */
 enum rte_lcore_role_t
 rte_eal_lcore_role(unsigned lcore_id)
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map 
b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index cbe175f..7a8c530 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -138,3 +138,10 @@ DPDK_2.2 {
rte_xen_dom0_supported;

 } DPDK_2.1;
+
+DPDK_2.3 {
+   global:
+
+   rte_eal_primary_proc_alive;
+
+} DPDK_2.2;
-- 
2.5.0

[dpdk-dev] Problem with Intel i40e XL710 dpdk driver

2016-01-20 Thread Karthick, A.R.

I found DMAR errors while bringing up other ports except port 0.
So rebooting the kernel with intel_iommu=off fixes it and dpdk i40e
initializes fine for all ports.

For some reason, the intel qcu64e mode change utility doesn't work if I
want to change back the mode from 4x10 to 2x40.
Tried several times but never reverts back to 2x40 from 4x10 though that is
a different issue.

Regards,
-Karthick

On Thu, Jan 14, 2016 at 6:26 PM, Karthick, A.R.  wrote:

> Hi,
>  I am seeing a "Failed to init adminq: -54" or admin queue timeouts
>  while initializing the admin queue for i40e xl710 intel nic.
>  (Intel server is a E5-2670)
>
>  First things first.
>  I am running the latest firmware.
>  The kernel module is not loaded and yes, it works with the i40e kernel
> driver. (latest or otherwise)
> And this problem comes even with dpdk 2.0/2.1 or the latest stable. So
> there's that.
>
>  I have done a bunch of debugging and here are my findings.
>  With the card configured in 2x40g or 4x10g mode, it _ALWAYS_ works with
>  successfully initializing pci function 0 or port 0.
>  It always fails to subsequently initialize the rest.
>  Even if unbind the igb uio for port 0 and bind only port 1 or port 2,3,4
> in 4x10g mode,
>  it fails.
>
>  Since it works with the kernel driver, I tried to see if there were
> differences in the way registers are setup for i40e driver in kernel and
> dpdk.
>  They look mostly to be the same but obviously there were subtle
> differences.
>  From what I could fathom, I couldn't see much and whatever little was
> caught, I tried to keep the dpdk code in sync and it still failed.
>
>  While stepping through gdb all the way from eal pci to pci uio map to
> eth_i40e_dev_init,
>  to the failure in obtaining the firmware revision for port1 during
> i40e_init_adminq,
>  I did confirm that the memory map was right for the pci.
>
>  So the hw->hw_addr looks correct for port 1 correlating it to the uio1
> map or the physical address from lspci or kernel driver when using the
> kernel driver which works.
>
>  However the admin queue seems to be not processing any request for port 1.
>  Note that port 0 always works and its the same code for others with a
> different eal dev/hw instance.
>
> But for other ports like port1, after correctly setting up the adminq
> registers and memory map,
>  it always fails to obtain the firmware revision since the i40e_asq_done
> is returning 0 for the
>  head register at 0x80300 and doesn't match the next_in_use when starting
> at 1.
>  So it always returns pending or false in i40e_asq_done which is retried a
> certain times after resetting the aq by i40e_init_adminq but ultimately
> gives up.
>
>  Thoughts and wondering if you guys have seen this and have a fix or patch
> that is not in upstream yet.
>
>  Failure enclosed below as mentioned above in detail: (with a 4x10g mode
> for the card but same failure with 2x40g mode as well. No difference. Port
> 0 always succeeds but subsequent ports fail.
>  And same result even with port 0 not bound and starting with the
> initialization of port 2,3,4 which always fails.
>
> EAL: lcore 1 is ready (tid=6bd30700;cpuset=[1])
> EAL: PCI device :01:00.0 on NUMA socket 0
> EAL:   probe driver: 8086:1521 rte_igb_pmd
> EAL:   Not managed by a supported kernel driver, skipped
> EAL: PCI device :01:00.1 on NUMA socket 0
> EAL:   probe driver: 8086:1521 rte_igb_pmd
> EAL:   Not managed by a supported kernel driver, skipped
> EAL: PCI device :83:00.0 on NUMA socket 1
> EAL:   probe driver: 8086:1583 rte_i40e_pmd
> EAL:   PCI memory mapped at 0x7f2f8000
> EAL:   PCI memory mapped at 0x7f2f8080
> PMD: eth_i40e_dev_init(): FW 4.40 API 1.4 NVM 04.05.03 eetrack 80001dca
> PMD: i40e_pf_parameter_init(): Max supported VSIs:34
> PMD: i40e_pf_parameter_init(): PF queue pairs:64
> PMD: i40e_pf_parameter_init(): Max VMDQ VSI num:34
> PMD: i40e_pf_parameter_init(): VMDQ queue pairs:4
> EAL: PCI device :83:00.1 on NUMA socket 1
> EAL:   probe driver: 8086:1583 rte_i40e_pmd
> EAL:   PCI memory mapped at 0x7f2f80808000
> EAL:   PCI memory mapped at 0x7f2f81008000
> PMD: eth_i40e_dev_init(): Failed to init adminq: -54
> EAL: Error - exiting with code: 1
>  Cause: Requested device :83:00.1 cannot be used
>
> Regards,
> -Karthick
>
>
>
>

[dpdk-dev] How classification happens in scheduling ?

2016-01-20 Thread ravulakollu.ku...@wipro.com

Hi all,

Could someone explain me how this code snippet determining 
subport,pipe,traffic_class,queue,color.

uint16_t *pdata = rte_pktmbuf_mtod(m, uint16_t *); //points to the 
start of the data in the mbuf

*subport = (rte_be_to_cpu_16(pdata[SUBPORT_OFFSET]) & 0x0FFF) 
&(port_params.n_subports_per_port - 1); /* Outer VLAN ID*/
*pipe = (rte_be_to_cpu_16(pdata[PIPE_OFFSET]) & 0x0FFF) & 
(port_params.n_pipes_per_subport - 1); /* Inner VLAN ID */
*traffic_class = (pdata[QUEUE_OFFSET] & 0x0F) & 
(RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE - 1); /* Destination IP */
*queue = ((pdata[QUEUE_OFFSET] >> 8) & 0x0F) & 
(RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS - 1) ; /* Destination IP */
*color = pdata[COLOR_OFFSET] & 0x03;/* Destination IP */

Thanks & Regards,
Uday
The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com

[dpdk-dev] [PATCH v2] Patch introducing API to read/write Intel Architecture Model Specific Registers (MSR)...

2016-01-20 Thread Ananyev, Konstantin


Hi Wojciech,
Couple of nits, see below.
Konstantin

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wojciech Andralojc
> Sent: Wednesday, January 20, 2016 10:57 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v2] Patch introducing API to read/write Intel 
> Architecture Model Specific Registers (MSR)...
> 
> Patch rework based on feedback, only x86 specific functions left under 
> lib/librte_eal/common/include/arch/x86/.
> 
> Signed-off-by: Wojciech Andralojc 
> ---
>  lib/librte_eal/common/include/arch/x86/rte_msr.h | 158 
> +++
>  1 file changed, 158 insertions(+)
>  create mode 100644 lib/librte_eal/common/include/arch/x86/rte_msr.h
> 
> diff --git a/lib/librte_eal/common/include/arch/x86/rte_msr.h 
> b/lib/librte_eal/common/include/arch/x86/rte_msr.h
> new file mode 100644
> index 000..9d16633
> --- /dev/null
> +++ b/lib/librte_eal/common/include/arch/x86/rte_msr.h
> +
> +#ifndef _RTE_MSR_X86_64_H_
> +#define _RTE_MSR_X86_64_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include  //O_RDONLY
> +#include  //pread

Pls remove '//' comments here.

> +
> +#include 
> +#include 
> +
> +#define CPU_MSR_PATH "/dev/cpu/%u/msr"
> +#define CPU_MSR_PATH_MAX_LEN 32
> +
> +/**
> + * This function should not be called directly.
> + * Function to open CPU's MSR file
> + */
> +static int
> +__msr_open_file(const unsigned lcore, int flags)
> +{
> + char fname[CPU_MSR_PATH_MAX_LEN] = {0};

Why not just  use PATH_MAX here?

> + int fd = -1;
> +
> + snprintf(fname, sizeof(fname) - 1, CPU_MSR_PATH, lcore);
> +
> + fd = open(fname, flags);
> +
> + if (fd < 0)
> + RTE_LOG(ERR, PQOS, "Error opening file '%s'!\n", fname);
> +
> + return fd;
> +}
> +
> +/**
> + * Function to read CPU's MSR
> + *
> + * @param [in] lcore
> + *  CPU logical core id

Hmm, are you aware that DPDK lcore id != CPU lcore id?
Might be better to use 'cpuid' name here?
Just to avoid confusion.

> + *
> + * @param [in] reg
> + *  MSR reg to read
> + *
> + * @param [out] value
> + *  Read value of MSR reg
> + *
> + * @return
> + *  Operations status
> +*/
> +
> +static inline int
> +rte_msr_read(const unsigned lcore, const uint32_t reg, uint64_t *value)

I don't think there is a need to put rte_msr_read/rte_msr_write() 
Definition into a header file and make them static inline.
Just normal external function definition seems sufficient here.

> +{
> + int fd = -1;
> + int ret = -1;
> +
> + RTE_VERIFY(value != NULL);

That's a a public API.
No need to coredump if one of the input parameters is invalid.

> + if (value == NULL)
> + return -1;


Might be better -EINVAL;

> +
> + fd = __msr_open_file(lcore, O_RDONLY);
> +
> + if (fd >= 0) {
> + ssize_t read_ret = 0;
> +
> + read_ret = pread(fd, value, sizeof(value[0]), (off_t)reg);
> +
> + if (read_ret != sizeof(value[0])) {
> + RTE_LOG(ERR, PQOS, "RDMSR failed for reg[0x%x] on lcore 
> %u\n",
> + (unsigned)reg, lcore);
> + } else
> + ret = 0;
> +
> + close(fd);
> + }
> +
> + return ret;
> +}

[dpdk-dev] [PATCH] ip_pipeline: fix cpu socket-id error

2016-01-20 Thread Jasvinder Singh

This patch fixes the socket-id error in ip_pipeline sample
application running over uni-processor systems.

Signed-off-by: Jasvinder Singh 
Acked-by: Cristian Dumitrescu 
---
 examples/ip_pipeline/init.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/examples/ip_pipeline/init.c b/examples/ip_pipeline/init.c
index 186ca03..86aa378 100644
--- a/examples/ip_pipeline/init.c
+++ b/examples/ip_pipeline/init.c
@@ -835,6 +835,17 @@ app_init_link_frag_ras(struct app_params *app)
}
 }

+static inline int
+app_get_cpu_socket_id(uint32_t pmd_id)
+{
+   int status = rte_eth_dev_socket_id(pmd_id);
+
+   if (status == -1)
+   return 0;
+
+   return status;
+}
+
 static void
 app_init_link(struct app_params *app)
 {
@@ -890,7 +901,7 @@ app_init_link(struct app_params *app)
p_link->pmd_id,
rxq_queue_id,
p_rxq->size,
-   rte_eth_dev_socket_id(p_link->pmd_id),
+   app_get_cpu_socket_id(p_link->pmd_id),
&p_rxq->conf,
app->mempool[p_rxq->mempool_id]);
if (status < 0)
@@ -917,7 +928,7 @@ app_init_link(struct app_params *app)
p_link->pmd_id,
txq_queue_id,
p_txq->size,
-   rte_eth_dev_socket_id(p_link->pmd_id),
+   app_get_cpu_socket_id(p_link->pmd_id),
&p_txq->conf);
if (status < 0)
rte_panic("%s (%" PRIu32 "): "
@@ -989,7 +1000,7 @@ app_init_tm(struct app_params *app)
/* TM */
p_tm->sched_port_params.name = p_tm->name;
p_tm->sched_port_params.socket =
-   rte_eth_dev_socket_id(p_link->pmd_id);
+   app_get_cpu_socket_id(p_link->pmd_id);
p_tm->sched_port_params.rate =
(uint64_t) link_eth_params.link_speed * 1000 * 1000 / 8;

-- 
2.5.0

[dpdk-dev] [PATCH v2] Patch introducing API to read/write Intel Architecture Model Specific Registers (MSR)...

2016-01-20 Thread Wojciech Andralojc

Patch rework based on feedback, only x86 specific functions left under 
lib/librte_eal/common/include/arch/x86/.

Signed-off-by: Wojciech Andralojc 
---
 lib/librte_eal/common/include/arch/x86/rte_msr.h | 158 +++
 1 file changed, 158 insertions(+)
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_msr.h

diff --git a/lib/librte_eal/common/include/arch/x86/rte_msr.h 
b/lib/librte_eal/common/include/arch/x86/rte_msr.h
new file mode 100644
index 000..9d16633
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/x86/rte_msr.h
@@ -0,0 +1,158 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_MSR_X86_64_H_
+#define _RTE_MSR_X86_64_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include  //O_RDONLY
+#include  //pread
+
+#include 
+#include 
+
+#define CPU_MSR_PATH "/dev/cpu/%u/msr"
+#define CPU_MSR_PATH_MAX_LEN 32
+
+/**
+ * This function should not be called directly.
+ * Function to open CPU's MSR file
+ */
+static int
+__msr_open_file(const unsigned lcore, int flags)
+{
+   char fname[CPU_MSR_PATH_MAX_LEN] = {0};
+   int fd = -1;
+
+   snprintf(fname, sizeof(fname) - 1, CPU_MSR_PATH, lcore);
+
+   fd = open(fname, flags);
+
+   if (fd < 0)
+   RTE_LOG(ERR, PQOS, "Error opening file '%s'!\n", fname);
+
+   return fd;
+}
+
+/**
+ * Function to read CPU's MSR
+ *
+ * @param [in] lcore
+ *  CPU logical core id
+ *
+ * @param [in] reg
+ *  MSR reg to read
+ *
+ * @param [out] value
+ *  Read value of MSR reg
+ *
+ * @return
+ *  Operations status
+*/
+
+static inline int
+rte_msr_read(const unsigned lcore, const uint32_t reg, uint64_t *value)
+{
+   int fd = -1;
+   int ret = -1;
+
+   RTE_VERIFY(value != NULL);
+   if (value == NULL)
+   return -1;
+
+   fd = __msr_open_file(lcore, O_RDONLY);
+
+   if (fd >= 0) {
+   ssize_t read_ret = 0;
+
+   read_ret = pread(fd, value, sizeof(value[0]), (off_t)reg);
+
+   if (read_ret != sizeof(value[0])) {
+   RTE_LOG(ERR, PQOS, "RDMSR failed for reg[0x%x] on lcore 
%u\n",
+   (unsigned)reg, lcore);
+   } else
+   ret = 0;
+
+   close(fd);
+   }
+
+   return ret;
+}
+
+/**
+ * Function to write CPU's MSR
+ *
+ * @param [in] lcore
+ *  CPU logical core id
+ *
+ * @param [in] reg
+ *  MSR reg to write
+ *
+ * @param [in] value
+ *  Value to be written to MSR reg
+ *
+ * @return
+ *  Operations status
+*/
+static inline int
+rte_msr_write(const unsigned lcore, const uint32_t reg, const uint64_t value)
+{
+   int fd = -1;
+   int ret = -1;
+
+   fd = __msr_open_file(lcore, O_WRONLY);
+
+   if (fd >= 0) {
+   ssize_t write_ret = 0;
+
+   write_ret = pwrite(fd, &value, sizeof(value), (off_t)reg);
+   if (write_ret != sizeof(value)) {
+   RTE_LOG(ERR, PQOS, "WRMSR failed for reg[0x%x] <- 
value[0x%llx] on "
+   "lcore %u\n", (unsigned)reg, (unsigned 
long long)value, lcore);
+   } else
+   ret = 0;
+
+   close(fd);
+   }
+
+   return ret;
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_MSR_X86_64_H_ */
-- 
1.9.3

[dpdk-dev] [PKTGEN] fixing weird termio issues that complicate debugging

2016-01-20 Thread Panu Matilainen

On 01/20/2016 08:32 AM, Matthew Hall wrote:
> Hello,
>
> Since the pktgen code is reindented I am finding time to read through it
> and experiment and see if I can get it working.
>
> I have issues with the init process of pktgen. It is difficult to debug
> it because the init code does a lot of very scary stuff to the terminal
> control / TTY device at inconvenient times in an inconvenient order, and
> in the process damages the debug output and damages the screen of your
> GDB without doing weird things to run GDB on a different TTY.
>
> Of course I am willing to contribute patches and not just complain, but
> first I need some help to follow what is going on.
>
> Here is the problematic call-flow with some explanation what went wrong
> trying it on some community machines outside of its original environment:
>
> 1) it calls printf("\n%s %s\n", wr_copyright_msg(), wr_powered_by());
> which dumps tons of weird boilerplate of licenses, copyrights, code
> creator, etc.
>
> It is open source and everybody that matters already knows who coded it,
> so is this stuff really that important? This gets in the way when you
> are trying to work on it and I just have to comment it out.
>
> 2) it calls wr_scrn_setw and tinkers with the windows size very early in
> the init which can make your terminal weird
>
> 3) it calls rte_eal_init which produces a lot of nice debug output,
> which is fine
>
> 4) it calls pktgen_init_screen, which calls wr_scrn_init, which calls
> wr_scrn_erase which destroys the valuable debug output just created in
> (c) which is a bad thing
>
> 5) it calls wr_print_copyright and dumps more boilerplate I am not sure
> is needed
>
> 6) it logs some helpful messages about the port / descriptor settings
> which is fine
>
> 7) it calls the pktgen_config_ports function which can crash in ways you
> need the destroyed debug output to fix.
>
> For example in my case that function crashes here:
>
>  if (pktgen.nb_ports == 0)
>  pktgen_log_panic("*** Did not find any ports to use ***");
>
> 8) Later it makes a logo and a splash screen (wr_log, wr_splash_screen).
> Is this stuff really needed? This is a ton of output for just starting
> up some test program.
>
> To fix this debug problem I propose some changes which I am happy to
> help develop:
>
> 1) decide what of this output we really need here and greatly simplify
> how much gets printed out
>
> 2) move wr_scrn_setw right before pktgen_init_screen and after
> rte_eal_init to prevent damaging that output
>
> 3) consider how wr_scrn_init is called in pktgen_init_screen, because it
> calls wr_scrn_erase which damages output
>
> 4) I think that pktgen_config_ports should be called before all this
> weird screen init stuff, so that if it fails you can actually see what
> happened there.
>
> One other random topic... on the long lines of code it looks like there
> are some gigantic tab-indents pushing things off to the right still. One
> example, maybe there are others or another setting which is needed to
> fix all of these:
>
>  info->seq_pkt = (pkt_seq_t *)rte_zmalloc_socket(buff,
> (sizeof(pkt_seq_t) * NUM_TOTAL_PKTS),
>
>   RTE_CACHE_LINE_SIZE,
> rte_socket_id());
>
> Thoughts?

Just that I'm in violent agreement about the splash screens and all.
Unfortunately the license explicitly forbids removal of the copyright 
messages (http://dpdk.org/browse/apps/pktgen-dpdk/tree/LICENSE#n18):

--
# 4) The screens displayed by the application must contain the copyright 
notice as defined
# above and can not be removed without specific prior written permission.
--

Keith, any chance you could work out the details with Wind River to get 
the ridiculous startup messages straightened out? I dont think anybody 
would mind a line or two "copyright by..." kind of printf() in there if 
that's what it takes, but the current screen after screen after screen 
copyrights and advertisements are obnoxious to the point of driving 
potential users away.

- Panu -

> Matthew Hall

[dpdk-dev] [PATCH v6 2/2] eal/linux: Add support for handling built-in kernel modules

2016-01-20 Thread krytarow...@caviumnetworks.com

From: Kamil Rytarowski 

Currently rte_eal_check_module() detects Linux kernel modules via reading
/proc/modules. Built-in ones aren't listed there and therefore they are not
being found by the script.

Add support for checking built-in modules with parsing the sysfs files

This commit obsoletes the /proc/modules parsing approach.

Signed-off-by: Kamil Rytarowski 
Acked-by: David Marchand 
Acked-by: Yuanhan Liu 
---
 lib/librte_eal/linuxapp/eal/eal.c | 34 --
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 635ec36..21a4a32 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -901,27 +901,33 @@ int rte_eal_has_hugepages(void)
 int
 rte_eal_check_module(const char *module_name)
 {
-   char mod_name[30]; /* Any module names can be longer than 30 bytes? */
-   int ret = 0;
+   char sysfs_mod_name[PATH_MAX];
+   struct stat st;
int n;

if (NULL == module_name)
return -1;

-   FILE *fd = fopen("/proc/modules", "r");
-   if (NULL == fd) {
-   RTE_LOG(ERR, EAL, "Open /proc/modules failed!"
-   " error %i (%s)\n", errno, strerror(errno));
+   /* Check if there is sysfs mounted */
+   if (stat("/sys/module", &st) != 0) {
+   RTE_LOG(DEBUG, EAL, "sysfs is not mounted! error %i (%s)\n",
+   errno, strerror(errno));
return -1;
}
-   while (!feof(fd)) {
-   n = fscanf(fd, "%29s %*[^\n]", mod_name);
-   if ((n == 1) && !strcmp(mod_name, module_name)) {
-   ret = 1;
-   break;
-   }
+
+   /* A module might be built-in, therefore try sysfs */
+   n = snprintf(sysfs_mod_name, PATH_MAX, "/sys/module/%s", module_name);
+   if (n < 0 || n > PATH_MAX) {
+   RTE_LOG(DEBUG, EAL, "Could not format module path\n");
+   return -1;
}
-   fclose(fd);

-   return ret;
+   if (stat(sysfs_mod_name, &st) != 0) {
+   RTE_LOG(DEBUG, EAL, "Module %s not found! error %i (%s)\n",
+   sysfs_mod_name, errno, strerror(errno));
+   return 0;
+   }
+
+   /* Module has been found */
+   return 1;
 }
-- 
1.9.1

[dpdk-dev] [PATCH v6 1/2] tools: Add support for handling built-in kernel modules

2016-01-20 Thread krytarow...@caviumnetworks.com

From: Kamil Rytarowski 

Currently dpdk_nic_bind.py detects Linux kernel modules via reading
/proc/modules. Built-in ones aren't listed there and therefore they are not
being found by the script.

Add support for checking built-in modules with parsing the sysfs files.

This commit obsoletes the /proc/modules parsing approach.

Signed-off-by: Kamil Rytarowski 
Acked-by: David Marchand 
Acked-by: Yuanhan Liu 
---
 tools/dpdk_nic_bind.py | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/tools/dpdk_nic_bind.py b/tools/dpdk_nic_bind.py
index f02454e..1d16d9f 100755
--- a/tools/dpdk_nic_bind.py
+++ b/tools/dpdk_nic_bind.py
@@ -156,22 +156,32 @@ def check_modules():
 '''Checks that igb_uio is loaded'''
 global dpdk_drivers

-fd = file("/proc/modules")
-loaded_mods = fd.readlines()
-fd.close()
-
 # list of supported modules
 mods =  [{"Name" : driver, "Found" : False} for driver in dpdk_drivers]

 # first check if module is loaded
-for line in loaded_mods:
+try:
+# Get list of syfs modules, some of them might be builtin and merge 
with mods
+sysfs_path = '/sys/module/'
+
+# Get the list of directories in sysfs_path
+sysfs_mods = [os.path.join(sysfs_path, o) for o
+  in os.listdir(sysfs_path)
+  if os.path.isdir(os.path.join(sysfs_path, o))]
+
+# Extract the last element of '/sys/module/abc' in the array
+sysfs_mods = [a.split('/')[-1] for a in sysfs_mods]
+
+# special case for vfio_pci (module is named vfio-pci,
+# but its .ko is named vfio_pci)
+sysfs_mods = map(lambda a:
+ a if a != 'vfio_pci' else 'vfio-pci', sysfs_mods)
+
 for mod in mods:
-if line.startswith(mod["Name"]):
-mod["Found"] = True
-# special case for vfio_pci (module is named vfio-pci,
-# but its .ko is named vfio_pci)
-elif line.replace("_", "-").startswith(mod["Name"]):
+if mod["Found"] == False and (mod["Name"] in sysfs_mods):
 mod["Found"] = True
+except:
+pass

 # check if we have at least one loaded module
 if True not in [mod["Found"] for mod in mods] and b_flag is not None:
-- 
1.9.1

[dpdk-dev] [PATCH] examples/vhost: fix out of sequence packets

2016-01-20 Thread Yuanhan Liu

On Wed, Jan 20, 2016 at 03:18:11AM +0800, Jianfeng Tan wrote:
> Issue description: when packets go through vhost example to virtio
> device and come back to another virtio device or physical NIC, the
> sequence of packets will be changed.
> 
> Reported-by: Thomas Long 
> Signed-off-by: Jianfeng Tan 

Acked-by: Yuanhan Liu 

--yliu

[dpdk-dev] [PATCH v2 10/10] pci: place all uio pci device ids in a dedicated section

2016-01-20 Thread Neil Horman

On Tue, Jan 19, 2016 at 01:35:14PM -0800, Stephen Hemminger wrote:
> On Tue, 19 Jan 2016 15:56:14 -0500
> Neil Horman  wrote:
> 
> > On Tue, Jan 19, 2016 at 08:10:19AM -0800, Stephen Hemminger wrote:
> > > On Tue, 19 Jan 2016 09:29:31 -0500
> > > Neil Horman  wrote:
> > > 
> > > > On Tue, Jan 19, 2016 at 08:30:40AM +0100, Thomas Monjalon wrote:
> > > > > 2016-01-18 13:30, David Marchand:
> > > > > > We could do something ? la modinfo, but let's keep it simple for 
> > > > > > now.
> > > > > > 
> > > > > > With this, you can extract the devices that need to be bound to uio 
> > > > > > / vfio
> > > > > > with tools like objdump :
> > > > > > 
> > > > > > $ objdump -j rte_pci_id_uio -s build/lib/librte_pmd_fm10k.so
> > > > > > 
> > > > > > Contents of section rte_pci_id_uio:
> > > > > >  15760 8680a415  8680d015   
> > > > > >  15770 8680a515     
> > > > > 
> > > > > Yes we need a modinfo-like tool.
> > > > > Currently, the UIO/VFIO binding can be done after parsing the PCI 
> > > > > device list.
> > > > > It is better to define the device ids locally to their drivers but it 
> > > > > must
> > > > > be integrated with an appropriate parsing tool at the same time.
> > > > > And more importantly than any tool, the format of these ELF data must 
> > > > > be
> > > > > properly defined, documented and extensible.
> > > > > 
> > > > > Is there someone experimented with such format definition?
> > > > > Stephen, you were asking for this change, what is your opinion?
> > > > > I remember that Neil was also interested in this change:
> > > > >   http://dpdk.org/ml/archives/dev/2015-January/012115.html
> > > > > Panu, Christian, this change could be related to distribution 
> > > > > packaging.
> > > > > Thanks for helping to move this change forward.
> > > > 
> > > > Yes, I would be interested in seeing this.  Is the ask here that 
> > > > someone do it?
> > > > As I recall from the last thread that you reference, I thought David M 
> > > > was
> > > > interested in writing it and soliciting for ideas.  If thats no longer 
> > > > the case,
> > > > I can take a stab at writing it.
> > > > 
> > > > Neil
> > > > 
> > > 
> > > If these are libraries is there a way to have a real entry point
> > > to dump PCI id's. 
> > > 
> > Sure, you could write a method that could be dlsym-ed easily enough to 
> > fetch an
> > array of pci ids, or just print stuff the console.  Not sure thats the best 
> > way,
> > but definately an option
> > Neil
> 
> It is just that reading data with objdump is a kludge likely to get broken.
> 
Not suggesting that we rely on objdump in perpituity, only that we export the
data, rather than a method to access it so that it can be reached via libelf.
Using a function to return the information has implicit issues at the moment
(specifically if you dlopen a dpdk driver, its constructor will attempt to
register it with the core libraries).  While thats not catastrophic, it means
more stuff than you expect gets loaded, which might have wierd side effects.
Adding a separate section that you could reach via libelf would be nice I think

Neil

[dpdk-dev] [PATCH] ethdev: expose link status and speed using xstats

2016-01-20 Thread Stephen Hemminger

On Wed, 20 Jan 2016 16:13:34 +0100
Thomas Monjalon  wrote:

> > We already have the rte_eth_link_get function. Why not let users
> > continue to use that? It's well defined, it is simple, and it is
> > consistent.  

+1

API's should not duplicate results (DRY)

That said, it would be useful to have some way to get statistics
on the number of link transitions and time since last change.
But this ideally should be in rte_eth_link_get() but that wouldn't
be ABI compatiable.

[dpdk-dev] [PATCH] ip_pipeline: fix cpu socket-id error

2016-01-20 Thread Stephen Hemminger

On Wed, 20 Jan 2016 11:01:17 +
Jasvinder Singh  wrote:

> +static inline int
> +app_get_cpu_socket_id(uint32_t pmd_id)
> +{
> + int status = rte_eth_dev_socket_id(pmd_id);
> +
> + if (status == -1)
> + return 0;
> +
> + return status;
> +

Why not:
return (status != SOCKET_ID_ANY) ? status : 0;

[dpdk-dev] [PATCH] ethdev: expose link status and speed using xstats

2016-01-20 Thread Kyle Larose

Hi Harry,

On Wed, Jan 20, 2016 at 9:45 AM, Van Haaren, Harry
 wrote:
> Hi Kyle,

>
> In theory we could create a new API for this, but I think the current xstats 
> API is a good fit for exposing this info, so why create extra APIs? As a 
> client of the DPDK API, I would prefer more statistics in a single API than 
> have to research and implement two or more APIs to retrieve the information 
> to monitor.
>

You create new APIs for many reasons: modularity, simplicitly within
the API, consistency, etc. My main concern with this proposed change
relates to consistency. Previously, each stat had similar semantics.
It was a number, representing the amount of times something had
occurred on a chip. This fact allows you to perform operations like
addition, subtraction/etc and expect that the result will be
meaningful for every value in the array.

For example, suppose I wrote a tool to give the "rate" for each of the
stats. We could sample these stats periodically, then output the
difference between the two samples divided by the time between samples
for each stat. A naive implementation, but quite simple.

However, if we start adding values like link speed and state, which
are not really numerical, or not monotonic, you can no longer apply
the same mathematical operations on them and expect them to be
meaningful. For example, suppose a link went down. The "rate" for that
stat would be -1. Does that really make sense? Anyone using this API
would need to explicitly filter out the non-stats, or risk nonsensical
output.

Let's also consider how to interpret the value. When I look at a stat,
there's usually one of two meanings: it's either a number of packets,
or it's a number of bytes. We're now adding exceptions to that rule.
Link state is a boolean. Link speed is a value in mbps. Duplex is
pretty much an enum.

We already have the rte_eth_link_get function. Why not let users
continue to use that? It's well defined, it is simple, and it is
consistent.

Thanks,

Kyle

[dpdk-dev] [PATCH] ethdev: expose link status and speed using xstats

2016-01-20 Thread Kyle Larose

On Wed, Jan 20, 2016 at 9:28 AM, Harry van Haaren
 wrote:
> This patch exposes link duplex, speed, and status via the
> existing xstats API.
>

I'm slightly confused by this. Why are we exposing operational
properties of the chip through an API which I thought was primarily
targeting statistics? When I think of statistics and a NIC, I think of
values which are monotonically increasing. I think of values that are
derived primary from the packets flowing through the system. I do not
think of link state, link speed and duplex, which have nothing to do
with packets, and are not monotonic.

Should we not have a separate API to get this type information? I
mean, just because we have a generic "string to uint64_t" map doesn't
mean we should toss in anything that can fit into a uin64_t. Would you
want to see the MAC address in here as well? If we put in link
speed/etc. it seems like we may as well!

Thanks,

Kyle

[dpdk-dev] Missing Outstanding Patches (By Me) In Patchwork

2016-01-20 Thread Andriy Berestovskyy

Hi Matthew,
I hope that is what you are looking for:
http://dpdk.org/dev/patchwork/project/dpdk/list/?submitter=37&state=*&archive=both

You just click on Filters and there are few options...

Andriy


On Wed, Jan 20, 2016 at 6:20 AM, Matthew Hall  wrote:
> I have some outstanding minor patches which do not appear in Patchwork
> anywhere I can see but the interface is also pretty confusing.
>
> Is there a way to find all patches by a person throughout time so I can see
> what happened to them and check why they are not listed and also not merged
> (that I am aware of anyway)?
>
> Sincerely,
> Matthew.



-- 
Andriy Berestovskyy

[dpdk-dev] [PATCH] i40e: fix vlan filtering

2016-01-20 Thread Zhang, Helin



> -Original Message-
> From: Julien Meunier [mailto:julien.meunier at 6wind.com]
> Sent: Tuesday, January 19, 2016 1:19 AM
> To: Zhang, Helin
> Cc: dev at dpdk.org
> Subject: [PATCH] i40e: fix vlan filtering
> 
> VLAN filtering was always performed, even if hw_vlan_filter was disabled.
> During device initialization, default filter RTE_MACVLAN_PERFECT_MATCH
> was applied. In this situation, all incoming VLAN frames were dropped by the
> card (increase of the register RUPP - Rx Unsupported Protocol).
> 
> In order to restore default behavior, if HW VLAN filtering is activated, set a
> filter to match MAC and VLAN. If not, set a filter to only match MAC.
> 
> Signed-off-by: Julien Meunier 
> Signed-off-by: David Marchand 
> ---
>  drivers/net/i40e/i40e_ethdev.c | 39
> ++-
>  drivers/net/i40e/i40e_ethdev.h |  1 +
>  2 files changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
> index bf6220d..ef9d578 100644
> --- a/drivers/net/i40e/i40e_ethdev.c
> +++ b/drivers/net/i40e/i40e_ethdev.c
> @@ -2332,6 +2332,13 @@ i40e_vlan_offload_set(struct rte_eth_dev *dev,
> int mask)
>   struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data-
> >dev_private);
>   struct i40e_vsi *vsi = pf->main_vsi;
> 
> + if (mask & ETH_VLAN_FILTER_MASK) {
> + if (dev->data->dev_conf.rxmode.hw_vlan_filter)
> + i40e_vsi_config_vlan_filter(vsi, TRUE);
> + else
> + i40e_vsi_config_vlan_filter(vsi, FALSE);
> + }
> +
>   if (mask & ETH_VLAN_STRIP_MASK) {
>   /* Enable or disable VLAN stripping */
>   if (dev->data->dev_conf.rxmode.hw_vlan_strip)
> @@ -4156,6 +4163,34 @@ fail_mem:
>   return NULL;
>  }
> 
> +/* Configure vlan filter on or off */
> +int
> +i40e_vsi_config_vlan_filter(struct i40e_vsi *vsi, bool on) {
> + struct i40e_hw *hw = I40E_VSI_TO_HW(vsi);
> + struct i40e_mac_filter_info filter;
> + int ret;
> +
> + rte_memcpy(&filter.mac_addr,
> +(struct ether_addr *)(hw->mac.perm_addr),
> ETH_ADDR_LEN);
> + ret = i40e_vsi_delete_mac(vsi, &filter.mac_addr);
> +
> + if (on) {
> + /* Filter to match MAC and VLAN */
> + filter.filter_type = RTE_MACVLAN_PERFECT_MATCH;
> + } else {
> + /* Filter to match only MAC */
> + filter.filter_type = RTE_MAC_PERFECT_MATCH;
> + }
> +
> + ret |= i40e_vsi_add_mac(vsi, &filter);
How would it be if multiple mac addresses has been configured?
I think this might be ignored in the code changes, right?

Regards,
Helin

> +
> + if (ret)
> + PMD_DRV_LOG(INFO, "Update VSI failed to %s vlan filter",
> + on ? "enable" : "disable");
> + return ret;
> +}
> +
>  /* Configure vlan stripping on or off */  int
> i40e_vsi_config_vlan_stripping(struct i40e_vsi *vsi, bool on) @@ -4203,9
> +4238,11 @@ i40e_dev_init_vlan(struct rte_eth_dev *dev)  {
>   struct rte_eth_dev_data *data = dev->data;
>   int ret;
> + int mask = 0;
> 
>   /* Apply vlan offload setting */
> - i40e_vlan_offload_set(dev, ETH_VLAN_STRIP_MASK);
> + mask = ETH_VLAN_STRIP_MASK | ETH_VLAN_FILTER_MASK;
> + i40e_vlan_offload_set(dev, mask);
> 
>   /* Apply double-vlan setting, not implemented yet */
> 
> diff --git a/drivers/net/i40e/i40e_ethdev.h
> b/drivers/net/i40e/i40e_ethdev.h index 1f9792b..5505d72 100644
> --- a/drivers/net/i40e/i40e_ethdev.h
> +++ b/drivers/net/i40e/i40e_ethdev.h
> @@ -551,6 +551,7 @@ void i40e_vsi_queues_unbind_intr(struct i40e_vsi
> *vsi);  int i40e_vsi_vlan_pvid_set(struct i40e_vsi *vsi,
>  struct i40e_vsi_vlan_pvid_info *info);  int
> i40e_vsi_config_vlan_stripping(struct i40e_vsi *vsi, bool on);
> +int i40e_vsi_config_vlan_filter(struct i40e_vsi *vsi, bool on);
>  uint64_t i40e_config_hena(uint64_t flags);  uint64_t
> i40e_parse_hena(uint64_t flags);  enum i40e_status_code
> i40e_fdir_setup_tx_resources(struct i40e_pf *pf);
> --
> 2.1.4

[dpdk-dev] [PATCH] i40e: fix vlan filtering

2016-01-20 Thread Pei, Yulong

It works as expected, thanks.

Tested-by Yulong.pei at intel.com

Best Regards
Yulong Pei

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Julien Meunier
Sent: Tuesday, January 19, 2016 1:19 AM
To: Zhang, Helin 
Cc: dev at dpdk.org
Subject: [dpdk-dev] [PATCH] i40e: fix vlan filtering

VLAN filtering was always performed, even if hw_vlan_filter was disabled. 
During device initialization, default filter RTE_MACVLAN_PERFECT_MATCH was 
applied. In this situation, all incoming VLAN frames were dropped by the card 
(increase of the register RUPP - Rx Unsupported Protocol).

In order to restore default behavior, if HW VLAN filtering is activated, set a 
filter to match MAC and VLAN. If not, set a filter to only match MAC.

Signed-off-by: Julien Meunier 
Signed-off-by: David Marchand 
---
 drivers/net/i40e/i40e_ethdev.c | 39 ++-
 drivers/net/i40e/i40e_ethdev.h |  1 +
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c 
index bf6220d..ef9d578 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -2332,6 +2332,13 @@ i40e_vlan_offload_set(struct rte_eth_dev *dev, int mask)
struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
struct i40e_vsi *vsi = pf->main_vsi;

+   if (mask & ETH_VLAN_FILTER_MASK) {
+   if (dev->data->dev_conf.rxmode.hw_vlan_filter)
+   i40e_vsi_config_vlan_filter(vsi, TRUE);
+   else
+   i40e_vsi_config_vlan_filter(vsi, FALSE);
+   }
+
if (mask & ETH_VLAN_STRIP_MASK) {
/* Enable or disable VLAN stripping */
if (dev->data->dev_conf.rxmode.hw_vlan_strip)
@@ -4156,6 +4163,34 @@ fail_mem:
return NULL;
 }

+/* Configure vlan filter on or off */
+int
+i40e_vsi_config_vlan_filter(struct i40e_vsi *vsi, bool on) {
+   struct i40e_hw *hw = I40E_VSI_TO_HW(vsi);
+   struct i40e_mac_filter_info filter;
+   int ret;
+
+   rte_memcpy(&filter.mac_addr,
+  (struct ether_addr *)(hw->mac.perm_addr), ETH_ADDR_LEN);
+   ret = i40e_vsi_delete_mac(vsi, &filter.mac_addr);
+
+   if (on) {
+   /* Filter to match MAC and VLAN */
+   filter.filter_type = RTE_MACVLAN_PERFECT_MATCH;
+   } else {
+   /* Filter to match only MAC */
+   filter.filter_type = RTE_MAC_PERFECT_MATCH;
+   }
+
+   ret |= i40e_vsi_add_mac(vsi, &filter);
+
+   if (ret)
+   PMD_DRV_LOG(INFO, "Update VSI failed to %s vlan filter",
+   on ? "enable" : "disable");
+   return ret;
+}
+
 /* Configure vlan stripping on or off */  int  
i40e_vsi_config_vlan_stripping(struct i40e_vsi *vsi, bool on) @@ -4203,9 
+4238,11 @@ i40e_dev_init_vlan(struct rte_eth_dev *dev)  {
struct rte_eth_dev_data *data = dev->data;
int ret;
+   int mask = 0;

/* Apply vlan offload setting */
-   i40e_vlan_offload_set(dev, ETH_VLAN_STRIP_MASK);
+   mask = ETH_VLAN_STRIP_MASK | ETH_VLAN_FILTER_MASK;
+   i40e_vlan_offload_set(dev, mask);

/* Apply double-vlan setting, not implemented yet */

diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h 
index 1f9792b..5505d72 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -551,6 +551,7 @@ void i40e_vsi_queues_unbind_intr(struct i40e_vsi *vsi);  
int i40e_vsi_vlan_pvid_set(struct i40e_vsi *vsi,
   struct i40e_vsi_vlan_pvid_info *info);  int 
i40e_vsi_config_vlan_stripping(struct i40e_vsi *vsi, bool on);
+int i40e_vsi_config_vlan_filter(struct i40e_vsi *vsi, bool on);
 uint64_t i40e_config_hena(uint64_t flags);  uint64_t i40e_parse_hena(uint64_t 
flags);  enum i40e_status_code i40e_fdir_setup_tx_resources(struct i40e_pf *pf);
--
2.1.4

[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-20 Thread Xie, Huawei

On 1/12/2016 1:37 PM, Tetsuya Mukawa wrote:
> Hi Jianfeng and Xie,
>
> I guess my implementation and yours have a lot of common code, so I will
> try to rebase my patch on yours.
>
> BTW, one thing I need to change your memory allocation way is that
> mmaped address should be under 44bit(32 + PAGE_SHIFT) to work with my patch.
> This is because VIRTIO_PCI_QUEUE_PFN register only accepts such address.
> (I may need to add one more EAL parameter like "--mmap-under ")

I believe it is OK to mmap under 44bit, but better check the user space
address space layout.

>
> Thanks,
> Tetsuya

[dpdk-dev] [PATCH] vhost: remove lockless enqueue to the virtio ring

2016-01-20 Thread Xie, Huawei

On 1/20/2016 2:33 AM, Polehn, Mike A wrote:
> SMP operations can be very expensive, sometimes can impact operations by 100s 
> to 1000s of clock cycles depending on what is the circumstances of the 
> synchronization. It is how you arrange the SMP operations within the tasks at 
> hand across the SMP cores that gives methods for top performance.  Using 
> traditional general purpose SMP methods will result in traditional general 
> purpose performance. Migrating to general libraries (understood by most 
> general purpose programmers) from expert abilities (understood by much 
> smaller group of expert programmers focused on performance) will greatly 
> reduce the value of DPDK since the end result will be lower performance 
> and/or have less predictable operation where rate performance, 
> predictability, and low latency are the primary goals.
>
> The best method to date, is to have multiple outputs to a single port is to 
> use a DPDK queue with multiple producer, single consumer to do an SMP 
> operation for multiple sources to feed a single non SMP task to output to the 
> port (that is why the ports are not SMP protected). Also when considerable 
> contention from multiple sources occur often (data feeding at same time), 
> having DPDK queue with input and output variables  in separate cache lines 
> can have a notable throughput improvement.
>
> Mike 

Mike:
Thanks for detailed explanation. Do you have comment to this patch?

>
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Xie, Huawei
> Sent: Tuesday, January 19, 2016 8:44 AM
> To: Tan, Jianfeng; dev at dpdk.org
> Cc: ann.zhuangyanying at huawei.com
> Subject: Re: [dpdk-dev] [PATCH] vhost: remove lockless enqueue to the virtio 
> ring
>
> On 1/20/2016 12:25 AM, Tan, Jianfeng wrote:
>> Hi Huawei,
>>
>> On 1/4/2016 10:46 PM, Huawei Xie wrote:
>>> This patch removes the internal lockless enqueue implmentation.
>>> DPDK doesn't support receiving/transmitting packets from/to the same 
>>> queue. Vhost PMD wraps vhost device as normal DPDK port. DPDK 
>>> applications normally have their own lock implmentation when enqueue 
>>> packets to the same queue of a port.
>>>
>>> The atomic cmpset is a costly operation. This patch should help 
>>> performance a bit.
>>>
>>> Signed-off-by: Huawei Xie 
>>> ---
>>>   lib/librte_vhost/vhost_rxtx.c | 86
>>> +--
>>>   1 file changed, 25 insertions(+), 61 deletions(-)
>>>
>>> diff --git a/lib/librte_vhost/vhost_rxtx.c 
>>> b/lib/librte_vhost/vhost_rxtx.c index bbf3fac..26a1b9c 100644
>>> --- a/lib/librte_vhost/vhost_rxtx.c
>>> +++ b/lib/librte_vhost/vhost_rxtx.c
>> I think vhost example will not work well with this patch when
>> vm2vm=software.
>>
>> Test case:
>> Two virtio ports handled by two pmd threads. Thread 0 polls pkts from
>> physical NIC and sends to virtio0, while thread0 receives pkts from
>> virtio1 and routes it to virtio0.
> vhost port will be wrapped as port, by vhost PMD. DPDK APP treats all
> physical and virtual ports as ports equally. When two DPDK threads try
> to enqueue to the same port, the APP needs to consider the contention.
> All the physical PMDs doesn't support concurrent enqueuing/dequeuing.
> Vhost PMD should expose the same behavior unless absolutely necessary
> and we expose the difference of different PMD.
>
>>> -
>>>   *(volatile uint16_t *)&vq->used->idx += entry_success;
>> Another unrelated question: We ever try to move this assignment out of
>> loop to save cost as it's a data contention?
> This operation itself is not that costly, but it has side effect on the
> cache transfer.
> It is outside of the loop for non-mergable case. For mergeable case, it
> is inside the loop.
> Actually it has pro and cons whether we do this in burst or in a smaller
> step. I prefer to move it outside of the loop. Let us address this later.
>
>> Thanks,
>> Jianfeng
>>
>>
>

[dpdk-dev] [PATCH] examples/vhost: fix out of sequence packets

2016-01-20 Thread Jianfeng Tan

Issue description: when packets go through vhost example to virtio
device and come back to another virtio device or physical NIC, the
sequence of packets will be changed.

Reported-by: Thomas Long 
Signed-off-by: Jianfeng Tan 
---
 examples/vhost/main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 2dcdacb..aa9aa5a 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1336,8 +1336,8 @@ switch_worker(__attribute__((unused)) void *arg)

rte_pktmbuf_free(pkts_burst[--tx_count]);
}
}
-   while (tx_count)
-   virtio_tx_route(vdev, 
pkts_burst[--tx_count], (uint16_t)dev->device_fh);
+   for (i = 0; i < tx_count; ++i)
+   virtio_tx_route(vdev, pkts_burst[i], 
(uint16_t)dev->device_fh);
}

/*move to the next device in the list*/
-- 
2.1.4

[dpdk-dev] [PATCH v6 08/11] eal: pci: introduce RTE_KDRV_VFIO_NOIOMMUi driver mode

2016-01-20 Thread Santosh Shukla

Adding RTE_KDRV_VFIO_NOIOMMU mode in kernel driver. Also including
rte_vfio_is_noiommu() helper function. This function will parse
/sys/bus/pci/device// and make sure that
- vfio noiommu mode set in kernel driver
- pci device attached to vfio-noiommu driver only

If both condition satisfies then set drv->kdrv = RTE_KDRV_VFIO_NOIOMMU

Also did similar changes in virtio_rd/wr, Changes applicable for virtio spec
0.95 only.

Signed-off-by: Santosh Shukla 
---
v5--> v6:
- Include pci_dev == NULL check in pci_vfio_is_noiommu(), suggested by Anatoly.

v4--> v5:
- Removed virtio_xx_init_by_vfio and added new driver mode.
- Now no need to parse vfio interface in virtio. As pci_eal module will take of
  vfio-noiommu driver parsing for virtio or any other future device willing to
  use vfio-noiommu driver.

 drivers/net/virtio/virtio_pci.c|   12 ++---
 lib/librte_eal/common/include/rte_pci.h|1 +
 lib/librte_eal/linuxapp/eal/eal_pci.c  |   13 +++--
 lib/librte_eal/linuxapp/eal/eal_pci_init.h |1 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c |   72 
 5 files changed, 90 insertions(+), 9 deletions(-)

diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
index 0c29f1d..537c552 100644
--- a/drivers/net/virtio/virtio_pci.c
+++ b/drivers/net/virtio/virtio_pci.c
@@ -60,7 +60,7 @@ virtio_read_reg_1(struct virtio_hw *hw, uint64_t reg_offset)
struct rte_pci_device *dev;

dev = hw->dev;
-   if (dev->kdrv == RTE_KDRV_VFIO)
+   if (dev->kdrv == RTE_KDRV_VFIO_NOIOMMU)
ioport_inb(dev, reg_offset, &ret);
else
ret = inb(VIRTIO_PCI_REG_ADDR(hw, reg_offset));
@@ -75,7 +75,7 @@ virtio_read_reg_2(struct virtio_hw *hw, uint64_t reg_offset)
struct rte_pci_device *dev;

dev = hw->dev;
-   if (dev->kdrv == RTE_KDRV_VFIO)
+   if (dev->kdrv == RTE_KDRV_VFIO_NOIOMMU)
ioport_inw(dev, reg_offset, &ret);
else
ret = inw(VIRTIO_PCI_REG_ADDR(hw, reg_offset));
@@ -90,7 +90,7 @@ virtio_read_reg_4(struct virtio_hw *hw, uint64_t reg_offset)
struct rte_pci_device *dev;

dev = hw->dev;
-   if (dev->kdrv == RTE_KDRV_VFIO)
+   if (dev->kdrv == RTE_KDRV_VFIO_NOIOMMU)
ioport_inl(dev, reg_offset, &ret);
else
ret = inl(VIRTIO_PCI_REG_ADDR(hw, reg_offset));
@@ -104,7 +104,7 @@ virtio_write_reg_1(struct virtio_hw *hw, uint64_t 
reg_offset, uint8_t value)
struct rte_pci_device *dev;

dev = hw->dev;
-   if (dev->kdrv == RTE_KDRV_VFIO)
+   if (dev->kdrv == RTE_KDRV_VFIO_NOIOMMU)
ioport_outb_p(dev, reg_offset, value);
else
outb_p((unsigned char)value,
@@ -117,7 +117,7 @@ virtio_write_reg_2(struct virtio_hw *hw, uint64_t 
reg_offset, uint16_t value)
struct rte_pci_device *dev;

dev = hw->dev;
-   if (dev->kdrv == RTE_KDRV_VFIO)
+   if (dev->kdrv == RTE_KDRV_VFIO_NOIOMMU)
ioport_outw_p(dev, reg_offset, value);
else
outw_p((unsigned short)value,
@@ -130,7 +130,7 @@ virtio_write_reg_4(struct virtio_hw *hw, uint64_t 
reg_offset, uint32_t value)
struct rte_pci_device *dev;

dev = hw->dev;
-   if (dev->kdrv == RTE_KDRV_VFIO)
+   if (dev->kdrv == RTE_KDRV_VFIO_NOIOMMU)
ioport_outl_p(dev, reg_offset, value);
else
outl_p((unsigned int)value,
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 0c667ff..2dbc658 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -149,6 +149,7 @@ enum rte_kernel_driver {
RTE_KDRV_VFIO,
RTE_KDRV_UIO_GENERIC,
RTE_KDRV_NIC_UIO,
+   RTE_KDRV_VFIO_NOIOMMU,
RTE_KDRV_NONE,
 };

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index eb503f0..2936497 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -131,6 +131,7 @@ rte_eal_pci_map_device(struct rte_pci_device *dev)
/* try mapping the NIC resources using VFIO if it exists */
switch (dev->kdrv) {
case RTE_KDRV_VFIO:
+   case RTE_KDRV_VFIO_NOIOMMU:
 #ifdef VFIO_PRESENT
if (pci_vfio_is_enabled())
ret = pci_vfio_map_resource(dev);
@@ -158,6 +159,7 @@ rte_eal_pci_unmap_device(struct rte_pci_device *dev)
/* try unmapping the NIC resources using VFIO if it exists */
switch (dev->kdrv) {
case RTE_KDRV_VFIO:
+   case RTE_KDRV_VFIO_NOIOMMU:
RTE_LOG(ERR, EAL, "Hotplug doesn't support vfio yet\n");
break;
case RTE_KDRV_IGB_UIO:
@@ -353,9 +355,12 @@ pci_scan_one(const char *dirname, uint16_t domain, uint8_t 
bus,
}

if (!ret) {
-   if (!strcmp(driver, "vfio-pci"))
-

[dpdk-dev] [PATCH] vhost: remove lockless enqueue to the virtio ring

2016-01-20 Thread Tan, Jianfeng

Hi Huawei,

On 1/4/2016 10:46 PM, Huawei Xie wrote:
> This patch removes the internal lockless enqueue implmentation.
> DPDK doesn't support receiving/transmitting packets from/to the same
> queue. Vhost PMD wraps vhost device as normal DPDK port. DPDK
> applications normally have their own lock implmentation when enqueue
> packets to the same queue of a port.
>
> The atomic cmpset is a costly operation. This patch should help
> performance a bit.
>
> Signed-off-by: Huawei Xie 
> ---
>   lib/librte_vhost/vhost_rxtx.c | 86 
> +--
>   1 file changed, 25 insertions(+), 61 deletions(-)
>
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> index bbf3fac..26a1b9c 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c

I think vhost example will not work well with this patch when 
vm2vm=software.

Test case:
Two virtio ports handled by two pmd threads. Thread 0 polls pkts from 
physical NIC and sends to virtio0, while thread0 receives pkts from 
virtio1 and routes it to virtio0.

> -
>   *(volatile uint16_t *)&vq->used->idx += entry_success;

Another unrelated question: We ever try to move this assignment out of 
loop to save cost as it's a data contention?

Thanks,
Jianfeng

[dpdk-dev] [PATCH v5 08/11] eal: pci: introduce RTE_KDRV_VFIO_NOIOMMUi driver mode

2016-01-20 Thread Santosh Shukla

On Tue, Jan 19, 2016 at 7:48 PM, Burakov, Anatoly
 wrote:
> Hi Santosh,
>
>> +int
>> +pci_vfio_is_noiommu(struct rte_pci_device *pci_dev) {
>> + FILE *fp;
>> + struct rte_pci_addr *loc;
>> + const char *path =
>> "/sys/module/vfio/parameters/enable_unsafe_noiommu_mode";
>> + char filename[PATH_MAX] = {0};
>> + char buf[PATH_MAX] = {0};
>> +
>> + /*
>> +  * 1. chk vfio-noiommu mode set in kernel driver
>> +  * 2. verify pci device attached to vfio-noiommu driver
>> +  * example:
>> +  * cd /sys/bus/pci/drivers/vfio-pci//iommu_group
>> +  * > cat name
>> +  * > vfio-noiommu --> means virtio_dev attached to vfio-noiommu
>> driver
>> +  */
>> +
>> + fp = fopen(path, "r");
>> + if (fp == NULL) {
>> + RTE_LOG(ERR, EAL, "can't open %s\n", path);
>> + return -1;
>> + }
>> +
>> + if (fread(buf, sizeof(char), 1, fp) != 1) {
>> + RTE_LOG(ERR, EAL, "can't read from file %s\n", path);
>> + fclose(fp);
>> + return -1;
>> + }
>> +
>> + if (strncmp(buf, "Y", 1) != 0) {
>> + RTE_LOG(ERR, EAL, "[%s]: vfio: noiommu mode not set\n",
>> path);
>> + fclose(fp);
>> + return -1;
>> + }
>> +
>> + fclose(fp);
>> +
>> + /* 2. chk whether attached driver is vfio-noiommu or not */
>> + loc = &pci_dev->addr;
>> + snprintf(filename, sizeof(filename),
>> +  SYSFS_PCI_DEVICES "/" PCI_PRI_FMT
>> "/iommu_group/name",
>> +  loc->domain, loc->bus, loc->devid, loc->function);
>> +
>> + /* check for vfio-noiommu */
>> + fp = fopen(filename, "r");
>> + if (fp == NULL) {
>> + RTE_LOG(ERR, EAL, "can't open %s\n", filename);
>> + return -1;
>> + }
>> +
>> + if (fread(buf, sizeof(char), sizeof("vfio-noiommu"), fp) !=
>> +   sizeof("vfio-noiommu")) {
>> + RTE_LOG(ERR, EAL, "can't read from file %s\n", filename);
>> + fclose(fp);
>> + return -1;
>> + }
>> +
>> + if (strncmp(buf, "vfio-noiommu", strlen("vfio-noiommu")) != 0) {
>> + RTE_LOG(ERR, EAL, "not a vfio-noiommu driver\n");
>> + fclose(fp);
>> + return -1;
>> + }
>> +
>> + fclose(fp);
>> +
>> + return 0;
>> +}
>
> Since this is a public non-performance critical API, shouldn't we check if 
> pci_dev is NULL? Otherwise the patch-set seems fine to me as far as VFIO 
> parts are concerned.
>
pci_scan_one() uses this api for now and it populate pci_dev before
pci_vfio_is_noiommu() could use. So didn't though to add a check, But
you are right in case any other module want to use this api. Sending
patch now. Thanks.

> Thanks,
> Anatoly

59 matches

Mail list logo