[dpdk-dev] How do you setup a VM in Promiscuous Mode using PCI Pass-Through (SR-IOV)?

2015-05-14 Thread Assaad, Sami (Sami)
Hello,

My Hardware consists of the following:
  - DL380 Gen 9 Server supporting two Haswell Processors (Xeon CPU E5-2680 v3 @ 
2.50GHz)
  - An x540 Ethernet Controller Card supporting 2x10G ports.

Software:
  - CentOS 7 (3.10.0-229.1.2.el7.x86_64)
  - DPDK 1.8

I want all the network traffic received on the two 10G ports to be transmitted 
to my VM. The issue is that the Virtual Function / Physical Functions have 
setup the internal virtual switch to only route Ethernet packets with 
destination MAC address matching the VM virtual interface MAC. How can I 
configure my virtual environment to provide all network traffic to the 
VM...i.e. set the virtual functions for both PCI devices in Promiscuous mode?

[ If a l2fwd-vf example exists, this would actually solve this problem ... Is 
there a DPDK l2fwd-vf example available? ]


Thanks in advance.

Best Regards,
Sami Assaad.


[dpdk-dev] Technical Steering Committee (TSC)

2015-05-14 Thread O'Driscoll, Tim
At Tuesday's Beyond DPDK 2.0 call, one topic we discussed was decision making 
and whether we need a Technical Steering Committee (TSC). As a follow-up to 
that discussion, I'd like to propose that we create a TSC for DPDK to guide the 
long-term strategic direction of the project.


Justification
-

The role of the Maintainer is to be the gate-keeper for the project, and to 
only accept contributions that are properly implemented, properly reviewed, and 
consistent with the agreed project scope/charter. However, it shouldn't be the 
responsibility of the Maintainer to be the final decision maker (after 
community discussion) on issues affecting the strategic direction of the 
project. Instead, this should be determined by a higher level body that's 
representative of the DPDK community.

Having a TSC should help to provide a clear direction/strategy for the project, 
and help to resolve complex issues which don't reach a consensus on the mailing 
list in a timely manner.

There were arguments at the call that a TSC is not required. The alternative 
view though is why would we not put one in place? The TSC could review its own 
progress after 6 months, and if the members don't consider it to be productive, 
then it could be disbanded. I see little effort and zero risk in trying this, 
with the potential gain of a clearer decision making process and a better 
defined project strategy. 


Scope
- 

Issues the TSC should consider should include:
- Project scope/charter. What is and isn't within the scope of the project? 
What happens if somebody wants to upstream a new library/capability and it's 
not clear whether it fits within DPDK or not? As a random example, if somebody 
wanted to upstream a DPDK-enabled TCP/IP stack to dpdk.org, should that be 
accepted or rejected?
- Performance vs functionality considerations. If we need to make a change and 
there's an unavoidable performance impact to doing so (maybe something like 
extending the mbuf again), does that change get accepted or not? In most cases 
you can probably work around situations like this by making them optional, but 
that might not always be possible. If it's not, who decides whether performance 
or functionality is more important?
- Replacing existing functionality versus adding new alternatives. An example 
of this might be Cuckoo Hash. Does that replace the existing hash 
implementation, or should it be provided as an alternative? Who decides this? 
That could be more of an operational issue, but it's borderline.
- Competitive Positioning. Monitor the competitive landscape and determine any 
impacts to future DPDK strategy. 
- Unresolved issues. Provide a decision on issues that don't reach a consensus 
on the mailing list in a timely manner.


Composition
---

Composition of the TSC should reflect contributions to the project, but be 
balanced so that no single party has an undue influence. It should also be kept 
to a manageable size(maybe 7?).

The TSC should elect its own chair, who would have the deciding vote in the 
event that the TSC was deadlocked. Once in place, the TSC should approve any 
new members.

Specific details on membership can be discussed and agreed later, if we agree 
on the creation of a TSC.


[dpdk-dev] [PATCH v10 1/2] mk:Simplify the ifdefs in the makefile

2015-05-14 Thread Olivier MATZ
Hi Keith,

On 05/14/2015 04:21 PM, Keith Wiles wrote:
> Simplify the ifdefs in rte.app.mk to make the code more
> readable and maintainable by introducing a internal
> _LDLIBS-y variable to build up the LDLIBS variable.
> 
> The new internal variable _LDLIBS-y should not be
> used outside of the rte.app.mk file.
> 
> Signed-off-by: Keith Wiles 

Series
Acked-by: Olivier Matz 

Thanks,
Olivier


[dpdk-dev] Issues with example/vhost with running VM

2015-05-14 Thread Maciej Grochowski
Thank You Xie for reply,

When I run vhost with -m 2048 it didn't start, with -m 3000 and host with
1024 give me segfault (I calculated hugepages and have only 3 free)

VHOST_CONFIG: bind to usvhost
VHOST_CONFIG: new virtio connection is 26
VHOST_CONFIG: new device, handle is 0
VHOST_CONFIG: new virtio connection is 27
VHOST_CONFIG: new device, handle is 1
VHOST_CONFIG: read message VHOST_USER_SET_OWNER
VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:0 file:28
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:1 file:29
VHOST_CONFIG: read message VHOST_USER_SET_OWNER
VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:0 file:30
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:1 file:31
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:0 file:32
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:1 file:28
VHOST_CONFIG: read message VHOST_USER_SET_FEATURES
VHOST_CONFIG: read message VHOST_USER_SET_MEM_TABLE
VHOST_CONFIG: mapped region 0 fd:29 to 0x2ac0 sz:0xa off:0x0
VHOST_CONFIG: mapped region 1 fd:33 to 0x sz:0x5000
off:0xc
VHOST_CONFIG: mmap qemu guest failed.
VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
Segmentation fault (core dumped)

While testing different configuration I finally got -m 3584

 ./build/app/vhost-switch -c f -n 4  --huge-dir /mnt/huge --socket-mem 3712
-- -p 0x1 --dev-basename usvhost --vm2vm 1 --stats 9
so I can run vhost-switch from 3584*2048k Hugepage

-with this vhost user I run two KVM machines with followed parameters

kvm -nographic -boot c -machine pc-i440fx-1.4,accel=kvm -name vm1 -cpu host
-smp 2
-hda /home/ubuntu/qemu/debian_squeeze2_amd64.qcow2 -m 1024 -mem-path
/mnt/huge -mem-prealloc
-chardev socket,id=char1,path=/home/ubuntu/dpdk/examples/vhost/usvhost
-netdev type=vhost-user,id=hostnet1,chardev=char1
-device virtio-net
pci,netdev=hostnet1,id=net1,csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off
-chardev socket,id=char2,path=/home/ubuntu/dpdk/examples/vhost/usvhost
-netdev type=vhost-user,id=hostnet2,chardev=char2
-device
virtio-net-pci,netdev=hostnet2,id=net2,csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off

with this i got

...
VHOST_CONFIG: mapped region 0 fd:31 to 0x2aaabae0 sz:0xa off:0x0
VHOST_CONFIG: mapped region 1 fd:37 to 0x2aaabb00 sz:0x1000
off:0xc
VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
VHOST_CONFIG: vring kick idx:0 file:38
VHOST_CONFIG: virtio isn't ready for processing.
VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
VHOST_CONFIG: vring kick idx:1 file:39
VHOST_CONFIG: virtio is now ready for processing.
VHOST_DATA: (1) Device has been added to data core 2


So everything looking ok, Thank You Xie.

But I found another Issue, maybe it is something trivial but using options:
--vm2vm 1 (or) 2 --stats 9
it seems that I didn't have connection between VM2VM communication. I set
manually IP for eth0 and eth1:

on 1 VM
ifconfig eth0 192.168.0.100 netmask 255.255.255.0 up
ifconfig eth1 192.168.1.101 netmask 255.255.255.0 up
on 2 VM
ifconfig eth0 192.168.1.200 netmask 255.255.255.0 up
ifconfig eth1 192.168.0.202 netmask 255.255.255.0 up

And I cant ping between any IP/interfaces, moreover stats show me that:

Device statistics 
Statistics for device 0 --
TX total:   0
TX dropped: 0
TX successful:  0
RX total:   0
RX dropped: 0
RX successful:  0
Statistics for device 1 --
TX total:   0
TX dropped: 0
TX successful:  0
RX total:   0
RX dropped: 0
RX successful:  0
Statistics for device 2 --
TX total:   0
TX dropped: 0
TX successful:  0
RX total:   0
RX dropped: 0
RX successful:  0
Statistics for device 3 --
TX total:   0
TX dropped: 0
TX successful:  0
RX total:   0
RX dropped: 0
RX successful:  0
==

So it seems like any packet didn't leave my VM.
also arp table is empty on each VM

Do You have any ide what can be 

[dpdk-dev] Intel fortville not working with multi-segment

2015-05-14 Thread Nissim Nisimov
Hi Helin,

Any news regarding this issue? do u know if there is any related patch I can 
apply on my application in order to work with multi-segment packets?

Thanks,
Nissim

-Original Message-
From: Zhang, Helin [mailto:helin.zh...@intel.com] 
Sent: Tuesday, May 12, 2015 11:51 AM
To: Nissim Nisimov
Cc: 'dev at dpdk.org'
Subject: RE: Intel fortville not working with multi-segment

Hi Nissim

It seems that our validation guys here can reproduce it in our lab. I will 
check that soon later, and update you later.
Thank you very much for the good finding!

Regards,
Helin

> -Original Message-
> From: Nissim Nisimov [mailto:NissimN at Radware.com]
> Sent: Monday, May 11, 2015 11:44 AM
> To: Zhang, Helin
> Cc: 'dev at dpdk.org'
> Subject: RE: Intel fortville not working with multi-segment
> 
> Hi,
> 
> I am using PF pass-through and it doesn't work even with 2000 bytes of 
> server response page size.
> Looks like the first segment of each session is not received.
> 
> When i am changing the server response size to 1000 bytes, all works 
> as expected.
> 
> I am working with dpdk 1.8 version.
> 
> Any idea why ? Is it related to i40e multi segment support?
> 
> Thx
> Nissim
> 
> On May 11, 2015 5:03 AM, "Zhang, Helin" 
> wrote:
> >
> > Hi Nissim
> >
> > Are you using PF pass-through or VF pass-through?
> > For PF pass-through, you might have already gotten the fix.
> > For VF pass-through, there is
> 
> Hi Nissim
> 
> Are you using PF pass-through or VF pass-through?
> For PF pass-through, you might have already gotten the fix.
> For VF pass-through, there is a bug fix which is needed for supporting 
> jumbo frame and multiple mbuf.
> http://www.dpdk.org/dev/patchwork/patch/4641/
> 
> 
> Regards,
> Helin
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Nissim Nisimov
> > Sent: Monday, May 11, 2015 3:48 AM
> > To: Nissim Nisimov; 'dev at dpdk.org'
> > Subject: Re: [dpdk-dev] Intel fortville not working with 
> > multi-segment
> >
> > Hi,
> >
> > can someone assist regarding this issue?
> >
> > Is it a known limitation in i40e/dpdk (no support for multi-segment)?
> >
> > Thx
> > Nissim
> >
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Nissim Nisimov
> > Sent: Thursday, May 07, 2015 5:44 PM
> > To: 'dev at dpdk.org'
> > Subject: [dpdk-dev] Intel fortville not working with multi-segment
> >
> > Hi,
> >
> >
> >
> > I am trying to work with Intel Fortville (XL710) NICs in Passthrough 
> > mode from a VM running dpdk app.
> >
> >
> > First I didn't have any TX traffic from the VM, I got dpdk patch for 
> > this issue and it fixed it.
> > (http://www.dpdk.org/dev/patchwork/patch/4588/)
> >
> > But now I see that when trying to run multi-segment traffic not all 
> > the packets reaching the VM (I tested it on bare metal as well and 
> > saw the same issue)
> >
> > Is it a known issue? any workaround for it?
> >
> > Thanks,
> > Nissim



[dpdk-dev] [PATCH v9 1/2] mk:Simplify the ifdefs in the makefile

2015-05-14 Thread Wiles, Keith


On 5/14/15, 7:30 AM, "Olivier MATZ"  wrote:

>Hi,
>
>On 05/13/2015 06:35 PM, Keith Wiles wrote:
>> Simplify the ifdefs in rte.app.mk to make the code more
>> readable and maintainable by introducing a internal
>> _LDLIBS-y variable to build up the LDLIBS variable.
>> 
>> The new internal variable _LDLIBS-y should not be
>> used outside of the rte.app.mk file.
>> 
>> Signed-off-by: Keith Wiles 
>> ---
>>  mk/rte.app.mk | 243
>>+++---
>>  1 file changed, 61 insertions(+), 182 deletions(-)
>> 
>> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
>> index af8a1b0..1a2043a 100644
>> --- a/mk/rte.app.mk
>> +++ b/mk/rte.app.mk
>> [...]
>>  
>> -LDLIBS += $(CPU_LDLIBS)
>> +LDLIBS += $(_LDLIBS-y) $(CPU_LDLIBS) $(EXTRA_LDLIBS)
>>  
>>  .PHONY: all
>>  all: install
>
>
>You are still adding EXTRA_LDLIBS in a patch called "simplify the
>ifdefs".

I did have them split correctly and its my bad I somehow did not get then
committed correctly.

>
>A good idea before sending a new version of a patch on the mailing
>list is to check the list of the modifications that were discussed.
>Then this list can be added in the cover letter or after the "---"
>marker of your patch, allowing the reviewers to better understand
>what changed in this version.
>
>Regards,
>Olivier



[dpdk-dev] [PATCH v9 1/2] mk:Simplify the ifdefs in the makefile

2015-05-14 Thread Olivier MATZ
Hi,

On 05/13/2015 06:35 PM, Keith Wiles wrote:
> Simplify the ifdefs in rte.app.mk to make the code more
> readable and maintainable by introducing a internal
> _LDLIBS-y variable to build up the LDLIBS variable.
> 
> The new internal variable _LDLIBS-y should not be
> used outside of the rte.app.mk file.
> 
> Signed-off-by: Keith Wiles 
> ---
>  mk/rte.app.mk | 243 
> +++---
>  1 file changed, 61 insertions(+), 182 deletions(-)
> 
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index af8a1b0..1a2043a 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> [...]
>  
> -LDLIBS += $(CPU_LDLIBS)
> +LDLIBS += $(_LDLIBS-y) $(CPU_LDLIBS) $(EXTRA_LDLIBS)
>  
>  .PHONY: all
>  all: install


You are still adding EXTRA_LDLIBS in a patch called "simplify the
ifdefs".

A good idea before sending a new version of a patch on the mailing
list is to check the list of the modifications that were discussed.
Then this list can be added in the cover letter or after the "---"
marker of your patch, allowing the reviewers to better understand
what changed in this version.

Regards,
Olivier


[dpdk-dev] [PATCH 0/2] doc: refactored fig and table nums into references

2015-05-14 Thread Mcnamara, John


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, May 13, 2015 8:08 PM
> To: Mcnamara, John
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 0/2] doc: refactored fig and table nums
> into references
> 
> 
> The error is removed so it's better.
> With this patch, a figure reference looks like this:
> :numref:`figure_single_port_nic` Virtualization for a Single Port NIC in
> SR-IOV Mode The rst line is:
> :numref:`figure_single_port_nic` :ref:`figure_single_port_nic`
> 
> I was trying to replace the numref output by a working link with "figure"
> as label.
> This is my trial to mimic :ref: as a first step:


Hi Thomas,

I'll take a look at it and see if I can get something working.

John.
-- 




[dpdk-dev] [PATCH v10 2/2] mk:Introduce the EXTRA_LDLIBS variable

2015-05-14 Thread Keith Wiles
Signed-off-by: Keith Wiles 
---
 doc/build-sdk-quick.txt  | 1 +
 doc/guides/prog_guide/dev_kit_build_system.rst   | 2 ++
 doc/guides/prog_guide/dev_kit_root_make_help.rst | 2 +-
 mk/rte.app.mk| 2 +-
 4 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/doc/build-sdk-quick.txt b/doc/build-sdk-quick.txt
index 041a40e..bf18b48 100644
--- a/doc/build-sdk-quick.txt
+++ b/doc/build-sdk-quick.txt
@@ -13,6 +13,7 @@ Build variables
EXTRA_CPPFLAGS   preprocessor options
EXTRA_CFLAGS compiler options
EXTRA_LDFLAGSlinker options
+   EXTRA_LDLIBS linker library options
RTE_KERNELDIRlinux headers path
CROSS toolchain prefix
V verbose
diff --git a/doc/guides/prog_guide/dev_kit_build_system.rst 
b/doc/guides/prog_guide/dev_kit_build_system.rst
index 04f1d4e..7dc2de6 100644
--- a/doc/guides/prog_guide/dev_kit_build_system.rst
+++ b/doc/guides/prog_guide/dev_kit_build_system.rst
@@ -411,6 +411,8 @@ Variables that Can be Set/Overridden by the User in a 
Makefile or Command Line

 *   EXTRA_LDFLAGS: The content of this variable is appended after LDFLAGS when 
linking.

+*   EXTRA_LDLIBS: The content of this variable is appended after LDLIBS when 
linking.
+
 *   EXTRA_ASFLAGS: The content of this variable is appended after ASFLAGS when 
assembling.

 *   EXTRA_CPPFLAGS: The content of this variable is appended after CPPFLAGS 
when using a C preprocessor on assembly files.
diff --git a/doc/guides/prog_guide/dev_kit_root_make_help.rst 
b/doc/guides/prog_guide/dev_kit_root_make_help.rst
index 333b007..e522c12 100644
--- a/doc/guides/prog_guide/dev_kit_root_make_help.rst
+++ b/doc/guides/prog_guide/dev_kit_root_make_help.rst
@@ -218,7 +218,7 @@ The following variables can be specified on the command 
line:

 Enable dependency debugging. This provides some useful information about 
why a target is built or not.

-*   EXTRA_CFLAGS=, EXTRA_LDFLAGS=, EXTRA_ASFLAGS=, EXTRA_CPPFLAGS=
+*   EXTRA_CFLAGS=, EXTRA_LDFLAGS=, EXTRA_LDLIBS=, EXTRA_ASFLAGS=, 
EXTRA_CPPFLAGS=

 Append specific compilation, link or asm flags.

diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 4fc582a..1a2043a 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -141,7 +141,7 @@ _LDLIBS-y += $(EXECENV_LDLIBS)
 _LDLIBS-y += --end-group
 _LDLIBS-y += --no-whole-archive

-LDLIBS += $(_LDLIBS-y) $(CPU_LDLIBS)
+LDLIBS += $(_LDLIBS-y) $(CPU_LDLIBS) $(EXTRA_LDLIBS)

 .PHONY: all
 all: install
-- 
2.3.0



[dpdk-dev] [PATCH v10 1/2] mk:Simplify the ifdefs in the makefile

2015-05-14 Thread Keith Wiles
Simplify the ifdefs in rte.app.mk to make the code more
readable and maintainable by introducing a internal
_LDLIBS-y variable to build up the LDLIBS variable.

The new internal variable _LDLIBS-y should not be
used outside of the rte.app.mk file.

Signed-off-by: Keith Wiles 
---
 mk/rte.app.mk | 243 +++---
 1 file changed, 61 insertions(+), 182 deletions(-)

diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index af8a1b0..4fc582a 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -1,7 +1,7 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
-#   Copyright(c) 2014 6WIND S.A.
+#   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+#   Copyright(c) 2014-2015 6WIND S.A.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -51,218 +51,97 @@ LDSCRIPT = $(RTE_LDSCRIPT)
 endif

 # default path for libs
-LDLIBS += -L$(RTE_SDK_BIN)/lib
+_LDLIBS-y += -L$(RTE_SDK_BIN)/lib

 #
 # Order is important: from higher level to lower level
 #
-LDLIBS += --whole-archive

-ifeq ($(CONFIG_RTE_BUILD_COMBINE_LIBS),y)
-LDLIBS += -l$(RTE_LIBNAME)
-endif
+_LDLIBS-y += --whole-archive

-ifeq ($(CONFIG_RTE_BUILD_COMBINE_LIBS),n)
+_LDLIBS-$(CONFIG_RTE_BUILD_COMBINE_LIBS)+= -l$(RTE_LIBNAME)

-ifeq ($(CONFIG_RTE_LIBRTE_DISTRIBUTOR),y)
-LDLIBS += -lrte_distributor
-endif
+ifeq ($(CONFIG_RTE_BUILD_COMBINE_LIBS),n)

-ifeq ($(CONFIG_RTE_LIBRTE_REORDER),y)
-LDLIBS += -lrte_reorder
-endif
+_LDLIBS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)+= -lrte_distributor
+_LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)+= -lrte_reorder

-ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
-LDLIBS += -lrte_kni
-endif
+_LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)+= -lrte_kni
+_LDLIBS-$(CONFIG_RTE_LIBRTE_IVSHMEM)+= -lrte_ivshmem
 endif

-ifeq ($(CONFIG_RTE_LIBRTE_IVSHMEM),y)
-ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
-LDLIBS += -lrte_ivshmem
-endif
-endif
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PIPELINE)   += -lrte_pipeline
+_LDLIBS-$(CONFIG_RTE_LIBRTE_TABLE)  += -lrte_table
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PORT)   += -lrte_port
+_LDLIBS-$(CONFIG_RTE_LIBRTE_TIMER)  += -lrte_timer
+_LDLIBS-$(CONFIG_RTE_LIBRTE_HASH)   += -lrte_hash
+_LDLIBS-$(CONFIG_RTE_LIBRTE_JOBSTATS)   += -lrte_jobstats
+_LDLIBS-$(CONFIG_RTE_LIBRTE_LPM)+= -lrte_lpm
+_LDLIBS-$(CONFIG_RTE_LIBRTE_POWER)  += -lrte_power
+_LDLIBS-$(CONFIG_RTE_LIBRTE_ACL)+= -lrte_acl
+_LDLIBS-$(CONFIG_RTE_LIBRTE_METER)  += -lrte_meter

-ifeq ($(CONFIG_RTE_LIBRTE_PIPELINE),y)
-LDLIBS += -lrte_pipeline
-endif
+_LDLIBS-$(CONFIG_RTE_LIBRTE_SCHED)  += -lrte_sched
+_LDLIBS-$(CONFIG_RTE_LIBRTE_SCHED)  += -lm
+_LDLIBS-$(CONFIG_RTE_LIBRTE_SCHED)  += -lrt

-ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y)
-LDLIBS += -lrte_table
-endif
-
-ifeq ($(CONFIG_RTE_LIBRTE_PORT),y)
-LDLIBS += -lrte_port
-endif
-
-ifeq ($(CONFIG_RTE_LIBRTE_TIMER),y)
-LDLIBS += -lrte_timer
-endif
-
-ifeq ($(CONFIG_RTE_LIBRTE_HASH),y)
-LDLIBS += -lrte_hash
-endif
-
-ifeq ($(CONFIG_RTE_LIBRTE_JOBSTATS),y)
-LDLIBS += -lrte_jobstats
-endif
-
-ifeq ($(CONFIG_RTE_LIBRTE_LPM),y)
-LDLIBS += -lrte_lpm
-endif
-
-ifeq ($(CONFIG_RTE_LIBRTE_POWER),y)
-LDLIBS += -lrte_power
-endif
-
-ifeq ($(CONFIG_RTE_LIBRTE_ACL),y)
-LDLIBS += -lrte_acl
-endif
-
-ifeq ($(CONFIG_RTE_LIBRTE_METER),y)
-LDLIBS += -lrte_meter
-endif
-
-ifeq ($(CONFIG_RTE_LIBRTE_SCHED),y)
-LDLIBS += -lrte_sched
-LDLIBS += -lm
-LDLIBS += -lrt
-endif
-
-ifeq ($(CONFIG_RTE_LIBRTE_VHOST), y)
-LDLIBS += -lrte_vhost
-endif
+_LDLIBS-$(CONFIG_RTE_LIBRTE_VHOST)  += -lrte_vhost

 endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS

-ifeq ($(CONFIG_RTE_LIBRTE_PMD_PCAP),y)
-LDLIBS += -lpcap
-endif
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP)   += -lpcap

-ifeq ($(CONFIG_RTE_LIBRTE_VHOST)$(CONFIG_RTE_LIBRTE_VHOST_USER),yn)
-LDLIBS += -lfuse
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST_USER),n)
+_LDLIBS-$(CONFIG_RTE_LIBRTE_VHOST)  += -lfuse
 endif

-ifeq ($(CONFIG_RTE_LIBRTE_MLX4_PMD),y)
-LDLIBS += -libverbs
-endif
+_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)   += -libverbs

-LDLIBS += --start-group
+_LDLIBS-y += --start-group

 ifeq ($(CONFIG_RTE_BUILD_COMBINE_LIBS),n)

-ifeq ($(CONFIG_RTE_LIBRTE_KVARGS),y)
-LDLIBS += -lrte_kvargs
-endif
-
-ifeq ($(CONFIG_RTE_LIBRTE_MBUF),y)
-LDLIBS += -lrte_mbuf
-endif
-
-ifeq ($(CONFIG_RTE_LIBRTE_IP_FRAG),y)
-LDLIBS += -lrte_ip_frag
-endif
-
-ifeq ($(CONFIG_RTE_LIBRTE_ETHER),y)
-LDLIBS += -lethdev
-endif
-
-ifeq ($(CONFIG_RTE_LIBRTE_MALLOC),y)
-LDLIBS += -lrte_malloc
-endif
-
-ifeq ($(CONFIG_RTE_LIBRTE_MEMPOOL),y)
-LDLIBS += -lrte_mempool
-endif
-
-ifeq ($(CONFIG_RTE_LIBRTE_RING),y)
-LDLIBS += -lrte_ring
-endif
-
-ifeq ($(CONFIG_RTE_LIBRTE_EAL),y)
-LDLIBS += -lrte_eal
-endif
-
-ifeq ($(CONFIG_RTE_LIBRTE_CMDLINE),y)
-LDLIBS += -lrte_cmdline
-endif
-
-ifeq ($(CONFIG_RTE_LIBRTE_CFGFILE),y)
-LDLIBS += -lrte_cfgfile
-endif
-
-ifeq 

[dpdk-dev] proposal: raw packet send and receive API for PMD driver

2015-05-14 Thread Guojiachun
Hello,

This is our proposal to introduce new PMD APIs, it would be much better to 
integrate DPDK into various applications.

There is a gap in hardware offload when you porting DPDK to new platform which 
support some offload features, like packet accelerator, buffer management etc.
If we can make some supplement to it. It will be easy for porting DPDK to new 
NIC/platform.


1.  Packet buffer management

The PMD driver use DPDK software mempool API to get/put packet buffer in 
currently. But in other cases, the hardware maybe support buffer management 
unit. We can use hardware buffer-unit replaces DPDK software mempool to gain 
efficiencies. So we need to register get/put hook APIs to these eth_dev. 
Defined as following:

/*  include  and  */

typedef int (*rbuf_bulk_get_hook)(void* mempool, void **obj_table, unsigned n);

typedef int (*rbuf_bulk_free_hook)(void* memaddr);



typedef int (*eth_dev_init_t)(struct eth_driver  *eth_drv,  struct rte_eth_dev 
*eth_dev,
rbuf_bulk_get_hook *rbuf_get,  
rbuf_bulk_free_hook *rbuf_free);

 If there are no hardware buffer-unit, we can register the currently 
rte_mempool APIs in eth_dev_init(). Each driver use these API hook but not 
rte_mempool_get_bulk()/rte_mempool_put().


2.  Recv/send API and raw_buf

The hardware offload feature differences exist between the NICs. Currently 
defined in rte_mbuf for rx/tx offload can't applying all NIC. And sometimes 
modifying rte_mbuf also need to modify all PMD driver. But we can define a 
union rte_rbuf to resolve it.



struct rte_rbuf {

 void *buf_addr;   /**< Virtual address of segment buffer. */

 phys_addr_t buf_physaddr; /**< Physical address of segment buffer. */

 uint16_t buf_len; /**< Length of segment buffer. */

  uint16_t data_off;

 uint8_t nb_segs;  /**< Number of segments. */



union{

struct{

 uint32_t rx_offload_data[8];

 uint32_t tx_offload_data[8];

} offload_data;



struct{ /* intel nic offload define*/

  uint32_t rss;

uint64_t tx_offload;

  ...

}intel_offload;



 /* other NIC offload define */

...

} /* offload define */

}





3.  RTE_PKTMBUF_HEADROOM

Each PMD driver need to fill rte_mbuf->data_off according to the macro: 
RTE_PKTMBUF_HEADROOM. But in some cases, different application need different 
RTE_PKTMBUF_HEADROOM. Once changing the value of RTE_PKTMBUF_HEADROOM, it 
should to re-compile all drivers. That means different application need 
different driver lib, but not the same one.

So we can pass a argument dynamically in eth_dev_init() to replace the MACRO: 
RTE_PKTMBUF_HEADROOM.




Therefore, we can add these APIs as following:


struct rte_rbuf {

 void *buf_addr;   /**< Virtual address of segment buffer. */

 phys_addr_t buf_physaddr; /**< Physical address of segment buffer. */

 uint16_t buf_len; /**< Length of segment buffer. */

  uint16_t data_off;

 uint8_t nb_segs;  /**< Number of segments. */



union{

struct{

 uint32_t rx_offload_data[8];

 uint32_t tx_offload_data[8];

} offload_data;



struct{ /* intel nic offload define*/

  uint32_t rss;

uint64_t tx_offload;

  ...

}intel_offload;



 /* other NIC offload define */

...

} /* offload define */

}

uint16_t rte_eth_tx_raw_burst(uint8_t port_id, uint16_t queue_id, struct 
rte_rbuf **tx_pkts, uint16_t nb_pkts);
uint16_t rte_eth_rx_raw_burst(uint8_t port_id, uint16_t queue_id, struct 
rte_rbuf **rx_pkts, uint16_t nb_pkts);

/*  include  and  */
typedef int (*rbuf_bulk_get_hook)(void* mempool, void **obj_table, unsigned n);
typedef int (*rbuf_bulk_free_hook)(void* memaddr);

/* use 'headroom_offset' to replace compile MARCO(CONFIG_RTE_PKTMBUF_HEADROOM) 
*/
typedef int (*eth_dev_init_t)(struct eth_driver  *eth_drv,  struct rte_eth_dev 
*eth_dev,
rbuf_bulk_get_hook *rbuf_get,  
rbuf_bulk_free_hook *rbuf_free, uint16_t headroom_offset);


These are my ideas, I hope you can help me to improve on them.
Thank you!

Jiachun