date:20141209

[dpdk-dev] [PATCH v2 3/3] app/testpmd:change tx_checksum command and csum forwarding engine

2014-12-09 Thread Jijiang Liu

The patch enhances the tx_checksum command and reworks csum forwarding engine 
due to the change of tx_checksum command.
The main changes of the tx_checksum command are listed below,

1. add "tx_checksum set tunnel (hw|sw|none) (port-id)" command

2. add "tx_checksum set outer-ip (hw|sw) (port-id)" command

3. remove the "vxlan" option from the "tx_checksum set(ip|udp|tcp|sctp|vxlan) 
(hw|sw) (port-id)" command

Moreover, replace the TESTPMD_TX_OFFLOAD_VXLAN_CKSUM flag with 
TESTPMD_TX_OFFLOAD_TUNNEL_CKSUM flag, and add the 
TESTPMD_TX_OFFLOAD_OUTER_IP_CKSUM and TESTPMD_TX_OFFLOAD_NON_TUNNEL_CKSUM flag.


Signed-off-by: Jijiang Liu 
---
 app/test-pmd/cmdline.c  |  209 ---
 app/test-pmd/csumonly.c |   38 ++---
 app/test-pmd/testpmd.h  |   14 +++-
 3 files changed, 234 insertions(+), 27 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index f79ea3e..9bfa9ef 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -316,16 +316,30 @@ static void cmd_help_long_parsed(void *parsed_result,
"Disable hardware insertion of a VLAN header in"
" packets sent on a port.\n\n"

-   "tx_cksum set (ip|udp|tcp|sctp|vxlan) (hw|sw) 
(port_id)\n"
+   "tx_cksum set (ip|udp|tcp|sctp) (hw|sw) (port_id)\n"
"Select hardware or software calculation of the"
" checksum with when transmitting a packet using the"
" csum forward engine.\n"
-   "ip|udp|tcp|sctp always concern the inner layer.\n"
-   "vxlan concerns the outer IP and UDP layer (in"
-   " case the packet is recognized as a vxlan packet by"
-   " the forward engine)\n"
+   "In the case of tunneling packet, ip|udp|tcp|sctp"
+   " always concern the inner layer.\n\n"
+
+   "tx_cksum set tunnel (hw|sw|none) (port_id)\n"
+   " Select hardware or software calculation of the"
+   " checksum with when transmitting a tunneling packet"
+   " using the csum forward engine.\n"
+   " The none option means treat tunneling packet as 
ordinary
+   " packet when using the csum forward engine.
+   "Tunneling packet concerns the outer IP, inner IP"
+   " and inner L4\n"
"Please check the NIC datasheet for HW limits.\n\n"

+   "tx_cksum set (outer-ip) (hw|sw) (port_id)\n"
+   "Select hardware or software calculation of the"
+   " checksum with when transmitting a packet using the"
+   " csum forward engine.\n"
+   "outer-ip always concern the outer layer of"
+   " tunneling packet.\n\n"
+
"tx_checksum show (port_id)\n"
"Display tx checksum offload configuration\n\n"

@@ -2861,6 +2875,181 @@ cmdline_parse_inst_t cmd_tx_vlan_reset = {
},
 };

+/* ENABLE HARDWARE INSERTION OF CHECKSUM IN TX PACKETS FOR TUNNELING */
+struct cmd_tx_cksum_tunnel_result {
+   cmdline_fixed_string_t tx_cksum;
+   cmdline_fixed_string_t mode;
+   cmdline_fixed_string_t type;
+   cmdline_fixed_string_t hwsw;
+   uint8_t port_id;
+};
+
+static void
+cmd_tx_cksum_tunnel_parsed(void *parsed_result,
+  __attribute__((unused)) struct cmdline *cl,
+  __attribute__((unused)) void *data)
+{
+   struct cmd_tx_cksum_tunnel_result *res = parsed_result;
+   int hw = 0;
+   uint16_t ol_flags, mask = 0;
+
+   if (port_id_is_invalid(res->port_id)) {
+   printf("invalid port %d\n", res->port_id);
+   return;
+   }
+
+   if (!strcmp(res->mode, "set")) {
+
+   if (!strcmp(res->hwsw, "hw"))
+   hw = 1;
+   else if (!strcmp(res->hwsw, "none")) {
+   ports[res->port_id].tx_ol_flags &=
+   ~(TESTPMD_TX_OFFLOAD_TUNNEL_CKSUM
+   | TESTPMD_TX_OFFLOAD_OUTER_IP_CKSUM);
+   ports[res->port_id].tx_ol_flags |=
+   TESTPMD_TX_OFFLOAD_NON_TUNNEL_CKSUM;
+   return;
+   }
+
+   ports[res->port_id].tx_ol_flags &=
+   ~TESTPMD_TX_OFFLOAD_NON_TUNNEL_CKSUM;
+   mask = TESTPMD_TX_OFFLOAD_TUNNEL_CKSUM;
+
+   if (hw)
+   ports[res->port_id].tx_ol_flags |= mask;
+   else
+   ports[res->port_id].tx_ol_flags &= (~mask);
+   }
+
+   ol_flags = ports[res->port_id].tx_ol_flags;
+

[dpdk-dev] [PATCH v2 0/3] enhance TX checksum command and csum forwarding engine

2014-12-09 Thread Jijiang Liu

In the current codes, the "tx_checksum set (ip|udp|tcp|sctp|vxlan) (hw|sw) 
(port-id)" command is not easy to understand and extend, so the patch set 
enhances the tx_checksum command and reworks csum forwarding engine due to the 
change of tx_checksum command. 
The main changes of the tx_checksum command are listed below,

1> add "tx_checksum set tunnel (hw|sw|none) (port-id)" command

The command is used to set/clear tunnel flag that is used to tell the NIC that 
the packetg is a tunneing packet and application want hardware TX checksum 
offload for outer layer, or inner layer, or both.

The 'none' option means that user treat tunneling packet as ordinary packet 
when using the csum forward engine.
for example, let say we have a tunnel packet: 
eth_hdr_out/ipv4_hdr_out/udp_hdr_out/vxlan_hdr/ehtr_hdr_in/ipv4_hdr_in/tcp_hdr_in.
 one of several scenarios:

1) User requests HW offload for ipv4_hdr_out  checksum, and doesn't care is it 
a tunnelled packet or not. So he sets:

tx_checksum set tunnel none 0

tx_checksum set ip hw 0

So for such case we should set tx_tunnel to 'none'.

2> add "tx_checksum set outer-ip (hw|sw) (port-id)" command

The command is used to set/clear outer IP flag that is used to tell the NIC 
that application want hardware offload for outer layer.

3> remove the 'vxlan' option from the  "tx_checksum set (ip|udp|tcp|sctp|vxlan) 
(hw|sw) (port-id)" command

The command is used to set IP, UDP, TCP and SCTP TX checksum flag. In the case 
of tunneling packet, the IP, UDP, TCP and SCTP flags always concern inner layer.

Moreover, replace the TESTPMD_TX_OFFLOAD_VXLAN_CKSUM flag with 
TESTPMD_TX_OFFLOAD_TUNNEL_CKSUM flag and add the 
TESTPMD_TX_OFFLOAD_OUTER_IP_CKSUM and TESTPMD_TX_OFFLOAD_NON_TUNNEL_CKSUM flag 
in test-pmd application.

v2 change:
 redefine the 'none' behaviour for "tx_checksum set tunnel (hw|sw|none) 
(port-id)" command.

Jijiang Liu (3):
  add outer IP offload capability in librte_ether.
  add outer IP checksum capability in i40e PMD
  testpmd command lines of the tx_checksum and csum forwarding rework

 app/test-pmd/cmdline.c|  201 +++--
 app/test-pmd/csumonly.c   |   35 ---
 app/test-pmd/testpmd.h|6 +-
 lib/librte_ether/rte_ethdev.h |1 +
 lib/librte_pmd_i40e/i40e_ethdev.c |3 +-
 5 files changed, 218 insertions(+), 28 deletions(-)

-- 
1.7.7.6

[dpdk-dev] [PATCH] ixgbe: fix a problem with NIC TSO offload

2014-12-09 Thread Walukiewicz, Miroslaw

You are completely right. I did not catch the mask. I will remove the patch.

Mirek

> -Original Message-
> From: Ananyev, Konstantin
> Sent: Tuesday, December 9, 2014 7:41 PM
> To: Walukiewicz, Miroslaw; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] ixgbe: fix a problem with NIC TSO offload
> 
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of
> miroslaw.walukiewicz at intel.com
> > Sent: Tuesday, December 09, 2014 5:15 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH] ixgbe: fix a problem with NIC TSO offload
> >
> > From: Miroslaw Walukiewicz 
> >
> > The patch fixes a minor issue with setting up of TSO feature for
> > ixgbe NICs.
> >
> > The values for l4_len and tso_segsz was chagned first by txoffload mask
> > and next set up in the NIC descriptor.
> >
> > Signed-off-by: Mirek Walukiewicz 
> > ---
> >  lib/librte_pmd_ixgbe/ixgbe_rxtx.c |4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> > index 8559ef6..c9c3104 100644
> > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> > @@ -390,13 +390,13 @@ ixgbe_set_xmit_ctx(struct igb_tx_queue* txq,
> > type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4 |
> > IXGBE_ADVTXD_TUCMD_L4T_TCP |
> > IXGBE_ADVTXD_DTYP_CTXT |
> IXGBE_ADVTXD_DCMD_DEXT;
> > +   mss_l4len_idx |= tx_offload.tso_segsz <<
> IXGBE_ADVTXD_MSS_SHIFT;
> > +   mss_l4len_idx |= tx_offload.l4_len <<
> IXGBE_ADVTXD_L4LEN_SHIFT;
> 
> Didn't  understand your comment above...
> By whom tx_offload.l4_len will be changed?
> As I can see, below only tx_offload_mask is updated, tx_offload is intact.
> Konstantin
> 
> >
> > tx_offload_mask.l2_len = ~0;
> > tx_offload_mask.l3_len = ~0;
> > tx_offload_mask.l4_len = ~0;
> > tx_offload_mask.tso_segsz = ~0;
> > -   mss_l4len_idx |= tx_offload.tso_segsz <<
> IXGBE_ADVTXD_MSS_SHIFT;
> > -   mss_l4len_idx |= tx_offload.l4_len <<
> IXGBE_ADVTXD_L4LEN_SHIFT;
> > } else { /* no TSO, check if hardware checksum is needed */
> > if (ol_flags & PKT_TX_IP_CKSUM) {
> > type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4;

[dpdk-dev] [PATCH] ixgbe: fix a problem with NIC TSO offload

2014-12-09 Thread Ananyev, Konstantin

> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of miroslaw.walukiewicz 
> at intel.com
> Sent: Tuesday, December 09, 2014 5:15 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] ixgbe: fix a problem with NIC TSO offload
> 
> From: Miroslaw Walukiewicz 
> 
> The patch fixes a minor issue with setting up of TSO feature for
> ixgbe NICs.
> 
> The values for l4_len and tso_segsz was chagned first by txoffload mask
> and next set up in the NIC descriptor.
> 
> Signed-off-by: Mirek Walukiewicz 
> ---
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> index 8559ef6..c9c3104 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> @@ -390,13 +390,13 @@ ixgbe_set_xmit_ctx(struct igb_tx_queue* txq,
>   type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4 |
>   IXGBE_ADVTXD_TUCMD_L4T_TCP |
>   IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;
> + mss_l4len_idx |= tx_offload.tso_segsz << IXGBE_ADVTXD_MSS_SHIFT;
> + mss_l4len_idx |= tx_offload.l4_len << IXGBE_ADVTXD_L4LEN_SHIFT;

Didn't  understand your comment above...
By whom tx_offload.l4_len will be changed?
As I can see, below only tx_offload_mask is updated, tx_offload is intact.
Konstantin

> 
>   tx_offload_mask.l2_len = ~0;
>   tx_offload_mask.l3_len = ~0;
>   tx_offload_mask.l4_len = ~0;
>   tx_offload_mask.tso_segsz = ~0;
> - mss_l4len_idx |= tx_offload.tso_segsz << IXGBE_ADVTXD_MSS_SHIFT;
> - mss_l4len_idx |= tx_offload.l4_len << IXGBE_ADVTXD_L4LEN_SHIFT;
>   } else { /* no TSO, check if hardware checksum is needed */
>   if (ol_flags & PKT_TX_IP_CKSUM) {
>   type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4;

[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-12-09 Thread Paolo Bonzini

I had replied to this message, but my reply never got to the list.
Let's try again.

I wonder if this might be papering over a bug in the host cpufreq
driver.  If the guest is not doing much and leaving a lot of idle CPU
time, the host should scale down the frequency of that CPU.  In the case
of pinned VCPUs this should really "just work".  What is the problem
that is being solved?

Paolo

On 22/11/2014 18:17, Vincent JARDIN wrote:
> Tim,
> 
> cc-ing Paolo and qemu-devel@ again in order to get their take on it.
> 
 Did you make any progress in Qemu/KVM community?
 We need to be sync'ed up with them to be sure we share the same goal.
 I want also to avoid using a solution which doesn't fit with their
 plan.
 Remember that we already had this problem with ivshmem which was
 planned to be dropped.

> 
>>> Unfortunately, I have not yet received any feedback:
>>> http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01103.html
>>
>> Just to add to what Alan said above, this capability does not exist in
>> qemu at the moment, and based on there having been no feedback on the
>> qemu mailing list so far, I think it's reasonable to assume that it
>> will not be implemented in the immediate future. The VM Power
>> Management feature has also been designed to allow easy migration to a
>> qemu-based solution when this is supported in future. Therefore, I'd
>> be in favour of accepting this feature into DPDK now.
>>
>> It's true that the implementation is a work-around, but there have
>> been similar cases in DPDK in the past. One recent example that comes
>> to mind is userspace vhost. The original implementation could also be
>> considered a work-around, but it met the needs of many in the
>> community. Now, with support for vhost-user in qemu 2.1, that
>> implementation is being improved. I'd see VM Power Management
>> following a similar path when this capability is supported in qemu.
> 
> Best regards,
>   Vincent
>

[dpdk-dev] DDPK use of MAP_FIXED in mmap

2014-12-09 Thread Xie, Huawei



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> Sent: Tuesday, December 09, 2014 2:55 AM
> To: Karmarkar Suyash
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] DDPK use of MAP_FIXED in mmap
> 
> On Mon, Dec 08, 2014 at 07:02:38PM +, Karmarkar Suyash wrote:
> > Hello,
> >
> > In DPDK when we use mmap why are we passing the MAP_FIXED flag when
> Linux man page itself says that the option is discouraged? Any specific reason
> for passing the MAP_FIXED flag?
> >
> >
> > http://linux.die.net/man/2/mmap
> >
> > MAP_FIXED
> > Don't interpret addr as a hint: place the mapping at exactly that address. 
> > addr
> must be a multiple of the page size. If the memory region specified by addr 
> and
> len overlaps pages of any existing mapping(s), then the overlapped part of the
> existing mapping(s) will be discarded. If the specified address cannot be 
> used,
> mmap() will fail. Because requiring a fixed address for a mapping is less 
> portable,
> the use of this option is discouraged.
> >
> >
> > Regards
> > Suyash Karmarkar
> 
> I won't comment on every occurance of "MAP_FIXED" in DPDK, but it's main use
> is
> when mapping the hugepages into memory inside EAL init. In this case, we are
> ok to
> use it, as we take good care to ensure that our mapping space is free. What we
> do
> is, once we know how many contiguous hugepages we need to map, we request
> a mapping
> from /dev/zero for that particular size. We then record the address of the
> mapping
> we get, and then unmap /dev/zero again - thereby freeing up the entire address
> range. At this point, we then use MAP_FIXED to explicitly mmap in the
> hugepages
> into this region that we have just freed up - thereby guaranteeing contiguous
> hugepage mappings in the correct order. [The reason for doing things this way 
> is
> that we found on some systems - particularly with 32-bit code, the regular
> mmaps
> of pages we being done in reverse order, meaning each page became it's own
> segment].
> 
> On the other hand, it's also good to note where we don't use MAP_FIXED. We
> don't
> use map fixed when initializing a secondary process and are mapping the
> hugepage
> memory into it. In this case, although we know where the memory has to be
> placed,
> we don't know if it is safe to use or not. Instead of using MAP_FIXED, we 
> instead
> hint to the kernel our preferred address and check if the request was 
> satisfied
> at that address.
> 
> Hope this clarifies things a bit,

Don't know if kernel always use the hinted address if the target region is free 
for mmap.
The safe way is if we are absolutely sure the region is free(like through 
checking maps file), use MMAP_FIXED(in secondary process).

> /Bruce

[dpdk-dev] [PATCH] rte_eth_ctrl: rename rte_eth_fdir_flow.ip6_flow

2014-12-09 Thread Mark Kavanagh

The name of the rte_eth_fdir_flow's rte_eth_ipv6_flow attribute,
'ip6_flow', clashes with a macro defined in
/usr/include/netinet/ip6.h, such that when DPDK is linked with an
application that uses the afforementioned header, the macro is
expanded within the DPDK struct, causing a compilation error.

Rename the relevant attribute in DPDK to resolve this.

Signed-off-by: Mark Kavanagh 
---
 app/test-pmd/cmdline.c  | 4 ++--
 lib/librte_ether/rte_eth_ctrl.h | 2 +-
 lib/librte_pmd_i40e/i40e_fdir.c | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index f79ea3e..882a5a2 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -8261,9 +8261,9 @@ cmd_flow_director_filter_parsed(void *parsed_result,
case RTE_ETH_FLOW_TYPE_UDPV6:
case RTE_ETH_FLOW_TYPE_TCPV6:
IPV6_ADDR_TO_ARRAY(res->ip_dst,
-   entry.input.flow.ip6_flow.dst_ip);
+   entry.input.flow.ipv6_flow.dst_ip);
IPV6_ADDR_TO_ARRAY(res->ip_src,
-   entry.input.flow.ip6_flow.src_ip);
+   entry.input.flow.ipv6_flow.src_ip);
/* need convert to big endian. */
entry.input.flow.udp6_flow.dst_port =
rte_cpu_to_be_16(res->port_dst);
diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index 7088d8d..5d9c387 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -275,7 +275,7 @@ union rte_eth_fdir_flow {
struct rte_eth_udpv6_flow  udp6_flow;
struct rte_eth_tcpv6_flow  tcp6_flow;
struct rte_eth_sctpv6_flow sctp6_flow;
-   struct rte_eth_ipv6_flow   ip6_flow;
+   struct rte_eth_ipv6_flow   ipv6_flow;
 };

 /**
diff --git a/lib/librte_pmd_i40e/i40e_fdir.c b/lib/librte_pmd_i40e/i40e_fdir.c
index ad38803..68511c8 100644
--- a/lib/librte_pmd_i40e/i40e_fdir.c
+++ b/lib/librte_pmd_i40e/i40e_fdir.c
@@ -749,10 +749,10 @@ i40e_fdir_fill_eth_ip_head(const struct 
rte_eth_fdir_input *fdir_input,
 * to the expected received packets.
 */
rte_memcpy(&(ip6->src_addr),
-  &(fdir_input->flow.ip6_flow.dst_ip),
+  &(fdir_input->flow.ipv6_flow.dst_ip),
   IPV6_ADDR_LEN);
rte_memcpy(&(ip6->dst_addr),
-  &(fdir_input->flow.ip6_flow.src_ip),
+  &(fdir_input->flow.ipv6_flow.src_ip),
   IPV6_ADDR_LEN);
ip6->proto = next_proto[fdir_input->flow_type];
break;
-- 
1.9.3

[dpdk-dev] [PATCH] rte_eth_ctrl: rename rte_eth_fdir_flow.ip6_flow

2014-12-09 Thread Mark Kavanagh

Hi,

This patch resolves a compilation issue that arises when DPDK1.8-rc3
is linked with OVS on Fedora 20, kernel v3.16.4-200.

Specifically, the name of rte_eth_fdir_flow's
rte_eth_ipv6_flow attribute, 'ip6_flow', clashes with a macro defined
in /usr/include/netinet/ip6.h, such that when DPDK is linked with an
application that uses the afforementioned header, the macro is
expanded within the DPDK struct, causing a compilation error.

Regards,
Mark

Mark Kavanagh (1):
  rte_eth_ctrl: rename rte_eth_fdir_flow.ip6_flow

 app/test-pmd/cmdline.c  | 4 ++--
 lib/librte_ether/rte_eth_ctrl.h | 2 +-
 lib/librte_pmd_i40e/i40e_fdir.c | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

-- 
1.9.3

[dpdk-dev] [PATCH] enic: fix of compile error

2014-12-09 Thread Helin Zhang

Compile warnings/errors was found on gcc 4.7.2 as follows. Variables
was reported of being used but uninitialized. Assigning an initial
value to it is needed.

lib/librte_pmd_enic/vnic/vnic_dev.c: In function vnic_dev_get_mac_addr:
lib/librte_pmd_enic/vnic/vnic_dev.c:393:16: error: a1 may be used uninitialized 
in this function [-Werror=uninitialized]
lib/librte_pmd_enic/vnic/vnic_dev.c:629:10: note: a1 was declared here
lib/librte_pmd_enic/vnic/vnic_dev.c: In function vnic_dev_set_mac_addr:
lib/librte_pmd_enic/vnic/vnic_dev.c:393:16: error: a1 may be used uninitialized 
in this function [-Werror=uninitialized]
lib/librte_pmd_enic/vnic/vnic_dev.c:980:10: note: a1 was declared here

Signed-off-by: Helin Zhang 
---
 lib/librte_pmd_enic/vnic/vnic_dev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_pmd_enic/vnic/vnic_dev.c 
b/lib/librte_pmd_enic/vnic/vnic_dev.c
index b1cd63f..6407994 100644
--- a/lib/librte_pmd_enic/vnic/vnic_dev.c
+++ b/lib/librte_pmd_enic/vnic/vnic_dev.c
@@ -626,7 +626,7 @@ int vnic_dev_soft_reset_done(struct vnic_dev *vdev, int 
*done)

 int vnic_dev_get_mac_addr(struct vnic_dev *vdev, u8 *mac_addr)
 {
-   u64 a0, a1;
+   u64 a0, a1 = 0;
int wait = 1000;
int err, i;

@@ -977,7 +977,7 @@ struct rte_pci_device *vnic_dev_get_pdev(struct vnic_dev 
*vdev)

 int vnic_dev_set_mac_addr(struct vnic_dev *vdev, u8 *mac_addr)
 {
-   u64 a0, a1;
+   u64 a0, a1 = 0;
int wait = 1000;
int i;

-- 
1.8.1.4

[dpdk-dev] A question about hugepage initialization time

2014-12-09 Thread Matt Laswell

Hey Everybody,

Thanks for the feedback.  Yeah, we're pretty sure that the amount of memory
we work with is atypical, and we're hitting something that isn't an issue
for most DPDK users.

To clarify, yes, we're using 1GB hugepages, and we set them up via
hugepagesz and hugepages= in our kernel's grub line.  We find that when we
use four 1GB huge pages, eal memory init takes a couple of seconds, which
is no big deal.  When we use 128 1GB pages, though, memory init can take
several minutes.   The concern is that we will very likely use even more
memory in the future.  Our boot time is mostly just a nuisance now;
nonlinear growth in memory init time may transform it into a larger problem.

We've had to disable transparent hugepages due to latency issues with
in-memory databases.  I'll have to look at the possibility of alternative
memset implementations.  Perhaps some profiler time is in my future.

Again, thanks to everybody for the useful information.

--
Matt Laswell
laswell at infiniteio.com
infinite io, inc.

On Tue, Dec 9, 2014 at 1:06 PM, Matthew Hall  wrote:

> On Tue, Dec 09, 2014 at 10:33:59AM -0600, Matt Laswell wrote:
> > Our DPDK application deals with very large in memory data structures, and
> > can potentially use tens or even hundreds of gigabytes of hugepage
> memory.
>
> What you're doing is an unusual use case and this is open source code where
> nobody might have tested and QA'ed this yet.
>
> So my recommendation would be adding some rte_log statements to measure the
> various steps in the process to see what's going on. Also using the Linux
> Perf
> framework to do low-overhead sampling-based profiling, and making sure
> you've
> got everything compiled with debug symbols so you can see what's consuming
> the
> execution time.
>
> You might find that it makes sense to use some custom allocators like
> jemalloc
> alongside of the DPDK allocators, including perhaps "transparent hugepage
> mode" in your process, and some larger page sizes to reduce the number of
> pages.
>
> You can also use this handy kernel options, hugepagesz= hugepages=N .
> This creates guaranteed-contiguous known-good hugepages during boot which
> initialize much more quickly with less trouble and glitches in my
> experience.
>
> https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
> https://www.kernel.org/doc/Documentation/vm/transhuge.txt
>
> There is no one-size-fits-all solution but these are some possibilities.
>
> Good Luck,
> Matthew.
>

[dpdk-dev] DDPK use of MAP_FIXED in mmap

2014-12-09 Thread Bruce Richardson

On Tue, Dec 09, 2014 at 10:26:34AM -0500, Neil Horman wrote:
> On Mon, Dec 08, 2014 at 07:02:38PM +, Karmarkar Suyash wrote:
> > Hello,
> > 
> > In DPDK when we use mmap why are we passing the MAP_FIXED flag when Linux 
> > man page itself says that the option is discouraged? Any specific reason 
> > for passing the MAP_FIXED flag?
> > 
> Because theres nothing wrong with doing so.  The man page says its discouraged
> because it creates less portable applications (due to the fact that not all
> operating systems support MAP_FIXED).  Given that we currently only support 
> BSD
> and Linux however, and given that MAP_FIXED was added to POSIX for
> XSI compliant systems, it seems like a reasonable thing to use, as we will 
> most
> likely never run into a system that won't support it
> Neil
> 

Whatever about portability, I thing the best reason to avoid it is given in the
quote from the map page given below: "If the memory region specified by addr and
len overlaps pages of any existing mapping(s), then the overlapped part of the
existing mapping(s) will be discarded." i.e. an mmap with MAP_FIXED can silently
discard an existing mapping. This can lead to some strange behaviour for the
unwary.

/Bruce

> > 
> > http://linux.die.net/man/2/mmap
> > 
> > MAP_FIXED
> > Don't interpret addr as a hint: place the mapping at exactly that address. 
> > addr must be a multiple of the page size. If the memory region specified by 
> > addr and len overlaps pages of any existing mapping(s), then the overlapped 
> > part of the existing mapping(s) will be discarded. If the specified address 
> > cannot be used, mmap() will fail. Because requiring a fixed address for a 
> > mapping is less portable, the use of this option is discouraged.
> > 
> > 
> > Regards
> > Suyash Karmarkar
> >

[dpdk-dev] [RFC PATCH 00/17] Single virtio implementation

2014-12-09 Thread Qiu, Michael

On 2014/12/9 22:19, Ouyang, Changchun wrote:
> Hi Bruce,
>
>> -Original Message-
>> From: Richardson, Bruce
>> Sent: Tuesday, December 9, 2014 5:47 PM
>> To: Ouyang, Changchun
>> Cc: Thomas Monjalon; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [RFC PATCH 00/17] Single virtio implementation
>>
>> On Tue, Dec 09, 2014 at 06:40:23AM +, Ouyang, Changchun wrote:
>>>
 -Original Message-
 From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
 Sent: Tuesday, December 9, 2014 2:12 PM
 To: Ouyang, Changchun
 Cc: Qiu, Michael; Stephen Hemminger; dev at dpdk.org
 Subject: Re: [dpdk-dev] [RFC PATCH 00/17] Single virtio
 implementation

 2014-12-09 05:41, Ouyang, Changchun:
> Hi
>
>> -Original Message-
>> From: Qiu, Michael
>> Sent: Tuesday, December 9, 2014 11:23 AM
>> To: Ouyang, Changchun; Thomas Monjalon; Stephen Hemminger
>> Cc: dev at dpdk.org
>> Subject: Re: [dpdk-dev] [RFC PATCH 00/17] Single virtio
>> implementation
>>
>> On 12/9/2014 9:11 AM, Ouyang, Changchun wrote:
>>> Hi Thomas,
>>>
 -Original Message-
 From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
 Sent: Monday, December 8, 2014 5:31 PM
 To: Ouyang, Changchun
 Cc: dev at dpdk.org
 Subject: Re: [dpdk-dev] [RFC PATCH 00/17] Single virtio
 implementation

 Hi Changchun,

 2014-12-08 14:21, Ouyang Changchun:
> This patch set bases on two original RFC patch sets from
> Stephen
 Hemminger[stephen at networkplumber.org]
> Refer to
> [http://dpdk.org/ml/archives/dev/2014-August/004845.html ]
> for
 the original one.
> This patch set also resolves some conflict with latest codes
> and removed
 duplicated codes.

 As you sent the patches, you appear as the author.
 But I guess Stephen should be the author for some of them.
 Please check who has contributed the most in each patch to
>> decide.
>>> You are right, most of patches originate from Stephen's
>>> patchset, except for the last one, To be honest, I am ok
>>> whoever is the author of this patch set, :-), We could co-own
>>> the feature of Single virtio if you all agree with it, and I
>>> think we couldn't finish Such a feature without collaboration
>>> among us, this is why I tried to communicate
>> with most of you to collect more feedback, suggestion and
>> comments for this feature.
>>> Very appreciate for all kinds of feedback, suggestion here,
>>> especially for
>> patch set from Stephen.
>>> According to your request, how could we make this patch set
>>> looks more
>> like Stephen as the author?
>>> Currently I add Stephen as Signed-off-by list in each patch(I
>>> got the
>> agreement from Stephen before doing this :-)).
>>
>> Hi Ouyang,
>>
>> "Signed-off-by" should be added by himself, because the one who
>> in the Signed-off-by list should take responsibility for it(like
>> potential
 bugs/issues).
>> Although, lots of patches are originate from Stephen, we still
>> need himself add this line :)
> Hi Thomas,
> It that right? I can't add Stephen into Signed-off-by list even if
> I have gotten the agreement from Stephen, What 's the strict rule here?
 Stephen sent the patches with his Signed-off, then you added yours.
 This is OK.
 Using git am, author would have been Stephen. To change author now,
 you can edit each commit with interactive rebase and "git commit
 --amend -- author=Stephen".
 No need to resend now. Please check it for next version of the patchset.

>>> So I understand correctly, Stephen need care for from patches from 1
>>> to 16, I need care for the 17th patch from next version.
>>> What I mean "caring for" above is:  debug and validate them and send
>>> out patches
>>>
>>> Thanks
>>> Changchun
>>>
>> Just to clarify Thomas point here about use of "git am". If you get a patch
>> from someone to test or work on, use "git am" to apply it, rather than "git
>> apply", since "git am" generates a commit in your local repo and thereby
>> maintains the original authorship of the patch. If you do "git apply" and
>> subsequently commit yourself, you - rather than the original author - will
>> appear as the "author" of the patch, and you need to amend the commit as
>> Thomas suggests to fix this.
>>
>> So in short:
>> * git am == good
>> * git apply == bad
> Thanks very much for the clarification. I will use git am for next version.

BTW, you also can use "git am ./xx/*" to patch a series patch set to
your local git tree.

Thanks,
Michael
> Changchun
>
>

[dpdk-dev] [PATCH v3 15/28] eal/pci: Add probe and close function for virtual drivers

2014-12-09 Thread Qiu, Michael

On 2014/12/9 14:33, Tetsuya Mukawa wrote:
> The patch adds rte_eal_dev_init_one() and rte_eal_dev_close_one().
> These are used for attaching and detaching virtual devices.
>
> Signed-off-by: Tetsuya Mukawa 
> ---
>  lib/librte_eal/common/eal_common_dev.c  | 66 
> +
>  lib/librte_eal/common/include/rte_dev.h |  6 +++
>  lib/librte_eal/linuxapp/eal/Makefile|  1 +
>  3 files changed, 73 insertions(+)
>
> diff --git a/lib/librte_eal/common/eal_common_dev.c 
> b/lib/librte_eal/common/eal_common_dev.c
> index eae5656..f573a54 100644
> --- a/lib/librte_eal/common/eal_common_dev.c
> +++ b/lib/librte_eal/common/eal_common_dev.c
> @@ -32,10 +32,13 @@
>   *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>   */
>  
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -107,3 +110,66 @@ rte_eal_dev_init(void)
>   }
>   return 0;
>  }
> +
> +/* So far, linux only supports DPDK hotplug function. */

Sorry, I don't know if I get your point, should be "only linux" right?

> +#if defined(RTE_LIBRTE_EAL_HOTPLUG) && defined(RTE_LIBRTE_EAL_LINUXAPP)
> +
> +#define INVOKE_PROBE (0)
> +#define INVOKE_CLOSE (1)
> +
> +static void
> +rte_eal_dev_invoke(struct rte_driver *driver,
> + struct rte_devargs *devargs, int type)
> +{
> + if ((driver == NULL) || (devargs == NULL))
> + return;
> +
> + switch (type) {
> + case INVOKE_PROBE:
> + driver->init(devargs->virtual.drv_name, devargs->args);
> + break;
> + case INVOKE_CLOSE:
> + driver->close(devargs->virtual.drv_name, devargs->args);
> + break;
> + }
> +}
> +
> +static int
> +rte_eal_dev_find_and_invoke(const char *name, int type)
> +{
> + struct rte_devargs *devargs;
> + struct rte_driver *driver;
> +
> + if (name == NULL)
> + return -EINVAL;
> +
> + /* call the init function for each virtual device */
> + TAILQ_FOREACH(devargs, _list, next) {
> +
> + if (devargs->type != RTE_DEVTYPE_VIRTUAL)
> + continue;
> +
> + if (strncmp(name, devargs->virtual.drv_name, strlen(name)))
> + continue;
> +
> + TAILQ_FOREACH(driver, _driver_list, next) {
> + if (driver->type != PMD_VDEV)
> + continue;
> +
> + /* search a driver prefix in virtual device name */
> + if (!strncmp(driver->name, devargs->virtual.drv_name,
> + strlen(driver->name))) {
> + rte_eal_dev_invoke(driver, devargs, type);
> + break;
> + }
> + }
> +
> + if (driver == NULL) {
> + RTE_LOG(WARNING, EAL, "no driver found for %s\n",
> +   devargs->virtual.drv_name);
> + }
> + return 0;
> + }
> + return 1;
> +}
> +#endif /* RTE_LIBRTE_EAL_HOTPLUG & RTE_LIBRTE_EAL_LINUXAPP */
> diff --git a/lib/librte_eal/common/include/rte_dev.h 
> b/lib/librte_eal/common/include/rte_dev.h
> index f7e3a10..71d40c3 100644
> --- a/lib/librte_eal/common/include/rte_dev.h
> +++ b/lib/librte_eal/common/include/rte_dev.h
> @@ -57,6 +57,11 @@ TAILQ_HEAD(rte_driver_list, rte_driver);
>  typedef int (rte_dev_init_t)(const char *name, const char *args);
>  
>  /**
> + * Close function called for each device driver once.
> + */
> +typedef int (rte_dev_close_t)(const char *name, const char *args);
> +
> +/**
>   * Driver type enumeration
>   */
>  enum pmd_type {
> @@ -72,6 +77,7 @@ struct rte_driver {
>   enum pmd_type type;/**< PMD Driver type */
>   const char *name;   /**< Driver name. */
>   rte_dev_init_t *init;  /**< Device init. function. */
> + rte_dev_close_t *close;/**< Device close. function. */
>  };
>  
>  /**
> diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
> b/lib/librte_eal/linuxapp/eal/Makefile
> index 72ecf3a..0ec83b5 100644
> --- a/lib/librte_eal/linuxapp/eal/Makefile
> +++ b/lib/librte_eal/linuxapp/eal/Makefile
> @@ -41,6 +41,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common/include
>  CFLAGS += -I$(RTE_SDK)/lib/librte_ring
>  CFLAGS += -I$(RTE_SDK)/lib/librte_mempool
>  CFLAGS += -I$(RTE_SDK)/lib/librte_malloc
> +CFLAGS += -I$(RTE_SDK)/lib/librte_mbuf
>  CFLAGS += -I$(RTE_SDK)/lib/librte_ether
>  CFLAGS += -I$(RTE_SDK)/lib/librte_ivshmem
>  CFLAGS += -I$(RTE_SDK)/lib/librte_pmd_ring

[dpdk-dev] [PATCH v3 14/28] eal/pci: Add rte_eal_devargs_remove

2014-12-09 Thread Qiu, Michael

I don't know if other reviewers ask you to split so many patches. But I
would like merge some of them, because some are doing same affairs(just
using different args), others should be add/remove affairs. Those could
be merge to one patch, although it is much more easier for review.

Actually, it is much harder to review such long thread

But just one suggestion, merge or not depends you.

Thanks,
Michael
On 2014/12/9 14:33, Tetsuya Mukawa wrote:
> The function removes a specified devargs from devargs_list.
>
> Signed-off-by: Tetsuya Mukawa 
> ---
>  lib/librte_eal/common/eal_common_devargs.c  | 16 
>  lib/librte_eal/common/include/rte_devargs.h | 18 ++
>  2 files changed, 34 insertions(+)
>
> diff --git a/lib/librte_eal/common/eal_common_devargs.c 
> b/lib/librte_eal/common/eal_common_devargs.c
> index f95a12d..5fd2a2c 100644
> --- a/lib/librte_eal/common/eal_common_devargs.c
> +++ b/lib/librte_eal/common/eal_common_devargs.c
> @@ -140,6 +140,22 @@ rte_eal_devargs_add(enum rte_devtype devtype, const char 
> *devargs_str)
>   return 0;
>  }
>  
> +/* remove it from the devargs_list */
> +void
> +rte_eal_devargs_remove(enum rte_devtype devtype, void *args)
> +{
> + struct rte_devargs *devargs;
> +
> + if (args == NULL)
> + return;
> +
> + devargs = rte_eal_devargs_find(devtype, args);
> + if (devargs == NULL)
> + return;
> +
> + TAILQ_REMOVE(_list, devargs, next);
> +}
> +
>  /* count the number of devices of a specified type */
>  unsigned int
>  rte_eal_devargs_type_count(enum rte_devtype devtype)
> diff --git a/lib/librte_eal/common/include/rte_devargs.h 
> b/lib/librte_eal/common/include/rte_devargs.h
> index 9f9c98f..1066efd 100644
> --- a/lib/librte_eal/common/include/rte_devargs.h
> +++ b/lib/librte_eal/common/include/rte_devargs.h
> @@ -123,6 +123,24 @@ extern struct rte_devargs_list devargs_list;
>  int rte_eal_devargs_add(enum rte_devtype devtype, const char *devargs_str);
>  
>  /**
> + * Remove a device from the user device list
> + *
> + * For PCI devices, the format of arguments string is "PCI_ADDR". It 
> shouldn't
> + * involves parameters for the device. Example: "08:00.1".
> + *
> + * For virtual devices, the format of arguments string is "DRIVER_NAME*". It
> + * shouldn't involves parameters for the device. Example: "eth_ring". The
> + * validity of the driver name is not checked by this function, it is done
> + * when closing the drivers.
> + *
> + * @param devtype
> + *   The type of the device.
> + * @param name
> + *   The name of the device.
> + */
> +void rte_eal_devargs_remove(enum rte_devtype devtype, void *args);
> +
> +/**
>   * Count the number of user devices of a specified type
>   *
>   * @param devtype

[dpdk-dev] [PATCH v3] testpmd: Add port hotplug support

2014-12-09 Thread Tetsuya Mukawa

The patch introduces following commands.
- port attach [ident]
- port detach [port_id]
 - attach: attaching a port
 - detach: detaching a port
 - ident: pci address of physical device.
  Or device name and paramerters of virtual device.
 (ex. :02:00.0, eth_pcap0,iface=eth0)
 - port_id: port identifier

Signed-off-by: Tetsuya Mukawa 
---
 app/test-pmd/cmdline.c| 133 +--
 app/test-pmd/config.c | 116 +++
 app/test-pmd/parameters.c |  21 +++--
 app/test-pmd/testpmd.c| 199 +++---
 app/test-pmd/testpmd.h|  18 -
 5 files changed, 357 insertions(+), 130 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index f79ea3e..40c2db7 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -572,6 +572,12 @@ static void cmd_help_long_parsed(void *parsed_result,
"port close (port_id|all)\n"
"Close all ports or port_id.\n\n"

+   "port add (p|a) (ident)\n"
+   "Add physical or virtual dev by pci address or 
virtual device name\n\n"
+
+   "port del (p|a) (port_id)\n"
+   "Del physical or virtual dev by port_id\n\n"
+
"port config (port_id|all)"
" speed (10|100|1000|1|4|auto)"
" duplex (half|full|auto)\n"
@@ -853,6 +859,89 @@ cmdline_parse_inst_t cmd_operate_specific_port = {
},
 };

+/* *** attach a specificied port *** */
+struct cmd_operate_attach_port_result {
+   cmdline_fixed_string_t port;
+   cmdline_fixed_string_t keyword;
+   cmdline_fixed_string_t identifier;
+};
+
+static void cmd_operate_attach_port_parsed(void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_operate_attach_port_result *res = parsed_result;
+
+   if (!strcmp(res->keyword, "attach"))
+   attach_port(res->identifier);
+   else
+   printf("Unknown parameter\n");
+}
+
+cmdline_parse_token_string_t cmd_operate_attach_port_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_attach_port_result,
+   port, "port");
+cmdline_parse_token_string_t cmd_operate_attach_port_keyword =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_attach_port_result,
+   keyword, "attach");
+cmdline_parse_token_string_t cmd_operate_attach_port_identifier =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_attach_port_result,
+   identifier, NULL);
+
+cmdline_parse_inst_t cmd_operate_attach_port = {
+   .f = cmd_operate_attach_port_parsed,
+   .data = NULL,
+   .help_str = "port attach identifier, "
+   "identifier: pci address or virtual dev name",
+   .tokens = {
+   (void *)_operate_attach_port_port,
+   (void *)_operate_attach_port_keyword,
+   (void *)_operate_attach_port_identifier,
+   NULL,
+   },
+};
+
+/* *** detach a specificied port *** */
+struct cmd_operate_detach_port_result {
+   cmdline_fixed_string_t port;
+   cmdline_fixed_string_t keyword;
+   uint8_t port_id;
+};
+
+static void cmd_operate_detach_port_parsed(void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_operate_detach_port_result *res = parsed_result;
+
+   if (!strcmp(res->keyword, "detach"))
+   detach_port(res->port_id);
+   else
+   printf("Unknown parameter\n");
+}
+
+cmdline_parse_token_string_t cmd_operate_detach_port_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_detach_port_result,
+   port, "port");
+cmdline_parse_token_string_t cmd_operate_detach_port_keyword =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_detach_port_result,
+   keyword, "detach");
+cmdline_parse_token_num_t cmd_operate_detach_port_port_id =
+   TOKEN_NUM_INITIALIZER(struct cmd_operate_detach_port_result,
+   port_id, UINT8);
+
+cmdline_parse_inst_t cmd_operate_detach_port = {
+   .f = cmd_operate_detach_port_parsed,
+   .data = NULL,
+   .help_str = "port detach port_id",
+   .tokens = {
+   (void *)_operate_detach_port_port,
+   (void *)_operate_detach_port_keyword,
+   (void *)_operate_detach_port_port_id,
+   NULL,
+   },
+};
+
 /* *** configure speed for all ports *** */
 struct cmd_config_speed_all {
cmdline_fixed_string_t port;
@@ -907,7 +996,7 @@ cmd_config_speed_all_parsed(void *parsed_result,
return;
}

-   for (pid = 0; pid < nb_ports; pid++) {

[dpdk-dev] [PATCH v3 28/28] eal: Enable port hotplug framework in Linux

2014-12-09 Thread Tetsuya Mukawa

The patch enables CONFIG_RTE_LIBRTE_EAL_HOTPLUG in Linux configuration.

Signed-off-by: Tetsuya Mukawa 
---
 config/common_linuxapp | 5 +
 1 file changed, 5 insertions(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 2f9643b..27d05be 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -114,6 +114,11 @@ CONFIG_RTE_PCI_MAX_READ_REQUEST_SIZE=0
 CONFIG_RTE_LIBRTE_EAL_LINUXAPP=y

 #
+# Compile Environment Abstraction Layer to support hotplug
+#
+CONFIG_RTE_LIBRTE_EAL_HOTPLUG=y
+
+#
 # Compile Environment Abstraction Layer to support Vmware TSC map
 #
 CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y
-- 
1.9.1

[dpdk-dev] [PATCH v3 27/28] eal/pci: Remove rte_eal_dev_attach/detach_pdev() and rte_eal_dev_attach/detach_vdev()

2014-12-09 Thread Tetsuya Mukawa

These functions are wrapped by rte_eal_dev_attach/detach(). So delete
these.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_dev.c  |  8 +++---
 lib/librte_eal/common/include/rte_dev.h | 50 -
 2 files changed, 4 insertions(+), 54 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_dev.c 
b/lib/librte_eal/common/eal_common_dev.c
index c9d4894..7481fb7 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -174,7 +174,7 @@ rte_eal_dev_find_and_invoke(const char *name, int type)
 }

 /* attach the new physical device, then store port_id of the device */
-int
+static int
 rte_eal_dev_attach_pdev(struct rte_pci_addr *addr, uint8_t *port_id)
 {
uint8_t new_port_id;
@@ -203,7 +203,7 @@ err:
 }

 /* detach the new physical device, then store pci_addr of the device */
-int
+static int
 rte_eal_dev_detach_pdev(uint8_t port_id, struct rte_pci_addr *addr)
 {
struct rte_pci_addr freed_addr;
@@ -252,7 +252,7 @@ get_vdev_name(char *vdevargs)
 }

 /* attach the new virtual device, then store port_id of the device */
-int
+static int
 rte_eal_dev_attach_vdev(const char *vdevargs, uint8_t *port_id)
 {
char *args;
@@ -294,7 +294,7 @@ err0:
 }

 /* detach the new virtual device, then store the name of the device */
-int
+static int
 rte_eal_dev_detach_vdev(uint8_t port_id, char *vdevname)
 {
char name[RTE_ETH_NAME_MAX_LEN];
diff --git a/lib/librte_eal/common/include/rte_dev.h 
b/lib/librte_eal/common/include/rte_dev.h
index 1f8f24a..9da518d 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -102,56 +102,6 @@ void rte_eal_driver_unregister(struct rte_driver *driver);
 #if defined(RTE_LIBRTE_EAL_HOTPLUG) && defined(RTE_LIBRTE_EAL_LINUXAPP)

 /**
- * Attach a new physical device.
- *
- * @param addr
- *   A pointer to a pci address structure describing the new
- *   device to be attached.
- * @param port_id
- *  A pointer to a port identifier actually attached.
- * @return
- *  0 on success and port_id is filled, negative on error
- */
-int rte_eal_dev_attach_pdev(struct rte_pci_addr *addr, uint8_t *port_id);
-
-/**
- * Attach a new virtual device.
- *
- * @param vdevargs
- *   A pointer to a strings array describing the new device
- *   to be attached.
- * @param port_id
- *  A pointer to a port identifier actually attached.
- * @return
- *  0 on success and port_id is filled, negative on error
- */
-int rte_eal_dev_attach_vdev(const char *vdevargs, uint8_t *port_id);
-
-/**
- * Detach a physical device.
- *
- * @param port_id
- *   The port identifier of the physical device to detach.
- * @param addr
- *  A pointer to a pci address structure actually detached.
- * @return
- *  0 on success and addr is filled, negative on error
- */
-int rte_eal_dev_detach_pdev(uint8_t port_id, struct rte_pci_addr *addr);
-
-/**
- * Detach a virtual device.
- *
- * @param port_id
- *   The port identifier of the virtual device to detach.
- * @param addr
- *  A pointer to a virtual device name actually detached.
- * @return
- *  0 on success and vdevname is filled, negative on error
- */
-int rte_eal_dev_detach_vdev(uint8_t port_id, char *vdevname);
-
-/**
  * Attach a new device.
  *
  * @param devargs
-- 
1.9.1

[dpdk-dev] [PATCH v3 25/28] eal/pci: Remove pci_probe/close_all_drivers()

2014-12-09 Thread Tetsuya Mukawa

These functions are actually wrappers of pci_invoke_all_drivers().
Just call it directly.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_pci.c | 30 --
 1 file changed, 4 insertions(+), 26 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index 5ff7b49..5044d8e 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -130,29 +130,7 @@ pci_invoke_all_drivers(struct rte_pci_device *dev, int 
type)
return 1;
 }

-/*
- * If vendor/device ID match, call the devinit() function of all
- * registered driver for the given device. Return -1 if initialization
- * failed, return 1 if no driver is found for this device.
- */
-static int
-pci_probe_all_drivers(struct rte_pci_device *dev)
-{
-   return pci_invoke_all_drivers(dev, INVOKE_PROBE);
-}
-
 #if defined(RTE_LIBRTE_EAL_HOTPLUG) && defined(RTE_LIBRTE_EAL_LINUXAPP)
-/*
- * If vendor/device ID match, call the devclose() function of all
- * registered driver for the given device. Return -1 if initialization
- * failed, return 1 if no driver is found for this device.
- */
-static int
-pci_close_all_drivers(struct rte_pci_device *dev)
-{
-   return pci_invoke_all_drivers(dev, INVOKE_CLOSE);
-}
-
 static int
 rte_eal_pci_invoke_one(struct rte_pci_addr *addr, int type)
 {
@@ -165,10 +143,10 @@ rte_eal_pci_invoke_one(struct rte_pci_addr *addr, int 
type)

switch (type) {
case INVOKE_PROBE:
-   ret = pci_probe_all_drivers(dev);
+   ret = pci_invoke_all_drivers(dev, INVOKE_PROBE);
break;
case INVOKE_CLOSE:
-   ret = pci_close_all_drivers(dev);
+   ret = pci_invoke_all_drivers(dev, INVOKE_CLOSE);
break;
}
if (ret < 0)
@@ -237,10 +215,10 @@ rte_eal_pci_probe(void)

/* probe all or only whitelisted devices */
if (probe_all)
-   ret = pci_probe_all_drivers(dev);
+   ret = pci_invoke_all_drivers(dev, INVOKE_PROBE);
else if (devargs != NULL &&
devargs->type == RTE_DEVTYPE_WHITELISTED_PCI)
-   ret = pci_probe_all_drivers(dev);
+   ret = pci_invoke_all_drivers(dev, INVOKE_PROBE);
if (ret < 0)
rte_exit(EXIT_FAILURE, "Requested device " PCI_PRI_FMT
 " cannot be used\n", dev->addr.domain, 
dev->addr.bus,
-- 
1.9.1

[dpdk-dev] [PATCH v3 24/28] eal/pci: Add port hotplug functions for physical devices.

2014-12-09 Thread Tetsuya Mukawa

The patch adds rte_eal_dev_attach_pdev() and rte_eal_dev_detach_pdev().

rte_eal_dev_attach_pdev() receives a PCI address of the device and
returns an attached port number.
rte_eal_dev_detach_pdev() receives a port number, and returns a PCI
address actually detached.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_dev.c  | 64 +
 lib/librte_eal/common/include/rte_dev.h | 26 ++
 2 files changed, 90 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_dev.c 
b/lib/librte_eal/common/eal_common_dev.c
index 4ea3502..9ff03ed 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -173,6 +173,70 @@ rte_eal_dev_find_and_invoke(const char *name, int type)
return 1;
 }

+/* attach the new physical device, then store port_id of the device */
+int
+rte_eal_dev_attach_pdev(struct rte_pci_addr *addr, uint8_t *port_id)
+{
+   uint8_t new_port_id;
+   struct rte_eth_dev devs[RTE_MAX_ETHPORTS];
+
+   if ((addr == NULL) || (port_id == NULL))
+   goto err;
+
+   /* save current port status */
+   rte_eth_dev_save(devs);
+   /* re-construct pci_device_list */
+   if (rte_eal_pci_scan())
+   goto err;
+   /* invoke probe func of the driver can handle the new device */
+   if (rte_eal_pci_probe_one(addr))
+   goto err;
+   /* get port_id enabled by above procedures */
+   if (rte_eth_dev_get_changed_port(devs, _port_id))
+   goto err;
+
+   *port_id = new_port_id;
+   return 0;
+err:
+   RTE_LOG(ERR, EAL, "Drver, cannot attach the device\n");
+   return -1;
+}
+
+/* detach the new physical device, then store pci_addr of the device */
+int
+rte_eal_dev_detach_pdev(uint8_t port_id, struct rte_pci_addr *addr)
+{
+   struct rte_pci_addr freed_addr;
+   struct rte_pci_addr vp;
+
+   if (addr == NULL)
+   goto err;
+
+   /* check whether the driver supports detach feature, or not */
+   if (rte_eth_dev_check_detachable(port_id))
+   goto err;
+
+   /* get pci address by port id */
+   if (rte_eth_dev_get_addr_by_port(port_id, _addr))
+   goto err;
+
+   /* Zerod pci addr means the port comes from virtual device */
+   vp.domain = vp.bus = vp.devid = vp.function = 0;
+   if (eal_compare_pci_addr(, _addr) == 0)
+   goto err;
+
+   /* invoke close func of the driver,
+* also remove the device from pci_device_list */
+   if (rte_eal_pci_close_one(_addr))
+   goto err;
+
+   *addr = freed_addr;
+   return 0;
+err:
+   RTE_LOG(ERR, EAL, "Drver, cannot detach the device\n");
+   return -1;
+}
+
 static void
 get_vdev_name(char *vdevargs)
 {
diff --git a/lib/librte_eal/common/include/rte_dev.h 
b/lib/librte_eal/common/include/rte_dev.h
index 159d5a5..f0677cb 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -47,6 +47,7 @@ extern "C" {
 #endif

 #include 
+#include 

 /** Double linked list of device drivers. */
 TAILQ_HEAD(rte_driver_list, rte_driver);
@@ -101,6 +102,19 @@ void rte_eal_driver_unregister(struct rte_driver *driver);
 #if defined(RTE_LIBRTE_EAL_HOTPLUG) && defined(RTE_LIBRTE_EAL_LINUXAPP)

 /**
+ * Attach a new physical device.
+ *
+ * @param addr
+ *   A pointer to a pci address structure describing the new
+ *   device to be attached.
+ * @param port_id
+ *  A pointer to a port identifier actually attached.
+ * @return
+ *  0 on success and port_id is filled, negative on error
+ */
+int rte_eal_dev_attach_pdev(struct rte_pci_addr *addr, uint8_t *port_id);
+
+/**
  * Attach a new virtual device.
  *
  * @param vdevargs
@@ -114,6 +128,18 @@ void rte_eal_driver_unregister(struct rte_driver *driver);
 int rte_eal_dev_attach_vdev(const char *vdevargs, uint8_t *port_id);

 /**
+ * Detach a physical device.
+ *
+ * @param port_id
+ *   The port identifier of the physical device to detach.
+ * @param addr
+ *  A pointer to a pci address structure actually detached.
+ * @return
+ *  0 on success and addr is filled, negative on error
+ */
+int rte_eal_dev_detach_pdev(uint8_t port_id, struct rte_pci_addr *addr);
+
+/**
  * Detach a virtual device.
  *
  * @param port_id
-- 
1.9.1

[dpdk-dev] [PATCH v3 22/28] eal/pci: Add pci_close_all_drivers

2014-12-09 Thread Tetsuya Mukawa

The function tries to find a driver for the specified device, and then
close the driver.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_pci.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index 1e3efea..b404ee0 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -100,6 +100,7 @@ static struct rte_devargs *pci_devargs_lookup(struct 
rte_pci_device *dev)
 }

 #define INVOKE_PROBE   (0)
+#define INVOKE_CLOSE   (1)

 static int
 pci_invoke_all_drivers(struct rte_pci_device *dev, int type)
@@ -112,6 +113,11 @@ pci_invoke_all_drivers(struct rte_pci_device *dev, int 
type)
case INVOKE_PROBE:
rc = rte_eal_pci_probe_one_driver(dr, dev);
break;
+#if defined(RTE_LIBRTE_EAL_HOTPLUG) && defined(RTE_LIBRTE_EAL_LINUXAPP)
+   case INVOKE_CLOSE:
+   rc = rte_eal_pci_close_one_driver(dr, dev);
+   break;
+#endif
}
if (rc < 0)
/* negative value is an error */
@@ -135,6 +141,19 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
return pci_invoke_all_drivers(dev, INVOKE_PROBE);
 }

+#if defined(RTE_LIBRTE_EAL_HOTPLUG) && defined(RTE_LIBRTE_EAL_LINUXAPP)
+/*
+ * If vendor/device ID match, call the devclose() function of all
+ * registered driver for the given device. Return -1 if initialization
+ * failed, return 1 if no driver is found for this device.
+ */
+static int
+pci_close_all_drivers(struct rte_pci_device *dev)
+{
+   return pci_invoke_all_drivers(dev, INVOKE_CLOSE);
+}
+#endif /* RTE_LIBRTE_EAL_HOTPLUG & RTE_LIBRTE_EAL_LINUXAPP */
+
 /*
  * Scan the content of the PCI bus, and call the devinit() function for
  * all registered drivers that have a matching entry in its id_table
-- 
1.9.1

[dpdk-dev] [PATCH v3 21/28] eal/pci: Fix pci_probe_all_drivers to share code with closing function

2014-12-09 Thread Tetsuya Mukawa

pci_close_all_drivers() will be implemented after the patch.
To share a part of code between thses 2 functions, The patch fixes
pci_probe_all_drivers() first.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_pci.c | 28 
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index f01f258..1e3efea 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -99,19 +99,20 @@ static struct rte_devargs *pci_devargs_lookup(struct 
rte_pci_device *dev)
return NULL;
 }

-/*
- * If vendor/device ID match, call the devinit() function of all
- * registered driver for the given device. Return -1 if initialization
- * failed, return 1 if no driver is found for this device.
- */
+#define INVOKE_PROBE   (0)
+
 static int
-pci_probe_all_drivers(struct rte_pci_device *dev)
+pci_invoke_all_drivers(struct rte_pci_device *dev, int type)
 {
struct rte_pci_driver *dr = NULL;
-   int rc;
+   int rc = 0;

TAILQ_FOREACH(dr, _driver_list, next) {
-   rc = rte_eal_pci_probe_one_driver(dr, dev);
+   switch (type) {
+   case INVOKE_PROBE:
+   rc = rte_eal_pci_probe_one_driver(dr, dev);
+   break;
+   }
if (rc < 0)
/* negative value is an error */
return -1;
@@ -124,6 +125,17 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
 }

 /*
+ * If vendor/device ID match, call the devinit() function of all
+ * registered driver for the given device. Return -1 if initialization
+ * failed, return 1 if no driver is found for this device.
+ */
+static int
+pci_probe_all_drivers(struct rte_pci_device *dev)
+{
+   return pci_invoke_all_drivers(dev, INVOKE_PROBE);
+}
+
+/*
  * Scan the content of the PCI bus, and call the devinit() function for
  * all registered drivers that have a matching entry in its id_table
  * for discovered devices.
-- 
1.9.1

[dpdk-dev] [PATCH v3 19/28] eal/pci: Change scope of rte_eal_pci_scan to global

2014-12-09 Thread Tetsuya Mukawa

The function is called by port hotplug framework, so change scope of the
function to global.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_private.h   | 11 +++
 lib/librte_eal/linuxapp/eal/eal_pci.c |  6 +++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index 232fcec..a1127ab 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -154,6 +154,17 @@ struct rte_pci_driver;
 struct rte_pci_device;

 /**
+ * Scan the content of the PCI bus, and the devices in the devices
+ * list
+ *
+ * This function is private to EAL.
+ *
+ * @return
+ *  0 on success, negative on error
+ */
+int rte_eal_pci_scan(void);
+
+/**
  * Mmap memory for single PCI device
  *
  * This function is private to EAL.
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 355c858..8d683f5 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -375,8 +375,8 @@ error:
  * Scan the content of the PCI bus, and the devices in the devices
  * list
  */
-static int
-pci_scan(void)
+int
+rte_eal_pci_scan(void)
 {
struct dirent *e;
DIR *dir;
@@ -629,7 +629,7 @@ rte_eal_pci_init(void)
if (internal_config.no_pci)
return 0;

-   if (pci_scan() < 0) {
+   if (rte_eal_pci_scan() < 0) {
RTE_LOG(ERR, EAL, "%s(): Cannot scan PCI bus\n", __func__);
return -1;
}
-- 
1.9.1

[dpdk-dev] [PATCH v3 15/28] eal/pci: Add probe and close function for virtual drivers

2014-12-09 Thread Tetsuya Mukawa

The patch adds rte_eal_dev_init_one() and rte_eal_dev_close_one().
These are used for attaching and detaching virtual devices.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_dev.c  | 66 +
 lib/librte_eal/common/include/rte_dev.h |  6 +++
 lib/librte_eal/linuxapp/eal/Makefile|  1 +
 3 files changed, 73 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_dev.c 
b/lib/librte_eal/common/eal_common_dev.c
index eae5656..f573a54 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -32,10 +32,13 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */

+#include 
+#include 
 #include 
 #include 
 #include 

+#include 
 #include 
 #include 
 #include 
@@ -107,3 +110,66 @@ rte_eal_dev_init(void)
}
return 0;
 }
+
+/* So far, linux only supports DPDK hotplug function. */
+#if defined(RTE_LIBRTE_EAL_HOTPLUG) && defined(RTE_LIBRTE_EAL_LINUXAPP)
+
+#define INVOKE_PROBE   (0)
+#define INVOKE_CLOSE   (1)
+
+static void
+rte_eal_dev_invoke(struct rte_driver *driver,
+   struct rte_devargs *devargs, int type)
+{
+   if ((driver == NULL) || (devargs == NULL))
+   return;
+
+   switch (type) {
+   case INVOKE_PROBE:
+   driver->init(devargs->virtual.drv_name, devargs->args);
+   break;
+   case INVOKE_CLOSE:
+   driver->close(devargs->virtual.drv_name, devargs->args);
+   break;
+   }
+}
+
+static int
+rte_eal_dev_find_and_invoke(const char *name, int type)
+{
+   struct rte_devargs *devargs;
+   struct rte_driver *driver;
+
+   if (name == NULL)
+   return -EINVAL;
+
+   /* call the init function for each virtual device */
+   TAILQ_FOREACH(devargs, _list, next) {
+
+   if (devargs->type != RTE_DEVTYPE_VIRTUAL)
+   continue;
+
+   if (strncmp(name, devargs->virtual.drv_name, strlen(name)))
+   continue;
+
+   TAILQ_FOREACH(driver, _driver_list, next) {
+   if (driver->type != PMD_VDEV)
+   continue;
+
+   /* search a driver prefix in virtual device name */
+   if (!strncmp(driver->name, devargs->virtual.drv_name,
+   strlen(driver->name))) {
+   rte_eal_dev_invoke(driver, devargs, type);
+   break;
+   }
+   }
+
+   if (driver == NULL) {
+   RTE_LOG(WARNING, EAL, "no driver found for %s\n",
+ devargs->virtual.drv_name);
+   }
+   return 0;
+   }
+   return 1;
+}
+#endif /* RTE_LIBRTE_EAL_HOTPLUG & RTE_LIBRTE_EAL_LINUXAPP */
diff --git a/lib/librte_eal/common/include/rte_dev.h 
b/lib/librte_eal/common/include/rte_dev.h
index f7e3a10..71d40c3 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -57,6 +57,11 @@ TAILQ_HEAD(rte_driver_list, rte_driver);
 typedef int (rte_dev_init_t)(const char *name, const char *args);

 /**
+ * Close function called for each device driver once.
+ */
+typedef int (rte_dev_close_t)(const char *name, const char *args);
+
+/**
  * Driver type enumeration
  */
 enum pmd_type {
@@ -72,6 +77,7 @@ struct rte_driver {
enum pmd_type type;/**< PMD Driver type */
const char *name;   /**< Driver name. */
rte_dev_init_t *init;  /**< Device init. function. */
+   rte_dev_close_t *close;/**< Device close. function. */
 };

 /**
diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index 72ecf3a..0ec83b5 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -41,6 +41,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common/include
 CFLAGS += -I$(RTE_SDK)/lib/librte_ring
 CFLAGS += -I$(RTE_SDK)/lib/librte_mempool
 CFLAGS += -I$(RTE_SDK)/lib/librte_malloc
+CFLAGS += -I$(RTE_SDK)/lib/librte_mbuf
 CFLAGS += -I$(RTE_SDK)/lib/librte_ether
 CFLAGS += -I$(RTE_SDK)/lib/librte_ivshmem
 CFLAGS += -I$(RTE_SDK)/lib/librte_pmd_ring
-- 
1.9.1

[dpdk-dev] [PATCH v3 14/28] eal/pci: Add rte_eal_devargs_remove

2014-12-09 Thread Tetsuya Mukawa

The function removes a specified devargs from devargs_list.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_devargs.c  | 16 
 lib/librte_eal/common/include/rte_devargs.h | 18 ++
 2 files changed, 34 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_devargs.c 
b/lib/librte_eal/common/eal_common_devargs.c
index f95a12d..5fd2a2c 100644
--- a/lib/librte_eal/common/eal_common_devargs.c
+++ b/lib/librte_eal/common/eal_common_devargs.c
@@ -140,6 +140,22 @@ rte_eal_devargs_add(enum rte_devtype devtype, const char 
*devargs_str)
return 0;
 }

+/* remove it from the devargs_list */
+void
+rte_eal_devargs_remove(enum rte_devtype devtype, void *args)
+{
+   struct rte_devargs *devargs;
+
+   if (args == NULL)
+   return;
+
+   devargs = rte_eal_devargs_find(devtype, args);
+   if (devargs == NULL)
+   return;
+
+   TAILQ_REMOVE(_list, devargs, next);
+}
+
 /* count the number of devices of a specified type */
 unsigned int
 rte_eal_devargs_type_count(enum rte_devtype devtype)
diff --git a/lib/librte_eal/common/include/rte_devargs.h 
b/lib/librte_eal/common/include/rte_devargs.h
index 9f9c98f..1066efd 100644
--- a/lib/librte_eal/common/include/rte_devargs.h
+++ b/lib/librte_eal/common/include/rte_devargs.h
@@ -123,6 +123,24 @@ extern struct rte_devargs_list devargs_list;
 int rte_eal_devargs_add(enum rte_devtype devtype, const char *devargs_str);

 /**
+ * Remove a device from the user device list
+ *
+ * For PCI devices, the format of arguments string is "PCI_ADDR". It shouldn't
+ * involves parameters for the device. Example: "08:00.1".
+ *
+ * For virtual devices, the format of arguments string is "DRIVER_NAME*". It
+ * shouldn't involves parameters for the device. Example: "eth_ring". The
+ * validity of the driver name is not checked by this function, it is done
+ * when closing the drivers.
+ *
+ * @param devtype
+ *   The type of the device.
+ * @param name
+ *   The name of the device.
+ */
+void rte_eal_devargs_remove(enum rte_devtype devtype, void *args);
+
+/**
  * Count the number of user devices of a specified type
  *
  * @param devtype
-- 
1.9.1

[dpdk-dev] [PATCH v3 13/28] eal/pci: Prevent double registration for devargs_list

2014-12-09 Thread Tetsuya Mukawa

The patch fixes rte_eal_devargs_add() not to register same device twice.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_devargs.c | 35 ++
 1 file changed, 35 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_devargs.c 
b/lib/librte_eal/common/eal_common_devargs.c
index 4c7d11a..f95a12d 100644
--- a/lib/librte_eal/common/eal_common_devargs.c
+++ b/lib/librte_eal/common/eal_common_devargs.c
@@ -44,6 +44,35 @@
 struct rte_devargs_list devargs_list =
TAILQ_HEAD_INITIALIZER(devargs_list);

+
+/* find a entry specified by pci address or device name */
+static struct rte_devargs *
+rte_eal_devargs_find(enum rte_devtype devtype, void *args)
+{
+   struct rte_devargs *devargs;
+
+   if (args == NULL)
+   return NULL;
+
+   TAILQ_FOREACH(devargs, _list, next) {
+   switch (devtype) {
+   case RTE_DEVTYPE_WHITELISTED_PCI:
+   case RTE_DEVTYPE_BLACKLISTED_PCI:
+   if (eal_compare_pci_addr(>pci.addr, args) == 0)
+   goto found;
+   break;
+   case RTE_DEVTYPE_VIRTUAL:
+   if (memcmp(>virtual.drv_name, args,
+   strlen((char *)args)) == 0)
+   goto found;
+   break;
+   }
+   }
+   return NULL;
+found:
+   return devargs;
+}
+
 /* store a whitelist parameter for later parsing */
 int
 rte_eal_devargs_add(enum rte_devtype devtype, const char *devargs_str)
@@ -101,6 +130,12 @@ rte_eal_devargs_add(enum rte_devtype devtype, const char 
*devargs_str)
break;
}

+   /* make sure there is no same entry */
+   if (rte_eal_devargs_find(devtype, >pci.addr)) {
+   RTE_LOG(ERR, EAL, "device already registered: <%s>\n", buf);
+   return -1;
+   }
+
TAILQ_INSERT_TAIL(_list, devargs, next);
return 0;
 }
-- 
1.9.1

[dpdk-dev] [PATCH v3 12/28] ethdev: Change scope of rte_eth_dev_allocated to global

2014-12-09 Thread Tetsuya Mukawa

This function is used by virtual PMDs to support port hotplug framework.
So change scope of the function to global.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_ether/rte_ethdev.c |  2 +-
 lib/librte_ether/rte_ethdev.h | 10 ++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index ed53e66..86200e0 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -206,7 +206,7 @@ rte_eth_dev_data_alloc(void)
RTE_MAX_ETHPORTS * sizeof(*rte_eth_dev_data));
 }

-static struct rte_eth_dev *
+struct rte_eth_dev *
 rte_eth_dev_allocated(const char *name)
 {
unsigned i;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 404c41f..47622a2 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1712,6 +1712,16 @@ extern int rte_eth_dev_get_name_by_port(uint8_t port_id, 
char *name);
 extern int rte_eth_dev_check_detachable(uint8_t port_id);

 /**
+ * Function for internal use by port hotplug functions.
+ * Returns a ethdev slot specified by the unique identifier name.
+ * @param  name
+ *  The pointer to the Unique identifier name for each Ethernet device
+ * @return
+ *   - The pointer to the ethdev slot, on success. NULL on error
+ */
+extern struct rte_eth_dev *rte_eth_dev_allocated(const char *name);
+
+/**
  * Function for internal use by dummy drivers primarily, e.g. ring-based
  * driver.
  * Allocates a new ethdev slot for an ethernet device and returns the pointer
-- 
1.9.1

[dpdk-dev] [PATCH v3 11/28] ethdev: Add rte_eth_dev_check_detachable

2014-12-09 Thread Tetsuya Mukawa

The function returns whether a PMD supports detach function, or not.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_ether/rte_ethdev.c |  9 +
 lib/librte_ether/rte_ethdev.h | 11 +++
 2 files changed, 20 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 1d82f69..ed53e66 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -498,6 +498,15 @@ rte_eth_dev_get_name_by_port(uint8_t port_id, char *name)
return 0;
 }

+int
+rte_eth_dev_check_detachable(uint8_t port_id)
+{
+   uint32_t drv_flags;
+
+   drv_flags = rte_eth_devices[port_id].driver->pci_drv.drv_flags;
+   return !(drv_flags & RTE_PCI_DRV_DETACHABLE);
+}
+
 static int
 rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index bd921d0..404c41f 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1701,6 +1701,17 @@ extern int rte_eth_dev_get_port_by_addr(
 extern int rte_eth_dev_get_name_by_port(uint8_t port_id, char *name);

 /**
+ * Function for internal use by port hotplug functions.
+ * Check whether or not, a PMD that is handling the ethdev specified by port
+ * identifier can support detach function.
+ * @param  port_id
+ *   The port identifier
+ * @return
+ *   - 0 on supporting detach function, negative on not supporting
+ */
+extern int rte_eth_dev_check_detachable(uint8_t port_id);
+
+/**
  * Function for internal use by dummy drivers primarily, e.g. ring-based
  * driver.
  * Allocates a new ethdev slot for an ethernet device and returns the pointer
-- 
1.9.1

[dpdk-dev] [PATCH v3 08/28] ethdev: Add rte_eth_dev_get_addr_by_port

2014-12-09 Thread Tetsuya Mukawa

The function returns a pci address of a ethdev specified by port
identifier.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_ether/rte_ethdev.c | 17 +
 lib/librte_ether/rte_ethdev.h | 13 +
 2 files changed, 30 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 6a3700e..ddaf14a 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -437,6 +437,23 @@ rte_eth_dev_get_changed_port(struct rte_eth_dev *devs, 
uint8_t *port_id)
return 1;
 }

+int
+rte_eth_dev_get_addr_by_port(uint8_t port_id, struct rte_pci_addr *addr)
+{
+   if (rte_eth_dev_validate_port(port_id) == DEV_INVALID) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -EINVAL;
+   }
+
+   if (addr == NULL) {
+   PMD_DEBUG_TRACE("Null pointer is specified\n");
+   return -EINVAL;
+   }
+
+   *addr = rte_eth_devices[port_id].pci_dev->addr;
+   return 0;
+}
+
 static int
 rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 03c8850..30277a2 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1663,6 +1663,19 @@ extern int rte_eth_dev_get_changed_port(
struct rte_eth_dev *devs, uint8_t *port_id);

 /**
+ * Function for internal use by port hotplug functions.
+ * Returns a pci address of a ethdev specified by port identifier.
+ * @param  port_id
+ *   The port identifier of the Ethernet device
+ * @param  addr
+ *   The pointer to the pci address
+ * @return
+ *   - 0 on success, negative on error
+ */
+extern int rte_eth_dev_get_addr_by_port(
+   uint8_t port_id, struct rte_pci_addr *addr);
+
+/**
  * Function for internal use by dummy drivers primarily, e.g. ring-based
  * driver.
  * Allocates a new ethdev slot for an ethernet device and returns the pointer
-- 
1.9.1

[dpdk-dev] [PATCH v3 07/28] ethdev: Add functions to know which port is attached or detached

2014-12-09 Thread Tetsuya Mukawa

The patch adds rte_eth_dev_save() and rte_eth_dev_get_changed_port().
rte_eth_dev_save() is used for saving current rte_eth_dev structures.
rte_eth_dev_get_changed_port() receives the rte_eth_dev structures, then
compare these with current values to know which port is actually
attached or detached.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_ether/rte_ethdev.c | 21 +
 lib/librte_ether/rte_ethdev.h | 21 +
 2 files changed, 42 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 51697e1..6a3700e 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -416,6 +416,27 @@ rte_eth_dev_count(void)
return (nb_ports);
 }

+void
+rte_eth_dev_save(struct rte_eth_dev *devs)
+{
+   if (devs == NULL)
+   return;
+
+   /* save current rte_eth_devices */
+   memcpy(devs, rte_eth_devices,
+   sizeof(struct rte_eth_dev) * RTE_MAX_ETHPORTS);
+}
+
+int
+rte_eth_dev_get_changed_port(struct rte_eth_dev *devs, uint8_t *port_id)
+{
+   /* check which port was attached or detached */
+   for (*port_id = 0; *port_id < RTE_MAX_ETHPORTS; (*port_id)++, devs++)
+   if (rte_eth_devices[*port_id].attached ^ devs->attached)
+   return 0;
+   return 1;
+}
+
 static int
 rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index b329e11..03c8850 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1642,6 +1642,27 @@ extern struct rte_eth_dev rte_eth_devices[];
 extern uint8_t rte_eth_dev_count(void);

 /**
+ * Function for internal use by port hotplug functions.
+ * Copies current ethdev structures to the specified pointer.
+ *
+ * @param  devsThe pointer to the ethdev structures
+ */
+extern void rte_eth_dev_save(struct rte_eth_dev *devs);
+
+/**
+ * Function for internal use by port hotplug functions.
+ * Compare the specified ethdev structures with currents. Then
+ * if there is a port which status is changed, fill the specified pointer
+ * with the port id of that port.
+ * @param  devsThe pointer to the ethdev structures
+ * @param  port_id The pointer to the port id
+ * @return
+ *   - 0 on success, negative on error
+ */
+extern int rte_eth_dev_get_changed_port(
+   struct rte_eth_dev *devs, uint8_t *port_id);
+
+/**
  * Function for internal use by dummy drivers primarily, e.g. ring-based
  * driver.
  * Allocates a new ethdev slot for an ethernet device and returns the pointer
-- 
1.9.1

[dpdk-dev] [PATCH v3 06/28] ethdev: Add rte_eth_dev_shutdown for closing PCI devices.

2014-12-09 Thread Tetsuya Mukawa

rte_eth_dev_shutdown() is called when PCI device is closed.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_ether/rte_ethdev.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index d5fdb03..51697e1 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -332,6 +332,42 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv,
return diag;
 }

+static int
+rte_eth_dev_shutdown(struct rte_pci_driver *pci_drv,
+struct rte_pci_device *pci_dev)
+{
+   struct eth_driver *eth_drv;
+   struct rte_eth_dev *eth_dev;
+   char ethdev_name[RTE_ETH_NAME_MAX_LEN];
+
+   /* Create unique Ethernet device name using PCI address */
+   snprintf(ethdev_name, RTE_ETH_NAME_MAX_LEN, "%d:%d.%d",
+   pci_dev->addr.bus, pci_dev->addr.devid,
+   pci_dev->addr.function);
+
+   eth_dev = rte_eth_dev_free(ethdev_name);
+   if (eth_dev == NULL)
+   return -ENODEV;
+
+   eth_drv = (struct eth_driver *)pci_drv;
+
+   /* Invoke PMD device shutdown function */
+   if (*eth_drv->eth_dev_shutdown)
+   (*eth_drv->eth_dev_shutdown)(eth_drv, eth_dev);
+
+   /* init user callbacks */
+   TAILQ_INIT(&(eth_dev->callbacks));
+
+   if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+   rte_free(eth_dev->data->dev_private);
+
+   eth_dev->pci_dev = NULL;
+   eth_dev->driver = NULL;
+   eth_dev->data = NULL;
+
+   return 0;
+}
+
 /**
  * Register an Ethernet [Poll Mode] driver.
  *
@@ -350,6 +386,7 @@ void
 rte_eth_driver_register(struct eth_driver *eth_drv)
 {
eth_drv->pci_drv.devinit = rte_eth_dev_init;
+   eth_drv->pci_drv.devshutdown = rte_eth_dev_shutdown;
rte_eal_pci_register(_drv->pci_drv);
 }

-- 
1.9.1

[dpdk-dev] [PATCH v3 05/28] eal, ethdev: Add function pointer for closing a device

2014-12-09 Thread Tetsuya Mukawa

The patch adds function pointer to rte_pci_driver and eth_driver
structure. These function pointers are used when ports are detached.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/include/rte_pci.h |  7 +++
 lib/librte_ether/rte_ethdev.h   | 24 
 2 files changed, 31 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index fe374a8..74720d1 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -181,12 +181,19 @@ struct rte_pci_driver;
 typedef int (pci_devinit_t)(struct rte_pci_driver *, struct rte_pci_device *);

 /**
+ * Shutdown function for the driver called during hotplugging.
+ */
+typedef int (pci_devshutdown_t)(
+   struct rte_pci_driver *, struct rte_pci_device *);
+
+/**
  * A structure describing a PCI driver.
  */
 struct rte_pci_driver {
TAILQ_ENTRY(rte_pci_driver) next;   /**< Next in list. */
const char *name;   /**< Driver name. */
pci_devinit_t *devinit; /**< Device init. function. */
+   pci_devshutdown_t *devshutdown; /**< Device shutdown function. 
*/
struct rte_pci_id *id_table;/**< ID table, NULL terminated. 
*/
uint32_t drv_flags; /**< Flags contolling handling 
of device. */
 };
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index fb1caea..b329e11 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1702,6 +1702,27 @@ typedef int (*eth_dev_init_t)(struct eth_driver  
*eth_drv,

 /**
  * @internal
+ * Finalization function of an Ethernet driver invoked for each matching
+ * Ethernet PCI device detected during the PCI closing phase.
+ *
+ * @param eth_drv
+ *   The pointer to the [matching] Ethernet driver structure supplied by
+ *   the PMD when it registered itself.
+ * @param eth_dev
+ *   The *eth_dev* pointer is the address of the *rte_eth_dev* structure
+ *   associated with the matching device and which have been [automatically]
+ *   allocated in the *rte_eth_devices* array.
+ * @return
+ *   - 0: Success, the device is properly finalized by the driver.
+ *In particular, the driver MUST free the *dev_ops* pointer
+ *of the *eth_dev* structure.
+ *   - <0: Error code of the device initialization failure.
+ */
+typedef int (*eth_dev_shutdown_t)(struct eth_driver  *eth_drv,
+ struct rte_eth_dev *eth_dev);
+
+/**
+ * @internal
  * The structure associated with a PMD Ethernet driver.
  *
  * Each Ethernet driver acts as a PCI driver and is represented by a generic
@@ -1711,11 +1732,14 @@ typedef int (*eth_dev_init_t)(struct eth_driver  
*eth_drv,
  *
  * - The *eth_dev_init* function invoked for each matching PCI device.
  *
+ * - The *eth_dev_shutdown* function invoked for each matching PCI device.
+ *
  * - The size of the private data to allocate for each matching device.
  */
 struct eth_driver {
struct rte_pci_driver pci_drv;/**< The PMD is also a PCI driver. */
eth_dev_init_t eth_dev_init;  /**< Device init function. */
+   eth_dev_shutdown_t eth_dev_shutdown;/**< Device shutdown function. */
unsigned int dev_private_size;/**< Size of device private data. */
 };

-- 
1.9.1

[dpdk-dev] [PATCH v3 01/28] eal/pci: Add a new flag indicating a driver can detach devices at runtime.

2014-12-09 Thread Tetsuya Mukawa

This patch adds "RTE_PCI_DRV_DETACHABLE" to drv_flags of rte_pci_driver
structure. The flags indicates the driver can detach devices at runtime.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/include/rte_pci.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 66ed793..b819539 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -199,6 +199,8 @@ struct rte_pci_driver {
 #define RTE_PCI_DRV_FORCE_UNBIND 0x0004
 /** Device driver supports link state interrupt */
 #define RTE_PCI_DRV_INTR_LSC   0x0008
+/** Device driver supports detaching capablity */
+#define RTE_PCI_DRV_DETACHABLE 0x0010

 /**< Internal use only - Macro used by pci addr parsing functions **/
 #define GET_PCIADDR_FIELD(in, fd, lim, dlm)   \
-- 
1.9.1

[dpdk-dev] [PATCH v3 00/28] Port Hotplug Framework

2014-12-09 Thread Tetsuya Mukawa

This patch series adds a dynamic port hotplug framework to DPDK.
With the patches, DPDK apps can attach or detach ports at runtime.

The basic concept of the port hotplug is like followings.
- DPDK apps must have responsibility to manage ports.
  DPDK apps only know which ports are attached or detached at the moment.
  The port hotplug framework is implemented to allow DPDK apps to manage ports.
  For example, when DPDK apps call port attach function, attached port number
  will be returned. Also DPDK apps can detach port by port number.
- Kernel support is needed for attaching or detaching physical device ports.
  To attach new device, the device will be recognized by kernel at first and
  controlled by kernel driver. Then user can bind the device to igb_uio
  by 'dpdk_nic_bind.py'. Finally, DPDK apps can call the port hotplug
  functions to attach ports.
  For detaching, steps are vice versa.
- Before detach ports, ports must be stopped and closed.
  DPDK application must call rte_eth_dev_stop() and rte_eth_dev_close() before
  detaching ports. These function will call finalization codes of PMDs.
  But so far, no PMD frees all resources allocated by initialization.
  It means PMDs are needed to be fixed to support the port hotplug.
  'RTE_PCI_DRV_DETACHABLE' is a new flag indicating a PMD supports detaching.
  Without this flag, detaching will be failed.
- Mustn't affect legacy DPDK apps.
  No DPDK EAL behavior is changed, if the port hotplug functions are't called.
  So all legacy DPDK apps can still work without modifications.

And a few limitations.
- The port hotplug functions are not thread safe.
  DPDK apps should handle it.
- Only support Linux and igb_uio so far.
  BSD and VFIO is not supported. I will send VFIO patches at least, but I don't
  have a plan to submit BSD patch so far.


Here is port hotplug APIs.
---
/**
 * Attach a new device.
 *
 * @param devargs
 *   A pointer to a strings array describing the new device
 *   to be attached. The strings should be a pci address like
 *   ':01:00.0' or virtual device name like 'eth_pcap0'.
 * @param port_id
 *  A pointer to a port identifier actually attached.
 * @return
 *  0 on success and port_id is filled, negative on error
 */
int rte_eal_dev_attach(const char *devargs, uint8_t *port_id);

/**
 * Detach a device.
 *
 * @param port_id
 *   The port identifier of the device to detach.
 * @param addr
 *  A pointer to a device name actually detached.
 * @return
 *  0 on success and devname is filled, negative on error
 */
int rte_eal_dev_detach(uint8_t port_id, char *devname);
---

This patch series are for DPDK EAL. To use port hotplug function by DPDK apps,
each PMD should be fixed to support 'RTE_PCI_DRV_DETACHABLE' flag. Please check
a patch for pcap PMD.

Also please check testpmd patch. It will show you how to fix your legacy
applications to support port hotplug feature.


PATCH v3 chages:
 - Fix enum definition used in rte_ethdev.c.
   (Thanks to Zhang, Helin)

PATCH v2 chages:
 - Replace rte_eal_dev_attach_pdev(), rte_eal_dev_detach_pdev,
   rte_eal_dev_attach_vdev() and rte_eal_dev_detach_vdev() to
   rte_eal_dev_attach() and rte_eal_dev_detach().
 - Add parameter values checking.
 - Refashion a few functions.
   (Thanks to Iremonger, Bernard)

PATCH v1 Chages:
 - Fix error checking code of librte_eth APIs.
 - Fix issue that port from pcap PMD cannot be detached correctly.
 - Fix issue that testpmd could hang after forwarding, if attaching and 
detaching
   is repeatedly.
 - Fix if-condition of rte_eth_dev_get_port_by_addr().
   (Thanks to Mark Enright)

RFC PATCH v2 Changes:
- remove 'rte_eth_dev_validate_port()', and cleanup codes.


Tetsuya Mukawa (28):
  eal/pci: Add a new flag indicating a driver can detach devices at
runtime.
  ethdev: Remove assumption that port will not be detached
  eal/pci: Replace pci address comparison code by eal_compare_pci_addr
  ethdev: Add rte_eth_dev_free to free specified device
  eal,ethdev: Add function pointer for closing a device
  ethdev: Add rte_eth_dev_shutdown for closing PCI devices.
  ethdev: Add functions to know which port is attached or detached
  ethdev: Add rte_eth_dev_get_addr_by_port
  ethdev: Add rte_eth_dev_get_port_by_addr
  ethdev: Add rte_eth_dev_get_name_by_port
  ethdev: Add rte_eth_dev_check_detachable
  ethdev: Change scope of rte_eth_dev_allocated to global
  eal/pci: Prevent double registration for devargs_list
  eal/pci: Add rte_eal_devargs_remove
  eal/pci: Add probe and close function for virtual drivers
  eal/pci: Add port hotplug functions for virtual devices.
  eal/linux/pci: Add functions for unmapping igb_uio resources
  eal/pci: Prevent double registrations for pci_device_list
  eal/pci: Change scope of rte_eal_pci_scan to global
  eal/pci: Add rte_eal_pci_close_one_driver
  eal/pci: Fix

[dpdk-dev] [PATCH v2 02/28] ethdev: Remove assumption that port will not be detached

2014-12-09 Thread Tetsuya Mukawa

Hi Zhang,

Thanks to your comment.
I will submit again soon.

Thanks,
Tetsuya

(2014/12/09 14:07), Zhang, Helin wrote:
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Tetsuya Mukawa
>> Sent: Tuesday, December 9, 2014 11:42 AM
>> To: dev at dpdk.org
>> Cc: nakajima.yoshihiro at lab.ntt.co.jp; menrigh at brocade.com;
>> masutani.hitoshi at lab.ntt.co.jp
>> Subject: [dpdk-dev] [PATCH v2 02/28] ethdev: Remove assumption that port will
>> not be detached
>>
>> To remove assumption, do like followings.
>>
>> - Add 'attached' member to rte_eth_dev structure.
>>   This member is used for indicating the port is attached, or not.
>> - Add rte_eth_dev_allocate_new_port().
>>   This function is used for allocating new port.
>>
>> Signed-off-by: Tetsuya Mukawa 
>> ---
>>  lib/librte_ether/rte_ethdev.c | 248 
>> --
>>  lib/librte_ether/rte_ethdev.h |   5 +
>>  2 files changed, 149 insertions(+), 104 deletions(-)
>>
>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c 
>> index
>> 95f2ceb..9f713ae 100644
>> --- a/lib/librte_ether/rte_ethdev.c
>> +++ b/lib/librte_ether/rte_ethdev.c
>> @@ -175,6 +175,16 @@ enum {
>>  STAT_QMAP_RX
>>  };
>>
>> +enum {
>> +DEV_VALID = 0,
>> +DEV_INVALID
>> +};
> Would it be safer to define DEV_INVALID = 0, then DEV_VALID after it?
> Or even as below?
> Enum {
>   DEV_UNKNOWN = 0,
>   DEV_VALID,
>   DEV_INVALID
> };
>
> Regards,
> Helin
>
>> +
>> +enum {
>> +DEV_DISCONNECTED = 0,
>> +DEV_CONNECTED
>> +};
>> +
>>  static inline void
>>  rte_eth_dev_data_alloc(void)
>>  {
>> @@ -201,19 +211,34 @@ rte_eth_dev_allocated(const char *name)  {
>>  unsigned i;
>>
>> -for (i = 0; i < nb_ports; i++) {
>> -if (strcmp(rte_eth_devices[i].data->name, name) == 0)
>> +for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
>> +if ((rte_eth_devices[i].attached == DEV_CONNECTED)
>> +&& strcmp(rte_eth_devices[i].data->name,
>> +name) == 0)
>>  return _eth_devices[i];
>>  }
>>  return NULL;
>>  }
>>
>> +static uint8_t
>> +rte_eth_dev_allocate_new_port(void)
>> +{
>> +unsigned i;
>> +
>> +for (i = 0; i < RTE_MAX_ETHPORTS; i++)
>> +if (rte_eth_devices[i].attached == DEV_DISCONNECTED)
>> +return i;
>> +return RTE_MAX_ETHPORTS;
>> +}
>> +
>>  struct rte_eth_dev *
>>  rte_eth_dev_allocate(const char *name)
>>  {
>> +uint8_t port_id;
>>  struct rte_eth_dev *eth_dev;
>>
>> -if (nb_ports == RTE_MAX_ETHPORTS) {
>> +port_id = rte_eth_dev_allocate_new_port();
>> +if (port_id == RTE_MAX_ETHPORTS) {
>>  PMD_DEBUG_TRACE("Reached maximum number of Ethernet
>> ports\n");
>>  return NULL;
>>  }
>> @@ -226,10 +251,12 @@ rte_eth_dev_allocate(const char *name)
>>  return NULL;
>>  }
>>
>> -eth_dev = _eth_devices[nb_ports];
>> -eth_dev->data = _eth_dev_data[nb_ports];
>> +eth_dev = _eth_devices[port_id];
>> +eth_dev->data = _eth_dev_data[port_id];
>>  snprintf(eth_dev->data->name, sizeof(eth_dev->data->name), "%s",
>> name);
>> -eth_dev->data->port_id = nb_ports++;
>> +eth_dev->data->port_id = port_id;
>> +eth_dev->attached = DEV_CONNECTED;
>> +nb_ports++;
>>  return eth_dev;
>>  }
>>
>> @@ -283,6 +310,7 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv,
>>  (unsigned) pci_dev->id.device_id);
>>  if (rte_eal_process_type() == RTE_PROC_PRIMARY)
>>  rte_free(eth_dev->data->dev_private);
>> +eth_dev->attached = DEV_DISCONNECTED;
>>  nb_ports--;
>>  return diag;
>>  }
>> @@ -308,10 +336,22 @@ rte_eth_driver_register(struct eth_driver *eth_drv)
>>  rte_eal_pci_register(_drv->pci_drv);
>>  }
>>
>> +static int
>> +rte_eth_dev_validate_port(uint8_t port_id) {
>> +if (port_id >= RTE_MAX_ETHPORTS)
>> +return DEV_INVALID;
>> +
>> +if (rte_eth_devices[port_id].attached == DEV_CONNECTED)
>> +return DEV_VALID;
>> +else
>> +return DEV_INVALID;
>> +}
>> +
>>  int
>>  rte_eth_dev_socket_id(uint8_t port_id)
>>  {
>> -if (port_id >= nb_ports)
>> +if (rte_eth_dev_validate_port(port_id) == DEV_INVALID)
>>  return -1;
>>  return rte_eth_devices[port_id].pci_dev->numa_node;
>>  }
>> @@ -369,7 +409,7 @@ rte_eth_dev_rx_queue_start(uint8_t port_id, uint16_t
>> rx_queue_id)
>>   * in a multi-process setup*/
>>  PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
>>
>> -if (port_id >= nb_ports) {
>> +if (rte_eth_dev_validate_port(port_id) == DEV_INVALID) {
>>  PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
>>  return -EINVAL;
>>  }
>> @@ -395,7 +435,7 @@ rte_eth_dev_rx_queue_stop(uint8_t port_id, uint16_t
>> rx_queue_id)
>>   * in a multi-process setup*/
>>

[dpdk-dev] [PATCH v3 13/28] eal/pci: Prevent double registration for devargs_list

2014-12-09 Thread Qiu, Michael

On 2014/12/9 14:33, Tetsuya Mukawa wrote:
> The patch fixes rte_eal_devargs_add() not to register same device twice.
>
> Signed-off-by: Tetsuya Mukawa 
> ---
>  lib/librte_eal/common/eal_common_devargs.c | 35 
> ++
>  1 file changed, 35 insertions(+)
>
> diff --git a/lib/librte_eal/common/eal_common_devargs.c 
> b/lib/librte_eal/common/eal_common_devargs.c
> index 4c7d11a..f95a12d 100644
> --- a/lib/librte_eal/common/eal_common_devargs.c
> +++ b/lib/librte_eal/common/eal_common_devargs.c
> @@ -44,6 +44,35 @@
>  struct rte_devargs_list devargs_list =
>   TAILQ_HEAD_INITIALIZER(devargs_list);
>  
> +
> +/* find a entry specified by pci address or device name */
> +static struct rte_devargs *
> +rte_eal_devargs_find(enum rte_devtype devtype, void *args)
> +{
> + struct rte_devargs *devargs;
> +
> + if (args == NULL)
> + return NULL;
> +
> + TAILQ_FOREACH(devargs, _list, next) {
> + switch (devtype) {
> + case RTE_DEVTYPE_WHITELISTED_PCI:
> + case RTE_DEVTYPE_BLACKLISTED_PCI:
> + if (eal_compare_pci_addr(>pci.addr, args) == 0)
> + goto found;
> + break;
> + case RTE_DEVTYPE_VIRTUAL:
> + if (memcmp(>virtual.drv_name, args,
> + strlen((char *)args)) == 0)
> + goto found;
> + break;
> + }
> + }
> + return NULL;
> +found:
> + return devargs;
> +}
> +
>  /* store a whitelist parameter for later parsing */
>  int
>  rte_eal_devargs_add(enum rte_devtype devtype, const char *devargs_str)
> @@ -101,6 +130,12 @@ rte_eal_devargs_add(enum rte_devtype devtype, const char 
> *devargs_str)
>   break;
>   }
>  
> + /* make sure there is no same entry */
> + if (rte_eal_devargs_find(devtype, >pci.addr)) {

Am I miss something? If devtype == RTE_DEVTYPE_VIRTUAL, what will happen?

Thanks,
Michael
> + RTE_LOG(ERR, EAL, "device already registered: <%s>\n", buf);
> + return -1;
> + }
> +
>   TAILQ_INSERT_TAIL(_list, devargs, next);
>   return 0;
>  }

[dpdk-dev] [PATCH v2 0/3] enhance TX checksum command and csum forwarding engine

2014-12-09 Thread Ananyev, Konstantin

> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jijiang Liu
> Sent: Tuesday, December 09, 2014 1:18 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v2 0/3] enhance TX checksum command and csum 
> forwarding engine
> 
> In the current codes, the "tx_checksum set (ip|udp|tcp|sctp|vxlan) (hw|sw) 
> (port-id)" command is not easy to understand and
> extend, so the patch set enhances the tx_checksum command and reworks csum 
> forwarding engine due to the change of
> tx_checksum command.
> The main changes of the tx_checksum command are listed below,
> 
> 1> add "tx_checksum set tunnel (hw|sw|none) (port-id)" command
> 
> The command is used to set/clear tunnel flag that is used to tell the NIC 
> that the packetg is a tunneing packet and application want
> hardware TX checksum offload for outer layer, or inner layer, or both.
> 
> The 'none' option means that user treat tunneling packet as ordinary packet 
> when using the csum forward engine.
> for example, let say we have a tunnel packet:
> eth_hdr_out/ipv4_hdr_out/udp_hdr_out/vxlan_hdr/ehtr_hdr_in/ipv4_hdr_in/tcp_hdr_in.
>  one of several scenarios:
> 
> 1) User requests HW offload for ipv4_hdr_out  checksum, and doesn't care is 
> it a tunnelled packet or not. So he sets:
> 
> tx_checksum set tunnel none 0
> 
> tx_checksum set ip hw 0
> 
> So for such case we should set tx_tunnel to 'none'.
> 
> 2> add "tx_checksum set outer-ip (hw|sw) (port-id)" command
> 
> The command is used to set/clear outer IP flag that is used to tell the NIC 
> that application want hardware offload for outer layer.
> 
> 3> remove the 'vxlan' option from the  "tx_checksum set 
> (ip|udp|tcp|sctp|vxlan) (hw|sw) (port-id)" command
> 
> The command is used to set IP, UDP, TCP and SCTP TX checksum flag. In the 
> case of tunneling packet, the IP, UDP, TCP and SCTP flags
> always concern inner layer.
> 
> Moreover, replace the TESTPMD_TX_OFFLOAD_VXLAN_CKSUM flag with 
> TESTPMD_TX_OFFLOAD_TUNNEL_CKSUM flag and add the
> TESTPMD_TX_OFFLOAD_OUTER_IP_CKSUM and TESTPMD_TX_OFFLOAD_NON_TUNNEL_CKSUM 
> flag in test-pmd application.
> 
> v2 change:
>  redefine the 'none' behaviour for "tx_checksum set tunnel (hw|sw|none) 
> (port-id)" command.
> 
> Jijiang Liu (3):
>   add outer IP offload capability in librte_ether.
>   add outer IP checksum capability in i40e PMD
>   testpmd command lines of the tx_checksum and csum forwarding rework
> 
>  app/test-pmd/cmdline.c|  201 
> +++--
>  app/test-pmd/csumonly.c   |   35 ---
>  app/test-pmd/testpmd.h|6 +-
>  lib/librte_ether/rte_ethdev.h |1 +
>  lib/librte_pmd_i40e/i40e_ethdev.c |3 +-
>  5 files changed, 218 insertions(+), 28 deletions(-)
> 

Acked-by: Konstantin Ananyev 

> --
> 1.7.7.6

[dpdk-dev] [PATCH v3 07/28] ethdev: Add functions to know which port is attached or detached

2014-12-09 Thread Qiu, Michael

On 2014/12/9 14:32, Tetsuya Mukawa wrote:
> The patch adds rte_eth_dev_save() and rte_eth_dev_get_changed_port().
> rte_eth_dev_save() is used for saving current rte_eth_dev structures.
> rte_eth_dev_get_changed_port() receives the rte_eth_dev structures, then
> compare these with current values to know which port is actually
> attached or detached.
>
> Signed-off-by: Tetsuya Mukawa 
> ---
>  lib/librte_ether/rte_ethdev.c | 21 +
>  lib/librte_ether/rte_ethdev.h | 21 +
>  2 files changed, 42 insertions(+)
>
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index 51697e1..6a3700e 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -416,6 +416,27 @@ rte_eth_dev_count(void)
>   return (nb_ports);
>  }
>  
> +void
> +rte_eth_dev_save(struct rte_eth_dev *devs)
> +{
> + if (devs == NULL)
> + return;
> +
> + /* save current rte_eth_devices */
> + memcpy(devs, rte_eth_devices,
> + sizeof(struct rte_eth_dev) * RTE_MAX_ETHPORTS);
> +}
> +
> +int
> +rte_eth_dev_get_changed_port(struct rte_eth_dev *devs, uint8_t *port_id)
> +{
> + /* check which port was attached or detached */
> + for (*port_id = 0; *port_id < RTE_MAX_ETHPORTS; (*port_id)++, devs++)
> + if (rte_eth_devices[*port_id].attached ^ devs->attached)
> + return 0;

Can we have more than one port changed?
If so, your logic should do little modify.

Thanks,
Michael
> + return 1;
> +}
> +
>  static int
>  rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues)
>  {
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index b329e11..03c8850 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -1642,6 +1642,27 @@ extern struct rte_eth_dev rte_eth_devices[];
>  extern uint8_t rte_eth_dev_count(void);
>  
>  /**
> + * Function for internal use by port hotplug functions.
> + * Copies current ethdev structures to the specified pointer.
> + *
> + * @paramdevsThe pointer to the ethdev structures
> + */
> +extern void rte_eth_dev_save(struct rte_eth_dev *devs);
> +
> +/**
> + * Function for internal use by port hotplug functions.
> + * Compare the specified ethdev structures with currents. Then
> + * if there is a port which status is changed, fill the specified pointer
> + * with the port id of that port.
> + * @paramdevsThe pointer to the ethdev structures
> + * @paramport_id The pointer to the port id
> + * @return
> + *   - 0 on success, negative on error
> + */
> +extern int rte_eth_dev_get_changed_port(
> + struct rte_eth_dev *devs, uint8_t *port_id);
> +
> +/**
>   * Function for internal use by dummy drivers primarily, e.g. ring-based
>   * driver.
>   * Allocates a new ethdev slot for an ethernet device and returns the pointer

[dpdk-dev] [PATCH v3 03/28] eal/pci: Replace pci address comparison code by eal_compare_pci_addr

2014-12-09 Thread Qiu, Michael

On 2014/12/9 14:32, Tetsuya Mukawa wrote:
> This patch replaces pci_addr_comparison() and memcmp() of pci addresses by
> eal_compare_pci_addr().
>
> Signed-off-by: Tetsuya Mukawa 
> ---
>  lib/librte_eal/bsdapp/eal/eal_pci.c   | 16 +---
>  lib/librte_eal/common/eal_common_pci.c|  2 +-
>  lib/librte_eal/common/include/rte_pci.h   | 29 +
>  lib/librte_eal/linuxapp/eal/eal_pci.c | 16 +---
>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c |  2 +-
>  5 files changed, 33 insertions(+), 32 deletions(-)
>
> diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
> b/lib/librte_eal/bsdapp/eal/eal_pci.c
> index 74ecce7..7eda513 100644
> --- a/lib/librte_eal/bsdapp/eal/eal_pci.c
> +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
> @@ -270,20 +270,6 @@ pci_uio_map_resource(struct rte_pci_device *dev)
>   return (0);
>  }
>  
> -/* Compare two PCI device addresses. */
> -static int
> -pci_addr_comparison(struct rte_pci_addr *addr, struct rte_pci_addr *addr2)
> -{
> - uint64_t dev_addr = (addr->domain << 24) + (addr->bus << 16) + 
> (addr->devid << 8) + addr->function;
> - uint64_t dev_addr2 = (addr2->domain << 24) + (addr2->bus << 16) + 
> (addr2->devid << 8) + addr2->function;
> -
> - if (dev_addr > dev_addr2)
> - return 1;
> - else
> - return 0;
> -}
> -
> -
>  /* Scan one pci sysfs entry, and fill the devices list from it. */
>  static int
>  pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
> @@ -358,7 +344,7 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
>   struct rte_pci_device *dev2 = NULL;
>  
>   TAILQ_FOREACH(dev2, _device_list, next) {
> - if (pci_addr_comparison(>addr, >addr))
> + if (eal_compare_pci_addr(>addr, >addr))
>   continue;
>   else {
>   TAILQ_INSERT_BEFORE(dev2, dev, next);
> diff --git a/lib/librte_eal/common/eal_common_pci.c 
> b/lib/librte_eal/common/eal_common_pci.c
> index f3c7f71..f01f258 100644
> --- a/lib/librte_eal/common/eal_common_pci.c
> +++ b/lib/librte_eal/common/eal_common_pci.c
> @@ -93,7 +93,7 @@ static struct rte_devargs *pci_devargs_lookup(struct 
> rte_pci_device *dev)
>   if (devargs->type != RTE_DEVTYPE_BLACKLISTED_PCI &&
>   devargs->type != RTE_DEVTYPE_WHITELISTED_PCI)
>   continue;
> - if (!memcmp(>addr, >pci.addr, sizeof(dev->addr)))
> + if (eal_compare_pci_addr(>addr, >pci.addr))
>   return devargs;
>   }
>   return NULL;
> diff --git a/lib/librte_eal/common/include/rte_pci.h 
> b/lib/librte_eal/common/include/rte_pci.h
> index b819539..fe374a8 100644
> --- a/lib/librte_eal/common/include/rte_pci.h
> +++ b/lib/librte_eal/common/include/rte_pci.h
> @@ -261,6 +261,35 @@ eal_parse_pci_DomBDF(const char *input, struct 
> rte_pci_addr *dev_addr)
>  }
>  #undef GET_PCIADDR_FIELD
>  
> +/* Compare two PCI device addresses. */
> +/**
> + * Utility function to compare two PCI device addresses.
> + *
> + * @param addr
> + *   The PCI Bus-Device-Function address to compare
> + * @param addr2
> + *   The PCI Bus-Device-Function address to compare
> + * @return
> + *  0 on equal PCI address.
> + *  Positive on addr is greater than addr2.
> + *  Negative on addr is less than addr2.
> + */
> +static inline int
> +eal_compare_pci_addr(struct rte_pci_addr *addr, struct rte_pci_addr *addr2)
> +{
> + uint64_t dev_addr = (addr->domain << 24) + (addr->bus << 16) +
> + (addr->devid << 8) + addr->function;
> + uint64_t dev_addr2 = (addr2->domain << 24) + (addr2->bus << 16) +
> + (addr2->devid << 8) + addr2->function;
> +

Here I prefer:

uint64_t dev_addr = (addr->domain << 24) | (addr->bus << 16) | (addr->devid << 
8) | addr->function;

Thanks,
Michael
> + if (dev_addr > dev_addr2)
> + return 1;
> + else if (dev_addr < dev_addr2)
> + return -1;
> + else
> + return 0;
> +}
> +
>  /**
>   * Probe the PCI bus for registered drivers.
>   *
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
> b/lib/librte_eal/linuxapp/eal/eal_pci.c
> index b5f5410..23a69e9 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
> @@ -200,20 +200,6 @@ error:
>   return -1;
>  }
>  
> -/* Compare two PCI device addresses. */
> -static int
> -pci_addr_comparison(struct rte_pci_addr *addr, struct rte_pci_addr *addr2)
> -{
> - uint64_t dev_addr = (addr->domain << 24) + (addr->bus << 16) + 
> (addr->devid << 8) + addr->function;
> - uint64_t dev_addr2 = (addr2->domain << 24) + (addr2->bus << 16) + 
> (addr2->devid << 8) + addr2->function;
> -
> - if (dev_addr > dev_addr2)
> - return 1;
> - else
> - return 0;
> -}
> -
> -
>  /* Scan one pci sysfs entry, and fill the

[dpdk-dev] A question about hugepage initialization time

2014-12-09 Thread Stephen Hemminger

On Tue, 9 Dec 2014 11:45:07 -0800
  wrote:

> > Hey Folks,
> >
> > Our DPDK application deals with very large in memory data structures, and
> > can potentially use tens or even hundreds of gigabytes of hugepage memory.
> > During the course of development, we've noticed that as the number of huge
> > pages increases, the memory initialization time during EAL init gets to be
> > quite long, lasting several minutes at present.  The growth in init time
> > doesn't appear to be linear, which is concerning.
> >
> > This is a minor inconvenience for us and our customers, as memory
> > initialization makes our boot times a lot longer than it would otherwise
> > be.  Also, my experience has been that really long operations often are
> > hiding errors - what you think is merely a slow operation is actually a
> > timeout of some sort, often due to misconfiguration. This leads to two
> > questions:
> >
> > 1. Does the long initialization time suggest that there's an error
> > happening under the covers?
> > 2. If not, is there any simple way that we can shorten memory
> > initialization time?
> >
> > Thanks in advance for your insights.
> >
> > --
> > Matt Laswell
> > laswell at infiniteio.com
> > infinite io, inc.
> >
> 
> Hello,
> 
> please find some quick comments on the questions:
> 1.) By our experience long initialization time is normal in case of 
> large amount of memory. However this time depends on some things:
> - number of hugepages (pagefault handled by kernel is pretty expensive)
> - size of hugepages (memset at initialization)
> 
> 2.) Using 1G pages instead of 2M will reduce the initialization time 
> significantly. Using wmemset instead of memset adds an additional 20-30% 
> boost by our measurements. Or, just by touching the pages but not cleaning 
> them you can have still some more speedup. But in this case your layer or 
> the applications above need to do the cleanup at allocation time 
> (e.g. by using rte_zmalloc).
> 
> Cheers,
> 

I wonder if the whole rte_malloc code is even worth it with a modern kernel
with transparent huge pages? rte_malloc adds very little value and is less safe
and slower than glibc or other allocators. Plus you lose the ablilty to get
all the benefit out of valgrind or electric fence.

[dpdk-dev] [RFC PATCH 00/17] Single virtio implementation

2014-12-09 Thread Ouyang, Changchun

Hi Bruce,

> -Original Message-
> From: Richardson, Bruce
> Sent: Tuesday, December 9, 2014 5:47 PM
> To: Ouyang, Changchun
> Cc: Thomas Monjalon; dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 00/17] Single virtio implementation
> 
> On Tue, Dec 09, 2014 at 06:40:23AM +, Ouyang, Changchun wrote:
> >
> >
> > > -Original Message-
> > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > Sent: Tuesday, December 9, 2014 2:12 PM
> > > To: Ouyang, Changchun
> > > Cc: Qiu, Michael; Stephen Hemminger; dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [RFC PATCH 00/17] Single virtio
> > > implementation
> > >
> > > 2014-12-09 05:41, Ouyang, Changchun:
> > > > Hi
> > > >
> > > > > -Original Message-
> > > > > From: Qiu, Michael
> > > > > Sent: Tuesday, December 9, 2014 11:23 AM
> > > > > To: Ouyang, Changchun; Thomas Monjalon; Stephen Hemminger
> > > > > Cc: dev at dpdk.org
> > > > > Subject: Re: [dpdk-dev] [RFC PATCH 00/17] Single virtio
> > > > > implementation
> > > > >
> > > > > On 12/9/2014 9:11 AM, Ouyang, Changchun wrote:
> > > > > > Hi Thomas,
> > > > > >
> > > > > >> -Original Message-
> > > > > >> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > > >> Sent: Monday, December 8, 2014 5:31 PM
> > > > > >> To: Ouyang, Changchun
> > > > > >> Cc: dev at dpdk.org
> > > > > >> Subject: Re: [dpdk-dev] [RFC PATCH 00/17] Single virtio
> > > > > >> implementation
> > > > > >>
> > > > > >> Hi Changchun,
> > > > > >>
> > > > > >> 2014-12-08 14:21, Ouyang Changchun:
> > > > > >>> This patch set bases on two original RFC patch sets from
> > > > > >>> Stephen
> > > > > >> Hemminger[stephen at networkplumber.org]
> > > > > >>> Refer to
> > > > > >>> [http://dpdk.org/ml/archives/dev/2014-August/004845.html ]
> > > > > >>> for
> > > > > >> the original one.
> > > > > >>> This patch set also resolves some conflict with latest codes
> > > > > >>> and removed
> > > > > >> duplicated codes.
> > > > > >>
> > > > > >> As you sent the patches, you appear as the author.
> > > > > >> But I guess Stephen should be the author for some of them.
> > > > > >> Please check who has contributed the most in each patch to
> decide.
> > > > > > You are right, most of patches originate from Stephen's
> > > > > > patchset, except for the last one, To be honest, I am ok
> > > > > > whoever is the author of this patch set, :-), We could co-own
> > > > > > the feature of Single virtio if you all agree with it, and I
> > > > > > think we couldn't finish Such a feature without collaboration
> > > > > > among us, this is why I tried to communicate
> > > > > with most of you to collect more feedback, suggestion and
> > > > > comments for this feature.
> > > > > > Very appreciate for all kinds of feedback, suggestion here,
> > > > > > especially for
> > > > > patch set from Stephen.
> > > > > >
> > > > > > According to your request, how could we make this patch set
> > > > > > looks more
> > > > > like Stephen as the author?
> > > > > > Currently I add Stephen as Signed-off-by list in each patch(I
> > > > > > got the
> > > > > agreement from Stephen before doing this :-)).
> > > > >
> > > > > Hi Ouyang,
> > > > >
> > > > > "Signed-off-by" should be added by himself, because the one who
> > > > > in the Signed-off-by list should take responsibility for it(like
> > > > > potential
> > > bugs/issues).
> > > > >
> > > > > Although, lots of patches are originate from Stephen, we still
> > > > > need himself add this line :)
> > > >
> > > > Hi Thomas,
> > > > It that right? I can't add Stephen into Signed-off-by list even if
> > > > I have gotten the agreement from Stephen, What 's the strict rule here?
> > >
> > > Stephen sent the patches with his Signed-off, then you added yours.
> > > This is OK.
> > > Using git am, author would have been Stephen. To change author now,
> > > you can edit each commit with interactive rebase and "git commit
> > > --amend -- author=Stephen".
> > > No need to resend now. Please check it for next version of the patchset.
> > >
> >
> > So I understand correctly, Stephen need care for from patches from 1
> > to 16, I need care for the 17th patch from next version.
> > What I mean "caring for" above is:  debug and validate them and send
> > out patches
> >
> > Thanks
> > Changchun
> >
> Just to clarify Thomas point here about use of "git am". If you get a patch
> from someone to test or work on, use "git am" to apply it, rather than "git
> apply", since "git am" generates a commit in your local repo and thereby
> maintains the original authorship of the patch. If you do "git apply" and
> subsequently commit yourself, you - rather than the original author - will
> appear as the "author" of the patch, and you need to amend the commit as
> Thomas suggests to fix this.
> 
> So in short:
> * git am == good
> * git apply == bad

Thanks very much for the clarification. I will use git am for next version.

Changchun

[dpdk-dev] [PATCH] rte_eth_ctrl: rename rte_eth_fdir_flow.ip6_flow

2014-12-09 Thread Neil Horman

On Tue, Dec 09, 2014 at 05:32:08PM +, Mark Kavanagh wrote:
> The name of the rte_eth_fdir_flow's rte_eth_ipv6_flow attribute,
> 'ip6_flow', clashes with a macro defined in
> /usr/include/netinet/ip6.h, such that when DPDK is linked with an
> application that uses the afforementioned header, the macro is
> expanded within the DPDK struct, causing a compilation error.
> 
> Rename the relevant attribute in DPDK to resolve this.
> 
> Signed-off-by: Mark Kavanagh 
> ---
>  app/test-pmd/cmdline.c  | 4 ++--
>  lib/librte_ether/rte_eth_ctrl.h | 2 +-
>  lib/librte_pmd_i40e/i40e_fdir.c | 4 ++--
>  3 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index f79ea3e..882a5a2 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -8261,9 +8261,9 @@ cmd_flow_director_filter_parsed(void *parsed_result,
>   case RTE_ETH_FLOW_TYPE_UDPV6:
>   case RTE_ETH_FLOW_TYPE_TCPV6:
>   IPV6_ADDR_TO_ARRAY(res->ip_dst,
> - entry.input.flow.ip6_flow.dst_ip);
> + entry.input.flow.ipv6_flow.dst_ip);
>   IPV6_ADDR_TO_ARRAY(res->ip_src,
> - entry.input.flow.ip6_flow.src_ip);
> + entry.input.flow.ipv6_flow.src_ip);
>   /* need convert to big endian. */
>   entry.input.flow.udp6_flow.dst_port =
>   rte_cpu_to_be_16(res->port_dst);
> diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
> index 7088d8d..5d9c387 100644
> --- a/lib/librte_ether/rte_eth_ctrl.h
> +++ b/lib/librte_ether/rte_eth_ctrl.h
> @@ -275,7 +275,7 @@ union rte_eth_fdir_flow {
>   struct rte_eth_udpv6_flow  udp6_flow;
>   struct rte_eth_tcpv6_flow  tcp6_flow;
>   struct rte_eth_sctpv6_flow sctp6_flow;
> - struct rte_eth_ipv6_flow   ip6_flow;
> + struct rte_eth_ipv6_flow   ipv6_flow;
>  };
>  
>  /**
> diff --git a/lib/librte_pmd_i40e/i40e_fdir.c b/lib/librte_pmd_i40e/i40e_fdir.c
> index ad38803..68511c8 100644
> --- a/lib/librte_pmd_i40e/i40e_fdir.c
> +++ b/lib/librte_pmd_i40e/i40e_fdir.c
> @@ -749,10 +749,10 @@ i40e_fdir_fill_eth_ip_head(const struct 
> rte_eth_fdir_input *fdir_input,
>* to the expected received packets.
>*/
>   rte_memcpy(&(ip6->src_addr),
> -&(fdir_input->flow.ip6_flow.dst_ip),
> +&(fdir_input->flow.ipv6_flow.dst_ip),
>  IPV6_ADDR_LEN);
>   rte_memcpy(&(ip6->dst_addr),
> -&(fdir_input->flow.ip6_flow.src_ip),
> +&(fdir_input->flow.ipv6_flow.src_ip),
>  IPV6_ADDR_LEN);
>   ip6->proto = next_proto[fdir_input->flow_type];
>   break;
> -- 
> 1.9.3
> 
> 
Acked-by: Neil Horman

[dpdk-dev] DDPK use of MAP_FIXED in mmap

2014-12-09 Thread Neil Horman

On Tue, Dec 09, 2014 at 04:04:11PM +, Bruce Richardson wrote:
> On Tue, Dec 09, 2014 at 10:26:34AM -0500, Neil Horman wrote:
> > On Mon, Dec 08, 2014 at 07:02:38PM +, Karmarkar Suyash wrote:
> > > Hello,
> > > 
> > > In DPDK when we use mmap why are we passing the MAP_FIXED flag when Linux 
> > > man page itself says that the option is discouraged? Any specific reason 
> > > for passing the MAP_FIXED flag?
> > > 
> > Because theres nothing wrong with doing so.  The man page says its 
> > discouraged
> > because it creates less portable applications (due to the fact that not all
> > operating systems support MAP_FIXED).  Given that we currently only support 
> > BSD
> > and Linux however, and given that MAP_FIXED was added to POSIX for
> > XSI compliant systems, it seems like a reasonable thing to use, as we will 
> > most
> > likely never run into a system that won't support it
> > Neil
> > 
> 
> Whatever about portability, I thing the best reason to avoid it is given in 
> the
> quote from the map page given below: "If the memory region specified by addr 
> and
> len overlaps pages of any existing mapping(s), then the overlapped part of the
> existing mapping(s) will be discarded." i.e. an mmap with MAP_FIXED can 
> silently
> discard an existing mapping. This can lead to some strange behaviour for the
> unwary.
> 
> /Bruce
> 
That can definately produce some odd behavior, but thats completely avoidable.
An application can introspect its own address mapping to avoid such overlaps.

Of course, if we can avoid using MAP_FIXED, it just makes things a bit easier :)
Neil

> > > 
> > > http://linux.die.net/man/2/mmap
> > > 
> > > MAP_FIXED
> > > Don't interpret addr as a hint: place the mapping at exactly that 
> > > address. addr must be a multiple of the page size. If the memory region 
> > > specified by addr and len overlaps pages of any existing mapping(s), then 
> > > the overlapped part of the existing mapping(s) will be discarded. If the 
> > > specified address cannot be used, mmap() will fail. Because requiring a 
> > > fixed address for a mapping is less portable, the use of this option is 
> > > discouraged.
> > > 
> > > 
> > > Regards
> > > Suyash Karmarkar
> > > 
>

[dpdk-dev] [PATCH] Fix power_of_2 macro. Avoid branching when calculating RTE_MIN and RTE_MAX.

2014-12-09 Thread r k

Subject: [PATCH] Fix power_of_2 macro. Avoid branching when
calculating RTE_MIN and RTE_MAX.

---
 lib/librte_eal/common/include/rte_common.h | 6 +++---
 lib/librte_pmd_e1000/igb_pf.c  | 4 ++--
 lib/librte_pmd_ixgbe/ixgbe_pf.c| 4 ++--
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_common.h
b/lib/librte_eal/common/include/rte_common.h
index 921b91f..21b17ad 100644
--- a/lib/librte_eal/common/include/rte_common.h
+++ b/lib/librte_eal/common/include/rte_common.h
@@ -203,7 +203,7 @@ extern int RTE_BUILD_BUG_ON_detected_error;
static inline int  rte_is_power_of_2(uint32_t n)  {
-   return ((n-1) & n) == 0;
+   return n && !(n & (n - 1));
 }

 /**
@@ -259,7 +259,7 @@ rte_align64pow2(uint64_t v)  #define RTE_MIN(a, b) ({ \
typeof (a) _a = (a); \
typeof (b) _b = (b); \
-   _a < _b ? _a : _b; \
+_b ^ ((_a ^ _b) & -(_a < _b)); \
})

 /**
@@ -268,7 +268,7 @@ rte_align64pow2(uint64_t v)  #define RTE_MAX(a, b) ({ \
typeof (a) _a = (a); \
typeof (b) _b = (b); \
-   _a > _b ? _a : _b; \
+_a ^ ((_a ^ _b) & -(_a < _b)); \
})

 /*** Other general functions / macros / diff --git
a/lib/librte_pmd_e1000/igb_pf.c b/lib/librte_pmd_e1000/igb_pf.c index
bc3816a..546499c 100644
--- a/lib/librte_pmd_e1000/igb_pf.c
+++ b/lib/librte_pmd_e1000/igb_pf.c
@@ -321,11 +321,11 @@ igb_vf_set_mac_addr(struct rte_eth_dev *dev,
uint32_t vf, uint32_t *msgbuf)  static int
igb_vf_set_multicast(struct rte_eth_dev *dev, __rte_unused uint32_t
vf, uint32_t *msgbuf)  {
-   int i;
+   int16_t i;
uint32_t vector_bit;
uint32_t vector_reg;
uint32_t mta_reg;
-   int entries = (msgbuf[0] & E1000_VT_MSGINFO_MASK) >>
+   int32_t entries = (msgbuf[0] & E1000_VT_MSGINFO_MASK) >>
E1000_VT_MSGINFO_SHIFT;
uint16_t *hash_list = (uint16_t *)[1];
struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
diff --git a/lib/librte_pmd_ixgbe/ixgbe_pf.c
b/lib/librte_pmd_ixgbe/ixgbe_pf.c index 51da1fd..426caf9 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_pf.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_pf.c
@@ -390,7 +390,7 @@ ixgbe_vf_set_multicast(struct rte_eth_dev *dev,
__rte_unused uint32_t vf, uint32
struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
struct ixgbe_vf_info *vfinfo =
*(IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data->dev_private));
-   int nb_entries = (msgbuf[0] & IXGBE_VT_MSGINFO_MASK) >>
+   int32_t nb_entries = (msgbuf[0] & IXGBE_VT_MSGINFO_MASK) >>
IXGBE_VT_MSGINFO_SHIFT;
uint16_t *hash_list = (uint16_t *)[1];
uint32_t mta_idx;
@@ -399,7 +399,7 @@ ixgbe_vf_set_multicast(struct rte_eth_dev *dev,
__rte_unused uint32_t vf, uint32
const uint32_t IXGBE_MTA_BIT_SHIFT = 5;
const uint32_t IXGBE_MTA_BIT_MASK = (0x1 << IXGBE_MTA_BIT_SHIFT) - 1;
uint32_t reg_val;
-   int i;
+   int16_t i;

/* only so many hash values supported */
nb_entries = RTE_MIN(nb_entries, IXGBE_MAX_VF_MC_ENTRIES);
--
1.9.1

[dpdk-dev] [PATCH] Small fixes in rte_common.h

2014-12-09 Thread r k

Subject: [PATCH] Small fixes in rte_common.h

Fix a bug in power_of_2 since zero is not.
Avoid branching in RTE_MIN and RTE_MAX macros.

Ravi Kerur (1):
  Fix power_of_2 macro.
  Avoid branching when calculating RTE_MIN and RTE_MAX.

 lib/librte_eal/common/include/rte_common.h | 6 +++---
 lib/librte_pmd_e1000/igb_pf.c  | 4 ++--
 lib/librte_pmd_ixgbe/ixgbe_pf.c| 4 ++--
 3 files changed, 7 insertions(+), 7 deletions(-)

--
1.9.1

[dpdk-dev] [PATCH v2] testpmd: Add port hotplug support

2014-12-09 Thread Tetsuya Mukawa

The patch introduces following commands.
- port attach [ident]
- port detach [port_id]
 - attach: attaching a port
 - detach: detaching a port
 - ident: pci address of physical device.
  Or device name and paramerters of virtual device.
 (ex. :02:00.0, eth_pcap0,iface=eth0)
 - port_id: port identifier

Signed-off-by: Tetsuya Mukawa 
---
 app/test-pmd/cmdline.c| 133 +--
 app/test-pmd/config.c | 116 +++
 app/test-pmd/parameters.c |  21 +++--
 app/test-pmd/testpmd.c| 199 +++---
 app/test-pmd/testpmd.h|  18 -
 5 files changed, 357 insertions(+), 130 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index f79ea3e..40c2db7 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -572,6 +572,12 @@ static void cmd_help_long_parsed(void *parsed_result,
"port close (port_id|all)\n"
"Close all ports or port_id.\n\n"

+   "port add (p|a) (ident)\n"
+   "Add physical or virtual dev by pci address or 
virtual device name\n\n"
+
+   "port del (p|a) (port_id)\n"
+   "Del physical or virtual dev by port_id\n\n"
+
"port config (port_id|all)"
" speed (10|100|1000|1|4|auto)"
" duplex (half|full|auto)\n"
@@ -853,6 +859,89 @@ cmdline_parse_inst_t cmd_operate_specific_port = {
},
 };

+/* *** attach a specificied port *** */
+struct cmd_operate_attach_port_result {
+   cmdline_fixed_string_t port;
+   cmdline_fixed_string_t keyword;
+   cmdline_fixed_string_t identifier;
+};
+
+static void cmd_operate_attach_port_parsed(void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_operate_attach_port_result *res = parsed_result;
+
+   if (!strcmp(res->keyword, "attach"))
+   attach_port(res->identifier);
+   else
+   printf("Unknown parameter\n");
+}
+
+cmdline_parse_token_string_t cmd_operate_attach_port_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_attach_port_result,
+   port, "port");
+cmdline_parse_token_string_t cmd_operate_attach_port_keyword =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_attach_port_result,
+   keyword, "attach");
+cmdline_parse_token_string_t cmd_operate_attach_port_identifier =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_attach_port_result,
+   identifier, NULL);
+
+cmdline_parse_inst_t cmd_operate_attach_port = {
+   .f = cmd_operate_attach_port_parsed,
+   .data = NULL,
+   .help_str = "port attach identifier, "
+   "identifier: pci address or virtual dev name",
+   .tokens = {
+   (void *)_operate_attach_port_port,
+   (void *)_operate_attach_port_keyword,
+   (void *)_operate_attach_port_identifier,
+   NULL,
+   },
+};
+
+/* *** detach a specificied port *** */
+struct cmd_operate_detach_port_result {
+   cmdline_fixed_string_t port;
+   cmdline_fixed_string_t keyword;
+   uint8_t port_id;
+};
+
+static void cmd_operate_detach_port_parsed(void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_operate_detach_port_result *res = parsed_result;
+
+   if (!strcmp(res->keyword, "detach"))
+   detach_port(res->port_id);
+   else
+   printf("Unknown parameter\n");
+}
+
+cmdline_parse_token_string_t cmd_operate_detach_port_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_detach_port_result,
+   port, "port");
+cmdline_parse_token_string_t cmd_operate_detach_port_keyword =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_detach_port_result,
+   keyword, "detach");
+cmdline_parse_token_num_t cmd_operate_detach_port_port_id =
+   TOKEN_NUM_INITIALIZER(struct cmd_operate_detach_port_result,
+   port_id, UINT8);
+
+cmdline_parse_inst_t cmd_operate_detach_port = {
+   .f = cmd_operate_detach_port_parsed,
+   .data = NULL,
+   .help_str = "port detach port_id",
+   .tokens = {
+   (void *)_operate_detach_port_port,
+   (void *)_operate_detach_port_keyword,
+   (void *)_operate_detach_port_port_id,
+   NULL,
+   },
+};
+
 /* *** configure speed for all ports *** */
 struct cmd_config_speed_all {
cmdline_fixed_string_t port;
@@ -907,7 +996,7 @@ cmd_config_speed_all_parsed(void *parsed_result,
return;
}

-   for (pid = 0; pid < nb_ports; pid++) {

[dpdk-dev] [PATCH v2] librte_pmd_pcap: Add port hotplug support

2014-12-09 Thread Tetsuya Mukawa

This patch adds finalization code to free resources allocated by the
PMD.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_pmd_pcap/rte_eth_pcap.c | 40 ++
 1 file changed, 40 insertions(+)

diff --git a/lib/librte_pmd_pcap/rte_eth_pcap.c 
b/lib/librte_pmd_pcap/rte_eth_pcap.c
index 0f34f73..c92ebb7 100644
--- a/lib/librte_pmd_pcap/rte_eth_pcap.c
+++ b/lib/librte_pmd_pcap/rte_eth_pcap.c
@@ -499,6 +499,13 @@ static struct eth_dev_ops ops = {
.stats_reset = eth_stats_reset,
 };

+static struct eth_driver rte_pcap_pmd = {
+   .pci_drv = {
+   .name = "rte_pcap_pmd",
+   .drv_flags = RTE_PCI_DRV_DETACHABLE,
+   },
+};
+
 /*
  * Function handler that opens the pcap file for reading a stores a
  * reference of it for use it later on.
@@ -714,6 +721,10 @@ rte_pmd_init_internals(const char *name, const unsigned 
nb_rx_queues,
if (*eth_dev == NULL)
goto error;

+   /* check length of device name */
+   if ((strlen((*eth_dev)->data->name) + 1) > sizeof(data->name))
+   goto error;
+
/* now put it all together
 * - store queue data in internals,
 * - store numa_node info in pci_driver
@@ -739,10 +750,13 @@ rte_pmd_init_internals(const char *name, const unsigned 
nb_rx_queues,
data->nb_tx_queues = (uint16_t)nb_tx_queues;
data->dev_link = pmd_link;
data->mac_addrs = _addr;
+   strncpy(data->name,
+   (*eth_dev)->data->name, strlen((*eth_dev)->data->name));

(*eth_dev)->data = data;
(*eth_dev)->dev_ops = 
(*eth_dev)->pci_dev = pci_dev;
+   (*eth_dev)->driver = _pcap_pmd;

return 0;

@@ -927,10 +941,36 @@ rte_pmd_pcap_devinit(const char *name, const char *params)

 }

+static int
+rte_pmd_pcap_devclose(const char *name, const char *params __rte_unused)
+{
+   struct rte_eth_dev *eth_dev = NULL;
+
+   RTE_LOG(INFO, PMD, "Closing pcap ethdev on numa socket %u\n",
+   rte_socket_id());
+
+   if (name == NULL)
+   return -1;
+
+   /* reserve an ethdev entry */
+   eth_dev = rte_eth_dev_allocated(name);
+   if (eth_dev == NULL)
+   return -1;
+
+   rte_free(eth_dev->data->dev_private);
+   rte_free(eth_dev->data);
+   rte_free(eth_dev->pci_dev);
+
+   rte_eth_dev_free(name);
+
+   return 0;
+}
+
 static struct rte_driver pmd_pcap_drv = {
.name = "eth_pcap",
.type = PMD_VDEV,
.init = rte_pmd_pcap_devinit,
+   .close = rte_pmd_pcap_devclose,
 };

 PMD_REGISTER_DRIVER(pmd_pcap_drv);
-- 
1.9.1

[dpdk-dev] [PATCH v2 28/28] eal: Enable port hotplug framework in Linux

2014-12-09 Thread Tetsuya Mukawa

The patch enables CONFIG_RTE_LIBRTE_EAL_HOTPLUG in Linux configuration.

Signed-off-by: Tetsuya Mukawa 
---
 config/common_linuxapp | 5 +
 1 file changed, 5 insertions(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 2f9643b..27d05be 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -114,6 +114,11 @@ CONFIG_RTE_PCI_MAX_READ_REQUEST_SIZE=0
 CONFIG_RTE_LIBRTE_EAL_LINUXAPP=y

 #
+# Compile Environment Abstraction Layer to support hotplug
+#
+CONFIG_RTE_LIBRTE_EAL_HOTPLUG=y
+
+#
 # Compile Environment Abstraction Layer to support Vmware TSC map
 #
 CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y
-- 
1.9.1

[dpdk-dev] [PATCH v2 25/28] eal/pci: Remove pci_probe/close_all_drivers()

2014-12-09 Thread Tetsuya Mukawa

These functions are actually wrappers of pci_invoke_all_drivers().
Just call it directly.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_pci.c | 30 --
 1 file changed, 4 insertions(+), 26 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index 5ff7b49..5044d8e 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -130,29 +130,7 @@ pci_invoke_all_drivers(struct rte_pci_device *dev, int 
type)
return 1;
 }

-/*
- * If vendor/device ID match, call the devinit() function of all
- * registered driver for the given device. Return -1 if initialization
- * failed, return 1 if no driver is found for this device.
- */
-static int
-pci_probe_all_drivers(struct rte_pci_device *dev)
-{
-   return pci_invoke_all_drivers(dev, INVOKE_PROBE);
-}
-
 #if defined(RTE_LIBRTE_EAL_HOTPLUG) && defined(RTE_LIBRTE_EAL_LINUXAPP)
-/*
- * If vendor/device ID match, call the devclose() function of all
- * registered driver for the given device. Return -1 if initialization
- * failed, return 1 if no driver is found for this device.
- */
-static int
-pci_close_all_drivers(struct rte_pci_device *dev)
-{
-   return pci_invoke_all_drivers(dev, INVOKE_CLOSE);
-}
-
 static int
 rte_eal_pci_invoke_one(struct rte_pci_addr *addr, int type)
 {
@@ -165,10 +143,10 @@ rte_eal_pci_invoke_one(struct rte_pci_addr *addr, int 
type)

switch (type) {
case INVOKE_PROBE:
-   ret = pci_probe_all_drivers(dev);
+   ret = pci_invoke_all_drivers(dev, INVOKE_PROBE);
break;
case INVOKE_CLOSE:
-   ret = pci_close_all_drivers(dev);
+   ret = pci_invoke_all_drivers(dev, INVOKE_CLOSE);
break;
}
if (ret < 0)
@@ -237,10 +215,10 @@ rte_eal_pci_probe(void)

/* probe all or only whitelisted devices */
if (probe_all)
-   ret = pci_probe_all_drivers(dev);
+   ret = pci_invoke_all_drivers(dev, INVOKE_PROBE);
else if (devargs != NULL &&
devargs->type == RTE_DEVTYPE_WHITELISTED_PCI)
-   ret = pci_probe_all_drivers(dev);
+   ret = pci_invoke_all_drivers(dev, INVOKE_PROBE);
if (ret < 0)
rte_exit(EXIT_FAILURE, "Requested device " PCI_PRI_FMT
 " cannot be used\n", dev->addr.domain, 
dev->addr.bus,
-- 
1.9.1

[dpdk-dev] [PATCH v2 24/28] eal/pci: Add port hotplug functions for physical devices.

2014-12-09 Thread Tetsuya Mukawa

The patch adds rte_eal_dev_attach_pdev() and rte_eal_dev_detach_pdev().

rte_eal_dev_attach_pdev() receives a PCI address of the device and
returns an attached port number.
rte_eal_dev_detach_pdev() receives a port number, and returns a PCI
address actually detached.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_dev.c  | 64 +
 lib/librte_eal/common/include/rte_dev.h | 26 ++
 2 files changed, 90 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_dev.c 
b/lib/librte_eal/common/eal_common_dev.c
index 4ea3502..9ff03ed 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -173,6 +173,70 @@ rte_eal_dev_find_and_invoke(const char *name, int type)
return 1;
 }

+/* attach the new physical device, then store port_id of the device */
+int
+rte_eal_dev_attach_pdev(struct rte_pci_addr *addr, uint8_t *port_id)
+{
+   uint8_t new_port_id;
+   struct rte_eth_dev devs[RTE_MAX_ETHPORTS];
+
+   if ((addr == NULL) || (port_id == NULL))
+   goto err;
+
+   /* save current port status */
+   rte_eth_dev_save(devs);
+   /* re-construct pci_device_list */
+   if (rte_eal_pci_scan())
+   goto err;
+   /* invoke probe func of the driver can handle the new device */
+   if (rte_eal_pci_probe_one(addr))
+   goto err;
+   /* get port_id enabled by above procedures */
+   if (rte_eth_dev_get_changed_port(devs, _port_id))
+   goto err;
+
+   *port_id = new_port_id;
+   return 0;
+err:
+   RTE_LOG(ERR, EAL, "Drver, cannot attach the device\n");
+   return -1;
+}
+
+/* detach the new physical device, then store pci_addr of the device */
+int
+rte_eal_dev_detach_pdev(uint8_t port_id, struct rte_pci_addr *addr)
+{
+   struct rte_pci_addr freed_addr;
+   struct rte_pci_addr vp;
+
+   if (addr == NULL)
+   goto err;
+
+   /* check whether the driver supports detach feature, or not */
+   if (rte_eth_dev_check_detachable(port_id))
+   goto err;
+
+   /* get pci address by port id */
+   if (rte_eth_dev_get_addr_by_port(port_id, _addr))
+   goto err;
+
+   /* Zerod pci addr means the port comes from virtual device */
+   vp.domain = vp.bus = vp.devid = vp.function = 0;
+   if (eal_compare_pci_addr(, _addr) == 0)
+   goto err;
+
+   /* invoke close func of the driver,
+* also remove the device from pci_device_list */
+   if (rte_eal_pci_close_one(_addr))
+   goto err;
+
+   *addr = freed_addr;
+   return 0;
+err:
+   RTE_LOG(ERR, EAL, "Drver, cannot detach the device\n");
+   return -1;
+}
+
 static void
 get_vdev_name(char *vdevargs)
 {
diff --git a/lib/librte_eal/common/include/rte_dev.h 
b/lib/librte_eal/common/include/rte_dev.h
index 159d5a5..f0677cb 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -47,6 +47,7 @@ extern "C" {
 #endif

 #include 
+#include 

 /** Double linked list of device drivers. */
 TAILQ_HEAD(rte_driver_list, rte_driver);
@@ -101,6 +102,19 @@ void rte_eal_driver_unregister(struct rte_driver *driver);
 #if defined(RTE_LIBRTE_EAL_HOTPLUG) && defined(RTE_LIBRTE_EAL_LINUXAPP)

 /**
+ * Attach a new physical device.
+ *
+ * @param addr
+ *   A pointer to a pci address structure describing the new
+ *   device to be attached.
+ * @param port_id
+ *  A pointer to a port identifier actually attached.
+ * @return
+ *  0 on success and port_id is filled, negative on error
+ */
+int rte_eal_dev_attach_pdev(struct rte_pci_addr *addr, uint8_t *port_id);
+
+/**
  * Attach a new virtual device.
  *
  * @param vdevargs
@@ -114,6 +128,18 @@ void rte_eal_driver_unregister(struct rte_driver *driver);
 int rte_eal_dev_attach_vdev(const char *vdevargs, uint8_t *port_id);

 /**
+ * Detach a physical device.
+ *
+ * @param port_id
+ *   The port identifier of the physical device to detach.
+ * @param addr
+ *  A pointer to a pci address structure actually detached.
+ * @return
+ *  0 on success and addr is filled, negative on error
+ */
+int rte_eal_dev_detach_pdev(uint8_t port_id, struct rte_pci_addr *addr);
+
+/**
  * Detach a virtual device.
  *
  * @param port_id
-- 
1.9.1

[dpdk-dev] [PATCH v2 21/28] eal/pci: Fix pci_probe_all_drivers to share code with closing function

2014-12-09 Thread Tetsuya Mukawa

pci_close_all_drivers() will be implemented after the patch.
To share a part of code between thses 2 functions, The patch fixes
pci_probe_all_drivers() first.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_pci.c | 28 
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index f01f258..1e3efea 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -99,19 +99,20 @@ static struct rte_devargs *pci_devargs_lookup(struct 
rte_pci_device *dev)
return NULL;
 }

-/*
- * If vendor/device ID match, call the devinit() function of all
- * registered driver for the given device. Return -1 if initialization
- * failed, return 1 if no driver is found for this device.
- */
+#define INVOKE_PROBE   (0)
+
 static int
-pci_probe_all_drivers(struct rte_pci_device *dev)
+pci_invoke_all_drivers(struct rte_pci_device *dev, int type)
 {
struct rte_pci_driver *dr = NULL;
-   int rc;
+   int rc = 0;

TAILQ_FOREACH(dr, _driver_list, next) {
-   rc = rte_eal_pci_probe_one_driver(dr, dev);
+   switch (type) {
+   case INVOKE_PROBE:
+   rc = rte_eal_pci_probe_one_driver(dr, dev);
+   break;
+   }
if (rc < 0)
/* negative value is an error */
return -1;
@@ -124,6 +125,17 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
 }

 /*
+ * If vendor/device ID match, call the devinit() function of all
+ * registered driver for the given device. Return -1 if initialization
+ * failed, return 1 if no driver is found for this device.
+ */
+static int
+pci_probe_all_drivers(struct rte_pci_device *dev)
+{
+   return pci_invoke_all_drivers(dev, INVOKE_PROBE);
+}
+
+/*
  * Scan the content of the PCI bus, and call the devinit() function for
  * all registered drivers that have a matching entry in its id_table
  * for discovered devices.
-- 
1.9.1

[dpdk-dev] [PATCH v2 20/28] eal/pci: Add rte_eal_pci_close_one_driver

2014-12-09 Thread Tetsuya Mukawa

The function is used for closing the specified driver and device.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_private.h   | 15 +
 lib/librte_eal/linuxapp/eal/eal_pci.c | 61 +++
 2 files changed, 76 insertions(+)

diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index a1127ab..f892ac4 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -176,6 +176,21 @@ int rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr,
struct rte_pci_device *dev);

 /**
+ * Munmap memory for single PCI device
+ *
+ * This function is private to EAL.
+ *
+ * @param  dr
+ *  The pointer to the pci driver structure
+ * @param  dev
+ *  The pointer to the pci device structure
+ * @return
+ *   0 on success, negative on error
+ */
+int rte_eal_pci_close_one_driver(struct rte_pci_driver *dr,
+   struct rte_pci_device *dev);
+
+/**
  * Init tail queues for non-EAL library structures. This is to allow
  * the rings, mempools, etc. lists to be shared among multiple processes
  *
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 8d683f5..93456f3 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -616,6 +616,67 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
return 1;
 }

+#if defined(RTE_LIBRTE_EAL_HOTPLUG) && defined(RTE_LIBRTE_EAL_LINUXAPP)
+/*
+ * If vendor/device ID match, call the devshutdown() function of the
+ * driver.
+ */
+int
+rte_eal_pci_close_one_driver(struct rte_pci_driver *dr,
+   struct rte_pci_device *dev)
+{
+   struct rte_pci_id *id_table;
+
+   if ((dr == NULL) || (dev == NULL))
+   return -EINVAL;
+
+   for (id_table = dr->id_table ; id_table->vendor_id != 0; id_table++) {
+
+   /* check if device's identifiers match the driver's ones */
+   if (id_table->vendor_id != dev->id.vendor_id &&
+   id_table->vendor_id != PCI_ANY_ID)
+   continue;
+   if (id_table->device_id != dev->id.device_id &&
+   id_table->device_id != PCI_ANY_ID)
+   continue;
+   if (id_table->subsystem_vendor_id !=
+   dev->id.subsystem_vendor_id &&
+   id_table->subsystem_vendor_id != PCI_ANY_ID)
+   continue;
+   if (id_table->subsystem_device_id !=
+   dev->id.subsystem_device_id &&
+   id_table->subsystem_device_id != PCI_ANY_ID)
+   continue;
+
+   struct rte_pci_addr *loc = >addr;
+
+   RTE_LOG(DEBUG, EAL,
+   "PCI device "PCI_PRI_FMT" on NUMA socket %i\n",
+   loc->domain, loc->bus, loc->devid,
+   loc->function, dev->numa_node);
+
+   RTE_LOG(DEBUG, EAL, "  remove driver: %x:%x %s\n",
+   dev->id.vendor_id, dev->id.device_id,
+   dr->name);
+
+   /* call the driver devshutdown() function */
+   if (dr->devshutdown && (dr->devshutdown(dr, dev) < 0))
+   return -1;  /* negative value is an error */
+
+   /* clear driver structure */
+   dev->driver = NULL;
+
+   if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING)
+   /* unmap resources for devices that use igb_uio */
+   pci_unmap_device(dev);
+
+   return 0;
+   }
+   /* return positive value if driver is not found */
+   return 1;
+}
+#endif /* RTE_LIBRTE_EAL_HOTPLUG & RTE_LIBRTE_EAL_LINUXAPP */
+
 /* Init the PCI EAL subsystem */
 int
 rte_eal_pci_init(void)
-- 
1.9.1

[dpdk-dev] [PATCH v2 19/28] eal/pci: Change scope of rte_eal_pci_scan to global

2014-12-09 Thread Tetsuya Mukawa

The function is called by port hotplug framework, so change scope of the
function to global.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_private.h   | 11 +++
 lib/librte_eal/linuxapp/eal/eal_pci.c |  6 +++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index 232fcec..a1127ab 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -154,6 +154,17 @@ struct rte_pci_driver;
 struct rte_pci_device;

 /**
+ * Scan the content of the PCI bus, and the devices in the devices
+ * list
+ *
+ * This function is private to EAL.
+ *
+ * @return
+ *  0 on success, negative on error
+ */
+int rte_eal_pci_scan(void);
+
+/**
  * Mmap memory for single PCI device
  *
  * This function is private to EAL.
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 355c858..8d683f5 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -375,8 +375,8 @@ error:
  * Scan the content of the PCI bus, and the devices in the devices
  * list
  */
-static int
-pci_scan(void)
+int
+rte_eal_pci_scan(void)
 {
struct dirent *e;
DIR *dir;
@@ -629,7 +629,7 @@ rte_eal_pci_init(void)
if (internal_config.no_pci)
return 0;

-   if (pci_scan() < 0) {
+   if (rte_eal_pci_scan() < 0) {
RTE_LOG(ERR, EAL, "%s(): Cannot scan PCI bus\n", __func__);
return -1;
}
-- 
1.9.1

[dpdk-dev] [PATCH v2 18/28] eal/pci: Prevent double registrations for pci_device_list

2014-12-09 Thread Tetsuya Mukawa

The patch fixes pci_scan_one() not to register same pci devices twice.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/linuxapp/eal/eal_pci.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index fe212d1..355c858 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -306,14 +306,17 @@ pci_scan_one(const char *dirname, uint16_t domain, 
uint8_t bus,
}
else {
struct rte_pci_device *dev2 = NULL;
+   int ret;

TAILQ_FOREACH(dev2, _device_list, next) {
-   if (eal_compare_pci_addr(>addr, >addr) != 0)
+   ret = eal_compare_pci_addr(>addr, >addr);
+   if (ret > 0)
continue;
-   else {
+   else if (ret < 0) {
TAILQ_INSERT_BEFORE(dev2, dev, next);
return 0;
-   }
+   } else  /* already registered */
+   return 0;
}
TAILQ_INSERT_TAIL(_device_list, dev, next);
}
-- 
1.9.1

[dpdk-dev] [PATCH v2 16/28] eal/pci: Add port hotplug functions for virtual devices.

2014-12-09 Thread Tetsuya Mukawa

The patch adds rte_eal_dev_attach_vdev() and rte_eal_dev_detach_vdev().

rte_eal_dev_attach_vdev() receives virtual device name and parameters,
and returns an attached port number.
rte_eal_dev_detach_vdev() receives a port number, and returns device
name actually detached.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_dev.c  | 86 +
 lib/librte_eal/common/include/rte_dev.h | 29 +++
 2 files changed, 115 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_dev.c 
b/lib/librte_eal/common/eal_common_dev.c
index f573a54..4ea3502 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -172,4 +172,90 @@ rte_eal_dev_find_and_invoke(const char *name, int type)
}
return 1;
 }
+
+static void
+get_vdev_name(char *vdevargs)
+{
+   char *sep;
+
+   if (vdevargs == NULL)
+   return;
+
+   /* set the first ',' to '\0' to split name and arguments */
+   sep = strchr(vdevargs, ',');
+   if (sep != NULL)
+   sep[0] = '\0';
+}
+
+/* attach the new virtual device, then store port_id of the device */
+int
+rte_eal_dev_attach_vdev(const char *vdevargs, uint8_t *port_id)
+{
+   char *args;
+   uint8_t new_port_id;
+   struct rte_eth_dev devs[RTE_MAX_ETHPORTS];
+
+   if ((vdevargs == NULL) || (port_id == NULL))
+   goto err0;
+
+   args = strdup(vdevargs);
+   if (args == NULL)
+   goto err0;
+
+   /* save current port status */
+   rte_eth_dev_save(devs);
+   /* add the vdevargs to devargs_list */
+   if (rte_eal_devargs_add(RTE_DEVTYPE_VIRTUAL, args))
+   goto err1;
+   /* parse vdevargs, then retrieve device name */
+   get_vdev_name(args);
+   /* walk around dev_driver_list to find the driver of the device,
+* then invoke probe function o the driver */
+   if (rte_eal_dev_find_and_invoke(args, INVOKE_PROBE))
+   goto err2;
+   /* get port_id enabled by above procedures */
+   if (rte_eth_dev_get_changed_port(devs, _port_id))
+   goto err2;
+
+   free(args);
+   *port_id = new_port_id;
+   return 0;
+err2:
+   rte_eal_devargs_remove(RTE_DEVTYPE_VIRTUAL, args);
+err1:
+   free(args);
+err0:
+   RTE_LOG(ERR, EAL, "Drver, cannot detach the device\n");
+   return -1;
+}
+
+/* detach the new virtual device, then store the name of the device */
+int
+rte_eal_dev_detach_vdev(uint8_t port_id, char *vdevname)
+{
+   char name[RTE_ETH_NAME_MAX_LEN];
+
+   if (vdevname == NULL)
+   goto err;
+
+   /* check whether the driver supports detach feature, or not */
+   if (rte_eth_dev_check_detachable(port_id))
+   goto err;
+
+   /* get device name by port id */
+   if (rte_eth_dev_get_name_by_port(port_id, name))
+   goto err;
+   /* walk around dev_driver_list to find the driver of the device,
+* then invoke close function o the driver */
+   if (rte_eal_dev_find_and_invoke(name, INVOKE_CLOSE))
+   goto err;
+   /* remove the vdevname from devargs_list */
+   rte_eal_devargs_remove(RTE_DEVTYPE_VIRTUAL, name);
+
+   strncpy(vdevname, name, sizeof(name));
+   return 0;
+err:
+   RTE_LOG(ERR, EAL, "Drver, cannot detach the device\n");
+   return -1;
+}
 #endif /* RTE_LIBRTE_EAL_HOTPLUG & RTE_LIBRTE_EAL_LINUXAPP */
diff --git a/lib/librte_eal/common/include/rte_dev.h 
b/lib/librte_eal/common/include/rte_dev.h
index 71d40c3..159d5a5 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -98,6 +98,35 @@ void rte_eal_driver_register(struct rte_driver *driver);
  */
 void rte_eal_driver_unregister(struct rte_driver *driver);

+#if defined(RTE_LIBRTE_EAL_HOTPLUG) && defined(RTE_LIBRTE_EAL_LINUXAPP)
+
+/**
+ * Attach a new virtual device.
+ *
+ * @param vdevargs
+ *   A pointer to a strings array describing the new device
+ *   to be attached.
+ * @param port_id
+ *  A pointer to a port identifier actually attached.
+ * @return
+ *  0 on success and port_id is filled, negative on error
+ */
+int rte_eal_dev_attach_vdev(const char *vdevargs, uint8_t *port_id);
+
+/**
+ * Detach a virtual device.
+ *
+ * @param port_id
+ *   The port identifier of the virtual device to detach.
+ * @param addr
+ *  A pointer to a virtual device name actually detached.
+ * @return
+ *  0 on success and vdevname is filled, negative on error
+ */
+int rte_eal_dev_detach_vdev(uint8_t port_id, char *vdevname);
+
+#endif /* RTE_LIBRTE_EAL_HOTPLUG & RTE_LIBRTE_EAL_LINUXAPP */
+
 /**
  * Initalize all the registered drivers in this process
  */
-- 
1.9.1

[dpdk-dev] [PATCH v2 15/28] eal/pci: Add probe and close function for virtual drivers

2014-12-09 Thread Tetsuya Mukawa

The patch adds rte_eal_dev_init_one() and rte_eal_dev_close_one().
These are used for attaching and detaching virtual devices.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_dev.c  | 66 +
 lib/librte_eal/common/include/rte_dev.h |  6 +++
 lib/librte_eal/linuxapp/eal/Makefile|  1 +
 3 files changed, 73 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_dev.c 
b/lib/librte_eal/common/eal_common_dev.c
index eae5656..f573a54 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -32,10 +32,13 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */

+#include 
+#include 
 #include 
 #include 
 #include 

+#include 
 #include 
 #include 
 #include 
@@ -107,3 +110,66 @@ rte_eal_dev_init(void)
}
return 0;
 }
+
+/* So far, linux only supports DPDK hotplug function. */
+#if defined(RTE_LIBRTE_EAL_HOTPLUG) && defined(RTE_LIBRTE_EAL_LINUXAPP)
+
+#define INVOKE_PROBE   (0)
+#define INVOKE_CLOSE   (1)
+
+static void
+rte_eal_dev_invoke(struct rte_driver *driver,
+   struct rte_devargs *devargs, int type)
+{
+   if ((driver == NULL) || (devargs == NULL))
+   return;
+
+   switch (type) {
+   case INVOKE_PROBE:
+   driver->init(devargs->virtual.drv_name, devargs->args);
+   break;
+   case INVOKE_CLOSE:
+   driver->close(devargs->virtual.drv_name, devargs->args);
+   break;
+   }
+}
+
+static int
+rte_eal_dev_find_and_invoke(const char *name, int type)
+{
+   struct rte_devargs *devargs;
+   struct rte_driver *driver;
+
+   if (name == NULL)
+   return -EINVAL;
+
+   /* call the init function for each virtual device */
+   TAILQ_FOREACH(devargs, _list, next) {
+
+   if (devargs->type != RTE_DEVTYPE_VIRTUAL)
+   continue;
+
+   if (strncmp(name, devargs->virtual.drv_name, strlen(name)))
+   continue;
+
+   TAILQ_FOREACH(driver, _driver_list, next) {
+   if (driver->type != PMD_VDEV)
+   continue;
+
+   /* search a driver prefix in virtual device name */
+   if (!strncmp(driver->name, devargs->virtual.drv_name,
+   strlen(driver->name))) {
+   rte_eal_dev_invoke(driver, devargs, type);
+   break;
+   }
+   }
+
+   if (driver == NULL) {
+   RTE_LOG(WARNING, EAL, "no driver found for %s\n",
+ devargs->virtual.drv_name);
+   }
+   return 0;
+   }
+   return 1;
+}
+#endif /* RTE_LIBRTE_EAL_HOTPLUG & RTE_LIBRTE_EAL_LINUXAPP */
diff --git a/lib/librte_eal/common/include/rte_dev.h 
b/lib/librte_eal/common/include/rte_dev.h
index f7e3a10..71d40c3 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -57,6 +57,11 @@ TAILQ_HEAD(rte_driver_list, rte_driver);
 typedef int (rte_dev_init_t)(const char *name, const char *args);

 /**
+ * Close function called for each device driver once.
+ */
+typedef int (rte_dev_close_t)(const char *name, const char *args);
+
+/**
  * Driver type enumeration
  */
 enum pmd_type {
@@ -72,6 +77,7 @@ struct rte_driver {
enum pmd_type type;/**< PMD Driver type */
const char *name;   /**< Driver name. */
rte_dev_init_t *init;  /**< Device init. function. */
+   rte_dev_close_t *close;/**< Device close. function. */
 };

 /**
diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index 72ecf3a..0ec83b5 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -41,6 +41,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common/include
 CFLAGS += -I$(RTE_SDK)/lib/librte_ring
 CFLAGS += -I$(RTE_SDK)/lib/librte_mempool
 CFLAGS += -I$(RTE_SDK)/lib/librte_malloc
+CFLAGS += -I$(RTE_SDK)/lib/librte_mbuf
 CFLAGS += -I$(RTE_SDK)/lib/librte_ether
 CFLAGS += -I$(RTE_SDK)/lib/librte_ivshmem
 CFLAGS += -I$(RTE_SDK)/lib/librte_pmd_ring
-- 
1.9.1

[dpdk-dev] [PATCH v2 13/28] eal/pci: Prevent double registration for devargs_list

2014-12-09 Thread Tetsuya Mukawa

The patch fixes rte_eal_devargs_add() not to register same device twice.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_devargs.c | 35 ++
 1 file changed, 35 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_devargs.c 
b/lib/librte_eal/common/eal_common_devargs.c
index 4c7d11a..f95a12d 100644
--- a/lib/librte_eal/common/eal_common_devargs.c
+++ b/lib/librte_eal/common/eal_common_devargs.c
@@ -44,6 +44,35 @@
 struct rte_devargs_list devargs_list =
TAILQ_HEAD_INITIALIZER(devargs_list);

+
+/* find a entry specified by pci address or device name */
+static struct rte_devargs *
+rte_eal_devargs_find(enum rte_devtype devtype, void *args)
+{
+   struct rte_devargs *devargs;
+
+   if (args == NULL)
+   return NULL;
+
+   TAILQ_FOREACH(devargs, _list, next) {
+   switch (devtype) {
+   case RTE_DEVTYPE_WHITELISTED_PCI:
+   case RTE_DEVTYPE_BLACKLISTED_PCI:
+   if (eal_compare_pci_addr(>pci.addr, args) == 0)
+   goto found;
+   break;
+   case RTE_DEVTYPE_VIRTUAL:
+   if (memcmp(>virtual.drv_name, args,
+   strlen((char *)args)) == 0)
+   goto found;
+   break;
+   }
+   }
+   return NULL;
+found:
+   return devargs;
+}
+
 /* store a whitelist parameter for later parsing */
 int
 rte_eal_devargs_add(enum rte_devtype devtype, const char *devargs_str)
@@ -101,6 +130,12 @@ rte_eal_devargs_add(enum rte_devtype devtype, const char 
*devargs_str)
break;
}

+   /* make sure there is no same entry */
+   if (rte_eal_devargs_find(devtype, >pci.addr)) {
+   RTE_LOG(ERR, EAL, "device already registered: <%s>\n", buf);
+   return -1;
+   }
+
TAILQ_INSERT_TAIL(_list, devargs, next);
return 0;
 }
-- 
1.9.1

[dpdk-dev] [PATCH v2 12/28] ethdev: Change scope of rte_eth_dev_allocated to global

2014-12-09 Thread Tetsuya Mukawa

This function is used by virtual PMDs to support port hotplug framework.
So change scope of the function to global.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_ether/rte_ethdev.c |  2 +-
 lib/librte_ether/rte_ethdev.h | 10 ++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index ed53e66..86200e0 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -206,7 +206,7 @@ rte_eth_dev_data_alloc(void)
RTE_MAX_ETHPORTS * sizeof(*rte_eth_dev_data));
 }

-static struct rte_eth_dev *
+struct rte_eth_dev *
 rte_eth_dev_allocated(const char *name)
 {
unsigned i;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 404c41f..47622a2 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1712,6 +1712,16 @@ extern int rte_eth_dev_get_name_by_port(uint8_t port_id, 
char *name);
 extern int rte_eth_dev_check_detachable(uint8_t port_id);

 /**
+ * Function for internal use by port hotplug functions.
+ * Returns a ethdev slot specified by the unique identifier name.
+ * @param  name
+ *  The pointer to the Unique identifier name for each Ethernet device
+ * @return
+ *   - The pointer to the ethdev slot, on success. NULL on error
+ */
+extern struct rte_eth_dev *rte_eth_dev_allocated(const char *name);
+
+/**
  * Function for internal use by dummy drivers primarily, e.g. ring-based
  * driver.
  * Allocates a new ethdev slot for an ethernet device and returns the pointer
-- 
1.9.1

[dpdk-dev] [PATCH v2 11/28] ethdev: Add rte_eth_dev_check_detachable

2014-12-09 Thread Tetsuya Mukawa

The function returns whether a PMD supports detach function, or not.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_ether/rte_ethdev.c |  9 +
 lib/librte_ether/rte_ethdev.h | 11 +++
 2 files changed, 20 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 1d82f69..ed53e66 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -498,6 +498,15 @@ rte_eth_dev_get_name_by_port(uint8_t port_id, char *name)
return 0;
 }

+int
+rte_eth_dev_check_detachable(uint8_t port_id)
+{
+   uint32_t drv_flags;
+
+   drv_flags = rte_eth_devices[port_id].driver->pci_drv.drv_flags;
+   return !(drv_flags & RTE_PCI_DRV_DETACHABLE);
+}
+
 static int
 rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index bd921d0..404c41f 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1701,6 +1701,17 @@ extern int rte_eth_dev_get_port_by_addr(
 extern int rte_eth_dev_get_name_by_port(uint8_t port_id, char *name);

 /**
+ * Function for internal use by port hotplug functions.
+ * Check whether or not, a PMD that is handling the ethdev specified by port
+ * identifier can support detach function.
+ * @param  port_id
+ *   The port identifier
+ * @return
+ *   - 0 on supporting detach function, negative on not supporting
+ */
+extern int rte_eth_dev_check_detachable(uint8_t port_id);
+
+/**
  * Function for internal use by dummy drivers primarily, e.g. ring-based
  * driver.
  * Allocates a new ethdev slot for an ethernet device and returns the pointer
-- 
1.9.1

[dpdk-dev] [PATCH v2 09/28] ethdev: Add rte_eth_dev_get_port_by_addr

2014-12-09 Thread Tetsuya Mukawa

The function returns a port identifier of a ethdev specified by pci
address.

v3:
- Fix if-condition bug while comparing pci addresses.
- Add error checking codes.
Reported-by: Mark Enright 

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_ether/rte_ethdev.c | 22 ++
 lib/librte_ether/rte_ethdev.h | 13 +
 2 files changed, 35 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index ddaf14a..4e9f0f7 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -454,6 +454,28 @@ rte_eth_dev_get_addr_by_port(uint8_t port_id, struct 
rte_pci_addr *addr)
return 0;
 }

+int
+rte_eth_dev_get_port_by_addr(struct rte_pci_addr *addr, uint8_t *port_id)
+{
+   struct rte_pci_addr *tmp;
+
+   if ((addr == NULL) || (port_id == NULL)) {
+   PMD_DEBUG_TRACE("Null pointer is specified\n");
+   return -1;
+   }
+
+   for (*port_id = 0; *port_id < RTE_MAX_ETHPORTS; (*port_id)++) {
+   if (!rte_eth_devices[*port_id].attached)
+   continue;
+   if (!rte_eth_devices[*port_id].pci_dev)
+   continue;
+   tmp = _eth_devices[*port_id].pci_dev->addr;
+   if (eal_compare_pci_addr(tmp, addr) == 0)
+   return 0;
+   }
+   return -1;
+}
+
 static int
 rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 30277a2..8c55c56 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1676,6 +1676,19 @@ extern int rte_eth_dev_get_addr_by_port(
uint8_t port_id, struct rte_pci_addr *addr);

 /**
+ * Function for internal use by port hotplug functions.
+ * Returns a port identifier of a ethdev specified by pci address.
+ * @param  addr
+ *   The pointer to the pci address of the Ethernet device.
+ * @param  port_id
+ *   The pointer to the port identifier
+ * @return
+ *   - 0 on success, negative on error
+ */
+extern int rte_eth_dev_get_port_by_addr(
+   struct rte_pci_addr *addr, uint8_t *port_id);
+
+/**
  * Function for internal use by dummy drivers primarily, e.g. ring-based
  * driver.
  * Allocates a new ethdev slot for an ethernet device and returns the pointer
-- 
1.9.1

[dpdk-dev] [PATCH v2 07/28] ethdev: Add functions to know which port is attached or detached

2014-12-09 Thread Tetsuya Mukawa

The patch adds rte_eth_dev_save() and rte_eth_dev_get_changed_port().
rte_eth_dev_save() is used for saving current rte_eth_dev structures.
rte_eth_dev_get_changed_port() receives the rte_eth_dev structures, then
compare these with current values to know which port is actually
attached or detached.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_ether/rte_ethdev.c | 21 +
 lib/librte_ether/rte_ethdev.h | 21 +
 2 files changed, 42 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 51697e1..6a3700e 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -416,6 +416,27 @@ rte_eth_dev_count(void)
return (nb_ports);
 }

+void
+rte_eth_dev_save(struct rte_eth_dev *devs)
+{
+   if (devs == NULL)
+   return;
+
+   /* save current rte_eth_devices */
+   memcpy(devs, rte_eth_devices,
+   sizeof(struct rte_eth_dev) * RTE_MAX_ETHPORTS);
+}
+
+int
+rte_eth_dev_get_changed_port(struct rte_eth_dev *devs, uint8_t *port_id)
+{
+   /* check which port was attached or detached */
+   for (*port_id = 0; *port_id < RTE_MAX_ETHPORTS; (*port_id)++, devs++)
+   if (rte_eth_devices[*port_id].attached ^ devs->attached)
+   return 0;
+   return 1;
+}
+
 static int
 rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index b329e11..03c8850 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1642,6 +1642,27 @@ extern struct rte_eth_dev rte_eth_devices[];
 extern uint8_t rte_eth_dev_count(void);

 /**
+ * Function for internal use by port hotplug functions.
+ * Copies current ethdev structures to the specified pointer.
+ *
+ * @param  devsThe pointer to the ethdev structures
+ */
+extern void rte_eth_dev_save(struct rte_eth_dev *devs);
+
+/**
+ * Function for internal use by port hotplug functions.
+ * Compare the specified ethdev structures with currents. Then
+ * if there is a port which status is changed, fill the specified pointer
+ * with the port id of that port.
+ * @param  devsThe pointer to the ethdev structures
+ * @param  port_id The pointer to the port id
+ * @return
+ *   - 0 on success, negative on error
+ */
+extern int rte_eth_dev_get_changed_port(
+   struct rte_eth_dev *devs, uint8_t *port_id);
+
+/**
  * Function for internal use by dummy drivers primarily, e.g. ring-based
  * driver.
  * Allocates a new ethdev slot for an ethernet device and returns the pointer
-- 
1.9.1

[dpdk-dev] [PATCH v2 05/28] eal, ethdev: Add function pointer for closing a device

2014-12-09 Thread Tetsuya Mukawa

The patch adds function pointer to rte_pci_driver and eth_driver
structure. These function pointers are used when ports are detached.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/include/rte_pci.h |  7 +++
 lib/librte_ether/rte_ethdev.h   | 24 
 2 files changed, 31 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index fe374a8..74720d1 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -181,12 +181,19 @@ struct rte_pci_driver;
 typedef int (pci_devinit_t)(struct rte_pci_driver *, struct rte_pci_device *);

 /**
+ * Shutdown function for the driver called during hotplugging.
+ */
+typedef int (pci_devshutdown_t)(
+   struct rte_pci_driver *, struct rte_pci_device *);
+
+/**
  * A structure describing a PCI driver.
  */
 struct rte_pci_driver {
TAILQ_ENTRY(rte_pci_driver) next;   /**< Next in list. */
const char *name;   /**< Driver name. */
pci_devinit_t *devinit; /**< Device init. function. */
+   pci_devshutdown_t *devshutdown; /**< Device shutdown function. 
*/
struct rte_pci_id *id_table;/**< ID table, NULL terminated. 
*/
uint32_t drv_flags; /**< Flags contolling handling 
of device. */
 };
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index fb1caea..b329e11 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1702,6 +1702,27 @@ typedef int (*eth_dev_init_t)(struct eth_driver  
*eth_drv,

 /**
  * @internal
+ * Finalization function of an Ethernet driver invoked for each matching
+ * Ethernet PCI device detected during the PCI closing phase.
+ *
+ * @param eth_drv
+ *   The pointer to the [matching] Ethernet driver structure supplied by
+ *   the PMD when it registered itself.
+ * @param eth_dev
+ *   The *eth_dev* pointer is the address of the *rte_eth_dev* structure
+ *   associated with the matching device and which have been [automatically]
+ *   allocated in the *rte_eth_devices* array.
+ * @return
+ *   - 0: Success, the device is properly finalized by the driver.
+ *In particular, the driver MUST free the *dev_ops* pointer
+ *of the *eth_dev* structure.
+ *   - <0: Error code of the device initialization failure.
+ */
+typedef int (*eth_dev_shutdown_t)(struct eth_driver  *eth_drv,
+ struct rte_eth_dev *eth_dev);
+
+/**
+ * @internal
  * The structure associated with a PMD Ethernet driver.
  *
  * Each Ethernet driver acts as a PCI driver and is represented by a generic
@@ -1711,11 +1732,14 @@ typedef int (*eth_dev_init_t)(struct eth_driver  
*eth_drv,
  *
  * - The *eth_dev_init* function invoked for each matching PCI device.
  *
+ * - The *eth_dev_shutdown* function invoked for each matching PCI device.
+ *
  * - The size of the private data to allocate for each matching device.
  */
 struct eth_driver {
struct rte_pci_driver pci_drv;/**< The PMD is also a PCI driver. */
eth_dev_init_t eth_dev_init;  /**< Device init function. */
+   eth_dev_shutdown_t eth_dev_shutdown;/**< Device shutdown function. */
unsigned int dev_private_size;/**< Size of device private data. */
 };

-- 
1.9.1

[dpdk-dev] [PATCH v2 04/28] ethdev: Add rte_eth_dev_free to free specified device

2014-12-09 Thread Tetsuya Mukawa

This patch adds rte_eth_dev_free(). The function is used for changing a
attached status of the device that has specified name.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_ether/rte_ethdev.c | 17 +
 lib/librte_ether/rte_ethdev.h | 11 +++
 2 files changed, 28 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 9f713ae..d5fdb03 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -260,6 +260,23 @@ rte_eth_dev_allocate(const char *name)
return eth_dev;
 }

+struct rte_eth_dev *
+rte_eth_dev_free(const char *name)
+{
+   struct rte_eth_dev *eth_dev;
+
+   eth_dev = rte_eth_dev_allocated(name);
+   if (eth_dev == NULL) {
+   PMD_DEBUG_TRACE("Ethernet Device with name %s doesn't exist!\n",
+   name);
+   return NULL;
+   }
+
+   eth_dev->attached = 0;
+   nb_ports--;
+   return eth_dev;
+}
+
 static int
 rte_eth_dev_init(struct rte_pci_driver *pci_drv,
 struct rte_pci_device *pci_dev)
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 257de86..fb1caea 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1653,6 +1653,17 @@ extern uint8_t rte_eth_dev_count(void);
  */
 struct rte_eth_dev *rte_eth_dev_allocate(const char *name);

+/**
+ * Function for internal use by dummy drivers primarily, e.g. ring-based
+ * driver.
+ * Free the specified ethdev and returns the pointer to that slot.
+ *
+ * @param  nameUnique identifier name for each Ethernet device
+ * @return
+ *   - Slot in the rte_dev_devices array for the freed device;
+ */
+struct rte_eth_dev *rte_eth_dev_free(const char *name);
+
 struct eth_driver;
 /**
  * @internal
-- 
1.9.1

[dpdk-dev] [PATCH v2 03/28] eal/pci: Replace pci address comparison code by eal_compare_pci_addr

2014-12-09 Thread Tetsuya Mukawa

This patch replaces pci_addr_comparison() and memcmp() of pci addresses by
eal_compare_pci_addr().

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c   | 16 +---
 lib/librte_eal/common/eal_common_pci.c|  2 +-
 lib/librte_eal/common/include/rte_pci.h   | 29 +
 lib/librte_eal/linuxapp/eal/eal_pci.c | 16 +---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c |  2 +-
 5 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 74ecce7..7eda513 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -270,20 +270,6 @@ pci_uio_map_resource(struct rte_pci_device *dev)
return (0);
 }

-/* Compare two PCI device addresses. */
-static int
-pci_addr_comparison(struct rte_pci_addr *addr, struct rte_pci_addr *addr2)
-{
-   uint64_t dev_addr = (addr->domain << 24) + (addr->bus << 16) + 
(addr->devid << 8) + addr->function;
-   uint64_t dev_addr2 = (addr2->domain << 24) + (addr2->bus << 16) + 
(addr2->devid << 8) + addr2->function;
-
-   if (dev_addr > dev_addr2)
-   return 1;
-   else
-   return 0;
-}
-
-
 /* Scan one pci sysfs entry, and fill the devices list from it. */
 static int
 pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
@@ -358,7 +344,7 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
struct rte_pci_device *dev2 = NULL;

TAILQ_FOREACH(dev2, _device_list, next) {
-   if (pci_addr_comparison(>addr, >addr))
+   if (eal_compare_pci_addr(>addr, >addr))
continue;
else {
TAILQ_INSERT_BEFORE(dev2, dev, next);
diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index f3c7f71..f01f258 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -93,7 +93,7 @@ static struct rte_devargs *pci_devargs_lookup(struct 
rte_pci_device *dev)
if (devargs->type != RTE_DEVTYPE_BLACKLISTED_PCI &&
devargs->type != RTE_DEVTYPE_WHITELISTED_PCI)
continue;
-   if (!memcmp(>addr, >pci.addr, sizeof(dev->addr)))
+   if (eal_compare_pci_addr(>addr, >pci.addr))
return devargs;
}
return NULL;
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index b819539..fe374a8 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -261,6 +261,35 @@ eal_parse_pci_DomBDF(const char *input, struct 
rte_pci_addr *dev_addr)
 }
 #undef GET_PCIADDR_FIELD

+/* Compare two PCI device addresses. */
+/**
+ * Utility function to compare two PCI device addresses.
+ *
+ * @param addr
+ * The PCI Bus-Device-Function address to compare
+ * @param addr2
+ * The PCI Bus-Device-Function address to compare
+ * @return
+ *  0 on equal PCI address.
+ *  Positive on addr is greater than addr2.
+ *  Negative on addr is less than addr2.
+ */
+static inline int
+eal_compare_pci_addr(struct rte_pci_addr *addr, struct rte_pci_addr *addr2)
+{
+   uint64_t dev_addr = (addr->domain << 24) + (addr->bus << 16) +
+   (addr->devid << 8) + addr->function;
+   uint64_t dev_addr2 = (addr2->domain << 24) + (addr2->bus << 16) +
+   (addr2->devid << 8) + addr2->function;
+
+   if (dev_addr > dev_addr2)
+   return 1;
+   else if (dev_addr < dev_addr2)
+   return -1;
+   else
+   return 0;
+}
+
 /**
  * Probe the PCI bus for registered drivers.
  *
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index b5f5410..23a69e9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -200,20 +200,6 @@ error:
return -1;
 }

-/* Compare two PCI device addresses. */
-static int
-pci_addr_comparison(struct rte_pci_addr *addr, struct rte_pci_addr *addr2)
-{
-   uint64_t dev_addr = (addr->domain << 24) + (addr->bus << 16) + 
(addr->devid << 8) + addr->function;
-   uint64_t dev_addr2 = (addr2->domain << 24) + (addr2->bus << 16) + 
(addr2->devid << 8) + addr2->function;
-
-   if (dev_addr > dev_addr2)
-   return 1;
-   else
-   return 0;
-}
-
-
 /* Scan one pci sysfs entry, and fill the devices list from it. */
 static int
 pci_scan_one(const char *dirname, uint16_t domain, uint8_t bus,
@@ -306,7 +292,7 @@ pci_scan_one(const char *dirname, uint16_t domain, uint8_t 
bus,
struct rte_pci_device *dev2 = NULL;

TAILQ_FOREACH(dev2, _device_list, next) {
-   if (pci_addr_comparison(>addr, >addr))

[dpdk-dev] [PATCH v2 02/28] ethdev: Remove assumption that port will not be detached

2014-12-09 Thread Tetsuya Mukawa

To remove assumption, do like followings.

- Add 'attached' member to rte_eth_dev structure.
  This member is used for indicating the port is attached, or not.
- Add rte_eth_dev_allocate_new_port().
  This function is used for allocating new port.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_ether/rte_ethdev.c | 248 --
 lib/librte_ether/rte_ethdev.h |   5 +
 2 files changed, 149 insertions(+), 104 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 95f2ceb..9f713ae 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -175,6 +175,16 @@ enum {
STAT_QMAP_RX
 };

+enum {
+   DEV_VALID = 0,
+   DEV_INVALID
+};
+
+enum {
+   DEV_DISCONNECTED = 0,
+   DEV_CONNECTED
+};
+
 static inline void
 rte_eth_dev_data_alloc(void)
 {
@@ -201,19 +211,34 @@ rte_eth_dev_allocated(const char *name)
 {
unsigned i;

-   for (i = 0; i < nb_ports; i++) {
-   if (strcmp(rte_eth_devices[i].data->name, name) == 0)
+   for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
+   if ((rte_eth_devices[i].attached == DEV_CONNECTED)
+   && strcmp(rte_eth_devices[i].data->name,
+   name) == 0)
return _eth_devices[i];
}
return NULL;
 }

+static uint8_t
+rte_eth_dev_allocate_new_port(void)
+{
+   unsigned i;
+
+   for (i = 0; i < RTE_MAX_ETHPORTS; i++)
+   if (rte_eth_devices[i].attached == DEV_DISCONNECTED)
+   return i;
+   return RTE_MAX_ETHPORTS;
+}
+
 struct rte_eth_dev *
 rte_eth_dev_allocate(const char *name)
 {
+   uint8_t port_id;
struct rte_eth_dev *eth_dev;

-   if (nb_ports == RTE_MAX_ETHPORTS) {
+   port_id = rte_eth_dev_allocate_new_port();
+   if (port_id == RTE_MAX_ETHPORTS) {
PMD_DEBUG_TRACE("Reached maximum number of Ethernet ports\n");
return NULL;
}
@@ -226,10 +251,12 @@ rte_eth_dev_allocate(const char *name)
return NULL;
}

-   eth_dev = _eth_devices[nb_ports];
-   eth_dev->data = _eth_dev_data[nb_ports];
+   eth_dev = _eth_devices[port_id];
+   eth_dev->data = _eth_dev_data[port_id];
snprintf(eth_dev->data->name, sizeof(eth_dev->data->name), "%s", name);
-   eth_dev->data->port_id = nb_ports++;
+   eth_dev->data->port_id = port_id;
+   eth_dev->attached = DEV_CONNECTED;
+   nb_ports++;
return eth_dev;
 }

@@ -283,6 +310,7 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv,
(unsigned) pci_dev->id.device_id);
if (rte_eal_process_type() == RTE_PROC_PRIMARY)
rte_free(eth_dev->data->dev_private);
+   eth_dev->attached = DEV_DISCONNECTED;
nb_ports--;
return diag;
 }
@@ -308,10 +336,22 @@ rte_eth_driver_register(struct eth_driver *eth_drv)
rte_eal_pci_register(_drv->pci_drv);
 }

+static int
+rte_eth_dev_validate_port(uint8_t port_id)
+{
+   if (port_id >= RTE_MAX_ETHPORTS)
+   return DEV_INVALID;
+
+   if (rte_eth_devices[port_id].attached == DEV_CONNECTED)
+   return DEV_VALID;
+   else
+   return DEV_INVALID;
+}
+
 int
 rte_eth_dev_socket_id(uint8_t port_id)
 {
-   if (port_id >= nb_ports)
+   if (rte_eth_dev_validate_port(port_id) == DEV_INVALID)
return -1;
return rte_eth_devices[port_id].pci_dev->numa_node;
 }
@@ -369,7 +409,7 @@ rte_eth_dev_rx_queue_start(uint8_t port_id, uint16_t 
rx_queue_id)
 * in a multi-process setup*/
PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);

-   if (port_id >= nb_ports) {
+   if (rte_eth_dev_validate_port(port_id) == DEV_INVALID) {
PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
return -EINVAL;
}
@@ -395,7 +435,7 @@ rte_eth_dev_rx_queue_stop(uint8_t port_id, uint16_t 
rx_queue_id)
 * in a multi-process setup*/
PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);

-   if (port_id >= nb_ports) {
+   if (rte_eth_dev_validate_port(port_id) == DEV_INVALID) {
PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
return -EINVAL;
}
@@ -421,7 +461,7 @@ rte_eth_dev_tx_queue_start(uint8_t port_id, uint16_t 
tx_queue_id)
 * in a multi-process setup*/
PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);

-   if (port_id >= nb_ports) {
+   if (rte_eth_dev_validate_port(port_id) == DEV_INVALID) {
PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
return -EINVAL;
}
@@ -447,7 +487,7 @@ rte_eth_dev_tx_queue_stop(uint8_t port_id, uint16_t 
tx_queue_id)
 * in a multi-process setup*/
PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);

-   if (port_id >= nb_ports) {
+   if (rte_eth_dev_validate_port(port_id) == DEV_INVALID) {

[dpdk-dev] [PATCH v2 01/28] eal/pci: Add a new flag indicating a driver can detach devices at runtime.

2014-12-09 Thread Tetsuya Mukawa

This patch adds "RTE_PCI_DRV_DETACHABLE" to drv_flags of rte_pci_driver
structure. The flags indicates the driver can detach devices at runtime.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/include/rte_pci.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 66ed793..b819539 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -199,6 +199,8 @@ struct rte_pci_driver {
 #define RTE_PCI_DRV_FORCE_UNBIND 0x0004
 /** Device driver supports link state interrupt */
 #define RTE_PCI_DRV_INTR_LSC   0x0008
+/** Device driver supports detaching capablity */
+#define RTE_PCI_DRV_DETACHABLE 0x0010

 /**< Internal use only - Macro used by pci addr parsing functions **/
 #define GET_PCIADDR_FIELD(in, fd, lim, dlm)   \
-- 
1.9.1

[dpdk-dev] [PATCH v2 00/28] Port Hotplug Framework

2014-12-09 Thread Tetsuya Mukawa

This patch series adds a dynamic port hotplug framework to DPDK.
With the patches, DPDK apps can attach or detach ports at runtime.

The basic concept of the port hotplug is like followings.
- DPDK apps must have responsibility to manage ports.
  DPDK apps only know which ports are attached or detached at the moment.
  The port hotplug framework is implemented to allow DPDK apps to manage ports.
  For example, when DPDK apps call port attach function, attached port number
  will be returned. Also DPDK apps can detach port by port number.
- Kernel support is needed for attaching or detaching physical device ports.
  To attach new device, the device will be recognized by kernel at first and
  controlled by kernel driver. Then user can bind the device to igb_uio
  by 'dpdk_nic_bind.py'. Finally, DPDK apps can call the port hotplug
  functions to attach ports.
  For detaching, steps are vice versa.
- Before detach ports, ports must be stopped and closed.
  DPDK application must call rte_eth_dev_stop() and rte_eth_dev_close() before
  detaching ports. These function will call finalization codes of PMDs.
  But so far, no PMD frees all resources allocated by initialization.
  It means PMDs are needed to be fixed to support the port hotplug.
  'RTE_PCI_DRV_DETACHABLE' is a new flag indicating a PMD supports detaching.
  Without this flag, detaching will be failed.
- Mustn't affect legacy DPDK apps.
  No DPDK EAL behavior is changed, if the port hotplug functions are't called.
  So all legacy DPDK apps can still work without modifications.

And few limitations.
- The port hotplug functions are not thread safe.
  DPDK apps should handle it.
- Only support Linux and igb_uio so far.
  BSD and VFIO is not supported. I will send VFIO patches at least, but I don't
  have a plan to submit BSD patch so far.


Here is port hotplug APIs.
---
/**
 * Attach a new device.
 *
 * @param devargs
 *   A pointer to a strings array describing the new device
 *   to be attached. The strings should be a pci address like
 *   ':01:00.0' or virtual device name like 'eth_pcap0'.
 * @param port_id
 *  A pointer to a port identifier actually attached.
 * @return
 *  0 on success and port_id is filled, negative on error
 */
int rte_eal_dev_attach(const char *devargs, uint8_t *port_id);

/**
 * Detach a device.
 *
 * @param port_id
 *   The port identifier of the device to detach.
 * @param addr
 *  A pointer to a device name actually detached.
 * @return
 *  0 on success and devname is filled, negative on error
 */
int rte_eal_dev_detach(uint8_t port_id, char *devname);
---

This patch series are for DPDK EAL. To use port hotplug function by DPDK apps,
each PMD should be fixed to support 'RTE_PCI_DRV_DETACHABLE' flag. Please check
a patch for pcap PMD.

Also please check testpmd patch. It will show you how to fix your legacy
applications to support port hotplug feature.


PATCH v2 chages:
 - Replace rte_eal_dev_attach_pdev(), rte_eal_dev_detach_pdev,
   rte_eal_dev_attach_vdev() and rte_eal_dev_detach_vdev() to
   rte_eal_dev_attach() and rte_eal_dev_detach().
 - Add parameter values checking.
 - Refashion a few functions.
   (Thanks to Iremonger, Bernard)

PATCH v1 Chages:
 - Fix error checking code of librte_eth APIs.
 - Fix issue that port from pcap PMD cannot be detached correctly.
 - Fix issue that testpmd could hang after forwarding, if attaching and 
detaching
   is repeatedly.
 - Fix if-condition of rte_eth_dev_get_port_by_addr().
   (Thanks to Mark Enright)

RFC PATCH v2 Changes:
- remove 'rte_eth_dev_validate_port()', and cleanup codes.


Tetsuya Mukawa (28):
  eal/pci: Add a new flag indicating a driver can detach devices at
runtime.
  ethdev: Remove assumption that port will not be detached
  eal/pci: Replace pci address comparison code by eal_compare_pci_addr
  ethdev: Add rte_eth_dev_free to free specified device
  eal,ethdev: Add function pointer for closing a device
  ethdev: Add rte_eth_dev_shutdown for closing PCI devices.
  ethdev: Add functions to know which port is attached or detached
  ethdev: Add rte_eth_dev_get_addr_by_port
  ethdev: Add rte_eth_dev_get_port_by_addr
  ethdev: Add rte_eth_dev_get_name_by_port
  ethdev: Add rte_eth_dev_check_detachable
  ethdev: Change scope of rte_eth_dev_allocated to global
  eal/pci: Prevent double registration for devargs_list
  eal/pci: Add rte_eal_devargs_remove
  eal/pci: Add probe and close function for virtual drivers
  eal/pci: Add port hotplug functions for virtual devices.
  eal/linux/pci: Add functions for unmapping igb_uio resources
  eal/pci: Prevent double registrations for pci_device_list
  eal/pci: Change scope of rte_eal_pci_scan to global
  eal/pci: Add rte_eal_pci_close_one_driver
  eal/pci: Fix pci_probe_all_drivers to share code with closing function
  eal/pci: Add pci_close_all_drivers
  eal/pci: Add

[dpdk-dev] [PATCH v3 3/3] doc: add VM power mgmt app

2014-12-09 Thread Pablo de Lara

Added new section in sample app UG for
the new VM power management app.

Signed-off-by: Alan Carew 
Signed-off-by: Pablo de Lara 
---
 doc/guides/rel_notes/rel_description.rst |2 +
 doc/guides/sample_app_ug/index.rst   |5 +
 doc/guides/sample_app_ug/vm_power_management.rst |  361 ++
 3 files changed, 368 insertions(+), 0 deletions(-)
 create mode 100644 doc/guides/sample_app_ug/vm_power_management.rst

diff --git a/doc/guides/rel_notes/rel_description.rst 
b/doc/guides/rel_notes/rel_description.rst
index 07c897b..d159b3c 100644
--- a/doc/guides/rel_notes/rel_description.rst
+++ b/doc/guides/rel_notes/rel_description.rst
@@ -149,6 +149,8 @@ The following is a list of Intel? DPDK documents in the 
suggested reading order

 *   Kernel NIC Interface (KNI)

+*   VM Power Management
+
 In addition, there are some other applications that are built when the 
libraries are created.
 The source for these applications is in the DPDK/app directory and are 
called:

diff --git a/doc/guides/sample_app_ug/index.rst 
b/doc/guides/sample_app_ug/index.rst
index db88b0d..c3b50e2 100644
--- a/doc/guides/sample_app_ug/index.rst
+++ b/doc/guides/sample_app_ug/index.rst
@@ -101,6 +101,7 @@ Copyright ? 2012 - 2014, Intel Corporation. All rights 
reserved.
 internet_proto_ip_pipeline
 test_pipeline
 dist_app
+vm_power_management

 **Figures**

@@ -152,6 +153,10 @@ Copyright ? 2012 - 2014, Intel Corporation. All rights 
reserved.

 :ref:`Figure 23.Distributor Sample Application Layout `

+:ref:`Figure 24.High level Solution `
+
+:ref:`Figure 25.VM request to scale frequency `
+
 **Tables**

 :ref:`Table 1.Output Traffic Marking `
diff --git a/doc/guides/sample_app_ug/vm_power_management.rst 
b/doc/guides/sample_app_ug/vm_power_management.rst
new file mode 100644
index 000..f5b5200
--- /dev/null
+++ b/doc/guides/sample_app_ug/vm_power_management.rst
@@ -0,0 +1,361 @@
+..  BSD LICENSE
+Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of Intel Corporation nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+VM Power Management Application
+===
+
+Introduction
+
+
+Applications running in Virtual Environments have an abstract view of
+the underlying hardware on the Host, in particular applications cannot see
+the binding of virtual to physical hardware.
+When looking at CPU resourcing, the pinning of Virtual CPUs(vCPUs) to
+Host Physical CPUs(pCPUS) is not apparent to an application
+and this pinning may change over time.
+Furthermore, Operating Systems on virtual machines do not have the ability
+to govern their own power policy; the Machine Specific Registers (MSRs)
+for enabling P-State transitions are not exposed to Operating Systems
+running on Virtual Machines(VMs).
+
+The Virtual Machine Power Management solution shows an example of
+how a DPDK application can indicate its processing requirements using VM local
+only information(vCPU/lcore) to a Host based Monitor which is responsible
+for accepting requests for frequency changes for a vCPU, translating the vCPU
+to a pCPU via libvirt and affecting the change in frequency.
+
+The solution is comprised of two high-level components:
+
+#. Example Host Application
+
+   Using a Command Line Interface(CLI) for VM->Host communication channel 
management
+   allows adding channels to the Monitor, setting and querying the vCPU to 
pCPU pinning,
+   inspecting and manually changing

[dpdk-dev] [PATCH v3 1/3] doc: add vm power mgmt overview svg

2014-12-09 Thread Pablo de Lara

Added first of the two figures in the VM power management app UG:
VM power mangament highlevel overview

Signed-off-by: Alan Carew 
Signed-off-by: Pablo de Lara 
---
 .../sample_app_ug/img/vm_power_mgr_highlevel.svg   |  742 
 1 files changed, 742 insertions(+), 0 deletions(-)
 create mode 100644 doc/guides/sample_app_ug/img/vm_power_mgr_highlevel.svg

diff --git a/doc/guides/sample_app_ug/img/vm_power_mgr_highlevel.svg 
b/doc/guides/sample_app_ug/img/vm_power_mgr_highlevel.svg
new file mode 100644
index 000..bc07dfe
--- /dev/null
+++ b/doc/guides/sample_app_ug/img/vm_power_mgr_highlevel.svg
@@ -0,0 +1,742 @@
+
+http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd;>
+
+
+
+
+http://www.w3.org/2000/svg; 
xmlns:xlink="http://www.w3.org/1999/xlink; 
xmlns:ev="http://www.w3.org/2001/xml-events;
+   width="7.96928in" height="6.37479in" viewBox="0 0 573.788 
458.985" xml:space="preserve" color-interpolation-filters="sRGB"
+   class="st28">
+   
+   
+   
+
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   
+   Page-1
+   
+   Box
+

[dpdk-dev] [PATCH v3 0/3] doc: add VM power mgmt app

2014-12-09 Thread Pablo de Lara

This patchset adds a new sample app UG, contaning explanation
of the new two sample apps added in the VM power management
patchset

Changes in v3:

Scaled svg files
Removed trailing whitespaces
Replaced directives for code blocks
Separate svg files in different commits
Fixed numbered lists

Changes in v2:

Corrected svg files

Pablo de Lara (3):
  doc: add vm power mgmt overview svg
  doc: add vm power mgmt request sequence svg
  doc: add VM power mgmt app

 doc/guides/rel_notes/rel_description.rst   |2 +
 .../sample_app_ug/img/vm_power_mgr_highlevel.svg   |  742 
 .../img/vm_power_mgr_vm_request_seq.svg|  927 
 doc/guides/sample_app_ug/index.rst |5 +
 doc/guides/sample_app_ug/vm_power_management.rst   |  361 
 5 files changed, 2037 insertions(+), 0 deletions(-)
 create mode 100644 doc/guides/sample_app_ug/img/vm_power_mgr_highlevel.svg
 create mode 100644 doc/guides/sample_app_ug/img/vm_power_mgr_vm_request_seq.svg
 create mode 100644 doc/guides/sample_app_ug/vm_power_management.rst

-- 
1.7.4.1

[dpdk-dev] A question about hugepage initialization time

2014-12-09 Thread

> Hey Folks,
>
> Our DPDK application deals with very large in memory data structures, and
> can potentially use tens or even hundreds of gigabytes of hugepage memory.
> During the course of development, we've noticed that as the number of huge
> pages increases, the memory initialization time during EAL init gets to be
> quite long, lasting several minutes at present.  The growth in init time
> doesn't appear to be linear, which is concerning.
>
> This is a minor inconvenience for us and our customers, as memory
> initialization makes our boot times a lot longer than it would otherwise
> be.  Also, my experience has been that really long operations often are
> hiding errors - what you think is merely a slow operation is actually a
> timeout of some sort, often due to misconfiguration. This leads to two
> questions:
>
> 1. Does the long initialization time suggest that there's an error
> happening under the covers?
> 2. If not, is there any simple way that we can shorten memory
> initialization time?
>
> Thanks in advance for your insights.
>
> --
> Matt Laswell
> laswell at infiniteio.com
> infinite io, inc.
>

Hello,

please find some quick comments on the questions:
1.) By our experience long initialization time is normal in case of 
large amount of memory. However this time depends on some things:
- number of hugepages (pagefault handled by kernel is pretty expensive)
- size of hugepages (memset at initialization)

2.) Using 1G pages instead of 2M will reduce the initialization time 
significantly. Using wmemset instead of memset adds an additional 20-30% 
boost by our measurements. Or, just by touching the pages but not cleaning 
them you can have still some more speedup. But in this case your layer or 
the applications above need to do the cleanup at allocation time 
(e.g. by using rte_zmalloc).

Cheers,

[dpdk-dev] A question about hugepage initialization time

2014-12-09 Thread Matthew Hall

On Tue, Dec 09, 2014 at 10:33:59AM -0600, Matt Laswell wrote:
> Our DPDK application deals with very large in memory data structures, and
> can potentially use tens or even hundreds of gigabytes of hugepage memory.

What you're doing is an unusual use case and this is open source code where 
nobody might have tested and QA'ed this yet.

So my recommendation would be adding some rte_log statements to measure the 
various steps in the process to see what's going on. Also using the Linux Perf 
framework to do low-overhead sampling-based profiling, and making sure you've 
got everything compiled with debug symbols so you can see what's consuming the 
execution time.

You might find that it makes sense to use some custom allocators like jemalloc 
alongside of the DPDK allocators, including perhaps "transparent hugepage 
mode" in your process, and some larger page sizes to reduce the number of 
pages.

You can also use this handy kernel options, hugepagesz= hugepages=N . 
This creates guaranteed-contiguous known-good hugepages during boot which 
initialize much more quickly with less trouble and glitches in my experience.

https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
https://www.kernel.org/doc/Documentation/vm/transhuge.txt

There is no one-size-fits-all solution but these are some possibilities.

Good Luck,
Matthew.

[dpdk-dev] A question about hugepage initialization time

2014-12-09 Thread Matt Laswell

Hey Folks,

Our DPDK application deals with very large in memory data structures, and
can potentially use tens or even hundreds of gigabytes of hugepage memory.
During the course of development, we've noticed that as the number of huge
pages increases, the memory initialization time during EAL init gets to be
quite long, lasting several minutes at present.  The growth in init time
doesn't appear to be linear, which is concerning.

This is a minor inconvenience for us and our customers, as memory
initialization makes our boot times a lot longer than it would otherwise
be.  Also, my experience has been that really long operations often are
hiding errors - what you think is merely a slow operation is actually a
timeout of some sort, often due to misconfiguration. This leads to two
questions:

1. Does the long initialization time suggest that there's an error
happening under the covers?
2. If not, is there any simple way that we can shorten memory
initialization time?

Thanks in advance for your insights.

--
Matt Laswell
laswell at infiniteio.com
infinite io, inc.

[dpdk-dev] DDPK use of MAP_FIXED in mmap

2014-12-09 Thread Neil Horman

On Mon, Dec 08, 2014 at 07:02:38PM +, Karmarkar Suyash wrote:
> Hello,
> 
> In DPDK when we use mmap why are we passing the MAP_FIXED flag when Linux man 
> page itself says that the option is discouraged? Any specific reason for 
> passing the MAP_FIXED flag?
> 
Because theres nothing wrong with doing so.  The man page says its discouraged
because it creates less portable applications (due to the fact that not all
operating systems support MAP_FIXED).  Given that we currently only support BSD
and Linux however, and given that MAP_FIXED was added to POSIX for
XSI compliant systems, it seems like a reasonable thing to use, as we will most
likely never run into a system that won't support it
Neil

> 
> http://linux.die.net/man/2/mmap
> 
> MAP_FIXED
> Don't interpret addr as a hint: place the mapping at exactly that address. 
> addr must be a multiple of the page size. If the memory region specified by 
> addr and len overlaps pages of any existing mapping(s), then the overlapped 
> part of the existing mapping(s) will be discarded. If the specified address 
> cannot be used, mmap() will fail. Because requiring a fixed address for a 
> mapping is less portable, the use of this option is discouraged.
> 
> 
> Regards
> Suyash Karmarkar
>

[dpdk-dev] lib: include rte_memory.h for __rte_cache_aligned

2014-12-09 Thread Neil Horman

On Tue, Dec 09, 2014 at 09:53:18AM +0100, Olivier MATZ wrote:
> Hi Neil,
> 
> On 12/08/2014 04:04 PM, Neil Horman wrote:
> >On Fri, Nov 07, 2014 at 09:28:09AM -0800, Jia Yu wrote:
> >>Include rte_memory.h for lib files that use __rte_cache_aligned
> >>attribute.
> >>
> >>Signed-off-by: Jia Yu 
> >>
> >Why?  I presume there was a build break or something.  Please repost with a
> >changelog that details what this patch is for.
> >Neil
> 
> I don't know if Yu's issue was the same, but I had a very "fun" issue
> with __rte_cache_aligned in my application. Consider the following code:
> 
>   struct per_core_foo {
>   ...
>   } __rte_cache_aligned;
> 
>   struct global_foo {
>   struct per_core_foo foo[RTE_MAX_CORE];
>   };
> 
> If __rte_cache_aligned is not defined (rte_memory.h is not included),
> the code compiles but the structure is not aligned... it defines the
> structure and creates a global variable called __rte_cache_aligned.
> And this can lead to really bad things if this code is in a .h that
> is included by files that may or may not include rte_memory.h
> 
> I have no idea about how we could prevent this issue, except using
> __attribute__((aligned(CACHE_LINE))) instead of __rte_cache_aligned.
> 
> Anyway this could probably explain the willing to include rte_memory.h
> everywhere.
> 
> Regards,
> Olivier
> 
> 

So, that is a great explination, and would be good to have in the changelog.

Also, to avoid the problem that you describe, while its preferred to have it at
the end of a struct, you can also put the alignment attribute right after the
struct keyword in gcc:
https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html#Attribute-Syntax

That seems like it would solve the problem going forward.

Neil

[dpdk-dev] [PATCH v2] app/test: fix memory needs after RTE_MAX_LCORE was increased to 128

2014-12-09 Thread Pablo de Lara

Since commit b91c67e5a693211862aa7dc3b78630b4e856c2af,
maximum number of cores is 128, which has increase
the total memory necessary for a rte_mempool structure,
as the per-lcore local cache has been doubled in size.
Therefore, eal_flags unit test was broken since it needed
to use more hugepages.

Changes in v2: Increased memory to 18MB, as that is the
actual minimum memory necessary (depending on the physical memory segments,
DPDK may need less memory)

Signed-off-by: Pablo de Lara 
---
 app/test/test_eal_flags.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index 6e2a8f2..0a8269c 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -55,7 +55,7 @@
 #ifdef RTE_LIBRTE_XEN_DOM0
 #define DEFAULT_MEM_SIZE "30"
 #else
-#define DEFAULT_MEM_SIZE "8"
+#define DEFAULT_MEM_SIZE "18"
 #endif
 #define mp_flag "--proc-type=secondary"
 #define no_hpet "--no-hpet"
-- 
1.7.4.1

[dpdk-dev] [PATCH 14/15] app/test: turn off cpu flag checks for tile architecture

2014-12-09 Thread Neil Horman

On Mon, Dec 08, 2014 at 04:59:37PM +0800, Zhigang Lu wrote:
> Tile processor doesn't have CPU flag hardware registers, so this patch
> turns off cpu flag checks for tile.
> 
> Signed-off-by: Zhigang Lu 
> Signed-off-by: Cyril Chemparathy 
> ---
>  app/test/test_cpuflags.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/app/test/test_cpuflags.c b/app/test/test_cpuflags.c
> index 5aeba5d..da93af5 100644
> --- a/app/test/test_cpuflags.c
> +++ b/app/test/test_cpuflags.c
> @@ -113,7 +113,7 @@ test_cpuflags(void)
>  
>   printf("Check for ICACHE_SNOOP:\t\t");
>   CHECK_FOR_FLAG(RTE_CPUFLAG_ICACHE_SNOOP);
> -#else
> +#elif !defined(RTE_ARCH_TILE)
>   printf("Check for SSE:\t\t");
>   CHECK_FOR_FLAG(RTE_CPUFLAG_SSE);
>  
Please stop this.  It doesn't make sense for a library that supports multiple
arches, we need some way to generically test for flags that doesn't involve
forcing applications to do ton's of ifdeffing.  Perhaps rte_cpu_get_flag_enabled
needs to do a flag table lookup based on the detected arch at run time, and
return the appropriate response.  In the case of tile, it can just be an empty
table, so 0 is always returned.  But making an application responsible for doing
arch checks is a guarantee to write non-portable applications

Neil

> -- 
> 2.1.2
> 
>

[dpdk-dev] DDPK use of MAP_FIXED in mmap

2014-12-09 Thread Bruce Richardson

On Mon, Dec 08, 2014 at 07:02:38PM +, Karmarkar Suyash wrote:
> Hello,
> 
> In DPDK when we use mmap why are we passing the MAP_FIXED flag when Linux man 
> page itself says that the option is discouraged? Any specific reason for 
> passing the MAP_FIXED flag?
> 
> 
> http://linux.die.net/man/2/mmap
> 
> MAP_FIXED
> Don't interpret addr as a hint: place the mapping at exactly that address. 
> addr must be a multiple of the page size. If the memory region specified by 
> addr and len overlaps pages of any existing mapping(s), then the overlapped 
> part of the existing mapping(s) will be discarded. If the specified address 
> cannot be used, mmap() will fail. Because requiring a fixed address for a 
> mapping is less portable, the use of this option is discouraged.
> 
> 
> Regards
> Suyash Karmarkar

I won't comment on every occurance of "MAP_FIXED" in DPDK, but it's main use is
when mapping the hugepages into memory inside EAL init. In this case, we are ok 
to
use it, as we take good care to ensure that our mapping space is free. What we 
do
is, once we know how many contiguous hugepages we need to map, we request a 
mapping
from /dev/zero for that particular size. We then record the address of the 
mapping
we get, and then unmap /dev/zero again - thereby freeing up the entire address
range. At this point, we then use MAP_FIXED to explicitly mmap in the hugepages
into this region that we have just freed up - thereby guaranteeing contiguous
hugepage mappings in the correct order. [The reason for doing things this way is
that we found on some systems - particularly with 32-bit code, the regular mmaps
of pages we being done in reverse order, meaning each page became it's own 
segment].

On the other hand, it's also good to note where we don't use MAP_FIXED. We don't
use map fixed when initializing a secondary process and are mapping the hugepage
memory into it. In this case, although we know where the memory has to be 
placed,
we don't know if it is safe to use or not. Instead of using MAP_FIXED, we 
instead
hint to the kernel our preferred address and check if the request was satisfied
at that address.

Hope this clarifies things a bit,
/Bruce

[dpdk-dev] lib: include rte_memory.h for __rte_cache_aligned

2014-12-09 Thread Olivier MATZ

Hi Neil,

On 12/08/2014 04:04 PM, Neil Horman wrote:
> On Fri, Nov 07, 2014 at 09:28:09AM -0800, Jia Yu wrote:
>> Include rte_memory.h for lib files that use __rte_cache_aligned
>> attribute.
>>
>> Signed-off-by: Jia Yu 
>>
> Why?  I presume there was a build break or something.  Please repost with a
> changelog that details what this patch is for.
> Neil

I don't know if Yu's issue was the same, but I had a very "fun" issue
with __rte_cache_aligned in my application. Consider the following code:

struct per_core_foo {
...
} __rte_cache_aligned;

struct global_foo {
struct per_core_foo foo[RTE_MAX_CORE];
};

If __rte_cache_aligned is not defined (rte_memory.h is not included),
the code compiles but the structure is not aligned... it defines the
structure and creates a global variable called __rte_cache_aligned.
And this can lead to really bad things if this code is in a .h that
is included by files that may or may not include rte_memory.h

I have no idea about how we could prevent this issue, except using
__attribute__((aligned(CACHE_LINE))) instead of __rte_cache_aligned.

Anyway this could probably explain the willing to include rte_memory.h
everywhere.

Regards,
Olivier

[dpdk-dev] [PATCH v2] VFIO: Avoid to enable vfio while the module not loaded

2014-12-09 Thread Burakov, Anatoly

Hi Michael,

> 
> Yes, you are right, strncmp() will check 30 bytes if use sizeof(buffer).
> 
> But any issue of strcmp() ? This rountin cares about exactly match. I think no
> need to convert to strncmp() if it does have other issue.
> 
> > fscanf with fgets would be better too, to make sure we never go over our
> buffer size when dealing with /proc/modules.
> 
> If we use fgets, we need additional efforts to get the modname,  for
> potential overflow issue, we can limit counts of fscanf(). like below:
> 
> fscanf(fd, "%30s %*[^\n]", mod_name);
> 

As  long as it doesn't cause easy buffer overruns, I'm fine with it :-)

Thanks,
Anatoly

[dpdk-dev] [RFC PATCH 00/17] Single virtio implementation

2014-12-09 Thread Bruce Richardson

On Tue, Dec 09, 2014 at 06:40:23AM +, Ouyang, Changchun wrote:
> 
> 
> > -Original Message-
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > Sent: Tuesday, December 9, 2014 2:12 PM
> > To: Ouyang, Changchun
> > Cc: Qiu, Michael; Stephen Hemminger; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [RFC PATCH 00/17] Single virtio implementation
> > 
> > 2014-12-09 05:41, Ouyang, Changchun:
> > > Hi
> > >
> > > > -Original Message-
> > > > From: Qiu, Michael
> > > > Sent: Tuesday, December 9, 2014 11:23 AM
> > > > To: Ouyang, Changchun; Thomas Monjalon; Stephen Hemminger
> > > > Cc: dev at dpdk.org
> > > > Subject: Re: [dpdk-dev] [RFC PATCH 00/17] Single virtio
> > > > implementation
> > > >
> > > > On 12/9/2014 9:11 AM, Ouyang, Changchun wrote:
> > > > > Hi Thomas,
> > > > >
> > > > >> -Original Message-
> > > > >> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > >> Sent: Monday, December 8, 2014 5:31 PM
> > > > >> To: Ouyang, Changchun
> > > > >> Cc: dev at dpdk.org
> > > > >> Subject: Re: [dpdk-dev] [RFC PATCH 00/17] Single virtio
> > > > >> implementation
> > > > >>
> > > > >> Hi Changchun,
> > > > >>
> > > > >> 2014-12-08 14:21, Ouyang Changchun:
> > > > >>> This patch set bases on two original RFC patch sets from Stephen
> > > > >> Hemminger[stephen at networkplumber.org]
> > > > >>> Refer to
> > > > >>> [http://dpdk.org/ml/archives/dev/2014-August/004845.html ] for
> > > > >> the original one.
> > > > >>> This patch set also resolves some conflict with latest codes and
> > > > >>> removed
> > > > >> duplicated codes.
> > > > >>
> > > > >> As you sent the patches, you appear as the author.
> > > > >> But I guess Stephen should be the author for some of them.
> > > > >> Please check who has contributed the most in each patch to decide.
> > > > > You are right, most of patches originate from Stephen's patchset,
> > > > > except for the last one, To be honest, I am ok whoever is the
> > > > > author of this patch set, :-), We could co-own the feature of
> > > > > Single virtio if you all agree with it, and I think we couldn't
> > > > > finish Such a feature without collaboration among us, this is why
> > > > > I tried to communicate
> > > > with most of you to collect more feedback, suggestion and comments
> > > > for this feature.
> > > > > Very appreciate for all kinds of feedback, suggestion here,
> > > > > especially for
> > > > patch set from Stephen.
> > > > >
> > > > > According to your request, how could we make this patch set looks
> > > > > more
> > > > like Stephen as the author?
> > > > > Currently I add Stephen as Signed-off-by list in each patch(I got
> > > > > the
> > > > agreement from Stephen before doing this :-)).
> > > >
> > > > Hi Ouyang,
> > > >
> > > > "Signed-off-by" should be added by himself, because the one who in
> > > > the Signed-off-by list should take responsibility for it(like potential
> > bugs/issues).
> > > >
> > > > Although, lots of patches are originate from Stephen, we still need
> > > > himself add this line :)
> > >
> > > Hi Thomas,
> > > It that right? I can't add Stephen into Signed-off-by list even if I
> > > have gotten the agreement from Stephen, What 's the strict rule here?
> > 
> > Stephen sent the patches with his Signed-off, then you added yours.
> > This is OK.
> > Using git am, author would have been Stephen. To change author now, you
> > can edit each commit with interactive rebase and "git commit --amend --
> > author=Stephen".
> > No need to resend now. Please check it for next version of the patchset.
> > 
> 
> So I understand correctly, Stephen need care for from patches from 1 to 16,
> I need care for the 17th patch from next version.
> What I mean "caring for" above is:  debug and validate them and send out 
> patches
> 
> Thanks
> Changchun
> 
Just to clarify Thomas point here about use of "git am". If you get a patch
from someone to test or work on, use "git am" to apply it, rather than "git 
apply",
since "git am" generates a commit in your local repo and thereby maintains the
original authorship of the patch. If you do "git apply" and subsequently commit
yourself, you - rather than the original author - will appear as the "author" of
the patch, and you need to amend the commit as Thomas suggests to fix this.

So in short:
* git am == good
* git apply == bad

Regards,
/Bruce

[dpdk-dev] error: value computed is not used

2014-12-09 Thread Qiu, Michael

Any other comments on this issue?

Thanks,
Michael

On 12/8/2014 5:07 PM, Qiu, Michael wrote:
> Hi all,
> My platform is:
>
> uname -a
> Linux suse-11-sp3 3.0.77-0.11-xen #1 SMP Tue Mar 11 16:48:56 CST 2014
> x86_64 x86_64 x86_64 GNU/Linux
>
> gcc -v
> Using built-in specs.
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/4.5/lto-wrapper
> Target: x86_64-suse-linux
> Configured with: ../configure --prefix=/usr --infodir=/usr/share/info
> --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64
> --enable-languages=c,c++,objc,fortran,obj-c++,java,ada
> --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.5
> --enable-ssp --disable-libssp --disable-plugin
> --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux'
> --disable-libgcj --disable-libmudflap --with-slibdir=/lib64
> --with-system-zlib --enable-__cxa_atexit
> --enable-libstdcxx-allocator=new --disable-libstdcxx-pch
> --enable-version-specific-runtime-libs --program-suffix=-4.5
> --enable-linux-futex --without-system-libunwind --enable-gold
> --with-plugin-ld=/usr/bin/gold --with-arch-32=i586 --with-tune=generic
> --build=x86_64-suse-linux
> Thread model: posix
> gcc version 4.5.1 20101208 [gcc-4_5-branch revision 167585] (SUSE Linux)
>
> When I try to compile the source code to x86_64 linuxapp, I got this
> error message:
>
> lib/librte_pmd_enic/enic_main.c: In function ?enic_set_rsskey?:
> lib/librte_pmd_enic/enic_main.c:862:2: error: value computed is not used
>
> I dig out that, it was ome issue of  the macros rte_memcpy()
> #define rte_memcpy(dst, src, n)  \
> ((__builtin_constant_p(n)) ?  \
> memcpy((dst), (src), (n)) :  \
> rte_memcpy_func((dst), (src), (n)))
>
> When I use only (n) instead of (__builtin_constant_p(n), it will pass( I
> know that it was incorrect, just a experiment).
>
> But I try to use inline function instead of macros:
> static inline void * rte_memcpy(void *dst, const void *src, size_t n)
> {
> return __builtin_constant_p(n) ? memcpy(dst, src, n) :
>  rte_memcpy_func(dst, src, n);
> }
>
> It will pass:), and works, this could be one potential workaround fix.
>
> Who knows why? The root cause is what?
>
> I've no idea about this.
>
> Thanks,
> Michael
>

[dpdk-dev] lib: include rte_memory.h for __rte_cache_aligned

2014-12-09 Thread Jia Yu

Yes, Olivier?s observation is consistent with ours. Compilation didn?t
complain
if rte_memory.h is not included.

Currently, the lib files indirectly included rte_mbuf.h or rte_mempool.h.
These two header files already included rte_memory.h. Therefore without
this patch, things still work (as validated by parole).

For good practice, it?s better to explicitly include rte_memory.h to avoid
the problem.

Thanks,
Jia



On 12/9/14, 12:53 AM, "Olivier MATZ"  wrote:

>Hi Neil,
>
>On 12/08/2014 04:04 PM, Neil Horman wrote:
>> On Fri, Nov 07, 2014 at 09:28:09AM -0800, Jia Yu wrote:
>>> Include rte_memory.h for lib files that use __rte_cache_aligned
>>> attribute.
>>>
>>> Signed-off-by: Jia Yu 
>>>
>> Why?  I presume there was a build break or something.  Please repost
>>with a
>> changelog that details what this patch is for.
>> Neil
>
>I don't know if Yu's issue was the same, but I had a very "fun" issue
>with __rte_cache_aligned in my application. Consider the following code:
>
>   struct per_core_foo {
>   ...
>   } __rte_cache_aligned;
>
>   struct global_foo {
>   struct per_core_foo foo[RTE_MAX_CORE];
>   };
>
>If __rte_cache_aligned is not defined (rte_memory.h is not included),
>the code compiles but the structure is not aligned... it defines the
>structure and creates a global variable called __rte_cache_aligned.
>And this can lead to really bad things if this code is in a .h that
>is included by files that may or may not include rte_memory.h
>
>I have no idea about how we could prevent this issue, except using
>__attribute__((aligned(CACHE_LINE))) instead of __rte_cache_aligned.
>
>Anyway this could probably explain the willing to include rte_memory.h
>everywhere.
>
>Regards,
>Olivier
>

[dpdk-dev] [PATCH 3/4] rte_sched: don't clear statistics when read

2014-12-09 Thread Stephen Hemminger

Make rte_sched statistics API work like the ethernet statistics API.
Don't auto-clear statistics when read.

Signed-off-by: Stephen Hemminger 

--- a/lib/librte_sched/rte_sched.c  2014-12-08 09:29:49.014821607 -0800
+++ b/lib/librte_sched/rte_sched.c  2014-12-08 09:35:45.568568267 -0800
@@ -924,9 +924,8 @@ rte_sched_subport_read_stats(struct rte_
}
s = port->subport + subport_id;

-   /* Copy subport stats and clear */
-   memcpy(stats, >stats, sizeof(struct rte_sched_subport_stats));
-   memset(>stats, 0, sizeof(struct rte_sched_subport_stats));
+   /* Copy subport stats */
+   *stats = s->stats;

/* Subport TC ovesubscription status */
*tc_ov = s->tc_ov;
@@ -935,6 +934,21 @@ rte_sched_subport_read_stats(struct rte_
 }

 int
+rte_sched_subport_stats_reset(struct rte_sched_port *port,
+ uint32_t subport_id)
+{
+   struct rte_sched_subport *s;
+
+   /* Check user parameters */
+   if (port == NULL || subport_id >= port->n_subports_per_port)
+   return -1;
+
+   s = port->subport + subport_id;
+   memset(>stats, 0, sizeof(struct rte_sched_subport_stats));
+   return 0;
+}
+
+int
 rte_sched_queue_read_stats(struct rte_sched_port *port,
uint32_t queue_id,
struct rte_sched_queue_stats *stats,
@@ -953,9 +967,8 @@ rte_sched_queue_read_stats(struct rte_sc
q = port->queue + queue_id;
qe = port->queue_extra + queue_id;

-   /* Copy queue stats and clear */
-   memcpy(stats, >stats, sizeof(struct rte_sched_queue_stats));
-   memset(>stats, 0, sizeof(struct rte_sched_queue_stats));
+   /* Copy queue stats */
+   *stats = qe->stats;

/* Queue length */
*qlen = q->qw - q->qr;
@@ -963,6 +976,21 @@ rte_sched_queue_read_stats(struct rte_sc
return 0;
 }

+int
+rte_sched_queue_stats_reset(struct rte_sched_port *port,
+   uint32_t queue_id)
+{
+   struct rte_sched_queue_extra *qe;
+
+   /* Check user parameters */
+   if (port == NULL || queue_id >= rte_sched_port_queues_per_port(port))
+   return -1;
+
+   qe = port->queue_extra + queue_id;
+   memset(>stats, 0, sizeof(struct rte_sched_queue_stats));
+   return 0;
+}
+
 static inline uint32_t
 rte_sched_port_qindex(struct rte_sched_port *port, uint32_t subport, uint32_t 
pipe, uint32_t traffic_class, uint32_t queue)
 {
--- a/lib/librte_sched/rte_sched.h  2014-12-08 09:29:49.014821607 -0800
+++ b/lib/librte_sched/rte_sched.h  2014-12-08 09:30:29.426977482 -0800
@@ -312,6 +312,21 @@ rte_sched_subport_read_stats(struct rte_
struct rte_sched_subport_stats *stats,
uint32_t *tc_ov);

+
+/**
+ * Hierarchical scheduler subport statistics reset
+ *
+ * @param port
+ *   Handle to port scheduler instance
+ * @param subport_id
+ *   Subport ID
+ * @return
+ *   0 upon success, error code otherwise
+ */
+int
+rte_sched_subport_stats_reset(struct rte_sched_port *port,
+ uint32_t subport_id);
+
 /**
  * Hierarchical scheduler queue statistics read
  *
@@ -333,6 +348,20 @@ rte_sched_queue_read_stats(struct rte_sc
struct rte_sched_queue_stats *stats,
uint16_t *qlen);

+/**
+ * Hierarchical scheduler queue statistics reset
+ *
+ * @param port
+ *   Handle to port scheduler instance
+ * @param queue_id
+ *   Queue ID within port scheduler
+ * @return
+ *   0 upon success, error code otherwise
+ */
+int
+rte_sched_queue_stats_reset(struct rte_sched_port *port,
+   uint32_t queue_id);
+
 /*
  * Run-time
  *

[dpdk-dev] [PATCH 2/4] rte_sched: keep track of RED drops

2014-12-09 Thread Stephen Hemminger

Add new statistic to keep track of drops due to RED.

Signed-off-by: Stephen Hemminger 

--- a/lib/librte_sched/rte_sched.c  2014-12-08 09:28:37.810590895 -0800
+++ b/lib/librte_sched/rte_sched.c  2014-12-08 09:28:37.810590895 -0800
@@ -1028,7 +1028,9 @@ rte_sched_port_update_subport_stats(stru
 }

 static inline void
-rte_sched_port_update_subport_stats_on_drop(struct rte_sched_port *port, 
uint32_t qindex, struct rte_mbuf *pkt)
+rte_sched_port_update_subport_stats_on_drop(struct rte_sched_port *port,
+   uint32_t qindex,
+   struct rte_mbuf *pkt, uint32_t red)
 {
struct rte_sched_subport *s = port->subport + (qindex / 
rte_sched_port_queues_per_subport(port));
uint32_t tc_index = (qindex >> 2) & 0x3;
@@ -1036,6 +1038,9 @@ rte_sched_port_update_subport_stats_on_d

s->stats.n_pkts_tc_dropped[tc_index] += 1;
s->stats.n_bytes_tc_dropped[tc_index] += pkt_len;
+#ifdef RTE_SCHED_RED
+   s->stats.n_pkts_red_dropped[tc_index] += red;
+#endif
 }

 static inline void
@@ -1206,12 +1211,20 @@ rte_sched_port_enqueue_qwa(struct rte_sc
qlen = q->qw - q->qr;

/* Drop the packet (and update drop stats) when queue is full */
-   if (unlikely(rte_sched_port_red_drop(port, pkt, qindex, qlen) || (qlen 
>= qsize))) {
+   if (unlikely(rte_sched_port_red_drop(port, pkt, qindex, qlen))) {
+#ifdef RTE_SCHED_COLLECT_STATS
+   rte_sched_port_update_subport_stats_on_drop(port, qindex, pkt, 
1);
+   rte_sched_port_update_queue_stats_on_drop(port, qindex, pkt, 1);
+#endif
rte_pktmbuf_free(pkt);
+   }
+
+   if (qlen >= qsize) {
 #ifdef RTE_SCHED_COLLECT_STATS
-   rte_sched_port_update_subport_stats_on_drop(port, qindex, pkt);
-   rte_sched_port_update_queue_stats_on_drop(port, qindex, pkt);
+   rte_sched_port_update_subport_stats_on_drop(port, qindex, pkt, 
0);
+   rte_sched_port_update_queue_stats_on_drop(port, qindex, pkt, 0);
 #endif
+   rte_pktmbuf_free(pkt);
return 0;
}

--- a/lib/librte_sched/rte_sched.h  2014-12-08 09:28:37.810590895 -0800
+++ b/lib/librte_sched/rte_sched.h  2014-12-08 09:29:11.402692026 -0800
@@ -140,6 +140,9 @@ struct rte_sched_subport_stats {
  subport for each traffic class*/
uint32_t n_bytes_tc_dropped[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE]; /**< 
Number of bytes dropped by the current
   subport for each traffic class due 
to subport queues being full or congested */
+#ifdef RTE_SCHED_RED
+   uint32_t n_pkts_red_dropped[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE]; /**< 
Number of packets dropped by red */
+#endif
 };

 /** Pipe configuration parameters. The period and credits_per_period 
parameters are measured
@@ -168,7 +171,9 @@ struct rte_sched_queue_stats {
/* Packets */
uint32_t n_pkts; /**< Number of packets successfully 
written to current queue */
uint32_t n_pkts_dropped; /**< Number of packets dropped due to 
current queue being full or congested */
-
+#ifdef RTE_SCHED_RED
+   uint32_t n_pkts_red_dropped;
+#endif
/* Bytes */
uint32_t n_bytes;/**< Number of bytes successfully 
written to current queue */
uint32_t n_bytes_dropped;/**< Number of bytes dropped due to 
current queue being full or congested */

[dpdk-dev] [PATCH v3] mbuf: fix of enabling all newly added RX error flags

2014-12-09 Thread Thomas Monjalon

2014-12-09 02:14, Zhang, Helin:
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > 2014-12-06 09:33, Helin Zhang:
> > > Before redefining mbuf structure, there was lack of free bits in 
> > > 'ol_flags'
> > > (32 bits in total) for new RX or TX flags. So it tried to reuse
> > > existant bits as most as possible, or even assigning 0 to some of bit
> > > flags. After new mbuf structure defined, there are quite a lot of free
> > > bits. So those newly added bit flags should be assigned with correct
> > > and valid bit values, and getting their names should be enabled as
> > > well. Note that 'RECIP' should be removed, as nowhere will use it.
> > > 'PKT_RX_ERR' is defined to replace all other error bit flags, e.g. MAC
> > > error, Oversize error, header buffer overflow error.
> > >
> > > Signed-off-by: Helin Zhang 
> > [...]
> > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > @@ -84,11 +84,6 @@ extern "C" {
> > >  #define PKT_RX_FDIR  (1ULL << 2)  /**< RX packet with FDIR
> > match indicate. */
> > >  #define PKT_RX_L4_CKSUM_BAD  (1ULL << 3)  /**< L4 cksum of RX pkt.
> > is
> > > not OK. */  #define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)  /**< IP cksum
> > of
> > > RX pkt. is not OK. */ -#define PKT_RX_EIP_CKSUM_BAD (0ULL << 0)  /**<
> > External IP header checksum error. */
> > > -#define PKT_RX_OVERSIZE  (0ULL << 0)  /**< Num of desc of an RX
> > pkt oversize. */
> > > -#define PKT_RX_HBUF_OVERFLOW (0ULL << 0)  /**< Header buffer
> > overflow. */
> > > -#define PKT_RX_RECIP_ERR (0ULL << 0)  /**< Hardware processing
> > error. */
> > > -#define PKT_RX_MAC_ERR   (0ULL << 0)  /**< MAC error. */
> > >  #define PKT_RX_IPV4_HDR  (1ULL << 5)  /**< RX packet with IPv4
> > header. */
> > >  #define PKT_RX_IPV4_HDR_EXT  (1ULL << 6)  /**< RX packet with
> > extended IPv4 header. */
> > >  #define PKT_RX_IPV6_HDR  (1ULL << 7)  /**< RX packet with IPv6
> > header. */
> > > @@ -99,6 +94,8 @@ extern "C" {
> > >  #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet
> > with IPv6 header. */
> > >  #define PKT_RX_FDIR_ID   (1ULL << 13) /**< FD id reported if FDIR
> > match. */
> > >  #define PKT_RX_FDIR_FLX  (1ULL << 14) /**< Flexible bytes reported
> > if FDIR match. */
> > > +#define PKT_RX_EIP_CKSUM_BAD (1ULL << 15)  /**< External IP header
> > > +checksum error. */
> > 
> > Could PKT_RX_EIP_CKSUM_BAD be aggregated with
> > PKT_RX_IP_CKSUM_BAD?
> 
> I tend to say no, but I would listen to comments from others.
> For tunneling case (e.g. IP over IP), it is a bit similar to the case of 
> L3/L4 (e.g. UDP over IP).
> For L3/L4 case, we have PKT_RX_IP_CKSUM_BAD and PKT_RX_L4_CKSUM_BAD to
> indicate the checksum error is in L3 or L4.
> So I'd prefer to have PKT_RX_IP_CKSUM_BAD and PKT_RX_EIP_CKSUM_BAD to indicate
> the checksum error is in outer or inner header.

I think OUTER_IP would be more consistent than EIP.

> Otherwise we have no chance to know where the checksum error is, based on 
> mbuf.
> 
> > The conclusion is the same: the packet is corrupted.
> > And some hardwares could not detect the encapsulation and use
> > PKT_RX_IP_CKSUM_BAD.
> 
> If the hardware don't know it is a tunneling packet, it will just treat it as 
> an IP packet. But if
> hardware supports tunneling, it would be better for apps to know that more 
> about the
> packet which can be offloaded by hardware.
> 
> > 
> > Another interesting improvement would be to have PKT_RX_IP_CKSUM_OK.
> > I think we'll have to think about this kind of flag for next version.
> 
> For checksum OK, if no 'BAD' indicated, we can assume it is OK. Any other 
> hints from you?

No, having no BAD can indicate also that it hasn't been checked (i.e. check not
enabled or not supported).

> > Note that this patch is an API change and shouldn't be applied for 1.8.0.
> > But we can do an exception as it has no impact on existing applications and
> > fixes the 0 flags.
> Agree with you!
> 
> Thank you very much for thinking so much about this!
> 
> Regards,
> Helin

-- 
Thomas

[dpdk-dev] [RFC PATCH 00/17] Single virtio implementation

2014-12-09 Thread Thomas Monjalon

2014-12-09 05:41, Ouyang, Changchun:
> Hi
> 
> > -Original Message-
> > From: Qiu, Michael
> > Sent: Tuesday, December 9, 2014 11:23 AM
> > To: Ouyang, Changchun; Thomas Monjalon; Stephen Hemminger
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [RFC PATCH 00/17] Single virtio implementation
> > 
> > On 12/9/2014 9:11 AM, Ouyang, Changchun wrote:
> > > Hi Thomas,
> > >
> > >> -Original Message-
> > >> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > >> Sent: Monday, December 8, 2014 5:31 PM
> > >> To: Ouyang, Changchun
> > >> Cc: dev at dpdk.org
> > >> Subject: Re: [dpdk-dev] [RFC PATCH 00/17] Single virtio
> > >> implementation
> > >>
> > >> Hi Changchun,
> > >>
> > >> 2014-12-08 14:21, Ouyang Changchun:
> > >>> This patch set bases on two original RFC patch sets from Stephen
> > >> Hemminger[stephen at networkplumber.org]
> > >>> Refer to [http://dpdk.org/ml/archives/dev/2014-August/004845.html ]
> > >>> for
> > >> the original one.
> > >>> This patch set also resolves some conflict with latest codes and
> > >>> removed
> > >> duplicated codes.
> > >>
> > >> As you sent the patches, you appear as the author.
> > >> But I guess Stephen should be the author for some of them.
> > >> Please check who has contributed the most in each patch to decide.
> > > You are right, most of patches originate from Stephen's patchset,
> > > except for the last one, To be honest, I am ok whoever is the author
> > > of this patch set, :-), We could co-own the feature of Single virtio
> > > if you all agree with it, and I think we couldn't finish Such a
> > > feature without collaboration among us, this is why I tried to communicate
> > with most of you to collect more feedback, suggestion and comments for this
> > feature.
> > > Very appreciate for all kinds of feedback, suggestion here, especially for
> > patch set from Stephen.
> > >
> > > According to your request, how could we make this patch set looks more
> > like Stephen as the author?
> > > Currently I add Stephen as Signed-off-by list in each patch(I got the
> > agreement from Stephen before doing this :-)).
> > 
> > Hi Ouyang,
> > 
> > "Signed-off-by" should be added by himself, because the one who in the
> > Signed-off-by list should take responsibility for it(like potential 
> > bugs/issues).
> > 
> > Although, lots of patches are originate from Stephen, we still need himself
> > add this line :)
> 
> Hi Thomas, 
> It that right? I can't add Stephen into Signed-off-by list even if I have 
> gotten the agreement from Stephen,
> What 's the strict rule here?

Stephen sent the patches with his Signed-off, then you added yours.
This is OK.
Using git am, author would have been Stephen. To change author now, you can
edit each commit with interactive rebase and "git commit --amend 
--author=Stephen".
No need to resend now. Please check it for next version of the patchset.

-- 
Thomas

[dpdk-dev] [RFC PATCH 00/17] Single virtio implementation

2014-12-09 Thread Ouyang, Changchun



> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Tuesday, December 9, 2014 2:12 PM
> To: Ouyang, Changchun
> Cc: Qiu, Michael; Stephen Hemminger; dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 00/17] Single virtio implementation
> 
> 2014-12-09 05:41, Ouyang, Changchun:
> > Hi
> >
> > > -Original Message-
> > > From: Qiu, Michael
> > > Sent: Tuesday, December 9, 2014 11:23 AM
> > > To: Ouyang, Changchun; Thomas Monjalon; Stephen Hemminger
> > > Cc: dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [RFC PATCH 00/17] Single virtio
> > > implementation
> > >
> > > On 12/9/2014 9:11 AM, Ouyang, Changchun wrote:
> > > > Hi Thomas,
> > > >
> > > >> -Original Message-
> > > >> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > >> Sent: Monday, December 8, 2014 5:31 PM
> > > >> To: Ouyang, Changchun
> > > >> Cc: dev at dpdk.org
> > > >> Subject: Re: [dpdk-dev] [RFC PATCH 00/17] Single virtio
> > > >> implementation
> > > >>
> > > >> Hi Changchun,
> > > >>
> > > >> 2014-12-08 14:21, Ouyang Changchun:
> > > >>> This patch set bases on two original RFC patch sets from Stephen
> > > >> Hemminger[stephen at networkplumber.org]
> > > >>> Refer to
> > > >>> [http://dpdk.org/ml/archives/dev/2014-August/004845.html ] for
> > > >> the original one.
> > > >>> This patch set also resolves some conflict with latest codes and
> > > >>> removed
> > > >> duplicated codes.
> > > >>
> > > >> As you sent the patches, you appear as the author.
> > > >> But I guess Stephen should be the author for some of them.
> > > >> Please check who has contributed the most in each patch to decide.
> > > > You are right, most of patches originate from Stephen's patchset,
> > > > except for the last one, To be honest, I am ok whoever is the
> > > > author of this patch set, :-), We could co-own the feature of
> > > > Single virtio if you all agree with it, and I think we couldn't
> > > > finish Such a feature without collaboration among us, this is why
> > > > I tried to communicate
> > > with most of you to collect more feedback, suggestion and comments
> > > for this feature.
> > > > Very appreciate for all kinds of feedback, suggestion here,
> > > > especially for
> > > patch set from Stephen.
> > > >
> > > > According to your request, how could we make this patch set looks
> > > > more
> > > like Stephen as the author?
> > > > Currently I add Stephen as Signed-off-by list in each patch(I got
> > > > the
> > > agreement from Stephen before doing this :-)).
> > >
> > > Hi Ouyang,
> > >
> > > "Signed-off-by" should be added by himself, because the one who in
> > > the Signed-off-by list should take responsibility for it(like potential
> bugs/issues).
> > >
> > > Although, lots of patches are originate from Stephen, we still need
> > > himself add this line :)
> >
> > Hi Thomas,
> > It that right? I can't add Stephen into Signed-off-by list even if I
> > have gotten the agreement from Stephen, What 's the strict rule here?
> 
> Stephen sent the patches with his Signed-off, then you added yours.
> This is OK.
> Using git am, author would have been Stephen. To change author now, you
> can edit each commit with interactive rebase and "git commit --amend --
> author=Stephen".
> No need to resend now. Please check it for next version of the patchset.
> 

So I understand correctly, Stephen need care for from patches from 1 to 16,
I need care for the 17th patch from next version.
What I mean "caring for" above is:  debug and validate them and send out patches

Thanks
Changchun

[dpdk-dev] [PATCH 11/15] eal/tile: add EAL support for global mPIPE initialization

2014-12-09 Thread Neil Horman

On Mon, Dec 08, 2014 at 01:32:30PM -0800, Cyril Chemparathy wrote:
> Hi Neil,
> 
> 
> On 12/8/2014 12:03 PM, Neil Horman wrote:
> >On Mon, Dec 08, 2014 at 04:59:34PM +0800, Zhigang Lu wrote:
> >>The TileGx mPIPE hardware provides Ethernet connectivity,
> >>packet classification, and packet load balancing services.
> >>
> >>Signed-off-by: Zhigang Lu 
> >>Signed-off-by: Cyril Chemparathy 
> >>---
> >>  .../common/include/arch/tile/rte_mpipe.h   |  67 ++
> >>  lib/librte_eal/linuxapp/eal/Makefile   |   3 +
> >>  lib/librte_eal/linuxapp/eal/eal.c  |   9 ++
> >>  lib/librte_eal/linuxapp/eal/eal_mpipe_tile.c   | 147 
> >> +
> >>  mk/rte.app.mk  |   4 +
> >>  5 files changed, 230 insertions(+)
> >>  create mode 100644 lib/librte_eal/common/include/arch/tile/rte_mpipe.h
> >>  create mode 100644 lib/librte_eal/linuxapp/eal/eal_mpipe_tile.c
> >>
> >This seems like the wrong way to implement mpip access.  If you want to use 
> >it
> >for networking access, you should create a pmd to talk to it.  If you just 
> >want
> >raw gxio access, you already have a gxio library that applications can 
> >interface
> >to.  Theres no need to create addtional DPDK api services just to wrap it up,
> >especially given that those surfaces won't exist outside of the tile arch 
> >(i.e.
> >this allows for the creation of very non-portable applications).
> 
> Thanks for the taking a look.
> 
> The mPIPE hardware block includes hardware managed buffer pools, which we've
> abstracted under the mempool interface in the very next patch in the series.
> As a result of this mempool dependency, mPIPE needs to be globally
> initialized on TileGX architecture.
> 
Ok, but you already have a mechanism to do that, in that the application can use
the gxio library to do the initialization itself, and you don't need to provide
additional wrapper calls that are arch specific in a common library.

> The alternative is to not include support for the hardware managed buffer
> pool, but that decision incurs a significant performance hit.
> 
No, there are plenty of alternatives to just not doing it.  In fact, you can
create a constructor to do this initialization work, and manage the instance
id's so that the user never has to see it

Neil

> Thanks
> -- Cyril.
>

[dpdk-dev] [RFC PATCH 00/17] Single virtio implementation

2014-12-09 Thread Ouyang, Changchun

Hi

> -Original Message-
> From: Qiu, Michael
> Sent: Tuesday, December 9, 2014 11:23 AM
> To: Ouyang, Changchun; Thomas Monjalon; Stephen Hemminger
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 00/17] Single virtio implementation
> 
> On 12/9/2014 9:11 AM, Ouyang, Changchun wrote:
> > Hi Thomas,
> >
> >> -Original Message-
> >> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> >> Sent: Monday, December 8, 2014 5:31 PM
> >> To: Ouyang, Changchun
> >> Cc: dev at dpdk.org
> >> Subject: Re: [dpdk-dev] [RFC PATCH 00/17] Single virtio
> >> implementation
> >>
> >> Hi Changchun,
> >>
> >> 2014-12-08 14:21, Ouyang Changchun:
> >>> This patch set bases on two original RFC patch sets from Stephen
> >> Hemminger[stephen at networkplumber.org]
> >>> Refer to [http://dpdk.org/ml/archives/dev/2014-August/004845.html ]
> >>> for
> >> the original one.
> >>> This patch set also resolves some conflict with latest codes and
> >>> removed
> >> duplicated codes.
> >>
> >> As you sent the patches, you appear as the author.
> >> But I guess Stephen should be the author for some of them.
> >> Please check who has contributed the most in each patch to decide.
> > You are right, most of patches originate from Stephen's patchset,
> > except for the last one, To be honest, I am ok whoever is the author
> > of this patch set, :-), We could co-own the feature of Single virtio
> > if you all agree with it, and I think we couldn't finish Such a
> > feature without collaboration among us, this is why I tried to communicate
> with most of you to collect more feedback, suggestion and comments for this
> feature.
> > Very appreciate for all kinds of feedback, suggestion here, especially for
> patch set from Stephen.
> >
> > According to your request, how could we make this patch set looks more
> like Stephen as the author?
> > Currently I add Stephen as Signed-off-by list in each patch(I got the
> agreement from Stephen before doing this :-)).
> 
> Hi Ouyang,
> 
> "Signed-off-by" should be added by himself, because the one who in the
> Signed-off-by list should take responsibility for it(like potential 
> bugs/issues).
> 
> Although, lots of patches are originate from Stephen, we still need himself
> add this line :)

Hi Thomas, 
It that right? I can't add Stephen into Signed-off-by list even if I have 
gotten the agreement from Stephen,
What 's the strict rule here?

Thanks
Changchun

[dpdk-dev] [PATCH v2 02/28] ethdev: Remove assumption that port will not be detached

2014-12-09 Thread Zhang, Helin



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Tetsuya Mukawa
> Sent: Tuesday, December 9, 2014 11:42 AM
> To: dev at dpdk.org
> Cc: nakajima.yoshihiro at lab.ntt.co.jp; menrigh at brocade.com;
> masutani.hitoshi at lab.ntt.co.jp
> Subject: [dpdk-dev] [PATCH v2 02/28] ethdev: Remove assumption that port will
> not be detached
> 
> To remove assumption, do like followings.
> 
> - Add 'attached' member to rte_eth_dev structure.
>   This member is used for indicating the port is attached, or not.
> - Add rte_eth_dev_allocate_new_port().
>   This function is used for allocating new port.
> 
> Signed-off-by: Tetsuya Mukawa 
> ---
>  lib/librte_ether/rte_ethdev.c | 248 
> --
>  lib/librte_ether/rte_ethdev.h |   5 +
>  2 files changed, 149 insertions(+), 104 deletions(-)
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c 
> index
> 95f2ceb..9f713ae 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -175,6 +175,16 @@ enum {
>   STAT_QMAP_RX
>  };
> 
> +enum {
> + DEV_VALID = 0,
> + DEV_INVALID
> +};
Would it be safer to define DEV_INVALID = 0, then DEV_VALID after it?
Or even as below?
Enum {
DEV_UNKNOWN = 0,
DEV_VALID,
DEV_INVALID
};

Regards,
Helin

> +
> +enum {
> + DEV_DISCONNECTED = 0,
> + DEV_CONNECTED
> +};
> +
>  static inline void
>  rte_eth_dev_data_alloc(void)
>  {
> @@ -201,19 +211,34 @@ rte_eth_dev_allocated(const char *name)  {
>   unsigned i;
> 
> - for (i = 0; i < nb_ports; i++) {
> - if (strcmp(rte_eth_devices[i].data->name, name) == 0)
> + for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
> + if ((rte_eth_devices[i].attached == DEV_CONNECTED)
> + && strcmp(rte_eth_devices[i].data->name,
> + name) == 0)
>   return _eth_devices[i];
>   }
>   return NULL;
>  }
> 
> +static uint8_t
> +rte_eth_dev_allocate_new_port(void)
> +{
> + unsigned i;
> +
> + for (i = 0; i < RTE_MAX_ETHPORTS; i++)
> + if (rte_eth_devices[i].attached == DEV_DISCONNECTED)
> + return i;
> + return RTE_MAX_ETHPORTS;
> +}
> +
>  struct rte_eth_dev *
>  rte_eth_dev_allocate(const char *name)
>  {
> + uint8_t port_id;
>   struct rte_eth_dev *eth_dev;
> 
> - if (nb_ports == RTE_MAX_ETHPORTS) {
> + port_id = rte_eth_dev_allocate_new_port();
> + if (port_id == RTE_MAX_ETHPORTS) {
>   PMD_DEBUG_TRACE("Reached maximum number of Ethernet
> ports\n");
>   return NULL;
>   }
> @@ -226,10 +251,12 @@ rte_eth_dev_allocate(const char *name)
>   return NULL;
>   }
> 
> - eth_dev = _eth_devices[nb_ports];
> - eth_dev->data = _eth_dev_data[nb_ports];
> + eth_dev = _eth_devices[port_id];
> + eth_dev->data = _eth_dev_data[port_id];
>   snprintf(eth_dev->data->name, sizeof(eth_dev->data->name), "%s",
> name);
> - eth_dev->data->port_id = nb_ports++;
> + eth_dev->data->port_id = port_id;
> + eth_dev->attached = DEV_CONNECTED;
> + nb_ports++;
>   return eth_dev;
>  }
> 
> @@ -283,6 +310,7 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv,
>   (unsigned) pci_dev->id.device_id);
>   if (rte_eal_process_type() == RTE_PROC_PRIMARY)
>   rte_free(eth_dev->data->dev_private);
> + eth_dev->attached = DEV_DISCONNECTED;
>   nb_ports--;
>   return diag;
>  }
> @@ -308,10 +336,22 @@ rte_eth_driver_register(struct eth_driver *eth_drv)
>   rte_eal_pci_register(_drv->pci_drv);
>  }
> 
> +static int
> +rte_eth_dev_validate_port(uint8_t port_id) {
> + if (port_id >= RTE_MAX_ETHPORTS)
> + return DEV_INVALID;
> +
> + if (rte_eth_devices[port_id].attached == DEV_CONNECTED)
> + return DEV_VALID;
> + else
> + return DEV_INVALID;
> +}
> +
>  int
>  rte_eth_dev_socket_id(uint8_t port_id)
>  {
> - if (port_id >= nb_ports)
> + if (rte_eth_dev_validate_port(port_id) == DEV_INVALID)
>   return -1;
>   return rte_eth_devices[port_id].pci_dev->numa_node;
>  }
> @@ -369,7 +409,7 @@ rte_eth_dev_rx_queue_start(uint8_t port_id, uint16_t
> rx_queue_id)
>* in a multi-process setup*/
>   PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
> 
> - if (port_id >= nb_ports) {
> + if (rte_eth_dev_validate_port(port_id) == DEV_INVALID) {
>   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
>   return -EINVAL;
>   }
> @@ -395,7 +435,7 @@ rte_eth_dev_rx_queue_stop(uint8_t port_id, uint16_t
> rx_queue_id)
>* in a multi-process setup*/
>   PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
> 
> - if (port_id >= nb_ports) {
> + if (rte_eth_dev_validate_port(port_id) == DEV_INVALID) {
>   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
>   return

[dpdk-dev] [RFC PATCH 00/17] Single virtio implementation

2014-12-09 Thread Qiu, Michael

On 12/9/2014 9:11 AM, Ouyang, Changchun wrote:
> Hi Thomas,
>
>> -Original Message-
>> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
>> Sent: Monday, December 8, 2014 5:31 PM
>> To: Ouyang, Changchun
>> Cc: dev at dpdk.org
>> Subject: Re: [dpdk-dev] [RFC PATCH 00/17] Single virtio implementation
>>
>> Hi Changchun,
>>
>> 2014-12-08 14:21, Ouyang Changchun:
>>> This patch set bases on two original RFC patch sets from Stephen
>> Hemminger[stephen at networkplumber.org]
>>> Refer to [http://dpdk.org/ml/archives/dev/2014-August/004845.html ] for
>> the original one.
>>> This patch set also resolves some conflict with latest codes and removed
>> duplicated codes.
>>
>> As you sent the patches, you appear as the author.
>> But I guess Stephen should be the author for some of them.
>> Please check who has contributed the most in each patch to decide.
> You are right, most of patches originate from Stephen's patchset, except for 
> the last one,
> To be honest, I am ok whoever is the author of this patch set, :-),
> We could co-own the feature of Single virtio if you all agree with it, and I 
> think we couldn't finish
> Such a feature without collaboration among us, this is why I tried to 
> communicate with most of you 
> to collect more feedback, suggestion and comments for this feature.
> Very appreciate for all kinds of feedback, suggestion here, especially for 
> patch set from Stephen. 
>
> According to your request, how could we make this patch set looks more like 
> Stephen as the author? 
> Currently I add Stephen as Signed-off-by list in each patch(I got the 
> agreement from Stephen before doing this :-)).

Hi Ouyang,

"Signed-off-by" should be added by himself, because the one who in the 
Signed-off-by list should take responsibility for it(like potential 
bugs/issues).

Although, lots of patches are originate from Stephen, we still need himself add 
this line :)

If DPDK community's Signed-off-by" rule is different from linux(qemu, etc.), 
please ignore my comment :)

Thanks,
Michael 

> Need I send all patchset to Stephen and let Stephen send out them to dpdk.org?
> Or any other better solution?
> If you has better suggestion, I assume it works for all subsequent RFC and 
> normal patch set.
>  
> Any other suggestions are welcome.
>
> Thanks
> Changchun
>
>
>
>

[dpdk-dev] [PATCH v2 2/2] mbuf: assign valid bit values for some RX and TX flags

2014-12-09 Thread Zhang, Helin



> -Original Message-
> From: Ananyev, Konstantin
> Sent: Monday, December 8, 2014 6:55 PM
> To: Zhang, Helin; dev at dpdk.org
> Cc: Cao, Waterman; Cao, Min; Wu, Jingjing; Liu, Jijiang;
> olivier.matz at 6wind.com
> Subject: RE: [PATCH v2 2/2] mbuf: assign valid bit values for some RX and TX
> flags
> 
> 
> 
> > -Original Message-
> > From: Zhang, Helin
> > Sent: Saturday, December 06, 2014 1:08 AM
> > To: Ananyev, Konstantin; dev at dpdk.org
> > Cc: Cao, Waterman; Cao, Min; Wu, Jingjing; Liu, Jijiang;
> > olivier.matz at 6wind.com
> > Subject: RE: [PATCH v2 2/2] mbuf: assign valid bit values for some RX
> > and TX flags
> >
> >
> >
> > > -Original Message-
> > > From: Zhang, Helin
> > > Sent: Saturday, December 6, 2014 8:40 AM
> > > To: Ananyev, Konstantin; dev at dpdk.org
> > > Cc: Cao, Waterman; Cao, Min; Wu, Jingjing; Liu, Jijiang;
> > > olivier.matz at 6wind.com
> > > Subject: RE: [PATCH v2 2/2] mbuf: assign valid bit values for some
> > > RX and TX flags
> > >
> > > OK. I will send out another patch according to your comments. Thanks a 
> > > lot!
> > >
> > > Regards,
> > > Helin
> > >
> > > > -Original Message-
> > > > From: Ananyev, Konstantin
> > > > Sent: Friday, December 5, 2014 6:50 PM
> > > > To: Zhang, Helin; dev at dpdk.org
> > > > Cc: Cao, Waterman; Cao, Min; Wu, Jingjing; Liu, Jijiang;
> > > > olivier.matz at 6wind.com
> > > > Subject: RE: [PATCH v2 2/2] mbuf: assign valid bit values for some
> > > > RX and TX flags
> > > >
> > > > Hi Helin,
> > > >
> > > > > -Original Message-
> > > > > From: Zhang, Helin
> > > > > Sent: Friday, December 05, 2014 1:46 AM
> > > > > To: dev at dpdk.org
> > > > > Cc: Cao, Waterman; Cao, Min; Wu, Jingjing; Liu, Jijiang;
> > > > > Ananyev, Konstantin; olivier.matz at 6wind.com; Zhang, Helin
> > > > > Subject: [PATCH v2 2/2] mbuf: assign valid bit values for some
> > > > > RX and TX flags
> > > > >
> > > > > Before redefining mbuf structure, there was lack of free bits in
> > > > > 'ol_flags' (32 bits in total) for new RX or TX flags. So it
> > > > > tried to reuse existant bits as most as possible, or even
> > > > > assigning 0 to some of bit flags. After new mbuf structure
> > > > > defined, there are quite a lot of free bits. So those newly
> > > > > added bit flags should be assigned with correct and valid bit
> > > > > values, and getting their names should be enabled as well. Note
> > > > > that 'RECIP' should be removed, as nowhere will use it.
> > > > >
> > > > > Signed-off-by: Helin Zhang 
> > > > > ---
> > > > >  lib/librte_mbuf/rte_mbuf.c |  9 -
> > > > > lib/librte_mbuf/rte_mbuf.h
> > > > > | 19 +--
> > > > >  2 files changed, 13 insertions(+), 15 deletions(-)
> > > > >
> > > > > v2 changes:
> > > > > * Removed error flag of 'ECIPE' processing only in mbuf. All other 
> > > > > error
> flags
> > > > >   were added back.
> > > > > * Assigned error flags with correct and valid values, as their 
> > > > > previous
> values
> > > > >   were invalid.
> > > > > * Enabled getting all error flag names.
> > > > >
> > > > > diff --git a/lib/librte_mbuf/rte_mbuf.c
> > > > > b/lib/librte_mbuf/rte_mbuf.c index 87c2963..3ce7c8d 100644
> > > > > --- a/lib/librte_mbuf/rte_mbuf.c
> > > > > +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > > @@ -210,11 +210,10 @@ const char
> > > > > *rte_get_rx_ol_flag_name(uint64_t
> > > > mask)
> > > > >   case PKT_RX_FDIR: return "PKT_RX_FDIR";
> > > > >   case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
> > > > >   case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
> > > > > - /* case PKT_RX_EIP_CKSUM_BAD: return
> "PKT_RX_EIP_CKSUM_BAD"; */
> > > > > - /* case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE"; */
> > > > > - /* case PKT_RX_HBUF_OVERFLOW: return
> "PKT_RX_HBUF_OVERFLOW";
> > > > */
> > > > > - /* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
> > > > > - /* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
> > > > > + case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD";
> > > > > + case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE";
> > > > > + case PKT_RX_HBUF_OVERFLOW: return
> > > "PKT_RX_HBUF_OVERFLOW";
> > > > > + case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR";
> > > > >   case PKT_RX_IPV4_HDR: return "PKT_RX_IPV4_HDR";
> > > > >   case PKT_RX_IPV4_HDR_EXT: return "PKT_RX_IPV4_HDR_EXT";
> > > > >   case PKT_RX_IPV6_HDR: return "PKT_RX_IPV6_HDR"; diff --git
> > > > > a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index
> > > > > 2e5fce5..c9591c0 100644
> > > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > @@ -84,11 +84,6 @@ extern "C" {
> > > > >  #define PKT_RX_FDIR  (1ULL << 2)  /**< RX packet with
> FDIR
> > > > match indicate. */
> > > > >  #define PKT_RX_L4_CKSUM_BAD  (1ULL << 3)  /**< L4 cksum of RX
> pkt.
> > > > is
> > > > > not OK. */  #define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)  /**< IP
> > > > > cksum
> > > > of
> > >

[dpdk-dev] [PATCH v3] mbuf: fix of enabling all newly added RX error flags

2014-12-09 Thread Zhang, Helin



> -Original Message-
> From: Ananyev, Konstantin
> Sent: Monday, December 8, 2014 6:45 PM
> To: Zhang, Helin; dev at dpdk.org
> Cc: Cao, Waterman; Cao, Min; olivier.matz at 6wind.com
> Subject: RE: [PATCH v3] mbuf: fix of enabling all newly added RX error flags
> 
> Hi Helin,
> 
> > -Original Message-
> > From: Zhang, Helin
> > Sent: Saturday, December 06, 2014 1:34 AM
> > To: dev at dpdk.org
> > Cc: Cao, Waterman; Cao, Min; olivier.matz at 6wind.com; Ananyev,
> > Konstantin; Zhang, Helin
> > Subject: [PATCH v3] mbuf: fix of enabling all newly added RX error
> > flags
> >
> > Before redefining mbuf structure, there was lack of free bits in 'ol_flags'
> > (32 bits in total) for new RX or TX flags. So it tried to reuse
> > existant bits as most as possible, or even assigning 0 to some of bit
> > flags. After new mbuf structure defined, there are quite a lot of free
> > bits. So those newly added bit flags should be assigned with correct
> > and valid bit values, and getting their names should be enabled as
> > well. Note that 'RECIP' should be removed, as nowhere will use it.
> > 'PKT_RX_ERR' is defined to replace all other error bit flags, e.g. MAC
> > error, Oversize error, header buffer overflow error.
> >
> > Signed-off-by: Helin Zhang 
> > ---
> >  lib/librte_mbuf/rte_mbuf.c  |  7 ++-
> >  lib/librte_mbuf/rte_mbuf.h  |  7 ++-
> >  lib/librte_pmd_i40e/i40e_rxtx.c | 15 ---
> >  3 files changed, 8 insertions(+), 21 deletions(-)
> >
> > v2 changes:
> > * Removed error flag of 'ECIPE' processing only in both i40e PMD and mbuf.
> All
> >   other error flags were added back.
> > * Assigned error flags with correct and valid values, as their previous 
> > values
> >   were invalid.
> > * Enabled getting all error flag names.
> >
> > v3 changes:
> > * 'PKT_RX_ERR' is defined to replace error bit flags of MAC error, Oversize
> >   error, header buffer overflow error.
> > * Removed assigning values to PKT_TX_* bit flags, as it has already been
> done
> >   in another packet set.
> >
> > diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> > index 1b14e02..5e6b5d0 100644
> > --- a/lib/librte_mbuf/rte_mbuf.c
> > +++ b/lib/librte_mbuf/rte_mbuf.c
> > @@ -210,11 +210,8 @@ const char *rte_get_rx_ol_flag_name(uint64_t
> mask)
> > case PKT_RX_FDIR: return "PKT_RX_FDIR";
> > case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
> > case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
> > -   /* case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD"; */
> > -   /* case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE"; */
> > -   /* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW";
> */
> > -   /* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
> > -   /* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
> > +   case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD";
> > +   case PKT_RX_ERR: return "PKT_RX_ERR";
> > case PKT_RX_IPV4_HDR: return "PKT_RX_IPV4_HDR";
> > case PKT_RX_IPV4_HDR_EXT: return "PKT_RX_IPV4_HDR_EXT";
> > case PKT_RX_IPV6_HDR: return "PKT_RX_IPV6_HDR"; diff --git
> > a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index
> > efdefc4..5e647a9 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -84,11 +84,6 @@ extern "C" {
> >  #define PKT_RX_FDIR  (1ULL << 2)  /**< RX packet with FDIR
> match indicate. */
> >  #define PKT_RX_L4_CKSUM_BAD  (1ULL << 3)  /**< L4 cksum of RX pkt.
> is
> > not OK. */  #define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)  /**< IP cksum
> of
> > RX pkt. is not OK. */ -#define PKT_RX_EIP_CKSUM_BAD (0ULL << 0)  /**<
> External IP header checksum error. */
> > -#define PKT_RX_OVERSIZE  (0ULL << 0)  /**< Num of desc of an RX
> pkt oversize. */
> > -#define PKT_RX_HBUF_OVERFLOW (0ULL << 0)  /**< Header buffer
> overflow. */
> > -#define PKT_RX_RECIP_ERR (0ULL << 0)  /**< Hardware processing
> error. */
> > -#define PKT_RX_MAC_ERR   (0ULL << 0)  /**< MAC error. */
> >  #define PKT_RX_IPV4_HDR  (1ULL << 5)  /**< RX packet with IPv4
> header. */
> >  #define PKT_RX_IPV4_HDR_EXT  (1ULL << 6)  /**< RX packet with
> extended IPv4 header. */
> >  #define PKT_RX_IPV6_HDR  (1ULL << 7)  /**< RX packet with IPv6
> header. */
> > @@ -99,6 +94,8 @@ extern "C" {
> >  #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet
> with IPv6 header. */
> >  #define PKT_RX_FDIR_ID   (1ULL << 13) /**< FD id reported if FDIR
> match. */
> >  #define PKT_RX_FDIR_FLX  (1ULL << 14) /**< Flexible bytes reported
> if FDIR match. */
> > +#define PKT_RX_EIP_CKSUM_BAD (1ULL << 15)  /**< External IP header
> checksum error. */
> > +#define PKT_RX_ERR   (1ULL << 16)  /**< Other errors, e.g.
> MAC error. */
> >  /* add new RX flags here */
> >
> >  /* add new TX flags here */
> > diff --git a/lib/librte_pmd_i40e/i40e_rxtx.c
> > b/lib/librte_pmd_i40e/i40e_rxtx.c index 2beae3c..5d99bd2 100644
> > --- a/lib/librte_pmd_i40e/i40e_rxtx.c

[dpdk-dev] [PATCH v3] mbuf: fix of enabling all newly added RX error flags

2014-12-09 Thread Zhang, Helin

Hi Thomas

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Monday, December 8, 2014 6:51 PM
> To: Zhang, Helin
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3] mbuf: fix of enabling all newly added RX
> error flags
> 
> Hi Helin,
> 
> 2014-12-06 09:33, Helin Zhang:
> > Before redefining mbuf structure, there was lack of free bits in 'ol_flags'
> > (32 bits in total) for new RX or TX flags. So it tried to reuse
> > existant bits as most as possible, or even assigning 0 to some of bit
> > flags. After new mbuf structure defined, there are quite a lot of free
> > bits. So those newly added bit flags should be assigned with correct
> > and valid bit values, and getting their names should be enabled as
> > well. Note that 'RECIP' should be removed, as nowhere will use it.
> > 'PKT_RX_ERR' is defined to replace all other error bit flags, e.g. MAC
> > error, Oversize error, header buffer overflow error.
> >
> > Signed-off-by: Helin Zhang 
> [...]
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -84,11 +84,6 @@ extern "C" {
> >  #define PKT_RX_FDIR  (1ULL << 2)  /**< RX packet with FDIR
> match indicate. */
> >  #define PKT_RX_L4_CKSUM_BAD  (1ULL << 3)  /**< L4 cksum of RX pkt.
> is
> > not OK. */  #define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)  /**< IP cksum
> of
> > RX pkt. is not OK. */ -#define PKT_RX_EIP_CKSUM_BAD (0ULL << 0)  /**<
> External IP header checksum error. */
> > -#define PKT_RX_OVERSIZE  (0ULL << 0)  /**< Num of desc of an RX
> pkt oversize. */
> > -#define PKT_RX_HBUF_OVERFLOW (0ULL << 0)  /**< Header buffer
> overflow. */
> > -#define PKT_RX_RECIP_ERR (0ULL << 0)  /**< Hardware processing
> error. */
> > -#define PKT_RX_MAC_ERR   (0ULL << 0)  /**< MAC error. */
> >  #define PKT_RX_IPV4_HDR  (1ULL << 5)  /**< RX packet with IPv4
> header. */
> >  #define PKT_RX_IPV4_HDR_EXT  (1ULL << 6)  /**< RX packet with
> extended IPv4 header. */
> >  #define PKT_RX_IPV6_HDR  (1ULL << 7)  /**< RX packet with IPv6
> header. */
> > @@ -99,6 +94,8 @@ extern "C" {
> >  #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet
> with IPv6 header. */
> >  #define PKT_RX_FDIR_ID   (1ULL << 13) /**< FD id reported if FDIR
> match. */
> >  #define PKT_RX_FDIR_FLX  (1ULL << 14) /**< Flexible bytes reported
> if FDIR match. */
> > +#define PKT_RX_EIP_CKSUM_BAD (1ULL << 15)  /**< External IP header
> > +checksum error. */
> 
> Could PKT_RX_EIP_CKSUM_BAD be aggregated with
> PKT_RX_IP_CKSUM_BAD?
I tend to say no, but I would listen to comments from others.
For tunneling case (e.g. IP over IP), it is a bit similar to the case of L3/L4 
(e.g. UDP over IP).
For L3/L4 case, we have PKT_RX_IP_CKSUM_BAD and PKT_RX_L4_CKSUM_BAD to
indicate the checksum error is in L3 or L4.
So I'd prefer to have PKT_RX_IP_CKSUM_BAD and PKT_RX_EIP_CKSUM_BAD to indicate
the checksum error is in outer or inner header.
Otherwise we have no chance to know where the checksum error is, based on mbuf.

> The conclusion is the same: the packet is corrupted.
> And some hardwares could not detect the encapsulation and use
> PKT_RX_IP_CKSUM_BAD.
If the hardware don't know it is a tunneling packet, it will just treat it as 
an IP packet. But if
hardware supports tunneling, it would be better for apps to know that more 
about the
packet which can be offloaded by hardware.

> 
> Another interesting improvement would be to have PKT_RX_IP_CKSUM_OK.
> I think we'll have to think about this kind of flag for next version.
For checksum OK, if no 'BAD' indicated, we can assume it is OK. Any other hints 
from you?

> 
> Note that this patch is an API change and shouldn't be applied for 1.8.0.
> But we can do an exception as it has no impact on existing applications and
> fixes the 0 flags.
Agree with you!

> 
> --
> Thomas

Thank you very much for thinking so much about this!

Regards,
Helin

94 matches

Mail list logo