date:20151019

[dpdk-dev] Question about unsupported transceivers

2015-10-19 Thread Alex Forster

On 10/15/15, 3:53 PM, "Alexander Duyck"  wrote:


>>> It looks like you are probably seeing interfaces be unbound and then
>>> rebound.  As such you are likely pushing things outside of the array
>>> boundary.  One solution might just be to at more ",1"s if you are only
>>> going to be doing this kind of thing at boot up.  The upper limit for
>>> the array is 32 entries so as long as you only are setting this up once
>>> you could probably get away with that.
>>>
>>> An alternative would be to modify the definition of the parameter in
>>> ixgbe_param.c.  If you look through the file you should fine several
>>> likes like below:
>>> struct ixgbe_option opt = {
>>> .type = enable_option,
>>> .name = "allow_unsupported_sfp",
>>> .err  = "defaulting to Disabled",
>>> .def  = OPTION_DISABLED
>>> };
>>>
>>> If you modify the .def value to "OPTION_ENABLED", and then rebuild and
>>> reinstall your driver you should be able have it install without any
>>> issues.
>>>
>>> - Alex
>>>
>> Yeah, I've had roughly the same thought process since you mentioned the
>> args array. My first idea was "maybe the driver can't fit all of my 1's"
>> but I saw it was defined at 32. Then I decided to just patch the whole
>> enable_unsupported_sfp option out
>> https://gist.github.com/AlexForster/112fd822704caf804849 but I'm still
>> failing.
>>
>> I've been digging a bit, and I'm failing here in ixgbe_main.c...
>>
>> /* reset_hw fills in the perm_addr as well */
>> hw->phy.reset_if_overtemp = true;
>> err = hw->mac.ops.reset_hw(hw);
>> hw->phy.reset_if_overtemp = false;
>> if (err == IXGBE_ERR_SFP_NOT_PRESENT) {
>>  err = IXGBE_SUCCESS;
>> } else if (err == IXGBE_ERR_SFP_NOT_SUPPORTED) {
>>  e_dev_err("failed to load because an unsupported SFP+ or QSFP "
>>"module type was detected.\n");
>>  e_dev_err("Reload the driver after installing a supported "
>>"module.\n");
>>  goto err_sw_init;
>> } else if (err) {
>>  e_dev_err("HW Init failed: %d\n", err);
>>  goto err_sw_init;
>> }
>>
>>
>> I've attempted a hand-stacktrace and came up with the following...
>>
>> ixgbe_82599.c at 1016
>>   * ixgbe_reset_hw_82599() is defined
>>   * calls phy->ops.init() which potentially returns
>> IXGBE_ERR_SFP_NOT_SUPPORTED
>>
>> ixgbe_82599.c at 102
>>   * ixgbe_init_phy_ops_82599() is defined
>>   * IXGBE_ERR_SFP_NOT_SUPPORTED is returned after calling
>> phy->ops.identify()
>>
>> ixgbe_82599.c at 2085
>>   * ixgbe_identify_phy_82599() is defined
>>   * calls ixgbe_identify_module_generic()
>>
>> ixgbe_phy.c at 1281
>>   * ixgbe_identify_module_generic() is defined
>>   * calls ixgbe_identify_qsfp_module_generic()
>>
>> ixgbe_phy.c at 1663
>>   * ixgbe_identify_qsfp_module_generic() is defined
>>   * We fail somewhere before the ending call to ixgbe_get_device_caps()
>> which does take allow_unsupported_sfp into account
>>
>>   * Possibility: hw->phy.ops.read_i2c_eeprom(hw, IXGBE_SFF_IDENTIFIER,
>> &identifier) != IXGBE_SFF_IDENTIFIER_QSFP_PLUS
>>   * Possibility: active_cable != true
>>
>> And then I'm over my head. Should I assume from here that the most
>>likely
>> explanation is a bad transceiver or bad fiber?
>>
>> Alex Forster
>
>Are you able to swap transceiver or fiber between the 4 ports that work
>and the 4 that don't?  If you could do that then you should be able to
>tell if the issue is following the NIC ports, or if it is an issue with
>the external connections.  If it is issue is following the transceiver
>or fiber then it is probably what is causing the issue.
>
>The other thing you could try doing is adding a printk to the spots
>where the status is set to SFP_NOT_SUPPORTED so that you could figure
>out exactly which spot is triggering the rejection of the module.
>
>- Alex

I had remote hands swap fibers on the QSFP side and the issue moved to the
first card, so I'm going to have the fibers cleaned and tested. This
appears to be my issue.

I'd like to submit a patch for ixgbe_identify_qsfp_module_generic() to
print more helpful errors in the two cases mentioned above, so that
hopefully nobody ever has to deal with this again. Would I be wasting my
time, or does something like this have a likelihood of being accepted?

Thank you for all of your help! I wouldn't have figured this out nearly as
quickly without it.

Alex Forster

[dpdk-dev] RSS API discussion

2015-10-19 Thread Lu, Wenzhuo

Hi N?lio,

> -Original Message-
> From: N?lio Laranjeiro [mailto:nelio.laranjeiro at 6wind.com]
> Sent: Friday, October 16, 2015 11:28 PM
> To: Lu, Wenzhuo
> Cc: dev at dpdk.org; Adrien Mazarguil; Olga Shern
> Subject: Re: RSS API discussion
> 
> On Fri, Oct 16, 2015 at 01:40:00PM +, Lu, Wenzhuo wrote:
> > Hi N?lio,
> >
> > > Hi Wenzhuo,
> > >
> > > We should discuss about this API for a future release of DPDK
> > > because this one lacks in flexibility.  Some other NICs have
> > > indirection tables with a different/configurable size, and the current 
> > > API does
> not help to manage it.
> > >
> > > For ConnectX-4 I have made a lot of hacks to avoid changing the DPDK
> > > API, "[dpdk-dev] [PATCH 0/3] Add RETA configuration to MLX5".
> > > http://dpdk.org/ml/archives/dev/2015-October/024681.html
> > >
> > > From a user point of view, to update the RETA table, the API expects
> > > the user to know the size of it to update or query.  With your
> > > patchset, Intel have two indirection table sizes now, with Mellanox
> > > ConnectX-4, I fixed to the size of 512 entries because it is not fixed by 
> > > default.
> > >
> > > How about discussing this in a separate thread?
> > Sure, I changed the tittle.
> > But I'm afraid I don't catch up with you. I have 2 questions, 1, Why the 
> > reta
> size cannot be predict on  Mellanox ConnectX-4? 2, I don't understand the
> meaning when you say the reta size is not fixed by default on Mellanox
> ConnectX-4.
> > Would you like to give more details of the Mellanox ConnectX-4's behavior?
> It'll be helpful to understand this problem.
> 
> On Connexct-4 we can configure the indirection table size with any power of
> two up to 512 entries.
> 
> The current API only expects RETA tables size to be a multiple of
> RTE_RETA_GROUP_SIZE i.e. 64.
> 
> To handle any kind of NICs, a simple array of N (RETA table size) elements 
> with
> the data, instead of something in multiple of 64 would be easier to use,
> something like:
> 
>  struct rte_eth_rss_reta_entry {
>   uint16_t idx; /* index to query/update. */
>   uint16_t reta; /* redirection value. */  };
> 
> We should update the API to take an array of such entries plus its size.
> It would also make update and query easier.
Agree. We should not limit the RETA table size to be multiple of 64 in RTE. 
Every NIC should check the RETA table size for itself.
I think the only reason the API is realized like this is that we only support 
Intel NICs before. We'd better change this API in the future release for the 
sake of all kinds of NICs.

> 
> --
> N?lio Laranjeiro
> 6WIND

[dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD

2015-10-19 Thread Tetsuya Mukawa

On 2015/10/16 21:52, Bruce Richardson wrote:
> On Mon, Aug 31, 2015 at 12:55:26PM +0900, Tetsuya Mukawa wrote:
>> The patch introduces a new PMD. This PMD is implemented as thin wrapper
>> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
>> The PMD can have 'iface' parameter like below to specify a path to connect
>> to a virtio-net device.
>>
>> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
>>
>> To connect above testpmd, here is qemu command example.
>>
>> $ qemu-system-x86_64 \
>> 
>> -chardev socket,id=chr0,path=/tmp/sock0 \
>> -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
>> -device virtio-net-pci,netdev=net0
>>
>> Signed-off-by: Tetsuya Mukawa 
> With this PMD in place, is there any need to keep the existing vhost library
> around as a separate entity? Can the existing library be subsumed/converted 
> into
> a standard PMD?
>
> /Bruce

Hi Bruce,

I concern about whether the PMD has all features of librte_vhost,
because librte_vhost provides more features and freedom than ethdev API
provides.
In some cases, user needs to choose limited implementation without
librte_vhost.
I am going to eliminate such cases while implementing the PMD.
But I don't have strong belief that we can remove librte_vhost now.

So how about keeping current separation in next DPDK?
I guess people will try to replace librte_vhost to vhost PMD, because
apparently using ethdev APIs will be useful in many cases.
And we will get feedbacks like "vhost PMD needs to support like this usage".
(Or we will not have feedbacks, but it's also OK.)
Then, we will be able to merge librte_vhost and vhost PMD.

Thanks,
Tetsuya

[dpdk-dev] [PATCH v2 0/3] Support for flow director behavior "passthru" on Intel FVL NIC

2015-10-19 Thread Wu, Jingjing



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Andrey Chilikin
> Sent: Wednesday, September 30, 2015 9:43 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v2 0/3] Support for flow director behavior
> "passthru" on Intel FVL NIC
> 
> This patch set adds new flow director behavior "passthru" on Intel X(L)710
> NIC.
> When this mode is selected flow director will direct packet to LAN while the
> queue is defined by other filters. This can be used to extract flexible 
> payload
> to RX desriptor with the flow director filter while targeted queue will be
> defined by other filters, for example, by hash filter (RSS).
> 
> v2: rename RTE_ETH_FDIR_OTHER to RTE_ETH_FDIR_PASSTHRU
> 
> Andrey Chilikin (3):
>   librte_ether: add RTE_ETH_FDIR_PASSTHRU for flow director behavior
>   i40e: add RTE_ETH_FDIR_PASSTHRU processing for flow director behavior
>   ixgbe: add check for supported flow director behaviors
> 
>  drivers/net/i40e/i40e_fdir.c|   12 ++--
>  drivers/net/ixgbe/ixgbe_fdir.c  |3 ++-
>  lib/librte_ether/rte_eth_ctrl.h |1 +
>  3 files changed, 13 insertions(+), 3 deletions(-)
> 
> --
Acked-by: Jingjing Wu 

> 1.7.4.1

[dpdk-dev] [PATCH 2/3] vhost: check that a device exists during reset_owner

2015-10-19 Thread Yuanhan Liu

On Fri, Oct 16, 2015 at 11:08:49AM +0200, Jerome Jutteau wrote:
> virtio-net search for it's device in reset_owner.
> The function don't check the return result of get_config_ll_entry
> which can be NULL.
> 
> Signed-off-by: Jerome Jutteau 
> ---
>  lib/librte_vhost/virtio-net.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
> index 955a29d..ec6a575 100644
> --- a/lib/librte_vhost/virtio-net.c
> +++ b/lib/librte_vhost/virtio-net.c
> @@ -402,6 +402,8 @@ reset_owner(struct vhost_device_ctx ctx)
>   uint64_t device_fh;
>  
>   ll_dev = get_config_ll_entry(ctx);
> + if (ll_dev == NULL)
> + return -1;

I would call get_device() here, which dumps an error message when
it's NULL so that user will get noticed.

Besides that, the 3 patches look good to me, and,

Acked-by: Yuanhan Liu 

--yliu

>   device_fh = ll_dev->dev.device_fh;
>  
>   cleanup_device(&ll_dev->dev);
> -- 
> jerome

[dpdk-dev] [PATCH v2 2/7] virtio: add software rx ring, fake_buf into virtqueue

2015-10-19 Thread Xie, Huawei

On 10/19/2015 12:20 PM, Stephen Hemminger wrote:

On Sun, 18 Oct 2015 14:28:59 +0800
Huawei Xie  wrote:



+   if (vq->sw_ring)
+   rte_free(vq->sw_ring);
+



Do not need to test for NULL before calling rte_free.
Better to just rely on the fact that rte_free(NULL) is documented
to be ok (no operation).



 ok, btw, in previous commit, just in the same function,
void virtio_dev_queue_release(vq)
[...]

   rte_free(vq);

   vq = NULL;
I think there is no need to set NULL to vq. Will submit a patch to fix it if 
you agree.

[dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine

2015-10-19 Thread Xie, Huawei

On 10/19/2015 12:19 PM, Stephen Hemminger wrote:
> +static inline void __attribute__((always_inline))
> +virtio_xmit_cleanup(struct virtqueue *vq)
> +{
>
> Please don't use always inline, frustrating the compiler isn't going
> to help.
always inline is scattered elsewhere in the dpdk code.
What is the negative effect? Should we remove all of them?
> + uint16_t i, desc_idx;
> + int nb_free = 0;
> + struct rte_mbuf *m, *free[VIRTIO_TX_MAX_FREE_BUF_SZ];
> +
> + desc_idx = (uint16_t)(vq->vq_used_cons_idx &
> + ((vq->vq_nentries >> 1) - 1));
> + free[0] = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
> + nb_free = 1;
> +
> + for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
> + m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
> + if (likely(m->pool == free[0]->pool))
> + free[nb_free++] = m;
> + else {
> + rte_mempool_put_bulk(free[0]->pool, (void **)free,
> + nb_free);
> + free[0] = m;
> + nb_free = 1;
> + }
> + }
> +
> + rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
> + vq->vq_used_cons_idx += VIRTIO_TX_FREE_NR;
> + vq->vq_free_cnt += (VIRTIO_TX_FREE_NR << 1);
> +
> + return;
> +}
>
> Don't add return; at end of void functions. It only clutters
> things for no reason.
Agree.
>

[dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine

2015-10-19 Thread Xie, Huawei

On 10/19/2015 12:17 PM, Stephen Hemminger wrote:
> On Sun, 18 Oct 2015 14:29:03 +0800
> Huawei Xie  wrote:
>
>> +
>> +for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
>> +m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
>> +if (likely(m->pool == free[0]->pool))
>> +free[nb_free++] = m;
>> +else {
>> +rte_mempool_put_bulk(free[0]->pool, (void **)free,
>> +nb_free);
>> +free[0] = m;
>> +nb_free = 1;
>> +}
>> +}
> This assumes all transmits are from the same pool, which is not necessarily 
> true.

Don't get you. It accumulates all the mbufs from the same pool, free all
of them until it meets one from a different pool.

if (likely(m->pool == free[0]->pool))


>

[dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine

2015-10-19 Thread Xie, Huawei

On 10/19/2015 12:17 PM, Stephen Hemminger wrote:
> On Sun, 18 Oct 2015 14:29:03 +0800
> Huawei Xie  wrote:
>
>> bulk free of mbufs when clean used ring.
>> shift operation of idx could be further saved if vq_free_cnt means
>> free slots rather than free descriptors.
>>
>> Signed-off-by: Huawei Xie 
> Did you measure this. I finished my transmit optimizations and gets 25% 
> performance improvement
> without any of these restrictions.
Which restriction do you mean?  For the ring layout optimization, this
is the core idea. For the single segment mbuf, this is what all other
PMDs assume for fastest rx/tx path.
With all vhost and virtio optimizations, it could achieve approximately
3~4 times performance improvement(depending on the workload).
Do you mean the indirect feature support or additional optimization not
submitted? Would review your patch this week.
>

[dpdk-dev] [PATCH v3] i40e: Add a workaround to drop flow control frames from VFs

2015-10-19 Thread Jingjing Wu

This patch adds a workaround to drop flow control frames from being
transmitted from VSIs.
With this patch in place a malicious VF cannot send flow control or PFC
packets out on the wire.

Signed-off-by: Jingjing Wu 
---
v3:
 - fix typo
v2:
 - reword comments

 drivers/net/i40e/i40e_ethdev.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 2dd9fdc..963e501 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -382,6 +382,30 @@ static inline void i40e_flex_payload_reg_init(struct 
i40e_hw *hw)
I40E_WRITE_REG(hw, I40E_GLQF_PIT(17), 0x7440);
 }

+#define I40E_FLOW_CONTROL_ETHERTYPE  0x8808
+
+/*
+ * Add a ethertype filter to drop all flow control frames transmitted
+ * from VSIs.
+*/
+static void
+i40e_add_tx_flow_control_drop_filter(struct i40e_pf *pf)
+{
+   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
+   uint16_t flags = I40E_AQC_ADD_CONTROL_PACKET_FLAGS_IGNORE_MAC |
+   I40E_AQC_ADD_CONTROL_PACKET_FLAGS_DROP |
+   I40E_AQC_ADD_CONTROL_PACKET_FLAGS_TX;
+   int ret;
+
+   ret = i40e_aq_add_rem_control_packet_filter(hw, NULL,
+   I40E_FLOW_CONTROL_ETHERTYPE, flags,
+   pf->main_vsi_seid, 0,
+   TRUE, NULL, NULL);
+   if (ret)
+   PMD_INIT_LOG(ERR, "Failed to add filter to drop flow control "
+ " frames from VSIs.");
+}
+
 static int
 eth_i40e_dev_init(struct rte_eth_dev *dev)
 {
@@ -584,6 +608,12 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)

/* enable uio intr after callback register */
rte_intr_enable(&(pci_dev->intr_handle));
+   /*
+* Add an ethertype filter to drop all flow control frames transmitted
+* from VSIs. By doing so, we stop VF from sending out PAUSE or PFC
+* frames to wire.
+*/
+   i40e_add_tx_flow_control_drop_filter(pf);

/* initialize mirror rule list */
TAILQ_INIT(&pf->mirror_list);
-- 
2.4.0

[dpdk-dev] [PATCH v4] mbuf/ip_frag: move mbuf chaining to common code

2015-10-19 Thread Olivier MATZ

Hi Simon,

On 10/15/2015 09:01 AM, Simon Kagstrom wrote:
> Chaining/segmenting mbufs can be useful in many places, so make it
> global.
> 
> Signed-off-by: Simon Kagstrom 
> Signed-off-by: Johan Faltstrom 

Acked-by: Olivier Matz 

Thanks!

[dpdk-dev] [Q] l2fwd in examples directory

2015-10-19 Thread Moon-Sang Lee

My NUT has Xeon L5520 that is based on Nehalem microarchitecture.
Does Nehalem supports PCIe interface on chipset?

Anyhow, 'lstopo' shows as below and it seems that my PCI devices are
connected to socket #0.
I'm still wondering why rte_eth_dev_socket_id(portid) always returns -1.

mslee at myhost:~$ lstopo
Machine (31GB)
  NUMANode L#0 (P#0 16GB) + Socket L#0 + L3 L#0 (8192KB)
L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
  PU L#0 (P#0)
  PU L#1 (P#8)
L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
  PU L#2 (P#2)
  PU L#3 (P#10)
L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
  PU L#4 (P#4)
  PU L#5 (P#12)
L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
  PU L#6 (P#6)
  PU L#7 (P#14)
  NUMANode L#1 (P#1 16GB) + Socket L#1 + L3 L#1 (8192KB)
L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4
  PU L#8 (P#1)
  PU L#9 (P#9)
L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5
  PU L#10 (P#3)
  PU L#11 (P#11)
L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6
  PU L#12 (P#5)
  PU L#13 (P#13)
L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7
  PU L#14 (P#7)
  PU L#15 (P#15)
  HostBridge L#0
PCIBridge
  PCI 14e4:163b
Net L#0 "em1"
  PCI 14e4:163b
Net L#1 "em2"
PCIBridge
  PCI 1000:0058
Block L#2 "sda"
Block L#3 "sdb"
PCIBridge
  PCIBridge
PCIBridge
  PCI 8086:10e8
  PCI 8086:10e8
PCIBridge
  PCI 8086:10e8
  PCI 8086:10e8
PCIBridge
  PCI 102b:0532
PCI 8086:3a20
PCI 8086:3a26
  Block L#4 "sr0"
mslee at myhost:~$



On Sun, Oct 18, 2015 at 2:51 PM, Moon-Sang Lee  wrote:

>
> thanks bruce.
>
> I didn't know that PCI slots have direct socket affinity.
> is it static or configurable through PCI configuration space?
> well, my NUT, two node NUMA, seems always returns -1 on calling
> rte_eth_dev_socket_id(portid) whenever portid is 0, 1, or other values.
> I appreciate if you explain more about getting the affinity.
>
> p.s.
> I'm using intel Xeon processor and 1G NIC(82576).
>
>
>
>
> On Fri, Oct 16, 2015 at 10:43 PM, Bruce Richardson <
> bruce.richardson at intel.com> wrote:
>
>> On Thu, Oct 15, 2015 at 11:08:57AM +0900, Moon-Sang Lee wrote:
>> > There is codes as below in examples/l2fwd/main.c and I think
>> > rte_eth_dev_socket_id(portid)
>> > always returns -1(SOCKET_ID_ANY) since there is no association code
>> between
>> > port and
>> > lcore in the example codes.
>>
>> Can you perhaps clarify what you mean here. On modern NUMA systems, such
>> as those
>> from Intel :-), the PCI slots are directly connected to the CPU sockets,
>> so the
>> ethernet ports do indeed have a direct NUMA affinity. It's not something
>> that
>> the app needs to specify.
>>
>> /Bruce
>>
>> > (i.e. I need to find a matching lcore from
>> > lcore_queue_conf[] with portid
>> > and call rte_lcore_to_socket_id(lcore_id).)
>> >
>> > /* init one RX queue */
>> > fflush(stdout);
>> > ret = rte_eth_rx_queue_setup(portid, 0, nb_rxd,
>> >  rte_eth_dev_socket_id(portid),
>> >  NULL,
>> >  l2fwd_pktmbuf_pool);
>> > if (ret < 0)
>> > rte_exit(EXIT_FAILURE, "rte_eth_rx_queue_setup:err=%d,
>> > port=%u\n",
>> >   ret, (unsigned) portid);
>> >
>> > It works fine even though memory is allocated in different NUMA node.
>> But I
>> > wonder there is
>> > a DPDK API that associates inlcore to port internally thus
>> > rte_eth_devices[portid].pci_dev->numa_node
>> > contains proper node.
>> >
>> >
>> > --
>> > Moon-Sang Lee, SW Engineer
>> > Email: sang0627 at gmail.com
>> > Wisdom begins in wonder. *Socrates*
>>
>
>
>
> --
> Moon-Sang Lee, SW Engineer
> Email: sang0627 at gmail.com
> Wisdom begins in wonder. *Socrates*
>



-- 
Moon-Sang Lee, SW Engineer
Email: sang0627 at gmail.com
Wisdom begins in wonder. *Socrates*

[dpdk-dev] [Q] l2fwd in examples directory

2015-10-19 Thread Moon-Sang Lee

Let me clarify my mixed stuffs.

My processor is L5520, family 6, model 26 that is based on Nehalem
microarchitecture
according to wikipedia (
https://en.wikipedia.org/wiki/Nehalem_(microarchitecture)),
it does not have PCI interface on chipset.

Therefore, "rte_eth_dev_socket_id(portid) always returns -1"  seems no
problem.
My understanding of the lstopo result might be wrong.

Thanks anyway.




On Mon, Oct 19, 2015 at 4:39 PM, Moon-Sang Lee  wrote:

>
> My NUT has Xeon L5520 that is based on Nehalem microarchitecture.
> Does Nehalem supports PCIe interface on chipset?
>
> Anyhow, 'lstopo' shows as below and it seems that my PCI devices are
> connected to socket #0.
> I'm still wondering why rte_eth_dev_socket_id(portid) always returns -1.
>
> mslee at myhost:~$ lstopo
> Machine (31GB)
>   NUMANode L#0 (P#0 16GB) + Socket L#0 + L3 L#0 (8192KB)
> L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
>   PU L#0 (P#0)
>   PU L#1 (P#8)
> L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
>   PU L#2 (P#2)
>   PU L#3 (P#10)
> L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
>   PU L#4 (P#4)
>   PU L#5 (P#12)
> L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
>   PU L#6 (P#6)
>   PU L#7 (P#14)
>   NUMANode L#1 (P#1 16GB) + Socket L#1 + L3 L#1 (8192KB)
> L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4
>   PU L#8 (P#1)
>   PU L#9 (P#9)
> L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5
>   PU L#10 (P#3)
>   PU L#11 (P#11)
> L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6
>   PU L#12 (P#5)
>   PU L#13 (P#13)
> L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7
>   PU L#14 (P#7)
>   PU L#15 (P#15)
>   HostBridge L#0
> PCIBridge
>   PCI 14e4:163b
> Net L#0 "em1"
>   PCI 14e4:163b
> Net L#1 "em2"
> PCIBridge
>   PCI 1000:0058
> Block L#2 "sda"
> Block L#3 "sdb"
> PCIBridge
>   PCIBridge
> PCIBridge
>   PCI 8086:10e8
>   PCI 8086:10e8
> PCIBridge
>   PCI 8086:10e8
>   PCI 8086:10e8
> PCIBridge
>   PCI 102b:0532
> PCI 8086:3a20
> PCI 8086:3a26
>   Block L#4 "sr0"
> mslee at myhost:~$
>
>
>
> On Sun, Oct 18, 2015 at 2:51 PM, Moon-Sang Lee  wrote:
>
>>
>> thanks bruce.
>>
>> I didn't know that PCI slots have direct socket affinity.
>> is it static or configurable through PCI configuration space?
>> well, my NUT, two node NUMA, seems always returns -1 on calling
>> rte_eth_dev_socket_id(portid) whenever portid is 0, 1, or other values.
>> I appreciate if you explain more about getting the affinity.
>>
>> p.s.
>> I'm using intel Xeon processor and 1G NIC(82576).
>>
>>
>>
>>
>> On Fri, Oct 16, 2015 at 10:43 PM, Bruce Richardson <
>> bruce.richardson at intel.com> wrote:
>>
>>> On Thu, Oct 15, 2015 at 11:08:57AM +0900, Moon-Sang Lee wrote:
>>> > There is codes as below in examples/l2fwd/main.c and I think
>>> > rte_eth_dev_socket_id(portid)
>>> > always returns -1(SOCKET_ID_ANY) since there is no association code
>>> between
>>> > port and
>>> > lcore in the example codes.
>>>
>>> Can you perhaps clarify what you mean here. On modern NUMA systems, such
>>> as those
>>> from Intel :-), the PCI slots are directly connected to the CPU sockets,
>>> so the
>>> ethernet ports do indeed have a direct NUMA affinity. It's not something
>>> that
>>> the app needs to specify.
>>>
>>> /Bruce
>>>
>>> > (i.e. I need to find a matching lcore from
>>> > lcore_queue_conf[] with portid
>>> > and call rte_lcore_to_socket_id(lcore_id).)
>>> >
>>> > /* init one RX queue */
>>> > fflush(stdout);
>>> > ret = rte_eth_rx_queue_setup(portid, 0, nb_rxd,
>>> >  rte_eth_dev_socket_id(portid),
>>> >  NULL,
>>> >  l2fwd_pktmbuf_pool);
>>> > if (ret < 0)
>>> > rte_exit(EXIT_FAILURE, "rte_eth_rx_queue_setup:err=%d,
>>> > port=%u\n",
>>> >   ret, (unsigned) portid);
>>> >
>>> > It works fine even though memory is allocated in different NUMA node.
>>> But I
>>> > wonder there is
>>> > a DPDK API that associates inlcore to port internally thus
>>> > rte_eth_devices[portid].pci_dev->numa_node
>>> > contains proper node.
>>> >
>>> >
>>> > --
>>> > Moon-Sang Lee, SW Engineer
>>> > Email: sang0627 at gmail.com
>>> > Wisdom begins in wonder. *Socrates*
>>>
>>
>>
>>
>> --
>> Moon-Sang Lee, SW Engineer
>> Email: sang0627 at gmail.com
>> Wisdom begins in wonder. *Socrates*
>>
>
>
>
> --
> Moon-Sang Lee, SW Engineer
> Email: sang0627 at gmail.com
> Wisdom begins in wonder. *Socrates*
>



-- 
Moon-Sang Lee, SW Engineer
Email: sang0627 at gmail.com
Wisdom begins in wonder. *Socrates*

[dpdk-dev] [PATCH 1/5] virtio: clean up space checks on xmit

2015-10-19 Thread Xie, Huawei

On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
> The space check for transmit ring only needs a single conditional.
> I.e only need to recheck for space if there was no space in first check.
I see you reorganized the code. It is more clear and readable with your
change, but no check is removed.
We could remove the check after virtio_xmit_cleanup. In current
implementation, we assume the virtio_xmit_cleanup and vq_ring_free_chain
always succeed.
> This can help performance and simplifies loop.
>
> Signed-off-by: Stephen Hemminger 
> ---
>  drivers/net/virtio/virtio_rxtx.c | 66 
> 
>  1 file changed, 27 insertions(+), 39 deletions(-)
>
> diff --git a/drivers/net/virtio/virtio_rxtx.c 
> b/drivers/net/virtio/virtio_rxtx.c
> index c5b53bb..5b50ed0 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -745,7 +745,6 @@ uint16_t
>  virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
>  {
>   struct virtqueue *txvq = tx_queue;
> - struct rte_mbuf *txm;
>   uint16_t nb_used, nb_tx;
>   int error;
>  
> @@ -759,57 +758,46 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
> **tx_pkts, uint16_t nb_pkts)
>   if (likely(nb_used > txvq->vq_nentries - txvq->vq_free_thresh))
>   virtio_xmit_cleanup(txvq, nb_used);
>  
> - nb_tx = 0;
> + for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
> + struct rte_mbuf *txm = tx_pkts[nb_tx];
> + int need = txm->nb_segs - txvq->vq_free_cnt + 1;
>  
> - while (nb_tx < nb_pkts) {
> - /* Need one more descriptor for virtio header. */
> - int need = tx_pkts[nb_tx]->nb_segs - txvq->vq_free_cnt + 1;
> -
> - /*Positive value indicates it need free vring descriptors */
> + /* Positive value indicates it need free vring descriptors */
As you fix the comment, if it is correct, could you do s/need/needs/ as
well, :)?
>   if (unlikely(need > 0)) {
>   nb_used = VIRTQUEUE_NUSED(txvq);
>   virtio_rmb();
>   need = RTE_MIN(need, (int)nb_used);
>  
>   virtio_xmit_cleanup(txvq, need);
> - need = (int)tx_pkts[nb_tx]->nb_segs -
> - txvq->vq_free_cnt + 1;
> - }
> -
> - /*
> -  * Zero or negative value indicates it has enough free
> -  * descriptors to use for transmitting.
> -  */
> - if (likely(need <= 0)) {
> - txm = tx_pkts[nb_tx];
> -
> - /* Do VLAN tag insertion */
> - if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) {
> - error = rte_vlan_insert(&txm);
> - if (unlikely(error)) {
> - rte_pktmbuf_free(txm);
> - ++nb_tx;
> - continue;
> - }
Still need is rechecked here. It could be removed if virtio_xmit_cleanup
ensures need would be <= 0 after the call.
> + need = txm->nb_segs - txvq->vq_free_cnt + 1;
> + if (unlikely(need > 0)) {
> + PMD_TX_LOG(ERR,
> +"No free tx descriptors to 
> transmit");
> + break;
>   }
> + }
>  
> - /* Enqueue Packet buffers */
> - error = virtqueue_enqueue_xmit(txvq, txm);
> + /* Do VLAN tag insertion */
> + if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) {
> + error = rte_vlan_insert(&txm);
>   if (unlikely(error)) {
> - if (error == ENOSPC)
> - PMD_TX_LOG(ERR, "virtqueue_enqueue Free 
> count = 0");
> - else if (error == EMSGSIZE)
> - PMD_TX_LOG(ERR, "virtqueue_enqueue Free 
> count < 1");
> - else
> - PMD_TX_LOG(ERR, "virtqueue_enqueue 
> error: %d", error);
> - break;
> + rte_pktmbuf_free(txm);
> + continue;
>   }
> - nb_tx++;
> - txvq->bytes += txm->pkt_len;
> - } else {
> - PMD_TX_LOG(ERR, "No free tx descriptors to transmit");
> + }
> +
> + /* Enqueue Packet buffers */
> + error = virtqueue_enqueue_xmit(txvq, txm);
> + if (unlikely(error)) {
> + if (error == ENOSPC)
> + PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 
> 0");
> + else if (error == EMSGSIZE)
> +

[dpdk-dev] [PATCH 2/2] i40e: Enlarge the number of supported queues

2015-10-19 Thread Wu, Jingjing

Hi, helin

Few comments

> a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index
> 4b70588..3bdcaa4 100644
> --- a/drivers/net/i40e/i40e_ethdev.c
> +++ b/drivers/net/i40e/i40e_ethdev.c
> @@ -2240,113 +2240,88 @@ i40e_pf_parameter_init(struct rte_eth_dev
> *dev)  {
>   struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data-
> >dev_private);
>   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
> - uint16_t sum_queues = 0, sum_vsis, left_queues;
> + uint16_t qp_count = 0, vsi_count = 0;
> 
> - /* First check if FW support SRIOV */
>   if (dev->pci_dev->max_vfs && !hw->func_caps.sr_iov_1_1) {
>   PMD_INIT_LOG(ERR, "HW configuration doesn't support
> SRIOV");
>   return -EINVAL;
>   }
> 
>   pf->flags = I40E_FLAG_HEADER_SPLIT_DISABLED;
> - pf->max_num_vsi = RTE_MIN(hw->func_caps.num_vsis,
> I40E_MAX_NUM_VSIS);
> - PMD_INIT_LOG(INFO, "Max supported VSIs:%u", pf->max_num_vsi);
> - /* Allocate queues for pf */
> - if (hw->func_caps.rss) {
> - pf->flags |= I40E_FLAG_RSS;
> - pf->lan_nb_qps = RTE_MIN(hw->func_caps.num_tx_qp,
> - (uint32_t)(1 << hw-
> >func_caps.rss_table_entry_width));
> - pf->lan_nb_qps = i40e_align_floor(pf->lan_nb_qps);
> - } else
> + pf->max_num_vsi = hw->func_caps.num_vsis;
> + pf->lan_nb_qp_max = RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF;
> + pf->vmdq_nb_qp_max = RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM;
> + pf->vf_nb_qp_max = RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF;
> +
Need use the NUM_PER_VF but not NUM_PER_PF
pf->vf_nb_qp_max = RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF? ==> pf->vf_nb_qp_max = 
RTE_LIBRTE_I40E_QUEUE_NUM_PER_VF;

[dpdk-dev] [PATCH v3] i40e: Add a workaround to drop flow control frames from VFs

2015-10-19 Thread Lu, Wenzhuo

Hi,

> -Original Message-
> From: Wu, Jingjing
> Sent: Monday, October 19, 2015 2:32 PM
> To: dev at dpdk.org
> Cc: Wu, Jingjing; Lu, Wenzhuo; Pei, Yulong
> Subject: [PATCH v3] i40e: Add a workaround to drop flow control frames
> from VFs
> 
> This patch adds a workaround to drop flow control frames from being
> transmitted from VSIs.
> With this patch in place a malicious VF cannot send flow control or PFC
> packets out on the wire.
> 
> Signed-off-by: Jingjing Wu 
Acked-by: Wenzhuo Lu

[dpdk-dev] [PATCH 2/2] i40e: Enlarge the number of supported queues

2015-10-19 Thread Zhang, Helin



> -Original Message-
> From: Wu, Jingjing
> Sent: Monday, October 19, 2015 4:30 PM
> To: Zhang, Helin; dev at dpdk.org
> Cc: Pei, Yulong; Liu, Yong
> Subject: RE: [PATCH 2/2] i40e: Enlarge the number of supported queues
> 
> Hi, helin
> 
> Few comments
> 
> > a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
> > index
> > 4b70588..3bdcaa4 100644
> > --- a/drivers/net/i40e/i40e_ethdev.c
> > +++ b/drivers/net/i40e/i40e_ethdev.c
> > @@ -2240,113 +2240,88 @@ i40e_pf_parameter_init(struct rte_eth_dev
> > *dev)  {
> > struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data-
> > >dev_private);
> > struct i40e_hw *hw = I40E_PF_TO_HW(pf);
> > -   uint16_t sum_queues = 0, sum_vsis, left_queues;
> > +   uint16_t qp_count = 0, vsi_count = 0;
> >
> > -   /* First check if FW support SRIOV */
> > if (dev->pci_dev->max_vfs && !hw->func_caps.sr_iov_1_1) {
> > PMD_INIT_LOG(ERR, "HW configuration doesn't support SRIOV");
> > return -EINVAL;
> > }
> >
> > pf->flags = I40E_FLAG_HEADER_SPLIT_DISABLED;
> > -   pf->max_num_vsi = RTE_MIN(hw->func_caps.num_vsis,
> > I40E_MAX_NUM_VSIS);
> > -   PMD_INIT_LOG(INFO, "Max supported VSIs:%u", pf->max_num_vsi);
> > -   /* Allocate queues for pf */
> > -   if (hw->func_caps.rss) {
> > -   pf->flags |= I40E_FLAG_RSS;
> > -   pf->lan_nb_qps = RTE_MIN(hw->func_caps.num_tx_qp,
> > -   (uint32_t)(1 << hw-
> > >func_caps.rss_table_entry_width));
> > -   pf->lan_nb_qps = i40e_align_floor(pf->lan_nb_qps);
> > -   } else
> > +   pf->max_num_vsi = hw->func_caps.num_vsis;
> > +   pf->lan_nb_qp_max = RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF;
> > +   pf->vmdq_nb_qp_max = RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM;
> > +   pf->vf_nb_qp_max = RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF;
> > +
> Need use the NUM_PER_VF but not NUM_PER_PF
> pf->vf_nb_qp_max = RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF? ==>
> pf->pf->vf_nb_qp_max = RTE_LIBRTE_I40E_QUEUE_NUM_PER_VF;
Yes, you are right. Thank you very much!
I will correct it in the next version.

Helin

>

[dpdk-dev] [PATCH] fm10k: add Intel Boulder Rapid NIC support

2015-10-19 Thread He, Shaopeng


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Michael Qiu
> Sent: Friday, September 25, 2015 10:31 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] fm10k: add Intel Boulder Rapid NIC support
> 
> Boulder Rapid is Intel new NIC within fm10k family.
> This patch make DPDK driver support this new NIC.
> 
> Signed-off-by: Michael Qiu 

Acked-by : Shaopeng He

[dpdk-dev] [PATCH] tools/setup.sh: let function quit to exit directly (may convenient for use); replace tab by space;

2015-10-19 Thread Zhangjun (Kenny)


>From 49090c56383f1255e3ea1991f51e83b1ed6fb8d6 Mon Sep 17 00:00:00 2001
From: z00182565 
Date: Mon, 19 Oct 2015 16:43:59 +0800
Subject: [PATCH] tools/setup.sh: let function quit to exit directly (may
 convenient for use); replace tab by space;

---
 tools/setup.sh | 679 -
 1 file changed, 338 insertions(+), 341 deletions(-)

diff --git a/tools/setup.sh b/tools/setup.sh
index 5a8b2f3..a43d16d 100755
--- a/tools/setup.sh
+++ b/tools/setup.sh
@@ -50,11 +50,11 @@ echo 
"--
 EAL_PARAMS='-n 4'

 #
-# Sets QUIT variable so script will finish.
+# finish script.
 #
 quit()
 {
-   QUIT=$1
+exit 0
 }

 #
@@ -62,21 +62,21 @@ quit()
 #
 setup_icc()
 {
-   DEFAULT_PATH=/opt/intel/bin/iccvars.sh
-   param=$1
-   shpath=`which iccvars.sh 2> /dev/null`
-   if [ $? -eq 0 ] ; then
-   echo "Loading iccvars.sh from $shpath for $param"
-   source $shpath $param
-   elif [ -f $DEFAULT_PATH ] ; then
-   echo "Loading iccvars.sh from $DEFAULT_PATH for $param"
-   source $DEFAULT_PATH $param
-   else
-   echo "## ERROR: cannot find 'iccvars.sh' script to set up ICC."
-   echo "## To fix, please add the directory that contains"
-   echo "## iccvars.sh  to your 'PATH' environment variable."
-   quit
-   fi
+DEFAULT_PATH=/opt/intel/bin/iccvars.sh
+param=$1
+shpath=`which iccvars.sh 2> /dev/null`
+if [ $? -eq 0 ] ; then
+echo "Loading iccvars.sh from $shpath for $param"
+source $shpath $param
+elif [ -f $DEFAULT_PATH ] ; then
+echo "Loading iccvars.sh from $DEFAULT_PATH for $param"
+source $DEFAULT_PATH $param
+else
+echo "## ERROR: cannot find 'iccvars.sh' script to set up ICC."
+echo "## To fix, please add the directory that contains"
+echo "## iccvars.sh  to your 'PATH' environment variable."
+quit
+fi
 }

 #
@@ -84,24 +84,22 @@ setup_icc()
 #
 setup_target()
 {
-   option=$1
-   export RTE_TARGET=${TARGETS[option]}
-
-   compiler=${RTE_TARGET##*-}
-   if [ "$compiler" == "icc" ] ; then
-   platform=${RTE_TARGET%%-*}
-   if [ "$platform" == "x86_64" ] ; then
-   setup_icc intel64
-   else
-   setup_icc ia32
-   fi
-   fi
-   if [ "$QUIT" == "0" ] ; then
-   make install T=${RTE_TARGET}
-   fi
-   echo 
"--"
-   echo " RTE_TARGET exported as $RTE_TARGET"
-   echo 
"--"
+option=$1
+export RTE_TARGET=${TARGETS[option]}
+
+compiler=${RTE_TARGET##*-}
+if [ "$compiler" == "icc" ] ; then
+platform=${RTE_TARGET%%-*}
+if [ "$platform" == "x86_64" ] ; then
+setup_icc intel64
+else
+setup_icc ia32
+fi
+fi
+make install T=${RTE_TARGET}
+echo 
"--"
+echo " RTE_TARGET exported as $RTE_TARGET"
+echo 
"--"
 }

 #
@@ -109,7 +107,7 @@ setup_target()
 #
 uninstall_targets()
 {
-   make uninstall
+make uninstall
 }

 #
@@ -117,13 +115,13 @@ uninstall_targets()
 #
 create_mnt_huge()
 {
-   echo "Creating /mnt/huge and mounting as hugetlbfs"
-   sudo mkdir -p /mnt/huge
+echo "Creating /mnt/huge and mounting as hugetlbfs"
+sudo mkdir -p /mnt/huge

-   grep -s '/mnt/huge' /proc/mounts > /dev/null
-   if [ $? -ne 0 ] ; then
-   sudo mount -t hugetlbfs nodev /mnt/huge
-   fi
+grep -s '/mnt/huge' /proc/mounts > /dev/null
+if [ $? -ne 0 ] ; then
+sudo mount -t hugetlbfs nodev /mnt/huge
+fi
 }

 #
@@ -131,15 +129,15 @@ create_mnt_huge()
 #
 remove_mnt_huge()
 {
-   echo "Unmounting /mnt/huge and removing directory"
-   grep -s '/mnt/huge' /proc/mounts > /dev/null
-   if [ $? -eq 0 ] ; then
-   sudo umount /mnt/huge
-   fi
-
-   if [ -d /mnt/huge ] ; then
-   sudo rm -R /mnt/huge
-   fi
+echo "Unmounting /mnt/huge and removing directory"
+grep -s '/mnt/huge' /proc/mounts > /dev/null
+if [ $? -eq 0 ] ; then
+sudo umount /mnt/huge
+fi
+
+if [ -d /mnt/huge ] ; then
+sudo rm -R /mnt/huge
+fi
 }

 #
@@ -147,11 +145,11 @@ remove_mnt_huge()
 #
 remove_igb_uio_module()
 {
-   echo "Unloading any existing DPDK UIO module"
-   /sbin/lsmod | grep -s igb_uio > /dev/null
-   if [ $? -eq 0 ] ; then
-   sudo /sbin/rmmod igb_uio
-   fi
+echo "Unloading any existing DPDK UIO module"
+/sbin/lsmod | grep

[dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD

2015-10-19 Thread Loftus, Ciara

> On 2015/10/16 21:52, Bruce Richardson wrote:
> > On Mon, Aug 31, 2015 at 12:55:26PM +0900, Tetsuya Mukawa wrote:
> >> The patch introduces a new PMD. This PMD is implemented as thin
> wrapper
> >> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> >> The PMD can have 'iface' parameter like below to specify a path to
> connect
> >> to a virtio-net device.
> >>
> >> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
> >>
> >> To connect above testpmd, here is qemu command example.
> >>
> >> $ qemu-system-x86_64 \
> >> 
> >> -chardev socket,id=chr0,path=/tmp/sock0 \
> >> -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
> >> -device virtio-net-pci,netdev=net0
> >>
> >> Signed-off-by: Tetsuya Mukawa 
> > With this PMD in place, is there any need to keep the existing vhost library
> > around as a separate entity? Can the existing library be
> subsumed/converted into
> > a standard PMD?
> >
> > /Bruce
> 
> Hi Bruce,
> 
> I concern about whether the PMD has all features of librte_vhost,
> because librte_vhost provides more features and freedom than ethdev API
> provides.
> In some cases, user needs to choose limited implementation without
> librte_vhost.
> I am going to eliminate such cases while implementing the PMD.
> But I don't have strong belief that we can remove librte_vhost now.
> 
> So how about keeping current separation in next DPDK?
> I guess people will try to replace librte_vhost to vhost PMD, because
> apparently using ethdev APIs will be useful in many cases.
> And we will get feedbacks like "vhost PMD needs to support like this usage".
> (Or we will not have feedbacks, but it's also OK.)
> Then, we will be able to merge librte_vhost and vhost PMD.

I agree with the above. One the concerns I had when reviewing the patch was 
that the PMD removes some freedom that is available with the library. Eg. 
Ability to implement the new_device and destroy_device callbacks. If using the 
PMD you are constrained to the implementations of these in the PMD driver, but 
if using librte_vhost, you can implement your own with whatever functionality 
you like - a good example of this can be seen in the vhost sample app.
On the other hand, the PMD is useful in that it removes a lot of complexity for 
the user and may work for some more general use cases. So I would be in favour 
of having both options available too.

Ciara

> 
> Thanks,
> Tetsuya

[dpdk-dev] [Q] l2fwd in examples directory

2015-10-19 Thread Bruce Richardson

On Mon, Oct 19, 2015 at 04:39:41PM +0900, Moon-Sang Lee wrote:
> My NUT has Xeon L5520 that is based on Nehalem microarchitecture.
> Does Nehalem supports PCIe interface on chipset?

For nehalem, I think having the PCI numa node reported as -1 is normal, as it's
not directly connected to the physical sockets, but connected to the chipset
instead. For later generation CPU Xeon platforms, your PCI slots are physically
going to be connected to one CPU socket or the other.

/Bruce

[dpdk-dev] [PATCH 2/3] vhost: check that a device exists during reset_owner

2015-10-19 Thread Jérôme Jutteau

Hi Yuanhan,

2015-10-19 6:35 GMT+02:00 Yuanhan Liu :
> I would call get_device() here, which dumps an error message when
> it's NULL so that user will get noticed.
>
> Besides that, the 3 patches look good to me, and,
>
> Acked-by: Yuanhan Liu 

Thanks for your review, a v2 is coming soon with get_device().

-- 
J?r?me

[dpdk-dev] [PATCH v2 0/4] vhost: Fix virtio-net on VHOST_USER_RESET_OWNER

2015-10-19 Thread Jerome Jutteau

Hi,

I have a bug when Qemu with two vhost interfaces gently stops (SIGINT).
When stopping, it sends two RESET_OWNER for each interface:
- Before stopping, we have two interfaces identifers: 0 and 1.
- The first reset_owner call resets device 1 (and this id device_fh) to zero,
  the device list now contains two devices with id 0.
- The second call don't find device 1 and segfault as reset_owner don't
  check if the device has been found or not.
- Later, user_get_vring_base can also segfault for the same reason.

This series of patches propose to fix the way reset_owner alter a device and
add more checks when searching for a device.

In this v2, we use get_device instead of get_config_ll_entry to get an error
message when a device is not found.

Jerome Jutteau (4):
  vhost: avoid device identifier to be reset to 0 in reset_owner
  vhost: check that a device exists during reset_owner
  vhost: protect user_get_vring_base from unknown devices
  vhost: change method to get device in reset_owner

 lib/librte_vhost/vhost_user/virtio-net-user.c |  2 ++
 lib/librte_vhost/virtio-net.c | 14 +-
 2 files changed, 11 insertions(+), 5 deletions(-)

-- 
jerome

[dpdk-dev] [PATCH v2 1/4] vhost: avoid device identifier to be reset to 0 in reset_owner

2015-10-19 Thread Jerome Jutteau

virtio-net clean and init device after a VHOST_USER_RESET_OWNER.
This reset device identifier to 0 and break ll_root listing logic.
This patch keep the old device identifier and re-write it on the cleaned
device.

Signed-off-by: Jerome Jutteau 
---
 lib/librte_vhost/virtio-net.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index d0f1764..955a29d 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -399,11 +399,14 @@ static int
 reset_owner(struct vhost_device_ctx ctx)
 {
struct virtio_net_config_ll *ll_dev;
+   uint64_t device_fh;

ll_dev = get_config_ll_entry(ctx);
+   device_fh = ll_dev->dev.device_fh;

cleanup_device(&ll_dev->dev);
init_device(&ll_dev->dev);
+   ll_dev->dev.device_fh = device_fh;

return 0;
 }
-- 
jerome

[dpdk-dev] [PATCH v2 2/4] vhost: check that a device exists during reset_owner

2015-10-19 Thread Jerome Jutteau

virtio-net search for it's device in reset_owner.
The function don't check the return result of get_config_ll_entry
which can be NULL.

Signed-off-by: Jerome Jutteau 
---
 lib/librte_vhost/virtio-net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 955a29d..ec6a575 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -402,6 +402,8 @@ reset_owner(struct vhost_device_ctx ctx)
uint64_t device_fh;

ll_dev = get_config_ll_entry(ctx);
+   if (ll_dev == NULL)
+   return -1;
device_fh = ll_dev->dev.device_fh;

cleanup_device(&ll_dev->dev);
-- 
jerome

[dpdk-dev] [PATCH v2 3/4] vhost: protect user_get_vring_base from unknown devices

2015-10-19 Thread Jerome Jutteau

get_device return is not checked and may cause segfault when device is
not found. This patch fix this.

Signed-off-by: Jerome Jutteau 
---
 lib/librte_vhost/vhost_user/virtio-net-user.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c 
b/lib/librte_vhost/vhost_user/virtio-net-user.c
index 4689927..e0bc2a4 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -276,6 +276,8 @@ user_get_vring_base(struct vhost_device_ctx ctx,
 {
struct virtio_net *dev = get_device(ctx);

+   if (dev == NULL)
+   return -1;
/* We have to stop the queue (virtio) if it is running. */
if (dev->flags & VIRTIO_DEV_RUNNING)
notify_ops->destroy_device(dev);
-- 
jerome

[dpdk-dev] [PATCH v2 4/4] vhost: change method to get device in reset_owner

2015-10-19 Thread Jerome Jutteau

Using get_config_ll_entry in reset_owner don't show any error when the
device is not found. This patch fix this by using get_device instead
instead of get_config_ll_entry.

Signed-off-by: Jerome Jutteau 
---
 lib/librte_vhost/virtio-net.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index ec6a575..a6ab245 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -398,18 +398,17 @@ set_owner(struct vhost_device_ctx ctx)
 static int
 reset_owner(struct vhost_device_ctx ctx)
 {
-   struct virtio_net_config_ll *ll_dev;
+   struct virtio_net *dev;
uint64_t device_fh;

-   ll_dev = get_config_ll_entry(ctx);
-   if (ll_dev == NULL)
+   dev = get_device(ctx);
+   if (dev == NULL)
return -1;
-   device_fh = ll_dev->dev.device_fh;
-
-   cleanup_device(&ll_dev->dev);
-   init_device(&ll_dev->dev);
-   ll_dev->dev.device_fh = device_fh;

+   device_fh = dev->device_fh;
+   cleanup_device(dev);
+   init_device(dev);
+   dev->device_fh = device_fh;
return 0;
 }

-- 
jerome

[dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD

2015-10-19 Thread Bruce Richardson

On Mon, Oct 19, 2015 at 10:32:50AM +0100, Loftus, Ciara wrote:
> > On 2015/10/16 21:52, Bruce Richardson wrote:
> > > On Mon, Aug 31, 2015 at 12:55:26PM +0900, Tetsuya Mukawa wrote:
> > >> The patch introduces a new PMD. This PMD is implemented as thin
> > wrapper
> > >> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> > >> The PMD can have 'iface' parameter like below to specify a path to
> > connect
> > >> to a virtio-net device.
> > >>
> > >> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
> > >>
> > >> To connect above testpmd, here is qemu command example.
> > >>
> > >> $ qemu-system-x86_64 \
> > >> 
> > >> -chardev socket,id=chr0,path=/tmp/sock0 \
> > >> -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
> > >> -device virtio-net-pci,netdev=net0
> > >>
> > >> Signed-off-by: Tetsuya Mukawa 
> > > With this PMD in place, is there any need to keep the existing vhost 
> > > library
> > > around as a separate entity? Can the existing library be
> > subsumed/converted into
> > > a standard PMD?
> > >
> > > /Bruce
> > 
> > Hi Bruce,
> > 
> > I concern about whether the PMD has all features of librte_vhost,
> > because librte_vhost provides more features and freedom than ethdev API
> > provides.
> > In some cases, user needs to choose limited implementation without
> > librte_vhost.
> > I am going to eliminate such cases while implementing the PMD.
> > But I don't have strong belief that we can remove librte_vhost now.
> > 
> > So how about keeping current separation in next DPDK?
> > I guess people will try to replace librte_vhost to vhost PMD, because
> > apparently using ethdev APIs will be useful in many cases.
> > And we will get feedbacks like "vhost PMD needs to support like this usage".
> > (Or we will not have feedbacks, but it's also OK.)
> > Then, we will be able to merge librte_vhost and vhost PMD.
> 
> I agree with the above. One the concerns I had when reviewing the patch was 
> that the PMD removes some freedom that is available with the library. Eg. 
> Ability to implement the new_device and destroy_device callbacks. If using 
> the PMD you are constrained to the implementations of these in the PMD 
> driver, but if using librte_vhost, you can implement your own with whatever 
> functionality you like - a good example of this can be seen in the vhost 
> sample app.
> On the other hand, the PMD is useful in that it removes a lot of complexity 
> for the user and may work for some more general use cases. So I would be in 
> favour of having both options available too.
> 
> Ciara
>

Thanks.
However, just because the libraries are merged does not mean that you need
be limited by PMD functionality. Many PMDs provide additional library-specific
functions over and above their PMD capabilities. The bonded PMD is a good 
example
here, as it has a whole set of extra functions to create and manipulate bonded
devices - things that are obviously not part of the general ethdev API. Other
vPMDs similarly include functions to allow them to be created on the fly too.

regards,
/Bruce

[dpdk-dev] [PATCH] eal: don't reset getopt lib

2015-10-19 Thread Bruce Richardson

On Thu, Oct 15, 2015 at 07:46:04PM +0800, Tiwei Bie wrote:
> Someone may need to call rte_eal_init() with a fake argc/argv array
> in the middle of using getopt() to parse its own unrelated argc/argv
> parameters. So getopt lib shouldn't be reset by rte_eal_init().
> 
> Now eal will always save optind, optarg and optopt (and optreset on
> FreeBSD) at the beginning, initialize optind (and optreset on FreeBSD)
> to 1 before calling getopt_long(), then restore all values after.

This patch looks good overall. Minor comment inline below.

/Bruce
> 
> Suggested-by: Don Provan 
> Suggested-by: Bruce Richardson 
> Signed-off-by: Tiwei Bie 
> ---
>  lib/librte_eal/bsdapp/eal/eal.c   | 59 +++--
>  lib/librte_eal/linuxapp/eal/eal.c | 69 
> ++-
>  2 files changed, 102 insertions(+), 26 deletions(-)
> 
> diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
> index 1b6f705..bd09377 100644
> --- a/lib/librte_eal/bsdapp/eal/eal.c
> +++ b/lib/librte_eal/bsdapp/eal/eal.c
> @@ -312,8 +312,20 @@ eal_log_level_parse(int argc, char **argv)
>   int opt;
>   char **argvopt;
>   int option_index;
> + int old_optind;
> + int old_optopt;
> + int old_optreset;
> + char *old_optarg;
> +
> + /* save getopt lib */
> + old_optind = optind;
> + old_optopt = optopt;
> + old_optreset = optreset;
> + old_optarg = optarg;

Rather than adding in 10 lines, you could shorten this, and provide additional
compiler hints by merging these assignments into the variable declarations, and
then making the vars const. e.g.

const int old_optind = optind;

>  
>   argvopt = argv;
> + optind = 1;
> + optreset = 1;

[dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD

2015-10-19 Thread Tetsuya Mukawa

On 2015/10/19 18:45, Bruce Richardson wrote:
> On Mon, Oct 19, 2015 at 10:32:50AM +0100, Loftus, Ciara wrote:
>>> On 2015/10/16 21:52, Bruce Richardson wrote:
 On Mon, Aug 31, 2015 at 12:55:26PM +0900, Tetsuya Mukawa wrote:
> The patch introduces a new PMD. This PMD is implemented as thin
>>> wrapper
> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> The PMD can have 'iface' parameter like below to specify a path to
>>> connect
> to a virtio-net device.
>
> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
>
> To connect above testpmd, here is qemu command example.
>
> $ qemu-system-x86_64 \
> 
> -chardev socket,id=chr0,path=/tmp/sock0 \
> -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
> -device virtio-net-pci,netdev=net0
>
> Signed-off-by: Tetsuya Mukawa 
 With this PMD in place, is there any need to keep the existing vhost 
 library
 around as a separate entity? Can the existing library be
>>> subsumed/converted into
 a standard PMD?

 /Bruce
>>> Hi Bruce,
>>>
>>> I concern about whether the PMD has all features of librte_vhost,
>>> because librte_vhost provides more features and freedom than ethdev API
>>> provides.
>>> In some cases, user needs to choose limited implementation without
>>> librte_vhost.
>>> I am going to eliminate such cases while implementing the PMD.
>>> But I don't have strong belief that we can remove librte_vhost now.
>>>
>>> So how about keeping current separation in next DPDK?
>>> I guess people will try to replace librte_vhost to vhost PMD, because
>>> apparently using ethdev APIs will be useful in many cases.
>>> And we will get feedbacks like "vhost PMD needs to support like this usage".
>>> (Or we will not have feedbacks, but it's also OK.)
>>> Then, we will be able to merge librte_vhost and vhost PMD.
>> I agree with the above. One the concerns I had when reviewing the patch was 
>> that the PMD removes some freedom that is available with the library. Eg. 
>> Ability to implement the new_device and destroy_device callbacks. If using 
>> the PMD you are constrained to the implementations of these in the PMD 
>> driver, but if using librte_vhost, you can implement your own with whatever 
>> functionality you like - a good example of this can be seen in the vhost 
>> sample app.
>> On the other hand, the PMD is useful in that it removes a lot of complexity 
>> for the user and may work for some more general use cases. So I would be in 
>> favour of having both options available too.
>>
>> Ciara
>>
> Thanks.
> However, just because the libraries are merged does not mean that you need
> be limited by PMD functionality. Many PMDs provide additional library-specific
> functions over and above their PMD capabilities. The bonded PMD is a good 
> example
> here, as it has a whole set of extra functions to create and manipulate bonded
> devices - things that are obviously not part of the general ethdev API. Other
> vPMDs similarly include functions to allow them to be created on the fly too.
>
> regards,
> /Bruce

Hi Bruce,

I appreciate for showing a good example. I haven't noticed the PMD.
I will check the bonding PMD, and try to remove librte_vhost without
losing freedom and features of the library.

Regards,
Tetsuya

[dpdk-dev] [PATCH v3] ip_pipeline: add more functions to routing-pipeline

2015-10-19 Thread Jasvinder Singh

This patch adds following features to the
routing-pipeline to enable it for various NFV
use-cases;

1.Fast-path ARP table enable/disable
2.Double-tagged VLAN (Q-in-Q) packet enacapsulation
for the next-hop
3.MPLS encapsulation for the next-hop
4.Add colour (Traffic-class for QoS) to the MPLS tag
5.Classification action to select the input queue
of the hierarchical scehdular (QoS)

The above proposed features can be enabled
(or disabled) through the parameters specified
in configuration file as below;

[PIPELINE0]
type = ROUTING
core = 1
pktq_in = RXQ0.0 RXQ1.0 RXQ2.0 RXQ3.0
pktq_out = TXQ0.0 TXQ1.0 TXQ2.0 TXQ3.0
n_routes = 4096
n_arp_entries = 1024
ip_hdr_offset = 142
arp_key_offset = 64
l2 = qinq
qinq_sched = no

The LPM table entries might include additional
fields depending upon the packet encapsulation
(Q-in-Q, MPLS)for the next-hop. The CLI
commands for adding or deleting such entries
to LPM table have been implemented. Action
handlers for QinQ and MPLS encapsulation,
classification action to select the input queue
of the hierarchical schedular(QoS) and adding
colour (Traffic-class for QoS) to the MPLS
tag have been implemented.

v2:
*fixed bug in print_route

v3:
*replaced config file "l2 = qinq/mpls" with
"encap = ethernet/ethernet_qinq/ethernet_mpls"

*added config file option "dbg_ah_disable=yes/no"
to disable table action handlers for routing and
arp, which is a quick way to disable packet
encapsulation.

*compacted routing table action handlers

*writing 6 bytes for the macaddr_dst instead of
8 bytes during encapsulation, as the additional
2 bytes are located on previous cache line

Signed-off-by: Jasvinder Singh 

Acked-by: Cristian Dumitrescu 

---
 examples/ip_pipeline/pipeline/pipeline_routing.c   |  806 -
 examples/ip_pipeline/pipeline/pipeline_routing.h   |8 +-
 .../ip_pipeline/pipeline/pipeline_routing_be.c | 1256 ++--
 .../ip_pipeline/pipeline/pipeline_routing_be.h |   72 +-
 4 files changed, 1954 insertions(+), 188 deletions(-)

diff --git a/examples/ip_pipeline/pipeline/pipeline_routing.c 
b/examples/ip_pipeline/pipeline/pipeline_routing.c
index beec982..4f6ff81 100644
--- a/examples/ip_pipeline/pipeline/pipeline_routing.c
+++ b/examples/ip_pipeline/pipeline/pipeline_routing.c
@@ -43,7 +43,7 @@

 struct app_pipeline_routing_route {
struct pipeline_routing_route_key key;
-   struct app_pipeline_routing_route_params params;
+   struct pipeline_routing_route_data data;
void *entry_ptr;

TAILQ_ENTRY(app_pipeline_routing_route) node;
@@ -187,21 +187,55 @@ print_route(const struct app_pipeline_routing_route 
*route)
&route->key.key.ipv4;

printf("IP Prefix = %" PRIu32 ".%" PRIu32
-   ".%" PRIu32 ".%" PRIu32 "/%" PRIu32 " => "
-   "(Port = %" PRIu32 ", Next Hop IP = "
-   "%" PRIu32 ".%" PRIu32 ".%" PRIu32 ".%" PRIu32 ")\n",
+   ".%" PRIu32 ".%" PRIu32 "/%" PRIu32
+   " => (Port = %" PRIu32,
+
(key->ip >> 24) & 0xFF,
(key->ip >> 16) & 0xFF,
(key->ip >> 8) & 0xFF,
key->ip & 0xFF,

key->depth,
-   route->params.port_id,
+   route->data.port_id);
+
+   if (route->data.flags & PIPELINE_ROUTING_ROUTE_ARP)
+   printf(
+   ", Next Hop IP = %" PRIu32 ".%" PRIu32
+   ".%" PRIu32 ".%" PRIu32,
+
+   (route->data.ethernet.ip >> 24) & 0xFF,
+   (route->data.ethernet.ip >> 16) & 0xFF,
+   (route->data.ethernet.ip >> 8) & 0xFF,
+   route->data.ethernet.ip & 0xFF);
+   else
+   printf(
+   ", Next Hop HWaddress = %02" PRIx32
+   ":%02" PRIx32 ":%02" PRIx32
+   ":%02" PRIx32 ":%02" PRIx32
+   ":%02" PRIx32,
+
+   route->data.ethernet.macaddr.addr_bytes[0],
+   route->data.ethernet.macaddr.addr_bytes[1],
+   route->data.ethernet.macaddr.addr_bytes[2],
+   route->data.ethernet.macaddr.addr_bytes[3],
+   route->data.ethernet.macaddr.addr_bytes[4],
+   route->data.ethernet.macaddr.addr_bytes[5]);
+
+   if (route->data.flags & PIPELINE_ROUTING_ROUTE_QINQ)
+   printf(", QinQ SVLAN = %" PRIu32 " CVLAN = %" PRIu32,
+   route->data.l2.qinq.svlan,
+   route->data.l2.qinq.cvlan);
+
+   if (route->data.flags & PIPELINE_ROUTING_ROUTE_MPLS) {
+

[dpdk-dev] [PATCH v5 0/4] bonding corrections and additions

2015-10-19 Thread Declan Doherty



On 30/09/15 18:48, Eric Kinzie wrote:
> This patchset makes a couple of small corrections to the bonding driver
> and introduces the ability to use an external state machine for mode
> 4 operation.
>
> Changes in v2:
>. eliminate external_sm field in 802.3ad configuration
>  (rte_eth_bond_8023ad_conf).
>. stop bonding device before changing the periodic callback function.
>  start again if needed.
>. remove unnecessary calls to valid_bonded_port_id().
>. do not check for NULL tx_ring.
>. return error in rte_eth_bond_8023ad_ext_slowtx() if packet is not LACP.
>. remove check for external sm configuration in periodic callback
>. check for valid LACPDU in test application's rx callback
>. add "Fixes:" tags
>
> Changes in v3:
>. update rte_eth_bond_version.map
>
> Changes in v4:
>. version functions that modify slowrx_cb
>
> Changes in v5:
>. remove "__vsym" from source files and use updated symbol version macros
>
> Eric Kinzie (4):
>bond mode 4: copy entire config structure
>bond mode 4: do not ignore multicast
>bond mode 4: allow external state machine
>bond mode 4: tests for external state machine
>
>   app/test/test_link_bonding_mode4.c|  217 ++-
>   drivers/net/bonding/Makefile  |2 +-
>   drivers/net/bonding/rte_eth_bond_8023ad.c |  241 
> -
>   drivers/net/bonding/rte_eth_bond_8023ad.h |   44 
>   drivers/net/bonding/rte_eth_bond_8023ad_private.h |   28 +++
>   drivers/net/bonding/rte_eth_bond_pmd.c|1 +
>   drivers/net/bonding/rte_eth_bond_version.map  |3 +
>   7 files changed, 522 insertions(+), 14 deletions(-)
>

Hey Eric, the patch set looks good, the only issue is that the 
versioning referrers to 2.1 now instead of 2.2, if you could fix that, 
I'll ack straight away.

Thanks
Declan

[dpdk-dev] Inconsistent statistics counters for pmd_i40e

2015-10-19 Thread Eimear Morrissey



Hi,

I'm having issues measuring packets dropped at the NIC in both the 2.0.0
and 2.1.0 versions of DPDK on an X710 Intel NIC.

In dpdk-2.0.0
Using rte_eth_xstats the rx_packets and rx_bytes counters increase as
expected, however rx_missed_errors is always 0 even if a sleep statement is
added between calls to rte_eth_rx_burst. However changing the coremask so
the application is running on a different socket than the card will cause
rx_missed_errors to increment for a limited amount of time and then stop.
Using rte_eth_stats, ipackets is incremented on packet receipt but the
q_ipackets and q_errors arrays remain zero. Even crossing sockets seems to
have no effect on q_errors.

In dpdk-2.1.0 the behaviour is the same as above, except that the number of
fields returned by rte_eth_xstats_get is reduced (no rx_missed errors at
all) so running on a different socket no longer has any noticeable effect
on the stats.

My understanding from the API manual is that the rte_eth_stats q_errors
array should count the packets missed because software isn't polling fast
enough, but that doesn't seem to be the case? Is there a standard DPDK way
to check this? The application is a forwarding one so there's no other way
to estimate drop except through NIC rx.

Thanks,
Eimear

[dpdk-dev] [PATCH v2 1/1] ethdev: distinguish between drop and error stats

2015-10-19 Thread Maryam Tahhan

Make a distniction between dropped packets and error statistics to allow
a higher level fault management entity to interact with DPDK and take
appropriate measures when errors are detected. It will also provide
valuable information for any applications that collects/extracts DPDK
stats, such applications include Open vSwitch.
After this patch the distinction is:
ierrors = Total number of packets dropped by hardware (malformed
packets, ...) Where the # of drops can ONLY be <=  the packets received
(without overlap between registers).
ipkterrors = Total number of erroneous received packets. Where the #
of errors can be >= the packets received (without overlap between
registers), this is because there may be multiple errors associated with
a packet.

v2:
 - Added detailed descriptions for ierrors and ipkterrors.

Signed-off-by: Maryam Tahhan 
---
 lib/librte_ether/rte_ethdev.h | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 8a8c82b..e27ba31 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -200,8 +200,18 @@ struct rte_eth_stats {
/**< Deprecated; Total of RX packets with CRC error. */
uint64_t ibadlen;
/**< Deprecated; Total of RX packets with bad length. */
-   uint64_t ierrors;   /**< Total number of erroneous received packets. */
+   uint64_t ierrors;
+   /**< Total number of packets dropped by hardware (malformed packets...)
+* Where the # of drops can ONLY be <=  the packets received (without
+* overlap between registers).
+*/
uint64_t oerrors;   /**< Total number of failed transmitted packets. */
+   uint64_t ipkterrors;
+   /**< Total number of erroneous received packets. Where the number of
+* errors can be >= the packets received (without overlap between
+* registers), this is because there may be multiple errors associated 
with
+* a packet.
+*/
uint64_t imcasts;
/**< Deprecated; Total number of multicast received packets. */
uint64_t rx_nombuf; /**< Total number of RX mbuf allocation failures. */
-- 
2.4.3

[dpdk-dev] [PATCH v2] eal:Map rte cfg and uio at the end of hugepage mem

2015-10-19 Thread Bruce Richardson

On Sat, Oct 17, 2015 at 10:48:29AM +0300, Nissim Nisimov wrote:
> * this patch removes unnecessary call to rte_eal_memory_init() introduced in 
> the first version.
> 
> Problem:
> In DPDK Primary/Secondary module we assume mapping same regions of virtual 
> memory addresses for Primary process and Secondary.
> An issue may occur when the Primary and secondary processes are not symmetric 
> in such way that the code is not the same (for example, Primary process is a 
> traffic distributer and secondary is a worker). The result may be that 
> specific virtual address region in the first process won't be available in 
> the second process.
> 
> Changes done at eal init:
> map all related rte configuration and uio sections close to the end of huge 
> pages memory (that mean rte_eal_memory_init() should be called before 
> rte_config_init() in primary process)
> According to our observations there will be more probability to success when 
> allocating rte_config and uio memzones after huge pages sections (actually 
> uio is already allocated after the huge pages area)
> 
> Signed-off-by: Nissim Nisimov 

Hi Nissim,

thanks for the V2. Some review comments below. Also, when applying there were
some whitespace errors reported from git:

git apply dpdk-dev-v2-eal-Map-rte-cfg-and-uio-at-the-end-of-hugepage-mem.patch
dpdk-dev-v2-eal-Map-rte-cfg-and-uio-at-the-end-of-hugepage-mem.patch:64: 
trailing whitespace.

dpdk-dev-v2-eal-Map-rte-cfg-and-uio-at-the-end-of-hugepage-mem.patch:111: space 
before tab in indent.
pci_map_addr = pci_find_max_end_va();
dpdk-dev-v2-eal-Map-rte-cfg-and-uio-at-the-end-of-hugepage-mem.patch:113: space 
before tab in indent.
pci_map_addr = 
(void*)RTE_PTR_ALIGN(((char*)rte_eal_get_configuration()->mem_config->mem_cfg_addr
 + sizeof(struct rte_mem_config)),sysconf(_SC_PAGE_SIZE));
warning: 3 lines add whitespace errors.


> ---
>  lib/librte_eal/linuxapp/eal/eal.c | 28 +---
>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 10 +++---
>  2 files changed, 28 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
> b/lib/librte_eal/linuxapp/eal/eal.c
> index 33e1067..f85eb63 100644
> --- a/lib/librte_eal/linuxapp/eal/eal.c
> +++ b/lib/librte_eal/linuxapp/eal/eal.c
> @@ -87,6 +87,7 @@
>  
>  #define SOCKET_MEM_STRLEN (RTE_MAX_NUMA_NODES * 10)
>  
> +void *pci_find_max_end_va(void);

Probably want a blank line after the prototype, since it's unrelated to the line
afterward. Maybe mark it extern too. 
[The alternative to this function prototype here is obviously to include
eal_pci_init.h to get the function prototype - not sure which solution is 
better]

>  /* Allow the application to print its usage message too if set */
>  static rte_usage_hook_t  rte_application_usage_hook = NULL;
>  
> @@ -189,12 +190,15 @@ rte_eal_config_create(void)
>   return;
>  
>   /* map the config before hugepage address so that we don't waste a page 
> */
> - if (internal_config.base_virtaddr != 0)
> + if (internal_config.base_virtaddr != 0){
>   rte_mem_cfg_addr = (void *)
>   RTE_ALIGN_FLOOR(internal_config.base_virtaddr -
>   sizeof(struct rte_mem_config), sysconf(_SC_PAGE_SIZE));
> - else
> - rte_mem_cfg_addr = NULL;
> +}
> + else{
> + rte_mem_cfg_addr = pci_find_max_end_va();

Would it work as well putting the config immediately before the hugepages
rather than at the end, as is done in the case of having a cmdline-specified 
virtual
address? If so, it should remove the need to change the eal_pci_uio.c file.

> +RTE_LOG(INFO, EAL, "rte_mem_cfg_addr =  0x%llx 
> PType=%s\n",(unsigned long long)rte_mem_cfg_addr,rte_config.process_type == 
> RTE_PROC_PRIMARY ? "PRIMARY" : "SECONDARY");
> +}
> 

I would suggest making the RTE_LOG statment unconditional. Even having it
printing when the value is specified can be useful to confirm the parameter was
parsed correctly. [It would also reduce the diff as you won't have to add in the
braces to the "if/else" statements].
Also:
* use tabs, not spaces for indentation.
* I would suggest making the LOG statement be at the debug, rather than info
level, as it's not something that normally needs to be printed.

>   if (mem_cfg_fd < 0){
>   mem_cfg_fd = open(pathname, O_RDWR | O_CREAT, 0660);
> @@ -227,7 +231,7 @@ rte_eal_config_create(void)
>   /* store address of the config in the config itself so that secondary
>* processes could later map the config into this exact location */
>   rte_config.mem_config->mem_cfg_addr = (uintptr_t) rte_mem_cfg_addr;
> -
> +
>  }
Remove this whitespace change from diff.

>  
>  /* attach to an existing shared memory config */
> @@ -784,6 +788,13 @@ rte_eal_init(int argc, char **argv)
>  
>   rte_srand(rte_rdtsc());
>  
> +/* Primary process should

[dpdk-dev] [PATCH v2] eal: don't reset getopt lib

2015-10-19 Thread Tiwei Bie

Someone may need to call rte_eal_init() with a fake argc/argv array
in the middle of using getopt() to parse its own unrelated argc/argv
parameters. So getopt lib shouldn't be reset by rte_eal_init().

Now eal will always save optind, optarg and optopt (and optreset on
FreeBSD) at the beginning, initialize optind (and optreset on FreeBSD)
to 1 before calling getopt_long(), then restore all values after.

Suggested-by: Don Provan 
Suggested-by: Bruce Richardson 
Signed-off-by: Tiwei Bie 
Reviewed-by: Don Provan 
Reviewed-by: Bruce Richardson 
---
v2:
 - constify some variables

 lib/librte_eal/bsdapp/eal/eal.c   | 47 ---
 lib/librte_eal/linuxapp/eal/eal.c | 59 ---
 2 files changed, 80 insertions(+), 26 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 1b6f705..b356517 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -312,8 +312,14 @@ eal_log_level_parse(int argc, char **argv)
int opt;
char **argvopt;
int option_index;
+   const int old_optind = optind;
+   const int old_optopt = optopt;
+   const int old_optreset = optreset;
+   char * const old_optarg = optarg;

argvopt = argv;
+   optind = 1;
+   optreset = 1;

eal_reset_internal_config(&internal_config);

@@ -334,7 +340,11 @@ eal_log_level_parse(int argc, char **argv)
break;
}

-   optind = 0; /* reset getopt lib */
+   /* restore getopt lib */
+   optind = old_optind;
+   optopt = old_optopt;
+   optreset = old_optreset;
+   optarg = old_optarg;
 }

 /* Parse the argument given in the command line of the application */
@@ -345,25 +355,31 @@ eal_parse_args(int argc, char **argv)
char **argvopt;
int option_index;
char *prgname = argv[0];
+   const int old_optind = optind;
+   const int old_optopt = optopt;
+   const int old_optreset = optreset;
+   char * const old_optarg = optarg;

argvopt = argv;
+   optind = 1;
+   optreset = 1;

while ((opt = getopt_long(argc, argvopt, eal_short_options,
  eal_long_options, &option_index)) != EOF) {

-   int ret;
-
/* getopt is not happy, stop right now */
if (opt == '?') {
eal_usage(prgname);
-   return -1;
+   ret = -1;
+   goto out;
}

ret = eal_parse_common_option(opt, optarg, &internal_config);
/* common parser is not happy */
if (ret < 0) {
eal_usage(prgname);
-   return -1;
+   ret = -1;
+   goto out;
}
/* common parser handled this option */
if (ret == 0)
@@ -387,23 +403,34 @@ eal_parse_args(int argc, char **argv)
"on FreeBSD\n", opt);
}
eal_usage(prgname);
-   return -1;
+   ret = -1;
+   goto out;
}
}

-   if (eal_adjust_config(&internal_config) != 0)
-   return -1;
+   if (eal_adjust_config(&internal_config) != 0) {
+   ret = -1;
+   goto out;
+   }

/* sanity checks */
if (eal_check_common_options(&internal_config) != 0) {
eal_usage(prgname);
-   return -1;
+   ret = -1;
+   goto out;
}

if (optind >= 0)
argv[optind-1] = prgname;
ret = optind-1;
-   optind = 0; /* reset getopt lib */
+
+out:
+   /* restore getopt lib */
+   optind = old_optind;
+   optopt = old_optopt;
+   optreset = old_optreset;
+   optarg = old_optarg;
+
return ret;
 }

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 33e1067..89a81bd 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -505,8 +505,12 @@ eal_log_level_parse(int argc, char **argv)
int opt;
char **argvopt;
int option_index;
+   const int old_optind = optind;
+   const int old_optopt = optopt;
+   char * const old_optarg = optarg;

argvopt = argv;
+   optind = 1;

eal_reset_internal_config(&internal_config);

@@ -527,7 +531,10 @@ eal_log_level_parse(int argc, char **argv)
break;
}

-   optind = 0; /* reset getopt lib */
+   /* restore getopt lib */
+   optind = old_optind;
+   optopt = old_optopt;
+   optarg = old_optarg;
 }

 /* Parse the argument given in the command line of the application */
@@ -539,25 +546,29 @@ eal_parse_args(int argc, char **argv)

[dpdk-dev] [PATCH] eal: don't reset getopt lib

2015-10-19 Thread Tiwei Bie

On Mon, Oct 19, 2015 at 11:36:41AM +0100, Bruce Richardson wrote:
> On Thu, Oct 15, 2015 at 07:46:04PM +0800, Tiwei Bie wrote:
> > Someone may need to call rte_eal_init() with a fake argc/argv array
> > in the middle of using getopt() to parse its own unrelated argc/argv
> > parameters. So getopt lib shouldn't be reset by rte_eal_init().
> > 
> > Now eal will always save optind, optarg and optopt (and optreset on
> > FreeBSD) at the beginning, initialize optind (and optreset on FreeBSD)
> > to 1 before calling getopt_long(), then restore all values after.
> 
> This patch looks good overall. Minor comment inline below.
> 

Thanks for review! :-)

Best regards,
Tiwei Bie

[dpdk-dev] [PATCH v6 0/9] Dynamic RSS Configuration for Bonding

2015-10-19 Thread Declan Doherty

On 16/10/15 11:00, Tomasz Kulasek wrote:
> OVERVIEW
> 
> 1) Setting .rxmode.mq_mode for bonding device to ETH_MQ_RX_RSS makes bonding
> device fully RSS-capable, so all slaves are synchronized with its 
> configuration.
> This mode is intended to provide RSS configuration as known from "dynamic RSS
> configuration for one port" and made slaves transparent for client application
> implementation.
>
> 2) If .rxmode.mq_mode for bonding device isn't ETH_MQ_RX_RSS, slaves are not
> synchronized. That provides an ability to configure them manually. This mode 
> may
> be useful when application wants to manage RSS in an unusual way and the
> consistency of RSS configuration for slaves isn't required.
>
> Turning on/off RSS mode for slaves when bonding is started is not possible.
> Other RSS configuration is propagated over slaves, when bonding device API is
> used to do it.
>
> v6 changes:
>   - patchset reordered
>   - fixed forward dependency between patch 4/9 and 5/9
>
> v5 changes:
>   - updated to DPDK 2.2
>   - removed copyright change from null device source
>   - removed queue_stats_mapping_set from eth_dev_ops of bonding device
>   - null pmd cleanups (removed unnecessary malloc, replaced memcpy with
> rte_memcpy)
>   - fixed queues number configuration in null pmd
>
> v4 changes:
>   - fixed copy-paste error,
>   - removed example application as too complex and introducing a new
> dependency,
>   - addapted null pmd to be used as testing device for dynamic RSS 
> configuration,
>   - addapted test units to use null pmd instead of ring pmd,
>   - ring pmd is not used and changed in this patchset
>
> v3 changes:
>   - checkpatch cleanups
>
> v2 changes:
>   - added support for keys other than 40 bytes long,
>   - now, if RSS key is not set for bonding, it is not set also for slaves,
>   - fix - full initial RSS configuration before any slave is added was not
> possible due to the initially zeroed flow_type_rss_offloads for bonding,
>   - fix - changed error to warning when slave is synchronizing due to the
> bonding's initial configuration (to allow use slaves' drivers not 
> supporting
> dynamic RSS configuration in bonding),
>   - some code cleanups,
>   - updated documentation,
>
> Tomasz Kulasek (9):
>bonding: rss dynamic configuration
>null: fix segfault when null_pmd added to bonding
>null: extend number of virtual queues
>null: export eth_dev_null_create
>null: virtual dynamic rss configuration
>test: dynamic rss configuration
>bonding: per queue stats
>doc: fixed spellings and typos
>doc: dynamic rss configuration for bonding
>
>   app/test/Makefile  |8 +
>   app/test/test_link_bonding_rssconf.c   |  679 
> 
>   .../prog_guide/link_bonding_poll_mode_drv_lib.rst  |   42 +-
>   drivers/net/bonding/rte_eth_bond_api.c |   28 +
>   drivers/net/bonding/rte_eth_bond_pmd.c |  216 ++-
>   drivers/net/bonding/rte_eth_bond_private.h |   12 +
>   drivers/net/null/Makefile  |2 +-
>   drivers/net/null/rte_eth_null.c|  149 -
>   drivers/net/null/rte_eth_null.h|   40 ++
>   drivers/net/null/rte_pmd_null_version.map  |7 +
>   10 files changed, 1151 insertions(+), 32 deletions(-)
>   create mode 100644 app/test/test_link_bonding_rssconf.c
>   create mode 100644 drivers/net/null/rte_eth_null.h
>

Series Acked-by: Declan Doherty

[dpdk-dev] [PATCH v2] eal: don't reset getopt lib

2015-10-19 Thread Bruce Richardson

On Mon, Oct 19, 2015 at 09:13:10PM +0800, Tiwei Bie wrote:
> Someone may need to call rte_eal_init() with a fake argc/argv array
> in the middle of using getopt() to parse its own unrelated argc/argv
> parameters. So getopt lib shouldn't be reset by rte_eal_init().
> 
> Now eal will always save optind, optarg and optopt (and optreset on
> FreeBSD) at the beginning, initialize optind (and optreset on FreeBSD)
> to 1 before calling getopt_long(), then restore all values after.
> 
> Suggested-by: Don Provan 
> Suggested-by: Bruce Richardson 
> Signed-off-by: Tiwei Bie 
> Reviewed-by: Don Provan 
> Reviewed-by: Bruce Richardson 

Acked-by: Bruce Richardson

[dpdk-dev] [PATCH v6 00/13] vhost-user multiple queues enabling

2015-10-19 Thread Flavio Leitner

On Fri, Oct 09, 2015 at 01:45:59PM +0800, Yuanhan Liu wrote:
> This patch set enables vhost-user multiple queues.
> 
> Overview
> 
> 
> It depends on some QEMU patches that has already been merged to upstream.
> Those qemu patches introduce some new vhost-user messages, for vhost-user
> mq enabling negotiation. Here is the main negotiation steps (Qemu
> as master, and DPDK vhost-user as slave):
> 
> - Master queries features by VHOST_USER_GET_FEATURES from slave
> 
> - Check if VHOST_USER_F_PROTOCOL_FEATURES exist. If not, mq is not
>   supported. (check patch 1 for why VHOST_USER_F_PROTOCOL_FEATURES
>   is introduced)
> 
> - Master then sends another command, VHOST_USER_GET_QUEUE_NUM, for
>   querying how many queues the slave supports.
> 
>   Master will compare the result with the requested queue number.
>   Qemu exits if the former is smaller.
> 
> - Master then tries to initiate all queue pairs by sending some vhost
>   user commands, including VHOST_USER_SET_VRING_CALL, which will
>   trigger the slave to do related vring setup, such as vring allocation.
> 
> 
> Till now, all necessary initiation and negotiation are done. And master
> could send another message, VHOST_USER_SET_VRING_ENABLE, to enable/disable
> a specific queue dynamically later.
> 
> 
> Patchset
> 
> 
> Patch 1-6 are all prepare works for enabling mq; they are all atomic
> changes, with "do not breaking anything" beared in mind while making
> them.
> 
> Patch 7 acutally enables mq feature, by setting two key feature flags.
> 
> Patch 8 handles VHOST_USER_SET_VRING_ENABLE message, which is for enabling
> disabling a specific virt queue pair, and there is only one queue pair is
> enabled by default.
> 
> Patch 9-12 is for demostrating the mq feature.
> 
> Patch 13 udpates the doc release note.
> 
> 
> Testing
> ===
> 
> Host side
> --
> 
> - # Start vhost-switch
> 
>   sudo mount -t hugetlbfs nodev /mnt/huge
>   sudo modprobe uio
>   sudo insmod $RTE_SDK/$RTE_TARGET/kmod/igb_uio.ko
> 
>   sudo $RTE_SDK/tools/dpdk_nic_bind.py --bind igb_uio :08:00.0
> 
>   sudo $RTE_SDK/examples/vhost/build/vhost-switch -c 0xf0 -n 4 \
>--huge-dir /mnt/huge --socket-mem 2048,0 -- -p 1 --vm2vm 0  \
>--dev-basename usvhost --rxq 2
> 
>   # Above common generates a usvhost socket file at PWD. You could also
>   # specify "--stats 1" option to enable stats dumping.
> 
> 
> 
> - # start qemu
> 
> 
>   sudo sudo mount -t hugetlbfs nodev $HOME/hugetlbfs
>   $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 -machine accel=kvm -m 4G \
> -object 
> memory-backend-file,id=mem,size=4G,mem-path=$HOME/hugetlbfs,share=on \
>   -numa node,memdev=mem -chardev socket,id=chr0,path=/path/to/usvhost \
>   -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=2 \
>   -device 
> virtio-net-pci,netdev=net0,mq=on,vectors=6,mac=52:54:00:12:34:58,csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off
>  \
>   -hda $HOME/iso/fc-22-x86_64.img -smp 10 -cpu 
> core2duo,+sse3,+sse4.1,+sse4.2
> 
> 
> Guest side
> --
> 
>modprobe uio
>insmod $RTE_SDK/$RTE_TARGET/kmod/igb_uio.ko
>echo 1024 > 
> /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
>./tools/dpdk_nic_bind.py --bind igb_uio 00:03.0
> 
>$RTE_SDK/$RTE_TARGET/app/testpmd -c 1f -n 4 -- --rxq=2 --txq=2 \
> --nb-cores=4 -i --disable-hw-vlan --txqflags 0xf00
> 
>> set fwd mac
>> start tx_first
>  
> 
> After those setups, you then could use packet generator for packet tx/rx 
> testing.
> 
> 
> 
> Test with OVS
> =
> 
> Marcel also created a simple yet quite clear test guide with OVS at:
> 
>http://wiki.qemu.org/Features/vhost-user-ovs-dpdk
> 
> BTW, Marcel, would you please complete the page on mq testing?
> 
> 
> ---

I reviewed the patch and tested with success using Open vSwitch.
The Open vSwitch patch is posted here:
http://openvswitch.org/pipermail/dev/2015-October/061413.html

Acked-by: Flavio Leitner 

Thanks,
fbl

[dpdk-dev] [PATCH 3/5] virtio: use indirect ring elements

2015-10-19 Thread Xie, Huawei

On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
> The virtio ring in QEMU/KVM is usually limited to 256 entries
> and the normal way that virtio driver was queuing mbufs required
> nsegs + 1 ring elements. By using the indirect ring element feature
> if available, each packet will take only one ring slot even for
> multi-segment packets.

[...]


>  static int
> -virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie)
> +virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie,
> +int use_indirect)
>  {
>   struct vq_desc_extra *dxp;
>   struct vring_desc *start_dp;
>   uint16_t seg_num = cookie->nb_segs;
> - uint16_t needed = 1 + seg_num;
> + uint16_t needed = use_indirect ? 1 : 1 + seg_num;
>   uint16_t head_idx, idx;
> - uint16_t head_size = txvq->hw->vtnet_hdr_size;
> + unsigned long offs;
>  
>   if (unlikely(txvq->vq_free_cnt == 0))
>   return -ENOSPC;
> @@ -220,12 +221,29 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct 
> rte_mbuf *cookie)
>   dxp = &txvq->vq_descx[idx];
>   dxp->cookie = (void *)cookie;
>   dxp->ndescs = needed;
> -
>   start_dp = txvq->vq_ring.desc;
> - start_dp[idx].addr =
> - txvq->virtio_net_hdr_mem + idx * head_size;
> - start_dp[idx].len = (uint32_t)head_size;
> - start_dp[idx].flags = VRING_DESC_F_NEXT;
> +
> + if (use_indirect) {
> + struct virtio_tx_region *txr
> + = txvq->virtio_net_hdr_mz->addr;
> +
> + offs = idx * sizeof(struct virtio_tx_region)
> + + offsetof(struct virtio_tx_region, tx_indir);
> +
> + start_dp[idx].addr  = txvq->virtio_net_hdr_mem + offs;
> + start_dp[idx].len   = (seg_num + 1) * sizeof(struct vring_desc);
a. In indirect mode, as we always use one descriptor, could we allocate
one fixed descriptor as what i did in RX/TX ring layout optimization? :).
b. If not a, we could cache the descriptor, avoid update unless the
fields are different. In current implementation of free desc list, we
could make them always use the same tx desc for the same ring slot. I am
to submit a patch for normal rx path.
c. Could we initialize the length of all tx descriptors to be
VIRTIO_MAX_INDIRECT * sizeof(struct vring_desc)? Is maximum length ok
here? Does the spec require that the length field reflects the length of
real used descs, as we already have the next field to indicate the last
descriptor.


> + start_dp[idx].flags = VRING_DESC_F_INDIRECT;
> +
> + start_dp = txr[idx].tx_indir;
> + idx = 0;
> + } else {
> + offs = idx * sizeof(struct virtio_tx_region)
> + + offsetof(struct virtio_tx_region, tx_hdr);
> +
> + start_dp[idx].addr  = txvq->virtio_net_hdr_mem + offs;
> + start_dp[idx].len   = txvq->hw->vtnet_hdr_size;
> + start_dp[idx].flags = VRING_DESC_F_NEXT;
> + }
>  
>   for (; ((seg_num > 0) && (cookie != NULL)); seg_num--) {
>   idx = start_dp[idx].next;
This looks weird to me. Why skip the first user provided descriptor?
idx = 0
idx = start_dp[idx].next
start_dp[idx].addr = ...

> @@ -236,7 +254,12 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct 
> rte_mbuf *cookie)
>   }
>  
>   start_dp[idx].flags &= ~VRING_DESC_F_NEXT;
> - idx = start_dp[idx].next;
> +
> + if (use_indirect)
> + idx = txvq->vq_ring.desc[head_idx].next;
> + else
> + idx = start_dp[idx].next;
> +
>   txvq->vq_desc_head_idx = idx;
>   if (txvq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
>   txvq->vq_desc_tail_idx = idx;
> @@ -261,7 +284,7 @@ static void
>  virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
>  {
>   struct rte_mbuf *m;
> - int i, nbufs, error, size = vq->vq_nentries;
> + int nbufs, error, size = vq->vq_nentries;
>   struct vring *vr = &vq->vq_ring;
>   uint8_t *ring_mem = vq->vq_ring_virt_mem;
>  
> @@ -279,10 +302,7 @@ virtio_dev_vring_start(struct virtqueue *vq, int 
> queue_type)
>   vq->vq_free_cnt = vq->vq_nentries;
>   memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries);
>  
> - /* Chain all the descriptors in the ring with an END */
> - for (i = 0; i < size - 1; i++)
> - vr->desc[i].next = (uint16_t)(i + 1);
> - vr->desc[i].next = VQ_RING_DESC_CHAIN_END;
> + vring_desc_init(vr->desc, size);
>  
>   /*
>* Disable device(host) interrupting guest
> @@ -760,7 +780,15 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
> **tx_pkts, uint16_t nb_pkts)
>  
>   for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
>   struct rte_mbuf *txm = tx_pkts[nb_tx];
> - int need = txm->nb_segs - txvq->vq_free_cnt + 1;
> + int use_indirect, slots, need;
> +
> + use_indirect = vtpci_with_feature(txvq->hw,
> +

[dpdk-dev] [PATCH] igb: fix IEEE1588 frame identification in i210

2015-10-19 Thread Pablo de Lara

Fixed issue where the flag PKT_RX_IEEE1588_PTP was not being set
in Intel I210 NIC, as EtherType in RX descriptor is in bits 8:10 of
Packet Type and not in the default bits 0:2.

Fixes known issue "IEEE1588 support possibly not working
with an Intel Ethernet Controller I210 NIC"

Signed-off-by: Pablo de Lara 
---
 doc/guides/rel_notes/known_issues.rst | 19 ---
 doc/guides/rel_notes/release_2_2.rst  |  5 +
 drivers/net/e1000/igb_rxtx.c  | 15 +++
 3 files changed, 16 insertions(+), 23 deletions(-)

diff --git a/doc/guides/rel_notes/known_issues.rst 
b/doc/guides/rel_notes/known_issues.rst
index b9a52d0..9f1fcdb 100644
--- a/doc/guides/rel_notes/known_issues.rst
+++ b/doc/guides/rel_notes/known_issues.rst
@@ -421,25 +421,6 @@ Intel? QuickAssist Technology sample application does not 
work on a 32-bit OS o
All.


-IEEE1588 support possibly not working with an Intel? Ethernet Controller I210 
NIC
--
-
-**Description**:
-   IEEE1588 support is not working with an Intel? Ethernet Controller I210 NIC.
-
-**Implication**:
-   IEEE1588 packets are not forwarded correctly by the Intel? Ethernet 
Controller I210 NIC.
-
-**Resolution/Workaround**:
-   There is no workaround available.
-
-**Affected Environment/Platform**:
-   All.
-
-**Driver/Module**:
-   IGB Poll Mode Driver
-
-
 Differences in how different Intel NICs handle maximum packet length for jumbo 
frame
 


diff --git a/doc/guides/rel_notes/release_2_2.rst 
b/doc/guides/rel_notes/release_2_2.rst
index 5687676..f623fdb 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -39,6 +39,11 @@ Drivers

   Fixed issue with libvirt ``virsh destroy`` not killing the VM.

+* **igb: Fixed IEEE1588 frame identification in I210.**
+
+  Fixed issue where the flag PKT_RX_IEEE1588_PTP was not being set
+  in Intel I210 NIC, as EtherType in RX descriptor is in bits 8:10 of
+  Packet Type and not in the default bits 0:2.

 Libraries
 ~
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index 19905fd..ebc57cf 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -668,7 +668,7 @@ igb_rxd_pkt_info_to_pkt_type(uint16_t pkt_info)
 }

 static inline uint64_t
-rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
+rx_desc_hlen_type_rss_to_pkt_flags(struct igb_rx_queue *rxq, uint32_t hl_tp_rs)
 {
uint64_t pkt_flags = ((hl_tp_rs & 0x0F) == 0) ?  0 : PKT_RX_RSS_HASH;

@@ -678,7 +678,14 @@ rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
0, 0, 0, 0,
};

-   pkt_flags |= ip_pkt_etqf_map[(hl_tp_rs >> 4) & 0x07];
+   struct rte_eth_dev dev = rte_eth_devices[rxq->port_id];
+   struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev.data->dev_private);
+
+   /* EtherType is in bits 8:10 in Packet Type, and not in the default 0:2 
*/
+   if (hw->mac.type == e1000_i210)
+   pkt_flags |= ip_pkt_etqf_map[(hl_tp_rs >> 12) & 0x07];
+   else
+   pkt_flags |= ip_pkt_etqf_map[(hl_tp_rs >> 4) & 0x07];
 #endif

return pkt_flags;
@@ -852,7 +859,7 @@ eth_igb_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
/* Only valid if PKT_RX_VLAN_PKT set in pkt_flags */
rxm->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan);

-   pkt_flags = rx_desc_hlen_type_rss_to_pkt_flags(hlen_type_rss);
+   pkt_flags = rx_desc_hlen_type_rss_to_pkt_flags(rxq, 
hlen_type_rss);
pkt_flags = pkt_flags | rx_desc_status_to_pkt_flags(staterr);
pkt_flags = pkt_flags | rx_desc_error_to_pkt_flags(staterr);
rxm->ol_flags = pkt_flags;
@@ -1088,7 +1095,7 @@ eth_igb_recv_scattered_pkts(void *rx_queue, struct 
rte_mbuf **rx_pkts,
 */
first_seg->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan);
hlen_type_rss = rte_le_to_cpu_32(rxd.wb.lower.lo_dword.data);
-   pkt_flags = rx_desc_hlen_type_rss_to_pkt_flags(hlen_type_rss);
+   pkt_flags = rx_desc_hlen_type_rss_to_pkt_flags(rxq, 
hlen_type_rss);
pkt_flags = pkt_flags | rx_desc_status_to_pkt_flags(staterr);
pkt_flags = pkt_flags | rx_desc_error_to_pkt_flags(staterr);
first_seg->ol_flags = pkt_flags;
-- 
2.1.0

[dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD

2015-10-19 Thread Panu Matilainen

On 10/19/2015 01:50 PM, Tetsuya Mukawa wrote:
> On 2015/10/19 18:45, Bruce Richardson wrote:
>> On Mon, Oct 19, 2015 at 10:32:50AM +0100, Loftus, Ciara wrote:
 On 2015/10/16 21:52, Bruce Richardson wrote:
> On Mon, Aug 31, 2015 at 12:55:26PM +0900, Tetsuya Mukawa wrote:
>> The patch introduces a new PMD. This PMD is implemented as thin
 wrapper
>> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
>> The PMD can have 'iface' parameter like below to specify a path to
 connect
>> to a virtio-net device.
>>
>> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
>>
>> To connect above testpmd, here is qemu command example.
>>
>> $ qemu-system-x86_64 \
>>  
>>  -chardev socket,id=chr0,path=/tmp/sock0 \
>>  -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
>>  -device virtio-net-pci,netdev=net0
>>
>> Signed-off-by: Tetsuya Mukawa 
> With this PMD in place, is there any need to keep the existing vhost 
> library
> around as a separate entity? Can the existing library be
 subsumed/converted into
> a standard PMD?
>
> /Bruce
 Hi Bruce,

 I concern about whether the PMD has all features of librte_vhost,
 because librte_vhost provides more features and freedom than ethdev API
 provides.
 In some cases, user needs to choose limited implementation without
 librte_vhost.
 I am going to eliminate such cases while implementing the PMD.
 But I don't have strong belief that we can remove librte_vhost now.

 So how about keeping current separation in next DPDK?
 I guess people will try to replace librte_vhost to vhost PMD, because
 apparently using ethdev APIs will be useful in many cases.
 And we will get feedbacks like "vhost PMD needs to support like this 
 usage".
 (Or we will not have feedbacks, but it's also OK.)
 Then, we will be able to merge librte_vhost and vhost PMD.
>>> I agree with the above. One the concerns I had when reviewing the patch was 
>>> that the PMD removes some freedom that is available with the library. Eg. 
>>> Ability to implement the new_device and destroy_device callbacks. If using 
>>> the PMD you are constrained to the implementations of these in the PMD 
>>> driver, but if using librte_vhost, you can implement your own with whatever 
>>> functionality you like - a good example of this can be seen in the vhost 
>>> sample app.
>>> On the other hand, the PMD is useful in that it removes a lot of complexity 
>>> for the user and may work for some more general use cases. So I would be in 
>>> favour of having both options available too.
>>>
>>> Ciara
>>>
>> Thanks.
>> However, just because the libraries are merged does not mean that you need
>> be limited by PMD functionality. Many PMDs provide additional 
>> library-specific
>> functions over and above their PMD capabilities. The bonded PMD is a good 
>> example
>> here, as it has a whole set of extra functions to create and manipulate 
>> bonded
>> devices - things that are obviously not part of the general ethdev API. Other
>> vPMDs similarly include functions to allow them to be created on the fly too.
>>
>> regards,
>> /Bruce
>
> Hi Bruce,
>
> I appreciate for showing a good example. I haven't noticed the PMD.
> I will check the bonding PMD, and try to remove librte_vhost without
> losing freedom and features of the library.

Hi,

Just a gentle reminder - if you consider removing (even if by just 
replacing/renaming) an entire library, it needs to happen the ABI 
deprecation process.

It seems obvious enough. But for all the ABI policing here, somehow we 
all failed to notice the two compatibility breaking rename-elephants in 
the room during 2.1 development:
- libintel_dpdk was renamed to libdpdk
- librte_pmd_virtio_uio was renamed to librte_pmd_virtio

Of course these cases are easy to work around with symlinks, and are 
unrelated to the matter at hand. Just wanting to make sure such things 
dont happen again.

- Panu -

[dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD

2015-10-19 Thread Richardson, Bruce



> -Original Message-
> From: Panu Matilainen [mailto:pmatilai at redhat.com]
> Sent: Monday, October 19, 2015 2:26 PM
> To: Tetsuya Mukawa ; Richardson, Bruce
> ; Loftus, Ciara 
> Cc: dev at dpdk.org; ann.zhuangyanying at huawei.com
> Subject: Re: [dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD
> 
> On 10/19/2015 01:50 PM, Tetsuya Mukawa wrote:
> > On 2015/10/19 18:45, Bruce Richardson wrote:
> >> On Mon, Oct 19, 2015 at 10:32:50AM +0100, Loftus, Ciara wrote:
>  On 2015/10/16 21:52, Bruce Richardson wrote:
> > On Mon, Aug 31, 2015 at 12:55:26PM +0900, Tetsuya Mukawa wrote:
> >> The patch introduces a new PMD. This PMD is implemented as thin
>  wrapper
> >> of librte_vhost. It means librte_vhost is also needed to compile
> the PMD.
> >> The PMD can have 'iface' parameter like below to specify a path
> >> to
>  connect
> >> to a virtio-net device.
> >>
> >> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
> >>
> >> To connect above testpmd, here is qemu command example.
> >>
> >> $ qemu-system-x86_64 \
> >>  
> >>  -chardev socket,id=chr0,path=/tmp/sock0 \
> >>  -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
> >>  -device virtio-net-pci,netdev=net0
> >>
> >> Signed-off-by: Tetsuya Mukawa 
> > With this PMD in place, is there any need to keep the existing
> > vhost library around as a separate entity? Can the existing
> > library be
>  subsumed/converted into
> > a standard PMD?
> >
> > /Bruce
>  Hi Bruce,
> 
>  I concern about whether the PMD has all features of librte_vhost,
>  because librte_vhost provides more features and freedom than ethdev
>  API provides.
>  In some cases, user needs to choose limited implementation without
>  librte_vhost.
>  I am going to eliminate such cases while implementing the PMD.
>  But I don't have strong belief that we can remove librte_vhost now.
> 
>  So how about keeping current separation in next DPDK?
>  I guess people will try to replace librte_vhost to vhost PMD,
>  because apparently using ethdev APIs will be useful in many cases.
>  And we will get feedbacks like "vhost PMD needs to support like this
> usage".
>  (Or we will not have feedbacks, but it's also OK.) Then, we will be
>  able to merge librte_vhost and vhost PMD.
> >>> I agree with the above. One the concerns I had when reviewing the
> patch was that the PMD removes some freedom that is available with the
> library. Eg. Ability to implement the new_device and destroy_device
> callbacks. If using the PMD you are constrained to the implementations of
> these in the PMD driver, but if using librte_vhost, you can implement your
> own with whatever functionality you like - a good example of this can be
> seen in the vhost sample app.
> >>> On the other hand, the PMD is useful in that it removes a lot of
> complexity for the user and may work for some more general use cases. So I
> would be in favour of having both options available too.
> >>>
> >>> Ciara
> >>>
> >> Thanks.
> >> However, just because the libraries are merged does not mean that you
> >> need be limited by PMD functionality. Many PMDs provide additional
> >> library-specific functions over and above their PMD capabilities. The
> >> bonded PMD is a good example here, as it has a whole set of extra
> >> functions to create and manipulate bonded devices - things that are
> >> obviously not part of the general ethdev API. Other vPMDs similarly
> include functions to allow them to be created on the fly too.
> >>
> >> regards,
> >> /Bruce
> >
> > Hi Bruce,
> >
> > I appreciate for showing a good example. I haven't noticed the PMD.
> > I will check the bonding PMD, and try to remove librte_vhost without
> > losing freedom and features of the library.
> 
> Hi,
> 
> Just a gentle reminder - if you consider removing (even if by just
> replacing/renaming) an entire library, it needs to happen the ABI
> deprecation process.
> 
> It seems obvious enough. But for all the ABI policing here, somehow we all
> failed to notice the two compatibility breaking rename-elephants in the
> room during 2.1 development:
> - libintel_dpdk was renamed to libdpdk
> - librte_pmd_virtio_uio was renamed to librte_pmd_virtio
> 
> Of course these cases are easy to work around with symlinks, and are
> unrelated to the matter at hand. Just wanting to make sure such things
> dont happen again.
> 
>   - Panu -

Still doesn't hurt to remind us, Panu! Thanks. :-)

[dpdk-dev] [PATCH 8/8] librte_table: modify release notes and deprecation notice

2015-10-19 Thread Thomas Monjalon

2015-10-12 16:30, Mcnamara, John:
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> > If a v2 is needed, please squash the release notes changes with the API
> > changes (patches 1 and 2).
> 
> Hi Thomas,
> 
> Could you clarify this? In general shouldn't we keep all 
> lib/drivers/examples/doc changes in separate patches in the patchset?

Why should we?
When reviewing a patch, having the doc atomically updated with the code change
will help to understand it.

[dpdk-dev] [PATCH v2 6/8] example/ip_pipeline: add parse_hex_string for internal use

2015-10-19 Thread Thomas Monjalon

2015-10-13 14:57, Jasvinder Singh:
> From: Fan Zhang 
> +static uint32_t
> +get_hex_val(char c)
> +{
> + switch (c) {
> + case '0':
> + case '1':
> + case '2':
> + case '3':
> + case '4':
> + case '5':
> + case '6':
> + case '7':
> + case '8':
> + case '9':
> + return c - '0';
> + case 'A':
> + case 'B':
> + case 'C':
> + case 'D':
> + case 'E':
> + case 'F':
> + return c - 'A' + 10;
> + case 'a':
> + case 'b':
> + case 'c':
> + case 'd':
> + case 'e':
> + case 'f':
> + return c - 'a' + 10;
> + default:
> + return 0;
> + }
> +}
> +
> +int
> +parse_hex_string(char *src, uint8_t *dst, uint32_t *size)
> +{
> + char *c;
> + uint32_t len, i;
> +
> + /* Check input parameters */
> + if ((src == NULL) ||
> + (dst == NULL) ||
> + (size == NULL) ||
> + (*size == 0))
> + return -1;
> +
> + len = strlen(src);
> + if (((len & 3) != 0) ||
> + (len > (*size) * 2))
> + return -1;
> + *size = len / 2;
> +
> + for (c = src; *c != 0; c++) {
> + if *c) >= '0') && ((*c) <= '9')) ||
> + (((*c) >= 'A') && ((*c) <= 'F')) ||
> + (((*c) >= 'a') && ((*c) <= 'f')))
> + continue;
> +
> + return -1;
> + }
> +
> + /* Convert chars to bytes */
> + for (i = 0; i < *size; i++)
> + dst[i] = get_hex_val(src[2 * i]) * 16 +
> + get_hex_val(src[2 * i + 1]);
> +
> + return 0;
> +}

Why not use strtol()?

[dpdk-dev] DPDK User Space: Session onUseability and Ease of Use

2015-10-19 Thread Simon Kågström

On 2015-10-13 18:36, Mcnamara, John wrote:
> 
> * More sample apps
> 
>   - Some more examples of using secondary processes.

One thing I would appreciate is more basic samples for the more advanced
functionality. For example, the qos_sched example is (in my opimion)
quite hard to follow since it has both an integrated command line and a
config file.

I think it would be better to split the example to multiple smaller
samples, which uses static setups to more clearly illustrate some
particular functionality of the QoS framework.

// Simon

[dpdk-dev] [PATCH v2 7/8] example/ip_pipeline/pipeline: update flow_classification pipeline

2015-10-19 Thread Thomas Monjalon

2015-10-13 14:57, Jasvinder Singh:
> From: Fan Zhang 
> + table_params.ops =
> + &rte_table_hash_key16_ext_dosig_ops;
> + }

It doesn't build with shared library.
pipeline_flow_classification_be.c:(.text+0xa84): undefined reference to 
`rte_table_hash_key16_ext_dosig_ops'

Some functions are missing in lib/librte_table/rte_table_version.map

[dpdk-dev] [PATCH v4 6/7] mk: Add rule for installing sdk files

2015-10-19 Thread Panu Matilainen

On 10/16/2015 10:31 PM, Olivier MATZ wrote:
> Hi Panu,
>
> On 10/05/2015 10:20 PM, Mario Carrillo wrote:
>> Add hierarchy-file support to the DPDK makefiles, scripts,
>> examples, tools, config files and headers.
>>
>> When invoking "make install-sdk" makefiles, scripts,
>> examples, tools, config files will be installed in:
>> $(DESTDIR)/$(SDK_DIR)
>> and headers will be installed in:
>> $(DESTDIR)/$(INCLUDE_DIR)
>>
>> Where SDK_DIR=/usr/share/dpdk and INCLUDE_DIR=/usr/include/dpdk
>> by default.
>>
>> You can overrite SDK_DIR and INCLUDE_DIR vars.
>> This hierarchy is based on:
>> http://www.freedesktop.org/software/systemd/man/file-hierarchy.html
>>
>> Signed-off-by: Mario Carrillo 
>
> I don't know if it's feasible, but I think it would be great here
> to be able to install a SDK that is usable to build external
> applications.
>
> I mean, doing something like that:
>
>make install-sdk DESTDIR=/tmp/sdk
>cd /path/to/examples/helloworld
>make RTE_SDK=/tmp/sdk/usr/share/dpdk
>
> Else, what is the purpose of installing the sdk?

Its possible (been there), it just needs some additional symlinks and 
such, at least for lib/ and include/. However at that point the contents 
become arch-dependent (due to the lib symlink) meaning it cannot go into 
/usr/share.

- Panu -

> Regards,
> Olivier
>

[dpdk-dev] Inconsistent statistics counters for pmd_i40e

2015-10-19 Thread Arnon Warshavsky

Hi Eimear,

I just experienced the same problem with firmware versions 4.23 and 4.33
(dpdk 2.0). Did not get to try the latest which is 4.5.
Looking at the code, I don't see that this counter is being read any
differently than its peer counters and I suspect the nic itself.
Can you tell which firmware version you were using?

thanks
/Arnon

On Mon, Oct 19, 2015 at 2:43 PM, Eimear Morrissey <
eimear.morrissey at ie.ibm.com> wrote:

>
>
> Hi,
>
> I'm having issues measuring packets dropped at the NIC in both the 2.0.0
> and 2.1.0 versions of DPDK on an X710 Intel NIC.
>
> In dpdk-2.0.0
> Using rte_eth_xstats the rx_packets and rx_bytes counters increase as
> expected, however rx_missed_errors is always 0 even if a sleep statement is
> added between calls to rte_eth_rx_burst. However changing the coremask so
> the application is running on a different socket than the card will cause
> rx_missed_errors to increment for a limited amount of time and then stop.
> Using rte_eth_stats, ipackets is incremented on packet receipt but the
> q_ipackets and q_errors arrays remain zero. Even crossing sockets seems to
> have no effect on q_errors.
>
> In dpdk-2.1.0 the behaviour is the same as above, except that the number of
> fields returned by rte_eth_xstats_get is reduced (no rx_missed errors at
> all) so running on a different socket no longer has any noticeable effect
> on the stats.
>
> My understanding from the API manual is that the rte_eth_stats q_errors
> array should count the packets missed because software isn't polling fast
> enough, but that doesn't seem to be the case? Is there a standard DPDK way
> to check this? The application is a forwarding one so there's no other way
> to estimate drop except through NIC rx.
>
> Thanks,
> Eimear
>



-- 

*Arnon Warshavsky*
*Qwilt | work: +972-72-2221634 | mobile: +972-50-8583058 | arnon at qwilt.com
*

[dpdk-dev] [PATCH v3 1/1] ip_pipeline: added dynamic pipeline reconfiguration

2015-10-19 Thread Piotr Azarewicz

Up till now pipeline was bound to thread selected in the initial config.
This patch allows binding pipeline to other threads at runtime using CLI
commands.

v2 changes:
- deleted debug printfs

v3 changes:
- add timer for thread message request
- fix bug that the new functionality can't work
- fix leaking memory
- cleaning up

Signed-off-by: Maciej Gajdzica 
Signed-off-by: Piotr Azarewicz 

Acked-by: Cristian Dumitrescu 
---
 examples/ip_pipeline/Makefile  |1 +
 examples/ip_pipeline/app.h |   12 +
 examples/ip_pipeline/config_parse.c|2 +-
 examples/ip_pipeline/init.c|   24 ++
 examples/ip_pipeline/pipeline.h|6 +
 examples/ip_pipeline/pipeline/pipeline_common_fe.h |3 +
 examples/ip_pipeline/thread.c  |  157 -
 examples/ip_pipeline/thread.h  |   84 +
 examples/ip_pipeline/thread_fe.c   |  344 
 examples/ip_pipeline/thread_fe.h   |   95 ++
 10 files changed, 720 insertions(+), 8 deletions(-)
 create mode 100644 examples/ip_pipeline/thread.h
 create mode 100644 examples/ip_pipeline/thread_fe.c
 create mode 100644 examples/ip_pipeline/thread_fe.h

diff --git a/examples/ip_pipeline/Makefile b/examples/ip_pipeline/Makefile
index f3ff1ec..c8e80b5 100644
--- a/examples/ip_pipeline/Makefile
+++ b/examples/ip_pipeline/Makefile
@@ -54,6 +54,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += config_parse_tm.c
 SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += config_check.c
 SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += init.c
 SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += thread.c
+SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += thread_fe.c
 SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += cpu_core_map.c

 SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += pipeline_common_be.c
diff --git a/examples/ip_pipeline/app.h b/examples/ip_pipeline/app.h
index 521e3a0..3a57956 100644
--- a/examples/ip_pipeline/app.h
+++ b/examples/ip_pipeline/app.h
@@ -220,9 +220,11 @@ struct app_pipeline_data {
void *be;
void *fe;
uint64_t timer_period;
+   uint32_t enabled;
 };

 struct app_thread_pipeline_data {
+   uint32_t pipeline_id;
void *be;
pipeline_be_op_run f_run;
pipeline_be_op_timer f_timer;
@@ -234,6 +236,10 @@ struct app_thread_pipeline_data {
 #define APP_MAX_THREAD_PIPELINES 16
 #endif

+#ifndef APP_THREAD_TIMER_PERIOD
+#define APP_THREAD_TIMER_PERIOD  1
+#endif
+
 struct app_thread_data {
struct app_thread_pipeline_data regular[APP_MAX_THREAD_PIPELINES];
struct app_thread_pipeline_data custom[APP_MAX_THREAD_PIPELINES];
@@ -241,7 +247,13 @@ struct app_thread_data {
uint32_t n_regular;
uint32_t n_custom;

+   uint64_t timer_period;
+   uint64_t thread_req_deadline;
+
uint64_t deadline;
+
+   struct rte_ring *msgq_in;
+   struct rte_ring *msgq_out;
 };

 struct app_eal_params {
diff --git a/examples/ip_pipeline/config_parse.c 
b/examples/ip_pipeline/config_parse.c
index c9b78f9..d2aaadf 100644
--- a/examples/ip_pipeline/config_parse.c
+++ b/examples/ip_pipeline/config_parse.c
@@ -362,7 +362,7 @@ parser_read_uint32(uint32_t *value, const char *p)
return 0;
 }

-static int
+int
 parse_pipeline_core(uint32_t *socket,
uint32_t *core,
uint32_t *ht,
diff --git a/examples/ip_pipeline/init.c b/examples/ip_pipeline/init.c
index 3f9c68d..8c63879 100644
--- a/examples/ip_pipeline/init.c
+++ b/examples/ip_pipeline/init.c
@@ -50,6 +50,7 @@
 #include "pipeline_firewall.h"
 #include "pipeline_flow_classification.h"
 #include "pipeline_routing.h"
+#include "thread_fe.h"

 #define APP_NAME_SIZE  32

@@ -1253,6 +1254,25 @@ app_init_threads(struct app_params *app)

t = &app->thread_data[lcore_id];

+   t->timer_period = (rte_get_tsc_hz() * APP_THREAD_TIMER_PERIOD) 
/ 1000;
+   t->thread_req_deadline = time + t->timer_period;
+
+   t->msgq_in = app_thread_msgq_in_get(app,
+   params->socket_id,
+   params->core_id,
+   params->hyper_th_id);
+   if (t->msgq_in == NULL)
+   rte_panic("Init error: Cannot find MSGQ_IN for thread 
%" PRId32,
+   lcore_id);
+
+   t->msgq_out = app_thread_msgq_out_get(app,
+   params->socket_id,
+   params->core_id,
+   params->hyper_th_id);
+   if (t->msgq_out == NULL)
+   rte_panic("Init error: Cannot find MSGQ_OUT for thread 
%" PRId32,
+   lcore_id);
+
ptype = app_pipeline_type_find(app, params->type);
if (ptype == NULL)
rte_panic("Init error: Unknown pipeline "
@@ -1262,12 +1282,15 @@ app_init_thr

[dpdk-dev] [PATCH 4/4] librte_port: modify release notes and deprecation notice

2015-10-19 Thread Thomas Monjalon

2015-09-11 14:35, roy.fan.zhang at intel.com:
> @@ -74,7 +76,7 @@ The libraries prepended with a plus sign were incremented 
> in this version.
>   librte_pipeline.so.1
>   librte_pmd_bond.so.1
> + librte_pmd_ring.so.2
> - librte_port.so.1
> + librte_port.so.2

The updated library must be prepended with a plus sign.
Will be fixed before squashing with path 1 and pushing.

[dpdk-dev] [PATCH] igb: fix IEEE1588 frame identification in i210

2015-10-19 Thread Mcnamara, John

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pablo de Lara
> Sent: Monday, October 19, 2015 2:20 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] igb: fix IEEE1588 frame identification in i210
> 
> Fixed issue where the flag PKT_RX_IEEE1588_PTP was not being set in Intel
> I210 NIC, as EtherType in RX descriptor is in bits 8:10 of Packet Type and
> not in the default bits 0:2.
> 
> Fixes known issue "IEEE1588 support possibly not working with an Intel
> Ethernet Controller I210 NIC"

Acked-by: John McNamara

[dpdk-dev] Inconsistent statistics counters for pmd_i40e

2015-10-19 Thread Eimear Morrissey

Arnon Warshavsky  wrote on 10/19/2015 03:01:46 PM:

> From: Arnon Warshavsky 
> To: Eimear Morrissey/Ireland/IBM at IBMIE
> Cc: dev at dpdk.org
> Date: 10/19/2015 03:01 PM
> Subject: Re: [dpdk-dev] Inconsistent statistics counters for pmd_i40e
>
> Hi Eimear,
>
> I just experienced the same problem with firmware versions 4.23 and
> 4.33 (dpdk 2.0). Did not get to try the latest which is 4.5.
> Looking at the code, I don't see that this counter is being read any
> differently than its peer counters and I suspect the nic itself.
> Can you tell which firmware version you were using?
>
> thanks
> /Arnon
>
> On Mon, Oct 19, 2015 at 2:43 PM, Eimear Morrissey
 > wrote:
>
>
> Hi,
>
> I'm having issues measuring packets dropped at the NIC in both the 2.0.0
> and 2.1.0 versions of DPDK on an X710 Intel NIC.
>
> In dpdk-2.0.0
> Using rte_eth_xstats the rx_packets and rx_bytes counters increase as
> expected, however rx_missed_errors is always 0 even if a sleep statement
is
> added between calls to rte_eth_rx_burst. However changing the coremask so
> the application is running on a different socket than the card will cause
> rx_missed_errors to increment for a limited amount of time and then stop.
> Using rte_eth_stats, ipackets is incremented on packet receipt but the
> q_ipackets and q_errors arrays remain zero. Even crossing sockets seems
to
> have no effect on q_errors.
>
> In dpdk-2.1.0 the behaviour is the same as above, except that the number
of
> fields returned by rte_eth_xstats_get is reduced (no rx_missed errors at
> all) so running on a different socket no longer has any noticeable effect
> on the stats.
>
> My understanding from the API manual is that the rte_eth_stats q_errors
> array should count the packets missed because software isn't polling fast
> enough, but that doesn't seem to be the case? Is there a standard DPDK
way
> to check this? The application is a forwarding one so there's no other
way
> to estimate drop except through NIC rx.
>
> Thanks,
> Eimear
>
>
>
> --
>
> Arnon Warshavsky
> Qwilt | work: +972-72-2221634 | mobile: +972-50-8583058 | arnon at qwilt.com

Hi Arnon,

The firmware version I'm using is 4.26. Where do you see the latest is 4.5
- I can't find anything obvious in the download centre?

Regards,
Eimear

[dpdk-dev] Inconsistent statistics counters for pmd_i40e

2015-10-19 Thread Arnon Warshavsky

Hi Eimear

This is the link I have.
https://downloadcenter.intel.com/download/24769

I guess that the version seen in the web page comes from a different
parallel universe.
You should see the actual fw version inside the zip file.

Thanks
/Arnon


On Mon, Oct 19, 2015 at 5:30 PM, Eimear Morrissey <
eimear.morrissey at ie.ibm.com> wrote:

> Arnon Warshavsky  wrote on 10/19/2015 03:01:46 PM:
>
> > From: Arnon Warshavsky 
> > To: Eimear Morrissey/Ireland/IBM at IBMIE
> > Cc: dev at dpdk.org
> > Date: 10/19/2015 03:01 PM
> > Subject: Re: [dpdk-dev] Inconsistent statistics counters for pmd_i40e
>
> >
> > Hi Eimear,
> >
> > I just experienced the same problem with firmware versions 4.23 and
> > 4.33 (dpdk 2.0). Did not get to try the latest which is 4.5.
> > Looking at the code, I don't see that this counter is being read any
> > differently than its peer counters and I suspect the nic itself.
> > Can you tell which firmware version you were using?
> >
> > thanks
> > /Arnon
> >
> > On Mon, Oct 19, 2015 at 2:43 PM, Eimear Morrissey <
> eimear.morrissey at ie.ibm.com
> > > wrote:
> >
> >
> > Hi,
> >
> > I'm having issues measuring packets dropped at the NIC in both the 2.0.0
> > and 2.1.0 versions of DPDK on an X710 Intel NIC.
> >
> > In dpdk-2.0.0
> > Using rte_eth_xstats the rx_packets and rx_bytes counters increase as
> > expected, however rx_missed_errors is always 0 even if a sleep statement
> is
> > added between calls to rte_eth_rx_burst. However changing the coremask so
> > the application is running on a different socket than the card will cause
> > rx_missed_errors to increment for a limited amount of time and then stop.
> > Using rte_eth_stats, ipackets is incremented on packet receipt but the
> > q_ipackets and q_errors arrays remain zero. Even crossing sockets seems
> to
> > have no effect on q_errors.
> >
> > In dpdk-2.1.0 the behaviour is the same as above, except that the number
> of
> > fields returned by rte_eth_xstats_get is reduced (no rx_missed errors at
> > all) so running on a different socket no longer has any noticeable effect
> > on the stats.
> >
> > My understanding from the API manual is that the rte_eth_stats q_errors
> > array should count the packets missed because software isn't polling fast
> > enough, but that doesn't seem to be the case? Is there a standard DPDK
> way
> > to check this? The application is a forwarding one so there's no other
> way
> > to estimate drop except through NIC rx.
> >
> > Thanks,
> > Eimear
> >
> >
> >
> > --
> >
> > Arnon Warshavsky
> > Qwilt | work: +972-72-2221634 | mobile: +972-50-8583058 |
> arnon at qwilt.com
>
> Hi Arnon,
>
> The firmware version I'm using is 4.26. Where do you see the latest is 4.5
> - I can't find anything obvious in the download centre?
>
> Regards,
> Eimear
>
>


-- 

*Arnon Warshavsky*
*Qwilt | work: +972-72-2221634 | mobile: +972-50-8583058 | arnon at qwilt.com
*

[dpdk-dev] I have a problem in setting up DPDK 2.1.0 in Fedora OS release 20 (Heisenbug). I cannot r

2015-10-19 Thread 최익성

 Dear DPDK experts.

Thank you very much for your best great efforts and precious answers.

As you suggested, I reinstalled newer version of Fedora 22.

I succeeded to communicate test-pmd in Fedora 22 (linux kernel version 
4.2.3-200.fc22.x86_64, DPDK 2.1.0).

The steps are as follows.


sudo yum install gcc
sudo yum install kernel-devel
sudo yum install libpcap-devel 

And then I patched the source codes as in  
http://dpdk.org/dev/patchwork/patch/6800/   Patchwork [dpdk-dev] kni: fix igb 
build with kernel 4.2.

And then, I followed the quick start instructions from 
http://dpdk.org/doc/quick-start .

The test-pmd example seems to work.

Thank you very much.



I have another problem.

As in the following test-pmd log scripts, most of received packets are RX-error.

The same errors are occurred in the Ubuntu OS computer (Ubuntu 14.04.1 LTS, 
kernel version : 3.13.0-34-generic, DPDK 2.1.0).  

If you have any advices or answers for the above problem, I will really 
appreciate.

Thank you very much.

Sincerely Yours,

Ick-Sung Choi.


--
test-pmd scripts:


$ sudo ./build/app/testpmd -c 0xFF -n 4 -- -i --portmask=0x3 --nb-cores=2


EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 1 on socket 0
EAL: Detected lcore 2 as core 2 on socket 0
EAL: Detected lcore 3 as core 3 on socket 0
EAL: Detected lcore 4 as core 4 on socket 0
EAL: Detected lcore 5 as core 8 on socket 0
EAL: Detected lcore 6 as core 9 on socket 0
EAL: Detected lcore 7 as core 10 on socket 0
EAL: Detected lcore 8 as core 11 on socket 0
EAL: Detected lcore 9 as core 12 on socket 0
EAL: Detected lcore 10 as core 0 on socket 1
EAL: Detected lcore 11 as core 1 on socket 1
EAL: Detected lcore 12 as core 2 on socket 1
EAL: Detected lcore 13 as core 3 on socket 1
EAL: Detected lcore 14 as core 4 on socket 1
EAL: Detected lcore 15 as core 8 on socket 1
EAL: Detected lcore 16 as core 9 on socket 1
EAL: Detected lcore 17 as core 10 on socket 1
EAL: Detected lcore 18 as core 11 on socket 1
EAL: Detected lcore 19 as core 12 on socket 1
EAL: Support maximum 128 logical core(s) by configuration.
EAL: Detected 20 lcore(s)
EAL: VFIO modules not all loaded, skip VFIO support...
EAL: Setting up physically contiguous memory...
EAL: Ask a virtual area of 0x5400 bytes
EAL: Virtual area found at 0x7f678a00 (size = 0x5400)
EAL: Ask a virtual area of 0x60 bytes
EAL: Virtual area found at 0x7f678980 (size = 0x60)
EAL: Ask a virtual area of 0x60 bytes
EAL: Virtual area found at 0x7f678900 (size = 0x60)
EAL: Ask a virtual area of 0x80 bytes
EAL: Virtual area found at 0x7f678860 (size = 0x80)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7f678820 (size = 0x20)
EAL: Ask a virtual area of 0x9c40 bytes
EAL: Virtual area found at 0x7f66ebc0 (size = 0x9c40)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7f66eb80 (size = 0x20)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7f66eb40 (size = 0x20)
EAL: Ask a virtual area of 0x1c0 bytes
EAL: Virtual area found at 0x7f66e960 (size = 0x1c0)
EAL: Ask a virtual area of 0x120 bytes
EAL: Virtual area found at 0x7f66e820 (size = 0x120)
EAL: Ask a virtual area of 0x40 bytes
EAL: Virtual area found at 0x7f66e7c0 (size = 0x40)
EAL: Ask a virtual area of 0x40 bytes
EAL: Virtual area found at 0x7f66e760 (size = 0x40)
EAL: Ask a virtual area of 0x380 bytes
EAL: Virtual area found at 0x7f66e3c0 (size = 0x380)
EAL: Ask a virtual area of 0x500 bytes
EAL: Virtual area found at 0x7f66dea0 (size = 0x500)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7f66de60 (size = 0x20)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7f66de20 (size = 0x20)
EAL: Ask a virtual area of 0xc0 bytes
EAL: Virtual area found at 0x7f66dd40 (size = 0xc0)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7f66dd00 (size = 0x20)
EAL: Ask a virtual area of 0xa0 bytes
EAL: Virtual area found at 0x7f66dc40 (size = 0xa0)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7f66dc00 (size = 0x20)
EAL: Ask a virtual area of 0x40 bytes
EAL: Virtual area found at 0x7f66dba0 (size = 0x40)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7f66db60 (size = 0x20)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7f66db20 (size = 0x20)
EAL: Ask a virtual area of 0xf400 bytes
EAL: Virtual area found at 0x7f65e700 (size = 0xf400)
EAL: Ask a virtual area of 0x40 bytes
EAL: Virtual area found at 0x7f65e6a0 (size = 0x40)
EAL: Ask a virtual area of 0x40 bytes
EAL: Virtual area found at 0x7f65e640 (size = 0x40)
EAL: Ask a virtual area

[dpdk-dev] [PATCH 0/4]librte_port: modify macros to access packet meta-data

2015-10-19 Thread Thomas Monjalon

2015-09-11 14:35, roy.fan.zhang at intel.com:
>  app/test-pipeline/main.h  |  2 ++
>  app/test-pipeline/pipeline_hash.c | 34 ++-
>  app/test-pipeline/pipeline_lpm.c  |  2 +-
>  app/test-pipeline/pipeline_lpm_ipv6.c |  2 +-
>  app/test/test_table.h |  8 +--
>  app/test/test_table_combined.c| 28 +++---
>  app/test/test_table_pipeline.c|  3 ++-
>  app/test/test_table_tables.c  | 44 
> ++-
>  doc/guides/rel_notes/deprecation.rst  |  5 
>  doc/guides/rel_notes/release_2_2.rst  |  4 +++-
>  lib/librte_port/Makefile  |  2 +-
>  lib/librte_port/rte_port.h|  2 +-

examples/ip_pipeline does not need to be updated?

[dpdk-dev] [PATCH 0/4]librte_port: modify macros to access packet meta-data

2015-10-19 Thread Thomas Monjalon

> > From: Fan Zhang 
> > 
> > This patchset links to ABI change announced for librte_port. Macros to
> > access the packet meta-data stored within the packet buffer has been
> > adjusted to cover the packet mbuf structure.
> 
> Acked-by: Cristian Dumitrescu 

Applied in 1 commit (such change must be atomic), thanks

[dpdk-dev] Question about unsupported transceivers

2015-10-19 Thread Alexander Duyck

On 10/18/2015 06:06 PM, Alex Forster wrote:
> On 10/15/15, 3:53 PM, "Alexander Duyck"  wrote:
>
>
 It looks like you are probably seeing interfaces be unbound and then
 rebound.  As such you are likely pushing things outside of the array
 boundary.  One solution might just be to at more ",1"s if you are only
 going to be doing this kind of thing at boot up.  The upper limit for
 the array is 32 entries so as long as you only are setting this up once
 you could probably get away with that.

 An alternative would be to modify the definition of the parameter in
 ixgbe_param.c.  If you look through the file you should fine several
 likes like below:
struct ixgbe_option opt = {
.type = enable_option,
.name = "allow_unsupported_sfp",
.err  = "defaulting to Disabled",
.def  = OPTION_DISABLED
};

 If you modify the .def value to "OPTION_ENABLED", and then rebuild and
 reinstall your driver you should be able have it install without any
 issues.

 - Alex

>>> Yeah, I've had roughly the same thought process since you mentioned the
>>> args array. My first idea was "maybe the driver can't fit all of my 1's"
>>> but I saw it was defined at 32. Then I decided to just patch the whole
>>> enable_unsupported_sfp option out
>>> https://gist.github.com/AlexForster/112fd822704caf804849 but I'm still
>>> failing.
>>>
>>> I've been digging a bit, and I'm failing here in ixgbe_main.c...
>>>
>>> /* reset_hw fills in the perm_addr as well */
>>> hw->phy.reset_if_overtemp = true;
>>> err = hw->mac.ops.reset_hw(hw);
>>> hw->phy.reset_if_overtemp = false;
>>> if (err == IXGBE_ERR_SFP_NOT_PRESENT) {
>>> err = IXGBE_SUCCESS;
>>> } else if (err == IXGBE_ERR_SFP_NOT_SUPPORTED) {
>>> e_dev_err("failed to load because an unsupported SFP+ or QSFP "
>>>   "module type was detected.\n");
>>> e_dev_err("Reload the driver after installing a supported "
>>>   "module.\n");
>>> goto err_sw_init;
>>> } else if (err) {
>>> e_dev_err("HW Init failed: %d\n", err);
>>> goto err_sw_init;
>>> }
>>>
>>>
>>> I've attempted a hand-stacktrace and came up with the following...
>>>
>>> ixgbe_82599.c at 1016
>>>* ixgbe_reset_hw_82599() is defined
>>>* calls phy->ops.init() which potentially returns
>>> IXGBE_ERR_SFP_NOT_SUPPORTED
>>>
>>> ixgbe_82599.c at 102
>>>* ixgbe_init_phy_ops_82599() is defined
>>>* IXGBE_ERR_SFP_NOT_SUPPORTED is returned after calling
>>> phy->ops.identify()
>>>
>>> ixgbe_82599.c at 2085
>>>* ixgbe_identify_phy_82599() is defined
>>>* calls ixgbe_identify_module_generic()
>>>
>>> ixgbe_phy.c at 1281
>>>* ixgbe_identify_module_generic() is defined
>>>* calls ixgbe_identify_qsfp_module_generic()
>>>
>>> ixgbe_phy.c at 1663
>>>* ixgbe_identify_qsfp_module_generic() is defined
>>>* We fail somewhere before the ending call to ixgbe_get_device_caps()
>>> which does take allow_unsupported_sfp into account
>>>
>>>* Possibility: hw->phy.ops.read_i2c_eeprom(hw, IXGBE_SFF_IDENTIFIER,
>>> &identifier) != IXGBE_SFF_IDENTIFIER_QSFP_PLUS
>>>* Possibility: active_cable != true
>>>
>>> And then I'm over my head. Should I assume from here that the most
>>> likely
>>> explanation is a bad transceiver or bad fiber?
>>>
>>> Alex Forster
>>
>> Are you able to swap transceiver or fiber between the 4 ports that work
>> and the 4 that don't?  If you could do that then you should be able to
>> tell if the issue is following the NIC ports, or if it is an issue with
>> the external connections.  If it is issue is following the transceiver
>> or fiber then it is probably what is causing the issue.
>>
>> The other thing you could try doing is adding a printk to the spots
>> where the status is set to SFP_NOT_SUPPORTED so that you could figure
>> out exactly which spot is triggering the rejection of the module.
>>
>> - Alex
>
> I had remote hands swap fibers on the QSFP side and the issue moved to the
> first card, so I'm going to have the fibers cleaned and tested. This
> appears to be my issue.
>
> I'd like to submit a patch for ixgbe_identify_qsfp_module_generic() to
> print more helpful errors in the two cases mentioned above, so that
> hopefully nobody ever has to deal with this again. Would I be wasting my
> time, or does something like this have a likelihood of being accepted?
>
> Thank you for all of your help! I wouldn't have figured this out nearly as
> quickly without it.
>
> Alex Forster

I suspect there would be some value to such a patch, just make sure to 
explain the reason for needing it in the patch description.  My advice 
would be to put such a patch together against what is in Jeff Kirsher's 
next queue 
(https://git.kernel.org/cgit/linux/kernel/git/jkirsher/next-queue.git/) 
and then base your patches off of that.  The email lists for submitting 
patches to is intel

[dpdk-dev] [PATCH v2 2/3] port: fix ras ring ports

2015-10-19 Thread Thomas Monjalon

2015-09-24 11:55, Piotr Azarewicz:
> Bug fixes for ring ports with IPv4/IPv6 reassembly support.
> Previous implementation can't work properly due to incorrect choosing
> process function.
> Also, assuming that, when processing ip packet, ip header is know we can
> set l3_len parameter here.
> 
> Signed-off-by: Piotr Azarewicz 

Please add a "Fixes:" tag.
Thanks

[dpdk-dev] [PATCH v2 1/3] port: add mp/mc ring ports

2015-10-19 Thread Thomas Monjalon

2015-09-24 11:55, Piotr Azarewicz:
> ring_multi_reader input port (on top of multi consumer rte_ring)
> ring_multi_writer output port (on top of multi producer rte_ring)
> 
> Signed-off-by: Piotr Azarewicz 
> ---
>  lib/librte_port/rte_port_ring.c |  311 
> +++
>  lib/librte_port/rte_port_ring.h |   35 -

Piotr,
You forgot the shared library case: the new functions are missing in the .map.

Cristian,
As a maintainer, please check compilation of shared library.

See you for the v3 ;)
Thanks

[dpdk-dev] [PATCH v3 3/5] test_table: added check for bulk add/delete to acl table unit test

2015-10-19 Thread Thomas Monjalon

Hi,
Some changes are needed. Please send a v4.

2015-10-13 09:34, Marcin Kerlin:
> Added to acl table unit test check for bulk add and bulk delete.
> 
> Signed-off-by: Maciej Gajdzica 
> Acked-by: Cristian Dumitrescu 

ERROR:BAD_SIGN_OFF: Unrecognized email address

Build fails:
test_table_acl.c:(.text+0x24e7): undefined reference to 
`rte_pipeline_table_entry_add_bulk'
test_table_acl.c:(.text+0x288a): undefined reference to 
`rte_pipeline_table_entry_delete_bulk'

Probably missing in the .map file?

[dpdk-dev] [PATCH v6 0/4] bonding corrections and additions

2015-10-19 Thread Eric Kinzie

This patchset makes a couple of small corrections to the bonding driver
and introduces the ability to use an external state machine for mode
4 operation.

Changes in v2:
  . eliminate external_sm field in 802.3ad configuration 
(rte_eth_bond_8023ad_conf).
  . stop bonding device before changing the periodic callback function.
start again if needed.
  . remove unnecessary calls to valid_bonded_port_id().
  . do not check for NULL tx_ring.
  . return error in rte_eth_bond_8023ad_ext_slowtx() if packet is not LACP.
  . remove check for external sm configuration in periodic callback
  . check for valid LACPDU in test application's rx callback
  . add "Fixes:" tags

Changes in v3:
  . update rte_eth_bond_version.map

Changes in v4:
  . version functions that modify slowrx_cb

Changes in v5:
  . remove "__vsym" from source files and use updated symbol version macros

Changes in v6:
  . updated symbols have version 2.2
  . specify versioned local symbol names in map so they don't become exported

Eric Kinzie (4):
  bond mode 4: copy entire config structure
  bond mode 4: do not ignore multicast
  bond mode 4: allow external state machine
  bond mode 4: tests for external state machine

 app/test/test_link_bonding_mode4.c|  217 ++-
 drivers/net/bonding/Makefile  |2 +-
 drivers/net/bonding/rte_eth_bond_8023ad.c |  241 -
 drivers/net/bonding/rte_eth_bond_8023ad.h |   44 
 drivers/net/bonding/rte_eth_bond_8023ad_private.h |   28 +++
 drivers/net/bonding/rte_eth_bond_pmd.c|1 +
 drivers/net/bonding/rte_eth_bond_version.map  |   10 +
 7 files changed, 529 insertions(+), 14 deletions(-)

-- 
1.7.10.4

[dpdk-dev] [PATCH v6 1/4] bond mode 4: copy entire config structure

2015-10-19 Thread Eric Kinzie

From: Eric Kinzie 

  Copy all needed fields from the mode8023ad_private structure in
  bond_mode_8023ad_conf_get().  This help ensure that a subsequent call
  to rte_eth_bond_8023ad_setup() is not passed uninitialized data that
  would result in either incorrect behavior or a failed sanity check.

Fixes: 46fb43683679 ("bond: add mode 4")

Signed-off-by: Eric Kinzie 
---
 drivers/net/bonding/rte_eth_bond_8023ad.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c 
b/drivers/net/bonding/rte_eth_bond_8023ad.c
index c0f0b99..08f679f 100644
--- a/drivers/net/bonding/rte_eth_bond_8023ad.c
+++ b/drivers/net/bonding/rte_eth_bond_8023ad.c
@@ -1013,6 +1013,7 @@ bond_mode_8023ad_conf_get(struct rte_eth_dev *dev,
conf->aggregate_wait_timeout_ms = mode4->aggregate_wait_timeout / 
ms_ticks;
conf->tx_period_ms = mode4->tx_period_timeout / ms_ticks;
conf->update_timeout_ms = mode4->update_timeout_us / 1000;
+   conf->rx_marker_period_ms = mode4->rx_marker_timeout / ms_ticks;
 }

 void
-- 
1.7.10.4

[dpdk-dev] [PATCH v6 2/4] bond mode 4: do not ignore multicast

2015-10-19 Thread Eric Kinzie

From: Eric Kinzie 

The bonding PMD in mode 4 puts all enslaved interfaces into promiscuous
mode in order to receive LACPDUs and must filter unwanted packets
after the traffic has been "collected".  Allow broadcast and multicast
through so that ARP and IPv6 neighbor discovery continue to work.

Fixes: 46fb43683679 ("bond: add mode 4")

Signed-off-by: Eric Kinzie 
---
 app/test/test_link_bonding_mode4.c |7 +--
 drivers/net/bonding/rte_eth_bond_pmd.c |1 +
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/app/test/test_link_bonding_mode4.c 
b/app/test/test_link_bonding_mode4.c
index 460539d..d785393 100644
--- a/app/test/test_link_bonding_mode4.c
+++ b/app/test/test_link_bonding_mode4.c
@@ -749,8 +749,11 @@ test_mode4_rx(void)
rte_eth_macaddr_get(test_params.bonded_port_id, &bonded_mac);
ether_addr_copy(&bonded_mac, &dst_mac);

-   /* Assert that dst address is not bonding address */
-   dst_mac.addr_bytes[0]++;
+   /* Assert that dst address is not bonding address.  Do not set the
+* least significant bit of the zero byte as this would create a
+* multicast address.
+*/
+   dst_mac.addr_bytes[0] += 2;

/* First try with promiscuous mode enabled.
 * Add 2 packets to each slave. First with bonding MAC address, second 
with
diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
b/drivers/net/bonding/rte_eth_bond_pmd.c
index 5cc6372..733022b 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -170,6 +170,7 @@ bond_ethdev_rx_burst_8023ad(void *queue, struct rte_mbuf 
**bufs,
 * mode and packet address does not match. */
if (unlikely(hdr->ether_type == ether_type_slow_be ||
!collecting || (!promisc &&
+   !is_multicast_ether_addr(&hdr->d_addr) 
&&
!is_same_ether_addr(&bond_mac, 
&hdr->d_addr {

if (hdr->ether_type == ether_type_slow_be) {
-- 
1.7.10.4

[dpdk-dev] [PATCH v6 3/4] bond mode 4: allow external state machine

2015-10-19 Thread Eric Kinzie

From: Eric Kinzie 

  Provide functions to allow an external 802.3ad state machine to transmit
  and recieve LACPDUs and to set the collection/distribution flags on
  slave interfaces.

  Size of struct rte_eth_bond_8023ad_conf changed.  Increment LIBABIVER
  and version bond_mode_8023ad_setup and bond_mode_8023ad_conf_get
  functions.

Signed-off-by: Eric Kinzie 
---
 drivers/net/bonding/Makefile  |2 +-
 drivers/net/bonding/rte_eth_bond_8023ad.c |  240 -
 drivers/net/bonding/rte_eth_bond_8023ad.h |   44 
 drivers/net/bonding/rte_eth_bond_8023ad_private.h |   28 +++
 drivers/net/bonding/rte_eth_bond_version.map  |   10 +
 5 files changed, 321 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/Makefile b/drivers/net/bonding/Makefile
index dee0875..8b892ca 100644
--- a/drivers/net/bonding/Makefile
+++ b/drivers/net/bonding/Makefile
@@ -41,7 +41,7 @@ CFLAGS += $(WERROR_FLAGS)

 EXPORT_MAP := rte_eth_bond_version.map

-LIBABIVER := 1
+LIBABIVER := 2

 #
 # all source are stored in SRCS-y
diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c 
b/drivers/net/bonding/rte_eth_bond_8023ad.c
index 08f679f..0461fb5 100644
--- a/drivers/net/bonding/rte_eth_bond_8023ad.c
+++ b/drivers/net/bonding/rte_eth_bond_8023ad.c
@@ -39,9 +39,12 @@
 #include 
 #include 
 #include 
+#include 

 #include "rte_eth_bond_private.h"

+static void bond_mode_8023ad_ext_periodic_cb(void *arg);
+
 #ifdef RTE_LIBRTE_BOND_DEBUG_8023AD
 #define MODE4_DEBUG(fmt, ...) RTE_LOG(DEBUG, PMD, "%6u [Port %u: %s] " fmt, \
bond_dbg_get_time_diff_ms(), slave_id, \
@@ -999,7 +1002,28 @@ bond_mode_8023ad_mac_address_update(struct rte_eth_dev 
*bond_dev)
 }

 void
-bond_mode_8023ad_conf_get(struct rte_eth_dev *dev,
+bond_mode_8023ad_conf_get_v20(struct rte_eth_dev *dev,
+   struct rte_eth_bond_8023ad_conf *conf)
+{
+   struct bond_dev_private *internals = dev->data->dev_private;
+   struct mode8023ad_private *mode4 = &internals->mode4;
+   uint64_t ms_ticks = rte_get_tsc_hz() / 1000;
+
+   conf->fast_periodic_ms = mode4->fast_periodic_timeout / ms_ticks;
+   conf->slow_periodic_ms = mode4->slow_periodic_timeout / ms_ticks;
+   conf->short_timeout_ms = mode4->short_timeout / ms_ticks;
+   conf->long_timeout_ms = mode4->long_timeout / ms_ticks;
+   conf->aggregate_wait_timeout_ms = mode4->aggregate_wait_timeout / 
ms_ticks;
+   conf->tx_period_ms = mode4->tx_period_timeout / ms_ticks;
+   conf->update_timeout_ms = mode4->update_timeout_us / 1000;
+   conf->rx_marker_period_ms = mode4->rx_marker_timeout / ms_ticks;
+   conf->slowrx_cb = mode4->slowrx_cb;
+}
+
+VERSION_SYMBOL(bond_mode_8023ad_conf_get, _v20, 2.0);
+
+void
+bond_mode_8023ad_conf_get_v22(struct rte_eth_dev *dev,
struct rte_eth_bond_8023ad_conf *conf)
 {
struct bond_dev_private *internals = dev->data->dev_private;
@@ -1014,10 +1038,15 @@ bond_mode_8023ad_conf_get(struct rte_eth_dev *dev,
conf->tx_period_ms = mode4->tx_period_timeout / ms_ticks;
conf->update_timeout_ms = mode4->update_timeout_us / 1000;
conf->rx_marker_period_ms = mode4->rx_marker_timeout / ms_ticks;
+   conf->slowrx_cb = mode4->slowrx_cb;
 }

+BIND_DEFAULT_SYMBOL(bond_mode_8023ad_conf_get, _v22, 2.2);
+MAP_STATIC_SYMBOL(void bond_mode_8023ad_conf_get(struct rte_eth_dev *dev, 
struct rte_eth_bond_8023ad_conf *conf), \
+ bond_mode_8023ad_conf_get_v22);
+
 void
-bond_mode_8023ad_setup(struct rte_eth_dev *dev,
+bond_mode_8023ad_setup_v20(struct rte_eth_dev *dev,
struct rte_eth_bond_8023ad_conf *conf)
 {
struct rte_eth_bond_8023ad_conf def_conf;
@@ -1047,6 +1076,50 @@ bond_mode_8023ad_setup(struct rte_eth_dev *dev,
mode4->update_timeout_us = conf->update_timeout_ms * 1000;
 }

+VERSION_SYMBOL(bond_mode_8023ad_setup, _v20, 2.0);
+
+void
+bond_mode_8023ad_setup_v22(struct rte_eth_dev *dev,
+   struct rte_eth_bond_8023ad_conf *conf)
+{
+   struct rte_eth_bond_8023ad_conf def_conf;
+   struct bond_dev_private *internals = dev->data->dev_private;
+   struct mode8023ad_private *mode4 = &internals->mode4;
+   uint64_t ms_ticks = rte_get_tsc_hz() / 1000;
+
+   if (conf == NULL) {
+   conf = &def_conf;
+   conf->fast_periodic_ms = BOND_8023AD_FAST_PERIODIC_MS;
+   conf->slow_periodic_ms = BOND_8023AD_SLOW_PERIODIC_MS;
+   conf->short_timeout_ms = BOND_8023AD_SHORT_TIMEOUT_MS;
+   conf->long_timeout_ms = BOND_8023AD_LONG_TIMEOUT_MS;
+   conf->aggregate_wait_timeout_ms = 
BOND_8023AD_AGGREGATE_WAIT_TIMEOUT_MS;
+   conf->tx_period_ms = BOND_8023AD_TX_MACHINE_PERIOD_MS;
+   conf->rx_marker_period_ms = BOND_8023AD_RX_MARKER_PERIOD_MS;
+   conf->update_timeout_ms = BOND_MODE_8023AX_UPDATE_TIMEOUT_MS;
+   conf->slowrx_cb = NULL;
+   }
+
+

[dpdk-dev] [PATCH v6 4/4] bond mode 4: tests for external state machine

2015-10-19 Thread Eric Kinzie

From: Eric Kinzie 

  This adds test cases for exercising the external state machine API to
  the mode 4 autotest.

Signed-off-by: Eric Kinzie 
---
 app/test/test_link_bonding_mode4.c |  210 ++--
 1 file changed, 201 insertions(+), 9 deletions(-)

diff --git a/app/test/test_link_bonding_mode4.c 
b/app/test/test_link_bonding_mode4.c
index d785393..6a459cd 100644
--- a/app/test/test_link_bonding_mode4.c
+++ b/app/test/test_link_bonding_mode4.c
@@ -151,6 +151,8 @@ static struct rte_eth_conf default_pmd_conf = {
.lpbk_mode = 0,
 };

+static uint8_t lacpdu_rx_count[RTE_MAX_ETHPORTS] = {0, };
+
 #define FOR_EACH(_i, _item, _array, _size) \
for (_i = 0, _item = &_array[0]; _i < _size && (_item = &_array[_i]); 
_i++)

@@ -320,8 +322,26 @@ remove_slave(struct slave_conf *slave)
return 0;
 }

+static void
+lacp_recv_cb(uint8_t slave_id, struct rte_mbuf *lacp_pkt)
+{
+   struct ether_hdr *hdr;
+   struct slow_protocol_frame *slow_hdr;
+
+   RTE_VERIFY(lacp_pkt != NULL);
+
+   hdr = rte_pktmbuf_mtod(lacp_pkt, struct ether_hdr *);
+   RTE_VERIFY(hdr->ether_type == rte_cpu_to_be_16(ETHER_TYPE_SLOW));
+
+   slow_hdr = rte_pktmbuf_mtod(lacp_pkt, struct slow_protocol_frame *);
+   RTE_VERIFY(slow_hdr->slow_protocol.subtype == SLOW_SUBTYPE_LACP);
+
+   lacpdu_rx_count[slave_id]++;
+   rte_pktmbuf_free(lacp_pkt);
+}
+
 static int
-initialize_bonded_device_with_slaves(uint8_t slave_count, uint8_t start)
+initialize_bonded_device_with_slaves(uint8_t slave_count, uint8_t external_sm)
 {
uint8_t i;

@@ -337,9 +357,17 @@ initialize_bonded_device_with_slaves(uint8_t slave_count, 
uint8_t start)
rte_eth_bond_8023ad_setup(test_params.bonded_port_id, NULL);
rte_eth_promiscuous_disable(test_params.bonded_port_id);

-   if (start)
-   
TEST_ASSERT_SUCCESS(rte_eth_dev_start(test_params.bonded_port_id),
-   "Failed to start bonded device");
+   if (external_sm) {
+   struct rte_eth_bond_8023ad_conf conf;
+
+   rte_eth_bond_8023ad_conf_get(test_params.bonded_port_id, &conf);
+   conf.slowrx_cb = lacp_recv_cb;
+   rte_eth_bond_8023ad_setup(test_params.bonded_port_id, &conf);
+
+   }
+
+   TEST_ASSERT_SUCCESS(rte_eth_dev_start(test_params.bonded_port_id),
+   "Failed to start bonded device");

return TEST_SUCCESS;
 }
@@ -642,7 +670,7 @@ test_mode4_lacp(void)
 {
int retval;

-   retval = initialize_bonded_device_with_slaves(TEST_LACP_SLAVE_COUT, 1);
+   retval = initialize_bonded_device_with_slaves(TEST_LACP_SLAVE_COUT, 0);
TEST_ASSERT_SUCCESS(retval, "Failed to initialize bonded device");

/* Test LACP handshake function */
@@ -740,7 +768,7 @@ test_mode4_rx(void)
struct ether_addr dst_mac;
struct ether_addr bonded_mac;

-   retval = initialize_bonded_device_with_slaves(TEST_PROMISC_SLAVE_COUNT, 
1);
+   retval = initialize_bonded_device_with_slaves(TEST_PROMISC_SLAVE_COUNT, 
0);
TEST_ASSERT_SUCCESS(retval, "Failed to initialize bonded device");

retval = bond_handshake();
@@ -917,7 +945,7 @@ test_mode4_tx_burst(void)
struct ether_addr dst_mac = { { 0x00, 0xFF, 0x00, 0xFF, 0x00, 0x00 } };
struct ether_addr bonded_mac;

-   retval = initialize_bonded_device_with_slaves(TEST_TX_SLAVE_COUNT, 1);
+   retval = initialize_bonded_device_with_slaves(TEST_TX_SLAVE_COUNT, 0);
TEST_ASSERT_SUCCESS(retval, "Failed to initialize bonded device");

retval = bond_handshake();
@@ -1101,7 +1129,7 @@ test_mode4_marker(void)
uint8_t i, j;
const uint16_t ethtype_slow_be = rte_be_to_cpu_16(ETHER_TYPE_SLOW);

-   retval = initialize_bonded_device_with_slaves(TEST_MARKER_SLAVE_COUT, 
1);
+   retval = initialize_bonded_device_with_slaves(TEST_MARKER_SLAVE_COUT, 
0);
TEST_ASSERT_SUCCESS(retval, "Failed to initialize bonded device");

/* Test LACP handshake function */
@@ -1186,7 +1214,7 @@ test_mode4_expired(void)

struct rte_eth_bond_8023ad_conf conf;

-   retval = initialize_bonded_device_with_slaves(TEST_EXPIRED_SLAVE_COUNT, 
1);
+   retval = initialize_bonded_device_with_slaves(TEST_EXPIRED_SLAVE_COUNT, 
0);
/* Set custom timeouts to make test last shorter. */
rte_eth_bond_8023ad_conf_get(test_params.bonded_port_id, &conf);
conf.fast_periodic_ms = 100;
@@ -1268,6 +1296,156 @@ test_mode4_expired(void)
 }

 static int
+test_mode4_ext_ctrl(void)
+{
+   /*
+* configure bonded interface without the external sm enabled
+*   . try to transmit lacpdu (should fail)
+*   . try to set collecting and distributing flags (should fail)
+* reconfigure w/external sm
+*   . transmit one lacpdu on each slave using new api
+*   . make sure each slave receives one lacpdu using the callback api
+*   . transmit

[dpdk-dev] [PATCH 3/5] virtio: use indirect ring elements

2015-10-19 Thread Stephen Hemminger

On Mon, 19 Oct 2015 13:19:50 +
"Xie, Huawei"  wrote:

> >  static int
> > -virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie)
> > +virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie,
> > +  int use_indirect)
> >  {
> > struct vq_desc_extra *dxp;
> > struct vring_desc *start_dp;
> > uint16_t seg_num = cookie->nb_segs;
> > -   uint16_t needed = 1 + seg_num;
> > +   uint16_t needed = use_indirect ? 1 : 1 + seg_num;
> > uint16_t head_idx, idx;
> > -   uint16_t head_size = txvq->hw->vtnet_hdr_size;
> > +   unsigned long offs;
> >  
> > if (unlikely(txvq->vq_free_cnt == 0))
> > return -ENOSPC;
> > @@ -220,12 +221,29 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct 
> > rte_mbuf *cookie)
> > dxp = &txvq->vq_descx[idx];
> > dxp->cookie = (void *)cookie;
> > dxp->ndescs = needed;
> > -
> > start_dp = txvq->vq_ring.desc;
> > -   start_dp[idx].addr =
> > -   txvq->virtio_net_hdr_mem + idx * head_size;
> > -   start_dp[idx].len = (uint32_t)head_size;
> > -   start_dp[idx].flags = VRING_DESC_F_NEXT;
> > +
> > +   if (use_indirect) {
> > +   struct virtio_tx_region *txr
> > +   = txvq->virtio_net_hdr_mz->addr;
> > +
> > +   offs = idx * sizeof(struct virtio_tx_region)
> > +   + offsetof(struct virtio_tx_region, tx_indir);
> > +
> > +   start_dp[idx].addr  = txvq->virtio_net_hdr_mem + offs;
> > +   start_dp[idx].len   = (seg_num + 1) * sizeof(struct 
> > vring_desc);  
> a. In indirect mode, as we always use one descriptor, could we allocate
> one fixed descriptor as what i did in RX/TX ring layout optimization? :).

The code can not assume all packets will be in indirect mode. If using
any_layout, then some packets will use that. Also if you give a packet
where nb_segs is very large, then it falls back to old mode.
Also some hosts (like vhost) don't support indirect.

> b. If not a, we could cache the descriptor, avoid update unless the
> fields are different. In current implementation of free desc list, we
> could make them always use the same tx desc for the same ring slot. I am
> to submit a patch for normal rx path.

See above

> c. Could we initialize the length of all tx descriptors to be
> VIRTIO_MAX_INDIRECT * sizeof(struct vring_desc)? Is maximum length ok
> here? Does the spec require that the length field reflects the length of
> real used descs, as we already have the next field to indicate the last
> descriptor.

The largest VIRTIO_MAX_INDIRECT possible is very large 4K


> 
> > +   start_dp[idx].flags = VRING_DESC_F_INDIRECT;
> > +
> > +   start_dp = txr[idx].tx_indir;
> > +   idx = 0;
> > +   } else {
> > +   offs = idx * sizeof(struct virtio_tx_region)
> > +   + offsetof(struct virtio_tx_region, tx_hdr);
> > +
> > +   start_dp[idx].addr  = txvq->virtio_net_hdr_mem + offs;
> > +   start_dp[idx].len   = txvq->hw->vtnet_hdr_size;
> > +   start_dp[idx].flags = VRING_DESC_F_NEXT;
> > +   }
> >  
> > for (; ((seg_num > 0) && (cookie != NULL)); seg_num--) {
> > idx = start_dp[idx].next;  
> This looks weird to me. Why skip the first user provided descriptor?
> idx = 0
> idx = start_dp[idx].next
> start_dp[idx].addr = ...

The first descriptor (0) is initialized once to point to the static
all zeros tx header. Then code skips to second entry to initailize the
first data block.

> 
> > @@ -236,7 +254,12 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct 
> > rte_mbuf *cookie)
> > }
> >  
> > start_dp[idx].flags &= ~VRING_DESC_F_NEXT;
> > -   idx = start_dp[idx].next;
> > +
> > +   if (use_indirect)
> > +   idx = txvq->vq_ring.desc[head_idx].next;
> > +   else
> > +   idx = start_dp[idx].next;
> > +
> > txvq->vq_desc_head_idx = idx;
> > if (txvq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
> > txvq->vq_desc_tail_idx = idx;
> > @@ -261,7 +284,7 @@ static void
> >  virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
> >  {
> > struct rte_mbuf *m;
> > -   int i, nbufs, error, size = vq->vq_nentries;
> > +   int nbufs, error, size = vq->vq_nentries;
> > struct vring *vr = &vq->vq_ring;
> > uint8_t *ring_mem = vq->vq_ring_virt_mem;
> >  
> > @@ -279,10 +302,7 @@ virtio_dev_vring_start(struct virtqueue *vq, int 
> > queue_type)
> > vq->vq_free_cnt = vq->vq_nentries;
> > memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries);
> >  
> > -   /* Chain all the descriptors in the ring with an END */
> > -   for (i = 0; i < size - 1; i++)
> > -   vr->desc[i].next = (uint16_t)(i + 1);
> > -   vr->desc[i].next = VQ_RING_DESC_CHAIN_END;
> > +   vring_desc_init(vr->desc, size);
> >  
> > /*
> >  * Disable device(host) interrupting guest
> > @@ -760,7 +780,15 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
> > **tx_pkts, uint16_t nb_pkts)
> >

[dpdk-dev] [PATCH 1/5] virtio: clean up space checks on xmit

2015-10-19 Thread Stephen Hemminger

On Mon, 19 Oct 2015 08:02:07 +
"Xie, Huawei"  wrote:

> On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
> > The space check for transmit ring only needs a single conditional.
> > I.e only need to recheck for space if there was no space in first check.
> I see you reorganized the code. It is more clear and readable with your
> change, but no check is removed.
> We could remove the check after virtio_xmit_cleanup. In current
> implementation, we assume the virtio_xmit_cleanup and vq_ring_free_chain
> always succeed.

The only case that matters is if virtio_xmit_cleanup can not
free up any slots.

> > This can help performance and simplifies loop.
> >
> > Signed-off-by: Stephen Hemminger 
> > ---
> >  drivers/net/virtio/virtio_rxtx.c | 66 
> > 
> >  1 file changed, 27 insertions(+), 39 deletions(-)
> >
> > diff --git a/drivers/net/virtio/virtio_rxtx.c 
> > b/drivers/net/virtio/virtio_rxtx.c
> > index c5b53bb..5b50ed0 100644
> > --- a/drivers/net/virtio/virtio_rxtx.c
> > +++ b/drivers/net/virtio/virtio_rxtx.c
> > @@ -745,7 +745,6 @@ uint16_t
> >  virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t 
> > nb_pkts)
> >  {
> > struct virtqueue *txvq = tx_queue;
> > -   struct rte_mbuf *txm;
> > uint16_t nb_used, nb_tx;
> > int error;
> >  
> > @@ -759,57 +758,46 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
> > **tx_pkts, uint16_t nb_pkts)
> > if (likely(nb_used > txvq->vq_nentries - txvq->vq_free_thresh))
> > virtio_xmit_cleanup(txvq, nb_used);
> >  
> > -   nb_tx = 0;
> > +   for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
> > +   struct rte_mbuf *txm = tx_pkts[nb_tx];
> > +   int need = txm->nb_segs - txvq->vq_free_cnt + 1;
> >  
> > -   while (nb_tx < nb_pkts) {
> > -   /* Need one more descriptor for virtio header. */
> > -   int need = tx_pkts[nb_tx]->nb_segs - txvq->vq_free_cnt + 1;
> > -
> > -   /*Positive value indicates it need free vring descriptors */
> > +   /* Positive value indicates it need free vring descriptors */
> As you fix the comment, if it is correct, could you do s/need/needs/ as
> well, :)?
> > if (unlikely(need > 0)) {
> > nb_used = VIRTQUEUE_NUSED(txvq);
> > virtio_rmb();
> > need = RTE_MIN(need, (int)nb_used);
> >  
> > virtio_xmit_cleanup(txvq, need);
> > -   need = (int)tx_pkts[nb_tx]->nb_segs -
> > -   txvq->vq_free_cnt + 1;
> > -   }
> > -
> > -   /*
> > -* Zero or negative value indicates it has enough free
> > -* descriptors to use for transmitting.
> > -*/
> > -   if (likely(need <= 0)) {
> > -   txm = tx_pkts[nb_tx];
> > -
> > -   /* Do VLAN tag insertion */
> > -   if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) {
> > -   error = rte_vlan_insert(&txm);
> > -   if (unlikely(error)) {
> > -   rte_pktmbuf_free(txm);
> > -   ++nb_tx;
> > -   continue;
> > -   }
> Still need is rechecked here. It could be removed if virtio_xmit_cleanup
> ensures need would be <= 0 after the call.
> > +   need = txm->nb_segs - txvq->vq_free_cnt + 1;
> > +   if (unlikely(need > 0)) {
> > +   PMD_TX_LOG(ERR,
> > +  "No free tx descriptors to 
> > transmit");
> > +   break;
> > }
> > +   }
> >  
> > -   /* Enqueue Packet buffers */
> > -   error = virtqueue_enqueue_xmit(txvq, txm);
> > +   /* Do VLAN tag insertion */
> > +   if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) {
> > +   error = rte_vlan_insert(&txm);
> > if (unlikely(error)) {
> > -   if (error == ENOSPC)
> > -   PMD_TX_LOG(ERR, "virtqueue_enqueue Free 
> > count = 0");
> > -   else if (error == EMSGSIZE)
> > -   PMD_TX_LOG(ERR, "virtqueue_enqueue Free 
> > count < 1");
> > -   else
> > -   PMD_TX_LOG(ERR, "virtqueue_enqueue 
> > error: %d", error);
> > -   break;
> > +   rte_pktmbuf_free(txm);
> > +   continue;
> > }
> > -   nb_tx++;
> > -   txvq->bytes += txm->pkt_len;
> > -   } else {
> > -   PMD_TX_LOG(ERR, "No free tx descriptors to transmit");
> > +   }
> > +
> > +   /* Enqueue Packet buffers */
> > +   error = virtqueue_enqueue_xmit(txvq, txm

[dpdk-dev] [PATCH v3] librte_cfgfile(rte_cfgfile.h): modify the macros values

2015-10-19 Thread Thomas Monjalon

2015-09-04 11:58, Jasvinder Singh:
> -#define CFG_NAME_LEN 32
> -#define CFG_VALUE_LEN 64
> +#ifndef CFG_NAME_LEN
> +#define CFG_NAME_LEN 64
> +#endif

Build fails:
examples/qos_sched/cfg_file.h:40:0: error: "CFG_NAME_LEN" redefined

[dpdk-dev] [PATCH 00/34] update e1000 base driver

2015-10-19 Thread Thomas Monjalon

Hi Wenzhuo,

Isn't it too late to have it included in DPDK 2.2?
For reference: http://dpdk.org/dev/roadmap

If you consider it for 2.3, please use the patchwork status "Deferred".
For older versions of the patchset, please use the status "Superseded".
Thanks

[dpdk-dev] [PATCH 3/5] virtio: use indirect ring elements

2015-10-19 Thread Xie, Huawei

On 10/19/2015 11:47 PM, Stephen Hemminger wrote:
> On Mon, 19 Oct 2015 13:19:50 +
> "Xie, Huawei"  wrote:
>
>>>  static int
>>> -virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie)
>>> +virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie,
>>> +  int use_indirect)
>>>  {
>>> struct vq_desc_extra *dxp;
>>> struct vring_desc *start_dp;
>>> uint16_t seg_num = cookie->nb_segs;
>>> -   uint16_t needed = 1 + seg_num;
>>> +   uint16_t needed = use_indirect ? 1 : 1 + seg_num;
>>> uint16_t head_idx, idx;
>>> -   uint16_t head_size = txvq->hw->vtnet_hdr_size;
>>> +   unsigned long offs;
>>>  
>>> if (unlikely(txvq->vq_free_cnt == 0))
>>> return -ENOSPC;
>>> @@ -220,12 +221,29 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct 
>>> rte_mbuf *cookie)
>>> dxp = &txvq->vq_descx[idx];
>>> dxp->cookie = (void *)cookie;
>>> dxp->ndescs = needed;
>>> -
>>> start_dp = txvq->vq_ring.desc;
>>> -   start_dp[idx].addr =
>>> -   txvq->virtio_net_hdr_mem + idx * head_size;
>>> -   start_dp[idx].len = (uint32_t)head_size;
>>> -   start_dp[idx].flags = VRING_DESC_F_NEXT;
>>> +
>>> +   if (use_indirect) {
>>> +   struct virtio_tx_region *txr
>>> +   = txvq->virtio_net_hdr_mz->addr;
>>> +
>>> +   offs = idx * sizeof(struct virtio_tx_region)
>>> +   + offsetof(struct virtio_tx_region, tx_indir);
>>> +
>>> +   start_dp[idx].addr  = txvq->virtio_net_hdr_mem + offs;
>>> +   start_dp[idx].len   = (seg_num + 1) * sizeof(struct 
>>> vring_desc);  
>> a. In indirect mode, as we always use one descriptor, could we allocate
>> one fixed descriptor as what i did in RX/TX ring layout optimization? :).
> The code can not assume all packets will be in indirect mode. If using
> any_layout, then some packets will use that. Also if you give a packet
> where nb_segs is very large, then it falls back to old mode.
> Also some hosts (like vhost) don't support indirect.
Agree.
With always one descriptor, it is ok.
For the packets with more than VIRTIO_MAX_INDIRECT segs, we fall to old
mode.
>
>> b. If not a, we could cache the descriptor, avoid update unless the
>> fields are different. In current implementation of free desc list, we
>> could make them always use the same tx desc for the same ring slot. I am
>> to submit a patch for normal rx path.
> See above
I don't mean using fixed descriptors. Even in general implementation,
the desc allocated for the ring entry would be normally the same.
So we cache the descriptor id(in the case only one desc is used) last
allocated for each avail ring entry , compare, if the same, skip the store.
This introduces some overhead, but considering vhost has to fetch this
L1M cache line from virtio's processing core, it might be worth.
Mst has posted a patch for virtio's testcase.
Maybe it makes more sense for RX as currently we always uses one descriptor.
If you mean the "only one descriptor" again, skip this comment.
>
>> c. Could we initialize the length of all tx descriptors to be
>> VIRTIO_MAX_INDIRECT * sizeof(struct vring_desc)? Is maximum length ok
>> here? Does the spec require that the length field reflects the length of
>> real used descs, as we already have the next field to indicate the last
>> descriptor.
> The largest VIRTIO_MAX_INDIRECT possible is very large 4K
instead of

start_dp[idx].len   = (seg_num + 1) * sizeof(struct vring_desc)
Is it ok to use
start_dp[idx].len   = 4096 * sizeof(struct vring_desc)?

>
>
>>> +   start_dp[idx].flags = VRING_DESC_F_INDIRECT;
>>> +
>>> +   start_dp = txr[idx].tx_indir;
>>> +   idx = 0;
>>> +   } else {
>>> +   offs = idx * sizeof(struct virtio_tx_region)
>>> +   + offsetof(struct virtio_tx_region, tx_hdr);
>>> +
>>> +   start_dp[idx].addr  = txvq->virtio_net_hdr_mem + offs;
>>> +   start_dp[idx].len   = txvq->hw->vtnet_hdr_size;
>>> +   start_dp[idx].flags = VRING_DESC_F_NEXT;
>>> +   }
>>>  
>>> for (; ((seg_num > 0) && (cookie != NULL)); seg_num--) {
>>> idx = start_dp[idx].next;  
>> This looks weird to me. Why skip the first user provided descriptor?
>> idx = 0
>> idx = start_dp[idx].next
>> start_dp[idx].addr = ...
> The first descriptor (0) is initialized once to point to the static
> all zeros tx header. Then code skips to second entry to initailize the
> first data block.
Ah, forget the header.
>>> @@ -236,7 +254,12 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct 
>>> rte_mbuf *cookie)
>>> }
>>>  
>>> start_dp[idx].flags &= ~VRING_DESC_F_NEXT;
>>> -   idx = start_dp[idx].next;
>>> +
>>> +   if (use_indirect)
>>> +   idx = txvq->vq_ring.desc[head_idx].next;
>>> +   else
>>> +   idx = start_dp[idx].next;
>>> +
>>> txvq->vq_desc_head_idx = idx;
>>> if (txvq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
>>> txvq->vq_desc_tail_idx = idx;
>>> @@ -2

[dpdk-dev] [PATCH v6 0/4] bonding corrections and additions

2015-10-19 Thread Declan Doherty

On 19/10/15 16:36, Eric Kinzie wrote:
> This patchset makes a couple of small corrections to the bonding driver
> and introduces the ability to use an external state machine for mode
> 4 operation.
>
> Changes in v2:
>. eliminate external_sm field in 802.3ad configuration
>  (rte_eth_bond_8023ad_conf).
>. stop bonding device before changing the periodic callback function.
>  start again if needed.
>. remove unnecessary calls to valid_bonded_port_id().
>. do not check for NULL tx_ring.
>. return error in rte_eth_bond_8023ad_ext_slowtx() if packet is not LACP.
>. remove check for external sm configuration in periodic callback
>. check for valid LACPDU in test application's rx callback
>. add "Fixes:" tags
>
> Changes in v3:
>. update rte_eth_bond_version.map
>
> Changes in v4:
>. version functions that modify slowrx_cb
>
> Changes in v5:
>. remove "__vsym" from source files and use updated symbol version macros
>
> Changes in v6:
>. updated symbols have version 2.2
>. specify versioned local symbol names in map so they don't become exported
>
> Eric Kinzie (4):
>bond mode 4: copy entire config structure
>bond mode 4: do not ignore multicast
>bond mode 4: allow external state machine
>bond mode 4: tests for external state machine
>
>   app/test/test_link_bonding_mode4.c|  217 ++-
>   drivers/net/bonding/Makefile  |2 +-
>   drivers/net/bonding/rte_eth_bond_8023ad.c |  241 
> -
>   drivers/net/bonding/rte_eth_bond_8023ad.h |   44 
>   drivers/net/bonding/rte_eth_bond_8023ad_private.h |   28 +++
>   drivers/net/bonding/rte_eth_bond_pmd.c|1 +
>   drivers/net/bonding/rte_eth_bond_version.map  |   10 +
>   7 files changed, 529 insertions(+), 14 deletions(-)
>

Series Acked-by: Declan Doherty

[dpdk-dev] [PATCH 1/5] virtio: clean up space checks on xmit

2015-10-19 Thread Xie, Huawei

On 10/19/2015 11:48 PM, Stephen Hemminger wrote:
> On Mon, 19 Oct 2015 08:02:07 +
> "Xie, Huawei"  wrote:
>
>> On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
>>> The space check for transmit ring only needs a single conditional.
>>> I.e only need to recheck for space if there was no space in first check.
>> I see you reorganized the code. It is more clear and readable with your
>> change, but no check is removed.
>> We could remove the check after virtio_xmit_cleanup. In current
>> implementation, we assume the virtio_xmit_cleanup and vq_ring_free_chain
>> always succeed.
> The only case that matters is if virtio_xmit_cleanup can not
> free up any slots.
Is there the case in current implementation virtio_xmit_cleanup could
free up any descriptors, i.e, dxp->ndesc is zero?
vq->free_cnt += dxp->ndesc
>
>>> This can help performance and simplifies loop.
>>>
>>> Signed-off-by: Stephen Hemminger 
>>> ---
>>>  drivers/net/virtio/virtio_rxtx.c | 66 
>>> 
>>>  1 file changed, 27 insertions(+), 39 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio/virtio_rxtx.c 
>>> b/drivers/net/virtio/virtio_rxtx.c
>>> index c5b53bb..5b50ed0 100644
>>> --- a/drivers/net/virtio/virtio_rxtx.c
>>> +++ b/drivers/net/virtio/virtio_rxtx.c
>>> @@ -745,7 +745,6 @@ uint16_t
>>>  virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t 
>>> nb_pkts)
>>>  {
>>> struct virtqueue *txvq = tx_queue;
>>> -   struct rte_mbuf *txm;
>>> uint16_t nb_used, nb_tx;
>>> int error;
>>>  
>>> @@ -759,57 +758,46 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
>>> **tx_pkts, uint16_t nb_pkts)
>>> if (likely(nb_used > txvq->vq_nentries - txvq->vq_free_thresh))
>>> virtio_xmit_cleanup(txvq, nb_used);
>>>  
>>> -   nb_tx = 0;
>>> +   for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
>>> +   struct rte_mbuf *txm = tx_pkts[nb_tx];
>>> +   int need = txm->nb_segs - txvq->vq_free_cnt + 1;
>>>  
>>> -   while (nb_tx < nb_pkts) {
>>> -   /* Need one more descriptor for virtio header. */
>>> -   int need = tx_pkts[nb_tx]->nb_segs - txvq->vq_free_cnt + 1;
>>> -
>>> -   /*Positive value indicates it need free vring descriptors */
>>> +   /* Positive value indicates it need free vring descriptors */
>> As you fix the comment, if it is correct, could you do s/need/needs/ as
>> well, :)?
>>> if (unlikely(need > 0)) {
>>> nb_used = VIRTQUEUE_NUSED(txvq);
>>> virtio_rmb();
>>> need = RTE_MIN(need, (int)nb_used);
>>>  
>>> virtio_xmit_cleanup(txvq, need);
>>> -   need = (int)tx_pkts[nb_tx]->nb_segs -
>>> -   txvq->vq_free_cnt + 1;
>>> -   }
>>> -
>>> -   /*
>>> -* Zero or negative value indicates it has enough free
>>> -* descriptors to use for transmitting.
>>> -*/
>>> -   if (likely(need <= 0)) {
>>> -   txm = tx_pkts[nb_tx];
>>> -
>>> -   /* Do VLAN tag insertion */
>>> -   if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) {
>>> -   error = rte_vlan_insert(&txm);
>>> -   if (unlikely(error)) {
>>> -   rte_pktmbuf_free(txm);
>>> -   ++nb_tx;
>>> -   continue;
>>> -   }
>> Still need is rechecked here. It could be removed if virtio_xmit_cleanup
>> ensures need would be <= 0 after the call.
>>> +   need = txm->nb_segs - txvq->vq_free_cnt + 1;
>>> +   if (unlikely(need > 0)) {
>>> +   PMD_TX_LOG(ERR,
>>> +  "No free tx descriptors to 
>>> transmit");
>>> +   break;
>>> }
>>> +   }
>>>  
>>> -   /* Enqueue Packet buffers */
>>> -   error = virtqueue_enqueue_xmit(txvq, txm);
>>> +   /* Do VLAN tag insertion */
>>> +   if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) {
>>> +   error = rte_vlan_insert(&txm);
>>> if (unlikely(error)) {
>>> -   if (error == ENOSPC)
>>> -   PMD_TX_LOG(ERR, "virtqueue_enqueue Free 
>>> count = 0");
>>> -   else if (error == EMSGSIZE)
>>> -   PMD_TX_LOG(ERR, "virtqueue_enqueue Free 
>>> count < 1");
>>> -   else
>>> -   PMD_TX_LOG(ERR, "virtqueue_enqueue 
>>> error: %d", error);
>>> -   break;
>>> +   rte_pktmbuf_free(txm);
>>> +   continue;
>>> }
>>> -   nb_tx++;
>>> -   txvq->bytes += txm->pkt_len;
>>> -

[dpdk-dev] [PATCH 4/5] virtio: use any layout on transmit

2015-10-19 Thread Xie, Huawei

On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
> Virtio supports a feature that allows sender to put transmit
> header prepended to data.  It requires that the mbuf be writeable, correct
> alignment, and the feature has been negotiatied.  If all this works out,
> then it will be the optimum way to transmit a single segment packet.
"When using legacy interfaces, transitional drivers which have not
negotiated VIRTIO_F_ANY_LAYOUT
MUST use a single descriptor for the struct virtio_net_hdr on both
transmit and receive, with the
network data in the following descriptors."

I think we shouldn't assume that virtio header descriptor uses a
separate descriptor. It could be with data. Virtio RX(and dpdk vhost)
actually is implemented like this before, i.e, i thought this should be
inherent but not a feature.
Is the current RX implementation wrong?
[...]

[dpdk-dev] [PATCH] ixgbe: prefetch packet headers in vector PMD receive function

2015-10-19 Thread Zoltan Kiss



On 15/10/15 16:43, Ananyev, Konstantin wrote:
>
>
>> -Original Message-
>> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
>> Sent: Thursday, October 15, 2015 11:33 AM
>> To: Ananyev, Konstantin; Richardson, Bruce; dev at dpdk.org
>> Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD receive 
>> function
>>
>>
>>
>> On 15/10/15 00:23, Ananyev, Konstantin wrote:
>>>
>>>
 -Original Message-
 From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
 Sent: Wednesday, October 14, 2015 5:11 PM
 To: Ananyev, Konstantin; Richardson, Bruce; dev at dpdk.org
 Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD receive 
 function



 On 28/09/15 00:19, Ananyev, Konstantin wrote:
>
>
>> -Original Message-
>> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
>> Sent: Friday, September 25, 2015 7:29 PM
>> To: Richardson, Bruce; dev at dpdk.org
>> Cc: Ananyev, Konstantin
>> Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD 
>> receive function
>>
>> On 07/09/15 07:41, Richardson, Bruce wrote:
>>>
 -Original Message-
 From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
 Sent: Monday, September 7, 2015 3:15 PM
 To: Richardson, Bruce; dev at dpdk.org
 Cc: Ananyev, Konstantin
 Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD 
 receive
 function



 On 07/09/15 13:57, Richardson, Bruce wrote:
>
>> -Original Message-
>> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
>> Sent: Monday, September 7, 2015 1:26 PM
>> To: dev at dpdk.org
>> Cc: Ananyev, Konstantin; Richardson, Bruce
>> Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD
>> receive function
>>
>> Hi,
>>
>> I just realized I've missed the "[PATCH]" tag from the subject. Did
>> anyone had time to review this?
>>
> Hi Zoltan,
>
> the big thing that concerns me with this is the addition of new
> instructions for each packet in the fast path. Ideally, this
> prefetching would be better handled in the application itself, as for
> some apps, e.g. those using pipelining, the core doing the RX from the
> NIC may not touch the packet data at all, and the prefetches will
 instead cause a performance slowdown.
> Is it possible to get the same performance increase - or something
> close to it - by making changes in OVS?
 OVS already does a prefetch when it's processing the previous packet, 
 but
 apparently it's not early enough. At least for my test scenario, where 
 I'm
 forwarding UDP packets with the least possible overhead. I guess in 
 tests
 where OVS does more complex processing it should be fine.
 I'll try to move the prefetch earlier in OVS codebase, but I'm not 
 sure if
 it'll help.
>>> I would suggest trying to prefetch more than one packet ahead. 
>>> Prefetching 4 or
>>> 8 ahead might work better, depending on the processing being done.
>>
>> I've moved the prefetch earlier, and it seems to work:
>>
>> https://patchwork.ozlabs.org/patch/519017/
>>
>> However it raises the question: should we remove header prefetch from
>> all the other drivers, and the CONFIG_RTE_PMD_PACKET_PREFETCH option?
>
> My vote would be for that.
> Konstantin

 After some further thinking I would rather support the
 rte_packet_prefetch() macro (prefetch the header in the driver, and
 configure it through CONFIG_RTE_PMD_PACKET_PREFETCH)

 - the prefetch can happen earlier, so if an application needs the header
 right away, this is the fastest
 - if the application doesn't need header prefetch, it can turn it off
 compile time. Same as if we wouldn't have this option.
 - if the application has mixed needs (sometimes it needs the header
 right away, sometimes it doesn't), it can turn it off and do what it
 needs. Same as if we wouldn't have this option.

 A harder decision would be whether it should be turned on or off by
 default. Currently it's on, and half of the receive functions don't use it.
>>>
>>> Yep,  it is sort of a mess right now.
>>> Another question if we'd like to keep it and standardise it:
>>> at what moment to prefetch: as soon as we realize that HW is done with that 
>>> buffer,
>>> or as late inside rx_burst() as possible?
>>> I suppose there is no clear answer for that.
>> I think if the application needs the driver start the prefetching, it
>> does it because it's already too late when rte_eth_rx_burst() returns.
>> So I think it's best to do it as soon as we learn that the HW is done.
>
>

[dpdk-dev] [PATCH 4/5] virtio: use any layout on transmit

2015-10-19 Thread Stephen Hemminger

On Mon, 19 Oct 2015 16:28:30 +
"Xie, Huawei"  wrote:

> On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
> > Virtio supports a feature that allows sender to put transmit
> > header prepended to data.  It requires that the mbuf be writeable, correct
> > alignment, and the feature has been negotiatied.  If all this works out,
> > then it will be the optimum way to transmit a single segment packet.  
> "When using legacy interfaces, transitional drivers which have not
> negotiated VIRTIO_F_ANY_LAYOUT
> MUST use a single descriptor for the struct virtio_net_hdr on both
> transmit and receive, with the
> network data in the following descriptors."

The code checks for the any layout feature, what is the problem?

[dpdk-dev] [PATCH 4/5] virtio: use any layout on transmit

2015-10-19 Thread Xie, Huawei

On 10/20/2015 12:43 AM, Stephen Hemminger wrote:
> On Mon, 19 Oct 2015 16:28:30 +
> "Xie, Huawei"  wrote:
>
>> On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
>>> Virtio supports a feature that allows sender to put transmit
>>> header prepended to data.  It requires that the mbuf be writeable, correct
>>> alignment, and the feature has been negotiatied.  If all this works out,
>>> then it will be the optimum way to transmit a single segment packet.  
>> "When using legacy interfaces, transitional drivers which have not
>> negotiated VIRTIO_F_ANY_LAYOUT
>> MUST use a single descriptor for the struct virtio_net_hdr on both
>> transmit and receive, with the
>> network data in the following descriptors."
> The code checks for the any layout feature, what is the problem?
My reply is removed. I said virtio RX is already implemented using this
feature by default without negotiation(at the time of implementation, no
idea of this feature), is the RX implementation wrong?

[dpdk-dev] [PATCH 4/5] virtio: use any layout on transmit

2015-10-19 Thread Stephen Hemminger

On Mon, 19 Oct 2015 16:28:30 +
"Xie, Huawei"  wrote:

> "When using legacy interfaces, transitional drivers which have not
> negotiated VIRTIO_F_ANY_LAYOUT
> MUST use a single descriptor for the struct virtio_net_hdr on both
> transmit and receive, with the
> network data in the following descriptors."
> 
> I think we shouldn't assume that virtio header descriptor uses a
> separate descriptor. It could be with data. Virtio RX(and dpdk vhost)
> actually is implemented like this before, i.e, i thought this should be
> inherent but not a feature.
> Is the current RX implementation wrong?

I believe current RX is ok, the any layout refers more to what is
handed to the host on transmit. Rusty said something like
"any sane implementation would work with contiguous buffer"
but the standard couldn't assume sanity!

[dpdk-dev] [PATCH] ixgbe: prefetch packet headers in vector PMD receive function

2015-10-19 Thread Ananyev, Konstantin



> -Original Message-
> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
> Sent: Monday, October 19, 2015 5:30 PM
> To: Ananyev, Konstantin; Richardson, Bruce; dev at dpdk.org
> Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD receive 
> function
> 
> 
> 
> On 15/10/15 16:43, Ananyev, Konstantin wrote:
> >
> >
> >> -Original Message-
> >> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
> >> Sent: Thursday, October 15, 2015 11:33 AM
> >> To: Ananyev, Konstantin; Richardson, Bruce; dev at dpdk.org
> >> Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD receive 
> >> function
> >>
> >>
> >>
> >> On 15/10/15 00:23, Ananyev, Konstantin wrote:
> >>>
> >>>
>  -Original Message-
>  From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
>  Sent: Wednesday, October 14, 2015 5:11 PM
>  To: Ananyev, Konstantin; Richardson, Bruce; dev at dpdk.org
>  Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD 
>  receive function
> 
> 
> 
>  On 28/09/15 00:19, Ananyev, Konstantin wrote:
> >
> >
> >> -Original Message-
> >> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
> >> Sent: Friday, September 25, 2015 7:29 PM
> >> To: Richardson, Bruce; dev at dpdk.org
> >> Cc: Ananyev, Konstantin
> >> Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD 
> >> receive function
> >>
> >> On 07/09/15 07:41, Richardson, Bruce wrote:
> >>>
>  -Original Message-
>  From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
>  Sent: Monday, September 7, 2015 3:15 PM
>  To: Richardson, Bruce; dev at dpdk.org
>  Cc: Ananyev, Konstantin
>  Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD 
>  receive
>  function
> 
> 
> 
>  On 07/09/15 13:57, Richardson, Bruce wrote:
> >
> >> -Original Message-
> >> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
> >> Sent: Monday, September 7, 2015 1:26 PM
> >> To: dev at dpdk.org
> >> Cc: Ananyev, Konstantin; Richardson, Bruce
> >> Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD
> >> receive function
> >>
> >> Hi,
> >>
> >> I just realized I've missed the "[PATCH]" tag from the subject. Did
> >> anyone had time to review this?
> >>
> > Hi Zoltan,
> >
> > the big thing that concerns me with this is the addition of new
> > instructions for each packet in the fast path. Ideally, this
> > prefetching would be better handled in the application itself, as 
> > for
> > some apps, e.g. those using pipelining, the core doing the RX from 
> > the
> > NIC may not touch the packet data at all, and the prefetches will
>  instead cause a performance slowdown.
> > Is it possible to get the same performance increase - or something
> > close to it - by making changes in OVS?
>  OVS already does a prefetch when it's processing the previous 
>  packet, but
>  apparently it's not early enough. At least for my test scenario, 
>  where I'm
>  forwarding UDP packets with the least possible overhead. I guess in 
>  tests
>  where OVS does more complex processing it should be fine.
>  I'll try to move the prefetch earlier in OVS codebase, but I'm not 
>  sure if
>  it'll help.
> >>> I would suggest trying to prefetch more than one packet ahead. 
> >>> Prefetching 4 or
> >>> 8 ahead might work better, depending on the processing being done.
> >>
> >> I've moved the prefetch earlier, and it seems to work:
> >>
> >> https://patchwork.ozlabs.org/patch/519017/
> >>
> >> However it raises the question: should we remove header prefetch from
> >> all the other drivers, and the CONFIG_RTE_PMD_PACKET_PREFETCH option?
> >
> > My vote would be for that.
> > Konstantin
> 
>  After some further thinking I would rather support the
>  rte_packet_prefetch() macro (prefetch the header in the driver, and
>  configure it through CONFIG_RTE_PMD_PACKET_PREFETCH)
> 
>  - the prefetch can happen earlier, so if an application needs the header
>  right away, this is the fastest
>  - if the application doesn't need header prefetch, it can turn it off
>  compile time. Same as if we wouldn't have this option.
>  - if the application has mixed needs (sometimes it needs the header
>  right away, sometimes it doesn't), it can turn it off and do what it
>  needs. Same as if we wouldn't have this option.
> 
>  A harder decision would be whether it should be turned on or off by
>  default. Currently it's on, and half of the receive functions don't

[dpdk-dev] [PATCH v3 0/4] support for netronome nfp-6xxx card

2015-10-19 Thread Mcnamara, John

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alejandro.Lucero
> Sent: Friday, October 16, 2015 11:45 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v3 0/4] support for netronome nfp-6xxx card
> 
> From: "Alejandro.Lucero" 
> 
> This patchset adds a new PMD for Netronome NFP-6xxx card along with a new
> UIO driver, documentation and minor changes to configuration scripts.

I haven't reviewed the PMD itself but the majority of the changes are confined 
to drivers/net/nfp under the Maintainership of the submitter. There is also a 
new uio module called nfp_uio.

The patches apply cleanly and build with gcc and clang, the documentation is 
good and builds cleanly, there are additions to tools/dpdk_nic_bind.py and 
tools/setup.sh which work correctly in minimal testing.

CCing other maintainers in the relevant trees but if there are no other 
objections:

Series Acked-by: John McNamara

[dpdk-dev] [dpdk-dev, PATCHv6 5/6] i40e: enhance eth_(rxq|txq)_info_get to retrieve more queue information

2015-10-19 Thread Stephen Hemminger

On Tue, 20 Oct 2015 00:06:23 +0200
Amine Kherbouche  wrote:

> According to new fields in struct rte_eth_rxq_info, those are filled to
> add additional information about queue descriptors.
> 
> Signed-off-by: Amine Kherbouche 
> ---
>  drivers/net/i40e/i40e_rxtx.c |4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
> index fa1451e..e958d56 100644
> --- a/drivers/net/i40e/i40e_rxtx.c
> +++ b/drivers/net/i40e/i40e_rxtx.c
> @@ -3075,6 +3075,8 @@ i40e_rxq_info_get(struct rte_eth_dev *dev, uint16_t 
> queue_id,
>   qinfo->mp = rxq->mp;
>   qinfo->scattered_rx = dev->data->scattered_rx;
>   qinfo->nb_desc = rxq->nb_rx_desc;
> + qinfo->used_desc = (uint16_t)i40e_dev_rx_queue_count(dev, queue_id);

Cast here is unnecessary. C does the necessary type conversion.

[dpdk-dev] [dpdk-dev, PATCHv6 1/6] ethdev: enhance rte_eth_(tx|rx)q_info struct

2015-10-19 Thread Stephen Hemminger

On Tue, 20 Oct 2015 00:06:19 +0200
Amine Kherbouche  wrote:

> + uint16_t used_desc; /**< number of used descriptors */
> + uint16_t free_desc; /**< number of free descriptors */

These two fields are obviously not SMP sfe.
Also, they are redundant with the existing queue_count API?

[dpdk-dev] [PATCH v4 1/7] mk: Add rule for installing headers

2015-10-19 Thread Arevalo, Mario Alfredo C

Hi Olivier, thank you for your feedback , more comments below :) 

Thanks.
Mario.

From: Olivier MATZ [olivier.m...@6wind.com]
Sent: Friday, October 16, 2015 12:29 PM
To: Arevalo, Mario Alfredo C; dev at dpdk.org
Cc: Panu Matilainen
Subject: Re: [PATCH v4 1/7] mk: Add rule for installing headers

Hi Mario,

Thank you for this patch series, and thank you Panu for the
good comments on this series.

Please see some comments below.

On 10/05/2015 10:20 PM, Mario Carrillo wrote:
> Add hierarchy-file support to the DPDK headers.
>
> When invoking "make install-headers" headers will
> be installed in: $(DESTDIR)/$(INCLUDE_DIR)
> where INCLUDE_DIR=/usr/include/dpdk by default.
>
> You can override INCLUDE_DIR var.
> This hierarchy is based on:
> http://www.freedesktop.org/software/systemd/man/file-hierarchy.html
>
> Signed-off-by: Mario Carrillo 
> ---
>  mk/rte.sdkinstall.mk | 18 +-
>  mk/rte.sdkroot.mk|  4 ++--
>  2 files changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk
> index 86c98a5..f016171 100644
> --- a/mk/rte.sdkinstall.mk
> +++ b/mk/rte.sdkinstall.mk
> @@ -41,6 +41,10 @@ endif
>  # x86_64-native-*-gcc
>  ifndef T
>  T=*
> +ifneq (,$(wildcard $(BUILD_DIR)/build/.config))

What about using $(RTE_OUTPUT) here instead of $(BUILD_DIR)/build ?

With you patch, the following commands work:

  make config T=x86_64-native-linuxapp-gcc
  make -j32
  make install-headers DESTDIR=install

Replacing all the occurences of $(BUILD_DIR)/build by $(RTE_OUTPUT)
would also make the following commands work:

  make config T=x86_64-native-linuxapp-gcc O=build2
  make -j32 O=build2
  make install-headers O=build2 DESTDIR=install2

Note: the default value of RTE_OUTPUT is set to $(RTE_SRCDIR)/build
in mk/rte.sdkroot.mk

We also need to to replace other occurences of $(BUILD_DIR), please
see my comment on patch 2/7.

[Mario]: Yes, you right, I can replace it :) 

> +INCLUDE_DIR ?= /usr/include/dpdk
> +HSLINKS := $(wildcard $(RTE_OUTPUT)/include/*)
> +endif
>  endif
>
>  #
> @@ -72,7 +76,19 @@ install: $(INSTALL_TARGETS)
>   echo "Using local configuration"; \
>   fi
>   $(Q)$(MAKE) all O=$(BUILD_DIR)/$*
> -
> +#
> +# install headers in /usr/include/dpdk by default
> +# INCLUDE_DIR can be overridden.
> +#
> +.PHONY: install-headers
> +install-headers:
> + @echo == Installing headers;
> + @[ -d $(DESTDIR)/$(INCLUDE_DIR) ] || mkdir -p $(DESTDIR)/$(INCLUDE_DIR)

I think it's useless to do the [ -d $(DESTDIR)/$(INCLUDE_DIR) ] as the
'mkdir -p' will do it as well.

But maybe it could be useful to check:
  [ "$${HSLINKS}" != "" ]
This would solve the issue described by Panu about the directories
created even if they are empty.

[Mario]: Sounds good, that can solve the problem. 

> + @for HSLINK in ${HSLINKS}; do \

Not sure to understand what is the meaning of HSLINK?
HS = headers?

[Mario]: HSLINKS= headers links, in dpdk installation, it creates links to 
headers, however for installation I would like
to get the headers paths and copy them to installation path, not create links  
:) .

Regards,
Olivier

[dpdk-dev] [PATCH v4 5/7] mk: Add rule for installing documentation

2015-10-19 Thread Arevalo, Mario Alfredo C

Hi Olivier,

Ooops I'm going to fix that :)

Thanks.
Mario.

From: Olivier MATZ [olivier.m...@6wind.com]
Sent: Friday, October 16, 2015 12:30 PM
To: Arevalo, Mario Alfredo C; dev at dpdk.org
Subject: Re: [PATCH v4 5/7] mk: Add rule for installing documentation

Hi Mario,

On 10/05/2015 10:20 PM, Mario Carrillo wrote:
> Add hierarchy-file support to the DPDK documentation.
>
> When invoking "make install-doc" documentation files will
> be installed in: $(DESTDIR)/$(DOC_DIR) where
> DOC_DIR=$(DESTDIR)/usr/share/doc/dpdk by default.
>
> You can override DOC_DIR var.
> This hierarchy is based on:
> http://www.freedesktop.org/software/systemd/man/file-hierarchy.html
>
> Signed-off-by: Mario Carrillo 
> ---
>  mk/rte.sdkinstall.mk | 14 ++
>  mk/rte.sdkroot.mk|  6 --
>  2 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk
> index 628d9fb..d917329 100644
> --- a/mk/rte.sdkinstall.mk
> +++ b/mk/rte.sdkinstall.mk
> @@ -44,12 +44,14 @@ T=*
>  ifneq (,$(wildcard $(BUILD_DIR)/build/.config))
>  INCLUDE_DIR ?= /usr/include/dpdk
>  BIN_DIR ?= /usr/bin
> +DOC_DIR ?= /usr/share/doc/dpdk
>  HSLINKS := $(wildcard $(RTE_OUTPUT)/include/*)
>  BINARY_FILES := $(patsubst %.map,,$(wildcard $(RTE_OUTPUT)/app/*))
>  NIC_FILES := $(wildcard $(BUILD_DIR)/tools/*.py)
>  BINARY_FILES += $(NIC_FILES)
>  LIBS := $(wildcard $(RTE_OUTPUT)/lib/*)
>  MODULES := $(wildcard $(RTE_OUTPUT)/kmod/*)
> +DOCS := $(wildcard $(BUILD_DIR)/doc/*)
>  include $(BUILD_DIR)/build/.config
>  RTE_ARCH := $(CONFIG_RTE_ARCH:"%"=%)
>  RTE_EXEC_ENV := $(CONFIG_RTE_EXEC_ENV:"%"=%)
> @@ -148,6 +150,18 @@ install-mod:
>   echo installing: $$MOD; \
>   done
>  #
> +# install documentation in /usr/share/doc/dpdk
> +# bu default, DOC_DIR can be overriden.
> +#

small typo here (bu -> by)

Regards,
Olivier

85 matches

Mail list logo