[dpdk-dev] deadline for 2.0 features proposal

2015-01-28 Thread Thomas Monjalon
Hello all,

As previously announced when releasing DPDK 1.8.0
(http://dpdk.org/ml/archives/dev/2014-December/010470.html),
we are going to apply deadlines to schedule the release cycles.

The version 2.0 will integrate only features submitted before
end of January (end of this week) and reviewed before 20th February.
More details on this page:
http://dpdk.org/dev/roadmap#dates

During the 2nd phase ("Review Period"), only pending features, fixes
and highly desirable cleanups will be accepted.

In case there are some volunteers to clean the code, I maintain a list
of cleanups which could be interesting:
- use Rx/Tx defaults in testpmd
- use librte_cfgfile in examples/qos_sched (promised deduplication)
- use rte_eth_dev_atomic_read_link_status in PMDs
- use new assert macros for unit tests
- convert all drivers to new filtering API
- move non-ethernet API from ethdev to EAL
- move RTE_MBUF_DATA_DMA_ADDR from all PMDs to a common place
- move rte_rxmbuf_alloc in API
- move queue_stats_mapping_set to ixgbe
- move rte_cache_aligned at beginning of struct declarations
- detect cache line size
- remove old VMDQ API
- remove old filtering API
- remove Xen ifdefs in memory management
- remove doxygen warnings
- choose between RTE_LIBRTE_*_PMD and RTE_LIBRTE_PMD_*

Thank you
-- 
Thomas


[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-28 Thread EDMISON, Kelvin (Kelvin)

On 2015-01-27, 3:22 AM, "Wang, Zhihong"  wrote:

>
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of EDMISON, Kelvin
>> (Kelvin)
>> Sent: Friday, January 23, 2015 2:22 AM
>> To: dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
>> 
>> 
>> 
>> On 2015-01-21, 3:54 PM, "Neil Horman"  wrote:
>> 
>> >On Wed, Jan 21, 2015 at 11:49:47AM -0800, Stephen Hemminger wrote:
>> >> On Wed, 21 Jan 2015 13:26:20 +
>> >> Bruce Richardson  wrote:
>> >>
[..trim...]
>> >> One issue I have is that as a vendor we need to ship on binary, not
>> >>different distributions
>> >> for each Intel chip variant. There is some support for multi-chip
>> >>version functions
>> >> but only in latest Gcc which isn't in Debian stable. And the
>>multi-chip
>> >>version
>> >> of functions is going to be more expensive than inlining. For some
>> >>cases, I have
>> >> seen that the overhead of fancy instructions looks good but have
>>nasty
>> >>side effects
>> >> like CPU stall and/or increased power consumption which turns of
>>turbo
>> >>boost.
>> >>
>> >>
>> >> Distro's in general have the same problem with special case
>> >>optimizations.
>> >>
>> >What we really need is to do something like borrow the alternatives
>> >mechanism
>> >from the kernel so that we can dynamically replace instructions at run
>> >time
>> >based on cpu flags.  That way we could make the choice at run time, and
>> >wouldn't
>> >have to do alot of special case jumping about.
>> >Neil
>> 
>> +1.
>> 
>> I think it should be an anti-requirement that the build machine be the
>> exact same chip as the deployment platform.
>> 
>> I like the cpu flag inspection approach.  It would help in the case
>>where
>> DPDK is in a VM and an odd set of CPU flags have been exposed.
>> 
>> If that approach doesn't work though, then perhaps DPDK memcpy could go
>> through a benchmarking at app startup time and select the most
>>performant
>> option out of a set, like mdraid's raid6 implementation does.  To give
>>an
>> example, this is what my systems print out at boot time re: raid6
>> algorithm selection.
>> raid6: sse2x13171 MB/s
>> raid6: sse2x23925 MB/s
>> raid6: sse2x44523 MB/s
>> raid6: using algorithm sse2x4 (4523 MB/s)
>> 
>> Regards,
>>Kelvin
>> 
>
>Thanks for the proposal!
>
>For DPDK, performance is always the most important concern. We need to
>utilize new architecture features to achieve that, so solution per arch
>is necessary.
>Even a few extra cycles can lead to bad performance if they're in a hot
>loop.
>For instance, let's assume DPDK takes 60 cycles to process a packet on
>average, then 3 more cycles here means 5% performance drop.
>
>The dynamic solution is doable but with performance penalties, even if it
>could be small. Also it may bring extra complexity, which can lead to
>unpredictable behaviors and side effects.
>For example, the dynamic solution won't have inline unrolling, which can
>bring significant performance benefit for small copies with constant
>length, like eth_addr.
>
>We can investigate the VM scenario more.
>
>Zhihong (John)

John,

  Thanks for taking the time to answer my newbie question. I deeply
appreciate the attention paid to performance in DPDK. I have a follow-up
though.

I'm trying to figure out what requirements this approach creates for the
software build environment.  If we want to build optimized versions for
Haswell, Ivy Bridge, Sandy Bridge, etc, does this mean that we must have
one of each micro-architecture available for running the builds, or is
there a way of cross-compiling for all micro-architectures from just one
build environment?

Thanks,
  Kelvin 




[dpdk-dev] Regarding UDP checksum offload

2015-01-28 Thread Prashant Upadhyaya
On Wed, Jan 28, 2015 at 6:32 PM, Olivier MATZ 
wrote:

> Hi Prashant,
>
>
> On 01/28/2015 12:25 PM, Prashant Upadhyaya wrote:
>
>> Hi,
>>
>> I am aware that this topic has been discussed several times before, but I
>> am somehow still stuck with this.
>>
>> I am using dpdk 1.6r1, intel 82599 NIC.
>> I have an mbuf, I have hand-constructed a UDP packet (IPv4) in the data
>> portion, filled the relevant fields of the headers and I do a tx burst. No
>> problems, the destination gets the packet. I filled UDP checksum as zero
>> and there was no checksum offloaded in ol_flags.
>>
>> Now in the same usecase, I want to offload UDP checksum.
>> I am aware that the checksum field in UDP header has to be filled with the
>> pseudo header checksum, I did that, duly added the PKT_TX_UDP_CKSUM flag
>> in
>> ol_flags, did a tx_burst and the packet does not reach the destination.
>>
>> I realized that I have to fill the following fields as well (my packet
>> does
>> not have vlan tag)
>> mbuf->pkt.vlan_macip.f.l2_len
>> mbuf->pkt.vlan_macip.f.l3_len
>>
>> so I filled the l2_len as 14 and l3_len as 20 (IP header with no options)
>> Yet the packet did not reach the destination.
>>
>> So my question is -- am I filling the l2_len and l3_len properly ?
>> Is there anything else to be done before I can get this UDP checksum
>> offload to work properly for me.
>>
>
>
> As far as I remember, this should be working on 1.6r1.
> When you say "did not reach the destination", do you mean that the
> packet is not transmitted at all? Or is it transmitted with a wrong
> checksum?
>

The packet is not transmitted to destination. I cannot see it in tcpdump at
wireshark.
If I don't do the offload and fill UDP checksum as zero, then destination
shows the packet in tcpdump
If I don't do the offload and just fill the pseudo header checksum in UDP
header (clearly the wrong checksum), then the destination shows the packet
in tcpdump and wireshark decodes it to complain of wrong UDP checksum as
expected.

Let me add further, I am _just_ doing the UDP checksum offload and not the
IP hdr checksum offload. I calculate and set IP header checksum by my own
code. I hope that this is acceptable and does not interfere with UDP
checksum offload

>
> I think you should try to reproduce the issue with the latest DPDK
> which is known to work with test-pmd (csum forward engine).
>
> Regards,
> Olivier
>
>


[dpdk-dev] [PATCH v1 5/5] L3fwd-power: enable one-shot rx interrupt and polling/interrupt mode switch

2015-01-28 Thread Liang, Cunming


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Danny Zhou
> Sent: Wednesday, January 28, 2015 2:51 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v1 5/5] L3fwd-power: enable one-shot rx interrupt
> and polling/interrupt mode switch
> 
> Signed-off-by: Danny Zhou 
> ---
>  examples/l3fwd-power/main.c | 170
> +---
>  1 file changed, 129 insertions(+), 41 deletions(-)
> 
> diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
> index f6b55b9..e6e4f55 100644
> --- a/examples/l3fwd-power/main.c
> +++ b/examples/l3fwd-power/main.c
> @@ -75,12 +75,13 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #define RTE_LOGTYPE_L3FWD_POWER RTE_LOGTYPE_USER1
> 
>  #define MAX_PKT_BURST 32
> 
> -#define MIN_ZERO_POLL_COUNT 5
> +#define MIN_ZERO_POLL_COUNT 10
> 
>  /* around 100ms at 2 Ghz */
>  #define TIMER_RESOLUTION_CYCLES   2ULL
> @@ -188,6 +189,9 @@ struct lcore_rx_queue {
>  #define MAX_TX_QUEUE_PER_PORT RTE_MAX_ETHPORTS
>  #define MAX_RX_QUEUE_PER_PORT 128
> 
> +#define MAX_RX_QUEUE_INTERRUPT_PER_PORT 16
> +
> +
>  #define MAX_LCORE_PARAMS 1024
>  struct lcore_params {
>   uint8_t port_id;
> @@ -214,7 +218,7 @@ static uint16_t nb_lcore_params =
> sizeof(lcore_params_array_default) /
> 
>  static struct rte_eth_conf port_conf = {
>   .rxmode = {
> - .mq_mode= ETH_MQ_RX_RSS,
> + .mq_mode = ETH_MQ_RX_RSS,
>   .max_rx_pkt_len = ETHER_MAX_LEN,
>   .split_hdr_size = 0,
>   .header_split   = 0, /**< Header Split disabled */
> @@ -226,11 +230,14 @@ static struct rte_eth_conf port_conf = {
>   .rx_adv_conf = {
>   .rss_conf = {
>   .rss_key = NULL,
> - .rss_hf = ETH_RSS_IP,
> + .rss_hf = ETH_RSS_UDP,
>   },
>   },
>   .txmode = {
> - .mq_mode = ETH_DCB_NONE,
> + .mq_mode = ETH_MQ_TX_NONE,
> + },
> + .intr_conf = {
> + .rxq = 1, /**< rxq interrupt feature enabled */
>   },
>  };
> 
> @@ -402,19 +409,22 @@ power_timer_cb(__attribute__((unused)) struct
> rte_timer *tim,
>   /* accumulate total execution time in us when callback is invoked */
>   sleep_time_ratio = (float)(stats[lcore_id].sleep_time) /
>   (float)SCALING_PERIOD;
> -
>   /**
>* check whether need to scale down frequency a step if it sleep a lot.
>*/
> - if (sleep_time_ratio >= SCALING_DOWN_TIME_RATIO_THRESHOLD)
> - rte_power_freq_down(lcore_id);
> + if (sleep_time_ratio >= SCALING_DOWN_TIME_RATIO_THRESHOLD) {
> + if (rte_power_freq_down)
> + rte_power_freq_down(lcore_id);
> + }
>   else if ( (unsigned)(stats[lcore_id].nb_rx_processed /
> - stats[lcore_id].nb_iteration_looped) < MAX_PKT_BURST)
> + stats[lcore_id].nb_iteration_looped) < MAX_PKT_BURST) {
>   /**
>* scale down a step if average packet per iteration less
>* than expectation.
>*/
> - rte_power_freq_down(lcore_id);
> + if (rte_power_freq_down)
> + rte_power_freq_down(lcore_id);
> + }
> 
>   /**
>* initialize another timer according to current frequency to ensure
> @@ -707,22 +717,20 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t
> portid,
> 
>  }
> 
> -#define SLEEP_GEAR1_THRESHOLD100
> -#define SLEEP_GEAR2_THRESHOLD1000
> +#define MINIMUM_SLEEP_TIME 1
> +#define SUSPEND_THRESHOLD  300
> 
>  static inline uint32_t
>  power_idle_heuristic(uint32_t zero_rx_packet_count)
>  {
> - /* If zero count is less than 100, use it as the sleep time in us */
> - if (zero_rx_packet_count < SLEEP_GEAR1_THRESHOLD)
> - return zero_rx_packet_count;
> - /* If zero count is less than 1000, sleep time should be 100 us */
> - else if ((zero_rx_packet_count >= SLEEP_GEAR1_THRESHOLD) &&
> - (zero_rx_packet_count < SLEEP_GEAR2_THRESHOLD))
> - return SLEEP_GEAR1_THRESHOLD;
> - /* If zero count is greater than 1000, sleep time should be 1000 us */
> - else if (zero_rx_packet_count >= SLEEP_GEAR2_THRESHOLD)
> - return SLEEP_GEAR2_THRESHOLD;
> + /* If zero count is less than 100,  sleep 1us */
> + if (zero_rx_packet_count < SUSPEND_THRESHOLD)
> + return MINIMUM_SLEEP_TIME;
> + /* If zero count is less than 1000, sleep 100 us which is the minimum
> latency
> + switching from C3/C6 to C0
> + */
> + else
> + return SUSPEND_THRESHOLD;
> 
>   return 0;
>  }
> @@ -762,6 +770,35 @@ power_freq_scaleup_heuristic(unsigned lcore_id,
>   return FREQ_CURRENT;
>  }
> 
> +/**
> + * force polling thread sleep until one-shot rx interrupt triggers
> + * @param 

[dpdk-dev] [PATCH v1 4/5] eal: add per rx queue interrupt handling based on VFIO

2015-01-28 Thread Liang, Cunming


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Danny Zhou
> Sent: Wednesday, January 28, 2015 2:51 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v1 4/5] eal: add per rx queue interrupt handling
> based on VFIO
> 
> Signed-off-by: Danny Zhou 
> Signed-off-by: Yong Liu 
> ---
>  lib/librte_eal/common/include/rte_eal.h|   9 +
>  lib/librte_eal/linuxapp/eal/eal_interrupts.c   | 186
> -
>  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c |  11 +-
>  .../linuxapp/eal/include/exec-env/rte_interrupts.h |   4 +
>  4 files changed, 168 insertions(+), 42 deletions(-)
> 
> diff --git a/lib/librte_eal/common/include/rte_eal.h
> b/lib/librte_eal/common/include/rte_eal.h
> index f4ecd2e..5f31aa5 100644
> --- a/lib/librte_eal/common/include/rte_eal.h
> +++ b/lib/librte_eal/common/include/rte_eal.h
> @@ -150,6 +150,15 @@ int rte_eal_iopl_init(void);
>   *   - On failure, a negative error value.
>   */
>  int rte_eal_init(int argc, char **argv);
> +
> +/**
> + * @param port_id
> + *   the port id
> + * @return
> + *   - On success, return 0
[LCM] It has changes to return -1.
> + */
> +int rte_eal_wait_rx_intr(uint8_t port_id, uint8_t queue_id);
> +
>  /**
>   * Usage function typedef used by the application usage function.
>   *
> diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> index dc2668a..b120303 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> @@ -64,6 +64,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "eal_private.h"
>  #include "eal_vfio.h"
> @@ -127,6 +128,7 @@ static pthread_t intr_thread;
>  #ifdef VFIO_PRESENT
> 
>  #define IRQ_SET_BUF_LEN  (sizeof(struct vfio_irq_set) + sizeof(int))
> +#define MSIX_IRQ_SET_BUF_LEN (sizeof(struct vfio_irq_set) + sizeof(int) *
> (VFIO_MAX_QUEUE_ID + 1))
> 
>  /* enable legacy (INTx) interrupts */
>  static int
> @@ -221,7 +223,7 @@ vfio_disable_intx(struct rte_intr_handle *intr_handle) {
>  /* enable MSI-X interrupts */
>  static int
>  vfio_enable_msi(struct rte_intr_handle *intr_handle) {
> - int len, ret;
> + int len, ret, max_intr;
>   char irq_set_buf[IRQ_SET_BUF_LEN];
>   struct vfio_irq_set *irq_set;
>   int *fd_ptr;
> @@ -230,12 +232,19 @@ vfio_enable_msi(struct rte_intr_handle *intr_handle)
> {
> 
>   irq_set = (struct vfio_irq_set *) irq_set_buf;
>   irq_set->argsz = len;
> - irq_set->count = 1;
> + if ((!intr_handle->max_intr) ||
> + (intr_handle->max_intr > VFIO_MAX_QUEUE_ID))
> + max_intr = VFIO_MAX_QUEUE_ID + 1;
> + else
> + max_intr = intr_handle->max_intr;
> +
> + irq_set->count = max_intr;
>   irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
> VFIO_IRQ_SET_ACTION_TRIGGER;
>   irq_set->index = VFIO_PCI_MSI_IRQ_INDEX;
>   irq_set->start = 0;
>   fd_ptr = (int *) _set->data;
> - *fd_ptr = intr_handle->fd;
> + memcpy(fd_ptr, intr_handle->queue_fd, sizeof(intr_handle->queue_fd));
> + fd_ptr[max_intr - 1] = intr_handle->fd;
> 
>   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
> 
> @@ -244,23 +253,6 @@ vfio_enable_msi(struct rte_intr_handle *intr_handle) {
>   intr_handle->fd);
>   return -1;
>   }
> -
> - /* manually trigger interrupt to enable it */
> - memset(irq_set, 0, len);
> - len = sizeof(struct vfio_irq_set);
> - irq_set->argsz = len;
> - irq_set->count = 1;
> - irq_set->flags = VFIO_IRQ_SET_DATA_NONE |
> VFIO_IRQ_SET_ACTION_TRIGGER;
> - irq_set->index = VFIO_PCI_MSI_IRQ_INDEX;
> - irq_set->start = 0;
> -
> - ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
> -
> - if (ret) {
> - RTE_LOG(ERR, EAL, "Error triggering MSI interrupts for fd %d\n",
> - intr_handle->fd);
> - return -1;
> - }
>   return 0;
>  }
> 
> @@ -292,8 +284,8 @@ vfio_disable_msi(struct rte_intr_handle *intr_handle) {
>  /* enable MSI-X interrupts */
>  static int
>  vfio_enable_msix(struct rte_intr_handle *intr_handle) {
> - int len, ret;
> - char irq_set_buf[IRQ_SET_BUF_LEN];
> + int len, ret, max_intr;
> + char irq_set_buf[MSIX_IRQ_SET_BUF_LEN];
>   struct vfio_irq_set *irq_set;
>   int *fd_ptr;
> 
> @@ -301,12 +293,19 @@ vfio_enable_msix(struct rte_intr_handle *intr_handle)
> {
> 
>   irq_set = (struct vfio_irq_set *) irq_set_buf;
>   irq_set->argsz = len;
> - irq_set->count = 1;
> + if ((!intr_handle->max_intr) ||
> + (intr_handle->max_intr > VFIO_MAX_QUEUE_ID))
> + max_intr = VFIO_MAX_QUEUE_ID + 1;
> + else
> + max_intr = intr_handle->max_intr;
> +
> + irq_set->count = max_intr;
>   irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
> 

[dpdk-dev] [PATCH v1 3/5] igb: enable rx queue interrupts for PF

2015-01-28 Thread Danny Zhou
Signed-off-by: Danny Zhou 
---
 lib/librte_pmd_e1000/e1000/e1000_hw.h |   3 +
 lib/librte_pmd_e1000/e1000_ethdev.h   |   6 +
 lib/librte_pmd_e1000/igb_ethdev.c | 265 ++
 3 files changed, 249 insertions(+), 25 deletions(-)

diff --git a/lib/librte_pmd_e1000/e1000/e1000_hw.h 
b/lib/librte_pmd_e1000/e1000/e1000_hw.h
index 4dd92a3..9b999ec 100644
--- a/lib/librte_pmd_e1000/e1000/e1000_hw.h
+++ b/lib/librte_pmd_e1000/e1000/e1000_hw.h
@@ -780,6 +780,9 @@ struct e1000_mac_info {
u16 mta_reg_count;
u16 uta_reg_count;

+   u32 max_rx_queues;
+   u32 max_tx_queues;
+
/* Maximum size of the MTA register table in all supported adapters */
#define MAX_MTA_REG 128
u32 mta_shadow[MAX_MTA_REG];
diff --git a/lib/librte_pmd_e1000/e1000_ethdev.h 
b/lib/librte_pmd_e1000/e1000_ethdev.h
index d155e77..713ca11 100644
--- a/lib/librte_pmd_e1000/e1000_ethdev.h
+++ b/lib/librte_pmd_e1000/e1000_ethdev.h
@@ -34,6 +34,8 @@
 #ifndef _E1000_ETHDEV_H_
 #define _E1000_ETHDEV_H_

+#include 
+
 /* need update link, bit flag */
 #define E1000_FLAG_NEED_LINK_UPDATE (uint32_t)(1 << 0)
 #define E1000_FLAG_MAILBOX  (uint32_t)(1 << 1)
@@ -105,10 +107,14 @@
 #define E1000_FTQF_QUEUE_SHIFT   16
 #define E1000_FTQF_QUEUE_ENABLE  0x0100

+/* maximum number of other interrupts besides Rx & Tx interrupts */
+#define E1000_MAX_OTHER_INTR   1
+
 /* structure for interrupt relative data */
 struct e1000_interrupt {
uint32_t flags;
uint32_t mask;
+   rte_spinlock_t lock;
 };

 /* local vfta copy */
diff --git a/lib/librte_pmd_e1000/igb_ethdev.c 
b/lib/librte_pmd_e1000/igb_ethdev.c
index 2a268b8..2a9bf00 100644
--- a/lib/librte_pmd_e1000/igb_ethdev.c
+++ b/lib/librte_pmd_e1000/igb_ethdev.c
@@ -97,6 +97,7 @@ static int  eth_igb_flow_ctrl_get(struct rte_eth_dev *dev,
 static int  eth_igb_flow_ctrl_set(struct rte_eth_dev *dev,
struct rte_eth_fc_conf *fc_conf);
 static int eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev);
+static int eth_igb_rxq_interrupt_setup(struct rte_eth_dev *dev);
 static int eth_igb_interrupt_get_status(struct rte_eth_dev *dev);
 static int eth_igb_interrupt_action(struct rte_eth_dev *dev);
 static void eth_igb_interrupt_handler(struct rte_intr_handle *handle,
@@ -191,6 +192,12 @@ static int eth_igb_filter_ctrl(struct rte_eth_dev *dev,
 enum rte_filter_op filter_op,
 void *arg);

+static int eth_igb_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t 
queue_id);
+static int eth_igb_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t 
queue_id);
+static void eth_igb_assign_vector(struct e1000_hw *hw, s8 direction, u8 queue, 
u8 msix_vector);
+static void eth_igb_configure_msix(struct  e1000_hw *hw);
+static void eth_igb_write_ivar(struct e1000_hw *hw, u8 msix_vector, u8 index, 
u8 offset);
+
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
  */
@@ -250,6 +257,8 @@ static struct eth_dev_ops eth_igb_ops = {
.vlan_tpid_set= eth_igb_vlan_tpid_set,
.vlan_offload_set = eth_igb_vlan_offload_set,
.rx_queue_setup   = eth_igb_rx_queue_setup,
+   .rx_queue_intr_enable = eth_igb_rx_queue_intr_enable,
+   .rx_queue_intr_disable = eth_igb_rx_queue_intr_disable,
.rx_queue_release = eth_igb_rx_queue_release,
.rx_queue_count   = eth_igb_rx_queue_count,
.rx_descriptor_done   = eth_igb_rx_descriptor_done,
@@ -592,6 +601,16 @@ eth_igb_dev_init(__attribute__((unused)) struct eth_driver 
*eth_drv,
 eth_dev->data->port_id, pci_dev->id.vendor_id,
 pci_dev->id.device_id);

+   /* set max interrupt vfio request */
+   struct rte_eth_dev_info dev_info;
+
+   memset(_info, 0, sizeof(dev_info));
+   eth_igb_infos_get(eth_dev, _info);
+
+   hw->mac.max_rx_queues = dev_info.max_rx_queues;
+
+   pci_dev->intr_handle.max_intr = hw->mac.max_rx_queues + 
E1000_MAX_OTHER_INTR;
+
rte_intr_callback_register(&(pci_dev->intr_handle),
eth_igb_interrupt_handler, (void *)eth_dev);

@@ -754,7 +773,7 @@ eth_igb_start(struct rte_eth_dev *dev)
 {
struct e1000_hw *hw =
E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-   int ret, i, mask;
+   int ret, mask;
uint32_t ctrl_ext;

PMD_INIT_FUNC_TRACE();
@@ -794,6 +813,9 @@ eth_igb_start(struct rte_eth_dev *dev)
/* configure PF module if SRIOV enabled */
igb_pf_host_configure(dev);

+   /* confiugre msix for  sleep until  rx interrupt */
+   eth_igb_configure_msix(hw);
+
/* Configure for OS presence */
igb_init_manageability(hw);

@@ -821,33 +843,9 @@ eth_igb_start(struct rte_eth_dev *dev)
igb_vmdq_vlan_hw_filter_enable(dev);
}

-   /*
-* Configure the Interrupt Moderation register (EITR) with the maximum
-* 

[dpdk-dev] [PATCH v1 1/5] ethdev: add rx interrupt enable/disable functions

2015-01-28 Thread Danny Zhou
Signed-off-by: Danny Zhou 
---
 lib/librte_ether/rte_ethdev.c | 45 ++
 lib/librte_ether/rte_ethdev.h | 57 +++
 2 files changed, 102 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index ea3a1fb..dd66cd9 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -2825,6 +2825,51 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
}
rte_spinlock_unlock(_eth_dev_cb_lock);
 }
+
+int
+rte_eth_dev_rx_queue_intr_enable(uint8_t port_id,
+   uint16_t queue_id)
+{
+   struct rte_eth_dev *dev;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-ENODEV);
+   }
+
+   dev = _eth_devices[port_id];
+   if (dev == NULL) {
+   PMD_DEBUG_TRACE("Invalid port device\n");
+   return (-ENODEV);
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_enable, -ENOTSUP);
+   (*dev->dev_ops->rx_queue_intr_enable)(dev, queue_id);
+   return 0;
+}
+
+int
+rte_eth_dev_rx_queue_intr_disable(uint8_t port_id,
+   uint16_t queue_id)
+{
+   struct rte_eth_dev *dev;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-ENODEV);
+   }
+
+   dev = _eth_devices[port_id];
+   if (dev == NULL) {
+   PMD_DEBUG_TRACE("Invalid port device\n");
+   return (-ENODEV);
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_disable, -ENOTSUP);
+   (*dev->dev_ops->rx_queue_intr_disable)(dev, queue_id);
+   return 0;
+}
+
 #ifdef RTE_NIC_BYPASS
 int rte_eth_dev_bypass_init(uint8_t port_id)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 1200c1c..c080039 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -848,6 +848,8 @@ struct rte_eth_fdir {
 struct rte_intr_conf {
/** enable/disable lsc interrupt. 0 (default) - disable, 1 enable */
uint16_t lsc;
+   /** enable/disable rxq interrupt. 0 (default) - disable, 1 enable */
+   uint16_t rxq;
 };

 /**
@@ -1108,6 +1110,14 @@ typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev 
*dev,
const struct rte_eth_txconf *tx_conf);
 /**< @internal Setup a transmit queue of an Ethernet device. */

+typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
+   uint16_t rx_queue_id);
+/**< @internal Enable interrupt of a receive queue of an Ethernet device. */
+
+typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev,
+   uint16_t rx_queue_id);
+/**< @internal Disable interrupt of a receive queue of an Ethernet device. */
+
 typedef void (*eth_queue_release_t)(void *queue);
 /**< @internal Release memory resources allocated by given RX/TX queue. */

@@ -1444,6 +1454,8 @@ struct eth_dev_ops {
eth_queue_start_t  tx_queue_start;/**< Start TX for a queue.*/
eth_queue_stop_t   tx_queue_stop;/**< Stop TX for a queue.*/
eth_rx_queue_setup_t   rx_queue_setup;/**< Set up device RX queue.*/
+   eth_rx_enable_intr_t   rx_queue_intr_enable; /**< Enable Rx queue 
interrupt. */
+   eth_rx_disable_intr_t  rx_queue_intr_disable; /**< Disable Rx queue 
interrupt.*/
eth_queue_release_trx_queue_release;/**< Release RX queue.*/
eth_rx_queue_count_t   rx_queue_count; /**< Get Rx queue count. */
eth_rx_descriptor_done_t   rx_descriptor_done;  /**< Check rxd DD bit */
@@ -2810,6 +2822,51 @@ void _rte_eth_dev_callback_process(struct rte_eth_dev 
*dev,
enum rte_eth_event_type event);

 /**
+ * When there is no rx packet coming in Rx Queue for a long time, we can
+ * sleep lcore related to RX Queue for power saving, and enable rx interrupt
+ * to be triggered when rx packect arrives.
+ *
+ * The rte_eth_dev_rx_queue_intr_enable() function enables rx queue
+ * interrupt on specific rx queue of a port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if underlying hardware OR driver doesn't support
+ * that operation.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_rx_queue_intr_enable(uint8_t port_id,
+   uint16_t queue_id);
+
+/**
+ * When lcore wakes up from rx interrupt indicating packet coming, disable rx
+ * interrupt and returns to polling mode.
+ *
+ * The rte_eth_dev_rx_queue_intr_disable() function disables rx 

[dpdk-dev] [PATCH v1 0/5] Interrupt mode for PMD

2015-01-28 Thread Danny Zhou
The patch series introduce low-latency one-shot rx interrupt into DPDK with
polling and interrupt mode switch control example.

DPDK userspace interrupt notification and handling mechanism is based on UIO 
with below limitation:
1) It is designed to handle LSC interrupt only with inefficient suspended
pthread wakeup procedure (e.g. UIO wakes up LSC interrupt handling thread
which then wakes up DPDK polling thread). In this way, it introduces
non-deterministic wakeup latency for DPDK polling thread as well as packet 
latency if it is used to handle Rx interrupt.
2) UIO only supports a single interrupt vector which has to been shared by
LSC interrupt and interrupts assigned to dedicated rx queues.

This patchset includes below features:
1) Enable one-shot rx queue interrupt in ixgbe PMD(PF & VF) and igb PMD(PF 
only).
2) Build on top of the VFIO mechanism instead of UIO, so it could support
up to 64 interrupt vectors for rx queue interrupts.
3) Have 1 DPDK polling thread handle per Rx queue interrupt with a dedicated
VFIO eventfd, which eliminates non-deterministic pthread wakeup latency in
user space.
4) Demonstrate interrupts control APIs and userspace NAIP-like 
polling/interrupt 
switch algorithms in L3fwd-power example.

Known limitations:
1) It does not work for UIO due to a single interrupt eventfd shared by LSC
and rx queue interrupt handlers causes a mess.
2) LSC interrupt is not supported by VF driver, so it is by default disabled 
in L3fwd-power now. Feel free to turn in on if you want to support both LSC
and rx queue interrupts on a PF.


Danny Zhou (5):
  ethdev: add rx interrupt enable/disable functions
  ixgbe: enable rx queue interrupts for both PF and VF
  igb: enable rx queue interrupts for PF
  eal: add per rx queue interrupt handling based on VFIO
  L3fwd-power: enable one-shot rx interrupt and polling/interrupt mode switch

 examples/l3fwd-power/main.c| 170 +++---
 lib/librte_eal/common/include/rte_eal.h|   9 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c   | 186 ---
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c |  11 +-
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |   4 +
 lib/librte_ether/rte_ethdev.c  |  45 +++
 lib/librte_ether/rte_ethdev.h  |  57 
 lib/librte_pmd_e1000/e1000/e1000_hw.h  |   3 +
 lib/librte_pmd_e1000/e1000_ethdev.h|   6 +
 lib/librte_pmd_e1000/igb_ethdev.c  | 265 +--
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c| 371 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h|   9 +
 12 files changed, 1028 insertions(+), 108 deletions(-)

-- 
1.8.1.4



[dpdk-dev] Process question: reviewing older patches

2015-01-28 Thread Thomas Monjalon
2015-01-28 09:52, Jay Rolette:
> There's a fairly old KNI patch (http://dpdk.org/dev/patchwork/patch/84/)
> that I reviewed, but I'm not seeing how to submit my "Reviewed-by" when I
> don't have any of the emails from the patch in my mail client.
> 
> I can copy the text from the 'mbox' link in Patchwork into an email, but
> I'm guessing that may not make the patch toolchain happy.
> 
> What's the right way to do this?

I think you should try to open the mbox file with your mail client and reply.
In my case, I had to rename it into .mbox (was .patch).

Thanks for reviewing
-- 
Thomas


[dpdk-dev] [PATCH v3 00/18] ACL: New AVX2 classify method and several other enhancements.

2015-01-28 Thread Thomas Monjalon
> > v3 changes:
> > Applied review comments from Thomas:
> > - fix spelling errors reported by codespell.
> > - split last patch into two:
> > first to remove unused macros,
> > second to add some comments about ACL internal layout.
> > 
> > v2 changes:
> > - When build with the compilers that don't support AVX2 instructions,
> > make rte_acl_classify_avx2() do nothing and return an error.
> > - Remove unneeded 'ifdef __AVX2__' in acl_run_avx2.*.
> > - Reorder order of patches in the set, to keep RTE_LIBRTE_ACL_STANDALONE=y
> > always buildable.
> > 
> > This patch series contain several fixes and enhancements for ACL library.
> > See complete list below.
> > Two main changes that are externally visible:
> > - Introduce new classify method:  RTE_ACL_CLASSIFY_AVX2.
> > It uses AVX2 instructions and 256 bit wide data types
> > to perform internal trie traversal.
> > That helps to increase classify() throughput.
> > This method is selected as default one on CPUs that supports AVX2.
> > - Introduce new field in the build config structure: max_size.
> > It specifies maximum size that internal RT structure for given context
> > can reach.
> > The purpose of that is to allow user to decide about space/performance 
> > trade-off
> > (faster classify() vs less space for RT internal structures)
> > for each given set of rules.
> > 
> > Konstantin Ananyev (18):
> >   fix fix compilation issues with RTE_LIBRTE_ACL_STANDALONE=y
> >   app/test: few small fixes fot test_acl.c
> >   librte_acl: make data_indexes long enough to survive idle transitions.
> >   librte_acl: remove build phase heuristsic with negative performance
> > effect.
> >   librte_acl: fix a bug at build phase that can cause matches beeing
> > overwirtten.
> >   librte_acl: introduce DFA nodes compression (group64) for identical
> > entries.
> >   librte_acl: build/gen phase - simplify the way match nodes are
> > allocated.
> >   librte_acl: make scalar RT code to be more similar to vector one.
> >   librte_acl: a bit of RT code deduplication.
> >   EAL: introduce rte_ymm and relatives in rte_common_vect.h.
> >   librte_acl: add AVX2 as new rte_acl_classify() method
> >   test-acl: add ability to manually select RT method.
> >   librte_acl: Remove search_sse_2 and relatives.
> >   libter_acl: move lo/hi dwords shuffle out from calc_addr
> >   libte_acl: make calc_addr a define to deduplicate the code.
> >   libte_acl: introduce max_size into rte_acl_config.
> >   libte_acl: remove unused macros.
> >   libte_acl: add some comments about ACL internal layout.
> > 
> For the series
> Acked-by: Neil Horman 

Applied

Thanks for the big work
-- 
Thomas


[dpdk-dev] DPDK 1.7.1 error (PANIC in ovdk_vport_phy_port_init(): Cannot init NIC port '0' (Success))

2015-01-28 Thread sothy shan
Hi!

I have one question to inii NIC port in DPDK 1.7.1.

I got the following error/

EAL: PCI device :03:00.0 on NUMA socket -1
EAL:   probe driver: 8086:150e rte_igb_pmd
EAL:   PCI memory mapped at 0x7fa5a5261000
EAL:   PCI memory mapped at 0x7fa5a7538000
EAL: PCI device :03:00.1 on NUMA socket -1
EAL:   probe driver: 8086:150e rte_igb_pmd
EAL:   PCI memory mapped at 0x7fa5a51e1000
EAL:   PCI memory mapped at 0x7fa5a7534000
EAL: PCI device :03:00.2 on NUMA socket -1
EAL:   probe driver: 8086:150e rte_igb_pmd
EAL:   PCI memory mapped at 0x7fa5a5161000
EAL:   PCI memory mapped at 0x7fa5a753
EAL: PCI device :03:00.3 on NUMA socket -1
EAL:   probe driver: 8086:150e rte_igb_pmd
EAL:   PCI memory mapped at 0x7fa5a50e1000
EAL:   PCI memory mapped at 0x7fa5a50dd000
EAL: PCI device :06:00.0 on NUMA socket -1
EAL:   probe driver: 8086:154d rte_ixgbe_pmd
EAL:   :06:00.0 not managed by UIO driver, skipping
EAL: PCI device :06:00.1 on NUMA socket -1
EAL:   probe driver: 8086:154d rte_ixgbe_pmd
EAL:   :06:00.1 not managed by UIO driver, skipping
EAL:   :06:00.0 not managed by UIO driver, skipping
EAL:   :06:00.1 not managed by UIO driver, skipping
PANIC in ovdk_vport_phy_port_init():
Cannot init NIC port '0' (Success)
1: 
[/home/cubiq/sothy/dpdkovs/dpdk-1.7.1/x86_64-ivshmem-linuxapp-gcc/lib/libintel_dpdk.so(rte_dump_stack+0x18)
[0x7fa5a75bb768]]
Abandon (core dumped)

+++

THe above error I got when I run
$./datapath/dpdk/ovs-dpdk -c 0x0F -n 4 --proc-type primary --huge-dir
/dev/hugepages -- --stats_core=0 --stats_int=5 -p 0x03

I guess it is problem of DPDK to init PCI probe. Any guess or
suggestion from error log.

I am running Fedora 20, DPDK 1.7.1 and OVS DPDK 1.2./

Best regards
Sothy


[dpdk-dev] Regarding UDP checksum offload

2015-01-28 Thread Prashant Upadhyaya
Hi,

I am aware that this topic has been discussed several times before, but I
am somehow still stuck with this.

I am using dpdk 1.6r1, intel 82599 NIC.
I have an mbuf, I have hand-constructed a UDP packet (IPv4) in the data
portion, filled the relevant fields of the headers and I do a tx burst. No
problems, the destination gets the packet. I filled UDP checksum as zero
and there was no checksum offloaded in ol_flags.

Now in the same usecase, I want to offload UDP checksum.
I am aware that the checksum field in UDP header has to be filled with the
pseudo header checksum, I did that, duly added the PKT_TX_UDP_CKSUM flag in
ol_flags, did a tx_burst and the packet does not reach the destination.

I realized that I have to fill the following fields as well (my packet does
not have vlan tag)
mbuf->pkt.vlan_macip.f.l2_len
mbuf->pkt.vlan_macip.f.l3_len

so I filled the l2_len as 14 and l3_len as 20 (IP header with no options)
Yet the packet did not reach the destination.

So my question is -- am I filling the l2_len and l3_len properly ?
Is there anything else to be done before I can get this UDP checksum
offload to work properly for me.

Regards
-Prashant


[dpdk-dev] Process question: reviewing older patches

2015-01-28 Thread Neil Horman
On Wed, Jan 28, 2015 at 02:57:58PM -0600, Jay Rolette wrote:
> Thanks Thomas and Neil. Sadly, no joy. While I generally like gmail for my
> mail, there's not a reasonable way to import the mbox file or to control
> the message id.
> 
Sure there is, you just need to select an appropriate MUA.  You can't use the
web interface for this.  Enable imap access to your gmail account, and setup an
MUA like mutt to point to it.  Then the mutt client can open the mbox file, or
you can fill out the in-reply-to: header manually.

Neil



[dpdk-dev] Pktgen-DPDK rate and traffic inconsistency problem

2015-01-28 Thread Alexandre Frigon
Hi Pawel,
Thanks for your reply.

Sadly, assigning 1 core per port didn't change anything.

As for the NUMA nodes, If I understand correctly there is a node for each 
socket and I'm only using 1 socket with 4 cores and lscpu is showing me only 1 
node. I don't think I can do anything about that. Correct me if I'm wrong on 
this one.

I'm definitely going to try a older version and see if it works properly.

Thanks for your help
Alexandre F.

> -Original Message-
> From: Wodkowski, PawelX [mailto:pawelx.wodkowski at intel.com]
> Sent: Wednesday, January 28, 2015 9:15 AM
> To: Alexandre Frigon; dev at dpdk.org; keith.wiles at windriver.com
> Subject: RE: Pktgen-DPDK rate and traffic inconsistency problem
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alexandre Frigon
> > Sent: Tuesday, January 27, 2015 8:31 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] Pktgen-DPDK rate and traffic inconsistency problem
> >
> > Hi all,
> >
> >  I'm using dpdk 1.8 and pktgen-dpdk 2.8 to generate traffic on a
> > back-to-back setup both equipped with 82599EB 10-Gigabit NIC.
> > The problem is when I start it, pktgen indicates 1Mbits/s Tx with
> > 64B packet size,  but I'm receiving  about 15% of it on the other end.
> > This percentage seems to be proportional with the packet size.
> >
> > e.g.
> > Using nload to read Rx traffic
> > Pktgen: Tx: 1Mbits/s==> Other end:  Rx 1660
> Mbits/s
> > Rate: 100%
> > Pkt size: 64B
> >
> >
> > e.g 2
> > Pktgen: Tx: 1Mbits/s==> Other end:  Rx 9385
> Mbits/s
> > Rate: 100%
> > Pkt size: 1518B
> >
> >
> > Pktgen is started with this command on a Xeon(R) CPU E31270 @ 3.40GHz
> > ./app/pktgen -c 1f -n 3 --proc-type auto --socket-mem 1024
> > --file-prefix pg -- -p
> > 0x3 -P  -N -m "[1:3].0, [2:4].1"
> 
> From past experience I don't assign more than 1 core per port. It had some
> race conditions issues and one core I capable to RX or TX full 10G.
> Also check if you assign proper cores/memory for your NICs (the same
> NUMA node).
> 
> >
> > Is there something I'm not configuring correctly or something I have miss?
> >
> > Also, the % rate is  acting strangely since anything above 50% doesn't
> > change the Tx rate and anything below is modifying it
> > e.g Tx:  1Mbits/s   5000Mbits/s
> > %Rate:  >=50%   25%
> >
> >
> 
> Actually I am getting exactly opposite results :) If I set rate to 50% I get
> MBits/s Rx/Tx   : 0/9942 9942/0  9942/9942
> 
> For 10%:
> MBits/s Rx/Tx   : 0/1997 1997/0  1997/1997
> 
> Which is about 2x  set :D
> 
> Additionaly I am getting message when "start 0" -> "stop 0" -> "start 0" is
> issued
> PMD: ixgbe_dev_rx_init(): forcing scatter mode
> 
> So there is definitely something wrong there but don't know where.
> Another issue I encountered is build system that fail when building out-of-
> tree.
> 
> Till this is fixed you can try version 2.7.1 that is working for me.
> 
> Pawel


[dpdk-dev] Regarding UDP checksum offload

2015-01-28 Thread Olivier MATZ
Hi Prashant,

On 01/28/2015 03:57 PM, Prashant Upadhyaya wrote:
>>> I am using dpdk 1.6r1, intel 82599 NIC.
>>> I have an mbuf, I have hand-constructed a UDP packet (IPv4) in
>>> the data
>>> portion, filled the relevant fields of the headers and I do a tx
>>> burst. No
>>> problems, the destination gets the packet. I filled UDP checksum
>>> as zero
>>> and there was no checksum offloaded in ol_flags.
>>>
>>> Now in the same usecase, I want to offload UDP checksum.
>>> I am aware that the checksum field in UDP header has to be
>>> filled with the
>>> pseudo header checksum, I did that, duly added the
>>> PKT_TX_UDP_CKSUM flag in
>>> ol_flags, did a tx_burst and the packet does not reach the
>>> destination.
>>>
>>> I realized that I have to fill the following fields as well (my
>>> packet does
>>> not have vlan tag)
>>> mbuf->pkt.vlan_macip.f.l2_len
>>> mbuf->pkt.vlan_macip.f.l3_len
>>>
>>> so I filled the l2_len as 14 and l3_len as 20 (IP header with no
>>> options)
>>> Yet the packet did not reach the destination.
>>>
>>> So my question is -- am I filling the l2_len and l3_len properly ?
>>> Is there anything else to be done before I can get this UDP checksum
>>> offload to work properly for me.
>>
>>
>>
>> As far as I remember, this should be working on 1.6r1.
>> When you say "did not reach the destination", do you mean that the
>> packet is not transmitted at all? Or is it transmitted with a wrong
>> checksum?
>
>
> The packet is not transmitted to destination. I cannot see it in tcpdump
> at wireshark.
> If I don't do the offload and fill UDP checksum as zero, then
> destination shows the packet in tcpdump
> If I don't do the offload and just fill the pseudo header checksum in
> UDP header (clearly the wrong checksum), then the destination shows the
> packet in tcpdump and wireshark decodes it to complain of wrong UDP
> checksum as expected.

This is strange. I don't see anything obvious in what you are
describing. It looks like the packet is dropped in the driver
or in the hardware. You can check the device statistics.

Another thing you can do is to retry on the latest stable dpdk which
is known to work (see csumonly.c in test-pmd).

> Let me add further, I am _just_ doing the UDP checksum offload and not
> the IP hdr checksum offload. I calculate and set IP header checksum by
> my own code. I hope that this is acceptable and does not interfere with
> UDP checksum offload

This should not be a problem.

Regards,
Olivier



[dpdk-dev] [PATCH] eal/linux: allow to map BARs with MSI-X tables, around them

2015-01-28 Thread Dan Aloni
On Thu, Jan 22, 2015 at 10:36:11AM +0200, Dan Aloni wrote:
> While VFIO doesn't allow us to map complete BARs with MSI-X tables,
> it does allow us to map around them in PAGE_SIZE granularity. There
> might be adapters that provide their registers in the same BAR
> but on a different page. For example, Intel's NVME adapter, though
> not a network adapter, provides only one MMIO BAR that contains
> the MSI-X table.
> 
> Signed-off-by: Dan Aloni 
> CC: Anatoly Burakov 

Has anyone reviewed this yet?

I am asking because I am interested to know whether someone is aiming
to integrate storage controllers support into DPDK, and this patch
could be instrumental.

--
Dan Aloni


[dpdk-dev] DPDK testpmd forwarding performace degradation

2015-01-28 Thread Alexander Belyakov
On Tue, Jan 27, 2015 at 7:21 PM, De Lara Guarch, Pablo <
pablo.de.lara.guarch at intel.com> wrote:

>
>
> > On Tue, Jan 27, 2015 at 10:51 AM, Alexander Belyakov
>
> >  wrote:
>
> >
>
> > Hi Pablo,
>
> >
>
> > On Mon, Jan 26, 2015 at 5:22 PM, De Lara Guarch, Pablo
>
> >  wrote:
>
> > Hi Alexander,
>
> >
>
> > > -Original Message-
>
> > > From: dev [mailto:dev-bounces at dpdk.org ] On
> Behalf Of Alexander
>
> > Belyakov
>
> > > Sent: Monday, January 26, 2015 10:18 AM
>
> > > To: dev at dpdk.org
>
> > > Subject: [dpdk-dev] DPDK testpmd forwarding performace degradation
>
> > >
>
> > > Hello,
>
> > >
>
> > > recently I have found a case of significant performance degradation
> for our
>
> > > application (built on top of DPDK, of course). Surprisingly, similar
> issue
>
> > > is easily reproduced with default testpmd.
>
> > >
>
> > > To show the case we need simple IPv4 UDP flood with variable UDP
>
> > payload
>
> > > size. Saying "packet length" below I mean: Eth header length (14
> bytes) +
>
> > > IPv4 header length (20 bytes) + UPD header length (8 bytes) + UDP
> payload
>
> > > length (variable) + CRC (4 bytes). Source IP addresses and ports are
>
> > selected
>
> > > randomly for each packet.
>
> > >
>
> > > I have used DPDK with revisions 1.6.0r2 and 1.7.1. Both show the same
>
> > issue.
>
> > >
>
> > > Follow "Quick start" guide (http://dpdk.org/doc/quick-start) to build
> and
>
> > > run testpmd. Enable testpmd forwarding ("start" command).
>
> > >
>
> > > Table below shows measured forwarding performance depending on
>
> > packet
>
> > > length:
>
> > >
>
> > > No. -- UDP payload length (bytes) -- Packet length (bytes) --
> Forwarding
>
> > > performance (Mpps) -- Expected theoretical performance (Mpps)
>
> > >
>
> > > 1. 0 -- 64 -- 14.8 -- 14.88
>
> > > 2. 34 -- 80 -- 12.4 -- 12.5
>
> > > 3. 35 -- 81 -- 6.2 -- 12.38 (!)
>
> > > 4. 40 -- 86 -- 6.6 -- 11.79
>
> > > 5. 49 -- 95 -- 7.6 -- 10.87
>
> > > 6. 50 -- 96 -- 10.7 -- 10.78 (!)
>
> > > 7. 60 -- 106 -- 9.4 -- 9.92
>
> > >
>
> > > At line number 3 we have added 1 byte of UDP payload (comparing to
>
> > > previous
>
> > > line) and got forwarding performance halved! 6.2 Mpps against 12.38
> Mpps
>
> > > of
>
> > > expected theoretical maximum for this packet size.
>
> > >
>
> > > That is the issue.
>
> > >
>
> > > Significant performance degradation exists up to 50 bytes of UDP
> payload
>
> > > (96 bytes packet length), where it jumps back to theoretical maximum.
>
> > >
>
> > > What is happening between 80 and 96 bytes packet length?
>
> > >
>
> > > This issue is stable and 100% reproducible. At this point I am not
> sure if
>
> > > it is DPDK or NIC issue. These tests have been performed on Intel(R)
> Eth
>
> > > Svr Bypass Adapter X520-LR2 (X520LR2BP).
>
> > >
>
> > > Is anyone aware of such strange behavior?
>
> > I cannot reproduce the issue using two ports on two different 82599EB
> NICs,
>
> > using 1.7.1 and 1.8.0.
>
> > I always get either same or better linerate as I increase the packet
> size.
>
> >
>
> > Thank you for trying to reproduce the issue.
>
> >
>
> > Actually, have you tried using 1.8.0?
>
> >
>
> > I feel 1.8.0 is little bit immature and might require some post-release
>
> > patching. Even tespmd from this release is not forwarding packets
> properly
>
> > on my setup. It is up and running without visible errors/warnings, TX/RX
>
> > counters are ticking but I can not see any packets at the output.
>
>
>
> This is strange. Without  changing anything, forwarding works perfectly
> for me
>
> (so, RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC is enabled).
>
>
>
> >Please note, both 1.6.0r2 and 1.7.1 releases work (on the same setup)
> out-of-the-box just
>
> > fine with only exception of this mysterious performance drop.
>
> > So it will take some time to figure out what is wrong with dpdk-1.8.0.
>
> > Meanwhile we could focus on stable dpdk-1.7.1.
>
> >
>
> > Managed to get testpmd from dpdk-1.8.0 to work on my setup.
>
> > Unfortunately I had to disable RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC,
>
> > it is new comparing to 1.7.1 and somehow breaks testpmd forwarding. By
> the
>
> > way, simply disabling RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC in
>
> > common_linuxapp config file breaks the build - had to make quick'n'dirty
> fix
>
> > in struct igb_rx_queue as well.
>
> >
>
> > Anyway, issue is still here.
>
> >
>
> > Forwarding 80 bytes packets at 12.4 Mpps.
>
> > Forwarding 81 bytes packets at 7.2 Mpps.
>
> >
>
> > Any ideas?
>
> > As for X520-LR2 NIC - it is dual port bypass adapter with device id
> 155d. I
>
> > believe it should be treated as 82599EB except bypass feature. I put
> bypass
>
> > mode to "normal" in those tests.
>
>
>
> I have used a 82599EB first, and now a X520-SR2. Same results.
>
> I assume that X520-SR2 and X520-LR2 should give similar results
>
> (only thing that is changed is the wavelength, but the controller is the
> same).
>
>
>
It seems I found what was wrong, at least got a hint.

My build server 

[dpdk-dev] [PATCH v2 04/24] virtio: Add support for Link State interrupt

2015-01-28 Thread Stephen Hemminger
On Wed, 28 Jan 2015 03:03:32 +
"Ouyang, Changchun"  wrote:

> Hi Stephen,
> 
> > -Original Message-
> > From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> > Sent: Tuesday, January 27, 2015 6:00 PM
> > To: Xie, Huawei
> > Cc: Ouyang, Changchun; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2 04/24] virtio: Add support for Link State
> > interrupt
> > 
> > On Tue, 27 Jan 2015 09:04:07 +
> > "Xie, Huawei"  wrote:
> > 
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ouyang
> > > > Changchun
> > > > Sent: Tuesday, January 27, 2015 10:36 AM
> > > > To: dev at dpdk.org
> > > > Subject: [dpdk-dev] [PATCH v2 04/24] virtio: Add support for Link
> > > > State interrupt
> > > >
> > > > Virtio has link state interrupt which can be used.
> > > >
> > > > Signed-off-by: Stephen Hemminger 
> > > > Signed-off-by: Changchun Ouyang 
> > > > ---
> > > >  lib/librte_pmd_virtio/virtio_ethdev.c | 78
> > > > +++--
> > > > --
> > > >  lib/librte_pmd_virtio/virtio_pci.c| 22 ++
> > > >  lib/librte_pmd_virtio/virtio_pci.h|  4 ++
> > > >  3 files changed, 86 insertions(+), 18 deletions(-)
> > > >
> > > > diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c
> > > > b/lib/librte_pmd_virtio/virtio_ethdev.c
> > > > index 5df3b54..ef87ff8 100644
> > > > --- a/lib/librte_pmd_virtio/virtio_ethdev.c
> > > > +++ b/lib/librte_pmd_virtio/virtio_ethdev.c
> > > > @@ -845,6 +845,34 @@ static int virtio_resource_init(struct
> > > > rte_pci_device *pci_dev __rte_unused)  #endif
> > > >
> > > >  /*
> > > > + * Process Virtio Config changed interrupt and call the callback
> > > > + * if link state changed.
> > > > + */
> > > > +static void
> > > > +virtio_interrupt_handler(__rte_unused struct rte_intr_handle *handle,
> > > > +void *param)
> > > > +{
> > > > +   struct rte_eth_dev *dev = param;
> > > > +   struct virtio_hw *hw =
> > > > +   VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> > > > +   uint8_t isr;
> > > > +
> > > > +   /* Read interrupt status which clears interrupt */
> > > > +   isr = vtpci_isr(hw);
> > > > +   PMD_DRV_LOG(INFO, "interrupt status = %#x", isr);
> > > > +
> > > > +   if (rte_intr_enable(>pci_dev->intr_handle) < 0)
> > > > +   PMD_DRV_LOG(ERR, "interrupt enable failed");
> > > > +
> > >
> > > Is it better to put rte_intr_enable after we have handled the interrupt.
> > > Is there the possibility of interrupt reentrant in uio intr framework?
> > 
> > The UIO framework handles IRQ's via posix thread that is reading fd, then
> > calling this code. Therefore it is always single threaded.
> 
> Even if it is under UIO framework, and always single threaded, 
> How about move rte_intr_enable after the virtio_dev_link_update() and 
> _rte_eth_dev_callback_process is called.
> This make it more like interrupt handler in linux kernel.
> What do you think of it?

I ordered the interrupt handling to match what happens in e1000/igb
handler. My concern is that interrupt was level (not edge triggered)
and another link transisition could occur and be missed.




[dpdk-dev] [PATCH] eal/linux: allow to map BARs with MSI-X tables, around them

2015-01-28 Thread Burakov, Anatoly
Hi Dan

Apologies for not looking at it earlier.

> While VFIO doesn't allow us to map complete BARs with MSI-X tables,
> it does allow us to map around them in PAGE_SIZE granularity. There
> might be adapters that provide their registers in the same BAR
> but on a different page. For example, Intel's NVME adapter, though
> not a network adapter, provides only one MMIO BAR that contains
> the MSI-X table.
> 
> Signed-off-by: Dan Aloni 
> CC: Anatoly Burakov 
> ---
>  lib/librte_eal/linuxapp/eal/eal_pci.c  |  5 +-
>  lib/librte_eal/linuxapp/eal/eal_pci_init.h |  2 +-
>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c  |  4 +-
>  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 99
> +++---
>  lib/librte_eal/linuxapp/eal/eal_vfio.h |  8 ++-
>  5 files changed, 101 insertions(+), 17 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c
> b/lib/librte_eal/linuxapp/eal/eal_pci.c
> index b5f54101e8aa..4a74a9372a15 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
> @@ -118,13 +118,14 @@ pci_find_max_end_va(void)
> 
>  /* map a particular resource from a file */
>  void *
> -pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
> +pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size,
> +  int additional_flags)
>  {
>   void *mapaddr;
> 
>   /* Map the PCI memory resource of device */
>   mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
> - MAP_SHARED, fd, offset);
> + MAP_SHARED | additional_flags, fd, offset);
>   if (mapaddr == MAP_FAILED) {
>   RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx,
> 0x%lx): %s (%p)\n",
>   __func__, fd, requested_addr,
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_init.h
> b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
> index 1070eb88fe0a..0a0853d4c4df 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci_init.h
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
> @@ -66,7 +66,7 @@ extern void *pci_map_addr;
>  void *pci_find_max_end_va(void);
> 
>  void *pci_map_resource(void *requested_addr, int fd, off_t offset,
> - size_t size);
> +size_t size, int additional_flags);
> 
>  /* map IGB_UIO resource prototype */
>  int pci_uio_map_resource(struct rte_pci_device *dev);
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> index e53f06b82430..eaa2e36f643e 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> @@ -139,7 +139,7 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
> 
>   if (pci_map_resource(uio_res->maps[i].addr, fd,
>(off_t)uio_res->maps[i].offset,
> -  (size_t)uio_res->maps[i].size)
> +  (size_t)uio_res->maps[i].size, 0)
>   != uio_res->maps[i].addr) {
>   RTE_LOG(ERR, EAL,
>   "Cannot mmap device resource\n");
> @@ -379,7 +379,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
>   pci_map_addr =
> pci_find_max_end_va();
> 
>   mapaddr =
> pci_map_resource(pci_map_addr, fd, (off_t)offset,
> - (size_t)maps[j].size);
> + (size_t)maps[j].size, 0);
>   if (mapaddr == MAP_FAILED)
>   fail = 1;
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> index 20e097727f80..f6542a1f1464 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> @@ -62,6 +62,9 @@
> 
>  #ifdef VFIO_PRESENT
> 
> +#define PAGE_SIZE   (sysconf(_SC_PAGESIZE))
> +#define PAGE_MASK   (~(PAGE_SIZE - 1))
> +
>  #define VFIO_DIR "/dev/vfio"
>  #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
>  #define VFIO_GROUP_FMT "/dev/vfio/%u"
> @@ -72,10 +75,12 @@ static struct vfio_config vfio_cfg;
> 
>  /* get PCI BAR number where MSI-X interrupts are */
>  static int
> -pci_vfio_get_msix_bar(int fd, int *msix_bar)
> +pci_vfio_get_msix_bar(int fd, int *msix_bar, uint32_t *msix_table_offset,
> +   uint32_t *msix_table_size)
>  {
>   int ret;
>   uint32_t reg;
> + uint16_t flags;
>   uint8_t cap_id, cap_offset;
> 
>   /* read PCI capability pointer from config space */
> @@ -134,7 +139,18 @@ pci_vfio_get_msix_bar(int fd, int *msix_bar)
>   return -1;
>   }
> 
> + ret = pread64(fd, , sizeof(flags),
> +
>   VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
> + 

[dpdk-dev] [PATCH v2 15/15] timer: add support to non-EAL thread

2015-01-28 Thread Cunming Liang
Allow to setup timers only for EAL (lcore) threads (__lcore_id < MAX_LCORE_ID).
E.g. ? dynamically created thread will be able to reset/stop timer for lcore 
thread,
but it will be not allowed to setup timer for itself or another non-lcore 
thread.
rte_timer_manage() for non-lcore thread would simply do nothing and return 
straightway.

Signed-off-by: Cunming Liang 
---
 lib/librte_timer/rte_timer.c | 40 +++-
 lib/librte_timer/rte_timer.h |  2 +-
 2 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index 269a992..601c159 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -79,9 +79,10 @@ static struct priv_timer priv_timer[RTE_MAX_LCORE];

 /* when debug is enabled, store some statistics */
 #ifdef RTE_LIBRTE_TIMER_DEBUG
-#define __TIMER_STAT_ADD(name, n) do { \
-   unsigned __lcore_id = rte_lcore_id();   \
-   priv_timer[__lcore_id].stats.name += (n);   \
+#define __TIMER_STAT_ADD(name, n) do { \
+   unsigned __lcore_id = rte_lcore_id();   \
+   if (__lcore_id < RTE_MAX_LCORE) \
+   priv_timer[__lcore_id].stats.name += (n);   \
} while(0)
 #else
 #define __TIMER_STAT_ADD(name, n) do {} while(0)
@@ -127,15 +128,26 @@ timer_set_config_state(struct rte_timer *tim,
unsigned lcore_id;

lcore_id = rte_lcore_id();
+   if (lcore_id >= RTE_MAX_LCORE)
+   lcore_id = LCORE_ID_ANY;

/* wait that the timer is in correct status before update,
 * and mark it as being configured */
while (success == 0) {
prev_status.u32 = tim->status.u32;

+   /*
+* prevent race condition of non-EAL threads
+* to update the timer. When 'owner == LCORE_ID_ANY',
+* it means updated by a non-EAL thread.
+*/
+   if (lcore_id == (unsigned)LCORE_ID_ANY &&
+   (uint16_t)lcore_id == prev_status.owner)
+   return -1;
+
/* timer is running on another core, exit */
if (prev_status.state == RTE_TIMER_RUNNING &&
-   (unsigned)prev_status.owner != lcore_id)
+   prev_status.owner != (uint16_t)lcore_id)
return -1;

/* timer is being configured on another core */
@@ -366,9 +378,13 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,

/* round robin for tim_lcore */
if (tim_lcore == (unsigned)LCORE_ID_ANY) {
-   tim_lcore = rte_get_next_lcore(priv_timer[lcore_id].prev_lcore,
-  0, 1);
-   priv_timer[lcore_id].prev_lcore = tim_lcore;
+   if (lcore_id < RTE_MAX_LCORE) {
+   tim_lcore = rte_get_next_lcore(
+   priv_timer[lcore_id].prev_lcore,
+   0, 1);
+   priv_timer[lcore_id].prev_lcore = tim_lcore;
+   } else
+   tim_lcore = rte_get_next_lcore(LCORE_ID_ANY, 0, 1);
}

/* wait that the timer is in correct status before update,
@@ -378,7 +394,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
return -1;

__TIMER_STAT_ADD(reset, 1);
-   if (prev_status.state == RTE_TIMER_RUNNING) {
+   if (prev_status.state == RTE_TIMER_RUNNING &&
+   lcore_id < RTE_MAX_LCORE) {
priv_timer[lcore_id].updated = 1;
}

@@ -455,7 +472,8 @@ rte_timer_stop(struct rte_timer *tim)
return -1;

__TIMER_STAT_ADD(stop, 1);
-   if (prev_status.state == RTE_TIMER_RUNNING) {
+   if (prev_status.state == RTE_TIMER_RUNNING &&
+   lcore_id < RTE_MAX_LCORE) {
priv_timer[lcore_id].updated = 1;
}

@@ -499,6 +517,10 @@ void rte_timer_manage(void)
uint64_t cur_time;
int i, ret;

+   /* timer manager only runs on EAL thread */
+   if (lcore_id >= RTE_MAX_LCORE)
+   return;
+
__TIMER_STAT_ADD(manage, 1);
/* optimize for the case where per-cpu list is empty */
if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL)
diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index 4907cf5..5c5df91 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -76,7 +76,7 @@ extern "C" {
 #define RTE_TIMER_RUNNING 2 /**< State: timer function is running. */
 #define RTE_TIMER_CONFIG  3 /**< State: timer is being configured. */

-#define RTE_TIMER_NO_OWNER -1 /**< Timer has no owner. */
+#define RTE_TIMER_NO_OWNER -2 /**< Timer has no owner. */

 /**
  * Timer type: Periodic or single (one-shot).
-- 
1.8.1.4



[dpdk-dev] [PATCH v2 14/15] ring: add support to non-EAL thread

2015-01-28 Thread Cunming Liang
ring debug stat won't take care non-EAL thread.

Signed-off-by: Cunming Liang 
---
 lib/librte_ring/rte_ring.h | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 7cd5f2d..39bacdd 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -188,10 +188,12 @@ struct rte_ring {
  *   The number to add to the object-oriented statistics.
  */
 #ifdef RTE_LIBRTE_RING_DEBUG
-#define __RING_STAT_ADD(r, name, n) do {   \
-   unsigned __lcore_id = rte_lcore_id();   \
-   r->stats[__lcore_id].name##_objs += n;  \
-   r->stats[__lcore_id].name##_bulk += 1;  \
+#define __RING_STAT_ADD(r, name, n) do {\
+   unsigned __lcore_id = rte_lcore_id();   \
+   if (__lcore_id < RTE_MAX_LCORE) {   \
+   r->stats[__lcore_id].name##_objs += n;  \
+   r->stats[__lcore_id].name##_bulk += 1;  \
+   }   \
} while(0)
 #else
 #define __RING_STAT_ADD(r, name, n) do {} while(0)
-- 
1.8.1.4



[dpdk-dev] [PATCH v2 13/15] mempool: add support to non-EAL thread

2015-01-28 Thread Cunming Liang
For non-EAL thread, bypass per lcore cache, directly use ring pool.
It allows using rte_mempool in either EAL thread or any user pthread.
As in non-EAL thread, it directly rely on rte_ring and it's none preemptive.
It doesn't suggest to run multi-pthread/cpu which compete the rte_mempool.
It will get bad performance and has critical risk if scheduling policy is RT.

Signed-off-by: Cunming Liang 
---
 lib/librte_mempool/rte_mempool.h | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 3314651..4845f27 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -198,10 +198,12 @@ struct rte_mempool {
  *   Number to add to the object-oriented statistics.
  */
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
-#define __MEMPOOL_STAT_ADD(mp, name, n) do {   \
-   unsigned __lcore_id = rte_lcore_id();   \
-   mp->stats[__lcore_id].name##_objs += n; \
-   mp->stats[__lcore_id].name##_bulk += 1; \
+#define __MEMPOOL_STAT_ADD(mp, name, n) do {\
+   unsigned __lcore_id = rte_lcore_id();   \
+   if (__lcore_id < RTE_MAX_LCORE) {   \
+   mp->stats[__lcore_id].name##_objs += n; \
+   mp->stats[__lcore_id].name##_bulk += 1; \
+   }   \
} while(0)
 #else
 #define __MEMPOOL_STAT_ADD(mp, name, n) do {} while(0)
@@ -767,8 +769,9 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const 
*obj_table,
__MEMPOOL_STAT_ADD(mp, put, n);

 #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
-   /* cache is not enabled or single producer */
-   if (unlikely(cache_size == 0 || is_mp == 0))
+   /* cache is not enabled or single producer or none EAL thread */
+   if (unlikely(cache_size == 0 || is_mp == 0 ||
+lcore_id >= RTE_MAX_LCORE))
goto ring_enqueue;

/* Go straight to ring if put would overflow mem allocated for cache */
@@ -952,7 +955,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void **obj_table,
uint32_t cache_size = mp->cache_size;

/* cache is not enabled or single consumer */
-   if (unlikely(cache_size == 0 || is_mc == 0 || n >= cache_size))
+   if (unlikely(cache_size == 0 || is_mc == 0 ||
+n >= cache_size || lcore_id >= RTE_MAX_LCORE))
goto ring_dequeue;

cache = >local_cache[lcore_id];
-- 
1.8.1.4



[dpdk-dev] [PATCH v2 12/15] eal: fix recursive spinlock in non-EAL thraed

2015-01-28 Thread Cunming Liang
In non-EAL thread, lcore_id alrways be LCORE_ID_ANY.
It cann't be used as unique id for recursive spinlock.
Then use rte_gettid() to replace it.

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/common/include/generic/rte_spinlock.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/include/generic/rte_spinlock.h 
b/lib/librte_eal/common/include/generic/rte_spinlock.h
index dea885c..c7fb0df 100644
--- a/lib/librte_eal/common/include/generic/rte_spinlock.h
+++ b/lib/librte_eal/common/include/generic/rte_spinlock.h
@@ -179,7 +179,7 @@ static inline void 
rte_spinlock_recursive_init(rte_spinlock_recursive_t *slr)
  */
 static inline void rte_spinlock_recursive_lock(rte_spinlock_recursive_t *slr)
 {
-   int id = rte_lcore_id();
+   int id = rte_gettid();

if (slr->user != id) {
rte_spinlock_lock(>sl);
@@ -212,7 +212,7 @@ static inline void 
rte_spinlock_recursive_unlock(rte_spinlock_recursive_t *slr)
  */
 static inline int rte_spinlock_recursive_trylock(rte_spinlock_recursive_t *slr)
 {
-   int id = rte_lcore_id();
+   int id = rte_gettid();

if (slr->user != id) {
if (rte_spinlock_trylock(>sl) == 0)
-- 
1.8.1.4



[dpdk-dev] [PATCH v2 11/15] eal: set _lcore_id and _socket_id to (-1) by default

2015-01-28 Thread Cunming Liang
For those none EAL thread, *_lcore_id* shall always be LCORE_ID_ANY.
The libraries using *_lcore_id* as index need to take care.
*_socket_id* always be SOCKET_ID_ANY unitl the thread changes the affinity
by rte_thread_set_affinity()

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/bsdapp/eal/eal_thread.c   | 4 ++--
 lib/librte_eal/linuxapp/eal/eal_thread.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c 
b/lib/librte_eal/bsdapp/eal/eal_thread.c
index 5b16302..2b3c9a8 100644
--- a/lib/librte_eal/bsdapp/eal/eal_thread.c
+++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
@@ -56,8 +56,8 @@
 #include "eal_private.h"
 #include "eal_thread.h"

-RTE_DEFINE_PER_LCORE(unsigned, _lcore_id);
-RTE_DEFINE_PER_LCORE(unsigned, _socket_id);
+RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = (unsigned)LCORE_ID_ANY;
+RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY;
 RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);

 /*
diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c 
b/lib/librte_eal/linuxapp/eal/eal_thread.c
index 6eb1525..ab94e20 100644
--- a/lib/librte_eal/linuxapp/eal/eal_thread.c
+++ b/lib/librte_eal/linuxapp/eal/eal_thread.c
@@ -57,8 +57,8 @@
 #include "eal_private.h"
 #include "eal_thread.h"

-RTE_DEFINE_PER_LCORE(unsigned, _lcore_id);
-RTE_DEFINE_PER_LCORE(unsigned, _socket_id);
+RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = (unsigned)LCORE_ID_ANY;
+RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY;
 RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);

 /*
-- 
1.8.1.4



[dpdk-dev] [PATCH v2 09/15] malloc: fix the issue of SOCKET_ID_ANY

2015-01-28 Thread Cunming Liang
Add check for rte_socket_id(), avoid get unexpected return like (-1).

Signed-off-by: Cunming Liang 
---
 lib/librte_malloc/malloc_heap.h | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/lib/librte_malloc/malloc_heap.h b/lib/librte_malloc/malloc_heap.h
index b4aec45..a47136d 100644
--- a/lib/librte_malloc/malloc_heap.h
+++ b/lib/librte_malloc/malloc_heap.h
@@ -44,7 +44,12 @@ extern "C" {
 static inline unsigned
 malloc_get_numa_socket(void)
 {
-   return rte_socket_id();
+   unsigned socket_id = rte_socket_id();
+
+   if (socket_id == (unsigned)SOCKET_ID_ANY)
+   return 0;
+
+   return socket_id;
 }

 void *
-- 
1.8.1.4



[dpdk-dev] [PATCH v2 08/15] enic: fix re-define freebsd compile complain

2015-01-28 Thread Cunming Liang
Some macro already been defined by freebsd 'sys/param.h'.

Signed-off-by: Cunming Liang 
---
 lib/librte_pmd_enic/enic.h| 1 +
 lib/librte_pmd_enic/enic_compat.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/lib/librte_pmd_enic/enic.h b/lib/librte_pmd_enic/enic.h
index c43417c..189c3b9 100644
--- a/lib/librte_pmd_enic/enic.h
+++ b/lib/librte_pmd_enic/enic.h
@@ -66,6 +66,7 @@
 #define ENIC_CALC_IP_CKSUM  1
 #define ENIC_CALC_TCP_UDP_CKSUM 2
 #define ENIC_MAX_MTU9000
+#undef PAGE_SIZE
 #define PAGE_SIZE   4096
 #define PAGE_ROUND_UP(x) \
unsigned long)(x)) + PAGE_SIZE-1) & (~(PAGE_SIZE-1)))
diff --git a/lib/librte_pmd_enic/enic_compat.h 
b/lib/librte_pmd_enic/enic_compat.h
index b1af838..b84c766 100644
--- a/lib/librte_pmd_enic/enic_compat.h
+++ b/lib/librte_pmd_enic/enic_compat.h
@@ -67,6 +67,7 @@
 #define pr_warn(y, args...) dev_warning(0, y, ##args)
 #define BUG() pr_err("BUG at %s:%d", __func__, __LINE__)

+#undef ALIGN
 #define ALIGN(x, a)  __ALIGN_MASK(x, (typeof(x))(a)-1)
 #define __ALIGN_MASK(x, mask)(((x)+(mask))&~(mask))
 #define udelay usleep
-- 
1.8.1.4



[dpdk-dev] [PATCH v2 07/15] eal: apply affinity of EAL thread by assigned cpuset

2015-01-28 Thread Cunming Liang
EAL threads use assigned cpuset to set core affinity during startup.
It keeps 1:1 mapping, if no '--lcores' option is used.

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/bsdapp/eal/eal.c  | 13 ---
 lib/librte_eal/bsdapp/eal/eal_thread.c   | 63 +-
 lib/librte_eal/linuxapp/eal/eal.c|  7 +++-
 lib/librte_eal/linuxapp/eal/eal_thread.c | 67 +++-
 4 files changed, 54 insertions(+), 96 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 69f3c03..98c5a83 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -432,6 +432,7 @@ rte_eal_init(int argc, char **argv)
int i, fctret, ret;
pthread_t thread_id;
static rte_atomic32_t run_once = RTE_ATOMIC32_INIT(0);
+   char cpuset[CPU_STR_LEN];

if (!rte_atomic32_test_and_set(_once))
return -1;
@@ -502,13 +503,17 @@ rte_eal_init(int argc, char **argv)
if (rte_eal_pci_init() < 0)
rte_panic("Cannot init PCI\n");

-   RTE_LOG(DEBUG, EAL, "Master core %u is ready (tid=%p)\n",
-   rte_config.master_lcore, thread_id);
-
eal_check_mem_on_local_socket();

rte_eal_mcfg_complete();

+   eal_thread_init_master(rte_config.master_lcore);
+
+   eal_thread_dump_affinity(cpuset, CPU_STR_LEN);
+
+   RTE_LOG(DEBUG, EAL, "Master lcore %u is ready (tid=%p;cpuset=[%s])\n",
+   rte_config.master_lcore, thread_id, cpuset);
+
if (rte_eal_dev_init() < 0)
rte_panic("Cannot init pmd devices\n");

@@ -532,8 +537,6 @@ rte_eal_init(int argc, char **argv)
rte_panic("Cannot create thread\n");
}

-   eal_thread_init_master(rte_config.master_lcore);
-
/*
 * Launch a dummy function on all slave lcores, so that master lcore
 * knows they are all ready when this function returns.
diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c 
b/lib/librte_eal/bsdapp/eal/eal_thread.c
index d0c077b..5b16302 100644
--- a/lib/librte_eal/bsdapp/eal/eal_thread.c
+++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
@@ -103,55 +103,27 @@ eal_thread_set_affinity(void)
 {
int s;
pthread_t thread;
-
-/*
- * According to the section VERSIONS of the CPU_ALLOC man page:
- *
- * The CPU_ZERO(), CPU_SET(), CPU_CLR(), and CPU_ISSET() macros were added
- * in glibc 2.3.3.
- *
- * CPU_COUNT() first appeared in glibc 2.6.
- *
- * CPU_AND(), CPU_OR(), CPU_XOR(),CPU_EQUAL(),CPU_ALLOC(),
- * CPU_ALLOC_SIZE(), CPU_FREE(), CPU_ZERO_S(),  CPU_SET_S(),  CPU_CLR_S(),
- * CPU_ISSET_S(),  CPU_AND_S(), CPU_OR_S(), CPU_XOR_S(), and CPU_EQUAL_S()
- * first appeared in glibc 2.7.
- */
-#if defined(CPU_ALLOC)
-   size_t size;
-   cpu_set_t *cpusetp;
-
-   cpusetp = CPU_ALLOC(RTE_MAX_LCORE);
-   if (cpusetp == NULL) {
-   RTE_LOG(ERR, EAL, "CPU_ALLOC failed\n");
-   return -1;
-   }
-
-   size = CPU_ALLOC_SIZE(RTE_MAX_LCORE);
-   CPU_ZERO_S(size, cpusetp);
-   CPU_SET_S(rte_lcore_id(), size, cpusetp);
+   unsigned lcore_id = rte_lcore_id();

thread = pthread_self();
-   s = pthread_setaffinity_np(thread, size, cpusetp);
+   s = pthread_setaffinity_np(thread, sizeof(cpuset_t),
+  _config[lcore_id].cpuset);
if (s != 0) {
RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
-   CPU_FREE(cpusetp);
return -1;
}

-   CPU_FREE(cpusetp);
-#else /* CPU_ALLOC */
-   cpuset_t cpuset;
-   CPU_ZERO(  );
-   CPU_SET( rte_lcore_id(),  );
+   /* acquire system unique id  */
+   rte_gettid();
+
+   /* store socket_id in TLS for quick access */
+   RTE_PER_LCORE(_socket_id) =
+   eal_cpuset_socket_id(_config[lcore_id].cpuset);
+
+   CPU_COPY(_config[lcore_id].cpuset, _PER_LCORE(_cpuset));
+
+   lcore_config[lcore_id].socket_id = RTE_PER_LCORE(_socket_id);

-   thread = pthread_self();
-   s = pthread_setaffinity_np(thread, sizeof( cpuset ), );
-   if (s != 0) {
-   RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
-   return -1;
-   }
-#endif
return 0;
 }

@@ -174,6 +146,7 @@ eal_thread_loop(__attribute__((unused)) void *arg)
unsigned lcore_id;
pthread_t thread_id;
int m2s, s2m;
+   char cpuset[CPU_STR_LEN];

thread_id = pthread_self();

@@ -185,9 +158,6 @@ eal_thread_loop(__attribute__((unused)) void *arg)
if (lcore_id == RTE_MAX_LCORE)
rte_panic("cannot retrieve lcore id\n");

-   RTE_LOG(DEBUG, EAL, "Core %u is ready (tid=%p)\n",
-   lcore_id, thread_id);
-
m2s = lcore_config[lcore_id].pipe_master2slave[0];
s2m = lcore_config[lcore_id].pipe_slave2master[1];

@@ -198,6 +168,11 @@ eal_thread_loop(__attribute__((unused)) 

[dpdk-dev] [PATCH v2 06/15] eal: add rte_gettid() to acquire unique system tid

2015-01-28 Thread Cunming Liang
The rte_gettid() wraps the linux and freebsd syscall gettid().
It provides a persistent unique thread id for the calling thread.
It will save the unique id in TLS on the first time.

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/bsdapp/eal/eal_thread.c   |  9 +
 lib/librte_eal/common/include/rte_eal.h  | 27 +++
 lib/librte_eal/linuxapp/eal/eal_thread.c |  7 +++
 3 files changed, 43 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c 
b/lib/librte_eal/bsdapp/eal/eal_thread.c
index 10220c7..d0c077b 100644
--- a/lib/librte_eal/bsdapp/eal/eal_thread.c
+++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -233,3 +234,11 @@ eal_thread_loop(__attribute__((unused)) void *arg)
/* pthread_exit(NULL); */
/* return NULL; */
 }
+
+/* require calling thread tid by gettid() */
+int rte_sys_gettid(void)
+{
+   long lwpid;
+   thr_self();
+   return (int)lwpid;
+}
diff --git a/lib/librte_eal/common/include/rte_eal.h 
b/lib/librte_eal/common/include/rte_eal.h
index f4ecd2e..8ccdd65 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -41,6 +41,9 @@
  */

 #include 
+#include 
+
+#include 

 #ifdef __cplusplus
 extern "C" {
@@ -262,6 +265,30 @@ rte_set_application_usage_hook( rte_usage_hook_t 
usage_func );
  */
 int rte_eal_has_hugepages(void);

+/**
+ * A wrap API for syscall gettid.
+ *
+ * @return
+ *   On success, returns the thread ID of calling process.
+ *   It always successful.
+ */
+int rte_sys_gettid(void);
+
+/**
+ * Get system unique thread id.
+ *
+ * @return
+ *   On success, returns the thread ID of calling process.
+ *   It always successful.
+ */
+static inline int rte_gettid(void)
+{
+   static RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
+   if (RTE_PER_LCORE(_thread_id) == -1)
+   RTE_PER_LCORE(_thread_id) = rte_sys_gettid();
+   return RTE_PER_LCORE(_thread_id);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c 
b/lib/librte_eal/linuxapp/eal/eal_thread.c
index 748a83a..ed20c93 100644
--- a/lib/librte_eal/linuxapp/eal/eal_thread.c
+++ b/lib/librte_eal/linuxapp/eal/eal_thread.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -233,3 +234,9 @@ eal_thread_loop(__attribute__((unused)) void *arg)
/* pthread_exit(NULL); */
/* return NULL; */
 }
+
+/* require calling thread tid by gettid() */
+int rte_sys_gettid(void)
+{
+   return (int)syscall(SYS_gettid);
+}
-- 
1.8.1.4



[dpdk-dev] [PATCH v2 05/15] eal: add eal_common_thread.c for common thread API

2015-01-28 Thread Cunming Liang
The API works for both EAL thread and none EAL thread.
When calling rte_thread_set_affinity, the *_socket_id* and
*_cpuset* of calling thread will be updated if the thread
successful set the cpu affinity.

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/bsdapp/eal/Makefile|   1 +
 lib/librte_eal/common/eal_common_thread.c | 142 ++
 lib/librte_eal/linuxapp/eal/Makefile  |   2 +
 3 files changed, 145 insertions(+)
 create mode 100644 lib/librte_eal/common/eal_common_thread.c

diff --git a/lib/librte_eal/bsdapp/eal/Makefile 
b/lib/librte_eal/bsdapp/eal/Makefile
index d434882..78406be 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -73,6 +73,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_hexdump.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_devargs.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_dev.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_options.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_thread.c

 CFLAGS_eal.o := -D_GNU_SOURCE
 #CFLAGS_eal_thread.o := -D_GNU_SOURCE
diff --git a/lib/librte_eal/common/eal_common_thread.c 
b/lib/librte_eal/common/eal_common_thread.c
new file mode 100644
index 000..d996690
--- /dev/null
+++ b/lib/librte_eal/common/eal_common_thread.c
@@ -0,0 +1,142 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "eal_thread.h"
+
+int
+rte_thread_set_affinity(rte_cpuset_t *cpusetp)
+{
+   int s;
+   unsigned lcore_id;
+   pthread_t tid;
+
+   if (!cpusetp)
+   return -1;
+
+   lcore_id = rte_lcore_id();
+   if (lcore_id != (unsigned)LCORE_ID_ANY) {
+   /* EAL thread */
+   tid = lcore_config[lcore_id].thread_id;
+
+   s = pthread_setaffinity_np(tid, sizeof(rte_cpuset_t), cpusetp);
+   if (s != 0) {
+   RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
+   return -1;
+   }
+
+   /* store socket_id in TLS for quick access */
+   RTE_PER_LCORE(_socket_id) =
+   eal_cpuset_socket_id(cpusetp);
+
+   /* store cpuset in TLS for quick access */
+   rte_memcpy(_PER_LCORE(_cpuset), cpusetp,
+  sizeof(rte_cpuset_t));
+
+   /* update lcore_config */
+   lcore_config[lcore_id].socket_id = RTE_PER_LCORE(_socket_id);
+   rte_memcpy(_config[lcore_id].cpuset, cpusetp,
+  sizeof(rte_cpuset_t));
+   } else {
+   /* none EAL thread */
+   tid = pthread_self();
+
+   s = pthread_setaffinity_np(tid, sizeof(rte_cpuset_t), cpusetp);
+   if (s != 0) {
+   RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
+   return -1;
+   }
+
+   /* store cpuset in TLS for quick access */
+   rte_memcpy(_PER_LCORE(_cpuset), cpusetp,
+  sizeof(rte_cpuset_t));
+
+   /* store socket_id in TLS for quick access */
+   RTE_PER_LCORE(_socket_id) =
+   

[dpdk-dev] [PATCH v2 04/15] eal: new TLS definition and API declaration

2015-01-28 Thread Cunming Liang
1. add two TLS *_socket_id* and *_cpuset*
2. add two external API rte_thread_set/get_affinity
3. add one internal API eal_thread_dump_affinity

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/bsdapp/eal/eal_thread.c|  2 ++
 lib/librte_eal/common/eal_thread.h| 14 ++
 lib/librte_eal/common/include/rte_lcore.h | 29 +++--
 lib/librte_eal/linuxapp/eal/eal_thread.c  |  2 ++
 4 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c 
b/lib/librte_eal/bsdapp/eal/eal_thread.c
index ab05368..10220c7 100644
--- a/lib/librte_eal/bsdapp/eal/eal_thread.c
+++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
@@ -56,6 +56,8 @@
 #include "eal_thread.h"

 RTE_DEFINE_PER_LCORE(unsigned, _lcore_id);
+RTE_DEFINE_PER_LCORE(unsigned, _socket_id);
+RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);

 /*
  * Send a message to a slave lcore identified by slave_id to call a
diff --git a/lib/librte_eal/common/eal_thread.h 
b/lib/librte_eal/common/eal_thread.h
index a25ee86..28edf51 100644
--- a/lib/librte_eal/common/eal_thread.h
+++ b/lib/librte_eal/common/eal_thread.h
@@ -102,4 +102,18 @@ eal_cpuset_socket_id(rte_cpuset_t *cpusetp)
return socket_id;
 }

+/**
+ * Dump the current pthread cpuset.
+ * This function is private to EAL.
+ *
+ * @param str
+ *   The string buffer the cpuset will dump to.
+ * @param size
+ *   The string buffer size.
+ */
+#define CPU_STR_LEN256
+void
+eal_thread_dump_affinity(char str[], unsigned size);
+
+
 #endif /* EAL_THREAD_H */
diff --git a/lib/librte_eal/common/include/rte_lcore.h 
b/lib/librte_eal/common/include/rte_lcore.h
index 4c7d6bb..facdbdc 100644
--- a/lib/librte_eal/common/include/rte_lcore.h
+++ b/lib/librte_eal/common/include/rte_lcore.h
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 

 #ifdef __cplusplus
 extern "C" {
@@ -80,7 +81,9 @@ struct lcore_config {
  */
 extern struct lcore_config lcore_config[RTE_MAX_LCORE];

-RTE_DECLARE_PER_LCORE(unsigned, _lcore_id); /**< Per core "core id". */
+RTE_DECLARE_PER_LCORE(unsigned, _lcore_id);  /**< Per thread "lcore id". */
+RTE_DECLARE_PER_LCORE(unsigned, _socket_id); /**< Per thread "socket id". */
+RTE_DECLARE_PER_LCORE(rte_cpuset_t, _cpuset); /**< Per thread "cpuset". */

 /**
  * Return the ID of the execution unit we are running on.
@@ -146,7 +149,7 @@ rte_lcore_index(int lcore_id)
 static inline unsigned
 rte_socket_id(void)
 {
-   return lcore_config[rte_lcore_id()].socket_id;
+   return RTE_PER_LCORE(_socket_id);
 }

 /**
@@ -229,6 +232,28 @@ rte_get_next_lcore(unsigned i, int skip_master, int wrap)
 i

[dpdk-dev] [PATCH v2 03/15] eal: add support parsing socket_id from cpuset

2015-01-28 Thread Cunming Liang
It returns the socket_id if all cpus in the cpuset belongs
to the same NUMA node, otherwise it will return SOCKET_ID_ANY.

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/bsdapp/eal/eal_lcore.c   |  7 +
 lib/librte_eal/common/eal_thread.h  | 52 +
 lib/librte_eal/linuxapp/eal/eal_lcore.c |  7 +
 3 files changed, 66 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_lcore.c 
b/lib/librte_eal/bsdapp/eal/eal_lcore.c
index 72f8ac2..162fb4f 100644
--- a/lib/librte_eal/bsdapp/eal/eal_lcore.c
+++ b/lib/librte_eal/bsdapp/eal/eal_lcore.c
@@ -41,6 +41,7 @@
 #include 

 #include "eal_private.h"
+#include "eal_thread.h"

 /* No topology information available on FreeBSD including NUMA info */
 #define cpu_core_id(X) 0
@@ -112,3 +113,9 @@ rte_eal_cpu_init(void)

return 0;
 }
+
+unsigned
+eal_cpu_socket_id(__rte_unused unsigned cpu_id)
+{
+   return cpu_socket_id(cpu_id);
+}
diff --git a/lib/librte_eal/common/eal_thread.h 
b/lib/librte_eal/common/eal_thread.h
index b53b84d..a25ee86 100644
--- a/lib/librte_eal/common/eal_thread.h
+++ b/lib/librte_eal/common/eal_thread.h
@@ -34,6 +34,10 @@
 #ifndef EAL_THREAD_H
 #define EAL_THREAD_H

+#include 
+
+#include 
+
 /**
  * basic loop of thread, called for each thread by eal_init().
  *
@@ -50,4 +54,52 @@ __attribute__((noreturn)) void *eal_thread_loop(void *arg);
  */
 void eal_thread_init_master(unsigned lcore_id);

+/**
+ * Get the NUMA socket id from cpu id.
+ * This function is private to EAL.
+ *
+ * @param cpu_id
+ *   The logical process id.
+ * @return
+ *   socket_id or SOCKET_ID_ANY
+ */
+unsigned eal_cpu_socket_id(unsigned cpu_id);
+
+/**
+ * Get the NUMA socket id from cpuset.
+ * This function is private to EAL.
+ *
+ * @param cpusetp
+ *   The point to a valid cpu set.
+ * @return
+ *   socket_id or SOCKET_ID_ANY
+ */
+static inline int
+eal_cpuset_socket_id(rte_cpuset_t *cpusetp)
+{
+   unsigned cpu = 0;
+   int socket_id = SOCKET_ID_ANY;
+   int sid;
+
+   if (cpusetp == NULL)
+   return SOCKET_ID_ANY;
+
+   do {
+   if (!CPU_ISSET(cpu, cpusetp))
+   continue;
+
+   if (socket_id == SOCKET_ID_ANY)
+   socket_id = eal_cpu_socket_id(cpu);
+
+   sid = eal_cpu_socket_id(cpu);
+   if (socket_id != sid) {
+   socket_id = SOCKET_ID_ANY;
+   break;
+   }
+
+   } while (++cpu < RTE_MAX_LCORE);
+
+   return socket_id;
+}
+
 #endif /* EAL_THREAD_H */
diff --git a/lib/librte_eal/linuxapp/eal/eal_lcore.c 
b/lib/librte_eal/linuxapp/eal/eal_lcore.c
index 29615f8..922af6d 100644
--- a/lib/librte_eal/linuxapp/eal/eal_lcore.c
+++ b/lib/librte_eal/linuxapp/eal/eal_lcore.c
@@ -45,6 +45,7 @@

 #include "eal_private.h"
 #include "eal_filesystem.h"
+#include "eal_thread.h"

 #define SYS_CPU_DIR "/sys/devices/system/cpu/cpu%u"
 #define CORE_ID_FILE "topology/core_id"
@@ -197,3 +198,9 @@ rte_eal_cpu_init(void)

return 0;
 }
+
+unsigned
+eal_cpu_socket_id(unsigned cpu_id)
+{
+   return cpu_socket_id(cpu_id);
+}
-- 
1.8.1.4



[dpdk-dev] [PATCH v2 02/15] eal: new eal option '--lcores' for cpu assignment

2015-01-28 Thread Cunming Liang
It supports one new eal long option '--lcores' for EAL thread cpuset assignment.

The format pattern:
--lcores='lcores[@cpus]<,lcores[@cpus]>'
lcores, cpus could be a single digit/range or a group.
'(' and ')' are necessary if it's a group.
If not supply '@cpus', the value of cpus uses the same as lcores.

e.g. '1,2@(5-7),(3-5)@(0,2),(0,6),7-8' means starting 9 EAL thread as below
  lcore 0 runs on cpuset 0x41 (cpu 0,6)
  lcore 1 runs on cpuset 0x2 (cpu 1)
  lcore 2 runs on cpuset 0xe0 (cpu 5,6,7)
  lcore 3,4,5 runs on cpuset 0x5 (cpu 0,2)
  lcore 6 runs on cpuset 0x41 (cpu 0,6)
  lcore 7 runs on cpuset 0x80 (cpu 7)
  lcore 8 runs on cpuset 0x100 (cpu 8)

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/common/eal_common_launch.c  |   1 -
 lib/librte_eal/common/eal_common_options.c | 300 -
 lib/librte_eal/common/eal_options.h|   2 +
 lib/librte_eal/linuxapp/eal/Makefile   |   1 +
 4 files changed, 299 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_launch.c 
b/lib/librte_eal/common/eal_common_launch.c
index 599f83b..2d732b1 100644
--- a/lib/librte_eal/common/eal_common_launch.c
+++ b/lib/librte_eal/common/eal_common_launch.c
@@ -117,4 +117,3 @@ rte_eal_mp_wait_lcore(void)
rte_eal_wait_lcore(lcore_id);
}
 }
-
diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index 67e02dc..29ebb6f 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "eal_internal_cfg.h"
 #include "eal_options.h"
@@ -85,6 +86,7 @@ eal_long_options[] = {
{OPT_XEN_DOM0, 0, 0, OPT_XEN_DOM0_NUM},
{OPT_CREATE_UIO_DEV, 1, NULL, OPT_CREATE_UIO_DEV_NUM},
{OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM},
+   {OPT_LCORES, 1, 0, OPT_LCORES_NUM},
{0, 0, 0, 0}
 };

@@ -255,9 +257,11 @@ eal_parse_corelist(const char *corelist)
if (min == RTE_MAX_LCORE)
min = idx;
for (idx = min; idx <= max; idx++) {
-   cfg->lcore_role[idx] = ROLE_RTE;
-   lcore_config[idx].core_index = count;
-   count++;
+   if (cfg->lcore_role[idx] != ROLE_RTE) {
+   cfg->lcore_role[idx] = ROLE_RTE;
+   lcore_config[idx].core_index = count;
+   count++;
+   }
}
min = RTE_MAX_LCORE;
} else
@@ -292,6 +296,279 @@ eal_parse_master_lcore(const char *arg)
return 0;
 }

+/*
+ * Parse elem, the elem could be single number/range or '(' ')' group
+ * Within group elem, '-' used for a range seperator;
+ *',' used for a single number.
+ */
+static int
+eal_parse_set(const char *input, uint16_t set[], unsigned num)
+{
+   unsigned idx;
+   const char *str = input;
+   char *end = NULL;
+   unsigned min, max;
+
+   memset(set, 0, num * sizeof(uint16_t));
+
+   while (isblank(*str))
+   str++;
+
+   /* only digit or left bracket is qulify for start point */
+   if ((!isdigit(*str) && *str != '(') || *str == '\0')
+   return -1;
+
+   /* process single number or single range of number */
+   if (*str != '(') {
+   errno = 0;
+   idx = strtoul(str, , 10);
+   if (errno || end == NULL || idx >= num)
+   return -1;
+   else {
+   while (isblank(*end))
+   end++;
+
+   min = idx;
+   max = idx;
+   if (*end == '-') {
+   /* proccess single - */
+   end++;
+   while (isblank(*end))
+   end++;
+   if (!isdigit(*end))
+   return -1;
+
+   errno = 0;
+   idx = strtoul(end, , 10);
+   if (errno || end == NULL || idx >= num)
+   return -1;
+   max = idx;
+   while (isblank(*end))
+   end++;
+   if (*end != ',' && *end != '\0')
+   return -1;
+   }
+
+   if (*end != ',' && *end != '\0' &&
+   *end != '@')
+   return -1;
+
+   for (idx = RTE_MIN(min, max);
+idx <= 

[dpdk-dev] [PATCH v2 01/15] eal: add cpuset into per EAL thread lcore_config

2015-01-28 Thread Cunming Liang
The patch adds 'cpuset' into per-lcore configure 'lcore_config[]',
as the lcore no longer always 1:1 pinning with physical cpu.
The lcore now stands for a EAL thread rather than a logical cpu.

It doesn't change the default behavior of 1:1 mapping, but allows to
affinity the EAL thread to multiple cpus.

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/bsdapp/eal/eal_lcore.c | 7 +++
 lib/librte_eal/bsdapp/eal/eal_memory.c| 2 ++
 lib/librte_eal/common/include/rte_lcore.h | 8 
 lib/librte_eal/linuxapp/eal/Makefile  | 1 +
 lib/librte_eal/linuxapp/eal/eal_lcore.c   | 8 
 5 files changed, 26 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_lcore.c 
b/lib/librte_eal/bsdapp/eal/eal_lcore.c
index 662f024..72f8ac2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_lcore.c
+++ b/lib/librte_eal/bsdapp/eal/eal_lcore.c
@@ -76,11 +76,18 @@ rte_eal_cpu_init(void)
 * ones and enable them by default.
 */
for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+   /* init cpuset for per lcore config */
+   CPU_ZERO(_config[lcore_id].cpuset);
+
lcore_config[lcore_id].detected = (lcore_id < ncpus);
if (lcore_config[lcore_id].detected == 0) {
config->lcore_role[lcore_id] = ROLE_OFF;
continue;
}
+
+   /* By default, lcore 1:1 map to cpu id */
+   CPU_SET(lcore_id, _config[lcore_id].cpuset);
+
/* By default, each detected core is enabled */
config->lcore_role[lcore_id] = ROLE_RTE;
lcore_config[lcore_id].core_id = cpu_core_id(lcore_id);
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c 
b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ee87d..a34d500 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -45,6 +45,8 @@
 #include "eal_internal_cfg.h"
 #include "eal_filesystem.h"

+/* avoid re-defined against with freebsd header */
+#undef PAGE_SIZE
 #define PAGE_SIZE (sysconf(_SC_PAGESIZE))

 /*
diff --git a/lib/librte_eal/common/include/rte_lcore.h 
b/lib/librte_eal/common/include/rte_lcore.h
index 49b2c03..4c7d6bb 100644
--- a/lib/librte_eal/common/include/rte_lcore.h
+++ b/lib/librte_eal/common/include/rte_lcore.h
@@ -50,6 +50,13 @@ extern "C" {

 #define LCORE_ID_ANY -1/**< Any lcore. */

+#if defined(__linux__)
+   typedef cpu_set_t rte_cpuset_t;
+#elif defined(__FreeBSD__)
+#include 
+   typedef cpuset_t rte_cpuset_t;
+#endif
+
 /**
  * Structure storing internal configuration (per-lcore)
  */
@@ -65,6 +72,7 @@ struct lcore_config {
unsigned socket_id;/**< physical socket id for this lcore */
unsigned core_id;  /**< core number on socket for this lcore */
int core_index;/**< relative index, starting from 0 */
+   rte_cpuset_t cpuset;   /**< cpu set which the lcore affinity to */
 };

 /**
diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index 72ecf3a..0e9c447 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -87,6 +87,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_dev.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_options.c

 CFLAGS_eal.o := -D_GNU_SOURCE
+CFLAGS_eal_lcore.o := -D_GNU_SOURCE
 CFLAGS_eal_thread.o := -D_GNU_SOURCE
 CFLAGS_eal_log.o := -D_GNU_SOURCE
 CFLAGS_eal_common_log.o := -D_GNU_SOURCE
diff --git a/lib/librte_eal/linuxapp/eal/eal_lcore.c 
b/lib/librte_eal/linuxapp/eal/eal_lcore.c
index c67e0e6..29615f8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_lcore.c
+++ b/lib/librte_eal/linuxapp/eal/eal_lcore.c
@@ -158,11 +158,19 @@ rte_eal_cpu_init(void)
 * ones and enable them by default.
 */
for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+   /* init cpuset for per lcore config */
+   CPU_ZERO(_config[lcore_id].cpuset);
+
+   /* in 1:1 mapping, record related cpu detected state */
lcore_config[lcore_id].detected = cpu_detected(lcore_id);
if (lcore_config[lcore_id].detected == 0) {
config->lcore_role[lcore_id] = ROLE_OFF;
continue;
}
+
+   /* By default, lcore 1:1 map to cpu id */
+   CPU_SET(lcore_id, _config[lcore_id].cpuset);
+
/* By default, each detected core is enabled */
config->lcore_role[lcore_id] = ROLE_RTE;
lcore_config[lcore_id].core_id = cpu_core_id(lcore_id);
-- 
1.8.1.4



[dpdk-dev] [PATCH v2 00/15] support multi-pthread per core

2015-01-28 Thread Cunming Liang
v2 changes:
  add '-' support for EAL option '--lcores' 

The patch series contain the enhancements of EAL and fixes for libraries
to run multi-pthreads(either EAL or non-EAL thread) per physical core. 
Two major changes list as below:
- Extend the core affinity of each EAL thread to 1:n.
  Each lcore stands for a EAL thread rather than a logical core.
  The change adds new EAL option to allow static lcore to cpuset assginment.
  Then a lcore(EAL thread) affinity to a cpuset, original 1:1 mapping is the 
special case.
- Fix the libraries to allow running on any non-EAL thread.
  It fix the gaps running libraries in non-EAL thread(dynamic created by user).
  Each fix libraries take care the case of rte_lcore_id() >= RTE_MAX_LCORE.

Thanks a million for the comments from Konstantin, Bruce, Mirek and Stephen in 
RFC review.


*** BLURB HERE ***

Cunming Liang (15):
  eal: add cpuset into per EAL thread lcore_config
  eal: new eal option '--lcores' for cpu assignment
  eal: add support parsing socket_id from cpuset
  eal: new TLS definition and API declaration
  eal: add eal_common_thread.c for common thread API
  eal: add rte_gettid() to acquire unique system tid
  eal: apply affinity of EAL thread by assigned cpuset
  enic: fix re-define freebsd compile complain
  malloc: fix the issue of SOCKET_ID_ANY
  log: fix the gap to support non-EAL thread
  eal: set _lcore_id and _socket_id to (-1) by default
  eal: fix recursive spinlock in non-EAL thraed
  mempool: add support to non-EAL thread
  ring: add support to non-EAL thread
  timer: add support to non-EAL thread

 lib/librte_eal/bsdapp/eal/Makefile |   1 +
 lib/librte_eal/bsdapp/eal/eal.c|  13 +-
 lib/librte_eal/bsdapp/eal/eal_lcore.c  |  14 +
 lib/librte_eal/bsdapp/eal/eal_memory.c |   2 +
 lib/librte_eal/bsdapp/eal/eal_thread.c |  76 +++---
 lib/librte_eal/common/eal_common_launch.c  |   1 -
 lib/librte_eal/common/eal_common_log.c |  17 +-
 lib/librte_eal/common/eal_common_options.c | 300 -
 lib/librte_eal/common/eal_common_thread.c  | 142 ++
 lib/librte_eal/common/eal_options.h|   2 +
 lib/librte_eal/common/eal_thread.h |  66 +
 .../common/include/generic/rte_spinlock.h  |   4 +-
 lib/librte_eal/common/include/rte_eal.h|  27 ++
 lib/librte_eal/common/include/rte_lcore.h  |  37 ++-
 lib/librte_eal/common/include/rte_log.h|   5 +
 lib/librte_eal/linuxapp/eal/Makefile   |   4 +
 lib/librte_eal/linuxapp/eal/eal.c  |   7 +-
 lib/librte_eal/linuxapp/eal/eal_lcore.c|  15 ++
 lib/librte_eal/linuxapp/eal/eal_thread.c   |  78 +++---
 lib/librte_malloc/malloc_heap.h|   7 +-
 lib/librte_mempool/rte_mempool.h   |  18 +-
 lib/librte_pmd_enic/enic.h |   1 +
 lib/librte_pmd_enic/enic_compat.h  |   1 +
 lib/librte_ring/rte_ring.h |  10 +-
 lib/librte_timer/rte_timer.c   |  40 ++-
 lib/librte_timer/rte_timer.h   |   2 +-
 26 files changed, 759 insertions(+), 131 deletions(-)
 create mode 100644 lib/librte_eal/common/eal_common_thread.c

-- 
1.8.1.4



[dpdk-dev] Process question: reviewing older patches

2015-01-28 Thread Jay Rolette
Thanks Thomas and Neil. Sadly, no joy. While I generally like gmail for my
mail, there's not a reasonable way to import the mbox file or to control
the message id.

If someone else wants to resend the message to the list, I can reply to
that. Otherwise, here are the relevant bits from the original patch email:

>From patchwork Wed Jul 23 06:45:12 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [dpdk-dev] kni: optimizing the rte_kni_rx_burst
From: Hemant Agrawal 
X-Patchwork-Id: 84
Message-Id: <14060979121185-git-send-email-Hemant at freescale.com>
To: 
Date: Wed, 23 Jul 2014 12:15:12 +0530

The current implementation of rte_kni_rx_burst polls the fifo for buffers.
Irrespective of success or failure, it allocates the mbuf and try to put
them into the alloc_q
if the buffers are not added to alloc_q, it frees them.
This waste lots of cpu cycles in allocating and freeing the buffers if
alloc_q is full.

The logic has been changed to:
1. Initially allocand add buffer(burstsize) to alloc_q
2. Add buffers to alloc_q only when you are pulling out the buffers.

Signed-off-by: Hemant Agrawal 

---
lib/librte_kni/rte_kni.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
index 76feef4..01e85f8 100644
--- a/lib/librte_kni/rte_kni.c
+++ b/lib/librte_kni/rte_kni.c
@@ -263,6 +263,9 @@ rte_kni_alloc(struct rte_mempool *pktmbuf_pool,

  ctx->in_use = 1;

+ /* Allocate mbufs and then put them into alloc_q */
+ kni_allocate_mbufs(ctx);
+
  return ctx;

 fail:
@@ -369,8 +372,9 @@ rte_kni_rx_burst(struct rte_kni *kni, struct rte_mbuf
**mbufs, unsigned num)
 {
  unsigned ret = kni_fifo_get(kni->tx_q, (void **)mbufs, num);

- /* Allocate mbufs and then put them into alloc_q */
- kni_allocate_mbufs(kni);
+ /* If buffers removed, allocate mbufs and then put them into alloc_q */
+ if(ret)
+ kni_allocate_mbufs(kni);

  return ret;
 }

The patch looks good from a DPDK 1.6r2 viewpoint. We saw the same behavior
in our app and ended up avoiding it higher in the stack (in our code).

Reviewed-by: Jay Rolette 

Jay

On Wed, Jan 28, 2015 at 10:49 AM, Neil Horman  wrote:

> On Wed, Jan 28, 2015 at 09:52:48AM -0600, Jay Rolette wrote:
> > There's a fairly old KNI patch (http://dpdk.org/dev/patchwork/patch/84/)
> > that I reviewed, but I'm not seeing how to submit my "Reviewed-by" when I
> > don't have any of the emails from the patch in my mail client.
> >
> > I can copy the text from the 'mbox' link in Patchwork into an email, but
> > I'm guessing that may not make the patch toolchain happy.
> >
> > What's the right way to do this?
> >
> Just grab the message id from the patchwork site, and list it in the
> envelope
> headers in-reply-to: field when you respond.  You won't have the rest of
> the
> conversation field in the thread, but you will respond properly to the
> thread,
> and patchwork will pick up the ACK
> Neil
>
> > Thanks,
> > Jay
> >
>


[dpdk-dev] [PATCH v2 0/6] new ntuple filter replaces 2tuple and 5tuple filters

2015-01-28 Thread De Lara Guarch, Pablo


> -Original Message-
> From: Wu, Jingjing
> Sent: Thursday, January 22, 2015 7:38 AM
> To: dev at dpdk.org
> Cc: Wu, Jingjing; De Lara Guarch, Pablo; Cao, Min; Xu, HuilongX
> Subject: [PATCH v2 0/6] new ntuple filter replaces 2tuple and 5tuple filters
> 
> v2 changes:
>   - remove the code which is already applied in patch "Integrate ethertype
> filter in igb/ixgbe driver to new API".
>   - modify commands' description in doc testpmd_funcs.rst.
> 
> The patch set uses filter_ctrl API to replace old 2tuple and 5tuple filter 
> APIs.
> It defines ntuple filter to combine 2tuple and 5tuple types.
> It uses new functions and structure to replace old ones in igb/ixgbe driver,
> new commands to replace old ones in testpmd, and removes the old APIs.
> It removes the filter's index parameters from user interface, only the
> filter's key and assigned queue are visible to user.
> 
> Jingjing Wu (6):
>   ethdev: define ntuple filter type and its structure
>   ixgbe: ntuple filter functions replace old ones for 5tuple filter
>   e1000: ntuple filter functions replace old ones for 2tuple and 5tuple
> filter
>   testpmd: new commands for ntuple filter
>   ethdev: remove old APIs and structures of 5tuple and 2tuple filters
>   doc: commands changed in testpmd_funcs for 2tuple amd 5tuple filter
> 
>  app/test-pmd/cmdline.c  | 406 ++---
>  app/test-pmd/config.c   |  65 ---
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  99 +---
>  lib/librte_ether/rte_eth_ctrl.h |  57 ++
>  lib/librte_ether/rte_ethdev.c   | 116 
>  lib/librte_ether/rte_ethdev.h   | 192 --
>  lib/librte_pmd_e1000/e1000_ethdev.h |  69 ++-
>  lib/librte_pmd_e1000/igb_ethdev.c   | 869 +++--
> ---
>  lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 468 +++
>  lib/librte_pmd_ixgbe/ixgbe_ethdev.h |  52 +-
>  10 files changed, 1300 insertions(+), 1093 deletions(-)
> 
> --
> 1.9.3

Acked-by: Pablo de Lara 

Just mind that the last patch (changing the documentation) does not apply 
properly,
 as there was another patch (from you I think), that modifies that document.
Could you send another version of the last patch? 
Not sure if that's OK or if it is better to send the full patchset again.



[dpdk-dev] Pktgen-DPDK rate and traffic inconsistency problem

2015-01-28 Thread Wodkowski, PawelX
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alexandre Frigon
> Sent: Tuesday, January 27, 2015 8:31 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] Pktgen-DPDK rate and traffic inconsistency problem
> 
> Hi all,
> 
>  I'm using dpdk 1.8 and pktgen-dpdk 2.8 to generate traffic on a back-to-back
> setup both equipped with 82599EB 10-Gigabit NIC.
> The problem is when I start it, pktgen indicates 1Mbits/s Tx with 64B 
> packet
> size,  but I'm receiving  about 15% of it on the other end.
> This percentage seems to be proportional with the packet size.
> 
> e.g.
> Using nload to read Rx traffic
> Pktgen:   Tx: 1Mbits/s==> Other end:  Rx 1660 Mbits/s
>   Rate: 100%
>   Pkt size: 64B
> 
> 
> e.g 2
> Pktgen:   Tx: 1Mbits/s==> Other end:  Rx 9385 Mbits/s
>   Rate: 100%
>   Pkt size: 1518B
> 
> 
> Pktgen is started with this command on a Xeon(R) CPU E31270 @ 3.40GHz
> ./app/pktgen -c 1f -n 3 --proc-type auto --socket-mem 1024 --file-prefix pg 
> -- -p
> 0x3 -P  -N -m "[1:3].0, [2:4].1"

>From past experience I don't assign more than 1 core per port. It had some
race conditions issues and one core I capable to RX or TX full 10G.
Also check if you assign proper cores/memory for your NICs (the same
NUMA node).

> 
> Is there something I'm not configuring correctly or something I have miss?
> 
> Also, the % rate is  acting strangely since anything above 50% doesn't change
> the Tx rate and anything below is modifying it
> e.g   Tx:  1Mbits/s   5000Mbits/s
>   %Rate:  >=50%   25%
> 
> 

Actually I am getting exactly opposite results :)
If I set rate to 50% I get
MBits/s Rx/Tx   : 0/9942 9942/0  9942/9942

For 10%:
MBits/s Rx/Tx   : 0/1997 1997/0  1997/1997

Which is about 2x  set :D

Additionaly I am getting message when "start 0" -> "stop 0" -> "start 0" is 
issued
PMD: ixgbe_dev_rx_init(): forcing scatter mode

So there is definitely something wrong there but don't know where.
Another issue I encountered is build system that fail when building out-of-tree.

Till this is fixed you can try version 2.7.1 that is working for me.

Pawel


[dpdk-dev] Regarding UDP checksum offload

2015-01-28 Thread Olivier MATZ
Hi Prashant,

On 01/28/2015 12:25 PM, Prashant Upadhyaya wrote:
> Hi,
>
> I am aware that this topic has been discussed several times before, but I
> am somehow still stuck with this.
>
> I am using dpdk 1.6r1, intel 82599 NIC.
> I have an mbuf, I have hand-constructed a UDP packet (IPv4) in the data
> portion, filled the relevant fields of the headers and I do a tx burst. No
> problems, the destination gets the packet. I filled UDP checksum as zero
> and there was no checksum offloaded in ol_flags.
>
> Now in the same usecase, I want to offload UDP checksum.
> I am aware that the checksum field in UDP header has to be filled with the
> pseudo header checksum, I did that, duly added the PKT_TX_UDP_CKSUM flag in
> ol_flags, did a tx_burst and the packet does not reach the destination.
>
> I realized that I have to fill the following fields as well (my packet does
> not have vlan tag)
> mbuf->pkt.vlan_macip.f.l2_len
> mbuf->pkt.vlan_macip.f.l3_len
>
> so I filled the l2_len as 14 and l3_len as 20 (IP header with no options)
> Yet the packet did not reach the destination.
>
> So my question is -- am I filling the l2_len and l3_len properly ?
> Is there anything else to be done before I can get this UDP checksum
> offload to work properly for me.


As far as I remember, this should be working on 1.6r1.
When you say "did not reach the destination", do you mean that the
packet is not transmitted at all? Or is it transmitted with a wrong
checksum?

I think you should try to reproduce the issue with the latest DPDK
which is known to work with test-pmd (csum forward engine).

Regards,
Olivier



[dpdk-dev] [PATCH 0/6] Support NVGRE on i40e

2015-01-28 Thread Olivier MATZ
Hi Min,

On 01/27/2015 06:46 AM, Cao, Min wrote:
> Test by: min.cao 
> Patch name:   [dpdk-dev] [PATCH 0/6]  Support NVGRE on i40e
> Test Flag:Tested-by
> Tester name:  min.cao at intel.com
> Result summary:   total 2 cases, 2 passed, 0 failed
>
> Test Case 1:  
> Name: nvgre filter
> Environment:  OS: Fedora20 3.11.10-301.fc20.x86_64
>   gcc (GCC) 4.8.2
>   CPU: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
>   NIC: Fortville eagle
> [...]

Just one remarq about your test report: it's quite useful to have
such reports, showing that a feature is tested. However, I think
it would be much better if you provide all the means to reproduce
the test (testpmd configuration, scripts to generate packets, ...).

For instance, this could help people wanting to implement the same
on another PMD to validate with the same test plan than yours.

Regards,
Olivier



[dpdk-dev] [PATCH] mk: allow application to override clean

2015-01-28 Thread Olivier MATZ
Hi Stephen,

On 01/23/2015 07:19 AM, stephen at networkplumber.org wrote:
> From: Stephen Hemminger 
>
> In some cases application may want to have additional rules
> for clean. This can be handled by allowing the double colon
> form of rule.
>
>   https://www.gnu.org/software/make/manual/html_node/Double_002dColon.html

There is already a way to do that in dpdk makefiles: you can add
the following code in your application Makefile, before the line
that includes $(RTE_SDK)/mk/rte.app.mk:

POSTCLEAN += my_clean

.PHONY: my_clean
my_clean:
@echo executed after clean


Regards,
Olivier



[dpdk-dev] [PATCH v2 0/3] PMD ring MAC management, fix initialization, link up/down

2015-01-28 Thread Doherty, Declan
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Tomasz Kulasek
> Sent: Monday, January 19, 2015 11:57 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v2 0/3] PMD ring MAC management, fix 
> initialization,
> link up/down
> 
> Patch split into smaller parts to separate features from previous version.
> 
> Tomasz Kulasek (3):
>   PMD Ring - Add link up/down functions
>   PMD Ring - Add MAC addr add/remove functions
>   PMD Ring - Fix for per device management
> 
>  lib/librte_pmd_ring/rte_eth_ring.c |   62
> +---
>  1 file changed, 57 insertions(+), 5 deletions(-)
> 
> --
> 1.7.9.5

Acked-by: Declan Doherty 


[dpdk-dev] [PATCH v2] maintainers: start a Linux-style file

2015-01-28 Thread Thomas Monjalon
This MAINTAINERS file is inspired from the Linux one.

Almost all files are split into areas in order to identify maintainers of
each DPDK area. Note that a maintainer is not a git tree manager.
Candidates are welcome to send a patch to sign up for one or several areas.

There is a script to check coverage, especially when adding or moving files.

Signed-off-by: Thomas Monjalon 
Acked-by: Neil Horman 
---
Changes in v2:
- add copyright and licence to check-maintainers.sh
- minor improvements in the script
---
 MAINTAINERS  | 388 +++
 scripts/check-maintainers.sh | 117 +
 2 files changed, 505 insertions(+)
 create mode 100644 MAINTAINERS
 create mode 100755 scripts/check-maintainers.sh

diff --git a/MAINTAINERS b/MAINTAINERS
new file mode 100644
index 000..1f7d04a
--- /dev/null
+++ b/MAINTAINERS
@@ -0,0 +1,388 @@
+DPDK Maintainers
+
+
+The intention of this file is to provide a set of names that we can rely on
+for helping in patch reviews and questions.
+These names are additional recipients for emails sent to dev at dpdk.org.
+Please avoid private emails.
+
+Descriptions of section entries:
+
+   M: Maintainer's Full Name 
+   T: Git tree location.
+   F: Files and directories with wildcard patterns.
+  A trailing slash includes all files and subdirectory files.
+  A wildcard includes all files but not subdirectories.
+  One pattern per line. Multiple F: lines acceptable.
+   X: Files and directories exclusion, same rules as F:
+   K: Keyword regex pattern to match content.
+  One regex pattern per line. Multiple K: lines acceptable.
+
+
+General Project Administration
+--
+M: Thomas Monjalon 
+T: git://dpdk.org/dpdk
+F: MAINTAINERS
+F: scripts/check-maintainers.sh
+
+
+Security Issues
+---
+M: maintainers at dpdk.org
+
+
+Documentation (with overlaps)
+-
+F: doc/
+
+
+Build System
+
+F: GNUmakefile
+F: Makefile
+F: config/
+F: mk/
+F: pkg/
+F: scripts/depdirs-rule.sh
+F: scripts/gen-build-mk.sh
+F: scripts/gen-config-h.sh
+F: scripts/relpath.sh
+
+
+Environment Abstraction Layer
+-
+
+EAL API and common code
+M: Thomas Monjalon 
+F: lib/librte_eal/common/*
+F: lib/librte_eal/common/include/*
+F: lib/librte_eal/common/include/generic/
+F: app/test/test_alarm.c
+F: app/test/test_atomic.c
+F: app/test/test_byteorder.c
+F: app/test/test_common.c
+F: app/test/test_cpuflags.c
+F: app/test/test_cycles.c
+F: app/test/test_debug.c
+F: app/test/test_devargs.c
+F: app/test/test_eal*
+F: app/test/test_errno.c
+F: app/test/test_func_reentrancy.c
+F: app/test/test_interrupts.c
+F: app/test/test_logs.c
+F: app/test/test_memcpy*
+F: app/test/test_memory.c
+F: app/test/test_memzone.c
+F: app/test/test_pci.c
+F: app/test/test_per_lcore.c
+F: app/test/test_prefetch.c
+F: app/test/test_rwlock.c
+F: app/test/test_spinlock.c
+F: app/test/test_string_fns.c
+F: app/test/test_tailq.c
+F: app/test/test_version.c
+
+Secondary process
+K: RTE_PROC_
+F: doc/guides/prog_guide/multi_proc_support.rst
+F: app/test/test_mp_secondary.c
+F: examples/multi_process/
+F: doc/guides/sample_app_ug/multi_process.rst
+
+IBM Power
+F: lib/librte_eal/common/include/arch/ppc_64/
+
+Intel x86
+F: lib/librte_eal/common/include/arch/x86/
+
+Linux EAL (with overlaps)
+F: lib/librte_eal/linuxapp/Makefile
+F: lib/librte_eal/linuxapp/eal/
+F: doc/guides/linux_gsg/
+
+Linux UIO
+F: lib/librte_eal/linuxapp/igb_uio/
+F: lib/librte_eal/linuxapp/eal/*uio*
+
+Linux VFIO
+F: lib/librte_eal/linuxapp/eal/*vfio*
+
+Linux Xen
+F: lib/librte_eal/linuxapp/xen_dom0/
+F: lib/librte_eal/linuxapp/eal/*xen*
+F: lib/librte_eal/linuxapp/eal/include/exec-env/rte_dom0_common.h
+F: lib/librte_mempool/rte_dom0_mempool.c
+F: lib/librte_pmd_xenvirt/
+F: app/test-pmd/mempool_*
+F: examples/vhost_xen/
+F: doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst
+
+FreeBSD EAL (with overlaps)
+F: lib/librte_eal/bsdapp/Makefile
+F: lib/librte_eal/bsdapp/eal/
+F: doc/guides/freebsd_gsg/
+
+FreeBSD contigmem
+F: lib/librte_eal/bsdapp/contigmem/
+
+FreeBSD UIO
+F: lib/librte_eal/bsdapp/nic_uio/
+
+
+Core Libraries
+--
+
+Memory management
+F: lib/librte_malloc/
+F: doc/guides/prog_guide/malloc_lib.rst
+F: app/test/test_malloc.c
+F: lib/librte_mempool/
+F: doc/guides/prog_guide/mempool_lib.rst
+F: app/test/test_mempool*
+F: app/test/test_func_reentrancy.c
+
+Ring queue
+F: lib/librte_ring/
+F: app/test/test_ring*
+F: app/test/test_func_reentrancy.c
+
+Packet buffer
+F: lib/librte_mbuf/
+F: doc/guides/prog_guide/mbuf_lib.rst
+F: app/test/test_mbuf.c
+
+Ethernet API
+M: Thomas Monjalon 
+F: lib/librte_ether/
+
+
+Drivers
+---
+
+Link bonding
+F: lib/librte_pmd_bond/
+F: doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst
+F: app/test/test_link_bonding.c
+
+Linux KNI
+F: lib/librte_eal/linuxapp/kni/
+F: lib/librte_kni/
+F: 

[dpdk-dev] Process question: reviewing older patches

2015-01-28 Thread Neil Horman
On Wed, Jan 28, 2015 at 09:52:48AM -0600, Jay Rolette wrote:
> There's a fairly old KNI patch (http://dpdk.org/dev/patchwork/patch/84/)
> that I reviewed, but I'm not seeing how to submit my "Reviewed-by" when I
> don't have any of the emails from the patch in my mail client.
> 
> I can copy the text from the 'mbox' link in Patchwork into an email, but
> I'm guessing that may not make the patch toolchain happy.
> 
> What's the right way to do this?
> 
Just grab the message id from the patchwork site, and list it in the envelope
headers in-reply-to: field when you respond.  You won't have the rest of the
conversation field in the thread, but you will respond properly to the thread,
and patchwork will pick up the ACK
Neil

> Thanks,
> Jay
> 


[dpdk-dev] ACL trie insertion and search

2015-01-28 Thread Ananyev, Konstantin
Hi

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Rapelly, Varun
> Sent: Wednesday, January 28, 2015 9:07 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] ACL trie insertion and search
> 
> Hi,
> 
> We were converting the acl rule data, from host to network byte order, [by 
> mistake] while inserting into trie. And while searching we
> are not converting the search data to n/w byte order.
> With the above also rules are matching, except few scenarios.
> 
> After correcting the above mistake, all rules are matching perfectly fine.
> 
> I believe the above[converting while insertion] should also work for all 
> rules. Please clarify the above.

Yes, that's correct.
rte_acl_add_rules() expects all fields to be in host byte order, while
rte_acl_classify()expects all fields in input data buffers to be in network 
byte order.
As mentioned in comments in rte_acl.h.  

Konstantin

> 
> Regards,
> Varun



[dpdk-dev] ACL trie insertion and search

2015-01-28 Thread Rapelly, Varun
Hi,

We were converting the acl rule data, from host to network byte order, [by 
mistake] while inserting into trie. And while searching we are not converting 
the search data to n/w byte order.
With the above also rules are matching, except few scenarios.

After correcting the above mistake, all rules are matching perfectly fine.

I believe the above[converting while insertion] should also work for all rules. 
Please clarify the above.

Regards,
Varun



[dpdk-dev] [PATCH v2 02/24] virtio: Use weaker barriers

2015-01-28 Thread Ouyang, Changchun

> -Original Message-
> From: Xie, Huawei
> Sent: Wednesday, January 28, 2015 12:16 AM
> To: Stephen Hemminger
> Cc: Ouyang, Changchun; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v2 02/24] virtio: Use weaker barriers
> 
> 
> 
> > -Original Message-
> > From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> > Sent: Tuesday, January 27, 2015 5:59 PM
> > To: Xie, Huawei
> > Cc: Ouyang, Changchun; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2 02/24] virtio: Use weaker barriers
> >
> >
> > > I recall our original code is virtio_wmb().
> > > Use store fence to ensure all updates to entries before updating the
> index.
> > > Why do we need virtio_rmb() here and add virtio_wmb after
> > vq_update_avail_idx()?
> >
> > Store fence is unnecessary, Intel CPU's are cache coherent, please
> > read the virtio Linux ring header file for explanation. A full fence
> > WMB is more expensive and causes CPU stall
> >
> 
> 
> I mean virtio_wmb rather than virtio_rmb should be used here, and both of
> them are defined as compiler barrier.
> 
> The following code is linux virtio driver for adding buffer to vring.
> /* Put entry in available array (but don't update avail->idx until they
>* do sync). */
>   avail = (vq->vring.avail->idx & (vq->vring.num-1));
>   vq->vring.avail->ring[avail] = head;
> 
>   /* Descriptors and available array need to be set before we expose
> the
>* new available array entries. */
>   virtio_wmb(vq->weak_barriers);
>   vq->vring.avail->idx++;
> 

Yes, use virtio_wmb is better here, will change it in next version.

Thanks
Changchun




[dpdk-dev] [PATCH v2 04/24] virtio: Add support for Link State interrupt

2015-01-28 Thread Ouyang, Changchun
Hi Stephen,

> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Tuesday, January 27, 2015 6:00 PM
> To: Xie, Huawei
> Cc: Ouyang, Changchun; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 04/24] virtio: Add support for Link State
> interrupt
> 
> On Tue, 27 Jan 2015 09:04:07 +
> "Xie, Huawei"  wrote:
> 
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ouyang
> > > Changchun
> > > Sent: Tuesday, January 27, 2015 10:36 AM
> > > To: dev at dpdk.org
> > > Subject: [dpdk-dev] [PATCH v2 04/24] virtio: Add support for Link
> > > State interrupt
> > >
> > > Virtio has link state interrupt which can be used.
> > >
> > > Signed-off-by: Stephen Hemminger 
> > > Signed-off-by: Changchun Ouyang 
> > > ---
> > >  lib/librte_pmd_virtio/virtio_ethdev.c | 78
> > > +++--
> > > --
> > >  lib/librte_pmd_virtio/virtio_pci.c| 22 ++
> > >  lib/librte_pmd_virtio/virtio_pci.h|  4 ++
> > >  3 files changed, 86 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c
> > > b/lib/librte_pmd_virtio/virtio_ethdev.c
> > > index 5df3b54..ef87ff8 100644
> > > --- a/lib/librte_pmd_virtio/virtio_ethdev.c
> > > +++ b/lib/librte_pmd_virtio/virtio_ethdev.c
> > > @@ -845,6 +845,34 @@ static int virtio_resource_init(struct
> > > rte_pci_device *pci_dev __rte_unused)  #endif
> > >
> > >  /*
> > > + * Process Virtio Config changed interrupt and call the callback
> > > + * if link state changed.
> > > + */
> > > +static void
> > > +virtio_interrupt_handler(__rte_unused struct rte_intr_handle *handle,
> > > +  void *param)
> > > +{
> > > + struct rte_eth_dev *dev = param;
> > > + struct virtio_hw *hw =
> > > + VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> > > + uint8_t isr;
> > > +
> > > + /* Read interrupt status which clears interrupt */
> > > + isr = vtpci_isr(hw);
> > > + PMD_DRV_LOG(INFO, "interrupt status = %#x", isr);
> > > +
> > > + if (rte_intr_enable(>pci_dev->intr_handle) < 0)
> > > + PMD_DRV_LOG(ERR, "interrupt enable failed");
> > > +
> >
> > Is it better to put rte_intr_enable after we have handled the interrupt.
> > Is there the possibility of interrupt reentrant in uio intr framework?
> 
> The UIO framework handles IRQ's via posix thread that is reading fd, then
> calling this code. Therefore it is always single threaded.

Even if it is under UIO framework, and always single threaded, 
How about move rte_intr_enable after the virtio_dev_link_update() and 
_rte_eth_dev_callback_process is called.
This make it more like interrupt handler in linux kernel.
What do you think of it?
Thanks
Changchun



[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-28 Thread Wang, Zhihong


> -Original Message-
> From: Ananyev, Konstantin
> Sent: Tuesday, January 27, 2015 8:20 PM
> To: Wang, Zhihong; Richardson, Bruce; 'Marc Sune'
> Cc: 'dev at dpdk.org'
> Subject: RE: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> 
> 
> 
> > -Original Message-
> > From: Ananyev, Konstantin
> > Sent: Tuesday, January 27, 2015 11:30 AM
> > To: Wang, Zhihong; Richardson, Bruce; Marc Sune
> > Cc: dev at dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> >
> >
> >
> > > -Original Message-
> > > From: Wang, Zhihong
> > > Sent: Tuesday, January 27, 2015 1:42 AM
> > > To: Ananyev, Konstantin; Richardson, Bruce; Marc Sune
> > > Cc: dev at dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Ananyev, Konstantin
> > > > Sent: Tuesday, January 27, 2015 2:29 AM
> > > > To: Wang, Zhihong; Richardson, Bruce; Marc Sune
> > > > Cc: dev at dpdk.org
> > > > Subject: RE: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> > > >
> > > > Hi Zhihong,
> > > >
> > > > > -Original Message-
> > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wang,
> > > > > Zhihong
> > > > > Sent: Friday, January 23, 2015 6:52 AM
> > > > > To: Richardson, Bruce; Marc Sune
> > > > > Cc: dev at dpdk.org
> > > > > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> > > > >
> > > > >
> > > > >
> > > > > > -Original Message-
> > > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce
> > > > > > Richardson
> > > > > > Sent: Wednesday, January 21, 2015 9:26 PM
> > > > > > To: Marc Sune
> > > > > > Cc: dev at dpdk.org
> > > > > > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> > > > > >
> > > > > > On Wed, Jan 21, 2015 at 02:21:25PM +0100, Marc Sune wrote:
> > > > > > >
> > > > > > > On 21/01/15 14:02, Bruce Richardson wrote:
> > > > > > > >On Wed, Jan 21, 2015 at 01:36:41PM +0100, Marc Sune wrote:
> > > > > > > >>On 21/01/15 04:44, Wang, Zhihong wrote:
> > > > > > > -Original Message-
> > > > > > > From: Richardson, Bruce
> > > > > > > Sent: Wednesday, January 21, 2015 12:15 AM
> > > > > > > To: Neil Horman
> > > > > > > Cc: Wang, Zhihong; dev at dpdk.org
> > > > > > > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy
> > > > > > > optimization
> > > > > > > 
> > > > > > > On Tue, Jan 20, 2015 at 10:11:18AM -0500, Neil Horman wrote:
> > > > > > > >On Tue, Jan 20, 2015 at 03:01:44AM +, Wang, Zhihong
> wrote:
> > > > > > > >>>-Original Message-
> > > > > > > >>>From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > > > > > > >>>Sent: Monday, January 19, 2015 9:02 PM
> > > > > > > >>>To: Wang, Zhihong
> > > > > > > >>>Cc: dev at dpdk.org
> > > > > > > >>>Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy
> > > > > > > >>>optimization
> > > > > > > >>>
> > > > > > > >>>On Mon, Jan 19, 2015 at 09:53:30AM +0800,
> > > > > > > >>>zhihong.wang at intel.com
> > > > > > > wrote:
> > > > > > > This patch set optimizes memcpy for DPDK for both
> > > > > > > SSE and AVX
> > > > > > > platforms.
> > > > > > > It also extends memcpy test coverage with unaligned
> > > > > > > cases and more test
> > > > > > > >>>points.
> > > > > > > Optimization techniques are summarized below:
> > > > > > > 
> > > > > > > 1. Utilize full cache bandwidth
> > > > > > > 
> > > > > > > 2. Enforce aligned stores
> > > > > > > 
> > > > > > > 3. Apply load address alignment based on
> > > > > > > architecture features
> > > > > > > 
> > > > > > > 4. Make load/store address available as early as
> > > > > > > possible
> > > > > > > 
> > > > > > > 5. General optimization techniques like inlining,
> > > > > > > branch reducing, prefetch pattern access
> > > > > > > 
> > > > > > > Zhihong Wang (4):
> > > > > > >    Disabled VTA for memcpy test in app/test/Makefile
> > > > > > >    Removed unnecessary test cases in test_memcpy.c
> > > > > > >    Extended test coverage in test_memcpy_perf.c
> > > > > > >    Optimized memcpy in arch/x86/rte_memcpy.h for
> > > > > > > both SSE
> > > > > > and AVX
> > > > > > >  platforms
> > > > > > > 
> > > > > > >   app/test/Makefile  |   
> > > > > > >  6 +
> > > > > > >   app/test/test_memcpy.c |  
> > > > > > >  52 +-
> > > > > > >   app/test/test_memcpy_perf.c| 238
> +---
> > > > > > >   .../common/include/arch/x86/rte_memcpy.h   |
> 664
> > > > > > > >>>+++--
> > > > > > >   4 files changed, 656 insertions(+), 304
> > > > > > >  deletions(-)
> > > > > > > 

[dpdk-dev] [PATCH v2] testpmd check return value of rte_eth_dev_vlan_filter()

2015-01-28 Thread Qiu, Michael
On 1/28/2015 1:20 AM, Michal Jastrzebski wrote:
> This patch modifies testpmd behavior when setting:
> rx_vlan add all vf_port (enabling all vlanids
> to be passed thru rx filter on VF).
> Rx_vlan_all_filter_set() function,
> checks if the next vlanid can be enabled by the driver.
> Number of vlanids is limited by the NIC and thus the NIC
> do not allow to enable more vlanids than it can allocate
> in VFTA table.

But what about if it is caused by other issue to lead a enable failure?


> v2 - fix formatting errors
>
> Signed-off-by: Michal Jastrzebski 
> ---
>  app/test-pmd/config.c |   15 +--
>  app/test-pmd/testpmd.h|2 +-
>  lib/librte_ether/rte_ethdev.c |4 ++--
>  3 files changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> index c40f819..eda737e 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -1643,21 +1643,22 @@ rx_vlan_filter_set(portid_t port_id, int on)
>  "diag=%d\n", port_id, on, diag);
>  }
>  
> -void
> +int
>  rx_vft_set(portid_t port_id, uint16_t vlan_id, int on)
>  {
>   int diag;
>  
>   if (port_id_is_invalid(port_id))
> - return;
> + return 1;
>   if (vlan_id_is_invalid(vlan_id))
> - return;
> + return 1;
>   diag = rte_eth_dev_vlan_filter(port_id, vlan_id, on);
>   if (diag == 0)
> - return;
> + return 0;
>   printf("rte_eth_dev_vlan_filter(port_pi=%d, vlan_id=%d, on=%d) failed "
>  "diag=%d\n",
>  port_id, vlan_id, on, diag);
> + return -1;
>  }
>  
>  void
> @@ -1667,8 +1668,10 @@ rx_vlan_all_filter_set(portid_t port_id, int on)
>  
>   if (port_id_is_invalid(port_id))
>   return;
> - for (vlan_id = 0; vlan_id < 4096; vlan_id++)
> - rx_vft_set(port_id, vlan_id, on);
> + for (vlan_id = 0; vlan_id < 4096; vlan_id++){

Before "{" you use a Tab? One white space is OK.

Thanks,
Michael

> + if (rx_vft_set(port_id, vlan_id, on))
> + break;
> + }
>  }
>  
>  void
> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
> index 8f5e6c7..e0186b9 100644
> --- a/app/test-pmd/testpmd.h
> +++ b/app/test-pmd/testpmd.h
> @@ -492,7 +492,7 @@ void rx_vlan_strip_set_on_queue(portid_t port_id, 
> uint16_t queue_id, int on);
>  
>  void rx_vlan_filter_set(portid_t port_id, int on);
>  void rx_vlan_all_filter_set(portid_t port_id, int on);
> -void rx_vft_set(portid_t port_id, uint16_t vlan_id, int on);
> +int rx_vft_set(portid_t port_id, uint16_t vlan_id, int on);
>  void vlan_extend_set(portid_t port_id, int on);
>  void vlan_tpid_set(portid_t port_id, uint16_t tp_id);
>  void tx_vlan_set(portid_t port_id, uint16_t vlan_id);
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index ea3a1fb..064b5d6 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -1519,8 +1519,8 @@ rte_eth_dev_vlan_filter(uint8_t port_id, uint16_t 
> vlan_id, int on)
>   return (-EINVAL);
>   }
>   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->vlan_filter_set, -ENOTSUP);
> - (*dev->dev_ops->vlan_filter_set)(dev, vlan_id, on);
> - return (0);
> +
> + return (*dev->dev_ops->vlan_filter_set)(dev, vlan_id, on);
>  }
>  
>  int