[dpdk-dev] [PATCH] mbuf: extend rte_mbuf_prefetch_part* to support more prefetching methods

2016-05-31 Thread Jianbo Liu
Change the inline function to macro with parameters

Signed-off-by: Jianbo Liu 
---
 drivers/net/fm10k/fm10k_rxtx_vec.c  |  8 
 drivers/net/i40e/i40e_rxtx_vec.c|  8 
 drivers/net/ixgbe/ixgbe_rxtx_vec.c  |  8 
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 12 
 drivers/net/mlx4/mlx4.c |  4 ++--
 drivers/net/mlx5/mlx5_rxtx.c|  4 ++--
 examples/ipsec-secgw/ipsec-secgw.c  |  2 +-
 lib/librte_mbuf/rte_mbuf.h  | 25 +
 8 files changed, 38 insertions(+), 33 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index ef256a5..0e4c91c 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -487,10 +487,10 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf 
**rx_pkts,
rte_compiler_barrier();

if (split_packet) {
-   rte_mbuf_prefetch_part2(rx_pkts[pos]);
-   rte_mbuf_prefetch_part2(rx_pkts[pos + 1]);
-   rte_mbuf_prefetch_part2(rx_pkts[pos + 2]);
-   rte_mbuf_prefetch_part2(rx_pkts[pos + 3]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos + 1]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos + 2]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos + 3]);
}

/* D.1 pkt 3,4 convert format from desc to pktmbuf */
diff --git a/drivers/net/i40e/i40e_rxtx_vec.c b/drivers/net/i40e/i40e_rxtx_vec.c
index eef80d9..a5c4847 100644
--- a/drivers/net/i40e/i40e_rxtx_vec.c
+++ b/drivers/net/i40e/i40e_rxtx_vec.c
@@ -297,10 +297,10 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct 
rte_mbuf **rx_pkts,
_mm_storeu_si128((__m128i *)&rx_pkts[pos+2], mbp2);

if (split_packet) {
-   rte_mbuf_prefetch_part2(rx_pkts[pos]);
-   rte_mbuf_prefetch_part2(rx_pkts[pos + 1]);
-   rte_mbuf_prefetch_part2(rx_pkts[pos + 2]);
-   rte_mbuf_prefetch_part2(rx_pkts[pos + 3]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos + 1]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos + 2]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos + 3]);
}

/* avoid compiler reorder optimization */
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c 
b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
index 09f4892..55adb56 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
@@ -308,10 +308,10 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct 
rte_mbuf **rx_pkts,
_mm_storeu_si128((__m128i *)&rx_pkts[pos+2], mbp2);

if (split_packet) {
-   rte_mbuf_prefetch_part2(rx_pkts[pos]);
-   rte_mbuf_prefetch_part2(rx_pkts[pos + 1]);
-   rte_mbuf_prefetch_part2(rx_pkts[pos + 2]);
-   rte_mbuf_prefetch_part2(rx_pkts[pos + 3]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos + 1]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos + 2]);
+   RTE_MBUF_PREFETCH_PART2(prefetch0, rx_pkts[pos + 3]);
}

/* avoid compiler reorder optimization */
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c 
b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
index 9c1d124..941b2d5 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
@@ -280,10 +280,14 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct 
rte_mbuf **rx_pkts,
vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2);

if (split_packet) {
-   rte_prefetch_non_temporal(&rx_pkts[pos]->cacheline1);
-   rte_prefetch_non_temporal(&rx_pkts[pos + 
1]->cacheline1);
-   rte_prefetch_non_temporal(&rx_pkts[pos + 
2]->cacheline1);
-   rte_prefetch_non_temporal(&rx_pkts[pos + 
3]->cacheline1);
+   RTE_MBUF_PREFETCH_PART2(prefetch_non_temporal,
+   rx_pkts[pos]);
+   RTE_MBUF_PREFETCH_PART2(prefetch_non_temporal,
+   rx_pkts[pos + 1]);
+   RTE_MBUF_PREFETCH_PART2(prefetch_non_temporal,
+   rx_pkts[pos + 2]);
+   RTE_MBUF_PREFETCH_PART2(prefetch_non_temporal,
+   rx_pkts[pos + 3]);
  

[dpdk-dev] [PATCH] mbuf: extend rte_mbuf_prefetch_part* to support more prefetching methods

2016-05-31 Thread Olivier MATZ
Hi Jianbo,

On 05/31/2016 05:06 AM, Jianbo Liu wrote:
> Change the inline function to macro with parameters
> 
> Signed-off-by: Jianbo Liu 
>
> [...]
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -849,14 +849,15 @@ struct rte_mbuf {
>   * in the receive path. If the cache line of the architecture is higher than
>   * 64B, the second part will also be prefetched.
>   *
> + * @param method
> + *   The prefetch method: prefetch0, prefetch1, prefetch2 or
> + *prefetch_non_temporal.
> + *
>   * @param m
>   *   The pointer to the mbuf.
>   */
> -static inline void
> -rte_mbuf_prefetch_part1(struct rte_mbuf *m)
> -{
> - rte_prefetch0(&m->cacheline0);
> -}
> +#define RTE_MBUF_PREFETCH_PART1(method, m)   \
> + rte_##method(&(m)->cacheline0)

I'm not very fan of this macro, because it allows to
really do everything):

  RTE_MBUF_PREFETCH_PART1(pktmbuf_free, m)

would expand as:

  rte_pktmbuf_free(m)


I'd prefer to have a switch case like this, almost similar
to what Keith proposed in the initial discussion for my
patch:

enum rte_mbuf_prefetch_type {
PREFETCH0,
PREFETCH1,
...
};

static inline void
rte_mbuf_prefetch_part1(enum rte_mbuf_prefetch_type type,
struct rte_mbuf *m)
{
switch (type) {
case PREFETCH0:
rte_prefetch0(&m->cacheline0);
break;
case PREFETCH1:
rte_prefetch1(&m->cacheline0);
break;
...
}


Some questions: could you give some details about the use
of non-temporal prefetch in ixgbe_vec_neon? What are the
pros and cons, and would it be useful in other drivers?
Currently all drivers are doing prefetch0 when they prefetch
the mbuf structure. Some drivers use prefetch1 for data.


By the way, I did not try to apply the patch, but it looks
it's on top of dpdk-next-net/rel_16_07, right?

Thanks,
Olivier


[dpdk-dev] [PATCH] mbuf: extend rte_mbuf_prefetch_part* to support more prefetching methods

2016-05-31 Thread Stephen Hemminger
On Tue, 31 May 2016 08:36:06 +0530
Jianbo Liu  wrote:

> Change the inline function to macro with parameters
> 
> Signed-off-by: Jianbo Liu 

Going from typed (inline) to untyped (macro) is a step backwards
in code safety.


[dpdk-dev] [PATCH] mbuf: extend rte_mbuf_prefetch_part* to support more prefetching methods

2016-06-21 Thread Olivier Matz
Hi,

On 06/02/2016 11:30 AM, Jerin Jacob wrote:
> On Thu, Jun 02, 2016 at 05:04:13PM +0800, Jianbo Liu wrote:
>> On 1 June 2016 at 14:00, Jerin Jacob  
>> wrote:
>>> On Wed, Jun 01, 2016 at 11:29:47AM +0800, Jianbo Liu wrote:
 On 1 June 2016 at 03:28, Olivier MATZ  wrote:
> Hi Jianbo,
>
> On 05/31/2016 05:06 AM, Jianbo Liu wrote:
>> Change the inline function to macro with parameters
>>
>> Signed-off-by: Jianbo Liu 
>>
>> [...]
>> [...]
 It's for performance consideration, and only on armv8a platform.
>>>
>>> Strictly it is not armv8 specific, IA also implemented this API with
>>> _MM_HINT_NTA hint.
>>
>> I mean this patch is only for ixgbe vector PMD on armv8 platform.
>>
>>>
>>> Do we really need non-temporal/transient version of prefetch for ixgbe?
>>
>> Strictly speaking, we don't have to since we don't know how APPs use
>> the mbuf header.
> 
> Then IMO it makes sense to keep the same behavior as x86 ixgbe driver.
> Then on the upside, We may not need the new macros for part prefetching
> 
> Jerin

Knowing that http://www.dpdk.org/dev/patchwork/patch/13992/ has been
submitted, I think this patch can be marked as closed in patchwork.


[dpdk-dev] [PATCH] mbuf: extend rte_mbuf_prefetch_part* to support more prefetching methods

2016-06-01 Thread Jianbo Liu
On 1 June 2016 at 03:28, Olivier MATZ  wrote:
> Hi Jianbo,
>
> On 05/31/2016 05:06 AM, Jianbo Liu wrote:
>> Change the inline function to macro with parameters
>>
>> Signed-off-by: Jianbo Liu 
>>
>> [...]
>> --- a/lib/librte_mbuf/rte_mbuf.h
>> +++ b/lib/librte_mbuf/rte_mbuf.h
>> @@ -849,14 +849,15 @@ struct rte_mbuf {
>>   * in the receive path. If the cache line of the architecture is higher than
>>   * 64B, the second part will also be prefetched.
>>   *
>> + * @param method
>> + *   The prefetch method: prefetch0, prefetch1, prefetch2 or
>> + *prefetch_non_temporal.
>> + *
>>   * @param m
>>   *   The pointer to the mbuf.
>>   */
>> -static inline void
>> -rte_mbuf_prefetch_part1(struct rte_mbuf *m)
>> -{
>> - rte_prefetch0(&m->cacheline0);
>> -}
>> +#define RTE_MBUF_PREFETCH_PART1(method, m)   \
>> + rte_##method(&(m)->cacheline0)
>
> I'm not very fan of this macro, because it allows to
> really do everything):
>
>   RTE_MBUF_PREFETCH_PART1(pktmbuf_free, m)
>
> would expand as:
>
>   rte_pktmbuf_free(m)
>
>
> I'd prefer to have a switch case like this, almost similar
> to what Keith proposed in the initial discussion for my
> patch:
>
> enum rte_mbuf_prefetch_type {
> PREFETCH0,
> PREFETCH1,
> ...
> };
>
> static inline void
> rte_mbuf_prefetch_part1(enum rte_mbuf_prefetch_type type,
> struct rte_mbuf *m)
> {
> switch (type) {
> case PREFETCH0:
> rte_prefetch0(&m->cacheline0);
> break;
> case PREFETCH1:
> rte_prefetch1(&m->cacheline0);
> break;
> ...
> }
>
How about adding these to forbid the illegal use of this macro?
enum rte_mbuf_prefetch_type {
 ENUM_prefetch0,
 ENUM_prefetch1,
 ...
};

#define RTE_MBUF_PREFETCH_PART1(type, m) \
if (ENUM_##type == ENUM_prefretch0) \
rte_prefetch0(&(m)->cacheline0);   \
else if (ENUM_##type == ENUM_prefetch1) \
rte_prefetch1(&(m)->cacheline0); \


>
> Some questions: could you give some details about the use
> of non-temporal prefetch in ixgbe_vec_neon? What are the
> pros and cons, and would it be useful in other drivers?
> Currently all drivers are doing prefetch0 when they prefetch
> the mbuf structure. Some drivers use prefetch1 for data.
>
It's for performance consideration, and only on armv8a platform.

>
> By the way, I did not try to apply the patch, but it looks
> it's on top of dpdk-next-net/rel_16_07, right?
>
Yes


[dpdk-dev] [PATCH] mbuf: extend rte_mbuf_prefetch_part* to support more prefetching methods

2016-06-01 Thread Jerin Jacob
On Wed, Jun 01, 2016 at 11:29:47AM +0800, Jianbo Liu wrote:
> On 1 June 2016 at 03:28, Olivier MATZ  wrote:
> > Hi Jianbo,
> >
> > On 05/31/2016 05:06 AM, Jianbo Liu wrote:
> >> Change the inline function to macro with parameters
> >>
> >> Signed-off-by: Jianbo Liu 
> >>
> >> [...]
> >> --- a/lib/librte_mbuf/rte_mbuf.h
> >> +++ b/lib/librte_mbuf/rte_mbuf.h
> >> @@ -849,14 +849,15 @@ struct rte_mbuf {
> >>   * in the receive path. If the cache line of the architecture is higher 
> >> than
> >>   * 64B, the second part will also be prefetched.
> >>   *
> >> + * @param method
> >> + *   The prefetch method: prefetch0, prefetch1, prefetch2 or
> >> + *prefetch_non_temporal.
> >> + *
> >>   * @param m
> >>   *   The pointer to the mbuf.
> >>   */
> >> -static inline void
> >> -rte_mbuf_prefetch_part1(struct rte_mbuf *m)
> >> -{
> >> - rte_prefetch0(&m->cacheline0);
> >> -}
> >> +#define RTE_MBUF_PREFETCH_PART1(method, m)   \
> >> + rte_##method(&(m)->cacheline0)
> >
> > I'm not very fan of this macro, because it allows to
> > really do everything):
> >
> >   RTE_MBUF_PREFETCH_PART1(pktmbuf_free, m)
> >
> > would expand as:
> >
> >   rte_pktmbuf_free(m)
> >
> >
> > I'd prefer to have a switch case like this, almost similar
> > to what Keith proposed in the initial discussion for my
> > patch:
> >
> > enum rte_mbuf_prefetch_type {
> > PREFETCH0,
> > PREFETCH1,
> > ...
> > };
> >
> > static inline void
> > rte_mbuf_prefetch_part1(enum rte_mbuf_prefetch_type type,
> > struct rte_mbuf *m)
> > {
> > switch (type) {
> > case PREFETCH0:
> > rte_prefetch0(&m->cacheline0);
> > break;
> > case PREFETCH1:
> > rte_prefetch1(&m->cacheline0);
> > break;
> > ...
> > }
> >
> How about adding these to forbid the illegal use of this macro?
> enum rte_mbuf_prefetch_type {
>  ENUM_prefetch0,
>  ENUM_prefetch1,
>  ...
> };
> 
> #define RTE_MBUF_PREFETCH_PART1(type, m) \
> if (ENUM_##type == ENUM_prefretch0) \
> rte_prefetch0(&(m)->cacheline0);   \
> else if (ENUM_##type == ENUM_prefetch1) \
> rte_prefetch1(&(m)->cacheline0); \
> 
> 
> >
> > Some questions: could you give some details about the use
> > of non-temporal prefetch in ixgbe_vec_neon? What are the
> > pros and cons, and would it be useful in other drivers?
> > Currently all drivers are doing prefetch0 when they prefetch
> > the mbuf structure. Some drivers use prefetch1 for data.
> >
> It's for performance consideration, and only on armv8a platform.

Strictly it is not armv8 specific, IA also implemented this API with
_MM_HINT_NTA hint.

Do we really need non-temporal/transient version of prefetch for ixgbe?
If so, for x86 also it makes sense to keep it? Right?

The primary use case for transient version would be use with pipe line
line mode where the same cpu wont consume the packet.

/**
 * Prefetch a cache line into all cache levels (non-temporal/transient
 * version)
 *
 * The non-temporal prefetch is intended as a prefetch hint that
 * processor will
 * use the prefetched data only once or short period, unlike the
 * rte_prefetch0() function which imply that prefetched data to use
 * repeatedly.
 *
 * @param p
 *   Address to prefetch
 */
static inline void rte_prefetch_non_temporal(const volatile void *p); 

> 
> >
> > By the way, I did not try to apply the patch, but it looks
> > it's on top of dpdk-next-net/rel_16_07, right?
> >
> Yes


[dpdk-dev] [PATCH] mbuf: extend rte_mbuf_prefetch_part* to support more prefetching methods

2016-06-02 Thread Olivier MATZ
Hi Jianbo,

On 06/01/2016 05:29 AM, Jianbo Liu wrote:
>> enum rte_mbuf_prefetch_type {
>> > PREFETCH0,
>> > PREFETCH1,
>> > ...
>> > };
>> >
>> > static inline void
>> > rte_mbuf_prefetch_part1(enum rte_mbuf_prefetch_type type,
>> > struct rte_mbuf *m)
>> > {
>> > switch (type) {
>> > case PREFETCH0:
>> > rte_prefetch0(&m->cacheline0);
>> > break;
>> > case PREFETCH1:
>> > rte_prefetch1(&m->cacheline0);
>> > break;
>> > ...
>> > }
>> >
> How about adding these to forbid the illegal use of this macro?
> enum rte_mbuf_prefetch_type {
>  ENUM_prefetch0,
>  ENUM_prefetch1,
>  ...
> };
> 
> #define RTE_MBUF_PREFETCH_PART1(type, m) \
> if (ENUM_##type == ENUM_prefretch0) \
> rte_prefetch0(&(m)->cacheline0);   \
> else if (ENUM_##type == ENUM_prefetch1) \
> rte_prefetch1(&(m)->cacheline0); \
> 
> 

As Stephen stated, a static inline is better than a macro, mainly
because it is understood by the compiler instead of beeing a dumb
code replacement.

Any reason why you would prefer a macro in that case?

Regards
Olivier


[dpdk-dev] [PATCH] mbuf: extend rte_mbuf_prefetch_part* to support more prefetching methods

2016-06-02 Thread Jianbo Liu
On 1 June 2016 at 14:00, Jerin Jacob  wrote:
> On Wed, Jun 01, 2016 at 11:29:47AM +0800, Jianbo Liu wrote:
>> On 1 June 2016 at 03:28, Olivier MATZ  wrote:
>> > Hi Jianbo,
>> >
>> > On 05/31/2016 05:06 AM, Jianbo Liu wrote:
>> >> Change the inline function to macro with parameters
>> >>
>> >> Signed-off-by: Jianbo Liu 
>> >>
>> >> [...]
[...]
>> It's for performance consideration, and only on armv8a platform.
>
> Strictly it is not armv8 specific, IA also implemented this API with
> _MM_HINT_NTA hint.

I mean this patch is only for ixgbe vector PMD on armv8 platform.

>
> Do we really need non-temporal/transient version of prefetch for ixgbe?

Strictly speaking, we don't have to since we don't know how APPs use
the mbuf header.
But, is it high possibility that the second part is used only once or
short period because prefetching is done only when split_packet is not
NULL?

> If so, for x86 also it makes sense to keep it? Right?
>
> The primary use case for transient version would be use with pipe line
> line mode where the same cpu wont consume the packet.
>
> /**
>  * Prefetch a cache line into all cache levels (non-temporal/transient
>  * version)
>  *
>  * The non-temporal prefetch is intended as a prefetch hint that
>  * processor will
>  * use the prefetched data only once or short period, unlike the
>  * rte_prefetch0() function which imply that prefetched data to use
>  * repeatedly.
>  *
>  * @param p
>  *   Address to prefetch
>  */
> static inline void rte_prefetch_non_temporal(const volatile void *p);
>
>>
>> >
>> > By the way, I did not try to apply the patch, but it looks
>> > it's on top of dpdk-next-net/rel_16_07, right?
>> >
>> Yes


[dpdk-dev] [PATCH] mbuf: extend rte_mbuf_prefetch_part* to support more prefetching methods

2016-06-02 Thread Jianbo Liu
On 2 June 2016 at 15:10, Olivier MATZ  wrote:
> Hi Jianbo,
>
> On 06/01/2016 05:29 AM, Jianbo Liu wrote:
>>> enum rte_mbuf_prefetch_type {
>>> > PREFETCH0,
>>> > PREFETCH1,
>>> > ...
>>> > };
>>> >
>>> > static inline void
>>> > rte_mbuf_prefetch_part1(enum rte_mbuf_prefetch_type type,
>>> > struct rte_mbuf *m)
>>> > {
>>> > switch (type) {
>>> > case PREFETCH0:
>>> > rte_prefetch0(&m->cacheline0);
>>> > break;
>>> > case PREFETCH1:
>>> > rte_prefetch1(&m->cacheline0);
>>> > break;
>>> > ...
>>> > }
>>> >
>> How about adding these to forbid the illegal use of this macro?
>> enum rte_mbuf_prefetch_type {
>>  ENUM_prefetch0,
>>  ENUM_prefetch1,
>>  ...
>> };
>>
>> #define RTE_MBUF_PREFETCH_PART1(type, m) \
>> if (ENUM_##type == ENUM_prefretch0) \
>> rte_prefetch0(&(m)->cacheline0);   \
>> else if (ENUM_##type == ENUM_prefetch1) \
>> rte_prefetch1(&(m)->cacheline0); \
>> 
>>
>
> As Stephen stated, a static inline is better than a macro, mainly
> because it is understood by the compiler instead of beeing a dumb
> code replacement.
>
> Any reason why you would prefer a macro in that case?
>
For the simplicity reason. If not, we may have to write several
similar functions for different prefetchings.


[dpdk-dev] [PATCH] mbuf: extend rte_mbuf_prefetch_part* to support more prefetching methods

2016-06-02 Thread Jerin Jacob
On Thu, Jun 02, 2016 at 05:04:13PM +0800, Jianbo Liu wrote:
> On 1 June 2016 at 14:00, Jerin Jacob  
> wrote:
> > On Wed, Jun 01, 2016 at 11:29:47AM +0800, Jianbo Liu wrote:
> >> On 1 June 2016 at 03:28, Olivier MATZ  wrote:
> >> > Hi Jianbo,
> >> >
> >> > On 05/31/2016 05:06 AM, Jianbo Liu wrote:
> >> >> Change the inline function to macro with parameters
> >> >>
> >> >> Signed-off-by: Jianbo Liu 
> >> >>
> >> >> [...]
> [...]
> >> It's for performance consideration, and only on armv8a platform.
> >
> > Strictly it is not armv8 specific, IA also implemented this API with
> > _MM_HINT_NTA hint.
> 
> I mean this patch is only for ixgbe vector PMD on armv8 platform.
> 
> >
> > Do we really need non-temporal/transient version of prefetch for ixgbe?
> 
> Strictly speaking, we don't have to since we don't know how APPs use
> the mbuf header.

Then IMO it makes sense to keep the same behavior as x86 ixgbe driver.
Then on the upside, We may not need the new macros for part prefetching

Jerin

> But, is it high possibility that the second part is used only once or
> short period because prefetching is done only when split_packet is not
> NULL?
> 
> > If so, for x86 also it makes sense to keep it? Right?
> >
> > The primary use case for transient version would be use with pipe line
> > line mode where the same cpu wont consume the packet.
> >
> > /**
> >  * Prefetch a cache line into all cache levels (non-temporal/transient
> >  * version)
> >  *
> >  * The non-temporal prefetch is intended as a prefetch hint that
> >  * processor will
> >  * use the prefetched data only once or short period, unlike the
> >  * rte_prefetch0() function which imply that prefetched data to use
> >  * repeatedly.
> >  *
> >  * @param p
> >  *   Address to prefetch
> >  */
> > static inline void rte_prefetch_non_temporal(const volatile void *p);
> >
> >>
> >> >
> >> > By the way, I did not try to apply the patch, but it looks
> >> > it's on top of dpdk-next-net/rel_16_07, right?
> >> >
> >> Yes