[dpdk-dev] Missing prefetch in non-vector rx function

2015-09-24 Thread Thomas Monjalon
Hi,

2015-09-24 22:10, Arnon Warshavsky:
> Moving from dpdk 1.5 to 2.0 we observed a PPS performance degradation of
> ~30%.
> After chasing this one for a while we found the problem:
> 
> A) Between the 2 versions rte_mbuf was increased in size from 1 to 2 cache
> lines.
> B) The standard (non-vector)  rx function does not perform a prefetch for
> the 2nd cache line of the mbuf (I see this bug exists in 2.1 as well) and
> it touches it setting the next pointer to NULL.
> I tested it in ixgbe, but it looks like it exists in all drivers in the
> *_rx_recv_pkts() and *_rx_recv_scattered_pkts() functions.
> Once added the prefetch for the 2nd line, we were back in our previous
> numbers.
> 
> I believe this one slipped under the radar as the vector mode is now the
> default.
> We stumbled into it because we work in non-vector mode due to a different
> mempool bug in 2.0 which sometimes crashes the application upon port stop.

Big thanks for this double bug report!

> I have 2 questions
> 1)
> Could anyone tell if the regression tests are comparing performance while
> building DPDK with the default set of flags alone, or are multiple options
> examined?

There is no official regression test of performance.
Though Intel is probably monitoring it for their hardware.

By the way, it would be a good improvement to have such standard benchmark
in DTS or elsewhere.

> 2)
> How are issues like that being tracked and later associated to a patch?

In general, it is followed by discussion and a patch on this mailing list.
The patch must track the fixed issue in the release notes.
In order to give better exposure of current bugs we could instantiate a
bug tracker. I think it's time to think about it seriously. Let's discuss
about the possible solutions in another thread.

Thanks again to you and all the Qwilt team.
PS: it would be nice to hear about your DPDK deployment and results


[dpdk-dev] Missing prefetch in non-vector rx function

2015-09-24 Thread Arnon Warshavsky
Hi All

Moving from dpdk 1.5 to 2.0 we observed a PPS performance degradation of
~30%.
After chasing this one for a while we found the problem:

A) Between the 2 versions rte_mbuf was increased in size from 1 to 2 cache
lines.
B) The standard (non-vector)  rx function does not perform a prefetch for
the 2nd cache line of the mbuf (I see this bug exists in 2.1 as well) and
it touches it setting the next pointer to NULL.
I tested it in ixgbe, but it looks like it exists in all drivers in the
*_rx_recv_pkts() and *_rx_recv_scattered_pkts() functions.
Once added the prefetch for the 2nd line, we were back in our previous
numbers.

I believe this one slipped under the radar as the vector mode is now the
default.
We stumbled into it because we work in non-vector mode due to a different
mempool bug in 2.0 which sometimes crashes the application upon port stop.

I have 2 questions
1)
Could anyone tell if the regression tests are comparing performance while
building DPDK with the default set of flags alone, or are multiple options
examined?

2)
How are issues like that being tracked and later associated to a patch?


Thanks
/Arnon





*Arnon Warshavsky*
*Qwilt | work: +972-72-2221634 | mobile: +972-50-8583058 | arnon at qwilt.com
*