subject:"\[dpdk\-dev\] Performance regression in DPDK 1.8\/2.0"

[dpdk-dev] Performance regression in DPDK 1.8/2.0

2015-05-05 Thread De Lara Guarch, Pablo

Hi Paul,

> -Original Message-
> From: Paul Emmerich [mailto:emmericp at net.in.tum.de]
> Sent: Tuesday, April 28, 2015 12:48 PM
> To: De Lara Guarch, Pablo
> Cc: Pavel Odintsov; dev at dpdk.org
> Subject: Re: [dpdk-dev] Performance regression in DPDK 1.8/2.0
> 
> Hi,
> 
> 
> De Lara Guarch, Pablo :
> > Could you tell me which changes you made here? I see you are using
> simple tx code path on 1.8.0,
> > but with the default values, you should be using vector tx,
> > unless you have changed anything in the tx configuration.
> 
> sorry, I might have written that down wrong or read the output wrong.
> I did not modify the l2fwd example.
> 
> 
> > So, just for clarification,
> > for l2fwd you used E3-1230 v2 (Ivy Bridge), at 1.6 GHz or 3.3 GHz?
> 
> At 1.6 GHz as it is simply too fast at 3.3 GHz ;)
> 
> 
> I'll probably write a minimal example that shows my
> problem with tx only sometime next week.
> I just used the l2fwd example to illustrate my point
> with a 'builtin' example.

Thanks for the clarification. I tested it on Ivy Bridge as well, and I could 
not reproduce the issue.
Make sure that you use vector rx/tx anyway, to get best performance 
(you should be seeing better performance, since l2fwd in 1.8/2.0 uses both 
vector rx/tx).

Thanks,
Pablo

> 
> Paul

[dpdk-dev] Performance regression in DPDK 1.8/2.0

2015-04-28 Thread Paul Emmerich

Hi,


De Lara Guarch, Pablo :
> Could you tell me which changes you made here? I see you are using simple tx 
> code path on 1.8.0, 
> but with the default values, you should be using vector tx, 
> unless you have changed anything in the tx configuration.

sorry, I might have written that down wrong or read the output wrong.
I did not modify the l2fwd example.


> So, just for clarification, 
> for l2fwd you used E3-1230 v2 (Ivy Bridge), at 1.6 GHz or 3.3 GHz?

At 1.6 GHz as it is simply too fast at 3.3 GHz ;)


I?ll probably write a minimal example that shows my
problem with tx only sometime next week.
I just used the l2fwd example to illustrate my point
with a 'builtin? example.

Paul

[dpdk-dev] Performance regression in DPDK 1.8/2.0

2015-04-28 Thread Paul Emmerich

Hi,

Matthew Hall :
> Not sure if it's relevant or not, but there was another mail claiming PCIe 
> MSI-X wasn't necessarily working in DPDK 2.x. Not sure if that could be 
> causing slowdowns when there are drastic volumes of 64-byte packets causing a 
> lot of PCI activity.

Interrupts should not be relevant here.


> Also, you are mentioning some specific patches were involved... so I have to 
> ask if anybody tried git bisect yet or not. Maybe easier than trying to guess 
> at the answer.

I have not yet tried to bisect it, but that?s the next step
on my todo list*. The mbuf patch was just an educated 
guess to start a discussion.

I hoped that I was just doing something obvious wrong,
and/or that someone could point me to performance
regression tests that were executed to proof that the mbuf
patch does not affect performance.

However, there don?t seem to be any 'official? performance
regression tests, are there?


Paul

* I probably won?t be able to it until next week, though as
I have to to finish the paper about my packet generator

[dpdk-dev] Performance regression in DPDK 1.8/2.0

2015-04-28 Thread Paul Emmerich

Hi,

sorry, I mixed up the hardware I used for my tests.


Paul Emmerich :
> CPU: Intel(R) Xeon(R) CPU E3-1230 v2
> TurboBoost and HyperThreading disabled.
> Frequency fixed at 3.30 GHz via acpi_cpufreq.

The CPU frequency was fixed at 1.60 GHz to enforce
a CPU bottleneck.


My original post said that I used a Xeon E5-2620 v3
at 1.2 GHz, this is incorrect. The calculation for Cycles/Pkt
in the original post used the correct 1.6 GHz figure, though.

(I used the E5 CPU for the evaluation of my packet generator
performance with 1.7.1/2.0.0, not for the l2fwd test.)


Sorry for the confusion.


Paul

[dpdk-dev] Performance regression in DPDK 1.8/2.0

2015-04-28 Thread Bruce Richardson

On Tue, Apr 28, 2015 at 12:28:34AM +0200, Paul Emmerich wrote:
> Let me know if you need any additional information.
> I'd also be interested in the configuration that resulted in the 20% speed-
> up that was mentioned in the original mbuf patch
> 
> Paul
> 
The speed-up would be for apps that were doing RX of scattered packets, i.e.
across mbufs. Before 1.8, this was using a scalar function which was rather
slow compared to the fast-path vector function. In 1.8 we introduced a new
vector function which supported scattered packets - it still isn't as fast as
the non-scattered packet RX function, but it was a good improvement over the
older version.

/Bruce

[dpdk-dev] Performance regression in DPDK 1.8/2.0

2015-04-28 Thread Bruce Richardson

On Tue, Apr 28, 2015 at 12:43:16PM +0200, Paul Emmerich wrote:
> Hi,
> 
> sorry, I mixed up the hardware I used for my tests.
> 
> 
> Paul Emmerich :
> > CPU: Intel(R) Xeon(R) CPU E3-1230 v2
> > TurboBoost and HyperThreading disabled.
> > Frequency fixed at 3.30 GHz via acpi_cpufreq.
> 
> The CPU frequency was fixed at 1.60 GHz to enforce
> a CPU bottleneck.
> 
> 
> My original post said that I used a Xeon E5-2620 v3
> at 1.2 GHz, this is incorrect. The calculation for Cycles/Pkt
> in the original post used the correct 1.6 GHz figure, though.
> 
> (I used the E5 CPU for the evaluation of my packet generator
> performance with 1.7.1/2.0.0, not for the l2fwd test.)
> 
> 
> Sorry for the confusion.
> 
> 
> Paul
Thanks for the update - we are investigating.

/Bruce

[dpdk-dev] Performance regression in DPDK 1.8/2.0

2015-04-28 Thread De Lara Guarch, Pablo



> -Original Message-
> From: Richardson, Bruce
> Sent: Tuesday, April 28, 2015 11:55 AM
> To: Paul Emmerich
> Cc: De Lara Guarch, Pablo; dev at dpdk.org
> Subject: Re: [dpdk-dev] Performance regression in DPDK 1.8/2.0
> 
> On Tue, Apr 28, 2015 at 12:43:16PM +0200, Paul Emmerich wrote:
> > Hi,
> >
> > sorry, I mixed up the hardware I used for my tests.
> >
> >
> > Paul Emmerich :
> > > CPU: Intel(R) Xeon(R) CPU E3-1230 v2
> > > TurboBoost and HyperThreading disabled.
> > > Frequency fixed at 3.30 GHz via acpi_cpufreq.
> >
> > The CPU frequency was fixed at 1.60 GHz to enforce
> > a CPU bottleneck.
> >
> >
> > My original post said that I used a Xeon E5-2620 v3
> > at 1.2 GHz, this is incorrect. The calculation for Cycles/Pkt
> > in the original post used the correct 1.6 GHz figure, though.
> >
> > (I used the E5 CPU for the evaluation of my packet generator
> > performance with 1.7.1/2.0.0, not for the l2fwd test.)

Thanks for the update. So, just for clarification, 
for l2fwd you used E3-1230 v2 (Ivy Bridge), at 1.6 GHz or 3.3 GHz?

Pablo
> >
> >
> > Sorry for the confusion.
> >
> >
> > Paul
> Thanks for the update - we are investigating.
> 
> /Bruce

[dpdk-dev] Performance regression in DPDK 1.8/2.0

2015-04-28 Thread De Lara Guarch, Pablo



> -Original Message-
> From: Paul Emmerich [mailto:emmericp at net.in.tum.de]
> Sent: Monday, April 27, 2015 11:29 PM
> To: De Lara Guarch, Pablo
> Cc: Pavel Odintsov; dev at dpdk.org
> Subject: Re: [dpdk-dev] Performance regression in DPDK 1.8/2.0
> 
> Hi,
> 
> Pablo :
> > Could you tell me how you got the L1 cache miss ratio? Perf?
> 
> perf stat -e L1-dcache-loads,L1-dcache-misses l2fwd ...
> 
> 
> > Could you provide more information on how you run the l2fwd app,
> > in order to try to reproduce the issue:
> > - L2fwd Command line
> 
> ./build/l2fwd -c 3 -n 2 -- -p 3 -q 2
> 
> 
> > - L2fwd initialization (to check memory/CPU/NICs)
> 
> I unfortunately did not save the output, but I wrote down the important
> parts:
> 
> 1.7.1: no output regarding rx/tx code paths as init debug wasn't enabled
> 1.8.0 and 2.0.0: simple tx code path, vector rx
> 
> 
> Hardware:
> 
> CPU: Intel(R) Xeon(R) CPU E3-1230 v2
> TurboBoost and HyperThreading disabled.
> Frequency fixed at 3.30 GHz via acpi_cpufreq.
> 
> NIC: X540-T2
> 
> Memory: Dual Channel DDR3 1333 MHz, 4x 4GB
> 
> > Did you change the l2fwd app between versions? L2fwd uses simple rx on
> 1.7.1,
> > whereas it uses vector rx on 2.0 (enable IXGBE_DEBUG_INIT to check it).
> 
> Yes, I had to update l2fwd when going from 1.7.1 to 1.8.0. However, the
> changes in the app were minimal.

Could you tell me which changes you made here? I see you are using simple tx 
code path on 1.8.0, 
but with the default values, you should be using vector tx, 
unless you have changed anything in the tx configuration.

Not sure also if you are using simple tx code path on 1.7.1 then, plus 
scattered rx.
(Without changing the l2fwd app, I use scattered rx and vector tx).

Thanks!
Pablo

> 
> 1.8.0 and 2.0.0 used vector rx. Disabling vector rx via DPDK .config file
> causes another 30% performance loss so I kept it enabled.
> 
> 
> 
> > Which packet format/size did you use? Does your traffic generator take
> into account the Inter-packet gap?
> 
> 64 Byte packets, full line rate on both ports, i.e. 14.88 Mpps per port.
> The packet's content doesn't matter as l2fwd doesn't look at it, but it was
> just some random stuff: EthType 0x1234.
> 
> 
> Let me know if you need any additional information.
> I'd also be interested in the configuration that resulted in the 20% speed-
> up that was mentioned in the original mbuf patch
> 
> Paul

[dpdk-dev] Performance regression in DPDK 1.8/2.0

2015-04-28 Thread Paul Emmerich

Hi,

Pablo :
> Could you tell me how you got the L1 cache miss ratio? Perf?

perf stat -e L1-dcache-loads,L1-dcache-misses l2fwd ...


> Could you provide more information on how you run the l2fwd app,
> in order to try to reproduce the issue:
> - L2fwd Command line

./build/l2fwd -c 3 -n 2 -- -p 3 -q 2


> - L2fwd initialization (to check memory/CPU/NICs)

I unfortunately did not save the output, but I wrote down the important
parts:

1.7.1: no output regarding rx/tx code paths as init debug wasn't enabled
1.8.0 and 2.0.0: simple tx code path, vector rx


Hardware:

CPU: Intel(R) Xeon(R) CPU E3-1230 v2
TurboBoost and HyperThreading disabled.
Frequency fixed at 3.30 GHz via acpi_cpufreq.

NIC: X540-T2

Memory: Dual Channel DDR3 1333 MHz, 4x 4GB

> Did you change the l2fwd app between versions? L2fwd uses simple rx on 1.7.1,
> whereas it uses vector rx on 2.0 (enable IXGBE_DEBUG_INIT to check it).

Yes, I had to update l2fwd when going from 1.7.1 to 1.8.0. However, the
changes in the app were minimal.

1.8.0 and 2.0.0 used vector rx. Disabling vector rx via DPDK .config file
causes another 30% performance loss so I kept it enabled.



> Which packet format/size did you use? Does your traffic generator take into 
> account the Inter-packet gap?

64 Byte packets, full line rate on both ports, i.e. 14.88 Mpps per port.
The packet's content doesn't matter as l2fwd doesn't look at it, but it was
just some random stuff: EthType 0x1234.


Let me know if you need any additional information.
I'd also be interested in the configuration that resulted in the 20% speed-
up that was mentioned in the original mbuf patch

Paul

[dpdk-dev] Performance regression in DPDK 1.8/2.0

2015-04-27 Thread Matthew Hall

On Apr 27, 2015, at 3:28 PM, Paul Emmerich  wrote:
> Let me know if you need any additional information.
> I'd also be interested in the configuration that resulted in the 20% speed-
> up that was mentioned in the original mbuf patch

Not sure if it's relevant or not, but there was another mail claiming PCIe 
MSI-X wasn't necessarily working in DPDK 2.x. Not sure if that could be causing 
slowdowns when there are drastic volumes of 64-byte packets causing a lot of 
PCI activity.

Also, you are mentioning some specific patches were involved... so I have to 
ask if anybody tried git bisect yet or not. Maybe easier than trying to guess 
at the answer.

Matthew.

[dpdk-dev] Performance regression in DPDK 1.8/2.0

2015-04-27 Thread De Lara Guarch, Pablo

Hi,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pavel Odintsov
> Sent: Monday, April 27, 2015 9:07 AM
> To: Paul Emmerich
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] Performance regression in DPDK 1.8/2.0
> 
> Hello!
> 
> I executed deep test of Paul's toolkit and could approve performance
> degradation in 2.0.0.
> 
> On Sun, Apr 26, 2015 at 9:50 PM, Paul Emmerich 
> wrote:
> > Hi,
> >
> > I'm working on a DPDK-based packet generator [1] and I recently tried to
> > upgrade from DPDK 1.7.1 to 2.0.0.
> > However, I noticed that DPDK 1.7.1 is about 25% faster than 2.0.0 for my
> use
> > case.
> >
> > So I ran some basic performance tests on the l2fwd example with DPDK
> 1.7.1,
> > 1.8.0 and 2.0.0.
> > I used an Intel Xeon E5-2620 v3 CPU clocked down to 1.2 GHz in order to
> > ensure that the CPU and not the network bandwidth is the bottleneck.
> > I configured l2fwd to forward between two interfaces of an X540 NIC using
> > only a single CPU core (-q2) and measured the following throughput under
> > full bidirectional load:
> >
> >
> > Version  TP [Mpps] Cycles/Pkt
> > 1.7.118.84 84.925690021
> > 1.8.016.78 95.351609058
> > 2.0.016.40 97.56097561
> >
> > DPDK 1.7.1 is about 15% faster in this scenario. The obvious suspect is the
> > new mbuf structure introduced in DPDK 1.8, so I profiled L1 cache misses:
> >
> > Version   L1 miss ratio
> > 1.7.1 6.5%
> > 1.8.013.8%
> > 2.0.013.4%
> >
> >
> > FWIW the performance results with my packet generator on the same 1.2
> GHz
> > CPU core are:
> >
> > Version  TP [Mpps]  L1 cache miss ratio
> > 1.7  11.77  4.3%
> > 2.0  9.58.4%

Could you tell me how you got the L1 cache miss ratio? Perf?
> >
> >
> > The discussion about the original patch [2] which introduced the new mbuf
> > structure addresses this potential performance degradation and mentions
> that
> > it is somehow mitigated.
> > It even claims a 20% *increase* in performance in a specific scenario.
> > However, that doesn't seem to be the case for both l2fwd and my packet
> > generator.
> >
> > Any ideas how to fix this? A 25% loss in throughput prevents me from
> > upgrading to DPDK 2.0.0. I need the new lcore features and the 40 GBit
> > driver updates, so I can't stay on 1.7.1 forever.

Could you provide more information on how you run the l2fwd app,
in order to try to reproduce the issue:
- L2fwd Command line
- L2fwd initialization (to check memory/CPU/NICs)

Did you change the l2fwd app between versions? L2fwd uses simple rx on 1.7.1,
whereas it uses vector rx on 2.0 (enable IXGBE_DEBUG_INIT to check it).

Last question, I assume you use your traffic generator to get all those numbers.
Which packet format/size did you use? Does your traffic generator take into 
account the Inter-packet gap?

Thanks!

Pablo
> >
> > Paul
> >
> >
> > [1] https://github.com/emmericp/MoonGen
> > [2]
> http://comments.gmane.org/gmane.comp.networking.dpdk.devel/5155
> 
> 
> 
> --
> Sincerely yours, Pavel Odintsov

[dpdk-dev] Performance regression in DPDK 1.8/2.0

2015-04-26 Thread Paul Emmerich

Hi,

I'm working on a DPDK-based packet generator [1] and I recently tried to
upgrade from DPDK 1.7.1 to 2.0.0.
However, I noticed that DPDK 1.7.1 is about 25% faster than 2.0.0 for my use
case.

So I ran some basic performance tests on the l2fwd example with DPDK 1.7.1,
1.8.0 and 2.0.0.
I used an Intel Xeon E5-2620 v3 CPU clocked down to 1.2 GHz in order to
ensure that the CPU and not the network bandwidth is the bottleneck.
I configured l2fwd to forward between two interfaces of an X540 NIC using
only a single CPU core (-q2) and measured the following throughput under
full bidirectional load:


Version  TP [Mpps] Cycles/Pkt
1.7.118.84 84.925690021
1.8.016.78 95.351609058
2.0.016.40 97.56097561

DPDK 1.7.1 is about 15% faster in this scenario. The obvious suspect is the
new mbuf structure introduced in DPDK 1.8, so I profiled L1 cache misses:

Version   L1 miss ratio
1.7.1 6.5%
1.8.013.8%
2.0.013.4%


FWIW the performance results with my packet generator on the same 1.2 GHz
CPU core are:

Version  TP [Mpps]  L1 cache miss ratio
1.7  11.77  4.3%
2.0  9.58.4%


The discussion about the original patch [2] which introduced the new mbuf
structure addresses this potential performance degradation and mentions that
it is somehow mitigated.
It even claims a 20% *increase* in performance in a specific scenario.
However, that doesn't seem to be the case for both l2fwd and my packet
generator.

Any ideas how to fix this? A 25% loss in throughput prevents me from
upgrading to DPDK 2.0.0. I need the new lcore features and the 40 GBit
driver updates, so I can't stay on 1.7.1 forever.

Paul


[1] https://github.com/emmericp/MoonGen
[2] http://comments.gmane.org/gmane.comp.networking.dpdk.devel/5155

[dpdk-dev] Performance regression in DPDK 1.8/2.0

[dpdk-dev] Performance regression in DPDK 1.8/2.0

[dpdk-dev] Performance regression in DPDK 1.8/2.0

[dpdk-dev] Performance regression in DPDK 1.8/2.0

[dpdk-dev] Performance regression in DPDK 1.8/2.0

[dpdk-dev] Performance regression in DPDK 1.8/2.0

[dpdk-dev] Performance regression in DPDK 1.8/2.0

[dpdk-dev] Performance regression in DPDK 1.8/2.0

[dpdk-dev] Performance regression in DPDK 1.8/2.0

[dpdk-dev] Performance regression in DPDK 1.8/2.0

[dpdk-dev] Performance regression in DPDK 1.8/2.0

[dpdk-dev] Performance regression in DPDK 1.8/2.0

12 matches

Site Navigation

Mail list logo

Footer information