Re: ixgbe TX desc prefetch

2015-01-19 Thread Luigi Rizzo
On Mon, Jan 19, 2015 at 07:37:03PM +, Zoltan Kiss wrote:
> Hi,
> 
> I'm using netmap on Ubuntu 14.04 (3.13.0-44 kernel, ixgbe 3.15.1-k), and 
> I can't max out a 10G link with pktgen :
> 
> Sent 1068122462 packets, 64 bytes each, in 84.29 seconds.
> Speed: 12.67 Mpps Bandwidth: 6.49 Gbps (raw 8.92 Gbps)
> 
> The README says "ixgbe tops at about 12.5 Mpps unless the driver
> prefetches tx descriptors". I've checked the actual driver code, it does 
> a prefetch(tx_desc) in the TX completion interrupt handler, is that what 
> you mean? Top shows ksoftirqd eats up one core while the pktgen process 
> is around 45%

that comment is related to how the TXDCTL register
in the NIC is programmed, not the CPU's prefetch, and it applies
only to the case where you use only one queue to transmit.
With multiple tx queues you should be able to do line rate
regardless of that setting.

However there are other things you might be hitting:
- if you have IOMMU enabled, that adds overhead to the memory mappings
  and i seem to remember that caused a drop in the tx rate;
- try pkt-gen with sizes 60 and 64 (before CRC) to see if there is any
  difference. Especially on the receive side, if the driver strips
  the CRC, performance with 60 bytes is worse (and i assume you have
  disabled flow control on both sender and receiver)

Finally, we have seen degradation on recent linux kernels (> 3.5 i would say)
and this seems to be due to the driver disabling interrupt moderation
if the napi handler reports work has been completed. Since netmap
does almost nothing in the NAPI handler, the OS is confused and thinks
there is no load so it could as well optimize for low latency.

A fix is to hardwire interrupt moderation to some 20-50us (not sure if
you can do it with ethtool, we tweaked the driver's code to avoid
the changes in moderation). That should deal with the high ksoftirq load.

And finally, multiple ports on the same nic contend for PCIe bandwidth
so it is well possible that the bus does not have capacity
for full traffic on both ports.

cheers
luigi

> My problem gets even worse when I want to use the another port on this 
> same dual port card to do receive back the traffic (I'm sending my 
> packets through a device I want to test for switching performance). The 
> sending performance drops down to 9.39 Mpps (6.61 Gbps), and the 
> receiving goes this much as well. I'm trying to bind the threads to 
> cores with "-a 3" and so, but they don't seem to obey based on top. The 
> TX now uses ca 50% CPU while RX is %20, but they don't seem to run on 
> their assigned CPU.
> My card is an Intel 82599ES, the CPU is i5-4570 @ 3.2GHz (no HT). Maybe 
> the fact it is a workstation CPU contributes to this problem?
> 
> All suggestions welcome!
> 
> Regards,
> 
> Zoltan Kiss
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


ixgbe TX desc prefetch

2015-01-19 Thread Zoltan Kiss

Hi,

I'm using netmap on Ubuntu 14.04 (3.13.0-44 kernel, ixgbe 3.15.1-k), and 
I can't max out a 10G link with pktgen :


Sent 1068122462 packets, 64 bytes each, in 84.29 seconds.
Speed: 12.67 Mpps Bandwidth: 6.49 Gbps (raw 8.92 Gbps)

The README says "ixgbe tops at about 12.5 Mpps unless the driver
prefetches tx descriptors". I've checked the actual driver code, it does 
a prefetch(tx_desc) in the TX completion interrupt handler, is that what 
you mean? Top shows ksoftirqd eats up one core while the pktgen process 
is around 45%
My problem gets even worse when I want to use the another port on this 
same dual port card to do receive back the traffic (I'm sending my 
packets through a device I want to test for switching performance). The 
sending performance drops down to 9.39 Mpps (6.61 Gbps), and the 
receiving goes this much as well. I'm trying to bind the threads to 
cores with "-a 3" and so, but they don't seem to obey based on top. The 
TX now uses ca 50% CPU while RX is %20, but they don't seem to run on 
their assigned CPU.
My card is an Intel 82599ES, the CPU is i5-4570 @ 3.2GHz (no HT). Maybe 
the fact it is a workstation CPU contributes to this problem?


All suggestions welcome!

Regards,

Zoltan Kiss
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


ixgbe TX desc prefetch

2015-01-19 Thread Zoltan Kiss

Hi,

I'm using netmap on Ubuntu 14.04 (3.13.0-44 kernel, ixgbe 3.15.1-k), and 
I can't max out a 10G link with pktgen :


Sent 1068122462 packets, 64 bytes each, in 84.29 seconds.
Speed: 12.67 Mpps Bandwidth: 6.49 Gbps (raw 8.92 Gbps)

The README says "ixgbe tops at about 12.5 Mpps unless the driver
prefetches tx descriptors". I've checked the actual driver code, it does 
a prefetch(tx_desc) in the TX completion interrupt handler, is that what 
you mean? Top shows ksoftirqd eats up one core while the pktgen process 
is around 45%
My problem gets even worse when I want to use the another port on this 
same dual port card to do receive back the traffic (I'm sending my 
packets through a device I want to test for switching performance). The 
sending performance drops down to 9.39 Mpps (6.61 Gbps), and the 
receiving goes this much as well. I'm trying to bind the threads to 
cores with "-a 3" and so, but they don't seem to obey based on top. The 
TX now uses ca 50% CPU while RX is %20, but they don't seem to run on 
their assigned CPU.
My card is an Intel 82599ES, the CPU is i5-4570 @ 3.2GHz (no HT). Maybe 
the fact it is a workstation CPU contributes to this problem?


All suggestions welcome!

Regards,

Zoltan Kiss
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Bug 195859] Reproduceble panic with VIMAGE + if_bridge

2015-01-19 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195859

Bjoern A. Zeeb  changed:

   What|Removed |Added

   Severity|Affects Only Me |Affects Some People

--- Comment #10 from Bjoern A. Zeeb  ---
(In reply to Craig Rodrigues from comment #8)

No, it's still used in the same jail.

What seems to happen is:
(a) the bridges get destroyed (all members detached, etc.), the lock gets
destroyed.
(b) the loopback interface in the same jail gets destroyed
(c) the globally registered eventhandler in if_bridge is called for the
interface (lo) disappearing.
(d) we get to the point where we try to acquire the lock which we previously
destroyed.

Either extra checks in bridge_ifdetach() need to be implemented to catch that
case (and I think that's not possible without adding extra bandaid
information), or
proper handling of net cloned interfaces and startup/teardown ordering needs to
be implemented "as a whole".

With all that the CURVET_SET/RESTORE question from comment #1 remains, as to
what happens if bridge_members in the normal case reside in different VNETs
(child jails)?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


netmap example bridge application performance issues

2015-01-19 Thread Colton Chojnacki
I ask this question about a week ago, but I may have asked on the wrong
mailing list.

I am trying to compare the performance of the bridge application included
in netmap with that of linux kernel bridge application. I have noticed that
the performance of netmap bridge application is significantly lower than
that of the linux kernel bridge.
I am not sure that I am using the netmap bridge application
correctly. Looking to be pointed in the right direction.

I am running the experiment on a single box using two network namespaces.
Here is an overview of my setup:
-Linux Kernel 3.14.28 with netmap module loaded
-two network namespaces: h0, h1
-two veth pairs: veth0, veth1, veth2, veth3
-one bridge implementation: linux kernel bridge or netmap bridge
-Each veth pair is terminated on a network namespace and the bridge
implementation that I am testing

I am running iperf from network namespace h0 to h1.
With the linux kernel bridge iperf is reporting a bandwidth of 24.1 Gbps
With the netmap bridge application iperf is reporting a bandwidth of
118Mbps

The command I am using to setup the netmap bridge application is:
./bridge -i netmap:veth1 -i netmap:veth3

The result of the experiment (24.1Gbps vs 118Mbps) seems off, no?

Also which branch/tag of this repository
https://code.google.com/p/netmap/ should
I be using for the latest linux version of netmap?

Thanks,
Colton
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Bug 195859] Reproduceble panic with VIMAGE + if_bridge

2015-01-19 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195859

Bjoern A. Zeeb  changed:

   What|Removed |Added

 CC||b...@freebsd.org

--- Comment #9 from Bjoern A. Zeeb  ---
(In reply to Craig Rodrigues from comment #1)

That patch would be bogus as the CURVNET_SET()/RESTORE() would have to be
before/after locking as that lack is virtualised as well.  But it's also not
the real problem.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"