Re: Performance issues with Intel Fortville (XL710/ixl(4))

2015-05-28 Thread Adrian Chadd
Hi,

I've no plans to MFC the RSS stuff to stable/10 - there's still RSS
work that needs to happen in -HEAD and it will likely change the
ioctls and kernel API a little.

I'd really appreciate help on developing the RSS stuff in -HEAD so we
can call it done and merge it back.

Thanks,



-adrian
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Performance issues with Intel Fortville (XL710/ixl(4))

2015-05-28 Thread Lewis, Fred
Hi Adrian,

It is my understanding that you are maintaining the RSS code. The Panasas
folks have this question for you.

We are doing our testing w/ 11-CURRENT, but we will initially ship Intel
XL710 40G NIC (Fortville)
running on 10.1-RELEASE or 10.2-RELEASE. The presence of RSS - even though
it is disabled by default - makes
the driver back-port non-trivial. Is there an estimate on when the
11-CURRENT version of the ixl driver (1.4.1)
and (especially) the supporting RSS code infrastructure will get MFCed to
10-STABLE?

Thanks,
-Fred




On 5/26/15 5:58 PM, Adrian Chadd adr...@freebsd.org wrote:

hi!

Try enabling RSS and PCBGROUPS on -HEAD. The ixl driver should work.

(I haven't tested it though; I've had other things going on here.)



-adrian


On 21 May 2015 at 15:20, Lakshmi Narasimhan Sundararajan
lakshm...@msystechnologies.com wrote:
 Hi FreeBSD Team!

 We seem to have found a problem to Tx performance.

 We found that the tx handling is spread on all CPUs causing probably
cache trashing resulting in poor performance.

 But once we used cpuset to bind interrupt thread and iperf process to
the same CPU, performance was close to line rate. I used userland cpuset
command to perform this manually. I want this constrained in the kernel
config/code through some tunables, and I am seeking your help/pointers
in that regard.


 My followup questions are as follows.

 a) How are Tx interrupts steered from the NIC to the CPU on the
transmit path? Would tx_complete# interrupt for packets transmitted from
CPU#x, be serviced on the same CPU? If not, how to get this binding done?


 b) I would like to use a pool of CPUs dedicated to service NIC
interrupts. Especially on the transmit path, I would want the
tx_interrupts to be handled on the same CPU on which request was
submitted. How to get this done?


 I played with the current ISR setting but I did not see any difference
in how Interrupts are scheduled across CPU. The max interrupt threads
even though set to one, the interrupt threads are scheduled on any CPU.
Even if I set bindthreads to Œ1¹. There is no difference in interrupt
thread scheduling.


 root@mau-da-27-4-1:~ # sysctl net.isr
 net.isr.dispatch: direct
 net.isr.maxthreads: 1
 net.isr.bindthreads: 0
 net.isr.maxqlimit: 10240
 net.isr.defaultqlimit: 256
 net.isr.maxprot: 16
 net.isr.numthreads: 1


 I would sincerely appreciate if you can provide some pointers on these
items above.




 Thanks

 LN







 From: Pokala, Ravi
 Sent: Wednesday, May 20, 2015 3:34 AM
 To: freebsd-net@freebsd.org, j...@freebsd.org, e...@freebsd.org
 Cc: freebsd-hack...@freebsd.org, Lewis, Fred, Sundararajan, Lakshmi





 Hi folks,

 At Panasas, we are working with the Intel XL710 40G NIC (aka Fortville),
 and we're seeing some performance issues w/ 11-CURRENT (r282653).

 Motherboard: Intel S2600KP (aka Kennedy Pass)
 CPU: E5-2660 v3 @ 2.6GHz (aka Haswell Xeon)
 (1 socket x 10 physical cores x 2 SMT threads) = 20 logical
cores
 NIC: Intel XL710, 2x40Gbps QSFP, configured in 4x10Gbps mode
 RAM: 4x 16GB DDR4 DIMMs

 What we've seen so far:

   - TX performance is pretty consistently lower than RX performance. All
 numbers below are for unidrectional tests using `iperf':
 10Gbps linksthreads/linkTX Gbps RX Gbps TX/RX
 1   1   9.029.8591.57%
 1   8   8.499.9185.67%
 1   16  7.009.9170.63%
 1   32  6.689.9267.40%

   - With multiple active links, both TX and RX performance suffer
greatly;
 the aggregate bandwidth tops out at about a third of the theoretical
 40Gbps implied by 4x 10Gbps.
 10Gbps linksthreads/linkTX Gbps RX Gbps % of
40Gbps
 4   1   13.39   13.38   33.4%

   - Multi-link bidirectional throughput is absolutely terrible; the
 aggregate is less than a tenth of the theoretical 40Gbps.
 10Gbps linksthreads/linkTX Gbps RX Gbps % of
40Gbps
 4   1   3.832.969.6% /
7.4%

   - Occasional interrupt storm messages are seen from the IRQs
associated
 with the NICs. Since that can impact performance, those runs were not
 included in the data listed above.

 Our questions:

   - How stable is ixl(4) in -CURRENT? By that, we mean both how quickly
is
 the driver changing, and does the driver cause any system instability?

   - What type of performance have others been getting w/ Fortville? In
 40Gbps mode? In 4x10Gbps mode?

   - Does anyone have any tuning parameters they can recommend for this
 card?

   - We did our testing w/ 11-CURRENT, but we will initially ship
Fortville
 running on 10.1-RELEASE or 10.2-RELEASE. The presence of RSS - even
though
 it is disabled by default - makes the driver back-port non-trivial. Is
 there an estimate on when the 

Re: Performance issues with Intel Fortville (XL710/ixl(4))

2015-05-26 Thread Lakshmi Narasimhan Sundararajan
Hi FreeBSD Team!

We seem to have found a problem to Tx performance. 

We found that the tx handling is spread on all CPUs causing probably cache 
trashing resulting in poor performance. 

But once we used cpuset to bind interrupt thread and iperf process to the same 
CPU, performance was close to line rate. I used userland cpuset command to 
perform this manually. I want this constrained in the kernel config/code 
through some tunables, and I am seeking your help/pointers in that regard.


My followup questions are as follows.

a) How are Tx interrupts steered from the NIC to the CPU on the transmit path? 
Would tx_complete# interrupt for packets transmitted from CPU#x, be serviced on 
the same CPU? If not, how to get this binding done?


b) I would like to use a pool of CPUs dedicated to service NIC interrupts. 
Especially on the transmit path, I would want the tx_interrupts to be handled 
on the same CPU on which request was submitted. How to get this done?


I played with the current ISR setting but I did not see any difference in how 
Interrupts are scheduled across CPU. The max interrupt threads even though set 
to one, the interrupt threads are scheduled on any CPU. Even if I set 
bindthreads to ‘1’. There is no difference in interrupt thread scheduling.


root@mau-da-27-4-1:~ # sysctl net.isr
net.isr.dispatch: direct
net.isr.maxthreads: 1
net.isr.bindthreads: 0
net.isr.maxqlimit: 10240
net.isr.defaultqlimit: 256
net.isr.maxprot: 16
net.isr.numthreads: 1


I would sincerely appreciate if you can provide some pointers on these items 
above.




Thanks

LN







From: Pokala, Ravi
Sent: ‎Wednesday‎, ‎May‎ ‎20‎, ‎2015 ‎3‎:‎34‎ ‎AM
To: freebsd-net@freebsd.org, j...@freebsd.org, e...@freebsd.org
Cc: freebsd-hack...@freebsd.org, Lewis, Fred, Sundararajan, Lakshmi





Hi folks,

At Panasas, we are working with the Intel XL710 40G NIC (aka Fortville),
and we're seeing some performance issues w/ 11-CURRENT (r282653).

Motherboard: Intel S2600KP (aka Kennedy Pass)
CPU: E5-2660 v3 @ 2.6GHz (aka Haswell Xeon)
(1 socket x 10 physical cores x 2 SMT threads) = 20 logical cores
NIC: Intel XL710, 2x40Gbps QSFP, configured in 4x10Gbps mode
RAM: 4x 16GB DDR4 DIMMs

What we've seen so far:

  - TX performance is pretty consistently lower than RX performance. All
numbers below are for unidrectional tests using `iperf':
10Gbps linksthreads/linkTX Gbps RX Gbps TX/RX
1   1   9.029.8591.57%
1   8   8.499.9185.67%
1   16  7.009.9170.63%
1   32  6.689.9267.40%

  - With multiple active links, both TX and RX performance suffer greatly;
the aggregate bandwidth tops out at about a third of the theoretical
40Gbps implied by 4x 10Gbps.
10Gbps linksthreads/linkTX Gbps RX Gbps % of 40Gbps
4   1   13.39   13.38   33.4%

  - Multi-link bidirectional throughput is absolutely terrible; the
aggregate is less than a tenth of the theoretical 40Gbps.
10Gbps linksthreads/linkTX Gbps RX Gbps % of 40Gbps
4   1   3.832.969.6% / 7.4%

  - Occasional interrupt storm messages are seen from the IRQs associated
with the NICs. Since that can impact performance, those runs were not
included in the data listed above.

Our questions:

  - How stable is ixl(4) in -CURRENT? By that, we mean both how quickly is
the driver changing, and does the driver cause any system instability?

  - What type of performance have others been getting w/ Fortville? In
40Gbps mode? In 4x10Gbps mode?

  - Does anyone have any tuning parameters they can recommend for this
card?

  - We did our testing w/ 11-CURRENT, but we will initially ship Fortville
running on 10.1-RELEASE or 10.2-RELEASE. The presence of RSS - even though
it is disabled by default - makes the driver back-port non-trivial. Is
there an estimate on when the 11-CURRENT version of the driver (1.4.1)
will get MFCed to 10-STABLE?

My colleagues Lakshmi and Fred (CCed) are working on this; please make
sure to include them if you have any comments.

Thanks,

Ravi
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Performance issues with Intel Fortville (XL710/ixl(4))

2015-05-26 Thread Adrian Chadd
hi!

Try enabling RSS and PCBGROUPS on -HEAD. The ixl driver should work.

(I haven't tested it though; I've had other things going on here.)



-adrian


On 21 May 2015 at 15:20, Lakshmi Narasimhan Sundararajan
lakshm...@msystechnologies.com wrote:
 Hi FreeBSD Team!

 We seem to have found a problem to Tx performance.

 We found that the tx handling is spread on all CPUs causing probably cache 
 trashing resulting in poor performance.

 But once we used cpuset to bind interrupt thread and iperf process to the 
 same CPU, performance was close to line rate. I used userland cpuset command 
 to perform this manually. I want this constrained in the kernel config/code 
 through some tunables, and I am seeking your help/pointers in that regard.


 My followup questions are as follows.

 a) How are Tx interrupts steered from the NIC to the CPU on the transmit 
 path? Would tx_complete# interrupt for packets transmitted from CPU#x, be 
 serviced on the same CPU? If not, how to get this binding done?


 b) I would like to use a pool of CPUs dedicated to service NIC interrupts. 
 Especially on the transmit path, I would want the tx_interrupts to be handled 
 on the same CPU on which request was submitted. How to get this done?


 I played with the current ISR setting but I did not see any difference in how 
 Interrupts are scheduled across CPU. The max interrupt threads even though 
 set to one, the interrupt threads are scheduled on any CPU. Even if I set 
 bindthreads to ‘1’. There is no difference in interrupt thread scheduling.


 root@mau-da-27-4-1:~ # sysctl net.isr
 net.isr.dispatch: direct
 net.isr.maxthreads: 1
 net.isr.bindthreads: 0
 net.isr.maxqlimit: 10240
 net.isr.defaultqlimit: 256
 net.isr.maxprot: 16
 net.isr.numthreads: 1


 I would sincerely appreciate if you can provide some pointers on these items 
 above.




 Thanks

 LN







 From: Pokala, Ravi
 Sent: ‎Wednesday‎, ‎May‎ ‎20‎, ‎2015 ‎3‎:‎34‎ ‎AM
 To: freebsd-net@freebsd.org, j...@freebsd.org, e...@freebsd.org
 Cc: freebsd-hack...@freebsd.org, Lewis, Fred, Sundararajan, Lakshmi





 Hi folks,

 At Panasas, we are working with the Intel XL710 40G NIC (aka Fortville),
 and we're seeing some performance issues w/ 11-CURRENT (r282653).

 Motherboard: Intel S2600KP (aka Kennedy Pass)
 CPU: E5-2660 v3 @ 2.6GHz (aka Haswell Xeon)
 (1 socket x 10 physical cores x 2 SMT threads) = 20 logical cores
 NIC: Intel XL710, 2x40Gbps QSFP, configured in 4x10Gbps mode
 RAM: 4x 16GB DDR4 DIMMs

 What we've seen so far:

   - TX performance is pretty consistently lower than RX performance. All
 numbers below are for unidrectional tests using `iperf':
 10Gbps linksthreads/linkTX Gbps RX Gbps TX/RX
 1   1   9.029.8591.57%
 1   8   8.499.9185.67%
 1   16  7.009.9170.63%
 1   32  6.689.9267.40%

   - With multiple active links, both TX and RX performance suffer greatly;
 the aggregate bandwidth tops out at about a third of the theoretical
 40Gbps implied by 4x 10Gbps.
 10Gbps linksthreads/linkTX Gbps RX Gbps % of 40Gbps
 4   1   13.39   13.38   33.4%

   - Multi-link bidirectional throughput is absolutely terrible; the
 aggregate is less than a tenth of the theoretical 40Gbps.
 10Gbps linksthreads/linkTX Gbps RX Gbps % of 40Gbps
 4   1   3.832.969.6% / 7.4%

   - Occasional interrupt storm messages are seen from the IRQs associated
 with the NICs. Since that can impact performance, those runs were not
 included in the data listed above.

 Our questions:

   - How stable is ixl(4) in -CURRENT? By that, we mean both how quickly is
 the driver changing, and does the driver cause any system instability?

   - What type of performance have others been getting w/ Fortville? In
 40Gbps mode? In 4x10Gbps mode?

   - Does anyone have any tuning parameters they can recommend for this
 card?

   - We did our testing w/ 11-CURRENT, but we will initially ship Fortville
 running on 10.1-RELEASE or 10.2-RELEASE. The presence of RSS - even though
 it is disabled by default - makes the driver back-port non-trivial. Is
 there an estimate on when the 11-CURRENT version of the driver (1.4.1)
 will get MFCed to 10-STABLE?

 My colleagues Lakshmi and Fred (CCed) are working on this; please make
 sure to include them if you have any comments.

 Thanks,

 Ravi
 ___
 freebsd-net@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To 

Performance issues with Intel Fortville (XL710/ixl(4))

2015-05-19 Thread Pokala, Ravi
Hi folks,

At Panasas, we are working with the Intel XL710 40G NIC (aka Fortville),
and we're seeing some performance issues w/ 11-CURRENT (r282653).

Motherboard: Intel S2600KP (aka Kennedy Pass)
CPU: E5-2660 v3 @ 2.6GHz (aka Haswell Xeon)
(1 socket x 10 physical cores x 2 SMT threads) = 20 logical cores
NIC: Intel XL710, 2x40Gbps QSFP, configured in 4x10Gbps mode
RAM: 4x 16GB DDR4 DIMMs

What we've seen so far:

  - TX performance is pretty consistently lower than RX performance. All
numbers below are for unidrectional tests using `iperf':
10Gbps linksthreads/linkTX Gbps RX Gbps TX/RX
1   1   9.029.8591.57%
1   8   8.499.9185.67%
1   16  7.009.9170.63%
1   32  6.689.9267.40%

  - With multiple active links, both TX and RX performance suffer greatly;
the aggregate bandwidth tops out at about a third of the theoretical
40Gbps implied by 4x 10Gbps.
10Gbps linksthreads/linkTX Gbps RX Gbps % of 40Gbps
4   1   13.39   13.38   33.4%

  - Multi-link bidirectional throughput is absolutely terrible; the
aggregate is less than a tenth of the theoretical 40Gbps.
10Gbps linksthreads/linkTX Gbps RX Gbps % of 40Gbps
4   1   3.832.969.6% / 7.4%

  - Occasional interrupt storm messages are seen from the IRQs associated
with the NICs. Since that can impact performance, those runs were not
included in the data listed above.

Our questions:

  - How stable is ixl(4) in -CURRENT? By that, we mean both how quickly is
the driver changing, and does the driver cause any system instability?

  - What type of performance have others been getting w/ Fortville? In
40Gbps mode? In 4x10Gbps mode?

  - Does anyone have any tuning parameters they can recommend for this
card?

  - We did our testing w/ 11-CURRENT, but we will initially ship Fortville
running on 10.1-RELEASE or 10.2-RELEASE. The presence of RSS - even though
it is disabled by default - makes the driver back-port non-trivial. Is
there an estimate on when the 11-CURRENT version of the driver (1.4.1)
will get MFCed to 10-STABLE?

My colleagues Lakshmi and Fred (CCed) are working on this; please make
sure to include them if you have any comments.

Thanks,

Ravi

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org