Re: tuning routing using cxgbe and T580-CR cards?

2014-07-14 Thread John Jasem
Dropping lro on the interfaces decreased interrupt usage on the CPUs, as
measured by top -CHIPSu, by 15-20%, at least from eyeballing it. It did
not otherwise have an effect on packet rates.

Thanks!




On 07/12/2014 08:33 AM, Bjoern A. Zeeb wrote:
> On 12 Jul 2014, at 12:17 , Olivier Cochard-Labbé  wrote:
>
>> On Fri, Jul 11, 2014 at 8:03 PM, Bjoern A. Zeeb <
>> bzeeb-li...@lists.zabbadoz.net> wrote:
>>
>>> If you are primarily forwarding packets (you say "routing" multiple times)
>>> the first thing you should do is turn off LRO and TSO on all ports.
>>>
>> Hi Bjoern,
>>
>> I was not aware of disabling LRO+TSO for forwarding packet.
>> If I read correctly the wikipedia page of LRO[1]: Disabling LRO is not a
>> concern of performance but only of not breaking the end-to-end principle,
>> right ?
>> But regarding TSO[2]: It should improve performance only between the TCP
>> and IP layer. But paquet forwarded didn't have to cross TCP<->IP layer,
>> then disabling TSO should not impact performance, right ?
> For forwarding it means that you are re-assembling a packet on receive, 
> buffering multiple, etc, then hand them up the stack, only to find that you 
> are sending it out again, and thus you break them into multiple packets 
> again.   In other words:  you do a lot more work and add latency than you 
> need/want.
>
> I seem to remember that we added the knob to automatically disable our 
> soft-LRO when forwarding is turned on (but I haven’t checked if I really 
> did).  If we did, at least for soft-LRO you won’t notice a difference indeed.
>
>
>> - Multi-flows (different UDP ports) of small packet (60B) at about 10Mpps
>> …
>> No difference proven at 95.0% confidence
>>
>> => There is not difference: Then I can disable LRO for respecting the
>> end-to-end principle. But why disabling TSO ?
> Try TCP flows.
>
> — 
> Bjoern A. Zeeb "Come on. Learn, goddamn it.", WarGames, 1983
>

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: tuning routing using cxgbe and T580-CR cards?

2014-07-14 Thread John Jasem
 0 0550M15G80   0   0   0 0   6   0   0 73508  179
161498  0 92  8
 0 0 0550M15G 0   0   0   0 0   6   0   0 72673  125
159449  0 92  8
 0 0 0550M15G80   0   0   0 0   6   0   0 75630  175
164614  0 91  9




On 07/11/2014 03:32 PM, Navdeep Parhar wrote:
> On 07/11/14 10:28, John Jasem wrote:
>> In testing two Chelsio T580-CR dual port cards with FreeBSD 10-STABLE,
>> I've been able to use a collection of clients to generate approximately
>> 1.5-1.6 million TCP packets per second sustained, and routinely hit
>> 10GB/s, both measured by netstat -d -b -w1 -W (I usually use -h for the
>> quick read, accepting the loss of granularity).
> When forwarding, the pps rate is often more interesting, and almost
> always the limiting factor, as compared to the total amount of data
> being passed around.  10GB at this pps probably means 9000 MTU.  Try
> with 1500 too if possible.
>
> "netstat -d 1" and "vmstat 1" for a few seconds when your system is
> under maximum load would be useful.  And what kind of CPU is in this system?
>
>> While performance has so far been stellar, and I'm honestly speculating
>> I will need more CPU depth and horsepower to get much faster, I'm
>> curious if there is any gain to tweaking performance settings. I'm
>> seeing, under multiple streams, with N targets connecting to N servers,
>> interrupts on all CPUs peg at 99-100%, and I'm curious if tweaking
>> configs will help, or its a free clue to get more horsepower.
>>
>> So, far, except for temporarily turning off pflogd, and setting the
>> following sysctl variables, I've not done any performance tuning on the
>> system yet.
>>
>> /etc/sysctl.conf
>> net.inet.ip.fastforwarding=1
>> kern.random.sys.harvest.ethernet=0
>> kern.random.sys.harvest.point_to_point=0
>> kern.random.sys.harvest.interrupt=0
>>
>> a) One of the first things I did in prior testing was to turn
>> hyperthreading off. I presume this is still prudent, as HT doesn't help
>> with interrupt handling?
> It is always worthwhile to try your workload with and without
> hyperthreading.
>
>> b) I briefly experimented with using cpuset(1) to stick interrupts to
>> physical CPUs, but it offered no performance enhancements, and indeed,
>> appeared to decrease performance by 10-20%. Has anyone else tried this?
>> What were your results?
>>
>> c) the defaults for the cxgbe driver appear to be 8 rx queues, and N tx
>> queues, with N being the number of CPUs detected. For a system running
>> multiple cards, routing or firewalling, does this make sense, or would
>> balancing tx and rx be more ideal? And would reducing queues per card
>> based on NUMBER-CPUS and NUM-CHELSIO-PORTS make sense at all?
> The defaults are nrxq = min(8, ncores) and ntxq = min(16, ncores).  The
> man page mentions this.  The reason for 8 vs. 16 is that tx queues are
> "cheaper" as they don't have to be backed by rx buffers.  It only needs
> some memory for the tx descriptor ring and some hardware resources.
>
> It appears that your system has >= 16 cores.  For forwarding it probably
> makes sense to have nrxq = ntxq.  If you're left with 8 or fewer cores
> after disabling hyperthreading you'll automatically get 8 rx and tx
> queues.  Otherwise you'll have to fiddle with the hw.cxgbe.nrxq10g and
> ntxq10g tunables (documented in the man page).
>
>
>> d) dev.cxl.$PORT.qsize_rxq: 1024 and dev.cxl.$PORT.qsize_txq: 1024.
>> These appear to not be writeable when if_cxgbe is loaded, so I speculate
>> they are not to be messed with, or are loader.conf variables? Is there
>> any benefit to messing with them?
> Can't change them after the port has been administratively brought up
> even once.  This is mentioned in the man page.  I don't really recommend
> changing them any way.
>
>> e) dev.t5nex.$CARD.toe.sndbuf: 262144. These are writeable, but messing
>> with values did not yield an immediate benefit. Am I barking up the
>> wrong tree, trying?
> The TOE tunables won't make a difference unless you have enabled TOE,
> the TCP endpoints lie on the system, and the connections are being
> handled by the TOE on the chip.  This is not the case on your systems.
> The driver does not enable TOE by default and the only way to use it is
> to switch it on explicitly.  There is no possibility that you're using
> it without knowing that you are.
>
>> f) based on prior experiments with other vendors, I tried tweaks to
>> net.isr.* settings, but did not see any benefits worth discussing. Am I
>> correct in thi

Re: tuning routing using cxgbe and T580-CR cards?

2014-07-11 Thread John Jasem

On 07/11/2014 03:32 PM, Navdeep Parhar wrote:
> On 07/11/14 10:28, John Jasem wrote:
>> In testing two Chelsio T580-CR dual port cards with FreeBSD 10-STABLE,
>> I've been able to use a collection of clients to generate approximately
>> 1.5-1.6 million TCP packets per second sustained, and routinely hit
>> 10GB/s, both measured by netstat -d -b -w1 -W (I usually use -h for the
>> quick read, accepting the loss of granularity).
> When forwarding, the pps rate is often more interesting, and almost
> always the limiting factor, as compared to the total amount of data
> being passed around.  10GB at this pps probably means 9000 MTU.  Try
> with 1500 too if possible.

Yes, I am generally more interested/concerned with the pps. Using
1500-sized packets, I've seen around 2 million pps. I'll have hard
numbers for the list, with netstat and vmstat output Monday.



>> a) One of the first things I did in prior testing was to turn
>> hyperthreading off. I presume this is still prudent, as HT doesn't help
>> with interrupt handling?
> It is always worthwhile to try your workload with and without
> hyperthreading.

Testing Mellanox cards, HT was severely detrimental. However, in almost
every case so far, Mellanox and Chelsio have resulted in opposite
conclusions (cpufreq, net.isr.*).

>> c) the defaults for the cxgbe driver appear to be 8 rx queues, and N tx
>> queues, with N being the number of CPUs detected. For a system running
>> multiple cards, routing or firewalling, does this make sense, or would
>> balancing tx and rx be more ideal? And would reducing queues per card
>> based on NUMBER-CPUS and NUM-CHELSIO-PORTS make sense at all?
> The defaults are nrxq = min(8, ncores) and ntxq = min(16, ncores).  The
> man page mentions this.  The reason for 8 vs. 16 is that tx queues are
> "cheaper" as they don't have to be backed by rx buffers.  It only needs
> some memory for the tx descriptor ring and some hardware resources.
>
> It appears that your system has >= 16 cores.  For forwarding it probably
> makes sense to have nrxq = ntxq.  If you're left with 8 or fewer cores
> after disabling hyperthreading you'll automatically get 8 rx and tx
> queues.  Otherwise you'll have to fiddle with the hw.cxgbe.nrxq10g and
> ntxq10g tunables (documented in the man page).

I promise I did look through the man page before posting. :) This is
actually a 12 core box with HT turned off.

Mining the cxl stat entries in sysctl, it appears that the queues per
port are reasonably well balanced, so I may be concerned over nothing.



>> g) Are there other settings I should be looking at, that may squeeze out
>> a few more packets?
> The pps rates that you've observed are within the chip's hardware limits
> by at least an order of magnitude.  Tuning the kernel rather than the
> driver may be the best bang for your buck.

If I am missing obvious configurations for kernel tuning in this regard,
it would not the be the first time.

Thanks again!

-- John Jasen (jja...@gmail.com)

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


re: Network Intel X520-SR2 stopping

2014-07-11 Thread John Jasem
Marcelo;

I recently had a case where an Intel card was flapping, but using LR
transceivers. Turns out, the cable ends needed to be re-polished, as not
enough light was making it through to register transmit power.

You and the networking people may want to spend a few moments exploring
that path.

-- John Jasen (jja...@gmail.com)
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


tuning routing using cxgbe and T580-CR cards?

2014-07-11 Thread John Jasem
In testing two Chelsio T580-CR dual port cards with FreeBSD 10-STABLE,
I've been able to use a collection of clients to generate approximately
1.5-1.6 million TCP packets per second sustained, and routinely hit
10GB/s, both measured by netstat -d -b -w1 -W (I usually use -h for the
quick read, accepting the loss of granularity).

While performance has so far been stellar, and I'm honestly speculating
I will need more CPU depth and horsepower to get much faster, I'm
curious if there is any gain to tweaking performance settings. I'm
seeing, under multiple streams, with N targets connecting to N servers,
interrupts on all CPUs peg at 99-100%, and I'm curious if tweaking
configs will help, or its a free clue to get more horsepower.

So, far, except for temporarily turning off pflogd, and setting the
following sysctl variables, I've not done any performance tuning on the
system yet.

/etc/sysctl.conf
net.inet.ip.fastforwarding=1
kern.random.sys.harvest.ethernet=0
kern.random.sys.harvest.point_to_point=0
kern.random.sys.harvest.interrupt=0

a) One of the first things I did in prior testing was to turn
hyperthreading off. I presume this is still prudent, as HT doesn't help
with interrupt handling?

b) I briefly experimented with using cpuset(1) to stick interrupts to
physical CPUs, but it offered no performance enhancements, and indeed,
appeared to decrease performance by 10-20%. Has anyone else tried this?
What were your results?

c) the defaults for the cxgbe driver appear to be 8 rx queues, and N tx
queues, with N being the number of CPUs detected. For a system running
multiple cards, routing or firewalling, does this make sense, or would
balancing tx and rx be more ideal? And would reducing queues per card
based on NUMBER-CPUS and NUM-CHELSIO-PORTS make sense at all?

d) dev.cxl.$PORT.qsize_rxq: 1024 and dev.cxl.$PORT.qsize_txq: 1024.
These appear to not be writeable when if_cxgbe is loaded, so I speculate
they are not to be messed with, or are loader.conf variables? Is there
any benefit to messing with them?

e) dev.t5nex.$CARD.toe.sndbuf: 262144. These are writeable, but messing
with values did not yield an immediate benefit. Am I barking up the
wrong tree, trying?

f) based on prior experiments with other vendors, I tried tweaks to
net.isr.* settings, but did not see any benefits worth discussing. Am I
correct in this speculation, based on others experience?

g) Are there other settings I should be looking at, that may squeeze out
a few more packets?

Thanks in advance!

-- John Jasen (jja...@gmail.com)













___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: em driver: netif hangs the system if interface is cabled and configured but there is no link

2014-06-12 Thread John Jasem

On 06/12/2014 01:02 PM, Andreas Nilsson wrote:



> If it is a dual port card, shouldn't it be em0 and em1 ?

Yes. I do have two dual port cards however.

-- John Jasen (jja...@gmail.com)
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


em driver: netif hangs the system if interface is cabled and configured but there is no link

2014-06-12 Thread John Jasem
I'm configuring a system that's destined to be a multi-homed server,
using Intel dual port 1GbE cards that rely on the em driver.

em0 has link, and only needed configuration.

In an attempt to be ahead of the game, I pre-configured em2, plugged in
my side of the cable to be ready when the other side plugged theirs in,
and rebooted the box.

In this state, it appears the box will hang as netif works through
interfaces defined in rc.conf. I'm not sure if permanently, but I'm
willing to call 20 minutes 'permanent' for the purposes of this exercise.

I eventually was able to narrow it down to the configurations for em2,
and em2 not having link WHILE a cable was plugged in. I was able to
replicate the expected condition by unplugging the cable, and was able
to replicate the failure condition on em1 and em3 by moving the cable
and configurations.

Any thoughts? Am I missing something?

-- John Jasen (jja...@gmail.com)

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"