Re: Chelsio / cxlv(4): strange messages, SR-IOV interface does not work

2024-10-23 Thread Navdeep Parhar
On Tue, Oct 22, 2024 at 11:21 PM Lexi Winter  wrote:

> hello,
>
> i'm trying to configure a cxlv(4) device, which is a VF of a Chelsio
> T540-CR on a host running bhyve.
>
> host: FreeBSD 15.0-CURRENT #3 lf/main-n269068-2cff93ced1d: Wed Oct 23
> 02:48:20 BST 2024
> guest: FreeBSD 15.0-CURRENT #2 lf/main-n269067-56dd459904b: Sat Oct 19
> 18:36:40 BST 2024
>
> the VF appears correctly in the VM:
>
> root@lily:~ # kldload if_cxlv
> t5vf0:  mem
> 0xc000e000-0xc000efff,0xc000-0xc0007fff,0xc0008000-0xc0009fff at
> device 6.0 on pci0
> t5vf0: 1 ports, 2 MSI-X interrupts, 4 eq, 2 iq
> cxlv0:  on t5vf0
> cxlv0: 2 txq, 1 rxq (NIC)
>
> and after bringing the interface 'up' everything seems fine:
>
> root@lily:~ # ifconfig cxlv0
> cxlv0: flags=1008843
> metric 0 mtu 1500
>
>
> options=6ec07bb
>  ether 06:44:3f:e7:60:30
>  media: Ethernet 10Gbase-Twinax  (10Gbase-Twinax
> )
>  status: active
>  nd6 options=29
>
> however, trying to assign an IP address causes immediate problems:
>
> root@lily:~ # ifconfig cxlv0 inet6 2001:8b0:aab5:7::10/64
> root@lily:~ # Oct 23 06:16:07 lily kernel: cxlv0: a looped back NS
> message is detected during DAD for fe80:3::444:3fff:fee7:6030.  Another
> DAD probes are being sent.
> root@lily:~ # dmesg|grep loop
> cxlv0: a looped back NS message is detected during DAD for
> fe80:3::444:3fff:fee7:6030.  Another DAD probes are being sent.
> cxlv0: a looped back NS message is detected during DAD for
> fe80:3::444:3fff:fee7:6030.  Another DAD probes are being sent.
> cxlv0: a looped back NS message is detected during DAD for
> fe80:3::444:3fff:fee7:6030.  Another DAD probes are being sent.
> cxlv0: a looped back NS message is detected during DAD for
> fe80:3::444:3fff:fee7:6030.  Another DAD probes are being sent.
> cxlv0: a looped back NS message is detected during DAD for
> fe80:3::444:3fff:fee7:6030.  Another DAD probes are being sent.
> cxlv0: a looped back NS message is detected during DAD for
> fe80:3::444:3fff:fee7:6030.  Another DAD probes are being sent.
> cxlv0: a looped back NS message is detected during DAD for
> fe80:3::444:3fff:fee7:6030.  Another DAD probes are being sent.
>
> i find this strange because the link local IP address in the kernel
> error is not even configured on the interface:
>
> root@lily:~ # ifconfig cxlv0
> cxlv0: flags=1008843
> metric 0 mtu 1500
>
>
> options=6ec07bb
>  ether 06:44:3f:e7:60:30
>  inet6 2001:8b0:aab5:7::10/64
>  inet6 fe80::444:3fff:fee7:6030%cxlv0/64 tentative scopeid 0x3
>  media: Ethernet 10Gbase-Twinax  (10Gbase-Twinax
> )
>  status: active
>  nd6 options=21
>
> am i doing something wrong here?
>

You can disable IPv6 DAD as a workaround for this issue.  The problem is
that the VF's multicast tx is getting reflected back to it by the internal
switch when it shouldn't.

# sysctl net.inet6.ip6.dad_count=0

Regards,
Navdeep


>
> thanks, lexi.
>
>


Re: Traffic between cxgbe VFs and/or PF on a host

2024-10-11 Thread Navdeep Parhar
On Fri, Oct 11, 2024 at 3:56 PM John Nielsen  wrote:

> Hi-
>
> I’m running a FreeBSD 14-STABLE host with a Chelstio T520. I have a bhyve
> VM (also running 14-STABLE) to which I have assigned a VF of the NIC. That
> is all working as expected; the host can pass traffic using the PF cxl0 and
> the guest can pass traffic using the VF cxlv0. However the host cannot
> communicate with the guest. I am looking in to the possibility of enabling
> 802.1qbg / VEPA / reflective relay on the switch port but I’d like to know
> if the T5 can do that switching itself without sending the packets over the
> wire. The marketing material says the card "integrates a high performance
> packet switch” but I don’t know how to configure that functionality on
> FreeBSD or if this use case is supported. Can anyone shed some light on
> that?
>

The PF driver's tx bypasses the internal switch by default and is not
visible to the VFs because of that.  Set this knob to force it go through
the switch.

 hw.cxgbe.tx_vm_wr
 Setting this to 1 instructs the driver to use VM work requests
to transmit data.
 This lets PF interfaces transmit frames to VF interfaces over
the internal switch in
 the ASIC.  Note that the cxgbev(4) VF driver always uses VM
work requests and is not
 affected by this tunable.  The default value is 0 and should
be changed only if PF
 and VF interfaces need to communicate with each other.
Different interfaces can be
 assigned different values using the dev..X.tx_vm_wr
sysctl when the interface
 is administratively down.

Regards,
Navdeep


> The other alternative would be to wire up the second port but if I can get
> away with not needing to use another SFP+ port on the switch for this that
> would be ideal.
>
> Thanks!
>
> JN
>
>
>


Re: Monitoring packet loss

2024-08-07 Thread Navdeep Parhar

On 8/7/24 7:06 AM, Alan Somers wrote:

I'd like to track the rate of packet loss for outbound packets from
some production servers.  Obviously, that's impossible.  But I think
that the rate of TCP retransmissions should be a close proxy for
packet loss.  Currently I can only observe TCP retransmissions by
using wireshark, a slow and laborious process.  But it seems to me
that the network stack should already have that information


The kernel already maintains a VNET-virtualized tcpstat structure for 
aggregate TCP stats.  netstat and systat grab these using the 
net.inet.tcp.stats sysctl.  This might work for you if you're okay with 
global and not per-interface information.


VNET_PCPUSTAT_DECLARE(struct tcpstat, tcpstat); /* tcp statistics */

$ netstat -sp tcp | grep -iE 'retr|rexm'
$ systat -tcp

Regards,
Navdeep

> Would it

be possible to add a sysctl to expose the total number of
retransmissions since boot?  This information would be very useful.
It could reveal for example problems with a model of NIC, or
congestion on one network segment but not another, or a regression in
the OS.

-Alan






Re: Chelsio NIC with RSS - Traffic distribution to different Queues

2023-07-11 Thread Navdeep Parhar

On 7/10/23 2:36 AM, josef.zahn...@swisscom.com wrote:
Ok we are getting closer to a solution. But I’m a little bit confused, 
I’ll try to explain why.


I’ve removed the following sysctl values, as you stated that they are 
not official FreeBSD values:


-net.inet.rss.bits="2"
-net.inet.rss.enabled="1"
-net.isr.bindthreads="1"
-net.isr.maxthreads="-1"

Without the setting above and without your proposed sysctl values, my 
system was stable and the traffic distributed over all CPUs (even HT 
cores). So no more drops on CARP if CPU0 has had high load. It looks 
like my problem is clearly related to the “Custom” OPNsense RSS values 
above. As the RSS sysctl values are noted in the OPNsense guide 
(https://docs.opnsense.org/troubleshooting/performance.html 
), my 
intention was, that those values needs to be set anyway for RSS. But it 
seems that those 4 values are making the thing worse!


The defaults are defaults for a reason :-) and it's always a good idea 
to get a baseline measurement for your workload before any tuning.  It's 
not possible to assess the effect of any tuning without a baseline to 
compare with.  Looks like everything would have worked for you even with 
the OPNsense defaults although they are different from FreeBSD.


Back to your topic. You are right, queue-0 does only control traffic if 
I’m enabling your sysctl values. Where can I see which protocols are 
falling into control traffic (queue 0)? So your change works as 
expected, thanks a lot!


There is no direct way to observe the traffic on each queue.  Each rx 
queue happens to have its own interrupt so I looked at the interrupt 
counts (as shown by vmstat -i) and the queues' consumer index (sysctl 
dev.cxl..rxq..cidx) to test my patch.




As you stated, queue 0 is never CPU 0, which is important to know. >
I’m pretty sure that I needed the RSS values above to get RSS work on 
intel NICs and with “netstat -Q” I could see the RSS queues. With 
Chelsio it seems that the FreeBSD process “netisr” doesn’t do the RSS 
stuff and “netstat -Q” shows nothing about RSS. For me this means that 
your driver directly does the RSS magic, is this correct? Please help to 
shed light on this topic.
All modern multiqueue NICs do RSS internally whether you have "options 
RSS" in the kernel or not.  "options RSS" enables extra code in the 
kernel that gives it some control over how RSS is configured the NICs 
and how many queues they should use.  The net.inet.rss.* sysctls show up 
only if you have "options RSS" in the kernel.  Note that cxgbe does 
support "options RSS" so I'm not sure why the output of "netstat -Q" 
wouldn't show that.  Was this the if_cxgbe.ko that you built yourself? 
You must use the same configuration as the kernel to build the module.


> -net.isr.bindthreads="1"
> -net.isr.maxthreads="-1"

This doesn't show the ISR dispatch policy (net.isr.dispatch) but 
whatever the policy is it applies to all the NICs in the system.  If 
you're comparing runs with different NICs please do it with the exact 
same configuration.  All NIC drivers call if_input (ether_input) to 
submit work to the kernel and ether_input uses netisr to dispatch the 
work.  The driver doesn't make netisr calls directly.


Regards,
Navdeep


Re: Chelsio NIC with RSS - Traffic distribution to different Queues

2023-07-06 Thread Navdeep Parhar
On Mon, Jul 3, 2023 at 8:48 AM  wrote:
>
> Sorry for the spam, I do see the values with sysctl now. It seems that 
> FreeBSD always loads the if_cxgbe.ko from /boot/kernel/if_cxgbe.ko. So what 
> I’ve done is, I renamed the original file and copied the newly compiled 
> if_cxgbe.ko from /boot/module to /boot/kernel. Is there a cleaner way to get 
> it work? Btw. Do I need t5fw_cfg.ko as well, I haven’t found any 
> documentation what exactly it does…
>
>
>
> Quiet difficult for someone who isn’t familiar with FreeBSD at all :-P.
>
>
>
> So I’ve retested again. Sadly it does still share the load over all 
> configured CPUs (0-3) for RSS (there are 8 cores plus 8 HT cores per physical 
> CPU). As I already mentioned, the new sysctl values are now visible, so I 
> think the driver should be fine. My expectation with that configuration was, 
> that CPU0 shouldn’t be used for RSS (as all values are smaller than the 
> available CPUs), but it has been used, so the network traffic does flapping 
> as hell.

Each rx queue gets its own interrupt and I verified with "vmstat -i"
that only non-RSS traffic (ARP) resulted in activity on queue 0, TCP
and UDP traffic were seen on other queues, as expected.  Note that rx
queue 0 does not necessarily mean CPU0.

> hw.cxgbe.cong_drop="1"
> hw.cxgbe.nrxq="3"
> hw.cxgbe.pause_settings="0"
> hw.cxgbe.rsrv_noflowq="1"
> hw.cxgbe.rsrv_norssq="1"
> hw.cxgbe.rx_budget="128"
> if_cxgbe_load="yes"

okay.

> net.inet.rss.bits="2"
> net.inet.rss.enabled="1"
> net.isr.bindthreads="1"
> net.isr.maxthreads="-1"

What are all these for?  Please go back to defaults and retry with
only the driver tunables above.  It looks like OPNsense ships with
"options RSS" in the kernel.  That complicates things a bit because I
tested my patch on FreeBSD GENERIC kernel which does not have this
option.  Run "top -SHIPzt" during the test to see exactly what is
hogging CPU0.

Regards,
Navdeep



Re: Chelsio NIC with RSS - Traffic distribution to different Queues

2023-07-01 Thread Navdeep Parhar
Hello,

Please try this patch: https://people.freebsd.org/~np/norssq.diff

It adds these sysctls to the driver.
1) hw.cxgbe.rsrv_norssq.  This is what you originally asked for.
2) hw.cxgbe.rx_budget.  This can be used to force the driver's RX to
yield periodically.

What kind of system (cores, memory, etc.) is this?  Control packets
are either getting dropped or the threads/timers responsible for
sending or processing these packets are starved of CPU.  It would be
useful to monitor interface activity with "netstat -d -I "
during the test.

# sysctl hw.model hw.ncpu hw.physmem
# netstat -dw1 -I cxl0

Try the settings listed below.  nrxq=X might help in case the driver
RX threads are hogging all the cores because all rx queues are heavily
loaded.  Set nrxq to something less than the number of cores in the
system.  rx_budget can be changed any time (try 64, 128, 256) and
might improve the responsiveness of the rest of the system during
load.

(in loader.conf)
hw.cxgbe.nrxq=2 (3 if you've patched the kernel and set norssq)
hw.cxgbe.rsrv_noflowq=1
hw.cxgbe.pause_settings=0
hw.cxgbe.cong_drop=1(2 would be better but needs a recent driver)
hw.cxgbe.rsrv_norssq=1  (needs patch)
hw.cxgbe.rx_budget=128  (needs patch)

Let us know how it goes.

Regards,
Navdeep

On Thu, Jun 29, 2023 at 5:53 AM  wrote:
>
> Can you tell me which netstat command you have in mind? I tried “netstat -Q”, 
> it shows a few drops but not that much that it would explain the CARP drops. 
> What I can tell you is, that especially CARP on the corresponding server is 
> just sending out packets as long as it is the master box and CPU0 load is 
> below 100%. It doesn’t receive any CARP traffic at all, just normal network 
> traffic. What I see is, that those CARP packets are not sent anymore if CPU0 
> has 100% load -> if that happens the server switches to standby and the 
> traffic is gone on the machine. So because of this behavior we would like to 
> have an option, which allows us to have Control Plane Traffic (LACP, CARP,…?) 
> in RSS RX queue 0 and nothing else. Question is, what would Control Plane 
> traffic be. Hopefully as well CARP/VRRP,…
>
> We tried hw.cxgbe.cong_drops=1, but it doesn’t help in our case.
>
> Can you explain a bit what your patch will do? Am I right that you will post 
> the link later on here?
>
> Cheers Josef



Re: Chelsio NIC with RSS - Traffic distribution to different Queues

2023-06-27 Thread Navdeep Parhar

On 6/27/23 12:47 AM, josef.zahn...@swisscom.com wrote:
We are familiar with «hw.cxgbe.rsrv_noflowq», but as you already stated 
it’s only for TX direction, it doesn’t help us at all.


Okay.

Our problem is in fact, that on FreeBSD only CPU0 seems to do the slow 
protocol (LACP, CARP,…) stuff and even though the other CPUs are 
completely idle, if CPU0 has 100% load (which is in fact possible to 
achieve with one iperf session) you can create a scenario where 
LACP/CARP stops working due to the load on CPU0. So the idea would be to 
get rid of the RSS load of CPU0 to ensure that the slow protocols are 
always working as expected.


Are the LACP/CARP problems due to control packets getting dropped?  Or 
did all the control packets make it in but the rx queue handling is so 
backlogged that the kernel did not process them in time, resulting in 
protocol timeouts?  Look at the netstat counters for packet drops.  If 
there are none then it must be slow processing.  If you see rx drops 
then try setting hw.cxgbe.cong_drops=1 in loader.conf.


To sum it up, yes I’m in theory able to test a RX patch, as it is only a 
test setup. However I’m completely new in FreeBSD and we are using it 
only because our firewall (OPNsense) does use it. FreeBSD came on the 
ISO with the installation medium. So please explain in detail what we 
have to do…


My patch will apply to the stock FreeBSD kernel.  I'm not familiar with 
OPNsense development workflow so not sure what to do there.  If 
cong_drops=1 doesn't fix your problem all by itself then combining it 
with this future-patch really should fix it.  I'll post the patch in a 
day or two.


Regards,
Navdeep



Cheers Josef





Re: Chelsio NIC with RSS - Traffic distribution to different Queues

2023-06-27 Thread Navdeep Parhar
On Mon, Jun 26, 2023 at 5:58 AM  wrote:
>
> Hi guys
>
>
>
> I’m trying to do something similar like in the question here 
> (https://lists.freebsd.org/pipermail/freebsd-net/2017-March/047378.html). We 
> have Chelsio NICs (T580-LP-CR).
>
>
>
> Our goal is the following:
>
> RSS Queue 0 -> only control plane traffic, eg. CARP (IP Protocol 112 – 
> Multicast), ICMP, LACP,…
> RSS Queue 1-7 -> no change, process everything (probably except protocols 
> already processed in Queue-0)
>
>
>
> Any hints how we can achieve something like this with cxgbetool?
>

There is a knob (hw.cxgbe.rsrv_noflowq) that does something similar but in the
Tx direction only.  Will you be able to test a patch that does it for
the Rx side too?

Regards,
Navdeep

>
> Cheers Josef



Re: FreeBSD 12.2 traffic not occurs onVXLAN

2021-08-24 Thread Navdeep Parhar
On Tue, Aug 24, 2021 at 1:12 AM alfadev  wrote:
>
> Thanks for interest
> FreeBSD ifconfig:
>
> igb0: flags=8822 metric 0 mtu 1500
...
> wg0: flags=80c1 metric 0 mtu 1420

igb0 should be UP and RUNNING before it will pass traffic.  You can see that
the wireguard interface is fully operational (has both those flags) and you are
able to use it for VXLAN traffic.  Maybe the difference between the FreeBSD
versions is that igb0 is UP after all your configuration commands on 11.2 only?

Regards,
Navdeep



Re: FreeBSD 12.2 traffic not occurs onVXLAN

2021-08-23 Thread Navdeep Parhar
On Sun, Aug 22, 2021 at 8:30 AM alfadev via freebsd-hackers
 wrote:
>
> Hi, I successfully configured VXLAN tunnel between amd64 FreeBSD 11.2 to x64 
> Linux
> But in FreeBSD 12.2 with below same configuration not works.
> So What is tHe problem with FreeBSD 12.2 is it bug or any other thing?
> Any help would be aooreciated..
>
> My fully working tested configuration is:
>
> FreeBSD 11.2 side:
> physical interface: igb0
> ifconfig vxlan4095 create vxlanid 4095 vxlanlocal 192.168.99.1 vxlanremote 
> 192.168.99.99 inet 192.168.157.1/24

Can you please provide the ifconfig output for both the vxlan and the
physical interface?  Have you tried running tcpdump -p on the physical
interface to see if there is any VXLAN traffic on the link?

Regards,
Navdeep

>
> Linux side:
> physical interfaces: eth0,eth1
>
> ip link add name vxlan4095 type vxlan id 4095 remote 192.168.99.1 local 
> 192.168.99.99
> ip link add name vbr0 type bridge
> ip link set eth1 master vbr0
> ip link set vxlan4095 master vbr0
> ip link set vbr0 up
>
> there is a client connected on eth1 and have IP : 192.168.157.100
> http https , icmp .. traffic passes through between client and tunnel
> eveything works well.



Re: RFC: NFS trunking (multiple TCP connections for a mount

2021-06-28 Thread Navdeep Parhar
On Mon, Jun 28, 2021 at 5:23 PM Rick Macklem  wrote:
>
> The Linux NFS client now has a mount option "nconnect",
> which specifies that multiple TCP connections be created
> for an NFS mount, where RPCs are done on the connections,
> in a round robin fashion. (Alternating between the two TCP
> connections for the case of nconnect=2.)
>
> The Linux man page says:
> nconnect=n
>   When using a connection oriented protocol such as TCP, it
>   may sometimes be advantageous to set up multiple
>   connections between the client and server. For instance,
>   if your clients and/or servers are equipped with multiple
>   network interface cards (NICs), using multiple connections
>   to spread the load may improve overall performance.  In
>   such cases, the nconnect option allows the user to specify
>   the number of connections that should be established
>   between the client and server up to a limit of 16.
>
> I don't understand how multiple TCP connections to the same
> server IP address will distribute the load across multiple network
> interfaces?
> I thought that lagg would have handled this?
>
> I could easily implement this, but I only have low end hardware
> to test on, so I doubt that I will see any performance improvement.

Pretty much all modern NICs are multiqueue and multiple connections will
distribute load across CPUs even without any lagg.  I think an nconnect
like option would be quite useful for NFS over high bandwidth links as
it's a lot easier to saturate the pipe using multiple connections than a
single one.

Regards,
Navdeep



Re: How to Increase TX Queue Priority for LACP Packets

2020-06-16 Thread Navdeep Parhar
No, it just enables some printf's in the LACP code and has no effect on
anything else.

Regards,
Navdeep


On Wed, Jun 17, 2020 at 12:31:33AM +, Saad, Mark wrote:
> Navdeep
>Thanks for getting back ; I’ll do some digging. Back to the
>question about running with LACP debug on . Does this put the nics
>into promiscuous mode ? 
> ---
> Mark Saad | mark.s...@lucera.com 
> 
> > On Jun 16, 2020, at 8:13 PM, Navdeep Parhar  wrote:
> > 
> > We could have a global knob that tells all NIC drivers to use a reserved
> > queue for non-RSS traffic, but that would be advisory at best because
> > the tx queue selection takes place inside the driver's (or iflib's)
> > transmit routine.  The meat of the change is going to be in iflib and
> > all non-iflib drivers' if_transmit.
> > 
> > Regards,
> > Navdeep
> > 
> >> On Tue, Jun 16, 2020 at 09:48:19PM +, Saad, Mark wrote:
> >> All
> >> Is there any way to make this change on other nic's like Intel ix and
> >> Solarflare sfxge ? I have seen similar issues on both with 12.1
> >> mainly with solarflare nics.
> >> 
> >> ---
> >> Mark Saad
> >> mark.s...@lucera.com
> >> 
> >> 
> >> 
> >> From: owner-freebsd-...@freebsd.org  on 
> >> behalf of Foster, Greg 
> >> Sent: Tuesday, June 16, 2020 3:56 PM
> >> To: Navdeep Parhar
> >> Cc: freebsd-net@freebsd.org
> >> Subject: RE: How to Increase TX Queue Priority for LACP Packets
> >> 
> >> HI Navdeep,
> >> 
> >> Thanks for the information!  I've integrated the changes and will be
> >> testing more today.
> >> 
> >> We have seen the LACP port flapping under different scenarios, most we
> >> believe are traffic/load based.
> >> 
> >> I did see the flapping unexpectedly when I just enabled LACP debug
> >> (e.g., sysctl net.link.lagg.lacp.debug=1). Is this a known
> >> problem?
> >> 
> >> Thanks
> >> Greg
> >> 
> >> -Original Message-
> >> From: Navdeep Parhar  On Behalf Of Navdeep Parhar
> >> Sent: Friday, June 12, 2020 7:51 PM
> >> To: Foster, Greg 
> >> Cc: freebsd-net@freebsd.org
> >> Subject: Re: How to Increase TX Queue Priority for LACP Packets
> >> 
> >>> On Fri, Jun 12, 2020 at 11:47:41PM +, Foster, Greg wrote:
> >>> FreeBSD Networkers,
> >>> 
> >>> We are seeing LACP port flapping on our FreeBSD 10.4/12.1 systems
> >>> under different conditions.
> >>> 
> >>> Can someone explain or point me to the information on how to queue
> >>> the LACP packets to a higher priority queue ?
> >>> 
> >>> We are using the Chelsio T580-LP-CR adapter/cxgbe driver.  The
> >>> Cheslio NICs have 8 TX/RX queues each, but I don't know how to
> >>> explicitly put the LACP packets in the higher priority TX queue.
> >>> 
> >>> I've read about PF/ALTQ and think this may be overkill our needs,
> >>> and was wondering if there was a simpler method.
> >> 
> >> This is cxgbe specific but that's what you're using so it'll do.
> >> 
> >> Add "hw.cxgbe.rsrv_noflowq=1" to your /boot/loader.conf.  That
> >> reserves one tx queue for non-RSS traffic (like ARP, LACP).  You might
> >> also want to increase the number of tx queues to compensate for the
> >> one that's now reserved.  Use "hw.cxgbe.ntxq=9" for that.  The ntxq
> >> knob might be different on 10.4 but the man page matching the driver
> >> should have its exact name.
> >> 
> >> Regards,
> >> Navdeep
> >> 
> >> ___
> >> freebsd-net@freebsd.org mailing list
> >> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> >> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> >> 
> > 
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: How to Increase TX Queue Priority for LACP Packets

2020-06-16 Thread Navdeep Parhar
We could have a global knob that tells all NIC drivers to use a reserved
queue for non-RSS traffic, but that would be advisory at best because
the tx queue selection takes place inside the driver's (or iflib's)
transmit routine.  The meat of the change is going to be in iflib and
all non-iflib drivers' if_transmit.

Regards,
Navdeep

On Tue, Jun 16, 2020 at 09:48:19PM +, Saad, Mark wrote:
> All
>  Is there any way to make this change on other nic's like Intel ix and
>  Solarflare sfxge ? I have seen similar issues on both with 12.1
>  mainly with solarflare nics.
> 
> ---
> Mark Saad
> mark.s...@lucera.com
> 
> 
> 
> From: owner-freebsd-...@freebsd.org  on behalf 
> of Foster, Greg 
> Sent: Tuesday, June 16, 2020 3:56 PM
> To: Navdeep Parhar
> Cc: freebsd-net@freebsd.org
> Subject: RE: How to Increase TX Queue Priority for LACP Packets
> 
> HI Navdeep,
> 
> Thanks for the information!  I've integrated the changes and will be
> testing more today.
> 
> We have seen the LACP port flapping under different scenarios, most we
> believe are traffic/load based.
> 
> I did see the flapping unexpectedly when I just enabled LACP debug
> (e.g., sysctl net.link.lagg.lacp.debug=1). Is this a known
> problem?
> 
> Thanks
> Greg
> 
> -Original Message-
> From: Navdeep Parhar  On Behalf Of Navdeep Parhar
> Sent: Friday, June 12, 2020 7:51 PM
> To: Foster, Greg 
> Cc: freebsd-net@freebsd.org
> Subject: Re: How to Increase TX Queue Priority for LACP Packets
> 
> On Fri, Jun 12, 2020 at 11:47:41PM +, Foster, Greg wrote:
> > FreeBSD Networkers,
> >
> > We are seeing LACP port flapping on our FreeBSD 10.4/12.1 systems
> > under different conditions.
> >
> > Can someone explain or point me to the information on how to queue
> > the LACP packets to a higher priority queue ?
> >
> > We are using the Chelsio T580-LP-CR adapter/cxgbe driver.  The
> > Cheslio NICs have 8 TX/RX queues each, but I don't know how to
> > explicitly put the LACP packets in the higher priority TX queue.
> >
> > I've read about PF/ALTQ and think this may be overkill our needs,
> > and was wondering if there was a simpler method.
> 
> This is cxgbe specific but that's what you're using so it'll do.
> 
> Add "hw.cxgbe.rsrv_noflowq=1" to your /boot/loader.conf.  That
> reserves one tx queue for non-RSS traffic (like ARP, LACP).  You might
> also want to increase the number of tx queues to compensate for the
> one that's now reserved.  Use "hw.cxgbe.ntxq=9" for that.  The ntxq
> knob might be different on 10.4 but the man page matching the driver
> should have its exact name.
> 
> Regards,
> Navdeep
> 
> ___
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> 
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: How to Increase TX Queue Priority for LACP Packets

2020-06-16 Thread Navdeep Parhar
On Tue, Jun 16, 2020 at 07:56:19PM +, Foster, Greg wrote:
> HI Navdeep,
> 
> Thanks for the information!  I've integrated the changes and will be
> testing more today.
> 
> We have seen the LACP port flapping under different scenarios, most we
> believe are traffic/load based.
> 
> I did see the flapping unexpectedly when I just enabled LACP debug
> (e.g., sysctl net.link.lagg.lacp.debug=1). Is this a known
> problem?

No, I don't think so.  The debug output goes to the console so it is
relatively slow but it shouldn't cause any flaps directly.

Regards,
Navdeep

> 
> Thanks
> Greg
> 
> -Original Message-
> From: Navdeep Parhar  On Behalf Of Navdeep Parhar
> Sent: Friday, June 12, 2020 7:51 PM
> To: Foster, Greg 
> Cc: freebsd-net@freebsd.org
> Subject: Re: How to Increase TX Queue Priority for LACP Packets
> 
> On Fri, Jun 12, 2020 at 11:47:41PM +, Foster, Greg wrote:
> > FreeBSD Networkers,
> >
> > We are seeing LACP port flapping on our FreeBSD 10.4/12.1 systems
> > under different conditions.
> >
> > Can someone explain or point me to the information on how to queue
> > the LACP packets to a higher priority queue ?
> >
> > We are using the Chelsio T580-LP-CR adapter/cxgbe driver.  The
> > Cheslio NICs have 8 TX/RX queues each, but I don't know how to
> > explicitly put the LACP packets in the higher priority TX queue.
> >
> > I've read about PF/ALTQ and think this may be overkill our needs,
> > and was wondering if there was a simpler method.
> 
> This is cxgbe specific but that's what you're using so it'll do.
> 
> Add "hw.cxgbe.rsrv_noflowq=1" to your /boot/loader.conf.  That
> reserves one tx queue for non-RSS traffic (like ARP, LACP).  You might
> also want to increase the number of tx queues to compensate for the
> one that's now reserved.  Use "hw.cxgbe.ntxq=9" for that.  The ntxq
> knob might be different on 10.4 but the man page matching the driver
> should have its exact name.
> 
> Regards,
> Navdeep
> 
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: How to Increase TX Queue Priority for LACP Packets

2020-06-12 Thread Navdeep Parhar
On Fri, Jun 12, 2020 at 11:47:41PM +, Foster, Greg wrote:
> FreeBSD Networkers,
> 
> We are seeing LACP port flapping on our FreeBSD 10.4/12.1 systems under
> different conditions.
> 
> Can someone explain or point me to the information on how to queue the
> LACP packets to a higher priority queue ?
> 
> We are using the Chelsio T580-LP-CR adapter/cxgbe driver.  The Cheslio
> NICs have 8 TX/RX queues each, but I don't know how to explicitly put
> the LACP packets in the higher priority TX queue.
>
> I've read about PF/ALTQ and think this may be overkill our needs, and
> was wondering if there was a simpler method.

This is cxgbe specific but that's what you're using so it'll do.

Add "hw.cxgbe.rsrv_noflowq=1" to your /boot/loader.conf.  That reserves
one tx queue for non-RSS traffic (like ARP, LACP).  You might also want
to increase the number of tx queues to compensate for the one that's now
reserved.  Use "hw.cxgbe.ntxq=9" for that.  The ntxq knob might be
different on 10.4 but the man page matching the driver should have its
exact name.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Chelsio NETMAP performance

2020-02-05 Thread Navdeep Parhar
On Wed, Feb 05, 2020 at 02:38:32PM +0300, Slawa Olhovchenkov wrote:
> On Tue, Feb 04, 2020 at 12:37:08PM -0800, Navdeep Parhar wrote:
> 
> > >> nm_holdoff_tmr_idx is a 0-based index into the list above.  So if the
> > >> tmr idx is 0 you are using the 0th (first) value from the list of
> > >> timers.  Try increasing nm_holdoff_tmr_idx and see if that brings down
> > >> the interrupt rate under control.
> > >>
> > >> # sysctl hw.cxgbe.nm_holdoff_tmr_idx=3/4/5
> > > 
> > > OK, interrupt rate go down, but interrupt time about same.
> > > (interrupt rate for intel card about 0, compared to 25% chelsio).
> > 
> > I think iflib runs a lot of stuff in taskqueues rather than the driver
> > ithread so the CPU accounting may vary.  Use dtrace to see if
> 
> Don't think this is impact: worker's CPU core w/o any syscalls and
> only w/ bunding workker thread and NIC irq handler show about 100%
> user CPU time.
> 
> May be some cache-miss work performed later, at poll(2) time in case
> of intel driver compared to chelsio (do at interrupt time)?

Could be.  While we are here, is it possible for you to try the patches
in these two?

https://reviews.freebsd.org/D17868
https://reviews.freebsd.org/D17869

> 
> > netmap_rx_irq is being called by an ithread or a taskqueue to figure out
> > what driver does what.
> 
> Can you explain some more?
> I am not sure about dtrace probe to use and later evaluation

# dtrace -n 'fbt::netmap_rx_irq:entry {stack()}'

Take a look at the stack and see if it's an ithread or one of iflib's
taskqueues that called netmap_rx_irq.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Chelsio NETMAP performance

2020-02-04 Thread Navdeep Parhar
On 2/4/20 8:20 AM, Slawa Olhovchenkov wrote:
> On Mon, Feb 03, 2020 at 02:39:03PM -0800, Navdeep Parhar wrote:
> 
>> On 2/3/20 2:23 PM, Slawa Olhovchenkov wrote:
>>> On Mon, Feb 03, 2020 at 01:39:52PM -0800, Navdeep Parhar wrote:
>>>
>>>> On 2/3/20 12:17 PM, Slawa Olhovchenkov wrote:
>>>>> I am try to use Chelsio T540-CR in netmap mode and see poor (compared
>>>>> to Intel 82599ES) performance.
>>>>
>>>> What approximate FreeBSD version is this?
>>>
>>> 12.1-STABLE
>>>
>>>>>
>>>>> Same application ac receive only about 8.9Mpss, compared to 12.5Mpps
>>>>> at Intel.
>>>>>
>>>>> pmc profile show mostly time spend in:
>>>>>
>>>>> 49.76%  [17802]service_nm_rxq @ /boot/kernel/if_cxgbe.ko
>>>>>  100.0%  [17802] t4_vi_intr
>>>>>   100.0%  [17802]  ithread_loop @ /boot/kernel/kernel
>>>>>100.0%  [17802]   fork_exit
>>>>>
>>>>>
>>>>> to be exact at line
>>>>>
>>>>> while ((d->rsp.u.type_gen & F_RSPD_GEN) == nm_rxq->iq_gen) {
>>>>>
>>>>> Is this maximum limit for this vendor?
>>>>
>>>> No, a T540 should be able to sink full 10Gbps (14.88Mpps) on a single rx
>>>> queue.  Try adding this to your loader.conf:
>>>>
>>>> hw.cxgbe.toecaps_allowed="0"
>>>>
>>>> Then try simple netmap "pkt-gen -f rx" instead of any custom app and see
>>>> how many pps it's able to sink.
>>>
>>> Thanks! `hw.cxgbe.toecaps_allowed="0"` allow recive 14Mpps for may
>>> application too!
>>>
>>> Now I am got only 10% less performance compared to Intel, as I see by
>>> higher Chelsio interrupt cpu time (top show about 30% for every
>>> interrupt handler). Is this normal? Is this posible to optimize?
>>
>> Try changing the interrupt holdoff timer for the netmap rx queues.
>>
>> This shows the list of timers available (in microseconds):
>> # sysctl dev.t5nex.0.holdoff_timers
>>
>> nm_holdoff_tmr_idx is a 0-based index into the list above.  So if the
>> tmr idx is 0 you are using the 0th (first) value from the list of
>> timers.  Try increasing nm_holdoff_tmr_idx and see if that brings down
>> the interrupt rate under control.
>>
>> # sysctl hw.cxgbe.nm_holdoff_tmr_idx=3/4/5
> 
> OK, interrupt rate go down, but interrupt time about same.
> (interrupt rate for intel card about 0, compared to 25% chelsio).

I think iflib runs a lot of stuff in taskqueues rather than the driver
ithread so the CPU accounting may vary.  Use dtrace to see if
netmap_rx_irq is being called by an ithread or a taskqueue to figure out
what driver does what.

Are you also transmitting a lot out of this node or is it mostly Rx?
There's no need to worry about Tx updates (and the interrupts they might
generate) if this is an Rx-mostly workload.

> Most time spent in service_nm_rxq(), in while() check.
> Is this posible to do some prefetch?
> Trivial `__builtin_prefetch(64+(char*)d);` in body of loop don't
> change anything.
> 
> Is this posible to do batch prefetch before cycle?

prefetches are not possible here.  That while condition is waiting for
the ownership bit of the rx descriptor to  flip, indicating there is
work for the driver to do.

Regards,
Navdeep


___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Chelsio NETMAP performance

2020-02-03 Thread Navdeep Parhar
On 2/3/20 2:23 PM, Slawa Olhovchenkov wrote:
> On Mon, Feb 03, 2020 at 01:39:52PM -0800, Navdeep Parhar wrote:
> 
>> On 2/3/20 12:17 PM, Slawa Olhovchenkov wrote:
>>> I am try to use Chelsio T540-CR in netmap mode and see poor (compared
>>> to Intel 82599ES) performance.
>>
>> What approximate FreeBSD version is this?
> 
> 12.1-STABLE
> 
>>>
>>> Same application ac receive only about 8.9Mpss, compared to 12.5Mpps
>>> at Intel.
>>>
>>> pmc profile show mostly time spend in:
>>>
>>> 49.76%  [17802]service_nm_rxq @ /boot/kernel/if_cxgbe.ko
>>>  100.0%  [17802] t4_vi_intr
>>>   100.0%  [17802]  ithread_loop @ /boot/kernel/kernel
>>>100.0%  [17802]   fork_exit
>>>
>>>
>>> to be exact at line
>>>
>>> while ((d->rsp.u.type_gen & F_RSPD_GEN) == nm_rxq->iq_gen) {
>>>
>>> Is this maximum limit for this vendor?
>>
>> No, a T540 should be able to sink full 10Gbps (14.88Mpps) on a single rx
>> queue.  Try adding this to your loader.conf:
>>
>> hw.cxgbe.toecaps_allowed="0"
>>
>> Then try simple netmap "pkt-gen -f rx" instead of any custom app and see
>> how many pps it's able to sink.
> 
> Thanks! `hw.cxgbe.toecaps_allowed="0"` allow recive 14Mpps for may
> application too!
> 
> Now I am got only 10% less performance compared to Intel, as I see by
> higher Chelsio interrupt cpu time (top show about 30% for every
> interrupt handler). Is this normal? Is this posible to optimize?

Try changing the interrupt holdoff timer for the netmap rx queues.

This shows the list of timers available (in microseconds):
# sysctl dev.t5nex.0.holdoff_timers

nm_holdoff_tmr_idx is a 0-based index into the list above.  So if the
tmr idx is 0 you are using the 0th (first) value from the list of
timers.  Try increasing nm_holdoff_tmr_idx and see if that brings down
the interrupt rate under control.

# sysctl hw.cxgbe.nm_holdoff_tmr_idx=3/4/5

Regards,
Navdeep

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Chelsio NETMAP performance

2020-02-03 Thread Navdeep Parhar
On 2/3/20 12:17 PM, Slawa Olhovchenkov wrote:
> I am try to use Chelsio T540-CR in netmap mode and see poor (compared
> to Intel 82599ES) performance.

What approximate FreeBSD version is this?

> 
> Same application ac receive only about 8.9Mpss, compared to 12.5Mpps
> at Intel.
> 
> pmc profile show mostly time spend in:
> 
> 49.76%  [17802]service_nm_rxq @ /boot/kernel/if_cxgbe.ko
>  100.0%  [17802] t4_vi_intr
>   100.0%  [17802]  ithread_loop @ /boot/kernel/kernel
>100.0%  [17802]   fork_exit
> 
> 
> to be exact at line
> 
> while ((d->rsp.u.type_gen & F_RSPD_GEN) == nm_rxq->iq_gen) {
> 
> Is this maximum limit for this vendor?

No, a T540 should be able to sink full 10Gbps (14.88Mpps) on a single rx
queue.  Try adding this to your loader.conf:

hw.cxgbe.toecaps_allowed="0"

Then try simple netmap "pkt-gen -f rx" instead of any custom app and see
how many pps it's able to sink.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: unexplained latency, interrupt spikes and loss of throughput on FreeBSD router/firewall system

2020-01-15 Thread Navdeep Parhar
On 1/15/20 6:55 AM, John Jasen wrote:
> Executive summary:
> 
> Periodically, load will spike on network interrupts on one of our
> firewalls. Latency will quickly climb to the point that things are
> unresponsive, sessions will timeout, and bandwidth will plummet.

Is this with 9000 MTU?  Can you please post "netstat -m" from this
system?  Assuming this is 9000 MTU, try setting this in
/boot/loader.conf and reboot:

hw.cxgbe.largest_rx_cluster=4096

> We do not see increases in ethernet pause frames, drops, errors, or
> anything else like that from the system.

This part is strange.  The incoming frames are either being dropped
(errors or overflows) or getting throttled via pause frames.  I'd have
expected "netstat -dI " to show errors or drops or "sysctl dev.cc
dev.cxl | grep pause" to show some activity.  Can you please double check?

Regards,
Navdeep

> 
> Usually, the quickest fix is to failover to the backup firewall. At that
> time, the backup firewall behaves normally and interrupt load drops on the
> afflicted firewall device.
> 
> I'm stumped. Networking says its these systems. I believe its something on
> other side.
> 
> Any ideas?
> 
> Background information:
> FreeBSD 11.3-RELEASE-p3
> hw.machine: amd64
> hw.model: Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
> hw.ncpu: 24
> hw.machine_arch: amd64
> Firewall: pf
> failover: CARP
> network cards: seen with Chelsio T5-580 and T6 series cards.
> other networking information: VLANs are in use. Occasional LAGG usage as
> well.
> 
>   When this occurs, some of the interrupts dedicated to cxgbe queues spike
> to 100%.  Latency climbs to the point that TCP timeouts start kicking in,
> and users start complaining. Bandwidth drops from 2-3Gbs to ~100-200Mbs
> 
> netstat shows no increase of error or drop packets. sysctl shows no
> increase in pause frames.
> 
> I'm happy to provide further information.
> ___
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> 

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: logs/traces

2019-10-15 Thread Navdeep Parhar
Have you looked at siftr(4) or dtrace_tcp(4)?

Regards,
Navdeep

On Tue, Oct 15, 2019 at 07:15:27PM -0700, vm finance wrote:
> Hi Kevin,
> 
> I am looking to enable traces/log messages (like syslog or
> /var/log/messages) inside the codebase... any pointers for tcp/ip.
> tcpdump shows what is going on wire - but I would like to trace code
> internals...printk..
> 
> Thanks a lot!
> 
> On Tue, Oct 15, 2019 at 6:11 PM Kevin Oberman  wrote:
> 
> > Use tcpdump(1) and/or net/wireshark(5). See man tcpdump and pcap-filter
> > for usage details. wireshark can analyze files collected by tcpdump and
> > dissect the packets. It can also do packet capture, itself.
> > --
> > Kevin Oberman, Part time kid herder and retired Network Engineer
> > E-mail: rkober...@gmail.com
> > PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683
> >
> >
> > On Tue, Oct 15, 2019 at 3:17 AM vm finance  wrote:
> >
> >> Hi,
> >>
> >> Could someone please guide me on how to turn on tracing/log?
> >>
> >> I would like to follow/track how packets go in/out of TCP code block...
> >> Please let me know what knobs are available to achieve this.
> >>
> >> Thanks for any pointers.
> >> ___
> >> freebsd-net@freebsd.org mailing list
> >> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> >> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> >>
> >
> ___
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Chelsio TOE not working in CURRENT

2019-03-29 Thread Navdeep Parhar
Fixed in r345701.


On 2019-03-28 22:59, Dustin Marquess wrote:
> On a brand new CURRENT Chelsio (cxl) TOE doesn't work, as loading the
> dependent t4_tom module fails:
> 
> If VIMAGE is enabled:
> 
> link_elf_obj: symbol vnet_entry_tcp_autorcvbuf_inc undefined
> linker_load_file: /boot/kernel/t4_tom.ko - unsupported file type
> 
> If VIMAGE isn't enabled:
> 
> link_elf_obj: symbol tcp_autorcvbuf_inc undefined
> linker_load_file: /boot/kernel/t4_tom.ko - unsupported file type
> 
> Any tips?
> 
> Thanks!
> -Dustin
> ___
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> 

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: netmap on cxgb (Chelsio T3) — panic on transmit

2018-11-26 Thread Navdeep Parhar
On 11/22/18 7:30 AM, Lev Serebryakov wrote:
> 
>  I've obtained Chelsio T3 for my "network lab". It works with cxgb
> driver well, but when I try to use Netmap's pkt-gen on it it crashes
> system immediately with such message:
> 
> panic: trying to coalesce 9 packets in to one WR
> 
> I've turned all checksums, lro and tso off, but it doesn't help.
> 
> Do I have any chances to get netmap supported (maybe, not very
> efficient) on this NIC?
> 

The T3 is a very old chip that has been EoL'd for some time and it's not
likely to get native netmap support.

Your panic must be while using netmap's emulation mode on top of cxgb.
Try modifying check_pkt_coalesce() in the driver to always return 0 and
see if that avoids the panic.  Don't expect much performance-wise even
if that works.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: cxl nic not working after reboot

2018-08-30 Thread Navdeep Parhar
On 8/30/18 3:14 PM, Navdeep Parhar wrote:
> On 8/30/18 2:51 PM, Marius Halden wrote:
>> On Thu, Aug 30, 2018, at 19:27, Navdeep Parhar wrote:
>>> On 8/30/18 4:21 AM, Marius Halden wrote:
>>>> I tried to downgrade to a the previous bsdrp version we were running based 
>>>> on 11.1-RELEASE-p10, but it did not start working again. ifconfig output 
>>>> and t5nex0 devlog follows (non-working state), but it all looks fine to me 
>>>> (media has always been reported wrong). I did notice though that 
>>>> "hw_mac_init_port[0], …" will always be logged when it works but not when 
>>>> when it doesn't work. I confirmed this on a box still running the old 
>>>> version which functions as intended.
>>>>
>>>> The ones not working has a newer firmware version than the ones working.
>>>>
>>>> Non-working:
>>>> # sysctl dev.t5nex.0.firmware_version
>>>> dev.t5nex.0.firmware_version: 1.19.1.0
>>>>
>>>>
>>>> Working:
>>>> # sysctl dev.t5nex.0.firmware_version
>>>> dev.t5nex.0.firmware_version: 1.16.45.0
>>>>
>>>> Any other ideas?
>>>>
>>>
>>> I'm still looking into it.  The current working theory is that the peer
>>> is trying to autonegotiate when it shouldn't (it's 1Gbps optics).  If
>>> you know how to disable AN on the peer this should be easy to test.
>>
>> Thanks. Unfortunately we don't have access to the peer as that's our ISPs 
>> router. According to the sysctl autonegotiate is not supported on our side.
>>
> 
> I'll get a couple of 1G optics in a day or so and then I'll be able to
> try some things for myself.  Wait for that or try a 1G TWINAX (copper)
> SFP+ cable in case you have one -- that should be able to link up (if

Sorry, this won't work either.  At 1G, AN is done only for -BT (RJ45)
copper cables, not for SFP+ TWINAX copper.

Regards,
Navdeep

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: cxl nic not working after reboot

2018-08-30 Thread Navdeep Parhar
On 8/30/18 2:51 PM, Marius Halden wrote:
> On Thu, Aug 30, 2018, at 19:27, Navdeep Parhar wrote:
>> On 8/30/18 4:21 AM, Marius Halden wrote:
>>> I tried to downgrade to a the previous bsdrp version we were running based 
>>> on 11.1-RELEASE-p10, but it did not start working again. ifconfig output 
>>> and t5nex0 devlog follows (non-working state), but it all looks fine to me 
>>> (media has always been reported wrong). I did notice though that 
>>> "hw_mac_init_port[0], …" will always be logged when it works but not when 
>>> when it doesn't work. I confirmed this on a box still running the old 
>>> version which functions as intended.
>>>
>>> The ones not working has a newer firmware version than the ones working.
>>>
>>> Non-working:
>>> # sysctl dev.t5nex.0.firmware_version
>>> dev.t5nex.0.firmware_version: 1.19.1.0
>>>
>>>
>>> Working:
>>> # sysctl dev.t5nex.0.firmware_version
>>> dev.t5nex.0.firmware_version: 1.16.45.0
>>>
>>> Any other ideas?
>>>
>>
>> I'm still looking into it.  The current working theory is that the peer
>> is trying to autonegotiate when it shouldn't (it's 1Gbps optics).  If
>> you know how to disable AN on the peer this should be easy to test.
> 
> Thanks. Unfortunately we don't have access to the peer as that's our ISPs 
> router. According to the sysctl autonegotiate is not supported on our side.
> 

I'll get a couple of 1G optics in a day or so and then I'll be able to
try some things for myself.  Wait for that or try a 1G TWINAX (copper)
SFP+ cable in case you have one -- that should be able to link up (if
it's really autonegotiation that's causing problems).

Regards,
Navdeep

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: cxl nic not working after reboot

2018-08-30 Thread Navdeep Parhar
On 8/30/18 4:21 AM, Marius Halden wrote:
> On Wed, Aug 29, 2018, at 12:22, Marius Halden wrote:
>> On Wed, Aug 29, 2018, at 08:28, Navdeep Parhar wrote:
>>>>>> Provide the output of these commands when the link isn't up:
>>>>>> # ifconfig -mvvv 
>>>>>> # sysctl -n dev.t5nex.0.misc.devlog
>>>
>>> Can you provide the output from when the port is working?
>>
>> Sure, the output from when it works follows.
> 
> I tried to downgrade to a the previous bsdrp version we were running based on 
> 11.1-RELEASE-p10, but it did not start working again. ifconfig output and 
> t5nex0 devlog follows (non-working state), but it all looks fine to me (media 
> has always been reported wrong). I did notice though that 
> "hw_mac_init_port[0], …" will always be logged when it works but not when 
> when it doesn't work. I confirmed this on a box still running the old version 
> which functions as intended.
> 
> The ones not working has a newer firmware version than the ones working.
> 
> Non-working:
> # sysctl dev.t5nex.0.firmware_version
> dev.t5nex.0.firmware_version: 1.19.1.0
> 
> 
> Working:
> # sysctl dev.t5nex.0.firmware_version
> dev.t5nex.0.firmware_version: 1.16.45.0
> 
> Any other ideas?
> 

I'm still looking into it.  The current working theory is that the peer
is trying to autonegotiate when it shouldn't (it's 1Gbps optics).  If
you know how to disable AN on the peer this should be easy to test.

Regards,
Navdeep

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: cxl nic not working after reboot

2018-08-28 Thread Navdeep Parhar
On 8/28/18 12:30 PM, Marius Halden wrote:
> On Tue, Aug 28, 2018, at 20:32, Navdeep Parhar wrote:
>> On 8/28/18 11:27 AM, Marius Halden wrote:
>>> On Tue, Aug 28, 2018, at 20:06, Navdeep Parhar wrote:
>>>> On 8/28/18 2:35 AM, Marius Halden wrote:
>>>>> ...
>>>>> media: Ethernet 1000baseSX 
>>>>> status: active
>>>>> supported media:
>>>>> media 1000baseSX mediaopt full-duplex,rxpause,txpause
>>>>> media 1000baseSX mediaopt full-duplex,rxpause
>>>>> media 1000baseSX mediaopt full-duplex,txpause
>>>>> media 1000baseSX mediaopt full-duplex
>>>>> plugged: SFP/SFP+/SFP28 1000BASE-LX (LC)
>>>>
>>>> This shows that the SFP+ is recognized properly and the link is up
>>>> (active).  Can you please provide the outputs of ifconfig and sysctl
>>>> from when the system is in the problem state, when the link doesn't come 
>>>> up?
>>>
>>> This is from when it's in the problem state. According to ifconfig there is 
>>> link and everything looks fine to me, but it doesn't seem to pass any 
>>> traffic through the interface at all.
>>>
>>
>> Try passing some network traffic through the interface and look at
>> before/after state of the MAC counters in sysctl dev.cxl.0.stats.  Do
>> you see the tx_frames/rx_frames counter move at all?
> 
> tx_frames does move, rx_frames is stuck at zero. The following counters are 
> non-zero and does increase when traffic is sent through the interface, all 
> other are stuck at zero:
> 
> dev.cxl.0.stats.tx_frames_65_127: 26083
> dev.cxl.0.stats.tx_frames_64: 4084
> dev.cxl.0.stats.tx_mcast_frames: 26083
> dev.cxl.0.stats.tx_bcast_frames: 4084
> dev.cxl.0.stats.tx_frames: 30167
> dev.cxl.0.stats.tx_octets: 2608846
> 
> Anything else I should look at?
> 

What is on the other side of the link?  Look at the peer's rx stats and
see if it received the frames that cxl0 claims it has transmitted or not.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: cxl nic not working after reboot

2018-08-28 Thread Navdeep Parhar
On 8/28/18 11:27 AM, Marius Halden wrote:
> On Tue, Aug 28, 2018, at 20:06, Navdeep Parhar wrote:
>> On 8/28/18 2:35 AM, Marius Halden wrote:
>>> ...
>>> media: Ethernet 1000baseSX 
>>> status: active
>>> supported media:
>>> media 1000baseSX mediaopt full-duplex,rxpause,txpause
>>> media 1000baseSX mediaopt full-duplex,rxpause
>>> media 1000baseSX mediaopt full-duplex,txpause
>>> media 1000baseSX mediaopt full-duplex
>>> plugged: SFP/SFP+/SFP28 1000BASE-LX (LC)
>>
>> This shows that the SFP+ is recognized properly and the link is up
>> (active).  Can you please provide the outputs of ifconfig and sysctl
>> from when the system is in the problem state, when the link doesn't come up?
> 
> This is from when it's in the problem state. According to ifconfig there is 
> link and everything looks fine to me, but it doesn't seem to pass any traffic 
> through the interface at all.
> 

Try passing some network traffic through the interface and look at
before/after state of the MAC counters in sysctl dev.cxl.0.stats.  Do
you see the tx_frames/rx_frames counter move at all?

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: cxl nic not working after reboot

2018-08-28 Thread Navdeep Parhar
On 8/28/18 2:35 AM, Marius Halden wrote:
> ...
> media: Ethernet 1000baseSX 
> status: active
> supported media:
> media 1000baseSX mediaopt full-duplex,rxpause,txpause
> media 1000baseSX mediaopt full-duplex,rxpause
> media 1000baseSX mediaopt full-duplex,txpause
> media 1000baseSX mediaopt full-duplex
> plugged: SFP/SFP+/SFP28 1000BASE-LX (LC)

This shows that the SFP+ is recognized properly and the link is up
(active).  Can you please provide the outputs of ifconfig and sysctl
from when the system is in the problem state, when the link doesn't come up?

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: cxl nic not working after reboot

2018-08-27 Thread Navdeep Parhar
On Mon, Aug 27, 2018 at 04:19:29PM +0200, Marius Halden wrote:
> Hi,
> 
> We have some routers with Chelsio T540-CR NICs using 1Gbps SFPs (1000
> Base-LX IIRC) to connect to our ISPs. After upgrading them to FreeBSD
> 11.2-p2 (BSDRP v1.91) we have run into some issues. When the machines
> are rebooted networking does not function at all through the ports
> connected with SFPs, ports connected with DAC cables work properly.
> According to ifconfig the SFPs are detected properly and they both

What exact ifconfig command do you use to bring the interfaces up?

Provide the output of these commands when the link isn't up:
# ifconfig -mvvv 
# sysctl -n dev.t5nex.0.misc.devlog

> send and receive light as they should. If we remove the SFPs from the
> port and then reinsert them in the same port everything will start
> working as it should.
> 
> Has anyone else observed this problem? Any suggestions for what can be
> done to solve/work around this issue without actually going to the
> datacenter would be great.

Try ifconfig down/up instead of physically removing and then reinserting
the SFP module.  Does that work?

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: netmap & chelsio

2018-07-11 Thread Navdeep Parhar
On 07/11/18 07:58, Eggert, Lars wrote:
> Hi,
> 
> I have netmap working with the T6 cards now.
> 
> However, performance is very poor. It seems to take several milliseconds 
> after a NIOCTXSYNC ioctl before the tail is updated?

Try changing lazy_tx_credit_flush to 0 on the running kernel with a
debugger, or compile the driver with it set to 0 -- it's in t4_netmap.c:

int lazy_tx_credit_flush = 1;

I'm surprised I don't have a tunable/sysctl for it.  I'll add one really
soon.


Regards,
Navdeep

> 
> In case it matters, here is what is in loader.conf:
> 
> hw.cxgbe.num_vis=2
> hw.cxgbe.fl_pktshift=0
> hw.cxgbe.ntxq=1
> hw.cxgbe.nrxq=1
> hw.cxgbe.qsize_txq=512
> hw.cxgbe.qsize_rxq=512
> hw.cxgbe.cong_drop=1
> hw.cxgbe.pause_settings=1
> hw.cxgbe.autoneg=0
> hw.cxgbe.nm_rx_nframes=1
> hw.cxgbe.nm_rx_ndesc=1
> 
> Lars
> 

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: fix for some netmap drivers

2018-02-20 Thread Navdeep Parhar
Done in r329675.


On Mon, 2018-02-19 at 16:41 +0100, Vincenzo Maffione wrote:
> Hello,
>   Can anyone please apply the attached patch? It follows up the
> removal of the nkr_slot_flags in the upstream netmap.
> The change fixes compilation issues and has no effect on
> functionality.
> 
> Thanks,
>   Vincenzo
> 
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [freebsd-current]Who should reset M_PKTHDR flag in m_buf when IP packets are fragmented. m_unshare panic throw when IPSec is enabled

2017-12-27 Thread Navdeep Parhar

On 12/27/2017 12:59, Andrey V. Elsukov wrote:

On 27.12.2017 23:09, Navdeep Parhar wrote:

It is not clear to me why it helps. The panic happens on outbound path,
where mbuf should be allocated by network stack and should be writeable.
ip_reass() usually used on inbound path. I think the patch just hides
the problem in another place.
Do you mean that cxgbe can produce !WRITEABLE mbuf for received packet
and then pass it to the network stack?


Yes, cxgbe does that.  But I think the real bug here is in ip_reass
because it doesn't properly get rid of the pkthdr of the fragments while
creating the reassembled datagram.  cxgbe happens to trip on this easily
because it often creates !WRITEABLE mbufs.


 From the quick look, I don't see the code in netipsec and in crypto,
that does check mbuf is WRITEABLE. It is expected that in most cases for
received mbuf the data will be decrypted and copied back into the given
buffer. Can this lead to memory corruption?


This should fix it:
https://people.freebsd.org/~np/ip_reass_demotehdr.diff

It will also fix leaks in configurations where mbuf tags are in use by
default (for example with MAC), ip_reass is involved during rx, and the
mbuf chain never gets m_demote'd elsewhere (meaning ip_reass should have
freed the tags itself).


I think such chain with several mbufs with M_PKTHDR flag is created with
m_cat() due to !WRITEABLE mbufs. And when mbuf chain will be freed, the
tags chain will be also destroyed by mbuf zone destructor.


I see m_freem/m_free will do the right thing but such a chain isn't 
legal.  m_unshare is complaining about it here.  m_sanity on the chain 
will fail too.


m_cat says it will leave the pkthdr alone so it is working as 
advertised.  It's the caller's job to clean up headers etc. to keep the 
mbuf chain valid.




If you think it solves the problem, the IPv6 fragment reassembly
probably needs the same code. But I think that M_WRITEABLE flag is not
properly handled is the problem too.



I think M_WRITEABLE is being handled properly here.  m_unshare deals 
with the chain just fine apart from this assert about multiple M_PKTHDR.


I'll fix IP6 reassembly too and post to phabricator if the change looks ok?

Regards,
Navdeep

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [freebsd-current]Who should reset M_PKTHDR flag in m_buf when IP packets are fragmented. m_unshare panic throw when IPSec is enabled

2017-12-27 Thread Navdeep Parhar

On 12/26/2017 03:33, Andrey V. Elsukov wrote:

On 26.12.2017 13:22, Harsh Jain wrote:

panic: m_unshare: m0 0xf80020f82600, m 0xf8005d054100 has M_PKTHDR
cpuid = 15
time = 1495578455
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame 0xfe044e9bb890
kdb_backtrace() at kdb_backtrace+0x53/frame 0xfe044e9bb960
vpanic() at vpanic+0x269/frame 0xfe044e9bba30
kassert_panic() at kassert_panic+0xc7/frame 0xfe044e9bbac0
m_unshare() at m_unshare+0x578/frame 0xfe044e9bbbc0
esp_output() at esp_output+0x44c/frame 0xfe044e9bbe40
ipsec4_perform_request() at ipsec4_perform_request+0x5df/frame 
0xfe044e9bbff0

Hi,

it seems unusual that IP reassembly happens on outbound path.

It can be re-produced with single Ping packet on chelsio(cxgbe) NIC. I tried 
with Intel NIC. It seems they re-produce M_WRITEABLE() buffer(follows different 
path in m_unshare) which is not true for cxgbe.


In my view, IP fragmentation should occur in ip_output after IPsec
encryption. Something like:

1. rip_output() has mbuf chain where only first mbuf has M_PKTHDR flag
2. ip_output() -> IPSEC_OUTPUT() -> esp_output() -> m_unshare(). We
should still have only one mbuf with M_PKTHDR flag here.
3. esp_output_cb() -> ipsec_process_done() -> ip_output()
4. Now IP fragmentation should occur: ip_fragment() creates chain of
mbufs to send, where M_PKTHDR flag will be set for each fragment.


Do you have some packet normalization using firewall?

Default FREEBSD current installation. No explicit firewall.
What you think above patch makes sense.


It is not clear to me why it helps. The panic happens on outbound path,
where mbuf should be allocated by network stack and should be writeable.
ip_reass() usually used on inbound path. I think the patch just hides
the problem in another place.
Do you mean that cxgbe can produce !WRITEABLE mbuf for received packet
and then pass it to the network stack?



Yes, cxgbe does that.  But I think the real bug here is in ip_reass 
because it doesn't properly get rid of the pkthdr of the fragments while 
creating the reassembled datagram.  cxgbe happens to trip on this easily 
because it often creates !WRITEABLE mbufs.


This should fix it:
https://people.freebsd.org/~np/ip_reass_demotehdr.diff

It will also fix leaks in configurations where mbuf tags are in use by 
default (for example with MAC), ip_reass is involved during rx, and the 
mbuf chain never gets m_demote'd elsewhere (meaning ip_reass should have 
freed the tags itself).


Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: packets per second routing tests: mismatch between switch (~4M pps) and netstat (~10M pps) scores

2017-09-15 Thread Navdeep Parhar
The tx errors/drops are almost always due to software queue overflow.
There isn't much that can go wrong once the frame has successfully
been submitted to the hw for tx.

Do you have a lot of PAUSE coming in during the tests where you see
just 31Kpps tx?  You can monitor the incoming PAUSE frames with this:
# sysctl dev.cc | grep rx_pause

# sysctl dev.cc..stats has a lot of stats straight from the
hardware.  The hardware's tx_ stats should be clean even though
netstat reports a lot of errors/drops as long as the drops are in the
software.

Regards,
Navdeep

On Fri, Sep 15, 2017 at 8:18 AM, Caraballo-Vega, Jordan A.
(GSFC-6062)[COMPUTER SCIENCE CORP] 
wrote:
> Hi all,
>
> We are currently running udp 1500 bytes pps tests through a Dell PE R530
> and two Chelsio T62100-LP-CR NICs acting as a router. At the switch we
> are seeing ~4M pps, while with "netstat -w1 -d -h" we are seeing ~10M
> pps or ~31k in the same test. Any idea of why scores are so different?
> It could be a misconfiguration of my network interfaces? Where should I
> look at?
>
> Scores from the switch are as follow:
>
> 2017-09-15 10:57:57: duration=360, nodes=24, streams=85ea/2040agg,
> frame_size=1500, pps=1499977, Mbps=17999: Zone-1(in:57%, out:0%)
> Zone-2(in:0%, out:14%)
>
> Scores from netstat are as follow:
>
> input(Total)   output
>packets  errs idrops  bytespackets  errs  bytes colls drops
>17M 0 023G10M 014G 0 0
>17M 0 023G10M 014G 0 0
>17M 0 023G10M 014G 0 0
>17M 0 023G10M 014G 0 0
>17M 0 024G31k  7.8M   1.9M 0  7.8M
>17M 0 024G31k  7.8M   1.9M 0  7.8M
>17M 0 024G31k  7.8M   1.9M 0  7.8M
>17M 0 024G31k  7.8M   1.9M 0  7.8M
>
> Interfaces are bonded as follow (/etc/rc.conf):
>
> ifconfig_cc0="-tso -lro mtu 9000 up"
> ifconfig_cc1="-tso -lro mtu 9000 up"
> ifconfig_cc2="-tso -lro mtu 9000 up"
> ifconfig_cc3="-tso -lro mtu 9000 up"
> cloned_interfaces="lagg0 lagg1 vlan0 vlan1"
> ifconfig_lagg0="laggproto lacp laggport cc0 laggport cc1"
> ifconfig_lagg1="laggproto lacp laggport cc2 laggport cc3"
> ifconfig_vlan0="inet 172.16.2.1/24 vlan 20 vlandev lagg0"
> ifconfig_vlan1="inet 172.16.1.1/24 vlan 10 vlandev lagg1"
>
> Thanks in advance,
>
> - Jordan
>
>
> ___
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Sporadic TCP/RST sent to client

2017-06-26 Thread Navdeep Parhar
On Thu, Jun 22, 2017 at 3:57 PM, Youssef  GHORBAL
 wrote:
> Hello,
>
> I'm having an issue with a FreeBSD 11 based system, sending 
> sporadically TCP/RST to clients after initial TCP session correctly initiated.
> The sequence goes this way :
>
> 1 Client -> Server : SYN
> 2 Server -> Client : SYN/ACK
> 3 Client -> Server : ACK
> 4 Client -> Server : PSH/ACK (upper protocol data sending starts here)
> 5 Server -> Client : RST
>
> - The problem happens sporadically, same client and same server can 
> communicate smoothely on the same service port. But from time to time (hours, 
> sometime days) the previous sequence happens.
> - The service running on server is not responsible for the RST sent. 
> The service was deeply profiled and nothing happens to justify the RST.
> - tcpdump on the server side assures that packet arrives timely 
> ordered.
> - the traffic is very light. Some TCP sessions per day.
> - the server is connected using a lagg enslaving two cxgb interfaces.
>
> In my effort to diagnose the problem (try to have a reproductible 
> test case) I noticed that the issue is triggered most likely when those two 
> conditions are met :
> - the ACK (in step 3) and the PSH/ACK (in step 4) arrive on different 
> lagg NICs.
> - the timing between those two packets is sub 10 microseconds.
>
> When searching the interwebs I came across a strangely similar issue 
> reported here 7 years ago :
> 
> https://lists.freebsd.org/pipermail/freebsd-net/2010-August/026029.html
>
> (The OP seemed to have resolved his issue changing the netisr policy 
> from direct to hybrid. but no reference of laggs being used)
>
> I'm pretty sure that I'm hitting some race condition, a scenario 
> where due to multithreading the PSH/ACK is somehow handled before the ACK 
> making the kernel rising TCP/RST since the initial TCP handshake did'nt 
> finish yet.
>
> I've read about netisr work and I was under the impression that even 
> if it's SMP enabled it was made to keep prorocol ordering.
>
> What's the expected behaviour in this scenario on the netisr side ?
> How can I push the investigation further ?

I think you've already figured out the situation here -- the PSH/ACK is likely
being handled before the ACK for the SYN because they arrived on different
interfaces.  There is nothing in netisr dispatch that will maintain protocol
ordering in this case.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: state of packet forwarding in FreeBSD?

2017-06-15 Thread Navdeep Parhar

On 06/14/2017 10:42, Olivier Cochard-Labbé wrote:
On Wed, Jun 14, 2017 at 7:36 PM, Navdeep Parhar <mailto:npar...@gmail.com>> wrote:



I think I fixed this a long time back.  Have you tried recently?  We
moved the netmap functionality to the vcxl interfaces and it should
just work.

​
It stills panic with an -head build today.



Fixed in r319986.

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: state of packet forwarding in FreeBSD?

2017-06-14 Thread Navdeep Parhar
Do you mean it works with one pkt-gen but panics when the second one is started?

On Wed, Jun 14, 2017 at 10:42 AM, Olivier Cochard-Labbé
 wrote:
> On Wed, Jun 14, 2017 at 7:36 PM, Navdeep Parhar  wrote:
>>
>>
>> I think I fixed this a long time back.  Have you tried recently?  We
>> moved the netmap functionality to the vcxl interfaces and it should
>> just work.
>>
>
> It stills panic with an -head build today.
>
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: state of packet forwarding in FreeBSD?

2017-06-14 Thread Navdeep Parhar
On Wed, Jun 14, 2017 at 8:21 AM, Olivier Cochard-Labbé
 wrote:
> On Wed, Jun 14, 2017 at 4:48 PM, John Jasen  wrote:
>
>>
>> b) On the negative side, between the various releases, netmap appeared
>> to be unstable with the Chelsio cards -- sometimes supported, sometimes
>> broken. Also, we're still trying to figure out netmap utilities, such as
>> vale-ctl and bridge, so any advice would be appreciated.
>>
>
> I confirm that mixing netmap and Chelsio is broken on -current since about
>
> 6 month.
> We can't start 2 netmap's pkt-gen simultaneously as example.
>
> cf my report:
> https://lists.freebsd.org/pipermail/svn-src-head/2016-December/094418.html

I think I fixed this a long time back.  Have you tried recently?  We
moved the netmap functionality to the vcxl interfaces and it should
just work.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: tools/netmap/pkt-gen: clang error

2017-05-30 Thread Navdeep Parhar
You can suppress the warning with something like this:
# cd /usr/src/tools/tools/netmap
# WARNS=2 make

Regards,
Navdeep


On Tue, May 30, 2017 at 10:54 AM, Harry Schmalzbauer  wrote:
>  Hello,
>
> after merging netmap code from head I can't compile pkt-gen from
> usr/src/tools/tools/netmap:
> cc  -O2 -pipe -Werror -Wall -Wextra -march=ivybridge  -g -std=gnu99
> -fstack-protector-strong-Qunused-arguments -Qunused-arguments
> -std=gnu99 -fstack-protector-strong-Qunused-arguments
> -Qunused-arguments  -c pkt-gen.c -o pkt-gen.o
> pkt-gen.c:650:19: error: taking address of packed member 'ip' of class
> or structure 'pkt' may result in an unaligned pointer value
>   [-Werror,-Waddress-of-packed-member]
> struct ip *ip = &pkt->ip;
>  ^~~
> pkt-gen.c:651:24: error: taking address of packed member 'udp' of class
> or structure 'pkt' may result in an unaligned pointer value
>   [-Werror,-Waddress-of-packed-member]
> struct udphdr *udp = &pkt->udp;
>   ^~~~
> pkt-gen.c:745:8: error: taking address of packed member 'ip' of class or
> structure 'pkt' may result in an unaligned pointer value
>   [-Werror,-Waddress-of-packed-member]
> ip = &pkt->ip;
>   ^~~
> pkt-gen.c:762:9: error: taking address of packed member 'udp' of class
> or structure 'pkt' may result in an unaligned pointer value
>   [-Werror,-Waddress-of-packed-member]
> udp = &pkt->udp;
>
> Quick hints highly appreciated.
>
> Thanks,
>
> -harry
> ___
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: cxgbe netmap promiscuous mode?

2017-04-24 Thread Navdeep Parhar

On 04/19/2017 08:31, Joe Jones wrote:

Hi Navdeep,

I already got rid of the hw.cxgbe.num_vis line in loader.conf when I 
rebooted this morning.


dev.t5nex.0.firmware_version: 1.15.37.0


I tried this exact firmware and was able to reproduce the problem.  This 
appears to be a firmware bug that has already been fixed in the 1.16.x 
firmware available in 10-STABLE.


Regards,
Navdeep




On 19/04/17 15:37, Navdeep Parhar wrote:

What is the firmware version?

# sysctl dev.t5nex.0.firmware_version

I'll try to repeat the experiment with a T520-SO with the firmware that
you have on your card.  Does the card behave this way if the extra VIs
are not created?  Can you please try without hw.cxgbe.num_vis in
loader.conf?

Regards,
Navdeep

On Wed, Apr 19, 2017 at 10:29:06AM +0100, Joe Jones wrote:

uname -a
FreeBSD goose2 11.0-RELEASE-p9 FreeBSD 11.0-RELEASE-p9 #0: Tue Apr 11
08:48:40 UTC 2017
r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64

The card is a 'T520-SO Unified Wire Ethernet Controller'

I ran the following with dtrace running in a separate window

ifconfig cxl1 promisc up ( only see broadcast)
ifconfig cxl1 -promisc
ifconfig cxl1 promisc (now I see traffic)

dtrace output was

[root@goose2 /usr/home/joe]# dtrace -n 'fbt::t4_set_rxmode:entry
{trace(arg4)}'
dtrace: description 'fbt::t4_set_rxmode:entry ' matched 1 probe
CPU IDFUNCTION:NAME
   4  61078  t4_set_rxmode:entry 1
   7  61078  t4_set_rxmode:entry 0
   5  61078  t4_set_rxmode:entry     1


On 19/04/17 01:18, Navdeep Parhar wrote:

On Mon, Apr 17, 2017 at 11:00:38AM +0100, Joe Jones wrote:

Hi Navdeep

running "ifconfig up" and then "ifconfig promisc" works. Running 
"ifconfig
promisc" and then "ifconfig up" does not work. Running "ifconfig up 
promisc"

together does work. Running "ifconfig promisc up" does not work.
What version of FreeBSD is this?  I couldn't reproduce this on head 
with

a T6 card.  Can you please run this in parallel with your ifconfig
commands, note what dtrace logs in response to what command(s), and 
send

the output to me?

# dtrace -n 'fbt::t4_set_rxmode:entry {trace(arg4)}'

The combination that does not work leaves the interface in a state 
where it

reports it's self as being in promiscuous mode.
In my experiments the interface did function in promiscuous mode 
whether

I did "ifconfig cc0 promisc up" or "ifconfig cc0 up promisc".

Regards,
Navdeep

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"




___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: cxgbe netmap promiscuous mode?

2017-04-18 Thread Navdeep Parhar
On Mon, Apr 17, 2017 at 11:00:38AM +0100, Joe Jones wrote:
> Hi Navdeep
> 
> running "ifconfig up" and then "ifconfig promisc" works. Running "ifconfig
> promisc" and then "ifconfig up" does not work. Running "ifconfig up promisc"
> together does work. Running "ifconfig promisc up" does not work.

What version of FreeBSD is this?  I couldn't reproduce this on head with
a T6 card.  Can you please run this in parallel with your ifconfig
commands, note what dtrace logs in response to what command(s), and send
the output to me?

# dtrace -n 'fbt::t4_set_rxmode:entry {trace(arg4)}'

> 
> The combination that does not work leaves the interface in a state where it
> reports it's self as being in promiscuous mode.

In my experiments the interface did function in promiscuous mode whether
I did "ifconfig cc0 promisc up" or "ifconfig cc0 up promisc".

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: cxgbe netmap promiscuous mode?

2017-04-14 Thread Navdeep Parhar
On Fri, Apr 14, 2017 at 04:10:59PM +0100, Joe Jones wrote:
> Hi Navdeep,
> 
> I think I have found a driver bug. Earlier today I set up the switch I'm
> using so that two of the ports mirror the traffic on one of the other ports.
> We are planning on using a similar setup to allow packet tracing without
> stressing the boxes our application is running on any more then they are
> already.
> 
> I connected both ports to one of our cxgbe cards, My intention was to use
> tcpdump to check that my switch config was doing what I thought it should. I
> ran
> 
> ifconfig cxl? promisc -vlanhwtag up

Does the problem occur only if you use this form of ifconfig?  Can you
please try "ifconfig up" and "ifconfig cxl? promisc" separately and see
what happens?

Regards,
Navdeep

> 
> on both interfaces, this is what the interfaces looked like
> 
> cxl0: flags=28943
> metric 0 mtu 1500
> options=ec07ab
> ether 00:07:43:33:8a:20
> nd6 options=29
> media: Ethernet 10Gbase-Twinax 
> status: active
> vcxl0: flags=8802 metric 0 mtu 1500
> options=ec07bb
> ether 00:07:43:33:8a:22
> nd6 options=29
> media: Ethernet 10Gbase-Twinax 
> status: active
> cxl1: flags=28943
> metric 0 mtu 1500
> options=ec07ab
> ether 00:07:43:33:8a:28
> nd6 options=29
> media: Ethernet 10Gbase-Twinax 
> status: active
> vcxl1: flags=8802 metric 0 mtu 1500
> options=ec07bb
> ether 00:07:43:33:8a:2a
> nd6 options=29
> media: Ethernet 10Gbase-Twinax 
> status: active
> 
> The interesting thing is, a tcpdump on cxl0 showed all the traffic I
> expected to see, while tcpdump on cxl1 showed only broadcast traffic. After
> playing with the switch config to make sure the difference was not on the
> switch I pulled both patch cables out and into another server with the same
> card. On the second server I could use tcpdump and see all the traffic I
> expected on either interface.
> 
> Then back on the original server, I reloaded the device driver and tried
> again. Now I got only broadcast on cxl0 and cxl1. Then finally I got all the
> traffic to show up by doing
> 
> ifconfig cxl1 -promisc
> ifconfig cxl1 promisc
> 
> It would appearer to me that the card can get into a state where ifconfig
> reports that it is in promiscuous mode when it is not.
> 
> 
> Joe Jones
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: TSO and packets accounting

2017-03-25 Thread Navdeep Parhar
On Sun, Mar 26, 2017 at 01:39:30AM +0300, Slawa Olhovchenkov wrote:
> How to acoount output packets w/ TSO?
> I mean as one large packet. What I see:
> 
> # netstat -nbI lagg0 1
> input  lagg0   output
>packets  errs idrops  bytespackets  errs  bytes colls
>1702715 0 0  185606274492 0 9401968581 0
>1623416 0 0  1163063536035291 0 9045680036 0
>1670956 0 0  1199116786107868 0 9153586152 0
>1682365 0 0  1205181126157620 0 9228163875 0
>1575295 0 0  1127861995831604 0 8736670135 0
>1596283 0 0  1144040285910990 0 8852555094 0
>1651946 0 0  1184494786080815 0 9109251501 0
>1661730 0 0  1190015126152532 0 9219357915 0
>1638212 0 0  1175028026114157 0 9160154253 0
>1644270 0 0  1178239306116968 0 9164984649 0
> 
> 9401968581/6274492 = 1498.44299442887169192342
> 
> TSO not worked?
> Or this is adapted acounting?

The interfaces are cc(4), so this is adapter accounting.  The numbers
you see are coming from hardware MAC statistics that track "on-the-wire"
frames.

If you want to know if TSO is occurring look for the driver stats for
the number of TSO work requests it has sent to the chip:
# sysctl dev.cc. | grep tso_wrs

Regards,
Navdeep

> 
> cc0: flags=8843 metric 0 mtu 1500
> 
> options=ec07bb
> ether 00:07:43:04:b3:20
> nd6 options=9
> media: Ethernet 40Gbase-SR4 
> status: active
> cc1: flags=8843 metric 0 mtu 1500
> 
> options=ec07bb
> ether 00:07:43:04:b3:20
> nd6 options=9
> media: Ethernet 40Gbase-SR4 
> status: active
> lagg0: flags=8843 metric 0 mtu 1500
> 
> options=ec07bb
> ether 00:07:43:04:b3:20
> nd6 options=9
> media: Ethernet autoselect
> status: active
> groups: lagg 
> laggproto lacp lagghash l2,l3,l4
> laggport: cc0 flags=1c
> laggport: cc1 flags=1c
> ___
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: bad throughput performance on multiple systems: Re: Fwd: Re: Disappointing packets-per-second performance results on a Dell,PE R530

2017-03-24 Thread Navdeep Parhar
On 03/24/2017 16:53, Caraballo-vega, Jordan A. (GSFC-6062)[COMPUTER 
SCIENCE CORP] wrote:

It looks like netmap is there; however, is there a way of figuring out
if netmap is being used?


If you're not running netmap-fwd or some other netmap application, it's 
not being used.  You have just 1 txq/rxq and that would explain the 
difference between cxl and vcxl.


> cxl0: 16 txq, 8 rxq (NIC)
> vcxl0: 1 txq, 1 rxq (NIC); 2 txq, 2 rxq (netmap)


...
And yes, we are using UDP 64 bytes tests.


That's strange then.  The "input packets" counter counts every single 
frame that the chip saw on the wire that matched any of its MAC 
addresses, including frames that the chip drops.  There's no way to 
explain why vcxl sees ~640K pps incoming vs. 2.8M pps for cxl.  That 
number shouldn't depend on your router configuration at all -- it's 
entirely dependent on the traffic generators.  Are you sure you aren't 
getting PAUSE frames out of the chip?  There's nothing else that could 
slow down UDP senders.


# sysctl -a | grep tx_pause

Regards,
Navdeep



On 3/24/17 7:39 PM, Navdeep Parhar wrote:

On 03/24/2017 16:07, Caraballo-vega, Jordan A. (GSFC-6062)[COMPUTER
SCIENCE CORP] wrote:

At the time of implementing the vcxl* interfaces we get very bad
results.


You're probably not using netmap with the vcxl interfaces, and the
number of "normal" tx and rx queues is just 2 for these interfaces.

Even if you _are_ using netmap, the hw.cxgbe.nnmtxq10g/rxq10g tunables
don't work anymore.  Use these to control the number of queues for
netmap:
hw.cxgbe.nnmtxq_vi
hw.cxgbe.nnmrxq_vi

You should see a line like this in dmesg for all cxl/vcxl interfaces
and that tells you exactly how many queues the driver configured:
cxl0: 4 txq, 4 rxq (NIC); 4 txq, 2 rxq (TOE)



packets  errs idrops  bytespackets  errs  bytes colls drops
629k  4.5k 066M   629k 066M
0 0
701k  5.0k 074M   701k 074M
0 0
668k  4.8k 070M   668k 070M
0 0
667k  4.8k 070M   667k 070M
0 0
645k  4.5k 068M   645k 068M
0 0
686k  4.9k 072M   686k 072M
0 0

And by using just the cxl* interfaces we were getting about

  input(Total)   output
 packets  errs idrops  bytespackets  errs  bytes colls
drops
2.8M 0  1.2M   294M   1.6M 0   171M
0 0
2.8M 0  1.2M   294M   1.6M 0   171M
0 0
2.8M 0  1.2M   294M   1.6M 0   171M
0 0
2.8M 0  1.2M   295M   1.6M 0   172M
0 0
2.8M 0  1.2M   295M   1.6M 0   171M
0 0

These are our configurations for now. Any advice or suggestion will be
appreciated.


What I don't understand is that you have PAUSE disabled and congestion
drops enabled but still the number of packets coming in (whether they
are dropped eventually or not is irrelevant here) is very low in your
experiments.  It's almost as if the senders are backing off in the
face of packet loss.  Are you using TCP or UDP?  Always use UDP for
pps testing -- the senders need to be relentless.

Regards,
Navdeep



/etc/rc.conf configurations

ifconfig_cxl0="up"
ifconfig_cxl1="up"
ifconfig_vcxl0="inet 172.16.2.1/24 -tso -lro mtu 9000"
ifconfig_vcxl1="inet 172.16.1.1/24 -tso -lro mtu 9000"
gateway_enable="YES"

/boot/loader.conf configurations

# Chelsio Modules
t4fw_cfg_load="YES"
t5fw_cfg_load="YES"
if_cxgbe_load="YES"

# rx and tx size
dev.cxl.0.qsize_txq=8192
dev.cxl.0.qsize_rxq=8192
dev.cxl.1.qsize_txq=8192
dev.cxl.1.qsize_rxq=8192

# drop toecaps to increase queues
dev.t5nex.0.toecaps=0
dev.t5nex.0.rdmacaps=0
dev.t5nex.0.iscsicaps=0
dev.t5nex.0.fcoecaps=0

# Controls the hardware response to congestion.  -1 disables
# congestion feedback and is not recommended.  0 instructs the
# hardware to backpressure its pipeline on congestion.  This
# usually results in the port emitting PAUSE frames.  1 instructs
# the hardware to drop frames destined for congested queues. From cxgbe
dev.t5nex.0.cong_drop=1

# Saw these recomendations in Vicenzo email thread
hw.cxgbe.num_vis=2
hw.cxgbe.fl_pktshift=0
hw.cxgbe.toecaps_allowed=0
hw.cxgbe.nnmtxq10g=8
hw.cxgbe.nnmrxq10g=8

/etc/sysctl.conf configurations

# Turning off pauses
dev.cxl.0.pause_settings=0
dev.cxl.1.pause_settings=0
# John Jasen suggestion - March 24, 2017
net.isr.bindthreads=0
net.isr.maxthreads=24


On 3/18/17 1:28 AM, Navdeep Parhar wrote:

On Fri, Mar 17, 2017 at 11:43:32PM -0400, John Jasen wrote:

On 03/17/2017 03:32 PM, Navdeep Parhar wrote:


On Fri, Mar 17, 2017 at 12:21 PM, John Jasen 
wrote:

Yes.
We were hopeful, initially, 

Re: bad throughput performance on multiple systems: Re: Fwd: Re: Disappointing packets-per-second performance results on a Dell,PE R530

2017-03-24 Thread Navdeep Parhar
On 03/24/2017 16:07, Caraballo-vega, Jordan A. (GSFC-6062)[COMPUTER 
SCIENCE CORP] wrote:

At the time of implementing the vcxl* interfaces we get very bad results.


You're probably not using netmap with the vcxl interfaces, and the 
number of "normal" tx and rx queues is just 2 for these interfaces.


Even if you _are_ using netmap, the hw.cxgbe.nnmtxq10g/rxq10g tunables 
don't work anymore.  Use these to control the number of queues for netmap:

hw.cxgbe.nnmtxq_vi
hw.cxgbe.nnmrxq_vi

You should see a line like this in dmesg for all cxl/vcxl interfaces and 
that tells you exactly how many queues the driver configured:

cxl0: 4 txq, 4 rxq (NIC); 4 txq, 2 rxq (TOE)



packets  errs idrops  bytespackets  errs  bytes colls drops
   629k  4.5k 066M   629k 066M 0 0
   701k  5.0k 074M   701k 074M 0 0
   668k  4.8k 070M   668k 070M 0 0
   667k  4.8k 070M   667k 070M 0 0
   645k  4.5k 068M   645k 068M 0 0
   686k  4.9k 072M   686k 072M 0 0

And by using just the cxl* interfaces we were getting about

 input(Total)   output
packets  errs idrops  bytespackets  errs  bytes colls drops
   2.8M 0  1.2M   294M   1.6M 0   171M 0 0
   2.8M 0  1.2M   294M   1.6M 0   171M 0 0
   2.8M 0  1.2M   294M   1.6M 0   171M 0 0
   2.8M 0  1.2M   295M   1.6M 0   172M 0 0
   2.8M 0  1.2M   295M   1.6M 0   171M 0 0

These are our configurations for now. Any advice or suggestion will be
appreciated.


What I don't understand is that you have PAUSE disabled and congestion 
drops enabled but still the number of packets coming in (whether they 
are dropped eventually or not is irrelevant here) is very low in your 
experiments.  It's almost as if the senders are backing off in the face 
of packet loss.  Are you using TCP or UDP?  Always use UDP for pps 
testing -- the senders need to be relentless.


Regards,
Navdeep



/etc/rc.conf configurations

ifconfig_cxl0="up"
ifconfig_cxl1="up"
ifconfig_vcxl0="inet 172.16.2.1/24 -tso -lro mtu 9000"
ifconfig_vcxl1="inet 172.16.1.1/24 -tso -lro mtu 9000"
gateway_enable="YES"

/boot/loader.conf configurations

# Chelsio Modules
t4fw_cfg_load="YES"
t5fw_cfg_load="YES"
if_cxgbe_load="YES"

# rx and tx size
dev.cxl.0.qsize_txq=8192
dev.cxl.0.qsize_rxq=8192
dev.cxl.1.qsize_txq=8192
dev.cxl.1.qsize_rxq=8192

# drop toecaps to increase queues
dev.t5nex.0.toecaps=0
dev.t5nex.0.rdmacaps=0
dev.t5nex.0.iscsicaps=0
dev.t5nex.0.fcoecaps=0

# Controls the hardware response to congestion.  -1 disables
# congestion feedback and is not recommended.  0 instructs the
# hardware to backpressure its pipeline on congestion.  This
# usually results in the port emitting PAUSE frames.  1 instructs
# the hardware to drop frames destined for congested queues. From cxgbe
dev.t5nex.0.cong_drop=1

# Saw these recomendations in Vicenzo email thread
hw.cxgbe.num_vis=2
hw.cxgbe.fl_pktshift=0
hw.cxgbe.toecaps_allowed=0
hw.cxgbe.nnmtxq10g=8
hw.cxgbe.nnmrxq10g=8

/etc/sysctl.conf configurations

# Turning off pauses
dev.cxl.0.pause_settings=0
dev.cxl.1.pause_settings=0
# John Jasen suggestion - March 24, 2017
net.isr.bindthreads=0
net.isr.maxthreads=24


On 3/18/17 1:28 AM, Navdeep Parhar wrote:

On Fri, Mar 17, 2017 at 11:43:32PM -0400, John Jasen wrote:

On 03/17/2017 03:32 PM, Navdeep Parhar wrote:


On Fri, Mar 17, 2017 at 12:21 PM, John Jasen  wrote:

Yes.
We were hopeful, initially, to be able to achieve higher packet
forwarding rates through either netmap-fwd or due to enhancements based
off https://wiki.freebsd.org/ProjectsRoutingProposal

Have you tried netmap-fwd?  I'd be interested in how that did in your tests.

We have. On this particular box, (11-STABLE, netmap-fwd fresh from git)
it took about 1.7m pps in, dropped 500k, and passed about 800k.

I'm lead to believe that vcxl interfaces may yield better results?

Yes, those are the ones with native netmap support.  Any netmap based
application should use the vcxl interfaces.  If you used them on the
main cxl interfaces you were running netmap in emulated mode.

Regards,
Navdeep




___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Committing a new 25G/40G/100G Ethernet Driver

2017-03-24 Thread Navdeep Parhar

On 03/23/2017 19:39, Somayajulu, David wrote:

Hi All,
I have a brand new Cavium 25G/40G/100G Ethernet Driver to commit to HEAD.
The patch generated using "svn diff"   is about 22Mb. Per gnn's advice I have tried to 
submit the patch via Phabricator at https://reviews.freebsd.org/differential/diff/create/ for 
review. The file uploads fine but I get the following error "413 Request Entry Too 
Large".  I would appreciate if someone can help me circumvent this problem. Also would it be 
o.k if I break the patch into smaller portions and submit to Phabricator ?



Have you tried the command line tool?  "arc diff --create" from the 
workspace where you ran svn diff.


Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: cxgbe netmap promiscuous mode?

2017-03-24 Thread Navdeep Parhar
On Fri, Mar 24, 2017 at 6:40 AM, Joe Jones  wrote:
> Hello Navdeep,
>
> ...
>
> We were using our own MACs, we can fix the problem by using the mac from the
> vcxl interface. Should we not be able to capture all traffic on the
> interface regardless of what destination MAC it has.

You should, but you'll need to put vcxl in promiscous mode for
that.  The command that you posted had cxl in promiscuous mode.  As
I said earlier they share the wire but operate as independent network
interfaces otherwise.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: cxgbe netmap promiscuous mode?

2017-03-23 Thread Navdeep Parhar
Your netmap application should be using the 'vcxl' interface, not the
cxl interface.  Even though they share a physical port they have
different MAC addresses and are totally autonomous.  The peer should
use the vcxl interface's MAC if it wants to reach the netmap
application.

Do you have the panic message and stack?  I know of a couple of panics
that have been fixed in -STABLE -- one was one related to emulated
mode and the second one was an illegal lock acquisition.

Regards,
Navdeep

On Thu, Mar 23, 2017 at 6:00 AM, Joe Jones  wrote:
> Hello,
>
> We have a T520-SO and have made a new install of 11.0, to begin with the box
> would panic every time we tried to switch the card into netmap mode. So we
> recompiled the kernel with netmap removed, then compiled the netmap kernel
> module from github, as this in our experience generally leads to a more
> stable netmap.
>
> we have
>
> uname -a
> FreeBSD goose2 11.0-RELEASE-p1 FreeBSD 11.0-RELEASE-p1 #0: Wed Mar 22
> 16:52:35 UTC 2017 joe@goose2:/usr/obj/usr/src/sys/GENERIC amd64
>
> and the following in /boot/loader.conf
>
> t4fw_cfg_load="YES"
> t5fw_cfg_load="YES"
> if_cxgbe_load="YES"
> hw.cxgbe.fl_pktshift=0
> hw.cxgbe.toecaps_allowed=0
> hw.cxgbe.nnmtxq10g=8
> hw.cxgbe.nnmrxq10g=8
> hw.cxgbe.num_vis=2
>
> Before I run our application I run
>
> ifconfig cxl1 promisc -vlanhwtag up
>
> Now our application can now start without panicking the kernel. Here is
> where it gets interesting, our application is able to announce it's self via
> ARP, I can see the ethernet switch learning which port it's on, and other
> hosts adding it to their ARP tables. When I try an ICMP ping it goes
> missing. After watching the TX packet graph for the connected port on the
> switch while starting and stopping a flood ping to the application, I'm sure
> the packets are getting sent to the card, however I don't see them in the
> netmap ring. If I kill our application, then use ifconfig to create and
> configure a vlan port I can confirm that the card is working and has
> connectivity.
>
> Here's what I think is happening. ARP requests are received because they are
> sent to the broadcast address. Our application then announces it's self.
> However traffic destined for the application is send to a MAC address which
> is neither the broadcast or the MAC programed into the hardware and is
> dropped. My understanding of promiscuous it that it informs the card that we
> want these dropped packets. It looks to me like, when the card is in netmap
> mode the promisc flag is being ignored.
>
> I have also tried using freebsd-update to update to p8. As with the p0
> kernel we get a panic when we switch the card into netmap mode.
>
> We did previously have these cards working in netmap mode. We were using a
> pre 11 snapshot of the svn head though .
>
> Many Thanks
>
> Joe Jones
> Stream Technologies
>
> ___
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: bad throughput performance on multiple systems: Re: Fwd: Re: Disappointing packets-per-second performance results on a Dell,PE R530

2017-03-17 Thread Navdeep Parhar
On Fri, Mar 17, 2017 at 11:43:32PM -0400, John Jasen wrote:
> On 03/17/2017 03:32 PM, Navdeep Parhar wrote:
> 
> > On Fri, Mar 17, 2017 at 12:21 PM, John Jasen  wrote:
> >> Yes.
> >> We were hopeful, initially, to be able to achieve higher packet
> >> forwarding rates through either netmap-fwd or due to enhancements based
> >> off https://wiki.freebsd.org/ProjectsRoutingProposal
> > Have you tried netmap-fwd?  I'd be interested in how that did in your tests.
> 
> We have. On this particular box, (11-STABLE, netmap-fwd fresh from git)
> it took about 1.7m pps in, dropped 500k, and passed about 800k.
>
> I'm lead to believe that vcxl interfaces may yield better results?

Yes, those are the ones with native netmap support.  Any netmap based
application should use the vcxl interfaces.  If you used them on the
main cxl interfaces you were running netmap in emulated mode.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: bad throughput performance on multiple systems: Re: Fwd: Re: Disappointing packets-per-second performance results on a Dell,PE R530

2017-03-17 Thread Navdeep Parhar
On Fri, Mar 17, 2017 at 12:21 PM, John Jasen  wrote:
> On 03/17/2017 06:08 AM, Slawa Olhovchenkov wrote:
>
>> On Thu, Mar 16, 2017 at 03:50:42PM -0400, John Jasen wrote:
>>
>>> As a few points of note, partial resolution, and curiosity:
>>>
>>> Following down leads that 11-STABLE had tryforward improvements over
>>> 11-RELENG, I upgraded. The same tests (24 client streams over UDP with
>>> small packets), the system went from passing 1.7m pps to about 2.5m.
>>>
>>> Following indications from Navdeep Parhar that UDP queue hashing is not as
>>> efficient as it could be, we started running the tests with various powers
>>> of 2 streams (2,4,8,16,32) -- and were able to push the system up to 5m pps.
>>>
>>> We are currently seeing in the tests approximately 10-11m pps on the
>>> outside interface, around 5-6m dropped, and 5 million passed.
>> You want more?
>
> Yes.
>
> We were hopeful, initially, to be able to achieve higher packet
> forwarding rates through either netmap-fwd or due to enhancements based
> off https://wiki.freebsd.org/ProjectsRoutingProposal

Have you tried netmap-fwd?  I'd be interested in how that did in your tests.

Sadly, projects/routing couldn't make it into 11.  I'm trying to find
out what's keeping it from getting merged into head.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: bad throughput performance on multiple systems: Re: Fwd: Re: Disappointing packets-per-second performance results on a Dell,PE R530

2017-03-13 Thread Navdeep Parhar
On Mon, Mar 13, 2017 at 10:13 AM, John Jasen  wrote:
> On 03/13/2017 01:03 PM, Navdeep Parhar wrote:
>
>> On Sun, Mar 12, 2017 at 5:35 PM, John Jasen  wrote:
>>> UDP traffic. dmesg reports 16 txq, 8 rxq -- which is the default for
>>> Chelsio.
>>>
>> I don't recall offhand, but UDP might be using 2-tuple hashing by
>> default and that might affect the distribution of flows across queues.
>> Are there senders generating IP fragments by any chance (that'll
>> depend on the "send size" that your UDP application is using)?
>
> No, they're not fragmenting.
>
>> Have you tried limiting the adapter's rx ithreads to the CPU that the
>> PCIe slot with the adapter is wired to?
>
> Above and beyond the use of cpuset, you mean?

I meant cpuset.

If possible, try your experiments on a single socket system.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: bad throughput performance on multiple systems: Re: Fwd: Re: Disappointing packets-per-second performance results on a Dell,PE R530

2017-03-13 Thread Navdeep Parhar
On Sun, Mar 12, 2017 at 5:35 PM, John Jasen  wrote:
>
> UDP traffic. dmesg reports 16 txq, 8 rxq -- which is the default for
> Chelsio.
>

I don't recall offhand, but UDP might be using 2-tuple hashing by
default and that might affect the distribution of flows across queues.
Are there senders generating IP fragments by any chance (that'll
depend on the "send size" that your UDP application is using)?

Have you tried limiting the adapter's rx ithreads to the CPU that the
PCIe slot with the adapter is wired to?

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Chelsio netmap support ? (RELENG_11)

2017-03-09 Thread Navdeep Parhar
On Wed, Mar 8, 2017 at 6:28 AM, Mike Tancsa  wrote:
> On 3/7/2017 9:08 PM, Navdeep Parhar wrote:
>> On Tue, Mar 7, 2017 at 5:46 PM, Mike Tancsa  wrote:
>>
>>>
>>> # dmesg | grep netm
>>> netmap: loaded module
>>> vcxl0: netmap queues/slots: TX 2/1023, RX 2/1024
>>> vcxl0: 1 txq, 1 rxq (NIC); 1 txq, 1 rxq (TOE); 2 txq, 2 rxq (netmap)
>>> vcxl1: netmap queues/slots: TX 2/1023, RX 2/1024
>>> vcxl1: 1 txq, 1 rxq (NIC); 1 txq, 1 rxq (TOE); 2 txq, 2 rxq (netmap)
>>> igb0: netmap queues/slots: TX 4/1024, RX 4/1024
>>> igb1: netmap queues/slots: TX 4/1024, RX 4/1024
>>>
>>> It maxes out at about 800Kpps with and without netmap.  Is there a way
>>
>> Are you actually using a netmap based application that acts as a
>> packet router or is this just the vcxl interface running as a normal
>> ifnet?
>
> the later, vcxl running normal ifnet. I thought there would be a benefit
> to utilizing netmap ?  Sorry, this is not clear to me.

The kernel's routing code does not utilize netmap even if it's
available.  You'll need something like netmap-fwd for netmap based
routing.

If you're not using netmap there is no need to create the extra vcxl interfaces.

>
>>
>>> to increase the queues for the Chelsio nic, like the onboard igb ?
>>
>> If you're not running a netmap based router get rid of the num_vis=2
>> and simply try with the cxl0/cxl1 interfaces.  They should each have 4
>> rxq/4 txq on your system.  In case you want to increase the number of
>> queues, use this:
>
> The tests with the regular cxl also show the box topping out at 0.8Mpps
> for forwarding.

I would have expected multiple streams to do better.  There is a lot
of information about forwarding on the bsdrp.net website.  Have you
tried the tips there?  The numbers there are significantly better than
what you observe.  I suspect your router is CPU-bound.

https://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_hp_proliant_dl360p_gen8_with_10-gigabit_with_10-gigabit_chelsio_t540-cr

https://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_superserver_5018a-ftn4_with_10-gigabit_chelsio_t540-cr

There's a projects/routing branch that does much better than the stock
kernel.  I'm not sure what work remains to be done before it can be
merged into head.
https://github.com/ocochard/netbenches/blob/master/Xeon_E5-2650-8Cores-Chelsio_T540-CR/forwarding-pf-ipfw/results/fbsd11-routing.r287531/README.md

Regards,
Navdeep

>
>>
>> The "NIC" queues are the normal tx/rx queues, the "netmap" queues are
>> active when the interface is in netmap mode.
>>
>> Does netsend generate a single flow or multiple flows?  If it's a
>> single flow it will use a single queue only.
>
> I think its as a single flow. However, I was using a separate box to
> generate a second flow as well. It still topped out at about 800Kpps
> before dropping packets.
>
> ---Mike
>
>
>
> --
> ---
> Mike Tancsa, tel +1 519 651 3400
> Sentex Communications, m...@sentex.net
> Providing Internet services since 1994 www.sentex.net
> Cambridge, Ontario Canada   http://www.tancsa.com/
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Chelsio netmap support ? (RELENG_11)

2017-03-07 Thread Navdeep Parhar
On Tue, Mar 7, 2017 at 5:46 PM, Mike Tancsa  wrote:

>
> # dmesg | grep netm
> netmap: loaded module
> vcxl0: netmap queues/slots: TX 2/1023, RX 2/1024
> vcxl0: 1 txq, 1 rxq (NIC); 1 txq, 1 rxq (TOE); 2 txq, 2 rxq (netmap)
> vcxl1: netmap queues/slots: TX 2/1023, RX 2/1024
> vcxl1: 1 txq, 1 rxq (NIC); 1 txq, 1 rxq (TOE); 2 txq, 2 rxq (netmap)
> igb0: netmap queues/slots: TX 4/1024, RX 4/1024
> igb1: netmap queues/slots: TX 4/1024, RX 4/1024
>
> It maxes out at about 800Kpps with and without netmap.  Is there a way

Are you actually using a netmap based application that acts as a
packet router or is this just the vcxl interface running as a normal
ifnet?

> to increase the queues for the Chelsio nic, like the onboard igb ?

If you're not running a netmap based router get rid of the num_vis=2
and simply try with the cxl0/cxl1 interfaces.  They should each have 4
rxq/4 txq on your system.  In case you want to increase the number of
queues, use this:

# cxl0/cxl1 etc.
hw.cxgbe.nrxq10g=8
hw.cxgbe.ntxq10g=8

# vcxl0/vcxl1's "normal" queues
hw.cxgbe.nrxq_vi=8
hw.cxgbe.ntxq_vi=8

# vcxl0/vcxl1's netmap queues
hw.cxgbe.nnmrxq_vi=8
hw.cxgbe.nnmtxq_vi=8

Check in dmesg after you reboot with your new settings
> vcxl0: 1 txq, 1 rxq (NIC); 1 txq, 1 rxq (TOE); 2 txq, 2 rxq (netmap)

The "NIC" queues are the normal tx/rx queues, the "netmap" queues are
active when the interface is in netmap mode.

Does netsend generate a single flow or multiple flows?  If it's a
single flow it will use a single queue only.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Chelsio netmap support ? (RELENG_11)

2017-03-07 Thread Navdeep Parhar
On Tue, Mar 7, 2017 at 1:53 PM, Mike Tancsa  wrote:
...
>
> Using netsend, I cant seem to blast through a single flow of packets
> greater than 800Kpps without packet loss.  Can you point me to any
> performance tweaks for forwarding / routing ?
>
> I have 3 boxes, with one in the middle
>
> (netsend box) <> (router with chelsio nics) <---> (netreceive box)

How is the router configured -- is this something netmap based?
Please provide more details of the configuration.  What kind of
CPU/chipset is it?

Do you see any PAUSE frames out of the T5? "sysctl dev.cxl dev.vcxl |
grep _pause"  Is any CPU core pegged at 100% during the test?

Regards,
Navdeep

> #./netsend 10.151.10.3 500 100 80 20
> Sending packet of payload size 100 every 0.01250s for 20 seconds
> calling time every 100 cycles
>
> start: 1488952041.0
> finish:1488952061.00837
> send calls:1600
> send errors:   0
> approx send rate:  80 pps
> time/packet:   1250 ns
> approx error rate: 0
> waited:333086341
> approx waits/sec:  16654317
> approx wait rate:  20
>
>
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Chelsio netmap support ? (RELENG_11)

2017-03-07 Thread Navdeep Parhar
On Tue, Mar 7, 2017 at 12:21 PM, Navdeep Parhar  wrote:
> Is it possible for you to run 10-STABLE?

I meant 11-STABLE of course.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Chelsio netmap support ? (RELENG_11)

2017-03-07 Thread Navdeep Parhar
Is it possible for you to run 10-STABLE?  The netmap support in cxgbe
was enhanced and uses the general purpose 'vcxl' interfaces now.  You
can enable them with hw.cxgbe.num_vis=2 in loader.conf.

If -STABLE is not an option then check if you built your kernel with
device NETMAP.  config -x /boot/kernel/kernel | grep -i netmap.  cxgbe
has compile time checks for netmap so loadable netmap module may not
work.

Regards,
Navdeep

On Tue, Mar 7, 2017 at 11:36 AM, Mike Tancsa  wrote:
> I have a Chelsio T520 NIC that I have been trying to get netmap working
> with. I see reference to the ncxl# vs cxl# interface on the lists, but I
> never see ncxl come up.  The man pages dont say anything about having to
> specifically enable anything that I can see.  I am using RELENG11 as of
> today (r314848).
>
> ---Mike
>
> t5nex0@pci0:2:0:4:  class=0x02 card=0x1425 chip=0x54071425
> rev=0x00 hdr=0x00
> vendor = 'Chelsio Communications Inc'
> device = 'T520-SO Unified Wire Ethernet Controller'
> class  = network
> subclass   = ethernet
> bar   [10] = type Memory, range 64, base 0xdd30, size 524288,
> enabled
> bar   [18] = type Memory, range 64, base 0xdc00, size 16777216,
> enabled
> bar   [20] = type Memory, range 64, base 0xdd884000, size 8192, enabled
> cap 01[40] = powerspec 3  supports D0 D3  current D0
> cap 05[50] = MSI supports 32 messages, 64 bit, vector masks
> cap 10[70] = PCI-Express 2 endpoint max data 256(2048) FLR RO NS
>  link x8(x8) speed 8.0(8.0)
> cap 11[b0] = MSI-X supports 256 messages, enabled
>  Table in map 0x20[0x0], PBA in map 0x20[0x1000]
> cap 03[d0] = VPD
> ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected
> ecap 0003[170] = Serial 1 
> ecap 000e[190] = ARI 1
> ecap 0019[1a0] = PCIe Sec 1 lane errors 0
> ecap 0010[1c0] = SR-IOV 1 IOV disabled, Memory Space disabled, ARI
> disabled
>  0 VFs configured out of 0 supported
>  First VF RID Offset 0x0008, VF RID Stride 0x0004
>  VF Device ID 0x5807
>  Page Sizes: 4096 (enabled), 8192, 65536, 262144,
> 1048576, 4194304
> ecap 0017[200] = TPH Requester 1
>
> netmap seems to activate on the onboard igb NICs.
> # dmesg | grep -i netma
> netmap: loaded module
> igb0: netmap queues/slots: TX 4/1024, RX 4/1024
> igb1: netmap queues/slots: TX 4/1024, RX 4/1024
>
> # ifconfig cxl1
> cxl1: flags=8843 metric 0 mtu 1500
>
> options=ec07bb
> ether 00:07:43:39:9d:b8
> inet 10.151.10.1 netmask 0xff00 broadcast 10.151.10.255
> nd6 options=29
> media: Ethernet 10Gbase-Twinax 
> status: active
> plugged: SFP/SFP+/SFP28 Unknown (Copper pigtail)
> vendor: CISCO-TYCO PN: 1-2053783-2 SN: TED1605BGWC DATE: 2016-02-01
>
> ---Mike
>
> --
> ---
> Mike Tancsa, tel +1 519 651 3400
> Sentex Communications, m...@sentex.net
> Providing Internet services since 1994 www.sentex.net
> Cambridge, Ontario Canada   http://www.tancsa.com/
> ___
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: RSS, cxgbe, netmap and non-TCP traffic

2017-03-06 Thread Navdeep Parhar
On Mon, Mar 06, 2017 at 11:10:21AM +0100, Miłosz Kaniewski wrote:
> Hello,
> 
> I am trying to split my traffic into two flows:
> 1. TCP traffic
> 2. Other traffic (non-TCP traffic)
> 
> On my NIC (Chelsio T540-CR) I have configured 9 queues (these are
> Netmap queues configured with hw.cxgbe.nnmtx/rx sysctl) . Now I would
> like to direct TCP traffic to queues 0-7 and non-TCP traffic to queue
> number 8. To achieve this I have configured two filters using cxgbetool:
> 
> cxgbetool t5nex0 filter 0 proto 6 hitcnts 1
> cxgbetool t5nex0 filter 1 hitcnts 1 queue 8 tcbhash 1

hitcnts 1 is the default, tcbhash is unnecessary, and queue#  should be
the cntxt_id of the netmap rxq that you want to steer traffic to.  Use
something like this to get the cntxt_id:
# sysctl dev.cxl | grep rxq | grep cntxt_id | grep -vw fl

# cxgbetool t5nex0 filter 0 proto 6 action pass
# cxgbetool t5nex0 filter 1 action pass queue 

I'd also recommend that you explicitly provide the physical port# for
the traffic you're trying to steer to avoid surprises.  For example,
with the above rules you'll have _all_ non-TCP traffic on all ports sent
to queue .  This is likely not what you want.  You can refine
your rules with "iport port#".

# cxgbetool t5nex0 filter 1 iport 0 action pass queue 
(this will steer traffic that shows up on port 0 only)

> 
> And it seems ok because with such configuration non-TCP traffic is
> placed only at queue number 8 and TCP traffic is processed by RSS. But
> there is a problem because RSS uses all 9 queues and in result some
> TCP packets are also distibuted to queue number 8.
> 
> My question is how to limit the number of queues that are used by RSS
> to 8 (queues 0-7)? I tried to set net.inet.rss.bits to "3" but it doesn't
> seems to changes anything.

You'll need to modify the driver to do this.  Do not put the queue for
non-TCP traffic in the indirection table and traffic subject to RSS will
never arrive on that queue.  In this sample diff, the last queue is not
put in the indirection table and you should use the cntxt_id of this
queue when trying to steer non-TCP traffic to it.

--- a/sys/dev/cxgbe/t4_netmap.c Tue Feb 28 12:58:56 2017 -0800
+++ b/sys/dev/cxgbe/t4_netmap.c Mon Mar 06 21:32:03 2017 -0800
@@ -384,6 +384,8 @@ cxgbe_netmap_on(struct adapter *sc, stru
}
for (i = 0; i < vi->rss_size;) {
for_each_nm_rxq(vi, j, nm_rxq) {
+   if (j == vi->nnmrxq - 1)
+   break;
vi->nm_rss[i++] = nm_rxq->iq_abs_id;
if (i == vi->rss_size)
break;

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: ifmedia status callback is non-sleepable

2017-01-26 Thread Navdeep Parhar
I think it's a bad idea in general for the kernel to be holding
non-sleepable locks around driver ioctls.

Regards,
Navdeep

On Thu, Jan 26, 2017 at 9:19 AM, Andrew Rybchenko  wrote:
> Hello,
>
> I'd like to double-check that it is intended/known limitation on ifmedia
> status callback to be non-sleepable.
> The limitation is imposed by usage of the ifmedia ioctl to get status from
> lacp/lagg code on port creation (it holds non-sleepable rm_wlock).
>
> Backtrace of the corresponding panic:
>
> Sleeping thread (tid 100578, pid 10653) owns a non-sleepable lock
> KDB: stack backtrace of thread 100578:
> #0 0x80ae46e2 at mi_switch+0xd2
> #1 0x80b31e6a at sleepq_wait+0x3a
> #2 0x80ae34e2 at _sx_xlock_hard+0x592
> #3 0x8222fd7e at sfxge_media_status+0x2e
> #4 0x80be7b90 at ifmedia_ioctl+0x170
> #5 0x8222c3d0 at sfxge_if_ioctl+0x1f0
> #6 0x82277fbe at lagg_port_ioctl+0xde
> #7 0x82278f9b at lacp_linkstate+0x4b
> #8 0x822794c2 at lacp_port_create+0x1e2
> #9 0x82276a73 at lagg_ioctl+0x1243
> #10 0x80bdcbec at ifioctl+0xfbc
> #11 0x80b41ab4 at kern_ioctl+0x2d4
> #12 0x80b41771 at sys_ioctl+0x171
> #13 0x80fa16ae at amd64_syscall+0x4ce
> #14 0x80f8442b at Xfast_syscall+0xfb
> panic: sleeping thread
> cpuid = 23
> KDB: stack backtrace:
> #0 0x80b24077 at kdb_backtrace+0x67
> #1 0x80ad93e2 at vpanic+0x182
> #2 0x80ad9253 at panic+0x43
> #3 0x80b39a99 at propagate_priority+0x299
> #4 0x80b3a59f at turnstile_wait+0x3ef
> #5 0x80ab493d at __mtx_lock_sleep+0x13d
> #6 0x80ad39f2 at _rm_wlock+0xb2
> #7 0x82275479 at lagg_port_setlladdr+0x29
> #8 0x80b363ca at taskqueue_run_locked+0x14a
> #9 0x80b361bf at taskqueue_run+0xbf
> #10 0x80a9340f at intr_event_execute_handlers+0x20f
> #11 0x80a93676 at ithread_loop+0xc6
> #12 0x80a90055 at fork_exit+0x85
> #13 0x80f8467e at fork_trampoline+0xe
>
> I think vnic driver has the problem as well since it grabs sx_lock from
> ifmedia status callback.
>
> Andrew.
> ___
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: cxgbe's native netmap support broken since r307394

2016-12-19 Thread Navdeep Parhar
IFNET_RLOCK will work, thanks.

Navdeep

On Mon, Dec 19, 2016 at 3:21 AM, Vincenzo Maffione  wrote:
> Hi Navdeep,
>
>   Indeed, we have reviewed the code, and we think it is ok to
> implement nm_os_ifnet_lock() with IFNET_RLOCK(), instead of using
> IFNET_WLOCK().
> Since IFNET_RLOCK() results into sx_slock(), this should fix the issue.
>
> On FreeBSD, this locking is needed to protect a flag read by nm_iszombie().
> However, on Linux the same lock is also needed to protect the call to
> the nm_hw_register() callback, so we prefer to have an "unified"
> locking scheme, i.e. always calling nm_hw_register under the lock.
>
> Does this make sense to you? Would it be easy for you to make a quick
> test by replacing IFNET_WLOCK with IFNET_RLOCK?
>
> Thanks,
>   Vincenzo
>
> 2016-12-17 23:28 GMT+01:00 Navdeep Parhar :
>> Luigi, Vincenzo,
>>
>> The last major update to netmap (r307394 and followups) broke cxgbe's
>> native netmap support.  The problem is that netmap_hw_reg now holds an
>> rw_lock around the driver's netmap_on/off routines.  It has always been
>> safe for the driver to sleep during these operations but now it panics
>> instead.
>>
>> Why is IFNET_WLOCK needed here?  It seems like a regression to disallow
>> sleep on the control path.
>>
>> Regards,
>> Navdeep
>>
>> begin_synchronized_op with the following non-sleepable locks held:
>> exclusive rw ifnet_rw (ifnet_rw) r = 0 (0x8271d680) locked @
>> /root/ws/head/sys/dev/netmap/netmap_freebsd.c:95
>> stack backtrace:
>> #0 0x810837a5 at witness_debugger+0xe5
>> #1 0x81084d88 at witness_warn+0x3b8
>> #2 0x83ef2bcc at begin_synchronized_op+0x6c
>> #3 0x83f14beb at cxgbe_netmap_reg+0x5b
>> #4 0x809846f1 at netmap_hw_reg+0x81
>> #5 0x809806de at netmap_do_regif+0x19e
>> #6 0x8098121d at netmap_ioctl+0x7ad
>> #7 0x8098682f at freebsd_netmap_ioctl+0x5f
>
>
>
> --
> Vincenzo Maffione
> ___
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


cxgbe's native netmap support broken since r307394

2016-12-17 Thread Navdeep Parhar
Luigi, Vincenzo,

The last major update to netmap (r307394 and followups) broke cxgbe's
native netmap support.  The problem is that netmap_hw_reg now holds an
rw_lock around the driver's netmap_on/off routines.  It has always been
safe for the driver to sleep during these operations but now it panics
instead.

Why is IFNET_WLOCK needed here?  It seems like a regression to disallow
sleep on the control path.

Regards,
Navdeep

begin_synchronized_op with the following non-sleepable locks held:
exclusive rw ifnet_rw (ifnet_rw) r = 0 (0x8271d680) locked @
/root/ws/head/sys/dev/netmap/netmap_freebsd.c:95
stack backtrace:
#0 0x810837a5 at witness_debugger+0xe5
#1 0x81084d88 at witness_warn+0x3b8
#2 0x83ef2bcc at begin_synchronized_op+0x6c
#3 0x83f14beb at cxgbe_netmap_reg+0x5b
#4 0x809846f1 at netmap_hw_reg+0x81
#5 0x809806de at netmap_do_regif+0x19e
#6 0x8098121d at netmap_ioctl+0x7ad
#7 0x8098682f at freebsd_netmap_ioctl+0x5f
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: netmap, netmap-fwd, and how many M packets-per-second?

2016-12-01 Thread Navdeep Parhar
How have you configured netmap-fwd?  If you provide details on how the
router or firewall is setup I can try similar experiments here.

On Thu, Dec 1, 2016 at 3:55 PM, Jordan Caraballo
 wrote:
> Feedback and/or tips and tricks more than welcome.
>
> We are trying to process huge amounts of small (64 bytes) pps through a
> router. So far results have not been as we expected. We have tested
> FreeBSD 10.3, 11.0, 11.0-STABLE, and 12.0-CURRENT with and without
> netmap. Based on netmap documentation we were expecting about 5.0M pps;
> alongside with the routing improvements from the freebsd routing
> proposal a total of 12.0M.
>
> Server Description:
>
> Dell PowerEdge R530 with 2 Intel(R) Xeon(R) E5­2695 CPU's, 18 cores per
> cpu. Equipped with a Chelsio T-580-CR dual port in an 8x slot.
>
> BIOS tweaks:
>
> Hyperthreading (or Logical Processors) is turned off.
>
> Current results are shown below. Additional configurations can be given
> upon request.
>
> Test Environment:
> 5 clients and 5 servers - 4 Dell C6100 and 2 Dell R420; each one
> equipped with 10G NICS (4 intel 8259X and 6 with mellanox connectx2).
>
> Script that execute the following on each host.
>
> #!/usr/local/bin/bash
> # Iterate through ports and start tests
> for ((i=1;i<=PORTS;i++)); do
> PORT=$(($PORT+1))
> iperf3 -c 172.16.2.$IP -u -b1m -i 0 -N -l$PKT -t$TIME -P$STREAMS
> -p$PORT &
> #iperf3 -c 172.16.2.$IP -i0 -N -l$PKT -t$TIME -P$STREAMS -p$PORT &
> done
>
> # FreeBSD 10.3 - 4 streams to 80 ports from each client (5)
>
> input(Total)   output
>packets  errs idrops  bytespackets  errs  bytes colls drops
>   1.9M 0  1.3M   194M   540k 057M 0 0
>   2.1M 0  1.5M   216M   556k 058M 0 0
>   1.8M 0  1.3M   192M   553k 058M 0 0
>   1.7M 0  1.1M   174M   542k 057M 0 0
>   1.9M 0  1.4M   204M   537k 056M 0 0
>   1.6M 0  1.1M   171M   550k 058M 0 0
>   1.6M 0  1.1M   173M   546k 057M 0 0
>   1.7M 0  1.1M   176M   564k 059M 0 0
>   2.0M 0  1.5M   212M   543k 057M 0 0
>   2.1M 0  1.5M   219M   557k 058M 0 0
>   1.9M 0  1.4M   205M   547k 057M 0 0
>   1.7M 0  1.2M   179M   553k 058M 0 0
>
> # FreeBSD 11.0 - 4 streams to 80 ports from each client (5)
>
> input(Total)   output
>packets  errs idrops  bytespackets  errs  bytes colls drops
>   3.1M 0  1.8M   326M   1.3M 0   134M 0 0
>   2.6M 0  1.5M   269M   1.1M 0   116M 0 0
>   2.7M 0  1.5M   285M   1.2M 0   127M 0 0
>   2.4M 0  1.3M   257M   1.1M 0   119M 0 0
>   2.7M 0  1.5M   287M   1.3M 0   134M 0 0
>   2.5M 0  1.3M   262M   1.2M 0   127M 0 0
>   2.1M 0  1.1M   224M   1.0M 0   108M 0 0
>   2.7M 0  1.4M   285M   1.4M 0   143M 0 0
>   2.6M 0  1.3M   272M   1.3M 0   136M 0 0
>   2.5M 0  1.4M   265M   1.1M 0   120M 0 0
> # FreeBSD 11.0-STABLE - 4 streams to 80 ports from each client (5)
>
> input(Total)   output
>packets  errs idrops  bytespackets  errs  bytes colls drops
>   1.9M 0  849k   195M   1.0M 0   107M 0 0
>   1.9M 0  854k   196M   1.0M 0   106M 0 0
>   1.9M 0  851k   196M   1.0M 0   107M 0 0
>   1.9M 0  851k   196M   1.0M 0   107M 0 0
>   1.9M 0  851k   196M   1.0M 0   107M 0 0
>   1.9M 0  852k   196M   1.0M 0   107M 0 0
>   1.9M 0  847k   195M   1.0M 0   107M 0 0
>   1.9M 0  836k   195M   1.0M 0   107M 0 0
>   1.9M 0  843k   195M   1.0M 0   107M 0 0
> # FreeBSD 12.0-CURRENT - 4 streams to 80 ports from each client (5)
>
>input(Total)   output
>packets  errs idrops  bytespackets  errs  bytes colls drops
>   1.1M   259 0   115M   1.1M 0   115M 0 0
>   1.2M   273 0   124M   1.2M 0   124M 0 0
>   1.1M   200 0   112M   1.1M 0   112M 0 0
>   1.2M   290 0   122M   1.2M 0   122M 0 0
>   1.0M   132 0   107M   1.0M 0   107M 0 0
>   1.1M   303 0   118M   1.1M 0   118M 0 0
>   1.1M   278 0   112M   1.1M 0   112M 0 0
>  

Re: unable to use BPF Just-In-Time compiler

2016-09-19 Thread Navdeep Parhar
Use "options BPF_JITTER" in your kernconf.  If this is new code you must
include opt_bpf.h in it.  If it's part of a module then the module's
makefile must have opt_bpf.h in its SRCS.  When in doubt take a look at
all the obt_bpf.h in your obj tree after the buildkernel and see if they
have "#define BPF_JITTER 1" or not.  Remove all opt_bpf.h files in the
obj tree and retry buildkernel if all else fails.

Regards,
Navdeep

On Mon, Sep 19, 2016 at 10:41:38AM +, KrishnamRaju ErapaRaju wrote:
> Thanks for your response Bjeorn.
> 
> I added "device bpf_jitter"  line as you suggested, but not luck.
> Still no code under BPF_JITTER macro is executing.
> 
> Please help me  enabling JIT compiler.
> 
> Thanks,
> Krishna.
> -Original Message-
> From: Bjoern A. Zeeb [mailto:bzeeb-li...@lists.zabbadoz.net] 
> Sent: Sunday, September 18, 2016 2:41 AM
> To: KrishnamRaju ErapaRaju 
> Cc: freebsd-net@freebsd.org
> Subject: Re: unable to use BPF Just-In-Time compiler
> 
> On 15 Sep 2016, at 5:32, KrishnamRaju ErapaRaju wrote:
> 
> > Hi,
> >
> > I want to use BPF JIT Kernel APIs in FreeBSD(like: bpf_jitter(), 
> > etc..), for implementing TCP connection packet filtering.
> >
> > I have followed below instructions as specified in: 
> > https://lists.freebsd.org/pipermail/freebsd-current/2005-December/0586
> > 03.html
> >
> > STEPS followed:
> > -
> > cp /usr/src/sys/amd64/conf/GENERIC /usr/src/sys/amd64/conf/MYKERNEL
> >
> > And added below line in MYKERNEL config file.
> > options BPF_JITTER
> 
> I think you want
> 
> device bpf_jitter
> 
> The options line to my understanding only turns it on by default.
> 
> >
> > make buildkernel KERNCONF=MYKERNEL
> > make installkernel KERNCONF=MYKERNEL
> > reboot
> >
> > But after reboot the flag BPF_JITTER is not getting enabled(all the 
> > code inside "#ifdef BPF_JITTER" is not getting executed).
> >
> > Am I missing something?
> >
> > Also it looks like there are not many updates to BPF JIT code since 
> > 2005, is it stable? anyone using it?
> >
> > Thanks,
> > Krishna.
> > ___
> > freebsd-net@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> ___
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Netmap Checksum Offloading

2016-06-15 Thread Navdeep Parhar
On 06/15/2016 16:15, Andrey Yakovlev wrote:
> ive heard on bsdcan this year that some patches exist to add hwcsum 
> offloading to netmap, hope to see it chelsio at least

cxgbe/cxl is a bit sneaky and will let you override netmap (on tx only).
 The ncxl interfaces declare themselves capable of checksumming but all
such capabilities are disabled by default.  Just enable txcsum on the
interface and the hardware will do checksum insertion on tx.  No way to
solve the rx part entirely within the driver -- netmap has to be willing
to accept checksum related flags from the driver.

Regards,
Navdeep

> 
> -- 
> ./andy
> 
> 
> 14.06.2016, 12:15, "Dominik Schoeffmann" :
>> Dear Netmap Developers,
>>
>> during the course of my bachelor's thesis, I modified a packet generator
>> called MoonGen [1] in order to utilize netmap.
>> One key component was to flexibly offload checksums for different kinds
>> of packets (IPv4, UDP, TCP).
>> The ixgbe netmap patch was modified [2] in order to construct context
>> descriptors and suitable data descriptors. This is implemented in less
>> than 250 LoC (including pseudo-header calculations).
>> The man page states, that checksum offloading is available via ethtool,
>> although a solution inside the netmap API might be a cleaner way for
>> applications to actually use these features.
>> Attached is a graph showing the performance implication of using
>> offloading in the current implementation.
>> As can be seen, offloading has only a minor impact.
>> When regarding this data (and comparing it to other frameworks), please
>> keep in mind, that internally a lot of per-packet effort is needed due
>> to the software architecture of the packet generator.
>>
>> The question being:
>> Would it not make sense to include checksum offloading inside of netmap
>> in order to accomodate applications operating on layer 3 and above?
>> As these programs need to calculate the checksums in software, it would
>> be just as fast to move these calculations to the kernel for NICs
>> without checksum offloading support (and the kernel would act as a library).
>> The problem which currently is imposed by the fact, that netmap exports
>> the complete ring, is that context descriptors disrupt the data
>> descriptors, which is unpleasant for the application.
>> But you may find the data interesting nevertheless.
>>
>> Best Regards,
>> Dominik Schoeffmann
>>
>> [1] https://github.com/dschoeffm/MoonGen/tree/netmap
>> [2] https://github.com/dschoeffm/netmap/tree/mg-chksum-offloading
> ___
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> 

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Changing MTU on cxgbe

2016-05-27 Thread Navdeep Parhar
On Fri, May 27, 2016 at 12:23:02AM -0700, K. Macy wrote:
> On Thursday, May 26, 2016, Navdeep Parhar  wrote:
> 
> > On Fri, May 27, 2016 at 12:57:34AM -0400, Garrett Wollman wrote:
> > > In article <
> > cajpshy4vf5ky6guausloorogiquyd2ccrmvxu8x3carqrzx...@mail.gmail.com
> > > you write:
> > >
> > > ># ifconfig -m cxgbe0
> > > >cxgbe0: flags=8943
> > >
> > > ># ifconfig cxgbe0 mtu 9000
> > > >ifconfig: ioctl SIOCSIFMTU (set mtu): Invalid argument
> > >
> > > I believe this device, like many others, does not allow the MTU (or
> > > actually the MRU) to be changed once the receive ring has been set up
> >
> > This is not correct.  You can change the MTU of a cxgbe/cxl interface at
> > any time (whether it's up or down, passing traffic or idle, etc.).
> 
> 
> For some reason the stack needs init to be called when the MTU is changed
> for it to actually change the size of the packets passed to the driver. At
> least cxgb does not do that. I'm not at my computer right now, but cxgbe
> may be the same. If that's the case just up / down the interface. It _will_
> take effect without that if it's passed at module load.

The problem that was reported was that the ioctl that sets the MTU
failed, not that the ioctl succeeded but the MTU change did not take
effect.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Changing MTU on cxgbe

2016-05-26 Thread Navdeep Parhar
On Fri, May 27, 2016 at 12:57:34AM -0400, Garrett Wollman wrote:
> In article 
>  you 
> write:
> 
> ># ifconfig -m cxgbe0
> >cxgbe0: flags=8943
> 
> ># ifconfig cxgbe0 mtu 9000
> >ifconfig: ioctl SIOCSIFMTU (set mtu): Invalid argument
> 
> I believe this device, like many others, does not allow the MTU (or
> actually the MRU) to be changed once the receive ring has been set up

This is not correct.  You can change the MTU of a cxgbe/cxl interface at
any time (whether it's up or down, passing traffic or idle, etc.).

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Changing MTU on cxgbe

2016-05-26 Thread Navdeep Parhar
On Thu, May 26, 2016 at 11:52:45PM -0500, Dustin Marquess wrote:
> After my many issues with ixgbe & ixv, I ended up removing the Intel
> X520s with Chelsio T420-CR and the Intel X710s with Chelsio T520-CR.
> So far so good, except I can't seem to change the MTU away from 1500.
> In fact, ifconfig can't seem to change the mtu at all.  On -CURRENT as
> of yesterday:
> 
> # ifconfig -m cxgbe0
> cxgbe0: flags=8943
> metric 0 mtu 1500
> options=ec07bb
> capabilities=ecc7bb
> ether 00:07:43:11:2b:80
> nd6 options=29
> media: Ethernet 10Gbase-Twinax 
> status: active
> supported media:
> media 10Gbase-Twinax mediaopt full-duplex
> 
> # ifconfig cxgbe0 mtu 9000
> ifconfig: ioctl SIOCSIFMTU (set mtu): Invalid argument
> 
> # ifconfig cxgbe0 mtu 1500
> ifconfig: ioctl SIOCSIFMTU (set mtu): Invalid argument
> 
> Any ideas?

It should have worked.  I tried just now on a system running -CURRENT as
of today (r300785) and didn't encounter any problems.

# ifconfig cxl0 up
# ifconfig cxl0 mtu 500
# ifconfig cxl0 mtu 
# ifconfig cxl0 mtu 7992
...

Is the cxgbe/cxl ifnet part of a lagg or bridge or something like that?
Run this while you try the ifconfig and see what ioctls are failing with
EINVAL.  You may have to do a "kldload dtraceall" before this.

# dtrace -n 'fbt::*_ioctl:return /arg1 == 22/ {}'

> Also, the Chelsios don't seem to use iovctl to do SR-IOV like the
> Intels did (I'm using bhyve).  I assume that's what the
> hw.cxgbe.num_vis loader setting is for?

num_vis does create multiple autonomous ifnets that share the same
physical port but it doesn't involve SR-IOV.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Software iWARP on FreeBSD

2016-05-20 Thread Navdeep Parhar
On 05/20/2016 09:36, Vijay Singh wrote:
> Hello. I'm looking to have software iWARP working on FreeBSD (kernel mode
> for now). Bernard and his team have an implementation:
> https://github.com/zrlio/softiwarp/tree/master/kernel. It has been released
> under the BSD license. Currently it is written for Linux, so there will be
> porting effort required, which I am willing to undertake. Would there be
> any opposition to getting this into the tree when it's ready? We can also
> take care of keeping the source updated as and when needed.

Very nice.  I can help with interop testing with hardware based iWARP
implementations.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: netmap overrun counters

2016-04-28 Thread Navdeep Parhar
On 04/28/2016 11:13, bazzoola wrote:
> Hi!
> 
> Two questions:
> 
> (1) Is there a way to know when netmap rx rings overrun? Most NIC
> drivers provide MPC (missed packet count) and sysctl for rx_overrun.
> 
> I would like to know if my application is not reading as fast, i.e., no
> frames are being dropped.

A NIC's hardware counters don't distinguish between netmap or normal
operation so you can monitor the existing sysctls for rx drops/overruns.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Netmap seems to randomly cause Kernel panic on shutdown

2016-04-20 Thread Navdeep Parhar
On 04/05/2016 01:30, Steven Crangle wrote:
> 
> Hi,
> 
> 
> I'm looking for a bit of help to track down the reason behind this
> kernel panic, it seems like netmap works fine for the majority of the
> time, but occasionally it will cause the box to kernel panic.
> 
> 
> The machine is running FreeBSD Current from a few weeks ago (Rev
> 296937)

Doesn't seem to be the case.  The uname -a in the attachment indicates
it's FreeBSD 10.2-R-p10.  The stack also shows netmap_do_regif calling
netmap_do_unregif and only 10.x's netmap_do_regif does that.

As to the actual failure, it appears that netmap_do_regif on the cxl
interface failed and something on the error path caused panic.

Please try with HEAD, or try to figure out why netmap_do_regif failed.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Tracing dropped UDP packets

2016-04-20 Thread Navdeep Parhar
On 04/15/2016 22:38, bazzoola wrote:
> Greetings,
> 
> I would like to know where (in the kernel) UDP packets are dropped.

Have you tried netstat -sp udp ?  If the drops show up in some counter
there then you can look at the kernel code to see where the counter is
incremented.

Regards,
Navdeep

> 
> I looked at udp_usrreq.c but is overwhelming for the 1st time. Is it
> possible to use DTrace to locate where the packets are being dropped? or
> is there a tool similar to 'dropwatch' which can tell me where in the
> kernel my packets are getting dropped?
> 
> On Linux, I used dropwatch and they were dropped packets were at
> udp_queue_rcv_skb().
> 
> Thanks!
> ___
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> 

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: solved: Re: Chelsio T520-SO-CR low performance (netmap tested) for RX

2016-01-23 Thread Navdeep Parhar
On Sat, Jan 23, 2016 at 09:33:32PM -0800, Luigi Rizzo wrote:
> On Sat, Jan 23, 2016 at 8:28 PM, Marcus Cenzatti  wrote:
> >
> >
> > On 1/24/2016 at 1:10 AM, "Luigi Rizzo"  wrote:
> >>
> >>Thanks for re-running the experiments.
> >>
> >>I am changing the subject so that in the archives it is clear
> >>that the chelsio card works fine.
> >>
> >>Overall the tests confirm that whenever you hit the host stack you
> >>are bound
> >>to the poor performance of the latter. The problem does not appear
> >>using intel
> >>as a receiver because on the intel card netmap mode disables the
> >>host stack.
> >>
> >>More comments on the experiments:
> >>
> >>The only meaningful test is the one where you use the DMAC of the
> >>ncxl0 port:
> >>
> >>SENDER: ./pkt-gen -i ix0 -f tx -S 00:07:e9:44:d2:ba -D
> >>00:07:43:33:8d:c1
> >>
> >>in the other experiment you transmit broadcast frames and hit the
> >>network stack.
> >>ARP etc do not matter since tx and rx are directly connected.
> >>
> >>On the receiver you do not need to specify addresses:
> >>
> >>RECEIVER: ./pkt-gen -i ncxl0 -f rx
> >>
> >>The numbers in netstat are clearly rounded, so 15M is probably
> >>14.88M
> >>(line rate), and 3.7M that you see correctly represents the
> >>difference
> >>between incoming and received packets.
> >>
> >>The fact that you see drops may be related to the NIC being unable
> >>to
> >>replenish the queue fast enough, which in turn may be a hardware
> >>or a
> >>software (netmap) issue.
> >>You may try experiment with shorter batches on the receive side
> >>(say, -b 64 or less) and see if you have better results.
> >>
> >>A short batch replenishes the rx queue more frequently, but it is
> >>not a conclusive experiment because there is an optimization in
> >>the netmap poll code which, as an unintended side effect,
> >>replenishes
> >>the queue less often than it should.
> >>For a conclusive experiment you should grab the netmap code from
> >>github.com/luigirizzo/netmap and use pkt-gen-b which
> >>uses busy wait and works around the poll "optimization"
> >>
> >>thanks again for investigating the issue.
> >>
> >>cheers
> >>luigi
> >>
> >
> > so as a summary, with IP test on intel card, netmap disables the host stack 
> > while on chelsio netmap does not disable the host stack and we ket things 
> > injected to host, so the only reliable test is mac based when using chelsio 
> > cards?
> >
> > yes I am already running github's netmap code, let's try with busy code:
> ...
> > chelsio# ./pkt-gen-b -i ncxl0 -f rx
> > 785.659290 main [1930] interface is ncxl0
> > 785.659337 main [2050] running on 1 cpus (have 4)
> > 785.659477 extract_ip_range [367] range is 10.0.0.1:0 to 10.0.0.1:0
> > 785.659496 extract_ip_range [367] range is 10.1.0.1:0 to 10.1.0.1:0
> > 785.718707 main [2148] mapped 334980KB at 0x80180
> > Receiving from netmap:ncxl0: 2 queues, 1 threads and 1 cpus.
> > 785.718784 main [2235] Wait 2 secs for phy reset
> > 787.729197 main [2237] Ready...
> > 787.729449 receiver_body [1412] reading from netmap:ncxl0 fd 3 main_fd 3
> > 788.730089 main_thread [1720] 11.159 Mpps (11.166 Mpkts 5.360 Gbps in 
> > 1000673 usec) 205.89 avg_batch 0 min_space
> > 789.730588 main_thread [1720] 11.164 Mpps (11.169 Mpkts 5.361 Gbps in 
> > 1000500 usec) 183.54 avg_batch 0 min_space
> > 790.734224 main_thread [1720] 11.172 Mpps (11.213 Mpkts 5.382 Gbps in 
> > 1003636 usec) 198.84 avg_batch 0 min_space
> > ^C791.140853 sigint_h [404] received control-C on thread 0x801406800
> > 791.742841 main_thread [1720] 4.504 Mpps (4.542 Mpkts 2.180 Gbps in 1008617 
> > usec) 179.62 avg_batch 0 min_space
> > Received 38091031 packets 2285461860 bytes 196774 events 60 bytes each in 
> > 3.41 seconds.
> > Speed: 11.166 Mpps Bandwidth: 5.360 Gbps (raw 7.504 Gbps). Average batch: 
> > 193.58 pkts
> >
> > chelsio# ./pkt-gen-b -b 64 -i ncxl0 -f rx
> > 522.430459 main [1930] interface is ncxl0
> > 522.430507 main [2050] running on 1 cpus (have 4)
> > 522.430644 extract_ip_range [367] range is 10.0.0.1:0 to 10.0.0.1:0
> > 522.430662 extract_ip_range [367] range is 10.1.0.1:0 to 10.1.0.1:0
> > 522.677743 main [2148] mapped 334980KB at 0x80180
> > Receiving from netmap:ncxl0: 2 queues, 1 threads and 1 cpus.
> > 522.677822 main [2235] Wait 2 secs for phy reset
> > 524.698114 main [2237] Ready...
> > 524.698373 receiver_body [1412] reading from netmap:ncxl0 fd 3 main_fd 3
> > 525.699118 main_thread [1720] 10.958 Mpps (10.966 Mpkts 5.264 Gbps in 
> > 1000765 usec) 61.84 avg_batch 0 min_space
> > 526.700108 main_thread [1720] 11.086 Mpps (11.097 Mpkts 5.327 Gbps in 
> > 1000991 usec) 61.06 avg_batch 0 min_space
> > 527.705650 main_thread [1720] 11.166 Mpps (11.227 Mpkts 5.389 Gbps in 
> > 1005542 usec) 61.91 avg_batch 0 min_space
> > 528.707113 main_thread [1720] 11.090 Mpps (11.107 Mpkts 5.331 Gbps in 
> > 1001463 usec) 61.34 avg_batch 0 min_space
> > 529.707617 main_thread [1720] 10.847 Mpps (10.853 Mpkts 5.209 Gbps in 
> > 1000504 usec) 62.51 avg_batch 0 min_spa

Re: solved: Re: Chelsio T520-SO-CR low performance (netmap tested) for RX

2016-01-23 Thread Navdeep Parhar
On Sat, Jan 23, 2016 at 08:38:24PM -0800, Adrian Chadd wrote:
> ok, that's a discussion to have with navdeep. That /should/ work.
> Someone may have changed it lately.

Yes this used to work.

> 
> Things should behave very well and predictable once you can disable
> cxl0 but not ncxl0. :-P

The plan is to clean all this up by moving the netmap specific parts to
a driver module of its own.  So when you load if_cxgbe you'll get only
the cxl interfaces.  If you want netmap access to the ports you'll be
able to kldload cxgbe_netmap (or something like that) which will create
the ncxl ports.  These ncxl ports _will_ operate like normal ifnets
hooked to the kernel stack if netmap isn't enabled on them.  And the
cxgbe_netmap driver will attach to PCIe PFs 0-3 so it won't take up
resources (interrupt vectors, etc.) from PF4, which is what the main
if_cxgbe attaches to.  You'll certainly be able to up/down/whatever all
the interfaces independent of each other.  All this will get done in
time for FreeBSD 11.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Chelsio T520-SO-CR low performance (netmap tested) for RX

2016-01-23 Thread Navdeep Parhar
On Sat, Jan 23, 2016 at 11:12:28AM -0800, Luigi Rizzo wrote:
> On Sat, Jan 23, 2016 at 10:38 AM, Navdeep Parhar  wrote:
> > On Sat, Jan 23, 2016 at 03:48:39PM -0200, Marcus Cenzatti wrote:
> > ...
> >>
> >> woops, my bad, yes probably we had some drop, with -S and -D now I get 
> >> 1.2Mpps.
> >
> > Run "netstat -hdw1 -i cxl" on the receiver during your test.
> 
> Navdeep, does this give any info on the ncxl port rather
> than the cxl port connected to the host stack ?

You're right, it should have been "netstat -hdw 1 -I ncxl".  In these
kinds of experiments it might even be best to run two netstats in
parallel on cxl and ncxl.

> 
> ...
> > Do you know if the transmitter will pad up so as not to put runts on the
> > wire?  If not then you might want to bump up the size of the frame
> > explicitly (there's some pkt-gen knob for this).
> >
> 
> ix/ixl do automatic padding, and in any case pkt-gen
> by default generates valid packet sizes (and so it does
> with the variable-size tests I suggested).
> 
> Is there any parameter that controls interrupt moderation ?
> 
> In any case we need to know the numbers when sending to the
> ncxl MAC address as opposed to broadcast.
> 
> I suspects one of these problems:
> 
> - the interrupt moderation interval is too long thus limiting
>   the rate to one ring/interval. Unlikely though, even
>   with 1k slots, the measured 1.2 Mpps corresponds to almost
>   1ms which is too long
> 
> - the receiver cannot cope with the input load and somehow
>   takes a long time to recover from congestion. If this is
>   the case, running the sender at a lower rate might reach
>   a peak throughput > 1.2 Mpps when the receiver can still
>   keep up, and then drop to the low rate under congestion.
> 
> - and of course bus errors, when the device is connected on
>   a PCIe slot with only 1-2 data lanes.
>   This actually happens a lot, physical connector sizes
>   do not reflect the number of active PCIe lanes.

There are no drops or PAUSE or any sign of backpressure.  The netstat
counters show 900K incoming and 0 drops/errors, which means 900K packets
on the wire for the port and all were delivered to the driver
successfully.

The mismatch in the transmitter's counter and the incoming counter can
only be explained by
a) Frames whose DMAC address didn't match the local interface's MAC.
This can be tested by switching cxl0 and ncxl0 to promisc mode to see if
that opens the flood gates.
b) Frames mangled badly enough to be discarded.  But these should show
as an error or drop in at least one of these:

sysctl dev.cxl..stats
sysctl -n dev.t5nex.0.misc.tp_err_stats
netstat -hd -I cxl0
netstat -hd -I ncxl0

The only broken counter in cxgbe that I know of is rx_runts and we've
already verified that the transmitter isn't generating runts.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Chelsio T520-SO-CR low performance (netmap tested) for RX

2016-01-23 Thread Navdeep Parhar
On Sat, Jan 23, 2016 at 04:54:52PM -0200, Marcus Cenzatti wrote:
...
> here is the output for netstat when I pkt-gen -f tx un-throttled (14Mpps):
> 
> input(Total)   output
>packets  errs idrops  bytespackets  errs  bytes colls drops
>   900k 0 055M  3 0550 0 0
>   900k 0 055M  3 0422 0 0
>   900k 0 055M  3 0422 0 0
>   900k 0 055M  3 0422 0 0
>   900k 0 055M  9 0   2.4K 0 0
>   900k 0 055M  3 0422 0 0
>   900k 0 055M  3 0422 0 0
>   900k 0 055M  3 0422 0 0
>   900k 0 055M  3 0422 0 0
>   900k 0 055M  3 0422 0 0

This means that the chip really is getting ~900K packets per second from
the wire.  If it was receiving 14.8M and delivering 900K you'd have seen
14.8M in packets and 13.9M (14.8 - 900K) in idrops.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Chelsio T520-SO-CR low performance (netmap tested) for RX

2016-01-23 Thread Navdeep Parhar
On Sat, Jan 23, 2016 at 03:12:59PM -0200, Marcus Cenzatti wrote:
...
> intel# ./pkt-gen -i ix0 -f tx -d 00:07:43:33:8d:c1 -s 00:07:e9:44:d2:ba
> 267.767848 main [1715] interface is ix0
> 267.767990 extract_ip_range [291] range is 0.0.0.0:90 to 0.0.0.0:90
> 267.768006 extract_ip_range [291] range is 0.0.0.0:7 to 0.0.0.0:7

Does this mean the packets are being transmitted with source and
destination IP all 0?  Try to provide a more reasonable IP address
range.  The T5 receiver might be throwing away "obviously" bad frames.
(See my other email for how to check if it's dropping frames)

> 267.872796 main [1910] mapped 334980KB at 0x801c0
> Sending on netmap:ix0: 8 queues, 1 threads and 1 cpus.
> 00 -> 00 (00:00:00:00:00:00 -> ff:ff:ff:ff:ff:ff)

This you've fixed already.  L2 broadcasts will get replicated by the
receiver and will be delivered to both the cxl and the ncxl interface.
The ncxl interface is set to drop on congestion but the cxl interface is
set to emit PAUSE on congestion.  cxl plugs into the stack, which is
slow at pps workloads, and so L2 broadcasts will result in PAUSE out of
the port and will slow down the transmitter.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Chelsio T520-SO-CR low performance (netmap tested) for RX

2016-01-23 Thread Navdeep Parhar
On Sat, Jan 23, 2016 at 03:48:39PM -0200, Marcus Cenzatti wrote:
...
> 
> woops, my bad, yes probably we had some drop, with -S and -D now I get 
> 1.2Mpps.

Run "netstat -hdw1 -i cxl" on the receiver during your test.  Do you
see errs and/or idrops incrementing?  The input "packets" counter should
match what the transmitter is claiming to transmit at.

Also check the output of this:
# sysctl -n dev.t5nex.0.misc.tp_err_stats
It is ok if you see tnlCongDrops, but any of the Errs counter going up
is not good -- it means the incoming frames had errors.

Do you know if the transmitter will pad up so as not to put runts on the
wire?  If not then you might want to bump up the size of the frame
explicitly (there's some pkt-gen knob for this).

Regards,
Navdeep

> 
> curiously, I have always used -s/-d with IP addresses on ix-ix testing this is
> why I never noticed the case, since ix always received 14Mpps, but you
> probably explained it since ix has one single deviceport per wire, hence the
> different behavior
> 
> performance stills very low when compared to TX and to what is expected
> 
> thank you for noticing the case
> 
> 
> 
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Chelsio T520-SO-CR low performance (netmap tested) for RX

2016-01-23 Thread Navdeep Parhar
On Sat, Jan 23, 2016 at 03:34:27AM -0200, Marcus Cenzatti wrote:
> hello,
> 
> I am testing a chelsio t520-so-cr connected to a Intel card with ix(4)
> driver, I can get the ncxl0 interface to transmit at 14Mpps to another
> chelsio or to a Intel card. However I can only get 800Kpps-1Mpps for
> RX tests from both chelsio or Intel.
> 
> I have test with both FreeBSD 11 and FreeBSD 10.3-PRERELEASE.
> 
> I tested it untuned first and later I have applied tuning
> recommendations I found on BSDRP[1] website. Results still ranging
> from 800Kpps to 1Mpps for RX.
> 
> Tests are done w/ with pkt-gen in netmap mode on ncxl interface with
> both IP address and MAC address source/dest.

The ncxl interfaces have their own MAC addresses.  Make sure the sender
uses the MAC of the receiver's ncxl interface as the destination MAC.
(netmap's pkt-gen -f tx transmits L2 broadcasts by default).

Check for PAUSE frames coming out of the receiver (sysctl dev.cxl | grep
tx_pause).  If it's receiving frames on netmap interface the tx_pause
counter should not move.

Regards,
Navdeep

> 
> I have tested ix-ix and I can confirm 14Mpps for both RX and TX
> directions. I have tested with two different chelsio T520 and both
> have the very same results.
> 
> What particular loader/sysctl or ifconfig options I should
> investigate?
> 
> I also tested disabling txcsum, rxcsum and TSO. Results are different
> but still on the much lower mentioned 800K-1M pps rate.
> 
> thank you
> 
> [1]http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_hp_proliant_dl360p_gen8_with_10-gigabit_with_10-gigabit_chelsio_t540-cr
> 
> ___
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Chelsio cxl and ncxl interface, whats the difference?

2016-01-20 Thread Navdeep Parhar
On Wed, Jan 20, 2016 at 03:58:18PM +, Teleric Team wrote:
> I got a Chelsio T5 520-SO with two ports and I get 2 interfaces for it
> port, cxl and ncxl (cxl0 ncxl0 cxl1 ncxl1). Man page mentions cxl is
> for T5, what about ncxl? Should I get both or is something wrong?
> Which one should I use?  (is there any difference?).

You should use the cxl interfaces.  The 'n' interfaces are for netmap
use and show up only if you have netmap support compiled into the kernel
(which is the default for HEAD but not any stable/release branch).

There is work in progress to move the 'n' interfaces to their own module
so that they'll show up only if you load that extra module to get native
netmap support for cxl/cxgbe.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: ethernet header size

2016-01-08 Thread Navdeep Parhar

sizeof(struct ether_header)



On 01/08/2016 15:03, Hadi Rezaee wrote:

Hello there,

In some part of my application I need to have the Ethernet header size
(ideally, using sizeof).
Well I guess 'ethhdr' is not exist on FreeBSD, correct ?

According to Linux definition:

struct ethhdr {
 unsigned charh_dest[ETH_ALEN];
 unsigned charh_source[ETH_ALEN];
 unsigned shorth_proto;
} __attribute__((packed));

So, assume the ethernet header size is equal to 14, is it going to work
?! :)
and if there is already a definition somewhere in system header files,
so I don't have to define the size myself ?

Thank you

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"



___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Kernel panics in tcp_twclose

2015-09-23 Thread Navdeep Parhar
On Wed, Sep 23, 2015 at 11:15:03PM +0200, Palle Girgensohn wrote:
...
> > By the way Palle could you also run below Dtrace script to see where
> > this tcp_close() in INP_TIMEWAIT comes from:
> > 
> > $ cat tcp-close-tw.d
> > fbt::tcp_close:entry
> > /args[0]->t_inpcb->inp_flags & 0x0100/
> > {
> >  @s1[stack()] = count()
> > }
> > 
> > tick-1sec {
> > printa(@s1);
> > }
> > $ sudo dtrace -s tcp-close-tw.d
> 
> # dtrace -s tcp-close-tw.d
> dtrace: failed to compile script tcp-close-tw.d: line 2: t_inpcb is not a 
> member of struct e1000_hw
> 
> > 
> 
> on one system...
> 
> and for the other two:
> 
> # dtrace -s tcp-close-tw.d
> dtrace: failed to initialize dtrace: DTrace device not available on system
> 
> I'm adding
> 
> options KDTRACE_HOOKS
> 
> to the kernels, I guess that will help?

Load the DTrace modules ("kldload dtraceall") before trying to run the
DTrace script.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: netmap custom RSS and custom packet info

2015-06-29 Thread Navdeep Parhar

On 06/29/2015 08:17, Slawa Olhovchenkov wrote:
...

b) custom RSS. Modern NIC have RSS poorly interoperable with packet
analysing: packets from same flow, but different direction placed in
different queue, ...


This is default behavior because the default hash (Toeplitz) is not 
symmetrical.  There are modern NICs that do support other, symmetrical 
hashes.


Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Intel XL710 40GE NIC, i40 driver and wire speed performance with netmap

2015-06-28 Thread Navdeep Parhar
On Sun, Jun 28, 2015 at 01:18:44PM +0300, Pavel Odintsov wrote:
> Hello, folks!
> 
> I'm looking for solution which could do wire speed (56 mpps with
> 64byte packets) for 40GE.
> 
> We have tested PF_RING/DPDK on Linux and could not achieve more than
> ~42 mpps and it's not enough for us.

Is this transmit-only, receive-only, or a mix of the two (in which case,
is this an aggregate number or total/2)?

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Frequent hickups on the networking layer

2015-04-28 Thread Navdeep Parhar
On Wed, Apr 29, 2015 at 01:08:00AM -0400, Garrett Wollman wrote:
> <  said:
...
> > As far as I know (just from email discussion, never used them myself),
> > you can either stop using jumbo packets or switch to a different net
> > interface that doesn't allocate 9K jumbo mbufs (doing the receives of
> > jumbo packets into a list of smaller mbuf clusters).
> 
> Or just hack the driver to not use them.  For the Intel drivers this
> is easy, and at least for the hardware I have there's no benefit to
> using 9k clusters over 4k; for Chelsio it's quite a bit harder.

Quite a bit harder, and entirely unnecessary these days.  Recent
versions of the Chelsio driver will fall back to 4K clusters
automatically (and on the fly) if the system is short of 9K clusters.
There are even tunables that will let you set 4K as the only cluster
size that the driver should allocate.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: net.inet.ip.forwarding impact on throughput

2015-04-23 Thread Navdeep Parhar
On Tue, Apr 21, 2015 at 12:47:45PM -0700, Scott Larson wrote:
>  We're in the process of migrating our network into the future with 40G
> at the core, including our firewall/traffic routers with 40G interfaces. An
> issue which this exposed and threw me for a week turns out to be directly
> related to net.inet.ip.forwarding and I'm looking to just get some insight
> on what exactly is occurring as a result of using it.

Enabling forwarding disables LRO and TSO and that probably accounts for
a large part of the difference in throughput that you've observed.  The
number of packets passing through the stack (and not the amount of data
passing through) is the dominant bottleneck.

fastforwarding _should_ make a difference, but only if packets actually
take the fast-forward path.  Check the counters available via netstat:
# netstat -sp ip | grep forwarded

Regards,
Navdeep

>  What I am seeing is when that knob is set to 0, an identical pair of
> what will be PF/relayd servers with direct DAC links between each other
> using Chelsio T580s can sustain around 38Gb/s on iperf runs. However the
> moment I set that knob to 1, that throughput collapses down into the 3 to
> 5Gb/s range. As the old gear this is replacing is all GigE I'd never
> witnessed this. Twiddling net.inet.ip.fastforwarding has no apparent effect.
>  I've not found any docs going in depth on what deeper changes enabling
> forwarding does to the network stack. Does it ultimately put a lower
> priority on traffic where the server functioning as the packet router is
> the final endpoint in exchange for having more resources available to route
> traffic across interfaces as would generally be the case?
> 
> 
> *[image: userimage]Scott Larson[image: los angeles]
> Lead
> Systems Administrator[image: wdlogo]  [image:
> linkedin]  [image: facebook]
>  [image: twitter]
>  [image: instagram]
> T 310 823 8238 x1106
> <310%20823%208238%20x1106>  |  M 310 904 8818 <310%20904%208818>*
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


netstat output on a recent head

2015-02-24 Thread Navdeep Parhar
I see a lot of literal "%s" in netstat's output on head.  This on a 
freshly built system from today..



# netstat -hdw 1
input(Total)   output
   packets  errs idrops  bytespackets  errs  bytes colls drops
%10s 17 %5s 0 %5s 0 %10s   1.3K %10s  6 %5s 
0 %10s997 %5s0 %5s  0
%10s 16 %5s 0 %5s 0 %10s   1.1K %10s  5 %5s 
0 %10s848 %5s0 %5s  0
%10s 14 %5s 0 %5s 0 %10s902 %10s  4 %5s 
0 %10s629 %5s0 %5s  0
%10s 18 %5s 0 %5s 0 %10s   1.4K %10s  6 %5s 
0 %10s883 %5s0 %5s  0
%10s 19 %5s 0 %5s 0 %10s   1.7K %10s  6 %5s 
0 %10s883 %5s0 %5s  0
%10s 38 %5s 0 %5s 0 %10s   3.2K %10s 11 %5s 
0 %10s   1.5K %5s0 %5s  0
%10s 18 %5s 0 %5s 0 %10s   1.4K %10s  5 %5s 
0 %10s794 %5s0 %5s  0
%10s 14 %5s 0 %5s 0 %10s   1.2K %10s  5 %5s 
0 %10s794 %5s0 %5s  0

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] [Changed Subscribers] D1761: Extend LRO support to accumulate more than 65535 bytes

2015-02-03 Thread np (Navdeep Parhar)
np added a subscriber: np.
np added a reviewer: lstewart.
np added a comment.

LRO affects the kernel TCP code in subtle (and almost always undesirable) ways 
by "compressing" multiple TCP headers into one.  Think TCP timestamps, bursty 
changes in sequence space as seen by the kernel, what happens to pure acks, 
etc.  Our LRO implementation is even willing to combine multiple received TCP 
PSHes into one.  All this is a decent tradeoff when a handful of segments are 
involved but the proposed LRO_PLEN_MAX (1MB) is 700+ MSS sized segments at 1500 
MTU.  I wonder how well the kernel TCP will deal with such big bubbles of data. 
 Please do get the big picture reviewed by one of our TCP protocol experts.

M_HASHTYPE_LRO_TCP isn't really a hash type and will likely confuse the RSS 
code.  There is some value in providing the hash type to the stack but with 
your proposed change the hash type will be clobbered by tcp_lro_flush.  Data 
for a single stream will show up in the stack with either the correct hash type 
or M_HASHTYPE_LRO_TCP.  Not pretty.  

I wonder what one of these gigantic LRO'd packet looks like to bpf consumers?  
If they go by the ip header then they will likely get confused.  A good test 
would be to see if wireshark is able to follow the TCP stream accurately or 
does it lose track when it encounters one of these VLRO (Very Large RO) packets?

At the very least, allow drivers to opt out of this VLRO by
a) making LRO_PLEN_MAX per lro_ctrl, to be set when the LRO structures are 
initialized by the driver.
b) never clobbering the hash type in tcp_lro.c if the total length accumulated 
is <= 65535.

REVISION DETAIL
  https://reviews.freebsd.org/D1761

To: hselasky, rmacklem, rrs, glebius, gnn, emaste, imp, adrian, bz, rwatson, 
lstewart
Cc: np, freebsd-net
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: cxgbe and netmap

2015-01-02 Thread Navdeep Parhar
On Fri, Jan 02, 2015 at 06:57:50PM +0300, Alexander V. Chernikov wrote:
> Hello list!
> 
> FreeBSD has netmap support for chelsio T5 cards, which is amazing.
> The great thing about implementation is that you can play with
> traffic-generating applications without affecting "main" OS interface,
> which has always been a problem for Intel cards.
> However, this approach (having additional netmap-only ifp) turns to be a
> bit problematic for netmap-based networking elements participating in
> routing.
> 
> In Intel case you can configure all your interfaces, run routing daemon,
> run netmap application and punt all to-host traffic  to kernel via host
> pipes.
> It looks like I can't do this using current implementation: mac
> addresses are different for main/netmap interfaces so I can't run
> routing daemon on main interface (or sub-interfaces).
> I also can't run routing daemon on top of ncxgbe* interface since it
> appears to ignore non-netmap-derived traffic..
> 
> Is it possible to make ncxgbe* interfaces behave more like ordinary ones?
> 

Yes, I need to write a simple transmit and receive handler for the
non-netmap traffic on the ncxgbe/ncxl interfaces.  This is a bit
complicated because the normal rx runs in a mode where 1 rx buffer does
not always equal 1 rx frame.

Now that netmap is in GENERIC, it may be best to carve out a separate
cxgbe_netmap module that can be loaded by those who want to use netmap
on top of cxgbe/cxl hardware.  So no more magic 'n' interfaces by
default (some people were caught by surprise at the sudden appearance of
the 'n' interfaces on HEAD), and fully functional 'n' interfaces as soon
as you load the additional module.

What do you and other netmap users think?  I'm open to taking this
driver's netmap support in whatever direction the users want it to go.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: OFED support on FreeBSD

2014-08-25 Thread Navdeep Parhar

On 08/25/14 13:39, David Somayajulu wrote:

Hi All,
What is the current support for OFED on FreeBSD ? Are there any drivers which 
support either RoCE or iWARP ?


The iw_cxgbe module in sys/dev/cxgbe/iw_cxgbe is an iWARP driver for 
cxgbe(4) hardware.  The upstream version is kernel verbs only, and is 
probably missing some bugfixes.


Regards,
Navdeep


Thanks
David Somayajulu



This message and any attached documents contain information from QLogic 
Corporation or its wholly-owned subsidiaries that may be confidential. If you 
are not the intended recipient, you may not read, copy, distribute, or use this 
information. If you have received this transmission in error, please notify the 
sender immediately by reply e-mail and then delete this message.
___
freebsd-driv...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-drivers
To unsubscribe, send any mail to "freebsd-drivers-unsubscr...@freebsd.org"



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [RFC] Add support for hardware transmit rate limiting queues [WAS: Add support for changing the flow ID of TCP connections]

2014-08-20 Thread Navdeep Parhar

On 08/20/14 12:25, Hans Petter Selasky wrote:

On 08/20/14 20:44, Navdeep Parhar wrote:

On 08/20/14 00:34, Hans Petter Selasky wrote:

Hi,

A month has passed since the last e-mail on this topic, and in the
meanwhile some new patches have been created and tested:

Basically the approach has been changed a little bit:

- The creation of hardware transmit rings has been made independent of
the TCP stack. This allows firewall applications to forward traffic into
hardware transmit rings aswell, and not only native TCP applications.
This should be one more reason to get the feature into the kernel.

- A hardware transmit ring basically can have two modes: FIXED-RATE or
AUTOMATIC-RATE. In the fixed rate mode all traffic is sent at a fixed
bytes per second rate. In the automatic mode you can configure a time
after which the TX queue must be empty. The hardware driver uses this to
configure the actual rate. In automatic mode you can also set an upper
and lower transmit rate limit.

- The MBUF has got a new field in the packet header: "txringid"

- IOCTLs for TCP v4 and v6 sockets has been updated to allow setting of
the "txringid" field in the mbuf.

The current patch [see attachment] should be much simpler and less
intrusive than the previous one.

Any comments ?



Here are some thoughts.  The first two bullets cover relatively
minor issues, the rest are more important.

- All of the mbuf pkthdr fields today have the same meaning no matter
   what the context.  It is not clear what txringid's global meaning is.
   Is it even possible for driver foo to interpret it the same way as
   driver bar?  What if the number of rings are different, or if the ring
   at the particular index for foo is setup differently than the ring at
   that same index for bar?  You are attempting to influence the driver's
   txq selection and traditionally the mbuf's flowid has been used for
   this purpose.  Have you considered allowing the user to set the flowid
   directly?  And mark it as such via a new rsstype so the kernel will
   leave it alone.


Hi,

At work so to speak, we have tried to make a simple approach that will
not break existing code, without trying to optimise the possibilities
and reduce memory footprint.



- uint32_t -> m_flowid_t is plain gratuitous.  Now we need to include
   mbuf.h in more places just to get this definition.  What's the
   advantage of this?  style(9) isn't too fond of typedefs either.  Also,
   drivers *do* need to know the width of the flowid.  At least lagg(4)
   looks at the high bits of the flowid (see flowid_shift in lagg).  How
   high it can go depends on the width of the flowid.


The flowid should be typedef'ed. Else how can you know its type passing
flowid along function arguments and so on?


It's just a simple 32 bit unsigned int and all drivers know exactly what
it is.  I don't think we need type checking for trivial stuff like this.
We trust code to do the right thing and that's the correct tradeoff
here, in my opinion.  Or else we'd end up with errno_t, fd_t, etc. and
programming in C would not be fun anymore.  Here's a hyperbolic example:

errno_t socket(domain_t domain, socktype_t type, protocol_t protocol);

(oops, it returns an int -1 or 0 so errno_t is not strictly correct, but
you get my point).





- Interfaces can come and go, routes can change, and so the relationship
   between an inpcb and txringid is not stable at all.  What happens when
   the outbound route for an inpcb changes?


This is managed separately by a daemon or such. The problem about using
the "inpcb" approach which you are suggesting, is that you limit the
rate control feature to traffic which is bound by sockets. Can your way
of doing rate control be useful to non-socket based firewall
applications, for example?

You also assume a 1:1 mapping between "inpcb" and the flowID, right.
What about M:N mappings, where multiple streams should share the same
flowID, because it makes more sense?


You're right that an inpcb based scheme won't work for non-socket based
firewall.

inpcb represents an endpoint, almost always with an associated socket,
and it mostly has a 1:1 relation with an n-tuple (SO_LISTEN and UDP
sockets with no default destination are notable exceptions).  If you're
talking of non-socket based firewalls, then where is the inpcb coming
from?  Firewalls typically keep their own state for the n-tuples that
they are interested in.  It almost seems like you need a n-tuple ->
rate_limit mapping scheme instead of inpcb -> rate_limit.

Regards,
Navdeep





- The in_ratectlreq structure that you propose is inadequate in its
   current form.  For example, cxgbe's hardware can do rate limiting on a
   per-ring as well as per-connection basis, and it allows for pps,
   bandwidth, or min-max limits.  I think this is the critical piece that
   we NIC maintainers must agre

Re: [RFC] Add support for hardware transmit rate limiting queues [WAS: Add support for changing the flow ID of TCP connections]

2014-08-20 Thread Navdeep Parhar

On 08/20/14 00:34, Hans Petter Selasky wrote:

Hi,

A month has passed since the last e-mail on this topic, and in the
meanwhile some new patches have been created and tested:

Basically the approach has been changed a little bit:

- The creation of hardware transmit rings has been made independent of
the TCP stack. This allows firewall applications to forward traffic into
hardware transmit rings aswell, and not only native TCP applications.
This should be one more reason to get the feature into the kernel.

- A hardware transmit ring basically can have two modes: FIXED-RATE or
AUTOMATIC-RATE. In the fixed rate mode all traffic is sent at a fixed
bytes per second rate. In the automatic mode you can configure a time
after which the TX queue must be empty. The hardware driver uses this to
configure the actual rate. In automatic mode you can also set an upper
and lower transmit rate limit.

- The MBUF has got a new field in the packet header: "txringid"

- IOCTLs for TCP v4 and v6 sockets has been updated to allow setting of
the "txringid" field in the mbuf.

The current patch [see attachment] should be much simpler and less
intrusive than the previous one.

Any comments ?



Here are some thoughts.  The first two bullets cover relatively
minor issues, the rest are more important.

- All of the mbuf pkthdr fields today have the same meaning no matter
  what the context.  It is not clear what txringid's global meaning is.
  Is it even possible for driver foo to interpret it the same way as
  driver bar?  What if the number of rings are different, or if the ring
  at the particular index for foo is setup differently than the ring at
  that same index for bar?  You are attempting to influence the driver's
  txq selection and traditionally the mbuf's flowid has been used for
  this purpose.  Have you considered allowing the user to set the flowid
  directly?  And mark it as such via a new rsstype so the kernel will
  leave it alone.

- uint32_t -> m_flowid_t is plain gratuitous.  Now we need to include
  mbuf.h in more places just to get this definition.  What's the
  advantage of this?  style(9) isn't too fond of typedefs either.  Also,
  drivers *do* need to know the width of the flowid.  At least lagg(4)
  looks at the high bits of the flowid (see flowid_shift in lagg).  How
  high it can go depends on the width of the flowid.

- Interfaces can come and go, routes can change, and so the relationship
  between an inpcb and txringid is not stable at all.  What happens when
  the outbound route for an inpcb changes?

- The in_ratectlreq structure that you propose is inadequate in its
  current form.  For example, cxgbe's hardware can do rate limiting on a
  per-ring as well as per-connection basis, and it allows for pps,
  bandwidth, or min-max limits.  I think this is the critical piece that
  we NIC maintainers must agree on before any code hits the core kernel:
  how to express a rate-limit policy in a standard way and allow for
  hardware assistance opportunistically.  ipfw(4)'s dummynet is probably
  interested in this part too, so it's great that Luigi is paying
  attention to this thread.

- The RATECTL ioctls deal with in_ratectlreq so we need to standardize
  the ratectlreq structure before these ioctls can be considered generic
  ifnet ioctls.  This is the reason cxgbetool (and not ifconfig) has a
  private ioctl to frob cxgbe's per-queue rate-limiters.  I did not want
  to add ifnet ioctls that in reality were cxgbe only.  Ditto for i2c
  ioctls.  Now we have multiple drivers with i2c and melifaro@ is doing
  the right thing by promoting these private ioctls to a standard ifnet
  ioctl.  Have you considered a private mlxtool as a stop gap measure?

To summarize my take on all of this: we need a standard ratectlreq
structure, a standard way to associate an inpcb with one, and a standard
way to pass on this info to if_transmit.  After all this is in place we
could even have a dummynet-ish software layer that implements rate
limiters when the underlying hardware offers no assistance.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [patch][lagg] - Set a better granularity and distribution on roundrobin protocol.

2014-07-18 Thread Navdeep Parhar
On 07/18/14 19:06, Marcelo Araujo wrote:
> 
> 
> 
> 2014-07-19 2:18 GMT+08:00 Navdeep Parhar  <mailto:npar...@gmail.com>>:
> 
> On 07/18/14 00:49, Marcelo Araujo wrote:
> > Hello guys,
> >
> > I made few changes on the lagg(4) patch. Also, I made tests using
> igb(4),
> > ixgbe(4) and em(4); seems everything worked pretty well.
> >
> > I'm wondering if anyone else could make a review, and what I need
> to do, to
> > see this patch committed.
> 
> Deliberately putting out-of-order packets on the wire is never a good
> idea.  This would count as a serious regression in lagg(4) imho.
> 
> Regards,
> Navdeep
> 
> 
> 
> I'm wondering if anyone have tested the patch; because as I have
> explained in another email, the number of SACK is much less with this
> patch. I have put some pcap files
> here: http://people.freebsd.org/~araujo/lagg/
> 
> Also, as far as I know, the current roundrobin implementation has no
> such kind of mechanism to control the order of the packages that goes to
> the wire. And this patch, what it only does is, instead to send only one
> package through one interface and switch to the another one, it will
> send X(where X is the number of packets defined via sysctl) packets and
> then, switch to the next interface.
> 
> So, could you show me, where this patch deliberately put out-of-order
> packets? Did I miss anything?

Are you saying lagg's roundrobin implementation is already spraying
packets for the same flow across interfaces?  That would make it
unsuitable for anything TCP.  But then your patch isn't making it any
worse so I don't have any objection to it any more.

Looks like loadbalance does the right thing for flows.

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [patch][lagg] - Set a better granularity and distribution on roundrobin protocol.

2014-07-18 Thread Navdeep Parhar
On 07/18/14 00:49, Marcelo Araujo wrote:
> Hello guys,
> 
> I made few changes on the lagg(4) patch. Also, I made tests using igb(4),
> ixgbe(4) and em(4); seems everything worked pretty well.
> 
> I'm wondering if anyone else could make a review, and what I need to do, to
> see this patch committed.

Deliberately putting out-of-order packets on the wire is never a good
idea.  This would count as a serious regression in lagg(4) imho.

Regards,
Navdeep


> 
> Best Regards,
> 
> 
> 
> 
> 2014-06-24 10:40 GMT+08:00 Marcelo Araujo :
> 
>>
>>
>> 2014-06-24 6:54 GMT+08:00 Adrian Chadd :
>>
>> Hi,
>>>
>>> No, don't introduce out of order behaviour. Ever.
>>
>>
>> Yes, it has out of order behavior; with my patch much less. I upload two
>> pcap files and you can see by yourself, if you don't believe in what I'm
>> talking about.
>>
>> Test done using: "iperf -s" and "iperf -c  -i 1 -t 10".
>>
>> 1) Don't change the number of packets(default round robin behavior).
>> http://people.freebsd.org/~araujo/lagg/lagg-nop.cap
>> 8 out of order packets.
>> Several SACKs.
>>
>> 2) Set the number of packets to 50.
>> http://people.freebsd.org/~araujo/lagg/lagg.cap
>> 0 out of order packets.
>> Less SACKs.
>>
>>
>>> You may not think
>>> it's a problem for TCP, but UDP things and VPN things will start
>>> getting very angry. There are VPN configurations out there that will
>>> drop the VPN if frames are out of order.
>>>
>>
>> I'm not thinking that will be a problem for TCP, but, in somehow it will
>> be, less throughput as I showed before, and less SACK. About the VPN,
>> please, tell me which softwares, and let me know where I can get a sample
>> to make a testbed.
>>
>> However to be very honest, I don't believe anyone here when change
>> something at network protocols will make this extensive testbed. It is
>> almost impossible to predict what software it will works or not, and I
>> don't believe anyone here has all these stuff in hands.
>>
>>
>>>
>>> The ixgbe driver is setting the flowid to the msix queue ID, rather
>>> than a 32 bit unique flow id hash value for the flow. That makes it
>>> hard to do traffic distribution where the flowid is available.
>>>
>>
>> Thanks for the explanation.
>>
>>
>>>
>>> There's an lagg option to re-hash the mbuf rather than rely on the
>>> flowid for outbound port choice - have you looked at using that? Did
>>> that make any difference?
>>>
>>
>> Yes, I set to 0 the net.link.lagg.0.use _flowid, it make a little
>> difference to the default round robin implementation, but yet I can't reach
>> more than 5 Gbit/s. With my patch and set the packets to 50, it improved a
>> bit too.
>>
>> So, thank you so much for all review, I don't know if you have time and a
>> testbed to make a real test, as I'm doing. I would be happy if you or more
>> people could make tests on that patch. Also, I have only ixgbe(4) to make
>> tests, would appreciate if this patch could be tested with other NICs too.
>>
>> Best Regards,
>>
>> --
>> Marcelo Araujo(__)
>> ara...@freebsd.org \\\'',)http://www.FreeBSD.org 
>>    \/  \ ^
>> Power To Server. .\. /_)
>>
>>
> 
> 
> 
> 
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> 

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: change netmap global lock to sx?

2014-07-16 Thread Navdeep Parhar

On 05/27/14 17:32, Luigi Rizzo wrote:
> 
> 
> 
> On Wed, May 28, 2014 at 1:49 AM, Navdeep Parhar wrote:
> 
> I'd like to change the netmap global lock from a mutex into a sleepable
> shared/exclusive lock.  This will allow a driver's nm_register hook
> (which is called with the global lock held) to sleep if it has to.  I've
> casually used pkt-gen after this conversion (patch attached) and the
> witness hasn't complained about it.
> 
> 
> ​no objections, let me give this a try on stable/10
> stable/9 to make sure we can use the same code there as well


Any updates?  I'm considering what to have in cxgbe(4) in time for 10.1
and this needs to be sorted out before cxgbe's netmap support gets MFC'd
to any stable branch.

Regards,
Navdeep


> 
> cheers
> luigi
> ​
> 
> Thoughts?
> 
> Regards,
> Navdeep
> 
> 
> diff -r 0300d80260f4 sys/dev/netmap/netmap_kern.h
> --- a/sys/dev/netmap/netmap_kern.h  Fri May 23 19:00:56 2014 -0700
> +++ b/sys/dev/netmap/netmap_kern.h  Sat May 24 12:49:15 2014 -0700
> @@ -43,13 +43,13 @@
>  #define unlikely(x)__builtin_expect((long)!!(x), 0L)
> 
>  #defineNM_LOCK_T   struct mtx
> -#defineNMG_LOCK_T  struct mtx
> -#define NMG_LOCK_INIT()mtx_init(&netmap_global_lock, \
> -   "netmap global lock", NULL, MTX_DEF)
> -#define NMG_LOCK_DESTROY() mtx_destroy(&netmap_global_lock)
> -#define NMG_LOCK() mtx_lock(&netmap_global_lock)
> -#define NMG_UNLOCK()   mtx_unlock(&netmap_global_lock)
> -#define NMG_LOCK_ASSERT()  mtx_assert(&netmap_global_lock,
> MA_OWNED)
> +#defineNMG_LOCK_T  struct sx
> +#define NMG_LOCK_INIT()sx_init(&netmap_global_lock, \
> +   "netmap global lock")
> +#define NMG_LOCK_DESTROY() sx_destroy(&netmap_global_lock)
> +#define NMG_LOCK() sx_xlock(&netmap_global_lock)
> +#define NMG_UNLOCK()   sx_xunlock(&netmap_global_lock)
> +#define NMG_LOCK_ASSERT()  sx_assert(&netmap_global_lock,
> SA_XLOCKED)
> 
>  #defineNM_SELINFO_Tstruct selinfo
>  #defineMBUF_LEN(m) ((m)->m_pkthdr.len)
> 
> 


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: tuning routing using cxgbe and T580-CR cards?

2014-07-14 Thread Navdeep Parhar
4M15G82   0   0   0 0   6   0   0 81267  197
> 176539  0 92  8
>  0 0 0574M15G 2   0   0   0 0   6   0   0 82151  150
> 177434  0 91  9
>  0 0 0574M15G82   0   0   0 0   6   0   0 73904  204
> 160887  0 91  9
>  0 0 0574M15G 2   0   0   0 8   6   0   0 73820  150
> 161201  0 91  9
>  0 0 0574M15G82   0   0   0 0   6   0   0 73926  196
> 161850  0 92  8
>  0 0 0574M15G 2   0   0   0 0   6   0   0 77215  150
> 166886  0 91  9
>  0 0 0574M15G82   0   0   0 0   6   0   0 77509  198
> 169650  0 91  9
>  0 0 0574M15G 2   0   0   0 0   6   0   0 69993  156
> 154783  0 90 10
>  0 0 0574M15G82   0   0   0 0   6   0   0 69722  199
> 153525  0 91  9
>  0 0 0574M15G 2   0   0   0 0   6   0   0 66353  150
> 147027  0 91  9
>  0 0 0550M15G   102   0   0   0   101   6   0   0 67906  259
> 149365  0 90 10
>  0 0 0550M15G 0   0   0   0 0   6   0   0 71837  125
> 157253  0 92  8
>  0 0 0550M15G80   0   0   0 0   6   0   0 73508  179
> 161498  0 92  8
>  0 0 0550M15G 0   0   0   0 0   6   0   0 72673  125
> 159449  0 92  8
>  0 0 0550M15G80   0   0   0 0   6   0   0 75630  175
> 164614  0 91  9
> 
> 
> 
> 
> On 07/11/2014 03:32 PM, Navdeep Parhar wrote:
>> On 07/11/14 10:28, John Jasem wrote:
>>> In testing two Chelsio T580-CR dual port cards with FreeBSD 10-STABLE,
>>> I've been able to use a collection of clients to generate approximately
>>> 1.5-1.6 million TCP packets per second sustained, and routinely hit
>>> 10GB/s, both measured by netstat -d -b -w1 -W (I usually use -h for the
>>> quick read, accepting the loss of granularity).
>> When forwarding, the pps rate is often more interesting, and almost
>> always the limiting factor, as compared to the total amount of data
>> being passed around.  10GB at this pps probably means 9000 MTU.  Try
>> with 1500 too if possible.
>>
>> "netstat -d 1" and "vmstat 1" for a few seconds when your system is
>> under maximum load would be useful.  And what kind of CPU is in this system?
>>
>>> While performance has so far been stellar, and I'm honestly speculating
>>> I will need more CPU depth and horsepower to get much faster, I'm
>>> curious if there is any gain to tweaking performance settings. I'm
>>> seeing, under multiple streams, with N targets connecting to N servers,
>>> interrupts on all CPUs peg at 99-100%, and I'm curious if tweaking
>>> configs will help, or its a free clue to get more horsepower.
>>>
>>> So, far, except for temporarily turning off pflogd, and setting the
>>> following sysctl variables, I've not done any performance tuning on the
>>> system yet.
>>>
>>> /etc/sysctl.conf
>>> net.inet.ip.fastforwarding=1
>>> kern.random.sys.harvest.ethernet=0
>>> kern.random.sys.harvest.point_to_point=0
>>> kern.random.sys.harvest.interrupt=0
>>>
>>> a) One of the first things I did in prior testing was to turn
>>> hyperthreading off. I presume this is still prudent, as HT doesn't help
>>> with interrupt handling?
>> It is always worthwhile to try your workload with and without
>> hyperthreading.
>>
>>> b) I briefly experimented with using cpuset(1) to stick interrupts to
>>> physical CPUs, but it offered no performance enhancements, and indeed,
>>> appeared to decrease performance by 10-20%. Has anyone else tried this?
>>> What were your results?
>>>
>>> c) the defaults for the cxgbe driver appear to be 8 rx queues, and N tx
>>> queues, with N being the number of CPUs detected. For a system running
>>> multiple cards, routing or firewalling, does this make sense, or would
>>> balancing tx and rx be more ideal? And would reducing queues per card
>>> based on NUMBER-CPUS and NUM-CHELSIO-PORTS make sense at all?
>> The defaults are nrxq = min(8, ncores) and ntxq = min(16, ncores).  The
>> man page mentions this.  The reason for 8 vs. 16 is that tx queues are
>> "cheaper" as they don't have to be backed by rx buffers.  It only needs
>> some memory for the tx descriptor ring and some hardware resources.
>>
>> It appears that your system has >= 16 cores.  For forwarding it probably
>> makes sense to have nrxq = ntxq.  If you're left with 8 or fewer cores
>> after disabling hyperthreading you'll automatically get 8 rx and tx
>> queues.  Oth

Re: tuning routing using cxgbe and T580-CR cards?

2014-07-13 Thread Navdeep Parhar
On Fri, Jul 11, 2014 at 08:58:21PM -0400, John Jasem wrote:
> 
> On 07/11/2014 03:32 PM, Navdeep Parhar wrote:
> > On 07/11/14 10:28, John Jasem wrote:
> >> In testing two Chelsio T580-CR dual port cards with FreeBSD 10-STABLE,
> >> I've been able to use a collection of clients to generate approximately
> >> 1.5-1.6 million TCP packets per second sustained, and routinely hit
> >> 10GB/s, both measured by netstat -d -b -w1 -W (I usually use -h for the
> >> quick read, accepting the loss of granularity).
> > When forwarding, the pps rate is often more interesting, and almost
> > always the limiting factor, as compared to the total amount of data
> > being passed around.  10GB at this pps probably means 9000 MTU.  Try
> > with 1500 too if possible.
> 
> Yes, I am generally more interested/concerned with the pps. Using
> 1500-sized packets, I've seen around 2 million pps. I'll have hard
> numbers for the list, with netstat and vmstat output Monday.

Thanks!  If possible, please try with even lower packet sizes (128B,
512B, whatever your clients are good at).  You may have to disable Nagle
(TCP_NODELAY option) on your clients to get small TCP packets out of
them.  Or you could just switch to UDP for pps testing.

If all your incoming traffic is received on a single port then try
setting hw.cxgbe.nrxq10g=12 in /boot/loader.conf.  (You mentioned
elsewhere this is a system with 12 real cores).

Regards,
Navdeep

> 
> 
> 
> >> a) One of the first things I did in prior testing was to turn
> >> hyperthreading off. I presume this is still prudent, as HT doesn't help
> >> with interrupt handling?
> > It is always worthwhile to try your workload with and without
> > hyperthreading.
> 
> Testing Mellanox cards, HT was severely detrimental. However, in almost
> every case so far, Mellanox and Chelsio have resulted in opposite
> conclusions (cpufreq, net.isr.*).
> 
> >> c) the defaults for the cxgbe driver appear to be 8 rx queues, and N tx
> >> queues, with N being the number of CPUs detected. For a system running
> >> multiple cards, routing or firewalling, does this make sense, or would
> >> balancing tx and rx be more ideal? And would reducing queues per card
> >> based on NUMBER-CPUS and NUM-CHELSIO-PORTS make sense at all?
> > The defaults are nrxq = min(8, ncores) and ntxq = min(16, ncores).  The
> > man page mentions this.  The reason for 8 vs. 16 is that tx queues are
> > "cheaper" as they don't have to be backed by rx buffers.  It only needs
> > some memory for the tx descriptor ring and some hardware resources.
> >
> > It appears that your system has >= 16 cores.  For forwarding it probably
> > makes sense to have nrxq = ntxq.  If you're left with 8 or fewer cores
> > after disabling hyperthreading you'll automatically get 8 rx and tx
> > queues.  Otherwise you'll have to fiddle with the hw.cxgbe.nrxq10g and
> > ntxq10g tunables (documented in the man page).
> 
> I promise I did look through the man page before posting. :) This is
> actually a 12 core box with HT turned off.
> 
> Mining the cxl stat entries in sysctl, it appears that the queues per
> port are reasonably well balanced, so I may be concerned over nothing.
> 
> 
> 
> >> g) Are there other settings I should be looking at, that may squeeze out
> >> a few more packets?
> > The pps rates that you've observed are within the chip's hardware limits
> > by at least an order of magnitude.  Tuning the kernel rather than the
> > driver may be the best bang for your buck.
> 
> If I am missing obvious configurations for kernel tuning in this regard,
> it would not the be the first time.
> 
> Thanks again!
> 
> -- John Jasen (jja...@gmail.com)
> 
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: tuning routing using cxgbe and T580-CR cards?

2014-07-11 Thread Navdeep Parhar
On 07/11/14 10:28, John Jasem wrote:
> In testing two Chelsio T580-CR dual port cards with FreeBSD 10-STABLE,
> I've been able to use a collection of clients to generate approximately
> 1.5-1.6 million TCP packets per second sustained, and routinely hit
> 10GB/s, both measured by netstat -d -b -w1 -W (I usually use -h for the
> quick read, accepting the loss of granularity).

When forwarding, the pps rate is often more interesting, and almost
always the limiting factor, as compared to the total amount of data
being passed around.  10GB at this pps probably means 9000 MTU.  Try
with 1500 too if possible.

"netstat -d 1" and "vmstat 1" for a few seconds when your system is
under maximum load would be useful.  And what kind of CPU is in this system?

> 
> While performance has so far been stellar, and I'm honestly speculating
> I will need more CPU depth and horsepower to get much faster, I'm
> curious if there is any gain to tweaking performance settings. I'm
> seeing, under multiple streams, with N targets connecting to N servers,
> interrupts on all CPUs peg at 99-100%, and I'm curious if tweaking
> configs will help, or its a free clue to get more horsepower.
> 
> So, far, except for temporarily turning off pflogd, and setting the
> following sysctl variables, I've not done any performance tuning on the
> system yet.
> 
> /etc/sysctl.conf
> net.inet.ip.fastforwarding=1
> kern.random.sys.harvest.ethernet=0
> kern.random.sys.harvest.point_to_point=0
> kern.random.sys.harvest.interrupt=0
> 
> a) One of the first things I did in prior testing was to turn
> hyperthreading off. I presume this is still prudent, as HT doesn't help
> with interrupt handling?

It is always worthwhile to try your workload with and without
hyperthreading.

> b) I briefly experimented with using cpuset(1) to stick interrupts to
> physical CPUs, but it offered no performance enhancements, and indeed,
> appeared to decrease performance by 10-20%. Has anyone else tried this?
> What were your results?
> 
> c) the defaults for the cxgbe driver appear to be 8 rx queues, and N tx
> queues, with N being the number of CPUs detected. For a system running
> multiple cards, routing or firewalling, does this make sense, or would
> balancing tx and rx be more ideal? And would reducing queues per card
> based on NUMBER-CPUS and NUM-CHELSIO-PORTS make sense at all?

The defaults are nrxq = min(8, ncores) and ntxq = min(16, ncores).  The
man page mentions this.  The reason for 8 vs. 16 is that tx queues are
"cheaper" as they don't have to be backed by rx buffers.  It only needs
some memory for the tx descriptor ring and some hardware resources.

It appears that your system has >= 16 cores.  For forwarding it probably
makes sense to have nrxq = ntxq.  If you're left with 8 or fewer cores
after disabling hyperthreading you'll automatically get 8 rx and tx
queues.  Otherwise you'll have to fiddle with the hw.cxgbe.nrxq10g and
ntxq10g tunables (documented in the man page).


> d) dev.cxl.$PORT.qsize_rxq: 1024 and dev.cxl.$PORT.qsize_txq: 1024.
> These appear to not be writeable when if_cxgbe is loaded, so I speculate
> they are not to be messed with, or are loader.conf variables? Is there
> any benefit to messing with them?

Can't change them after the port has been administratively brought up
even once.  This is mentioned in the man page.  I don't really recommend
changing them any way.

> 
> e) dev.t5nex.$CARD.toe.sndbuf: 262144. These are writeable, but messing
> with values did not yield an immediate benefit. Am I barking up the
> wrong tree, trying?

The TOE tunables won't make a difference unless you have enabled TOE,
the TCP endpoints lie on the system, and the connections are being
handled by the TOE on the chip.  This is not the case on your systems.
The driver does not enable TOE by default and the only way to use it is
to switch it on explicitly.  There is no possibility that you're using
it without knowing that you are.

> 
> f) based on prior experiments with other vendors, I tried tweaks to
> net.isr.* settings, but did not see any benefits worth discussing. Am I
> correct in this speculation, based on others experience?
> 
> g) Are there other settings I should be looking at, that may squeeze out
> a few more packets?

The pps rates that you've observed are within the chip's hardware limits
by at least an order of magnitude.  Tuning the kernel rather than the
driver may be the best bang for your buck.

Regards,
Navdeep

> 
> Thanks in advance!
> 
> -- John Jasen (jja...@gmail.com)

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: tuning routing using cxgbe and T580-CR cards?

2014-07-11 Thread Navdeep Parhar
On 07/11/14 11:03, Bjoern A. Zeeb wrote:
> On 11 Jul 2014, at 17:28 , John Jasem  wrote:
> 
>> c) the defaults for the cxgbe driver appear to be 8 rx queues, and
>> N tx queues, with N being the number of CPUs detected. For a system
>> running multiple cards, routing or firewalling, does this make
>> sense, or would balancing tx and rx be more ideal? And would
>> reducing queues per card based on NUMBER-CPUS and NUM-CHELSIO-PORTS
>> make sense at all? … g) Are there other settings I should be
>> looking at, that may squeeze out a few more packets?
> 
> If you are primarily forwarding packets (you say “routing” multiple
> times) the first thing you should do is turn off LRO and TSO on all
> ports.

LRO, sure.  But TSO shouldn't really matter unless the packets originate
from a local TCP endpoint on the system.

Navdeep

> 
> — Bjoern A. Zeeb "Come on. Learn, goddamn it.", WarGames,
> 1983
> 


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [RFC] Add support for changing the flow ID of TCP connections

2014-07-09 Thread Navdeep Parhar
On Wed, Jul 09, 2014 at 04:36:53PM +0200, Hans Petter Selasky wrote:
> On 07/08/14 21:17, Navdeep Parhar wrote:
...
> >
> >I think we need to design this to be as generic as possible.  I have
> >quite a bit of code that does this stuff but I haven't pushed it
> >upstream or even offered it for review (yet).
> >
> 
> Hi,
> 
> When will the non hardware related patches be available for review?
> I understand there are multiple ways to reach the same goal, and I
> think it would be great if we could agree on a common API for
> applications.

Here is the kernel side of the patch:
http://people.freebsd.org/~np/flow_pacing_kern.diff

The registration parameters and the throttling parameters are probably
cxgbe-centric, because that's what it was written for.  We'll need to
tidy up those structs certainly.  And I'd like to add pps constraints to
the throttling parameters (all it does is bandwidth right now).

Regards,
Navdeep
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


  1   2   >