RE: [PATCH 1/1]ixgbe: Driver for Intel 82598 based 10GbE PCI Express family of adapters
-Original Message- From: Jeff Garzik [mailto:[EMAIL PROTECTED] Stephen Hemminger wrote: Using module parameter for per device settings is bad idea. Please extend existing interfaces like ethtool, etc rather than committing to a bad inflexible API. I agreed with Stephen's comments here. In general, net driver policy is to use ethtool (per-interface granularity) rather than module options. Thanks Stephen and Jeff for the feedback. The flow control and InterruptThrottleRate parameters were carried over from e1000 and it was pointed out to me that some disros/customer's scripts used those parameters in the scripts. Ixgbe being a new driver, we can remove those. Yes the LLI parameters can be removed. +RxQueues + +Valid Range: 1, 2, 4, 8 +Default Value: 8 +Number of RX queues. + + Ok. The present driver being NAPI only and supports one Rx queue only. So this parameter needs to be removed . But, once we have DaveM/Stephen work of NAPI struct work is done, driver will support multiple Rx queues and with Multi Tx queue patch already in the kernel, driver will support multiple Tx queues also shortly. So, with driver/device supporting multiple Tx and Rx queues, I think, it would be very useful to have ethtool interface to manage the number of Tx and Rx queues of the interface. Current ethtool interface supports managing the ring size so, we need similar interface for managing the number of Tx and Rx queues of the interface. Ayyappan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/1]ixgbe: Driver for Intel 82598 based 10GbE PCI Express family of adapters
-Original Message- From: Jeff Garzik [mailto:[EMAIL PROTECTED] Veeraiyan, Ayyappan wrote: So, with driver/device supporting multiple Tx and Rx queues, I think, it would be very useful to have ethtool interface to manage the number of Tx and Rx queues of the interface. Absolutely! ethtool patches welcome :) git://git.kernel.org/pub/scm/network/ethtool/ethtool.git We discussed this here today and we will try to come up with patches. Also, FWIW, I will be offline for next 2 months, and someone (Auke ? :)) from our team will submit updated driver shortly.. Ayyappan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/1] ixgbe: Support for Intel(R) 10GbE PCI Express adapters - Take #2
On 7/23/07, Rick Jones [EMAIL PROTECTED] wrote: The bidirectional looks like a two concurrent stream (TCP_STREAM + TCP_MAERTS) test right? If you want a single-stream bidirectional test, then with the top of trunk netperf you can use: Thanks for the feedback Rick. We haven't used the netperf trunk. The person who actually got these numbers will be trying the netperf trunk little later and we will post the results.. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/1] ixgbe: Support for Intel(R) 10GbE PCI Express adapters - Take #2
On 7/10/07, Jeff Garzik [EMAIL PROTECTED] wrote: Veeraiyan, Ayyappan wrote: On 7/10/07, Jeff Garzik [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote: I will post the performance numbers later today.. Sorry for not responding earlier. We faced couple of issues like setup, and false alarms.. Anyway here are the numbers.. RecvSendSendUtilization Service Demand SocketSocketMessageElapsed SendRecvSendRecv SizeSizeSizeTime Throughput local remotelocal remote Bytes Bytes Bytes sec 10^6bits/s % s % s us/KB us/KB 87380 65536 128 60 2261.34 13.82 4.254.006 1.233 128 87380 65536 256 60 3332.51 14.19 5.672.791.115 256 87380 65536 512 60.01 4262.24 14.38 6.9 2.211.062 512 87380 65536 102460 4659.18 14.47.392.026 1.039 1024 87380 65536 204860.01 6177.87 14.36 14.99 1.524 1.59 2048 87380 65536 409660.01 9410.29 11.58 14.60.807 1.017 4096 87380 65536 819260.01 9324.62 11.13 14.33 0.782 1.007 8192 87380 65536 16384 60.01 9371.35 11.07 14.28 0.774 0.999 16384 87380 65536 32768 60.02 9385.81 10.83 14.27 0.756 0.997 32768 87380 65536 65536 60.01 9363.5 10.73 14.26 0.751 0.998 65536 TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to n0417 (10.0.4.17) port 0 AF_INET : cpu bind Recv SendSend Utilization Service Demand Socket Socket Message Elapsed SendRecvSendRecv Size SizeSize Time Throughput local remote local remote bytes bytes bytessecs. 10^6bits/s % S% S us/KB us/KB 87380 65536 6553660.029399.61 2.22 14.530.155 1.013 87380 65536 6553660.029348.01 2.46 14.390.173 1.009 87380 65536 6553660.029403.36 2.26 14.370.158 1.001 87380 65536 6553660.019332.22 2.23 14.510.157 1.019 Bidirectional test. 87380 65536 6553660.01 7809.57 28.6630.022.405 2.519 TX 87380 65536 6553660.01 7592.90 28.6630.022.474 2.591 RX -- 87380 65536 6553660.01 7629.73 28.3229.642.433 2.546 RX 87380 65536 6553660.01 7926.99 28.3229.642.342 2.450 TX Signle netperf stream between 2 quad-core Xeon based boxes. Tested on 2.6.20 and 2.6.22 kernels. Driver uses NAPI and LRO. To summarize, we are seeing the line-rate with NAPI (single Rx queue) and Rx CPU utilization is around 14%. In back to back scenarios, NAPI (combined with LRO) performs clearly better. In multiple client scenarios, Non-NAPI with multiple Rx queues performs better. I am continuously doing more benchmarking and submit a patch to pick one this week. But going forward if NAPI supports multiple Rx queues natively, I believe that would perform much better in most of the cases. Also, did you get a chance to review the driver take #2? I like to implement the review comments (if any) as early as possible, and submit another version. Thanks... Ayyappan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/1] ixgbe: Support for Intel(R) 10GbE PCI Express adapters - Take #2
On 7/10/07, Jeff Garzik [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote: Doing both tends to signal that the author hasn't bothered to measure the differences between various approaches, and pick a clear winner. I did pick NAPI in our previous submission based on various tests. But to get 10Gig line rate we need to use multiple Rx queues which will need fake netdevs.. Since fake netdevs weren't acceptable, I added non-NAPI support which gets 10Gig line rate with multi-rx. I am ok with removing NAPI support till the work of separating the netdevs and NAPI work is done.. I strongly prefer NAPI combined with hardware interrupt mitigation -- it helps with multiple net interfaces balance load across the system, at times of high load -- but I'm open to other solutions as well. Majority of tests we did here, we saw NAPI is better. But for some specific test cases (especially if we add the SW RSC i.e. LRO), we saw better throughput and CPU utilization with non-NAPI. So... what are your preferences? What is the setup that gets closest to wire speed under Linux? :) With SW LRO, non-NAPI is better but without LRO, NAPI is better but NAPI needs multiple Rx queues. So given the limitations, non-NPAI is my preference now. I will post the performance numbers later today.. Jeff Thanks.. Ayyappan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] ixgbe: Introduce new 10GbE driver for Intel 82598 based PCI Express adapters...
From: Neil Horman [mailto:[EMAIL PROTECTED] Replying to myself... I've looked through the driver pretty throughly with regards to my above concern, and it appears the driver is reasonably free of netpoll issues at the moment, at least as far as what we found in e1000 was concerned. I do Thanks for reviewing the code.. however, see a concern in the use of the in_netpoll flag within the driver. Given that the primary registered net_device, and all the dummy net_devices in the rx_ring point to the same ixgbe_adapter structure, there can be some level of confusion over weather a given rx queue is in netpoll_mode or not. The revised driver I am going to post today will not have fake netdevs... adapter prforms a netpoll, all the individual rx queues will follow the in_netpoll path in the receive path (assuming misx interrupts are used). The result I think is the potential for a large amount of packet reordering during a netpoll operation. Perhaps not a serious problem, but likely worth looking Multiple Rx queues are used in non-NAPI mode only, and all Rx queues use one netdev (which is associated with the adapter struct). Also, the RSS (receive side scaling or rx packet steering) feature is used in multiple rx queues mode. In this mode, HW will always select the same Rx queue (for a flow) and this should prevent any packet reordering issue. Neil Ayyappan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] ixgbe: Introduce new 10GbE driver for Intel 82598 based PCI Express adapters...
On 7/2/07, Jeff Garzik [EMAIL PROTECTED] wrote: Ayyappan Veeraiyan wrote: +#define IXGBE_TX_FLAGS_VLAN_MASK 0x +#define IXGBE_TX_FLAGS_VLAN_SHIFT16 defining bits using the form (1 n) is preferred. Makes it easier to read, by eliminating the requirement of the human brain to decode hex into bit numbers. Ok. + struct net_device netdev; + }; Embedded a struct net_device into your ring? How can I put this? Wrong, wrong. Wrong, wrong, wrong. Wrong. Agreed. Fake netdevs are needed for doing the multiple Rx queues in NAPI mode. We are thinking to solve in 2 ways. Having netdev pointer in ring structure or having an array of netdev pointers in ixgbe_adatper struct which would be visible to all rings. + + char name[IFNAMSIZ + 5]; The interface name should not be stored by your ring structure Yes, I agree and also (pointed out by someone before) this would break if the user changes the interface name.. But, having the cause in the MSIX vector name really helps in debugging and helps the user also. I think the below output is much better [EMAIL PROTECTED] src]# cat /proc/interrupts | grep eth0 214: 0 0 PCI-MSI-edge eth0-lsc 215: 11763 4 PCI-MSI-edge eth0-rx7 216: 0 0 PCI-MSI-edge eth0-rx6 217: 77324 0 PCI-MSI-edge eth0-rx5 218: 0 0 PCI-MSI-edge eth0-rx4 219: 52911 0 PCI-MSI-edge eth0-rx3 220: 80271 0 PCI-MSI-edge eth0-rx2 221: 80244 6 PCI-MSI-edge eth0-rx1 222: 12 0 PCI-MSI-edge eth0-rx0 223: 124870 28543 PCI-MSI-edge eth0-tx0 Compared to [EMAIL PROTECTED] src]# cat /proc/interrupts | grep eth0 214: 0 0 PCI-MSI-edge eth0 215: 11763 4 PCI-MSI-edge eth0 216: 0 0 PCI-MSI-edge eth0 217: 77324 0 PCI-MSI-edge eth0 218: 0 0 PCI-MSI-edge eth0 219: 52911 0 PCI-MSI-edge eth0 220: 80271 0 PCI-MSI-edge eth0 221: 80244 6 PCI-MSI-edge eth0 222: 12 0 PCI-MSI-edge eth0 223: 124900 28543 PCI-MSI-edge eth0 Since we wanted to distinguish the various MSIX vectors in /proc/interrupts and since request_irq expects memory for name to be allocated somewhere, I added this part of the ring struct. Kill io_base and stop setting netdev-base_addr In my latest internal version, I already removed the io_base member (and couple more from ixgbe_adapter) but still setting the netdev-base_addr. I will remove that also.. + struct ixgbe_hw_stats stats; + char lsc_name[IFNAMSIZ + 5]; delete lsc_name and use netdev name directly in request_irq() Please see the response for the name member of ring structure. Will review more after you fix these problems. Thanks for the feedback. I will post another version shortly (except the feature flags change and the ring struct name members) which fixes my previous TODO list and also most of Francois comments.. Ayyappan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] ixgbe: Introduce new 10GbE driver for Intel 82598 based PCI Express adapters...
On 7/2/07, Stephen Hemminger [EMAIL PROTECTED] wrote: Fake netdevs are needed for doing the multiple Rx queues in NAPI mode. We are thinking to solve in 2 ways. Having netdev pointer in ring structure or having an array of netdev pointers in ixgbe_adatper struct which would be visible to all rings. Wait until Davem I separate NAPI from network device. The patch is close to ready for 2.6.24 when this driver will need to show up. Since I know Intel will be forced to backport this to older distro's. You would be best to have a single receive queue version when you have to make it work on the older code. So far all our testing indicates we need multiple Rx queues to have better CPU utilization number at 10Gig line rate. So I am thinking adding the non-NAPI support to the driver (like other 10Gig drivers) and restrict to single rx queue in case of NAPI. I already have the non-NAPI version coded up and went through internal testing. I will add this in the next submission. We will add the multiple Rx queues support in NAPI mode once when separate NAPI from network device is done. Does this sound ok? You only need to store the name for when you are doing request_irq, so it can just be a stack temporary. request_irq expects allocated memory not just stack temporary. I glanced the kernel source.. There are precedents to the way we did. linux-2.6/source/drivers/usb/core/hcd.c 1594 /* enable irqs just before we start the controller */ 1595 if (hcd-driver-irq) { 1596 snprintf(hcd-irq_descr, sizeof(hcd-irq_descr), %s:usb%d, 1597 hcd-driver-description, hcd-self.busnum); 1598 if ((retval = request_irq(irqnum, usb_hcd_irq, irqflags, 1599 hcd-irq_descr, hcd)) != 0) { 1600 dev_err(hcd-self.controller, 1601 request interrupt %d failed\n, irqnum); 1602 goto err_request_irq; 1603 } Stephen Hemminger [EMAIL PROTECTED] I appreciate the feedback. Thanks, Ayyappa - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] ixgbe: Introduce new 10GbE driver for Intel 82598 based PCI Express adapters...
On 7/2/07, Christoph Hellwig [EMAIL PROTECTED] wrote: But that'll require the single receiver queue version I guess. The netdevice abuse is the only really major issue I see, although I'd of course really like to see the driver getting rid of the bitfield abuse aswell. The submitted driver code supports single queue version in case of MSIX allocation failures... As I said in the other mail, I feel, restricting to single Rx queue in NAPI mode is better approach till Stephen's and DaveM' work of separating NAPI from netdevice is done.. Lots of drivers where the interface name is assigned after request_irq just use an internal name, e.g. ixgbeX in the case of this driver. This sounds ok to me. With this change, this is the output.. [EMAIL PROTECTED] src]# ip link 1: lo: LOOPBACK,UP,1 mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: sit0: NOARP mtu 1480 qdisc noop link/sit 0.0.0.0 brd 0.0.0.0 3: eth6: BROADCAST,MULTICAST,UP,1 mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:50:8b:05:5f:95 brd ff:ff:ff:ff:ff:ff 29: eth0: BROADCAST,MULTICAST,UP,1 mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:1b:21:01:e4:93 brd ff:ff:ff:ff:ff:ff 30: eth1: BROADCAST,MULTICAST mtu 1500 qdisc noop qlen 1000 link/ether 00:1b:21:01:e4:92 brd ff:ff:ff:ff:ff:ff [EMAIL PROTECTED] src]# cat /proc/interrupts | grep 29 214: 0 0 PCI-MSI-edge ixgbe29-lsc 215: 11764 80213 PCI-MSI-edge ixgbe29-rx7 216: 80257 0 PCI-MSI-edge ixgbe29-rx6 217: 77331 0 PCI-MSI-edge ixgbe29-rx5 218: 24201 0 PCI-MSI-edge ixgbe29-rx4 219: 52911 0 PCI-MSI-edge ixgbe29-rx3 220: 104591 0 PCI-MSI-edge ixgbe29-rx2 221: 80249 8 PCI-MSI-edge ixgbe29-rx1 222: 14 0 PCI-MSI-edge ixgbe29-rx0 223: 194023 118220 PCI-MSI-edge ixgbe29-tx0 Ayyappan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html