RE: [PATCH 1/1]ixgbe: Driver for Intel 82598 based 10GbE PCI Express family of adapters

2007-07-27 Thread Veeraiyan, Ayyappan
-Original Message-
From: Jeff Garzik [mailto:[EMAIL PROTECTED]

Stephen Hemminger wrote:
 Using module parameter for per device settings is bad idea.
 Please extend existing interfaces like ethtool, etc rather than
committing
 to a bad inflexible API.


I agreed with Stephen's comments here.

In general, net driver policy is to use ethtool (per-interface
granularity) rather than module options.


Thanks Stephen and Jeff for the feedback. The flow control and
InterruptThrottleRate parameters were carried over from e1000 and it was
pointed out to me that some disros/customer's scripts used those
parameters in the scripts. Ixgbe being a new driver, we can remove
those.

Yes the LLI parameters can be removed.

 +RxQueues
 +
 +Valid Range: 1, 2, 4, 8
 +Default Value: 8
 +Number of RX queues.
 +
 +

Ok. The present driver being NAPI only and supports one Rx queue only.
So this parameter needs to be removed .

But, once we have DaveM/Stephen work of NAPI struct work is done, driver
will support multiple Rx queues and with Multi Tx queue patch already in
the kernel, driver will support multiple Tx queues also shortly.

So, with driver/device supporting multiple Tx and Rx queues, I think, it
would be very useful to have ethtool interface to manage the number of
Tx and Rx queues of the interface. Current ethtool interface supports
managing the ring size so, we need similar interface for managing the
number of Tx and Rx queues of the interface.

Ayyappan
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/1]ixgbe: Driver for Intel 82598 based 10GbE PCI Express family of adapters

2007-07-27 Thread Veeraiyan, Ayyappan
-Original Message-
From: Jeff Garzik [mailto:[EMAIL PROTECTED]
Veeraiyan, Ayyappan wrote:
 So, with driver/device supporting multiple Tx and Rx queues, I think,
it
 would be very useful to have ethtool interface to manage the number
of
 Tx and Rx queues of the interface.


Absolutely!  ethtool patches welcome :)

git://git.kernel.org/pub/scm/network/ethtool/ethtool.git


We discussed this here today and we will try to come up with patches.

Also, FWIW, I will be offline for next 2 months, and someone (Auke ? :))
from our team will submit updated driver shortly..

Ayyappan
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/1] ixgbe: Support for Intel(R) 10GbE PCI Express adapters - Take #2

2007-07-26 Thread Veeraiyan, Ayyappan
On 7/23/07, Rick Jones [EMAIL PROTECTED] wrote:

The bidirectional looks like a two concurrent stream (TCP_STREAM +
TCP_MAERTS)
test right?

If you want a single-stream bidirectional test, then with the top of
trunk
netperf you can use:


Thanks for the feedback Rick. We haven't used the netperf trunk. The
person who actually got these numbers will be trying the netperf trunk
little later and we will post the results..
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/1] ixgbe: Support for Intel(R) 10GbE PCI Express adapters - Take #2

2007-07-17 Thread Veeraiyan, Ayyappan
On 7/10/07, Jeff Garzik [EMAIL PROTECTED] wrote:
Veeraiyan, Ayyappan wrote:
 On 7/10/07, Jeff Garzik [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] wrote:


 I will post the performance numbers later today..

Sorry for not responding earlier. We faced couple of issues like setup,
and false alarms..

Anyway here are the numbers..

RecvSendSendUtilization Service
Demand  
SocketSocketMessageElapsed  SendRecvSendRecv
SizeSizeSizeTime Throughput local   remotelocal remote  
Bytes   Bytes   Bytes   sec  10^6bits/s % s % s us/KB   us/KB

87380   65536   128 60  2261.34 13.82   4.254.006   1.233
128
87380   65536   256 60  3332.51 14.19   5.672.791.115
256
87380   65536   512 60.01   4262.24 14.38   6.9 2.211.062
512
87380   65536   102460  4659.18 14.47.392.026   1.039
1024
87380   65536   204860.01   6177.87 14.36   14.99   1.524   1.59
2048
87380   65536   409660.01   9410.29 11.58   14.60.807   1.017
4096
87380   65536   819260.01   9324.62 11.13   14.33   0.782   1.007
8192
87380   65536   16384   60.01   9371.35 11.07   14.28   0.774   0.999
16384
87380   65536   32768   60.02   9385.81 10.83   14.27   0.756   0.997
32768
87380   65536   65536   60.01   9363.5  10.73   14.26   0.751   0.998
65536

TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to n0417
(10.0.4.17) port 0 AF_INET : cpu bind

Recv   SendSend  Utilization  Service Demand
Socket Socket  Message  Elapsed   SendRecvSendRecv
Size   SizeSize Time   Throughput local  remote   local   remote
bytes  bytes   bytessecs.  10^6bits/s % S% S  us/KB   us/KB

87380  65536  6553660.029399.61   2.22 14.530.155
1.013 
87380  65536  6553660.029348.01   2.46 14.390.173
1.009 
87380  65536  6553660.029403.36   2.26 14.370.158
1.001 
87380  65536  6553660.019332.22   2.23 14.510.157
1.019 


Bidirectional test.
87380  65536  6553660.01   7809.57   28.6630.022.405   2.519
TX
87380  65536  6553660.01   7592.90   28.6630.022.474   2.591
RX
--
87380  65536  6553660.01  7629.73   28.3229.642.433   2.546
RX
87380  65536  6553660.01  7926.99   28.3229.642.342   2.450
TX

Signle netperf stream between 2 quad-core Xeon based boxes. Tested on
2.6.20 and 2.6.22 kernels. Driver uses NAPI and LRO.

To summarize, we are seeing the line-rate with NAPI (single Rx queue)
and Rx CPU utilization is around 14%. In back to back scenarios, NAPI
(combined with LRO) performs clearly better. In multiple client
scenarios, Non-NAPI with multiple Rx queues performs better. I am
continuously doing more benchmarking and submit a patch to pick one this
week.

But going forward if NAPI supports multiple Rx queues natively, I
believe that would perform much better in most of the cases.

Also, did you get a chance to review the driver take #2? I like to
implement the review comments (if any) as early as possible, and submit
another version.

Thanks...

Ayyappan




-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/1] ixgbe: Support for Intel(R) 10GbE PCI Express adapters - Take #2

2007-07-10 Thread Veeraiyan, Ayyappan
On 7/10/07, Jeff Garzik [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] wrote:
 
 Doing both tends to signal that the author hasn't bothered to measure
 the differences between various approaches, and pick a clear winner.
 

I did pick NAPI in our previous submission based on various tests. But
to get 10Gig line rate we need to use multiple Rx queues which will need
fake netdevs.. Since fake netdevs weren't acceptable, I added non-NAPI
support which gets 10Gig line rate with multi-rx. I am ok with removing
NAPI support till the work of separating the netdevs and NAPI work is
done..

 I strongly prefer NAPI combined with hardware interrupt mitigation --
it
 helps with multiple net interfaces balance load across the system, at
 times of high load -- but I'm open to other solutions as well.
 

Majority of tests we did here, we saw NAPI is better. But for some
specific test cases (especially if we add the SW RSC i.e. LRO), we saw
better throughput and CPU utilization with non-NAPI.

 So...  what are your preferences?  What is the setup that gets closest
 to wire speed under Linux?  :)

With SW LRO, non-NAPI is better but without LRO, NAPI is better but NAPI
needs multiple Rx queues. So given the limitations, non-NPAI is my
preference now.

I will post the performance numbers later today..

 
 Jeff

Thanks..

Ayyappan
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] ixgbe: Introduce new 10GbE driver for Intel 82598 based PCI Express adapters...

2007-07-09 Thread Veeraiyan, Ayyappan
From: Neil Horman [mailto:[EMAIL PROTECTED]
Replying to myself...
   I've looked through the driver pretty throughly with regards to
my
above
concern, and it appears the driver is reasonably free of netpoll issues
at
the
moment, at least as far as what we found in e1000 was concerned.  I do

Thanks for reviewing the code..

however,
see a concern in the use of the in_netpoll flag within the driver.
Given
that
the primary registered net_device, and all the dummy net_devices in the
rx_ring
point to the same ixgbe_adapter structure, there can be some level of
confusion
over weather a given rx queue is in netpoll_mode or not.

The revised driver I am going to post today will not have fake
netdevs...

adapter prforms a netpoll, all the individual rx queues will follow the
in_netpoll path in the receive path (assuming misx interrupts are
used).
The
result I think is the potential for a large amount of packet reordering
during a
netpoll operation.  Perhaps not a serious problem, but likely worth
looking

Multiple Rx queues are used in non-NAPI mode only, and all Rx queues use
one netdev (which is associated with the adapter struct). Also, the RSS
(receive side scaling or rx packet steering) feature is used in multiple
rx queues mode. In this mode, HW will always select the same Rx queue
(for a flow) and this should prevent any packet reordering issue.


Neil

Ayyappan
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] ixgbe: Introduce new 10GbE driver for Intel 82598 based PCI Express adapters...

2007-07-02 Thread Veeraiyan, Ayyappan
On 7/2/07, Jeff Garzik [EMAIL PROTECTED] wrote:
 Ayyappan Veeraiyan wrote:
  +#define IXGBE_TX_FLAGS_VLAN_MASK 0x
  +#define IXGBE_TX_FLAGS_VLAN_SHIFT16
 
 defining bits using the form (1  n) is preferred.  Makes it easier
 to read, by eliminating the requirement of the human brain to decode
hex
 into bit numbers.
 
 

Ok.

  + struct net_device netdev;
  + };
 
 Embedded a struct net_device into your ring?  How can I put this?
 
 Wrong, wrong.  Wrong, wrong, wrong.  Wrong.
 

Agreed. 
Fake netdevs are needed for doing the multiple Rx queues in NAPI mode.
We are thinking to solve in 2 ways. Having netdev pointer in ring
structure or having an array of netdev pointers in ixgbe_adatper struct
which would be visible to all rings.

  +
  + char name[IFNAMSIZ + 5];
 
 The interface name should not be stored by your ring structure


Yes, I agree and also (pointed out by someone before) this would break
if the user changes the interface name..
But, having the cause in the MSIX vector name really helps in
debugging and helps the user also. 

I think the below output is much better

[EMAIL PROTECTED] src]# cat /proc/interrupts | grep eth0
214:  0  0   PCI-MSI-edge  eth0-lsc
215:  11763  4   PCI-MSI-edge  eth0-rx7
216:  0  0   PCI-MSI-edge  eth0-rx6
217:  77324  0   PCI-MSI-edge  eth0-rx5
218:  0  0   PCI-MSI-edge  eth0-rx4
219:  52911  0   PCI-MSI-edge  eth0-rx3
220:  80271  0   PCI-MSI-edge  eth0-rx2
221:  80244  6   PCI-MSI-edge  eth0-rx1
222: 12  0   PCI-MSI-edge  eth0-rx0
223: 124870  28543   PCI-MSI-edge  eth0-tx0

Compared to 

[EMAIL PROTECTED] src]# cat /proc/interrupts | grep eth0
214:  0  0   PCI-MSI-edge  eth0
215:  11763  4   PCI-MSI-edge  eth0
216:  0  0   PCI-MSI-edge  eth0
217:  77324  0   PCI-MSI-edge  eth0
218:  0  0   PCI-MSI-edge  eth0
219:  52911  0   PCI-MSI-edge  eth0
220:  80271  0   PCI-MSI-edge  eth0
221:  80244  6   PCI-MSI-edge  eth0
222: 12  0   PCI-MSI-edge  eth0
223: 124900  28543   PCI-MSI-edge  eth0

Since we wanted to distinguish the various MSIX vectors in
/proc/interrupts and since request_irq expects memory for name to be
allocated somewhere, I added this part of the ring struct.

 
 Kill io_base and stop setting netdev-base_addr

In my latest internal version, I already removed the io_base member (and
couple more from ixgbe_adapter) but still setting the netdev-base_addr.
I will remove that also..

  + struct ixgbe_hw_stats stats;
  + char lsc_name[IFNAMSIZ + 5];
 
 delete lsc_name and use netdev name directly in request_irq()
 

Please see the response for the name member of ring structure.
 
 
 Will review more after you fix these problems.
 

Thanks for the feedback. I will post another version shortly (except the
feature flags change and the ring struct name members) which fixes my
previous TODO list and also most of Francois comments..

Ayyappan
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] ixgbe: Introduce new 10GbE driver for Intel 82598 based PCI Express adapters...

2007-07-02 Thread Veeraiyan, Ayyappan
On 7/2/07, Stephen Hemminger [EMAIL PROTECTED] wrote:
  Fake netdevs are needed for doing the multiple Rx queues in NAPI
mode.
  We are thinking to solve in 2 ways. Having netdev pointer in ring
  structure or having an array of netdev pointers in ixgbe_adatper
struct
  which would be visible to all rings.
 
 Wait until Davem  I separate NAPI from network device.
 The patch is close to ready for 2.6.24 when this driver will need to
show up.
 
 Since I know Intel will be forced to backport this to older distro's.
You
 would be best to have a single receive queue version when you have to
make
 it work on the older code.

So far all our testing indicates we need multiple Rx queues to have
better CPU utilization number at 10Gig line rate. So I am thinking
adding the non-NAPI support to the driver (like other 10Gig drivers) and
restrict to single rx queue in case of NAPI. I already have the non-NAPI
version coded up and went through internal testing. I will add this in
the next submission. We will add the multiple Rx queues support in NAPI
mode once when separate NAPI from network device is done. Does this
sound ok?

 
 You only need to store the name for when you are doing request_irq, so
 it can just be a stack temporary.


request_irq expects allocated memory not just stack temporary. I glanced
the kernel source.. There are precedents to the way we did.

linux-2.6/source/drivers/usb/core/hcd.c

1594 /* enable irqs just before we start the controller */
1595 if (hcd-driver-irq) {
1596 snprintf(hcd-irq_descr, sizeof(hcd-irq_descr),
%s:usb%d,
1597 hcd-driver-description,
hcd-self.busnum);
1598 if ((retval = request_irq(irqnum, usb_hcd_irq,
irqflags,
1599 hcd-irq_descr, hcd)) != 0) {
1600 dev_err(hcd-self.controller,
1601 request interrupt %d
failed\n, irqnum);
1602 goto err_request_irq;
1603 }

 
 Stephen Hemminger [EMAIL PROTECTED]

I appreciate the feedback.

Thanks,
Ayyappa
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] ixgbe: Introduce new 10GbE driver for Intel 82598 based PCI Express adapters...

2007-07-02 Thread Veeraiyan, Ayyappan
On 7/2/07, Christoph Hellwig [EMAIL PROTECTED] wrote:
 
 But that'll require the single receiver queue version I guess.  The
 netdevice abuse is the only really major issue I see, although I'd of
 course really like to see the driver getting rid of the bitfield abuse
 aswell.

The submitted driver code supports single queue version in case of MSIX
allocation failures... As I said in the other mail, I feel, restricting
to single Rx queue in NAPI mode is better approach till Stephen's and
DaveM' work of separating NAPI from netdevice is done..

 Lots of drivers where the interface name is assigned after request_irq
 just use an internal name, e.g. ixgbeX in the case of this driver.
 

This sounds ok to me. 

With this change, this is the output..

[EMAIL PROTECTED] src]# ip link
1: lo: LOOPBACK,UP,1 mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: sit0: NOARP mtu 1480 qdisc noop
link/sit 0.0.0.0 brd 0.0.0.0
3: eth6: BROADCAST,MULTICAST,UP,1 mtu 1500 qdisc pfifo_fast qlen
1000
link/ether 00:50:8b:05:5f:95 brd ff:ff:ff:ff:ff:ff
29: eth0: BROADCAST,MULTICAST,UP,1 mtu 1500 qdisc pfifo_fast qlen
1000
link/ether 00:1b:21:01:e4:93 brd ff:ff:ff:ff:ff:ff
30: eth1: BROADCAST,MULTICAST mtu 1500 qdisc noop qlen 1000
link/ether 00:1b:21:01:e4:92 brd ff:ff:ff:ff:ff:ff

[EMAIL PROTECTED] src]# cat /proc/interrupts | grep 29
214:  0  0   PCI-MSI-edge  ixgbe29-lsc
215:  11764  80213   PCI-MSI-edge  ixgbe29-rx7
216:  80257  0   PCI-MSI-edge  ixgbe29-rx6
217:  77331  0   PCI-MSI-edge  ixgbe29-rx5
218:  24201  0   PCI-MSI-edge  ixgbe29-rx4
219:  52911  0   PCI-MSI-edge  ixgbe29-rx3
220: 104591  0   PCI-MSI-edge  ixgbe29-rx2
221:  80249  8   PCI-MSI-edge  ixgbe29-rx1
222: 14  0   PCI-MSI-edge  ixgbe29-rx0
223: 194023 118220   PCI-MSI-edge  ixgbe29-tx0 

Ayyappan
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html