Clarification required about select vs wake_up race condition

2007-03-12 Thread Ravinandan Arakali \(rarakali\)
Hi,
I am facing following problem and was wondering if somebody could help
me out.
I reference our char driver below but the question I really have is
about sleep/wake_up mechanism. So, I thought somebody who is aware of
this can help me. BTW, this is 2.6.10.

Our char driver(pretty much like all other char drivers) does a
poll_wait() 
and returns status depending on whether data is available to be read.
Even though some data is available to be read(verified using one of our
internal 
commands), the select() never wakes up, inspite of any no. of messages
sent.

To understand this, I was looking at the code of select vs
wake_up_interruptible().
I feel I am misunderstanding some part of the kernel code but will be
glad if 
somebody can point it out.

My understanding:
The do_select() sets the state of task to TASK_INTERRUPTIBLE and calls
the driver's poll 
entry point. In our poll(), let's say immediately after we determine
that there's nothing 
to be read, some data arrives causing a wake_up_interruptible() on
another CPU. 
The wake up happens in the context of process sending the data. Since
the receiving process 
was already added to the list of listeners, from looking at the code of
try_to_wake_up(), it 
looks like it can set the state of the receiving process to
TASK_RUNNING(I don't see any lock 
preventing this). After this happens, the receiving process goes to
sleep (because of 
schedule_timeout called by do_select) but state is still set to
TASK_RUNNING. In this state, when another message arrives, the
wake_up_interruptible will not wake the process because of following 
code in try_to_wake_up() ?

old_state = p-state;
if (!(old_state  state))
   goto out;

The above situation seems simplistic so I'm wondering what I am missing
here ?

Thanks,
Ravi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


I/OAT configuration ?

2006-08-17 Thread Ravinandan Arakali
Hi,
I am trying to use I/OAT on one of the newer woodcrest boxes.
But not sure if things are configured properly since there
seems to be no change in performance with I/OAT enabled
or disabled.
Following are the steps followed.
1. MSI (CONFIG_PCI_MSI) is enabled in kernel(2.6.16.21).
2. In kernel DMA configuration, following are enabled.
 Support for DMA Engines
 Network: TCP receive copy offload
 Test DMA Client
 Intel I/OAT DMA support
3. I manually load the ioatdma driver (modprobe ioatdma)

As per some documentation I read, when step #3 is performed
successfully, directories dma0chanX is supposed to be created
under /sys/class/dma but in my case, this directory stays
empty. I don't see any messages in /var/log/messages.
Any idea what is missing ?

Thanks,
Ravi


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


H/W requirements for NETIF_F_HW_CSUM

2006-07-26 Thread Ravinandan Arakali
Hello,
Our current NIC does not provide the actual checksum value on receive path.
Hence we only claim NETIF_F_IP_CSUM instead of the more general
NETIF_F_HW_CSUM.

To support this in a future adapter, we would like to know what exactly are
the requirements (on both Rx and Tx )to claim NETIF_F_HW_CSUM ?

Following are some specific questions:
1. On Tx, our adapter supports checksumming of TCP/UDP over IPv4 and IPv6.
This computation is TCP/UDP specific. Does the checksum calculation need to
be more generic ? Also, skbuff.h says that the checksum needs to be placed
at a specific location(skb-h.raw+skb-csum). I guess this means the adapter
needs to pass back the checksum to host driver after transmission. What
happens in case of TSO ?
2. On Rx, is it suffficient if we place the L4 checksum in skb-csum ? What
about L3 checksum ?

Thanks,
Ravi

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: H/W requirements for NETIF_F_HW_CSUM

2006-07-26 Thread Ravinandan Arakali
Steve,
Thanks for the response.
Pls see my comments below.

 -Original Message-
 From: Stephen Hemminger [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, July 26, 2006 12:16 PM
 To: [EMAIL PROTECTED]
 Cc: netdev@vger.kernel.org; [EMAIL PROTECTED];
 [EMAIL PROTECTED]; [EMAIL PROTECTED];
 [EMAIL PROTECTED]; [EMAIL PROTECTED]; Leonid. Grossman
 (E-mail)
 Subject: Re: H/W requirements for NETIF_F_HW_CSUM
 
 
 On Wed, 26 Jul 2006 10:28:00 -0700
 Ravinandan Arakali [EMAIL PROTECTED] wrote:
 
  Hello,
  Our current NIC does not provide the actual checksum value 
 on receive path.
  Hence we only claim NETIF_F_IP_CSUM instead of the more general
  NETIF_F_HW_CSUM.
  
  To support this in a future adapter, we would like to know 
 what exactly are
  the requirements (on both Rx and Tx )to claim NETIF_F_HW_CSUM ?
 
 If you set NETIF_F_HW_CSUM, on transmit the adapter if 
 ip_summed is set
 will be handed an unchecksummed frame with the offset to 
 stuff the checksum at.
 Only difference between NETIF_F_HW_CSUM and NETIF_F_IP_CSUM 
 is that IP_CSUM
 means the device can handle IPV4 only.

Since our adapter does IPv4 and IPv6 checksum, do we then satisfy
the requirements to claim NETIF_F_HW_CSUM on Tx side ?
Also, for non-TSO, we can stuff the checksum at specified offset
in skb. What about TSO frames ?

 
 NETIF_F_HW_CSUM has no impact on receive. The form of receive 
 checksumming format
 is up to the device. It can either put one's complement in 
 skb-csum and
 set ip_summed to CHECKSUM_HW or if device only reports 
 checksum good then
 set CHECKSUM_UNNECESSARY.

The reason for thinking that NETIF_F_HW_CSUM and CHECKSUM_UNNECESSARY
don't go together was a comment from Jeff way back in '04 when our 
driver was initially submitted. Quoting from that mail:

You CANNOT use NETIF_F_HW_CSUM, when your hardware does not provide the 
checksum value.  You must use NETIF_F_IP_CSUM.  Your use of 
NETIF_F_HW_CSUM + CHECKSUM_UNNECESSARY is flat out incorrect.

 
 Several are a couple of subtleties to the receive processing:
 * Meaning of ip_summed changes from tx to rx path and that 
 has to be handled
   in code that does forwarding like bridges.
 * If device only reports checksum okay vs bad. The packets 
 marked bad might
   be another protocol, so should be passed up with 
 CHECKSUM_NONE and let any
   checksum errors get detected in software.
 * Checksum HW on receive should work better since then IPV6 
 and nested protocols like
   VLAN's can be handled.
 
  Following are some specific questions:
  1. On Tx, our adapter supports checksumming of TCP/UDP over 
 IPv4 and IPv6.
  This computation is TCP/UDP specific. Does the checksum 
 calculation need to
  be more generic ? Also, skbuff.h says that the checksum 
 needs to be placed
  at a specific location(skb-h.raw+skb-csum). I guess this 
 means the adapter
  needs to pass back the checksum to host driver after 
 transmission. What
  happens in case of TSO ?
  2. On Rx, is it suffficient if we place the L4 checksum in 
 skb-csum ? What
  about L3 checksum ?
  
 
 The L3 checksum (IP) is always computed. Since the header is 
 in CPU cache anyway
 it is faster that way.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH]NET: Add ECN support for TSO

2006-07-13 Thread Ravinandan Arakali
Dave/Michael,
Replicating NS bit(from super segment) across all segments looks fine.

But one of the issues is the random/pseudo-random generation of
ECT code points on each of these segments. The hardware will need to
be capable of generating this, and I guess should be able to verify this
against the NS bit received as part of ACK for that packet.

Following are couple of schemes proposed by our team. Please comment.

Option A)

If we were to permit ourselves to somewhat break the spirit of RFC3540
without breaking the letter, we could come up a fairly easy enhancement
to TSO... I think it would be acceptable to set ECT(0) on all packets
except one (I would suggest the last, but an argument could be made for
the first).  That one would have either ECT(0) or ECT(1) set as per a
field in the TxD (for example).

That would give us a method that works with ECN nonces (ECT(0) doesn't
increment the sum).  Unfortunately, it would give us a relative increase
in the number of packets being sent with ECT(0) (the random generation
should see a 50-50 distribution between ECT (0) and (1); we would be
skewing it toward (0) by whatever the proportion of packets to TSO
operations is).  So, a connection using ECN nonces and TSO would be less
robust than one not using TSO.

But it wouldn't be broken...



Option B)

The hardware could randomly generate either ECN codepoint on all packets
of a TSO operation except the last.  It would keep a local NS value for
the operation and, in the last packet, set either ECT(0) or ECT(1) as
necessary to generate a NS value equal to that specified in the
descriptor.

That way we would keep a much more equal distribution.  It comes at the
cost of a random value generator in the hardware but we could get by
with something extremely basic (e.g. lsb of the internal clock at the
point the packet is generated) if perfect randomness is not required.

The limitation with this scheme is that the sender can't verify the NS
from any returned ACK that falls inside a TSO operation (it can only be
checked at TSO endpoints and on non-TSO transmissions).

Ravi


-Original Message-
From: David Miller [mailto:[EMAIL PROTECTED]
Sent: Saturday, July 08, 2006 1:32 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
netdev@vger.kernel.org
Subject: Re: [PATCH]NET: Add ECN support for TSO


From: Michael Chan [EMAIL PROTECTED]
Date: Fri, 7 Jul 2006 18:01:34 -0700

 However, Large Receive Offload will be a different story.  If
 packets are accumulated in the hardware and presented to the stack
 as one large packet, the stack will not be able to calculate the
 cumulative NS correctly.  Unless the hardware calculates the partial
 NS over the LRO packet and puts it in the SKB when handing over the
 packet.

This is correct, LRO hardware would need to do something to make sure
the nonce parity works out.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH]NET: Add ECN support for TSO

2006-07-12 Thread Ravinandan Arakali
Thanks.. I will get rid of the per-session check for ECN.

Ravi

-Original Message-
From: David Miller [mailto:[EMAIL PROTECTED]
Sent: Tuesday, July 11, 2006 11:12 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
netdev@vger.kernel.org
Subject: Re: [PATCH]NET: Add ECN support for TSO


From: Michael Chan [EMAIL PROTECTED]
Date: Tue, 11 Jul 2006 21:53:42 -0700

 There is no reason to find out if ECN is enabled or not for any
 TCP connections.  Hw just needs to watch the above bits in the
 incoming packets.

Correct, it can be done in a completely stateless manner.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH]NET: Add ECN support for TSO

2006-07-11 Thread Ravinandan Arakali
Michael/David,
Thanks for the comments on LRO. The current LRO code in S2io driver is not
aware of ECN. While I was trying to fix this, the first thing I encountered
was to check, in the driver, if ECN is enabled for current session. To do
this, I try to get hold of the socket by doing something like:

tk = tcp_sk(skb-sk);
if (tk-ecn_flags  TCP_ECN_OK)
   /* Check CE, ECE, CWR etc */

I find that skb-sk is NULL. Is this the correct way to check the
per-session
ECN capability ? Why is skb-sk NULL ?

Thanks,
Ravi

-Original Message-
From: David Miller [mailto:[EMAIL PROTECTED]
Sent: Saturday, July 08, 2006 1:32 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
netdev@vger.kernel.org
Subject: Re: [PATCH]NET: Add ECN support for TSO


From: Michael Chan [EMAIL PROTECTED]
Date: Fri, 7 Jul 2006 18:01:34 -0700

 However, Large Receive Offload will be a different story.  If
 packets are accumulated in the hardware and presented to the stack
 as one large packet, the stack will not be able to calculate the
 cumulative NS correctly.  Unless the hardware calculates the partial
 NS over the LRO packet and puts it in the SKB when handing over the
 packet.

This is correct, LRO hardware would need to do something to make sure
the nonce parity works out.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH]NET: Add ECN support for TSO

2006-07-07 Thread Ravinandan Arakali
Michael,
Are network cards expected to be aware-of and implement RFC3540(ECN with
nonces) ?

Thanks,
Ravi

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Michael Chan
Sent: Tuesday, June 27, 2006 8:07 PM
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Cc: netdev@vger.kernel.org
Subject: [PATCH]NET: Add ECN support for TSO


In the current TSO implementation, NETIF_F_TSO and ECN cannot be
turned on together in a TCP connection.  The problem is that most
hardware that supports TSO does not handle CWR correctly if it is set
in the TSO packet.  Correct handling requires CWR to be set in the
first packet only if it is set in the TSO header.

This patch adds the ability to turn on NETIF_F_TSO and ECN using
GSO if necessary to handle TSO packets with CWR set.  Hardware
that handles CWR correctly can turn on NETIF_F_TSO_ECN in the dev-
features flag.

All TSO packets with CWR set will have the SKB_GSO_TCPV4_ECN set.  If
the output device does not have the NETIF_F_TSO_ECN feature set, GSO
will split the packet up correctly with CWR only set in the first
segment.

It is further assumed that all hardware will handle ECE properly by
replicating the ECE flag in all segments.  If that is not the case, a
simple extension of the logic will be required.


Signed-off-by: Michael Chan [EMAIL PROTECTED]



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [3/5] [NET]: Add software TSOv4

2006-06-26 Thread Ravinandan Arakali
We are working on it.

Ravi

-Original Message-
From: YOSHIFUJI Hideaki / g!?p-? [mailto:[EMAIL PROTECTED]
Sent: Friday, June 23, 2006 6:33 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
netdev@vger.kernel.org; [EMAIL PROTECTED]
Subject: Re: [3/5] [NET]: Add software TSOv4


In article [EMAIL PROTECTED] (at Fri, 23 Jun 2006
17:28:12 -0700), Ravinandan Arakali [EMAIL PROTECTED]
says:

 Neterion's Xframe adapter supports TSO over IPv6.

I remember you posted some patches.
Would you post revised version reflecting Stephen's comment, please?

--yoshfuji

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [patch 2.6.17] s2io driver irq fix

2006-06-22 Thread Ravinandan Arakali
Andrew,
My understanding is that MSI-X vectors are not usually shared. We don't want
to spend cycles checking if the interrupt was indeed from our card or
another device on same IRQ.
In fact, current driver shares IRQ for the MSI case which I think is a bug.
That should also be non-shared. Our MSI handler just runs thru' the Tx/Rx
completions and returns IRQ_HANDLED. In case of IRQ sharing, we could be
falsely claiming the interrupt as our own.

Ravi

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Andrew Morton
Sent: Wednesday, June 21, 2006 9:16 PM
To: Ananda Raju
Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
linux-fsdevel@vger.kernel.org; [EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]
Subject: Re: [patch 2.6.17] s2io driver irq fix


On Wed, 21 Jun 2006 15:50:49 -0400 (EDT)
Ananda Raju [EMAIL PROTECTED] wrote:

 + if (sp-intr_type == MSI_X) {
 + int i;

 - free_irq(vector, arg);
 + for (i=1; (sp-s2io_entries[i].in_use == MSIX_FLG); i++) {
 + if (sp-s2io_entries[i].type == MSIX_FIFO_TYPE) {
 + sprintf(sp-desc[i], %s:MSI-X-%d-TX,
 + dev-name, i);
 + err = request_irq(sp-entries[i].vector,
 +   s2io_msix_fifo_handle, 0, sp-desc[i],
 +   sp-s2io_entries[i].arg);

Is it usual to prohibit IRQ sharing with msix?

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] s2io: netpoll support

2006-06-13 Thread Ravinandan Arakali
I don't think we should disable and enable all interrupts in the
poll_controller entry point. With the current patch, at the end of
the routine _all_ interrupts get enabled which is not desirable.
Maybe you should just do disable_irq() at start of function and
enable_irq() before exiting, the way some of the other drivers do.

Ravi

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Brian Haley
Sent: Thursday, June 08, 2006 9:02 AM
To: netdev@vger.kernel.org
Cc: [EMAIL PROTECTED]
Subject: [PATCH] s2io: netpoll support


This adds netpoll support for things like netconsole/kgdboe to the s2io
10GbE driver.

This duplicates some code from s2io_poll() as I wanted to be
least-invasive, someone from Neterion might have other thoughts?

Signed-off-by: Brian Haley [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2.6.16.18] MSI: Proposed fix for MSI/MSI-X load failure

2006-06-02 Thread Ravinandan Arakali
Hi,
This patch suggests a fix for the MSI/MSI-X load failure.

Please review the patch.

Symptoms:
When a driver is loaded with MSI followed by MSI-X, the load fails indicating 
that the MSI vector is still active. And vice versa.

Suspected rootcause:
This happens inspite of driver calling free_irq() followed by 
pci_disable_msi/pci_disable_msix. This appears to be a kernel bug 
wherein the pci_disable_msi and pci_disable_msix calls do not 
clear/unpopulate the msi_desc data structure that was populated 
by pci_enable_msi/pci_enable_msix.

Proposed fix:
Free the MSI vector in pci_disable_msi and all allocated MSI-X vectors 
in pci_disable_msix.

Testing:
The fix has been tested on IA64 platforms with Neterion's Xframe driver.

Signed-off-by: Ravinandan Arakali [EMAIL PROTECTED]
---

diff -urpN old/drivers/pci/msi.c new/drivers/pci/msi.c
--- old/drivers/pci/msi.c   2006-05-31 19:02:19.0 -0700
+++ new/drivers/pci/msi.c   2006-05-31 19:02:39.0 -0700
@@ -779,6 +779,7 @@ void pci_disable_msi(struct pci_dev* dev
nr_released_vectors++;
default_vector = entry-msi_attrib.default_vector;
spin_unlock_irqrestore(msi_lock, flags);
+   msi_free_vector(dev, dev-irq, 1);
/* Restore dev-irq to its default pin-assertion vector */
dev-irq = default_vector;
disable_msi_mode(dev, pci_find_capability(dev, PCI_CAP_ID_MSI),
@@ -1046,6 +1047,7 @@ void pci_disable_msix(struct pci_dev* de
 
}
}
+   msi_remove_pci_irq_vectors(dev);
 }
 
 /**

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2.6.16.18] MSI: Proposed fix for MSI/MSI-X load failure

2006-06-02 Thread Ravinandan Arakali
Rajesh,
It's possible that the current behavior is by design but once the driver is 
loaded with MSI, you need a reboot to be able to load MSI-X. And vice versa. I 
found this rather restrictive.

I did test the fix multiple times. For eg. multiple load/unload iterations of
MSI followed by multiple load/unload of MSI-X followed by load/unload MSI. That 
way both transitions(MSI-to-MSI-X and vice versa) are tested.

Thanks,
Ravi

-Original Message-
From: Rajesh Shah [mailto:[EMAIL PROTECTED]
Sent: Friday, June 02, 2006 2:55 PM
To: Ravinandan Arakali
Cc: linux-kernel@vger.kernel.org; netdev@vger.kernel.org; Leonid
Grossman; Ananda Raju; Sriram Rapuru
Subject: Re: [PATCH 2.6.16.18] MSI: Proposed fix for MSI/MSI-X load
failure


On Fri, Jun 02, 2006 at 03:21:37PM -0400, Ravinandan Arakali wrote:
 
 Symptoms:
 When a driver is loaded with MSI followed by MSI-X, the load fails indicating 
 that the MSI vector is still active. And vice versa.
 
 Suspected rootcause:
 This happens inspite of driver calling free_irq() followed by 
 pci_disable_msi/pci_disable_msix. This appears to be a kernel bug 
 wherein the pci_disable_msi and pci_disable_msix calls do not 
 clear/unpopulate the msi_desc data structure that was populated 
 by pci_enable_msi/pci_enable_msix.
 
The current MSI code actually does this deliberately, not by
accident. It's got a lot of complex code to track devices and
vectors and make sure an enable_msi - disable - enable sequence
gives a driver the same vector. It also has policies about
reserving vectors based on potential hotplug activity etc.
Frankly, I've never understood the need for such policies, and
am in the process of removing all of them.

 Proposed fix:
 Free the MSI vector in pci_disable_msi and all allocated MSI-X vectors 
 in pci_disable_msix.
 
This will break the existing MSI policies. Once you take that away,
a whole lot of additional code and complexity can be removed too.
That's what I'm working on right now, but such a change is likely
too big for -stable.

So, I'm ok with this patch if it actually doesn't break MSI/MSI-X.
Did you try to repeatedly load/unload an MSI capable driver with
this patch? Did you repeatedly try to ifdown/ifup an Ethernet
driver that uses MSI? I'm not in a position to test this today, but
will try it out next week.

thanks,
Rajesh

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: pci_enable_msix throws up error

2006-06-01 Thread Ravinandan Arakali
I have submitted a proposed fix for the below issue.
Will wait for comments.

Ravi

-Original Message-
From: Andi Kleen [mailto:[EMAIL PROTECTED]
Sent: Friday, May 05, 2006 1:44 AM
To: Ayaz Abdulla
Cc: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org;
Ananda. Raju; netdev@vger.kernel.org; Leonid Grossman
Subject: Re: pci_enable_msix throws up error


On Friday 05 May 2006 07:14, Ayaz Abdulla wrote:
 I noticed the same behaviour, i.e. can not use both MSI and MSIX without
 rebooting.
 
 I had sent a message to the maintainer of the MSI/MSIX source a few
 months ago and got a response that they were working on fixing it. Not
 sure what the progress is on it.

The best way to make progress faster would be for someone like you
who needs it to submit a patch to fix it then.

-Andi

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pci_enable_msix throws up error

2006-05-04 Thread Ravinandan Arakali
Hi,
I am seeing the following problem with MSI/MSI-X.

Note: I am copying netdev since other network drivers use
this feature and somebody on the list could throw light.

Our 10G network card(Xframe II) supports MSI and MSI-X.
When I load/unload the driver with MSI support followed
by an attempt to load with MSI-X, I get the following
message from pci_enable_msix:

Can't enable MSI-X.  Device already has an MSI vector assigned

I seem to be doing the correct things when unloading the
MSI driver. Basically, I do free_irq() followed by pci_disable_msi().
Any idea what I am missing ?

Further analysis:
Looking at the code, the following check(when it finds a match) in
msi_lookup_vector(called by pci_enable_msix) seems to throw up this
message:
if (!msi_desc[vector] || msi_desc[vector]-dev != dev ||
msi_desc[vector]-msi_attrib.type != type ||
msi_desc[vector]-msi_attrib.default_vector != dev-irq)

pci_enable_msi, on successful completion will populate the
fields in msi_desc. But neither pci_disable_msi nor free_irq
seems to undo/unpopulate the msi_desc table.
Could this be the cause for the problem ?

Thanks,
Ravi


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2.6.16-rc5] S2io: Receive packet classification and steering mechanisms

2006-04-20 Thread Ravinandan Arakali
Andi,
The driver will be polling(listening) to netlink for
any configuration requests. We could release the user
tools but not sure where(in the tree) they would reside.

Thanks,
Ravi

-Original Message-
From: Andi Kleen [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 19, 2006 5:51 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; netdev@vger.kernel.org
Subject: Re: [PATCH 2.6.16-rc5] S2io: Receive packet classification and
steering mechanisms


On Thursday 20 April 2006 00:45, Ravinandan Arakali wrote:
 Andi,
 We would like to explain that this patch is tier-1 of a two
 tiered approach. It implements all the steering
 functionality at driver-only level, and it is fairly Neterion-specific.

That's fine for experiments, but probably not something
that should be in tree.


 The second upcoming submission will add a generic netlink-based
 interface for channel data flow and configuration(including receive
steering
 parameters) on per-channel basis, that will utilize the lower level
 implementation from the current patch.

Will the driver itself listening to netlink?

My feeling would be to teach the stack to use this would require
efficient interfaces and netlink isn't particularly. But if it's just
a glue module outside the driver that would be reasonable as a first
step I guess.

Do you also plan to release user tools to use it?


-Andi

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2.6.16-rc5] S2io: Receive packet classification and steering mechanisms

2006-04-19 Thread Ravinandan Arakali
Andi,
We would like to explain that this patch is tier-1 of a two
tiered approach. It implements all the steering
functionality at driver-only level, and it is fairly Neterion-specific.

The second upcoming submission will add a generic netlink-based
interface for channel data flow and configuration(including receive steering
parameters) on per-channel basis, that will utilize the lower level
implementation from the current patch.

Thanks,
Ravi

-Original Message-
From: Andi Kleen [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 18, 2006 5:59 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; netdev@vger.kernel.org
Subject: Re: [PATCH 2.6.16-rc5] S2io: Receive packet classification and
steering mechanisms


On Wednesday 19 April 2006 02:38, Ravinandan Arakali wrote:

 configuration: A mask(specified using loadable parameter rth_fn_and_mask)
 can be used to select a subset of TCP/UDP tuple for hash calculation.
 eg. To mask source port for TCP/IPv4 configuration,
 # insmod s2io.ko rx_steering_type=2 rth_fn_and_mask=0x0101
 LSB specifies RTH function type and MSB the mask. A full description
 is provided at the beginning of s2io.c

I don't think it's a good idea to introduce such weird and hard to
understand
module parameters for this.  I would be better to define a generic
internal kernel interface between stack and driver. Perhaps starting
with a standard netlink interface for this might be a good start
until the stack learns how to use this on its own.

 3. MAC address-based:
 Done based on destination MAC address of packet. Xframe can be
 configured with multiple unicast MAC addresses.

 configuration: Load-time parameters multi_mac_cnt and multi_macs
 can be used to specify no. of MAC addresses and list of unicast
 addresses.
 eg. insmod s2io.ko rx_steering_type=8 multi_mac_cnt=3
   multi_macs=00:0c:fc:00:00:22, 00:0c:fc:00:01:22, 00:0c:fc:00:02:22
 Packets received with default destination MAC address will be steered to
 ring0. Packets with destination MAC addresses specified by multi_macs are
 steered to ring1, ring2... respectively.

The obvious way to do this nicely would be to allow to define multiple
virtual interfaces where the mac addresses can be set using the usual
ioctls.


-Andi

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2.6.16-rc5] S2io: Receive packet classification and steering mechanisms

2006-03-29 Thread Ravinandan Arakali
Hi,
Just wondering if anybody got a chance to review below patch.

Thanks,
Ravi

-Original Message-
From: Ravinandan Arakali [mailto:[EMAIL PROTECTED]
Sent: Friday, March 10, 2006 12:32 PM
To: [EMAIL PROTECTED]; netdev@vger.kernel.org
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]
Subject: [PATCH 2.6.16-rc5] S2io: Receive packet classification and
steering mechanisms


Hi,
Attached below is a patch to several receive packet classification
and steering mechanisms for Xframe NIC hw channels. Current Xframe ASIC
supports one hardware channel per CPU, up to 8 channels. This number
will increase in the next ASIC release. A channel could be attached to a
specific MSI-X vector (with an independent interrupt moderation scheme),
which in turn can be bound to a CPU.

Follow-up patches will provide some enhancements for the default tcp
workload balancing across hw channels, as well as an optional hw channel
interface. The interface is intended to be very generic (not specific to
Xframe hardware).

The following mechanisms are supported in this patch:
Note: The steering type can be specified at load time with
parameter rx_steering_type. Values supported are 1(port based),
2(RTH), 4(SPDM), 8(MAC addr based).

1. RTH(Receive traffic hashing):
Steering is based on socket tuple (or a subset) and the popular Jenkins
hash is used for RTH. This lets the receive processing to be spanned out
to multiple CPUs, thus reducing single CPU bottleneck on Rx path.
Hash-based steering can be used when it is desired to balance an
unlimited number or TCP sessions across multiple CPUs but the exact
mapping between a particular session and a particular cpu is not
important.

configuration: A mask(specified using loadable parameter rth_fn_and_mask)
can be used to select a subset of TCP/UDP tuple for hash calculation.
eg. To mask source port for TCP/IPv4 configuration,
# insmod s2io.ko rx_steering_type=2 rth_fn_and_mask=0x0101
LSB specifies RTH function type and MSB the mask. A full description
is provided at the beginning of s2io.c

2. port based:
Steering is done based on source/destination TCP/UDP port number.

configuration: Interface used is netlink sockets. Can specify port
number(s), TCP/UDP type, source/destination port.

3. MAC address-based:
Done based on destination MAC address of packet. Xframe can be
configured with multiple unicast MAC addresses.

configuration: Load-time parameters multi_mac_cnt and multi_macs
can be used to specify no. of MAC addresses and list of unicast
addresses.
eg. insmod s2io.ko rx_steering_type=8 multi_mac_cnt=3
multi_macs=00:0c:fc:00:00:22, 00:0c:fc:00:01:22, 00:0c:fc:00:02:22
Packets received with default destination MAC address will be steered to
ring0. Packets with destination MAC addresses specified by multi_macs are
steered to ring1, ring2... respectively.

4. SPDM (Socket Pair Direct Match).
Steering is based on exact socket tuple (or a subset) match.
SPDM steering can be used when the exact mapping between a particular
session and a particular cpu is desired.

configuration: Interface used is netlink sockets. Can specify
socket tuple values. If any of the values(say source port) needs
to be don't care, specify 0x.

Signed-off-by: Raghavendra Koushik [EMAIL PROTECTED]
Signed-off-by: Sivakumar Subramani [EMAIL PROTECTED]
Signed-off-by: Ravinandan Arakali [EMAIL PROTECTED]
---

diff -urpN old/drivers/net/rx_cfg.h new/drivers/net/rx_cfg.h
--- old/drivers/net/rx_cfg.h1969-12-31 16:00:00.0 -0800
+++ new/drivers/net/rx_cfg.h2006-03-10 02:54:56.0 -0800
@@ -0,0 +1,44 @@
+#ifndef _RX_CFG_H_
+#define_RX_CFG_H_
+
+typedef struct {
+   unsigned short  port;
+   unsigned short  prot_n_type; /* TCP/UDP  Dst/Src port type */
+#defineSRC_PRT 0x0
+#defineDST_PRT 0x1
+#defineTCP_PROT0x0
+#defineUDP_PROT0x1
+   unsigned short  dst_ring;
+}port_info_t;
+
+/* A Rx steering config structure to pass info the driver by user */
+typedef struct {
+   //SPDM
+   unsigned intsip; /* Src IP addr */
+   unsigned intdip; /* Dst IP addr */
+   unsigned short  sprt; /* Src TCP port */
+   unsigned short  dprt; /* Dst TCP port */
+   unsigned intt_queue; /*Target Rx Queue for the packet */
+   unsigned inthash; /* the hash as per jenkin's hash algorithm. */
+#define SPDM_NO_DATA   0x1
+#define SPDM_XENA_IF   0x2
+#define SPDM_HW_UNINITIALIZED  0x3
+#define SPDM_INCOMPLETE_SOCKET 0x4
+#define SPDM_TABLE_ACCESS_FAILED   0x5
+#define SPDM_TABLE_FULL0x6
+#define SPDM_TABLE_UNKNOWN_BAR 0x7
+#define SPDM_TABLE_MALLOC_FAIL 0x8
+#defineSPDM_INVALID_DEVICE 0x9
+#define SPDM_CONF_SUCCESS  0x0
+#defineSPDM_GET_CFG_DATA   0xAA55
+   int

[PATCH 2.6.16-rc5] S2io: Receive packet classification and steering mechanisms

2006-03-10 Thread Ravinandan Arakali
Hi,
Attached below is a patch to several receive packet classification
and steering mechanisms for Xframe NIC hw channels. Current Xframe ASIC
supports one hardware channel per CPU, up to 8 channels. This number
will increase in the next ASIC release. A channel could be attached to a
specific MSI-X vector (with an independent interrupt moderation scheme),
which in turn can be bound to a CPU.

Follow-up patches will provide some enhancements for the default tcp
workload balancing across hw channels, as well as an optional hw channel
interface. The interface is intended to be very generic (not specific to
Xframe hardware).

The following mechanisms are supported in this patch:
Note: The steering type can be specified at load time with
parameter rx_steering_type. Values supported are 1(port based),
2(RTH), 4(SPDM), 8(MAC addr based). 
 
1. RTH(Receive traffic hashing):
Steering is based on socket tuple (or a subset) and the popular Jenkins
hash is used for RTH. This lets the receive processing to be spanned out
to multiple CPUs, thus reducing single CPU bottleneck on Rx path.
Hash-based steering can be used when it is desired to balance an
unlimited number or TCP sessions across multiple CPUs but the exact
mapping between a particular session and a particular cpu is not
important.

configuration: A mask(specified using loadable parameter rth_fn_and_mask)
can be used to select a subset of TCP/UDP tuple for hash calculation.
eg. To mask source port for TCP/IPv4 configuration,
# insmod s2io.ko rx_steering_type=2 rth_fn_and_mask=0x0101
LSB specifies RTH function type and MSB the mask. A full description
is provided at the beginning of s2io.c 

2. port based:
Steering is done based on source/destination TCP/UDP port number. 

configuration: Interface used is netlink sockets. Can specify port
number(s), TCP/UDP type, source/destination port.

3. MAC address-based:
Done based on destination MAC address of packet. Xframe can be
configured with multiple unicast MAC addresses.

configuration: Load-time parameters multi_mac_cnt and multi_macs
can be used to specify no. of MAC addresses and list of unicast
addresses.
eg. insmod s2io.ko rx_steering_type=8 multi_mac_cnt=3 
multi_macs=00:0c:fc:00:00:22, 00:0c:fc:00:01:22, 00:0c:fc:00:02:22 
Packets received with default destination MAC address will be steered to 
ring0. Packets with destination MAC addresses specified by multi_macs are 
steered to ring1, ring2... respectively.

4. SPDM (Socket Pair Direct Match).
Steering is based on exact socket tuple (or a subset) match.
SPDM steering can be used when the exact mapping between a particular
session and a particular cpu is desired.

configuration: Interface used is netlink sockets. Can specify 
socket tuple values. If any of the values(say source port) needs
to be don't care, specify 0x.

Signed-off-by: Raghavendra Koushik [EMAIL PROTECTED]
Signed-off-by: Sivakumar Subramani [EMAIL PROTECTED]
Signed-off-by: Ravinandan Arakali [EMAIL PROTECTED]
---

diff -urpN old/drivers/net/rx_cfg.h new/drivers/net/rx_cfg.h
--- old/drivers/net/rx_cfg.h1969-12-31 16:00:00.0 -0800
+++ new/drivers/net/rx_cfg.h2006-03-10 02:54:56.0 -0800
@@ -0,0 +1,44 @@
+#ifndef _RX_CFG_H_
+#define_RX_CFG_H_
+
+typedef struct {
+   unsigned short  port;
+   unsigned short  prot_n_type; /* TCP/UDP  Dst/Src port type */
+#defineSRC_PRT 0x0
+#defineDST_PRT 0x1
+#defineTCP_PROT0x0
+#defineUDP_PROT0x1
+   unsigned short  dst_ring;
+}port_info_t;
+
+/* A Rx steering config structure to pass info the driver by user */
+typedef struct {
+   //SPDM
+   unsigned intsip; /* Src IP addr */
+   unsigned intdip; /* Dst IP addr */
+   unsigned short  sprt; /* Src TCP port */
+   unsigned short  dprt; /* Dst TCP port */
+   unsigned intt_queue; /*Target Rx Queue for the packet */
+   unsigned inthash; /* the hash as per jenkin's hash algorithm. */
+#define SPDM_NO_DATA   0x1
+#define SPDM_XENA_IF   0x2
+#define SPDM_HW_UNINITIALIZED  0x3
+#define SPDM_INCOMPLETE_SOCKET 0x4
+#define SPDM_TABLE_ACCESS_FAILED   0x5
+#define SPDM_TABLE_FULL0x6
+#define SPDM_TABLE_UNKNOWN_BAR 0x7
+#define SPDM_TABLE_MALLOC_FAIL 0x8
+#defineSPDM_INVALID_DEVICE 0x9
+#define SPDM_CONF_SUCCESS  0x0
+#defineSPDM_GET_CFG_DATA   0xAA55
+   int ret;
+#define MAX_SPDM_ENTRIES_SIZE  (0x100 * 0x40)
+   unsigned char   data[MAX_SPDM_ENTRIES_SIZE];
+   int data_len; /* Number of entries retrieved */
+   chardev_name[20];   /* Device name, e.g. eth0, eth1... */
+
+   // Port steering
+   port_info_t l4_ports;
+} rx_steering_cfg_t;
+
+#endif /*_RX_CFG_H_*/
diff -urpN old/drivers/net/s2io-regs.h new/drivers/net/s2io-regs.h
--- old/drivers/net

RE: [PATCH 2.6.16-rc1] S2io: Large Receive Offload (LRO) feature(v2) for Neterion (s2io) 10GbE Xframe PCI-X and PCI-E NICs

2006-02-08 Thread Ravinandan Arakali
Hi,
Just wondering if anybody got a chance to review the below patch.
This version(as per Rick's comment on v1 patch) includes support
for TCP timestamps.

Thanks,
Ravi

-Original Message-
From: Ravinandan Arakali [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 25, 2006 11:53 AM
To: [EMAIL PROTECTED]; netdev@vger.kernel.org
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: [PATCH 2.6.16-rc1] S2io: Large Receive Offload (LRO)
feature(v2) for Neterion (s2io) 10GbE Xframe PCI-X and PCI-E NICs


Hi,
Below is a patch for the Large Receive Offload feature.
Please review and let us know your comments.

LRO algorithm was described in an OLS 2005 presentation, located at
ftp.s2io.com
user: linuxdocs
password: HALdocs

The same ftp site has Programming Manual for Xframe-I ASIC.
LRO feature is supported on Neterion Xframe-I, Xframe-II and
Xframe-Express 10GbE NICs.

Brief description:
The Large Receive Offload(LRO) feature is a stateless offload
that is complementary to TSO feature but on the receive path.
The idea is to combine and collapse(upto 64K maximum) in the
driver, in-sequence TCP packets belonging to the same session.
It is mainly designed to improve 1500 mtu receive performance,
since Jumbo frame performance is already close to 10GbE line
rate. Some performance numbers are attached below.

Implementation details:
1. Handle packet chains from multiple sessions(current default
MAX_LRO_SESSSIONS=32).
2. Examine each packet for eligiblity to aggregate. A packet is
considered eligible if it meets all the below criteria.
  a. It is a TCP/IP packet and L2 type is not LLC or SNAP.
  b. The packet has no checksum errors(L3 and L4).
  c. There are no IP options. The only TCP option supported is timestamps.
  d. Search and locate the LRO object corresponding to this
 socket and ensure packet is in TCP sequence.
  e. It's not a special packet(SYN, FIN, RST, URG, PSH etc. flags are not
set).
  f. TCP payload is non-zero(It's not a pure ACK).
  g. It's not an IP-fragmented packet.
3. If a packet is found eligible, the LRO object is updated with
   information such as next sequence number expected, current length
   of aggregated packet and so on. If not eligible or max packets
   reached, update IP and TCP headers of first packet in the chain
   and pass it up to stack.
4. The frag_list in skb structure is used to chain packets into one
   large packet.

Kernel changes required: None

Performance results:
Main focus of the initial testing was on 1500 mtu receiver, since this
is a bottleneck not covered by the existing stateless offloads.

There are couple disclaimers about the performance results below:
1. Your mileage will vary We initially concentrated on couple pci-x
2.0 platforms that are powerful enough to push 10 GbE NIC and do not
have bottlenecks other than cpu%;  testing on other platforms is still
in progress. On some lower end systems we are seeing lower gains.

2. Current LRO implementation is still (for the most part) software based,
and therefore performance potential of the feature is far from being
realized.
Full hw implementation of LRO is expected in the next version of Xframe
ASIC.

Performance delta(with MTU=1500) going from LRO disabled to enabled:
IBM 2-way Xeon (x366) : 3.5 to 7.1 Gbps
2-way Opteron : 4.5 to 6.1 Gbps

Signed-off-by: Ravinandan Arakali [EMAIL PROTECTED]
---

diff -urpN old/drivers/net/s2io.c new_ts/drivers/net/s2io.c
--- old/drivers/net/s2io.c  2006-01-19 04:31:05.0 -0800
+++ new_ts/drivers/net/s2io.c   2006-01-24 08:56:25.0 -0800
@@ -57,6 +57,9 @@
 #include linux/ethtool.h
 #include linux/workqueue.h
 #include linux/if_vlan.h
+#include linux/ip.h
+#include linux/tcp.h
+#include net/tcp.h

 #include asm/system.h
 #include asm/uaccess.h
@@ -66,7 +69,7 @@
 #include s2io.h
 #include s2io-regs.h

-#define DRV_VERSION Version 2.0.9.4
+#define DRV_VERSION 2.0.11.2

 /* S2io Driver name  version. */
 static char s2io_driver_name[] = Neterion;
@@ -168,6 +171,11 @@ static char ethtool_stats_keys[][ETH_GST
{\n DRIVER STATISTICS},
{single_bit_ecc_errs},
{double_bit_ecc_errs},
+   (lro_aggregated_pkts),
+   (lro_flush_both_count),
+   (lro_out_of_sequence_pkts),
+   (lro_flush_due_to_max_pkts),
+   (lro_avg_aggr_pkts),
 };

 #define S2IO_STAT_LEN sizeof(ethtool_stats_keys)/ ETH_GSTRING_LEN
@@ -317,6 +325,12 @@ static unsigned int indicate_max_pkts;
 static unsigned int rxsync_frequency = 3;
 /* Interrupt type. Values can be 0(INTA), 1(MSI), 2(MSI_X) */
 static unsigned int intr_type = 0;
+/* Large receive offload feature */
+static unsigned int lro = 0;
+/* Max pkts to be aggregated by LRO at one time. If not specified,
+ * aggregation happens until we hit max IP pkt size(64K)
+ */
+static unsigned int lro_max_pkts = 0x;

 /*
  * S2IO device table.
@@ -1476,6 +1490,19 @@ static int init_nic(struct s2io_nic *nic
writel((u32

[PATCH 2.6.16-rc1] S2io: Large Receive Offload (LRO) feature(v2) for Neterion (s2io) 10GbE Xframe PCI-X and PCI-E NICs

2006-01-25 Thread Ravinandan Arakali
Hi,
Below is a patch for the Large Receive Offload feature.
Please review and let us know your comments.

LRO algorithm was described in an OLS 2005 presentation, located at
ftp.s2io.com
user: linuxdocs
password: HALdocs

The same ftp site has Programming Manual for Xframe-I ASIC.
LRO feature is supported on Neterion Xframe-I, Xframe-II and 
Xframe-Express 10GbE NICs.

Brief description:
The Large Receive Offload(LRO) feature is a stateless offload 
that is complementary to TSO feature but on the receive path. 
The idea is to combine and collapse(upto 64K maximum) in the 
driver, in-sequence TCP packets belonging to the same session.
It is mainly designed to improve 1500 mtu receive performance, 
since Jumbo frame performance is already close to 10GbE line 
rate. Some performance numbers are attached below.  

Implementation details:
1. Handle packet chains from multiple sessions(current default
MAX_LRO_SESSSIONS=32).
2. Examine each packet for eligiblity to aggregate. A packet is
considered eligible if it meets all the below criteria.
  a. It is a TCP/IP packet and L2 type is not LLC or SNAP.
  b. The packet has no checksum errors(L3 and L4). 
  c. There are no IP options. The only TCP option supported is timestamps.
  d. Search and locate the LRO object corresponding to this
 socket and ensure packet is in TCP sequence.
  e. It's not a special packet(SYN, FIN, RST, URG, PSH etc. flags are not set).
  f. TCP payload is non-zero(It's not a pure ACK).
  g. It's not an IP-fragmented packet.
3. If a packet is found eligible, the LRO object is updated with 
   information such as next sequence number expected, current length
   of aggregated packet and so on. If not eligible or max packets
   reached, update IP and TCP headers of first packet in the chain
   and pass it up to stack.
4. The frag_list in skb structure is used to chain packets into one
   large packet.
 
Kernel changes required: None

Performance results:
Main focus of the initial testing was on 1500 mtu receiver, since this 
is a bottleneck not covered by the existing stateless offloads.

There are couple disclaimers about the performance results below:
1. Your mileage will vary We initially concentrated on couple pci-x 
2.0 platforms that are powerful enough to push 10 GbE NIC and do not 
have bottlenecks other than cpu%;  testing on other platforms is still 
in progress. On some lower end systems we are seeing lower gains.

2. Current LRO implementation is still (for the most part) software based, 
and therefore performance potential of the feature is far from being realized. 
Full hw implementation of LRO is expected in the next version of Xframe ASIC. 

Performance delta(with MTU=1500) going from LRO disabled to enabled:
IBM 2-way Xeon (x366) : 3.5 to 7.1 Gbps
2-way Opteron : 4.5 to 6.1 Gbps
 
Signed-off-by: Ravinandan Arakali [EMAIL PROTECTED]
---

diff -urpN old/drivers/net/s2io.c new_ts/drivers/net/s2io.c
--- old/drivers/net/s2io.c  2006-01-19 04:31:05.0 -0800
+++ new_ts/drivers/net/s2io.c   2006-01-24 08:56:25.0 -0800
@@ -57,6 +57,9 @@
 #include linux/ethtool.h
 #include linux/workqueue.h
 #include linux/if_vlan.h
+#include linux/ip.h
+#include linux/tcp.h
+#include net/tcp.h
 
 #include asm/system.h
 #include asm/uaccess.h
@@ -66,7 +69,7 @@
 #include s2io.h
 #include s2io-regs.h
 
-#define DRV_VERSION Version 2.0.9.4
+#define DRV_VERSION 2.0.11.2
 
 /* S2io Driver name  version. */
 static char s2io_driver_name[] = Neterion;
@@ -168,6 +171,11 @@ static char ethtool_stats_keys[][ETH_GST
{\n DRIVER STATISTICS},
{single_bit_ecc_errs},
{double_bit_ecc_errs},
+   (lro_aggregated_pkts),
+   (lro_flush_both_count),
+   (lro_out_of_sequence_pkts),
+   (lro_flush_due_to_max_pkts),
+   (lro_avg_aggr_pkts),
 };
 
 #define S2IO_STAT_LEN sizeof(ethtool_stats_keys)/ ETH_GSTRING_LEN
@@ -317,6 +325,12 @@ static unsigned int indicate_max_pkts;
 static unsigned int rxsync_frequency = 3;
 /* Interrupt type. Values can be 0(INTA), 1(MSI), 2(MSI_X) */
 static unsigned int intr_type = 0;
+/* Large receive offload feature */
+static unsigned int lro = 0;
+/* Max pkts to be aggregated by LRO at one time. If not specified,
+ * aggregation happens until we hit max IP pkt size(64K)
+ */
+static unsigned int lro_max_pkts = 0x;
 
 /*
  * S2IO device table.
@@ -1476,6 +1490,19 @@ static int init_nic(struct s2io_nic *nic
writel((u32) (val64  32), (add + 4));
val64 = readq(bar0-mac_cfg);
 
+   /* Enable FCS stripping by adapter */
+   add = bar0-mac_cfg;
+   val64 = readq(bar0-mac_cfg);
+   val64 |= MAC_CFG_RMAC_STRIP_FCS;
+   if (nic-device_type == XFRAME_II_DEVICE)
+   writeq(val64, bar0-mac_cfg);
+   else {
+   writeq(RMAC_CFG_KEY(0x4C0D), bar0-rmac_cfg_key);
+   writel((u32) (val64), add);
+   writeq(RMAC_CFG_KEY(0x4C0D), bar0-rmac_cfg_key);
+   writel((u32) (val64  32), (add + 4

RE: [PATCH 2.6.16-rc1] S2io: Large Receive Offload (LRO) feature for Neterion (s2io) 10GbE Xframe PCI-X and PCI-E NICs

2006-01-23 Thread Ravinandan Arakali
Rick,
This is the basic implementation I submitted. I will try and include support
for timestamp option and resubmit.
I did not did understand your other comments about service demand.

Thanks,
Ravi

-Original Message-
From: Rick Jones [mailto:[EMAIL PROTECTED]
Sent: Friday, January 20, 2006 3:30 PM
To: Ravinandan Arakali
Cc: [EMAIL PROTECTED]; netdev@vger.kernel.org;
[EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]
Subject: Re: [PATCH 2.6.16-rc1] S2io: Large Receive Offload (LRO)
feature for Neterion (s2io) 10GbE Xframe PCI-X and PCI-E NICs



 Implementation details:
 1. Handle packet chains from multiple sessions(current default
 MAX_LRO_SESSSIONS=32).
 2. Examine each packet for eligiblity to aggregate. A packet is
 considered eligible if it meets all the below criteria.
   a. It is a TCP/IP packet and L2 type is not LLC or SNAP.
   b. The packet has no checksum errors(L3 and L4).
   c. There are no TCP or IP options.

_No_ TCP options?  Not even Timestamps?  Given that one can theoretically
wrap
the 32-bit TCP sequence space in something like four seconds, and the
general
goodness of timestamps when using window scaling, one might think that
timestamps being enabled if not already common today would become more
common?

   d. Search and locate the LRO object corresponding to this
  socket and ensure packet is in TCP sequence.
   e. It's not a special packet(SYN, FIN, RST, URG, PSH etc. flags are not
set).
   f. TCP payload is non-zero(It's not a pure ACK).
   g. It's not an IP-fragmented packet.
 3. If a packet is found eligible, the LRO object is updated with
information such as next sequence number expected, current length
of aggregated packet and so on. If not eligible or max packets
reached, update IP and TCP headers of first packet in the chain
and pass it up to stack.
 4. The frag_list in skb structure is used to chain packets into one
large packet.

 Kernel changes required: None

 Performance results:
 Main focus of the initial testing was on 1500 mtu receiver, since this
 is a bottleneck not covered by the existing stateless offloads.

 There are couple disclaimers about the performance results below:
 1. Your mileage will vary We initially concentrated on couple pci-x
2.0
 platforms that are powerful enough to push 10 GbE NIC and do not
 have bottlenecks other than cpu%;  testing on other platforms is still
 in progress. On some lower end systems we are seeing lower gains.

You should still see benefits in reported service demand no?

 2. Current LRO implementation is still (for the most part) software based,
 and therefore performance potential of the feature is far from being
realized.
 Full hw implementation of LRO is expected in the next version of Xframe
ASIC.

 Performance delta(with MTU=1500) going from LRO disabled to enabled:
 IBM 2-way Xeon (x366) : 3.5 to 7.1 Gbps
 2-way Opteron : 4.5 to 6.1 Gbps

Service demand changes?

rick jones

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2.6.16-rc1] S2io: Large Receive Offload (LRO) feature for Neterion (s2io) 10GbE Xframe PCI-X and PCI-E NICs

2006-01-23 Thread Ravinandan Arakali
Rick,
In addition to showing improved throughput, the CPU utilization(service
demand)
also went down. But one of the CPUs was running at full utilization. For eg.
without LRO, the CPU idle times on the 4 CPUs were 39,43,8,12(average 25%
idle).
With LRO, it was 48/0/46/47(average 35% idle).

Regards,
Ravi

-Original Message-
From: Rick Jones [mailto:[EMAIL PROTECTED]
Sent: Monday, January 23, 2006 4:08 PM
To: Ravinandan Arakali
Cc: [EMAIL PROTECTED]; netdev@vger.kernel.org;
[EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]
Subject: Re: [PATCH 2.6.16-rc1] S2io: Large Receive Offload (LRO)
feature for Neterion (s2io) 10GbE Xframe PCI-X and PCI-E NICs


Ravinandan Arakali wrote:
 Rick,
 This is the basic implementation I submitted. I will try and include
support
 for timestamp option and resubmit.
 I did not did understand your other comments about service demand.

Sorry, that's a netperfism - netperf can report the service demand
measured
during a test - it is basically the quantity of CPU consumed per unit of
work
performed.  Lower is better.

For example:

languid:/opt/netperf2# src/netperf -H 192.168.3.212 -c -C
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.3.212
(192.168.3.212) port 0 AF_INET
Recv   SendSend  Utilization   Service
Demand
Socket Socket  Message  Elapsed  Send Recv SendRecv
Size   SizeSize Time Throughput  localremote   local
remote
bytes  bytes   bytessecs.10^6bits/s  % S  % S  us/KB   us/KB

  87380  16384  1638410.00   940.96   17.0147.962.962
8.351

In the test above, the sender consumed nearly 3 microseconds of CPU time to
transfer a KB of data, and the reciever consumed nearly 8.4

rick

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2.6.16-rc1] S2io: Large Receive Offload (LRO) feature for Neterion (s2io) 10GbE Xframe PCI-X and PCI-E NICs

2006-01-20 Thread Ravinandan Arakali
Hi,
Below is a patch for the Large Receive Offload feature.
Please review and let us know your comments.

LRO algorithm was described in an OLS 2005 presentation, located at
ftp.s2io.com
user: linuxdocs
password: HALdocs

The same ftp site has Programming Manual for Xframe-I ASIC.
LRO feature is supported on Neterion Xframe-I, Xframe-II and 
Xframe-Express 10GbE NICs.

Brief description:
The Large Receive Offload(LRO) feature is a stateless offload 
that is complementary to TSO feature but on the receive path. 
The idea is to combine and collapse(upto 64K maximum) in the 
driver, in-sequence TCP packets belonging to the same session.
It is mainly designed to improve 1500 mtu receive performance, 
since Jumbo frame performance is already close to 10GbE line 
rate. Some performance numbers are attached below.  

Implementation details:
1. Handle packet chains from multiple sessions(current default
MAX_LRO_SESSSIONS=32).
2. Examine each packet for eligiblity to aggregate. A packet is
considered eligible if it meets all the below criteria.
  a. It is a TCP/IP packet and L2 type is not LLC or SNAP.
  b. The packet has no checksum errors(L3 and L4). 
  c. There are no TCP or IP options.
  d. Search and locate the LRO object corresponding to this
 socket and ensure packet is in TCP sequence.
  e. It's not a special packet(SYN, FIN, RST, URG, PSH etc. flags are not set).
  f. TCP payload is non-zero(It's not a pure ACK).
  g. It's not an IP-fragmented packet.
3. If a packet is found eligible, the LRO object is updated with 
   information such as next sequence number expected, current length
   of aggregated packet and so on. If not eligible or max packets
   reached, update IP and TCP headers of first packet in the chain
   and pass it up to stack.
4. The frag_list in skb structure is used to chain packets into one
   large packet.
 
Kernel changes required: None

Performance results:
Main focus of the initial testing was on 1500 mtu receiver, since this 
is a bottleneck not covered by the existing stateless offloads.

There are couple disclaimers about the performance results below:
1. Your mileage will vary We initially concentrated on couple pci-x 2.0 
platforms that are powerful enough to push 10 GbE NIC and do not 
have bottlenecks other than cpu%;  testing on other platforms is still 
in progress. On some lower end systems we are seeing lower gains.

2. Current LRO implementation is still (for the most part) software based, 
and therefore performance potential of the feature is far from being realized. 
Full hw implementation of LRO is expected in the next version of Xframe ASIC. 

Performance delta(with MTU=1500) going from LRO disabled to enabled:
IBM 2-way Xeon (x366) : 3.5 to 7.1 Gbps
2-way Opteron : 4.5 to 6.1 Gbps
 
Signed-off-by: Ravinandan Arakali [EMAIL PROTECTED]
---

diff -urpN old/drivers/net/s2io.c new/drivers/net/s2io.c
--- old/drivers/net/s2io.c  2006-01-19 04:31:05.0 -0800
+++ new/drivers/net/s2io.c  2006-01-20 04:04:09.0 -0800
@@ -57,6 +57,8 @@
 #include linux/ethtool.h
 #include linux/workqueue.h
 #include linux/if_vlan.h
+#include linux/ip.h
+#include linux/tcp.h
 
 #include asm/system.h
 #include asm/uaccess.h
@@ -66,7 +68,7 @@
 #include s2io.h
 #include s2io-regs.h
 
-#define DRV_VERSION Version 2.0.9.4
+#define DRV_VERSION 2.0.11.2
 
 /* S2io Driver name  version. */
 static char s2io_driver_name[] = Neterion;
@@ -168,6 +170,11 @@ static char ethtool_stats_keys[][ETH_GST
{\n DRIVER STATISTICS},
{single_bit_ecc_errs},
{double_bit_ecc_errs},
+   (lro_aggregated_pkts),
+   (lro_flush_both_count),
+   (lro_out_of_sequence_pkts),
+   (lro_flush_due_to_max_pkts),
+   (lro_avg_aggr_pkts),
 };
 
 #define S2IO_STAT_LEN sizeof(ethtool_stats_keys)/ ETH_GSTRING_LEN
@@ -317,6 +324,12 @@ static unsigned int indicate_max_pkts;
 static unsigned int rxsync_frequency = 3;
 /* Interrupt type. Values can be 0(INTA), 1(MSI), 2(MSI_X) */
 static unsigned int intr_type = 0;
+/* Large receive offload feature */
+static unsigned int lro = 0;
+/* Max pkts to be aggregated by LRO at one time. If not specified,
+ * aggregation happens until we hit max IP pkt size(64K)
+ */
+static unsigned int lro_max_pkts = 0x;
 
 /*
  * S2IO device table.
@@ -1476,6 +1489,19 @@ static int init_nic(struct s2io_nic *nic
writel((u32) (val64  32), (add + 4));
val64 = readq(bar0-mac_cfg);
 
+   /* Enable FCS stripping by adapter */
+   add = bar0-mac_cfg;
+   val64 = readq(bar0-mac_cfg);
+   val64 |= MAC_CFG_RMAC_STRIP_FCS;
+   if (nic-device_type == XFRAME_II_DEVICE)
+   writeq(val64, bar0-mac_cfg);
+   else {
+   writeq(RMAC_CFG_KEY(0x4C0D), bar0-rmac_cfg_key);
+   writel((u32) (val64), add);
+   writeq(RMAC_CFG_KEY(0x4C0D), bar0-rmac_cfg_key);
+   writel((u32) (val64  32), (add + 4));
+   }
+
/*
 * Set the time value to be inserted

RE: [PATCH 2.6.12.1 5/12] S2io: Performance improvements

2005-07-08 Thread Ravinandan Arakali
Arthur/David/Jeff,
Thanks for pointing that out. We will wait for any other comments
on our 12 patches. If there are no other, will send out a patch13
to include the mmiowb() change.

Thanks,
Ravi

-Original Message-
From: Arthur Kepner [mailto:[EMAIL PROTECTED]
Sent: Friday, July 08, 2005 8:31 AM
To: Raghavendra Koushik
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; netdev@vger.kernel.org;
[EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]
Subject: RE: [PATCH 2.6.12.1 5/12] S2io: Performance improvements


On Thu, 7 Jul 2005, Raghavendra Koushik wrote:

 
 On an Altix machine I believe the readq was necessary to flush 
 the PIO writes. How long did you run the tests? I had seen
 in long duration tests that an occasional write 
 (TXDL control word and the address) would be missed and the xmit
 Get's stuck.
 

The most recent tests I did used pktgen, and they ran for a total 
time of ~.5 hours (changing pkt_size every 30 seconds or so). The 
pktgen tests and other tests (like nttcp) have been run several times, 
so I've exercised the card for a total of several hours without 
any problems.

 
  
  FWIW, I've done quite a few performance measurements with the patch 
  I posted earlier, and it's worked well. For 1500 byte mtus throughput 
  goes up by ~20%. Is even the mmiowb() unnecessary?
  
 
 Was this on 2.4 kernel because I think the readq would not have a 
 significant impact on 2.6 kernels due to TSO.
 (with TSO on the number of packets that actually enter the 
 Xmit routine would be reduced apprx 40 times).
 .

This was with a 2.6 kernel (with TSO on). PIO reads are pretty 
expensive on Altix, so eliminating them really helps us. 

For big mtus (=4KBytes) the benefit of replacing the readq()
with mmiowb() in s2io_xmit() is negligible. 

--
Arthur

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html