Re: [RFT] r8169 changes against 2.6.23-rc3

2007-08-21 Thread Bruce Cole
So I did some experimenting with locking, but eventually found that this 
chunk:


@@ -2677,10 +2681,18 @@ static void rtl8169_tx_interrupt(struct 
net_device *dev,


if (tp->dirty_tx != dirty_tx) {
tp->dirty_tx = dirty_tx;
- -smp_wmb();
- -if (netif_queue_stopped(dev) &&
- -(TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)) {
- -netif_wake_queue(dev);
+smp_mb();
+if (unlikely(netif_queue_stopped(dev))) {
+netif_tx_lock(dev);
+if (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)
+netif_wake_queue(dev);
+if (dirty_tx != tp->cur_tx)
+RTL_W8(TxPoll, NPQ);
+netif_tx_unlock(dev);
+} else if (dirty_tx != tp->cur_tx) {
+netif_tx_lock(dev);
+RTL_W8(TxPoll, NPQ);
+netif_tx_unlock(dev);
}
}
}

from the patch in http://www.spinics.net/lists/netdev/msg33960.html
was sufficient to fix the stuck TX queue bug without the busy-wait.  
Actually
just the else portion of the above chunk was sufficient in my testing, 
without

the barrier change or the if statement change.

David Gundersen pointed me to this potential fix days ago, but I didn't
consider it first since the change had (presumably intentionally) been 
dropped from

the set of diffs Francois pointed me to.  Given that I had reported the same
problem as David Gundersen (and Dirk, and other samba users...) I 
thought this

patch had been ruled out.  Apparently not.  Hopefully this can be dusted off
and made into a fairly high priority fix as it has been biting realtek users
since last year at least.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT] r8169 changes against 2.6.23-rc3

2007-08-20 Thread Chuck Lever

Francois Romieu wrote:

The latest serie of r8169 changes is available against 2.6.23-rc3 as:
http://www.fr.zoreil.com/people/francois/misc/20070818-2.6.23-rc3-r8169-test.patch 


or (tarball sits one level higher):

http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.23-rc3/r8169-20070818/

or (rebase prone branch)

git://electric-eye.fr.zoreil.com/home/romieu/linux-2.6.git#r8169
 
Please do not clone your whole git kernel tree from here, thanks.


Changes (most recent first):

- eeprom read support
- phy init cleanup
- PHY init for the 8168
- make room for more PHY init changes
- remove dead wood
- add MAC identifiers
- MSI support
- correct phy parameters for the 8110SC

The first patch of the serie ("correct phy parameters for the 8110SC") has
been elaborated with Edward Hsu from Realtek and it should help some owners
of 8169 chipsets. If there is no report of regression for it on any
chispet and it is reported to fix someone's problems, I will send it to
Jeff Garzik for inclusion in 2.6.23 as a bugfix.

Anything else in this serie has not been tested on a wide scale nor acked
by the manufacturer: I consider it post 2.6.23 material. That being said,
the MSI changes seem fine and the "PHY init for the 8168" patch could make
a difference for the users of the 8168 whose link is not properly
negotiated.

Success and failure reports or patches will be welcome. Please Cc: netdev
and include "r8169" in the Subject.


Tested 2.6.23-rc3 plus your patch on my dual-R8169 mini-ITX Jetway 
J7F4K1G2E mainboard.  No problems to report.
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
url:http://oss.oracle.com/~cel
version:2.1
end:vcard



Re: [RFT] r8169 changes against 2.6.23-rc3

2007-08-20 Thread Dirk
On 8/19/07, Bruce Cole <[EMAIL PROTECTED]> wrote:
> So it seems that when the driver tries to queue a packet while the
> controller is busy processing the queue, the newly queued packet does
> not get noticed by the controller (until further packet activity occurs).
> Perhaps there is a problem with the memory barriers when adding to the
> TX queue, but I'm a newbie on linux kernel memory barriers.

One thing I noticed a while ago (march) is that floodpinging (ping -f)
the r8169 host from an external system also increases performance
without changing code.

My original post about the problem:
http://marc.info/?l=linux-netdev&m=117207362010321&w=2

I ended up (until now perhaps :-) with disabling the onboard nic and
adding an e1000 card.


Kind regards,
Dirk
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT] r8169 changes against 2.6.23-rc3

2007-08-18 Thread Bruce Cole

Francois Romieu wrote:

The latest serie of r8169 changes is available against 2.6.23-rc3 as:
http://www.fr.zoreil.com/people/francois/misc/20070818-2.6.23-rc3-r8169-test.patch 


or (tarball sits one level higher):

http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.23-rc3/r8169-20070818/

or (rebase prone branch)
  
I applied these patches (except for the eeprom read support patch) to 
2.6.23-rc3, and it did not fix the TX performance problem.  The 
busy-wait workaround is still required to dramatically improve 
performance.  I experimented some more and found that ndelay(10) is a 
sufficient delay within the for loop, as David Gundersen suggested.  
With some diagnostic code, I can see that when there is a NPQ problem, 
the busy wait calls ndelay(10) 4 or 5 times before the NPQ bit clears, 
and meanwhile the TX queue only contains a low number of entries (2 to 5 
typically).  This is a bit surprising, I would have guessed the root 
problem related to the queue filling up. 

I was then suspicious that maybe there was an issue with TX interrupts 
not being serviced in a timely manner due to the NAPI RX support, but 
making the TX processing work without NAPI did not solve things.


I took a look at tcpdump output from both the sender&receiver and found 
that when there is a problem, the sender is apparently delaying 
transmission of packets.  TCP then retransmits, and duplicate TCP 
sequences arrive at the receiver at that later time.


So it seems that when the driver tries to queue a packet while the 
controller is busy processing the queue, the newly queued packet does 
not get noticed by the controller (until further packet activity occurs).
Perhaps there is a problem with the memory barriers when adding to the 
TX queue, but I'm a newbie on linux kernel memory barriers.


tcpdump sample follows, with the realtek interface on the sending side.  
In the sample, notice that the data at sequence #1429358 is 
retransmitted by the sender after .3 seconds, but the receiver receives 
both copies at basically the same time (delayed by approximately .3 
seconds).



Sender:

13:42:10.325958 IP 192.168.1.11.netbios-ssn > 192.168.1.12.60994: . 
1414878:1416326(1448) ack 5775 win 54 4346009> NBT Session Packet: Session Message
13:42:10.325979 IP 192.168.1.11.netbios-ssn > 192.168.1.12.60994: . 
1416326:1417774(1448) ack 5775 win 54 4346009> NBT Session Packet: Session Message
13:42:10.325999 IP 192.168.1.11.netbios-ssn > 192.168.1.12.60994: . 
1417774:1419222(1448) ack 5775 win 54 4346009> NBT Session Packet: Session Message
13:42:10.326018 IP 192.168.1.11.netbios-ssn > 192.168.1.12.60994: . 
1419222:1420670(1448) ack 5775 win 54 4346009> NBT Session Packet: Session Message
13:42:10.326036 IP 192.168.1.11.netbios-ssn > 192.168.1.12.60994: . 
1420670:1422118(1448) ack 5775 win 54 4346009> NBT Session Packet: Session Message
13:42:10.326056 IP 192.168.1.11.netbios-ssn > 192.168.1.12.60994: . 
1422118:1423566(1448) ack 5775 win 54 4346009> NBT Session Packet: Session Message
13:42:10.326076 IP 192.168.1.11.netbios-ssn > 192.168.1.12.60994: . 
1423566:1425014(1448) ack 5775 win 54 4346009> NBT Session Packet: Session Message
13:42:10.326094 IP 192.168.1.11.netbios-ssn > 192.168.1.12.60994: . 
1425014:1426462(1448) ack 5775 win 54 4346009> NBT Session Packet: Session Message
13:42:10.326114 IP 192.168.1.11.netbios-ssn > 192.168.1.12.60994: . 
1426462:1427910(1448) ack 5775 win 54 4346009> NBT Session Packet: Session Message
13:42:10.326565 IP 192.168.1.12.60994 > 192.168.1.11.netbios-ssn: . ack 
1417774 win 2056 
13:42:10.326607 IP 192.168.1.11.netbios-ssn > 192.168.1.12.60994: . 
1427910:1429358 (1448) ack 5775 win 54 4346009> NBT Session Packet: Session Message
13:42:10.326614 IP 192.168.1.11.netbios-ssn > 192.168.1.12.60994: . 
1429358:1430806(1448) ack 5775 win 54 4346009> NBT Session Packet: Session Message
13:42:10.326620 IP 192.168.1.11.netbios-ssn > 192.168.1.12.60994: P 
1430806:1431325(519) ack 5775 win 54 4346009> NBT Session Packet: Session Message
13:42:10.326626 IP 192.168.1.12.60994 > 192.168.1.11.netbios-ssn: . ack 
1427910 win 1907 
13:42:10.366646 IP 192.168.1.12.60994 > 192.168.1.11.netbios-ssn: . ack 
1429358 win 2056 
13:42:10.652691 IP 192.168.1.11.netbios-ssn > 192.168.1.12.60994: . 
1429358:1430806(1448) ack 5775 win 54 4346050> NBT Session Packet: Session Message
13:42:10.653636 IP 192.168.1.12.60994 > 192.168.1.11.netbios-ssn: . ack 
1430806 win 2056 
13:42:10.653671 IP 192.168.1.12.60994 > 192.168.1.11.netbios-ssn: . ack 
1431325 win 2056 
13:42:10.653698 IP 192.168.1.12.60994 > 192.168.1.11.netbios-ssn: . ack 
1431325 win 2056 {1429358:1430806}>


Receiver:
13:42:11.516820 IP 192.168.1.11.netbios-ssn > 192.168.1.12.60994: . 
1414878:1416326(1448) ack 5775 win 54 4346009> NBT Session Packet: Session Message
13:42:11.516841 IP 192.168.1.11.netbios-ssn > 192.168.1.12.60994: . 
1416326:1417774(1448) ack 5775 win 54 4346009> NBT Session Packet: Session Message
13:42:11.516856 IP 192.168.1.11.netbi