Re: r8169: slow samba performance

2007-08-22 Thread Bruce Cole

Shane wrote:

On Wed, Aug 22, 2007 at 09:39:47AM -0700, Bruce Cole wrote:
  

Shane, join the crowd :)  Try the fix I just re-posted over here:



Bruce, gigabit speeds thanks for the pointer.  This fix
works well for me though I just added the three or so lines
in the elseif statement as it rejected with the
r8169-20070818.  I suppose I could've merged the whole
thing and if you need that tested, let me know but this is
looking good.
  
Glad it works for you.  I'm not the maintainer, and also don't have 
adequate specs from Realtek to definitively explain why the NPQ bit 
apparently needs to be re-enabled when some but not all of the TX FIFO 
is dequeued.  It is documented as if it isn't cleared until the FIFO is 
empty.  So I assume an official patch will have to wait until Francois 
is back.




-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: r8169: slow samba performance

2007-08-27 Thread john


On Wed, 22 Aug 2007, Bruce Cole wrote:


Shane wrote:

On Wed, Aug 22, 2007 at 09:39:47AM -0700, Bruce Cole wrote:


Shane, join the crowd :)  Try the fix I just re-posted over here:



Bruce, gigabit speeds thanks for the pointer.  This fix
works well for me though I just added the three or so lines
in the elseif statement as it rejected with the
r8169-20070818.  I suppose I could've merged the whole
thing and if you need that tested, let me know but this is
looking good.

Glad it works for you.  I'm not the maintainer, and also don't have adequate 
specs from Realtek to definitively explain why the NPQ bit apparently needs 
to be re-enabled when some but not all of the TX FIFO is dequeued.  It is 
documented as if it isn't cleared until the FIFO is empty.  So I assume an 
official patch will have to wait until Francois is back.



I have had abysmal performance trying to remotely run X apps via ssh on a
computer with a RTL8111 NIC.  Saw this message and decided to give this
patch a try --- success!  Much, much better.

Thanks,

John
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: r8169: slow samba performance

2007-09-03 Thread Francois Romieu
[EMAIL PROTECTED] <[EMAIL PROTECTED]> :
[...]
> I have had abysmal performance trying to remotely run X apps via ssh on a
> computer with a RTL8111 NIC.  Saw this message and decided to give this
> patch a try --- success!  Much, much better.

Can you give a try to:

http://www.fr.zoreil.com/people/francois/misc/20070903-2.6.23-rc5-r8169-test.patch

or just patches #0001 + #0002 at:

http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.23-rc5/r8169-20070903/


-- 
Ueimor
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: r8169: slow samba performance

2007-09-04 Thread john


On Mon, 3 Sep 2007, Francois Romieu wrote:


[EMAIL PROTECTED] <[EMAIL PROTECTED]> :
[...]

I have had abysmal performance trying to remotely run X apps via ssh on a
computer with a RTL8111 NIC.  Saw this message and decided to give this
patch a try --- success!  Much, much better.


Can you give a try to:

http://www.fr.zoreil.com/people/francois/misc/20070903-2.6.23-rc5-r8169-test.patch

or just patches #0001 + #0002 at:

http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.23-rc5/r8169-20070903/



20070903-2.6.23-rc5-r8169-test.patch applied against 2.6.23-rc5 works fine.
Performance is acceptable.

Would you like me to *just* try patches 1 & 2, to help narrow down anything?

Thanks,

John

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: r8169: slow samba performance

2007-09-04 Thread Francois Romieu
[EMAIL PROTECTED] <[EMAIL PROTECTED]> :
[...]
> 20070903-2.6.23-rc5-r8169-test.patch applied against 2.6.23-rc5 works fine.
> Performance is acceptable.

Does "acceptable" mean that there is a noticeable difference when compared
to the patch based on a busy-waiting loop ?

> Would you like me to *just* try patches 1 & 2, to help narrow down anything?

I expect patch #2 alone to be enough to enhance the performance. If it gets
proven, the patch would be a good candidate for a quick merge upstream.

-- 
Ueimor
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: r8169: slow samba performance

2007-09-04 Thread john


On Tue, 4 Sep 2007, Francois Romieu wrote:


[EMAIL PROTECTED] <[EMAIL PROTECTED]> :
[...]

20070903-2.6.23-rc5-r8169-test.patch applied against 2.6.23-rc5 works fine.
Performance is acceptable.


Does "acceptable" mean that there is a noticeable difference when compared
to the patch based on a busy-waiting loop ?



Without this patch, latency in bringing up emacs, or in display of pages in
firefox is extremely high.  With the patch, latency is pretty much what I
see when using an old tulip based NIC.

Is there a specific test you wish me to try?



Would you like me to *just* try patches 1 & 2, to help narrow down anything?


I expect patch #2 alone to be enough to enhance the performance. If it gets
proven, the patch would be a good candidate for a quick merge upstream.



Okay, I will build another kernel with just #2 applied.


John


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: r8169: slow samba performance

2007-09-04 Thread Bruce Cole

Francois Romieu wrote:

Does "acceptable" mean that there is a noticeable difference when compared
to the patch based on a busy-waiting loop ?

  

Would you like me to *just* try patches 1 & 2, to help narrow down anything?



I expect patch #2 alone to be enough to enhance the performance. If it gets
proven, the patch would be a good candidate for a quick merge upstream.

  
Patch #0002 looks functionally equivalent to the patch I already pointed 
folks

to and which I showed as being sufficient to address the TX queue problem.
The fix has also already been confirmed by shane, that fix being:

diff -c r/r8169.c r3/r8169-out.c
*** r/r8169.c   2007-08-18 11:54:58.0 -0700
--- r3/r8169-out.c  2007-09-04 14:23:49.0 -0700
***
*** 2646,2651 
--- 2646,2655 
   if (netif_queue_stopped(dev) &&
   (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)) {
   netif_wake_queue(dev);
+   } else if (dirty_tx != tp->cur_tx) {
+   netif_tx_lock(dev);
+   RTL_W8(TxPoll, NPQ);
+   netif_tx_unlock(dev);
   }
   }
 }


In any case, I've tried your latest version of the patch,
0002-r8169-workaround-against-ignored-TxPoll-writes-8168.patch, and it alone
works as well.


I'm not sure why you describe this as being an "8168 hack", given that the
problem has been seen with the 8111b chip (I have an 8111b chip on my 
gigabyte

motherboard).


Now since this change heals the TX queue stall, it would seem that the real
underlying problem involves a race condition with enqueueing to the TX queue
while the controller is processing the queue.  The ultimate fix for that 
I bet
is either to address locking at TX enqueue time, or there is a 
controller bug.
Any clarification from realtek on the necessary processing for the NPQ 
bit, or

a known controller problem?

PS: I've also received private email that this problem pertains to video
streaming (to a Kiss DVD player) not just samba or X11 traffic.  Basically
most all high-level TCP based protocols are affected it seems.  This serious
performance problem should be considered to impact a lot more than just 
samba

users.



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: r8169: slow samba performance

2007-09-09 Thread David Madsen
>Does "acceptable" mean that there is a noticeable difference when compared
>to the patch based on a busy-waiting loop ?

I noticed a somewhat significant difference between patch #0002 and a
busy wait loop with ndelay(10). Write performance was equivalent in
both cases as should be the case.  Read perfomance for me maxed out
around 150ish megabit whereas switching to the ndelay(10) loop brought
up average performance around 350ish megabit while reading the same
files over samba.

--David Madsen
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: r8169: slow samba performance

2007-09-09 Thread Francois Romieu
David Madsen <[EMAIL PROTECTED]> :
> >Does "acceptable" mean that there is a noticeable difference when compared
> >to the patch based on a busy-waiting loop ?
> 
> I noticed a somewhat significant difference between patch #0002 and a
> busy wait loop with ndelay(10). Write performance was equivalent in
> both cases as should be the case.  Read perfomance for me maxed out

Do you have some (gross) figure for the write performance ?

> around 150ish megabit whereas switching to the ndelay(10) loop brought
> up average performance around 350ish megabit while reading the same
> files over samba.

Hardly extatic. :o/

Do you see a difference in the system load too, say a few lines of 'vmstat 1' ?

Can you add the patch below on top of #0002 and see if there is some
benefit from it ?

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index b85ab4a..8d8fff3 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -2457,6 +2457,7 @@ static int rtl8169_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
smp_wmb();
 
RTL_W8(TxPoll, NPQ);/* set polling bit */
+   RTL_R8(TxPoll);
 
if (TX_BUFFS_AVAIL(tp) < MAX_SKB_FRAGS) {
netif_stop_queue(dev);


I'd welcome if you could try the patch below on top of #0002 too:

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index b85ab4a..840df3b 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -2457,6 +2457,17 @@ static int rtl8169_start_xmit(struct sk_buff *skb, 
struct net_device *dev)
smp_wmb();
 
RTL_W8(TxPoll, NPQ);/* set polling bit */
+{
+   static unsigned int wait_max = 0;
+   unsigned i;
+
+   for (i = 0; (RTL_R8(TxPoll) & NPQ) && (i < 1000); i++)
+   ndelay(10);
+   if (i > wait_max) {
+   wait_max = i;
+   printk(KERN_INFO "%s: wait_max = %d\n", dev->name, wait_max);
+   }
+}
 
if (TX_BUFFS_AVAIL(tp) < MAX_SKB_FRAGS) {
netif_stop_queue(dev);

-- 
Ueimor
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: r8169: slow samba performance

2007-09-13 Thread David Madsen
> > I noticed a somewhat significant difference between patch #0002 and a
> > busy wait loop with ndelay(10). Write performance was equivalent in
> > both cases as should be the case.  Read perfomance for me maxed out
>
> Do you have some (gross) figure for the write performance ?

Write performance was quite high, around 650-700 megabit no doubt due
to caching behavior on the server, and it was similar in both cases.
I believe the machine with the r8169 was CPU bound at this point or it
probably would have been even higher.

Sorry for the slow response, your reply ended up buried in a deluge of
email and I just dug it out.  I'll give these other patches a spin in
the next couple days when I get a chance and see if things improve.

--David Madsen
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: r8169: slow samba performance

2007-09-15 Thread David Madsen
> Do you see a difference in the system load too, say a few lines of 'vmstat 1' 
> ?

This is running on a dual core machine which explains the 50/50
sys/idle in vmstat.

with 8168 hack  (patch #0002):

writes:
isis tmp # dd if=/dev/zero of=test.fil bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 13.5288 s, 77.5 MB/s

 procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 1  0200 714568   2944 22796400 4 74592 44243 12190  1 45 52  2
 1  0200 637656   3028 30501600 0 79324 45579 12795  1 48 27 24
 1  0200 556688   3112 38381600 4 82880 48222 13901  1 49 50  1
 1  0200 475936   3196 46228000 8 78736 47925 13942  1 50 49  1
 0  0200 394992   3284 5406760012 74592 47657 13949  1 48 50  1

reads:
isis tmp # dd if=test.fil of=/dev/null bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 34.7543 s, 30.2 MB/s

 procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 0  0200  18652   3952 91352400 19996 0 18650 2000  1 13 83  4
 0  0200  19476   3940 91277200 24576 0 15846 1725  0 12 88  0
 0  0200  22540   3940 91015600 14336 0 9470 1024  0  8 91  0
 0  0200  25180   3924 90766000 14336 0 10084 1076  0  8 92  0
 1  0200  17764   3920 91524800 24576 0 16046 1669  1 13 87  0
 0  0200  18732   3936 91452800 24592 0 17136 1963  0 13 86  1
 0  0200  29152   3924 90481600 40996 0 25609 2776  0 19 80  1
 0  0200  35040   3880 89954800 36872 0 24288 2589  1 18 81  1
 0  0200  15524   3904 91954000 36888 0 24506 2664  0 19 80  0
 0  0200  23964   3904 91187200 4304864 27498 2934  0 22 76  2
 0  0200  15960   3908 92056400 59444 0 38224 4096  0 29 68  3
 0  0200  14936   3908 92191600 26652 0 18401 1957  1 15 82  3
 1  0200  30392   3916 90686400 10248 0 7225  863  0  6 94  1
 0  0200  14836   3896 92276800 32796 0 20830 2313  1 16 80  4
 0  0200  35152   3896 90278800 30748 0 20679 2340  0 16 79  5

with ndelay(10) loop:

writes:
isis tmp # dd if=/dev/zero of=test.fil bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 13.7967 s, 76.0 MB/s

 1  0  0 694080   6448 24825200 0 78736 44235 12588  1 50 50  0
 1  0  0 613448   6524 32640000 0 78736 44215 12477  1 50 50  0
 1  0  0 535612   6628 40379600 8 91668 45789 10741  0 52 23 24
 0  0  0 454132   6704 48284800 0 78736 47082 10795  1 51 49  0
 1  0  0 373804   6784 56078000 4 75008 46826 10418  1 49 51  0
 1  0  0 292216   6860 63997600 0 82880 47279 10544  1 51 49  0

reads:
isis tmp # dd if=test.fil of=/dev/null bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 21.894 s, 47.9 MB/s

procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 1  0  0  14968   3736 92092000 44968 0 21886 3068  0 33 60  7
 0  0  0  13120   3672 92290800 32768 0 17486 2326  0 26 74  0
 1  0  0  24796   3640 91213600 51212 4 29417 3637  1 44 56  0
 1  0  0  52104   3600 88607200 49688 0 29329 3602  0 42 57  0
 1  0  0  44548   3564 89416400 43808 0 28066 3418  0 41 57  2
 0  0  0  16676   3608 92240000 47148 0 25292 3313  0 36 59  4
 0  0  0  37392   3604 90258000 51248 0 27554 3512  1 40 59  1
 1  0  0  23988   3648 91646800 49196 0 26621 3393  0 39 61  1
 1  0  0  16232   3696 92469200 51248 0 28792 3536  1 43 56  1
 0  1  0  18548   3732 92147200 39416 0 21470 2736  1 31 66  2
 2  0  0  13620   3760 92829600 40520 0 22081 2818  0 33 65  1
 0  0  0  18828   3732 92390000 53252 0 28577 3611  0 43 57  0
 1  0160  13308   3736 92971200 43012 0 22924 2920  1 33 66  0
 1  0176  13316   2668 93134800 40964 0 23122 2899  0 34 66  0
 0  0176  13764   1900 93241600 53260 0 28571 3601  0 42 57  0
 0  1176  14076   1744 93130000 51672 0 28600 3845  1 42 58  0
 1  0176  16380   1620 93116400 52828 0 27832 3518  1 41 58  1

Load is definately higher with the ndelay(10) loop but throughput on
reads is quite a bit better as well.

> Can you add the patch below on top of #0002 and see if there is some
> benefit from it ?

writes:
isis tmp # dd if=/dev/zero of=test.fil bs=1M count=1000
1000+