Re: Doubts about listen backlog and tcp_max_syn_backlog

2013-01-27 Thread Nivedita Singhvi
On 01/25/2013 02:05 AM, Leandro Lucarella wrote:
> On Thu, Jan 24, 2013 at 10:12:46PM -0800, Nivedita SInghvi wrote:
>>>>> I was just kind of quoting the name given by netstat: "SYNs to LISTEN
>>>>> sockets dropped" (for kernel 3.0, I noticed newer kernels don't have
>>>>> this stat anymore, or the name was changed). I still don't know if we
>>>>> are talking about the same thing.
>>>>
>> [snip]
>>>> I will sometimes be tripped-up by netstat's not showing a statistic
>>>> with a zero value...
>>
>> Leandro, you should be able to do an nstat -z, it will print all
>> counters even if zero. You should see something like so:
>>
>> ipv4]> nstat -z
>> #kernel
>> IpInReceives2135   0.0
>> IpInHdrErrors   0  0.0
>> IpInAddrErrors  2020.0
>> ...
>>
>> You might want to take a look at those (your pkts may not even be
>> making it to tcp) and these in particular:
>>
>> TcpExtSyncookiesSent0  0.0
>> TcpExtSyncookiesRecv0  0.0
>> TcpExtSyncookiesFailed  0  0.0
>> TcpExtListenOverflows   0  0.0
>> TcpExtListenDrops   0  0.0
>> TcpExtTCPBacklogDrop0  0.0
>> TcpExtTCPMinTTLDrop 0  0.0
>> TcpExtTCPDeferAcceptDrop0  0.0
>>
>> If you don't have nstat on that version for some reason, download the
>> latest iproute pkg. Looking at the counter names is a lot more helpful
>> and precise than the netstat converstion to human consumption. 
> 
> Thanks, but what about this?
> 
> pc2 $ nstat -z | grep -i drop
> TcpExtLockDroppedIcmps  0  0.0
> TcpExtListenDrops   0  0.0
> TcpExtTCPPrequeueDropped0  0.0
> TcpExtTCPBacklogDrop0  0.0
> TcpExtTCPMinTTLDrop 0  0.0
> TcpExtTCPDeferAcceptDrop0  0.0

That seems bogus. 


> pc2 $ netstat -s | grep -i drop
> 470 outgoing packets dropped
> 5659740 SYNs to LISTEN sockets dropped
> 
> Is this normal?

That's a lot ofconnect requests dropped, but it depends on how 
long you've been up and how much traffic you've seen. 

Hmm...you were on an older Ubuntu, right? The netstat source 
was patched to translate it as follows:

+{ "ListenDrops", N_("%u SYNs to LISTEN sockets dropped"), opt_number },

(see the file debian/patches/CVS-20081003-statistics.c_sync.patch 
 in the net-tools src)

i.e., the netstat pkg is printing the value of the TCPEXT MIB counter
that's counting TCPExtListenDrops. 

Theoretically, that number should be the same as that printed by nstat,
as they are getting it from the same kernel stats counter. I have not
looked at nstat code (I actually almost always dump the counters from
/proc/net/{netstat + snmp} via a simple prettyprint script (will send
you that offline).  

If the nstat and netstat counters don't match, something is fishy.
That nstat output is broken.  

>>> Yes, I already did captures and we are definitely loosing packets
>>> (including SYNs), but it looks like the amount of SYNs I'm loosing is
>>> lower than the amount of long connect() times I observe. This is not
>>> confirmed yet, I'm still investigating.
>>
>> Where did you narrow down the drop to? There are quite a few places in
>> the networking stack we silently drop packets (such as the one pointed
>> out earlier in this thread), although they should almost all be
>> extremely low probability/NEVER type events. Do you want a patch to
>> gap the most likely scenario? (I'll post that to netdev separately). 
> 
> Even when that would be awesome, unfortunately there is no way I could
> get permission to run a patched kernel (or even restart the servers for
> that matter).
> 
> And I don't know how could I narrow down the drops in any way. What I
> know is capturing traffic with tcpdump, I see some packets leaving one
> server but never arriving to the new one.

Hmm..do you have a switch between your two end points dropping pkts? 
Could be.. Basically, by looking at the statistics kept by each layer, you 
should be able to narrow it down a little bit at least. 

It does still sound like some drops are occurring in TCP due to accept 
backlog being full and you're overrunning TCP incoming processing (or 
at least this contributing), going by that ListenDrops count. 

> Also, the hardware is not great either, I'm no

Re: Doubts about listen backlog and tcp_max_syn_backlog

2013-01-27 Thread Nivedita Singhvi
On 01/25/2013 02:05 AM, Leandro Lucarella wrote:
> On Thu, Jan 24, 2013 at 10:12:46PM -0800, Nivedita SInghvi wrote:
>>>>> I was just kind of quoting the name given by netstat: "SYNs to LISTEN
>>>>> sockets dropped" (for kernel 3.0, I noticed newer kernels don't have
>>>>> this stat anymore, or the name was changed). I still don't know if we
>>>>> are talking about the same thing.
>>>>
>> [snip]
>>>> I will sometimes be tripped-up by netstat's not showing a statistic
>>>> with a zero value...
>>
>> Leandro, you should be able to do an nstat -z, it will print all
>> counters even if zero. You should see something like so:
>>
>> ipv4]> nstat -z
>> #kernel
>> IpInReceives2135   0.0
>> IpInHdrErrors   0  0.0
>> IpInAddrErrors  2020.0
>> ...
>>
>> You might want to take a look at those (your pkts may not even be
>> making it to tcp) and these in particular:
>>
>> TcpExtSyncookiesSent0  0.0
>> TcpExtSyncookiesRecv0  0.0
>> TcpExtSyncookiesFailed  0  0.0
>> TcpExtListenOverflows   0  0.0
>> TcpExtListenDrops   0  0.0
>> TcpExtTCPBacklogDrop0  0.0
>> TcpExtTCPMinTTLDrop 0  0.0
>> TcpExtTCPDeferAcceptDrop0  0.0
>>
>> If you don't have nstat on that version for some reason, download the
>> latest iproute pkg. Looking at the counter names is a lot more helpful
>> and precise than the netstat converstion to human consumption. 
> 
> Thanks, but what about this?
> 
> pc2 $ nstat -z | grep -i drop
> TcpExtLockDroppedIcmps  0  0.0
> TcpExtListenDrops   0  0.0
> TcpExtTCPPrequeueDropped0  0.0
> TcpExtTCPBacklogDrop0  0.0
> TcpExtTCPMinTTLDrop 0  0.0
> TcpExtTCPDeferAcceptDrop0  0.0

That seems bogus. 


> pc2 $ netstat -s | grep -i drop
> 470 outgoing packets dropped
> 5659740 SYNs to LISTEN sockets dropped
> 
> Is this normal?

That's a lot ofconnect requests dropped, but it depends on how 
long you've been up and how much traffic you've seen. 

Hmm...you were on an older Ubuntu, right? The netstat source 
was patched to translate it as follows:

+{ "ListenDrops", N_("%u SYNs to LISTEN sockets dropped"), opt_number },

(see the file debian/patches/CVS-20081003-statistics.c_sync.patch 
 in the net-tools src)

i.e., the netstat pkg is printing the value of the TCPEXT MIB counter
that's counting TCPExtListenDrops. 

Theoretically, that number should be the same as that printed by nstat,
as they are getting it from the same kernel stats counter. I have not
looked at nstat code (I actually almost always dump the counters from
/proc/net/{netstat + snmp} via a simple prettyprint script (will send
you that offline).  

If the nstat and netstat counters don't match, something is fishy.
That nstat output is broken.  

>>> Yes, I already did captures and we are definitely loosing packets
>>> (including SYNs), but it looks like the amount of SYNs I'm loosing is
>>> lower than the amount of long connect() times I observe. This is not
>>> confirmed yet, I'm still investigating.
>>
>> Where did you narrow down the drop to? There are quite a few places in
>> the networking stack we silently drop packets (such as the one pointed
>> out earlier in this thread), although they should almost all be
>> extremely low probability/NEVER type events. Do you want a patch to
>> gap the most likely scenario? (I'll post that to netdev separately). 
> 
> Even when that would be awesome, unfortunately there is no way I could
> get permission to run a patched kernel (or even restart the servers for
> that matter).
> 
> And I don't know how could I narrow down the drops in any way. What I
> know is capturing traffic with tcpdump, I see some packets leaving one
> server but never arriving to the new one.

Hmm..do you have a switch between your two end points dropping pkts? 
Could be.. Basically, by looking at the statistics kept by each layer, you 
should be able to narrow it down a little bit at least. 

It does still sound like some drops are occurring in TCP due to accept 
backlog being full and you're overrunning TCP incoming processing (or 
at least this contributing), going by that ListenDrops count. 

> Also, the hardware is not great either, I'm no

Re: Doubts about listen backlog and tcp_max_syn_backlog

2013-01-27 Thread Nivedita Singhvi
On 01/25/2013 02:05 AM, Leandro Lucarella wrote:
 On Thu, Jan 24, 2013 at 10:12:46PM -0800, Nivedita SInghvi wrote:
 I was just kind of quoting the name given by netstat: SYNs to LISTEN
 sockets dropped (for kernel 3.0, I noticed newer kernels don't have
 this stat anymore, or the name was changed). I still don't know if we
 are talking about the same thing.

 [snip]
 I will sometimes be tripped-up by netstat's not showing a statistic
 with a zero value...

 Leandro, you should be able to do an nstat -z, it will print all
 counters even if zero. You should see something like so:

 ipv4] nstat -z
 #kernel
 IpInReceives2135   0.0
 IpInHdrErrors   0  0.0
 IpInAddrErrors  2020.0
 ...

 You might want to take a look at those (your pkts may not even be
 making it to tcp) and these in particular:

 TcpExtSyncookiesSent0  0.0
 TcpExtSyncookiesRecv0  0.0
 TcpExtSyncookiesFailed  0  0.0
 TcpExtListenOverflows   0  0.0
 TcpExtListenDrops   0  0.0
 TcpExtTCPBacklogDrop0  0.0
 TcpExtTCPMinTTLDrop 0  0.0
 TcpExtTCPDeferAcceptDrop0  0.0

 If you don't have nstat on that version for some reason, download the
 latest iproute pkg. Looking at the counter names is a lot more helpful
 and precise than the netstat converstion to human consumption. 
 
 Thanks, but what about this?
 
 pc2 $ nstat -z | grep -i drop
 TcpExtLockDroppedIcmps  0  0.0
 TcpExtListenDrops   0  0.0
 TcpExtTCPPrequeueDropped0  0.0
 TcpExtTCPBacklogDrop0  0.0
 TcpExtTCPMinTTLDrop 0  0.0
 TcpExtTCPDeferAcceptDrop0  0.0

That seems bogus. 


 pc2 $ netstat -s | grep -i drop
 470 outgoing packets dropped
 5659740 SYNs to LISTEN sockets dropped
 
 Is this normal?

That's a lot ofconnect requests dropped, but it depends on how 
long you've been up and how much traffic you've seen. 

Hmm...you were on an older Ubuntu, right? The netstat source 
was patched to translate it as follows:

+{ ListenDrops, N_(%u SYNs to LISTEN sockets dropped), opt_number },

(see the file debian/patches/CVS-20081003-statistics.c_sync.patch 
 in the net-tools src)

i.e., the netstat pkg is printing the value of the TCPEXT MIB counter
that's counting TCPExtListenDrops. 

Theoretically, that number should be the same as that printed by nstat,
as they are getting it from the same kernel stats counter. I have not
looked at nstat code (I actually almost always dump the counters from
/proc/net/{netstat + snmp} via a simple prettyprint script (will send
you that offline).  

If the nstat and netstat counters don't match, something is fishy.
That nstat output is broken.  

 Yes, I already did captures and we are definitely loosing packets
 (including SYNs), but it looks like the amount of SYNs I'm loosing is
 lower than the amount of long connect() times I observe. This is not
 confirmed yet, I'm still investigating.

 Where did you narrow down the drop to? There are quite a few places in
 the networking stack we silently drop packets (such as the one pointed
 out earlier in this thread), although they should almost all be
 extremely low probability/NEVER type events. Do you want a patch to
 gap the most likely scenario? (I'll post that to netdev separately). 
 
 Even when that would be awesome, unfortunately there is no way I could
 get permission to run a patched kernel (or even restart the servers for
 that matter).
 
 And I don't know how could I narrow down the drops in any way. What I
 know is capturing traffic with tcpdump, I see some packets leaving one
 server but never arriving to the new one.

Hmm..do you have a switch between your two end points dropping pkts? 
Could be.. Basically, by looking at the statistics kept by each layer, you 
should be able to narrow it down a little bit at least. 

It does still sound like some drops are occurring in TCP due to accept 
backlog being full and you're overrunning TCP incoming processing (or 
at least this contributing), going by that ListenDrops count. 

 Also, the hardware is not great either, I'm not sure is not responsible
 for the loss. There are some errors reported by ethtool, but I don't
 know exactly what they mean:
 
 # ethtool -S eth0
 NIC statistics:
  tx_packets: 336978308273
  rx_packets: 384108075585
  tx_errors: 0
  rx_errors: 194
  rx_missed: 1119
  align_errors: 31731
  tx_single_collisions: 0
  tx_multi_collisions: 0
  unicast: 384108023754
  broadcast: 51825
  multicast: 6
  tx_aborted: 0
  tx_underrun: 0
 
 Thanks!
 

You aren't suffering a lot of packet loss at the NIC.  

Sorry, I'm on the road

Re: Doubts about listen backlog and tcp_max_syn_backlog

2013-01-27 Thread Nivedita Singhvi
On 01/25/2013 02:05 AM, Leandro Lucarella wrote:
 On Thu, Jan 24, 2013 at 10:12:46PM -0800, Nivedita SInghvi wrote:
 I was just kind of quoting the name given by netstat: SYNs to LISTEN
 sockets dropped (for kernel 3.0, I noticed newer kernels don't have
 this stat anymore, or the name was changed). I still don't know if we
 are talking about the same thing.

 [snip]
 I will sometimes be tripped-up by netstat's not showing a statistic
 with a zero value...

 Leandro, you should be able to do an nstat -z, it will print all
 counters even if zero. You should see something like so:

 ipv4] nstat -z
 #kernel
 IpInReceives2135   0.0
 IpInHdrErrors   0  0.0
 IpInAddrErrors  2020.0
 ...

 You might want to take a look at those (your pkts may not even be
 making it to tcp) and these in particular:

 TcpExtSyncookiesSent0  0.0
 TcpExtSyncookiesRecv0  0.0
 TcpExtSyncookiesFailed  0  0.0
 TcpExtListenOverflows   0  0.0
 TcpExtListenDrops   0  0.0
 TcpExtTCPBacklogDrop0  0.0
 TcpExtTCPMinTTLDrop 0  0.0
 TcpExtTCPDeferAcceptDrop0  0.0

 If you don't have nstat on that version for some reason, download the
 latest iproute pkg. Looking at the counter names is a lot more helpful
 and precise than the netstat converstion to human consumption. 
 
 Thanks, but what about this?
 
 pc2 $ nstat -z | grep -i drop
 TcpExtLockDroppedIcmps  0  0.0
 TcpExtListenDrops   0  0.0
 TcpExtTCPPrequeueDropped0  0.0
 TcpExtTCPBacklogDrop0  0.0
 TcpExtTCPMinTTLDrop 0  0.0
 TcpExtTCPDeferAcceptDrop0  0.0

That seems bogus. 


 pc2 $ netstat -s | grep -i drop
 470 outgoing packets dropped
 5659740 SYNs to LISTEN sockets dropped
 
 Is this normal?

That's a lot ofconnect requests dropped, but it depends on how 
long you've been up and how much traffic you've seen. 

Hmm...you were on an older Ubuntu, right? The netstat source 
was patched to translate it as follows:

+{ ListenDrops, N_(%u SYNs to LISTEN sockets dropped), opt_number },

(see the file debian/patches/CVS-20081003-statistics.c_sync.patch 
 in the net-tools src)

i.e., the netstat pkg is printing the value of the TCPEXT MIB counter
that's counting TCPExtListenDrops. 

Theoretically, that number should be the same as that printed by nstat,
as they are getting it from the same kernel stats counter. I have not
looked at nstat code (I actually almost always dump the counters from
/proc/net/{netstat + snmp} via a simple prettyprint script (will send
you that offline).  

If the nstat and netstat counters don't match, something is fishy.
That nstat output is broken.  

 Yes, I already did captures and we are definitely loosing packets
 (including SYNs), but it looks like the amount of SYNs I'm loosing is
 lower than the amount of long connect() times I observe. This is not
 confirmed yet, I'm still investigating.

 Where did you narrow down the drop to? There are quite a few places in
 the networking stack we silently drop packets (such as the one pointed
 out earlier in this thread), although they should almost all be
 extremely low probability/NEVER type events. Do you want a patch to
 gap the most likely scenario? (I'll post that to netdev separately). 
 
 Even when that would be awesome, unfortunately there is no way I could
 get permission to run a patched kernel (or even restart the servers for
 that matter).
 
 And I don't know how could I narrow down the drops in any way. What I
 know is capturing traffic with tcpdump, I see some packets leaving one
 server but never arriving to the new one.

Hmm..do you have a switch between your two end points dropping pkts? 
Could be.. Basically, by looking at the statistics kept by each layer, you 
should be able to narrow it down a little bit at least. 

It does still sound like some drops are occurring in TCP due to accept 
backlog being full and you're overrunning TCP incoming processing (or 
at least this contributing), going by that ListenDrops count. 

 Also, the hardware is not great either, I'm not sure is not responsible
 for the loss. There are some errors reported by ethtool, but I don't
 know exactly what they mean:
 
 # ethtool -S eth0
 NIC statistics:
  tx_packets: 336978308273
  rx_packets: 384108075585
  tx_errors: 0
  rx_errors: 194
  rx_missed: 1119
  align_errors: 31731
  tx_single_collisions: 0
  tx_multi_collisions: 0
  unicast: 384108023754
  broadcast: 51825
  multicast: 6
  tx_aborted: 0
  tx_underrun: 0
 
 Thanks!
 

You aren't suffering a lot of packet loss at the NIC.  

Sorry, I'm on the road

Re: Doubts about listen backlog and tcp_max_syn_backlog

2013-01-24 Thread Nivedita SInghvi
On 01/24/2013 11:21 AM, Leandro Lucarella wrote:
> On Thu, Jan 24, 2013 at 10:44:32AM -0800, Rick Jones wrote:
>> On 01/24/2013 04:22 AM, Leandro Lucarella wrote:
>>> On Wed, Jan 23, 2013 at 11:28:08AM -0800, Rick Jones wrote:
> Then if syncookies are enabled, the time spent in connect() shouldn't be
> bigger than 3 seconds even if SYNs are being "dropped" by listen, right?

 Do you mean if "ESTABLISHED" connections are dropped because the
 listen queue is full?  I don't think I would put that as "SYNs being
 dropped by listen" - too easy to confuse that with an actual
 dropping of a SYN segment.
>>>
>>> I was just kind of quoting the name given by netstat: "SYNs to LISTEN
>>> sockets dropped" (for kernel 3.0, I noticed newer kernels don't have
>>> this stat anymore, or the name was changed). I still don't know if we
>>> are talking about the same thing.
>>
[snip]
>> I will sometimes be tripped-up by netstat's not showing a statistic
>> with a zero value...

Leandro, you should be able to do an nstat -z, it will print all counters even 
if zero. You should see something like so:

ipv4]> nstat -z
#kernel
IpInReceives2135   0.0
IpInHdrErrors   0  0.0
IpInAddrErrors  2020.0
...

You might want to take a look at those (your pkts may not even be making it to 
tcp) and these in particular:

TcpExtSyncookiesSent0  0.0
TcpExtSyncookiesRecv0  0.0
TcpExtSyncookiesFailed  0  0.0
TcpExtListenOverflows   0  0.0
TcpExtListenDrops   0  0.0
TcpExtTCPBacklogDrop0  0.0
TcpExtTCPMinTTLDrop 0  0.0
TcpExtTCPDeferAcceptDrop0  0.0

If you don't have nstat on that version for some reason, download the latest 
iproute pkg. Looking at the counter names is a lot more helpful and precise 
than the netstat converstion to human consumption. 


> Yes, I already did captures and we are definitely loosing packets
> (including SYNs), but it looks like the amount of SYNs I'm loosing is
> lower than the amount of long connect() times I observe. This is not
> confirmed yet, I'm still investigating.

Where did you narrow down the drop to? There are quite a few places in the 
networking stack we silently drop packets (such as the one pointed out earlier 
in this thread), although they should almost all be extremely low 
probability/NEVER type events. Do you want a patch to gap the most likely 
scenario? (I'll post that to netdev separately). 

thanks,
Nivedita

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Doubts about listen backlog and tcp_max_syn_backlog

2013-01-24 Thread Nivedita SInghvi
On 01/24/2013 11:21 AM, Leandro Lucarella wrote:
 On Thu, Jan 24, 2013 at 10:44:32AM -0800, Rick Jones wrote:
 On 01/24/2013 04:22 AM, Leandro Lucarella wrote:
 On Wed, Jan 23, 2013 at 11:28:08AM -0800, Rick Jones wrote:
 Then if syncookies are enabled, the time spent in connect() shouldn't be
 bigger than 3 seconds even if SYNs are being dropped by listen, right?

 Do you mean if ESTABLISHED connections are dropped because the
 listen queue is full?  I don't think I would put that as SYNs being
 dropped by listen - too easy to confuse that with an actual
 dropping of a SYN segment.

 I was just kind of quoting the name given by netstat: SYNs to LISTEN
 sockets dropped (for kernel 3.0, I noticed newer kernels don't have
 this stat anymore, or the name was changed). I still don't know if we
 are talking about the same thing.

[snip]
 I will sometimes be tripped-up by netstat's not showing a statistic
 with a zero value...

Leandro, you should be able to do an nstat -z, it will print all counters even 
if zero. You should see something like so:

ipv4] nstat -z
#kernel
IpInReceives2135   0.0
IpInHdrErrors   0  0.0
IpInAddrErrors  2020.0
...

You might want to take a look at those (your pkts may not even be making it to 
tcp) and these in particular:

TcpExtSyncookiesSent0  0.0
TcpExtSyncookiesRecv0  0.0
TcpExtSyncookiesFailed  0  0.0
TcpExtListenOverflows   0  0.0
TcpExtListenDrops   0  0.0
TcpExtTCPBacklogDrop0  0.0
TcpExtTCPMinTTLDrop 0  0.0
TcpExtTCPDeferAcceptDrop0  0.0

If you don't have nstat on that version for some reason, download the latest 
iproute pkg. Looking at the counter names is a lot more helpful and precise 
than the netstat converstion to human consumption. 


 Yes, I already did captures and we are definitely loosing packets
 (including SYNs), but it looks like the amount of SYNs I'm loosing is
 lower than the amount of long connect() times I observe. This is not
 confirmed yet, I'm still investigating.

Where did you narrow down the drop to? There are quite a few places in the 
networking stack we silently drop packets (such as the one pointed out earlier 
in this thread), although they should almost all be extremely low 
probability/NEVER type events. Do you want a patch to gap the most likely 
scenario? (I'll post that to netdev separately). 

thanks,
Nivedita

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] MAINTAINERS: Update John Stultz email

2013-01-23 Thread Nivedita Singhvi
John's email switched from IBM to Linaro. One less place for him to update 
now...

Signed-off-by: Nivedita Singhvi 
---
 MAINTAINERS |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4e734ed..59e68d8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6596,7 +6596,7 @@ F:drivers/dma/dw_dmac_regs.h
 F: drivers/dma/dw_dmac.c
 
 TIMEKEEPING, NTP
-M: John Stultz 
+M: John Stultz 
 M: Thomas Gleixner 
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
timers/core
 S: Supported
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] MAINTAINERS: Update John Stultz email

2013-01-23 Thread Nivedita Singhvi
John's email switched from IBM to Linaro. One less place for him to update 
now...

Signed-off-by: Nivedita Singhvi n...@us.ibm.com
---
 MAINTAINERS |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4e734ed..59e68d8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6596,7 +6596,7 @@ F:drivers/dma/dw_dmac_regs.h
 F: drivers/dma/dw_dmac.c
 
 TIMEKEEPING, NTP
-M: John Stultz johns...@us.ibm.com
+M: John Stultz john.stu...@linaro.org
 M: Thomas Gleixner t...@linutronix.de
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
timers/core
 S: Supported
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Poor UDP performance using 2.6.21-rc5-rt5

2007-04-01 Thread Nivedita Singhvi

Dave Sperry wrote:

Hi


(adding netdev to cc list)

I have a dual core Opteron machine that exhibits poor UDP performance 
(RT consumes more than 2X cpu) with the 2.6.21-rc5-rt5 as compared to 
2.6.21-rc5. Top shows the IRQ handler consuming a lot of CPU.


Dave, any chance you've got oprofile working on the -rt5?
And I'm assuming nothing very different in the stats or errors
through both runs?

thanks,
Nivedita

The mother board is a Supermicro H8DME-2 with one dual core Opteron 
installed. The networking is provided by the on board nVidia MCP55Pro chip.


The RT test is done using netperf 2.4.3 with the server on an IBM LS20 
blade running RHEL4U2 and the Supermicro running netperf under RHEL5 
with 2.6.21-rc5-rt5.
The Non-RT test was done on the exact same setup except 2.6.21-rc5-rt5 
was loaded on the SuperMicro board.


Cyclesoak was used to measure CPU utilization in all cases.



Here are the RT results
3
## 2.6.21-rc5-rt5
###
$ !netper
netperf -l 100 -H 192.168.70.11 -t UDP_STREAM -- -m 1025
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.70.11 (192.168.70.11) port 0 AF_INET

Socket  Message  Elapsed  Messages
SizeSize Time Okay Errors   Throughput
bytes   bytessecs#  #   10^6bits/sec

1269761025   100.008676376  0 711.46
135168   100.008676376711.46

## cyclesoak during test
$ ./cyclesoak
using 2 CPUs
System load: -0.1%
System load: 40.5%
System load: 51.6%
System load: 51.5%
System load: 50.9%
System load: 50.7%
System load: 50.8%
System load: 50.7%
System load: 50.6%

 top during test
top - 13:26:48 up 8 min,  4 users,  load average: 1.74, 0.46, 0.15
Tasks: 149 total,   4 running, 145 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.7%us, 16.8%sy, 50.6%ni,  0.0%id,  0.0%wa, 25.6%hi,  6.3%si,  
0.0%st

Mem:   2035444k total,   465888k used,  1569556k free,28840k buffers
Swap:  3068372k total,0k used,  3068372k free,   318668k cached

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
3865 eadi  39  19  6804 1164  108 R  100  0.1   0:38.25 cyclesoak
2715 root -51  -5 000 S   51  0.0   0:09.52 IRQ-8406
3867 eadi  25   0  6440  632  480 R   34  0.0   0:06.03 
netperf 19 root -51   0 000 S   13  0.0   
0:02.33 softirq-net-tx/

3866 eadi  39  19  6804 1164  108 R1  0.1   0:20.47 cyclesoak
3167 root  25   0 29888 1180  888 S0  0.1   0:00.93 automount
3861 eadi  15   0 12712 1076  788 R0  0.1   0:00.19 top
   1 root  18   0 10308  668  552 S0  0.0   0:00.67 init 
2 root  RT   0 000 S0  0.0   0:00.00 migration/0   
   3 root  RT   0 000 S0  0.0   0:00.00 posix_cpu_timer

   4 root -51   0 000 S0  0.0   0:00.00 softirq-high/0
   5 root -51   0 000 S0  0.0   0:00.00 softirq-timer/0
   6 root -51   0 000 S0  0.0   0:00.00 softirq-net-tx/
   7 root -51   0 000 S0  0.0   0:00.00 softirq-net-rx/
   8 root -51   0 000 S0  0.0   0:00.00 softirq-block/0


The baseline results:
RHEL5 with 2.6.21-rc5 kernel
##

$  netperf -l 100 -H 192.168.70.11 -t UDP_STREAM -- -m 1025

UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.70.11 (192.168.70.11) port 0 AF_INET

Socket  Message  Elapsed  Messages
SizeSize Time Okay Errors   Throughput
bytes   bytessecs#  #   10^6bits/sec

1269761025   100.0011405485  0 935.24
135168   100.0011405485935.24

###
$ ./cyclesoak
using 2 CPUs
System load:  7.6%
System load: 29.6%
System load: 29.6%
System load: 28.9%
System load: 24.9%
System load: 25.0%
System load: 24.8%
System load: 24.9%

###
top:top - 13:52:22 up 10 min,  6 users,  load average: 1.46, 0.43, 0.17
Tasks: 118 total,   4 running, 114 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.5%us,  9.8%sy, 75.7%ni,  0.0%id,  0.0%wa,  5.8%hi,  8.1%si,  
0.0%st

Mem:   2057200k total,   459128k used,  1598072k free,29020k buffers
Swap:  3068372k total,0k used,  3068372k free,   318968k cached

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
3882 eadi  39  19  6804 1164  108 R  100  0.1   0:52.11 cyclesoak
3881 eadi  39  19  6804 1164  108 R   65  0.1   0:38.47 cyclesoak
3883 eadi  15   0  6436  632  480 R   35  0.0   0:18.26 netperf
3879 eadi  15   0 12580 1052  788 R0  0.1   0:00.15 top
   1 root  18   0 10308  664  552 S0  0.0   0:00.48 init 
2 root  RT   0 000 S0  0.0   0:00.00 migration/0

   3 root  34  19 000 S0  0.0   

Re: Poor UDP performance using 2.6.21-rc5-rt5

2007-04-01 Thread Nivedita Singhvi

Dave Sperry wrote:

Hi


(adding netdev to cc list)

I have a dual core Opteron machine that exhibits poor UDP performance 
(RT consumes more than 2X cpu) with the 2.6.21-rc5-rt5 as compared to 
2.6.21-rc5. Top shows the IRQ handler consuming a lot of CPU.


Dave, any chance you've got oprofile working on the -rt5?
And I'm assuming nothing very different in the stats or errors
through both runs?

thanks,
Nivedita

The mother board is a Supermicro H8DME-2 with one dual core Opteron 
installed. The networking is provided by the on board nVidia MCP55Pro chip.


The RT test is done using netperf 2.4.3 with the server on an IBM LS20 
blade running RHEL4U2 and the Supermicro running netperf under RHEL5 
with 2.6.21-rc5-rt5.
The Non-RT test was done on the exact same setup except 2.6.21-rc5-rt5 
was loaded on the SuperMicro board.


Cyclesoak was used to measure CPU utilization in all cases.



Here are the RT results
3
## 2.6.21-rc5-rt5
###
$ !netper
netperf -l 100 -H 192.168.70.11 -t UDP_STREAM -- -m 1025
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.70.11 (192.168.70.11) port 0 AF_INET

Socket  Message  Elapsed  Messages
SizeSize Time Okay Errors   Throughput
bytes   bytessecs#  #   10^6bits/sec

1269761025   100.008676376  0 711.46
135168   100.008676376711.46

## cyclesoak during test
$ ./cyclesoak
using 2 CPUs
System load: -0.1%
System load: 40.5%
System load: 51.6%
System load: 51.5%
System load: 50.9%
System load: 50.7%
System load: 50.8%
System load: 50.7%
System load: 50.6%

 top during test
top - 13:26:48 up 8 min,  4 users,  load average: 1.74, 0.46, 0.15
Tasks: 149 total,   4 running, 145 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.7%us, 16.8%sy, 50.6%ni,  0.0%id,  0.0%wa, 25.6%hi,  6.3%si,  
0.0%st

Mem:   2035444k total,   465888k used,  1569556k free,28840k buffers
Swap:  3068372k total,0k used,  3068372k free,   318668k cached

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
3865 eadi  39  19  6804 1164  108 R  100  0.1   0:38.25 cyclesoak
2715 root -51  -5 000 S   51  0.0   0:09.52 IRQ-8406
3867 eadi  25   0  6440  632  480 R   34  0.0   0:06.03 
netperf 19 root -51   0 000 S   13  0.0   
0:02.33 softirq-net-tx/

3866 eadi  39  19  6804 1164  108 R1  0.1   0:20.47 cyclesoak
3167 root  25   0 29888 1180  888 S0  0.1   0:00.93 automount
3861 eadi  15   0 12712 1076  788 R0  0.1   0:00.19 top
   1 root  18   0 10308  668  552 S0  0.0   0:00.67 init 
2 root  RT   0 000 S0  0.0   0:00.00 migration/0   
   3 root  RT   0 000 S0  0.0   0:00.00 posix_cpu_timer

   4 root -51   0 000 S0  0.0   0:00.00 softirq-high/0
   5 root -51   0 000 S0  0.0   0:00.00 softirq-timer/0
   6 root -51   0 000 S0  0.0   0:00.00 softirq-net-tx/
   7 root -51   0 000 S0  0.0   0:00.00 softirq-net-rx/
   8 root -51   0 000 S0  0.0   0:00.00 softirq-block/0


The baseline results:
RHEL5 with 2.6.21-rc5 kernel
##

$  netperf -l 100 -H 192.168.70.11 -t UDP_STREAM -- -m 1025

UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.70.11 (192.168.70.11) port 0 AF_INET

Socket  Message  Elapsed  Messages
SizeSize Time Okay Errors   Throughput
bytes   bytessecs#  #   10^6bits/sec

1269761025   100.0011405485  0 935.24
135168   100.0011405485935.24

###
$ ./cyclesoak
using 2 CPUs
System load:  7.6%
System load: 29.6%
System load: 29.6%
System load: 28.9%
System load: 24.9%
System load: 25.0%
System load: 24.8%
System load: 24.9%

###
top:top - 13:52:22 up 10 min,  6 users,  load average: 1.46, 0.43, 0.17
Tasks: 118 total,   4 running, 114 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.5%us,  9.8%sy, 75.7%ni,  0.0%id,  0.0%wa,  5.8%hi,  8.1%si,  
0.0%st

Mem:   2057200k total,   459128k used,  1598072k free,29020k buffers
Swap:  3068372k total,0k used,  3068372k free,   318968k cached

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
3882 eadi  39  19  6804 1164  108 R  100  0.1   0:52.11 cyclesoak
3881 eadi  39  19  6804 1164  108 R   65  0.1   0:38.47 cyclesoak
3883 eadi  15   0  6436  632  480 R   35  0.0   0:18.26 netperf
3879 eadi  15   0 12580 1052  788 R0  0.1   0:00.15 top
   1 root  18   0 10308  664  552 S0  0.0   0:00.48 init 
2 root  RT   0 000 S0  0.0   0:00.00 migration/0

   3 root  34  19 000 S0  0.0   

Re: Client receives TCP packets but does not ACK

2001-07-01 Thread Nivedita Singhvi

> The bad network behavior was due to shared irqs somehow screwing 
> things up. This explained most but not all of the problems. 

ah, that's why your test pgm succeeded on my systems..
 
> When I last posted I had a reproducible test case which spewed a bunch 
> of packets from a server to a client. The behavior is that the client 
> eventually stops ACKing and so the the connection stalls indefinitely. 
> packet. I added printk statements for each of these conditions in 
> hopes of detecting why the final packet is not acked. I recompiled 
> the kernel, and reran the test. The result was that the packet was 
> being droped in tcp_rcv_established() due to an invalid checksum. I 

Ouch!

In the interests of not having it be so painful to identify the
problem (to this point, i.e. TCP drops due to checksum failures) 
the next time around, I'd like to ask:

- Were you seeing any bad csum error messages in /var/log/messages?
  i.e. or else was it only TCP?

- Was the stats field /proc/net/snmp/Tcp:InErrs
  reflecting those drops?

- What additional logging/stats gathering would have made this
  (silent drops due to checksum failures by TCP) easier to detect?

  My 2c:

  The stat TcpInErrs is updated for most TCP input failures.
  So its not obvious (unless youre real familiar with TCP)
  that there are checksum failures happening. It actually 
  includes only these errors:
- checksum failures
- header len problems
- unexpected SYN's
 
  Is this adequate as a diagnostic, or would adding a breakdown
  counter(s) for checksum (and other) failures be useful? 
  At the moment, there is no logging TCP does on a plain vanilla 
  kernel, you have to recompile the kernel with NETDEBUG in order 
  to see logged checksum failures, at least at the TCP level. 

  It would be nice to have people be able to look at a counter or 
  stat on the fly and tell whether they're having packets silently 
  dropped due to checksum failures (and other issues) without needing 
  to recompile the kernel...
   
Any thoughts?

thanks,
Nivedita

---
I'd appreciate a cc since I'm not subscribed..
[EMAIL PROTECTED]
[EMAIL PROTECTED] 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-07-01 Thread Nivedita Singhvi

 The bad network behavior was due to shared irqs somehow screwing 
 things up. This explained most but not all of the problems. 

ah, that's why your test pgm succeeded on my systems..
 
 When I last posted I had a reproducible test case which spewed a bunch 
 of packets from a server to a client. The behavior is that the client 
 eventually stops ACKing and so the the connection stalls indefinitely. 
 packet. I added printk statements for each of these conditions in 
 hopes of detecting why the final packet is not acked. I recompiled 
 the kernel, and reran the test. The result was that the packet was 
 being droped in tcp_rcv_established() due to an invalid checksum. I 

Ouch!

In the interests of not having it be so painful to identify the
problem (to this point, i.e. TCP drops due to checksum failures) 
the next time around, I'd like to ask:

- Were you seeing any bad csum error messages in /var/log/messages?
  i.e. or else was it only TCP?

- Was the stats field /proc/net/snmp/Tcp:InErrs
  reflecting those drops?

- What additional logging/stats gathering would have made this
  (silent drops due to checksum failures by TCP) easier to detect?

  My 2c:

  The stat TcpInErrs is updated for most TCP input failures.
  So its not obvious (unless youre real familiar with TCP)
  that there are checksum failures happening. It actually 
  includes only these errors:
- checksum failures
- header len problems
- unexpected SYN's
 
  Is this adequate as a diagnostic, or would adding a breakdown
  counter(s) for checksum (and other) failures be useful? 
  At the moment, there is no logging TCP does on a plain vanilla 
  kernel, you have to recompile the kernel with NETDEBUG in order 
  to see logged checksum failures, at least at the TCP level. 

  It would be nice to have people be able to look at a counter or 
  stat on the fly and tell whether they're having packets silently 
  dropped due to checksum failures (and other issues) without needing 
  to recompile the kernel...
   
Any thoughts?

thanks,
Nivedita

---
I'd appreciate a cc since I'm not subscribed..
[EMAIL PROTECTED]
[EMAIL PROTECTED] 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Abysmal RECV network performance

2001-05-31 Thread Nivedita Singhvi

> >the Netgear FA311/2 (tulip). Found that the link lost
> >connectivity because of card lockups and transmit timeout
> >failures - and some of these were silent. However, I moved
> >to the 3C905C (3c59x driver) which behaved like a champ, and

> I'm a little confused here - do you mean the FA310TX ("tulip" driver) or the 
> FA311/2 ("natsemi" driver)? I have not had any connection problems with 
> either the FA310 or the FA311 cards. I haven't noticed any speed problems 
> with the FA311 card, but I haven't benchmarked it, either. The FA310 is so 
> horribly slow, I couldn't help but notice. Unfortunately, the same is true 
> of the 3cSOHO.

Sorry, meant to describe both, (natsemi and tulip, but latter
on older DEC chip). 

> I looked at tcpdump to try and figure it out, and it appeared that the P-90 
> was taking a very long time to ACK some packets. I am not a TCP/IP guru by 
> any stretch, but my guess at the time was that the packets that were taking 
> forever to get ACK'ed were the ones causing a framing error on the P-90, but 
> again, I'm not an expert.

> The only unusual stat is the framing errors. There are a lot of them under 
> heavy receive load. The machine will go for weeks without a single framing 
> error, but if I blast some netperf action at it (or FTP send to it, etc.) 
> then I get about 1/3 of the incoming packets (to the P-90) with framing 
> errors. I see no other errors at all except a TX overrun error (maybe 1 in 
> 10 packets).

Tried to reproduce this problem last night on my machines
at home (kernel 2.4.4, 500MHz K7/400Mhz K6). Just doing FTP
and netperf tests, didnt see any significant variation between
rcv and tx sides. Admittedly different machines, and between
3C905C and a FA310TX (tulip). However, if the problem
was purely kernel protocol under load, it should have showed. 

Also, am not seeing significant frame errors - 1 in 10K,
definitely not seeing anything remotely like 30%. If 1/3
of your packets are being dropped with frame errs, you'll see
lots of retransmissions, horrible performance, no question. But
I would expect frame errors to be due to things like the
speed not being negotiated correctly(?), or if the board
isnt sitting quite right (true - thats the only experience
I remember of recv code path being error prone compared
to tx), but that should affect all the kernel versions you
ran on that host..

Am pretty clueless about media level issues, but it would
help to identify whats causing the framing errors. 

Not much help, I know..

thanks,
Nivedita

---
Nivedita Singhvi(503) 578-4580
Linux Technology Center [EMAIL PROTECTED]
IBM Beaverton, OR   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Abysmal RECV network performance

2001-05-31 Thread Nivedita Singhvi

 the Netgear FA311/2 (tulip). Found that the link lost
 connectivity because of card lockups and transmit timeout
 failures - and some of these were silent. However, I moved
 to the 3C905C (3c59x driver) which behaved like a champ, and

 I'm a little confused here - do you mean the FA310TX (tulip driver) or the 
 FA311/2 (natsemi driver)? I have not had any connection problems with 
 either the FA310 or the FA311 cards. I haven't noticed any speed problems 
 with the FA311 card, but I haven't benchmarked it, either. The FA310 is so 
 horribly slow, I couldn't help but notice. Unfortunately, the same is true 
 of the 3cSOHO.

Sorry, meant to describe both, (natsemi and tulip, but latter
on older DEC chip). 

 I looked at tcpdump to try and figure it out, and it appeared that the P-90 
 was taking a very long time to ACK some packets. I am not a TCP/IP guru by 
 any stretch, but my guess at the time was that the packets that were taking 
 forever to get ACK'ed were the ones causing a framing error on the P-90, but 
 again, I'm not an expert.

 The only unusual stat is the framing errors. There are a lot of them under 
 heavy receive load. The machine will go for weeks without a single framing 
 error, but if I blast some netperf action at it (or FTP send to it, etc.) 
 then I get about 1/3 of the incoming packets (to the P-90) with framing 
 errors. I see no other errors at all except a TX overrun error (maybe 1 in 
 10 packets).

Tried to reproduce this problem last night on my machines
at home (kernel 2.4.4, 500MHz K7/400Mhz K6). Just doing FTP
and netperf tests, didnt see any significant variation between
rcv and tx sides. Admittedly different machines, and between
3C905C and a FA310TX (tulip). However, if the problem
was purely kernel protocol under load, it should have showed. 

Also, am not seeing significant frame errors - 1 in 10K,
definitely not seeing anything remotely like 30%. If 1/3
of your packets are being dropped with frame errs, you'll see
lots of retransmissions, horrible performance, no question. But
I would expect frame errors to be due to things like the
speed not being negotiated correctly(?), or if the board
isnt sitting quite right (true - thats the only experience
I remember of recv code path being error prone compared
to tx), but that should affect all the kernel versions you
ran on that host..

Am pretty clueless about media level issues, but it would
help to identify whats causing the framing errors. 

Not much help, I know..

thanks,
Nivedita

---
Nivedita Singhvi(503) 578-4580
Linux Technology Center [EMAIL PROTECTED]
IBM Beaverton, OR   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Abysmal RECV network performance

2001-05-29 Thread Nivedita Singhvi

> Can someone please help me troubleshoot this problem - 
> I am getting abysmal (see numbers below) network performance 
> on my system, but the poor performance seems limited to receiving 
> data. Transmission is OK. 

[ snip ]

> What kind of performance should I be seeing with a P-90 
> on a 100Mbps connection? I was expecting something in the 
> range of 40-70 Mbps - certainly not 1-2 Mbps. 
> 
> What can I do to track this problem down? Has anyone else 
> had problems like this? 

While we didnt use 2.2 kernels at all, we did similar tests
on 2.4.0 through 2.4.4 kernels, on UP and SMP. I've used
a similar machine (PII 333MHz) as well as faster (866MHz) 
machines, and got pretty nifty (> 90Mbs) throughput on 
netperf tests (tcp stream, no disk I/O) over a 100Mb full
duplex link.  (not sure if there are any P-90 issues).

Throughput does drop with small MTU, very small packet sizes,
small socket buffer sizes, but only at extremes, for the most
part throughput was well over 70Mbs. (this is true for single
connections, you dont mention how many connections you were
scaling to, if any).

However, we did run into serious performance problems with
the Netgear FA311/2 (tulip). Found that the link lost
connectivity because of card lockups and transmit timeout 
failures - and some of these were silent. However, I moved 
to the 3C905C (3c59x driver) which behaved like a champ, and 
we didnt see the problems any more, so have stuck to that card.  
This was back in the 2.4.0 time frame, and there have many 
patches since then to various drivers, so not sure if the
problem(s) have been resolved or not (likely to have been,
extensively reported). Both your cards might actually be
underperforming..

Are you seeing any errors reported in /var/log/messages?
Are you monitoring your connection via tcpdump, for example?
You might sometimes see long gaps in transmission...Are
there any abnormal numbers in /proc/net/ stats? I dont remember
seeing that high frame errors, although there were a few. 

HW checksumming for the kind of test you are doing (tcp, mostly
fast path) will not buy you any real performance gain, the
checksum is actually consumed by the user-kernel copy routine.

You can also run the tests on a profiling kernel and compare
results... 

Nivedita

---
Nivedita Singhvi(503) 578-4580
Linux Technology Center [EMAIL PROTECTED]
IBM Beaverton, OR   [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Abysmal RECV network performance

2001-05-29 Thread Nivedita Singhvi

 Can someone please help me troubleshoot this problem - 
 I am getting abysmal (see numbers below) network performance 
 on my system, but the poor performance seems limited to receiving 
 data. Transmission is OK. 

[ snip ]

 What kind of performance should I be seeing with a P-90 
 on a 100Mbps connection? I was expecting something in the 
 range of 40-70 Mbps - certainly not 1-2 Mbps. 
 
 What can I do to track this problem down? Has anyone else 
 had problems like this? 

While we didnt use 2.2 kernels at all, we did similar tests
on 2.4.0 through 2.4.4 kernels, on UP and SMP. I've used
a similar machine (PII 333MHz) as well as faster (866MHz) 
machines, and got pretty nifty ( 90Mbs) throughput on 
netperf tests (tcp stream, no disk I/O) over a 100Mb full
duplex link.  (not sure if there are any P-90 issues).

Throughput does drop with small MTU, very small packet sizes,
small socket buffer sizes, but only at extremes, for the most
part throughput was well over 70Mbs. (this is true for single
connections, you dont mention how many connections you were
scaling to, if any).

However, we did run into serious performance problems with
the Netgear FA311/2 (tulip). Found that the link lost
connectivity because of card lockups and transmit timeout 
failures - and some of these were silent. However, I moved 
to the 3C905C (3c59x driver) which behaved like a champ, and 
we didnt see the problems any more, so have stuck to that card.  
This was back in the 2.4.0 time frame, and there have many 
patches since then to various drivers, so not sure if the
problem(s) have been resolved or not (likely to have been,
extensively reported). Both your cards might actually be
underperforming..

Are you seeing any errors reported in /var/log/messages?
Are you monitoring your connection via tcpdump, for example?
You might sometimes see long gaps in transmission...Are
there any abnormal numbers in /proc/net/ stats? I dont remember
seeing that high frame errors, although there were a few. 

HW checksumming for the kind of test you are doing (tcp, mostly
fast path) will not buy you any real performance gain, the
checksum is actually consumed by the user-kernel copy routine.

You can also run the tests on a profiling kernel and compare
results... 

Nivedita

---
Nivedita Singhvi(503) 578-4580
Linux Technology Center [EMAIL PROTECTED]
IBM Beaverton, OR   [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



netperf stream scaling; patches that help?

2001-05-02 Thread Nivedita Singhvi

I'm trying to run a simple test on a pair of Linux 2.4.2 
PC's that starts up simultaneous netperf tcp stream tests, 
and find that I cant invoke more that 800 without running 
into memory allocation failures. This wouldnt be strange 
except that I find on the same systems, FreeBSD seems to 
do twice as well (1600). I complete 500 concurrent netperf 
tcp stream tests sending 64 byte packets successfully, but 
again, FreeBSD completes a 1000 successfully. Also, Linux
appears to hog around 300MB on the server side, whereas
FreeBSD only appears to be using 3MB. Those are the bare
numbers, details available, of course, but what I'd like 
to do is repeat the Linux test with 2.4.4 and include some 
VM patches that might possibly alleviate any memory 
management issues I may be running into.

This is between a 500MHz PIII Katmai and 333MHz PII
Deschutes both with 512MB memory, over a 100Mb (3C905C)
private nw.
 
I'd appreciate any pointers to patches that might help,
or suggestions in general to improve the Linux numbers.
Especially any insight into whether this is a case of
apples/oranges or whether I'm missing some trivial element
here... 

I know of Ed Tomlinson's patch posted on this list on
4/12, are there any others? I know Jonathan Morton posted
some OOM patches, are those included in 2.4.4?

thanks,
Nivedita 

---
Nivedita Singhvi(503) 578-4580
Linux Technology Center [EMAIL PROTECTED]
IBM Beaverton, OR   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



netperf stream scaling; patches that help?

2001-05-02 Thread Nivedita Singhvi

I'm trying to run a simple test on a pair of Linux 2.4.2 
PC's that starts up simultaneous netperf tcp stream tests, 
and find that I cant invoke more that 800 without running 
into memory allocation failures. This wouldnt be strange 
except that I find on the same systems, FreeBSD seems to 
do twice as well (1600). I complete 500 concurrent netperf 
tcp stream tests sending 64 byte packets successfully, but 
again, FreeBSD completes a 1000 successfully. Also, Linux
appears to hog around 300MB on the server side, whereas
FreeBSD only appears to be using 3MB. Those are the bare
numbers, details available, of course, but what I'd like 
to do is repeat the Linux test with 2.4.4 and include some 
VM patches that might possibly alleviate any memory 
management issues I may be running into.

This is between a 500MHz PIII Katmai and 333MHz PII
Deschutes both with 512MB memory, over a 100Mb (3C905C)
private nw.
 
I'd appreciate any pointers to patches that might help,
or suggestions in general to improve the Linux numbers.
Especially any insight into whether this is a case of
apples/oranges or whether I'm missing some trivial element
here... 

I know of Ed Tomlinson's patch posted on this list on
4/12, are there any others? I know Jonathan Morton posted
some OOM patches, are those included in 2.4.4?

thanks,
Nivedita 

---
Nivedita Singhvi(503) 578-4580
Linux Technology Center [EMAIL PROTECTED]
IBM Beaverton, OR   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/