Re: 2.6.29 & network stack strangeness

Finn Thain Fri, 05 Jun 2009 09:37:48 -0700

My only guess would be that the network stack delayed work queues depend 
upon working timer interrupts...


But since I have no knowledge of your hardware, I don't think I'll be a 
lot of help with that.

Finn


On Fri, 5 Jun 2009, Matthew Lear wrote:

> Hi - thanks for your reply.
> 
> The problem doesn't manifest only when the DHCP lease expires and I can still
> reproduce the problem with a static IP. With or without DHCP makes no 
> difference.
> 
> It seems to effect socket comms quite seriously (and quickly). If I run a 
> simple
> server program on the host that listens on a socket and writes a response 
> string
> to the socket when it receives data, and on the target I run a simple client
> program which writes a string to the socket, reads and prints the response 
> sent
> the server, I only have to send data from client to server with a delay of 1ms
> between transmissions for a few seconds and the client program hangs on 
> calling
> read() on the socket fd.
> 
> If I run a simple netcat test, eg
> 
> on target: nc -l -p 3333 > /dev/null
> on host: dd if=/dev/zero | nc <target-ip> 3333
> 
> ...strangely, once activity on the ethernet link as a result of the netcat 
> test
> ceases, running netstat -a on the target hangs for several seconds, eg:
> 
> 
> ~ # nc -l -p 3333 > /dev/null &
> ~ # netstat -a
> Active Internet connections (servers and established)
> Proto Recv-Q Send-Q Local Address           Foreign Address         State
> tcp        0      0 *:login                 *:*                     LISTEN
> tcp        0      0 *:shell                 *:*                     LISTEN
> tcp        0      0 *:sunrpc                *:*                     LISTEN
> tcp        0      0 *:finger                *:*                     LISTEN
> tcp        0      0 *:auth                  *:*                     LISTEN
> tcp        0      0 *:ftp                   *:*                     LISTEN
> tcp        0      0 *:telnet                *:*                     LISTEN
> 
> <system hangs for several seconds here>
> 
> tcp        0      0 192.168.0.11:3333       gateway0:45645
> ESTABLISHED
> udp        0      0 *:ntalk                 *:*
> udp        0      0 *:sunrpc                *:*
> Active UNIX domain sockets (servers and established)
> Proto RefCnt Flags       Type       State         I-Node Path
> unix  4      [ ]         DGRAM                    111    /dev/log
> unix  3      [ ]         STREAM     CONNECTED     123
> unix  3      [ ]         STREAM     CONNECTED     122
> unix  2      [ ]         DGRAM                    120
> unix  2      [ ]         DGRAM                    114
> ~ #
> 
> I thought this was interesting. Also, after this, I have trouble entering
> characters over the serial port / console. It seems like interrupts may having
> trouble getting serviced but this may be a side-effect...
> 
> If you run the same netstat command with strace, you can see that the delay is
> caused by polling the socket following calling send:
> 
> ...
> ...
> gettimeofday({366, 470000}, NULL)       = 0
> poll([{fd=4, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
> send(4, "lJ\1\0\0\1\0\0\0\0\0\0\00211\0010\003168\003192\7in-ad"..., 43,
> 0x4000) = 43
> poll(
> 
> 
> <delay is here>
> 
> 
> [{fd=4, events=POLLIN}], 1, 5000)  = 0
> ...
> ...
> 
> --  Matt
> 
> 
> Finn Thain wrote:
> > Does the problem manifest only when the DHCP lease expires?
> > Can you reproduce the problem with a static IP?
> > 
> > Finn
> > 
> > 
> > On Fri, 5 Jun 2009, Matthew Lear wrote:
> > 
> >> Hello all,
> >>
> >> I'm running a 2.6.29 kernel on an MMU enabled m68k coldfire mcf54455 
> >> platform
> >> and I'm having some throughput problems when running network tests.
> >>
> >> The kernel boots and mounts its rootfs from flash (jffs2). udhcpc runs, 
> >> obtains
> >> a lease from the dhcp server and configures eth0. Network connectivity is 
> >> ok. I
> >> can ping the target from the host and vice versa.
> >>
> >> 1/
> >> If I run ping -s 1500 -i 0.0001 <target ip address> on the host pc, after
> >> several mins, the kernel reports 'unexpected interrupt from 24' which is 
> >> the
> >> vector for a spurious interrupt. This message will repeat randomly (from 
> >> what I
> >> saw it appeared ~ 20 times when running the ping test above for 40 mins). 
> >> The
> >> mcf54455 reference manual describes a possible cause for spurious 
> >> interrupts.
> >> However, this test very rarely reports any packet loss, although the max 
> >> time to
> >> receive a packet can be very large indeed.
> >>
> >> 2/
> >> If I reboot, start again and run a ping flood test (ping -f) from host pc 
> >> ->
> >> target, all icmp requests are acknowledged - for a while. Before the target
> >> begins to fail to respond to the icmp requests, running top shows that the
> >> ksoftirq daemon is running at ~ 5% cpu load. This is normal as it is 
> >> involved in
> >> processing the deferred tasks of processing data fired up to the network 
> >> stack.
> >> So when the target beings to stop responding to icmp, if I then stop the 
> >> ping
> >> flood and try to ping the host from the target, there is no reply 
> >> indicated by
> >> ping. However, if you do this with a packet sniffer running (eg wireshark) 
> >> you
> >> can see that data is still being transmitted from the target -> host and 
> >> you can
> >> see the icmp reply, only the reply from the host appears to be received ok 
> >> by
> >> the fec driver but is processed by the network stack target.
> >>
> >> When in this state, a proc entry that I added to the fec driver shows that 
> >> the
> >> last return value from netif_rx() (called in the fec rx interrupt handling
> >> routine) is 1, indicating that the last packet was dropped by the network 
> >> stack,
> >> e.g.
> >>
> >> ~ # cat /proc/driver/fec
> >> total interrupts: 1421619
> >> last interrupt type: 2 [1=tx, 2=rx, 3=mii]
> >> total tx interrupts: 709148
> >> total rx interrupts: 712472
> >> total mii interrupts: 1
> >> last interrupt event: 0x2000000
> >> total eberr interrupts: 0
> >> total hberr interrupts: 0
> >> tx loop current count: 0
> >> tx loop last count: 1
> >> rx loop current count: 0
> >> rx loop last count: 1
> >> rx last cbd ctrl/status: 0x800
> >> rx last cbd len: 346
> >> rx last cbd buff addr: 0x40410000
> >> rx last netif_rx status: 1
> >>
> >> Strangely, wireshark still shows data being transmitted from the target
> >> -> host. I can see ARP requests and I can also see DHCP discovery packets 
> >> being
> >> sent by the target when its DHCP lease expires. This all looks ok, only the
> >> reply from host -> target is never processed by the target as the network 
> >> stack
> >> is in a state where it is dropping all incoming data provided to it by the 
> >> driver.
> >>
> >> I believe udhcpc utilises the network device directly, ie it does not 
> >> require an
> >> intermediate network protocol being implemented in the kernel (tcpdump is
> >> similar).
> >>
> >> The fec driver still seems to be running ok because I can see the ring 
> >> buffer
> >> address changing when data is received. Everything seems to be ok apart 
> >> from the
> >> network stack. Very strange indeed.
> >>
> >> Running network throughput tests between host and target with netcat or 
> >> netperf
> >> only run for a few seconds before activity ceases.
> >>
> >> Has anybody experienced anything similar? Why does the network stack 
> >> appear to
> >> be stuck and constantly dropping packets?
> >>
> >> Any feedback appreciated.
> >>
> >> Rgds,
> >> --  Matt
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
> >> the body of a message to [email protected]
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.29 & network stack strangeness

Reply via email to