Bug#649486: Forget about this - was triggered by extensive IPv6 address scanning

2011-11-27 Thread Bjørn Mork
Ben Hutchings b...@decadent.org.uk writes:

 The configuration looks fine to me.

OK.

It does not look like I'm able to reproduce this either.   I am only
able to trigger the expected

  Neighbour table overflow.

and a few additional

  ICMPv6 ND: ndisc_build_skb() failed to allocate an skb, err=-11.

but no TX watchdog.


Bjørn



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#649486: Forget about this - was triggered by extensive IPv6 address scanning

2011-11-25 Thread Bjørn Mork
Ben Hutchings b...@decadent.org.uk writes:

 On Mon, 2011-11-21 at 20:13 +0100, Bjørn Mork wrote:
 Looks like my wife did some external scans of our home network :-)
 
 Have to investigate further how she managed to kill the interface, but
 this is definitely not related to the driver upgrade.  Sorry for my
 misleading initial report.

 So far as I'm aware, if the TX watchdog fires it indicates one of:

 1. A bug in the driver, firmware or hardware caused the hardware
 transmit queue to stop.
 2. A bug in the driver, firmware or hardware meant that the kernel was
 not notified of link-down or another interruption that is expected to
 stop the hardware transmit queue.
 3. Transmission is being continually blocked by (full-duplex link) pause
 frames or (half-duplex link) collisions.  This may occur due to a switch
 misconfiguration or inconsistent configuration between switch and host.

 High levels of traffic or specific traffic patterns that overload the
 CPU should never cause this to happen.  As the primary maintainer of
 another Linux network driver, I have to treat every 'TX watchdog' report
 as a bug unless it falls into case 3.

This may very well be an example of case 3. The failing interface is
connected to a gig port on a Cisco Catalyst C2950G.  Both the switch
port and the host port is configured for both input and output
flow-control.

canardo:/tmp# ethtool -a eth1
Pause parameters for eth1:
Autonegotiate:  on
RX: on
TX: on

canardo:/tmp# ethtool eth1
Settings for eth1:
Supported ports: [ TP ]
Supported link modes:   10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Full 
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Full 
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: on
Supports Wake-on: d
Wake-on: d
Current message level: 0x0001 (1)
Link detected: yes

c2950a#show interfaces gigabitEthernet 0/1
GigabitEthernet0/1 is up, line protocol is up (connected)
  Hardware is Gigabit Ethernet, address is 000d.bc45.b3d9 (bia 000d.bc45.b3d9)
  Description: canardo
  MTU 1500 bytes, BW 100 Kbit, DLY 10 usec, 
 reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 1000Mb/s, media type is T
  input flow-control is on, output flow-control is on 
  ARP type: ARPA, ARP Timeout 04:00:00
  1000BaseT module in GBIC slot.
  Last input 00:00:03, output 00:00:01, output hang never
  Last clearing of show interface counters never
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 544000 bits/sec, 159 packets/sec
  5 minute output rate 117000 bits/sec, 103 packets/sec
 85269919 packets input, 1110719891 bytes, 756 no buffer
 Received 1673801 broadcasts (1543541 multicast)
 0 runts, 0 giants, 0 throttles
 0 input errors, 0 CRC, 0 frame, 0 overrun, 756 ignored
 0 watchdog, 1543541 multicast, 11987 pause input
 0 input packets with dribble condition detected
 61473019 packets output, 2505206278 bytes, 0 underruns
 0 output errors, 0 collisions, 2 interface resets
 0 babbles, 0 late collision, 0 deferred
 0 lost carrier, 0 no carrier, 0 PAUSE output
 0 output buffer failures, 0 output buffers swapped out


NOTE: switch counters have unfortunately been reset since the event.



The host network configuration is rather unusual, and may seem
unnecessarily complex (but I have my reasons for most of this - I've
just forgotten them :-)


The eth1 interface is bridged with a tap interface connected to a VDE
switch running on the host.  Both the physical and virtual switch ports
are configured as trunks and a number of VLAN interfaces are put on top
of the bridge interface:

bjorn@canardo:~$ brctl show
bridge name bridge id   STP enabled interfaces
br0 8000.0015171e5e35   no  eth1
tap0
canardo:/tmp# cat /proc/net/vlan/config 
VLAN Dev name| VLAN ID
Name-Type: VLAN_NAME_TYPE_RAW_PLUS_VID_NO_PAD
br0.1  | 1  | br0
br0.7  | 7  | br0
br0.90 | 90  | br0
br0.93 | 93  | br0
br0.666| 666  | br0


This way, I can easily connect any combination I want of physical switch
port, virtual switch port and host interface, using only a single cable.

To make this even better, one of the swich ports is connected to a ADSL
modem and I'm running two PPPoE sessions from the same host over 

Bug#649486: Forget about this - was triggered by extensive IPv6 address scanning

2011-11-25 Thread Ben Hutchings
On Fri, 2011-11-25 at 12:33 +0100, Bjørn Mork wrote:
 Ben Hutchings b...@decadent.org.uk writes:
 
  On Mon, 2011-11-21 at 20:13 +0100, Bjørn Mork wrote:
  Looks like my wife did some external scans of our home network :-)
  
  Have to investigate further how she managed to kill the interface, but
  this is definitely not related to the driver upgrade.  Sorry for my
  misleading initial report.
 
  So far as I'm aware, if the TX watchdog fires it indicates one of:
 
  1. A bug in the driver, firmware or hardware caused the hardware
  transmit queue to stop.
  2. A bug in the driver, firmware or hardware meant that the kernel was
  not notified of link-down or another interruption that is expected to
  stop the hardware transmit queue.
  3. Transmission is being continually blocked by (full-duplex link) pause
  frames or (half-duplex link) collisions.  This may occur due to a switch
  misconfiguration or inconsistent configuration between switch and host.
 
  High levels of traffic or specific traffic patterns that overload the
  CPU should never cause this to happen.  As the primary maintainer of
  another Linux network driver, I have to treat every 'TX watchdog' report
  as a bug unless it falls into case 3.
 
 This may very well be an example of case 3. The failing interface is
 connected to a gig port on a Cisco Catalyst C2950G.  Both the switch
 port and the host port is configured for both input and output
 flow-control.
[...]

The configuration looks fine to me.

Ben.

-- 
Ben Hutchings
Experience is directly proportional to the value of equipment destroyed.
 - Carolyn Scheppner


signature.asc
Description: This is a digitally signed message part


Bug#649486: Forget about this - was triggered by extensive IPv6 address scanning

2011-11-24 Thread Ben Hutchings
On Mon, 2011-11-21 at 20:13 +0100, Bjørn Mork wrote:
 Looks like my wife did some external scans of our home network :-)
 
 Have to investigate further how she managed to kill the interface, but
 this is definitely not related to the driver upgrade.  Sorry for my
 misleading initial report.

So far as I'm aware, if the TX watchdog fires it indicates one of:

1. A bug in the driver, firmware or hardware caused the hardware
transmit queue to stop.
2. A bug in the driver, firmware or hardware meant that the kernel was
not notified of link-down or another interruption that is expected to
stop the hardware transmit queue.
3. Transmission is being continually blocked by (full-duplex link) pause
frames or (half-duplex link) collisions.  This may occur due to a switch
misconfiguration or inconsistent configuration between switch and host.

High levels of traffic or specific traffic patterns that overload the
CPU should never cause this to happen.  As the primary maintainer of
another Linux network driver, I have to treat every 'TX watchdog' report
as a bug unless it falls into case 3.

So I don't want to just forget this either.  But if you can't reproduce
it, it may be difficult to track down.

Ben.

-- 
Ben Hutchings
Teamwork is essential - it allows you to blame someone else.


signature.asc
Description: This is a digitally signed message part