On Monday 17 April 2006 11:18, Stephen Hemminger wrote:
> I don't know what you are doing different, but my 2 port SysKonnect
> card is working fine.  Running SMP AMD64 and 2.6.17 latest.
>
> Showing full speed on both ports.
I missed that e-mail, sorry.

I just gave it another try, this time with 2.6.16.11 . One port works 
fine (so far, I just did very limited testing with ttcp). The second port 
does negotiate IP address via DHCP, but the packgages it receives 
seem to be garbled:

--8<--
       0x0000:  0000 6175 6469 7428 3131 3435 3939 3430  ..audit(11459940
        0x0010:  3031 2e39 3738 3a33 3829 3a20 7573 6572  01.978:38):.user
        0x0020:  2070 6964 3d33 3230 3920 7569 643d       .pid=3209.uid=
12:56:23.725090 00:00:00:00:00:00 > 30:6e:6d:00:00:00 null I (s=32,r=55,P) 
len=42
12:56:24.603274 00:00:21:00:00:00 > 00:00:00:00:00:00 null disc/C len=43
12:56:26.619326 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
12:56:28.635346 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
12:56:29.734046 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
12:56:29.865239 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
12:56:30.651371 00:00:00:00:00:00 > a6:00:00:00:4d:04, ethertype Unknown 
(0xe20c), length 60:
        0x0000:  0000 6175 6469 7428 3131 3435 3939 3436  ..audit(11459946
        0x0010:  3031 2e33 3639 3a34 3729 3a20 7573 6572  01.369:47):.user
        0x0020:  2070 6964 3d33 3239 3820 7569 643d       .pid=3298.uid=
12:56:30.916718 00:00:f0:71:61:00 > 28:37:03:5b:3a:00 null I (s=16,r=0,C) len=42
12:56:30.923558 00:00:21:00:00:00 > 00:00:00:00:00:00 null rnr (r=55,C) len=42
12:56:32.667413 00:00:d0:2e:30:42 > 10:60:61:00:00:00, ethertype Unknown 
(0x572b), length 60:
        0x0000:  0000 d675 0d00 0000 0000 0200 0000 0000  ...u............
        0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0020:  0000 ffff ffff 0000 0000 1300 0000       ..............
12:56:33.296384 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
12:56:33.303222 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
[..]
13:00:44.340062 00:00:00:00:00:00 > 5f:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:44.672350 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:44.868724 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:45.340123 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:46.340173 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:46.688433 IP truncated-ip - 1454 bytes missing! 192.168.65.66.40313 > 
192.168.65.65.5001: . 1426488980:1426490428(1448) ack 1790562292 win 1460 
<nop,nop,timestamp[|tcp]>
13:00:48.704431 00:00:21:00:00:00 > 00:00:00:00:00:00 null I (s=17,r=18,C) 
len=42
13:00:48.886426 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:50.720463 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:52.736496 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:54.752522 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:54.927556 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:54.934394 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
-->8--
On a different host connected to the same switch, traffic looks more like:
--8<--
2:01:49.388992 IP 192.168.64.1.ntp > 255.255.255.255.ntp: NTPv3, Broadcast, 
length 48
12:01:50.176550 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 
8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
12:01:51.235034 arp reply 192.168.64.32 is-at 00:0a:49:00:5e:8a
12:01:51.241857 arp reply 192.168.64.33 is-at 00:0a:49:00:5e:8b
12:01:51.891193 00:00:01:02:c8:58 > 45:c0:00:1c:00:20, ethertype Unknown 
(0xe000), length 60:
        0x0000:  0001 1164 ee9b 0000 0000 0000 0000 0000  ...d............
        0x0010:  0000 0000 0000 0000 0000 0000 2f6b 8c87  ............/k..
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
12:01:52.192552 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 
8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
12:01:52.801392 arp reply 192.168.64.34 is-at 00:0a:49:00:5e:8c
12:01:52.808240 arp reply 192.168.64.35 is-at 00:0a:49:00:5e:8d
12:01:54.208495 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 
8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
12:01:56.224453 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 
8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
12:01:58.240464 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 
8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
12:02:00.029320 arp reply 192.168.64.39 is-at 00:0a:49:00:5e:ff
12:02:00.256420 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 
8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
-->8--

I noticed that the interrupt count is very low too (the interrupt count
as shown in /proc/interrupts is much higher):
--8<--
[EMAIL PROTECTED] ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:A0:D1:E1:F2:D8
          inet addr:192.168.65.65  Bcast:192.168.65.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:4559786 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4071967 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:4680823977 (4.3 GiB)  TX bytes:4332319475 (4.0 GiB)
          Interrupt:169

eth1      Link encap:Ethernet  HWaddr 00:A0:D1:E1:F2:D9
          inet addr:192.168.64.199  Bcast:192.168.64.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2193 errors:0 dropped:0 overruns:0 frame:0
          TX packets:29 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:180137 (175.9 KiB)  TX bytes:1856 (1.8 KiB)
          Interrupt:169
-->8--

I then tried 2.6.17-rc2-git6. At first it looked OK, the second ethernet 
device was configured properly and I got some traffic through. Once 
I started copying large files (some 5GB were successfully copied) over 
NFS using a (very) fast NFS server though, traffic received by eth1 got
corrupted again:

--8<--
 [EMAIL PROTECTED] ~]# tcpdump -n -i eth1 -s 0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
14:23:14.049450 arp who-has 192.168.64.199 tell 192.168.64.202
14:23:14.049519 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
14:23:14.745075 arp who-has 192.168.64.199 tell 192.168.64.202
14:23:14.745082 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
14:23:14.852108 IP truncated-ip - 1454 bytes missing! 192.168.64.110.nfs > 
192.168.64.199.1021: . 159991419:159992879(1460) ack 3444328765 win 64240
14:23:14.944489 00:00:00:00:00:00 > a3:00:00:00:50:04, ethertype Unknown 
(0x210d), length 98:
        0x0000:  0000 6175 6469 7428 3131 3436 3030 3030  ..audit(11460000
        0x0010:  3032 2e31 3836 3a36 3329 3a20 7573 6572  02.186:63):.user
        0x0020:  2070 6964 3d33 3336 3120 7569 643d 3020  .pid=3361.uid=0.
        0x0030:  6175 6964 3d34 3239 3439 3637 3239 3520  auid=4294967295.
        0x0040:  6d73 673d 2750 414d 2073 6574 6372 6564  msg='PAM.setcred
        0x0050:  3a20 7573                                :.us
14:23:15.944703 arp who-has 192.168.64.253 tell 192.168.79.254
14:23:16.868291 arp who-has 192.168.64.199 tell 192.168.64.202
14:23:16.868301 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
14:23:16.944907 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 
192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
14:23:17.945113 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 
192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
14:23:18.884430 arp who-has 192.168.64.199 tell 192.168.64.202
14:23:18.884441 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
14:23:18.945318 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 
192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
-->8--

The ".audit ... PAM.sedcred" string is interesting. This is most likely 
not traffic from the net, but a text inside the host's RAM. Did some 
pointer get mangled?
 
I recompiled the kernel, now with RHFC4's gcc32. The result is similiar
(only after some data was copied using NFS, the second interface goes
bad):
--8<--
[EMAIL PROTECTED] ~]# tcpdump -n -s 0 -i eth1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
15:48:02.306927 IP 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 
8801
15:48:02.316088 arp who-has 192.168.64.202 tell 192.168.64.199
15:48:02.316329 arp who-has 192.168.64.199 tell 192.168.79.254
15:48:02.316335 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
15:48:02.316338 802.1d config 8000.00:a0:d1:e1:b4:78.8025 root 
8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
15:48:03.307095 IP 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 
8802
15:48:03.307289 IP truncated-ip - 38 bytes missing! 192.168.64.202 > 
192.168.64.199: icmp 64: echo request seq 8803
15:48:03.316166 arp who-has 192.168.64.202 tell 192.168.64.199
15:48:03.316397 arp who-has 192.168.64.199 tell 192.168.79.254
15:48:03.316401 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
15:48:03.316404 802.1d config 8000.00:a0:d1:e1:b4:78.8025 root 
8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
15:48:03.784698 IP truncated-ip - 38 bytes missing! 192.168.64.202 > 
192.168.64.199: icmp 64: echo request seq 8804

12 packets captured
12 packets received by filter
0 packets dropped by kernel
-->8--
No suspect text and no zero filled packets, only truncated ones now,
but that's bad enough to stop NFS and cause bad packet loss:
--8<--
64 bytes from 192.168.64.199: icmp_seq=83 ttl=64 time=147073 ms
64 bytes from 192.168.64.199: icmp_seq=84 ttl=64 time=149073 ms
64 bytes from 192.168.64.199: icmp_seq=85 ttl=64 time=149073 ms
64 bytes from 192.168.64.199: icmp_seq=87 ttl=64 time=149073 ms
64 bytes from 192.168.64.199: icmp_seq=88 ttl=64 time=149073 ms
64 bytes from 192.168.64.199: icmp_seq=233 ttl=64 time=82023 ms
64 bytes from 192.168.64.199: icmp_seq=236 ttl=64 time=80018 ms
64 bytes from 192.168.64.199: icmp_seq=241 ttl=64 time=81018 ms
64 bytes from 192.168.64.199: icmp_seq=243 ttl=64 time=81018 ms
64 bytes from 192.168.64.199: icmp_seq=253 ttl=64 time=85018 ms
64 bytes from 192.168.64.199: icmp_seq=255 ttl=64 time=85018 ms
64 bytes from 192.168.64.199: icmp_seq=256 ttl=64 time=85629 ms
64 bytes from 192.168.64.199: icmp_seq=257 ttl=64 time=87023 ms

--- 192.168.64.199 ping statistics ---
346 packets transmitted, 63 received, +3 errors, 81% packet loss, time 345136ms
rtt min/avg/max/mdev = 80018.748/119940.275/149073.885/21090.211 ms, pipe 151
-->8--

Considering the recent NFS changes, I tried to get the system into this
state using just ttcp. With some determination, three more hosts and 
a few million packets, I succeeded. This time eth0 truncated packets
and traffic slowed to a crawl (~1 good packet every 2s).

Some progress has been made, but it's not quite solid yet.

best regards
        Guenther
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to