On Monday 17 April 2006 11:18, Stephen Hemminger wrote: > I don't know what you are doing different, but my 2 port SysKonnect > card is working fine. Running SMP AMD64 and 2.6.17 latest. > > Showing full speed on both ports. I missed that e-mail, sorry.
I just gave it another try, this time with 2.6.16.11 . One port works fine (so far, I just did very limited testing with ttcp). The second port does negotiate IP address via DHCP, but the packgages it receives seem to be garbled: --8<-- 0x0000: 0000 6175 6469 7428 3131 3435 3939 3430 ..audit(11459940 0x0010: 3031 2e39 3738 3a33 3829 3a20 7573 6572 01.978:38):.user 0x0020: 2070 6964 3d33 3230 3920 7569 643d .pid=3209.uid= 12:56:23.725090 00:00:00:00:00:00 > 30:6e:6d:00:00:00 null I (s=32,r=55,P) len=42 12:56:24.603274 00:00:21:00:00:00 > 00:00:00:00:00:00 null disc/C len=43 12:56:26.619326 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 12:56:28.635346 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 12:56:29.734046 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 12:56:29.865239 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 12:56:30.651371 00:00:00:00:00:00 > a6:00:00:00:4d:04, ethertype Unknown (0xe20c), length 60: 0x0000: 0000 6175 6469 7428 3131 3435 3939 3436 ..audit(11459946 0x0010: 3031 2e33 3639 3a34 3729 3a20 7573 6572 01.369:47):.user 0x0020: 2070 6964 3d33 3239 3820 7569 643d .pid=3298.uid= 12:56:30.916718 00:00:f0:71:61:00 > 28:37:03:5b:3a:00 null I (s=16,r=0,C) len=42 12:56:30.923558 00:00:21:00:00:00 > 00:00:00:00:00:00 null rnr (r=55,C) len=42 12:56:32.667413 00:00:d0:2e:30:42 > 10:60:61:00:00:00, ethertype Unknown (0x572b), length 60: 0x0000: 0000 d675 0d00 0000 0000 0200 0000 0000 ...u............ 0x0010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0020: 0000 ffff ffff 0000 0000 1300 0000 .............. 12:56:33.296384 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 12:56:33.303222 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 [..] 13:00:44.340062 00:00:00:00:00:00 > 5f:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:44.672350 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:44.868724 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:45.340123 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:46.340173 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:46.688433 IP truncated-ip - 1454 bytes missing! 192.168.65.66.40313 > 192.168.65.65.5001: . 1426488980:1426490428(1448) ack 1790562292 win 1460 <nop,nop,timestamp[|tcp]> 13:00:48.704431 00:00:21:00:00:00 > 00:00:00:00:00:00 null I (s=17,r=18,C) len=42 13:00:48.886426 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:50.720463 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:52.736496 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:54.752522 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:54.927556 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:54.934394 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 -->8-- On a different host connected to the same switch, traffic looks more like: --8<-- 2:01:49.388992 IP 192.168.64.1.ntp > 255.255.255.255.ntp: NTPv3, Broadcast, length 48 12:01:50.176550 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 12:01:51.235034 arp reply 192.168.64.32 is-at 00:0a:49:00:5e:8a 12:01:51.241857 arp reply 192.168.64.33 is-at 00:0a:49:00:5e:8b 12:01:51.891193 00:00:01:02:c8:58 > 45:c0:00:1c:00:20, ethertype Unknown (0xe000), length 60: 0x0000: 0001 1164 ee9b 0000 0000 0000 0000 0000 ...d............ 0x0010: 0000 0000 0000 0000 0000 0000 2f6b 8c87 ............/k.. 0x0020: 0000 0000 0000 0000 0000 0000 0000 .............. 12:01:52.192552 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 12:01:52.801392 arp reply 192.168.64.34 is-at 00:0a:49:00:5e:8c 12:01:52.808240 arp reply 192.168.64.35 is-at 00:0a:49:00:5e:8d 12:01:54.208495 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 12:01:56.224453 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 12:01:58.240464 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 12:02:00.029320 arp reply 192.168.64.39 is-at 00:0a:49:00:5e:ff 12:02:00.256420 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 -->8-- I noticed that the interrupt count is very low too (the interrupt count as shown in /proc/interrupts is much higher): --8<-- [EMAIL PROTECTED] ~]# ifconfig eth0 Link encap:Ethernet HWaddr 00:A0:D1:E1:F2:D8 inet addr:192.168.65.65 Bcast:192.168.65.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:4559786 errors:0 dropped:0 overruns:0 frame:0 TX packets:4071967 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:4680823977 (4.3 GiB) TX bytes:4332319475 (4.0 GiB) Interrupt:169 eth1 Link encap:Ethernet HWaddr 00:A0:D1:E1:F2:D9 inet addr:192.168.64.199 Bcast:192.168.64.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2193 errors:0 dropped:0 overruns:0 frame:0 TX packets:29 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:180137 (175.9 KiB) TX bytes:1856 (1.8 KiB) Interrupt:169 -->8-- I then tried 2.6.17-rc2-git6. At first it looked OK, the second ethernet device was configured properly and I got some traffic through. Once I started copying large files (some 5GB were successfully copied) over NFS using a (very) fast NFS server though, traffic received by eth1 got corrupted again: --8<-- [EMAIL PROTECTED] ~]# tcpdump -n -i eth1 -s 0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes 14:23:14.049450 arp who-has 192.168.64.199 tell 192.168.64.202 14:23:14.049519 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9 14:23:14.745075 arp who-has 192.168.64.199 tell 192.168.64.202 14:23:14.745082 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9 14:23:14.852108 IP truncated-ip - 1454 bytes missing! 192.168.64.110.nfs > 192.168.64.199.1021: . 159991419:159992879(1460) ack 3444328765 win 64240 14:23:14.944489 00:00:00:00:00:00 > a3:00:00:00:50:04, ethertype Unknown (0x210d), length 98: 0x0000: 0000 6175 6469 7428 3131 3436 3030 3030 ..audit(11460000 0x0010: 3032 2e31 3836 3a36 3329 3a20 7573 6572 02.186:63):.user 0x0020: 2070 6964 3d33 3336 3120 7569 643d 3020 .pid=3361.uid=0. 0x0030: 6175 6964 3d34 3239 3439 3637 3239 3520 auid=4294967295. 0x0040: 6d73 673d 2750 414d 2073 6574 6372 6564 msg='PAM.setcred 0x0050: 3a20 7573 :.us 14:23:15.944703 arp who-has 192.168.64.253 tell 192.168.79.254 14:23:16.868291 arp who-has 192.168.64.199 tell 192.168.64.202 14:23:16.868301 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9 14:23:16.944907 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST 14:23:17.945113 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST 14:23:18.884430 arp who-has 192.168.64.199 tell 192.168.64.202 14:23:18.884441 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9 14:23:18.945318 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST -->8-- The ".audit ... PAM.sedcred" string is interesting. This is most likely not traffic from the net, but a text inside the host's RAM. Did some pointer get mangled? I recompiled the kernel, now with RHFC4's gcc32. The result is similiar (only after some data was copied using NFS, the second interface goes bad): --8<-- [EMAIL PROTECTED] ~]# tcpdump -n -s 0 -i eth1 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes 15:48:02.306927 IP 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8801 15:48:02.316088 arp who-has 192.168.64.202 tell 192.168.64.199 15:48:02.316329 arp who-has 192.168.64.199 tell 192.168.79.254 15:48:02.316335 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9 15:48:02.316338 802.1d config 8000.00:a0:d1:e1:b4:78.8025 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 15:48:03.307095 IP 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8802 15:48:03.307289 IP truncated-ip - 38 bytes missing! 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8803 15:48:03.316166 arp who-has 192.168.64.202 tell 192.168.64.199 15:48:03.316397 arp who-has 192.168.64.199 tell 192.168.79.254 15:48:03.316401 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9 15:48:03.316404 802.1d config 8000.00:a0:d1:e1:b4:78.8025 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 15:48:03.784698 IP truncated-ip - 38 bytes missing! 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8804 12 packets captured 12 packets received by filter 0 packets dropped by kernel -->8-- No suspect text and no zero filled packets, only truncated ones now, but that's bad enough to stop NFS and cause bad packet loss: --8<-- 64 bytes from 192.168.64.199: icmp_seq=83 ttl=64 time=147073 ms 64 bytes from 192.168.64.199: icmp_seq=84 ttl=64 time=149073 ms 64 bytes from 192.168.64.199: icmp_seq=85 ttl=64 time=149073 ms 64 bytes from 192.168.64.199: icmp_seq=87 ttl=64 time=149073 ms 64 bytes from 192.168.64.199: icmp_seq=88 ttl=64 time=149073 ms 64 bytes from 192.168.64.199: icmp_seq=233 ttl=64 time=82023 ms 64 bytes from 192.168.64.199: icmp_seq=236 ttl=64 time=80018 ms 64 bytes from 192.168.64.199: icmp_seq=241 ttl=64 time=81018 ms 64 bytes from 192.168.64.199: icmp_seq=243 ttl=64 time=81018 ms 64 bytes from 192.168.64.199: icmp_seq=253 ttl=64 time=85018 ms 64 bytes from 192.168.64.199: icmp_seq=255 ttl=64 time=85018 ms 64 bytes from 192.168.64.199: icmp_seq=256 ttl=64 time=85629 ms 64 bytes from 192.168.64.199: icmp_seq=257 ttl=64 time=87023 ms --- 192.168.64.199 ping statistics --- 346 packets transmitted, 63 received, +3 errors, 81% packet loss, time 345136ms rtt min/avg/max/mdev = 80018.748/119940.275/149073.885/21090.211 ms, pipe 151 -->8-- Considering the recent NFS changes, I tried to get the system into this state using just ttcp. With some determination, three more hosts and a few million packets, I succeeded. This time eth0 truncated packets and traffic slowed to a crawl (~1 good packet every 2s). Some progress has been made, but it's not quite solid yet. best regards Guenther - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html