Re: CARP and em0 timeout watchdog
On Fri, 2007-04-20 at 14:44 -0400, Sven Willenberger wrote: On Fri, 2007-04-20 at 11:27 -0700, Jack Vogel wrote: On 4/20/07, Sven Willenberger [EMAIL PROTECTED] wrote: On Fri, 2007-04-20 at 10:17 -0700, Jack Vogel wrote: On 4/20/07, Jeremy Chadwick [EMAIL PROTECTED] wrote: On Fri, Apr 20, 2007 at 11:51:56AM -0400, Sven Willenberger wrote: Having done more diagnostics I have found out it is not CARP related at all. It turns out that the same timeouts will happen when ftp'ing to the physical address IPs as well. There is also an odd situation here depending on which protocol I use. The two boxes are connected to a Dell Powerconnect 2616 gig switch with CAT6. If I scp files from the 192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth without a hiccup (I used dd to create various sized testfiles from 32M to 1G in size and just scp testfile* to the other box). On the other hand, if I connect to 192.168.0.19 using ftp (either active or passive) where ftp is being run through inetd, the interface resets (watchdog) within seconds (a few MBs) of traffic. Enabling polling does nothing, nor does changing net.inet.tcp.{recv,send}space. Any ideas why I would be seeing such behavioral differences between scp and ftp? You'll get a much higher throughput rate with FTP than you will with SSH, simply because encryption overhead is quite high (even with the Blowfish cipher). With a very fast processor and on a gigE network you'll probably see 8-9MByte/sec via SSH while 60-70MByte/sec via FTP. That's the only difference I can think of. The watchdog resets I can't explain; Jack Vogel should be able to assist with that. But it sounds like the resets only happen under very high throughput conditions (which is why you'd see it with FTP but not SSH). What kind of hardware is this interface? Watchdogs mean TX cleanup isn't happening in a reasonable time, without further data its hard to know what might be going on. Jack from pciconf: [EMAIL PROTECTED]:0:0: class=0x02 card=0x108c15d9 chip=0x108c8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'PRO/1000 PM' class= network subclass = ethernet [EMAIL PROTECTED]:0:0: class=0x02 card=0x109a15d9 chip=0x109a8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class= network subclass = ethernet em0 is the interface in question. from dmesg: em0: Intel(R) PRO/1000 Network Connection Version - 6.2.9 port 0x4000-0x401f mem 0xe030-0xe031 irq 16 at device 0.0 on pci13 em1: Intel(R) PRO/1000 Network Connection Version - 6.2.9 port 0x5000-0x501f mem 0xe040-0xe041 irq 17 at device 0.0 on pci14 OH, this is an 82573, and I've posted a firmware patcher a couple different times, there is a bit in the MANC register that is incorrectly programmed in some vendors systems. Can you search email for that patcher, it needs to run from DOS. If you are unable to find it let me know and I'll resent you a copy. Jack If you are referring to the dcgdis.ThisIsZip attachment, I found it in earlier threads, thanks. Will work on patching the nics and will keep the list updated. Thanks again. Sven I am happy to report that the firmware patch seems to have fixed the issue and I can transfer data across the gigE network without the watchdog timeouts and lockups. Thanks again!! Sven ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CARP and em0 timeout watchdog
On Wed, 2007-04-18 at 11:50 -0400, Sven Willenberger wrote: I currently have a FreeBSD 6.2-RELEASE-p3 SMP with dual intel PRO/1000PM nics configured as follows: em0: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST mtu 1500 options=bRXCSUM,TXCSUM,VLAN_MTU inet 192.168.0.18 netmask 0xff00 broadcast 192.168.0.255 ether 00:30:48:8d:5c:0a media: Ethernet autoselect (1000baseTX full-duplex) status: active em1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 4096 options=bRXCSUM,TXCSUM,VLAN_MTU inet 10.10.0.18 netmask 0xfff8 broadcast 10.10.0.23 ether 00:30:48:8d:5c:0b media: Ethernet autoselect (1000baseTX full-duplex) status: active the em0 interface connects to the LAN while the em1 interface is connected to an identical box via CAT6 crossover cable (for ggate/gmirror). Now, I have also configured a carp interface: carp0: flags=49UP,LOOPBACK,RUNNING mtu 1500 inet 192.168.0.20 netmask 0x carp: MASTER vhid 1 advbase 1 advskew 0 There are twin boxes here and I am running Samba. The problem is that with transfers across the carp IP (192.168.0.20) I end up with em0 resetting after a watchdog timeout error. This occurs whether I transfer files from a windows box using a share (samba) or via ftp. This problem does *not* occur if I ftp to the 192.168.0.19 interface (non-virtual). I suspected cabling at first so had all the cabling in question replaced with fresh CAT6 to no avail. Several gigs of data can be transferred to the real interface (em0) without any issue at all; a max of maybe 1 - 2 Gig can be transferred connected to the carp'ed IP before the em0 reset. Any ideas here? Sven Having done more diagnostics I have found out it is not CARP related at all. It turns out that the same timeouts will happen when ftp'ing to the physical address IPs as well. There is also an odd situation here depending on which protocol I use. The two boxes are connected to a Dell Powerconnect 2616 gig switch with CAT6. If I scp files from the 192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth without a hiccup (I used dd to create various sized testfiles from 32M to 1G in size and just scp testfile* to the other box). On the other hand, if I connect to 192.168.0.19 using ftp (either active or passive) where ftp is being run through inetd, the interface resets (watchdog) within seconds (a few MBs) of traffic. Enabling polling does nothing, nor does changing net.inet.tcp.{recv,send}space. Any ideas why I would be seeing such behavioral differences between scp and ftp? Sven ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CARP and em0 timeout watchdog
On Fri, Apr 20, 2007 at 11:51:56AM -0400, Sven Willenberger wrote: Having done more diagnostics I have found out it is not CARP related at all. It turns out that the same timeouts will happen when ftp'ing to the physical address IPs as well. There is also an odd situation here depending on which protocol I use. The two boxes are connected to a Dell Powerconnect 2616 gig switch with CAT6. If I scp files from the 192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth without a hiccup (I used dd to create various sized testfiles from 32M to 1G in size and just scp testfile* to the other box). On the other hand, if I connect to 192.168.0.19 using ftp (either active or passive) where ftp is being run through inetd, the interface resets (watchdog) within seconds (a few MBs) of traffic. Enabling polling does nothing, nor does changing net.inet.tcp.{recv,send}space. Any ideas why I would be seeing such behavioral differences between scp and ftp? You'll get a much higher throughput rate with FTP than you will with SSH, simply because encryption overhead is quite high (even with the Blowfish cipher). With a very fast processor and on a gigE network you'll probably see 8-9MByte/sec via SSH while 60-70MByte/sec via FTP. That's the only difference I can think of. The watchdog resets I can't explain; Jack Vogel should be able to assist with that. But it sounds like the resets only happen under very high throughput conditions (which is why you'd see it with FTP but not SSH). -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CARP and em0 timeout watchdog
On Fri, 2007-04-20 at 09:04 -0700, Jeremy Chadwick wrote: On Fri, Apr 20, 2007 at 11:51:56AM -0400, Sven Willenberger wrote: Having done more diagnostics I have found out it is not CARP related at all. It turns out that the same timeouts will happen when ftp'ing to the physical address IPs as well. There is also an odd situation here depending on which protocol I use. The two boxes are connected to a Dell Powerconnect 2616 gig switch with CAT6. If I scp files from the 192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth without a hiccup (I used dd to create various sized testfiles from 32M to 1G in size and just scp testfile* to the other box). On the other hand, if I connect to 192.168.0.19 using ftp (either active or passive) where ftp is being run through inetd, the interface resets (watchdog) within seconds (a few MBs) of traffic. Enabling polling does nothing, nor does changing net.inet.tcp.{recv,send}space. Any ideas why I would be seeing such behavioral differences between scp and ftp? You'll get a much higher throughput rate with FTP than you will with SSH, simply because encryption overhead is quite high (even with the Blowfish cipher). With a very fast processor and on a gigE network you'll probably see 8-9MByte/sec via SSH while 60-70MByte/sec via FTP. That's the only difference I can think of. The watchdog resets I can't explain; Jack Vogel should be able to assist with that. But it sounds like the resets only happen under very high throughput conditions (which is why you'd see it with FTP but not SSH). I guess it is possible that the traffic from ftp (or smb) is overloading the interface; fwiw, if I increase the {recv,send}space to 131072 I can acheive 32MB+/s using scp (and ftp shows similar values). The real question is how to avoid these watchdog timeouts during heavy traffic; the whole point here was to replace windows-based fileshare servers with FreeBSD for the local network but at the moment it is proving ineffectual as any samba file transfers stall (much like ftp). I see no other error messages in the logfiles other than the watchdog timeouts plus interface down/up messages. Sven ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CARP and em0 timeout watchdog
- Original Message - From: Sven Willenberger [EMAIL PROTECTED] To: Jeremy Chadwick [EMAIL PROTECTED] Cc: freebsd-stable@FreeBSD.org Sent: Friday, April 20, 2007 6:25 PM Subject: Re: CARP and em0 timeout watchdog On Fri, 2007-04-20 at 09:04 -0700, Jeremy Chadwick wrote: On Fri, Apr 20, 2007 at 11:51:56AM -0400, Sven Willenberger wrote: Having done more diagnostics I have found out it is not CARP related at all. It turns out that the same timeouts will happen when ftp'ing to the physical address IPs as well. There is also an odd situation here depending on which protocol I use. The two boxes are connected to a Dell Powerconnect 2616 gig switch with CAT6. If I scp files from the 192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth without a hiccup (I used dd to create various sized testfiles from 32M to 1G in size and just scp testfile* to the other box). On the other hand, if I connect to 192.168.0.19 using ftp (either active or passive) where ftp is being run through inetd, the interface resets (watchdog) within seconds (a few MBs) of traffic. Enabling polling does nothing, nor does changing net.inet.tcp.{recv,send}space. Any ideas why I would be seeing such behavioral differences between scp and ftp? You'll get a much higher throughput rate with FTP than you will with SSH, simply because encryption overhead is quite high (even with the Blowfish cipher). With a very fast processor and on a gigE network you'll probably see 8-9MByte/sec via SSH while 60-70MByte/sec via FTP. That's the only difference I can think of. The watchdog resets I can't explain; Jack Vogel should be able to assist with that. But it sounds like the resets only happen under very high throughput conditions (which is why you'd see it with FTP but not SSH). I guess it is possible that the traffic from ftp (or smb) is overloading the interface; fwiw, if I increase the {recv,send}space to 131072 I can acheive 32MB+/s using scp (and ftp shows similar values). The real question is how to avoid these watchdog timeouts during heavy traffic; the whole point here was to replace windows-based fileshare servers with FreeBSD for the local network but at the moment it is proving ineffectual as any samba file transfers stall (much like ftp). I see no other error messages in the logfiles other than the watchdog timeouts plus interface down/up messages. Sven Sorry for jumping on a thread here. I've had issues with em NIC's as well. Especially with heavy loads. What helped for me was turning on polling. I recompiled the kernel with polling and turned it on in rc.conf and my problems disappeared. Are you running with polling on? -Clay ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CARP and em0 timeout watchdog
On 4/20/07, Jeremy Chadwick [EMAIL PROTECTED] wrote: On Fri, Apr 20, 2007 at 11:51:56AM -0400, Sven Willenberger wrote: Having done more diagnostics I have found out it is not CARP related at all. It turns out that the same timeouts will happen when ftp'ing to the physical address IPs as well. There is also an odd situation here depending on which protocol I use. The two boxes are connected to a Dell Powerconnect 2616 gig switch with CAT6. If I scp files from the 192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth without a hiccup (I used dd to create various sized testfiles from 32M to 1G in size and just scp testfile* to the other box). On the other hand, if I connect to 192.168.0.19 using ftp (either active or passive) where ftp is being run through inetd, the interface resets (watchdog) within seconds (a few MBs) of traffic. Enabling polling does nothing, nor does changing net.inet.tcp.{recv,send}space. Any ideas why I would be seeing such behavioral differences between scp and ftp? You'll get a much higher throughput rate with FTP than you will with SSH, simply because encryption overhead is quite high (even with the Blowfish cipher). With a very fast processor and on a gigE network you'll probably see 8-9MByte/sec via SSH while 60-70MByte/sec via FTP. That's the only difference I can think of. The watchdog resets I can't explain; Jack Vogel should be able to assist with that. But it sounds like the resets only happen under very high throughput conditions (which is why you'd see it with FTP but not SSH). What kind of hardware is this interface? Watchdogs mean TX cleanup isn't happening in a reasonable time, without further data its hard to know what might be going on. Jack ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CARP and em0 timeout watchdog
On Fri, 2007-04-20 at 10:17 -0700, Jack Vogel wrote: On 4/20/07, Jeremy Chadwick [EMAIL PROTECTED] wrote: On Fri, Apr 20, 2007 at 11:51:56AM -0400, Sven Willenberger wrote: Having done more diagnostics I have found out it is not CARP related at all. It turns out that the same timeouts will happen when ftp'ing to the physical address IPs as well. There is also an odd situation here depending on which protocol I use. The two boxes are connected to a Dell Powerconnect 2616 gig switch with CAT6. If I scp files from the 192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth without a hiccup (I used dd to create various sized testfiles from 32M to 1G in size and just scp testfile* to the other box). On the other hand, if I connect to 192.168.0.19 using ftp (either active or passive) where ftp is being run through inetd, the interface resets (watchdog) within seconds (a few MBs) of traffic. Enabling polling does nothing, nor does changing net.inet.tcp.{recv,send}space. Any ideas why I would be seeing such behavioral differences between scp and ftp? You'll get a much higher throughput rate with FTP than you will with SSH, simply because encryption overhead is quite high (even with the Blowfish cipher). With a very fast processor and on a gigE network you'll probably see 8-9MByte/sec via SSH while 60-70MByte/sec via FTP. That's the only difference I can think of. The watchdog resets I can't explain; Jack Vogel should be able to assist with that. But it sounds like the resets only happen under very high throughput conditions (which is why you'd see it with FTP but not SSH). What kind of hardware is this interface? Watchdogs mean TX cleanup isn't happening in a reasonable time, without further data its hard to know what might be going on. Jack from pciconf: [EMAIL PROTECTED]:0:0: class=0x02 card=0x108c15d9 chip=0x108c8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'PRO/1000 PM' class= network subclass = ethernet [EMAIL PROTECTED]:0:0: class=0x02 card=0x109a15d9 chip=0x109a8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class= network subclass = ethernet em0 is the interface in question. from dmesg: em0: Intel(R) PRO/1000 Network Connection Version - 6.2.9 port 0x4000-0x401f mem 0xe030-0xe031 irq 16 at device 0.0 on pci13 em1: Intel(R) PRO/1000 Network Connection Version - 6.2.9 port 0x5000-0x501f mem 0xe040-0xe041 irq 17 at device 0.0 on pci14 Sven ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CARP and em0 timeout watchdog
On Fri, 2007-04-20 at 18:46 +0200, Clayton Milos wrote: - Original Message - From: Sven Willenberger [EMAIL PROTECTED] To: Jeremy Chadwick [EMAIL PROTECTED] Cc: freebsd-stable@FreeBSD.org Sent: Friday, April 20, 2007 6:25 PM Subject: Re: CARP and em0 timeout watchdog On Fri, 2007-04-20 at 09:04 -0700, Jeremy Chadwick wrote: On Fri, Apr 20, 2007 at 11:51:56AM -0400, Sven Willenberger wrote: Having done more diagnostics I have found out it is not CARP related at all. It turns out that the same timeouts will happen when ftp'ing to the physical address IPs as well. There is also an odd situation here depending on which protocol I use. The two boxes are connected to a Dell Powerconnect 2616 gig switch with CAT6. If I scp files from the 192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth without a hiccup (I used dd to create various sized testfiles from 32M to 1G in size and just scp testfile* to the other box). On the other hand, if I connect to 192.168.0.19 using ftp (either active or passive) where ftp is being run through inetd, the interface resets (watchdog) within seconds (a few MBs) of traffic. Enabling polling does nothing, nor does changing net.inet.tcp.{recv,send}space. Any ideas why I would be seeing such behavioral differences between scp and ftp? You'll get a much higher throughput rate with FTP than you will with SSH, simply because encryption overhead is quite high (even with the Blowfish cipher). With a very fast processor and on a gigE network you'll probably see 8-9MByte/sec via SSH while 60-70MByte/sec via FTP. That's the only difference I can think of. The watchdog resets I can't explain; Jack Vogel should be able to assist with that. But it sounds like the resets only happen under very high throughput conditions (which is why you'd see it with FTP but not SSH). I guess it is possible that the traffic from ftp (or smb) is overloading the interface; fwiw, if I increase the {recv,send}space to 131072 I can acheive 32MB+/s using scp (and ftp shows similar values). The real question is how to avoid these watchdog timeouts during heavy traffic; the whole point here was to replace windows-based fileshare servers with FreeBSD for the local network but at the moment it is proving ineffectual as any samba file transfers stall (much like ftp). I see no other error messages in the logfiles other than the watchdog timeouts plus interface down/up messages. Sven Sorry for jumping on a thread here. I've had issues with em NIC's as well. Especially with heavy loads. What helped for me was turning on polling. I recompiled the kernel with polling and turned it on in rc.conf and my problems disappeared. Are you running with polling on? At first I did not have polling compiled in, so no. Then I compiled in polling (and used options HZ=2000) but it didn't change anything. Whether I have polling enabled or disabled on the interface, the outcome is the same. Sven ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CARP and em0 timeout watchdog
On 4/20/07, Sven Willenberger [EMAIL PROTECTED] wrote: On Fri, 2007-04-20 at 10:17 -0700, Jack Vogel wrote: On 4/20/07, Jeremy Chadwick [EMAIL PROTECTED] wrote: On Fri, Apr 20, 2007 at 11:51:56AM -0400, Sven Willenberger wrote: Having done more diagnostics I have found out it is not CARP related at all. It turns out that the same timeouts will happen when ftp'ing to the physical address IPs as well. There is also an odd situation here depending on which protocol I use. The two boxes are connected to a Dell Powerconnect 2616 gig switch with CAT6. If I scp files from the 192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth without a hiccup (I used dd to create various sized testfiles from 32M to 1G in size and just scp testfile* to the other box). On the other hand, if I connect to 192.168.0.19 using ftp (either active or passive) where ftp is being run through inetd, the interface resets (watchdog) within seconds (a few MBs) of traffic. Enabling polling does nothing, nor does changing net.inet.tcp.{recv,send}space. Any ideas why I would be seeing such behavioral differences between scp and ftp? You'll get a much higher throughput rate with FTP than you will with SSH, simply because encryption overhead is quite high (even with the Blowfish cipher). With a very fast processor and on a gigE network you'll probably see 8-9MByte/sec via SSH while 60-70MByte/sec via FTP. That's the only difference I can think of. The watchdog resets I can't explain; Jack Vogel should be able to assist with that. But it sounds like the resets only happen under very high throughput conditions (which is why you'd see it with FTP but not SSH). What kind of hardware is this interface? Watchdogs mean TX cleanup isn't happening in a reasonable time, without further data its hard to know what might be going on. Jack from pciconf: [EMAIL PROTECTED]:0:0: class=0x02 card=0x108c15d9 chip=0x108c8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'PRO/1000 PM' class= network subclass = ethernet [EMAIL PROTECTED]:0:0: class=0x02 card=0x109a15d9 chip=0x109a8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class= network subclass = ethernet em0 is the interface in question. from dmesg: em0: Intel(R) PRO/1000 Network Connection Version - 6.2.9 port 0x4000-0x401f mem 0xe030-0xe031 irq 16 at device 0.0 on pci13 em1: Intel(R) PRO/1000 Network Connection Version - 6.2.9 port 0x5000-0x501f mem 0xe040-0xe041 irq 17 at device 0.0 on pci14 OH, this is an 82573, and I've posted a firmware patcher a couple different times, there is a bit in the MANC register that is incorrectly programmed in some vendors systems. Can you search email for that patcher, it needs to run from DOS. If you are unable to find it let me know and I'll resent you a copy. Jack ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CARP and em0 timeout watchdog
On Fri, 2007-04-20 at 11:27 -0700, Jack Vogel wrote: On 4/20/07, Sven Willenberger [EMAIL PROTECTED] wrote: On Fri, 2007-04-20 at 10:17 -0700, Jack Vogel wrote: On 4/20/07, Jeremy Chadwick [EMAIL PROTECTED] wrote: On Fri, Apr 20, 2007 at 11:51:56AM -0400, Sven Willenberger wrote: Having done more diagnostics I have found out it is not CARP related at all. It turns out that the same timeouts will happen when ftp'ing to the physical address IPs as well. There is also an odd situation here depending on which protocol I use. The two boxes are connected to a Dell Powerconnect 2616 gig switch with CAT6. If I scp files from the 192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth without a hiccup (I used dd to create various sized testfiles from 32M to 1G in size and just scp testfile* to the other box). On the other hand, if I connect to 192.168.0.19 using ftp (either active or passive) where ftp is being run through inetd, the interface resets (watchdog) within seconds (a few MBs) of traffic. Enabling polling does nothing, nor does changing net.inet.tcp.{recv,send}space. Any ideas why I would be seeing such behavioral differences between scp and ftp? You'll get a much higher throughput rate with FTP than you will with SSH, simply because encryption overhead is quite high (even with the Blowfish cipher). With a very fast processor and on a gigE network you'll probably see 8-9MByte/sec via SSH while 60-70MByte/sec via FTP. That's the only difference I can think of. The watchdog resets I can't explain; Jack Vogel should be able to assist with that. But it sounds like the resets only happen under very high throughput conditions (which is why you'd see it with FTP but not SSH). What kind of hardware is this interface? Watchdogs mean TX cleanup isn't happening in a reasonable time, without further data its hard to know what might be going on. Jack from pciconf: [EMAIL PROTECTED]:0:0: class=0x02 card=0x108c15d9 chip=0x108c8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'PRO/1000 PM' class= network subclass = ethernet [EMAIL PROTECTED]:0:0: class=0x02 card=0x109a15d9 chip=0x109a8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class= network subclass = ethernet em0 is the interface in question. from dmesg: em0: Intel(R) PRO/1000 Network Connection Version - 6.2.9 port 0x4000-0x401f mem 0xe030-0xe031 irq 16 at device 0.0 on pci13 em1: Intel(R) PRO/1000 Network Connection Version - 6.2.9 port 0x5000-0x501f mem 0xe040-0xe041 irq 17 at device 0.0 on pci14 OH, this is an 82573, and I've posted a firmware patcher a couple different times, there is a bit in the MANC register that is incorrectly programmed in some vendors systems. Can you search email for that patcher, it needs to run from DOS. If you are unable to find it let me know and I'll resent you a copy. Jack If you are referring to the dcgdis.ThisIsZip attachment, I found it in earlier threads, thanks. Will work on patching the nics and will keep the list updated. Thanks again. Sven ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CARP and em0 timeout watchdog
Date: Fri, 20 Apr 2007 09:04:31 -0700 From: Jeremy Chadwick [EMAIL PROTECTED] Sender: [EMAIL PROTECTED] On Fri, Apr 20, 2007 at 11:51:56AM -0400, Sven Willenberger wrote: Having done more diagnostics I have found out it is not CARP related at all. It turns out that the same timeouts will happen when ftp'ing to the physical address IPs as well. There is also an odd situation here depending on which protocol I use. The two boxes are connected to a Dell Powerconnect 2616 gig switch with CAT6. If I scp files from the 192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth without a hiccup (I used dd to create various sized testfiles from 32M to 1G in size and just scp testfile* to the other box). On the other hand, if I connect to 192.168.0.19 using ftp (either active or passive) where ftp is being run through inetd, the interface resets (watchdog) within seconds (a few MBs) of traffic. Enabling polling does nothing, nor does changing net.inet.tcp.{recv,send}space. Any ideas why I would be seeing such behavioral differences between scp and ftp? You'll get a much higher throughput rate with FTP than you will with SSH, simply because encryption overhead is quite high (even with the Blowfish cipher). With a very fast processor and on a gigE network you'll probably see 8-9MByte/sec via SSH while 60-70MByte/sec via FTP. That's the only difference I can think of. OK. Let's put the blame where it belongs. It's probably not the encryption/decryption that slows down scp. It's the OpenSSH code. It is only slightly related to CPU speed on reasonably modern CPUs. My Athlon 64 system goes to 23% CPU while transferring a large (150MB) file using AES128-CBC. My Ethernet runs at over 11 MBytes/sec on a FastEthernet about 5 nanoseconds long. If you have a system slower than about 600 MHz, then it may be the encryption. At least 3 years ago the folks at the Pittsburgh Supercomputer Center (PSC) were seeing slow scp performance and investigated. The systems they were running on were pretty fast (it is a Supercomputer Center) and should have been able to run at nearly 1 Gbps without problems, but could not. FTP (which is a VERY inefficient protocol) was much faster. They examined the OpenSSH source code and found the problem. They published patches to OpenSSH and have continued to maintain them, but the OpenBSD people have yet to incorporate them, so ssh is still slow on long paths. This only applies to transfers over longer distances. Transfers over the LAN should not be impacted by this. More information and the patch are available at: http://www.psc.edu/networking/projects/hpn-ssh/ -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: [EMAIL PROTECTED] Phone: +1 510 486-8634 Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751 pgpFhGWAgbpwA.pgp Description: PGP signature
Re: CARP and em0 timeout watchdog
On 4/20/07, Brian McCann [EMAIL PROTECTED] wrote: On 4/20/07, Jack Vogel [EMAIL PROTECTED] wrote: On 4/20/07, Sven Willenberger [EMAIL PROTECTED] wrote: On Fri, 2007-04-20 at 10:17 -0700, Jack Vogel wrote: On 4/20/07, Jeremy Chadwick [EMAIL PROTECTED] wrote: On Fri, Apr 20, 2007 at 11:51:56AM -0400, Sven Willenberger wrote: Having done more diagnostics I have found out it is not CARP related at all. It turns out that the same timeouts will happen when ftp'ing to the physical address IPs as well. There is also an odd situation here depending on which protocol I use. The two boxes are connected to a Dell Powerconnect 2616 gig switch with CAT6. If I scp files from the 192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth without a hiccup (I used dd to create various sized testfiles from 32M to 1G in size and just scp testfile* to the other box). On the other hand, if I connect to 192.168.0.19 using ftp (either active or passive) where ftp is being run through inetd, the interface resets (watchdog) within seconds (a few MBs) of traffic. Enabling polling does nothing, nor does changing net.inet.tcp.{recv,send}space. Any ideas why I would be seeing such behavioral differences between scp and ftp? You'll get a much higher throughput rate with FTP than you will with SSH, simply because encryption overhead is quite high (even with the Blowfish cipher). With a very fast processor and on a gigE network you'll probably see 8-9MByte/sec via SSH while 60-70MByte/sec via FTP. That's the only difference I can think of. The watchdog resets I can't explain; Jack Vogel should be able to assist with that. But it sounds like the resets only happen under very high throughput conditions (which is why you'd see it with FTP but not SSH). What kind of hardware is this interface? Watchdogs mean TX cleanup isn't happening in a reasonable time, without further data its hard to know what might be going on. Jack from pciconf: [EMAIL PROTECTED]:0:0: class=0x02 card=0x108c15d9 chip=0x108c8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'PRO/1000 PM' class= network subclass = ethernet [EMAIL PROTECTED]:0:0: class=0x02 card=0x109a15d9 chip=0x109a8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class= network subclass = ethernet em0 is the interface in question. from dmesg: em0: Intel(R) PRO/1000 Network Connection Version - 6.2.9 port 0x4000-0x401f mem 0xe030-0xe031 irq 16 at device 0.0 on pci13 em1: Intel(R) PRO/1000 Network Connection Version - 6.2.9 port 0x5000-0x501f mem 0xe040-0xe041 irq 17 at device 0.0 on pci14 OH, this is an 82573, and I've posted a firmware patcher a couple different times, there is a bit in the MANC register that is incorrectly programmed in some vendors systems. Can you search email for that patcher, it needs to run from DOS. If you are unable to find it let me know and I'll resent you a copy. Jack ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] FWIW, I've got 82546B cards and it's happening to me as well, but I'm on 6.1. I'm upgrading to 6.2 and trying polling as we speak. --Brian This is not the same problem, until you are running 6.2 RELEASE its a whole other ballpark, there were locking issues between the driver and the net layer that were fixed. Jack ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CARP and em0 timeout watchdog
On 4/20/07, Jack Vogel [EMAIL PROTECTED] wrote: On 4/20/07, Sven Willenberger [EMAIL PROTECTED] wrote: On Fri, 2007-04-20 at 10:17 -0700, Jack Vogel wrote: On 4/20/07, Jeremy Chadwick [EMAIL PROTECTED] wrote: On Fri, Apr 20, 2007 at 11:51:56AM -0400, Sven Willenberger wrote: Having done more diagnostics I have found out it is not CARP related at all. It turns out that the same timeouts will happen when ftp'ing to the physical address IPs as well. There is also an odd situation here depending on which protocol I use. The two boxes are connected to a Dell Powerconnect 2616 gig switch with CAT6. If I scp files from the 192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth without a hiccup (I used dd to create various sized testfiles from 32M to 1G in size and just scp testfile* to the other box). On the other hand, if I connect to 192.168.0.19 using ftp (either active or passive) where ftp is being run through inetd, the interface resets (watchdog) within seconds (a few MBs) of traffic. Enabling polling does nothing, nor does changing net.inet.tcp.{recv,send}space. Any ideas why I would be seeing such behavioral differences between scp and ftp? You'll get a much higher throughput rate with FTP than you will with SSH, simply because encryption overhead is quite high (even with the Blowfish cipher). With a very fast processor and on a gigE network you'll probably see 8-9MByte/sec via SSH while 60-70MByte/sec via FTP. That's the only difference I can think of. The watchdog resets I can't explain; Jack Vogel should be able to assist with that. But it sounds like the resets only happen under very high throughput conditions (which is why you'd see it with FTP but not SSH). What kind of hardware is this interface? Watchdogs mean TX cleanup isn't happening in a reasonable time, without further data its hard to know what might be going on. Jack from pciconf: [EMAIL PROTECTED]:0:0: class=0x02 card=0x108c15d9 chip=0x108c8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'PRO/1000 PM' class= network subclass = ethernet [EMAIL PROTECTED]:0:0: class=0x02 card=0x109a15d9 chip=0x109a8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class= network subclass = ethernet em0 is the interface in question. from dmesg: em0: Intel(R) PRO/1000 Network Connection Version - 6.2.9 port 0x4000-0x401f mem 0xe030-0xe031 irq 16 at device 0.0 on pci13 em1: Intel(R) PRO/1000 Network Connection Version - 6.2.9 port 0x5000-0x501f mem 0xe040-0xe041 irq 17 at device 0.0 on pci14 OH, this is an 82573, and I've posted a firmware patcher a couple different times, there is a bit in the MANC register that is incorrectly programmed in some vendors systems. Can you search email for that patcher, it needs to run from DOS. If you are unable to find it let me know and I'll resent you a copy. Jack ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] FWIW, I've got 82546B cards and it's happening to me as well, but I'm on 6.1. I'm upgrading to 6.2 and trying polling as we speak. --Brian -- _-=-_-=-_-=-_-=-_-=-_-=-_-=-_-=-_-=-_-=-_-=-_ Brian McCann Systems Network Administrator, K12USA I don't have to take this abuse from you -- I've got hundreds of people waiting to abuse me. -- Bill Murray, Ghostbusters ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
CARP and em0 timeout watchdog
I currently have a FreeBSD 6.2-RELEASE-p3 SMP with dual intel PRO/1000PM nics configured as follows: em0: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST mtu 1500 options=bRXCSUM,TXCSUM,VLAN_MTU inet 192.168.0.18 netmask 0xff00 broadcast 192.168.0.255 ether 00:30:48:8d:5c:0a media: Ethernet autoselect (1000baseTX full-duplex) status: active em1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 4096 options=bRXCSUM,TXCSUM,VLAN_MTU inet 10.10.0.18 netmask 0xfff8 broadcast 10.10.0.23 ether 00:30:48:8d:5c:0b media: Ethernet autoselect (1000baseTX full-duplex) status: active the em0 interface connects to the LAN while the em1 interface is connected to an identical box via CAT6 crossover cable (for ggate/gmirror). Now, I have also configured a carp interface: carp0: flags=49UP,LOOPBACK,RUNNING mtu 1500 inet 192.168.0.20 netmask 0x carp: MASTER vhid 1 advbase 1 advskew 0 There are twin boxes here and I am running Samba. The problem is that with transfers across the carp IP (192.168.0.20) I end up with em0 resetting after a watchdog timeout error. This occurs whether I transfer files from a windows box using a share (samba) or via ftp. This problem does *not* occur if I ftp to the 192.168.0.19 interface (non-virtual). I suspected cabling at first so had all the cabling in question replaced with fresh CAT6 to no avail. Several gigs of data can be transferred to the real interface (em0) without any issue at all; a max of maybe 1 - 2 Gig can be transferred connected to the carp'ed IP before the em0 reset. Any ideas here? Sven ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]