Re: Which FreeBSD is the most stable for Dell PowerEdge 2850
I don't have any 2850's but the 1850 I have has been running 6.0 since the BETA1, and last night just upgraded it to 6.1. No issues. The PERC 4e/Si card is phenominally fast on this system (running 2 disk RAID1). I'd recommend you to run 6.1 as it is stable on all of my Dell systems that run it (and I'm migrating the older FreeBSD boxes to 6.1 as time permits). If you already have > 1 CPU, you might as well leave hyperthreading off. There are cases where it degenerates performance rather than enhance it. As for mysql version, "no comment" :-) Thanks for the reply! I'm in the process of upgrading the 2850 to 6.1 now, and it seems to have gone well so far. Time will tell in the long term whether the stability is what I'm hoping for, but at least it does seem to be up and running okay so far. As for hyperthreading, I did some benchmarking back with FreeBSD 5.4 using the actual SQL databases I'm serving on the machine and loading the server with lots of simultaneous queries from remote machines similar to those which will be used in production. Back then, there was about a 10% increase in performance. I'll run the same tests again before putting the machine in production again to see if anything changed. 10% isn't much, but every bit helps, if hyperthreading doesn't cause the machine to become unstable otherwise. Thanks again! Dan -- Syzygy Research & Technology Box 83, Legal, AB T0G 1L0 Canada Phone: 780-961-2213 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 0.0% user, 0.0% nice, 0.0% system, 53.8% interrupt, 46.2% idle - Unusual interrupt use?
I do not use usb at all on this paticular server. The only ones on it are the 2-4 onboard ones. Removing uhci from my kernel fixed this issue. Thank you. On 7/5/06, Max Laier <[EMAIL PROTECTED]> wrote: On Thursday 06 July 2006 02:17, Vye Wilson wrote: > # vmstat -i > interrupt total rate > irq1: atkbd0 5 0 > irq6: fdc0 3 0 > irq10: uhci1 915633230 262810 > irq15: ata1 1306 0 > irq17: fwohci0 1 0 > irq18: fxp0 2876 0 > irq21: twa0 153 0 > cpu0: timer 6964974 1999 > Total 922602548 264811 Are you using usb on that box? If not, get rid of device uhci in your kernel config to see if that fixes it. If you are using usb - I have no idea. A BIOS upgrade might help. -- /"\ Best regards, | [EMAIL PROTECTED] \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | [EMAIL PROTECTED] / \ ASCII Ribbon Campaign | Against HTML Mail and News -- --Vye ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: pkg_version confused by architecutre in package name
On Thu, Jul 06, 2006 at 02:45:45AM +0200, [LoN]Kamikaze wrote: > I normally run the command > # pkg_version -Iv | grep \< > before running 'portupgrade -a', to see what's going to happen. This time I > got the following output: > > diablo-jdk-freebsd6.i386.1.5.0.07.00 < needs updating (index has > 1.5.0.07.00) > > It seems that the tool is confused by the i386 in the package name. Actually I think it's confused by the fact that the package name is "diablo-jdk" and the version is "freebsd6.i386.1.5.0.07.00". That's just plain bogus. -- Brooks -- Any statement of the form "X is the one, true Y" is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 pgpGOC8Lt7rr1.pgp Description: PGP signature
Re: em device hangs on ifconfig alias ...
On Wed, Jul 05, 2006 at 06:29:55PM -0700, Atanas wrote: > Pyun YongHyeon said the following on 6/30/06 8:54 PM: > >On Fri, Jun 30, 2006 at 12:28:49PM -0700, Atanas wrote: > > > User Freebsd said the following on 6/29/06 9:29 PM: > > > > > > > >The other funny thing about the current em driver is that if you move > > an > >IP to it from a different server, the appropriate ARP packets > > aren't > >sent out to redirect the IP traffic .. recently, someone > > pointed me to > >arping, which has solved my problem *external* to the > > driver ... > > > > > > > That's the second reason why I (still) avoid em in mass-aliased systems. > > > > > > I have a single pool of IP addresses shared by many servers with > > > multiple aliases each. When someone leaves and frees an IP, it gets > > > reused and brought up on a different server. In case it was previously > > > handled by em, the traffic doesn't get redirected to the new server. > > > > > > Similar thing happens even with machines with single static IPs. For > > > instance when retiring an old production system, I usually request a > > new > box to be brought up on a different IP, make a fresh install on > > > everything and test, swap IP addresses and reboot. In case of em, after > > > a soft reboot both systems are inaccessible. > > > > > > A workaround is to power both of the systems down and then power them > > > up. This however cannot be done remotely and in case there were IP > > > aliases, they still don't get any traffic. > > > > > > >I haven't fully tested it but what about attached patch? > >It may fix your ARP issue. The patch also fixes other issues > >related with ioctls. > >Now em(4) will send a ARP packet when its IP address is changed even > >if there is no active link. Since em(4) is not mii-aware driver I > >can't sure this behaviour is correct. > > > The patch is against if_em.c,v 1.116 2006/06/06, which is 7-CURRENT. I > tried "merging" the relevant em driver files into a 6-STABLE > installation by simply copying sys/dev/em/* and sys/modules/em/Makefile, > but it seems that the new revision depends on other -CURRENT things and > the module build fails: > > # pwd > /usr/src/sys/modules/em > # make clean; make > ... > /usr/src/sys/modules/em/../../dev/em/if_em.c: In function > `em_setup_interface': > /usr/src/sys/modules/em/../../dev/em/if_em.c:2143: error: > `IFCAP_VLAN_HWCSUM' undeclared (first use in this function) > ... > > I don't have a 7-CURRENT based box around. It seems too bleeding edge > for me anyway. I was hoping to play with different if_em kernel modules > on a semi-production (spare) box and eventually test the proposed em > patch, but apparently it's not so easy. > > Please let me know if I'm missing something obvious. > My bad. Here is patch generated against RELENG_6. -- Regards, Pyun YongHyeon --- if_em.c.origFri May 19 09:19:57 2006 +++ if_em.c Thu Jul 6 11:10:56 2006 @@ -657,8 +657,9 @@ mtx_assert(&adapter->mtx, MA_OWNED); -if (!adapter->link_active) -return; + if ((ifp->if_drv_flags & (IFF_DRV_RUNNING|IFF_DRV_OACTIVE)) != + IFF_DRV_RUNNING) + return; while (!IFQ_DRV_IS_EMPTY(&ifp->if_snd)) { @@ -719,11 +720,6 @@ if (adapter->in_detach) return(error); switch (command) { - case SIOCSIFADDR: - case SIOCGIFADDR: - IOCTL_DEBUGOUT("ioctl rcv'd: SIOCxIFADDR (Get/Set Interface Addr)"); - ether_ioctl(ifp, command, data); - break; case SIOCSIFMTU: { int max_frame_size; @@ -760,16 +756,17 @@ IOCTL_DEBUGOUT("ioctl rcv'd: SIOCSIFFLAGS (Set Interface Flags)"); EM_LOCK(adapter); if (ifp->if_flags & IFF_UP) { - if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) { + if ((ifp->if_drv_flags & IFF_DRV_RUNNING)) { + if ((ifp->if_flags ^ adapter->if_flags) & + IFF_PROMISC) { + em_disable_promisc(adapter); + em_set_promisc(adapter); + } + } else em_init_locked(adapter); - } - - em_disable_promisc(adapter); - em_set_promisc(adapter); } else { - if (ifp->if_drv_flags & IFF_DRV_RUNNING) { + if (ifp->if_drv_flags & IFF_DRV_RUNNING) em_stop(adapter); - } } EM_UNLOCK(adapter); break; @@ -835,8 +832,8 @@ break; } default: - IOCTL_DEBUGOUT1("ioctl received: UNKNOWN (0x%x)", (int)command);
Re: em device hangs on ifconfig alias ...
Pyun YongHyeon said the following on 6/30/06 8:54 PM: On Fri, Jun 30, 2006 at 12:28:49PM -0700, Atanas wrote: > User Freebsd said the following on 6/29/06 9:29 PM: > > > >The other funny thing about the current em driver is that if you move an > >IP to it from a different server, the appropriate ARP packets aren't > >sent out to redirect the IP traffic .. recently, someone pointed me to > >arping, which has solved my problem *external* to the driver ... > > > That's the second reason why I (still) avoid em in mass-aliased systems. > > I have a single pool of IP addresses shared by many servers with > multiple aliases each. When someone leaves and frees an IP, it gets > reused and brought up on a different server. In case it was previously > handled by em, the traffic doesn't get redirected to the new server. > > Similar thing happens even with machines with single static IPs. For > instance when retiring an old production system, I usually request a new > box to be brought up on a different IP, make a fresh install on > everything and test, swap IP addresses and reboot. In case of em, after > a soft reboot both systems are inaccessible. > > A workaround is to power both of the systems down and then power them > up. This however cannot be done remotely and in case there were IP > aliases, they still don't get any traffic. > I haven't fully tested it but what about attached patch? It may fix your ARP issue. The patch also fixes other issues related with ioctls. Now em(4) will send a ARP packet when its IP address is changed even if there is no active link. Since em(4) is not mii-aware driver I can't sure this behaviour is correct. The patch is against if_em.c,v 1.116 2006/06/06, which is 7-CURRENT. I tried "merging" the relevant em driver files into a 6-STABLE installation by simply copying sys/dev/em/* and sys/modules/em/Makefile, but it seems that the new revision depends on other -CURRENT things and the module build fails: # pwd /usr/src/sys/modules/em # make clean; make ... /usr/src/sys/modules/em/../../dev/em/if_em.c: In function `em_setup_interface': /usr/src/sys/modules/em/../../dev/em/if_em.c:2143: error: `IFCAP_VLAN_HWCSUM' undeclared (first use in this function) ... I don't have a 7-CURRENT based box around. It seems too bleeding edge for me anyway. I was hoping to play with different if_em kernel modules on a semi-production (spare) box and eventually test the proposed em patch, but apparently it's not so easy. Please let me know if I'm missing something obvious. Thanks, Atanas Index: if_em.c === RCS file: /pool/ncvs/src/sys/dev/em/if_em.c,v retrieving revision 1.116 diff -u -r1.116 if_em.c --- if_em.c 6 Jun 2006 08:03:49 - 1.116 +++ if_em.c 1 Jul 2006 03:51:41 - @@ -692,7 +692,8 @@ EM_LOCK_ASSERT(sc); - if (!sc->link_active) + if ((ifp->if_drv_flags & (IFF_DRV_RUNNING|IFF_DRV_OACTIVE)) != + IFF_DRV_RUNNING) return; while (!IFQ_DRV_IS_EMPTY(&ifp->if_snd)) { @@ -751,11 +752,6 @@ return (error); switch (command) { - case SIOCSIFADDR: - case SIOCGIFADDR: - IOCTL_DEBUGOUT("ioctl rcv'd: SIOCxIFADDR (Get/Set Interface Addr)"); - ether_ioctl(ifp, command, data); - break; case SIOCSIFMTU: { int max_frame_size; @@ -802,17 +798,19 @@ IOCTL_DEBUGOUT("ioctl rcv'd: SIOCSIFFLAGS (Set Interface Flags)"); EM_LOCK(sc); if (ifp->if_flags & IFF_UP) { - if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) { + if ((ifp->if_drv_flags & IFF_DRV_RUNNING)) { + if ((ifp->if_flags ^ sc->if_flags) & + IFF_PROMISC) { + em_disable_promisc(sc); + em_set_promisc(sc); + } + } else em_init_locked(sc); - } - - em_disable_promisc(sc); - em_set_promisc(sc); } else { - if (ifp->if_drv_flags & IFF_DRV_RUNNING) { + if (ifp->if_drv_flags & IFF_DRV_RUNNING) em_stop(sc); - } } + sc->if_flags = ifp->if_flags; EM_UNLOCK(sc); break; case SIOCADDMULTI: @@ -878,8 +876,8 @@ break; } default: - IOCTL_DEBUGOUT1("ioctl received: UNKNOWN (0x%x)", (int)command); - error = EINVAL; + error = ether_ioctl(ifp, command, data); + break;
pkg_version confused by architecutre in package name
I normally run the command # pkg_version -Iv | grep \< before running 'portupgrade -a', to see what's going to happen. This time I got the following output: diablo-jdk-freebsd6.i386.1.5.0.07.00 < needs updating (index has 1.5.0.07.00) It seems that the tool is confused by the i386 in the package name. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 0.0% user, 0.0% nice, 0.0% system, 53.8% interrupt, 46.2% idle - Unusual interrupt use?
On Thursday 06 July 2006 02:17, Vye Wilson wrote: > # vmstat -i > interrupt total rate > irq1: atkbd0 5 0 > irq6: fdc0 3 0 > irq10: uhci1 915633230 262810 > irq15: ata1 1306 0 > irq17: fwohci0 1 0 > irq18: fxp0 2876 0 > irq21: twa0 153 0 > cpu0: timer 6964974 1999 > Total 922602548 264811 Are you using usb on that box? If not, get rid of device uhci in your kernel config to see if that fixes it. If you are using usb - I have no idea. A BIOS upgrade might help. -- /"\ Best regards, | [EMAIL PROTECTED] \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | [EMAIL PROTECTED] / \ ASCII Ribbon Campaign | Against HTML Mail and News pgprlUTcGe4ux.pgp Description: PGP signature
Re: 0.0% user, 0.0% nice, 0.0% system, 53.8% interrupt, 46.2% idle - Unusual interrupt use?
Anything plugged into USB if so try removing it as uhci1 is clearly your issue. Vye Wilson wrote: # vmstat -i interrupt total rate irq1: atkbd0 5 0 irq6: fdc0 3 0 irq10: uhci1 915633230 262810 irq15: ata1 1306 0 irq17: fwohci0 1 0 irq18: fxp0 2876 0 irq21: twa0 153 0 cpu0: timer 6964974 1999 This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Wed, 5 Jul 2006, Francisco Reyes wrote: Scott Long writes: For what it's worth, I recently spent a lot of time putting FreeBSD 6.1 to the test as both an NFS client and server in a mixed OS environment. I have a few debugging settings/suggestions that have been sent my way and I plan to try them tonight, but this is just another report.. FreeBSD only environment. Today after hours going crazy with horrible performance I brought down nfsd and brought it back up.. that simple process got vmstat 'b' column down and everything was back to normal. Again this will not help anyone troubleshoot, but just to mention that it happens even with a FreeBSD only environment. 'k, to those out there that know what is useful, and what isn't ... If Francisco had DDB enabled, did a CTL-ALT-ESC when the above happens, and does a 'panic' to crash the server and dump a core ... can anything useful be gleamed from that core dump? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 0.0% user, 0.0% nice, 0.0% system, 53.8% interrupt, 46.2% idle - Unusual interrupt use?
# vmstat -i interrupt total rate irq1: atkbd0 5 0 irq6: fdc0 3 0 irq10: uhci1 915633230 262810 irq15: ata1 1306 0 irq17: fwohci0 1 0 irq18: fxp0 2876 0 irq21: twa0 153 0 cpu0: timer 6964974 1999 Total 922602548 264811 # systat /0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10 Load Average /0 /10 /20 /30 /40 /50 /60 /70 /80 /90 /100 root irq10: uhc X root idle X So would irq10 be the culprit? If so where do I go from here? On 7/5/06, Max Laier <[EMAIL PROTECTED]> wrote: On Thursday 06 July 2006 02:02, Vye Wilson wrote: > I'm really not sure how to go about troubleshooting this issue. Can someone > point me in the right direction? "vmstat -i" should give a good idea what is causing the interrupt load. -- /"\ Best regards, | [EMAIL PROTECTED] \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | [EMAIL PROTECTED] / \ ASCII Ribbon Campaign | Against HTML Mail and News -- --Vye ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 0.0% user, 0.0% nice, 0.0% system, 53.8% interrupt, 46.2% idle - Unusual interrupt use?
"vmstat -i" and "systat" will be useful at identifying what is causing the interupts. Vye Wilson wrote: Recently I've had an unusually high amount of 'interrupt' cpu usage. I stopped all my jails so the box is for the most part idle. Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 0.0% user, 0.0% nice, 0.0% system, 53.8% interrupt, 46.2% idle - Unusual interrupt use?
On Thursday 06 July 2006 02:02, Vye Wilson wrote: > I'm really not sure how to go about troubleshooting this issue. Can someone > point me in the right direction? "vmstat -i" should give a good idea what is causing the interrupt load. -- /"\ Best regards, | [EMAIL PROTECTED] \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | [EMAIL PROTECTED] / \ ASCII Ribbon Campaign | Against HTML Mail and News pgpF9ZSmeYtaR.pgp Description: PGP signature
0.0% user, 0.0% nice, 0.0% system, 53.8% interrupt, 46.2% idle - Unusual interrupt use?
Recently I've had an unusually high amount of 'interrupt' cpu usage. I stopped all my jails so the box is for the most part idle. Here is my uname: FreeBSD Natsume.wow.com 6.1-STABLE FreeBSD 6.1-STABLE #3: Tue Jul 4 22:14:02 UTC 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/NATSUME i386 Here is my top output: last pid: 674; load averages: 0.00, 0.00, 0.00 up 0+00:32:49 16:51:02 19 processes: 1 running, 18 sleeping CPU states: 0.0% user, 0.0% nice, 0.0% system, 53.8% interrupt, 46.2%idle Mem: 5332K Active, 3984K Inact, 20M Wired, 9056K Buf, 967M Free Swap: 2022M Total, 2022M Free PID USERNAMETHR PRI NICE SIZERES STATETIME WCPU COMMAND 666 root 1 40 6116K 3096K sbwait 0:00 0.00% sshd 674 root 1 -640 2288K 1560K RUN 0:00 0.00% top 670 vye 1 80 3188K 1992K wait 0:00 0.00% bash 672 vye 1 80 1684K 1332K wait 0:00 0.00% su 312 root 1 960 1344K 988K select 0:00 0.00% syslogd 673 root 1 80 3184K 2068K wait 0:00 0.00% bash 669 vye 1 960 6100K 3128K select 0:00 0.00% sshd 463 root 1 80 1356K 1116K nanslp 0:00 0.00% cron 594 root 1 50 1312K 944K ttyin0:00 0.00% getty 599 root 1 50 1312K 944K ttyin0:00 0.00% getty 593 root 1 50 1312K 944K ttyin0:00 0.00% getty 597 root 1 50 1312K 944K ttyin0:00 0.00% getty 592 root 1 50 1312K 944K ttyin0:00 0.00% getty 595 root 1 50 1312K 944K ttyin0:00 0.00% getty 596 root 1 50 1312K 944K ttyin0:00 0.00% getty 598 root 1 50 1312K 944K ttyin0:00 0.00% getty 450 root 1 960 3400K 2556K select 0:00 0.00% sshd 390 root 1 960 1256K 832K select 0:00 0.00% usbd 283 root 1 1080 516K 376K select 0:00 0.00% devd After taking a look at dmesg I'm not sure if I just now noticed this or if it has recently started doing this: unknown: can't assign resources (memory) unknown: can't assign resources (port) unknown: can't assign resources (port) unknown: can't assign resources (port) unknown: can't assign resources (port) unknown: can't assign resources (port) unknown: can't assign resources (irq) Full dmesg output: Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.1-STABLE #3: Tue Jul 4 22:14:02 UTC 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/NATSUME Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Pentium(R) 4 CPU 2.40GHz (2392.05-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf27 Stepping = 7 Features=0xbfebfbff Features2=0x400 real memory = 1073479680 (1023 MB) avail memory = 1041547264 (993 MB) MPTable: ioapic0: Assuming intbase of 0 ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 cpu0 on motherboard pcib0: pcibus 0 on motherboard pci0: on pcib0 pcib0: unable to route slot 31 INTC agp0: mem 0xf800-0xfbff at device 0.0 on pci0 pcib1: at device 1.0 on pci0 pci1: on pcib1 pcib2: at device 30.0 on pci0 pci2: on pcib2 3ware device driver for 9000 series storage controllers, version: 3.60.02.012 twa0: <3ware 9000 series Storage Controller> port 0xd400-0xd4ff mem 0xfeaffc00-0xfeaffcff,0xf380-0xf3ff irq 21 at device 9.0 on pci2 twa0: [GIANT-LOCKED] twa0: INFO: (0x15: 0x1300): Controller details:: Model 9500S-12, 12 ports, Firmware FE9X 2.06.00.009, BIOS BE9X 2.03.01.051 pcib3: at device 11.0 on pci2 pci3: on pcib3 pci3: at device 8.0 (no driver attached) fwohci0: mem 0xfc8fe000-0xfc8fefff irq 17 at device 9.0on pci3 fwohci0: OHCI version 1.0 (ROM=1) fwohci0: No. of Isochronous channels is 8. fwohci0: EUI64 00:08:d3:f0:00:00:01:09 fwohci0: Phy 1394a available S400, 3 ports. fwohci0: Link S400, max_rec 2048 bytes. firewire0: on fwohci0 fwe0: on firewire0 if_fwe0: Fake Ethernet address: 02:08:d3:00:01:09 fwe0: Ethernet address: 02:08:d3:00:01:09 fwe0: if_start running deferred for Giant sbp0: on firewire0 fwohci0: Initiate bus reset fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode firewire0: 1 nodes, maxhop <= 0, cable IRM = 0 (me) firewire0: bus manager 0 (me) fxp0: port 0xdf00-0xdf3f mem 0xfeacf000-0xfeac,0xfea8-0xfea9 irq 18 at device 12.0 on pci2 miibus0: on fxp0 inphy0: on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp0: Ethernet address: 00:07:e9:d4:a4:f8 fxp1: port 0xde80-0xdebf mem 0xfeace000-0xfeacefff,0xfea4-0xfea5 irq 19 at device 13.0 on pci2 miibus1: on fxp1 inphy1: on miibus1 inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp1: Ethernet address: 00:07:e9:d4:a4:fa pci2: at device 15.0 (no driver attached) isab0: at device 31.0 on pci0 isa0: on i
Re: NFS Locking Issue
User Freebsd writes: What are others using for ethernet? Of our two machines having the problem 1 has BGE and the other one has EM (Intel). Doesn't seem to make much of a difference. Except for the network cards, these two machines are identical. Same motherboard, same RAID controller, same amount of RAM, same RAID configuration... ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Wed, 5 Jul 2006, Francisco Reyes wrote: can you trigger it using work on just one client against a server, without client<->client interactions? This makes tracking and reproduction a lot easier Personally I am experiencing two problems. 1- NFS clients freeze/hang if the server goes away. We have clients with several mounts so if one of the servers dies then the entire operation of the client is put in jeopardy. This I can reproduce every single time with a 6.X client.. with both a 5.X and a 6.X server. "umount -f" hangs too. The problems you are experiencing are almost certainly not related to rpc.lockd, rather, bugs in the NFS client. Let's just look at the normal use hang for now, and revisit umount -f after that. as multi-client test cases are really tricky! The second case only happens under heavy load and restarting nfsd makes it go away. Basically 'b' column in vmstat goes high and the performnance of the machine falls to the floor. Going to try http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneld ebug-deadlocks.html And reading up on how to debug with DDB. Have another user who volunteered to give me some pointers.. so will try that.. so I am able to actually produce more helpfull info. If you can get into DDB when the hang has occurred, output via serial console for the following commands would be very helpful: show pcpu show allpcpu ps trace traceall show locks show alllocks show uma show malloc show lockedvnods Note that the last two will only work if you compile WITNESS in -- WITNESS significantly changes kernel timing, so you may find it closes whatever race you're running into. If you can reproduce the problem with WITNESS and INVARIANTS, that would be very useful. The above output will hopefully tell us the basic state of the system with respect to processes, threads, locking, and so on, and may help us track things down. For the above, you definitely want a serial console as it will be quite a bit of output. Also, can you send the output of the 'mount' command from the un-hung state? I notice a lot of threads stuck in 'ufs'. Finally, during the above, if you could disable background file system checking by placing the following in /etc/rc.conf: background_fsck="NO" And boot to single user mode, doing a full fsck -p before booting up, in order to make sure the file system is in a good state before beginning. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
User Freebsd writes: I believe, in Francisco's case, they are willing to pay someone to fix the NFS issues they are having, which, i'd assume, means easy access to the problematic server(s) to do proper testing in a "real life scenario" ... Correct. As long as the person is someone "trusted in the community" we could do that. And yes we are willing to come to some agreement for compensation for the help. Needless to say our introduction of new machines will go through a more rigourous test in the future.. specially when jumping to a new Release number in FreeBSD. We lost 1 big customer and after today we likely will loose 2 or 3 more.. of the big ones.. when it's all said and done we are likely to loose several thousand dollars/month due to this 6.X incidents. We are fairly new to NFS and that's why we were hoping to get someone to help us.. or at least point us in the right direction. I plan to go over the link you sent me and try to prepare at least one machine. As for paying someone, yes we have been actively looking for someone to help us since we are relatively new to NFS.. and much more newer to troubleshooting this type of prolbems ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
Robert Watson writes: It's not impossible. It would be interesting to see if ps axl reports that rpc.lockd is in the kqread state Found my post in another thread. 0 354 1 0 96 0 1412 1032 select Ss??0:07.06 /usr/sbin/rpcbind It was not in kqread state.. and that was from a point where the machine was totally locked up.. had to do a physical reset.. could not even kill nfsd that time. I had also more output from several different ps. You need to do "view more" to see them all. http://tinyurl.com/kpejr ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
Robert Watson writes: It's not impossible. It would be interesting to see if ps axl reports that rpc.lockd is in the kqread state, which would suggest it was blocked in the resolver. Just tried "ps axl | grep rpc" in the machine giving us the most grief.. Only got one line back: root 367 0.0 0.0 1368 960 ?? Ss 25Jun06 0:05.52 /usr/sbin/rpcbin 0 1 0 4 0 select Is that what one of the lines I should keep an eye, next time the machine is locked up? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
Robert Watson writes: can you trigger it using work on just one client against a server, without client<->client interactions? This makes tracking and reproduction a lot easier Personally I am experiencing two problems. 1- NFS clients freeze/hang if the server goes away. We have clients with several mounts so if one of the servers dies then the entire operation of the client is put in jeopardy. This I can reproduce every single time with a 6.X client.. with both a 5.X and a 6.X server. "umount -f" hangs too. as multi-client test cases are really tricky! The second case only happens under heavy load and restarting nfsd makes it go away. Basically 'b' column in vmstat goes high and the performnance of the machine falls to the floor. Going to try http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneld ebug-deadlocks.html And reading up on how to debug with DDB. Have another user who volunteered to give me some pointers.. so will try that.. so I am able to actually produce more helpfull info. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
Scott Long writes: For what it's worth, I recently spent a lot of time putting FreeBSD 6.1 to the test as both an NFS client and server in a mixed OS environment. I have a few debugging settings/suggestions that have been sent my way and I plan to try them tonight, but this is just another report.. FreeBSD only environment. Today after hours going crazy with horrible performance I brought down nfsd and brought it back up.. that simple process got vmstat 'b' column down and everything was back to normal. Again this will not help anyone troubleshoot, but just to mention that it happens even with a FreeBSD only environment. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: fetch hangs on AMD64 RELENG_6
On Jul 5, 2006, at 4:22 PM, Justin T. Gibbs wrote: Hmm. Seems we close the window unexpectedly and the remote side doesn't retransmit when we open it. Yes, interesting that. :-) Normally the stack only sets the window size to 0 in the event of severe congestion, it's used to tell the other side to stop sending traffic for an interval, although the other side should retry with zero-data-length ACK-only packets after a delay, or once your side sends a packet opening the window. FreeBSD's acks stop once the window is fully open... aren't the acks supposed to retried longer? If not, shouldn't fetch eventually see a socket close event instead of hanging forever? RFC-793 says: "The sending TCP must be prepared to accept from the user and send at least one octet of new data even if the send window is zero. The sending TCP must regularly retransmit to the receiving TCP even when the window is zero. Two minutes is recommended for the retransmission interval when the window is zero. This retransmission is essential to guarantee that when either TCP has a zero window the re-opening of the window will be reliably reported to the other. When the receiving TCP has a zero window and a segment arrives it must still send an acknowledgment showing its next expected sequence number and current window (zero)." The fact that you aren't seeing any ACK's back from this remote server suggests that perhaps a stateful firewall is involved which is getting confused and/or dropping the state entry once it sees the zero-window-size packet from your machine. There may be something wrong on the FreeBSD side as well, of course-- the fact that it grows the window by sending nearly twenty or more ACK packets in the span of about one millisecond without waiting for any ACKs from the other side is pretty wacky in it's own right. -- -Chuck ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: fetch hangs on AMD64 RELENG_6
Hmm. Seems we close the window unexpectedly and the remote side doesn't retransmit when we open it. FreeBSD's acks stop once the window is fully open... aren't the acks supposed to retried longer? If not, shouldn't fetch eventually see a socket close event instead of hanging forever? A similar failure occurs with SACK disabled. -- Justin 13:31:44.695211 IP manna.mozilla.org.http > databus.avidyne.com.64531: . 9018128:9019496(1368) ack 179 win 1716 13:31:44.695229 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 8957936 win 32832 13:31:44.702704 IP manna.mozilla.org.http > databus.avidyne.com.64531: . 9019496:9020864(1368) ack 179 win 1716 13:31:44.702719 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 8957936 win 32832 13:31:44.710200 IP manna.mozilla.org.http > databus.avidyne.com.64531: . 9020864:9022232(1368) ack 179 win 1716 13:31:44.710215 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 8957936 win 32832 13:31:44.719444 IP manna.mozilla.org.http > databus.avidyne.com.64531: . 9022232:9023600(1368) ack 179 win 1716 13:31:44.719462 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 8957936 win 32832 13:31:44.727065 IP manna.mozilla.org.http > databus.avidyne.com.64531: . 8957936:8959304(1368) ack 179 win 1716 13:31:44.727089 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 9023600 win 0 13:31:44.727146 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 9023600 win 1680 13:31:44.727181 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 9023600 win 3216 13:31:44.727275 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 9023600 win 4752 13:31:44.727295 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 9023600 win 6288 13:31:44.727342 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 9023600 win 7824 13:31:44.727375 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 9023600 win 9360 13:31:44.727492 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 9023600 win 10896 13:31:44.727513 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 9023600 win 12432 13:31:44.727565 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 9023600 win 15504 13:31:44.727632 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 9023600 win 17040 13:31:44.727653 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 9023600 win 18576 13:31:44.727701 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 9023600 win 20112 13:31:44.727780 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 9023600 win 21648 13:31:44.727870 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 9023600 win 23184 13:31:44.727889 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 9023600 win 24720 13:31:44.727920 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 9023600 win 26256 13:31:44.727982 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 9023600 win 27792 13:31:44.728034 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 9023600 win 29328 13:31:44.728053 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 9023600 win 30864 13:31:44.728217 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 9023600 win 32400 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
> with the bge driver ... could we be possibly talking internet vs nfs > issues? Pursuing invetigations, i have discovered that for people having workstations whose home directories are on a NFS server, and who run Gnome or KDE, there is a program which has horrible NFS behavior, it is gam_server from gamin, which detects alterations on your .kde for example. On my machine running nfsstat -c -w 1 i see 4000 requests/s due to that. If i displace it (*) and kill it, this drops to 80 requests/s and KDE works exactly as well, including discovering new files. I think it is not necessary to comment on the performance penalty if a number of stations send 4000r/s to a server, it will soon be killed. (*) it restarts itself automatically so it is necessary to displace or rename it before killing. -- Michel TALON ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Network Card
http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/config- network-setup.html might help you Regards, cian On 5 Jul 2006, at 18:54, Mihir Sanghavi wrote: Hi, Can someone please tell me how do i activate the network card in FreeBSD 5.5. Thanks. -- What we see depends mainly on what we look for. -MIHIR ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable- [EMAIL PROTECTED]" PGP.sig Description: This is a digitally signed message part
Network Card
Hi, Can someone please tell me how do i activate the network card in FreeBSD 5.5. Thanks. -- What we see depends mainly on what we look for. -MIHIR ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: fetch hangs on AMD64 RELENG_6
Justin T. Gibbs wrote: > Hi, > > I'm seeing fetch hang under AMD64/RELENG_6 when fetching data > from several different sites. An i386 machinem sitting next to it > running current from a few weeks back is not showing this problem > when fetching the same files. The failing machine is a Dell 2850 > with an em0 device. We have a T-1 here, so transfer speeds are > usually well over 100KBps. fetch is stuck in sbwait. Restarting > fetch a few times will eventually allow the transfer to complete. > Anyone else seen this? Any hints on how I might help debug the > problem? > Are these fetches for ports installs, and if so are they from the gnu.org site(s)? I noticed a similar issue myself last night when doing some installs from ports, and they were all related to gnu.org FTP sites. Otherwise fetch was working just as expected. -Proto ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 5.5-stable network interface rl0 stops working
On Wed, Jul 05, 2006 at 06:40:58PM +0200, Hank Hampel wrote: > Hello everybody, > > I have a very disturbing problem with one of our FreeBSD 5.5-stable > machines. It is a box on which ~10 jail systems run, each with > small to moderate network traffic. > > Now from time to time - sometimes after a few days, sometimes after a > couple of weeks - the network interface rl0 (which is the main > interface on the maschine, rl1 is for backups/internal use only) stops > working. Are they physically on the motherboard? Or on PCI cards? In the latter case try reseating the card in the slot. Try switching rl0 and rl1, and see if te problem persists. Also, swapping out the ethernet cable is worth trying. Another thing to check is if rl0 is sharing an interrupt with another device. That can cause problems. > Each jailed system has its own firewall ruleset, permitting only > traffic for the services in that specific jail. The packet filter used > is ipfw. Some of the rules are stateful (keep-state). > > When rl0 stops working ipfw loggs lots of denied packets so that it > seems that the dynamic (keep-state) rules don't work any longer. We > checked and increased the buffers for the dynamic rules to no avail - > I doubt they are part of the problem. I'm not even sure ipfw is part > of the problem. Does the problem persist without ipfw? I've got an rl0 card on my workstation (6.1-STABLE, amd64, using PF without problems) > After the stop on the interface occurs there is no other way to get > the interface up and running again than rebooting the whole machine. > Restarting /etc/rc.d/netif, the jails or ipfw doesn't help anything. What does ifconfig say after the interface stops working? > The bad thing is I haven't found any way to trigger this problem so > that I can only check and change things and wait if the situation > improves or not. For example I've already set debug.mpsafenet="0" but > this doesn't help, in contrast it seems to worsen the problem a little > bit. > Find attached the dmesg output of the machine. If any other > information is needed to hunt down the cause of this problem please > let me know. I checked various list archives but haven't found a clue > yet. Anything in the logs, except the denied packets? Roland -- R.F.Smith http://www.xs4all.nl/~rsmith/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725) pgpqxfkKnhmwC.pgp Description: PGP signature
Re: NFS Locking Issue
On Wed, 5 Jul 2006, Michel Talon wrote: So it may be relevant to say that i have kernels without IPV6 support. Recall that i have absolutely no problem with the client in FreeBSD-6.1. Tomorrow i will test one of the 6.1 machines as a NFS server and the other as a client, and will make you know if i see something. Well, i have checked between 2 FreeBSD-6.1-RELEASE machines on the network, both have fxp ethernet driver running at 100 Mb/s, one is NFS server other NFS client. Both run lockd and statd. I have absolutely no problem exchanging files, for example if i begin to copy /usr/src through NFS from one machine to the other, which makes a lot of transactions of all sorts, i get: niobe# mount asmodee:/usr/src /mnt cp -R /mnt/src . ... after some time i interrupt the transfer niobe% du -sh . 131M. and during this time i observe the following type of statistics asmodee% netstat -w 1 -I fxp0 input (fxp0) output packets errs bytespackets errs bytes colls 542 0 84116 1330 01219388 0 515 0 72806 1290 01196330 0 501 0 95722 1081 0 741048 0 539 0 90704 1090 01228052 0 645 0 67888902 01451098 0 405 0 81264 1609 0 604278 0 503 0 74218709 0 924422 0 500 0 98904973 0 619350 0 550 0 100122855 0 836328 0 615 0 79336 1081 0 862772 0 577 0 82862901 01005024 0 which looks decent to me. Doing the same with just one big file no problem either, and i get a transfer speed of 6.60 MB/s which is perhaps a little less than with linux, but nothing catastrophic. I get 8.20 MB/s for FreeBSD client interacting with the Linux server. Now netstat gives packets errs bytespackets errs bytes colls 785 0 123266 4716 06825600 0 759 0 139898 4530 07747276 0 852 0 124652 5106 06902566 0 863 0 128040 5170 07081738 0 811 0 123760 4862 06851498 0 789 0 123540 4720 06834310 0 840 0 115378 5024 06382114 0 So up to what i can see NFS works OK for me on FreeBSD-6.1. So the main difference with other people cases may be that i have removed IPV6 support from kernel. What are others using for ethernet? In your case, you say you are running between fxp cards ... I've heard some report, in another thread, problems with the bge driver ... could we be possibly talking internet vs nfs issues? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: fetch hangs on AMD64 RELENG_6
On Jul 5, 2006, at 1:42 PM, Justin T. Gibbs wrote: I'm seeing fetch hang under AMD64/RELENG_6 when fetching data from several different sites. An i386 machine sitting next to it running current from a few weeks back is not showing this problem when fetching the same files. [ ... ] Any hints on how I might help debug the problem? Using tcpdump to look at the traffic would be a useful starting point. :-) -- -Chuck ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
fetch hangs on AMD64 RELENG_6
Hi, I'm seeing fetch hang under AMD64/RELENG_6 when fetching data from several different sites. An i386 machinem sitting next to it running current from a few weeks back is not showing this problem when fetching the same files. The failing machine is a Dell 2850 with an em0 device. We have a T-1 here, so transfer speeds are usually well over 100KBps. fetch is stuck in sbwait. Restarting fetch a few times will eventually allow the transfer to complete. Anyone else seen this? Any hints on how I might help debug the problem? Thanks, Justin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
5.5-stable network interface rl0 stops working
Hello everybody, I have a very disturbing problem with one of our FreeBSD 5.5-stable machines. It is a box on which ~10 jail systems run, each with small to moderate network traffic. Now from time to time - sometimes after a few days, sometimes after a couple of weeks - the network interface rl0 (which is the main interface on the maschine, rl1 is for backups/internal use only) stops working. Each jailed system has its own firewall ruleset, permitting only traffic for the services in that specific jail. The packet filter used is ipfw. Some of the rules are stateful (keep-state). When rl0 stops working ipfw loggs lots of denied packets so that it seems that the dynamic (keep-state) rules don't work any longer. We checked and increased the buffers for the dynamic rules to no avail - I doubt they are part of the problem. I'm not even sure ipfw is part of the problem. After the stop on the interface occurs there is no other way to get the interface up and running again than rebooting the whole machine. Restarting /etc/rc.d/netif, the jails or ipfw doesn't help anything. The bad thing is I haven't found any way to trigger this problem so that I can only check and change things and wait if the situation improves or not. For example I've already set debug.mpsafenet="0" but this doesn't help, in contrast it seems to worsen the problem a little bit. Find attached the dmesg output of the machine. If any other information is needed to hunt down the cause of this problem please let me know. I checked various list archives but haven't found a clue yet. -[ dmesg ]- Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.5-STABLE #5: Tue May 30 13:51:55 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SHAWSHANK WARNING: MPSAFE network stack disabled, expect reduced performance. Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Pentium(R) 4 CPU 2.40GHz (2411.60-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf34 Stepping = 4 Features=0xbfebfbff real memory = 2147418112 (2047 MB) avail memory = 2096037888 (1998 MB) ACPI APIC Table: ioapic0 irqs 0-23 on motherboard npx0: on motherboard npx0: INT 16 interface acpi0: on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 cpu0: on acpi0 acpi_button0: on acpi0 pcib0: port 0x1000-0x10bf,0xcf8-0xcff on acpi0 pci0: on pcib0 agp0: mem 0xe800-0xefff at device 0.0 on pci0 pcib1: at device 1.0 on pci0 pci1: on pcib1 pcib2: at device 30.0 on pci0 pci2: on pcib2 pci2: at device 0.0 (no driver attached) rl0: port 0x9000-0x90ff mem 0xf500-0xf5ff irq 21 at device 1.0 on pci2 miibus0: on rl0 rlphy0: on miibus0 rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto rl0: Ethernet address: 00:02:2a:d5:39:74 rl1: port 0x9400-0x94ff mem 0xf5001000-0xf50010ff irq 22 at device 2.0 on pci2 miibus1: on rl1 rlphy1: on miibus1 rlphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto rl1: Ethernet address: 00:02:2a:d5:39:53 isab0: at device 31.0 on pci0 isa0: on isab0 atapci0: port 0xf000-0xf00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.1 on pci0 ata0: channel #0 on atapci0 ata1: channel #1 on atapci0 pci0: at device 31.3 (no driver attached) acpi_tz0: on acpi0 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A, console sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A pmtimer0 on isa0 orm0: at iomem 0xc-0xc7fff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x100> vga0: at port 0x3c0-0x3df iomem 0xa-0xb on isa0 atkbdc0: at port 0x64,0x60 on isa0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 ppc0: parallel port not found. Timecounter "TSC" frequency 2411601876 Hz quality 800 Timecounters tick every 10.000 msec ipfw2 initialized, divert disabled, rule-based forwarding disabled, default to deny, logging disabled ad0: 114497MB [232629/16/63] at ata0-master UDMA100 acd0: DVDROM at ata1-master PIO4 Mounting root from ufs:/dev/ad0s1a -[ dmesg ]- Best regards, Hank pgpesF2HPryqd.pgp Description: PGP signature
Re: NFS Locking Issue
On Mon, 3 Jul 2006, Michael Collette wrote: - Let's start with the simplest. The scenario here involves 2 machines, mach01 and mach02. Both are running 6-STABLE, and both are running rpcbind, rpc.statd, and rpc.lockd. mach01 has exported /documents and mach02 is mounting that export under /mnt. Simple enough? The /documents directory has multiple subdirectories and files of various sizes. The actual amount of data doesn't really matter to produce a failure. All you need to do at this point is to try to copy files from that mount point to somewhere else on the hard drive. cp -Rp /mnt/* /tmp/documents/ You may, or not, see that a couple of subdirectories were created, but no files actually moved over. The cp command is now locked up, and no traffic moves. This usually takes a second or two to show up as a problem. I can repeat this with multiple 6-STABLE boxes. Turn off rpc.lockd on either the server or client before the cp command, and things work. I've tried several times to reproduce this, and have not succeeded in doing so. In princple, cp should not be using advisory locks. Could you try running cp under ktrace, and saving the ktrace file somewhere outside of NFS? Something like the following: ktrace -f /usr/tmp/localfile cp -Rp /mnt/* /tmp/documents/ If you are able to reproduce the problem with tracing turned on, a copy of the tracefile would be very helpful. Also, when it locks up, are you able to kill cp using Ctrl-C, and if you hit Ctrl-T while it appears locked, what output do you get? Thanks, Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Wed, 5 Jul 2006, Robert Watson wrote: On Wed, 5 Jul 2006, Danny Braniss wrote: In my case our main servers are NetApp, and the problems are more related to am-utils running into some race condition (need more time to debug this :-) the other problem is related to throughput, freebsd is slower than linux, and while freebsd/nfs/tcp is faster on Freebsd than udp, on linux it's the same. So it seems some tunning is needed. our main problem now is samba/rpc.lockd, we are stuck with a server running FreeBSD 5.4 which crashes, and we can't upgrade to 6.1 because lockd doesn't work. So, if someone is willing to look into the lockd issue, we would like to help. The most significant problem working with rpc.lockd is creating easy to reproduce test cases. Not least because they can potentially involve multiple clients. If you can help to produce simple test cases to reproduce the bugs you're seeing, that would be invaluable. I'm aware of two general classes of problems with rpc.lockd. First, architectural issues, some derived from architectural problems in the NLM protocol: for example, assumptions that there can be a clean mapping of process lock owners to locks, which fall down as locks are properties of file descriptors that can be inheritted. Second, implementation bugs/misfeatures, such as the kernel not knowing how to cancel lock requests, so being unable to implement interruptible waits on locks in the distributed case. Reducing complex failure modes to easily reproduced test cases is tricky also, though. It requires careful analysis, often with ktrace and tcpdump/ethereal to work out what's going on, and not a little luck to perform the reduction of a large trace down to a simple test scenario. The first step is to try and figure out what, if any, specific workload results in a problem. For example, can you trigger it using work on just one client against a server, without client<->client interactions? This makes tracking and reproduction a lot easier, as multi-client test cases are really tricky! Once you've established whether it can be reproduced with a single client, you have to track down the behavior that triggers it -- normally, this is done by attempting to narrow down the specific program or sequence of events that causes the bug to trigger, removing things one at a time to see what causes the problem to disappear. This is made more difficult as lock managers are sensitive to timing, so removing a high load item from the list, even if it isn't the source of the problem, might cause it to trigger less frequently. I'm not sure if this is an option for anyone, either developer or user, but in the past, on particularly tricky bugs where I seemed to be the only one to be able to produce it, I've given access to a 'trusted developer' to the machine itself, to minimize the time lag that emails create ... but, also, to let the developer at a machine that has the load required to easily reproduce it ... Not sure if there is anyone out there, on either side of the proverbial fence, that feels comfortable doing this, but figured I'd throw the idea out ... I believe, in Francisco's case, they are willing to pay someone to fix the NFS issues they are having, which, i'd assume, means easy access to the problematic server(s) to do proper testing in a "real life scenario" ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
> So it may be relevant to say that i have kernels without IPV6 support. > Recall that i have absolutely no problem with the client in FreeBSD-6.1. > Tomorrow i will test one of the 6.1 machines as a NFS server and the other as > a client, and will make you know if i see something. Well, i have checked between 2 FreeBSD-6.1-RELEASE machines on the network, both have fxp ethernet driver running at 100 Mb/s, one is NFS server other NFS client. Both run lockd and statd. I have absolutely no problem exchanging files, for example if i begin to copy /usr/src through NFS from one machine to the other, which makes a lot of transactions of all sorts, i get: niobe# mount asmodee:/usr/src /mnt cp -R /mnt/src . ... after some time i interrupt the transfer niobe% du -sh . 131M. and during this time i observe the following type of statistics asmodee% netstat -w 1 -I fxp0 input (fxp0) output packets errs bytespackets errs bytes colls 542 0 84116 1330 01219388 0 515 0 72806 1290 01196330 0 501 0 95722 1081 0 741048 0 539 0 90704 1090 01228052 0 645 0 67888902 01451098 0 405 0 81264 1609 0 604278 0 503 0 74218709 0 924422 0 500 0 98904973 0 619350 0 550 0 100122855 0 836328 0 615 0 79336 1081 0 862772 0 577 0 82862901 01005024 0 which looks decent to me. Doing the same with just one big file no problem either, and i get a transfer speed of 6.60 MB/s which is perhaps a little less than with linux, but nothing catastrophic. I get 8.20 MB/s for FreeBSD client interacting with the Linux server. Now netstat gives packets errs bytespackets errs bytes colls 785 0 123266 4716 06825600 0 759 0 139898 4530 07747276 0 852 0 124652 5106 06902566 0 863 0 128040 5170 07081738 0 811 0 123760 4862 06851498 0 789 0 123540 4720 06834310 0 840 0 115378 5024 06382114 0 So up to what i can see NFS works OK for me on FreeBSD-6.1. So the main difference with other people cases may be that i have removed IPV6 support from kernel. -- Michel TALON ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Wed, Jul 05, 2006 at 02:04:59PM +0100, Robert Watson wrote: > > On Wed, 5 Jul 2006, Kostik Belousov wrote: > > >>Also, the both lockd processes now put identification information in the > >>proctitle (srv and kern). SIGUSR1 shall be sent to srv process. > > > >Hmm, after looking at the dump there and some code reading, I have noted > >the following: > > > >1. NLM lock request contains the field caller_name. It is filled by (let > >call it) kernel rpc.lockd by the results of hostname(3). > > > >2. This caller_name is used by server rpc.lockd to send request for host > >monitoring to rpc.statd (see send_granted). Request is made by clnt_call, > >that is blocking rpc call. > > > >3. rpc.statd does getaddrinfo on caller_name to determine address of the > >host to monitor. > > > >If the getaddrinfo in step 3 waits for resolver, then your client machine > >will get locking process in"lockd" state. > > > >Could people experiencing rpc.lockd mistery at least report whether > >_server_ machine successfully resolve hostname of clients as reported by > >hostname? And, if yes, to what family of IP protocols ? > > It's not impossible. It would be interesting to see if ps axl reports that > rpc.lockd is in the kqread state, which would suggest it was blocked in the rpc.statd :). > resolver. We probably ought to review rpc.statd and make sure it's > generally sensible. I've noticed that its notification process on start is > a bit poorly structured in terms of how it notifies hosts of its state > change -- if one host is down, it may take a very long time to notify other > hosts. pgpExEUvwNn5G.pgp Description: PGP signature
Re: NFS Locking Issue
On Wed, 5 Jul 2006, Kostik Belousov wrote: Also, the both lockd processes now put identification information in the proctitle (srv and kern). SIGUSR1 shall be sent to srv process. Hmm, after looking at the dump there and some code reading, I have noted the following: 1. NLM lock request contains the field caller_name. It is filled by (let call it) kernel rpc.lockd by the results of hostname(3). 2. This caller_name is used by server rpc.lockd to send request for host monitoring to rpc.statd (see send_granted). Request is made by clnt_call, that is blocking rpc call. 3. rpc.statd does getaddrinfo on caller_name to determine address of the host to monitor. If the getaddrinfo in step 3 waits for resolver, then your client machine will get locking process in"lockd" state. Could people experiencing rpc.lockd mistery at least report whether _server_ machine successfully resolve hostname of clients as reported by hostname? And, if yes, to what family of IP protocols ? It's not impossible. It would be interesting to see if ps axl reports that rpc.lockd is in the kqread state, which would suggest it was blocked in the resolver. We probably ought to review rpc.statd and make sure it's generally sensible. I've noticed that its notification process on start is a bit poorly structured in terms of how it notifies hosts of its state change -- if one host is down, it may take a very long time to notify other hosts. There are a number of other dubious things about the NLM protocol design (at least, from my reading last night). I've also noticed that our rpc.lockd is particularly sensitive, on the client side, to locks being released by a different process than the process that acquired the lock, which is triggered excessively by our new libpidfile in RELENG_6. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Wed, Jul 05, 2006 at 02:38:22PM +0300, Kostik Belousov wrote: > On Wed, Jul 05, 2006 at 10:09:24AM +0100, Robert Watson wrote: > > The most significant problem working with rpc.lockd is creating easy to > > reproduce test cases. Not least because they can potentially involve > > multiple clients. If you can help to produce simple test cases to > > reproduce the bugs you're seeing, that would be invaluable. > > > > > > > Reducing complex failure modes to easily reproduced test cases is tricky > > also, though. It requires careful analysis, often with ktrace and > > tcpdump/ethereal to work out what's going on, and not a little luck to > > perform the reduction of a large trace down to a simple test scenario. The > > first step is to try and figure out what, if any, specific workload results > > in a problem. For example, can you trigger it using work on just one > > client against a server, without client<->client interactions? This makes > > tracking and reproduction a lot easier, as multi-client test cases are > > really tricky! Once you've established whether it can be reproduced with a > > single client, you have to track down the behavior that triggers it -- > > normally, this is done by attempting to narrow down the specific program or > > sequence of events that causes the bug to trigger, removing things one at a > > time to see what causes the problem to disappear. This is made more > > difficult as lock managers are sensitive to timing, so removing a high load > > item from the list, even if it isn't the source of the problem, might cause > > it to trigger less frequently. > > I made the patch for rpc.lockd that could somewhat ease obtaining > debug information. Patch is available at > http://people.freebsd.org/~kib/rpc.lockd-debug.patch > > No functional changes. Patch only adds dumping of currently held locks > (as perceived by lockd) on receiving of SIGUSR1. You need to specify > debug level 2 or 3 to obtain the dump. > > Also, the both lockd processes now put identification information > in the proctitle (srv and kern). SIGUSR1 shall be sent to srv process. Hmm, after looking at the dump there and some code reading, I have noted the following: 1. NLM lock request contains the field caller_name. It is filled by (let call it) kernel rpc.lockd by the results of hostname(3). 2. This caller_name is used by server rpc.lockd to send request for host monitoring to rpc.statd (see send_granted). Request is made by clnt_call, that is blocking rpc call. 3. rpc.statd does getaddrinfo on caller_name to determine address of the host to monitor. If the getaddrinfo in step 3 waits for resolver, then your client machine will get locking process in"lockd" state. Could people experiencing rpc.lockd mistery at least report whether _server_ machine successfully resolve hostname of clients as reported by hostname? And, if yes, to what family of IP protocols ? pgpqXwVLbOl6l.pgp Description: PGP signature
Re: NFS Locking Issue
On Wed, Jul 05, 2006 at 10:09:24AM +0100, Robert Watson wrote: > The most significant problem working with rpc.lockd is creating easy to > reproduce test cases. Not least because they can potentially involve > multiple clients. If you can help to produce simple test cases to > reproduce the bugs you're seeing, that would be invaluable. > > > Reducing complex failure modes to easily reproduced test cases is tricky > also, though. It requires careful analysis, often with ktrace and > tcpdump/ethereal to work out what's going on, and not a little luck to > perform the reduction of a large trace down to a simple test scenario. The > first step is to try and figure out what, if any, specific workload results > in a problem. For example, can you trigger it using work on just one > client against a server, without client<->client interactions? This makes > tracking and reproduction a lot easier, as multi-client test cases are > really tricky! Once you've established whether it can be reproduced with a > single client, you have to track down the behavior that triggers it -- > normally, this is done by attempting to narrow down the specific program or > sequence of events that causes the bug to trigger, removing things one at a > time to see what causes the problem to disappear. This is made more > difficult as lock managers are sensitive to timing, so removing a high load > item from the list, even if it isn't the source of the problem, might cause > it to trigger less frequently. I made the patch for rpc.lockd that could somewhat ease obtaining debug information. Patch is available at http://people.freebsd.org/~kib/rpc.lockd-debug.patch No functional changes. Patch only adds dumping of currently held locks (as perceived by lockd) on receiving of SIGUSR1. You need to specify debug level 2 or 3 to obtain the dump. Also, the both lockd processes now put identification information in the proctitle (srv and kern). SIGUSR1 shall be sent to srv process. pgpyMjtyKCekU.pgp Description: PGP signature
mountd changed?
something has changed wrt nmount(2)/mountd(8)/exports(5): > cat /etc/exports /h -alldirs -network 132.65.0.0 -mask 255.255.0.0 > cat /etc/fstab /dev/da1s1d /h ufs rw 1 1 and all is fine, the filesystem is exported and accesible. # /etc/rc.d/mountd reload Reloading mountd config files. but /var/log/messages: mountd[473]: can't change attributes for /h mountd[473]: bad exports list line /h -alldirs -network 132.65.0.0 -mask 255.255.0.0 btw, nothing has changed in the /etc/exports file. 2nd, the root (/) is nfs readonly. and now any attempt to mount is denied. just in case: kern.securelevel: -1 danny ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
Quoting Michel Talon <[EMAIL PROTECTED]>: So it would appear that you cured the NFS problems inherent with FBSD-6 by replacing FBSD with Fedora Linux. Nice to know that NFSd works in Linux. But won't help those on the FBSD list fix their FBSD-6 boxen. :/ First NFS is designed to make machines of different OSs interact properly. Yes, this is it's purpose. If a FreeBSD server interacts properly with a FreeBSD client, but not other clients, you cannot say that the situation is fine. Indeed. Second i am not the one to chose the NFS server, there are people working in social groups, in the real world. And third, the most important, the OP message seemed to imply that the FreeBSD-6 NFS client was at fault, i pointed out that in my experience my FreeBSD-6.1 client works OK, while the 6.0 doesn't, when interacting with a FC5 server. This is in itself a relevant piece of information for the problem at hand. It may be that the server side is at fault, or some complex interaction between client and server. Of course. I quite agree. Horrible oversight on my part. Anyways some people claimed here that they had no problem with FreeBSD-5 clients and servers. My experience is that i had constant problems between FreeBSD-5 clients and Fedora Core 3 servers. I cannot provide any other data point. I am not particularly sure of the quality of the FC3 or FC5 NFS server implementation, except that the ~ 100 workstations running the similar Fedora distribution work like a charm with their homes NFS mounted on the server. On the other hand a Debian client machine also has severe NFS problems. My only conclusion is that these NFS stories are very tricky. The only moment everything worked fine was when we were running Solaris on the server. Useful knowledge, to be sure. Sorry for my oversight. I should probably refrain from responding when I have too many other things purculating in my mind while at work. This has gotten me in trouble once before on this _same_ list. :) Thank you for your thoughtful response. -- Michel TALON ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]" -- panic: kernel trap (ignored) - FreeBSD 5.4-RELEASE-p12 (SMP - 900x2) Tue Mar 7 19:37:23 PST 2006 / pgpHofOVV3K34.pgp Description: PGP Digital Signature
Re: NFS Locking Issue
On Wed, 5 Jul 2006, Danny Braniss wrote: In my case our main servers are NetApp, and the problems are more related to am-utils running into some race condition (need more time to debug this :-) the other problem is related to throughput, freebsd is slower than linux, and while freebsd/nfs/tcp is faster on Freebsd than udp, on linux it's the same. So it seems some tunning is needed. our main problem now is samba/rpc.lockd, we are stuck with a server running FreeBSD 5.4 which crashes, and we can't upgrade to 6.1 because lockd doesn't work. So, if someone is willing to look into the lockd issue, we would like to help. The most significant problem working with rpc.lockd is creating easy to reproduce test cases. Not least because they can potentially involve multiple clients. If you can help to produce simple test cases to reproduce the bugs you're seeing, that would be invaluable. I'm aware of two general classes of problems with rpc.lockd. First, architectural issues, some derived from architectural problems in the NLM protocol: for example, assumptions that there can be a clean mapping of process lock owners to locks, which fall down as locks are properties of file descriptors that can be inheritted. Second, implementation bugs/misfeatures, such as the kernel not knowing how to cancel lock requests, so being unable to implement interruptible waits on locks in the distributed case. Reducing complex failure modes to easily reproduced test cases is tricky also, though. It requires careful analysis, often with ktrace and tcpdump/ethereal to work out what's going on, and not a little luck to perform the reduction of a large trace down to a simple test scenario. The first step is to try and figure out what, if any, specific workload results in a problem. For example, can you trigger it using work on just one client against a server, without client<->client interactions? This makes tracking and reproduction a lot easier, as multi-client test cases are really tricky! Once you've established whether it can be reproduced with a single client, you have to track down the behavior that triggers it -- normally, this is done by attempting to narrow down the specific program or sequence of events that causes the bug to trigger, removing things one at a time to see what causes the problem to disappear. This is made more difficult as lock managers are sensitive to timing, so removing a high load item from the list, even if it isn't the source of the problem, might cause it to trigger less frequently. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
Mornin' On Tue, Jul 04, 2006 at 09:47:21PM +0100, Robert Watson wrote: > BTW, I noticed yesterday that that IPv6 support committ to rpc.lockd was > never backed out. An immediate question for people experiencing new > rpc.lockd problems with 6.x should be whether or not backing out that > change helps. That could be a good pointer. I also started experiencing some problems at home (I did not investigate further though, but started using local locking and all was fine), while in our prod setup, where lots of machines are running, and many of them use 6-STABLE of not too long ago, I never experienced any problems with NFS. The main difference between both these networks is, that at home I have an IPv6 environment, while at work it's IPv4 only. I barely find time before the weekend to do tests, but if I don't read any postings telling, that this made a difference, I will then start testing at home. Thanx, Oliver -- | Oliver Brandmueller | Offenbacher Str. 1 | Germany D-14197 Berlin | | Fon +49-172-3130856 | Fax +49-172-3145027 | WWW: http://the.addict.de/ | | Ich bin das Internet. Sowahr ich Gott helfe. | | Eine gewerbliche Nutzung aller enthaltenen Adressen ist nicht gestattet! | pgp9BUYZloqfB.pgp Description: PGP signature
novell mount losing state
Hello, i'am using FreeBSD 6.1 Stable and tried to mount an Novell volume (mount_nwfs). Mounting the volume works without problems but after some time of inactivity on that mount i have to remount the volume to get access again. Syslog message: Jul 5 08:51:08 pcmcb3-104 kernel: ncprq: Restoring connection, flags = 101 Output of "ncplist c" working mount (yesterday evening) Active NCP connections: refid server:user(connid), owner:group(mode), refs, 7 SERVER:USER(483), root:wheel(755), 1, Output of "ncplist c" non working mount (today morning) Active NCP connections: refid server:user(connid), owner:group(mode), refs, 7 SERVER:USER(397), root:wheel(755), 1, <> If i use a cronjob to access the mount periodically there is no such problem! Any hints? If this is the wrong list please let me know If you need more info you're welcome Thanks in advance Maik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"