Re: ntpd hanging machine
Peter Dufault wrote: > It's probably almost safe to run a program > rtprio or idprio if all you do is compute during that time and > go back to time sharing before doing anything else, but be sure > you're paged in, don't handle signals that way, etc... I ran rc5des idprio'ed for well over a year on 2.2.8, 3.2+ and 4.0 systems and never had a problem. Sounds like it fell just into your parameters here though. Perhaps a stern warning in the man page(s) is in order here? I had no idea I was making land mines for myself here. :) Doug -- "Welcome to the desert of the real." - Laurence Fishburne as Morpheus, "The Matrix" To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ntpd hanging machine
> On Thu, 2 Mar 2000, Matthew Dillon wrote: > > > :> > merge. Until that point I was using the stock ntp4 from udel with no > > :> > problems. But I tried the one shipping with 4.0 and it locks up completely > > > rtprio (and idprio) is virtually guarenteed to lockup your machine > > eventually. Don't use either. > > Unfortunately, ntpd in -current uses rtprio by default. > > Perter Dufault's recent changes fixed some related things, but not the > priority inversion problems. My guess is that the new ntpd now does something while it is rtpriod and it didn't used to. As far as I can tell the rtprio does nothing except maybe resume ntpd in preference to other readied kernel processes. The rtprio should be taken out of ntpd, it also (as I've talked over with Bruce) screws up the kernel priorities. This illustrates the danger of using it at all. With no way to bail out if you get upside down, any mods to the program are dangerous. It's probably almost safe to run a program rtprio or idprio if all you do is compute during that time and go back to time sharing before doing anything else, but be sure you're paged in, don't handle signals that way, etc... My "solution" would be to make rtprio lower priority than regular time sharing and if it needs to use any time sharing resources fault it or temporarily boost it up to time sharing. Peter -- Peter Dufault ([EMAIL PROTECTED]) Realtime development, Machine control, HD Associates, Inc. Fail-Safe systems, Agency approval To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ntpd hanging machine
: :On Thu, Mar 02, 2000 at 02:55:16PM -0800, a little birdie told me :that Matthew Dillon remarked :> :> rtprio (and idprio) is virtually guarenteed to lockup your machine :> eventually. Don't use either. : :Hm. :I've run ntpd rtprio'd to 52 for over a year, under -CURRENT and :RELENG_2_2. Never had a freeze/crash I coudl attribute to it. : :-- :Matthew Fuller (MF4839) |[EMAIL PROTECTED] idprio is a bigger problem then rtprio, but both suffer from priority inversion issues. If an idprio process blocks on something (like a disk read) and a normal process gets into a cpu-bound loop, the idprio process never gets cpu and the result is that all the resources locked by the idprio process while it is blocked stay locked. This can lockup the kernel. An rtprio process, being higher priority, has a similar issue but in a reverse sense. If a normal process blocks on something like a disk read and an rtprio process gets stuck in a cpu-bound loop, another rtprio process trying to do something that requires the resources locked by the non-rtprio process will block indefinitely. idprio based priority inversion problems are a bigger issue since it is far more likely for a normal process to get stuck in a cpu-bound loop. I run my ntpd's normally. I don't use rtprio or idprio. It works just fine, even on systems with heavy loads. If you are worried you can run it at nice -20 (still as a normal process). The reason it will work just fine is simply because it does not use very much cpu so when it *does* need the cpu the scheduler gives it cpu. The ntp protocol takes into account laggy networks and laggy response times, all you loose is a few milliseconds in accuracy (big whoopteedo). -Matt Matthew Dillon <[EMAIL PROTECTED]> To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ntpd hanging machine
On Thu, 2 Mar 2000, Matthew Dillon wrote: > :> > merge. Until that point I was using the stock ntp4 from udel with no > :> > problems. But I tried the one shipping with 4.0 and it locks up completely > rtprio (and idprio) is virtually guarenteed to lockup your machine > eventually. Don't use either. Unfortunately, ntpd in -current uses rtprio by default. Perter Dufault's recent changes fixed some related things, but not the priority inversion problems. Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ntpd hanging machine
On Thu, Mar 02, 2000 at 02:55:16PM -0800, a little birdie told me that Matthew Dillon remarked > > rtprio (and idprio) is virtually guarenteed to lockup your machine > eventually. Don't use either. Hm. I've run ntpd rtprio'd to 52 for over a year, under -CURRENT and RELENG_2_2. Never had a freeze/crash I coudl attribute to it. -- Matthew Fuller (MF4839) |[EMAIL PROTECTED] Unix Systems Administrator |[EMAIL PROTECTED] Specializing in FreeBSD |http://www.over-yonder.net/ "The only reason I'm burning my candle at both ends, is because I haven't figured out how to light the middle yet" To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ntpd hanging machine
:> > merge. Until that point I was using the stock ntp4 from udel with no :> > problems. But I tried the one shipping with 4.0 and it locks up completely :> > (looks like a hardware lockup). The ntp4 from udel works completely :> > though. Odd :) :> :> Yes, that's odd. I've never seen that... Can you try to compile a kernel with :> DDB and try to see whether it hits the debugger or not ? :> : :Already tried, no go. Its a complete lockup. However, I just updated to :4.0.99g from the udel ftp, and I am seeing the same lockups. Someone :suggested trying it w/o the rtprio calls, which I am rebuilding for as I :write this. : _ __ ___ ___ ___ ___ : Wesley N Morgan _ __ ___ | _ ) __| \ : [EMAIL PROTECTED] _ __ | _ \._ \ |) | : FreeBSD: The Power To Serve _ |___/___/___/ :Hi! I'm a .signature virus! Copy me into your ~/.signature to help me spread! rtprio (and idprio) is virtually guarenteed to lockup your machine eventually. Don't use either. -Matt Matthew Dillon <[EMAIL PROTECTED]> To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ntpd hanging machine
On Wed, 1 Mar 2000, Ollivier Robert wrote: > According to Wes Morgan: > > merge. Until that point I was using the stock ntp4 from udel with no > > problems. But I tried the one shipping with 4.0 and it locks up completely > > (looks like a hardware lockup). The ntp4 from udel works completely > > though. Odd :) > > Yes, that's odd. I've never seen that... Can you try to compile a kernel with > DDB and try to see whether it hits the debugger or not ? > Already tried, no go. Its a complete lockup. However, I just updated to 4.0.99g from the udel ftp, and I am seeing the same lockups. Someone suggested trying it w/o the rtprio calls, which I am rebuilding for as I write this. -- _ __ ___ ___ ___ ___ Wesley N Morgan _ __ ___ | _ ) __| \ [EMAIL PROTECTED] _ __ | _ \._ \ |) | FreeBSD: The Power To Serve _ |___/___/___/ Hi! I'm a .signature virus! Copy me into your ~/.signature to help me spread! To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ntpd hanging machine
According to Wes Morgan: > merge. Until that point I was using the stock ntp4 from udel with no > problems. But I tried the one shipping with 4.0 and it locks up completely > (looks like a hardware lockup). The ntp4 from udel works completely > though. Odd :) Yes, that's odd. I've never seen that... Can you try to compile a kernel with DDB and try to see whether it hits the debugger or not ? -- Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- [EMAIL PROTECTED] FreeBSD keltia.freenix.fr 4.0-CURRENT #77: Thu Dec 30 12:49:51 CET 1999 To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ntpd hanging machine
> On Mon, 28 Feb 2000, Matthew Frost wrote: > > > I'm experiencing some problems with ntpd. It would appear that a few > > (10-15) minutes after I start it, the machine crashes completely... > > > > Feb 28 14:10:02 egrorian ntpd[153]: ntpd 4.0.99b Mon Feb 28 12:12:17 GMT 2000 (1) > > Feb 28 14:10:02 egrorian ntpd[153]: using kernel phase-lock loop 2040 > > Feb 28 14:10:02 egrorian ntpd[153]: using kernel phase-lock loop 2041 > > > > And then locked solid.. > > I've been seeing the exact same behavior ever since the big ntp4 > merge. Until that point I was using the stock ntp4 from udel with no > problems. But I tried the one shipping with 4.0 and it locks up completely > (looks like a hardware lockup). The ntp4 from udel works completely > though. Odd :) Try disabling where it uses rtprio to set itself realtime and see if that "fixes" it. Peter -- Peter Dufault ([EMAIL PROTECTED]) Realtime development, Machine control, HD Associates, Inc. Fail-Safe systems, Agency approval To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ntpd hanging machine
On Mon, 28 Feb 2000, Matthew Frost wrote: > I'm experiencing some problems with ntpd. It would appear that a few > (10-15) minutes after I start it, the machine crashes completely... > > Feb 28 14:10:02 egrorian ntpd[153]: ntpd 4.0.99b Mon Feb 28 12:12:17 GMT 2000 (1) > Feb 28 14:10:02 egrorian ntpd[153]: using kernel phase-lock loop 2040 > Feb 28 14:10:02 egrorian ntpd[153]: using kernel phase-lock loop 2041 > > And then locked solid.. I've been seeing the exact same behavior ever since the big ntp4 merge. Until that point I was using the stock ntp4 from udel with no problems. But I tried the one shipping with 4.0 and it locks up completely (looks like a hardware lockup). The ntp4 from udel works completely though. Odd :) -- _ __ ___ ___ ___ ___ Wesley N Morgan _ __ ___ | _ ) __| \ [EMAIL PROTECTED] _ __ | _ \._ \ |) | FreeBSD: The Power To Serve _ |___/___/___/ Hi! I'm a .signature virus! Copy me into your ~/.signature to help me spread! To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
ntpd hanging machine
I'm experiencing some problems with ntpd. It would appear that a few (10-15) minutes after I start it, the machine crashes completely... Feb 28 14:10:02 egrorian ntpd[153]: ntpd 4.0.99b Mon Feb 28 12:12:17 GMT 2000 (1) Feb 28 14:10:02 egrorian ntpd[153]: using kernel phase-lock loop 2040 Feb 28 14:10:02 egrorian ntpd[153]: using kernel phase-lock loop 2041 And then locked solid.. It's -CURRENT as of this morning (28/2/2000) but it's happened before now... Here's the syslogged dmesg (I'll be able to provide any other information anyone wants once I can physically get to the machine to reset it later) I'll note it's a Soyo motherboard SY5SSM. Any help appreciated. Feb 28 14:09:35 egrorian /kernel: Copyright (c) 1992-2000 The FreeBSD Project. Feb 28 14:09:35 egrorian /kernel: Copyright (c) 1982, 1986, 1989, 1991, 1993 Feb 28 14:09:35 egrorian /kernel: The Regents of the University of California. All rights reserved. Feb 28 14:09:35 egrorian /kernel: FreeBSD 4.0-CURRENT #0: Mon Feb 28 13:40:01 GMT 2000 Feb 28 14:09:35 egrorian /kernel: [EMAIL PROTECTED]:/usr/src/sys/compile/NEWBOX Feb 28 14:09:35 egrorian /kernel: Timecounter "i8254" frequency 1193182 Hz Feb 28 14:09:35 egrorian /kernel: CPU: AMD-K6(tm) 3D processor (400.91-MHz 586-class CPU) Feb 28 14:09:35 egrorian /kernel: Origin = "AuthenticAMD" Id = 0x58c Stepping = 12 Feb 28 14:09:35 egrorian /kernel: Features=0x8021bf Feb 28 14:09:35 egrorian /kernel: AMD Features=0x8800 Feb 28 14:09:35 egrorian /kernel: real memory = 65011712 (63488K bytes) Feb 28 14:09:35 egrorian /kernel: avail memory = 60018688 (58612K bytes) Feb 28 14:09:35 egrorian /kernel: Preloaded elf kernel "kernel" at 0xc02bd000. Feb 28 14:09:35 egrorian /kernel: Preloaded userconfig_script "/boot/kernel.conf" at 0xc02bd09c. Feb 28 14:09:35 egrorian /kernel: md0: Malloc disk Feb 28 14:09:35 egrorian /kernel: npx0: on motherboard Feb 28 14:09:35 egrorian /kernel: npx0: INT 16 interface Feb 28 14:09:35 egrorian /kernel: pcib0: on motherboard Feb 28 14:09:35 egrorian /kernel: pci0: on pcib0 Feb 28 14:09:35 egrorian /kernel: atapci0: port 0x4000-0x400f,0x374-0x377,0x170-0x177,0x3f4-0x3f7,0x1f0-0x1f7 irq 14 at device 0.1 on pci0 Feb 28 14:09:35 egrorian /kernel: ata0: at 0x1f0 irq 14 on atapci0 Feb 28 14:09:35 egrorian /kernel: isab0: at device 1.0 on pci0 Feb 28 14:09:35 egrorian /kernel: isa0: on isab0 Feb 28 14:09:35 egrorian /kernel: pci0: (vendor=0x1039, dev=0x0009) at 1.1 Feb 28 14:09:35 egrorian /kernel: ohci0: mem 0xdc90-0xdc900fff irq 12 at device 1.2 on pci0 Feb 28 14:09:35 egrorian /kernel: usb0: OHCI version 1.0, legacy support Feb 28 14:09:35 egrorian /kernel: usb0: SMM does not respond, resetting Feb 28 14:09:35 egrorian /kernel: usb0: on ohci0 Feb 28 14:09:35 egrorian /kernel: usb0: USB revision 1.0 Feb 28 14:09:35 egrorian /kernel: uhub0: (unknown) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 Feb 28 14:09:35 egrorian /kernel: uhub0: 2 ports with 2 removable, self powered Feb 28 14:09:35 egrorian /kernel: pcib2: at device 2.0 on pci0 Feb 28 14:09:35 egrorian /kernel: pci1: on pcib2 Feb 28 14:09:35 egrorian /kernel: pci1: at 0.0 irq 11 Feb 28 14:09:35 egrorian /kernel: ed0: port 0xd000-0xd01f irq 10 at device 12.0 on pci0 Feb 28 14:09:35 egrorian /kernel: ed0: address 00:c0:f0:45:07:03, type NE2000 (16 bit) Feb 28 14:09:35 egrorian /kernel: pci0: (vendor=0x125d, dev=0x1969) at 13.0 irq 10 Feb 28 14:09:35 egrorian /kernel: pcib1: on motherboard Feb 28 14:09:35 egrorian /kernel: pci2: on pcib1 Feb 28 14:09:35 egrorian /kernel: fdc0: at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 Feb 28 14:09:35 egrorian /kernel: fdc0: FIFO enabled, 8 bytes threshold Feb 28 14:09:35 egrorian /kernel: fd0: <1440-KB 3.5" drive> on fdc0 drive 0 Feb 28 14:09:35 egrorian /kernel: atkbdc0: at port 0x60-0x6f on isa0 Feb 28 14:09:35 egrorian /kernel: atkbd0: irq 1 on atkbdc0 Feb 28 14:09:35 egrorian /kernel: vga0: at port 0x3c0-0x3df iomem 0xa-0xb on isa0 Feb 28 14:09:35 egrorian /kernel: sc0: on isa0 Feb 28 14:09:35 egrorian /kernel: sc0: VGA <16 virtual consoles, flags=0x200> Feb 28 14:09:35 egrorian /kernel: sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 Feb 28 14:09:35 egrorian /kernel: sio0: type 16550A Feb 28 14:09:35 egrorian /kernel: sio1 at port 0x2f8-0x2ff irq 3 on isa0 Feb 28 14:09:35 egrorian /kernel: sio1: type 16550A Feb 28 14:09:35 egrorian /kernel: ppc0: at port 0x378-0x37f irq 7 flags 0x40 on isa0 Feb 28 14:09:35 egrorian /kernel: ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode Feb 28 14:09:35 egrorian /kernel: ppc0: FIFO with 16/16/16 bytes threshold Feb 28 14:09:35 egrorian /kernel: ppi0: on ppbus0 Feb 28 14:09:35 egrorian /kernel: lpt0: on ppbus0 Feb 28 14:09:35 egrorian /kernel: lpt0: Interrupt-driven port Feb 28 14:09:35 egrorian /kernel: plip0: on ppbus0 Feb 28 14:09:35 egrorian /kernel: pca0 at port 0x40 on isa0 Feb 28 14:09:35 egrorian /kernel: unknown: can't assign resources Feb 28 14:09:35 eg