Re: ntpd hanging machine

2000-03-02 Thread Doug Barton

Peter Dufault wrote:

> It's probably almost safe to run a program
> rtprio or idprio if all you do is compute during that time and
> go back to time sharing before doing anything else, but be sure
> you're paged in, don't handle signals that way, etc...

I ran rc5des idprio'ed for well over a year on 2.2.8, 3.2+ and 4.0
systems and never had a problem. Sounds like it fell just into your
parameters here though. Perhaps a stern warning in the man page(s) is in
order here? I had no idea I was making land mines for myself here. :)

Doug
-- 
"Welcome to the desert of the real." 

- Laurence Fishburne as Morpheus, "The Matrix"


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: ntpd hanging machine

2000-03-02 Thread Peter Dufault

> On Thu, 2 Mar 2000, Matthew Dillon wrote:
> 
> > :> > merge. Until that point I was using the stock ntp4 from udel with no
> > :> > problems. But I tried the one shipping with 4.0 and it locks up completely
> 
> > rtprio (and idprio) is virtually guarenteed to lockup your machine 
> > eventually.  Don't use either.
> 
> Unfortunately, ntpd in -current uses rtprio by default.
> 
> Perter Dufault's recent changes fixed some related things, but not the
> priority inversion problems.

My guess is that the new ntpd now does something while it is rtpriod
and it didn't used to.  As far as I can tell the rtprio does nothing
except maybe resume ntpd in preference to other readied kernel 
processes.  The rtprio should be taken out of ntpd, it also
(as I've talked over with Bruce) screws up the kernel priorities.

This illustrates the danger of using it at all.
With no way to bail out if you get upside down, any mods to the
program are dangerous.  It's probably almost safe to run a program
rtprio or idprio if all you do is compute during that time and
go back to time sharing before doing anything else, but be sure
you're paged in, don't handle signals that way, etc...

My "solution" would be to make rtprio lower priority
than regular time sharing and if it needs to use any time sharing
resources fault it or temporarily boost it up to time sharing.

Peter

--
Peter Dufault ([EMAIL PROTECTED])   Realtime development, Machine control,
HD Associates, Inc.   Fail-Safe systems, Agency approval


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: ntpd hanging machine

2000-03-02 Thread Matthew Dillon


:
:On Thu, Mar 02, 2000 at 02:55:16PM -0800, a little birdie told me
:that Matthew Dillon remarked
:> 
:> rtprio (and idprio) is virtually guarenteed to lockup your machine 
:> eventually.  Don't use either.
:
:Hm.
:I've run ntpd rtprio'd to 52 for over a year, under -CURRENT and
:RELENG_2_2.  Never had a freeze/crash I coudl attribute to it.
:
:-- 
:Matthew Fuller (MF4839) |[EMAIL PROTECTED]

idprio is a bigger problem then rtprio, but both suffer from
priority inversion issues.  If an idprio process blocks on something
(like a disk read) and a normal process gets into a cpu-bound loop,
the idprio process never gets cpu and the result is that all the
resources locked by the idprio process while it is blocked stay locked.
This can lockup the kernel.

An rtprio process, being higher priority, has a similar issue but in
a reverse sense.  If a normal process blocks on something like a disk
read and an rtprio process gets stuck in a cpu-bound loop, another rtprio
process trying to do something that requires the resources locked by the
non-rtprio process will block indefinitely.

idprio based priority inversion problems are a bigger issue since it
is far more likely for a normal process to get stuck in a cpu-bound
loop.

I run my ntpd's normally.  I don't use rtprio or idprio.  It works just
fine, even on systems with heavy loads.  If you are worried you can run
it at nice -20 (still as a normal process).  The reason it will work just
fine is simply because it does not use very much cpu so when it *does*
need the cpu the scheduler gives it cpu.  The ntp protocol takes into
account laggy networks and laggy response times, all you loose is a few
milliseconds in accuracy (big whoopteedo).

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: ntpd hanging machine

2000-03-02 Thread Bruce Evans

On Thu, 2 Mar 2000, Matthew Dillon wrote:

> :> > merge. Until that point I was using the stock ntp4 from udel with no
> :> > problems. But I tried the one shipping with 4.0 and it locks up completely

> rtprio (and idprio) is virtually guarenteed to lockup your machine 
> eventually.  Don't use either.

Unfortunately, ntpd in -current uses rtprio by default.

Perter Dufault's recent changes fixed some related things, but not the
priority inversion problems.

Bruce



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: ntpd hanging machine

2000-03-02 Thread Matthew D. Fuller

On Thu, Mar 02, 2000 at 02:55:16PM -0800, a little birdie told me
that Matthew Dillon remarked
> 
> rtprio (and idprio) is virtually guarenteed to lockup your machine 
> eventually.  Don't use either.

Hm.
I've run ntpd rtprio'd to 52 for over a year, under -CURRENT and
RELENG_2_2.  Never had a freeze/crash I coudl attribute to it.



-- 
Matthew Fuller (MF4839) |[EMAIL PROTECTED]
Unix Systems Administrator  |[EMAIL PROTECTED]
Specializing in FreeBSD |http://www.over-yonder.net/

"The only reason I'm burning my candle at both ends, is because I
  haven't figured out how to light the middle yet"


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: ntpd hanging machine

2000-03-02 Thread Matthew Dillon


:> > merge. Until that point I was using the stock ntp4 from udel with no
:> > problems. But I tried the one shipping with 4.0 and it locks up completely
:> > (looks like a hardware lockup). The ntp4 from udel works completely
:> > though. Odd :)
:> 
:> Yes, that's odd. I've never seen that... Can you try to compile a kernel with
:> DDB and try to see whether it hits the debugger or not ?
:> 
:
:Already tried, no go. Its a complete lockup. However, I just updated to
:4.0.99g from the udel ftp, and I am seeing the same lockups. Someone
:suggested trying it w/o the rtprio calls, which I am rebuilding for as I
:write this.
:   _ __ ___   ___ ___ ___
:  Wesley N Morgan   _ __ ___ | _ ) __|   \
:  [EMAIL PROTECTED]   _ __ | _ \._ \ |) |
:  FreeBSD: The Power To Serve  _ |___/___/___/
:Hi! I'm a .signature virus! Copy me into your ~/.signature to help me spread!

rtprio (and idprio) is virtually guarenteed to lockup your machine 
eventually.  Don't use either.

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: ntpd hanging machine

2000-03-02 Thread Wes Morgan

On Wed, 1 Mar 2000, Ollivier Robert wrote:

> According to Wes Morgan:
> > merge. Until that point I was using the stock ntp4 from udel with no
> > problems. But I tried the one shipping with 4.0 and it locks up completely
> > (looks like a hardware lockup). The ntp4 from udel works completely
> > though. Odd :)
> 
> Yes, that's odd. I've never seen that... Can you try to compile a kernel with
> DDB and try to see whether it hits the debugger or not ?
> 

Already tried, no go. Its a complete lockup. However, I just updated to
4.0.99g from the udel ftp, and I am seeing the same lockups. Someone
suggested trying it w/o the rtprio calls, which I am rebuilding for as I
write this.


-- 
   _ __ ___   ___ ___ ___
  Wesley N Morgan   _ __ ___ | _ ) __|   \
  [EMAIL PROTECTED]   _ __ | _ \._ \ |) |
  FreeBSD: The Power To Serve  _ |___/___/___/
Hi! I'm a .signature virus! Copy me into your ~/.signature to help me spread!



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: ntpd hanging machine

2000-03-01 Thread Ollivier Robert

According to Wes Morgan:
> merge. Until that point I was using the stock ntp4 from udel with no
> problems. But I tried the one shipping with 4.0 and it locks up completely
> (looks like a hardware lockup). The ntp4 from udel works completely
> though. Odd :)

Yes, that's odd. I've never seen that... Can you try to compile a kernel with
DDB and try to see whether it hits the debugger or not ?
-- 
Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- [EMAIL PROTECTED]
FreeBSD keltia.freenix.fr 4.0-CURRENT #77: Thu Dec 30 12:49:51 CET 1999



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: ntpd hanging machine

2000-03-01 Thread Peter Dufault

> On Mon, 28 Feb 2000, Matthew Frost wrote:
> 
> > I'm experiencing some problems with ntpd.  It would appear that a few
> > (10-15) minutes after I start it, the machine crashes completely...
> > 
> > Feb 28 14:10:02 egrorian ntpd[153]: ntpd 4.0.99b Mon Feb 28 12:12:17 GMT 2000 (1)
> > Feb 28 14:10:02 egrorian ntpd[153]: using kernel phase-lock loop 2040
> > Feb 28 14:10:02 egrorian ntpd[153]: using kernel phase-lock loop 2041
> > 
> > And then locked solid..
> 
> I've been seeing the exact same behavior ever since the big ntp4
> merge. Until that point I was using the stock ntp4 from udel with no
> problems. But I tried the one shipping with 4.0 and it locks up completely
> (looks like a hardware lockup). The ntp4 from udel works completely
> though. Odd :)

Try disabling where it uses rtprio to set itself realtime and see if that
"fixes" it.

Peter

--
Peter Dufault ([EMAIL PROTECTED])   Realtime development, Machine control,
HD Associates, Inc.   Fail-Safe systems, Agency approval


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: ntpd hanging machine

2000-02-29 Thread Wes Morgan

On Mon, 28 Feb 2000, Matthew Frost wrote:

> I'm experiencing some problems with ntpd.  It would appear that a few
> (10-15) minutes after I start it, the machine crashes completely...
> 
> Feb 28 14:10:02 egrorian ntpd[153]: ntpd 4.0.99b Mon Feb 28 12:12:17 GMT 2000 (1)
> Feb 28 14:10:02 egrorian ntpd[153]: using kernel phase-lock loop 2040
> Feb 28 14:10:02 egrorian ntpd[153]: using kernel phase-lock loop 2041
> 
> And then locked solid..

I've been seeing the exact same behavior ever since the big ntp4
merge. Until that point I was using the stock ntp4 from udel with no
problems. But I tried the one shipping with 4.0 and it locks up completely
(looks like a hardware lockup). The ntp4 from udel works completely
though. Odd :)

-- 
   _ __ ___   ___ ___ ___
  Wesley N Morgan   _ __ ___ | _ ) __|   \
  [EMAIL PROTECTED]   _ __ | _ \._ \ |) |
  FreeBSD: The Power To Serve  _ |___/___/___/
Hi! I'm a .signature virus! Copy me into your ~/.signature to help me spread!



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



ntpd hanging machine

2000-02-28 Thread Matthew Frost

I'm experiencing some problems with ntpd.  It would appear that a few
(10-15) minutes after I start it, the machine crashes completely...

Feb 28 14:10:02 egrorian ntpd[153]: ntpd 4.0.99b Mon Feb 28 12:12:17 GMT 2000 (1)
Feb 28 14:10:02 egrorian ntpd[153]: using kernel phase-lock loop 2040
Feb 28 14:10:02 egrorian ntpd[153]: using kernel phase-lock loop 2041

And then locked solid..

It's -CURRENT as of this morning (28/2/2000) but it's happened before
now...  Here's the syslogged dmesg (I'll be able to provide any other
information anyone wants once I can physically get to the machine to
reset it later)

I'll note it's a Soyo motherboard SY5SSM.  Any help appreciated.

Feb 28 14:09:35 egrorian /kernel: Copyright (c) 1992-2000 The FreeBSD Project.
Feb 28 14:09:35 egrorian /kernel: Copyright (c) 1982, 1986, 1989, 1991, 1993
Feb 28 14:09:35 egrorian /kernel: The Regents of the University of California. All 
rights reserved.
Feb 28 14:09:35 egrorian /kernel: FreeBSD 4.0-CURRENT #0: Mon Feb 28 13:40:01 GMT 2000
Feb 28 14:09:35 egrorian /kernel: [EMAIL PROTECTED]:/usr/src/sys/compile/NEWBOX
Feb 28 14:09:35 egrorian /kernel: Timecounter "i8254"  frequency 1193182 Hz
Feb 28 14:09:35 egrorian /kernel: CPU: AMD-K6(tm) 3D processor (400.91-MHz 586-class 
CPU)
Feb 28 14:09:35 egrorian /kernel: Origin = "AuthenticAMD"  Id = 0x58c  Stepping = 12
Feb 28 14:09:35 egrorian /kernel: 
Features=0x8021bf
Feb 28 14:09:35 egrorian /kernel: AMD Features=0x8800
Feb 28 14:09:35 egrorian /kernel: real memory  = 65011712 (63488K bytes)
Feb 28 14:09:35 egrorian /kernel: avail memory = 60018688 (58612K bytes)
Feb 28 14:09:35 egrorian /kernel: Preloaded elf kernel "kernel" at 0xc02bd000.
Feb 28 14:09:35 egrorian /kernel: Preloaded userconfig_script "/boot/kernel.conf" at 
0xc02bd09c.
Feb 28 14:09:35 egrorian /kernel: md0: Malloc disk
Feb 28 14:09:35 egrorian /kernel: npx0:  on motherboard
Feb 28 14:09:35 egrorian /kernel: npx0: INT 16 interface
Feb 28 14:09:35 egrorian /kernel: pcib0:  on motherboard
Feb 28 14:09:35 egrorian /kernel: pci0:  on pcib0
Feb 28 14:09:35 egrorian /kernel: atapci0:  port 
0x4000-0x400f,0x374-0x377,0x170-0x177,0x3f4-0x3f7,0x1f0-0x1f7 irq 14 at device 0.1 on 
pci0
Feb 28 14:09:35 egrorian /kernel: ata0: at 0x1f0 irq 14 on atapci0
Feb 28 14:09:35 egrorian /kernel: isab0:  at device 1.0 on 
pci0
Feb 28 14:09:35 egrorian /kernel: isa0:  on isab0
Feb 28 14:09:35 egrorian /kernel: pci0:  (vendor=0x1039, dev=0x0009) at 
1.1
Feb 28 14:09:35 egrorian /kernel: ohci0:  mem 
0xdc90-0xdc900fff irq 12 at device 1.2 on pci0
Feb 28 14:09:35 egrorian /kernel: usb0: OHCI version 1.0, legacy support
Feb 28 14:09:35 egrorian /kernel: usb0: SMM does not respond, resetting
Feb 28 14:09:35 egrorian /kernel: usb0:  on ohci0
Feb 28 14:09:35 egrorian /kernel: usb0: USB revision 1.0
Feb 28 14:09:35 egrorian /kernel: uhub0: (unknown) OHCI root hub, class 9/0, rev 
1.00/1.00, addr 1
Feb 28 14:09:35 egrorian /kernel: uhub0: 2 ports with 2 removable, self powered
Feb 28 14:09:35 egrorian /kernel: pcib2:  
at device 2.0 on pci0
Feb 28 14:09:35 egrorian /kernel: pci1:  on pcib2
Feb 28 14:09:35 egrorian /kernel: pci1:  at 0.0 irq 11
Feb 28 14:09:35 egrorian /kernel: ed0:  port 
0xd000-0xd01f irq 10 at device 12.0 on pci0
Feb 28 14:09:35 egrorian /kernel: ed0: address 00:c0:f0:45:07:03, type NE2000 (16 bit) 
Feb 28 14:09:35 egrorian /kernel: pci0:  (vendor=0x125d, dev=0x1969) at 
13.0 irq 10
Feb 28 14:09:35 egrorian /kernel: pcib1:  on motherboard
Feb 28 14:09:35 egrorian /kernel: pci2:  on pcib1
Feb 28 14:09:35 egrorian /kernel: fdc0:  at port 
0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
Feb 28 14:09:35 egrorian /kernel: fdc0: FIFO enabled, 8 bytes threshold
Feb 28 14:09:35 egrorian /kernel: fd0: <1440-KB 3.5" drive> on fdc0 drive 0
Feb 28 14:09:35 egrorian /kernel: atkbdc0:  at port 
0x60-0x6f on isa0
Feb 28 14:09:35 egrorian /kernel: atkbd0:  irq 1 on atkbdc0
Feb 28 14:09:35 egrorian /kernel: vga0:  at port 0x3c0-0x3df iomem 
0xa-0xb on isa0
Feb 28 14:09:35 egrorian /kernel: sc0:  on isa0
Feb 28 14:09:35 egrorian /kernel: sc0: VGA <16 virtual consoles, flags=0x200>
Feb 28 14:09:35 egrorian /kernel: sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
Feb 28 14:09:35 egrorian /kernel: sio0: type 16550A
Feb 28 14:09:35 egrorian /kernel: sio1 at port 0x2f8-0x2ff irq 3 on isa0
Feb 28 14:09:35 egrorian /kernel: sio1: type 16550A
Feb 28 14:09:35 egrorian /kernel: ppc0:  at port 0x378-0x37f irq 7 
flags 0x40 on isa0
Feb 28 14:09:35 egrorian /kernel: ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in 
COMPATIBLE mode
Feb 28 14:09:35 egrorian /kernel: ppc0: FIFO with 16/16/16 bytes threshold
Feb 28 14:09:35 egrorian /kernel: ppi0:  on ppbus0
Feb 28 14:09:35 egrorian /kernel: lpt0:  on ppbus0
Feb 28 14:09:35 egrorian /kernel: lpt0: Interrupt-driven port
Feb 28 14:09:35 egrorian /kernel: plip0:  on ppbus0
Feb 28 14:09:35 egrorian /kernel: pca0 at port 0x40 on isa0
Feb 28 14:09:35 egrorian /kernel: unknown:  can't assign resources
Feb 28 14:09:35 eg