Re: 4.0 locked up over the weekend
On Wed, May 09, 2007 at 11:46:13AM -0700, Bruce Bauer wrote: On 5/8/07, Joachim Schipper [EMAIL PROTECTED] wrote: On Tue, May 08, 2007 at 09:05:44AM -0700, Bruce Bauer wrote: Probably a good idea to put some load on the sytem anyway. Running make in ports/www/kde should keep it busy for a while Not familiar with bonnie++, I'll check it out [snip: bonnie runs fine] update: i've experienced 3 more hard lockups. no messgaes on the console screen. nothing unusual in any of the log file that i've found. make running in /upr/ports/x11/kde was interrupted at different tasks each time, (downloading, compiling, and running a configure script). system recovered each time with no problems after a powercycle. are there some system monitoring tools i should be running to keep track of various resources? No, not really; very few things you could do would cause the system to freeze. Okay, so something is wrong. Troubleshooting it tends to be hard; however, you are experiencing lock-ups, not crashes. Perhaps the box simply gets too hot? Most modern systems have sensors, which can usually be seen via hw.sensors, or at least the BIOS screen. Simply cleaning out the mess tends to help here. Joachim -- TFMotD: fpa, fea, fta (4) - DEC FDDI controller device driver
Re: 4.0 locked up over the weekend
Bruce Bauer wrote: This system has been running flawlessly since mid-March with GENERIC plus the 010 patch. dmesg below This morning I found it totally unresponsive both through network and at the console. Had to use the power switch to recover. Where do I start trying to track this down? The system is running sshd and openvpn only DMESG: OpenBSD 4.0 (GENERICp) #0: Fri Mar 16 19:07:33 MST 2007 [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERICp cpu0: AMD Sempron(tm) Processor 3000+ (AuthenticAMD 686-class, 256KB L2 cache) 1.61 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,CX16 .. Is this an amd64 capable Sempron? It looks like it is, based on the rest of the dmesg. If so...this could be the i386 on amd64 bug, the symptoms sure seem to fit. (granted, locked hard covers a lot of problems... :) If that's the case, you might want to upgrade to 4.1, which should take care of the problem. Nick.
Re: 4.0 locked up over the weekend
Can you get rid of extraneous hardware? Can you drop some RAM, and the video card? How about any of the AMD-specific processor setting, like HyperTransport? Can you disable apm? maybe there are some conflicts in the apm... I mean, these are a few ideas that I thought of... On 5/10/07, Joachim Schipper [EMAIL PROTECTED] wrote: On Wed, May 09, 2007 at 11:46:13AM -0700, Bruce Bauer wrote: On 5/8/07, Joachim Schipper [EMAIL PROTECTED] wrote: On Tue, May 08, 2007 at 09:05:44AM -0700, Bruce Bauer wrote: Probably a good idea to put some load on the sytem anyway.
Re: 4.0 locked up over the weekend
In article [EMAIL PROTECTED], Nick Holland wrote: cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,CX16 .. Is this an amd64 capable Sempron? It looks like it is, based on the rest of the dmesg. Nope, no LONG in that cpu flags... -- [100~Plax]sb16i0A2172656B63616820636420726568746F6E61207473754A[dZ1!=b]salax
Re: 4.0 locked up over the weekend
Tobias Weingartner wrote: In article [EMAIL PROTECTED], Nick Holland wrote: cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,CX16 .. Is this an amd64 capable Sempron? It looks like it is, based on the rest of the dmesg. Nope, no LONG in that cpu flags... And while this part is right, that CPU does not have LONG support, it may still exhibit the PAE bug. -- [100~Plax]sb16i0A2172656B63616820636420726568746F6E61207473754A[dZ1!=b]salax
Re: 4.0 locked up over the weekend
Update: I've experienced 3 more hard lockups. No messgaes on the console screen. Nothing unusual in any of the log file that I've found. Make running in /upr/ports/x11/kde was interrupted at different tasks each time, (downloading, compiling, and running a configure script). System recovered each time with no problems after a powercycle. Are there some system monitoring tools I should be running to keep track of various resources? On 5/8/07, Bruce Bauer [EMAIL PROTECTED] wrote: Initial results: complied bonnie++ from ports make is running in ports/x11/kde 2 video streams passsing through VPN tunnel at abou 32 fps total output from bonnie++: Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP roadrunner.for 300M 50379 46 49432 6 6322 1 25376 41 34974 4 130.7 0 --Sequential Create-- Random Create -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 2542 5 + +++ 5113 8 2898 7 + +++ 5478 9 roadrunner.fortechsw.com,300M,50379,46,49432,6,6322,1,25376,41,34974,4,130.7,0,16,2542,5,+,+++,5113,8,2898,7,+,+++,5478,9 ran uptime after bonnie++ finished 11:21AM up 1 day, 2:15, 2 users, load averages: 4.08, 3.15, 2.55 Everything seems to be running smoothly Bruce On 5/8/07, Joachim Schipper [EMAIL PROTECTED] wrote: On Tue, May 08, 2007 at 09:05:44AM -0700, Bruce Bauer wrote: Probably a good idea to put some load on the sytem anyway. See how the VPN data transfer holds up. Downloading ports.tar.gz now Running make in ports/www/kde should keep it busy for a while Not familiar with bonnie++, I'll check it out Bonnie++ just generates a lot of I/O. The 'ghetto' version involves running 'tar xzf srf.tar.gz; rm -rf src' in a loop. Let us know how it goes... Joachim -- TFMotD: tht, thtc (4) - Tehuti Networks 10Gb Ethernet device
Re: 4.0 locked up over the weekend
Hmmm... Probably a good idea to put some load on the sytem anyway. See how the VPN data transfer holds up. Downloading ports.tar.gz now Running make in ports/www/kde should keep it busy for a while Not familiar with bonnie++, I'll check it out Thanks, Bruce On 5/7/07, Joachim Schipper [EMAIL PROTECTED] wrote: On Mon, May 07, 2007 at 12:42:55PM -0700, Bruce Bauer wrote: On 5/7/07, Jack J. Woehr [EMAIL PROTECTED] wrote: On May 7, 2007, at 12:20 PM, Bruce Bauer wrote: This system has been running flawlessly since mid-March with GENERIC plus the 010 patch. dmesg below This morning I found it totally unresponsive both through network and at the console. Had to use the power switch to recover. Where do I start trying to track this down? Open the box and check your power supply and blow it out with air if it's full of dust. Number one cause of mysterious lockups in my personal experience. Next, run a memory test. Only then start trying to debug software, e.g., OpenBSD. Thanks for the response. OK, maybe a little less basic than that. The system is sitting in a restricted access server room. Not a clean room, but very little dust. Nice and cool.. The system still looks brand new, inside and out. The purpose of this system is to receive streaming video data over the VPN from IP webcams. It doesn't do anything with the data except pass it on to a DVR system over the local network. Plans are to add another network card so the VPN and the local network will be on separate channels. But, for now, it all goes through one card. It has worked in this configuration for over a month with video from 2 cameras coming in. Oops! Message from Joachim Schipper just came in: There were no console messages The authlog does show that someone is trying to brute force an ssh login. I think I'll turn off sshd for now... Nah, script kiddies trying to bruteforce SSH logins are so common that I just tuned them out of the log parser altogether. Just use public keys, or good passwords. That said, Jack might be right to suspect some random hardware failure. If this is the case, how about some proper stress testing (compiling the whole system is fairly good in exercising CPU and memory, something like bonnie++ might help you to test the disk?). If that doesn't work, the software might be problematic... Joachim -- TFMotD: piconv (1) - iconv(1), reinvented in perl
Re: 4.0 locked up over the weekend
On Tue, May 08, 2007 at 09:05:44AM -0700, Bruce Bauer wrote: Probably a good idea to put some load on the sytem anyway. See how the VPN data transfer holds up. Downloading ports.tar.gz now Running make in ports/www/kde should keep it busy for a while Not familiar with bonnie++, I'll check it out Bonnie++ just generates a lot of I/O. The 'ghetto' version involves running 'tar xzf srf.tar.gz; rm -rf src' in a loop. Let us know how it goes... Joachim -- TFMotD: tht, thtc (4) - Tehuti Networks 10Gb Ethernet device
Re: 4.0 locked up over the weekend
Initial results: complied bonnie++ from ports make is running in ports/x11/kde 2 video streams passsing through VPN tunnel at abou 32 fps total output from bonnie++: Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP roadrunner.for 300M 50379 46 49432 6 6322 1 25376 41 34974 4 130.7 0 --Sequential Create-- Random Create -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 2542 5 + +++ 5113 8 2898 7 + +++ 5478 9 roadrunner.fortechsw.com,300M,50379,46,49432,6,6322,1,25376,41,34974,4,130.7,0,16,2542,5,+,+++,5113,8,2898,7,+,+++,5478,9 ran uptime after bonnie++ finished 11:21AM up 1 day, 2:15, 2 users, load averages: 4.08, 3.15, 2.55 Everything seems to be running smoothly Bruce On 5/8/07, Joachim Schipper [EMAIL PROTECTED] wrote: On Tue, May 08, 2007 at 09:05:44AM -0700, Bruce Bauer wrote: Probably a good idea to put some load on the sytem anyway. See how the VPN data transfer holds up. Downloading ports.tar.gz now Running make in ports/www/kde should keep it busy for a while Not familiar with bonnie++, I'll check it out Bonnie++ just generates a lot of I/O. The 'ghetto' version involves running 'tar xzf srf.tar.gz; rm -rf src' in a loop. Let us know how it goes... Joachim -- TFMotD: tht, thtc (4) - Tehuti Networks 10Gb Ethernet device
4.0 locked up over the weekend
This system has been running flawlessly since mid-March with GENERIC plus the 010 patch. dmesg below This morning I found it totally unresponsive both through network and at the console. Had to use the power switch to recover. Where do I start trying to track this down? The system is running sshd and openvpn only DMESG: OpenBSD 4.0 (GENERICp) #0: Fri Mar 16 19:07:33 MST 2007 [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERICp cpu0: AMD Sempron(tm) Processor 3000+ (AuthenticAMD 686-class, 256KB L2 cache) 1.61 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,CX16 real mem = 501706752 (489948K) avail mem = 449642496 (439104K) using 4256 buffers containing 25186304 bytes (24596K) of memory mainbus0 (root) bios0 at mainbus0: AT/286+(f0) BIOS, date 02/27/07, BIOS32 rev. 0 @ 0xfa820, SMBIOS rev. 2.4 @ 0xf (41 entries) apm0 at bios0: Power Management spec V1.2 apm0: AC on, battery charge unknown apm0: flags 70102 dobusy 1 doidle 1 pcibios0 at bios0: rev 3.0 @ 0xf/0xcfd4 pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfcee0/240 (13 entries) pcibios0: bad IRQ table checksum pcibios0: PCI BIOS has 13 Interrupt Routing table entries pcibios0: PCI Exclusive IRQs: 5 10 11 pcibios0: no compatible PCI ICU found pcibios0: Warning, unable to fix up PCI interrupt routing pcibios0: PCI bus #3 is the last bus bios0: ROM list: 0xc/0xde00 0xd/0x1800 cpu0 at mainbus0 pci0 at mainbus0 bus 0: configuration mode 1 (no bios) NVIDIA C51 Host rev 0xa2 at pci0 dev 0 function 0 not configured NVIDIA C51 Memory rev 0xa2 at pci0 dev 0 function 1 not configured NVIDIA C51 Memory rev 0xa2 at pci0 dev 0 function 2 not configured NVIDIA C51 Memory rev 0xa2 at pci0 dev 0 function 3 not configured NVIDIA C51 Memory rev 0xa2 at pci0 dev 0 function 4 not configured NVIDIA C51 Memory rev 0xa2 at pci0 dev 0 function 5 not configured NVIDIA C51 Memory rev 0xa2 at pci0 dev 0 function 6 not configured NVIDIA C51 Memory rev 0xa2 at pci0 dev 0 function 7 not configured ppb0 at pci0 dev 3 function 0 NVIDIA C51 PCIE rev 0xa1 pci1 at ppb0 bus 1 ppb1 at pci0 dev 4 function 0 NVIDIA C51 PCIE rev 0xa1 pci2 at ppb1 bus 2 vga1 at pci0 dev 5 function 0 NVIDIA GeForce 6100 rev 0xa2 wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) wsdisplay0: screen 1-5 added (80x25, vt100 emulation) NVIDIA MCP51 Host rev 0xa2 at pci0 dev 9 function 0 not configured pcib0 at pci0 dev 10 function 0 vendor NVIDIA, unknown product 0x0261 rev 0xa3 nviic0 at pci0 dev 10 function 1 NVIDIA MCP51 SMBus rev 0xa3 iic0 at nviic0 iic1 at nviic0 NVIDIA MCP51 Memory rev 0xa3 at pci0 dev 10 function 2 not configured ohci0 at pci0 dev 11 function 0 NVIDIA MCP51 USB rev 0xa3: irq 10, version 1.0, legacy support usb0 at ohci0: USB revision 1.0 uhub0 at usb0 uhub0: NVIDIA OHCI root hub, rev 1.00/1.00, addr 1 uhub0: 8 ports with 8 removable, self powered ehci0 at pci0 dev 11 function 1 NVIDIA MCP51 USB rev 0xa3: irq 11 usb1 at ehci0: USB revision 2.0 uhub1 at usb1 uhub1: NVIDIA EHCI root hub, rev 2.00/1.00, addr 1 uhub1: 8 ports with 8 removable, self powered pciide0 at pci0 dev 13 function 0 NVIDIA MCP51 IDE rev 0xa1: DMA, channel 0 configured to compatibility, channel 1 configured to compatibility pciide0: channel 0 disabled (no drives) atapiscsi0 at pciide0 channel 1 drive 0 scsibus0 at atapiscsi0: 2 targets cd0 at scsibus0 targ 0 lun 0: Lite-On, LTN486 48x Max, YD01 SCSI0 5/cdrom removable cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2 pciide1 at pci0 dev 14 function 0 NVIDIA MCP51 SATA rev 0xa1: DMA pciide1: using irq 11 for native-PCI interrupt wd0 at pciide1 channel 0 drive 0: WDC WD800JD-00MSA1 wd0: 16-sector PIO, LBA48, 76319MB, 156301488 sectors wd0(pciide1:0:0): using PIO mode 4, Ultra-DMA mode 5 ppb2 at pci0 dev 16 function 0 NVIDIA MCP51 PCI-PCI rev 0xa2 pci3 at ppb2 bus 3 auich0 at pci0 dev 16 function 2 NVIDIA MCP51 AC97 rev 0xa2: irq 11, MCP51 AC97 ac97: codec id 0x414c4760 (Avance Logic ALC655 rev 0) audio0 at auich0 nfe0 at pci0 dev 20 function 0 NVIDIA MCP51 LAN rev 0xa3: irq 10, address 00:19:21:33:1d:93 ukphy0 at nfe0 phy 1: Generic IEEE 802.3u media interface, rev. 1: OUI 0x0050ef, model 0x0007 pchb0 at pci0 dev 24 function 0 AMD AMD64 HyperTransport rev 0x00 pchb1 at pci0 dev 24 function 1 AMD AMD64 Address Map rev 0x00 pchb2 at pci0 dev 24 function 2 AMD AMD64 DRAM Cfg rev 0x00 pchb3 at pci0 dev 24 function 3 AMD AMD64 Misc Cfg rev 0x00 isa0 at pcib0 isadma0 at isa0 pckbc0 at isa0 port 0x60/5 pckbd0 at pckbc0 (kbd slot) pckbc0: using irq 1 for kbd slot wskbd0 at pckbd0: console keyboard, using wsdisplay0 pms0 at pckbc0 (aux slot) pckbc0: using irq 12 for aux slot wsmouse0 at pms0 mux 0 pcppi0 at isa0 port 0x61 midi0 at pcppi0: PC speaker spkr0 at pcppi0 lpt0 at isa0 port 0x378/4 irq 7 it0 at isa0 port 0x290/8: IT87 npx0 at isa0 port 0xf0/16: using exception 16 pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo fdc0 at isa0 port 0x3f0/6 irq 6 drq 2 biomask ef6d
Re: 4.0 locked up over the weekend
On May 7, 2007, at 12:20 PM, Bruce Bauer wrote: This system has been running flawlessly since mid-March with GENERIC plus the 010 patch. dmesg below This morning I found it totally unresponsive both through network and at the console. Had to use the power switch to recover. Where do I start trying to track this down? Open the box and check your power supply and blow it out with air if it's full of dust. Number one cause of mysterious lockups in my personal experience. Next, run a memory test. Only then start trying to debug software, e.g., OpenBSD. -- Jack J. Woehr Director of Development Absolute Performance, Inc. [EMAIL PROTECTED] 303-443-7000 ext. 527
Re: 4.0 locked up over the weekend
On Mon, May 07, 2007 at 11:20:00AM -0700, Bruce Bauer wrote: This system has been running flawlessly since mid-March with GENERIC plus the 010 patch. dmesg below This morning I found it totally unresponsive both through network and at the console. Had to use the power switch to recover. Where do I start trying to track this down? If it happens again, try to see if there are any messages on the console. Otherwise, look at what was last written to the log files; that might or might not contain a clue. (The kernel screaming at you about something or other would be a solid clue, for instance.) Joachim
Re: 4.0 locked up over the weekend
On 5/7/07, Jack J. Woehr [EMAIL PROTECTED] wrote: On May 7, 2007, at 12:20 PM, Bruce Bauer wrote: This system has been running flawlessly since mid-March with GENERIC plus the 010 patch. dmesg below This morning I found it totally unresponsive both through network and at the console. Had to use the power switch to recover. Where do I start trying to track this down? Open the box and check your power supply and blow it out with air if it's full of dust. Number one cause of mysterious lockups in my personal experience. Next, run a memory test. Only then start trying to debug software, e.g., OpenBSD. -- Jack J. Woehr Director of Development Absolute Performance, Inc. [EMAIL PROTECTED] 303-443-7000 ext. 527 Thanks for the response. OK, maybe a little less basic than that. The system is sitting in a restricted access server room. Not a clean room, but very little dust. Nice and cool.. The system still looks brand new, inside and out. The purpose of this system is to receive streaming video data over the VPN from IP webcams. It doesn't do anything with the data except pass it on to a DVR system over the local network. Plans are to add another network card so the VPN and the local network will be on separate channels. But, for now, it all goes through one card. It has worked in this configuration for over a month with video from 2 cameras coming in. Oops! Message from Joachim Schipper just came in: There were no console messages The authlog does show that someone is trying to brute force an ssh login. I think I'll turn off sshd for now...
Re: 4.0 locked up over the weekend
On Mon, May 07, 2007 at 12:42:55PM -0700, Bruce Bauer wrote: On 5/7/07, Jack J. Woehr [EMAIL PROTECTED] wrote: On May 7, 2007, at 12:20 PM, Bruce Bauer wrote: This system has been running flawlessly since mid-March with GENERIC plus the 010 patch. dmesg below This morning I found it totally unresponsive both through network and at the console. Had to use the power switch to recover. Where do I start trying to track this down? Open the box and check your power supply and blow it out with air if it's full of dust. Number one cause of mysterious lockups in my personal experience. Next, run a memory test. Only then start trying to debug software, e.g., OpenBSD. Thanks for the response. OK, maybe a little less basic than that. The system is sitting in a restricted access server room. Not a clean room, but very little dust. Nice and cool.. The system still looks brand new, inside and out. The purpose of this system is to receive streaming video data over the VPN from IP webcams. It doesn't do anything with the data except pass it on to a DVR system over the local network. Plans are to add another network card so the VPN and the local network will be on separate channels. But, for now, it all goes through one card. It has worked in this configuration for over a month with video from 2 cameras coming in. Oops! Message from Joachim Schipper just came in: There were no console messages The authlog does show that someone is trying to brute force an ssh login. I think I'll turn off sshd for now... Nah, script kiddies trying to bruteforce SSH logins are so common that I just tuned them out of the log parser altogether. Just use public keys, or good passwords. That said, Jack might be right to suspect some random hardware failure. If this is the case, how about some proper stress testing (compiling the whole system is fairly good in exercising CPU and memory, something like bonnie++ might help you to test the disk?). If that doesn't work, the software might be problematic... Joachim -- TFMotD: piconv (1) - iconv(1), reinvented in perl