Re: Reboot loop
Basically yes. It's also possible to use puc(4) in some cases but you'll need to find the memory address yourself in pcidump and set it in the bootloader with "machine comaddr" and I think for some puc devices you might have a hard time setting the port to the baud rate that you want. -- Sent from a phone, apologies for poor formatting. On 8 June 2018 02:36:28 IL Ka wrote: OpenBSD doesn't use ACPI to find an isa UART, it only looks in the fixed locations compiled in to the kernel. Ok, I see that "com.c" does it by reading register, it even has comment "Probe for all known forms of UART" For a system console (with access to DDB etc.) you need a "standard" com port. Do you mean I can use "com", but not "ucom(4)", right? Thank you, Ilya.
Re: Reboot loop
On 08/06/18 11:36, IL Ka wrote: >> For a system console (with access to DDB etc.) you need a "standard" com > port. > Do you mean I can use "com", but not "ucom(4)", right? Using USB serial would require enumeration of the serial bus then selection of the appropriate protocol (there's at least a dozen competing standards for USB serial) based on the VID/PID. Not trivial to do in the early boot phase. I don't know of many operating systems that can do this. -- Stuart Longland (aka Redhatter, VK4MSL) I haven't lost my mind... ...it's backed up on a tape somewhere.
Re: Reboot loop
> OpenBSD doesn't use ACPI to find an isa UART, it only looks in the fixed > locations compiled in to the kernel. Ok, I see that "com.c" does it by reading register, it even has comment "Probe for all known forms of UART" > For a system console (with access to DDB etc.) you need a "standard" com port. Do you mean I can use "com", but not "ucom(4)", right? Thank you, Ilya.
Re: Reboot loop
On 2018-06-06, IL Ka wrote: > There is >> com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo > in your dmesg. > > So, I assume your box reports com port somehow (via ACPI probably) OpenBSD doesn't use ACPI to find an isa UART, it only looks in the fixed locations compiled in to the kernel. Seeing ns16550a in the output suggests that it did actually find one. > Some boxes may have comport built into chipset but no external cable for it. > I have one, I bought cable separately. It's also possible that the UART is present (as part of a superio chip usually) but it isn't even brought ought to a header on the board. > Another option is to use UART that connects to USB For a system console (with access to DDB etc.) you need a "standard" com port. A standard DOS-compatible one at the usual com1/com2 address are easy. PCI/PCIe *might* be possible in some cases but awkward to setup. USB is not possible.
Re: Reboot loop
IL Ka, Thanks for pointing it out. It will take a few days before I can capture the output through the com port. Until then folks, - Mensaje original - De: IL Ka Para: francis dos santos CC: OpenBSD General Misc Enviado: Wed, 06 Jun 2018 19:32:32 -0300 (ART) Asunto: Re: Reboot loop There is > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo in your dmesg. So, I assume your box reports com port somehow (via ACPI probably) Some boxes may have comport built into chipset but no external cable for it. I have one, I bought cable separately. Another option is to use UART that connects to USB
Re: Reboot loop
Oops, spoke to soon. I'll have to break the box open/read manual to see if there is a com1 option through a header. - Mensaje original - De: IL Ka Para: francis dos santos CC: OpenBSD General Misc Enviado: Wed, 06 Jun 2018 18:45:07 -0300 (ART) Asunto: Re: Reboot loop Ok, then try to follow Stuart Longland's advice: use serial console. Connect your PC using null-modem cable to another pc, and in boot(8) prompt type: boot> set tty com0 On another PC run cu(1) or minicom or screen (or for Windows you may use PuTTY), connect to OpenBSD and you will see all your console output which you should be able to capture.
Re: Reboot loop
There is no com port on this machine. Thanks for the assistance. - Mensaje original - De: IL Ka Para: francis dos santos CC: OpenBSD General Misc Enviado: Wed, 06 Jun 2018 18:45:07 -0300 (ART) Asunto: Re: Reboot loop Ok, then try to follow Stuart Longland's advice: use serial console. Connect your PC using null-modem cable to another pc, and in boot(8) prompt type: boot> set tty com0 On another PC run cu(1) or minicom or screen (or for Windows you may use PuTTY), connect to OpenBSD and you will see all your console output which you should be able to capture.
Re: Reboot loop
I'll be more specific. I was talking about a 'loop' where the system reboots automatically and there is also a tighter loop that does not cause the system to reboot automatically. The inescapable loop is the tighter loop which causes the boot process to display uvm_fault(...) indefinitely. Needless to say, if something gets displayed before entering the tighter loop, I won't be able to see it. I do not see a kernel panic. - Mensaje original - De: IL Ka Para: francis dos santos CC: OpenBSD General Misc Enviado: Wed, 06 Jun 2018 17:29:55 -0300 (ART) Asunto: Re: Reboot loop ddb(4): "ddb is invoked upon a kernel panic when the sysctl(8) ddb.panic is set to 1". I belive this value is default. So, kernel should be dropped into ddb on panic. Does it happen? What exactly do you see on screen along with uvm_fault? Do you see whole stacktrace? Check https://www.openbsd.org/ddb.html for "Minimum information for kernel problems" section
Re: Reboot loop
There is > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo in your dmesg. So, I assume your box reports com port somehow (via ACPI probably) Some boxes may have comport built into chipset but no external cable for it. I have one, I bought cable separately. Another option is to use UART that connects to USB
Re: Reboot loop
IL Ka, The problem manifests itself before sysctl.conf is read. CTRL-ALT-ESC won't send a break. Using boot -d will drop me in ddb way too early. This machine doesn't have any means for serial output, no com port. No core is dumped. Can the system panic and still go on in an endless loop? Once in a blue moon the system actually boots. The inescapable loop happens just before it switches to a higher resolution and displays radeondrm0: ... x ... 32bpp (the actual resolution is irrelevant). After detection of the ring 2 test failure. - Mensaje original - De: IL Ka Para: francis dos santos CC: OpenBSD General Misc Enviado: Wed, 06 Jun 2018 16:01:24 -0300 (ART) Asunto: Re: Reboot loop https://www.openbsd.org/report.html See "How to create a problem report" step 5
Re: Reboot loop
Ok, then try to follow Stuart Longland's advice: use serial console. Connect your PC using null-modem cable to another pc, and in boot(8) prompt type: boot> set tty com0 On another PC run cu(1) or minicom or screen (or for Windows you may use PuTTY), connect to OpenBSD and you will see all your console output which you should be able to capture.
Re: Reboot loop
On 06/06/18 22:56, francis.dos.san...@ciudad.com.ar wrote: > About two days ago I upgraded to the version #65 below, just to see if > the game unknown-horizons would run smooth. Starting the computer anew > after evaluating the performance of the game I noticed that the machine > rebooted automatically. Many lines are printed with some weird '-> 0' > or '0 -> 1'. It scrolls by too fast to see properly. Can you plug a null modem cable into the machine's serial port (PCI/ISA one, not USB) and tell the kernel to direct its messages to that while another watches the output? Check your motherboard, some do provide "COM1" via a header which can be used exactly for this purpose. -- Stuart Longland (aka Redhatter, VK4MSL) I haven't lost my mind... ...it's backed up on a tape somewhere.
Re: Reboot loop
ddb(4): "ddb is invoked upon a kernel panic when the sysctl(8) ddb.panic is set to 1". I belive this value is default. So, kernel should be dropped into ddb on panic. Does it happen? What exactly do you see on screen along with uvm_fault? Do you see whole stacktrace? Check https://www.openbsd.org/ddb.html for "Minimum information for kernel problems" section
Re: Reboot loop
https://www.openbsd.org/report.html See "How to create a problem report" step 5
Re: Reboot loop
Theo, >> uvm_fault(0x81db7f68, 0x58, 0, 1) -> e > Just that one line? No other lines? I find that hard to believe. I should've stated that the uvm_fault messageline get's repeated ad infinitum. What can I do to get more debug info? > > Rebooting one more time resulted in an automatic reboot. Starting the > computer afresh gave me a wsconsole login. The firmware for vmm and > radeondrm were applied. Rebooting again, went in a loop, after that: > > uvm_fault(0x81db7ea8, 0x58, 0, 1) -> e > > Which piece of hardware is failing? Prior to the #65 upgrade I've seen > the ring 2 error CAFEDEAD message, but no automatic reboots before said > version. > What boot options are there to get a working stable system? > > OpenBSD 6.3-current (GENERIC.MP) #65: Fri Jun 1 17:44:06 MDT 2018 > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > real mem = 7966674944 (7597MB) > avail mem = 7652007936 (7297MB) > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root > bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xe95c0 (17 entries) > bios0: vendor American Megatrends Inc. version "P1.10" date 04/01/2014 > bios0: ASRock QC5000-ITX/WiFi > acpi0 at bios0: rev 2 > acpi0: sleep states S0 S3 S4 S5 > acpi0: tables DSDT FACP APIC FPDT MCFG HPET AAFT SSDT SSDT CRAT SSDT SSDT > SSDT SSDT > acpi0: wakeup devices GFX_(S4) GPP1(S4) GPP2(S4) GPP3(S4) SBAZ(S4) PS2K(S4) > UAR1(S4) OHC1(S4) EHC1(S4) OHC2(S4) EHC2(S4) OHC3(S4) EHC3(S4) XHC0(S4) > acpitimer0 at acpi0: 3579545 Hz, 32 bits > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat > cpu0 at mainbus0: apid 0 (boot processor) > cpu0: AMD A4-5000 APU with Radeon(TM) HD Graphics, 1497.37 MHz > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PCTRL3,ITSC,BMI1 > cpu0: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line > 16-way L2 cache > cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative > cpu0: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative > cpu0: smt 0, core 0, package 0 > mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges > cpu0: apic clock running at 99MHz > cpu0: mwait min=64, max=64, IBE > cpu1 at mainbus0: apid 1 (application processor) > cpu1: AMD A4-5000 APU with Radeon(TM) HD Graphics, 1497.20 MHz > cpu1: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PCTRL3,ITSC,BMI1 > cpu1: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line > 16-way L2 cache > cpu1: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative > cpu1: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative > cpu1: smt 0, core 1, package 0 > cpu2 at mainbus0: apid 2 (application processor) > cpu2: AMD A4-5000 APU with Radeon(TM) HD Graphics, 1497.20 MHz > cpu2: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PCTRL3,ITSC,BMI1 > cpu2: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line > 16-way L2 cache > cpu2: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative > cpu2: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative > cpu2: smt 0, core 2, package 0 > cpu3 at mainbus0: apid 3 (application processor) > cpu3: AMD A4-5000 APU with Radeon(TM) HD Graphics, 1497.20 MHz > cpu3: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PCTRL3,ITSC,BMI1 > cpu3: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line > 16-way L2 cache > cpu3: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative > cpu3: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative > cpu3: smt 0, core 3, package 0 > ioapic0 at mainbus0: apid 5 pa 0xfec0, version 21, 24 pins > ioapic1 at mainbus0: apid 6 pa 0xfec01000, version 21, 32 pins > acpimcfg0 at acpi
Re: Reboot loop
francis.dos.san...@ciudad.com.ar wrote: > Hello, > > My apologies if this should've gone to bugs@. There are 3 dmesg.boot > outputs within this text. The last successful boot of version #65 and > two outputs of #82 (xenodm enabled and disabled). > > About two days ago I upgraded to the version #65 below, just to see if > the game unknown-horizons would run smooth. Starting the computer anew > after evaluating the performance of the game I noticed that the machine > rebooted automatically. Many lines are printed with some weird '-> 0' > or '0 -> 1'. It scrolls by too fast to see properly. > > I cannot tell wether the following is relevant. After shutting down the > game 2bwm looked like it crashed and switching to a different tty seems > to weirdly copy tty5 (where xenodm starts). Say switching to tty0 > copies tty5 then switching to tty6 (I didn't know this one existed) I'm > presented with a wsconsole. Rebooting the machine resulted in it too > start over automatically. But luckily the second time around it > proceeded to a xenodm login screen. > > Today the system has become unusable. I upgraded to a newer #82 to see > if my problem would go away, but no luck. It's stuck in an endless > reboot loop. Commenting out xenodm_flags gave me the this result. > > uvm_fault(0x81db7f68, 0x58, 0, 1) -> e Just that one line? No other lines? I find that hard to believe. > > Rebooting one more time resulted in an automatic reboot. Starting the > computer afresh gave me a wsconsole login. The firmware for vmm and > radeondrm were applied. Rebooting again, went in a loop, after that: > > uvm_fault(0x81db7ea8, 0x58, 0, 1) -> e > > Which piece of hardware is failing? Prior to the #65 upgrade I've seen > the ring 2 error CAFEDEAD message, but no automatic reboots before said > version. > What boot options are there to get a working stable system? > > OpenBSD 6.3-current (GENERIC.MP) #65: Fri Jun 1 17:44:06 MDT 2018 > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > real mem = 7966674944 (7597MB) > avail mem = 7652007936 (7297MB) > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root > bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xe95c0 (17 entries) > bios0: vendor American Megatrends Inc. version "P1.10" date 04/01/2014 > bios0: ASRock QC5000-ITX/WiFi > acpi0 at bios0: rev 2 > acpi0: sleep states S0 S3 S4 S5 > acpi0: tables DSDT FACP APIC FPDT MCFG HPET AAFT SSDT SSDT CRAT SSDT SSDT > SSDT SSDT > acpi0: wakeup devices GFX_(S4) GPP1(S4) GPP2(S4) GPP3(S4) SBAZ(S4) PS2K(S4) > UAR1(S4) OHC1(S4) EHC1(S4) OHC2(S4) EHC2(S4) OHC3(S4) EHC3(S4) XHC0(S4) > acpitimer0 at acpi0: 3579545 Hz, 32 bits > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat > cpu0 at mainbus0: apid 0 (boot processor) > cpu0: AMD A4-5000 APU with Radeon(TM) HD Graphics, 1497.37 MHz > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PCTRL3,ITSC,BMI1 > cpu0: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line > 16-way L2 cache > cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative > cpu0: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative > cpu0: smt 0, core 0, package 0 > mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges > cpu0: apic clock running at 99MHz > cpu0: mwait min=64, max=64, IBE > cpu1 at mainbus0: apid 1 (application processor) > cpu1: AMD A4-5000 APU with Radeon(TM) HD Graphics, 1497.20 MHz > cpu1: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PCTRL3,ITSC,BMI1 > cpu1: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line > 16-way L2 cache > cpu1: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative > cpu1: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative > cpu1: smt 0, core 1, package 0 > cpu2 at mainbus0: apid 2 (application processor) > cpu2: AMD A4-5000 APU with Radeon(TM) HD Graphics, 1497.20 MHz > cpu2: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PCTRL3,ITSC,BM
Re: reboot loop on -current, one machine of several
On 11/13/17 14:24, Gregory Edigarov wrote: ... >> scsibus1 at ahci0: 32 targets >> -sd0 at scsibus1 targ 2 lun 0: SCSI3 0/direct >> fixed naa.50025388400562d4 >> +sd0 at scsibus1 targ 0 lun 0: SCSI3 0/direct >> fixed naa.50025388400563fe >> sd0: 976762MB, 512 bytes/sector, 2000409264 sectors, thin >> -sd1 at scsibus1 targ 3 lun 0: SCSI3 0/direct >> fixed naa.5002538c70007b02 >> -sd1: 1953514MB, 512 bytes/sector, 4000797360 sectors, thin >> +cd0 at scsibus1 targ 1 lun 0: ATAPI 5/cdrom >> removable >> ichiic0 at pci0 dev 31 function 3 "Intel 6 Series SMBus" rev 0x04: apic 0 >> int 19 >> iic0 at ichiic0 > My suspicion goes to SSDs. one of them have somehow become bad. I'm not able to say "no" to that. Been kinda leaning that direction, myself. These have been troublesome little beasts. Got several of the Samsung 850 series in this project, and never had so many problems with storage since I tried some off-brand (JTI?) disks around 20 years ago. Yes, I know, lots of people think these are the best around (Samsung, not JTI). *shrug* However, I did do a dd read over the first few GB (entire 'a' partition, partition table, mbr, etc.) of the disk to see if there were any read errors -- none. Whatever that's worth. If all else fails, I'll be moving the function to spare hw and totally rebuild this machine and see if it fixes it. Nick.
Re: reboot loop on -current, one machine of several
On 12.11.17 21:59, Nick Holland wrote: On 11/12/17 14:13, Otto Moerbeek wrote: On Sun, Nov 12, 2017 at 01:28:39PM -0500, Nick Holland wrote: Help. I was upgrading a few very similar machines to -current today. ONE of the three decided to be unpleasant. The thing has a serial console, and but it is about 370km from me. :-/ Upgrade from Sep 9 current to today's current via bsd.rd, just like the other two. Upon reboot, it does this (from /boot) : booting hd0a:/bsd: 8484712+2429968+244048+0+667648 [636809heap full (0x9d304+65536) And then reboots the system, as if from power-down/power-up. (already something I haven't seen before) Reboot from "bsd.rd" and "bsd.sp", same results. reboot from "obsd" (Sept 9), same results. Not a kernel problem, it seems. About this point, I'm starting to think how the serial console has let me down. I remember how to bring up a DRAC remote CD image via ssh tunnels to the drac and how to run java in a windows browser, and reboot off the remote CD image, do another upgrade, all goes fine (again), but upon reboot, same results... "heap full" and reboot. Boot from remote CD, at the boot> prompt, enter "boot hd0a:/bsd", and it boots Just Fine from the local hard disk (only boot pulled from the remote CD). Boot loader! Reinstalled boot: # installboot -v sd0 Using / as root installing bootstrap on /dev/rsd0c using first-stage /usr/mdec/biosboot, second-stage /usr/mdec/boot copying /usr/mdec/boot to /boot /boot is 3 blocks x 32768 bytes fs block shift 3; part offset 64; inode block 24, offset 2088 master boot record (MBR) at sector 0 partition 3: type 0xA6 offset 64 size 2000397671 /usr/mdec/biosboot will be written at sector 64 good, right? Reboot off local hard disk, boom. problem is still there. maybe not the boot loader. :-/ Verified /boot on trouble system and good system are the same. I'm not going to cry "bug", since there are two nearly identical systems working just fine. But I can't think of what I did wrong or what to do to fix it. Suggestions? You are hitting -DHEAP_LIMIT=0xA in /boot. The code is in libsa/alloa.c No idea why. But something in that system is different. You do have one weird line in your disklabel output: a filesystem mounted on swap? that's an mfs. This application has one directory which has a HUGE benefit to an MFS for tmp files. Though the reboot happens long before the mfs is created. scsibus1 at ahci0: 32 targets -sd0 at scsibus1 targ 2 lun 0: SCSI3 0/direct fixed naa.50025388400562d4 +sd0 at scsibus1 targ 0 lun 0: SCSI3 0/direct fixed naa.50025388400563fe sd0: 976762MB, 512 bytes/sector, 2000409264 sectors, thin -sd1 at scsibus1 targ 3 lun 0: SCSI3 0/direct fixed naa.5002538c70007b02 -sd1: 1953514MB, 512 bytes/sector, 4000797360 sectors, thin +cd0 at scsibus1 targ 1 lun 0: ATAPI 5/cdrom removable ichiic0 at pci0 dev 31 function 3 "Intel 6 Series SMBus" rev 0x04: apic 0 int 19 iic0 at ichiic0 My suspicion goes to SSDs. one of them have somehow become bad. Nick.
Re: reboot loop on -current, one machine of several
> booting hd0a:/bsd: 8484712+2429968+244048+0+667648 [636809heap full > (0x9d304+65536) Maybe a corrupted or too big /bsd file? I am curious about the cause.
Re: reboot loop on -current, one machine of several
On 11/12/17 14:13, Otto Moerbeek wrote: > On Sun, Nov 12, 2017 at 01:28:39PM -0500, Nick Holland wrote: > >> Help. >> >> I was upgrading a few very similar machines to -current today. >> ONE of the three decided to be unpleasant. The thing has a >> serial console, and but it is about 370km from me. :-/ >> >> Upgrade from Sep 9 current to today's current via bsd.rd, just >> like the other two. >> >> Upon reboot, it does this (from /boot) : >> >> booting hd0a:/bsd: 8484712+2429968+244048+0+667648 [636809heap full >> (0x9d304+65536) >> >> And then reboots the system, as if from power-down/power-up. >> (already something I haven't seen before) >> >> Reboot from "bsd.rd" and "bsd.sp", same results. reboot from "obsd" >> (Sept 9), same results. Not a kernel problem, it seems. About this >> point, I'm starting to think how the serial console has let me down. >> >> I remember how to bring up a DRAC remote CD image via ssh tunnels >> to the drac and how to run java in a windows browser, and >> reboot off the remote CD image, do another upgrade, all goes fine >> (again), but upon reboot, same results... "heap full" and reboot. >> >> Boot from remote CD, at the boot> prompt, enter "boot hd0a:/bsd", >> and it boots Just Fine from the local hard disk (only boot pulled >> from the remote CD). Boot loader! Reinstalled boot: >> >> # installboot -v sd0 >> Using / as root >> installing bootstrap on /dev/rsd0c >> using first-stage /usr/mdec/biosboot, second-stage /usr/mdec/boot >> copying /usr/mdec/boot to /boot >> /boot is 3 blocks x 32768 bytes >> fs block shift 3; part offset 64; inode block 24, offset 2088 >> master boot record (MBR) at sector 0 >> partition 3: type 0xA6 offset 64 size 2000397671 >> /usr/mdec/biosboot will be written at sector 64 >> >> good, right? >> >> Reboot off local hard disk, boom. problem is still there. maybe >> not the boot loader. :-/ >> >> Verified /boot on trouble system and good system are the same. >> >> I'm not going to cry "bug", since there are two nearly identical >> systems working just fine. But I can't think of what I did wrong >> or what to do to fix it. >> >> Suggestions? > > You are hitting -DHEAP_LIMIT=0xA in /boot. The code is in libsa/alloa.c > > No idea why. But something in that system is different. > > You do have one weird line in your disklabel output: a filesystem > mounted on swap? that's an mfs. This application has one directory which has a HUGE benefit to an MFS for tmp files. Though the reboot happens long before the mfs is created. $ more /etc/fstab cde728ba2c9bbe7.b none swap sw ccde728ba2c9bbe7.a / ffs rw,noatime 1 1 ccde728ba2c9bbe7.h /home ffs rw,noatime,nodev,nosuid 1 2 ccde728ba2c9bbe7.e /tmp ffs rw,noatime,nodev,nosuid 1 2 ccde728ba2c9bbe7.d /usr ffs rw,noatime,nodev 1 2 ccde728ba2c9bbe7.f /var ffs rw,noatime,nodev,nosuid 1 2 ccde728ba2c9bbe7.g /repo ffs rw,noatime,nodev 1 2 ccde728ba2c9bbe7.i /repo/anoncvs/dev ffs rw,noatime,nosuid 1 2 /dev/sd0b /repo/anoncvs/tmp mfs rw,nodev,nosuid,-m=1,-s=3072000,-i=2048 0 0 > Can you boot into single user mode? nope. Considering how fast the reboot happens, I wouldn't have expected it to, unless something is very different very early in the boot process. This is what happened: On the console: Using drive 0, partion 3. Loading... probing: pc0 com0 com1 mem[631K 3038M 2M 68K 72K 176k 64K 13312M a20=on] disk: fd0 hd0+ >> OpenBSD/amd64 BOOT 3.33 switching console to com0 and then on the serial console: >> OpenBSD/amd64 BOOT 3.33 boot> boot -s booting hd0a:/bsd: 8484304+2429960+244080+0+667648 [643739heap full (0x9d4fc+65536) (boom. reboot) here's a dmesg diff between the "good" and "bad" machines... $ diff -u dmesg.good dmesg.bad --- dmesg.good Sun Nov 12 14:51:30 2017 +++ dmesg.bad Sun Nov 12 14:51:21 2017 @@ -1,7 +1,7 @@ OpenBSD 6.2-current (GENERIC.MP) #203: Sat Nov 11 19:01:19 MST 2017 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 17131339776 (16337MB) -avail mem = 16605302784 (15836MB) +avail mem = 16605294592 (15836MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root @@ -16,46 +16,46 @@ acpihpet0 at acpi0: 14318179 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) -cpu0: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3193.18 MHz +cpu0: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3193.22 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT cpu0: 256KB 64b/line 8-way L2 cache -acpihpet0: recalibrated T
Re: reboot loop on -current, one machine of several
On Sun, Nov 12, 2017 at 01:28:39PM -0500, Nick Holland wrote: > Help. > > I was upgrading a few very similar machines to -current today. > ONE of the three decided to be unpleasant. The thing has a > serial console, and but it is about 370km from me. :-/ > > Upgrade from Sep 9 current to today's current via bsd.rd, just > like the other two. > > Upon reboot, it does this (from /boot) : > > booting hd0a:/bsd: 8484712+2429968+244048+0+667648 [636809heap full > (0x9d304+65536) > > And then reboots the system, as if from power-down/power-up. > (already something I haven't seen before) > > Reboot from "bsd.rd" and "bsd.sp", same results. reboot from "obsd" > (Sept 9), same results. Not a kernel problem, it seems. About this > point, I'm starting to think how the serial console has let me down. > > I remember how to bring up a DRAC remote CD image via ssh tunnels > to the drac and how to run java in a windows browser, and > reboot off the remote CD image, do another upgrade, all goes fine > (again), but upon reboot, same results... "heap full" and reboot. > > Boot from remote CD, at the boot> prompt, enter "boot hd0a:/bsd", > and it boots Just Fine from the local hard disk (only boot pulled > from the remote CD). Boot loader! Reinstalled boot: > > # installboot -v sd0 > Using / as root > installing bootstrap on /dev/rsd0c > using first-stage /usr/mdec/biosboot, second-stage /usr/mdec/boot > copying /usr/mdec/boot to /boot > /boot is 3 blocks x 32768 bytes > fs block shift 3; part offset 64; inode block 24, offset 2088 > master boot record (MBR) at sector 0 > partition 3: type 0xA6 offset 64 size 2000397671 > /usr/mdec/biosboot will be written at sector 64 > > good, right? > > Reboot off local hard disk, boom. problem is still there. maybe > not the boot loader. :-/ > > Verified /boot on trouble system and good system are the same. > > I'm not going to cry "bug", since there are two nearly identical > systems working just fine. But I can't think of what I did wrong > or what to do to fix it. > > Suggestions? You are hitting -DHEAP_LIMIT=0xA in /boot. The code is in libsa/alloa.c No idea why. But something in that system is different. You do have one weird line in your disklabel output: a filesystem mounted on swap? Can you boot into single user mode? -Otto > > Nick. > > > $ dmesg > OpenBSD 6.2-current (GENERIC.MP) #203: Sat Nov 11 19:01:19 MST 2017 > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > real mem = 17131339776 (16337MB) > avail mem = 16605306880 (15836MB) > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root > bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xe6680 (57 entries) > bios0: vendor Dell Inc. version "2.8.0" date 06/24/2014 > bios0: Dell Inc. PowerEdge R210 II > acpi0 at bios0: rev 2 > acpi0: sleep states S0 S4 S5 > acpi0: tables DSDT FACP SPMI ASF! HPET APIC MCFG BOOT SSDT ASPT SSDT SSDT > SPCR HEST ERST BERT EINJ > acpi0: wakeup devices P0P1(S4) GLAN(S0) EHC1(S4) EHC2(S4) XHC_(S4) PXSX(S4) > RP01(S5) PXSX(S4) RP02(S5) PXSX(S4) RP03(S5) PXSX(S4) RP04(S5) PXSX(S4) > RP05(S5) PXSX(S4) [...] > acpitimer0 at acpi0: 3579545 Hz, 24 bits > acpihpet0 at acpi0: 14318179 Hz > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat > cpu0 at mainbus0: apid 0 (boot processor) > cpu0: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3193.24 MHz > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT > cpu0: 256KB 64b/line 8-way L2 cache > acpihpet0: recalibrated TSC frequency 3192748207 Hz > cpu0: smt 0, core 0, package 0 > mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges > cpu0: apic clock running at 99MHz > cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE > cpu1 at mainbus0: apid 1 (application processor) > cpu1: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz > cpu1: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT > cpu1: 256KB 64b/line 8-way L2 cache > cpu1: smt 1, core 0, package 0 > cpu2 at mainbus0: apid 2 (application processor) > cpu2: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz > cpu2: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT > cpu2: 256KB 64b/line 8-way L2 cache > cpu2: smt 0, core 1, package 0 > cpu3 at mainbus0: apid 3 (application processor) > cpu3: Intel(R) Xeon(R) CP