Re: Reboot loop

2018-06-07 Thread Stuart Henderson
Basically yes. It's also possible to use puc(4) in some cases but you'll 
need to find the memory address yourself in pcidump and set it in the 
bootloader with "machine comaddr" and I think for some puc devices you 
might have a hard time setting the port to the baud rate that you want.



--
 Sent from a phone, apologies for poor formatting.

On 8 June 2018 02:36:28 IL Ka  wrote:

 OpenBSD doesn't use ACPI to find an isa UART, it only looks in the fixed
locations compiled in to the kernel.


Ok, I see that  "com.c" does it by reading register, it even has comment 
"Probe for all known forms of UART"




 For a system console (with access to DDB etc.) you need a "standard" com port.

Do you mean I can use "com", but not "ucom(4)", right?

Thank you,

Ilya.




Re: Reboot loop

2018-06-07 Thread Stuart Longland
On 08/06/18 11:36, IL Ka wrote:
>>  For a system console (with access to DDB etc.) you need a "standard" com
> port.
> Do you mean I can use "com", but not "ucom(4)", right?

Using USB serial would require enumeration of the serial bus then
selection of the appropriate protocol (there's at least a dozen
competing standards for USB serial) based on the VID/PID.

Not trivial to do in the early boot phase.  I don't know of many
operating systems that can do this.
-- 
Stuart Longland (aka Redhatter, VK4MSL)

I haven't lost my mind...
  ...it's backed up on a tape somewhere.



Re: Reboot loop

2018-06-07 Thread IL Ka
 >  OpenBSD doesn't use ACPI to find an isa UART, it only looks in the fixed
> locations compiled in to the kernel.

Ok, I see that  "com.c" does it by reading register, it even has comment
"Probe for all known forms of UART"


>  For a system console (with access to DDB etc.) you need a "standard" com
port.
Do you mean I can use "com", but not "ucom(4)", right?

Thank you,

Ilya.


Re: Reboot loop

2018-06-07 Thread Stuart Henderson
On 2018-06-06, IL Ka  wrote:
> There is
>> com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> in your dmesg.
>
> So, I assume your box reports com port somehow (via ACPI probably)

OpenBSD doesn't use ACPI to find an isa UART, it only looks in the fixed
locations compiled in to the kernel. Seeing ns16550a in the output
suggests that it did actually find one.

> Some boxes may have comport built into chipset but no external cable for it.
> I have one, I bought cable separately.

It's also possible that the UART is present (as part of a superio chip usually)
but it isn't even brought ought to a header on the board.

> Another option is to use UART that connects to USB

For a system console (with access to DDB etc.) you need a "standard" com port.
A standard DOS-compatible one at the usual com1/com2 address are easy. PCI/PCIe
*might* be possible in some cases but awkward to setup. USB is not possible.




Re: Reboot loop

2018-06-07 Thread francis . dos . santos
IL Ka,

Thanks for pointing it out. It will take a few days
before I can capture the output through the com
port.

Until then folks,

- Mensaje original -
De: IL Ka 
Para: francis dos santos 
CC: OpenBSD General Misc 
Enviado: Wed, 06 Jun 2018 19:32:32 -0300 (ART)
Asunto: Re: Reboot loop

There is
> com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
in your dmesg.

So, I assume your box reports com port somehow (via ACPI probably)
Some boxes may have comport built into chipset but no external cable for it.
I have one, I bought cable separately.

Another option is to use UART that connects to USB



Re: Reboot loop

2018-06-07 Thread francis . dos . santos
Oops, spoke to soon. I'll have to break the box open/read
manual to see if there is a com1 option through a header.

- Mensaje original -
De: IL Ka 
Para: francis dos santos 
CC: OpenBSD General Misc 
Enviado: Wed, 06 Jun 2018 18:45:07 -0300 (ART)
Asunto: Re: Reboot loop

Ok, then try to follow Stuart Longland's advice: use serial console.
Connect your PC using null-modem cable to another pc, and in boot(8) prompt
type:

boot> set tty com0

On another PC run cu(1) or minicom or screen (or for Windows you may use
PuTTY), connect to OpenBSD and you will
see all your console output which you should be able to capture.



Re: Reboot loop

2018-06-07 Thread francis . dos . santos
There is no com port on this machine.

Thanks for the assistance.

- Mensaje original -
De: IL Ka 
Para: francis dos santos 
CC: OpenBSD General Misc 
Enviado: Wed, 06 Jun 2018 18:45:07 -0300 (ART)
Asunto: Re: Reboot loop

Ok, then try to follow Stuart Longland's advice: use serial console.
Connect your PC using null-modem cable to another pc, and in boot(8) prompt
type:

boot> set tty com0

On another PC run cu(1) or minicom or screen (or for Windows you may use
PuTTY), connect to OpenBSD and you will
see all your console output which you should be able to capture.



Re: Reboot loop

2018-06-07 Thread francis . dos . santos
I'll be more specific. I was talking about a 'loop' where the system
reboots automatically and there is also a tighter loop that does not
cause the system to reboot automatically. The inescapable loop is the
tighter loop which causes the boot process to display uvm_fault(...)
indefinitely. Needless to say, if something gets displayed before
entering the tighter loop, I won't be able to see it.

I do not see a kernel panic.

- Mensaje original -
De: IL Ka 
Para: francis dos santos 
CC: OpenBSD General Misc 
Enviado: Wed, 06 Jun 2018 17:29:55 -0300 (ART)
Asunto: Re: Reboot loop

ddb(4):
"ddb is invoked upon a kernel panic when the sysctl(8) ddb.panic is set to
1".

I belive this value is default. So, kernel should be dropped into ddb on
panic.
Does it happen?

What exactly do you see on screen along with uvm_fault?

Do you see whole stacktrace?

Check
https://www.openbsd.org/ddb.html
for "Minimum information for kernel problems" section



Re: Reboot loop

2018-06-06 Thread IL Ka
There is
> com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
in your dmesg.

So, I assume your box reports com port somehow (via ACPI probably)
Some boxes may have comport built into chipset but no external cable for it.
I have one, I bought cable separately.

Another option is to use UART that connects to USB


Re: Reboot loop

2018-06-06 Thread francis . dos . santos
IL Ka,

The problem manifests itself before sysctl.conf is read. CTRL-ALT-ESC
won't send a break. Using boot -d will drop me in ddb way too early.
This machine doesn't have any means for serial output, no com port.
No core is dumped. Can the system panic and still go on in an endless
loop?

Once in a blue moon the system actually boots. The inescapable loop
happens just before it switches to a higher resolution and displays
radeondrm0: ... x ... 32bpp (the actual resolution is irrelevant).
After detection of the ring 2 test failure.

- Mensaje original -
De: IL Ka 
Para: francis dos santos 
CC: OpenBSD General Misc 
Enviado: Wed, 06 Jun 2018 16:01:24 -0300 (ART)
Asunto: Re: Reboot loop

https://www.openbsd.org/report.html
See "How to create a problem report" step 5



Re: Reboot loop

2018-06-06 Thread IL Ka
Ok, then try to follow Stuart Longland's advice: use serial console.
Connect your PC using null-modem cable to another pc, and in boot(8) prompt
type:

boot> set tty com0

On another PC run cu(1) or minicom or screen (or for Windows you may use
PuTTY), connect to OpenBSD and you will
see all your console output which you should be able to capture.


Re: Reboot loop

2018-06-06 Thread Stuart Longland
On 06/06/18 22:56, francis.dos.san...@ciudad.com.ar wrote:
> About two days ago I upgraded to the version #65 below, just to see if
> the game unknown-horizons would run smooth. Starting the computer anew
> after evaluating the performance of the game I noticed that the machine
> rebooted automatically. Many lines are printed with some weird '-> 0' 
> or '0 -> 1'. It scrolls by too fast to see properly.

Can you plug a null modem cable into the machine's serial port (PCI/ISA
one, not USB) and tell the kernel to direct its messages to that while
another watches the output?

Check your motherboard, some do provide "COM1" via a header which can be
used exactly for this purpose.
-- 
Stuart Longland (aka Redhatter, VK4MSL)

I haven't lost my mind...
  ...it's backed up on a tape somewhere.



Re: Reboot loop

2018-06-06 Thread IL Ka
ddb(4):
"ddb is invoked upon a kernel panic when the sysctl(8) ddb.panic is set to
1".

I belive this value is default. So, kernel should be dropped into ddb on
panic.
Does it happen?

What exactly do you see on screen along with uvm_fault?

Do you see whole stacktrace?

Check
https://www.openbsd.org/ddb.html
for "Minimum information for kernel problems" section


Re: Reboot loop

2018-06-06 Thread IL Ka
https://www.openbsd.org/report.html
See "How to create a problem report" step 5


Re: Reboot loop

2018-06-06 Thread francis . dos . santos
Theo,

>> uvm_fault(0x81db7f68, 0x58, 0, 1) -> e

> Just that one line?  No other lines?  I find that hard to believe.

I should've stated that the uvm_fault messageline get's repeated ad
infinitum. What can I do to get more debug info?




















































































































































































































































> 
> Rebooting one more time resulted in an automatic reboot. Starting the
> computer afresh gave me a wsconsole login. The firmware for vmm and
> radeondrm were applied. Rebooting again, went in a loop, after that:
> 
> uvm_fault(0x81db7ea8, 0x58, 0, 1) -> e
> 
> Which piece of hardware is failing? Prior to the #65 upgrade I've seen
> the ring 2 error CAFEDEAD message, but no automatic reboots before said
> version.
> What boot options are there to get a working stable system?
> 
> OpenBSD 6.3-current (GENERIC.MP) #65: Fri Jun  1 17:44:06 MDT 2018
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 7966674944 (7597MB)
> avail mem = 7652007936 (7297MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xe95c0 (17 entries)
> bios0: vendor American Megatrends Inc. version "P1.10" date 04/01/2014
> bios0: ASRock QC5000-ITX/WiFi
> acpi0 at bios0: rev 2
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP APIC FPDT MCFG HPET AAFT SSDT SSDT CRAT SSDT SSDT 
> SSDT SSDT
> acpi0: wakeup devices GFX_(S4) GPP1(S4) GPP2(S4) GPP3(S4) SBAZ(S4) PS2K(S4) 
> UAR1(S4) OHC1(S4) EHC1(S4) OHC2(S4) EHC2(S4) OHC3(S4) EHC3(S4) XHC0(S4)
> acpitimer0 at acpi0: 3579545 Hz, 32 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: AMD A4-5000 APU with Radeon(TM) HD Graphics, 1497.37 MHz
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PCTRL3,ITSC,BMI1
> cpu0: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line 
> 16-way L2 cache
> cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
> cpu0: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: AMD A4-5000 APU with Radeon(TM) HD Graphics, 1497.20 MHz
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PCTRL3,ITSC,BMI1
> cpu1: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line 
> 16-way L2 cache
> cpu1: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
> cpu1: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: AMD A4-5000 APU with Radeon(TM) HD Graphics, 1497.20 MHz
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PCTRL3,ITSC,BMI1
> cpu2: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line 
> 16-way L2 cache
> cpu2: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
> cpu2: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: AMD A4-5000 APU with Radeon(TM) HD Graphics, 1497.20 MHz
> cpu3: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PCTRL3,ITSC,BMI1
> cpu3: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line 
> 16-way L2 cache
> cpu3: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
> cpu3: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
> cpu3: smt 0, core 3, package 0
> ioapic0 at mainbus0: apid 5 pa 0xfec0, version 21, 24 pins
> ioapic1 at mainbus0: apid 6 pa 0xfec01000, version 21, 32 pins
> acpimcfg0 at acpi

Re: Reboot loop

2018-06-06 Thread Theo de Raadt
francis.dos.san...@ciudad.com.ar wrote:

> Hello,
> 
> My apologies if this should've gone to bugs@. There are 3 dmesg.boot
> outputs within this text. The last successful boot of version #65 and
> two outputs of #82 (xenodm enabled and disabled).
> 
> About two days ago I upgraded to the version #65 below, just to see if
> the game unknown-horizons would run smooth. Starting the computer anew
> after evaluating the performance of the game I noticed that the machine
> rebooted automatically. Many lines are printed with some weird '-> 0' 
> or '0 -> 1'. It scrolls by too fast to see properly.
> 
> I cannot tell wether the following is relevant. After shutting down the
> game 2bwm looked like it crashed and switching to a different tty seems
> to weirdly copy tty5 (where xenodm starts). Say switching to tty0
> copies tty5 then switching to tty6 (I didn't know this one existed) I'm
> presented with a wsconsole. Rebooting the machine resulted in it too
> start over automatically. But luckily the second time around it
> proceeded to a xenodm login screen.
> 
> Today the system has become unusable. I upgraded to a newer #82 to see
> if my problem would go away, but no luck. It's stuck in an endless
> reboot loop. Commenting out xenodm_flags gave me the this result.
> 
> uvm_fault(0x81db7f68, 0x58, 0, 1) -> e

Just that one line?  No other lines?  I find that hard to believe.































































































































































































































































> 
> Rebooting one more time resulted in an automatic reboot. Starting the
> computer afresh gave me a wsconsole login. The firmware for vmm and
> radeondrm were applied. Rebooting again, went in a loop, after that:
> 
> uvm_fault(0x81db7ea8, 0x58, 0, 1) -> e
> 
> Which piece of hardware is failing? Prior to the #65 upgrade I've seen
> the ring 2 error CAFEDEAD message, but no automatic reboots before said
> version.
> What boot options are there to get a working stable system?
> 
> OpenBSD 6.3-current (GENERIC.MP) #65: Fri Jun  1 17:44:06 MDT 2018
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 7966674944 (7597MB)
> avail mem = 7652007936 (7297MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xe95c0 (17 entries)
> bios0: vendor American Megatrends Inc. version "P1.10" date 04/01/2014
> bios0: ASRock QC5000-ITX/WiFi
> acpi0 at bios0: rev 2
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP APIC FPDT MCFG HPET AAFT SSDT SSDT CRAT SSDT SSDT 
> SSDT SSDT
> acpi0: wakeup devices GFX_(S4) GPP1(S4) GPP2(S4) GPP3(S4) SBAZ(S4) PS2K(S4) 
> UAR1(S4) OHC1(S4) EHC1(S4) OHC2(S4) EHC2(S4) OHC3(S4) EHC3(S4) XHC0(S4)
> acpitimer0 at acpi0: 3579545 Hz, 32 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: AMD A4-5000 APU with Radeon(TM) HD Graphics, 1497.37 MHz
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PCTRL3,ITSC,BMI1
> cpu0: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line 
> 16-way L2 cache
> cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
> cpu0: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: AMD A4-5000 APU with Radeon(TM) HD Graphics, 1497.20 MHz
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PCTRL3,ITSC,BMI1
> cpu1: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line 
> 16-way L2 cache
> cpu1: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
> cpu1: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: AMD A4-5000 APU with Radeon(TM) HD Graphics, 1497.20 MHz
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PCTRL3,ITSC,BM

Re: reboot loop on -current, one machine of several

2017-11-14 Thread Nick Holland
On 11/13/17 14:24, Gregory Edigarov wrote:
...
>>   scsibus1 at ahci0: 32 targets
>> -sd0 at scsibus1 targ 2 lun 0:  SCSI3 0/direct 
>> fixed naa.50025388400562d4
>> +sd0 at scsibus1 targ 0 lun 0:  SCSI3 0/direct 
>> fixed naa.50025388400563fe
>>   sd0: 976762MB, 512 bytes/sector, 2000409264 sectors, thin
>> -sd1 at scsibus1 targ 3 lun 0:  SCSI3 0/direct 
>> fixed naa.5002538c70007b02
>> -sd1: 1953514MB, 512 bytes/sector, 4000797360 sectors, thin
>> +cd0 at scsibus1 targ 1 lun 0:  ATAPI 5/cdrom 
>> removable
>>   ichiic0 at pci0 dev 31 function 3 "Intel 6 Series SMBus" rev 0x04: apic 0 
>> int 19
>>   iic0 at ichiic0

> My suspicion goes to SSDs. one of them have somehow become bad.

I'm not able to say "no" to that.  Been kinda leaning that direction,
myself.
These have been troublesome little beasts.  Got several of the Samsung
850 series in this project, and never had so many problems with storage
since I tried some off-brand (JTI?) disks around 20 years ago.  Yes, I
know, lots of people think these are the best around (Samsung, not JTI).
 *shrug*

However, I did do a dd read over the first few GB (entire 'a' partition,
partition table, mbr, etc.) of the disk to see if there were any read
errors -- none.  Whatever that's worth.

If all else fails, I'll be moving the function to spare hw and totally
rebuild this machine and see if it fixes it.

Nick.



Re: reboot loop on -current, one machine of several

2017-11-13 Thread Gregory Edigarov



On 12.11.17 21:59, Nick Holland wrote:

On 11/12/17 14:13, Otto Moerbeek wrote:

On Sun, Nov 12, 2017 at 01:28:39PM -0500, Nick Holland wrote:


Help.

I was upgrading a few very similar machines to -current today.
ONE of the three decided to be unpleasant.  The thing has a
serial console, and but it is about 370km from me. :-/

Upgrade from Sep 9 current to today's current via bsd.rd, just
like the other two.

Upon reboot, it does this (from /boot) :

booting hd0a:/bsd: 8484712+2429968+244048+0+667648 [636809heap full 
(0x9d304+65536)

And then reboots the system, as if from power-down/power-up.
(already something I haven't seen before)

Reboot from "bsd.rd" and "bsd.sp", same results.  reboot from "obsd"
(Sept 9), same results.  Not a kernel problem, it seems.  About this
point, I'm starting to think how the serial console has let me down.

I remember how to bring up a DRAC remote CD image via ssh tunnels
to the drac and how to run java in a windows browser, and
reboot off the remote CD image, do another upgrade, all goes fine
(again), but upon reboot, same results...  "heap full" and reboot.

Boot from remote CD, at the boot> prompt, enter "boot hd0a:/bsd",
and it boots Just Fine from the local hard disk (only boot pulled
from the remote CD).  Boot loader!  Reinstalled boot:

# installboot -v sd0
Using / as root
installing bootstrap on /dev/rsd0c
using first-stage /usr/mdec/biosboot, second-stage /usr/mdec/boot
copying /usr/mdec/boot to /boot
/boot is 3 blocks x 32768 bytes
fs block shift 3; part offset 64; inode block 24, offset 2088
master boot record (MBR) at sector 0
 partition 3: type 0xA6 offset 64 size 2000397671
/usr/mdec/biosboot will be written at sector 64

good, right?

Reboot off local hard disk, boom.  problem is still there.  maybe
not the boot loader. :-/

Verified /boot on trouble system and good system are the same.

I'm not going to cry "bug", since there are two nearly identical
systems working just fine.  But I can't think of what I did wrong
or what to do to fix it.

Suggestions?

You are hitting -DHEAP_LIMIT=0xA in /boot. The code is in libsa/alloa.c

No idea why. But something in that system is different.

You do have one weird line in your disklabel output: a filesystem
mounted on swap?

that's an mfs.  This application has one directory which has a HUGE
benefit to an MFS for tmp files.  Though the reboot happens long before
the mfs is created.


  scsibus1 at ahci0: 32 targets
-sd0 at scsibus1 targ 2 lun 0:  SCSI3 0/direct 
fixed naa.50025388400562d4
+sd0 at scsibus1 targ 0 lun 0:  SCSI3 0/direct 
fixed naa.50025388400563fe
  sd0: 976762MB, 512 bytes/sector, 2000409264 sectors, thin
-sd1 at scsibus1 targ 3 lun 0:  SCSI3 0/direct 
fixed naa.5002538c70007b02
-sd1: 1953514MB, 512 bytes/sector, 4000797360 sectors, thin
+cd0 at scsibus1 targ 1 lun 0:  ATAPI 5/cdrom 
removable
  ichiic0 at pci0 dev 31 function 3 "Intel 6 Series SMBus" rev 0x04: apic 0 int 
19
  iic0 at ichiic0

My suspicion goes to SSDs. one of them have somehow become bad.


Nick.





Re: reboot loop on -current, one machine of several

2017-11-12 Thread Mihai Popescu
> booting hd0a:/bsd: 8484712+2429968+244048+0+667648 [636809heap full 
> (0x9d304+65536)

Maybe a corrupted or too big /bsd file?

I am curious about the cause.



Re: reboot loop on -current, one machine of several

2017-11-12 Thread Nick Holland
On 11/12/17 14:13, Otto Moerbeek wrote:
> On Sun, Nov 12, 2017 at 01:28:39PM -0500, Nick Holland wrote:
> 
>> Help.
>> 
>> I was upgrading a few very similar machines to -current today.
>> ONE of the three decided to be unpleasant.  The thing has a
>> serial console, and but it is about 370km from me. :-/
>> 
>> Upgrade from Sep 9 current to today's current via bsd.rd, just
>> like the other two.
>> 
>> Upon reboot, it does this (from /boot) :
>> 
>> booting hd0a:/bsd: 8484712+2429968+244048+0+667648 [636809heap full 
>> (0x9d304+65536)
>> 
>> And then reboots the system, as if from power-down/power-up.
>> (already something I haven't seen before)
>> 
>> Reboot from "bsd.rd" and "bsd.sp", same results.  reboot from "obsd"
>> (Sept 9), same results.  Not a kernel problem, it seems.  About this
>> point, I'm starting to think how the serial console has let me down.
>> 
>> I remember how to bring up a DRAC remote CD image via ssh tunnels
>> to the drac and how to run java in a windows browser, and
>> reboot off the remote CD image, do another upgrade, all goes fine
>> (again), but upon reboot, same results...  "heap full" and reboot.
>> 
>> Boot from remote CD, at the boot> prompt, enter "boot hd0a:/bsd",
>> and it boots Just Fine from the local hard disk (only boot pulled
>> from the remote CD).  Boot loader!  Reinstalled boot:
>> 
>> # installboot -v sd0
>> Using / as root
>> installing bootstrap on /dev/rsd0c
>> using first-stage /usr/mdec/biosboot, second-stage /usr/mdec/boot
>> copying /usr/mdec/boot to /boot
>> /boot is 3 blocks x 32768 bytes
>> fs block shift 3; part offset 64; inode block 24, offset 2088
>> master boot record (MBR) at sector 0
>> partition 3: type 0xA6 offset 64 size 2000397671
>> /usr/mdec/biosboot will be written at sector 64
>> 
>> good, right?
>> 
>> Reboot off local hard disk, boom.  problem is still there.  maybe
>> not the boot loader. :-/
>> 
>> Verified /boot on trouble system and good system are the same.  
>> 
>> I'm not going to cry "bug", since there are two nearly identical
>> systems working just fine.  But I can't think of what I did wrong
>> or what to do to fix it.
>> 
>> Suggestions?
> 
> You are hitting -DHEAP_LIMIT=0xA in /boot. The code is in libsa/alloa.c
> 
> No idea why. But something in that system is different. 
> 
> You do have one weird line in your disklabel output: a filesystem
> mounted on swap? 

that's an mfs.  This application has one directory which has a HUGE
benefit to an MFS for tmp files.  Though the reboot happens long before
the mfs is created.

$ more /etc/fstab   

cde728ba2c9bbe7.b none swap sw
ccde728ba2c9bbe7.a / ffs rw,noatime 1 1
ccde728ba2c9bbe7.h /home ffs rw,noatime,nodev,nosuid 1 2
ccde728ba2c9bbe7.e /tmp ffs rw,noatime,nodev,nosuid 1 2
ccde728ba2c9bbe7.d /usr ffs rw,noatime,nodev 1 2
ccde728ba2c9bbe7.f /var ffs rw,noatime,nodev,nosuid 1 2
ccde728ba2c9bbe7.g /repo ffs rw,noatime,nodev 1 2
ccde728ba2c9bbe7.i /repo/anoncvs/dev ffs rw,noatime,nosuid 1 2
/dev/sd0b /repo/anoncvs/tmp mfs rw,nodev,nosuid,-m=1,-s=3072000,-i=2048 0 0

> Can you boot into single user mode?

nope.  Considering how fast the reboot happens, I wouldn't have expected
it to, unless something is very different very early in the boot process.
This is what happened:

On the console:
Using drive 0, partion 3.
Loading...
probing: pc0 com0 com1 mem[631K 3038M 2M 68K 72K 176k 64K 13312M a20=on]
disk: fd0 hd0+
>> OpenBSD/amd64 BOOT 3.33
switching console to com0

and then on the serial console:
>> OpenBSD/amd64 BOOT 3.33  
boot> boot -s   
booting hd0a:/bsd: 8484304+2429960+244080+0+667648 [643739heap full 
(0x9d4fc+65536)

(boom. reboot)

here's a dmesg diff between the "good" and "bad" machines...
$ diff -u dmesg.good dmesg.bad   
--- dmesg.good  Sun Nov 12 14:51:30 2017
+++ dmesg.bad   Sun Nov 12 14:51:21 2017
@@ -1,7 +1,7 @@
 OpenBSD 6.2-current (GENERIC.MP) #203: Sat Nov 11 19:01:19 MST 2017
 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
 real mem = 17131339776 (16337MB)
-avail mem = 16605302784 (15836MB)
+avail mem = 16605294592 (15836MB)
 mpath0 at root
 scsibus0 at mpath0: 256 targets
 mainbus0 at root
@@ -16,46 +16,46 @@
 acpihpet0 at acpi0: 14318179 Hz
 acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
 cpu0 at mainbus0: apid 0 (boot processor)
-cpu0: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3193.18 MHz
+cpu0: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3193.22 MHz
 cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
 cpu0: 256KB 64b/line 8-way L2 cache
-acpihpet0: recalibrated T

Re: reboot loop on -current, one machine of several

2017-11-12 Thread Otto Moerbeek
On Sun, Nov 12, 2017 at 01:28:39PM -0500, Nick Holland wrote:

> Help.
> 
> I was upgrading a few very similar machines to -current today.
> ONE of the three decided to be unpleasant.  The thing has a
> serial console, and but it is about 370km from me. :-/
> 
> Upgrade from Sep 9 current to today's current via bsd.rd, just
> like the other two.
> 
> Upon reboot, it does this (from /boot) :
> 
> booting hd0a:/bsd: 8484712+2429968+244048+0+667648 [636809heap full 
> (0x9d304+65536)
> 
> And then reboots the system, as if from power-down/power-up.
> (already something I haven't seen before)
> 
> Reboot from "bsd.rd" and "bsd.sp", same results.  reboot from "obsd"
> (Sept 9), same results.  Not a kernel problem, it seems.  About this
> point, I'm starting to think how the serial console has let me down.
> 
> I remember how to bring up a DRAC remote CD image via ssh tunnels
> to the drac and how to run java in a windows browser, and
> reboot off the remote CD image, do another upgrade, all goes fine
> (again), but upon reboot, same results...  "heap full" and reboot.
> 
> Boot from remote CD, at the boot> prompt, enter "boot hd0a:/bsd",
> and it boots Just Fine from the local hard disk (only boot pulled
> from the remote CD).  Boot loader!  Reinstalled boot:
> 
> # installboot -v sd0
> Using / as root
> installing bootstrap on /dev/rsd0c
> using first-stage /usr/mdec/biosboot, second-stage /usr/mdec/boot
> copying /usr/mdec/boot to /boot
> /boot is 3 blocks x 32768 bytes
> fs block shift 3; part offset 64; inode block 24, offset 2088
> master boot record (MBR) at sector 0
> partition 3: type 0xA6 offset 64 size 2000397671
> /usr/mdec/biosboot will be written at sector 64
> 
> good, right?
> 
> Reboot off local hard disk, boom.  problem is still there.  maybe
> not the boot loader. :-/
> 
> Verified /boot on trouble system and good system are the same.  
> 
> I'm not going to cry "bug", since there are two nearly identical
> systems working just fine.  But I can't think of what I did wrong
> or what to do to fix it.
> 
> Suggestions?

You are hitting -DHEAP_LIMIT=0xA in /boot. The code is in libsa/alloa.c

No idea why. But something in that system is different. 

You do have one weird line in your disklabel output: a filesystem
mounted on swap? 

Can you boot into single user mode?

-Otto

> 
> Nick.
> 
> 
> $ dmesg
> OpenBSD 6.2-current (GENERIC.MP) #203: Sat Nov 11 19:01:19 MST 2017
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 17131339776 (16337MB)
> avail mem = 16605306880 (15836MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xe6680 (57 entries)
> bios0: vendor Dell Inc. version "2.8.0" date 06/24/2014
> bios0: Dell Inc. PowerEdge R210 II
> acpi0 at bios0: rev 2
> acpi0: sleep states S0 S4 S5
> acpi0: tables DSDT FACP SPMI ASF! HPET APIC MCFG BOOT SSDT ASPT SSDT SSDT 
> SPCR HEST ERST BERT EINJ
> acpi0: wakeup devices P0P1(S4) GLAN(S0) EHC1(S4) EHC2(S4) XHC_(S4) PXSX(S4) 
> RP01(S5) PXSX(S4) RP02(S5) PXSX(S4) RP03(S5) PXSX(S4) RP04(S5) PXSX(S4) 
> RP05(S5) PXSX(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 14318179 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3193.24 MHz
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
> cpu0: 256KB 64b/line 8-way L2 cache
> acpihpet0: recalibrated TSC frequency 3192748207 Hz
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 1, core 0, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
> cpu2: 256KB 64b/line 8-way L2 cache
> cpu2: smt 0, core 1, package 0
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: Intel(R) Xeon(R) CP