Re: debugging a kernel that doesn't start

2022-09-14 Thread Emmanuel Dreyfus
On Mon, Sep 12, 2022 at 10:56:46PM +0200, Edgar Fuß wrote:
> Probably stupid question: I can switch the machine to UEFI. Is it easier 
> to debug things from there that from a BIOS boot?

My experience on a mac is that it helps a lot. You can printf() during
early boot, until you call UEFI's ExitBootServices. Removing that call
may cause a crash at some time, but it will let you printf() a bit 
further.

-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: debugging a kernel that doesn't start

2022-09-13 Thread Edgar Fuß
> I'm trying to run NetBSD on a Dell PowerEdge R6515, and the kernel is being 
> loaded (PXE or USB) but then the machine hangs hard.
I've made a giant step forward: booting the -current install image from a 
USB key /via UEFI/ works.
Maybe it's a bug in the server's CSM.

Thanks for all the helpful comments anyway.


Re: debugging a kernel that doesn't start

2022-09-13 Thread Kengo NAKAHARA

Hi,

On 2022/09/13 4:17, Edgar Fuß wrote:

I'm trying to run NetBSD on a Dell PowerEdge R6515, and the kernel is being
loaded (PXE or USB) but then the machine hangs hard.

What's the way to debug a kernel that hangs so early that you can't printf
or drop into ddb? I guess that's a phenomenon quite common for a new port
or changes to locore.s (or whatever that's called today), but it's completely
new to me.

I have virtually no clue about PeCee hardware. At the point the kernel is
started, are BIOS routines still available?


If you use -current, I think it may hit the following issue.
http://gnats.netbsd.org/54962

Could you try the patch in PR?


Thanks,

--
//
Internet Initiative Japan Inc.

Device Engineering Section,
Product Division,
Technology Unit

Kengo NAKAHARA 




Re: debugging a kernel that doesn't start

2022-09-13 Thread tlaronde
Le Mon, Sep 12, 2022 at 09:17:52PM +0200, Edgar Fuß a écrit :
> I'm trying to run NetBSD on a Dell PowerEdge R6515, and the kernel is being 
> loaded (PXE or USB) but then the machine hangs hard.
> 
> What's the way to debug a kernel that hangs so early that you can't printf 
> or drop into ddb? I guess that's a phenomenon quite common for a new port 
> or changes to locore.s (or whatever that's called today), but it's completely 
> new to me.
> 
> I have virtually no clue about PeCee hardware. At the point the kernel is 
> started, are BIOS routines still available?

Start by trying to boot without the KMS. I had the problem of a kernel
not reaching init, on a remote server, without any other access (no
serial, no IPMI). See:

http://notes.kergis.com/netbsd_on_OVH_baremetal.html

-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: debugging a kernel that doesn't start

2022-09-12 Thread Brian Buhrow
hello.  Another thing to try is to see if you can get to the boot 
prompt and boot the
kernel with various options, i.e. -a, -c, and possibly -2, to disable acpi.  If 
-c gets you to
a driver selection prompt, then you know the kernel is loaded and ready for you 
to disable
drivers.  If you don't get that far, then I'd say it's a boot loader issue.  It 
could be the
way the boot loader interacts with the BIOS, but without more info, it's hard 
to know what,
exactly, is going wrong.
-Brian



Re: debugging a kernel that doesn't start

2022-09-12 Thread Manuel Bouyer
On Mon, Sep 12, 2022 at 10:04:24PM +0200, Edgar Fuß wrote:
> > If you can setup a serial console, it may make things much easier.
> I do have a serial port on the machine.
> 
> > I almost always use serial consoles on dev machines; I don't remember the
> > details but doing the equivalent of a putchar very early was possible.
> Is the BIOS still available or how does that work?

Basically it just requires a outb to the serial port's TX register
(well, you also need to busy-wait for the transmit to complete,
or characters will be lost). This is simple enough to have it working
early in boot, as early as the consinit() call in init_x86_64().

I'm almost sure I used printf() in init_x86_64 with a serial console
(but after the consinit() call or course).
I definitively used it in pmap_bootstrap().

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: debugging a kernel that doesn't start

2022-09-12 Thread Robert Swindells


>> The simplest way to debug something is using a serial port, do you have
>> access to the one on this machine?
>
>Yes, there is one. It seems to sort-of mirror the on-screen messages up to 
>the point the NetBSD boot runs. I tried
>   consdev com0,9600
>from the boot prompt but that hung the machine.

Why not try booting Linux on the thing to find out how the serial
port is configured.

Are you connecting another computer to the serial port on the back or
using the system management controller?


Re: debugging a kernel that doesn't start

2022-09-12 Thread Manuel Bouyer
On Mon, Sep 12, 2022 at 10:02:33PM +0200, Edgar Fuß wrote:
> > Have you tried booting a custom kernel with some drivers removed?
> No. I wouldn't know which drivers to remove.
> The problem is the Kernel utters absolutely nothing, so it must hang very, 
> very early.
> 
> > have you tried an uncompressed one?
> No, but I guess the official install image (on a USB key) is supposed to 
> work as-is, no?
> 
> > The simplest way to debug something is using a serial port, do you have
> > access to the one on this machine?
> Yes, there is one. It seems to sort-of mirror the on-screen messages up to 
> the point the NetBSD boot runs. I tried
>   consdev com0,9600
> from the boot prompt but that hung the machine.

On some systems I have to set the ioaddr too

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: debugging a kernel that doesn't start

2022-09-12 Thread Edgar Fuß
> then you can bypass all the worries of using BIOS routines or whatnot 
> and just poke the hardware directly.
Probably stupid question: I can switch the machine to UEFI. Is it easier 
to debug things from there that from a BIOS boot?


Re: debugging a kernel that doesn't start

2022-09-12 Thread Edgar Fuß
> That could be a strong clue or it could be unrelated.
OK, just in case that might be another clue: If I want to interrupt the 
boot countdown, the first keystroke gets lost, I need to press  
a second time.


Re: debugging a kernel that doesn't start

2022-09-12 Thread Mouse
>> The simplest way to debug something is using a serial port, do you
>> have access to the one on this machine?
> Yes, there is one.  It seems to sort-of mirror the on-screen messages
> up to the point the NetBSD boot runs. I tried
>   consdev com0,9600
> from the boot prompt but that hung the machine.

That could be a strong clue or it could be unrelated.  In particular, I
consider it depressingly likely that the serial port is not a real
serial port but actually a USB-to-serial chip hanging off an internal
USB hub, in which case com0 may not find it - "consdev com0,9600" may
well have worked fine, in a sense, but be sending its output somewhere
that doesn't actually go out any hardware connector anywhere.  That
would look like hanging the machine, even if it technically isn't.

But if it's a real serial port, then the machine wedging at that point
is a strong clue, and you have a much simpler environment to chase
things down in when you're working in the bootblocks themselves.

If you have a real serial port - or a real parallel port, or even a
GPIO pin you can connect an LED to - then you can bypass all the
worries of using BIOS routines or whatnot and just poke the hardware
directly.  Even if you have to hardwire a bunch of addresses and
bitmasks, that's fine at this point; for this kind of debugging,
anything that exfiltrates data is useful.  In the early days of my
playing with next68k hardware I did a lot of poking data into hardwired
addresses that accessed the the framebuffer.  (If you have a GPIO pin,
in theory you could even turn it into an output-only data-pin-only
serial port by using fixed delay-loop constants.)

That paradigm fails as soon as you start meddling with the MMU, because
that can cut you off from access to the hardware.  But it can help if
your issues manifest before that, and, even if not, you've narrowed
things down - and with care you may be able to map the hardware again.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: debugging a kernel that doesn't start

2022-09-12 Thread Edgar Fuß
> If you can setup a serial console, it may make things much easier.
I do have a serial port on the machine.

> I almost always use serial consoles on dev machines; I don't remember the
> details but doing the equivalent of a putchar very early was possible.
Is the BIOS still available or how does that work?


Re: debugging a kernel that doesn't start

2022-09-12 Thread Edgar Fuß
> Have you tried booting a custom kernel with some drivers removed?
No. I wouldn't know which drivers to remove.
The problem is the Kernel utters absolutely nothing, so it must hang very, 
very early.

> have you tried an uncompressed one?
No, but I guess the official install image (on a USB key) is supposed to 
work as-is, no?

> The simplest way to debug something is using a serial port, do you have
> access to the one on this machine?
Yes, there is one. It seems to sort-of mirror the on-screen messages up to 
the point the NetBSD boot runs. I tried
consdev com0,9600
from the boot prompt but that hung the machine.


Re: debugging a kernel that doesn't start

2022-09-12 Thread Manuel Bouyer
On Mon, Sep 12, 2022 at 09:17:52PM +0200, Edgar Fuß wrote:
> I'm trying to run NetBSD on a Dell PowerEdge R6515, and the kernel is being 
> loaded (PXE or USB) but then the machine hangs hard.
> 
> What's the way to debug a kernel that hangs so early that you can't printf 
> or drop into ddb? I guess that's a phenomenon quite common for a new port 
> or changes to locore.s (or whatever that's called today), but it's completely 
> new to me.
> 
> I have virtually no clue about PeCee hardware. At the point the kernel is 
> started, are BIOS routines still available?

If you can setup a serial console, it may make things much easier.
I almost always use serial consoles on dev machines; I don't remember the
details but doing the equivalent of a putchar very early was possible.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: debugging a kernel that doesn't start

2022-09-12 Thread Robert Swindells


Edgar Fuß  wrote:
> I'm trying to run NetBSD on a Dell PowerEdge R6515, and the kernel is being 
> loaded (PXE or USB) but then the machine hangs hard.

Have you tried booting a custom kernel with some drivers removed?

I tried PXE booting an i386 machine today using pxeboot_ia32.bin
from -current, worked fine. Just tried amd64 now as well, also
worked.

One thing I didn't try was to pxeboot a gzipped kernel, have you
tried an uncompressed one?

> What's the way to debug a kernel that hangs so early that you can't
> printf or drop into ddb? I guess that's a phenomenon quite common for
> a new port or changes to locore.s (or whatever that's called today),
> but it's completely new to me.

The simplest way to debug something is using a serial port, do you have
access to the one on this machine?