Sorry for auto-responding all the time ;-)
I was just able to catch a "freeze" followed by a successful boot
afterwards.

The successful boot continues with these lines:

[   62.922169] systemd[1]: Finished Create System Users.
[   62.923633] systemd[1]: Starting Create Static Device Nodes in /dev...
[   62.941753] systemd[1]: Finished Create Static Device Nodes in /dev.
[   62.944691] systemd[1]: Starting Rule-based Manager for Device Events
and Files...
[   62.953082] systemd[1]: modprobe@drm.service: Succeeded.
[   62.953539] systemd[1]: Finished Load Kernel Module drm.
[   62.983630] systemd[1]: Started Rule-based Manager for Device Events and
Files.
[   62.991307] systemd[1]: Finished Set the console keyboard layout.
[   63.015898] input: Power Button as
/devices/LNXSYSTM:00/LNXPWRBN:00/input/input5
[   63.016490] systemd[1]: Finished Coldplug All udev Devices.
[   63.018250] systemd[1]: Starting Helper to synchronize boot up for
ifupdown...
[   63.020119] power_meter ACPI000D:00: Found ACPI power meter.
[   63.020214] power_meter ACPI000D:00: Ignoring unsafe software power cap!
[   63.020280] power_meter ACPI000D:00: hwmon_device_register() is
deprecated. Please convert the driver to use
hwmon_device_register_with_info().
[   63.029971] systemd[1]: Finished Monitoring of LVM2 mirrors, snapshots
etc. using dmeventd or progress polling.
[   63.030392] systemd[1]: Reached target Local File Systems (Pre).
[   63.031784] IPMI message handler: version 39.2
[   63.035060] ipmi device interface
[   63.036149] ACPI: Power Button [PWRF]
[   63.038539] EDAC MC1: Giving out device to module i7core_edac.c
controller i7 core #1: DEV 0000:3e:03.0 (INTERRUPT)
[   63.038670] EDAC PCI0: Giving out device to module i7core_edac
controller EDAC PCI controller: DEV 0000:3e:03.0 (POLLED)
[   63.039204] EDAC MC0: Giving out device to module i7core_edac.c
controller i7 core #0: DEV 0000:3f:03.0 (INTERRUPT)
[   63.039315] EDAC PCI1: Giving out device to module i7core_edac
controller EDAC PCI controller: DEV 0000:3f:03.0 (POLLED)
[   63.039405] EDAC i7core: Driver loaded, 2 memory controller(s) found.
[   63.044910] ipmi_si: IPMI System Interface driver
[   63.044996] ipmi_si dmi-ipmi-si.0: ipmi_platform: probing via SMBIOS
[   63.045059] ipmi_platform: ipmi_si: SMBIOS: io 0xca2 regsize 1 spacing 1
irq 0
[   63.045134] ipmi_si: Adding SMBIOS-specified kcs state machine
[   63.045263] ipmi_si IPI0001:00: ipmi_platform: probing via ACPI
[   63.045393] ipmi_si IPI0001:00: ipmi_platform: [io  0x0ca2-0x0ca3]
regsize 1 spacing 1 irq 0
[   63.045652] iTCO_vendor_support: vendor-support=0
[   63.046504] hpwdt 0000:02:00.0: HPE Watchdog Timer Driver: NMI decoding
initialized

This line catches my attention:

[   62.953082] systemd[1]: modprobe@drm.service: Succeeded.

This is missing (doesn't show) when the freeze happens.

FYI in the meantime I also installed firmware-amd-graphics however the
behaviour (sometimes freeze, sometimes boot) is still the same.

I continue to troubleshoot but if anyone has experienced something similar
or has some hints or can point to existing bugs please let me know.

On Tue, Jun 29, 2021 at 10:04 AM Claudio Kuenzler <c...@claudiokuenzler.com>
wrote:

> Meanwhile I was able to identify more by removing "quiet" from the grub
> loader.
> The pcc_cpufreq_init does not seem to hurt the booting - these are just
> warnings popping up.
>
> The following messages appear on the console before the server freezes:
>
> [ OK ] Finished Load Kernel Module fuse.
> [ 62.887855] systemd[1]: Mounting FUSE Control File System...
>    Mounting FUSE Controle File System...
> [ 62.891852] systemd[1]: Finished Apply Kernel Variables.
> [ OK ] Finished Apply Kernel Variables.
> [ 62.892237] systemd[1]: Mounted FUSE Control File System.
> [ OK ] Mounted FUSE Control File System.
> [ 62.900668] systemd[1]: Finished Create System Users.
> [ OK ] Finished Create System Users.
> [ 62.902224] systemd[1]: Starting Create Static Device Nodes in /dev...
>   Starting Create Static Device Nodes in /dev...
> [ 62.920767] systemd[1]: modprobe@drm.service: Succeeded.
> [ 62.921202] systemd[1]: Finished Load Kernel Module drm.
> [ OK ] Finished Load Kernel Module drm.
> [ 62.921979] systemd[1]: Finished Create Static Device Nodes in /dev.
> [ OK ] Finished Create Static Device Nodes in /dev.
> [ 62.925007] systemd[1]: Starting Rule-based Manager for Device Events and
> Files...
>    Starting Rule-based Manager for Device Events and Files...
> [ 62.955322] systemd[1]: Finished Monitoring of LVM2 mirrors, snapshots
> etc. using dmeventd or progress polling.
> [ OK ] Finished Monitoring of LVM2 mirrors, snapshots etc. using dmeventd
> or progress polling.
> [ 62.962186] systemd[1]: Started Rule-based Manager for Device Events and
> Files.
>
> After this, no further messages, no login prompt, server does not react to
> keyboard input anymore. Only a hardware reset works in this case.
> Out of ~10 server reboots this problem occurred 4 or 5 times.
>
> Could it have something to do with drm? I've seen a drm driver error
> during earlier boot phase.
>
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.182074] [drm] radeon kernel
> modesetting enabled.
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.182197] radeon 0000:01:03.0:
> vgaarb: deactivate vga console
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.183720] Console: switching to
> colour dummy device 80x25
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.184088] [drm] initializing
> kernel modesetting (RV100 0x1002:0x515E 0x103C:0x31FB 0x02).
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.184208] radeon 0000:01:03.0:
> VRAM: 128M 0x00000000E8000000 - 0x00000000EFFFFFFF (64M used)
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.184210] radeon 0000:01:03.0:
> GTT: 512M 0x00000000C8000000 - 0x00000000E7FFFFFF
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.184219] [drm] Detected VRAM
> RAM=128M, BAR=128M
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.184220] [drm] RAM width 16bits
> DDR
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.184302] [TTM] Zone  kernel:
> Available graphics memory: 49487844 KiB
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.184304] [TTM] Zone   dma32:
> Available graphics memory: 2097152 KiB
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.184305] [TTM] Initializing pool
> allocator
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.184310] [TTM] Initializing DMA
> pool allocator
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.184333] [drm] radeon: 64M of
> VRAM memory ready
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.184334] [drm] radeon: 512M of
> GTT memory ready.
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.184371] [drm] GART: num cpu
> pages 131072, num gpu pages 131072
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.205645] [drm] PCI GART of 512M
> enabled (table at 0x00000000FFF00000).
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.205890] radeon 0000:01:03.0: WB
> disabled
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.205894] radeon 0000:01:03.0:
> fence driver on ring 0 use gpu addr 0x00000000c8000000
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.205967] [drm] radeon: irq
> initialized.
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.205980] [drm] Loading R100
> Microcode
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.206233] radeon 0000:01:03.0:
> firmware: failed to load radeon/R100_cp.bin (-2)
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.206241] firmware_class: See
> https://wiki.debian.org/Firmware for information about missing firmware
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.206246] radeon 0000:01:03.0:
> Direct firmware load for radeon/R100_cp.bin failed with error -2
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.206311] [drm:r100_cp_init
> [radeon]] *ERROR* Failed to load firmware!
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.206318] radeon 0000:01:03.0:
> failed initializing CP (-2).
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.206321] radeon 0000:01:03.0:
> Disabling GPU acceleration
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.206329] [drm] radeon: cp
> finalized
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.206961] [drm] No TV DAC info
> found in BIOS
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.206996] [drm] Radeon Display
> Connectors
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.206997] [drm] Connector 0:
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.206998] [drm]   VGA-1
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.206999] [drm]   DDC: 0x60 0x60
> 0x60 0x60 0x60 0x60 0x60 0x60
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.207000] [drm]   Encoders:
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.207001] [drm]     CRT1:
> INTERNAL_DAC1
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.207002] [drm] Connector 1:
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.207003] [drm]   VGA-2
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.207004] [drm]   DDC: 0x6c 0x6c
> 0x6c 0x6c 0x6c 0x6c 0x6c 0x6c
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.207004] [drm]   Encoders:
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.207005] [drm]     CRT2:
> INTERNAL_DAC2
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.236242] kvm:
> VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using workaround
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.245005] EXT4-fs (dm-0): mounted
> filesystem with ordered data mode. Opts: (null)
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.250269] [drm] fb mappable at
> 0xE8040000
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.250270] [drm] vram apper at
> 0xE8000000
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.250271] [drm] size 1572864
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.250271] [drm] fb depth is 16
> Jun 28 16:15:05 irczsrvp08 kernel: [   63.250272] [drm]    pitch is 2048
>
> Maybe related to the known bullseye errata
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=989863 ?
>
>
>
> On Mon, Jun 28, 2021 at 8:32 PM Claudio Kuenzler <c...@claudiokuenzler.com>
> wrote:
>
>> Hello!
>>
>> Currently testing the new Bullseye release (using
>> firmware-bullseye-DI-rc2-amd64-netinst.iso) and see a strange phenomenon on
>> a HP Proliant DL380 G7 server.
>>
>> During boot, the following messages show up in the console:
>>
>> [63.063844] pcc_cpufreq_init: Too many CPUs, dynamic performance scaling
>> disabled
>> [63.063895] pcc_cpufreq_init: Try to enable another scaling driver
>> through BIOS settings
>> [63.063943] pcc_cpufreq_init: and complain to the system vendor
>>
>> According to
>> https://patchwork.kernel.org/project/linux-pm/patch/5423012.zznfdyd...@aspire.rjw.lan/
>> this is a Kernel patch from July 2018.
>> According to Andreas Herrmann, the settings can be defined in the HP
>> server BIOS:
>>
>> Power Management -> Advanced Power Options -> Collaborative Power Control
>> = enabled
>>
>> This is active (is the default I believe). The Power Regulator is set to
>> "Dynamic Power Savings Mode".
>>
>> After these messages show up on the console, no login prompt appears. No
>> network started. The server seems frozen - doesn't even react to
>> CTRL+ALT+DEL on the console anymore. Not sure if this is caused by cpufreq
>> or something else though.
>>
>> This boot problem happened on 2 out of 3 server boots.
>>
>> Is this a bug in Bullseye?
>>
>> thx for any hints.
>>
>>

Reply via email to