Re: APIC timer checked before it is set up, boot fails on Connex L1430
On Mon, 7 Jan 2019, Daniel Drake wrote: > I asked the motherboard vendor if they have any idea why the 8254 is > not ticking and they sent me a new BIOS where it is now working. So we > can probably consider this a non issue, although there are a few other > curious points to mention: > > 1. Other platforms with the same Intel N3350 SoC use the HPET timer > instead of PIC during early boot; on this platform there is no HPET > ACPI table though, so it uses PIT instead. Right. PIT is the ultimate fallback. > 2. It does seem a little redundant that the Linux APIC code goes to > these lengths to check that the legacy 8254 timer interrupt is working > before it then gets disabled a bit later during boot (when > setup_APIC_timer happens, the clocksource layer disables the previous > timer source, in this case the PIT or HPET). Although I can see this > is not a trivial patch to write. Well, it'd be trivial if we could rely on the APIC timer being functional on all CPUs and if we could figure out the APIC timer frequency without calibrating it against the PIT/HPET on older CPUs. Plus a gazillion of other issues (e.g. APIC stops in C states ) > 3. Windows 10 boots fine with no HPET & no 8254 timer ticks, so if we > find more platforms that appear in this state, we may want to look for > how to work around this strange situation on the Linux side. Under certain conditions we actually might avoid touching PIT/HPET and solely rely on the CPUID/MSR calibration values. Needs quite some thought though. Thanks, tglx
Re: APIC timer checked before it is set up, boot fails on Connex L1430
On Fri, Dec 28, 2018 at 2:11 PM Daniel Drake wrote: > On the Connex L1430 laptop based on Intel Apollo Lake N3350, Linux > doesn't boot. It hangs early on a blank screen. Reproduced with Linus > git, 4.18 and 4.19 (there is no previous known working kernel > version). EFI earlyprintk shows: > > APIC: switch to symmetric I/O mode setup > x2apic: IRQ remapping doesn't support X2APIC mode > ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 > ..MP-BIOS bug: 8254 timer not connected to IO-APIC > ...tryign to set up timer (IRQ0) through the 8259A ... > . (found apic 0 pin 2) ... > ... failed. > ...trying to set up timer as Virtual Wire IRQ... > . failed. > ...trying to set up timer as ExtINT IRQ... > do_IRQ: 0.55 No irq handler for vector > . failed :(. > Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with > apic=debug and send a report. > > Looking closer, check_timer() is observing that the IOAPIC timer > doesn't tick, so it then tries some other approaches but doesn't > manage to get them working either. > > The strange thing is, I booted with the no_timer_check parameter and > the system works fine! With this parameter it assumes the IOAPIC timer > is ticking and just continues the boot sequence anyway. Here is the > boot log with apic=debug no_timer_check: > https://gist.github.com/dsd/6f40d8ecc7102dd5dcb90c5dedc69214#file-dmesg-txt > > /proc/interrupts shows that APIC Local timer interrupts are working > fine on both CPUs: > https://gist.github.com/dsd/6f40d8ecc7102dd5dcb90c5dedc69214#file-interrupts-txt > > So, check_timer() is incorrectly deducing that the IOAPIC timer isn't > working. The way it checks this is to do a delay loop and then check > if jiffies has advanced. I presume the expectation here is that during > this delay, the hardware IRQ will invoke local_apic_timer_interrupt() > which will then increment jiffies. Indeed, during check_timer() > execution this interrupt does not fire, however by using > no_timer_check and adding a log message I can see that it fires for > the first time quite some time after check_timer() is done: > > ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 > clocksource: tsc-early: mask: 0x max_cycles: > 0xfc66f4fc7c, max_idle_ns: 440795224246 ns > Calibrating delay loop (skipped), value calculated using timer > frequency.. 2188.80 BogoMIPS (lpj=1094400) > pid_max: default: 32768 minimum: 301 > LSM: Security Framework initializing > SELinux: Initializing. > Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) > Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) > Mount-cache hash table entries: 4096 (order: 3, 32768 bytes) > Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes) > mce: CPU supports 7 MCE banks > mce: CPU0: Thermal monitoring enabled (TM1) > Last level iTLB entries: 4KB 48, 2MB 0, 4MB 0 > Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0 > Spectre V2 : Spectre mitigation: kernel not compiled with retpoline; > no mitigation available! > Freeing SMP alternatives memory: 44K > TSC deadline timer enabled > smpboot: CPU0: Intel(R) Celeron(R) CPU N3350 @ 1.10GHz (family: 0x6, > model: 0x5c, stepping: 0x9) > Performance Events: PEBS fmt3+, Goldmont events, 32-deep LBR, > full-width counters, Intel PMU driver. > ... version:4 > ... bit width: 48 > ... generic registers: 4 > ... value mask: > ... max period: 7fff > ... fixed-purpose events: 3 > ... event mask: 0007000f > rcu: Hierarchical SRCU implementation. > smp: Bringing up secondary CPUs ... > !!! local_apic_timer_interrupt for the first time cpu0 !!! > > Experimenting further, I used the same approach of adding delays and > checking for the interrupt during the delay to figure out at which > precise point during the boot sequence the timer interrupt starts > working. It's here: > > static void setup_APIC_timer(void) > { > [...] > if (this_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER)) { > [...] > clockevents_config_and_register(levt, > tsc_khz * (1000 / TSC_DIVISOR), > 0xF, ~0UL); > } > } > > We reach clockevents_register_device() which does: > 1. Take a spinlock and disable IRQs > 2. lapic_set_oneshot() which leads to "TSC deadline timer enabled" message > 3. lapic_next_deadline() > 4. Spin unlock & re-enable IRQs > > At the exact point where IRQs are re-enabled above, which is at the > time of return from clockevents_config_and_register(), timer > interrupts start working. > > > The overall ordering here seems surprising. check_timer() is probing > whether the APIC timer works well before setup_APIC_timer() has been > called. Shouldn't the timer be checked only after it has been set up? > > Or is Linux assuming that the BIOS will boot with the APIC timer > already running? Sorry for the rambling there - having in
APIC timer checked before it is set up, boot fails on Connex L1430
Hi, On the Connex L1430 laptop based on Intel Apollo Lake N3350, Linux doesn't boot. It hangs early on a blank screen. Reproduced with Linus git, 4.18 and 4.19 (there is no previous known working kernel version). EFI earlyprintk shows: APIC: switch to symmetric I/O mode setup x2apic: IRQ remapping doesn't support X2APIC mode ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 ..MP-BIOS bug: 8254 timer not connected to IO-APIC ...tryign to set up timer (IRQ0) through the 8259A ... . (found apic 0 pin 2) ... ... failed. ...trying to set up timer as Virtual Wire IRQ... . failed. ...trying to set up timer as ExtINT IRQ... do_IRQ: 0.55 No irq handler for vector . failed :(. Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with apic=debug and send a report. Looking closer, check_timer() is observing that the IOAPIC timer doesn't tick, so it then tries some other approaches but doesn't manage to get them working either. The strange thing is, I booted with the no_timer_check parameter and the system works fine! With this parameter it assumes the IOAPIC timer is ticking and just continues the boot sequence anyway. Here is the boot log with apic=debug no_timer_check: https://gist.github.com/dsd/6f40d8ecc7102dd5dcb90c5dedc69214#file-dmesg-txt /proc/interrupts shows that APIC Local timer interrupts are working fine on both CPUs: https://gist.github.com/dsd/6f40d8ecc7102dd5dcb90c5dedc69214#file-interrupts-txt So, check_timer() is incorrectly deducing that the IOAPIC timer isn't working. The way it checks this is to do a delay loop and then check if jiffies has advanced. I presume the expectation here is that during this delay, the hardware IRQ will invoke local_apic_timer_interrupt() which will then increment jiffies. Indeed, during check_timer() execution this interrupt does not fire, however by using no_timer_check and adding a log message I can see that it fires for the first time quite some time after check_timer() is done: ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 clocksource: tsc-early: mask: 0x max_cycles: 0xfc66f4fc7c, max_idle_ns: 440795224246 ns Calibrating delay loop (skipped), value calculated using timer frequency.. 2188.80 BogoMIPS (lpj=1094400) pid_max: default: 32768 minimum: 301 LSM: Security Framework initializing SELinux: Initializing. Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) Mount-cache hash table entries: 4096 (order: 3, 32768 bytes) Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes) mce: CPU supports 7 MCE banks mce: CPU0: Thermal monitoring enabled (TM1) Last level iTLB entries: 4KB 48, 2MB 0, 4MB 0 Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0 Spectre V2 : Spectre mitigation: kernel not compiled with retpoline; no mitigation available! Freeing SMP alternatives memory: 44K TSC deadline timer enabled smpboot: CPU0: Intel(R) Celeron(R) CPU N3350 @ 1.10GHz (family: 0x6, model: 0x5c, stepping: 0x9) Performance Events: PEBS fmt3+, Goldmont events, 32-deep LBR, full-width counters, Intel PMU driver. ... version:4 ... bit width: 48 ... generic registers: 4 ... value mask: ... max period: 7fff ... fixed-purpose events: 3 ... event mask: 0007000f rcu: Hierarchical SRCU implementation. smp: Bringing up secondary CPUs ... !!! local_apic_timer_interrupt for the first time cpu0 !!! Experimenting further, I used the same approach of adding delays and checking for the interrupt during the delay to figure out at which precise point during the boot sequence the timer interrupt starts working. It's here: static void setup_APIC_timer(void) { [...] if (this_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER)) { [...] clockevents_config_and_register(levt, tsc_khz * (1000 / TSC_DIVISOR), 0xF, ~0UL); } } We reach clockevents_register_device() which does: 1. Take a spinlock and disable IRQs 2. lapic_set_oneshot() which leads to "TSC deadline timer enabled" message 3. lapic_next_deadline() 4. Spin unlock & re-enable IRQs At the exact point where IRQs are re-enabled above, which is at the time of return from clockevents_config_and_register(), timer interrupts start working. The overall ordering here seems surprising. check_timer() is probing whether the APIC timer works well before setup_APIC_timer() has been called. Shouldn't the timer be checked only after it has been set up? Or is Linux assuming that the BIOS will boot with the APIC timer already running? Any other debugging suggestions much appreciated. Thanks Daniel