On 9/9/25 10:29, Stanislav Kinsburskii wrote:
> On Thu, Sep 04, 2025 at 07:38:53PM -0700, Mukesh R wrote:
>> On 9/4/25 15:37, Stanislav Kinsburskii wrote:
>>> On Wed, Sep 03, 2025 at 07:10:16PM -0700, Mukesh Rathor wrote:
>>>> +
>>>> +/*
>>>> + * Common function for all cpus before devirtualization.
>>>> + *
>>>> + * Hypervisor crash: all cpus get here in nmi context.
>>>> + * Linux crash: the panicing cpu gets here at base level, all others in
>>>> nmi
>>>> + * context. Note, panicing cpu may not be the bsp.
>>>> + *
>>>> + * The function is not inlined so it will show on the stack. It is named
>>>> so
>>>> + * because the crash cmd looks for certain well known function names on
>>>> the
>>>> + * stack before looking into the cpu saved note in the elf section, and
>>>> + * that work is currently incomplete.
>>>> + *
>>>> + * Notes:
>>>> + * Hypervisor crash:
>>>> + * - the hypervisor is in a very restrictive mode at this point and any
>>>> + * vmexit it cannot handle would result in reboot. For example,
>>>> console
>>>> + * output from here would result in synic ipi hcall, which would
>>>> result
>>>> + * in reboot. So, no mumbo jumbo, just get to kexec as quickly as
>>>> possible.
>>>> + *
>>>> + * Devirtualization is supported from the bsp only.
>>>> + */
>>>> +static noinline __noclone void crash_nmi_callback(struct pt_regs *regs)
>>>> +{
>>>> + struct hv_input_disable_hyp_ex *input;
>>>> + u64 status;
>>>> + int msecs = 1000, ccpu = smp_processor_id();
>>>> +
>>>> + if (ccpu == 0) {
>>>> + /* crash_save_cpu() will be done in the kexec path */
>>>> + cpu_emergency_stop_pt(); /* disable performance trace */
>>>> + atomic_inc(&crash_cpus_wait);
>>>> + } else {
>>>> + crash_save_cpu(regs, ccpu);
>>>> + cpu_emergency_stop_pt(); /* disable performance trace */
>>>> + atomic_inc(&crash_cpus_wait);
>>>> + for (;;); /* cause no vmexits */
>>>> + }
>>>> +
>>>> + while (atomic_read(&crash_cpus_wait) < num_online_cpus() && msecs--)
>>>> + mdelay(1);
>>>> +
>>>> + stop_nmi();
>>>> + if (!hv_has_crashed)
>>>> + hv_notify_prepare_hyp();
>>>> +
>>>> + if (crashing_cpu == -1)
>>>> + crashing_cpu = ccpu; /* crash cmd uses this */
>>>> +
>>>> + hv_hvcrash_ctxt_save();
>>>> + hv_mark_tss_not_busy();
>>>> + hv_crash_fixup_kernpt();
>>>> +
>>>> + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
>>>> + memset(input, 0, sizeof(*input));
>>>> + input->rip = trampoline_pa; /* PA of hv_crash_asm32 */
>>>> + input->arg = devirt_cr3arg; /* PA of trampoline page table L4 */
>>>> +
>>>> + status = hv_do_hypercall(HVCALL_DISABLE_HYP_EX, input, NULL);
>>>> + if (!hv_result_success(status)) {
>>>> + pr_emerg("%s: %s\n", __func__, hv_result_to_string(status));
>>>> + pr_emerg("Hyper-V: disable hyp failed. kexec not possible\n");
>>>
>>> These prints won't ever be printed to any console as prints in NMI
>>> handler are deffered.
>>
>> It's mostly for debug. There are different config options allowing one
>> to build kernel easily dumping to either uart, led, speaker etc... There
>> are no easy ways to debug. kernel debuggers could trap EMERGENCY printks
>> also...
>>
>> Are you 100% sure printk is async even if KERN_EMERG? If yes, I'd like to
>> propose someday to make it bypass all that for pr_emerg.
>>
>
> Yes, I'm quite sure. Right now this looks like is dead code.
>
>>
>>> Also, how are they aligned with the notice in the comment on top of
>>> the function stating that console output would lead to synic ipi call?
>>
>> Comment says "Hypervisor Crash". Please reread the whole block.
>>
>
> The comment states that in case of hypervisor crash "console
> output from here would result in synic ipi hcall, which would result in
> reboot".
> So, why printing anything if it will simply lead to reboot?
>
>>>
>>> Resetting the machine from an NMI handler is sloppy.
>>> There could be another NMI, which triggers the panic, leading to this
>>> handler.
>>> NMI handlers servicing is batched meanining that not only this handler
>>> won't output anything, but also any other prints from any other handlers
>>> executed before the same lock won't be written out to consoles.
>>>
>>> This introduces silent machine resets for the root partition. Can the
>>> intrusive logic me moved to a tasklet?
>>
>> I really don't think you understand what is going on here. I've tried
>> telling you at least once in the past year, there is no return from the nmi
>> handler in case of hyp crash, and that this is panic mode, something
>> really bad has happened! It could be memory corruption, it could be
>> hw failure... The hyp goes in emergency mode that just mostly loops,
>> handling tiny number of hypercalls and msrs for support of dom0/root
>> like windows that implements custom core collection in raw mode.
>>
>
> I wasn't clear.
> I wasn't talking about a hypervisor crash. If it is so intrusive, that an
> attempt to print things to console may lead to reboot, then there should
> be no prints for this case.
The line after the print is reboot!!
Ah, forget it! heck with the prints...
> But this same logic is also used for Linux crashes, when prints can and
> should be printed to console.
check the panic function to figure when/where it prints, then check
where the nmi is called from. that will help.
> Moreover, whe same logic is used for a case when there is no crash
> kernel loaded, which as I said already leads to silent reboot if panic
> has happened in NMI handler.
>
> I believe this needs to be fixed.
>
> Stas
>