On 9/9/25 10:29, Stanislav Kinsburskii wrote:
> On Thu, Sep 04, 2025 at 07:38:53PM -0700, Mukesh R wrote:
>> On 9/4/25 15:37, Stanislav Kinsburskii wrote:
>>> On Wed, Sep 03, 2025 at 07:10:16PM -0700, Mukesh Rathor wrote:
>>>> +
>>>> +/*
>>>> + * Common function for all cpus before devirtualization.
>>>> + *
>>>> + * Hypervisor crash: all cpus get here in nmi context.
>>>> + * Linux crash: the panicing cpu gets here at base level, all others in 
>>>> nmi
>>>> + *                context. Note, panicing cpu may not be the bsp.
>>>> + *
>>>> + * The function is not inlined so it will show on the stack. It is named 
>>>> so
>>>> + * because the crash cmd looks for certain well known function names on 
>>>> the
>>>> + * stack before looking into the cpu saved note in the elf section, and
>>>> + * that work is currently incomplete.
>>>> + *
>>>> + * Notes:
>>>> + *  Hypervisor crash:
>>>> + *    - the hypervisor is in a very restrictive mode at this point and any
>>>> + *        vmexit it cannot handle would result in reboot. For example, 
>>>> console
>>>> + *        output from here would result in synic ipi hcall, which would 
>>>> result
>>>> + *        in reboot. So, no mumbo jumbo, just get to kexec as quickly as 
>>>> possible.
>>>> + *
>>>> + *  Devirtualization is supported from the bsp only.
>>>> + */
>>>> +static noinline __noclone void crash_nmi_callback(struct pt_regs *regs)
>>>> +{
>>>> +  struct hv_input_disable_hyp_ex *input;
>>>> +  u64 status;
>>>> +  int msecs = 1000, ccpu = smp_processor_id();
>>>> +
>>>> +  if (ccpu == 0) {
>>>> +          /* crash_save_cpu() will be done in the kexec path */
>>>> +          cpu_emergency_stop_pt();        /* disable performance trace */
>>>> +          atomic_inc(&crash_cpus_wait);
>>>> +  } else {
>>>> +          crash_save_cpu(regs, ccpu);
>>>> +          cpu_emergency_stop_pt();        /* disable performance trace */
>>>> +          atomic_inc(&crash_cpus_wait);
>>>> +          for (;;);                       /* cause no vmexits */
>>>> +  }
>>>> +
>>>> +  while (atomic_read(&crash_cpus_wait) < num_online_cpus() && msecs--)
>>>> +          mdelay(1);
>>>> +
>>>> +  stop_nmi();
>>>> +  if (!hv_has_crashed)
>>>> +          hv_notify_prepare_hyp();
>>>> +
>>>> +  if (crashing_cpu == -1)
>>>> +          crashing_cpu = ccpu;            /* crash cmd uses this */
>>>> +
>>>> +  hv_hvcrash_ctxt_save();
>>>> +  hv_mark_tss_not_busy();
>>>> +  hv_crash_fixup_kernpt();
>>>> +
>>>> +  input = *this_cpu_ptr(hyperv_pcpu_input_arg);
>>>> +  memset(input, 0, sizeof(*input));
>>>> +  input->rip = trampoline_pa;     /* PA of hv_crash_asm32 */
>>>> +  input->arg = devirt_cr3arg;     /* PA of trampoline page table L4 */
>>>> +
>>>> +  status = hv_do_hypercall(HVCALL_DISABLE_HYP_EX, input, NULL);
>>>> +  if (!hv_result_success(status)) {
>>>> +          pr_emerg("%s: %s\n", __func__, hv_result_to_string(status));
>>>> +          pr_emerg("Hyper-V: disable hyp failed. kexec not possible\n");
>>>
>>> These prints won't ever be printed to any console as prints in NMI
>>> handler are deffered.
>>
>> It's mostly for debug. There are different config options allowing one
>> to build kernel easily dumping to either uart, led, speaker etc... There
>> are no easy ways to debug. kernel debuggers could trap EMERGENCY printks 
>> also...  
>>
>> Are you 100% sure printk is async even if KERN_EMERG? If yes, I'd like to 
>> propose someday to make it bypass all that for pr_emerg.
>>
> 
> Yes, I'm quite sure. Right now this looks like is dead code.
> 
>>
>>> Also, how are they aligned with the notice in the comment on top of
>>> the function stating that console output would lead to synic ipi call?
>>
>> Comment says "Hypervisor Crash". Please reread the whole block.
>>
> 
> The comment states that in case of hypervisor crash "console
> output from here would result in synic ipi hcall, which would result in
> reboot".
> So, why printing anything if it will simply lead to reboot?
> 
>>>
>>> Resetting the machine from an NMI handler is sloppy.
>>> There could be another NMI, which triggers the panic, leading to this 
>>> handler.
>>> NMI handlers servicing is batched meanining that not only this handler
>>> won't output anything, but also any other prints from any other handlers
>>> executed before the same lock won't be written out to consoles.
>>>
>>> This introduces silent machine resets for the root partition. Can the
>>> intrusive logic me moved to a tasklet?
>>
>> I really don't think you understand what is going on here. I've tried
>> telling you at least once in the past year, there is no return from the nmi 
>> handler in case of hyp crash, and that this is panic mode, something 
>> really bad has happened! It could be memory corruption, it could be 
>> hw failure...  The hyp goes in emergency mode that just mostly loops, 
>> handling tiny number of hypercalls and msrs for support of dom0/root 
>> like windows that implements custom core collection in raw mode.
>>
> 
> I wasn't clear.
> I wasn't talking about a hypervisor crash. If it is so intrusive, that an
> attempt to print things to console may lead to reboot, then there should
> be no prints for this case.

The line after the print is reboot!! 
Ah, forget it! heck with the prints... 

> But this same logic is also used for Linux crashes, when prints can and
> should be printed to console.

check the panic function to figure when/where it prints, then check
where the nmi is called from. that will help.

> Moreover, whe same logic is used for a case when there is no crash
> kernel loaded, which as I said already leads to silent reboot if panic
> has happened in NMI handler.
> 
> I believe this needs to be fixed.
> 
> Stas
> 


Reply via email to