From: Michael Kelley <[email protected]> Sent: Thursday, January 22, 2026 
10:39 PM
> 
> From: Matthew Ruffell <[email protected]> Sent: Thursday, January 
> 22, 2026 9:39 PM
> >
> > Hi Michael,
> >
> > > > I wonder if commit a41e0ab394e4 broke the initialization of screen_info 
> > > > in the
> > > > kdump kernel. Or perhaps there is now a rev-lock between the kernel 
> > > > with this
> > > > commit and a new version of the user space kexec command.
> >
> > a41e0ab394e4 isn't a mainline commit. Can you please mention the commit 
> > subject
> > so I can have a read.
> 
> It's this patch:
> 
> https://lore.kernel.org/lkml/[email protected]/ 
> 
> which is in linux-next, but not yet in mainline. Since you are dealing with 
> older
> kernels, it's not the culprit.
> 
> >
> > > > There's a parameter to the kexec() command that governs whether it uses 
> > > > the
> > > > kexec_file_load() system call or the kexec_load() system call.
> > > > I wonder if that parameter makes a difference in the problem described 
> > > > for this
> > > > patch.
> >
> > Yes, it does indeed make a difference. I have been debugging this the past 
> > few
> > days, and my colleague Melissa noticed that the problem reproduces when 
> > secure
> > boot is disabled, but it does not reproduce when secure boot is enabled.
> > Additionally, it reproduces on jammy, but not noble. It turns out that
> > kexec-tools on jammy defaults to kexec_load() when secure boot is disabled,
> > and when enabled, it instead uses kexec_file_load(). On noble, it defaults 
> > to
> > first trying kexec_file_load() before falling back to kexec_load(), so the
> > issue does not reproduce.
> 
> This is good info, and definitely a clue. So to be clear, the problem repros
> only when kexec_load() is used. With kexec_file_load(), it does not repro. Is 
> that
> right? I saw a similar distinction when working on commit 304386373007,
> though in the opposite direction!
> 
> >
> > > > >       /*
> > > > >        * Set up a region of MMIO space to use for accessing 
> > > > > configuration
> > > > > -      * space.
> > > > > +      * space. Use the high MMIO range to not conflict with the 
> > > > > hyperv_drm
> > > > > +      * driver (which normally gets MMIO from the low MMIO range) in 
> > > > > the
> > > > > +      * kdump kernel of a Gen2 VM, which fails to reserve the 
> > > > > framebuffer
> > > > > +      * MMIO range in vmbus_reserve_fb() due to screen_info.lfb_base 
> > > > > being
> > > > > +      * zero in the kdump kernel.
> > > > >        */
> > > > > -     ret = vmbus_allocate_mmio(&hbus->mem_config, hbus->hdev, 0, -1,
> > > > > +     ret = vmbus_allocate_mmio(&hbus->mem_config, hbus->hdev, SZ_4G, 
> > > > > -1,
> > > > >                                 PCI_CONFIG_MMIO_LENGTH, 0x1000, 
> > > > > false);
> > > > >       if (ret)
> > > > >               return ret;
> > > > > --
> >
> > Thank you for the patch Dexuan.
> >
> > This patch fixes the problem on Ubuntu 5.15, and 6.8 based kernels
> > booting V6 instance types on Azure with Gen 2 images.
> 
> Are you seeing the problem on x86/64 or arm64 instances in Azure?
> "V6 instance types" could be either, I think, but I'm guessing you
> are on x86/64.
> 
> And just to confirm: are you seeing the problem with the
> Hyper-V DRM driver, or the Hyper-V FB driver? This patch mentions
> the DRM driver, so I assume that's the problematic config.
> 
> >
> > Tested-by: Matthew Ruffell <[email protected]>
> 
> While this patch may solve the observed problem, I'm interested in
> understanding the root cause of why vmbus_reserve_fb() is seeing
> screen_info.lfb_base set to zero. It may be next week before I can
> take a look, and I may need follow up with you on more details of the
> scenario to reproduce the problem.

One more thought here: Is commit 96959283a58d relevant? The
commit message describes a scenario where vmbus_reserve_fb()
doesn't do anything because CONFIG_SYSFB is not set. Looking at
the code for vmbus_reserve_fb(), it doing nothing might imply that
screen_info.lfb_base is 0. But when CONFIG_SYSFB is not set,
screen_info.lfb_base is just ignored, with the same result. This behavior
started with the 6.7 kernel due to commit a07b50d80ab6.

Note that commit 96959283a58d has a follow-on to correct a
problem when CONFIG_EFI is not set.  See commit 7b89a44b2e8c.
If there's a reason to backport 96959283a58d, also get
7b89a44b2e8c.

Michael

Reply via email to