Re: [RFC PATCH] x86, hyperv: fix kernel panic when kexec on HyperV VM
On Tue, Mar 5, 2019 at 8:28 PM Peter Zijlstra wrote: > > On Wed, Feb 27, 2019 at 10:55:46PM +0800, Kairui Song wrote: > > On Wed, Feb 27, 2019 at 8:02 PM Peter Zijlstra wrote: > > > > > > On Tue, Feb 26, 2019 at 11:56:15PM +0800, Kairui Song wrote: > > > > arch/x86/hyperv/hv_init.c | 4 > > > > 1 file changed, 4 insertions(+) > > > > > > > > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c > > > > index 7abb09e2eeb8..92291c18d716 100644 > > > > --- a/arch/x86/hyperv/hv_init.c > > > > +++ b/arch/x86/hyperv/hv_init.c > > > > @@ -406,6 +406,10 @@ void hyperv_cleanup(void) > > > > /* Reset our OS id */ > > > > wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0); > > > > > > > > + /* Cleanup page reference before reset the page */ > > > > + hv_hypercall_pg = NULL; > > > > + wmb(); > > > > > > What do we need that SFENCE for? Any why does it lack a comment? > > > > Hi, that's for ensuring the hv_hypercall_pg is reset to NULL before > > the following wrmsr call. The wrmsr call will make the pointer address > > invalid. > > WRMSR is a serializing instruction (except for TSC_DEADLINE and the > X2APIC). > Many thanks for the info, I'm not aware of the exception condition, V2 is sent, will drop the barrier in V3 then. -- Best Regards, Kairui Song ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [RFC PATCH] x86, hyperv: fix kernel panic when kexec on HyperV VM
On Wed, Feb 27, 2019 at 10:55:46PM +0800, Kairui Song wrote: > On Wed, Feb 27, 2019 at 8:02 PM Peter Zijlstra wrote: > > > > On Tue, Feb 26, 2019 at 11:56:15PM +0800, Kairui Song wrote: > > > arch/x86/hyperv/hv_init.c | 4 > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c > > > index 7abb09e2eeb8..92291c18d716 100644 > > > --- a/arch/x86/hyperv/hv_init.c > > > +++ b/arch/x86/hyperv/hv_init.c > > > @@ -406,6 +406,10 @@ void hyperv_cleanup(void) > > > /* Reset our OS id */ > > > wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0); > > > > > > + /* Cleanup page reference before reset the page */ > > > + hv_hypercall_pg = NULL; > > > + wmb(); > > > > What do we need that SFENCE for? Any why does it lack a comment? > > Hi, that's for ensuring the hv_hypercall_pg is reset to NULL before > the following wrmsr call. The wrmsr call will make the pointer address > invalid. WRMSR is a serializing instruction (except for TSC_DEADLINE and the X2APIC). > I can add some comment in V2 if this is OK. Barriers must always have a comment. ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [RFC PATCH] x86, hyperv: fix kernel panic when kexec on HyperV VM
On Wed, Feb 27, 2019 at 8:02 PM Peter Zijlstra wrote: > > On Tue, Feb 26, 2019 at 11:56:15PM +0800, Kairui Song wrote: > > arch/x86/hyperv/hv_init.c | 4 > > 1 file changed, 4 insertions(+) > > > > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c > > index 7abb09e2eeb8..92291c18d716 100644 > > --- a/arch/x86/hyperv/hv_init.c > > +++ b/arch/x86/hyperv/hv_init.c > > @@ -406,6 +406,10 @@ void hyperv_cleanup(void) > > /* Reset our OS id */ > > wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0); > > > > + /* Cleanup page reference before reset the page */ > > + hv_hypercall_pg = NULL; > > + wmb(); > > What do we need that SFENCE for? Any why does it lack a comment? Hi, that's for ensuring the hv_hypercall_pg is reset to NULL before the following wrmsr call. The wrmsr call will make the pointer address invalid. I can add some comment in V2 if this is OK. -- Best Regards, Kairui Song ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [RFC PATCH] x86, hyperv: fix kernel panic when kexec on HyperV VM
On Tue, Feb 26, 2019 at 11:56:15PM +0800, Kairui Song wrote: > arch/x86/hyperv/hv_init.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c > index 7abb09e2eeb8..92291c18d716 100644 > --- a/arch/x86/hyperv/hv_init.c > +++ b/arch/x86/hyperv/hv_init.c > @@ -406,6 +406,10 @@ void hyperv_cleanup(void) > /* Reset our OS id */ > wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0); > > + /* Cleanup page reference before reset the page */ > + hv_hypercall_pg = NULL; > + wmb(); What do we need that SFENCE for? Any why does it lack a comment? > + > /* Reset the hypercall page */ > hypercall_msr.as_uint64 = 0; > wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); > -- > 2.20.1 > ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [RFC PATCH] x86, hyperv: fix kernel panic when kexec on HyperV VM
Kairui Song writes: > When hypercalls is used for sending IPIs, kexec will fail with a kernel > panic like this: > > kexec_core: Starting new kernel > BUG: unable to handle kernel NULL pointer dereference at > PGD 800057995067 P4D 800057995067 PUD 57990067 PMD 0 > Oops: 0002 [#1] SMP PTI > CPU: 0 PID: 1016 Comm: kexec Not tainted 4.18.16-300.fc29.x86_64 #1 > Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS > Hyper-V UEFI Release v3.0 03/02/2018 > RIP: 0010:0xc901d000 > Code: Bad RIP value. > RSP: 0018:c9000495bcf0 EFLAGS: 00010046 > RAX: RBX: c901d000 RCX: 00020015 > RDX: 7f553000 RSI: RDI: c9000495bd28 > RBP: 0002 R08: R09: 8238aaf8 > R10: 8238aae0 R11: R12: 88007f553008 > R13: 0001 R14: 8800ff553000 R15: > FS: 7ff5c0e67b80() GS:880078e0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: c901cfd6 CR3: 4f678006 CR4: 003606f0 > Call Trace: > ? __send_ipi_mask+0x1c6/0x2d0 > ? hv_send_ipi_mask_allbutself+0x6d/0xb0 > ? mp_save_irq+0x70/0x70 > ? __ioapic_read_entry+0x32/0x50 > ? ioapic_read_entry+0x39/0x50 > ? clear_IO_APIC_pin+0xb8/0x110 > ? native_stop_other_cpus+0x6e/0x170 > ? native_machine_shutdown+0x22/0x40 > ? kernel_kexec+0x136/0x156 > ? __do_sys_reboot+0x1be/0x210 > ? kmem_cache_free+0x1b1/0x1e0 > ? __dentry_kill+0x10b/0x160 > ? _cond_resched+0x15/0x30 > ? dentry_kill+0x47/0x170 > ? dput.part.34+0xc6/0x100 > ? __fput+0x147/0x220 > ? _cond_resched+0x15/0x30 > ? task_work_run+0x38/0xa0 > ? do_syscall_64+0x5b/0x160 > ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 > Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack > ebtable_nat ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 > ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 > nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security > nf_conntrack ip_set nfnetlink ebtable_filter ebtables ip6table_filter > ip6_tables sunrpc vfat fat crct10dif_pclmul crc32_pclmul ghash_clmulni_intel > intel_rapl_perf hv_balloon joydev xfs libcrc32c hv_storvsc serio_raw > scsi_transport_fc hv_netvsc hyperv_keyboard hyperv_fb hid_hyperv crc32c_intel > hv_vmbus > > That's because HyperV's machine_ops.shutdown allow registering a hook to > be called upon shutdown, hv_vmbus will invalidate the hypercall page > using this hook. But hv_hypercall_pg is still pointing to this invalid > page, any hypercall based operation will panic the kernel. And kexec > progress will send IPIs for stopping CPUs. > > This fix this by simply reset hv_hypercall_pg to NULL when the page is > revoked to avoid any misuse. IPI sending will fallback to use non > hypercall based method. This only happens on kexec / kdump so setting to > NULL should be good enough. > > Fixes: 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments") > Signed-off-by: Kairui Song > > --- > > I'm not sure about the details of what happened after the > > wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); > > But this fix should be valid, please let me know if I get anything > wrong, thanks. > > arch/x86/hyperv/hv_init.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c > index 7abb09e2eeb8..92291c18d716 100644 > --- a/arch/x86/hyperv/hv_init.c > +++ b/arch/x86/hyperv/hv_init.c > @@ -406,6 +406,10 @@ void hyperv_cleanup(void) > /* Reset our OS id */ > wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0); > > + /* Cleanup page reference before reset the page */ > + hv_hypercall_pg = NULL; > + wmb(); > + > /* Reset the hypercall page */ > hypercall_msr.as_uint64 = 0; > wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); This all looks correct to me. We need to reset HV_X64_MSR_HYPERCALL as the hypercall page will remain 'special' otherwise so dumping it may have undesired consequences (though, I think that last time I checked it it was possible to read from this page without issues - and this should be enough for kdump. But I'd rather keep things as they are as one additional wrmsr doesn't hurt). -- Vitaly ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel