Hi,

I tested this workaround : I confirm that it works on Xen host, but not on
Xen guest.
If you try to start a vm with latest kernel i.e. theses parameters in cfg
file :

#
#  Kernel + memory size
#
kernel      = '/boot/vmlinuz-4.9.0-7-amd64'
extra       = 'elevator=noop'
ramdisk     = '/boot/initrd.img-4.9.0-7-amd64'

The VM crash in loop with kernel error :

[    0.000000] Linux version 4.9.0-7-amd64 (debian-ker...@lists.debian.org)
(gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) ) #1 SMP Debian
4.9.110-1 (2018-07-05)
[    0.000000] Command line: root=/dev/xvda2 ro elevator=noop
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point
registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832
bytes, using 'standard' format.
[    0.000000] ACPI in unprivileged domain disabled
[    0.000000] Released 0 page(s)
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] Xen: [mem 0x0000000000000000-0x000000000009ffff] usable
[    0.000000] Xen: [mem 0x00000000000a0000-0x00000000000fffff] reserved
[    0.000000] Xen: [mem 0x0000000000100000-0x000000007fffffff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI not present or invalid.
[    0.000000] Hypervisor detected: Xen
[    0.000000] e820: last_pfn = 0x80000 max_arch_pfn = 0x400000000
[    0.000000] MTRR: Disabled
[    0.000000] x86/PAT: MTRRs disabled, skipping PAT initialization too.
[    0.000000] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WC  WP  UC
UC
[    0.000000] RAMDISK: [mem 0x02000000-0x05996fff]
[    0.000000] NUMA turned off
[    0.000000] Faking a node at [mem 0x0000000000000000-0x000000007fffffff]
[    0.000000] NODE_DATA(0) allocated [mem 0x7fc16000-0x7fc1afff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.000000]   DMA32    [mem 0x0000000001000000-0x000000007fffffff]
[    0.000000]   Normal   empty
[    0.000000]   Device   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000001000-0x000000000009ffff]
[    0.000000]   node   0: [mem 0x0000000000100000-0x000000007fffffff]
[    0.000000] Initmem setup node 0 [mem
0x0000000000001000-0x000000007fffffff]
[    0.000000] p2m virtual area at ffffc90000000000, size is 40000000
[    0.000000] Remapped 0 page(s)
[    0.000000] SFI: Simple Firmware Interface v0.81
http://simplefirmware.org
[    0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
[    0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
[    0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000fffff]
[    0.000000] e820: [mem 0x80000000-0xffffffff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on Xen
[    0.000000] Xen version: 4.8.4-pre (preserve-AD)
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 7645519600211568 ns
[    0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:1
nr_node_ids:1
[    0.000000] percpu: Embedded 35 pages/cpu @ffff88007f600000 s105304
r8192 d29864 u2097152
[    0.000000] PV qspinlock hash table entries: 256 (order: 0, 4096 bytes)
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.
Total pages: 515978
[    0.000000] Policy zone: DMA32
[    0.000000] Kernel command line: root=/dev/xvda2 ro elevator=noop
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Memory: 1980804K/2096764K available (6250K kernel code,
1159K rwdata, 2868K rodata, 1420K init, 688K bss, 115960K reserved, 0K
cma-reserved)
[    0.000000] Kernel/User page tables isolation: enabled
[    0.000000] Hierarchical RCU implementation.
[    0.000000]     Build-time adjustment of leaf fanout to 64.
[    0.000000]     RCU restricting CPUs from NR_CPUS=512 to nr_cpu_ids=1.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=1
[    0.000000] Using NULL legacy PIC
[    0.000000] NR_IRQS:33024 nr_irqs:32 0
[    0.000000] xen:events: Using FIFO-based ABI
[    0.000000] Console: colour dummy device 80x25
[    0.000000] console [tty0] enabled
[    0.000000] console [hvc0] enabled
[    0.000000] clocksource: xen: mask: 0xffffffffffffffff max_cycles:
0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.000000] installing Xen timer for CPU 0
[    0.000000] tsc: Unable to calibrate against PIT
[    0.000000] tsc: No reference (HPET/PMTIMER) available
[    0.000000] tsc: Detected 2597.018 MHz processor
[    0.004000] Calibrating delay loop (skipped), value calculated using
timer frequency.. 5194.03 BogoMIPS (lpj=10388072)
[    0.004000] pid_max: default: 32768 minimum: 301
[    0.004000] Security Framework initialized
[    0.004000] Yama: disabled by default; enable with sysctl kernel.yama.*
[    0.004000] AppArmor: AppArmor disabled by boot time parameter
[    0.004000] Dentry cache hash table entries: 262144 (order: 9, 2097152
bytes)
[    0.004000] Inode-cache hash table entries: 131072 (order: 8, 1048576
bytes)
[    0.004000] Mount-cache hash table entries: 4096 (order: 3, 32768 bytes)
[    0.004000] Mountpoint-cache hash table entries: 4096 (order: 3, 32768
bytes)
[    0.004000] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
[    0.004000] ENERGY_PERF_BIAS: View and update with
x86_energy_perf_policy(8)
[    0.004000] CPU: Physical Processor ID: 0
[    0.004000] CPU: Processor Core ID: 0
[    0.004000] mce: CPU supports 2 MCE banks
[    0.004000] Last level iTLB entries: 4KB 1024, 2MB 1024, 4MB 1024
[    0.004000] Last level dTLB entries: 4KB 1024, 2MB 1024, 4MB 1024, 1GB 4
[    0.004000] Spectre V2 : Mitigation: Full generic retpoline
[    0.004000] Spectre V2 : Spectre v2 mitigation: Enabling Indirect Branch
Prediction Barrier
[    0.004000] Spectre V2 : Enabling Restricted Speculation for firmware
calls
[    0.004000] Speculative Store Bypass: Vulnerable
[    0.051616] Freeing SMP alternatives memory: 24K
[    0.057710] ftrace: allocating 25269 entries in 99 pages
[    0.072061] cpu 0 spinlock event irq 1
[    0.072071] smpboot: Max logical packages: 1
[    0.072078] VPMU disabled by hypervisor.
[    0.072093] Performance Events: unsupported p6 CPU model 63 no PMU
driver, software events only.
[    0.072602] NMI watchdog: disabled (cpu0): hardware events not enabled
[    0.072610] NMI watchdog: Shutting down hard lockup detector on all cpus
[    0.072624] x86: Booted up 1 node, 1 CPUs
[    0.072761] devtmpfs: initialized
[    0.072813] x86/mm: Memory block size: 128MB
[    0.074028] clocksource: jiffies: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.074045] futex hash table entries: 256 (order: 2, 16384 bytes)
[    0.074075] pinctrl core: initialized pinctrl subsystem
[    0.074165] NET: Registered protocol family 16
[    0.074176] xen:grant_table: Grant tables using version 1 layout
[    0.074195] Grant table initialized
[    0.074377] PCI: setting up Xen PCI frontend stub
[    0.074377] ACPI: Interpreter disabled.
[    0.074377] xen:balloon: Initialising balloon driver
[    0.076045] xen_balloon: Initialising balloon driver
[    0.076053] vgaarb: loaded
[    0.076068] dmi: Firmware registration failed.
[    0.076106] PCI: System does not support PCI
[    0.076111] PCI: System does not support PCI
[    0.076237] clocksource: Switched to clocksource xen
[    0.081278] VFS: Disk quotas dquot_6.6.0
[    0.081294] VFS: Dquot-cache hash table entries: 512 (order 0, 4096
bytes)
[    0.081315] hugetlbfs: disabling because there are no supported hugepage
sizes
[    0.081343] pnp: PnP ACPI: disabled
[    0.082398] NET: Registered protocol family 2
[    0.082534] TCP established hash table entries: 16384 (order: 5, 131072
bytes)
[    0.082606] TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
[    0.082654] TCP: Hash tables configured (established 16384 bind 16384)
[    0.082689] UDP hash table entries: 1024 (order: 3, 32768 bytes)
[    0.082708] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes)
[    0.082750] NET: Registered protocol family 1
[    0.082788] Unpacking initramfs...
[    0.123386] Freeing initrd memory: 58972K
[    0.123786] general protection fault: 0000 [#1] SMP
[    0.123792] Modules linked in:
[    0.123799] CPU: 0 PID: 30 Comm: modprobe Not tainted 4.9.0-7-amd64 #1
Debian 4.9.110-1
[    0.123807] task: ffff880078ad7000 task.stack: ffffc90040498000
[    0.123812] RIP: e030:[<ffffffff81614d4d>]  [<ffffffff81614d4d>]
ret_from_fork+0x2d/0x70
[    0.123824] RSP: e02b:ffffc9004049bf50  EFLAGS: 00010006
[    0.123829] RAX: 0000000493ef5000 RBX: ffffffff8108e9d0 RCX:
ffffea0001ec61df
[    0.123835] RDX: 0000000000000002 RSI: 0000000000000002 RDI:
ffffc9004049bf58
[    0.123841] RBP: 0000000000000000 R08: 0000000000000000 R09:
ffff880078adc000
[    0.124009] R10: 8080808080808080 R11: fefefefefefefeff R12:
ffff88007ceb7a00
[    0.124009] R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000000
[    0.124009] FS:  0000000000000000(0000) GS:ffff88007f600000(0000)
knlGS:0000000000000000
[    0.124009] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.124009] CR2: 00007ffd13e9e9b9 CR3: 0000000078af4000 CR4:
0000000000042660
[    0.124009] Stack:
[    0.124009]  0000000000000000 0000000000000000 0000000000000000
0000000000000000
[    0.124009]  0000000000000000 0000000000000000 0000000000000000
0000000000000000
[    0.124009]  0000000000000000 0000000000000000 0000000000000000
0000000000000000
[    0.124009] Call Trace:
[    0.124009] Code: c7 e8 b8 fe a8 ff 48 85 db 75 2f 48 89 e7 e8 5b ed 9e
ff 50 90 0f 20 d8 65 48 0b 04 25 e0 02 01 00 78 08 65 88 04 25 e7 02 01 00
<0f> 22 d8 58 66 0f 1f 44 00 00 e9 c1 07 00 00 4c 89 e7 eb 11 e8
[    0.124009] RIP  [<ffffffff81614d4d>] ret_from_fork+0x2d/0x70
[    0.124009]  RSP <ffffc9004049bf50>
[    0.124009] ---[ end trace e2ff95a7e079b5b5 ]---

Did I miss something ?

Thanks for your help.

Best regards.

Benoît

Le lun. 16 juil. 2018 à 19:28, Hans van Kranenburg <h...@knorrie.org> a
écrit :

> Reportedly, adding pti=off to the kernel boot parameters will work
> around the issue for now.
>
> Turning off pti in the guest kernel is done in any case for PV. The
> issue between 4.9.107 and 4.9.111 affects the detection and turning off
> of pti, that's why forcing it off helps.
>
> In 4.9.112 it's fixed in commit 1adc34adc3447c34926994b87db5d929f5ab45b5
> "x86/cpu: Re-apply forced caps every time CPU caps are re-read"
>
> Hans
>

Reply via email to