On Thu, Aug 18, 2016 at 04:35:44PM +0100, One Thousand Gnomes wrote: > On Thu, 18 Aug 2016 05:12:54 -0600 > "Jan Beulich" <[email protected]> wrote: > > > >>> On 18.08.16 at 12:16, <[email protected]> wrote: > > > On 18/08/16 11:06, Jan Beulich wrote: > > >>>>> On 17.08.16 at 22:32, <[email protected]> wrote: > > >>> Looking at the kernel it assumes that WB is ok for 640KB->1MB. > > >>> The comment says: > > >>> " /* Low ISA region is always mapped WB in page table. No need to > > >>> track > > > *" > > >> As per above it's not clear to me what this comment is backed by. > > > > > > This states what is in the pagetables. Not the combined result with > > > MTRRs. > > > > > > WB in the pagetables and WC/UB in the MTRRs is a legal combination which > > > functions correctly. > > > > True, but then again - haven't I been told multiple times that Linux > > nowadays prefers to run without using MTRRs? > > The BIOS sets up the fixed MTRR registers for the 640K-1MB window. Those > are separate to the variable range MTRR registers used for main memory > with specific mappings for segments A000 to BFFF then C000-C7FF / > C800-CFFF / etc up to FFFF.
OK, so BIOS-inherited. Looking at the Intel SDM (figure 11-7), if the MTRR is UC for that, then having pagetables being either UC or WB are fine. Except Linux's use of the quirk (is_untracked_pat_range) ends up always requesting WB. And to combat the splat, the patch: >From 5209635f23786fb88cf0ce77719da8acda63bf65 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk <[email protected]> Date: Fri, 19 Aug 2016 11:06:44 -0400 Subject: [PATCH] x86/xen: Add x86_platform.is_untracked_pat_range quirk to ignore ISA regions. On x86 whenever VMAs are setup, the 'is_ISA_range quirk' (which this patch re-implements) is used to figure whether to ignore the requested PAT type and always use WB (see 'reserve_memtype'). Specifically it forces the WB type for any region in the ISA space. >From the Intel SDM, the combination of MTRR (UC, which is setup by the BIOS) and PAT (UC or WB) for the ISA region ends up with the same value - UC. However on Xen, due to XSA 154 we enforce that mappings that _ANY_ pagetable entry to MMIO ranges MUST have the same the same cachability mapping - and in this case we enforce UC. Which means that with XSA 154 (and without this patch) any application that maps /dev/mem to get SMBIOS information (like mcelog), and pokes in the ISA region will not have an PTE set. That is due to reserve_pfn_range returning -EINVAL which results in the PTE not being set. [These are debug entries added in 'reserve_pfn_range'] mcelog:2471 0xf0000->0xf1000, req_type=write-back new_type=write-back mcelog:2471 0xeb000->0xed000, req_type=write-back new_type=write-back .. above are successfull ones, but: mcelog:2471 0xeb000->0xed000, req_type=uncached new_type=uncached [again, a debug one:] mcelog:2471 want=uncached got=write-back strict 0x000eb000-0x000ecfff mcelog:2471 map pfn expected mapping type uncached for [mem 0x000eb000-0x000ecfff], got write-back ------------[ cut here ]------------ [<ffffffff816c66f0>] dump_stack+0x63/0x83 [<ffffffff81084745>] warn_slowpath_common+0x95/0xe0 [<ffffffff810847aa>] warn_slowpath_null+0x1a/0x20 [<ffffffff810725f3>] untrack_pfn+0x93/0xc0 [<ffffffff811b90f9>] unmap_single_vma+0xa9/0x100 [<ffffffff811b9644>] unmap_vmas+0x54/0xa0 [<ffffffff811bf0da>] exit_mmap+0x9a/0x150 [<ffffffff810825d3>] mmput+0x73/0x110 [<ffffffff81082775>] dup_mm+0x105/0x110 [<ffffffff81083b1d>] copy_process+0x11ed/0x1240 [<ffffffff81084009>] do_fork+0x79/0x280 [<ffffffff810259d3>] ? syscall_trace_enter_phase1+0x153/0x180 [<ffffffff81084226>] SyS_clone+0x16/0x20 [<ffffffff816cb3ee>] system_call_fastpath+0x12/0x71 results in that splat. The effective result of the function below is for 'reserver_memtype' to ignore the result from 'x86_platform.is_untracked_pat_range' quirk. Which means that the splat above does not happen. Signed-off-by: Konrad Rzeszutek Wilk <[email protected]> --- arch/x86/xen/enlighten.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 8ffb089..3238d04 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -283,6 +283,27 @@ static void __init xen_banner(void) version >> 16, version & 0xffff, extra.extraversion, xen_feature(XENFEAT_mmu_pt_update_preserve_ad) ? " (preserve-AD)" : ""); } + +/* + * On x86 whenever VMAs are setup, the 'is_ISA_range quirk' (which we + * re-implement below) is used to figure whether to ignore the + * requested PAT type and always use WB (see 'reserve_memtype'). + * + * The combination of MTRR (UC) and PAT (UC or WB) for the ISA region ends + * up with the same value - UC. + * + * However on Xen, due to XSA 154 we enforce that mappings to _ANY_ MMIO + * range MUST have the same the same cachability mapping - and in this case + * we enforce UC for everything. + * + * The effective result of the function below is for 'reserver_memtype' + * to ignore the result from 'x86_platform.is_untracked_pat_range' quirk. + */ +static bool xen_ignore(u64 s, u64 e) +{ + return false; +} + /* Check if running on Xen version (major, minor) or later */ bool xen_running_on_version_or_later(unsigned int major, unsigned int minor) @@ -1730,6 +1751,8 @@ asmlinkage __visible void __init xen_start_kernel(void) x86_init.mpparse.get_smp_config = x86_init_uint_noop; xen_boot_params_init_edd(); + + x86_platform.is_untracked_pat_range = xen_ignore; } #ifdef CONFIG_PCI /* PCI BIOS service won't work from a PV guest. */ -- 2.5.5 > > Alan

