Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on > 4GB, don't turn it on.
On Mon, Jul 30, 2012 at 03:58:02PM +0100, Stefano Stabellini wrote: > On Fri, 27 Jul 2012, Konrad Rzeszutek Wilk wrote: > > On Fri, Jul 27, 2012 at 12:06:27PM +0100, Stefano Stabellini wrote: > > > On Thu, 26 Jul 2012, Konrad Rzeszutek Wilk wrote: > > > > If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB > > > > gets turned on: > > > > PCI-DMA: Using software bounce buffering for IO (SWIOTLB) > > > > software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at > > > > [8800fb43d000-8800ff43cfff] > > > > > > > > which is OK if we had PCI devices, but not if we did not. In a PV > > > > guest the SWIOTLB ends up asking the hypervisor for precious lowmem > > > > memory - and 64MB of it per guest. On a 32GB machine, this limits the > > > > amount of guests that are 4GB to start due to lowmem exhaustion. > > > > > > > > What we do is detect whether the user supplied e820_hole=1 > > > > parameter, which is used to construct an E820 that is similar to > > > > the machine - so that the PCI regions do not overlap with RAM regions. > > > > We check for that by looking at the E820 and seeing if it diverges > > > > from the standard - and if so (and if iommu=soft was not turned on), > > > > we disable the check pci_swiotlb_detect_4gb code. > > > > > > What kind of paramter is it? > > > Is it a Linux cmdline paramter? Or maybe a Xen toolstack parameter? > > > > Its a guest config option. > > Is this option turned on by default if the VM config file contains one > or more PCI devices statically assigned to the VM? I think we debated it at some point but never came to agreement. I did showed that it would not negativly impact older guests - except that they would lose some big swaths of memory (they don't do the release memory pages for E820 I/O regions). > > If this option is not specified, is it going to be impossible to > dynamically passthrough a PCI devices after the VM is booted? Well, so I thought about this over the weekend and cooked up some new patches that turn Xen-SWIOTLB on (if it hasn't been turned on) when Xen PCI detectes that there are some dvices to be passed in. Testing it now. > > > > > Surely there must be a better way to let Linux know if this paramter has > > > been turned on than looking for ACPI entries in the E820. > > > > I am all open for suggestions. The best way I can think of is to have > > some early_init variant of XenBus-detect-this-backend-parameter. Can > > one unhook an "old" XenBus and reset with the full-fledged XenBus > > init later on? > > Assuming that the xen swiotlb is only useful for PCI passthrough devices > in PV guests, we could write few wrappers for the current xen_swiotlb > functions like this: > > xen_swiotlb_alloc_coherent_new(..) > { > if (xen_initial_domain() || (xen_pv_domain() && > a_pci_device_is_assigned())) > xen_swiotlb_alloc_coherent(); > else > return __get_free_pages(); > } > > do you think it would work? > This way it would be far more flexible. So I had a brain-fart when I wrote these patches. When a PV guest is booted with more than 4GB, the SWIOTLB that gets turned on is the *native* one. Not the XenSWIOTLB. The impact is that we dont' do any of the swizzle of memory below 4GB, but instead jus end up wasting 64MB in a PV guest. The fix for that is actually pretty simple: >From c5846a207249d7c072dccbec6850e5dbf0971c40 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Fri, 27 Jul 2012 20:16:00 -0400 Subject: [PATCH 7/9] xen/swiotlb: With more than 4GB on 64-bit, disable the native SWIOTLB. If a PV guest is booted the native SWIOTLB should not be turned on. It does not help us (we don't have any PCI devices) and it eats 64MB of good memory. In the case of PV guests with PCI devices we need the Xen-SWIOTLB one. Signed-off-by: Konrad Rzeszutek Wilk --- arch/x86/xen/pci-swiotlb-xen.c | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c index b6a5340..2f8cc57 100644 --- a/arch/x86/xen/pci-swiotlb-xen.c +++ b/arch/x86/xen/pci-swiotlb-xen.c @@ -8,6 +8,11 @@ #include #include +#ifdef CONFIG_X86_64 +#include +#include +#endif + int xen_swiotlb __read_mostly; static struct dma_map_ops xen_swiotlb_dma_ops = { @@ -49,6 +54,14 @@ int __init pci_xen_swiotlb_detect(void) * the 'swiotlb' flag is the only one turning it on. */ swiotlb = 0; +#ifdef CONFIG_X86_64 + /* pci_swiotlb_detect_4gb turns native SWIOTLB if no_iommu == 0 +* (so no iommu=X command line over-writes). So disable the native +* SWIOTLB. */ + if (max_pfn > MAX_DMA32_PFN) + no_iommu = 1; +#endif return xen_swiotlb; } -- 1.7.7.6 The next part is to deal with the user forgetting to pass in 'iommu=soft' when doing PCI passthrough for a PV guest. This "forgetting" part is quite annoying since it seems to happen to me all the time so I think that
Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on > 4GB, don't turn it on.
On Fri, 27 Jul 2012, Konrad Rzeszutek Wilk wrote: > On Fri, Jul 27, 2012 at 12:06:27PM +0100, Stefano Stabellini wrote: > > On Thu, 26 Jul 2012, Konrad Rzeszutek Wilk wrote: > > > If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB > > > gets turned on: > > > PCI-DMA: Using software bounce buffering for IO (SWIOTLB) > > > software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at > > > [8800fb43d000-8800ff43cfff] > > > > > > which is OK if we had PCI devices, but not if we did not. In a PV > > > guest the SWIOTLB ends up asking the hypervisor for precious lowmem > > > memory - and 64MB of it per guest. On a 32GB machine, this limits the > > > amount of guests that are 4GB to start due to lowmem exhaustion. > > > > > > What we do is detect whether the user supplied e820_hole=1 > > > parameter, which is used to construct an E820 that is similar to > > > the machine - so that the PCI regions do not overlap with RAM regions. > > > We check for that by looking at the E820 and seeing if it diverges > > > from the standard - and if so (and if iommu=soft was not turned on), > > > we disable the check pci_swiotlb_detect_4gb code. > > > > What kind of paramter is it? > > Is it a Linux cmdline paramter? Or maybe a Xen toolstack parameter? > > Its a guest config option. Is this option turned on by default if the VM config file contains one or more PCI devices statically assigned to the VM? If this option is not specified, is it going to be impossible to dynamically passthrough a PCI devices after the VM is booted? > > Surely there must be a better way to let Linux know if this paramter has > > been turned on than looking for ACPI entries in the E820. > > I am all open for suggestions. The best way I can think of is to have > some early_init variant of XenBus-detect-this-backend-parameter. Can > one unhook an "old" XenBus and reset with the full-fledged XenBus > init later on? Assuming that the xen swiotlb is only useful for PCI passthrough devices in PV guests, we could write few wrappers for the current xen_swiotlb functions like this: xen_swiotlb_alloc_coherent_new(..) { if (xen_initial_domain() || (xen_pv_domain() && a_pci_device_is_assigned())) xen_swiotlb_alloc_coherent(); else return __get_free_pages(); } do you think it would work? This way it would be far more flexible. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on > 4GB, don't turn it on.
>>> On 27.07.12 at 19:54, Konrad Rzeszutek Wilk wrote: > On Fri, Jul 27, 2012 at 08:27:39AM +0100, Jan Beulich wrote: >> >>> On 26.07.12 at 22:43, Konrad Rzeszutek Wilk >> >>> wrote: >> > + /* Check if the user supplied the e820_hole parameter >> > + * which would create a machine looking E820 region. */ >> > + for (i = 0; i < e820.nr_map; i++) { >> > + if ((e820.map[i].type == E820_ACPI) || >> > + (e820.map[i].type == E820_NVS)) >> > + return true; >> >> Tying this decision to the presence of ACPI regions in E820 is >> problematic for two reasons imo: For one, it precludes cleaning >> up this (bogus!) construct where it gets produced (PV DomU-s >> really shouldn't ever see such E820 entries, they should get >> converted to simple reserved entries, to wipe any notion of >> ACPI presence). And second it ties you to running on systems >> that actually have ACPI, whereas it is my rudimentary >> understanding that systems with e.g. SFI would not have any >> ACPI). > > Right. The other idea was to check the XenBus for the existence > of vpci backend. But at this stage it is not up yet. > > Perhaps what I should check for is the existence of two E820_RSV > and two E820_RAM regions - and that would be a normal PV guest. > Anything that is outside of that scope would be considered > a PCI PV guest? I'd limit this to two RAM and at least one reserved regions (after all it could happen that all the reserved ones can be folded into one). But beyond this minor detail that's the approach I'd prefer. All the ones below look more or less fragile. Jan > The other thought I had was to skip this check altogether and > either do: > 1). initialize SWIOTLB when xen-pcifront start up and detects > that it has devices (so later on initialization - similar to > how IA64 does it) - but I am not sure how the PCI-DMA works > with these late bloomers (especially as one could just make > xen-pcifront be a module). > 2). If xen-pcifront starts and does not detect any backends > it calls swiotlb_free. But that also requires the PCI-DMA > to swap in the dma_ops, and I am not entirely sure how > that would work out. > 3). Have an "early_init" xen-pcifront components that does a > a quick XenBus init (similar to how hvmloader checks for > DMI overwrites) and if it finds vpci then declare its > time to turn SWIOTLB on. > 4). The other thing is to wrap this code with something like > this: > > #ifdef CONFIG_SWIOTLB > #ifdef CONFIG_XEN_PCI_FRONTEND > if (.. blah balh) do the check as outlined in 3). > #else // PCI_FRONTEND is not present, so we won't need SWIOTLB > swiotlb = 0; > iommu = 1; > #endif > #endif > > That would take care of the built-in issues. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on 4GB, don't turn it on.
On 27.07.12 at 19:54, Konrad Rzeszutek Wilk kon...@darnok.org wrote: On Fri, Jul 27, 2012 at 08:27:39AM +0100, Jan Beulich wrote: On 26.07.12 at 22:43, Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote: + /* Check if the user supplied the e820_hole parameter + * which would create a machine looking E820 region. */ + for (i = 0; i e820.nr_map; i++) { + if ((e820.map[i].type == E820_ACPI) || + (e820.map[i].type == E820_NVS)) + return true; Tying this decision to the presence of ACPI regions in E820 is problematic for two reasons imo: For one, it precludes cleaning up this (bogus!) construct where it gets produced (PV DomU-s really shouldn't ever see such E820 entries, they should get converted to simple reserved entries, to wipe any notion of ACPI presence). And second it ties you to running on systems that actually have ACPI, whereas it is my rudimentary understanding that systems with e.g. SFI would not have any ACPI). Right. The other idea was to check the XenBus for the existence of vpci backend. But at this stage it is not up yet. Perhaps what I should check for is the existence of two E820_RSV and two E820_RAM regions - and that would be a normal PV guest. Anything that is outside of that scope would be considered a PCI PV guest? I'd limit this to two RAM and at least one reserved regions (after all it could happen that all the reserved ones can be folded into one). But beyond this minor detail that's the approach I'd prefer. All the ones below look more or less fragile. Jan The other thought I had was to skip this check altogether and either do: 1). initialize SWIOTLB when xen-pcifront start up and detects that it has devices (so later on initialization - similar to how IA64 does it) - but I am not sure how the PCI-DMA works with these late bloomers (especially as one could just make xen-pcifront be a module). 2). If xen-pcifront starts and does not detect any backends it calls swiotlb_free. But that also requires the PCI-DMA to swap in the dma_ops, and I am not entirely sure how that would work out. 3). Have an early_init xen-pcifront components that does a a quick XenBus init (similar to how hvmloader checks for DMI overwrites) and if it finds vpci then declare its time to turn SWIOTLB on. 4). The other thing is to wrap this code with something like this: #ifdef CONFIG_SWIOTLB #ifdef CONFIG_XEN_PCI_FRONTEND if (.. blah balh) do the check as outlined in 3). #else // PCI_FRONTEND is not present, so we won't need SWIOTLB swiotlb = 0; iommu = 1; #endif #endif That would take care of the built-in issues. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on 4GB, don't turn it on.
On Fri, 27 Jul 2012, Konrad Rzeszutek Wilk wrote: On Fri, Jul 27, 2012 at 12:06:27PM +0100, Stefano Stabellini wrote: On Thu, 26 Jul 2012, Konrad Rzeszutek Wilk wrote: If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB gets turned on: PCI-DMA: Using software bounce buffering for IO (SWIOTLB) software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at [8800fb43d000-8800ff43cfff] which is OK if we had PCI devices, but not if we did not. In a PV guest the SWIOTLB ends up asking the hypervisor for precious lowmem memory - and 64MB of it per guest. On a 32GB machine, this limits the amount of guests that are 4GB to start due to lowmem exhaustion. What we do is detect whether the user supplied e820_hole=1 parameter, which is used to construct an E820 that is similar to the machine - so that the PCI regions do not overlap with RAM regions. We check for that by looking at the E820 and seeing if it diverges from the standard - and if so (and if iommu=soft was not turned on), we disable the check pci_swiotlb_detect_4gb code. What kind of paramter is it? Is it a Linux cmdline paramter? Or maybe a Xen toolstack parameter? Its a guest config option. Is this option turned on by default if the VM config file contains one or more PCI devices statically assigned to the VM? If this option is not specified, is it going to be impossible to dynamically passthrough a PCI devices after the VM is booted? Surely there must be a better way to let Linux know if this paramter has been turned on than looking for ACPI entries in the E820. I am all open for suggestions. The best way I can think of is to have some early_init variant of XenBus-detect-this-backend-parameter. Can one unhook an old XenBus and reset with the full-fledged XenBus init later on? Assuming that the xen swiotlb is only useful for PCI passthrough devices in PV guests, we could write few wrappers for the current xen_swiotlb functions like this: xen_swiotlb_alloc_coherent_new(..) { if (xen_initial_domain() || (xen_pv_domain() a_pci_device_is_assigned())) xen_swiotlb_alloc_coherent(); else return __get_free_pages(); } do you think it would work? This way it would be far more flexible. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on 4GB, don't turn it on.
On Mon, Jul 30, 2012 at 03:58:02PM +0100, Stefano Stabellini wrote: On Fri, 27 Jul 2012, Konrad Rzeszutek Wilk wrote: On Fri, Jul 27, 2012 at 12:06:27PM +0100, Stefano Stabellini wrote: On Thu, 26 Jul 2012, Konrad Rzeszutek Wilk wrote: If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB gets turned on: PCI-DMA: Using software bounce buffering for IO (SWIOTLB) software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at [8800fb43d000-8800ff43cfff] which is OK if we had PCI devices, but not if we did not. In a PV guest the SWIOTLB ends up asking the hypervisor for precious lowmem memory - and 64MB of it per guest. On a 32GB machine, this limits the amount of guests that are 4GB to start due to lowmem exhaustion. What we do is detect whether the user supplied e820_hole=1 parameter, which is used to construct an E820 that is similar to the machine - so that the PCI regions do not overlap with RAM regions. We check for that by looking at the E820 and seeing if it diverges from the standard - and if so (and if iommu=soft was not turned on), we disable the check pci_swiotlb_detect_4gb code. What kind of paramter is it? Is it a Linux cmdline paramter? Or maybe a Xen toolstack parameter? Its a guest config option. Is this option turned on by default if the VM config file contains one or more PCI devices statically assigned to the VM? I think we debated it at some point but never came to agreement. I did showed that it would not negativly impact older guests - except that they would lose some big swaths of memory (they don't do the release memory pages for E820 I/O regions). If this option is not specified, is it going to be impossible to dynamically passthrough a PCI devices after the VM is booted? Well, so I thought about this over the weekend and cooked up some new patches that turn Xen-SWIOTLB on (if it hasn't been turned on) when Xen PCI detectes that there are some dvices to be passed in. Testing it now. Surely there must be a better way to let Linux know if this paramter has been turned on than looking for ACPI entries in the E820. I am all open for suggestions. The best way I can think of is to have some early_init variant of XenBus-detect-this-backend-parameter. Can one unhook an old XenBus and reset with the full-fledged XenBus init later on? Assuming that the xen swiotlb is only useful for PCI passthrough devices in PV guests, we could write few wrappers for the current xen_swiotlb functions like this: xen_swiotlb_alloc_coherent_new(..) { if (xen_initial_domain() || (xen_pv_domain() a_pci_device_is_assigned())) xen_swiotlb_alloc_coherent(); else return __get_free_pages(); } do you think it would work? This way it would be far more flexible. So I had a brain-fart when I wrote these patches. When a PV guest is booted with more than 4GB, the SWIOTLB that gets turned on is the *native* one. Not the XenSWIOTLB. The impact is that we dont' do any of the swizzle of memory below 4GB, but instead jus end up wasting 64MB in a PV guest. The fix for that is actually pretty simple: From c5846a207249d7c072dccbec6850e5dbf0971c40 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk konrad.w...@oracle.com Date: Fri, 27 Jul 2012 20:16:00 -0400 Subject: [PATCH 7/9] xen/swiotlb: With more than 4GB on 64-bit, disable the native SWIOTLB. If a PV guest is booted the native SWIOTLB should not be turned on. It does not help us (we don't have any PCI devices) and it eats 64MB of good memory. In the case of PV guests with PCI devices we need the Xen-SWIOTLB one. Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- arch/x86/xen/pci-swiotlb-xen.c | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c index b6a5340..2f8cc57 100644 --- a/arch/x86/xen/pci-swiotlb-xen.c +++ b/arch/x86/xen/pci-swiotlb-xen.c @@ -8,6 +8,11 @@ #include xen/xen.h #include asm/iommu_table.h +#ifdef CONFIG_X86_64 +#include asm/iommu.h +#include asm/dma.h +#endif + int xen_swiotlb __read_mostly; static struct dma_map_ops xen_swiotlb_dma_ops = { @@ -49,6 +54,14 @@ int __init pci_xen_swiotlb_detect(void) * the 'swiotlb' flag is the only one turning it on. */ swiotlb = 0; +#ifdef CONFIG_X86_64 + /* pci_swiotlb_detect_4gb turns native SWIOTLB if no_iommu == 0 +* (so no iommu=X command line over-writes). So disable the native +* SWIOTLB. */ + if (max_pfn MAX_DMA32_PFN) + no_iommu = 1; +#endif return xen_swiotlb; } -- 1.7.7.6 The next part is to deal with the user forgetting to pass in 'iommu=soft' when doing PCI passthrough for a PV guest. This forgetting part is quite annoying since it seems to happen to me all the time so I think that users are more likely to forget it too. -- To
Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on > 4GB, don't turn it on.
On Fri, Jul 27, 2012 at 12:06:27PM +0100, Stefano Stabellini wrote: > On Thu, 26 Jul 2012, Konrad Rzeszutek Wilk wrote: > > If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB > > gets turned on: > > PCI-DMA: Using software bounce buffering for IO (SWIOTLB) > > software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at > > [8800fb43d000-8800ff43cfff] > > > > which is OK if we had PCI devices, but not if we did not. In a PV > > guest the SWIOTLB ends up asking the hypervisor for precious lowmem > > memory - and 64MB of it per guest. On a 32GB machine, this limits the > > amount of guests that are 4GB to start due to lowmem exhaustion. > > > > What we do is detect whether the user supplied e820_hole=1 > > parameter, which is used to construct an E820 that is similar to > > the machine - so that the PCI regions do not overlap with RAM regions. > > We check for that by looking at the E820 and seeing if it diverges > > from the standard - and if so (and if iommu=soft was not turned on), > > we disable the check pci_swiotlb_detect_4gb code. > > What kind of paramter is it? > Is it a Linux cmdline paramter? Or maybe a Xen toolstack parameter? Its a guest config option. > > Surely there must be a better way to let Linux know if this paramter has > been turned on than looking for ACPI entries in the E820. I am all open for suggestions. The best way I can think of is to have some early_init variant of XenBus-detect-this-backend-parameter. Can one unhook an "old" XenBus and reset with the full-fledged XenBus init later on? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on > 4GB, don't turn it on.
On Fri, Jul 27, 2012 at 08:27:39AM +0100, Jan Beulich wrote: > >>> On 26.07.12 at 22:43, Konrad Rzeszutek Wilk > >>> wrote: > > If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB > > gets turned on: > > PCI-DMA: Using software bounce buffering for IO (SWIOTLB) > > software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at > > [8800fb43d000-8800ff43cfff] > > > > which is OK if we had PCI devices, but not if we did not. In a PV > > guest the SWIOTLB ends up asking the hypervisor for precious lowmem > > memory - and 64MB of it per guest. On a 32GB machine, this limits the > > amount of guests that are 4GB to start due to lowmem exhaustion. > > > > What we do is detect whether the user supplied e820_hole=1 > > parameter, which is used to construct an E820 that is similar to > > the machine - so that the PCI regions do not overlap with RAM regions. > > We check for that by looking at the E820 and seeing if it diverges > > from the standard - and if so (and if iommu=soft was not turned on), > > we disable the check pci_swiotlb_detect_4gb code. > > > > Signed-off-by: Konrad Rzeszutek Wilk > > --- > > arch/x86/xen/pci-swiotlb-xen.c | 26 ++ > > 1 files changed, 26 insertions(+), 0 deletions(-) > > > > diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c > > index 967633a..56f373e 100644 > > --- a/arch/x86/xen/pci-swiotlb-xen.c > > +++ b/arch/x86/xen/pci-swiotlb-xen.c > > @@ -8,6 +8,10 @@ > > #include > > #include > > > > +#include > > +#include > > +#include > > + > > int xen_swiotlb __read_mostly; > > > > static struct dma_map_ops xen_swiotlb_dma_ops = { > > @@ -24,7 +28,19 @@ static struct dma_map_ops xen_swiotlb_dma_ops = { > > .unmap_page = xen_swiotlb_unmap_page, > > .dma_supported = xen_swiotlb_dma_supported, > > }; > > +bool __init e820_has_acpi(void) > > +{ > > + int i; > > > > + /* Check if the user supplied the e820_hole parameter > > +* which would create a machine looking E820 region. */ > > + for (i = 0; i < e820.nr_map; i++) { > > + if ((e820.map[i].type == E820_ACPI) || > > + (e820.map[i].type == E820_NVS)) > > + return true; > > Tying this decision to the presence of ACPI regions in E820 is > problematic for two reasons imo: For one, it precludes cleaning > up this (bogus!) construct where it gets produced (PV DomU-s > really shouldn't ever see such E820 entries, they should get > converted to simple reserved entries, to wipe any notion of > ACPI presence). And second it ties you to running on systems > that actually have ACPI, whereas it is my rudimentary > understanding that systems with e.g. SFI would not have any > ACPI). Right. The other idea was to check the XenBus for the existence of vpci backend. But at this stage it is not up yet. Perhaps what I should check for is the existence of two E820_RSV and two E820_RAM regions - and that would be a normal PV guest. Anything that is outside of that scope would be considered a PCI PV guest? The other thought I had was to skip this check altogether and either do: 1). initialize SWIOTLB when xen-pcifront start up and detects that it has devices (so later on initialization - similar to how IA64 does it) - but I am not sure how the PCI-DMA works with these late bloomers (especially as one could just make xen-pcifront be a module). 2). If xen-pcifront starts and does not detect any backends it calls swiotlb_free. But that also requires the PCI-DMA to swap in the dma_ops, and I am not entirely sure how that would work out. 3). Have an "early_init" xen-pcifront components that does a a quick XenBus init (similar to how hvmloader checks for DMI overwrites) and if it finds vpci then declare its time to turn SWIOTLB on. 4). The other thing is to wrap this code with something like this: #ifdef CONFIG_SWIOTLB #ifdef CONFIG_XEN_PCI_FRONTEND if (.. blah balh) do the check as outlined in 3). #else // PCI_FRONTEND is not present, so we won't need SWIOTLB swiotlb = 0; iommu = 1; #endif #endif That would take care of the built-in issues. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on > 4GB, don't turn it on.
>>> On 26.07.12 at 22:43, Konrad Rzeszutek Wilk wrote: > If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB > gets turned on: > PCI-DMA: Using software bounce buffering for IO (SWIOTLB) > software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at > [8800fb43d000-8800ff43cfff] > > which is OK if we had PCI devices, but not if we did not. In a PV > guest the SWIOTLB ends up asking the hypervisor for precious lowmem > memory - and 64MB of it per guest. On a 32GB machine, this limits the > amount of guests that are 4GB to start due to lowmem exhaustion. > > What we do is detect whether the user supplied e820_hole=1 > parameter, which is used to construct an E820 that is similar to > the machine - so that the PCI regions do not overlap with RAM regions. > We check for that by looking at the E820 and seeing if it diverges > from the standard - and if so (and if iommu=soft was not turned on), > we disable the check pci_swiotlb_detect_4gb code. > > Signed-off-by: Konrad Rzeszutek Wilk > --- > arch/x86/xen/pci-swiotlb-xen.c | 26 ++ > 1 files changed, 26 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c > index 967633a..56f373e 100644 > --- a/arch/x86/xen/pci-swiotlb-xen.c > +++ b/arch/x86/xen/pci-swiotlb-xen.c > @@ -8,6 +8,10 @@ > #include > #include > > +#include > +#include > +#include > + > int xen_swiotlb __read_mostly; > > static struct dma_map_ops xen_swiotlb_dma_ops = { > @@ -24,7 +28,19 @@ static struct dma_map_ops xen_swiotlb_dma_ops = { > .unmap_page = xen_swiotlb_unmap_page, > .dma_supported = xen_swiotlb_dma_supported, > }; > +bool __init e820_has_acpi(void) > +{ > + int i; > > + /* Check if the user supplied the e820_hole parameter > + * which would create a machine looking E820 region. */ > + for (i = 0; i < e820.nr_map; i++) { > + if ((e820.map[i].type == E820_ACPI) || > + (e820.map[i].type == E820_NVS)) > + return true; Tying this decision to the presence of ACPI regions in E820 is problematic for two reasons imo: For one, it precludes cleaning up this (bogus!) construct where it gets produced (PV DomU-s really shouldn't ever see such E820 entries, they should get converted to simple reserved entries, to wipe any notion of ACPI presence). And second it ties you to running on systems that actually have ACPI, whereas it is my rudimentary understanding that systems with e.g. SFI would not have any ACPI). Jan > + } > + return false; > +} > /* > * pci_xen_swiotlb_detect - set xen_swiotlb to 1 if necessary > * > @@ -33,7 +49,17 @@ static struct dma_map_ops xen_swiotlb_dma_ops = { > */ > int __init pci_xen_swiotlb_detect(void) > { > +#ifdef CONFIG_X86_64 > > + /* Having more than 4GB triggers the native SWIOTLB to activate. > + * The way to turn it off is to set no_iommu. */ > + printk(KERN_INFO "swiotlb: %d\n", swiotlb); > + if (xen_pv_domain() && !swiotlb && max_pfn > MAX_DMA32_PFN) { > + /* Normal PV guests only have E820_RSV and E820_RAM regions */ > + if (!e820_has_acpi()) > + no_iommu = 1; > + } > +#endif > /* If running as PV guest, either iommu=soft, or swiotlb=force will >* activate this IOMMU. If running as PV privileged, activate it >* irregardless. > -- > 1.7.7.6 > > > ___ > Xen-devel mailing list > xen-de...@lists.xen.org > http://lists.xen.org/xen-devel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on 4GB, don't turn it on.
On 26.07.12 at 22:43, Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote: If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB gets turned on: PCI-DMA: Using software bounce buffering for IO (SWIOTLB) software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at [8800fb43d000-8800ff43cfff] which is OK if we had PCI devices, but not if we did not. In a PV guest the SWIOTLB ends up asking the hypervisor for precious lowmem memory - and 64MB of it per guest. On a 32GB machine, this limits the amount of guests that are 4GB to start due to lowmem exhaustion. What we do is detect whether the user supplied e820_hole=1 parameter, which is used to construct an E820 that is similar to the machine - so that the PCI regions do not overlap with RAM regions. We check for that by looking at the E820 and seeing if it diverges from the standard - and if so (and if iommu=soft was not turned on), we disable the check pci_swiotlb_detect_4gb code. Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- arch/x86/xen/pci-swiotlb-xen.c | 26 ++ 1 files changed, 26 insertions(+), 0 deletions(-) diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c index 967633a..56f373e 100644 --- a/arch/x86/xen/pci-swiotlb-xen.c +++ b/arch/x86/xen/pci-swiotlb-xen.c @@ -8,6 +8,10 @@ #include xen/xen.h #include asm/iommu_table.h +#include asm/e820.h +#include asm/dma.h +#include asm/iommu.h + int xen_swiotlb __read_mostly; static struct dma_map_ops xen_swiotlb_dma_ops = { @@ -24,7 +28,19 @@ static struct dma_map_ops xen_swiotlb_dma_ops = { .unmap_page = xen_swiotlb_unmap_page, .dma_supported = xen_swiotlb_dma_supported, }; +bool __init e820_has_acpi(void) +{ + int i; + /* Check if the user supplied the e820_hole parameter + * which would create a machine looking E820 region. */ + for (i = 0; i e820.nr_map; i++) { + if ((e820.map[i].type == E820_ACPI) || + (e820.map[i].type == E820_NVS)) + return true; Tying this decision to the presence of ACPI regions in E820 is problematic for two reasons imo: For one, it precludes cleaning up this (bogus!) construct where it gets produced (PV DomU-s really shouldn't ever see such E820 entries, they should get converted to simple reserved entries, to wipe any notion of ACPI presence). And second it ties you to running on systems that actually have ACPI, whereas it is my rudimentary understanding that systems with e.g. SFI would not have any ACPI). Jan + } + return false; +} /* * pci_xen_swiotlb_detect - set xen_swiotlb to 1 if necessary * @@ -33,7 +49,17 @@ static struct dma_map_ops xen_swiotlb_dma_ops = { */ int __init pci_xen_swiotlb_detect(void) { +#ifdef CONFIG_X86_64 + /* Having more than 4GB triggers the native SWIOTLB to activate. + * The way to turn it off is to set no_iommu. */ + printk(KERN_INFO swiotlb: %d\n, swiotlb); + if (xen_pv_domain() !swiotlb max_pfn MAX_DMA32_PFN) { + /* Normal PV guests only have E820_RSV and E820_RAM regions */ + if (!e820_has_acpi()) + no_iommu = 1; + } +#endif /* If running as PV guest, either iommu=soft, or swiotlb=force will * activate this IOMMU. If running as PV privileged, activate it * irregardless. -- 1.7.7.6 ___ Xen-devel mailing list xen-de...@lists.xen.org http://lists.xen.org/xen-devel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on 4GB, don't turn it on.
On Fri, Jul 27, 2012 at 08:27:39AM +0100, Jan Beulich wrote: On 26.07.12 at 22:43, Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote: If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB gets turned on: PCI-DMA: Using software bounce buffering for IO (SWIOTLB) software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at [8800fb43d000-8800ff43cfff] which is OK if we had PCI devices, but not if we did not. In a PV guest the SWIOTLB ends up asking the hypervisor for precious lowmem memory - and 64MB of it per guest. On a 32GB machine, this limits the amount of guests that are 4GB to start due to lowmem exhaustion. What we do is detect whether the user supplied e820_hole=1 parameter, which is used to construct an E820 that is similar to the machine - so that the PCI regions do not overlap with RAM regions. We check for that by looking at the E820 and seeing if it diverges from the standard - and if so (and if iommu=soft was not turned on), we disable the check pci_swiotlb_detect_4gb code. Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- arch/x86/xen/pci-swiotlb-xen.c | 26 ++ 1 files changed, 26 insertions(+), 0 deletions(-) diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c index 967633a..56f373e 100644 --- a/arch/x86/xen/pci-swiotlb-xen.c +++ b/arch/x86/xen/pci-swiotlb-xen.c @@ -8,6 +8,10 @@ #include xen/xen.h #include asm/iommu_table.h +#include asm/e820.h +#include asm/dma.h +#include asm/iommu.h + int xen_swiotlb __read_mostly; static struct dma_map_ops xen_swiotlb_dma_ops = { @@ -24,7 +28,19 @@ static struct dma_map_ops xen_swiotlb_dma_ops = { .unmap_page = xen_swiotlb_unmap_page, .dma_supported = xen_swiotlb_dma_supported, }; +bool __init e820_has_acpi(void) +{ + int i; + /* Check if the user supplied the e820_hole parameter +* which would create a machine looking E820 region. */ + for (i = 0; i e820.nr_map; i++) { + if ((e820.map[i].type == E820_ACPI) || + (e820.map[i].type == E820_NVS)) + return true; Tying this decision to the presence of ACPI regions in E820 is problematic for two reasons imo: For one, it precludes cleaning up this (bogus!) construct where it gets produced (PV DomU-s really shouldn't ever see such E820 entries, they should get converted to simple reserved entries, to wipe any notion of ACPI presence). And second it ties you to running on systems that actually have ACPI, whereas it is my rudimentary understanding that systems with e.g. SFI would not have any ACPI). Right. The other idea was to check the XenBus for the existence of vpci backend. But at this stage it is not up yet. Perhaps what I should check for is the existence of two E820_RSV and two E820_RAM regions - and that would be a normal PV guest. Anything that is outside of that scope would be considered a PCI PV guest? The other thought I had was to skip this check altogether and either do: 1). initialize SWIOTLB when xen-pcifront start up and detects that it has devices (so later on initialization - similar to how IA64 does it) - but I am not sure how the PCI-DMA works with these late bloomers (especially as one could just make xen-pcifront be a module). 2). If xen-pcifront starts and does not detect any backends it calls swiotlb_free. But that also requires the PCI-DMA to swap in the dma_ops, and I am not entirely sure how that would work out. 3). Have an early_init xen-pcifront components that does a a quick XenBus init (similar to how hvmloader checks for DMI overwrites) and if it finds vpci then declare its time to turn SWIOTLB on. 4). The other thing is to wrap this code with something like this: #ifdef CONFIG_SWIOTLB #ifdef CONFIG_XEN_PCI_FRONTEND if (.. blah balh) do the check as outlined in 3). #else // PCI_FRONTEND is not present, so we won't need SWIOTLB swiotlb = 0; iommu = 1; #endif #endif That would take care of the built-in issues. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on 4GB, don't turn it on.
On Fri, Jul 27, 2012 at 12:06:27PM +0100, Stefano Stabellini wrote: On Thu, 26 Jul 2012, Konrad Rzeszutek Wilk wrote: If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB gets turned on: PCI-DMA: Using software bounce buffering for IO (SWIOTLB) software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at [8800fb43d000-8800ff43cfff] which is OK if we had PCI devices, but not if we did not. In a PV guest the SWIOTLB ends up asking the hypervisor for precious lowmem memory - and 64MB of it per guest. On a 32GB machine, this limits the amount of guests that are 4GB to start due to lowmem exhaustion. What we do is detect whether the user supplied e820_hole=1 parameter, which is used to construct an E820 that is similar to the machine - so that the PCI regions do not overlap with RAM regions. We check for that by looking at the E820 and seeing if it diverges from the standard - and if so (and if iommu=soft was not turned on), we disable the check pci_swiotlb_detect_4gb code. What kind of paramter is it? Is it a Linux cmdline paramter? Or maybe a Xen toolstack parameter? Its a guest config option. Surely there must be a better way to let Linux know if this paramter has been turned on than looking for ACPI entries in the E820. I am all open for suggestions. The best way I can think of is to have some early_init variant of XenBus-detect-this-backend-parameter. Can one unhook an old XenBus and reset with the full-fledged XenBus init later on? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/