Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on > 4GB, don't turn it on.

2012-07-30 Thread Konrad Rzeszutek Wilk
On Mon, Jul 30, 2012 at 03:58:02PM +0100, Stefano Stabellini wrote:
> On Fri, 27 Jul 2012, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jul 27, 2012 at 12:06:27PM +0100, Stefano Stabellini wrote:
> > > On Thu, 26 Jul 2012, Konrad Rzeszutek Wilk wrote:
> > > > If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB
> > > > gets turned on:
> > > > PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> > > > software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at 
> > > > [8800fb43d000-8800ff43cfff]
> > > > 
> > > > which is OK if we had PCI devices, but not if we did not. In a PV
> > > > guest the SWIOTLB ends up asking the hypervisor for precious lowmem
> > > > memory - and 64MB of it per guest. On a 32GB machine, this limits the
> > > > amount of guests that are 4GB to start due to lowmem exhaustion.
> > > > 
> > > > What we do is detect whether the user supplied e820_hole=1
> > > > parameter, which is used to construct an E820 that is similar to
> > > > the machine  - so that the PCI regions do not overlap with RAM regions.
> > > > We check for that by looking at the E820 and seeing if it diverges
> > > > from the standard - and if so (and if iommu=soft was not turned on),
> > > > we disable the check pci_swiotlb_detect_4gb code.
> > > 
> > > What kind of paramter is it?
> > > Is it a Linux cmdline paramter? Or maybe a Xen toolstack parameter?
> > 
> > Its a guest config option.
> 
> Is this option turned on by default if the VM config file contains one
> or more PCI devices statically assigned to the VM?

I think we debated it at some point but never came to agreement. I did
showed that it would not negativly impact older guests - except that
they would lose some big swaths of memory (they don't do the release
memory pages for E820 I/O regions).
> 
> If this option is not specified, is it going to be impossible to
> dynamically passthrough a PCI devices after the VM is booted?

Well, so I thought about this over the weekend and cooked up some new
patches that turn Xen-SWIOTLB on (if it hasn't been turned on) when
Xen PCI detectes that there are some dvices to be passed in. Testing it now.

> 
> 
> > > Surely there must be a better way to let Linux know if this paramter has
> > > been turned on than looking for ACPI entries in the E820.
> > 
> > I am all open for suggestions. The best way I can think of is to have
> > some early_init variant of XenBus-detect-this-backend-parameter. Can
> > one unhook an "old" XenBus and reset with the full-fledged XenBus
> > init later on?
> 
> Assuming that the xen swiotlb is only useful for PCI passthrough devices
> in PV guests, we could write few wrappers for the current xen_swiotlb
> functions like this:
> 
> xen_swiotlb_alloc_coherent_new(..)
> {
> if (xen_initial_domain() || (xen_pv_domain() && 
> a_pci_device_is_assigned()))
> xen_swiotlb_alloc_coherent();
> else
> return __get_free_pages();
> }
> 
> do you think it would work?
> This way it would be far more flexible.

So I had a brain-fart when I wrote these patches. When a PV guest is booted
with more than 4GB, the SWIOTLB that gets turned on is the *native* one.
Not the XenSWIOTLB. The impact is that we dont' do any of the swizzle of memory
below 4GB, but instead jus end up wasting 64MB in a PV guest.

The fix for that is actually pretty simple:

>From c5846a207249d7c072dccbec6850e5dbf0971c40 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk 
Date: Fri, 27 Jul 2012 20:16:00 -0400
Subject: [PATCH 7/9] xen/swiotlb: With more than 4GB on 64-bit, disable the
 native SWIOTLB.

If a PV guest is booted the native SWIOTLB should not be
turned on. It does not help us (we don't have any PCI devices)
and it eats 64MB of good memory. In the case of PV guests
with PCI devices we need the Xen-SWIOTLB one.

Signed-off-by: Konrad Rzeszutek Wilk 
---
 arch/x86/xen/pci-swiotlb-xen.c |   13 +
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
index b6a5340..2f8cc57 100644
--- a/arch/x86/xen/pci-swiotlb-xen.c
+++ b/arch/x86/xen/pci-swiotlb-xen.c
@@ -8,6 +8,11 @@
 #include 
 #include 
 
+#ifdef CONFIG_X86_64
+#include 
+#include 
+#endif
+
 int xen_swiotlb __read_mostly;
 
 static struct dma_map_ops xen_swiotlb_dma_ops = {
@@ -49,6 +54,14 @@ int __init pci_xen_swiotlb_detect(void)
 * the 'swiotlb' flag is the only one turning it on. */
swiotlb = 0;
 
+#ifdef CONFIG_X86_64
+   /* pci_swiotlb_detect_4gb turns native SWIOTLB if no_iommu == 0
+* (so no iommu=X command line over-writes). So disable the native
+* SWIOTLB. */
+   if (max_pfn > MAX_DMA32_PFN)
+   no_iommu = 1;
+#endif
return xen_swiotlb;
 }
 
-- 
1.7.7.6


The next part is to deal with the user forgetting to pass in 'iommu=soft'
when doing PCI passthrough for a PV guest. This "forgetting" part is quite
annoying since it seems to happen to me all the time so I think that 

Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on > 4GB, don't turn it on.

2012-07-30 Thread Stefano Stabellini
On Fri, 27 Jul 2012, Konrad Rzeszutek Wilk wrote:
> On Fri, Jul 27, 2012 at 12:06:27PM +0100, Stefano Stabellini wrote:
> > On Thu, 26 Jul 2012, Konrad Rzeszutek Wilk wrote:
> > > If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB
> > > gets turned on:
> > > PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> > > software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at 
> > > [8800fb43d000-8800ff43cfff]
> > > 
> > > which is OK if we had PCI devices, but not if we did not. In a PV
> > > guest the SWIOTLB ends up asking the hypervisor for precious lowmem
> > > memory - and 64MB of it per guest. On a 32GB machine, this limits the
> > > amount of guests that are 4GB to start due to lowmem exhaustion.
> > > 
> > > What we do is detect whether the user supplied e820_hole=1
> > > parameter, which is used to construct an E820 that is similar to
> > > the machine  - so that the PCI regions do not overlap with RAM regions.
> > > We check for that by looking at the E820 and seeing if it diverges
> > > from the standard - and if so (and if iommu=soft was not turned on),
> > > we disable the check pci_swiotlb_detect_4gb code.
> > 
> > What kind of paramter is it?
> > Is it a Linux cmdline paramter? Or maybe a Xen toolstack parameter?
> 
> Its a guest config option.

Is this option turned on by default if the VM config file contains one
or more PCI devices statically assigned to the VM?

If this option is not specified, is it going to be impossible to
dynamically passthrough a PCI devices after the VM is booted?


> > Surely there must be a better way to let Linux know if this paramter has
> > been turned on than looking for ACPI entries in the E820.
> 
> I am all open for suggestions. The best way I can think of is to have
> some early_init variant of XenBus-detect-this-backend-parameter. Can
> one unhook an "old" XenBus and reset with the full-fledged XenBus
> init later on?

Assuming that the xen swiotlb is only useful for PCI passthrough devices
in PV guests, we could write few wrappers for the current xen_swiotlb
functions like this:

xen_swiotlb_alloc_coherent_new(..)
{
if (xen_initial_domain() || (xen_pv_domain() && a_pci_device_is_assigned()))
xen_swiotlb_alloc_coherent();
else
return __get_free_pages();
}

do you think it would work?
This way it would be far more flexible.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on > 4GB, don't turn it on.

2012-07-30 Thread Jan Beulich
>>> On 27.07.12 at 19:54, Konrad Rzeszutek Wilk  wrote:
> On Fri, Jul 27, 2012 at 08:27:39AM +0100, Jan Beulich wrote:
>> >>> On 26.07.12 at 22:43, Konrad Rzeszutek Wilk  
>> >>> wrote:
>> > +  /* Check if the user supplied the e820_hole parameter
>> > +   * which would create a machine looking E820 region. */
>> > +  for (i = 0; i < e820.nr_map; i++) {
>> > +  if ((e820.map[i].type == E820_ACPI) ||
>> > +  (e820.map[i].type == E820_NVS))
>> > +  return true;
>> 
>> Tying this decision to the presence of ACPI regions in E820 is
>> problematic for two reasons imo: For one, it precludes cleaning
>> up this (bogus!) construct where it gets produced (PV DomU-s
>> really shouldn't ever see such E820 entries, they should get
>> converted to simple reserved entries, to wipe any notion of
>> ACPI presence). And second it ties you to running on systems
>> that actually have ACPI, whereas it is my rudimentary
>> understanding that systems with e.g. SFI would not have any
>> ACPI).
> 
> Right. The other idea was to check the XenBus for the existence
> of vpci backend. But at this stage it is not up yet.
> 
> Perhaps what I should check for is the existence of two E820_RSV
> and two E820_RAM regions - and that would be a normal PV guest.
> Anything that is outside of that scope would be considered
> a PCI PV guest?

I'd limit this to two RAM and at least one reserved regions (after
all it could happen that all the reserved ones can be folded into
one). But beyond this minor detail that's the approach I'd prefer.
All the ones below look more or less fragile.

Jan

> The other thought I had was to skip this check altogether and
> either do:
> 1). initialize SWIOTLB when xen-pcifront start up and detects
> that it has devices (so later on initialization - similar to
> how IA64 does it) - but I am not sure how the PCI-DMA works
> with these late bloomers (especially as one could just make
> xen-pcifront be a module).
> 2). If xen-pcifront starts and does not detect any backends
> it calls swiotlb_free. But that also requires the PCI-DMA
> to swap in the dma_ops, and I am not entirely sure how
> that would work out.
> 3). Have an "early_init" xen-pcifront components that does a
> a quick XenBus init (similar to how hvmloader checks for
> DMI overwrites) and if it finds vpci then declare its
> time to turn SWIOTLB on.
> 4). The other thing is to wrap this code with something like
> this:
> 
> #ifdef CONFIG_SWIOTLB
> #ifdef CONFIG_XEN_PCI_FRONTEND
>   if (.. blah balh) do the check as outlined in 3).
> #else // PCI_FRONTEND is not present, so we won't need SWIOTLB
>   swiotlb = 0;
>   iommu = 1;
> #endif
> #endif
> 
> That would take care of the built-in issues.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on 4GB, don't turn it on.

2012-07-30 Thread Jan Beulich
 On 27.07.12 at 19:54, Konrad Rzeszutek Wilk kon...@darnok.org wrote:
 On Fri, Jul 27, 2012 at 08:27:39AM +0100, Jan Beulich wrote:
  On 26.07.12 at 22:43, Konrad Rzeszutek Wilk konrad.w...@oracle.com 
  wrote:
  +  /* Check if the user supplied the e820_hole parameter
  +   * which would create a machine looking E820 region. */
  +  for (i = 0; i  e820.nr_map; i++) {
  +  if ((e820.map[i].type == E820_ACPI) ||
  +  (e820.map[i].type == E820_NVS))
  +  return true;
 
 Tying this decision to the presence of ACPI regions in E820 is
 problematic for two reasons imo: For one, it precludes cleaning
 up this (bogus!) construct where it gets produced (PV DomU-s
 really shouldn't ever see such E820 entries, they should get
 converted to simple reserved entries, to wipe any notion of
 ACPI presence). And second it ties you to running on systems
 that actually have ACPI, whereas it is my rudimentary
 understanding that systems with e.g. SFI would not have any
 ACPI).
 
 Right. The other idea was to check the XenBus for the existence
 of vpci backend. But at this stage it is not up yet.
 
 Perhaps what I should check for is the existence of two E820_RSV
 and two E820_RAM regions - and that would be a normal PV guest.
 Anything that is outside of that scope would be considered
 a PCI PV guest?

I'd limit this to two RAM and at least one reserved regions (after
all it could happen that all the reserved ones can be folded into
one). But beyond this minor detail that's the approach I'd prefer.
All the ones below look more or less fragile.

Jan

 The other thought I had was to skip this check altogether and
 either do:
 1). initialize SWIOTLB when xen-pcifront start up and detects
 that it has devices (so later on initialization - similar to
 how IA64 does it) - but I am not sure how the PCI-DMA works
 with these late bloomers (especially as one could just make
 xen-pcifront be a module).
 2). If xen-pcifront starts and does not detect any backends
 it calls swiotlb_free. But that also requires the PCI-DMA
 to swap in the dma_ops, and I am not entirely sure how
 that would work out.
 3). Have an early_init xen-pcifront components that does a
 a quick XenBus init (similar to how hvmloader checks for
 DMI overwrites) and if it finds vpci then declare its
 time to turn SWIOTLB on.
 4). The other thing is to wrap this code with something like
 this:
 
 #ifdef CONFIG_SWIOTLB
 #ifdef CONFIG_XEN_PCI_FRONTEND
   if (.. blah balh) do the check as outlined in 3).
 #else // PCI_FRONTEND is not present, so we won't need SWIOTLB
   swiotlb = 0;
   iommu = 1;
 #endif
 #endif
 
 That would take care of the built-in issues.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on 4GB, don't turn it on.

2012-07-30 Thread Stefano Stabellini
On Fri, 27 Jul 2012, Konrad Rzeszutek Wilk wrote:
 On Fri, Jul 27, 2012 at 12:06:27PM +0100, Stefano Stabellini wrote:
  On Thu, 26 Jul 2012, Konrad Rzeszutek Wilk wrote:
   If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB
   gets turned on:
   PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
   software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at 
   [8800fb43d000-8800ff43cfff]
   
   which is OK if we had PCI devices, but not if we did not. In a PV
   guest the SWIOTLB ends up asking the hypervisor for precious lowmem
   memory - and 64MB of it per guest. On a 32GB machine, this limits the
   amount of guests that are 4GB to start due to lowmem exhaustion.
   
   What we do is detect whether the user supplied e820_hole=1
   parameter, which is used to construct an E820 that is similar to
   the machine  - so that the PCI regions do not overlap with RAM regions.
   We check for that by looking at the E820 and seeing if it diverges
   from the standard - and if so (and if iommu=soft was not turned on),
   we disable the check pci_swiotlb_detect_4gb code.
  
  What kind of paramter is it?
  Is it a Linux cmdline paramter? Or maybe a Xen toolstack parameter?
 
 Its a guest config option.

Is this option turned on by default if the VM config file contains one
or more PCI devices statically assigned to the VM?

If this option is not specified, is it going to be impossible to
dynamically passthrough a PCI devices after the VM is booted?


  Surely there must be a better way to let Linux know if this paramter has
  been turned on than looking for ACPI entries in the E820.
 
 I am all open for suggestions. The best way I can think of is to have
 some early_init variant of XenBus-detect-this-backend-parameter. Can
 one unhook an old XenBus and reset with the full-fledged XenBus
 init later on?

Assuming that the xen swiotlb is only useful for PCI passthrough devices
in PV guests, we could write few wrappers for the current xen_swiotlb
functions like this:

xen_swiotlb_alloc_coherent_new(..)
{
if (xen_initial_domain() || (xen_pv_domain()  a_pci_device_is_assigned()))
xen_swiotlb_alloc_coherent();
else
return __get_free_pages();
}

do you think it would work?
This way it would be far more flexible.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on 4GB, don't turn it on.

2012-07-30 Thread Konrad Rzeszutek Wilk
On Mon, Jul 30, 2012 at 03:58:02PM +0100, Stefano Stabellini wrote:
 On Fri, 27 Jul 2012, Konrad Rzeszutek Wilk wrote:
  On Fri, Jul 27, 2012 at 12:06:27PM +0100, Stefano Stabellini wrote:
   On Thu, 26 Jul 2012, Konrad Rzeszutek Wilk wrote:
If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB
gets turned on:
PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at 
[8800fb43d000-8800ff43cfff]

which is OK if we had PCI devices, but not if we did not. In a PV
guest the SWIOTLB ends up asking the hypervisor for precious lowmem
memory - and 64MB of it per guest. On a 32GB machine, this limits the
amount of guests that are 4GB to start due to lowmem exhaustion.

What we do is detect whether the user supplied e820_hole=1
parameter, which is used to construct an E820 that is similar to
the machine  - so that the PCI regions do not overlap with RAM regions.
We check for that by looking at the E820 and seeing if it diverges
from the standard - and if so (and if iommu=soft was not turned on),
we disable the check pci_swiotlb_detect_4gb code.
   
   What kind of paramter is it?
   Is it a Linux cmdline paramter? Or maybe a Xen toolstack parameter?
  
  Its a guest config option.
 
 Is this option turned on by default if the VM config file contains one
 or more PCI devices statically assigned to the VM?

I think we debated it at some point but never came to agreement. I did
showed that it would not negativly impact older guests - except that
they would lose some big swaths of memory (they don't do the release
memory pages for E820 I/O regions).
 
 If this option is not specified, is it going to be impossible to
 dynamically passthrough a PCI devices after the VM is booted?

Well, so I thought about this over the weekend and cooked up some new
patches that turn Xen-SWIOTLB on (if it hasn't been turned on) when
Xen PCI detectes that there are some dvices to be passed in. Testing it now.

 
 
   Surely there must be a better way to let Linux know if this paramter has
   been turned on than looking for ACPI entries in the E820.
  
  I am all open for suggestions. The best way I can think of is to have
  some early_init variant of XenBus-detect-this-backend-parameter. Can
  one unhook an old XenBus and reset with the full-fledged XenBus
  init later on?
 
 Assuming that the xen swiotlb is only useful for PCI passthrough devices
 in PV guests, we could write few wrappers for the current xen_swiotlb
 functions like this:
 
 xen_swiotlb_alloc_coherent_new(..)
 {
 if (xen_initial_domain() || (xen_pv_domain()  
 a_pci_device_is_assigned()))
 xen_swiotlb_alloc_coherent();
 else
 return __get_free_pages();
 }
 
 do you think it would work?
 This way it would be far more flexible.

So I had a brain-fart when I wrote these patches. When a PV guest is booted
with more than 4GB, the SWIOTLB that gets turned on is the *native* one.
Not the XenSWIOTLB. The impact is that we dont' do any of the swizzle of memory
below 4GB, but instead jus end up wasting 64MB in a PV guest.

The fix for that is actually pretty simple:

From c5846a207249d7c072dccbec6850e5dbf0971c40 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Date: Fri, 27 Jul 2012 20:16:00 -0400
Subject: [PATCH 7/9] xen/swiotlb: With more than 4GB on 64-bit, disable the
 native SWIOTLB.

If a PV guest is booted the native SWIOTLB should not be
turned on. It does not help us (we don't have any PCI devices)
and it eats 64MB of good memory. In the case of PV guests
with PCI devices we need the Xen-SWIOTLB one.

Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 arch/x86/xen/pci-swiotlb-xen.c |   13 +
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
index b6a5340..2f8cc57 100644
--- a/arch/x86/xen/pci-swiotlb-xen.c
+++ b/arch/x86/xen/pci-swiotlb-xen.c
@@ -8,6 +8,11 @@
 #include xen/xen.h
 #include asm/iommu_table.h
 
+#ifdef CONFIG_X86_64
+#include asm/iommu.h
+#include asm/dma.h
+#endif
+
 int xen_swiotlb __read_mostly;
 
 static struct dma_map_ops xen_swiotlb_dma_ops = {
@@ -49,6 +54,14 @@ int __init pci_xen_swiotlb_detect(void)
 * the 'swiotlb' flag is the only one turning it on. */
swiotlb = 0;
 
+#ifdef CONFIG_X86_64
+   /* pci_swiotlb_detect_4gb turns native SWIOTLB if no_iommu == 0
+* (so no iommu=X command line over-writes). So disable the native
+* SWIOTLB. */
+   if (max_pfn  MAX_DMA32_PFN)
+   no_iommu = 1;
+#endif
return xen_swiotlb;
 }
 
-- 
1.7.7.6


The next part is to deal with the user forgetting to pass in 'iommu=soft'
when doing PCI passthrough for a PV guest. This forgetting part is quite
annoying since it seems to happen to me all the time so I think that users
are more likely to forget it too.

--
To 

Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on > 4GB, don't turn it on.

2012-07-27 Thread Konrad Rzeszutek Wilk
On Fri, Jul 27, 2012 at 12:06:27PM +0100, Stefano Stabellini wrote:
> On Thu, 26 Jul 2012, Konrad Rzeszutek Wilk wrote:
> > If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB
> > gets turned on:
> > PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> > software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at 
> > [8800fb43d000-8800ff43cfff]
> > 
> > which is OK if we had PCI devices, but not if we did not. In a PV
> > guest the SWIOTLB ends up asking the hypervisor for precious lowmem
> > memory - and 64MB of it per guest. On a 32GB machine, this limits the
> > amount of guests that are 4GB to start due to lowmem exhaustion.
> > 
> > What we do is detect whether the user supplied e820_hole=1
> > parameter, which is used to construct an E820 that is similar to
> > the machine  - so that the PCI regions do not overlap with RAM regions.
> > We check for that by looking at the E820 and seeing if it diverges
> > from the standard - and if so (and if iommu=soft was not turned on),
> > we disable the check pci_swiotlb_detect_4gb code.
> 
> What kind of paramter is it?
> Is it a Linux cmdline paramter? Or maybe a Xen toolstack parameter?

Its a guest config option.

> 
> Surely there must be a better way to let Linux know if this paramter has
> been turned on than looking for ACPI entries in the E820.

I am all open for suggestions. The best way I can think of is to have
some early_init variant of XenBus-detect-this-backend-parameter. Can
one unhook an "old" XenBus and reset with the full-fledged XenBus
init later on?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on > 4GB, don't turn it on.

2012-07-27 Thread Konrad Rzeszutek Wilk
On Fri, Jul 27, 2012 at 08:27:39AM +0100, Jan Beulich wrote:
> >>> On 26.07.12 at 22:43, Konrad Rzeszutek Wilk  
> >>> wrote:
> > If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB
> > gets turned on:
> > PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> > software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at 
> > [8800fb43d000-8800ff43cfff]
> > 
> > which is OK if we had PCI devices, but not if we did not. In a PV
> > guest the SWIOTLB ends up asking the hypervisor for precious lowmem
> > memory - and 64MB of it per guest. On a 32GB machine, this limits the
> > amount of guests that are 4GB to start due to lowmem exhaustion.
> > 
> > What we do is detect whether the user supplied e820_hole=1
> > parameter, which is used to construct an E820 that is similar to
> > the machine  - so that the PCI regions do not overlap with RAM regions.
> > We check for that by looking at the E820 and seeing if it diverges
> > from the standard - and if so (and if iommu=soft was not turned on),
> > we disable the check pci_swiotlb_detect_4gb code.
> > 
> > Signed-off-by: Konrad Rzeszutek Wilk 
> > ---
> >  arch/x86/xen/pci-swiotlb-xen.c |   26 ++
> >  1 files changed, 26 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
> > index 967633a..56f373e 100644
> > --- a/arch/x86/xen/pci-swiotlb-xen.c
> > +++ b/arch/x86/xen/pci-swiotlb-xen.c
> > @@ -8,6 +8,10 @@
> >  #include 
> >  #include 
> >  
> > +#include 
> > +#include 
> > +#include 
> > +
> >  int xen_swiotlb __read_mostly;
> >  
> >  static struct dma_map_ops xen_swiotlb_dma_ops = {
> > @@ -24,7 +28,19 @@ static struct dma_map_ops xen_swiotlb_dma_ops = {
> > .unmap_page = xen_swiotlb_unmap_page,
> > .dma_supported = xen_swiotlb_dma_supported,
> >  };
> > +bool __init e820_has_acpi(void)
> > +{
> > +   int i;
> >  
> > +   /* Check if the user supplied the e820_hole parameter
> > +* which would create a machine looking E820 region. */
> > +   for (i = 0; i < e820.nr_map; i++) {
> > +   if ((e820.map[i].type == E820_ACPI) ||
> > +   (e820.map[i].type == E820_NVS))
> > +   return true;
> 
> Tying this decision to the presence of ACPI regions in E820 is
> problematic for two reasons imo: For one, it precludes cleaning
> up this (bogus!) construct where it gets produced (PV DomU-s
> really shouldn't ever see such E820 entries, they should get
> converted to simple reserved entries, to wipe any notion of
> ACPI presence). And second it ties you to running on systems
> that actually have ACPI, whereas it is my rudimentary
> understanding that systems with e.g. SFI would not have any
> ACPI).

Right. The other idea was to check the XenBus for the existence
of vpci backend. But at this stage it is not up yet.

Perhaps what I should check for is the existence of two E820_RSV
and two E820_RAM regions - and that would be a normal PV guest.
Anything that is outside of that scope would be considered
a PCI PV guest?

The other thought I had was to skip this check altogether and
either do:
1). initialize SWIOTLB when xen-pcifront start up and detects
that it has devices (so later on initialization - similar to
how IA64 does it) - but I am not sure how the PCI-DMA works
with these late bloomers (especially as one could just make
xen-pcifront be a module).
2). If xen-pcifront starts and does not detect any backends
it calls swiotlb_free. But that also requires the PCI-DMA
to swap in the dma_ops, and I am not entirely sure how
that would work out.
3). Have an "early_init" xen-pcifront components that does a
a quick XenBus init (similar to how hvmloader checks for
DMI overwrites) and if it finds vpci then declare its
time to turn SWIOTLB on.
4). The other thing is to wrap this code with something like
this:

#ifdef CONFIG_SWIOTLB
#ifdef CONFIG_XEN_PCI_FRONTEND
if (.. blah balh) do the check as outlined in 3).
#else // PCI_FRONTEND is not present, so we won't need SWIOTLB
swiotlb = 0;
iommu = 1;
#endif
#endif

That would take care of the built-in issues.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on > 4GB, don't turn it on.

2012-07-27 Thread Jan Beulich
>>> On 26.07.12 at 22:43, Konrad Rzeszutek Wilk  wrote:
> If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB
> gets turned on:
> PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at 
> [8800fb43d000-8800ff43cfff]
> 
> which is OK if we had PCI devices, but not if we did not. In a PV
> guest the SWIOTLB ends up asking the hypervisor for precious lowmem
> memory - and 64MB of it per guest. On a 32GB machine, this limits the
> amount of guests that are 4GB to start due to lowmem exhaustion.
> 
> What we do is detect whether the user supplied e820_hole=1
> parameter, which is used to construct an E820 that is similar to
> the machine  - so that the PCI regions do not overlap with RAM regions.
> We check for that by looking at the E820 and seeing if it diverges
> from the standard - and if so (and if iommu=soft was not turned on),
> we disable the check pci_swiotlb_detect_4gb code.
> 
> Signed-off-by: Konrad Rzeszutek Wilk 
> ---
>  arch/x86/xen/pci-swiotlb-xen.c |   26 ++
>  1 files changed, 26 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
> index 967633a..56f373e 100644
> --- a/arch/x86/xen/pci-swiotlb-xen.c
> +++ b/arch/x86/xen/pci-swiotlb-xen.c
> @@ -8,6 +8,10 @@
>  #include 
>  #include 
>  
> +#include 
> +#include 
> +#include 
> +
>  int xen_swiotlb __read_mostly;
>  
>  static struct dma_map_ops xen_swiotlb_dma_ops = {
> @@ -24,7 +28,19 @@ static struct dma_map_ops xen_swiotlb_dma_ops = {
>   .unmap_page = xen_swiotlb_unmap_page,
>   .dma_supported = xen_swiotlb_dma_supported,
>  };
> +bool __init e820_has_acpi(void)
> +{
> + int i;
>  
> + /* Check if the user supplied the e820_hole parameter
> +  * which would create a machine looking E820 region. */
> + for (i = 0; i < e820.nr_map; i++) {
> + if ((e820.map[i].type == E820_ACPI) ||
> + (e820.map[i].type == E820_NVS))
> + return true;

Tying this decision to the presence of ACPI regions in E820 is
problematic for two reasons imo: For one, it precludes cleaning
up this (bogus!) construct where it gets produced (PV DomU-s
really shouldn't ever see such E820 entries, they should get
converted to simple reserved entries, to wipe any notion of
ACPI presence). And second it ties you to running on systems
that actually have ACPI, whereas it is my rudimentary
understanding that systems with e.g. SFI would not have any
ACPI).

Jan

> + }
> + return false;
> +}
>  /*
>   * pci_xen_swiotlb_detect - set xen_swiotlb to 1 if necessary
>   *
> @@ -33,7 +49,17 @@ static struct dma_map_ops xen_swiotlb_dma_ops = {
>   */
>  int __init pci_xen_swiotlb_detect(void)
>  {
> +#ifdef CONFIG_X86_64
>  
> + /* Having more than 4GB triggers the native SWIOTLB to activate.
> +  * The way to turn it off is to set no_iommu. */
> + printk(KERN_INFO "swiotlb: %d\n", swiotlb);
> + if (xen_pv_domain() && !swiotlb && max_pfn > MAX_DMA32_PFN) {
> + /* Normal PV guests only have E820_RSV and E820_RAM regions */
> + if (!e820_has_acpi())
> + no_iommu = 1;
> + }
> +#endif
>   /* If running as PV guest, either iommu=soft, or swiotlb=force will
>* activate this IOMMU. If running as PV privileged, activate it
>* irregardless.
> -- 
> 1.7.7.6
> 
> 
> ___
> Xen-devel mailing list
> xen-de...@lists.xen.org 
> http://lists.xen.org/xen-devel 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on 4GB, don't turn it on.

2012-07-27 Thread Jan Beulich
 On 26.07.12 at 22:43, Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote:
 If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB
 gets turned on:
 PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
 software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at 
 [8800fb43d000-8800ff43cfff]
 
 which is OK if we had PCI devices, but not if we did not. In a PV
 guest the SWIOTLB ends up asking the hypervisor for precious lowmem
 memory - and 64MB of it per guest. On a 32GB machine, this limits the
 amount of guests that are 4GB to start due to lowmem exhaustion.
 
 What we do is detect whether the user supplied e820_hole=1
 parameter, which is used to construct an E820 that is similar to
 the machine  - so that the PCI regions do not overlap with RAM regions.
 We check for that by looking at the E820 and seeing if it diverges
 from the standard - and if so (and if iommu=soft was not turned on),
 we disable the check pci_swiotlb_detect_4gb code.
 
 Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 ---
  arch/x86/xen/pci-swiotlb-xen.c |   26 ++
  1 files changed, 26 insertions(+), 0 deletions(-)
 
 diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
 index 967633a..56f373e 100644
 --- a/arch/x86/xen/pci-swiotlb-xen.c
 +++ b/arch/x86/xen/pci-swiotlb-xen.c
 @@ -8,6 +8,10 @@
  #include xen/xen.h
  #include asm/iommu_table.h
  
 +#include asm/e820.h
 +#include asm/dma.h
 +#include asm/iommu.h
 +
  int xen_swiotlb __read_mostly;
  
  static struct dma_map_ops xen_swiotlb_dma_ops = {
 @@ -24,7 +28,19 @@ static struct dma_map_ops xen_swiotlb_dma_ops = {
   .unmap_page = xen_swiotlb_unmap_page,
   .dma_supported = xen_swiotlb_dma_supported,
  };
 +bool __init e820_has_acpi(void)
 +{
 + int i;
  
 + /* Check if the user supplied the e820_hole parameter
 +  * which would create a machine looking E820 region. */
 + for (i = 0; i  e820.nr_map; i++) {
 + if ((e820.map[i].type == E820_ACPI) ||
 + (e820.map[i].type == E820_NVS))
 + return true;

Tying this decision to the presence of ACPI regions in E820 is
problematic for two reasons imo: For one, it precludes cleaning
up this (bogus!) construct where it gets produced (PV DomU-s
really shouldn't ever see such E820 entries, they should get
converted to simple reserved entries, to wipe any notion of
ACPI presence). And second it ties you to running on systems
that actually have ACPI, whereas it is my rudimentary
understanding that systems with e.g. SFI would not have any
ACPI).

Jan

 + }
 + return false;
 +}
  /*
   * pci_xen_swiotlb_detect - set xen_swiotlb to 1 if necessary
   *
 @@ -33,7 +49,17 @@ static struct dma_map_ops xen_swiotlb_dma_ops = {
   */
  int __init pci_xen_swiotlb_detect(void)
  {
 +#ifdef CONFIG_X86_64
  
 + /* Having more than 4GB triggers the native SWIOTLB to activate.
 +  * The way to turn it off is to set no_iommu. */
 + printk(KERN_INFO swiotlb: %d\n, swiotlb);
 + if (xen_pv_domain()  !swiotlb  max_pfn  MAX_DMA32_PFN) {
 + /* Normal PV guests only have E820_RSV and E820_RAM regions */
 + if (!e820_has_acpi())
 + no_iommu = 1;
 + }
 +#endif
   /* If running as PV guest, either iommu=soft, or swiotlb=force will
* activate this IOMMU. If running as PV privileged, activate it
* irregardless.
 -- 
 1.7.7.6
 
 
 ___
 Xen-devel mailing list
 xen-de...@lists.xen.org 
 http://lists.xen.org/xen-devel 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on 4GB, don't turn it on.

2012-07-27 Thread Konrad Rzeszutek Wilk
On Fri, Jul 27, 2012 at 08:27:39AM +0100, Jan Beulich wrote:
  On 26.07.12 at 22:43, Konrad Rzeszutek Wilk konrad.w...@oracle.com 
  wrote:
  If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB
  gets turned on:
  PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
  software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at 
  [8800fb43d000-8800ff43cfff]
  
  which is OK if we had PCI devices, but not if we did not. In a PV
  guest the SWIOTLB ends up asking the hypervisor for precious lowmem
  memory - and 64MB of it per guest. On a 32GB machine, this limits the
  amount of guests that are 4GB to start due to lowmem exhaustion.
  
  What we do is detect whether the user supplied e820_hole=1
  parameter, which is used to construct an E820 that is similar to
  the machine  - so that the PCI regions do not overlap with RAM regions.
  We check for that by looking at the E820 and seeing if it diverges
  from the standard - and if so (and if iommu=soft was not turned on),
  we disable the check pci_swiotlb_detect_4gb code.
  
  Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
  ---
   arch/x86/xen/pci-swiotlb-xen.c |   26 ++
   1 files changed, 26 insertions(+), 0 deletions(-)
  
  diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
  index 967633a..56f373e 100644
  --- a/arch/x86/xen/pci-swiotlb-xen.c
  +++ b/arch/x86/xen/pci-swiotlb-xen.c
  @@ -8,6 +8,10 @@
   #include xen/xen.h
   #include asm/iommu_table.h
   
  +#include asm/e820.h
  +#include asm/dma.h
  +#include asm/iommu.h
  +
   int xen_swiotlb __read_mostly;
   
   static struct dma_map_ops xen_swiotlb_dma_ops = {
  @@ -24,7 +28,19 @@ static struct dma_map_ops xen_swiotlb_dma_ops = {
  .unmap_page = xen_swiotlb_unmap_page,
  .dma_supported = xen_swiotlb_dma_supported,
   };
  +bool __init e820_has_acpi(void)
  +{
  +   int i;
   
  +   /* Check if the user supplied the e820_hole parameter
  +* which would create a machine looking E820 region. */
  +   for (i = 0; i  e820.nr_map; i++) {
  +   if ((e820.map[i].type == E820_ACPI) ||
  +   (e820.map[i].type == E820_NVS))
  +   return true;
 
 Tying this decision to the presence of ACPI regions in E820 is
 problematic for two reasons imo: For one, it precludes cleaning
 up this (bogus!) construct where it gets produced (PV DomU-s
 really shouldn't ever see such E820 entries, they should get
 converted to simple reserved entries, to wipe any notion of
 ACPI presence). And second it ties you to running on systems
 that actually have ACPI, whereas it is my rudimentary
 understanding that systems with e.g. SFI would not have any
 ACPI).

Right. The other idea was to check the XenBus for the existence
of vpci backend. But at this stage it is not up yet.

Perhaps what I should check for is the existence of two E820_RSV
and two E820_RAM regions - and that would be a normal PV guest.
Anything that is outside of that scope would be considered
a PCI PV guest?

The other thought I had was to skip this check altogether and
either do:
1). initialize SWIOTLB when xen-pcifront start up and detects
that it has devices (so later on initialization - similar to
how IA64 does it) - but I am not sure how the PCI-DMA works
with these late bloomers (especially as one could just make
xen-pcifront be a module).
2). If xen-pcifront starts and does not detect any backends
it calls swiotlb_free. But that also requires the PCI-DMA
to swap in the dma_ops, and I am not entirely sure how
that would work out.
3). Have an early_init xen-pcifront components that does a
a quick XenBus init (similar to how hvmloader checks for
DMI overwrites) and if it finds vpci then declare its
time to turn SWIOTLB on.
4). The other thing is to wrap this code with something like
this:

#ifdef CONFIG_SWIOTLB
#ifdef CONFIG_XEN_PCI_FRONTEND
if (.. blah balh) do the check as outlined in 3).
#else // PCI_FRONTEND is not present, so we won't need SWIOTLB
swiotlb = 0;
iommu = 1;
#endif
#endif

That would take care of the built-in issues.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on 4GB, don't turn it on.

2012-07-27 Thread Konrad Rzeszutek Wilk
On Fri, Jul 27, 2012 at 12:06:27PM +0100, Stefano Stabellini wrote:
 On Thu, 26 Jul 2012, Konrad Rzeszutek Wilk wrote:
  If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB
  gets turned on:
  PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
  software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at 
  [8800fb43d000-8800ff43cfff]
  
  which is OK if we had PCI devices, but not if we did not. In a PV
  guest the SWIOTLB ends up asking the hypervisor for precious lowmem
  memory - and 64MB of it per guest. On a 32GB machine, this limits the
  amount of guests that are 4GB to start due to lowmem exhaustion.
  
  What we do is detect whether the user supplied e820_hole=1
  parameter, which is used to construct an E820 that is similar to
  the machine  - so that the PCI regions do not overlap with RAM regions.
  We check for that by looking at the E820 and seeing if it diverges
  from the standard - and if so (and if iommu=soft was not turned on),
  we disable the check pci_swiotlb_detect_4gb code.
 
 What kind of paramter is it?
 Is it a Linux cmdline paramter? Or maybe a Xen toolstack parameter?

Its a guest config option.

 
 Surely there must be a better way to let Linux know if this paramter has
 been turned on than looking for ACPI entries in the E820.

I am all open for suggestions. The best way I can think of is to have
some early_init variant of XenBus-detect-this-backend-parameter. Can
one unhook an old XenBus and reset with the full-fledged XenBus
init later on?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/