RE: [PATCH] reserved-ram for pci-passthrough without VT-d capable hardware

2009-05-05 Thread Passera, Pablo R
Andrea,
Thanks for your answers. I already patched the kernel and kvm 
(including rombios). The host boots up and the memory mapping is as explained 
in the patch. Now I am trying to launch a vm using memory mapping but it hangs 
after opening the sdl windows and before showing the bios messages. I am 
running qemu command from a console in the host that is running X and the 
command line is the following:

Qemu-system-x86_64 -hda ./dm.img -cdrom /dev/sr0 -m 32 -reserved-ram -boot d

- Is this command line correct?
- Should I run the vm without having started the X in the host machine?
- What should I see after starting the vm? Should the vm take ownership of the 
video card?

Thanks,
Pablo

-Original Message-
From: Andrea Arcangeli [mailto:aarca...@redhat.com]
Sent: Tuesday, April 28, 2009 3:06 PM
To: Passera, Pablo R
Cc: kvm@vger.kernel.org
Subject: Re: [PATCH] reserved-ram for pci-passthrough without VT-d
capable hardware

On Tue, Apr 28, 2009 at 07:35:26AM -0600, Passera, Pablo R wrote:
 - Against which kernel version was this patch generated?

I don't remember exactly (I was just using an upstream hg checkout and
I didn't record its hash value) but I think you can go back to when
e820.c was still shared and it'll likely apply and work.

 - Did you try this on a 32 or 64 bits system?

I only tested it on 64bit but there's no reason why it shouldn't work
on 32bit too.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] reserved-ram for pci-passthrough without VT-d capable hardware

2009-04-28 Thread Passera, Pablo R
Hello Andrea,

I have applied the patch to kvm and userland, but, when I tried to port the 
host kernel patch I noticed that the changes were over the file e820.c. 
However, on kernel 2.6.26 there are two e820 files, e820_32.c and e820_64.c and 
most of the changes maps on the e820_64.c file. So, I have a couple of 
questions if you don't mind:

- Against which kernel version was this patch generated?
- Did you try this on a 32 or 64 bits system?

Thanks,
Pablo

-Original Message-
From: Andrea Arcangeli [mailto:aarca...@redhat.com]
Sent: Monday, April 27, 2009 2:43 PM
To: Passera, Pablo R
Cc: kvm@vger.kernel.org
Subject: Re: [PATCH] reserved-ram for pci-passthrough without VT-d
capable hardware

Hello Pablo,

On Mon, Apr 27, 2009 at 11:00:51AM -0600, Passera, Pablo R wrote:
 Andrea,
 We are working with embedded hardware that does not have
VT-d and we need 1-1 mapping. I wonder which is the status of this
patch. Have you continued updating it with the latest KVM version?

Sorry to say but it isn't updated to latest KVM and latest
mainline. Porting normally should be easy. I attached last versions.

 Since you mentioned this ;), I take opportunity to add that those
 embedded usages are the ones that are totally fine with the compile
 time passthrough-guest-ram decision, instead of a boot time
 decision. Those host kernels will likely have RT patches (KVM works
 great with preempt-RT indeed) and in turn the compile time ram
 selection is the least of their problems as you can imagine ;). So you
 can see my patch as an embedded-build option, similar to Configure
 standard kernel features (for small systems) and no distro is
 shipping new kernels with that feature on either.

 Than if we decide 1:1 should have larger userbase instead of only the
 people that knows what they're doing (i.e. 1:1 guest can destroy
 linux-hypervisor) we can always add a bit of strtol parsing to 16bit
 kernelloader.

Agreed!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reserved-ram for pci-passthrough without VT-d capable hardware

2009-04-28 Thread Andrea Arcangeli
On Tue, Apr 28, 2009 at 07:35:26AM -0600, Passera, Pablo R wrote:
 - Against which kernel version was this patch generated?

I don't remember exactly (I was just using an upstream hg checkout and
I didn't record its hash value) but I think you can go back to when
e820.c was still shared and it'll likely apply and work.

 - Did you try this on a 32 or 64 bits system?

I only tested it on 64bit but there's no reason why it shouldn't work
on 32bit too.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reserved-ram for pci-passthrough without VT-d capable hardware

2009-04-27 Thread Passera, Pablo R
Andrea,
We are working with embedded hardware that does not have VT-d and we 
need 1-1 mapping. I wonder which is the status of this patch. Have you 
continued updating it with the latest KVM version?

Regards,
Pablo

On Wed, Jul 30, 2008 at 05:16:06PM +0300, Dor Laor wrote:
 In addition KVM is used in embedded too and things are slower there, we
 know of a specific use case (production) that demands
 1:1 mapping and can't use VT-d

Since you mentioned this ;), I take opportunity to add that those
embedded usages are the ones that are totally fine with the compile
time passthrough-guest-ram decision, instead of a boot time
decision. Those host kernels will likely have RT patches (KVM works
great with preempt-RT indeed) and in turn the compile time ram
selection is the least of their problems as you can imagine ;). So you
can see my patch as an embedded-build option, similar to Configure
standard kernel features (for small systems) and no distro is
shipping new kernels with that feature on either.

Than if we decide 1:1 should have larger userbase instead of only the
people that knows what they're doing (i.e. 1:1 guest can destroy
linux-hypervisor) we can always add a bit of strtol parsing to 16bit
kernelloader.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reserved-ram for pci-passthrough without VT-d capable hardware

2009-04-27 Thread Andrea Arcangeli
Hello Pablo,

On Mon, Apr 27, 2009 at 11:00:51AM -0600, Passera, Pablo R wrote:
 Andrea,
 We are working with embedded hardware that does not have
VT-d and we need 1-1 mapping. I wonder which is the status of this
patch. Have you continued updating it with the latest KVM version?

Sorry to say but it isn't updated to latest KVM and latest
mainline. Porting normally should be easy. I attached last versions.

 Since you mentioned this ;), I take opportunity to add that those
 embedded usages are the ones that are totally fine with the compile
 time passthrough-guest-ram decision, instead of a boot time
 decision. Those host kernels will likely have RT patches (KVM works
 great with preempt-RT indeed) and in turn the compile time ram
 selection is the least of their problems as you can imagine ;). So you
 can see my patch as an embedded-build option, similar to Configure
 standard kernel features (for small systems) and no distro is
 shipping new kernels with that feature on either.
 
 Than if we decide 1:1 should have larger userbase instead of only the
 people that knows what they're doing (i.e. 1:1 guest can destroy
 linux-hypervisor) we can always add a bit of strtol parsing to 16bit
 kernelloader.

Agreed!
From: Andrea Arcangeli aarca...@redhat.com

The reserved RAM can be mapped by virtualization software with
/dev/mem to create a 1:1 mapping between guest physical (bus) address
and host physical (bus) address. This will allow pci passthrough with
DMA for the guest using the ram with the 1:1 mapping. The only detail
to take care of is the ram marked reserved RAM failed. The
virtualization software must create for the guest an e820 map that
only includes the reserved RAM regions but if the guest touches
memory with guest physical address in the reserved RAM failed ranges
(linux guest will do that even if the ram isn't present in the e820
map), it should provide that as ram and map it with a non linear
mapping. This should allow any linux kernel to run fine and hopefully
any other OS too.

svm ~ # cat /proc/iomem |head -n 20
-0fff : reserved RAM failed
1000-5fff : reserved RAM
6000-7fff : reserved RAM failed
8000-0009efff : reserved RAM
0009f000-0009 : reserved
000cd600-000c : pnp 00:0d
000f-000f : reserved
0010-0fff : reserved RAM
1000-3ded : System RAM
  1000-10329ab2 : Kernel code
  10329ab3-104933e7 : Kernel data
  104f5000-10558e67 : Kernel bss
3dee-3dee2fff : ACPI Non-volatile Storage
3dee3000-3dee : ACPI Tables
3def-3def : reserved
3dff-3ffe : pnp 00:0d
e000-efff : reserved
fa00-fbff : PCI Bus #01
  fa00-fbff : :01:05.0
fda0-fdbf : PCI Bus #01
svm ~ # hexdump /dev/mem | grep -C2 '   '
7e0        
*
0001000        
*
0006000 a5a5 a5a5 8ec8 8ed8 8ec0 66d0 06c7 
--
*
0007ff0     3063 1000  
0008000        
*
009f000 0002       
--
00fffe0 6000 3c03 45e7 0184 0500 0082 01c0 0223
000 5bea 00e0 31f0 2f32 3931 302f 0037 12fc
010        
*
1000 8d48 f92d  48ff ed81  1000 8948
^C
svm ~ #

Signed-off-by: From: Andrea Arcangeli aarca...@redhat.com
---

This is a port to current linux-2.6.git of the previous reserved-ram
patch. Let me know if there's a chance to get this acked and
included. Anything that isn't at compile time would require much
bigger changes just to parse the command line at 16bit realmode time
to know where to relocate the kernel dynamically. Because 1:1 is a
corner case feature required only by some users, this is the minimal
intrusive approach. This also has some limits as it can't reserve more
than 1g, and with a few more changes 2g but this is ok for a long time
as the virtualized 1:1 guest doesn't need to be huge, just a desktop.

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1276,8 +1276,36 @@ config CRASH_DUMP
  (CONFIG_RELOCATABLE=y).
  For more details see Documentation/kdump/kdump.txt
 
+config RESERVE_PHYSICAL_START
+   bool Reserve all RAM below PHYSICAL_START (EXPERIMENTAL)
+   depends on !RELOCATABLE  X86_64
+   help
+ This makes the kernel use only RAM above __PHYSICAL_START.
+ All memory below __PHYSICAL_START will be left unused and
+ marked as reserved RAM in /proc/iomem. The few special
+ pages that can't be relocated at addresses above
+ __PHYSICAL_START and that can't be guaranteed to be unused
+ by the running kernel will be marked reserved RAM failed
+ in /proc/iomem. Those may or may be not used by the kernel
+ (for example SMP trampoline pages would only be used if
+ CPU hotplug is enabled).
+
+ The reserved RAM can be mapped by virtualization software
+  

Re: [PATCH] reserved-ram for pci-passthrough without VT-d capable hardware

2008-07-30 Thread Amit Shah
* On Tuesday 29 July 2008 18:47:35 Andi Kleen wrote:
  I'm not so interested to go there right now, because while this code
  is useful right now because the majority of systems out there lacks
  VT-d/iommu, I suspect this code could be nuked in the long
  run when all systems will ship with that, which is why I kept it all

 Actually at least on Intel platforms and if you exclude the lowest end
 VT-d is shipping universally for quite some time now. If you
 buy a Intel box today or bought it in the last year the chances are pretty
 high that it has VT-d support.

I think you mean VT-x, which is virtualization extensions for the x86 
architecture. VT-d is virtualization extensions for devices (IOMMU).

Amit
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reserved-ram for pci-passthrough without VT-d capable hardware

2008-07-30 Thread Andi Kleen
On Wed, Jul 30, 2008 at 11:50:43AM +0530, Amit Shah wrote:
 * On Tuesday 29 July 2008 18:47:35 Andi Kleen wrote:
   I'm not so interested to go there right now, because while this code
   is useful right now because the majority of systems out there lacks
   VT-d/iommu, I suspect this code could be nuked in the long
   run when all systems will ship with that, which is why I kept it all
 
  Actually at least on Intel platforms and if you exclude the lowest end
  VT-d is shipping universally for quite some time now. If you
  buy a Intel box today or bought it in the last year the chances are pretty
  high that it has VT-d support.
 
 I think you mean VT-x, which is virtualization extensions for the x86 
 architecture. VT-d is virtualization extensions for devices (IOMMU).

No I really mean VT-d. The modern not very lowend Intel IOHubs all have it.

-Andi
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reserved-ram for pci-passthrough without VT-d capable hardware

2008-07-30 Thread Andrea Arcangeli
On Wed, Jul 30, 2008 at 11:50:43AM +0530, Amit Shah wrote:
 * On Tuesday 29 July 2008 18:47:35 Andi Kleen wrote:
   I'm not so interested to go there right now, because while this code
   is useful right now because the majority of systems out there lacks
   VT-d/iommu, I suspect this code could be nuked in the long
   run when all systems will ship with that, which is why I kept it all
 
  Actually at least on Intel platforms and if you exclude the lowest end
  VT-d is shipping universally for quite some time now. If you
  buy a Intel box today or bought it in the last year the chances are pretty
  high that it has VT-d support.
 
 I think you mean VT-x, which is virtualization extensions for the x86 
 architecture. VT-d is virtualization extensions for devices (IOMMU).

I think Andi understood VT-d right but even if he was right that every
reader of this email that is buying a new VT-x system today is also
almost guaranteed to get a VT-d motherboard (which I disagree unless
you buy some really expensive toy), there are current large
installations of VT-x systems that lacks VT-d and that with recent
current dual/quadcore cpus are very fast and will be used for the next
couple of years and they will not upgrade just the motherboard to use
pci-passthrough.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reserved-ram for pci-passthrough without VT-d capable hardware

2008-07-30 Thread Dor Laor

Andrea Arcangeli wrote:

On Wed, Jul 30, 2008 at 11:50:43AM +0530, Amit Shah wrote:
  

* On Tuesday 29 July 2008 18:47:35 Andi Kleen wrote:


I'm not so interested to go there right now, because while this code
is useful right now because the majority of systems out there lacks
VT-d/iommu, I suspect this code could be nuked in the long
run when all systems will ship with that, which is why I kept it all


Actually at least on Intel platforms and if you exclude the lowest end
VT-d is shipping universally for quite some time now. If you
buy a Intel box today or bought it in the last year the chances are pretty
high that it has VT-d support.
  
I think you mean VT-x, which is virtualization extensions for the x86 
architecture. VT-d is virtualization extensions for devices (IOMMU).



I think Andi understood VT-d right but even if he was right that every
reader of this email that is buying a new VT-x system today is also
almost guaranteed to get a VT-d motherboard (which I disagree unless
you buy some really expensive toy), there are current large
installations of VT-x systems that lacks VT-d and that with recent
current dual/quadcore cpus are very fast and will be used for the next
couple of years and they will not upgrade just the motherboard to use
pci-passthrough.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


In addition KVM is used in embedded too and things are slower there, we 
know of a specific use case (production) that demands

1:1 mapping and can't use VT-d
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reserved-ram for pci-passthrough without VT-d capable hardware

2008-07-30 Thread Andrea Arcangeli
On Wed, Jul 30, 2008 at 05:16:06PM +0300, Dor Laor wrote:
 In addition KVM is used in embedded too and things are slower there, we 
 know of a specific use case (production) that demands
 1:1 mapping and can't use VT-d

Since you mentioned this ;), I take opportunity to add that those
embedded usages are the ones that are totally fine with the compile
time passthrough-guest-ram decision, instead of a boot time
decision. Those host kernels will likely have RT patches (KVM works
great with preempt-RT indeed) and in turn the compile time ram
selection is the least of their problems as you can imagine ;). So you
can see my patch as an embedded-build option, similar to Configure
standard kernel features (for small systems) and no distro is
shipping new kernels with that feature on either.

Than if we decide 1:1 should have larger userbase instead of only the
people that knows what they're doing (i.e. 1:1 guest can destroy
linux-hypervisor) we can always add a bit of strtol parsing to 16bit
kernelloader.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reserved-ram for pci-passthrough without VT-d capable hardware

2008-07-30 Thread FUJITA Tomonori
On Wed, 30 Jul 2008 15:58:46 +0200
Andrea Arcangeli [EMAIL PROTECTED] wrote:

 On Wed, Jul 30, 2008 at 11:50:43AM +0530, Amit Shah wrote:
  * On Tuesday 29 July 2008 18:47:35 Andi Kleen wrote:
I'm not so interested to go there right now, because while this code
is useful right now because the majority of systems out there lacks
VT-d/iommu, I suspect this code could be nuked in the long
run when all systems will ship with that, which is why I kept it all
  
   Actually at least on Intel platforms and if you exclude the lowest end
   VT-d is shipping universally for quite some time now. If you
   buy a Intel box today or bought it in the last year the chances are pretty
   high that it has VT-d support.
  
  I think you mean VT-x, which is virtualization extensions for the x86 
  architecture. VT-d is virtualization extensions for devices (IOMMU).
 
 I think Andi understood VT-d right but even if he was right that every
 reader of this email that is buying a new VT-x system today is also
 almost guaranteed to get a VT-d motherboard (which I disagree unless
 you buy some really expensive toy), there are current large
 installations of VT-x systems that lacks VT-d and that with recent
 current dual/quadcore cpus are very fast and will be used for the next
 couple of years and they will not upgrade just the motherboard to use
 pci-passthrough.

Today, very inexpensive desktops (for example, Dell OptiPlex 755) have
VT-d support.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] reserved-ram for pci-passthrough without VT-d capable hardware

2008-07-29 Thread Andrea Arcangeli
From: Andrea Arcangeli [EMAIL PROTECTED]

The reserved RAM can be mapped by virtualization software with
/dev/mem to create a 1:1 mapping between guest physical (bus) address
and host physical (bus) address. This will allow pci passthrough with
DMA for the guest using the ram with the 1:1 mapping. The only detail
to take care of is the ram marked reserved RAM failed. The
virtualization software must create for the guest an e820 map that
only includes the reserved RAM regions but if the guest touches
memory with guest physical address in the reserved RAM failed ranges
(linux guest will do that even if the ram isn't present in the e820
map), it should provide that as ram and map it with a non linear
mapping. This should allow any linux kernel to run fine and hopefully
any other OS too.

svm ~ # cat /proc/iomem |head -n 20
-0fff : reserved RAM failed
1000-5fff : reserved RAM
6000-7fff : reserved RAM failed
8000-0009efff : reserved RAM
0009f000-0009 : reserved
000cd600-000c : pnp 00:0d
000f-000f : reserved
0010-0fff : reserved RAM
1000-3ded : System RAM
  1000-10329ab2 : Kernel code
  10329ab3-104933e7 : Kernel data
  104f5000-10558e67 : Kernel bss
3dee-3dee2fff : ACPI Non-volatile Storage
3dee3000-3dee : ACPI Tables
3def-3def : reserved
3dff-3ffe : pnp 00:0d
e000-efff : reserved
fa00-fbff : PCI Bus #01
  fa00-fbff : :01:05.0
fda0-fdbf : PCI Bus #01
svm ~ # hexdump /dev/mem | grep -C2 '   '
7e0        
*
0001000        
*
0006000 a5a5 a5a5 8ec8 8ed8 8ec0 66d0 06c7 
--
*
0007ff0     3063 1000  
0008000        
*
009f000 0002       
--
00fffe0 6000 3c03 45e7 0184 0500 0082 01c0 0223
000 5bea 00e0 31f0 2f32 3931 302f 0037 12fc
010        
*
1000 8d48 f92d  48ff ed81  1000 8948
^C
svm ~ #

Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED]
---

This is a port to current linux-2.6.git of the previous reserved-ram
patch. Let me know if there's a chance to get this acked and
included. Anything that isn't at compile time would require much
bigger changes just to parse the command line at 16bit realmode time
to know where to relocate the kernel dynamically. Because 1:1 is a
corner case feature required only by some users, this is the minimal
intrusive approach. This also has some limits as it can't reserve more
than 1g, and with a few more changes 2g but this is ok for a long time
as the virtualized 1:1 guest doesn't need to be huge, just a desktop.

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1276,8 +1276,36 @@ config CRASH_DUMP
  (CONFIG_RELOCATABLE=y).
  For more details see Documentation/kdump/kdump.txt
 
+config RESERVE_PHYSICAL_START
+   bool Reserve all RAM below PHYSICAL_START (EXPERIMENTAL)
+   depends on !RELOCATABLE  X86_64
+   help
+ This makes the kernel use only RAM above __PHYSICAL_START.
+ All memory below __PHYSICAL_START will be left unused and
+ marked as reserved RAM in /proc/iomem. The few special
+ pages that can't be relocated at addresses above
+ __PHYSICAL_START and that can't be guaranteed to be unused
+ by the running kernel will be marked reserved RAM failed
+ in /proc/iomem. Those may or may be not used by the kernel
+ (for example SMP trampoline pages would only be used if
+ CPU hotplug is enabled).
+
+ The reserved RAM can be mapped by virtualization software
+ with /dev/mem to create a 1:1 mapping between guest physical
+ (bus) address and host physical (bus) address. This will
+ allow PCI passthrough with DMA for the guest using the RAM
+ with the 1:1 mapping. The only detail to take care of is the
+ RAM marked reserved RAM failed. The virtualization
+ software must create for the guest an e820 map that only
+ includes the reserved RAM regions but if the guest touches
+ memory with guest physical address in the reserved RAM
+ failed ranges (Linux guest will do that even if the RAM
+ isn't present in the e820 map), it should provide that as
+ RAM and map it with a non-linear mapping. This should allow
+ any Linux kernel to run fine and hopefully any other OS too.
+
 config PHYSICAL_START
-   hex Physical address where the kernel is loaded if (EMBEDDED || 
CRASH_DUMP)
+   hex Physical address where the kernel is loaded if (EMBEDDED || 
CRASH_DUMP || RESERVE_PHYSICAL_START)
default 0x100 if X86_NUMAQ
default 0x20 if X86_64
default 0x10
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
--- a/arch/x86/kernel/e820.c
+++ 

Re: [PATCH] reserved-ram for pci-passthrough without VT-d capable hardware

2008-07-29 Thread Andi Kleen
 This is a port to current linux-2.6.git of the previous reserved-ram
 patch. Let me know if there's a chance to get this acked and
 included. Anything that isn't at compile time would require much

I still think runtime would be far better. Nobody really wants
a proliferation of more weird special kernel images.

 bigger changes just to parse the command line at 16bit realmode time

You could always do it with kexec if you think 16bit real mode is
too hard.

-Andi
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reserved-ram for pci-passthrough without VT-d capable hardware

2008-07-29 Thread Andrea Arcangeli
On Tue, Jul 29, 2008 at 02:43:17PM +0200, Andi Kleen wrote:
  This is a port to current linux-2.6.git of the previous reserved-ram
  patch. Let me know if there's a chance to get this acked and
  included. Anything that isn't at compile time would require much
 
 I still think runtime would be far better. Nobody really wants
 a proliferation of more weird special kernel images.

Not for the usage we're interested about but surely this would prevent
distro to take advantage of the feature. The question is if distro
need to take advantage of the feature in the first place instead of
sticking with VT-d. 1:1 isn't secure virtualization as the guest must
be trusted so it's not necessarily a good model to deploy to users
that don't know exactly what they're doing.

  bigger changes just to parse the command line at 16bit realmode time
 
 You could always do it with kexec if you think 16bit real mode is
 too hard.

It's not too hard, but it'll add bloat to the 16 bit part of the boot
in the bzImage. It's likely simpler than kexec and surely more
user-friendly to setup for the end user.

In any case, my patch does the needed bits with regard to the e820
map. An incremental patch can add the parsing of the booatloader and
switch the Kconfig dependency from PHYSICAL_START to RELOCATABLE. The
e820 file will then have to replace the __PHYSICAL_START define with
something else and that's all.

I mean it's not entirely backwards to provide a compile time smaller
and simpler approach initially, and then to go where you want to go
incrementally later if we're sure there's enough userbase needing 1:1.

I'm not so interested to go there right now, because while this code
is useful right now because the majority of systems out there lacks
VT-d/iommu, I suspect this code could be nuked in the long
run when all systems will ship with that, which is why I kept it all
under #ifdef, and the changes to the other files outside ifdef are
bugfixes needed if you want to kexec-relocate above 40m or so that
should be kept.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reserved-ram for pci-passthrough without VT-d capable hardware

2008-07-29 Thread Andi Kleen
 I'm not so interested to go there right now, because while this code
 is useful right now because the majority of systems out there lacks
 VT-d/iommu, I suspect this code could be nuked in the long
 run when all systems will ship with that, which is why I kept it all

Actually at least on Intel platforms and if you exclude the lowest end
VT-d is shipping universally for quite some time now. If you
buy a Intel box today or bought it in the last year the chances are pretty 
high that it has VT-d support.

 under #ifdef, and the changes to the other files outside ifdef are
 bugfixes needed if you want to kexec-relocate above 40m or so that
 should be kept.

You should split that out then into a separate patch.

-Andi
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] reserved-ram for pci-passthrough without VT-d capable hardware

2008-06-24 Thread Andrea Arcangeli
From: Andrea Arcangeli [EMAIL PROTECTED]

This has to be applied to the host kernel and for example specifying a
relocation address of 0x2000 it will allow to start kvm guests
capable of pci-passthrough up to -m 512 by passing the
-reserved-ram parameter in the command line. There's no risk of
errors from the user thanks to the reserved ranges being provided to
the virtualization software through /proc/iomem. Only you shouldn't
run more than one -reserved-ram kvm quest per system at once.

This works by reserving the ram early in the e820 map so the initial
pagetables are allocated above the kernel .text relocation and then I
make the sparse code think the reserved-ram is actually available (so
struct pages are allocated) and finally I've to reserve those pages in
the bootmem allocator immediately after the bootmem allocator has been
initialized, so they remain PageReserved not used by linux, but with
'struct page' backing so they can still be exported to qemu via device
driver vma-fault (as they can still be the target of any emulated
dma, not all devices will passthrough).

The virtualization software must create for the guest an e820 map that
only includes the reserved RAM regions but if the guest touches
memory with guest physical address in the reserved RAM failed ranges
it should provide that as ram and map it with a non linear
mapping (in practice the only problem is for the first page at address
0 physical which is usually the bios and no sane OS is doing DMA to
it).

vmx ~ # cat /proc/iomem |head -n 20
-0fff : reserved RAM failed
1000-0008 : reserved RAM
0009-00091fff : reserved RAM failed
00092000-0009cfff : reserved RAM
0009d000-0009 : reserved
000a-000ec16f : reserved RAM failed
000ec170-000f : reserved
0010-1fff : reserved RAM
2000-bff9 : System RAM
  2000-20315f65 : Kernel code
  20315f66-204c3767 : Kernel data
  20557000-205c9eff : Kernel bss
bffa-bffa : ACPI Tables
bffb-bffd : ACPI Non-volatile Storage
bffe-bffedfff : reserved
bfff-bfff : reserved
d000-dfff : PCI Bus :02
  d000-dfff : :02:00.0
e000-efff : PCI MMCONFIG 0
  e000-efff : pnp 00:0c

Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED]
---

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1198,8 +1198,36 @@ config CRASH_DUMP
  (CONFIG_RELOCATABLE=y).
  For more details see Documentation/kdump/kdump.txt
 
+config RESERVE_PHYSICAL_START
+   bool Reserve all RAM below PHYSICAL_START (EXPERIMENTAL)
+   depends on !RELOCATABLE  X86_64
+   help
+ This makes the kernel use only RAM above __PHYSICAL_START.
+ All memory below __PHYSICAL_START will be left unused and
+ marked as reserved RAM in /proc/iomem. The few special
+ pages that can't be relocated at addresses above
+ __PHYSICAL_START and that can't be guaranteed to be unused
+ by the running kernel will be marked reserved RAM failed
+ in /proc/iomem. Those may or may be not used by the kernel
+ (for example SMP trampoline pages would only be used if
+ CPU hotplug is enabled).
+
+ The reserved RAM can be mapped by virtualization software
+ with /dev/mem to create a 1:1 mapping between guest physical
+ (bus) address and host physical (bus) address. This will
+ allow PCI passthrough with DMA for the guest using the RAM
+ with the 1:1 mapping. The only detail to take care of is the
+ RAM marked reserved RAM failed. The virtualization
+ software must create for the guest an e820 map that only
+ includes the reserved RAM regions but if the guest touches
+ memory with guest physical address in the reserved RAM
+ failed ranges (Linux guest will do that even if the RAM
+ isn't present in the e820 map), it should provide that as
+ RAM and map it with a non-linear mapping. This should allow
+ any Linux kernel to run fine and hopefully any other OS too.
+
 config PHYSICAL_START
-   hex Physical address where the kernel is loaded if (EMBEDDED || 
CRASH_DUMP)
+   hex Physical address where the kernel is loaded if (EMBEDDED || 
CRASH_DUMP || RESERVE_PHYSICAL_START)
default 0x100 if X86_NUMAQ
default 0x20 if X86_64
default 0x10
diff --git a/arch/x86/kernel/e820_64.c b/arch/x86/kernel/e820_64.c
--- a/arch/x86/kernel/e820_64.c
+++ b/arch/x86/kernel/e820_64.c
@@ -119,7 +119,31 @@ void __init early_res_to_bootmem(unsigne
printk(KERN_INFO   early res: %d [%lx-%lx] %s\n, i,
final_start, final_end - 1, r-name);
reserve_bootmem_generic(final_start, final_end - final_start);
+#ifdef CONFIG_RESERVE_PHYSICAL_START
+   if (r-start  __PHYSICAL_START)
+   add_memory_region(r-start, r-end - r-start,
+