[SeaBIOS PATCH 2/2] hotplug: Add device per func in ACPI DSDT tables

2011-09-19 Thread Amos Kong

Only func 0 is registered to guest driver (we can
only found func 0 in slot->funcs list of driver),
the other functions could not be cleaned when
hot-removing the whole slot. This patch adds
device per function in ACPI DSDT tables.

Have tested with linux/winxp/win7, hot-adding/hot-remving,
single/multiple function device, they are all fine.

new acpi-dst.hex(332K):
http://amos-kong.rhcloud.com/pub/acpi-dsdt.hex

Signed-off-by: Amos Kong 
---
 src/acpi-dsdt.dsl |   31 +--
 1 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 08412e2..d1426ec 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -128,9 +128,9 @@ DefinitionBlock (
 PCRM, 32,
 }
 
-#define hotplug_slot(name, nr) \
-Device (S##name) {\
-   Name (_ADR, nr##)  \
+#define hotplug_func(name, nr, adr, fn) \
+Device (S##name##fn) {\
+   Name (_ADR, adr)\
Method (_EJ0,1) {  \
 Store(ShiftLeft(1, nr), B0EJ) \
 Return (0x0)  \
@@ -138,6 +138,16 @@ DefinitionBlock (
Name (_SUN, name)  \
 }
 
+#define hotplug_slot(name, nr) \
+   hotplug_func(name, nr, nr##, 0)  \
+   hotplug_func(name, nr, nr##0001, 1)  \
+   hotplug_func(name, nr, nr##0002, 2)  \
+   hotplug_func(name, nr, nr##0003, 3)  \
+   hotplug_func(name, nr, nr##0004, 4)  \
+   hotplug_func(name, nr, nr##0005, 5)  \
+   hotplug_func(name, nr, nr##0006, 6)  \
+   hotplug_func(name, nr, nr##0007, 7)
+
hotplug_slot(1, 0x0001)
hotplug_slot(2, 0x0002)
hotplug_slot(3, 0x0003)
@@ -842,13 +852,22 @@ DefinitionBlock (
 Return(0x01)
 }
 
-#define gen_pci_hotplug(nr)   \
+#define gen_pci_hotplug_func(nr, fn)  \
 If (And(\_SB.PCI0.PCIU, ShiftLeft(1, nr))) {  \
-Notify(\_SB.PCI0.S##nr, 1)\
+Notify(\_SB.PCI0.S##nr##fn, 1)\
 } \
 If (And(\_SB.PCI0.PCID, ShiftLeft(1, nr))) {  \
-Notify(\_SB.PCI0.S##nr, 3)\
+Notify(\_SB.PCI0.S##nr##fn, 3)\
 }
+#define gen_pci_hotplug(nr) \
+   gen_pci_hotplug_func(nr, 0)\
+   gen_pci_hotplug_func(nr, 1)\
+   gen_pci_hotplug_func(nr, 2)\
+   gen_pci_hotplug_func(nr, 3)\
+   gen_pci_hotplug_func(nr, 4)\
+   gen_pci_hotplug_func(nr, 5)\
+   gen_pci_hotplug_func(nr, 6)\
+   gen_pci_hotplug_func(nr, 7)
 
 Method(_L01) {
 gen_pci_hotplug(1)
-- 
1.7.6.1
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory API code review

2011-09-19 Thread Peter Maydell
On 14 September 2011 16:07, Avi Kivity  wrote:
> I would like to carry out an online code review of the memory API so that
> more people are familiar with the internals, and perhaps even to catch some
> bugs or deficiency.  I'd like to use the next kvm conference call slot for
> this (Tuesday 1400 UTC) since many people already have it reserved in the
> schedule.
>
> It would be great if people from the wider qemu community be present, rather
> than the usual "x86 is everything" crowd (+Jan) that usually participates in
> the kvm weekly call.

I can dial in if that's useful -- are the dialin details available somewhere?

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC V2 PATCH 0/5] Multiqueue support in tun/tap

2011-09-19 Thread Jason Wang

On 09/18/2011 03:17 AM, Michael S. Tsirkin wrote:

On Sat, Sep 17, 2011 at 02:02:04PM +0800, Jason Wang wrote:

A wiki-page was created to narrate the detail design of all parts
involved in the multi queue implementation:
http://www.linux-kvm.org/page/Multiqueue and some basic tests result
could be seen in this page
http://www.linux-kvm.org/page/Multiqueue-performance-Sep-13. I would
post the detail numbers in attachment as the reply of this thread.

Does it make sense to test both with and without RPS in guest?


I've tested with RPS in guest, but didn't see improvements.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS PATCH 2/2] hotplug: Add device per func in ACPI DSDT tables

2011-09-19 Thread Michael S. Tsirkin
On Mon, Sep 19, 2011 at 12:36:44PM +0300, Michael S. Tsirkin wrote:
> On Mon, Sep 19, 2011 at 02:53:47PM +0800, Amos Kong wrote:
> > Only func 0 is registered to guest driver (we can
> > only found func 0 in slot->funcs list of driver),
> > the other functions could not be cleaned when
> > hot-removing the whole slot. This patch adds
> > device per function in ACPI DSDT tables.
> > 
> > Have tested with linux/winxp/win7, hot-adding/hot-remving,
> > single/multiple function device, they are all fine.
> > 
> > Signed-off-by: Amos Kong 
> 
> On top of my previous patch, the below saves another 6K by moving the
> method to the correct scope. The code for hotplug handling
> also gets better organized this way which is nice.
> 
> Signed-off-by: Michael S. Tsirkin 

To clarify: both this and the previous patch are only compiled,
not tested.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 3/4] block: add block timer and throttling algorithm

2011-09-19 Thread Zhi Yong Wu
On Wed, Sep 14, 2011 at 6:50 PM, Marcelo Tosatti  wrote:
> On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote:
>> On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti  wrote:
>> > On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
>> >> Note:
>> >>      1.) When bps/iops limits are specified to a small value such as 511 
>> >> bytes/s, this VM will hang up. We are considering how to handle this 
>> >> senario.
>> >
>> > You can increase the length of the slice, if the request is larger than
>> > slice_time * bps_limit.
>> Yeah, but it is a challenge for how to increase it. Do you have some nice 
>> idea?
>
> If the queue is empty, and the request being processed does not fit the
> queue, increase the slice so that the request fits.
Sorry for late reply. actually, do you think that this scenario is
meaningful for the user?
Since we implement this, if the user limits the bps below 512
bytes/second, the VM can also not run every task.
Can you let us know why we need to make such effort?

>
> That is, make BLOCK_IO_SLICE_TIME dynamic and adjust it as described
> above (if the bps or io limits change, reset it to the default
> BLOCK_IO_SLICE_TIME).
>
>> >>      2.) When "dd" command is issued in guest, if its option bs is set to 
>> >> a large value such as "bs=1024K", the result speed will slightly bigger 
>> >> than the limits.
>> >
>> > Why?
>> This issue has not existed. I will remove it.
>> When drive bps=100, i did some testings on guest VM.
>> 1.) bs=1024K
>> 18+0 records in
>> 18+0 records out
>> 18874368 bytes (19 MB) copied, 26.6268 s, 709 kB/s
>> 2.) bs=2048K
>> 18+0 records in
>> 18+0 records out
>> 37748736 bytes (38 MB) copied, 46.5336 s, 811 kB/s
>>
>> >
>> > There is lots of debugging leftovers in the patch.
>> sorry, i forgot to remove them.
>> >
>> >
>
>



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS] [SeaBIOS PATCH 2/2] hotplug: Add device per func in ACPI DSDT tables

2011-09-19 Thread Gleb Natapov
On Mon, Sep 19, 2011 at 03:27:38AM -0400, Amos Kong wrote:
> 
> Only func 0 is registered to guest driver (we can
> only found func 0 in slot->funcs list of driver),
> the other functions could not be cleaned when
> hot-removing the whole slot. This patch adds
> device per function in ACPI DSDT tables.
> 
You can't unplug a single function. Guest surely knows that.

> Have tested with linux/winxp/win7, hot-adding/hot-remving,
> single/multiple function device, they are all fine.
> 
What was not fine before?

Have you looked at real HW that supports PCI hot plug DSDT? Does it
looks the same?

> new acpi-dst.hex(332K):
> http://amos-kong.rhcloud.com/pub/acpi-dsdt.hex
> 
> Signed-off-by: Amos Kong 
> ---
>  src/acpi-dsdt.dsl |   31 +--
>  1 files changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
> index 08412e2..d1426ec 100644
> --- a/src/acpi-dsdt.dsl
> +++ b/src/acpi-dsdt.dsl
> @@ -128,9 +128,9 @@ DefinitionBlock (
>  PCRM, 32,
>  }
>  
> -#define hotplug_slot(name, nr) \
> -Device (S##name) {\
> -   Name (_ADR, nr##)  \
> +#define hotplug_func(name, nr, adr, fn) \
> +Device (S##name##fn) {\
> +   Name (_ADR, adr)\
> Method (_EJ0,1) {  \
>  Store(ShiftLeft(1, nr), B0EJ) \
>  Return (0x0)  \
> @@ -138,6 +138,16 @@ DefinitionBlock (
> Name (_SUN, name)  \
>  }
>  
> +#define hotplug_slot(name, nr) \
> + hotplug_func(name, nr, nr##, 0)  \
> + hotplug_func(name, nr, nr##0001, 1)  \
> + hotplug_func(name, nr, nr##0002, 2)  \
> + hotplug_func(name, nr, nr##0003, 3)  \
> + hotplug_func(name, nr, nr##0004, 4)  \
> + hotplug_func(name, nr, nr##0005, 5)  \
> + hotplug_func(name, nr, nr##0006, 6)  \
> + hotplug_func(name, nr, nr##0007, 7)
> +
>   hotplug_slot(1, 0x0001)
>   hotplug_slot(2, 0x0002)
>   hotplug_slot(3, 0x0003)
> @@ -842,13 +852,22 @@ DefinitionBlock (
>  Return(0x01)
>  }
>  
> -#define gen_pci_hotplug(nr)   \
> +#define gen_pci_hotplug_func(nr, fn)  \
>  If (And(\_SB.PCI0.PCIU, ShiftLeft(1, nr))) {  \
> -Notify(\_SB.PCI0.S##nr, 1)\
> +Notify(\_SB.PCI0.S##nr##fn, 1)\
>  } \
>  If (And(\_SB.PCI0.PCID, ShiftLeft(1, nr))) {  \
> -Notify(\_SB.PCI0.S##nr, 3)\
> +Notify(\_SB.PCI0.S##nr##fn, 3)\
>  }
> +#define gen_pci_hotplug(nr) \
> + gen_pci_hotplug_func(nr, 0)\
> + gen_pci_hotplug_func(nr, 1)\
> + gen_pci_hotplug_func(nr, 2)\
> + gen_pci_hotplug_func(nr, 3)\
> + gen_pci_hotplug_func(nr, 4)\
> + gen_pci_hotplug_func(nr, 5)\
> + gen_pci_hotplug_func(nr, 6)\
> + gen_pci_hotplug_func(nr, 7)
>  
>  Method(_L01) {
>  gen_pci_hotplug(1)
> -- 
> 1.7.6.1
> 
> ___
> SeaBIOS mailing list
> seab...@seabios.org
> http://www.seabios.org/mailman/listinfo/seabios

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS] [SeaBIOS PATCH 2/2] hotplug: Add device per func in ACPI DSDT tables

2011-09-19 Thread Michael S. Tsirkin
On Mon, Sep 19, 2011 at 12:57:33PM +0300, Gleb Natapov wrote:
> On Mon, Sep 19, 2011 at 03:27:38AM -0400, Amos Kong wrote:
> > 
> > Only func 0 is registered to guest driver (we can
> > only found func 0 in slot->funcs list of driver),
> > the other functions could not be cleaned when
> > hot-removing the whole slot. This patch adds
> > device per function in ACPI DSDT tables.
> > 
> You can't unplug a single function. Guest surely knows that.

Looking at guest code, it's clear that
at least a Linux guest doesn't know that.

> > Have tested with linux/winxp/win7, hot-adding/hot-remving,
> > single/multiple function device, they are all fine.
> > 
> What was not fine before?
> 
> Have you looked at real HW that supports PCI hot plug DSDT? Does it
> looks the same?

I recall I saw some examples like this on the net.


> > new acpi-dst.hex(332K):
> > http://amos-kong.rhcloud.com/pub/acpi-dsdt.hex
> > 
> > Signed-off-by: Amos Kong 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS PATCH 2/2] hotplug: Add device per func in ACPI DSDT tables

2011-09-19 Thread Michael S. Tsirkin
On Mon, Sep 19, 2011 at 12:36:44PM +0300, Michael S. Tsirkin wrote:
> On Mon, Sep 19, 2011 at 02:53:47PM +0800, Amos Kong wrote:
> > Only func 0 is registered to guest driver (we can
> > only found func 0 in slot->funcs list of driver),
> > the other functions could not be cleaned when
> > hot-removing the whole slot. This patch adds
> > device per function in ACPI DSDT tables.
> > 
> > Have tested with linux/winxp/win7, hot-adding/hot-remving,
> > single/multiple function device, they are all fine.
> > 
> > Signed-off-by: Amos Kong 
> 
> On top of my previous patch, the below saves another 6K by moving the
> method to the correct scope. The code for hotplug handling
> also gets better organized this way which is nice.
> 
> Signed-off-by: Michael S. Tsirkin 

We naturally should cleanup the old macro, it's unused now,
even though this doe snot save space :)


diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 36467ea..646f146 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -523,7 +523,7 @@ DefinitionBlock (
 Notify(S##nr##5, 1)\
 Notify(S##nr##6, 1)\
 Notify(S##nr##7, 1)\
-}\
+}  \
 If (And(PCID, ShiftLeft(1, nr))) { \
 Notify(S##nr##0, 3)\
 Notify(S##nr##1, 3)\
@@ -910,28 +910,6 @@ DefinitionBlock (
 Return(0x01)
 }
 
-#define gen_pci_hotplug(nr) \
-If (And(\_SB.PCI0.PCIU, ShiftLeft(1, nr))) {  \
-Notify(\_SB.PCI0.S##nr##0, 1)\
-Notify(\_SB.PCI0.S##nr##1, 1)\
-Notify(\_SB.PCI0.S##nr##2, 1)\
-Notify(\_SB.PCI0.S##nr##3, 1)\
-Notify(\_SB.PCI0.S##nr##4, 1)\
-Notify(\_SB.PCI0.S##nr##5, 1)\
-Notify(\_SB.PCI0.S##nr##6, 1)\
-Notify(\_SB.PCI0.S##nr##7, 1)\
-}\
-If (And(\_SB.PCI0.PCID, ShiftLeft(1, nr))) { \
-Notify(\_SB.PCI0.S##nr##0, 3)\
-Notify(\_SB.PCI0.S##nr##1, 3)\
-Notify(\_SB.PCI0.S##nr##2, 3)\
-Notify(\_SB.PCI0.S##nr##3, 3)\
-Notify(\_SB.PCI0.S##nr##4, 3)\
-Notify(\_SB.PCI0.S##nr##5, 3)\
-Notify(\_SB.PCI0.S##nr##6, 3)\
-Notify(\_SB.PCI0.S##nr##7, 3)\
-}
-
 Method(_L01) {
\_SB.PCI0.HPLG()
 Return (0x01)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS] [SeaBIOS PATCH 2/2] hotplug: Add device per func in ACPI DSDT tables

2011-09-19 Thread Gleb Natapov
On Mon, Sep 19, 2011 at 01:02:59PM +0300, Michael S. Tsirkin wrote:
> On Mon, Sep 19, 2011 at 12:57:33PM +0300, Gleb Natapov wrote:
> > On Mon, Sep 19, 2011 at 03:27:38AM -0400, Amos Kong wrote:
> > > 
> > > Only func 0 is registered to guest driver (we can
> > > only found func 0 in slot->funcs list of driver),
> > > the other functions could not be cleaned when
> > > hot-removing the whole slot. This patch adds
> > > device per function in ACPI DSDT tables.
> > > 
> > You can't unplug a single function. Guest surely knows that.
> 
> Looking at guest code, it's clear that
> at least a Linux guest doesn't know that.
> 
Have you asked relevant maintainers why is it so? Does Windows do the
same? (Obviously you can't check Windows code, but you can see if
removing function zero removes other functions from the device manager,
or you can even try to access other function).

If I am not mistaken, with this new DSDT you will see more then one eject
options in Windows GUI from each multi-function device. Is it so?

> > > Have tested with linux/winxp/win7, hot-adding/hot-remving,
> > > single/multiple function device, they are all fine.
> > > 
> > What was not fine before?
> > 
> > Have you looked at real HW that supports PCI hot plug DSDT? Does it
> > looks the same?
> 
> I recall I saw some examples like this on the net.
> 
Checking real HW DSDT will validate that we are doing a right thing here.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS] [SeaBIOS PATCH 2/2] hotplug: Add device per func in ACPI DSDT tables

2011-09-19 Thread Gleb Natapov
On Mon, Sep 19, 2011 at 01:12:30PM +0300, Gleb Natapov wrote:
> On Mon, Sep 19, 2011 at 01:02:59PM +0300, Michael S. Tsirkin wrote:
> > On Mon, Sep 19, 2011 at 12:57:33PM +0300, Gleb Natapov wrote:
> > > On Mon, Sep 19, 2011 at 03:27:38AM -0400, Amos Kong wrote:
> > > > 
> > > > Only func 0 is registered to guest driver (we can
> > > > only found func 0 in slot->funcs list of driver),
> > > > the other functions could not be cleaned when
> > > > hot-removing the whole slot. This patch adds
> > > > device per function in ACPI DSDT tables.
> > > > 
> > > You can't unplug a single function. Guest surely knows that.
> > 
> > Looking at guest code, it's clear that
> > at least a Linux guest doesn't know that.
> > 
> Have you asked relevant maintainers why is it so? Does Windows do the
> same? (Obviously you can't check Windows code, but you can see if
> removing function zero removes other functions from the device manager,
> or you can even try to access other function).
> 
> If I am not mistaken, with this new DSDT you will see more then one eject
> options in Windows GUI from each multi-function device. Is it so?
> 
> > > > Have tested with linux/winxp/win7, hot-adding/hot-remving,
> > > > single/multiple function device, they are all fine.
> > > > 
> > > What was not fine before?
> > > 
> > > Have you looked at real HW that supports PCI hot plug DSDT? Does it
> > > looks the same?
> > 
> > I recall I saw some examples like this on the net.
> > 
> Checking real HW DSDT will validate that we are doing a right thing here.
> 
According to Microsoft own documentation they want _EJ0 for each
function:
http://www.microsoft.com/china/whdc/system/pnppwr/hotadd/hotplugpci.mspx

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: .img on nfs, relative on ram, consuming mass ram

2011-09-19 Thread Rickard Lundin

> 
> Hi !
> Here's my config :
> 
> Version : qemu-kvm-0.12.5, qemu-kvm-0.12.4
> Hosts : AMD 64X2, Phenom and Core 2 duo
> OS : Slackware 64 13.0
> Kernel : 2.6.35.4 and many previous versions
> 
> I use a PXE server to boot semi-diskless (swap partitions and some local 
stuff) stations.
> This server also serves a read-only nfs folder, with plenty of .img on it.
> When clients connects, a relative image is created in /tmp, which is a 
tmpfs, so hosted in ram.
> 
> And here i go on my 2G stations :
> qemu-system-x86_64 -m 1024 -vga std -usb -usbdevice tablet -localtime -
soundhw es1370 /tmp/relimg.img
> qemu-system-x86_64 -m 1024 -vga std -usb -usbdevice tablet -localtime -
soundhw es1370 /dev/shm/relimg.img
> 
> I tried both. Always the same result : the ram is consumed quickly, and mass 
swap occurs.
> On a 4G system, i see kvm uses more than 1024, maybe 1200.
> And everytime a launch a program inside the vm, the amount of the host free 
ram (not cached) diminishes,
> which is weird, because it should have been reserved.
> 
> So on a 2G system, swap occurs very fast and the machine slow a lot down.
> An on a total diskless system, this leads fast to a freeze.
> 
> I have no problem if i use a relative image on disk :
> qemu-system-x86_64 -m 1024 -vga std -usb -usbdevice tablet -localtime -
soundhw es1370 -drive file=/mnt/hd/sda/sda1/tmp/relimg.img,cache=none
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo  vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


I am also looking for a "speed solution" so i put the whole image in /dev/shm.
host machine is ubuntu 11.10 betha , and guest is ubuntu 11.04.
it works , and i get 500mbyte/sec using virtio. .. I was hoping for a lot of 
more since its a i920 , 24gig host machine.

I will try disable the cache i guess it will improve the speed. 

My question is , what would the optimal filesystem of the guest be ? om using 
ext4 , but its a bit silly since the the host kvm image is in ram.

..ive got 12g /dev/shm , since the hostmache is 24gb.  my kvm image is 6 gig.

/Rickard


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS] [SeaBIOS PATCH 2/2] hotplug: Add device per func in ACPI DSDT tables

2011-09-19 Thread Michael S. Tsirkin
On Mon, Sep 19, 2011 at 01:32:48PM +0300, Gleb Natapov wrote:
> > > I recall I saw some examples like this on the net.
> > > 
> > Checking real HW DSDT will validate that we are doing a right thing here.
> > 
> According to Microsoft own documentation they want _EJ0 for each
> function:
> http://www.microsoft.com/china/whdc/system/pnppwr/hotadd/hotplugpci.mspx

Right, this is the link I was thinking of.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM: Fix simultaneous NMIs

2011-09-19 Thread Marcelo Tosatti
On Thu, Sep 15, 2011 at 08:48:58PM +0300, Avi Kivity wrote:
> On 09/15/2011 08:25 PM, Jan Kiszka wrote:
> >>
> >>  I think so.  Suppose the vcpu enters just after kvm_make_request(); it
> >>  sees KVM_REQ_EVENT and clears it, but doesn't see nmi_pending because it
> >>  wasn't set set.  Then comes a kick, the guest is reentered with
> >>  nmi_pending set but KVM_REQ_EVENT clear and sails through the check and
> >>  enters the guest.  The NMI is delayed until the next KVM_REQ_EVENT.
> >
> >That makes sense - and the old code looks more strange now.
> 
> I think it dates to the time all NMIs were synchronous.
> 
> 
>   /* try to inject new event if pending */
>    -  if (vcpu->arch.nmi_pending) {
>    +  if (atomic_read(&vcpu->arch.nmi_pending)) {
>   if (kvm_x86_ops->nmi_allowed(vcpu)) {
>    -  vcpu->arch.nmi_pending = false;
>    +  atomic_dec(&vcpu->arch.nmi_pending);
> >>>
> >>>  Here we lost NMIs in the past by overwriting nmi_pending while another
> >>>  one was already queued, right?
> >>
> >>  One place, yes.  The other is kvm_inject_nmi() - if the first nmi didn't
> >>  get picked up by the vcpu by the time the second nmi arrives, we lose
> >>  the second nmi.
> >
> >Thinking this through again, it's actually not yet clear to me what we
> >are modeling here: If two NMI events arrive almost perfectly in
> >parallel, does the real hardware guarantee that they will always cause
> >two NMI events in the CPU? Then this is required.
> 
> It's not 100% clear from the SDM, but this is what I understood from
> it.  And it's needed - the NMI handlers are now being reworked to
> handle just one NMI source (hopefully the cheapest) in the handler,
> and if we detect a back-to-back NMI, handle all possible NMI
> sources.  This optimization is needed in turn so we can use Jeremy's
> paravirt spinlock framework, which requires a sleep primitive and a
> wake-up-even-if-the-sleeper-has-interrupts-disabled primitive.  i
> thought of using HLT and NMIs respectively, but that means we need a
> cheap handler (i.e. don't go reading PMU MSRs).
> 
> >Otherwise I just lost understanding again why we were loosing NMIs here
> >and in kvm_inject_nmi (maybe elsewhere then?).
> 
> Because of that.
> 
> >>
>   if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>   inject_pending_event(vcpu);
> 
>   /* enable NMI/IRQ window open exits if needed */
>    -  if (nmi_pending)
>    +  if (atomic_read(&vcpu->arch.nmi_pending)
>    +  &&   nmi_in_progress(vcpu))
> >>>
> >>>  Is nmi_pending&&   !nmi_in_progress possible at all?
> >>
> >>  Yes, due to NMI-blocked-by-STI.  A really touchy area.
> >And we don't need the window exit notification then? I don't understand
> >what nmi_in_progress is supposed to do here.
> 
> We need the window notification in both cases.  If we're recovering
> from STI, then we don't need to collapse NMIs.  If we're completing
> an NMI handler, then we do need to collapse NMIs (since the queue
> length is two, and we just completed one).

I don't understand what is the point with nmi_in_progress, and the above
hunk, either. Can't inject_nmi do:

if (nmi_injected + atomic_read(nmi_pending) < 2)
atomic_inc(nmi_pending)

Instead of collapsing somewhere else? You'd also have to change
nmi_injected handling in arch code so its value is not "hidden", in
complete_interrupts().

> >>
> >>>  If not, what will happen next?
> >>
> >>  The NMI window will open and we'll inject the NMI.
> >
> >How will we know this? We do not request the exit, that's my worry.
> 
> I think we do? Oh, but this patch breaks it.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm, mmu: fix incorrect return of spte

2011-09-19 Thread Marcelo Tosatti
On Mon, Sep 19, 2011 at 12:19:51PM +0800, Zhao Jin wrote:
> __update_clear_spte_slow should return original spte while the
> current code returns low half of original spte combined with high
> half of new spte.
> 
> Signed-off-by: Zhao Jin 
> ---
>  arch/x86/kvm/mmu.c |3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tool: get rid of entire setnet handling

2011-09-19 Thread Pekka Enberg
On Sun, Sep 18, 2011 at 11:47 PM, Hagen Paul Pfeifer  wrote:
> 7eeef124e30ca2123f removes setnet.sh handling through build-in nfs dhcp
> client. This patch get rid of "kvm setup" functionality to copy the now
> non-existing setnet script. Otherwise kvm setup is silently aborted.
>
> Signed-off-by: Hagen Paul Pfeifer 
> Cc: Sasha Levin 
> ---
>  tools/kvm/builtin-setup.c |   13 -
>  1 files changed, 0 insertions(+), 13 deletions(-)
>
> diff --git a/tools/kvm/builtin-setup.c b/tools/kvm/builtin-setup.c
> index c93eec3..6b8eb5b 100644
> --- a/tools/kvm/builtin-setup.c
> +++ b/tools/kvm/builtin-setup.c
> @@ -129,15 +129,6 @@ static int copy_init(const char *guestfs_name)
>        return copy_file("guest/init", path);
>  }
>
> -static int copy_net(const char *guestfs_name)
> -{
> -       char path[PATH_MAX];
> -
> -       snprintf(path, PATH_MAX, "%s%s%s/virt/setnet.sh", HOME_DIR, 
> KVM_PID_FILE_PATH, guestfs_name);
> -
> -       return copy_file("guest/setnet.sh", path);
> -}
> -
>  static int make_guestfs_symlink(const char *guestfs_name, const char *path)
>  {
>        char target[PATH_MAX];
> @@ -195,10 +186,6 @@ static int do_setup(const char *guestfs_name)
>                make_guestfs_symlink(guestfs_name, guestfs_symlinks[i]);
>        }
>
> -       ret = copy_net(guestfs_name);
> -       if (ret < 0)
> -               return ret;
> -
>        return copy_init(guestfs_name);
>  }

Sasha already too care of this with commit
98ee903fd414366fbc1f72ed787b189a51f15c38 ("kvm tools: Don't copy
network autoconfiguration script"). Thanks for the patch, though!

 Pekka
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM: Fix simultaneous NMIs

2011-09-19 Thread Avi Kivity

On 09/19/2011 04:54 PM, Marcelo Tosatti wrote:

>  >>
>  >>   Yes, due to NMI-blocked-by-STI.  A really touchy area.
>  >And we don't need the window exit notification then? I don't understand
>  >what nmi_in_progress is supposed to do here.
>
>  We need the window notification in both cases.  If we're recovering
>  from STI, then we don't need to collapse NMIs.  If we're completing
>  an NMI handler, then we do need to collapse NMIs (since the queue
>  length is two, and we just completed one).

I don't understand what is the point with nmi_in_progress, and the above
hunk, either. Can't inject_nmi do:

if (nmi_injected + atomic_read(nmi_pending)<  2)
 atomic_inc(nmi_pending)

Instead of collapsing somewhere else?


We could.  It's not atomic though - two threads executing in parallel 
could raise the value to three.  Could do a cmpxchg loop does an 
increment bounded to two.  I guess this is a lot clearer, thanks.



You'd also have to change
nmi_injected handling in arch code so its value is not "hidden", in
complete_interrupts().


Or maybe make raising nmi_injected not decrement nmi_pending.  So:

  nmi_pending: total number of interrupts in queue
  nmi_injected: of these, how many are currently being injected

yes?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC V2 PATCH 0/5] Multiqueue support in tun/tap

2011-09-19 Thread Ben Hutchings
On Sat, 2011-09-17 at 14:02 +0800, Jason Wang wrote:
[...]
> 2 Current implementation may also get regression for single session
> packet transmission.
> 
> The reason is packets from each flow were not handled by the same
> queue/vhost thread.
> 
> Various method could be done to handle this:
> 
> 2.1 hack the guest driver, and store the queue index into the rxhash and
> use it when choosing tx in guest. This need some hack to store the
> rxhash into sk and pass it in to skb again in
> skb_orphan_try(). sk_rxhash is only used by RPS now, so some more
> clean method is needed.
[...]

I have previously suggested doing this as a general rule.  However, I
now think we can do much better with accelerated RFS and automatic XPS
(but the latter is not yet implemented).  For virtio_net, accelerated
RFS would effectively push the guest's RFS socket map out to the host.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM: Fix simultaneous NMIs

2011-09-19 Thread Marcelo Tosatti
On Mon, Sep 19, 2011 at 05:30:27PM +0300, Avi Kivity wrote:
> On 09/19/2011 04:54 PM, Marcelo Tosatti wrote:
> >>  >>
> >>  >>   Yes, due to NMI-blocked-by-STI.  A really touchy area.
> >>  >And we don't need the window exit notification then? I don't understand
> >>  >what nmi_in_progress is supposed to do here.
> >>
> >>  We need the window notification in both cases.  If we're recovering
> >>  from STI, then we don't need to collapse NMIs.  If we're completing
> >>  an NMI handler, then we do need to collapse NMIs (since the queue
> >>  length is two, and we just completed one).
> >
> >I don't understand what is the point with nmi_in_progress, and the above
> >hunk, either. Can't inject_nmi do:
> >
> >if (nmi_injected + atomic_read(nmi_pending)<  2)
> > atomic_inc(nmi_pending)
> >
> >Instead of collapsing somewhere else?
> 
> We could.  It's not atomic though - two threads executing in
> parallel could raise the value to three.  Could do a cmpxchg loop
> does an increment bounded to two.  I guess this is a lot clearer,
> thanks.
> 
> >You'd also have to change
> >nmi_injected handling in arch code so its value is not "hidden", in
> >complete_interrupts().
> 
> Or maybe make raising nmi_injected not decrement nmi_pending.  So:
> 
>   nmi_pending: total number of interrupts in queue
>   nmi_injected: of these, how many are currently being injected
> 
> yes?

Yes, at the expense of decrementing on subarch code (which is fine,
apparently).

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM: Fix simultaneous NMIs

2011-09-19 Thread Avi Kivity

On 09/19/2011 05:54 PM, Marcelo Tosatti wrote:

On Mon, Sep 19, 2011 at 05:30:27PM +0300, Avi Kivity wrote:
>  On 09/19/2011 04:54 PM, Marcelo Tosatti wrote:
>  >>   >>
>  >>   >>Yes, due to NMI-blocked-by-STI.  A really touchy area.
>  >>   >And we don't need the window exit notification then? I don't understand
>  >>   >what nmi_in_progress is supposed to do here.
>  >>
>  >>   We need the window notification in both cases.  If we're recovering
>  >>   from STI, then we don't need to collapse NMIs.  If we're completing
>  >>   an NMI handler, then we do need to collapse NMIs (since the queue
>  >>   length is two, and we just completed one).
>  >
>  >I don't understand what is the point with nmi_in_progress, and the above
>  >hunk, either. Can't inject_nmi do:
>  >
>  >if (nmi_injected + atomic_read(nmi_pending)<   2)
>  >  atomic_inc(nmi_pending)
>  >
>  >Instead of collapsing somewhere else?
>
>  We could.  It's not atomic though - two threads executing in
>  parallel could raise the value to three.  Could do a cmpxchg loop
>  does an increment bounded to two.  I guess this is a lot clearer,
>  thanks.
>
>  >You'd also have to change
>  >nmi_injected handling in arch code so its value is not "hidden", in
>  >complete_interrupts().
>
>  Or maybe make raising nmi_injected not decrement nmi_pending.  So:
>
>nmi_pending: total number of interrupts in queue
>nmi_injected: of these, how many are currently being injected
>
>  yes?

Yes, at the expense of decrementing on subarch code (which is fine,
apparently).



Hm, we have no place to decrement.  We need to do that when IRET 
executes, but we don't want to request an NMI window exit in the common 
case of nmi_pending = 1.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM: Fix simultaneous NMIs

2011-09-19 Thread Avi Kivity

On 09/19/2011 06:09 PM, Avi Kivity wrote:

On 09/19/2011 05:54 PM, Marcelo Tosatti wrote:

On Mon, Sep 19, 2011 at 05:30:27PM +0300, Avi Kivity wrote:
>  On 09/19/2011 04:54 PM, Marcelo Tosatti wrote:
> >> >>
> >> >>Yes, due to NMI-blocked-by-STI.  A really touchy area.
> >> >And we don't need the window exit notification then? I don't 
understand

> >> >what nmi_in_progress is supposed to do here.
> >>
> >>   We need the window notification in both cases.  If we're 
recovering
> >>   from STI, then we don't need to collapse NMIs.  If we're 
completing

> >>   an NMI handler, then we do need to collapse NMIs (since the queue
> >>   length is two, and we just completed one).
> >
> >I don't understand what is the point with nmi_in_progress, and the 
above

> >hunk, either. Can't inject_nmi do:
> >
> >if (nmi_injected + atomic_read(nmi_pending)<   2)
> >  atomic_inc(nmi_pending)
> >
> >Instead of collapsing somewhere else?
>
>  We could.  It's not atomic though - two threads executing in
>  parallel could raise the value to three.  Could do a cmpxchg loop
>  does an increment bounded to two.  I guess this is a lot clearer,
>  thanks.
>
> >You'd also have to change
> >nmi_injected handling in arch code so its value is not "hidden", in
> >complete_interrupts().
>
>  Or maybe make raising nmi_injected not decrement nmi_pending.  So:
>
>nmi_pending: total number of interrupts in queue
>nmi_injected: of these, how many are currently being injected
>
>  yes?

Yes, at the expense of decrementing on subarch code (which is fine,
apparently).



Hm, we have no place to decrement.  We need to do that when IRET 
executes, but we don't want to request an NMI window exit in the 
common case of nmi_pending = 1.




I guess we have to change kvm_inject_nmi to run in vcpu context, where 
it has access to more stuff.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC [v2]: vfio / device assignment -- layout of device fd files

2011-09-19 Thread Alex Williamson
On Fri, 2011-09-09 at 08:11 -0500, Stuart Yoder wrote:
> Based on the discussions over the last couple of weeks
> I have updated the device fd file layout proposal and
> tried to specify it a bit more formally.
> 
> ===
> 
> 1.  Overview
> 
>   This specification describes the layout of device files
>   used in the context of vfio, which gives user space
>   direct access to I/O devices that have been bound to
>   vfio.
> 
>   When a device fd is opened and read, offset 0x0 contains
>   a fixed sized header followed by a number of variable length
>   records that describe different characteristics
>   of the device-- addressable regions, interrupts, etc.
> 
>   0x0  +-+-+
>| magic | u32  // identifies this as a vfio
> device file
>+---+ and identifies the type of bus
>| version   | u32  // specifies the version of this
>+---+
>| flags | u32  // encodes any flags
>+---+
>|  dev info record 0|
>|type   | u32   // type of record
>|rec_len| u32   // length in bytes of record
>|   |  (including record header)
>|flags  | u32   // type specific flags
>|...content...  |   // record content, which could
>+---+   // include sub-records
>|  dev info record 1|
>+---+
>|  dev info record N|
>+---+
> 
>   The device info records following the file header may have
>   the following record types each with content encoded in
>   a record specific way:
> 
>   +---+--
>   |  type |
>Region |  num  | Description
>   ---
>   REGION   1describes an addressable address range for the device
>   DTPATH   2describes the device tree path for the device
>   DTINDEX  3describes the index into the related device tree
>   property (reg,ranges,interrupts,interrupt-map)
>   INTERRUPT4describes an interrupt for the device
>   PCI_CONFIG_SPACE 5property identifying a region as PCI config space
>   PCI_BAR_INDEX6describes the BAR index for a PCI region
>   PHYS_ADDR7describes the physical address of the region
>   ---
> 
> 2. Header
> 
> The header is located at offset 0x0 in the device fd
> and has the following format:
> 
> struct devfd_header {
> __u32 magic;
> __u32 version;
> __u32 flags;
> };
> 
> The 'magic' field contains a magic value that will
> identify the type bus the device is on.  Valid values
> are:
> 
> 0x70636900   // "pci" - PCI device
> 0x6474   // "dt" - device tree (system bus)
> 
> 3. Region
> 
>   A REGION record an addressable address region for the device.
> 
> struct devfd_region {
> __u32 type;   // must be 0x1
> __u32 record_len;
> __u32 flags;
> __u64 offset; // seek offset to region from beginning
>   // of file
> __u64 len   ; // length of the region
> };
> 
>   The 'flags' field supports one flag:
> 
>   IS_MMAPABLE
> 
> 4. Device Tree Path (DTPATH)
> 
>   A DTPATH record is a sub-record of a REGION and describes
>   the path to a device tree node for the region

Can we better distinguish sub-records from records?  I assume we're
trying to be as versatile as possible by having a single "type" address
space, but is this going to lead to implementation problems?  A DTPATH
as a record, an INTERRUPT as a sub-record, etc.  Should we instead have
a "subtype" address space per "type" and per device type?  For a "dt"
device, it looks like we really have:

  * REGION (type 0)
  * DTPATH (subtype 0)
  * DTINDEX (subtype 1)
  * PHYS_ADDR (subtype 2)
  * INTERRUPT (type 1)
  * DTPATH (subtype 0)
  * DTINDEX (subtype 1)

While "pci" is:

  * REGION (type 0)
  * PCI_CONFIG_SPACE (subtype 0)
  * PCI_BAR_INDEX (subtype 1)
  * INTERRUPT (type 1)

> struct devfd_dtpath {
> __u32 type;   // must be 0x2
> __u32 record_len;
> __u64 char[]   ; // length of the region
> };
> 
> 5. Device Tree Index (DTINDEX)
> 
>   A DTINDEX record is a sub-record of a REGION and specifies
>   the index into the resource list encoded in the associated
>   device tree property-- "reg", "ranges", "interrupts", or

Re: [RFC] KVM: Fix simultaneous NMIs

2011-09-19 Thread Marcelo Tosatti
On Mon, Sep 19, 2011 at 06:09:39PM +0300, Avi Kivity wrote:
> On 09/19/2011 05:54 PM, Marcelo Tosatti wrote:
> >On Mon, Sep 19, 2011 at 05:30:27PM +0300, Avi Kivity wrote:
> >>  On 09/19/2011 04:54 PM, Marcelo Tosatti wrote:
> >>  >>   >>
> >>  >>   >>Yes, due to NMI-blocked-by-STI.  A really touchy area.
> >>  >>   >And we don't need the window exit notification then? I don't 
> >> understand
> >>  >>   >what nmi_in_progress is supposed to do here.
> >>  >>
> >>  >>   We need the window notification in both cases.  If we're recovering
> >>  >>   from STI, then we don't need to collapse NMIs.  If we're completing
> >>  >>   an NMI handler, then we do need to collapse NMIs (since the queue
> >>  >>   length is two, and we just completed one).
> >>  >
> >>  >I don't understand what is the point with nmi_in_progress, and the above
> >>  >hunk, either. Can't inject_nmi do:
> >>  >
> >>  >if (nmi_injected + atomic_read(nmi_pending)<   2)
> >>  >  atomic_inc(nmi_pending)
> >>  >
> >>  >Instead of collapsing somewhere else?
> >>
> >>  We could.  It's not atomic though - two threads executing in
> >>  parallel could raise the value to three.  Could do a cmpxchg loop
> >>  does an increment bounded to two.  I guess this is a lot clearer,
> >>  thanks.
> >>
> >>  >You'd also have to change
> >>  >nmi_injected handling in arch code so its value is not "hidden", in
> >>  >complete_interrupts().
> >>
> >>  Or maybe make raising nmi_injected not decrement nmi_pending.  So:
> >>
> >>nmi_pending: total number of interrupts in queue
> >>nmi_injected: of these, how many are currently being injected
> >>
> >>  yes?
> >
> >Yes, at the expense of decrementing on subarch code (which is fine,
> >apparently).
> >
> 
> Hm, we have no place to decrement. 

Decrement when setting nmi_injected = false, increment when setting
nmi_injected = true, in vmx/svm.c.

>  We need to do that when IRET
> executes, but we don't want to request an NMI window exit in the
> common case of nmi_pending = 1.

Do not enable nmi window if nmi_injected = true?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM: Fix simultaneous NMIs

2011-09-19 Thread Avi Kivity

On 09/19/2011 06:22 PM, Marcelo Tosatti wrote:

On Mon, Sep 19, 2011 at 06:09:39PM +0300, Avi Kivity wrote:
>  On 09/19/2011 05:54 PM, Marcelo Tosatti wrote:
>  >On Mon, Sep 19, 2011 at 05:30:27PM +0300, Avi Kivity wrote:
>  >>   On 09/19/2011 04:54 PM, Marcelo Tosatti wrote:
>  >>   >>>>
>  >>   >>>> Yes, due to NMI-blocked-by-STI.  A really touchy area.
>  >>   >>>And we don't need the window exit notification then? I don't 
understand
>  >>   >>>what nmi_in_progress is supposed to do here.
>  >>   >>
>  >>   >>We need the window notification in both cases.  If we're 
recovering
>  >>   >>from STI, then we don't need to collapse NMIs.  If we're 
completing
>  >>   >>an NMI handler, then we do need to collapse NMIs (since the queue
>  >>   >>length is two, and we just completed one).
>  >>   >
>  >>   >I don't understand what is the point with nmi_in_progress, and the 
above
>  >>   >hunk, either. Can't inject_nmi do:
>  >>   >
>  >>   >if (nmi_injected + atomic_read(nmi_pending)<2)
>  >>   >   atomic_inc(nmi_pending)
>  >>   >
>  >>   >Instead of collapsing somewhere else?
>  >>
>  >>   We could.  It's not atomic though - two threads executing in
>  >>   parallel could raise the value to three.  Could do a cmpxchg loop
>  >>   does an increment bounded to two.  I guess this is a lot clearer,
>  >>   thanks.
>  >>
>  >>   >You'd also have to change
>  >>   >nmi_injected handling in arch code so its value is not "hidden", in
>  >>   >complete_interrupts().
>  >>
>  >>   Or maybe make raising nmi_injected not decrement nmi_pending.  So:
>  >>
>  >> nmi_pending: total number of interrupts in queue
>  >> nmi_injected: of these, how many are currently being injected
>  >>
>  >>   yes?
>  >
>  >Yes, at the expense of decrementing on subarch code (which is fine,
>  >apparently).
>  >
>
>  Hm, we have no place to decrement.

Decrement when setting nmi_injected = false, increment when setting
nmi_injected = true, in vmx/svm.c.


That gives a queue length of 3: one running nmi and nmi_pending = 2.


>   We need to do that when IRET
>  executes, but we don't want to request an NMI window exit in the
>  common case of nmi_pending = 1.

Do not enable nmi window if nmi_injected = true?



We have to, since we need a back-to-back nmi if the queue length > 1 
(including the running nmi).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM: Fix simultaneous NMIs

2011-09-19 Thread Marcelo Tosatti
On Mon, Sep 19, 2011 at 06:37:35PM +0300, Avi Kivity wrote:
> On 09/19/2011 06:22 PM, Marcelo Tosatti wrote:
> >On Mon, Sep 19, 2011 at 06:09:39PM +0300, Avi Kivity wrote:
> >>  On 09/19/2011 05:54 PM, Marcelo Tosatti wrote:
> >>  >On Mon, Sep 19, 2011 at 05:30:27PM +0300, Avi Kivity wrote:
> >>  >>   On 09/19/2011 04:54 PM, Marcelo Tosatti wrote:
> >>  >>   >>>>
> >>  >>   >>>> Yes, due to NMI-blocked-by-STI.  A really touchy area.
> >>  >>   >>>And we don't need the window exit notification then? I don't 
> >> understand
> >>  >>   >>>what nmi_in_progress is supposed to do here.
> >>  >>   >>
> >>  >>   >>We need the window notification in both cases.  If we're 
> >> recovering
> >>  >>   >>from STI, then we don't need to collapse NMIs.  If we're 
> >> completing
> >>  >>   >>an NMI handler, then we do need to collapse NMIs (since the 
> >> queue
> >>  >>   >>length is two, and we just completed one).
> >>  >>   >
> >>  >>   >I don't understand what is the point with nmi_in_progress, and the 
> >> above
> >>  >>   >hunk, either. Can't inject_nmi do:
> >>  >>   >
> >>  >>   >if (nmi_injected + atomic_read(nmi_pending)<2)
> >>  >>   >   atomic_inc(nmi_pending)
> >>  >>   >
> >>  >>   >Instead of collapsing somewhere else?
> >>  >>
> >>  >>   We could.  It's not atomic though - two threads executing in
> >>  >>   parallel could raise the value to three.  Could do a cmpxchg loop
> >>  >>   does an increment bounded to two.  I guess this is a lot clearer,
> >>  >>   thanks.
> >>  >>
> >>  >>   >You'd also have to change
> >>  >>   >nmi_injected handling in arch code so its value is not "hidden", in
> >>  >>   >complete_interrupts().
> >>  >>
> >>  >>   Or maybe make raising nmi_injected not decrement nmi_pending.  So:
> >>  >>
> >>  >> nmi_pending: total number of interrupts in queue
> >>  >> nmi_injected: of these, how many are currently being injected
> >>  >>
> >>  >>   yes?
> >>  >
> >>  >Yes, at the expense of decrementing on subarch code (which is fine,
> >>  >apparently).
> >>  >
> >>
> >>  Hm, we have no place to decrement.
> >
> >Decrement when setting nmi_injected = false, increment when setting
> >nmi_injected = true, in vmx/svm.c.
> 
> That gives a queue length of 3: one running nmi and nmi_pending = 2.

Increment through the same wrapper that will collapse the second and next, also
used by kvm_inject_nmi.

> >>   We need to do that when IRET
> >>  executes, but we don't want to request an NMI window exit in the
> >>  common case of nmi_pending = 1.
> >
> >Do not enable nmi window if nmi_injected = true?
> >
> 
> We have to, since we need a back-to-back nmi if the queue length > 1
> (including the running nmi).
> 
> -- 
> error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS] [SeaBIOS PATCH 2/2] hotplug: Add device per func in ACPI DSDT tables

2011-09-19 Thread Marcelo Tosatti
On Mon, Sep 19, 2011 at 01:02:59PM +0300, Michael S. Tsirkin wrote:
> On Mon, Sep 19, 2011 at 12:57:33PM +0300, Gleb Natapov wrote:
> > On Mon, Sep 19, 2011 at 03:27:38AM -0400, Amos Kong wrote:
> > > 
> > > Only func 0 is registered to guest driver (we can
> > > only found func 0 in slot->funcs list of driver),
> > > the other functions could not be cleaned when
> > > hot-removing the whole slot. This patch adds
> > > device per function in ACPI DSDT tables.
> > > 
> > You can't unplug a single function. Guest surely knows that.
> 
> Looking at guest code, it's clear that
> at least a Linux guest doesn't know that.

acpiphp_disable_slot function appears to eject all functions.

> > > Have tested with linux/winxp/win7, hot-adding/hot-remving,
> > > single/multiple function device, they are all fine.
> > > 

Does not work for me (FC12 guest). As mentioned previously, Linux driver
looks for function 0 when injection request is seen (see enable_device
function in acpiphp_glue.c).

> > What was not fine before?
> > 
> > Have you looked at real HW that supports PCI hot plug DSDT? Does it
> > looks the same?
> 
> I recall I saw some examples like this on the net.
> 
> 
> > > new acpi-dst.hex(332K):
> > > http://amos-kong.rhcloud.com/pub/acpi-dsdt.hex
> > > 
> > > Signed-off-by: Amos Kong 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/5] VFIO: Base framework for new VFIO driver

2011-09-19 Thread Alex Williamson
Sorry for the delay, just getting back from LPC and some time off...

On Wed, 2011-09-07 at 10:52 -0400, Konrad Rzeszutek Wilk wrote:
> > +static long vfio_iommu_unl_ioctl(struct file *filep,
> > +unsigned int cmd, unsigned long arg)
> > +{
> > +   struct vfio_iommu *viommu = filep->private_data;
> > +   struct vfio_dma_map dm;
> > +   int ret = -ENOSYS;
> > +
> > +   switch (cmd) {
> > +   case VFIO_IOMMU_MAP_DMA:
> > +   if (copy_from_user(&dm, (void __user *)arg, sizeof dm))
> > +   return -EFAULT;
> > +   ret = 0; // XXX - Do something
> 
> 

Truly an RFC ;)

> > +   if (!ret && copy_to_user((void __user *)arg, &dm, sizeof dm))
> > +   ret = -EFAULT;
> > +   break;
> > +
> > +   case VFIO_IOMMU_UNMAP_DMA:
> > +   if (copy_from_user(&dm, (void __user *)arg, sizeof dm))
> > +   return -EFAULT;
> > +   ret = 0; // XXX - Do something
> > +   if (!ret && copy_to_user((void __user *)arg, &dm, sizeof dm))
> > +   ret = -EFAULT;
> > +   break;
> > +   }
> > +   return ret;
> > +}
> > +
> > +#ifdef CONFIG_COMPAT
> > +static long vfio_iommu_compat_ioctl(struct file *filep,
> > +   unsigned int cmd, unsigned long arg)
> > +{
> > +   arg = (unsigned long)compat_ptr(arg);
> > +   return vfio_iommu_unl_ioctl(filep, cmd, arg);
> > +}
> > +#endif /* CONFIG_COMPAT */
> > +
> > +const struct file_operations vfio_iommu_fops = {
> > +   .owner  = THIS_MODULE,
> > +   .release= vfio_iommu_release,
> > +   .unlocked_ioctl = vfio_iommu_unl_ioctl,
> > +#ifdef CONFIG_COMPAT
> > +   .compat_ioctl   = vfio_iommu_compat_ioctl,
> > +#endif
> > +};
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> .. snip..
> > +int vfio_group_add_dev(struct device *dev, void *data)
> > +{
> > +   struct vfio_device_ops *ops = data;
> > +   struct list_head *pos;
> > +   struct vfio_group *vgroup = NULL;
> > +   struct vfio_device *vdev = NULL;
> > +   unsigned int group;
> > +   int ret = 0, new_group = 0;
> 
> 'new_group' should probably be 'bool'.

ok

> > +
> > +   if (iommu_device_group(dev, &group))
> > +   return 0;
> 
> -EEXIST?

I think I made this return 0 because it's called from device add
notifiers and walking devices lists.  It's ok for it to fail, not all
devices have to be backed by an iommu, they just won't show up in vfio.
Maybe I should leave that to the leaf callers though.  EINVAL is
probably more appropriate.

> > +
> > +   mutex_lock(&vfio.group_lock);
> > +
> > +   list_for_each(pos, &vfio.group_list) {
> > +   vgroup = list_entry(pos, struct vfio_group, next);
> > +   if (vgroup->group == group)
> > +   break;
> > +   vgroup = NULL;
> > +   }
> > +
> > +   if (!vgroup) {
> > +   int id;
> > +
> > +   if (unlikely(idr_pre_get(&vfio.idr, GFP_KERNEL) == 0)) {
> > +   ret = -ENOMEM;
> > +   goto out;
> > +   }
> > +   vgroup = kzalloc(sizeof(*vgroup), GFP_KERNEL);
> > +   if (!vgroup) {
> > +   ret = -ENOMEM;
> > +   goto out;
> > +   }
> > +
> > +   vgroup->group = group;
> > +   INIT_LIST_HEAD(&vgroup->device_list);
> > +
> > +   ret = idr_get_new(&vfio.idr, vgroup, &id);
> > +   if (ret == 0 && id > MINORMASK) {
> > +   idr_remove(&vfio.idr, id);
> > +   kfree(vgroup);
> > +   ret = -ENOSPC;
> > +   goto out;
> > +   }
> > +
> > +   vgroup->devt = MKDEV(MAJOR(vfio.devt), id);
> > +   list_add(&vgroup->next, &vfio.group_list);
> > +   device_create(vfio.class, NULL, vgroup->devt,
> > + vgroup, "%u", group);
> > +
> > +   new_group = 1;
> > +   } else {
> > +   list_for_each(pos, &vgroup->device_list) {
> > +   vdev = list_entry(pos, struct vfio_device, next);
> > +   if (vdev->dev == dev)
> > +   break;
> > +   vdev = NULL;
> > +   }
> > +   }
> > +
> > +   if (!vdev) {
> > +   /* Adding a device for a group that's already in use? */
> > +   /* Maybe we should attach to the domain so others can't */
> > +   BUG_ON(vgroup->container &&
> > +  vgroup->container->iommu &&
> > +  vgroup->container->iommu->refcnt);
> > +
> > +   vdev = ops->new(dev);
> > +   if (IS_ERR(vdev)) {
> > +   /* If we just created this vgroup, tear it down */
> > +   if (new_group) {
> > +   device_destroy(vfio.class, vgroup->devt);
> > +   idr_remove(&vfio.idr, MINOR(vgroup->devt));
> > +   list_del(&vgroup->next);
> > +   kfree(vgroup)

Re: [IPXE PATCH] [dhcp] Use random transaction ID to associate messages

2011-09-19 Thread Michael Brown
On Sunday 18 Sep 2011 02:21:04 Amos Kong wrote:
> Users may boot up a QEMU guest without default mac address, it's easy to
> repeat, it always to be failed to get IP with PXE boot.
> RFC suggests us to use random xid, it's necessary. This patch generates
> random id when start dhcp, and record it to netdev struct.

I've applied a modified version of this patch, which preserves the separation 
of abstractions between "net device" and "DHCP client":

  http://git.ipxe.org/ipxe.git/commitdiff/12767d2

Thanks!

Michael
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Temporary kvm and qemu git repositories

2011-09-19 Thread Michael S. Tsirkin
On Wed, Aug 31, 2011 at 06:38:28PM +0300, Avi Kivity wrote:
> Since master.kernel.org is down for maintenance, I've set up
> temporary repositories on github:
> 
>   git://github.com/avikivity/kvm.git
>   git://github.com/avikivity/qemu.git
> 
> Please use these instead of kvm.git and qemu-kvm.git until further notice.

I did hold out for a while, but as kernel.org
doesn't seem to be back, here's my tree:
https://github.com/mstsirkin/qemu/

The git url is
git://github.com/mstsirkin/qemu.git

> -- 
> error compiling committee.c: too many arguments to function
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS] [SeaBIOS PATCH 2/2] hotplug: Add device per func in ACPI DSDT tables

2011-09-19 Thread Michael S. Tsirkin
On Mon, Sep 19, 2011 at 01:27:25PM -0300, Marcelo Tosatti wrote:
> On Mon, Sep 19, 2011 at 01:02:59PM +0300, Michael S. Tsirkin wrote:
> > On Mon, Sep 19, 2011 at 12:57:33PM +0300, Gleb Natapov wrote:
> > > On Mon, Sep 19, 2011 at 03:27:38AM -0400, Amos Kong wrote:
> > > > 
> > > > Only func 0 is registered to guest driver (we can
> > > > only found func 0 in slot->funcs list of driver),
> > > > the other functions could not be cleaned when
> > > > hot-removing the whole slot. This patch adds
> > > > device per function in ACPI DSDT tables.
> > > > 
> > > You can't unplug a single function. Guest surely knows that.
> > 
> > Looking at guest code, it's clear that
> > at least a Linux guest doesn't know that.
> 
> acpiphp_disable_slot function appears to eject all functions.

Yes but the siblings list seems to be populated from the ACPI
tables, but by probing PCI functions.
So we need to, at a minimum, have Device tables for all functions.

> > > > Have tested with linux/winxp/win7, hot-adding/hot-remving,
> > > > single/multiple function device, they are all fine.
> > > > 
> 
> Does not work for me (FC12 guest). As mentioned previously, Linux driver
> looks for function 0 when injection request is seen (see enable_device
> function in acpiphp_glue.c).

What exactly are you trying to do?
ATM the idea is to add all functions, add function 0
as the last one.

> > > What was not fine before?
> > > 
> > > Have you looked at real HW that supports PCI hot plug DSDT? Does it
> > > looks the same?
> > 
> > I recall I saw some examples like this on the net.
> > 
> > 
> > > > new acpi-dst.hex(332K):
> > > > http://amos-kong.rhcloud.com/pub/acpi-dsdt.hex
> > > > 
> > > > Signed-off-by: Amos Kong 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC [v2]: vfio / device assignment -- layout of device fd files

2011-09-19 Thread Scott Wood
On 09/19/2011 10:16 AM, Alex Williamson wrote:
> On Fri, 2011-09-09 at 08:11 -0500, Stuart Yoder wrote:
>> 2. Header
>>
>> The header is located at offset 0x0 in the device fd
>> and has the following format:
>>
>> struct devfd_header {
>> __u32 magic;
>> __u32 version;
>> __u32 flags;
>> };
>>
>> The 'magic' field contains a magic value that will
>> identify the type bus the device is on.  Valid values
>> are:
>>
>> 0x70636900   // "pci" - PCI device
>> 0x6474   // "dt" - device tree (system bus)

These look somewhat conflict-prone (even more so than "vfio") -- maybe
not ambiguous in context, but if we're going to use magic numbers we
might as well make them as unique as we can.  Can't we just generate a
couple random numbers?

Also, is the magic number specifically 0x70636900 in native endian, or
"pci" however it would be encoded on the platform?  Are there any
platforms in Linux where multiple endians are supported at once in
userspace (on a per-process basis)?

>> 3. Region
>>
>>   A REGION record an addressable address region for the device.
>>
>> struct devfd_region {
>> __u32 type;   // must be 0x1
>> __u32 record_len;
>> __u32 flags;
>> __u64 offset; // seek offset to region from beginning
>>   // of file
>> __u64 len   ; // length of the region
>> };
>>
>>   The 'flags' field supports one flag:
>>
>>   IS_MMAPABLE
>>
>> 4. Device Tree Path (DTPATH)
>>
>>   A DTPATH record is a sub-record of a REGION and describes
>>   the path to a device tree node for the region
> 
> Can we better distinguish sub-records from records?  I assume we're
> trying to be as versatile as possible by having a single "type" address
> space, but is this going to lead to implementation problems?

What kind of problems?

> A DTPATH as a record, an INTERRUPT as a sub-record, etc.

Same as any other unrecognized (sub)record type, you ignore it -- but
the kernel should not be generating this.

> Should we instead have
> a "subtype" address space per "type" and per device type?  For a "dt"
> device, it looks like we really have:
> 
>   * REGION (type 0)
>   * DTPATH (subtype 0)
>   * DTINDEX (subtype 1)
>   * PHYS_ADDR (subtype 2)
>   * INTERRUPT (type 1)
>   * DTPATH (subtype 0)
>   * DTINDEX (subtype 1)
> 
> While "pci" is:
> 
>   * REGION (type 0)
>   * PCI_CONFIG_SPACE (subtype 0)
>   * PCI_BAR_INDEX (subtype 1)
>   * INTERRUPT (type 1)

I'd prefer to keep one numberspace, with documented restrictions on what
types/subtypes are allowed in each context.  Makes it easier if we end
up in a situation where a (sub)record type is applicable to multiple
contexts, and easier to detect when an error has been made.

>> 8.  PCI Bar Index (PCI_BAR_INDEX)
>>
>> A PCI_BAR_INDEX record is a sub-record of a REGION record
>> and identifies the PCI BAR index for the region.
>>
>> struct devfd_barindex {
>> __u32 type;   // must be 0x6
>> __u32 record_len;
>> __u32 flags;
>> __u32 bar_index;
>> }
> 
> I suppose we're more concerned with easy parsing and alignment than
> compactness, so a u32 to differentiate 6 BARS + 1 ROM is probably ok.

Right.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC [v2]: vfio / device assignment -- layout of device fd files

2011-09-19 Thread Alex Williamson
On Mon, 2011-09-19 at 14:37 -0500, Scott Wood wrote:
> On 09/19/2011 10:16 AM, Alex Williamson wrote:
> > On Fri, 2011-09-09 at 08:11 -0500, Stuart Yoder wrote:
> >> 2. Header
> >>
> >> The header is located at offset 0x0 in the device fd
> >> and has the following format:
> >>
> >> struct devfd_header {
> >> __u32 magic;
> >> __u32 version;
> >> __u32 flags;
> >> };
> >>
> >> The 'magic' field contains a magic value that will
> >> identify the type bus the device is on.  Valid values
> >> are:
> >>
> >> 0x70636900   // "pci" - PCI device
> >> 0x6474   // "dt" - device tree (system bus)
> 
> These look somewhat conflict-prone (even more so than "vfio") -- maybe
> not ambiguous in context, but if we're going to use magic numbers we
> might as well make them as unique as we can.  Can't we just generate a
> couple random numbers?
> 
> Also, is the magic number specifically 0x70636900 in native endian, or
> "pci" however it would be encoded on the platform?  Are there any
> platforms in Linux where multiple endians are supported at once in
> userspace (on a per-process basis)?

A GUID would be fine w/ me.

> >> 3. Region
> >>
> >>   A REGION record an addressable address region for the device.
> >>
> >> struct devfd_region {
> >> __u32 type;   // must be 0x1
> >> __u32 record_len;
> >> __u32 flags;
> >> __u64 offset; // seek offset to region from beginning
> >>   // of file
> >> __u64 len   ; // length of the region
> >> };
> >>
> >>   The 'flags' field supports one flag:
> >>
> >>   IS_MMAPABLE
> >>
> >> 4. Device Tree Path (DTPATH)
> >>
> >>   A DTPATH record is a sub-record of a REGION and describes
> >>   the path to a device tree node for the region
> > 
> > Can we better distinguish sub-records from records?  I assume we're
> > trying to be as versatile as possible by having a single "type" address
> > space, but is this going to lead to implementation problems?
> 
> What kind of problems?

vvv Those kind.

> > A DTPATH as a record, an INTERRUPT as a sub-record, etc.
> 
> Same as any other unrecognized (sub)record type, you ignore it -- but
> the kernel should not be generating this.

I'm trying to express that I think the spec is unclear here.  It's easy
to hand wave that the code will do the right thing, but if the next
person comes along and doesn't understand that a DTPATH is only a
sub-type and a DTINDEX needs to be paired with a DTPATH, then we've
failed.

> > Should we instead have
> > a "subtype" address space per "type" and per device type?  For a "dt"
> > device, it looks like we really have:
> > 
> >   * REGION (type 0)
> >   * DTPATH (subtype 0)
> >   * DTINDEX (subtype 1)
> >   * PHYS_ADDR (subtype 2)
> >   * INTERRUPT (type 1)
> >   * DTPATH (subtype 0)
> >   * DTINDEX (subtype 1)
> > 
> > While "pci" is:
> > 
> >   * REGION (type 0)
> >   * PCI_CONFIG_SPACE (subtype 0)
> >   * PCI_BAR_INDEX (subtype 1)
> >   * INTERRUPT (type 1)
> 
> I'd prefer to keep one numberspace, with documented restrictions on what
> types/subtypes are allowed in each context.  Makes it easier if we end
> up in a situation where a (sub)record type is applicable to multiple
> contexts, and easier to detect when an error has been made.

Couldn't we accomplish the same with separate type and subtype number
spaces?

enum types {
REGION,
INTERRUPT,
};

enum subtypes {
DTPATH,
DTINDEX,
PHYS_ADDR,
PCI_CONFIG_SPACE,
PCI_BAR_INDEX,
};

I just find it confusing that we intermix them when defining them.
Thanks,

Alex

> >> 8.  PCI Bar Index (PCI_BAR_INDEX)
> >>
> >> A PCI_BAR_INDEX record is a sub-record of a REGION record
> >> and identifies the PCI BAR index for the region.
> >>
> >> struct devfd_barindex {
> >> __u32 type;   // must be 0x6
> >> __u32 record_len;
> >> __u32 flags;
> >> __u32 bar_index;
> >> }
> > 
> > I suppose we're more concerned with easy parsing and alignment than
> > compactness, so a u32 to differentiate 6 BARS + 1 ROM is probably ok.
> 
> Right.
> 
> -Scott
> 



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC [v2]: vfio / device assignment -- layout of device fd files

2011-09-19 Thread Scott Wood
On 09/19/2011 04:07 PM, Alex Williamson wrote:
> On Mon, 2011-09-19 at 14:37 -0500, Scott Wood wrote:
>>> A DTPATH as a record, an INTERRUPT as a sub-record, etc.
>>
>> Same as any other unrecognized (sub)record type, you ignore it -- but
>> the kernel should not be generating this.
> 
> I'm trying to express that I think the spec is unclear here.  It's easy
> to hand wave that the code will do the right thing, but if the next
> person comes along and doesn't understand that a DTPATH is only a
> sub-type and a DTINDEX needs to be paired with a DTPATH, then we've
> failed.

Sure, the spec needs to be clear about which types are expected in each
context.

>> I'd prefer to keep one numberspace, with documented restrictions on what
>> types/subtypes are allowed in each context.  Makes it easier if we end
>> up in a situation where a (sub)record type is applicable to multiple
>> contexts, and easier to detect when an error has been made.
> 
> Couldn't we accomplish the same with separate type and subtype number
> spaces?
> 
> enum types {
>   REGION,
>   INTERRUPT,
> };
> 
> enum subtypes {
>   DTPATH,
>   DTINDEX,
>   PHYS_ADDR,
>   PCI_CONFIG_SPACE,
>   PCI_BAR_INDEX,
> };
> 
> I just find it confusing that we intermix them when defining them.
> Thanks,

As long as we don't have separate numberspaces for PCI/DT/future-stuff,
I'm reasonably happy.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memmory and CPU Ballooning

2011-09-19 Thread day knight
Does any one know how to do this? thank you kindly

On Sun, Sep 18, 2011 at 10:40 PM, day knight  wrote:
> Is it possible and if yes then how.
> Can we increase the memory on a live guest machine without having to
> shutdown or reboot as well as increase and decrease CPUs. if it is
> possible, can some one point me to the documentation :)
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] e1000: Don't set the Capabilities List bit

2011-09-19 Thread dann frazier
The Capabilities Pointer is NULL, so this bit shouldn't be set. The state of
this bit doesn't appear to change any behavior on Linux/Windows versions we've
tested, but it does cause Windows' PCI/PCI Express Compliance Test to balk.

I happen to have a physical 82540EM controller, and it also sets the
Capabilities Bit, but it actually has items on the capabilities list to go
with it :)

Signed-off-by: dann frazier 
---
 hw/e1000.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index fe3e812..3d92128 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -1166,8 +1166,6 @@ static int pci_e1000_init(PCIDevice *pci_dev)
 
 pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_INTEL);
 pci_config_set_device_id(pci_conf, E1000_DEVID);
-/* TODO: we have no capabilities, so why is this bit set? */
-pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_CAP_LIST);
 pci_conf[PCI_REVISION_ID] = 0x03;
 pci_config_set_class(pci_conf, PCI_CLASS_NETWORK_ETHERNET);
 /* TODO: RST# value should be 0, PCI spec 6.2.4 */
-- 
1.7.6.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-net: Read MAC only after initializing MSI-X

2011-09-19 Thread Rusty Russell
On Mon, 19 Sep 2011 09:01:50 +0300, "Michael S. Tsirkin"  
wrote:
> On Mon, Sep 19, 2011 at 01:05:17PM +0930, Rusty Russell wrote:
> > On Sat, 20 Aug 2011 23:00:44 +0300, "Michael S. Tsirkin"  
> > wrote:
> > > On Fri, Aug 19, 2011 at 07:33:07PM +0300, Sasha Levin wrote:
> > > > Maybe this is better solved by copying the way it was done in PCI itself
> > > > with capability linked list?
> > > 
> > > There are any number of ways to lay out the structure.  I went for what
> > > seemed a simplest one.  For MSI-X the train has left the station.  We
> > > can probably still tweak where the high 32 bit features
> > > for 64 bit features are.  No idea if it's worth it.
> > 
> > Sorry, this has been in the back of my mind.  I think it's a good idea;
> > can we use the capability linked list for pre-device specific stuff from
> > now on?
> > 
> > Thanks,
> > Rusty.
> 
> Do we even want capability bits then?
> We can give each capability an ack flag ...

We could have, and if I'd known PCI when I designed virtio I might have.

But it's not easy now to map structure offsets to that scheme, and we
can't really force such a change on the non-PCI users.  So I'd say we
should only do it for the non-device-specific options.  ie. we'll still
have the MSI-X case move the device-specific config, but we'll use a
linked list from now on, eg. for the next 32 features bits...

Thoughts?
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: E500: Support hugetlbfs

2011-09-19 Thread Alexander Graf
With hugetlbfs support emerging on e500, we should also support KVM
backing its guest memory by it.

This patch adds support for hugetlbfs into the e500 shadow mmu code.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/e500_tlb.c |   22 ++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
index ec17148..64f75eb 100644
--- a/arch/powerpc/kvm/e500_tlb.c
+++ b/arch/powerpc/kvm/e500_tlb.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -673,13 +674,34 @@ static inline void kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
pfn &= ~(tsize_pages - 1);
break;
}
+   } else if (vma && hva >= vma->vm_start &&
+   (vma->vm_flags & VM_HUGETLB)) {
+   unsigned long psize = vma_kernel_pagesize(vma);
+   int lz;
+
+   tsize = (gtlbe->mas1 & MAS1_TSIZE_MASK) >>
+   MAS1_TSIZE_SHIFT;
+
+   /*
+* e500 doesn't implement the lowest tsize bit,
+* or 1K pages.
+*/
+   tsize = max(BOOK3E_PAGESZ_4K, tsize & ~1);
+
+   /* take the smallest page size that satisfies host and
+  guest mapping */
+   asm (PPC_CNTLZL "%0,%1" : "=r" (lz) : "r" (psize));
+   tsize = min(21 - lz, tsize);
}
 
up_read(¤t->mm->mmap_sem);
}
 
if (likely(!pfnmap)) {
+   unsigned long tsize_pages = 1 << (tsize - 2);
pfn = gfn_to_pfn_memslot(vcpu_e500->vcpu.kvm, slot, gfn);
+   pfn &= ~(tsize_pages - 1);
+   gvaddr &= ~((tsize_pages << PAGE_SHIFT) - 1);
if (is_error_pfn(pfn)) {
printk(KERN_ERR "Couldn't get real page for gfn %lx!\n",
(long)gfn);
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memmory and CPU Ballooning

2011-09-19 Thread Emmanuel Noobadmin
On 9/19/11, day knight  wrote:
> Is it possible and if yes then how.
> Can we increase the memory on a live guest machine without having to
> shutdown or reboot as well as increase and decrase CPUs. if it is
> possible, can some one point me to the documentation :)

Chipping in my 2 cents since nobody's answering, hopefully the sheer
amount of wrong information I put out will generate a meaningful reply
:D

There is/was an option to configure memory ballooning in the domain
xml. However, when I last tried it (on SL6.0 host), it didn't seem to
be working as the domain will use the initial amount of memory and hit
swap instead of getting more memory. Although I vaguely remember
discovering afterwards, there was some qemu command needed for this.

Also, I've read that memory ballooning is a bad idea because of the
way the kernel allocates memory resources during boot based on
available memory. Using ballooning causes the allocation calculation
to be inaccurate and become highly inefficient.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html