Re: [RFC 0/4] KVM in-kernel PM Timer implementation

2010-12-15 Thread Avi Kivity

On 12/14/2010 06:04 PM, Anthony Liguori wrote:

On 12/14/2010 09:38 AM, Avi Kivity wrote:
Fortunately, we have a very good bytecode interpreter that's 
accelerated in the kernel called KVM ;-)


We have exactly the same bytecode interpreter under a different name, 
it's called userspace.


If you can afford to make the transition back to the guest for 
emulation, you might as well transition to userspace.


If you re-entered the guest and setup a stack that had the RIP of the 
source of the exit, then there's no additional need to exit the 
guest.  The handler can just do an iret.  Or am I missing something?


I didn't even consider an iret-to-guest, to be honest.  Let's consider 
the options:


 - iret-to-guest (a la tpr patching) - need to have an executable page 
in the guest virtual address space and some stack space (on 64-bit, can 
rely on iretq switching the stack).  That is probably impossible to do 
in a generic way without guest cooperation.  If we rely on guest 
cooperation, we might as well have the guest patch the IN instruction 
itself (no exits at all).


- architectural SMM - no need to find a virtual mapping, or even a 
physical page, since we're in our own physical address space.  However, 
the RSM instruction will trap, and on Intel, at least the first few 
instructions need to be emulated since SMM starts in big real mode.  
Also needs a tlb flush.


- kvm-specific SMM (probably what you referred to as paravirt SMM, but 
if the guest OS is not involved, it's not really paravirt) - can switch 
to our own cr3 so no problem with finding a virtual mapping; however 
still needs a tlb flush, and on pre-NPT/EPT machines, switching cr3 back 
will involve an exit.




We already have a virtual address space that works for most guests 
thanks to the TPR optimization.


It only works for Windows XP and Windows XP with the /3GB extension.


Is this a fundamental limitation or just a statement of today's 
heuristics?  Does any guest not keep the BIOS in virtual memory in a 
static location?


If you're looking for a fundamental limitation, then yes, a guest need 
not map the BIOS at all.  Practically, I believe all common guest do map 
the BIOS, but IIRC modern guests use non-executable mappings.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/4] genirq: Inform handler about line sharing state

2010-12-15 Thread Jan Kiszka
Am 15.12.2010 09:05, Thomas Gleixner wrote:
 On Wed, 15 Dec 2010, Jan Kiszka wrote:
 
 Am 14.12.2010 22:46, Thomas Gleixner wrote:
 On Mon, 13 Dec 2010, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com
chip_bus_lock(desc);
retval = __setup_irq(irq, desc, action);
chip_bus_sync_unlock(desc);
  
 -  if (retval)
 +  if (retval) {
 +  if (desc-action  !desc-action-next)
 +  desc-irq_data.drv_status = ~IRQS_SHARED;

 This is redundant. IRQS_SHARED gets set in a code path where all
 checks are done already.

 Nope, it's also set before entry of __setup_irq in case we call an
 IRQF_ADAPTIVE handler.

 We need to set it that early as we may race with IRQ events for the
 already registered handler happening between the sharing notification
 and the actual registration of the second handler.
 
 Hmm, ok. Though the MAKE_SHAREABLE flag should be sufficient to do the
 notification.

For notification, yes. But we need SHARED once we reenable the line
after the notification.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] qemu,qmp: convert do_inject_nmi() to QObject, QError

2010-12-15 Thread Lai Jiangshan

Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt).

changed from v1
Add document.
Add error handling when the cpu index is invalid.

changed from v2
use QERR_INVALID_PARAMETER_VALUE as Markus suggest.

Signed-off-by:  Lai Jiangshan la...@cn.fujitsu.com
---
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 23024ba..f86d9fe 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -724,7 +724,8 @@ ETEXI
 .args_type  = cpu_index:i,
 .params = cpu,
 .help   = inject an NMI on the given CPU,
-.mhandler.cmd = do_inject_nmi,
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_inject_nmi,
 },
 #endif
 STEXI
diff --git a/monitor.c b/monitor.c
index ec31eac..3e33a96 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2119,7 +2119,7 @@ static void do_wav_capture(Monitor *mon, const QDict 
*qdict)
 #endif
 
 #if defined(TARGET_I386)
-static void do_inject_nmi(Monitor *mon, const QDict *qdict)
+static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data)
 {
 CPUState *env;
 int cpu_index = qdict_get_int(qdict, cpu_index);
@@ -2127,8 +2127,12 @@ static void do_inject_nmi(Monitor *mon, const QDict 
*qdict)
 for (env = first_cpu; env != NULL; env = env-next_cpu)
 if (env-cpu_index == cpu_index) {
 cpu_interrupt(env, CPU_INTERRUPT_NMI);
-break;
+return 0;
 }
+
+qerror_report(QERR_INVALID_PARAMETER_VALUE, cpu_index,
+  a CPU number);
+return -1;
 }
 #endif
 
diff --git a/qmp-commands.hx b/qmp-commands.hx
index e5f157f..fcb6bf2 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -429,6 +429,33 @@ Example:
 
 EQMP
 
+#if defined(TARGET_I386)
+{
+.name   = inject_nmi,
+.args_type  = cpu_index:i,
+.params = cpu,
+.help   = inject an NMI on the given CPU,
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_inject_nmi,
+},
+#endif
+SQMP
+inject_nmi
+--
+
+Inject an NMI on the given CPU (x86 only).
+
+Arguments:
+
+- cpu_index: the index of the CPU to be injected NMI (json-int)
+
+Example:
+
+- { execute: inject_nmi, arguments: { cpu_index: 0 } }
+- { return: {} }
+
+EQMP
+
 {
 .name   = migrate,
 .args_type  = detach:-d,blk:-b,inc:-i,uri:s,
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/4] genirq: Inform handler about line sharing state

2010-12-15 Thread Thomas Gleixner
On Wed, 15 Dec 2010, Jan Kiszka wrote:
 Am 15.12.2010 09:05, Thomas Gleixner wrote:
  On Wed, 15 Dec 2010, Jan Kiszka wrote:
  
  Am 14.12.2010 22:46, Thomas Gleixner wrote:
  On Mon, 13 Dec 2010, Jan Kiszka wrote:
  From: Jan Kiszka jan.kis...@siemens.com
   chip_bus_lock(desc);
   retval = __setup_irq(irq, desc, action);
   chip_bus_sync_unlock(desc);
   
  -if (retval)
  +if (retval) {
  +if (desc-action  !desc-action-next)
  +desc-irq_data.drv_status = ~IRQS_SHARED;
 
  This is redundant. IRQS_SHARED gets set in a code path where all
  checks are done already.
 
  Nope, it's also set before entry of __setup_irq in case we call an
  IRQF_ADAPTIVE handler.
 
  We need to set it that early as we may race with IRQ events for the
  already registered handler happening between the sharing notification
  and the actual registration of the second handler.
  
  Hmm, ok. Though the MAKE_SHAREABLE flag should be sufficient to do the
  notification.
 
 For notification, yes. But we need SHARED once we reenable the line
 after the notification.

Darn. Will think more about it.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] pci-assign: Convert need_emulate_cmd into a bitmask

2010-12-15 Thread Michael S. Tsirkin
On Mon, Dec 13, 2010 at 04:56:29PM -0700, Alex Williamson wrote:
 On Tue, 2010-12-14 at 00:25 +0100, Jan Kiszka wrote:
  From: Jan Kiszka jan.kis...@siemens.com
  
  Define a mask of PCI command register bits that need to be emulated,
  i.e. read back from their shadow state. We will need this for
  selectively emulating the INTx mask bit.
  
  Note: No initialization of emulate_cmd_mask to zero needed, the device
  state is already zero-initialized.
  
  Signed-off-by: Jan Kiszka jan.kis...@siemens.com
  ---
   hw/device-assignment.c |   18 ++
   hw/device-assignment.h |2 +-
   2 files changed, 11 insertions(+), 9 deletions(-)
  
  diff --git a/hw/device-assignment.c b/hw/device-assignment.c
  index ef045f4..26d3bd7 100644
  --- a/hw/device-assignment.c
  +++ b/hw/device-assignment.c
  @@ -525,14 +525,17 @@ again:
   DEBUG((%x.%x): address=%04x val=0x%08x len=%d\n,
 (d-devfn  3)  0x1F, (d-devfn  0x7), address, val, len);
   
  -if (pci_dev-need_emulate_cmd 
  +if (pci_dev-emulate_cmd_mask 
   ranges_overlap(address, len, PCI_COMMAND, 2)) {
   if (address == PCI_COMMAND) {
  -val = 0x;
  -val |= pci_default_read_config(d, PCI_COMMAND, 2);
  +val = ~pci_dev-emulate_cmd_mask;
  +val |= pci_default_read_config(d, PCI_COMMAND, 2) 
  +pci_dev-emulate_cmd_mask;
   } else {
   /* high-byte access */
  -val = pci_default_read_config(d, PCI_COMMAND+1, 1);
  +val = ~(pci_dev-emulate_cmd_mask  8);
  +val |= pci_default_read_config(d, PCI_COMMAND+1, 1) 
  +(pci_dev-emulate_cmd_mask  8);
   }
   }
 
 We should definitely be using merge_bits here, this is the sort of thing
 I had in mind for it:
 
 val = merge_bits(val, pci_default_read_config(d, address, len), PCI_COMMAND, 
 pci_dev-emulate_cmd_mask);


Not a comment on this patch at all, rather on
assigned-devices support code generally:

I think code will become cleaner if we add a mask array mapping the
config space, along the lines of wmask and friends in pci.c

I think this will make it possible to mostly get rid of
merge_bits and tricky range check hacks.


  @@ -800,10 +803,9 @@ again:
   
   /* dealing with virtual function device */
   snprintf(name, sizeof(name), %sphysfn/, dir);
  -if (!stat(name, statbuf))
  -   pci_dev-need_emulate_cmd = 1;
  -else
  -   pci_dev-need_emulate_cmd = 0;
  +if (!stat(name, statbuf)) {
  +pci_dev-emulate_cmd_mask = 0x;
  +}
   
   dev-region_number = r;
   return 0;
  diff --git a/hw/device-assignment.h b/hw/device-assignment.h
  index c94a730..9ead022 100644
  --- a/hw/device-assignment.h
  +++ b/hw/device-assignment.h
  @@ -109,7 +109,7 @@ typedef struct AssignedDevice {
   void *msix_table_page;
   target_phys_addr_t msix_table_addr;
   int mmio_index;
  -int need_emulate_cmd;
  +uint32_t emulate_cmd_mask;
   char *configfd_name;
   QLIST_ENTRY(AssignedDevice) next;
   } AssignedDevice;
 
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AW: Freezing Windows 2008 x64bit guest

2010-12-15 Thread Vadim Rozenfeld
On Wed, 2010-12-15 at 00:57 +0100, Manfred Heubach wrote:
 Vadim Rozenfeld vrozenfe at redhat.com writes:
 
 
  On Mon, 2010-12-13 at 22:12 +0200, Dor Laor wrote:
   On 12/13/2010 09:42 PM, Manfred Heubach wrote:
   
I was running the host with Ubuntu 10.04 but upgraded to 10.10 - mainly
 because
of performance problems which were solved by the upgrade.
   
After the upgrade the system became extremly unstable. It was crashing 
as soon
as disk io and network io load was growing. 100% reproduceable with 
windows
server backup to an iscsi volume.
   
i had virtio drivers for storage and network installed (redhat/fedora 
1.1.11).
  
   Which fedora/rhel release is that?
 
 
 The host is Ubuntu 10.10 x64
 
 The drivers are from
 http://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/ 1.1.11-0
 released on 17-Aug-2010 - are there any newer drivers?
Yes, they are the most recent drivers at fedoraproject. Our rhel6
drivers are almost the same, except for a few non-critical, WHQL related
bug fixes.
However, there is one fix related to Large Send Offload (LSO) problem
which was fixed in rhel6, but I don't see the relevant changes in fedora
kvmnet driver sources
http://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/src/
 
 
 
   What's the windows virtio driver version?
 
 The virtio storage version shown in Windows is 6.0.0.10
 
  
   Have you tried using virt-manager/virhs instead of raw cmdline?
 
 I'm starting it with libvirt/virsh
 
 cmd-line copied from the log (and some log entries):
 
 LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin
 QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 8192 -smp
 4,sockets=4,cores=1,threads=1 -name sbs2008 -uuid
 933c2ef2-e5b0-0b39-db60-016b5d226534 -nodefaults -chardev
 socket,id=monitor,path=/var/lib/libvirt/qemu/sbs2008.monitor,server,nowait 
 -mon
 chardev=monitor,mode=readline -rtc base=localtime -boot c -drive
 file=/var/lib/libvirt/images/olscanner/virtio-win-1.1.11-0.iso,if=none,media=cdrom,
 id=drive-ide0-1-0,readonly=on,format=raw
 -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive
 file=/var/lib/libvirt/images/sbs2008/sbs2008.img,if=none,id=drive-virtio-disk0,
 boot=on,format=qcow2 -device
 virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
 -drive
 file=/dev/volg1/sbsdata,if=none,id=drive-virtio-disk1,format=raw,cache=none
 -device
 virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1
 -drive 
 file=/dev/volg1/wsus,if=none,id=drive-virtio-disk2,format=raw,cache=none
 -device
 virtio-blk-pci,bus=pci.0,addr=0x7,drive=drive-virtio-disk2,id=virtio-disk2
 -device e1000,vlan=0,id=net0,mac=52:54:00:8a:bc:c9,bus=pci.0,addr=0x6 -net
 tap,fd=107,vlan=0,name=hostnet0 -chardev pty,id=serial0 -device
 isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc 0.0.0.0:0 -k
 de -vga std -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3
 07:12:02.715: debug : qemudInitCpuAffinity:2423 : Setting CPU affinity
 07:12:02.717: debug : qemuSecurityDACSetProcessLabel:547 : Dropping privileges
 of VM to 105:114
 char device redirected to /dev/pts/0
 pci_add_option_rom: failed to find romfile pxe-e1000.bin
 
 
   About e1000, some windows comes with buggy driver and an update e1000
   from Intel fixes some issues.
  
 
 I'm running latest drivers from Intel: 8.3.15.0
 
  
At each BSOD I had the following line in the log of the guest:
   
  virtio_ioport_write: unexpected address 0x13 value 0x1
Seems to be the result of calling PapaNdis_OnBugCheck function

   
I changed the network interface back to e1000. What I experience now 
(and
 I had
that a the very beginning before i switched to virtio network) are
 freezes. The
guest doesn't respond anymore (doesn't answer to pings and doesn't
 interact via
mouse/keyboard anymore). Host CPU usage of the kvm process is 100% on 
as many
cores as there are virtual cpus (in this case 4).
 
 I had a crash today but no logentry on the host - but that could be because I
 had to restart syslog (ran out of diskspace after turning on debug logging ob
 libvirtd - didn't think that it would generate 6 GB of logs per day :-)
 
   
  Sounds like an interrupt storm to me. Can you try to ping your VM?
 
 No responds to ping.
 
  Anyway the best way to start debugging a stalled system is just to crash
  it with BSOD. For doing it you will need:
  - enable NMICrashDump (please see http://support.microsoft.com/kb/927069
  for more information
  - enable Kernel Memory Dump (actually Complete is much better, but it
  can be too big)  http://support.microsoft.com/kb/969028
  - you only will need to type nmi 0 in the qemu monitor to crash the
  system, when the system hangs next time.
 
 I prepared this. When the system crashed today I didn't have the complete
 memory dump ready - so I only have a minidump. The intersting point is that
 the system today crashed 

Re: [Qemu-devel] [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError

2010-12-15 Thread Markus Armbruster
Lai Jiangshan la...@cn.fujitsu.com writes:

 Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt).

 changed from v1
 Add document.
 Add error handling when the cpu index is invalid.

 changed from v2
 use QERR_INVALID_PARAMETER_VALUE as Markus suggest.

 Signed-off-by:  Lai Jiangshan la...@cn.fujitsu.com

A note on commit messages:

The commit message should describe the current version of the patch.
Don't repeat the subject line in the body.

Patch history is very useful for review, but usually uninteresting once
the patch is committed.  Thus, it's best to put it after the ---
separator.

Subsystem tags in the subject line are helpful.  But qemu doesn't
provide any information there :)


Regarding the patch:

The conversion looks good.

The new QMP command is called inject_nmi, while the existing human
monitor command is called nmi.  Luiz asked for this name change.  I
don't mind.  But should we rename the human monitor command for
consistency?

Regardless, the differing command name is worth mentioning in the commit
message.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 0/4] KVM in-kernel PM Timer implementation

2010-12-15 Thread Ulrich Obergfell

- Anthony Liguori anth...@codemonkey.ws wrote:

 On 12/14/2010 06:09 AM, Ulrich Obergfell wrote:

[...]

  Parts 1 thru 4 of this RFC contain experimental source code which
  I recently used to investigate the performance benefit. In a Linux
  guest, I was running a program that calls gettimeofday() 'n' times
  in a loop (the PM Timer register is read during each call). With
  in-kernel PM Timer, I observed a significant reduction of program
  execution time.
 
 
 I've played with this in the past.  Can you post real numbers, 
 preferably, with a real work load?


Anthony,

I only experimented with a gettimeofday() loop. With this test scenario
I observed that in-kernel PM Timer reduced the program execution time to
roughly half of the execution time that it takes with userspace PM Timer.
Please find some example results below (these results were obtained while
the host was not busy). The relative difference of in-kernel PM Timer
versus userspace PM Timer is high, whereas the absolute difference per
call appears to be low. So, the benefit much depends on how frequently
gettimeofday() is called in a real work load. I don't have any numbers
from a real work load. When I began working on this, I was motivated by
the fact that the Linux kernel itself provides an optimization for the
gettimeofday() call ('vxtime'). So, from this I presumed that there
would be real work loads which would benefit from the optimization of
the gettimeofday() call (otherwise, why would we have 'vxtime' ?).
Of course, 'vxtime' is not related to PM based time keeping. However,
the experimental code shows an approach to optimize gettimeofday() in
KVM virtual machines.


Regards,

Uli


- host:

# grep model name /proc/cpuinfo | sort | uniq -c
  8 model name : Intel(R) Core(TM) i7 CPU   Q 740  @ 1.73GHz

# uname -r
2.6.37-rc4


- guest:

# grep model name /proc/cpuinfo | sort | uniq -c
  4 model name : QEMU Virtual CPU version 0.13.50


- test program ('gtod.c'):

#include sys/time.h
#include stdlib.h

struct timeval tv;

main(int argc, char *argv[])
{
int i = atoi(argv[1]);
while (i--  0)
gettimeofday(tv, NULL);
}


- example results with in-kernel PM Timer:

# for i in 1 2 3
 do
 time ./gtod 2500
 done

real0m44.302s
user0m1.090s
sys 0m43.163s

real0m44.509s
user0m1.100s
sys 0m43.393s

real0m45.290s
user0m1.160s
sys 0m44.123s

# for i in 1000 5000 1
 do
 time ./gtod $i
 done

real0m17.981s
user0m0.810s
sys 0m17.157s

real1m27.253s
user0m1.930s
sys 1m25.307s

real2m51.801s
user0m3.359s
sys 2m48.384s


- example results with userspace PM Timer:

# for i in 1 2 3
 do
 time ./gtod 2500
 done

real1m24.185s
user0m2.000s
sys 1m22.168s

real1m23.508s
user0m1.750s
sys 1m21.738s

real1m24.437s
user0m1.900s
sys 1m22.517s

# for i in 1000 5000 1
 do
 time ./gtod $i
 done

real0m33.479s
user0m0.680s
sys 0m32.785s

real2m50.831s
user0m3.389s
sys 2m47.405s

real5m42.304s
user0m7.319s
sys 5m34.919s
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/4] genirq: Inform handler about line sharing state

2010-12-15 Thread Thomas Gleixner
On Wed, 15 Dec 2010, Jan Kiszka wrote:
 Am 14.12.2010 21:54, Thomas Gleixner wrote:
  On Mon, 13 Dec 2010, Jan Kiszka wrote:
  @@ -943,6 +950,9 @@ static struct irqaction *__free_irq(unsigned int irq, 
  void *dev_id)
 /* Make sure it's not being used on another CPU: */
 synchronize_irq(irq);
   
  +  if (single_handler)
  +  desc-irq_data.drv_status = ~IRQS_SHARED;
  +
  
  What's the reason to clear this flag outside of the desc-lock held
  region.
 
 We need to synchronize the irq first before clearing the flag.
 
 The problematic scenario behind this: An IRQ started in shared mode,
 this the line was unmasked after the hardirq. Now we clear IRQS_SHARED
 before calling into the threaded handler. And that handler may now think
 that the line is still masked as IRQS_SHARED is set.

That should read not set I guess. Hmm, needs more thoughts :(
 
  I need this status for other purposes as well, where I
  definitely need serialization.
 
 Well, two options: wrap all bit manipulations with desc-lock
 acquisition/release or turn drv_status into an atomic. I don't know what
 your plans with drv_status are, so...

Some bits for irq migration and other stuff, which allows us to avoid
fiddling with irqdesc in the drivers.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/28] nVMX: Add VMX and SVM to list of supported cpuid features

2010-12-15 Thread Nadav Har'El
On Thu, Dec 09, 2010, Joerg Roedel wrote about Re: [PATCH 02/28] nVMX: Add VMX 
and SVM to list of supported cpuid features:
 This patch should be the last one in your series because VMX should be
 fully supported before it is reported to userspace.
 
   Joerg

Thanks, good idea - especially for bisection (where we don't want a guest
to see half the nested VMX feature). 

I also removed the silly reference to SVM in the title of this patch ;-)

Nadav.

-- 
Nadav Har'El| Wednesday, Dec 15 2010, 8 Tevet 5771
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |Bore, n.: A person who talks when you
http://nadav.harel.org.il   |wish him to listen.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/4] genirq: Inform handler about line sharing state

2010-12-15 Thread Jan Kiszka
Am 15.12.2010 14:04, Thomas Gleixner wrote:
 On Wed, 15 Dec 2010, Jan Kiszka wrote:
 Am 14.12.2010 21:54, Thomas Gleixner wrote:
 On Mon, 13 Dec 2010, Jan Kiszka wrote:
 @@ -943,6 +950,9 @@ static struct irqaction *__free_irq(unsigned int irq, 
 void *dev_id)
/* Make sure it's not being used on another CPU: */
synchronize_irq(irq);
  
 +  if (single_handler)
 +  desc-irq_data.drv_status = ~IRQS_SHARED;
 +

 What's the reason to clear this flag outside of the desc-lock held
 region.

 We need to synchronize the irq first before clearing the flag.

 The problematic scenario behind this: An IRQ started in shared mode,
 this the line was unmasked after the hardirq. Now we clear IRQS_SHARED
 before calling into the threaded handler. And that handler may now think
 that the line is still masked as IRQS_SHARED is set.
 
 That should read not set I guess.

Can't remember who wrote this, but that guy might have been too tired
for clear sentences: Yes, of course, we could run into troubles, if
IRQS_SHARED was _not_ set while the IRQ line is unmasked between hard
and threaded handler.

 Hmm, needs more thoughts :(

Be warned, might be painful.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/4] genirq: Inform handler about line sharing state

2010-12-15 Thread Thomas Gleixner
On Wed, 15 Dec 2010, Jan Kiszka wrote:

 Am 15.12.2010 14:04, Thomas Gleixner wrote:
  On Wed, 15 Dec 2010, Jan Kiszka wrote:
  Am 14.12.2010 21:54, Thomas Gleixner wrote:
  On Mon, 13 Dec 2010, Jan Kiszka wrote:
  @@ -943,6 +950,9 @@ static struct irqaction *__free_irq(unsigned int 
  irq, void *dev_id)
   /* Make sure it's not being used on another CPU: */
   synchronize_irq(irq);
   
  +if (single_handler)
  +desc-irq_data.drv_status = ~IRQS_SHARED;
  +
 
  What's the reason to clear this flag outside of the desc-lock held
  region.
 
  We need to synchronize the irq first before clearing the flag.
 
  The problematic scenario behind this: An IRQ started in shared mode,
  this the line was unmasked after the hardirq. Now we clear IRQS_SHARED
  before calling into the threaded handler. And that handler may now think
  that the line is still masked as IRQS_SHARED is set.
  
  That should read not set I guess.
 
 Can't remember who wrote this, but that guy might have been too tired
 for clear sentences: Yes, of course, we could run into troubles, if
 IRQS_SHARED was _not_ set while the IRQ line is unmasked between hard
 and threaded handler.

Right.

As a side note, the current implementation requires that you lookup
irq_data.drv_status for every invocation of the handler or have a
reference to irq_data.drv_status somewhere locally, which I don't like
either.

I have an evil and nasy idea how to avoid that, need to look how ugly
it gets. Worst case we need to go back to that notification thing
which I wanted really avoid in the first place.
 
Though I like the register_mutex idea which came out of this a lot as
it allows us to reduce desc-lock held and interrupt disabled regions
quite nicely.

/me goes back to stare at the code

  Hmm, needs more thoughts :(
 
 Be warned, might be painful.

Bah, my brain became pain resistant when I started hacking that code.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/4] genirq: Inform handler about line sharing state

2010-12-15 Thread Thomas Gleixner
On Wed, 15 Dec 2010, Jan Kiszka wrote:

 Am 15.12.2010 14:04, Thomas Gleixner wrote:
  On Wed, 15 Dec 2010, Jan Kiszka wrote:
  Am 14.12.2010 21:54, Thomas Gleixner wrote:
  On Mon, 13 Dec 2010, Jan Kiszka wrote:
  @@ -943,6 +950,9 @@ static struct irqaction *__free_irq(unsigned int 
  irq, void *dev_id)
   /* Make sure it's not being used on another CPU: */
   synchronize_irq(irq);
   
  +if (single_handler)
  +desc-irq_data.drv_status = ~IRQS_SHARED;
  +
 
  What's the reason to clear this flag outside of the desc-lock held
  region.
 
  We need to synchronize the irq first before clearing the flag.
 
  The problematic scenario behind this: An IRQ started in shared mode,
  this the line was unmasked after the hardirq. Now we clear IRQS_SHARED
  before calling into the threaded handler. And that handler may now think
  that the line is still masked as IRQS_SHARED is set.
  
  That should read not set I guess.
 
 Can't remember who wrote this, but that guy might have been too tired
 for clear sentences: Yes, of course, we could run into troubles, if
 IRQS_SHARED was _not_ set while the IRQ line is unmasked between hard
 and threaded handler.
 
  Hmm, needs more thoughts :(
 
 Be warned, might be painful.

Talking about headache. Your solution above does not prevent that
scenario.

 CPU 0  CPU 1

 synchronize_irq();
hard irq comes in sees shared and unmasks
 clear IRQS_SHARED
thread handler runs and sees !SHARED

Same scenario, just moved by a few lines :)

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/4] genirq: Inform handler about line sharing state

2010-12-15 Thread Jan Kiszka
Am 15.12.2010 16:41, Thomas Gleixner wrote:
 On Wed, 15 Dec 2010, Jan Kiszka wrote:
 
 Am 15.12.2010 14:04, Thomas Gleixner wrote:
 On Wed, 15 Dec 2010, Jan Kiszka wrote:
 Am 14.12.2010 21:54, Thomas Gleixner wrote:
 On Mon, 13 Dec 2010, Jan Kiszka wrote:
 @@ -943,6 +950,9 @@ static struct irqaction *__free_irq(unsigned int 
 irq, void *dev_id)
  /* Make sure it's not being used on another CPU: */
  synchronize_irq(irq);
  
 +if (single_handler)
 +desc-irq_data.drv_status = ~IRQS_SHARED;
 +

 What's the reason to clear this flag outside of the desc-lock held
 region.

 We need to synchronize the irq first before clearing the flag.

 The problematic scenario behind this: An IRQ started in shared mode,
 this the line was unmasked after the hardirq. Now we clear IRQS_SHARED
 before calling into the threaded handler. And that handler may now think
 that the line is still masked as IRQS_SHARED is set.

 That should read not set I guess.

 Can't remember who wrote this, but that guy might have been too tired
 for clear sentences: Yes, of course, we could run into troubles, if
 IRQS_SHARED was _not_ set while the IRQ line is unmasked between hard
 and threaded handler.

 Hmm, needs more thoughts :(

 Be warned, might be painful.
 
 Talking about headache. Your solution above does not prevent that
 scenario.
 
  CPU 0  CPU 1
   
  synchronize_irq();
   hard irq comes in sees shared and unmasks

Nope, IRQ_ONESHOT is already cleared at that point.

  clear IRQS_SHARED
   thread handler runs and sees !SHARED
 
 Same scenario, just moved by a few lines :)

The same, just the other way around - and mostly harmless, I hope. :)

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] qemu-kvm: Switch to upstream -enable-kvm semantics

2010-12-15 Thread Markus Armbruster
We currently enable KVM by default, and when it's not available, we
print a message and fall back to TCG.  Option -enable-kvm is ignored.
Option -no-kvm suppresses KVM.

Upstream works differently: KVM is off by default, -enable-kvm
switches it on.  -enable-kvm terminates the process unsuccessfully if
KVM is not available.

upstream qemu   |  default  |-enable-kvm
+---+---
KVM available   | disabled  |  enabled
KVM unavailable | disabled  |fail

qemu-kvm|  default  |-enable-kvm
+---+---
KVM available   |  enabled* |  enabled
KVM unavailable | disabled  | disabled*

* differs from upstream

Users of qemu and qemu-kvm need to be aware of these differences to
enable / disable use of KVM reliably.  This is bothersome.

Consider -enable-kvm when KVM is unavailable: If the user expects
qemu-kvm behavior (fall back), but qemu fails, he'll likely be
surprised and unhappy.  If the user expects upstream behavior (fail),
but qemu-kvm falls back to TCG, the guest runs slow as molasses, and
the user will likely be confused and unhappy (unless he spots and
understands the disable KVM message).

Switch to upstream semantics: KVM off by default, -enable-kvm switches
it on, and when it can't, it's fatal.

Having to enable KVM explicitly is annoying, but the proper place to
address that is upstream.

Signed-off-by: Markus Armbruster arm...@redhat.com
---
 vl.c |   10 +-
 1 files changed, 1 insertions(+), 9 deletions(-)

diff --git a/vl.c b/vl.c
index e3c8919..87e88c2 100644
--- a/vl.c
+++ b/vl.c
@@ -247,7 +247,7 @@ static void *boot_set_opaque;
 static NotifierList exit_notifiers =
 NOTIFIER_LIST_INITIALIZER(exit_notifiers);
 
-int kvm_allowed = 1;
+int kvm_allowed = 0;
 uint32_t xen_domid;
 enum xen_mode xen_mode = XEN_EMULATE;
 
@@ -2436,10 +2436,8 @@ int main(int argc, char **argv, char **envp)
 case QEMU_OPTION_smbios:
 do_smbios_option(optarg);
 break;
-#ifdef OBSOLETE_KVM_IMPL
 case QEMU_OPTION_enable_kvm:
 kvm_allowed = 1;
-#endif
 break;
case QEMU_OPTION_no_kvm:
kvm_allowed = 0;
@@ -2789,18 +2787,12 @@ int main(int argc, char **argv, char **envp)
 if (kvm_allowed) {
 int ret = kvm_init(smp_cpus);
 if (ret  0) {
-#if defined(OBSOLETE_KVM_IMPL) || defined(CONFIG_NO_CPU_EMULATION)
 if (!kvm_available()) {
 printf(KVM not supported for this target\n);
 } else {
 fprintf(stderr, failed to initialize KVM: %s\n, 
strerror(-ret));
 }
 exit(1);
-#endif
-#ifdef CONFIG_KVM
-fprintf(stderr, Could not initialize KVM, will disable KVM 
support\n);
-kvm_allowed = 0;
-#endif
 }
 }
 
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/4] genirq: Inform handler about line sharing state

2010-12-15 Thread Thomas Gleixner
On Wed, 15 Dec 2010, Jan Kiszka wrote:
 Am 15.12.2010 16:41, Thomas Gleixner wrote:
  Talking about headache. Your solution above does not prevent that
  scenario.
  
   CPU 0  CPU 1
  
   synchronize_irq();
  hard irq comes in sees shared and unmasks
 
 Nope, IRQ_ONESHOT is already cleared at that point.

Errm ? It's set. Could you please stop to increase my confusion ? :)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 00/14] API test framework

2010-12-15 Thread Avi Kivity
This patchset adds an API test framework.  Rather than driving kvm from qemu,
we now have a way of calling the kvm API directly and observing the results.
We can switch to guest mode and back at will and see any micro effects such
as the result of executing particular instructions.

A first test is added, for commit edde99ce0 (KVM: Write protect memory
after slot swap).  This would be pretty hard to test for using the current
qemu-driven tests, but detects 5 million failures in about two minutes on
my machine.

As an experiment, the framework is coded in C++.

Avi Kivity (14):
  Makefile: add support for C++
  Improve autodepend includes
v2: new
  Add exception class for kernel errors (errno)
v2: new
  Add try_main() for running a program under an exception handler
v2: new
  Introduce a C++ wrapper for the kvm APIs
v2: throw std::exception exceptions, not ints, for libc failures
better msr list management
  Add support for calling a function in guest mode
v2: use tr1::function instead of boost::function
  Add sample test using the api test harness
  api: add support for KVM_SET_USER_MEMORY_REGION flags field
v2: new
  api: support KVM_GET_DIRTY_LOG ioctl
v2: new
  api: add memory map management
v2: new
  Build tests with debug information
v2: new
  api: Add support for creating an identity map with a hole
v2: new
  Introduce libapi.a to avoid long Makefile recipes
v2: new
  Add dirty log test
v2: new

 Makefile  |8 +-
 api/api-sample.cc |   30 
 api/dirty-log.cc  |   78 
 api/exception.cc  |   33 
 api/exception.hh  |   19 +
 api/identity.cc   |   95 
 api/identity.hh   |   43 +++
 api/kvmxx.cc  |  194 +
 api/kvmxx.hh  |   85 +
 api/memmap.cc |   76 +++
 api/memmap.hh |   43 +++
 config-x86-common.mak |   15 -
 12 files changed, 715 insertions(+), 4 deletions(-)
 create mode 100644 api/api-sample.cc
 create mode 100644 api/dirty-log.cc
 create mode 100644 api/exception.cc
 create mode 100644 api/exception.hh
 create mode 100644 api/identity.cc
 create mode 100644 api/identity.hh
 create mode 100644 api/kvmxx.cc
 create mode 100644 api/kvmxx.hh
 create mode 100644 api/memmap.cc
 create mode 100644 api/memmap.hh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 01/14] Makefile: add support for C++

2010-12-15 Thread Avi Kivity
Signed-off-by: Avi Kivity a...@redhat.com
---
 Makefile |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/Makefile b/Makefile
index d25e6f2..9b0256d 100644
--- a/Makefile
+++ b/Makefile
@@ -30,11 +30,13 @@ CFLAGS += -O1
 CFLAGS += $(autodepend-flags) -g -fomit-frame-pointer -Wall
 CFLAGS += $(call cc-option, -fno-stack-protector, )
 CFLAGS += $(call cc-option, -fno-stack-protector-all, )
+CFLAGS += -I.
 
-CXXFLAGS = $(autodepend-flags)
+CXXFLAGS += $(CFLAGS)
 
 autodepend-flags = -MMD -MF $(dir $*).$(notdir $*).d
 
+LDFLAGS += $(CFLAGS)
 LDFLAGS += -pthread -lrt
 
 kvmtrace_objs= kvmtrace.o
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 03/14] Add exception class for kernel errors (errno)

2010-12-15 Thread Avi Kivity
Signed-off-by: Avi Kivity a...@redhat.com
---
 api/exception.cc |   20 
 api/exception.hh |   16 
 2 files changed, 36 insertions(+), 0 deletions(-)
 create mode 100644 api/exception.cc
 create mode 100644 api/exception.hh

diff --git a/api/exception.cc b/api/exception.cc
new file mode 100644
index 000..500569a
--- /dev/null
+++ b/api/exception.cc
@@ -0,0 +1,20 @@
+#include exception.hh
+#include cstdio
+#include cstring
+
+errno_exception::errno_exception(int errno)
+: _errno(errno)
+{
+}
+
+int errno_exception::errno() const
+{
+return _errno;
+}
+
+const char *errno_exception::what()
+{
+std::snprintf(_buf, sizeof _buf, error: %s (%d),
+ std::strerror(_errno), _errno);
+return _buf;
+}
diff --git a/api/exception.hh b/api/exception.hh
new file mode 100644
index 000..4672760
--- /dev/null
+++ b/api/exception.hh
@@ -0,0 +1,16 @@
+#ifndef EXCEPTION_HH
+#define EXCEPTION_HH
+
+#include exception
+
+class errno_exception : public std::exception {
+public:
+explicit errno_exception(int err_no);
+int errno() const;
+virtual const char *what();
+private:
+int _errno;
+char _buf[1000];
+};
+
+#endif
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 06/14] Add support for calling a function in guest mode

2010-12-15 Thread Avi Kivity
This patch provides a way to establish an identity guest which has
a 1:1 gva-hva translation.  This allows the host to switch to guest
mode, call a function in the same address space, and return.

Because long mode virtual addresses are 47 bits long, and some hosts
have smaller physical addresses, we target 32-bit mode only.  On
x86_64 the code needs to be run with 'setarch i386 -3' to limit the
address space to 3GB, so the address space occupied by the local
APIC is left unused.

Signed-off-by: Avi Kivity a...@redhat.com
---
 api/identity.cc   |   76 +
 api/identity.hh   |   28 ++
 config-x86-common.mak |1 +
 3 files changed, 105 insertions(+), 0 deletions(-)
 create mode 100644 api/identity.cc
 create mode 100644 api/identity.hh

diff --git a/api/identity.cc b/api/identity.cc
new file mode 100644
index 000..de52f68
--- /dev/null
+++ b/api/identity.cc
@@ -0,0 +1,76 @@
+
+#include identity.hh
+#include stdio.h
+
+namespace identity {
+
+typedef unsigned long ulong;
+
+void setup_vm(kvm::vm vm)
+{
+vm.set_memory_region(0, NULL, 0, 3UL  30);
+vm.set_tss_addr(3UL  30);
+}
+
+void vcpu::setup_sregs()
+{
+kvm_sregs sregs = { };
+kvm_segment dseg = { };
+dseg.base = 0; dseg.limit = -1U; dseg.type = 3; dseg.present = 1;
+dseg.dpl = 3; dseg.db = 1; dseg.s = 1; dseg.l = 0; dseg.g = 1;
+kvm_segment cseg = dseg;
+cseg.type = 11;
+
+sregs.cs = cseg; asm (mov %%cs, %0 : =rm(sregs.cs.selector));
+sregs.ds = dseg; asm (mov %%ds, %0 : =rm(sregs.ds.selector));
+sregs.es = dseg; asm (mov %%es, %0 : =rm(sregs.es.selector));
+sregs.fs = dseg; asm (mov %%fs, %0 : =rm(sregs.fs.selector));
+sregs.gs = dseg; asm (mov %%gs, %0 : =rm(sregs.gs.selector));
+sregs.ss = dseg; asm (mov %%ss, %0 : =rm(sregs.ss.selector));
+
+uint32_t gsbase;
+asm (mov %%gs:0, %0 : =r(gsbase));
+sregs.gs.base = gsbase;
+
+sregs.tr.base = reinterpret_castulong(*_stack.begin());
+sregs.tr.type = 11;
+sregs.tr.s = 0;
+sregs.tr.present = 1;
+
+sregs.cr0 = 0x11; /* PE, ET, !PG */
+sregs.cr4 = 0;
+sregs.efer = 0;
+sregs.apic_base = 0xfee0;
+_vcpu.set_sregs(sregs);
+}
+
+void vcpu::thunk(vcpu* zis)
+{
+zis-_guest_func();
+asm volatile(outb %%al, %%dx : : a(0), d(0));
+}
+
+void vcpu::setup_regs()
+{
+kvm_regs regs = {};
+regs.rflags = 0x3202;
+regs.rsp = reinterpret_castulong(*_stack.end());
+regs.rsp = ~15UL;
+ulong* sp = reinterpret_castulong *(regs.rsp);
+*--sp = reinterpret_castulong((char*)this);
+*--sp = 0;
+regs.rsp = reinterpret_castulong(sp);
+regs.rip = reinterpret_castulong(vcpu::thunk);
+printf(rip %llx\n, regs.rip);
+_vcpu.set_regs(regs);
+}
+
+vcpu::vcpu(kvm::vcpu vcpu, std::tr1::functionvoid () guest_func,
+   unsigned long stack_size)
+: _vcpu(vcpu), _guest_func(guest_func), _stack(stack_size)
+{
+setup_sregs();
+setup_regs();
+}
+
+}
diff --git a/api/identity.hh b/api/identity.hh
new file mode 100644
index 000..7401826
--- /dev/null
+++ b/api/identity.hh
@@ -0,0 +1,28 @@
+#ifndef API_IDENTITY_HH
+#define API_IDENTITY_HH
+
+#include kvmxx.hh
+#include tr1/functional
+#include vector
+
+namespace identity {
+
+void setup_vm(kvm::vm vm);
+
+class vcpu {
+public:
+vcpu(kvm::vcpu vcpu, std::tr1::functionvoid () guest_func,
+unsigned long stack_size = 256 * 1024);
+private:
+static void thunk(vcpu* vcpu);
+void setup_regs();
+void setup_sregs();
+private:
+kvm::vcpu _vcpu;
+std::tr1::functionvoid () _guest_func;
+std::vectorchar _stack;
+};
+
+}
+
+#endif
diff --git a/config-x86-common.mak b/config-x86-common.mak
index d22df17..dde4f67 100644
--- a/config-x86-common.mak
+++ b/config-x86-common.mak
@@ -81,3 +81,4 @@ arch_clean:
$(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat $(TEST_DIR)/*.elf \
$(TEST_DIR)/.*.d $(TEST_DIR)/lib/.*.d $(TEST_DIR)/lib/*.o
 
+api/%.o: CFLAGS += -m32
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 11/14] Build tests with debug information

2010-12-15 Thread Avi Kivity
Signed-off-by: Avi Kivity a...@redhat.com
---
 Makefile |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Makefile b/Makefile
index 85ebd37..b6e8759 100644
--- a/Makefile
+++ b/Makefile
@@ -6,7 +6,7 @@ DESTDIR := $(PREFIX)/share/qemu/tests
 .PHONY: arch_clean clean
 
 #make sure env CFLAGS variable is not used
-CFLAGS =
+CFLAGS = -g
 
 libgcc := $(shell $(CC) --print-libgcc-file-name)
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 07/14] Add sample test using the api test harness

2010-12-15 Thread Avi Kivity
Call a function setting a global variable.

Signed-off-by: Avi Kivity a...@redhat.com
---
 api/api-sample.cc |   29 +
 config-x86-common.mak |7 +++
 2 files changed, 36 insertions(+), 0 deletions(-)
 create mode 100644 api/api-sample.cc

diff --git a/api/api-sample.cc b/api/api-sample.cc
new file mode 100644
index 000..8d57c09
--- /dev/null
+++ b/api/api-sample.cc
@@ -0,0 +1,29 @@
+
+#include api/kvmxx.hh
+#include api/identity.hh
+#include api/exception.hh
+#include stdio.h
+
+static int global = 0;
+
+static void set_global()
+{
+global = 1;
+}
+
+int test_main(int ac, char** av)
+{
+kvm::system system;
+kvm::vm vm(system);
+identity::setup_vm(vm);
+kvm::vcpu vcpu(vm, 0);
+identity::vcpu thread(vcpu, set_global);
+vcpu.run();
+printf(global %d\n, global);
+return global == 1 ? 0 : 1;
+}
+
+int main(int ac, char** av)
+{
+return try_main(test_main, ac, av);
+}
diff --git a/config-x86-common.mak b/config-x86-common.mak
index dde4f67..3e8e641 100644
--- a/config-x86-common.mak
+++ b/config-x86-common.mak
@@ -32,6 +32,8 @@ tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \
$(TEST_DIR)/hypercall.flat $(TEST_DIR)/sieve.flat \
$(TEST_DIR)/kvmclock_test.flat
 
+tests-common += api/api-sample
+
 tests_and_config = $(TEST_DIR)/*.flat $(TEST_DIR)/unittests.cfg
 
 test_cases: $(tests-common) $(tests)
@@ -82,3 +84,8 @@ arch_clean:
$(TEST_DIR)/.*.d $(TEST_DIR)/lib/.*.d $(TEST_DIR)/lib/*.o
 
 api/%.o: CFLAGS += -m32
+
+api/api-sample: LDLIBS += -lstdc++
+api/api-sample: LDFLAGS += -m32
+
+api/api-sample: api/api-sample.o api/kvmxx.o api/identity.o api/exception.o
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 14/14] Add dirty log test

2010-12-15 Thread Avi Kivity
This checks the failure that was fixed by kernel commit edde99ce0529
(KVM: Write protect memory after slot swap).  Two threads are used;
a guest thread continuously updates a shared variable, which is also
sampled by a host thread that also checks if dirty logging marked it
as dirty.

It detects about 5 million failures with the fix reverted, and 0 failures
with the fix applied.

Signed-off-by: Avi Kivity a...@redhat.com
---
 api/dirty-log.cc  |   78 +
 config-x86-common.mak |7 +++-
 2 files changed, 83 insertions(+), 2 deletions(-)
 create mode 100644 api/dirty-log.cc

diff --git a/api/dirty-log.cc b/api/dirty-log.cc
new file mode 100644
index 000..1e4ef9e
--- /dev/null
+++ b/api/dirty-log.cc
@@ -0,0 +1,78 @@
+#include kvmxx.hh
+#include memmap.hh
+#include identity.hh
+#include boost/thread/thread.hpp
+#include stdlib.h
+#include stdio.h
+
+namespace {
+
+void delay_loop(unsigned n)
+{
+for (unsigned i = 0; i  n; ++i) {
+asm volatile(pause);
+}
+ }
+
+void write_mem(volatile bool running, volatile int* shared_var)
+{
+while (running) {
+++*shared_var;
+delay_loop(1000);
+}
+}
+
+void check_dirty_log(mem_slot slot,
+ volatile bool running,
+ volatile int* shared_var,
+ int nr_fail)
+{
+uint64_t shared_var_gpa = reinterpret_castuint64_t(shared_var);
+slot.set_dirty_logging(true);
+slot.update_dirty_log();
+for (int i = 0; i  1000; ++i) {
+int sample1 = *shared_var;
+delay_loop(600);
+int sample2 = *shared_var;
+slot.update_dirty_log();
+if (!slot.is_dirty(shared_var_gpa)  sample1 != sample2) {
+++nr_fail;
+}
+}
+running = false;
+slot.set_dirty_logging(false);
+}
+
+}
+
+using boost::ref;
+using std::tr1::bind;
+
+int main(int ac, char **av)
+{
+kvm::system sys;
+kvm::vm vm(sys);
+mem_map memmap(vm);
+void* logged_slot_virt;
+posix_memalign(logged_slot_virt, 4096, 4096);
+int* shared_var = static_castint*(logged_slot_virt);
+identity::hole hole(logged_slot_virt, 4096);
+identity::vm ident_vm(vm, memmap, hole);
+kvm::vcpu vcpu(vm, 0);
+bool running = true;
+int nr_fail = 0;
+mem_slot logged_slot(memmap,
+ reinterpret_castuint64_t(logged_slot_virt),
+ 4096, logged_slot_virt);
+boost::thread host_poll_thread(check_dirty_log, ref(logged_slot),
+   ref(running),
+   ref(shared_var), ref(nr_fail));
+identity::vcpu guest_write_thread(vcpu,
+  bind(write_mem,
+   ref(running),
+   ref(shared_var)));
+vcpu.run();
+host_poll_thread.join();
+printf(Dirty bitmap failures: %d\n, nr_fail);
+return nr_fail == 0 ? 0 : 1;
+}
diff --git a/config-x86-common.mak b/config-x86-common.mak
index ce36cde..b5c49f4 100644
--- a/config-x86-common.mak
+++ b/config-x86-common.mak
@@ -33,6 +33,7 @@ tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \
$(TEST_DIR)/kvmclock_test.flat
 
 tests-common += api/api-sample
+tests-common += api/dirty-log
 
 tests_and_config = $(TEST_DIR)/*.flat $(TEST_DIR)/unittests.cfg
 
@@ -85,10 +86,12 @@ arch_clean:
 
 api/%.o: CFLAGS += -m32
 
-api/%: LDLIBS += -lstdc++
+api/%: LDLIBS += -lstdc++ -lboost_thread-mt -lpthread
 api/%: LDFLAGS += -m32
 
 api/libapi.a: api/kvmxx.o api/identity.o api/exception.o api/memmap.o
$(AR) rcs $@ $^
 
-api/api-sample: api/api-sample.o api/libapi.a
\ No newline at end of file
+api/api-sample: api/api-sample.o api/libapi.a
+
+api/dirty-log: api/dirty-log.o api/libapi.a
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 13/14] Introduce libapi.a to avoid long Makefile recipes

2010-12-15 Thread Avi Kivity
Signed-off-by: Avi Kivity a...@redhat.com
---
 config-x86-common.mak |   10 ++
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/config-x86-common.mak b/config-x86-common.mak
index 436f4bd..ce36cde 100644
--- a/config-x86-common.mak
+++ b/config-x86-common.mak
@@ -85,8 +85,10 @@ arch_clean:
 
 api/%.o: CFLAGS += -m32
 
-api/api-sample: LDLIBS += -lstdc++
-api/api-sample: LDFLAGS += -m32
+api/%: LDLIBS += -lstdc++
+api/%: LDFLAGS += -m32
 
-api/api-sample: api/api-sample.o api/kvmxx.o api/identity.o api/exception.o
-api/api-sample: api/memmap.o
\ No newline at end of file
+api/libapi.a: api/kvmxx.o api/identity.o api/exception.o api/memmap.o
+   $(AR) rcs $@ $^
+
+api/api-sample: api/api-sample.o api/libapi.a
\ No newline at end of file
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 10/14] api: add memory map management

2010-12-15 Thread Avi Kivity
Add a class to manage the memory map and a class to represent
a memory slot.

Signed-off-by: Avi Kivity a...@redhat.com
---
 api/memmap.cc |   76 +
 api/memmap.hh |   43 
 2 files changed, 119 insertions(+), 0 deletions(-)
 create mode 100644 api/memmap.cc
 create mode 100644 api/memmap.hh

diff --git a/api/memmap.cc b/api/memmap.cc
new file mode 100644
index 000..c625852
--- /dev/null
+++ b/api/memmap.cc
@@ -0,0 +1,76 @@
+
+#include memmap.hh
+
+mem_slot::mem_slot(mem_map map, uint64_t gpa, uint64_t size, void* hva)
+: _map(map)
+, _slot(map._free_slots.top())
+, _gpa(gpa)
+, _size(size)
+, _hva(hva)
+, _dirty_log_enabled(false)
+, _log()
+{
+map._free_slots.pop();
+update();
+}
+
+mem_slot::~mem_slot()
+{
+_size = 0;
+try {
+update();
+_map._free_slots.push(_slot);
+} catch (...) {
+// can't do much if we can't undo slot registration - leak the slot
+}
+}
+
+void mem_slot::set_dirty_logging(bool enabled)
+{
+if (_dirty_log_enabled != enabled) {
+_dirty_log_enabled = enabled;
+if (enabled) {
+int logsize = ((_size  12) + bits_per_word - 1) / bits_per_word;
+_log.resize(logsize);
+} else {
+_log.resize(0);
+}
+update();
+}
+}
+
+void mem_slot::update()
+{
+uint32_t flags = 0;
+if (_dirty_log_enabled) {
+flags |= KVM_MEM_LOG_DIRTY_PAGES;
+}
+_map._vm.set_memory_region(_slot, _hva, _gpa, _size, flags);
+}
+
+bool mem_slot::dirty_logging() const
+{
+return _dirty_log_enabled;
+}
+
+void mem_slot::update_dirty_log()
+{
+_map._vm.get_dirty_log(_slot, _log[0]);
+}
+
+bool mem_slot::is_dirty(uint64_t gpa) const
+{
+uint64_t pagenr = (gpa - _gpa)  12;
+ulong wordnr = pagenr / bits_per_word;
+ulong bit = 1ULL  (pagenr % bits_per_word);
+return _log[wordnr]  bit;
+}
+
+mem_map::mem_map(kvm::vm vm)
+: _vm(vm)
+{
+int nr_slots = vm.sys().get_extension_int(KVM_CAP_NR_MEMSLOTS);
+for (int i = 0; i  nr_slots; ++i) {
+_free_slots.push(i);
+}
+}
diff --git a/api/memmap.hh b/api/memmap.hh
new file mode 100644
index 000..59ec619
--- /dev/null
+++ b/api/memmap.hh
@@ -0,0 +1,43 @@
+#ifndef MEMMAP_HH
+#define MEMMAP_HH
+
+#include kvmxx.hh
+#include stdint.h
+#include vector
+#include stack
+
+class mem_map;
+class mem_slot;
+
+class mem_slot {
+public:
+mem_slot(mem_map map, uint64_t gpa, uint64_t size, void *hva);
+~mem_slot();
+void set_dirty_logging(bool enabled);
+bool dirty_logging() const;
+void update_dirty_log();
+bool is_dirty(uint64_t gpa) const;
+private:
+void update();
+private:
+typedef unsigned long ulong;
+static const int bits_per_word = sizeof(ulong) * 8;
+mem_map _map;
+int _slot;
+uint64_t _gpa;
+uint64_t _size;
+void *_hva;
+bool _dirty_log_enabled;
+std::vectorulong _log;
+};
+
+class mem_map {
+public:
+mem_map(kvm::vm vm);
+private:
+kvm::vm _vm;
+std::stackint _free_slots;
+friend class mem_slot;
+};
+
+#endif
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 09/14] api: support KVM_GET_DIRTY_LOG ioctl

2010-12-15 Thread Avi Kivity
Signed-off-by: Avi Kivity a...@redhat.com
---
 api/kvmxx.cc |8 
 api/kvmxx.hh |1 +
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/api/kvmxx.cc b/api/kvmxx.cc
index 42e8781..7ebebb5 100644
--- a/api/kvmxx.cc
+++ b/api/kvmxx.cc
@@ -163,6 +163,14 @@ void vm::set_memory_region(int slot, void *addr, uint64_t 
gpa, size_t len,
 _fd.ioctlp(KVM_SET_USER_MEMORY_REGION, umr);
 }
 
+void vm::get_dirty_log(int slot, void *log)
+{
+struct kvm_dirty_log kdl;
+kdl.slot = slot;
+kdl.dirty_bitmap = log;
+_fd.ioctlp(KVM_GET_DIRTY_LOG, kdl);
+}
+
 void vm::set_tss_addr(uint32_t addr)
 {
 _fd.ioctl(KVM_SET_TSS_ADDR, addr);
diff --git a/api/kvmxx.hh b/api/kvmxx.hh
index 958d36f..1dcb41d 100644
--- a/api/kvmxx.hh
+++ b/api/kvmxx.hh
@@ -59,6 +59,7 @@ public:
 explicit vm(system system);
 void set_memory_region(int slot, void *addr, uint64_t gpa, size_t len,
uint32_t flags = 0);
+void get_dirty_log(int slot, void *log);
 void set_tss_addr(uint32_t addr);
 system sys() { return _system; }
 private:
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 12/14] api: Add support for creating an identity map with a hole

2010-12-15 Thread Avi Kivity
The hole may be used for mmio or dirty logging.

Signed-off-by: Avi Kivity a...@redhat.com
---
 api/api-sample.cc |3 ++-
 api/identity.cc   |   23 +--
 api/identity.hh   |   17 -
 config-x86-common.mak |1 +
 4 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/api/api-sample.cc b/api/api-sample.cc
index 8d57c09..524ad7b 100644
--- a/api/api-sample.cc
+++ b/api/api-sample.cc
@@ -15,7 +15,8 @@ int test_main(int ac, char** av)
 {
 kvm::system system;
 kvm::vm vm(system);
-identity::setup_vm(vm);
+mem_map memmap(vm);
+identity::vm ident_vm(vm, memmap);
 kvm::vcpu vcpu(vm, 0);
 identity::vcpu thread(vcpu, set_global);
 vcpu.run();
diff --git a/api/identity.cc b/api/identity.cc
index de52f68..e04231b 100644
--- a/api/identity.cc
+++ b/api/identity.cc
@@ -6,9 +6,28 @@ namespace identity {
 
 typedef unsigned long ulong;
 
-void setup_vm(kvm::vm vm)
+hole::hole()
+: address(), size()
 {
-vm.set_memory_region(0, NULL, 0, 3UL  30);
+}
+
+hole::hole(void* address, size_t size)
+: address(address), size(size)
+{
+}
+
+vm::vm(kvm::vm vm, mem_map mmap, hole h)
+{
+uint64_t hole_gpa = reinterpret_castuint64_t(h.address);
+char* hole_hva = static_castchar*(h.address);
+if (h.address) {
+_slots.push_back(mem_slot_ptr(new mem_slot(mmap, 0, hole_gpa, NULL)));
+}
+uint64_t hole_end = hole_gpa + h.size;
+uint64_t end = 3U  30;
+_slots.push_back(mem_slot_ptr(new mem_slot(mmap, hole_end,
+   end - hole_end,
+   hole_hva + h.size)));
 vm.set_tss_addr(3UL  30);
 }
 
diff --git a/api/identity.hh b/api/identity.hh
index 7401826..4491043 100644
--- a/api/identity.hh
+++ b/api/identity.hh
@@ -2,12 +2,27 @@
 #define API_IDENTITY_HH
 
 #include kvmxx.hh
+#include memmap.hh
 #include tr1/functional
+#include tr1/memory
 #include vector
 
 namespace identity {
 
-void setup_vm(kvm::vm vm);
+struct hole {
+hole();
+hole(void* address, size_t size);
+void* address;
+size_t size;
+};
+
+class vm {
+public:
+vm(kvm::vm vm, mem_map mmap, hole address_space_hole = hole());
+private:
+typedef std::tr1::shared_ptrmem_slot mem_slot_ptr;
+std::vectormem_slot_ptr _slots;
+};
 
 class vcpu {
 public:
diff --git a/config-x86-common.mak b/config-x86-common.mak
index 3e8e641..436f4bd 100644
--- a/config-x86-common.mak
+++ b/config-x86-common.mak
@@ -89,3 +89,4 @@ api/api-sample: LDLIBS += -lstdc++
 api/api-sample: LDFLAGS += -m32
 
 api/api-sample: api/api-sample.o api/kvmxx.o api/identity.o api/exception.o
+api/api-sample: api/memmap.o
\ No newline at end of file
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 08/14] api: add support for KVM_SET_USER_MEMORY_REGION flags field

2010-12-15 Thread Avi Kivity
Signed-off-by: Avi Kivity a...@redhat.com
---
 api/kvmxx.cc |5 +++--
 api/kvmxx.hh |3 ++-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/api/kvmxx.cc b/api/kvmxx.cc
index ad27907..42e8781 100644
--- a/api/kvmxx.cc
+++ b/api/kvmxx.cc
@@ -150,12 +150,13 @@ vm::vm(system system)
 {
 }
 
-void vm::set_memory_region(int slot, void *addr, uint64_t gpa, size_t len)
+void vm::set_memory_region(int slot, void *addr, uint64_t gpa, size_t len,
+   uint32_t flags)
 {
 struct kvm_userspace_memory_region umr;
 
 umr.slot = slot;
-umr.flags = 0;
+umr.flags = flags;
 umr.guest_phys_addr = gpa;
 umr.memory_size = len;
 umr.userspace_addr = reinterpret_castuint64_t(addr);
diff --git a/api/kvmxx.hh b/api/kvmxx.hh
index 51dbe7a..958d36f 100644
--- a/api/kvmxx.hh
+++ b/api/kvmxx.hh
@@ -57,7 +57,8 @@ private:
 class vm {
 public:
 explicit vm(system system);
-void set_memory_region(int slot, void *addr, uint64_t gpa, size_t len);
+void set_memory_region(int slot, void *addr, uint64_t gpa, size_t len,
+   uint32_t flags = 0);
 void set_tss_addr(uint32_t addr);
 system sys() { return _system; }
 private:
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 05/14] Introduce a C++ wrapper for the kvm APIs

2010-12-15 Thread Avi Kivity
Introduce exception-safe objects for calling system, vm, and vcpu ioctls.

Signed-off-by: Avi Kivity a...@redhat.com
---
 api/kvmxx.cc |  185 ++
 api/kvmxx.hh |   83 ++
 2 files changed, 268 insertions(+), 0 deletions(-)
 create mode 100644 api/kvmxx.cc
 create mode 100644 api/kvmxx.hh

diff --git a/api/kvmxx.cc b/api/kvmxx.cc
new file mode 100644
index 000..ad27907
--- /dev/null
+++ b/api/kvmxx.cc
@@ -0,0 +1,185 @@
+#include kvmxx.hh
+#include exception.hh
+#include fcntl.h
+#include sys/ioctl.h
+#include sys/mman.h
+#include stdlib.h
+#include memory
+#include algorithm
+
+namespace kvm {
+
+static long check_error(long r)
+{
+if (r == -1) {
+   throw errno_exception(errno);
+}
+return r;
+}
+
+fd::fd(int fd)
+: _fd(fd)
+{
+}
+
+fd::fd(const fd other)
+: _fd(::dup(other._fd))
+{
+check_error(_fd);
+}
+
+fd::fd(std::string device_node, int flags)
+: _fd(::open(device_node.c_str(), flags))
+{
+check_error(_fd);
+}
+
+long fd::ioctl(unsigned nr, long arg)
+{
+return check_error(::ioctl(_fd, nr, arg));
+}
+
+vcpu::vcpu(vm vm, int id)
+: _vm(vm), _fd(vm._fd.ioctl(KVM_CREATE_VCPU, id)), _shared(NULL)
+, _mmap_size(_vm._system._fd.ioctl(KVM_GET_VCPU_MMAP_SIZE, 0))
+
+{
+kvm_run *shared = static_castkvm_run*(::mmap(NULL, _mmap_size,
+  PROT_READ | PROT_WRITE,
+  MAP_SHARED,
+  _fd.get(), 0));
+if (shared == MAP_FAILED) {
+   throw errno_exception(errno);
+}
+_shared = shared;
+}
+
+vcpu::~vcpu()
+{
+munmap(_shared, _mmap_size);
+}
+
+void vcpu::run()
+{
+_fd.ioctl(KVM_RUN, 0);
+}
+
+kvm_regs vcpu::regs()
+{
+kvm_regs regs;
+_fd.ioctlp(KVM_GET_REGS, regs);
+return regs;
+}
+
+void vcpu::set_regs(const kvm_regs regs)
+{
+_fd.ioctlp(KVM_SET_REGS, const_castkvm_regs*(regs));
+}
+
+kvm_sregs vcpu::sregs()
+{
+kvm_sregs sregs;
+_fd.ioctlp(KVM_GET_SREGS, sregs);
+return sregs;
+}
+
+void vcpu::set_sregs(const kvm_sregs sregs)
+{
+_fd.ioctlp(KVM_SET_SREGS, const_castkvm_sregs*(sregs));
+}
+
+class vcpu::kvm_msrs_ptr {
+public:
+explicit kvm_msrs_ptr(size_t nmsrs);
+~kvm_msrs_ptr() { ::free(_kvm_msrs); }
+kvm_msrs* operator-() { return _kvm_msrs; }
+kvm_msrs* get() { return _kvm_msrs; }
+private:
+kvm_msrs* _kvm_msrs;
+};
+
+vcpu::kvm_msrs_ptr::kvm_msrs_ptr(size_t nmsrs)
+: _kvm_msrs(0)
+{
+size_t size = sizeof(kvm_msrs) + sizeof(kvm_msr_entry) * nmsrs;
+_kvm_msrs = static_castkvm_msrs*(::malloc(size));
+if (!_kvm_msrs) {
+   throw std::bad_alloc();
+}
+}
+
+std::vectorkvm_msr_entry vcpu::msrs(std::vectoruint32_t indices)
+{
+kvm_msrs_ptr msrs(indices.size());
+msrs-nmsrs = indices.size();
+for (unsigned i = 0; i  msrs-nmsrs; ++i) {
+   msrs-entries[i].index = indices[i];
+}
+_fd.ioctlp(KVM_GET_MSRS, msrs.get());
+return std::vectorkvm_msr_entry(msrs-entries,
+ msrs-entries + msrs-nmsrs);
+}
+
+void vcpu::set_msrs(const std::vectorkvm_msr_entry msrs)
+{
+kvm_msrs_ptr _msrs(msrs.size());
+_msrs-nmsrs = msrs.size();
+std::copy(msrs.begin(), msrs.end(), _msrs-entries);
+_fd.ioctlp(KVM_SET_MSRS, _msrs.get());
+}
+
+void vcpu::set_debug(uint64_t dr[8], bool enabled, bool singlestep)
+{
+kvm_guest_debug gd;
+
+gd.control = 0;
+if (enabled) {
+   gd.control |= KVM_GUESTDBG_ENABLE;
+}
+if (singlestep) {
+   gd.control |= KVM_GUESTDBG_SINGLESTEP;
+}
+for (int i = 0; i  8; ++i) {
+   gd.arch.debugreg[i] = dr[i];
+}
+_fd.ioctlp(KVM_SET_GUEST_DEBUG, gd);
+}
+
+vm::vm(system system)
+: _system(system), _fd(system._fd.ioctl(KVM_CREATE_VM, 0))
+{
+}
+
+void vm::set_memory_region(int slot, void *addr, uint64_t gpa, size_t len)
+{
+struct kvm_userspace_memory_region umr;
+
+umr.slot = slot;
+umr.flags = 0;
+umr.guest_phys_addr = gpa;
+umr.memory_size = len;
+umr.userspace_addr = reinterpret_castuint64_t(addr);
+_fd.ioctlp(KVM_SET_USER_MEMORY_REGION, umr);
+}
+
+void vm::set_tss_addr(uint32_t addr)
+{
+_fd.ioctl(KVM_SET_TSS_ADDR, addr);
+}
+
+system::system(std::string device_node)
+: _fd(device_node, O_RDWR)
+{
+}
+
+bool system::check_extension(int extension)
+{
+return _fd.ioctl(KVM_CHECK_EXTENSION, extension);
+}
+
+int system::get_extension_int(int extension)
+{
+return _fd.ioctl(KVM_CHECK_EXTENSION, extension);
+}
+
+};
diff --git a/api/kvmxx.hh b/api/kvmxx.hh
new file mode 100644
index 000..51dbe7a
--- /dev/null
+++ b/api/kvmxx.hh
@@ -0,0 +1,83 @@
+#ifndef KVMXX_H
+#define KVMXX_H
+
+#include string
+#include signal.h
+#include unistd.h
+#include vector
+#include errno.h
+#include linux/kvm.h
+#include stdint.h
+
+namespace kvm {
+
+class system;
+class vm;
+class vcpu;
+class fd;

[PATCH kvm-unit-tests v2 04/14] Add try_main() for running a program under an exception handler

2010-12-15 Thread Avi Kivity
Signed-off-by: Avi Kivity a...@redhat.com
---
 api/exception.cc |   13 +
 api/exception.hh |3 +++
 2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/api/exception.cc b/api/exception.cc
index 500569a..910bdff 100644
--- a/api/exception.cc
+++ b/api/exception.cc
@@ -18,3 +18,16 @@ const char *errno_exception::what()
  std::strerror(_errno), _errno);
 return _buf;
 }
+
+int try_main(int (*main)(int argc, char** argv), int argc, char** argv,
+int ret_on_exception)
+{
+try {
+return main(argc, argv);
+} catch (std::exception e) {
+std::fprintf(stderr, exception: %s\n, e.what());
+} catch (...) {
+std::fprintf(stderr, unknown exception\n);
+}
+return ret_on_exception;
+}
diff --git a/api/exception.hh b/api/exception.hh
index 4672760..f78d9a1 100644
--- a/api/exception.hh
+++ b/api/exception.hh
@@ -13,4 +13,7 @@ private:
 char _buf[1000];
 };
 
+int try_main(int (*main)(int argc, char** argv), int argc, char** argv,
+int ret_on_exception = 127);
+
 #endif
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH-v2 0/3] KVM: Take missing slots_lock for kvm_io_bus_unregister_dev()

2010-12-15 Thread Takuya Yoshikawa
This patchset consists of three patches:

  [1/3] KVM: Take missing slots_lock for kvm_io_bus_unregister_dev()
  [2/3] KVM: Centralize slots_lock aquisition during KVM_CREATE_IRQCHIP
  [3/3] KVM: Centralize irq_lock aquisition during KVM_CREATE_IRQCHIP

Though patch 1 is a fix, this is only for an error handling path.
Patches 2,3 are just for coding style consistency. So pick up those
whitch match your policy.


Changelog:
  moved slots_lock and irq_lock aquisition to their caller following
  Marcelo's advice.

Thanks,
  Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM: Take missing slots_lock for kvm_io_bus_unregister_dev()

2010-12-15 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp

In KVM_CREATE_IRQCHIP, kvm_io_bus_unregister_dev() is called without taking
slots_lock in the error handling path.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 arch/ia64/kvm/kvm-ia64.c |2 ++
 arch/x86/kvm/x86.c   |4 
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 48a48bd..70d224d 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -951,7 +951,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
goto out;
r = kvm_setup_default_irq_routing(kvm);
if (r) {
+   mutex_lock(kvm-slots_lock);
kvm_ioapic_destroy(kvm);
+   mutex_unlock(kvm-slots_lock);
goto out;
}
break;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8d76150..3113aaf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3308,8 +3308,10 @@ long kvm_arch_vm_ioctl(struct file *filp,
if (vpic) {
r = kvm_ioapic_init(kvm);
if (r) {
+   mutex_lock(kvm-slots_lock);
kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS,
  vpic-dev);
+   mutex_unlock(kvm-slots_lock);
kfree(vpic);
goto create_irqchip_unlock;
}
@@ -3320,10 +3322,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
smp_wmb();
r = kvm_setup_default_irq_routing(kvm);
if (r) {
+   mutex_lock(kvm-slots_lock);
mutex_lock(kvm-irq_lock);
kvm_ioapic_destroy(kvm);
kvm_destroy_pic(kvm);
mutex_unlock(kvm-irq_lock);
+   mutex_unlock(kvm-slots_lock);
}
create_irqchip_unlock:
mutex_unlock(kvm-lock);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM: Centralize slots_lock aquisition during KVM_CREATE_IRQCHIP

2010-12-15 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp

Move slots_lock aquisition from kvm_ioapic_init() and kvm_create_pic()
to their caller.

As a result, x86's KVM_CREATE_IRQCHIP is now covered by a unified slots_lock
section, including kvm_setup_default_irq_routing().

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 arch/ia64/kvm/kvm-ia64.c |2 ++
 arch/x86/kvm/i8259.c |2 --
 arch/x86/kvm/x86.c   |6 ++
 virt/kvm/ioapic.c|2 --
 4 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 70d224d..060c594 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -946,7 +946,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
}
case KVM_CREATE_IRQCHIP:
r = -EFAULT;
+   mutex_lock(kvm-slots_lock);
r = kvm_ioapic_init(kvm);
+   mutex_unlock(kvm-slots_lock);
if (r)
goto out;
r = kvm_setup_default_irq_routing(kvm);
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index f628234..37d24bc 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -580,9 +580,7 @@ struct kvm_pic *kvm_create_pic(struct kvm *kvm)
 * Initialize PIO device
 */
kvm_iodevice_init(s-dev, picdev_ops);
-   mutex_lock(kvm-slots_lock);
ret = kvm_io_bus_register_dev(kvm, KVM_PIO_BUS, s-dev);
-   mutex_unlock(kvm-slots_lock);
if (ret  0) {
kfree(s);
return NULL;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3113aaf..736ab93 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3300,6 +3300,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
struct kvm_pic *vpic;
 
mutex_lock(kvm-lock);
+   mutex_lock(kvm-slots_lock);
r = -EEXIST;
if (kvm-arch.vpic)
goto create_irqchip_unlock;
@@ -3308,10 +3309,8 @@ long kvm_arch_vm_ioctl(struct file *filp,
if (vpic) {
r = kvm_ioapic_init(kvm);
if (r) {
-   mutex_lock(kvm-slots_lock);
kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS,
  vpic-dev);
-   mutex_unlock(kvm-slots_lock);
kfree(vpic);
goto create_irqchip_unlock;
}
@@ -3322,14 +3321,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
smp_wmb();
r = kvm_setup_default_irq_routing(kvm);
if (r) {
-   mutex_lock(kvm-slots_lock);
mutex_lock(kvm-irq_lock);
kvm_ioapic_destroy(kvm);
kvm_destroy_pic(kvm);
mutex_unlock(kvm-irq_lock);
-   mutex_unlock(kvm-slots_lock);
}
create_irqchip_unlock:
+   mutex_unlock(kvm-slots_lock);
mutex_unlock(kvm-lock);
break;
}
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 0b9df83..532bffc 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -393,9 +393,7 @@ int kvm_ioapic_init(struct kvm *kvm)
kvm_ioapic_reset(ioapic);
kvm_iodevice_init(ioapic-dev, ioapic_mmio_ops);
ioapic-kvm = kvm;
-   mutex_lock(kvm-slots_lock);
ret = kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS, ioapic-dev);
-   mutex_unlock(kvm-slots_lock);
if (ret  0) {
kvm-arch.vioapic = NULL;
kfree(ioapic);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM: Centralize irq_lock aquisition during KVM_CREATE_IRQCHIP

2010-12-15 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp

Move irq_lock aquisition from kvm_setup_default_irq_routing(), inside of
kvm_set_irq_routing(), to its caller.

This makes the lock management clearer, with a bit longer irq_lock section.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 arch/ia64/kvm/kvm-ia64.c |2 ++
 arch/x86/kvm/x86.c   |4 ++--
 virt/kvm/irq_comm.c  |2 --
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 060c594..34a1699 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -951,7 +951,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
mutex_unlock(kvm-slots_lock);
if (r)
goto out;
+   mutex_lock(kvm-irq_lock);
r = kvm_setup_default_irq_routing(kvm);
+   mutex_unlock(kvm-irq_lock);
if (r) {
mutex_lock(kvm-slots_lock);
kvm_ioapic_destroy(kvm);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 736ab93..5c70869 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3319,13 +3319,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
smp_wmb();
kvm-arch.vpic = vpic;
smp_wmb();
+   mutex_lock(kvm-irq_lock);
r = kvm_setup_default_irq_routing(kvm);
if (r) {
-   mutex_lock(kvm-irq_lock);
kvm_ioapic_destroy(kvm);
kvm_destroy_pic(kvm);
-   mutex_unlock(kvm-irq_lock);
}
+   mutex_unlock(kvm-irq_lock);
create_irqchip_unlock:
mutex_unlock(kvm-slots_lock);
mutex_unlock(kvm-lock);
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 9f614b4..265fd2f 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -407,10 +407,8 @@ int kvm_set_irq_routing(struct kvm *kvm,
++ue;
}
 
-   mutex_lock(kvm-irq_lock);
old = kvm-irq_routing;
kvm_irq_routing_update(kvm, new);
-   mutex_unlock(kvm-irq_lock);
 
synchronize_rcu();
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] qemu,qmp: convert do_inject_nmi() to QObject, QError

2010-12-15 Thread Luiz Capitulino
On Wed, 15 Dec 2010 17:49:27 +0800
Lai Jiangshan la...@cn.fujitsu.com wrote:

 
 Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt).
 
 changed from v1
 Add document.
 Add error handling when the cpu index is invalid.
 
 changed from v2
 use QERR_INVALID_PARAMETER_VALUE as Markus suggest.
 
 Signed-off-by:  Lai Jiangshan la...@cn.fujitsu.com
 ---
 diff --git a/hmp-commands.hx b/hmp-commands.hx
 index 23024ba..f86d9fe 100644
 --- a/hmp-commands.hx
 +++ b/hmp-commands.hx
 @@ -724,7 +724,8 @@ ETEXI
  .args_type  = cpu_index:i,
  .params = cpu,
  .help   = inject an NMI on the given CPU,
 -.mhandler.cmd = do_inject_nmi,
 +.user_print = monitor_user_noop,
 +.mhandler.cmd_new = do_inject_nmi,
  },
  #endif
  STEXI
 diff --git a/monitor.c b/monitor.c
 index ec31eac..3e33a96 100644
 --- a/monitor.c
 +++ b/monitor.c
 @@ -2119,7 +2119,7 @@ static void do_wav_capture(Monitor *mon, const QDict 
 *qdict)
  #endif
  
  #if defined(TARGET_I386)
 -static void do_inject_nmi(Monitor *mon, const QDict *qdict)
 +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject 
 **ret_data)
  {
  CPUState *env;
  int cpu_index = qdict_get_int(qdict, cpu_index);
 @@ -2127,8 +2127,12 @@ static void do_inject_nmi(Monitor *mon, const QDict 
 *qdict)
  for (env = first_cpu; env != NULL; env = env-next_cpu)
  if (env-cpu_index == cpu_index) {
  cpu_interrupt(env, CPU_INTERRUPT_NMI);
 -break;
 +return 0;
  }
 +
 +qerror_report(QERR_INVALID_PARAMETER_VALUE, cpu_index,
 +  a CPU number);
 +return -1;
  }
  #endif
  
 diff --git a/qmp-commands.hx b/qmp-commands.hx
 index e5f157f..fcb6bf2 100644
 --- a/qmp-commands.hx
 +++ b/qmp-commands.hx
 @@ -429,6 +429,33 @@ Example:
  
  EQMP
  
 +#if defined(TARGET_I386)
 +{
 +.name   = inject_nmi,
 +.args_type  = cpu_index:i,
 +.params = cpu,
 +.help   = inject an NMI on the given CPU,
 +.user_print = monitor_user_noop,
 +.mhandler.cmd_new = do_inject_nmi,
 +},
 +#endif
 +SQMP
 +inject_nmi
 +--
 +
 +Inject an NMI on the given CPU (x86 only).
 +
 +Arguments:
 +
 +- cpu_index: the index of the CPU to be injected NMI (json-int)

Please, use cpu-index, that's what we're using for the human-monitor-command.

Avi, Anthony, can you please ACK this new command?

 +
 +Example:
 +
 +- { execute: inject_nmi, arguments: { cpu_index: 0 } }
 +- { return: {} }
 +
 +EQMP
 +
  {
  .name   = migrate,
  .args_type  = detach:-d,blk:-b,inc:-i,uri:s,
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError

2010-12-15 Thread Luiz Capitulino
On Wed, 15 Dec 2010 11:49:23 +0100
Markus Armbruster arm...@redhat.com wrote:

 Lai Jiangshan la...@cn.fujitsu.com writes:
 
  Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt).
 
  changed from v1
  Add document.
  Add error handling when the cpu index is invalid.
 
  changed from v2
  use QERR_INVALID_PARAMETER_VALUE as Markus suggest.
 
  Signed-off-by:  Lai Jiangshan la...@cn.fujitsu.com
 
 A note on commit messages:
 
 The commit message should describe the current version of the patch.
 Don't repeat the subject line in the body.
 
 Patch history is very useful for review, but usually uninteresting once
 the patch is committed.  Thus, it's best to put it after the ---
 separator.
 
 Subsystem tags in the subject line are helpful.  But qemu doesn't
 provide any information there :)
 
 
 Regarding the patch:
 
 The conversion looks good.
 
 The new QMP command is called inject_nmi, while the existing human
 monitor command is called nmi.  Luiz asked for this name change.  I
 don't mind.  But should we rename the human monitor command for
 consistency?

I don't think so, we don't need (and maybe don't even want) naming parity
between QMP and HMP. Remember that one of our mistakes was to couple the two.

Also, Avi asked for more descriptive names in QMP and I agree with him, I
would even be in favor of calling it inject-non-maskable-interrupt.

 
 Regardless, the differing command name is worth mentioning in the commit
 message.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] qemu,qmp: convert do_inject_nmi() to QObject, QError

2010-12-15 Thread Avi Kivity

On 12/15/2010 07:09 PM, Luiz Capitulino wrote:

On Wed, 15 Dec 2010 17:49:27 +0800
Lai Jiangshanla...@cn.fujitsu.com  wrote:


  Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt).

  changed from v1
  Add document.
  Add error handling when the cpu index is invalid.

  changed from v2
  use QERR_INVALID_PARAMETER_VALUE as Markus suggest.

  Signed-off-by:  Lai Jiangshanla...@cn.fujitsu.com
  ---
  diff --git a/hmp-commands.hx b/hmp-commands.hx
  index 23024ba..f86d9fe 100644
  --- a/hmp-commands.hx
  +++ b/hmp-commands.hx
  @@ -724,7 +724,8 @@ ETEXI
   .args_type  = cpu_index:i,
   .params = cpu,
   .help   = inject an NMI on the given CPU,
  -.mhandler.cmd = do_inject_nmi,
  +.user_print = monitor_user_noop,
  +.mhandler.cmd_new = do_inject_nmi,
   },
   #endif
   STEXI
  diff --git a/monitor.c b/monitor.c
  index ec31eac..3e33a96 100644
  --- a/monitor.c
  +++ b/monitor.c
  @@ -2119,7 +2119,7 @@ static void do_wav_capture(Monitor *mon, const QDict 
*qdict)
   #endif

   #if defined(TARGET_I386)
  -static void do_inject_nmi(Monitor *mon, const QDict *qdict)
  +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject 
**ret_data)
   {
   CPUState *env;
   int cpu_index = qdict_get_int(qdict, cpu_index);
  @@ -2127,8 +2127,12 @@ static void do_inject_nmi(Monitor *mon, const QDict 
*qdict)
   for (env = first_cpu; env != NULL; env = env-next_cpu)
   if (env-cpu_index == cpu_index) {
   cpu_interrupt(env, CPU_INTERRUPT_NMI);
  -break;
  +return 0;
   }
  +
  +qerror_report(QERR_INVALID_PARAMETER_VALUE, cpu_index,
  +  a CPU number);
  +return -1;
   }
   #endif

  diff --git a/qmp-commands.hx b/qmp-commands.hx
  index e5f157f..fcb6bf2 100644
  --- a/qmp-commands.hx
  +++ b/qmp-commands.hx
  @@ -429,6 +429,33 @@ Example:

   EQMP

  +#if defined(TARGET_I386)
  +{
  +.name   = inject_nmi,
  +.args_type  = cpu_index:i,
  +.params = cpu,
  +.help   = inject an NMI on the given CPU,
  +.user_print = monitor_user_noop,
  +.mhandler.cmd_new = do_inject_nmi,
  +},
  +#endif
  +SQMP
  +inject_nmi
  +--
  +
  +Inject an NMI on the given CPU (x86 only).
  +
  +Arguments:
  +
  +- cpu_index: the index of the CPU to be injected NMI (json-int)

Please, use cpu-index, that's what we're using for the human-monitor-command.

Avi, Anthony, can you please ACK this new command?



I'd like to see cpu-index made optional; if not present, nmi all cpus 
(that's what the nmi button on many machines does, or at least I think 
that's what it does).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v4 1/2] Minimal RAM API support

2010-12-15 Thread Paul Brook
 This adds a minimum chunk of Anthony's RAM API support so that we
 can identify actual VM RAM versus all the other things that make
 use of qemu_ram_alloc.

Why do we care? How are you defining actual VM RAM?

Surely the whole point of qemu_ram_alloc is to allocate a chunk of memory that 
can be mapped into the guest physical address space, so all uses of 
qemu_ram_alloc should be using this API.

Paul
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] qemu,qmp: convert do_inject_nmi() to QObject, QError

2010-12-15 Thread Luiz Capitulino
On Wed, 15 Dec 2010 19:18:32 +0200
Avi Kivity a...@redhat.com wrote:

 On 12/15/2010 07:09 PM, Luiz Capitulino wrote:
  On Wed, 15 Dec 2010 17:49:27 +0800
  Lai Jiangshanla...@cn.fujitsu.com  wrote:
 
  
Convert do_inject_nmi() to QObject, QError, we need to use it(via 
   libvirt).
  
changed from v1
Add document.
Add error handling when the cpu index is invalid.
  
changed from v2
use QERR_INVALID_PARAMETER_VALUE as Markus suggest.
  
Signed-off-by:  Lai Jiangshanla...@cn.fujitsu.com
---
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 23024ba..f86d9fe 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -724,7 +724,8 @@ ETEXI
 .args_type  = cpu_index:i,
 .params = cpu,
 .help   = inject an NMI on the given CPU,
-.mhandler.cmd = do_inject_nmi,
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_inject_nmi,
 },
 #endif
 STEXI
diff --git a/monitor.c b/monitor.c
index ec31eac..3e33a96 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2119,7 +2119,7 @@ static void do_wav_capture(Monitor *mon, const 
   QDict *qdict)
 #endif
  
 #if defined(TARGET_I386)
-static void do_inject_nmi(Monitor *mon, const QDict *qdict)
+static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject 
   **ret_data)
 {
 CPUState *env;
 int cpu_index = qdict_get_int(qdict, cpu_index);
@@ -2127,8 +2127,12 @@ static void do_inject_nmi(Monitor *mon, const 
   QDict *qdict)
 for (env = first_cpu; env != NULL; env = env-next_cpu)
 if (env-cpu_index == cpu_index) {
 cpu_interrupt(env, CPU_INTERRUPT_NMI);
-break;
+return 0;
 }
+
+qerror_report(QERR_INVALID_PARAMETER_VALUE, cpu_index,
+  a CPU number);
+return -1;
 }
 #endif
  
diff --git a/qmp-commands.hx b/qmp-commands.hx
index e5f157f..fcb6bf2 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -429,6 +429,33 @@ Example:
  
 EQMP
  
+#if defined(TARGET_I386)
+{
+.name   = inject_nmi,
+.args_type  = cpu_index:i,
+.params = cpu,
+.help   = inject an NMI on the given CPU,
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_inject_nmi,
+},
+#endif
+SQMP
+inject_nmi
+--
+
+Inject an NMI on the given CPU (x86 only).
+
+Arguments:
+
+- cpu_index: the index of the CPU to be injected NMI (json-int)
 
  Please, use cpu-index, that's what we're using for the 
  human-monitor-command.
 
  Avi, Anthony, can you please ACK this new command?
 
 
 I'd like to see cpu-index made optional; if not present, nmi all cpus 
 (that's what the nmi button on many machines does, or at least I think 
 that's what it does).

Looks like a GUI feature to me, _might_ turn out to be an undesirable
side effect to client writers. I guess I prefer a to-all-cpus argument.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm: Switch to upstream -enable-kvm semantics

2010-12-15 Thread Anthony Liguori

On 12/15/2010 09:50 AM, Markus Armbruster wrote:

We currently enable KVM by default, and when it's not available, we
print a message and fall back to TCG.  Option -enable-kvm is ignored.
Option -no-kvm suppresses KVM.

Upstream works differently: KVM is off by default, -enable-kvm
switches it on.  -enable-kvm terminates the process unsuccessfully if
KVM is not available.

upstream qemu   |  default  |-enable-kvm
+---+---
KVM available   | disabled  |  enabled
KVM unavailable | disabled  |fail

qemu-kvm|  default  |-enable-kvm
+---+---
KVM available   |  enabled* |  enabled
KVM unavailable | disabled  | disabled*

* differs from upstream

Users of qemu and qemu-kvm need to be aware of these differences to
enable / disable use of KVM reliably.  This is bothersome.

Consider -enable-kvm when KVM is unavailable: If the user expects
qemu-kvm behavior (fall back), but qemu fails, he'll likely be
surprised and unhappy.  If the user expects upstream behavior (fail),
but qemu-kvm falls back to TCG, the guest runs slow as molasses, and
the user will likely be confused and unhappy (unless he spots and
understands the disable KVM message).

Switch to upstream semantics: KVM off by default, -enable-kvm switches
it on, and when it can't, it's fatal.

Having to enable KVM explicitly is annoying, but the proper place to
address that is upstream.

Signed-off-by: Markus Armbrusterarm...@redhat.com
   


Backwards compatibility is going to kill us if we try to make this change.

Current qemu-kvm behavior:

default: -accel kvm,tcg
-no-kvm: -accel tcg
-enable-kvm: -accel kvm,tcg

Current upstream behavior

default: -accel tcg
-enable-kvm: -accel kvm

I think we should tie `-accel' to the machine type.  For qemu-kvm, a 
different default machine type should be used than upstream qemu (it 
really should be a configure switch).


For `pc', the default `-accel' behavior should remain 'tcg'.  For 
`kvmpc', the default `-accel' behavior should be 'kvm,tcg'.


-no-kvm should be deprecated.  -enable-kvm should also be deprecated in 
favor of the `-accel' option.


In the short term, it would be a good idea to modify qemu-kvm to switch 
the -enable-kvm semantics to match upstream (fail if KVM isn't 
available).  Adding an alias for 'kvmpc' upstream and qemu-kvm and 
making qemu-kvm default to 'kvmpc' would be helpful for management tools 
too.


Regards,

Anthony Liguori


---
  vl.c |   10 +-
  1 files changed, 1 insertions(+), 9 deletions(-)

diff --git a/vl.c b/vl.c
index e3c8919..87e88c2 100644
--- a/vl.c
+++ b/vl.c
@@ -247,7 +247,7 @@ static void *boot_set_opaque;
  static NotifierList exit_notifiers =
  NOTIFIER_LIST_INITIALIZER(exit_notifiers);

-int kvm_allowed = 1;
+int kvm_allowed = 0;
  uint32_t xen_domid;
  enum xen_mode xen_mode = XEN_EMULATE;

@@ -2436,10 +2436,8 @@ int main(int argc, char **argv, char **envp)
  case QEMU_OPTION_smbios:
  do_smbios_option(optarg);
  break;
-#ifdef OBSOLETE_KVM_IMPL
  case QEMU_OPTION_enable_kvm:
  kvm_allowed = 1;
-#endif
  break;
case QEMU_OPTION_no_kvm:
kvm_allowed = 0;
@@ -2789,18 +2787,12 @@ int main(int argc, char **argv, char **envp)
  if (kvm_allowed) {
  int ret = kvm_init(smp_cpus);
  if (ret  0) {
-#if defined(OBSOLETE_KVM_IMPL) || defined(CONFIG_NO_CPU_EMULATION)
  if (!kvm_available()) {
  printf(KVM not supported for this target\n);
  } else {
  fprintf(stderr, failed to initialize KVM: %s\n, 
strerror(-ret));
  }
  exit(1);
-#endif
-#ifdef CONFIG_KVM
-fprintf(stderr, Could not initialize KVM, will disable KVM 
support\n);
-kvm_allowed = 0;
-#endif
  }
  }

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] qemu,qmp: convert do_inject_nmi() to QObject, QError

2010-12-15 Thread Markus Armbruster
Luiz Capitulino lcapitul...@redhat.com writes:

 On Wed, 15 Dec 2010 19:18:32 +0200
 Avi Kivity a...@redhat.com wrote:

 On 12/15/2010 07:09 PM, Luiz Capitulino wrote:
  On Wed, 15 Dec 2010 17:49:27 +0800
  Lai Jiangshanla...@cn.fujitsu.com  wrote:
 
  
Convert do_inject_nmi() to QObject, QError, we need to use it(via 
   libvirt).
  
changed from v1
Add document.
Add error handling when the cpu index is invalid.
  
changed from v2
use QERR_INVALID_PARAMETER_VALUE as Markus suggest.
  
Signed-off-by:  Lai Jiangshanla...@cn.fujitsu.com
---
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 23024ba..f86d9fe 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -724,7 +724,8 @@ ETEXI
 .args_type  = cpu_index:i,
 .params = cpu,
 .help   = inject an NMI on the given CPU,
-.mhandler.cmd = do_inject_nmi,
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_inject_nmi,
 },
 #endif
 STEXI
diff --git a/monitor.c b/monitor.c
index ec31eac..3e33a96 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2119,7 +2119,7 @@ static void do_wav_capture(Monitor *mon, const 
   QDict *qdict)
 #endif
  
 #if defined(TARGET_I386)
-static void do_inject_nmi(Monitor *mon, const QDict *qdict)
+static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject 
   **ret_data)
 {
 CPUState *env;
 int cpu_index = qdict_get_int(qdict, cpu_index);
@@ -2127,8 +2127,12 @@ static void do_inject_nmi(Monitor *mon, const 
   QDict *qdict)
 for (env = first_cpu; env != NULL; env = env-next_cpu)
 if (env-cpu_index == cpu_index) {
 cpu_interrupt(env, CPU_INTERRUPT_NMI);
-break;
+return 0;
 }
+
+qerror_report(QERR_INVALID_PARAMETER_VALUE, cpu_index,
+  a CPU number);
+return -1;
 }
 #endif
  
diff --git a/qmp-commands.hx b/qmp-commands.hx
index e5f157f..fcb6bf2 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -429,6 +429,33 @@ Example:
  
 EQMP
  
+#if defined(TARGET_I386)
+{
+.name   = inject_nmi,
+.args_type  = cpu_index:i,
+.params = cpu,
+.help   = inject an NMI on the given CPU,
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_inject_nmi,
+},
+#endif
+SQMP
+inject_nmi
+--
+
+Inject an NMI on the given CPU (x86 only).
+
+Arguments:
+
+- cpu_index: the index of the CPU to be injected NMI (json-int)
 
  Please, use cpu-index, that's what we're using for the 
  human-monitor-command.
 
  Avi, Anthony, can you please ACK this new command?
 
 
 I'd like to see cpu-index made optional; if not present, nmi all cpus 
 (that's what the nmi button on many machines does, or at least I think 
 that's what it does).

 Looks like a GUI feature to me,

Really?  Can't see how you can build NMI to all CPUs from NMI this
CPU.  Or am I misunderstanding you?

 _might_ turn out to be an undesirable
 side effect to client writers.

They seem to be coping fine with optional arguments elsewhere.

I guess I prefer a to-all-cpus argument.

How would that look like?  cpu-index: all?

I find optional json-int a simpler schema than either a json-int or
the json-string all.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError

2010-12-15 Thread Luiz Capitulino
On Wed, 15 Dec 2010 18:39:07 +0100
Markus Armbruster arm...@redhat.com wrote:

 Luiz Capitulino lcapitul...@redhat.com writes:
 
  On Wed, 15 Dec 2010 11:49:23 +0100
  Markus Armbruster arm...@redhat.com wrote:
 
  Lai Jiangshan la...@cn.fujitsu.com writes:
  
   Convert do_inject_nmi() to QObject, QError, we need to use it(via 
   libvirt).
  
   changed from v1
   Add document.
   Add error handling when the cpu index is invalid.
  
   changed from v2
   use QERR_INVALID_PARAMETER_VALUE as Markus suggest.
  
   Signed-off-by:  Lai Jiangshan la...@cn.fujitsu.com
  
  A note on commit messages:
  
  The commit message should describe the current version of the patch.
  Don't repeat the subject line in the body.
  
  Patch history is very useful for review, but usually uninteresting once
  the patch is committed.  Thus, it's best to put it after the ---
  separator.
  
  Subsystem tags in the subject line are helpful.  But qemu doesn't
  provide any information there :)
  
  
  Regarding the patch:
  
  The conversion looks good.
  
  The new QMP command is called inject_nmi, while the existing human
  monitor command is called nmi.  Luiz asked for this name change.  I
  don't mind.  But should we rename the human monitor command for
  consistency?
 
  I don't think so, we don't need (and maybe don't even want) naming parity
  between QMP and HMP. Remember that one of our mistakes was to couple the 
  two.
 
 Human nmi and QMP inject_nmi are identical commands, aren't they?

At this point in time yes, but they might not be in the near future. Assuming
they might be different is the safest thing to do.

That's true for all existing commands.

 Giving the same things the same name isn't coupling :)

Expecting them to be the same in the future is.

 The mistake that matters here was adopting existing human commands for
 QMP uncritically.

That's the protocol visible mistake, yes.

  Also, Avi asked for more descriptive names in QMP and I agree with him, I
  would even be in favor of calling it inject-non-maskable-interrupt.
 
 I like inject_nmi better than nmi.  inject-non-maskable-interrupt is too
 long even for QMP.

It's not supposed to be typed that much, but I'm not that strong about that.

nitpick: I think we should be consistent in the use of _ or -, eg. we
 should pick inject-nmi or inject_nmi?

 
  Regardless, the differing command name is worth mentioning in the commit
  message.
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm: Switch to upstream -enable-kvm semantics

2010-12-15 Thread Markus Armbruster
Anthony Liguori anth...@codemonkey.ws writes:

 On 12/15/2010 09:50 AM, Markus Armbruster wrote:
 We currently enable KVM by default, and when it's not available, we
 print a message and fall back to TCG.  Option -enable-kvm is ignored.
 Option -no-kvm suppresses KVM.

 Upstream works differently: KVM is off by default, -enable-kvm
 switches it on.  -enable-kvm terminates the process unsuccessfully if
 KVM is not available.

 upstream qemu   |  default  |-enable-kvm
 +---+---
 KVM available   | disabled  |  enabled
 KVM unavailable | disabled  |fail

 qemu-kvm|  default  |-enable-kvm
 +---+---
 KVM available   |  enabled* |  enabled
 KVM unavailable | disabled  | disabled*

 * differs from upstream

 Users of qemu and qemu-kvm need to be aware of these differences to
 enable / disable use of KVM reliably.  This is bothersome.

 Consider -enable-kvm when KVM is unavailable: If the user expects
 qemu-kvm behavior (fall back), but qemu fails, he'll likely be
 surprised and unhappy.  If the user expects upstream behavior (fail),
 but qemu-kvm falls back to TCG, the guest runs slow as molasses, and
 the user will likely be confused and unhappy (unless he spots and
 understands the disable KVM message).

 Switch to upstream semantics: KVM off by default, -enable-kvm switches
 it on, and when it can't, it's fatal.

 Having to enable KVM explicitly is annoying, but the proper place to
 address that is upstream.

 Signed-off-by: Markus Armbrusterarm...@redhat.com


 Backwards compatibility is going to kill us if we try to make this change.

 Current qemu-kvm behavior:

 default: -accel kvm,tcg
 -no-kvm: -accel tcg
 -enable-kvm: -accel kvm,tcg

 Current upstream behavior

 default: -accel tcg
 -enable-kvm: -accel kvm

 I think we should tie `-accel' to the machine type.  For qemu-kvm, a
 different default machine type should be used than upstream qemu (it
 really should be a configure switch).

 For `pc', the default `-accel' behavior should remain 'tcg'.  For
 kvmpc', the default `-accel' behavior should be 'kvm,tcg'.

 -no-kvm should be deprecated.  -enable-kvm should also be deprecated
 in favor of the `-accel' option.

I'm fine with -accel and deprecating the old options.  But until we have
that:

 In the short term, it would be a good idea to modify qemu-kvm to
 switch the -enable-kvm semantics to match upstream (fail if KVM isn't
 available).

That's what my patch does.

Additionally, it changes the default to match upstream: KVM disabled.

What do you want changed in my patch?

  Adding an alias for 'kvmpc' upstream and qemu-kvm and
 making qemu-kvm default to 'kvmpc' would be helpful for management
 tools too.

That would address Having to enable KVM explicitly is annoying.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] qemu,qmp: convert do_inject_nmi() to QObject, QError

2010-12-15 Thread Luiz Capitulino
On Wed, 15 Dec 2010 18:45:09 +0100
Markus Armbruster arm...@redhat.com wrote:

 Luiz Capitulino lcapitul...@redhat.com writes:
 
  On Wed, 15 Dec 2010 19:18:32 +0200
  Avi Kivity a...@redhat.com wrote:
 
  On 12/15/2010 07:09 PM, Luiz Capitulino wrote:
   On Wed, 15 Dec 2010 17:49:27 +0800
   Lai Jiangshanla...@cn.fujitsu.com  wrote:
  
   
 Convert do_inject_nmi() to QObject, QError, we need to use it(via 
libvirt).
   
 changed from v1
 Add document.
 Add error handling when the cpu index is invalid.
   
 changed from v2
 use QERR_INVALID_PARAMETER_VALUE as Markus suggest.
   
 Signed-off-by:  Lai Jiangshanla...@cn.fujitsu.com
 ---
 diff --git a/hmp-commands.hx b/hmp-commands.hx
 index 23024ba..f86d9fe 100644
 --- a/hmp-commands.hx
 +++ b/hmp-commands.hx
 @@ -724,7 +724,8 @@ ETEXI
  .args_type  = cpu_index:i,
  .params = cpu,
  .help   = inject an NMI on the given CPU,
 -.mhandler.cmd = do_inject_nmi,
 +.user_print = monitor_user_noop,
 +.mhandler.cmd_new = do_inject_nmi,
  },
  #endif
  STEXI
 diff --git a/monitor.c b/monitor.c
 index ec31eac..3e33a96 100644
 --- a/monitor.c
 +++ b/monitor.c
 @@ -2119,7 +2119,7 @@ static void do_wav_capture(Monitor *mon, const 
QDict *qdict)
  #endif
   
  #if defined(TARGET_I386)
 -static void do_inject_nmi(Monitor *mon, const QDict *qdict)
 +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject 
**ret_data)
  {
  CPUState *env;
  int cpu_index = qdict_get_int(qdict, cpu_index);
 @@ -2127,8 +2127,12 @@ static void do_inject_nmi(Monitor *mon, const 
QDict *qdict)
  for (env = first_cpu; env != NULL; env = env-next_cpu)
  if (env-cpu_index == cpu_index) {
  cpu_interrupt(env, CPU_INTERRUPT_NMI);
 -break;
 +return 0;
  }
 +
 +qerror_report(QERR_INVALID_PARAMETER_VALUE, cpu_index,
 +  a CPU number);
 +return -1;
  }
  #endif
   
 diff --git a/qmp-commands.hx b/qmp-commands.hx
 index e5f157f..fcb6bf2 100644
 --- a/qmp-commands.hx
 +++ b/qmp-commands.hx
 @@ -429,6 +429,33 @@ Example:
   
  EQMP
   
 +#if defined(TARGET_I386)
 +{
 +.name   = inject_nmi,
 +.args_type  = cpu_index:i,
 +.params = cpu,
 +.help   = inject an NMI on the given CPU,
 +.user_print = monitor_user_noop,
 +.mhandler.cmd_new = do_inject_nmi,
 +},
 +#endif
 +SQMP
 +inject_nmi
 +--
 +
 +Inject an NMI on the given CPU (x86 only).
 +
 +Arguments:
 +
 +- cpu_index: the index of the CPU to be injected NMI (json-int)
  
   Please, use cpu-index, that's what we're using for the 
   human-monitor-command.
  
   Avi, Anthony, can you please ACK this new command?
  
  
  I'd like to see cpu-index made optional; if not present, nmi all cpus 
  (that's what the nmi button on many machines does, or at least I think 
  that's what it does).
 
  Looks like a GUI feature to me,
 
 Really?  Can't see how you can build NMI to all CPUs from NMI this
 CPU.  Or am I misunderstanding you?

I guess so. Avi referred to 'nmi button on many machines', I assumed he
meant a virtual machine GUI, am I wrong?

  _might_ turn out to be an undesirable
  side effect to client writers.
 
 They seem to be coping fine with optional arguments elsewhere.

Which we might want to review.

 I guess I prefer a to-all-cpus argument.
 
 How would that look like?  cpu-index: all?

Like this:

{ execute: inject-nmi, arguments: { to-all-cpus: true } }

But this looks like an optimization to me, because it's also easy to do:

for cpu in query-cpus; do
  inject-nmi cpu

Unless we want to do this in an atomic way, due to side effects I'm
not aware about.

 I find optional json-int a simpler schema than either a json-int or
 the json-string all.
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v4 1/2] Minimal RAM API support

2010-12-15 Thread Alex Williamson
On Wed, 2010-12-15 at 17:23 +, Paul Brook wrote:
  This adds a minimum chunk of Anthony's RAM API support so that we
  can identify actual VM RAM versus all the other things that make
  use of qemu_ram_alloc.
 
 Why do we care? How are you defining actual VM RAM?
 
 Surely the whole point of qemu_ram_alloc is to allocate a chunk of memory 
 that 
 can be mapped into the guest physical address space, so all uses of 
 qemu_ram_alloc should be using this API.

http://wiki.qemu.org/Features/RamAPI

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v4 1/2] Minimal RAM API support

2010-12-15 Thread Anthony Liguori

On 12/15/2010 11:23 AM, Paul Brook wrote:

This adds a minimum chunk of Anthony's RAM API support so that we
can identify actual VM RAM versus all the other things that make
use of qemu_ram_alloc.
 

Why do we care? How are you defining actual VM RAM?

Surely the whole point of qemu_ram_alloc is to allocate a chunk of memory that
can be mapped into the guest physical address space, so all uses of
qemu_ram_alloc should be using this API.
   


actual VM RAM == the DIMM devices.  This address has exactly a 1-1 
mapping between memory content and an address.  It doesn't change during 
program execution.


It may be mapped in the CPU in weird ways, it may be visibly different 
to devices, but that's a different interface.


Why do we care about differentiating actual VM RAM from things that 
behave like RAM but are not actually RAM (like device ROM)?  Because the 
semantics are different.  ROM is non-volatile and RAM is volatile.  If 
we don't make that distinction in our interfaces, we loose the ability 
to model the behavioral differences.


For things like paravirtual devices, we can take short cuts (to optimize 
performance) by saying the device is directly connecting to RAM (and 
doesn't go through the normal translation hierarchy).


Regards,

Anthony Liguori


Paul
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ConVirt 2.0.1 Open Source released.

2010-12-15 Thread jd
We are pleased to announce availability of ConVirt 2.0.1 open
source. We would like to thank ConVirt user community for their
continuing participation and support. This release incorporates
feedback gathered from the community over last few months. 


To learn more about the release, please visit http://bit.ly/hZbHvW 

Thanks
ConVirt Team
http://www.convirture.com


  
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ConVirt 2.0.1 Open Source released.

2010-12-15 Thread Stefan Hajnoczi
On Wed, Dec 15, 2010 at 8:28 PM, jd jdsw2...@yahoo.com wrote:
 We are pleased to announce availability of ConVirt 2.0.1 open
 source. We would like to thank ConVirt user community for their
 continuing participation and support. This release incorporates
 feedback gathered from the community over last few months.

jd: A description of ConVirt would be nice.

Here's what I've figured out from the links:

It is a management tool for Xen, KVM, and others.  Written in Python
under the GPLv2 but developed as open core software (there's an
open source edition and an enterprise edition).  It talks to KVM
using the QEMU (human) monitor.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/21] virtio: modify save/load handler to handle inuse varialble.

2010-12-15 Thread Yoshiaki Tamura
2010/12/3 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp:
 2010/12/2 Michael S. Tsirkin m...@redhat.com:
 On Wed, Dec 01, 2010 at 05:03:43PM +0900, Yoshiaki Tamura wrote:
 2010/11/28 Michael S. Tsirkin m...@redhat.com:
  On Sun, Nov 28, 2010 at 08:27:58PM +0900, Yoshiaki Tamura wrote:
  2010/11/28 Michael S. Tsirkin m...@redhat.com:
   On Thu, Nov 25, 2010 at 03:06:44PM +0900, Yoshiaki Tamura wrote:
   Modify inuse type to uint16_t, let save/load to handle, and revert
   last_avail_idx with inuse if there are outstanding emulation.
  
   Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
  
   This changes migration format, so it will break compatibility with
   existing drivers. More generally, I think migrating internal
   state that is not guest visible is always a mistake
   as it ties migration format to an internal implementation
   (yes, I know we do this sometimes, but we should at least
   try not to add such cases).  I think the right thing to do in this case
   is to flush outstanding
   work when vm is stopped.  Then, we are guaranteed that inuse is 0.
   I sent patches that do this for virtio net and block.
 
  Could you give me the link of your patches?  I'd like to test
  whether they work with Kemari upon failover.  If they do, I'm
  happy to drop this patch.
 
  Yoshi
 
  Look for this:
  stable migration image on a stopped vm
  sent on:
  Wed, 24 Nov 2010 17:52:49 +0200

 Thanks for the info.

 However, The patch series above didn't solve the issue.  In
 case of Kemari, inuse is mostly  0 because it queues the
 output, and while last_avail_idx gets incremented
 immediately, not sending inuse makes the state inconsistent
 between Primary and Secondary.

 Hmm. Can we simply avoid incrementing last_avail_idx?

 I think we can calculate or prepare an internal last_avail_idx,
 and update the external when inuse is decremented.  I'll try
 whether it work w/ w/o Kemari.

Hi Michael,

Could you please take a look at the following patch?

commit 36ee7910059e6b236fe9467a609f5b4aed866912
Author: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Date:   Thu Dec 16 14:50:54 2010 +0900

virtio: update last_avail_idx when inuse is decreased.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp

diff --git a/hw/virtio.c b/hw/virtio.c
index c8a0fc6..6688c02 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -237,6 +237,7 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
 wmb();
 trace_virtqueue_flush(vq, count);
 vring_used_idx_increment(vq, count);
+vq-last_avail_idx += count;
 vq-inuse -= count;
 }

@@ -385,7 +386,7 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem)
 unsigned int i, head, max;
 target_phys_addr_t desc_pa = vq-vring.desc;

-if (!virtqueue_num_heads(vq, vq-last_avail_idx))
+if (!virtqueue_num_heads(vq, vq-last_avail_idx + vq-inuse))
 return 0;

 /* When we start there are none of either input nor output. */
@@ -393,7 +394,7 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem)

 max = vq-vring.num;

-i = head = virtqueue_get_head(vq, vq-last_avail_idx++);
+i = head = virtqueue_get_head(vq, vq-last_avail_idx + vq-inuse);

 if (vring_desc_flags(desc_pa, i)  VRING_DESC_F_INDIRECT) {
 if (vring_desc_len(desc_pa, i) % sizeof(VRingDesc)) {





  I'm wondering why
 last_avail_idx is OK to send but not inuse.

 last_avail_idx is at some level a mistake, it exposes part of
 our internal implementation, but it does *also* express
 a guest observable state.

 Here's the problem that it solves: just looking at the rings in virtio
 there is no way to detect that a specific request has already been
 completed. And the protocol forbids completing the same request twice.

 Our implementation always starts processing the requests
 in order, and since we flush outstanding requests
 before save, it works to just tell the remote 'process only requests
 after this place'.

 But there's no such requirement in the virtio protocol,
 so to be really generic we could add a bitmask of valid avail
 ring entries that did not complete yet. This would be
 the exact representation of the guest observable state.
 In practice we have rings of up to 512 entries.
 That's 64 byte per ring, not a lot at all.

 However, if we ever do change the protocol to send the bitmask,
 we would need some code to resubmit requests
 out of order, so it's not trivial.

 Another minor mistake with last_avail_idx is that it has
 some redundancy: the high bits in the index
 ( vq size) are not necessary as they can be
 got from avail idx.  There's a consistency check
 in load but we really should try to use formats
 that are always consistent.

 The following patch does the same thing as original, yet
 keeps the format of the virtio.  It shouldn't break live
 migration either because inuse should be 0.

 Yoshi

 Question is, can you flush to make inuse 0 in kemari too?
 And if not, how do you handle the fact that some requests
 are 

Re: [PATCH 11/21] ioport: insert event_tap_ioport() to ioport_write().

2010-12-15 Thread Yoshiaki Tamura
2010/11/28 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp:
 2010/11/28 Michael S. Tsirkin m...@redhat.com:
 On Thu, Nov 25, 2010 at 03:06:50PM +0900, Yoshiaki Tamura wrote:
 Record ioport event to replay it upon failover.

 Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp

 Interesting. This will have to be extended to support ioeventfd.
 Since each eventfd is really just a binary trigger
 it should be enough to read out the fd state.

 Haven't thought about eventfd yet.  Will try doing it in the next
 spin.

Hi Michael,

I looked into eventfd and realized it's only used with vhost now.  However, I
believe vhost bypass the net layer in qemu, and there is no way for Kemari to
detect the outputs.  To me, it doesn't make sense to extend this patch to
support eventfd...

Thanks,

Yoshi


 Yoshi


 ---
  ioport.c |    2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)

 diff --git a/ioport.c b/ioport.c
 index aa4188a..74aebf5 100644
 --- a/ioport.c
 +++ b/ioport.c
 @@ -27,6 +27,7 @@

  #include ioport.h
  #include trace.h
 +#include event-tap.h

  /***/
  /* IO Port */
 @@ -76,6 +77,7 @@ static void ioport_write(int index, uint32_t address, 
 uint32_t data)
          default_ioport_writel
      };
      IOPortWriteFunc *func = ioport_write_table[index][address];
 +    event_tap_ioport(index, address, data);
      if (!func)
          func = default_func[index];
      func(ioport_opaque[address], address, data);
 --
 1.7.1.2

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-15 Thread Avi Kivity

On 12/14/2010 07:53 PM, Hollis Blanchard wrote:

On 12/14/2010 12:48 AM, Avi Kivity wrote:

On 12/13/2010 07:17 PM, Hollis Blanchard wrote:
Rewriting is dangerous if the guest is unaware of it.  As soon as 
it is made aware of it, it might as well actually do it in the best 
way that suits it.


Can you list some examples of dangerous scenarios?

Perhaps I should rephrase... any real-world dangerous scenarios? :) 


That's much less fun.

I was hoping you could share some traps you've hit with Linux or 
Windows on x86.


We've hit a lot of issues with the very limited patching we do for 
Windows XP (Linux does its own patching):


- Windows hibernation saves the patched code, but not the payload, so we 
have to set up hooks to re-enable the payload when Windows resumes from 
hibernation
- We need the vcpu id in the payload code, and no easy way to get at 
it.  After several wierd hacks we settled on peeking at the Windows 
processor control block, a guest specific per-cpu data structure.
- Some patched instructions are called before the stack is set up, so 
the return doesn't work very well

- others I'm suppressing


- guest checksums own kernel pages
For runtime intrusion detection? Such guests can simply not ask the 
hypervisor to enable the rewriting feature.


Which is sad.


- clever compiler reuses code for constant pool
Not sure what you mean here. Anyways I think clever compilers are 
irrelevant, since a compiler will not ordinarily emit a 
supervisor-mode instruction. The hypervisor has no need to patch 
normal user-mode instructions.


I meant a really clever compiler.  And by using code for the constant 
pool I using IP-relative addressing to fetch a constant using a small 
offset.  If the constant happens to be a patched instruction, it won't 
be so constant.


- guest patches itself (a la linux alternatives), surprised when it 
sees a different instruction

PowerPC Linux does patch itself, which is a write-only operation.


Other self-patchers might be different; say you use xor to toggle 
between two variants, reducing the amount of data you need to keep for 
patching.


- guest jits own kernel code (like Singularity), gets confused when 
it reads back something it didn't write
This is getting really hypothetical, but why would a JIT need to read 
the generated code?




Any wierd hypothetical idea will be in mission-critical production use 
somewhere, see Andreas reply.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-15 Thread Avi Kivity

On 12/15/2010 01:16 PM, Sethi Varun-B16395 wrote:


  -Original Message-
  From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-
  ow...@vger.kernel.org] On Behalf Of Avi Kivity
  Sent: Tuesday, December 14, 2010 9:18 PM
  To: Yoder Stuart-B08248
  Cc: Hollis Blanchard; Alexander Graf; kvm-ppc@vger.kernel.org
  Subject: Re: re-writing on powerpc

  On 12/14/2010 05:45 PM, Yoder Stuart-B08248 wrote:
   -Original Message-
   From: Avi Kivity [mailto:a...@redhat.com]
   Sent: Tuesday, December 14, 2010 2:49 AM
   To: Hollis Blanchard
   Cc: Yoder Stuart-B08248; Alexander Graf; kvm-ppc@vger.kernel.org
   Subject: Re: re-writing on powerpc

   On 12/13/2010 07:17 PM, Hollis Blanchard wrote:
  Rewriting is dangerous if the guest is unaware of it.  As soon
  as
it
  is made aware of it, it might as well actually do it in the
  best
way
  that suits it.
   
  Can you list some examples of dangerous scenarios?
   

   - guest checksums own kernel pages
   - clever compiler reuses code for constant pool
   - guest patches itself (a la linux alternatives), surprised when it
sees a
   different instruction
   - guest jits own kernel code (like Singularity), gets confused when
  it  reads back something it didn't write
  
One possible solution to hiding rewriting from guest if it must be
hidden is to mark patched pages as execute only.  If a guest reads a
patched page, the hypervisor can fix up the read.
  

  Yes.  Something that is common to all the problems above is using code
  as data.

  However, execute only would only affect the page's mapping, not the page
  itself, yes?  So if the page has another mapping, this doesn't work.


But KVM would be aware of guest page mappings, so access permissions for any 
particular mapping
can be controlled by KVM.


kvm isn't aware of all guest mappings (only those that were instantiated 
in shadow tlb/pagetables).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: re-writing on powerpc

2010-12-15 Thread Sethi Varun-B16395


 -Original Message-
 From: Avi Kivity [mailto:a...@redhat.com]
 Sent: Wednesday, December 15, 2010 4:49 PM
 To: Sethi Varun-B16395
 Cc: Yoder Stuart-B08248; Hollis Blanchard; Alexander Graf; kvm-
 p...@vger.kernel.org
 Subject: Re: re-writing on powerpc
 
 On 12/15/2010 01:16 PM, Sethi Varun-B16395 wrote:
 
-Original Message-
From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-
   ow...@vger.kernel.org] On Behalf Of Avi Kivity
Sent: Tuesday, December 14, 2010 9:18 PM
To: Yoder Stuart-B08248
Cc: Hollis Blanchard; Alexander Graf; kvm-ppc@vger.kernel.org
Subject: Re: re-writing on powerpc
  
On 12/14/2010 05:45 PM, Yoder Stuart-B08248 wrote:
 -Original Message-
 From: Avi Kivity [mailto:a...@redhat.com]
 Sent: Tuesday, December 14, 2010 2:49 AM
 To: Hollis Blanchard
 Cc: Yoder Stuart-B08248; Alexander Graf; kvm-
 p...@vger.kernel.org
 Subject: Re: re-writing on powerpc
  
 On 12/13/2010 07:17 PM, Hollis Blanchard wrote:
Rewriting is dangerous if the guest is unaware of it.
 As soon
as
  it
is made aware of it, it might as well actually do it in
 the
best
  way
that suits it.
 
Can you list some examples of dangerous scenarios?
 
  
 - guest checksums own kernel pages
 - clever compiler reuses code for constant pool
 - guest patches itself (a la linux alternatives), surprised
 when it
  sees a
 different instruction
 - guest jits own kernel code (like Singularity), gets
 confused when
it  reads back something it didn't write  One possible
   solution to hiding rewriting from guest if it must behidden is
   to mark patched pages as execute only.  If a guest reads a  
   patched page, the hypervisor can fix up the read.

  
Yes.  Something that is common to all the problems above is using
   code  as data.
  
However, execute only would only affect the page's mapping, not the
   page  itself, yes?  So if the page has another mapping, this doesn't
 work.
  
 
  But KVM would be aware of guest page mappings, so access permissions
  for any particular mapping can be controlled by KVM.
 
 kvm isn't aware of all guest mappings (only those that were instantiated
 in shadow tlb/pagetables).
I am not sure if I understand, but guest would have to be instantiate the 
mapping in the tlb (for BookE) before page can be accessed.
That's when we can set the access permissions.

-Varun

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-15 Thread Avi Kivity

On 12/15/2010 01:32 PM, Sethi Varun-B16395 wrote:

  
But KVM would be aware of guest page mappings, so access permissions
for any particular mapping can be controlled by KVM.

  kvm isn't aware of all guest mappings (only those that were instantiated
  in shadow tlb/pagetables).
I am not sure if I understand, but guest would have to be instantiate the 
mapping in the tlb (for BookE) before page can be accessed.
That's when we can set the access permissions.


You're right, for a shadow tlb kvm has all guest mappings at all time.

For page table models, it doesn't.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html