Re: KVM call agenda for July 20

2010-07-20 Thread Avi Kivity

On 07/20/2010 12:46 AM, Chris Wright wrote:

Please send in any agenda items you are interested in covering.

   

 Last week's agenda, minus the item that we started to discuss.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PIT: free irq source id in handling error path

2010-07-20 Thread Xiao Guangrong
Free irq source id if create pit workqueue fail

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/i8254.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 0fd6378..211716f 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -697,6 +697,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
pit-wq = create_singlethread_workqueue(kvm-pit-wq);
if (!pit-wq) {
mutex_unlock(pit-pit_state.lock);
+   kvm_free_irq_source_id(kvm, pit-irq_source_id);
kfree(pit);
return NULL;
}
-- 
1.6.1.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PPC64/Power7 - 2.6.35-rc5] Bad relocation warnings whileBuilding a CONFIG_RELOCATABLE kernel with CONFIG_ISERIES enabled

2010-07-20 Thread Alexander Graf

On 20.07.2010, at 09:27, Milton Miller wrote:

 On Mon, 19 Jul 2010 about 14:00:56 +0200, Alexander Graf wrote:
 Milton Miller wrote:
 I wrote:
 
 Oh yea, and for book-3s, the code copies from 0x100 to __end_interrupts
 in arch/powerpc/kernel/exceptions-64s.h down to the real 0, but the rest
 of the kernel is at some disjointed address.  The interrupt will go to
 the copy at the real zero.  Any references to code outside that region
 must be done via a full indrect branch (not a relative one), simiar to
 the secondary startup (via following the function pointer in a descriptor
 set in very low memory), or syscall entry and exception vectors via paca.
 
 
 That would still break on normal PPC boxes, as any address accessed in
 real mode has to be inside the RMA. And the #include for
 kvm/book3s_rmhandlers.S happens after __end_interrupts. So I'd end up
 with code that gets executed outside of the RMA after a relocation, right?
 
 Alex
 
 
 Weither its outside of the RMA or not, DO_KVM is creating a branch outside
 of code copied to lowmem.
 
 This is BROKEN.
 
 We have a hard limit that we can't extend _end_interrupts past 0x7000, and
 a soft limit that we can't exceed 0x6000.  If there is space, we could
 move the real mode handler extensions inside end_interrupts in
 exceptions-64s.S, and store the full address in a .quad so it gets
 relocated properly.  Don't subtract the start, we have designed the kernel
 to run with start at a VA that can be used as a EA in real mode.

Moving everything to exceptions-64s.S sounds like the best thing to do. All the 
code in real mode really is there so it stays inside the RMA. I don't think we 
can guarantee that for any code that is not copied, right?

 Otherwise we need to mark KVM_BOOK3S_64 depends on (!RELOCATABLE ||
 BROKEN) for 2.6.35 until we get fixes.

Well - it's only broken when really getting relocated. But I agree, the current 
state doesn't cope with Linux's relocation logic.

 I took a read though the book3s code as of 2.6.34.   A few things I noticed:
 
 (1) The code is using slb large to control the segment size.   It should
 be using SLB B field (or just impliment 256M segments only).

I'm not sure I understand this part? We only use 256MB segments for now.

 (2) It appears that the mtspr and mfspr code is using the same storage for
 bats 4-7 as 0-3 ... I would have expected a 4 + a few places.

Yes, that one is fixed in more recent versions already.

 (3) Its not clear to me that you clear RI when transitioning to the guest
 but its obviously required because you place state in srr0  srr1.

Uh - do I have to clear RI? I'm not prepared to take an interrupt anyways and 
RI is just a soft flag for Linux's handlers, right?

 (4) I don't understand why __kvmppc_vcpu_run turns on interrupts so that
 __kvmppc_vcpu_entry can turn them back off.   Something to do with
 irq trace annotations?

__kvmppc_vcpu_run turns on soft interrupts while __kvmppc_vcpu_entry turns them 
off in MSR. This is so that when enabling interrupts again on guest exit, we 
have the soft enable bit set.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/18] Make cpu_tsc_khz updates use local CPU

2010-07-20 Thread Avi Kivity

On 07/19/2010 11:06 PM, Zachary Amsden wrote:

+static void tsc_khz_changed(void *data)
  {
-/* nothing */
+struct cpufreq_freqs *freq = data;
+unsigned long khz = 0;
+
+if (data)
+khz = freq-new;
+else if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
+khz = cpufreq_quick_get(raw_smp_processor_id());
+if (!khz)
+khz = tsc_khz;
+__get_cpu_var(cpu_tsc_khz) = khz;
  }


Do we really need to cache cpufreq_quick_get()?  If it's really 
quick, why not just use it everywhere instead of cacheing it?  Not a 
comment on this patch.





If cpufreq is compiled in, but disabled, it returns zero, so we need 
some sort of logic.


Maybe it's better to put it into cpufreq_quick_get().  Inconsistent APIs 
that appear to work are bad.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 01/14] KVM-test: Add a new macaddress pool algorithm

2010-07-20 Thread Michael Goldish
On 07/20/2010 04:34 AM, Amos Kong wrote:
 Old method uses the mac address in the configuration files which could
 lead serious problem when multiple tests running in different hosts.
 
 This patch adds a new macaddress pool algorithm, it generates the mac prefix
 based on mac address of the host which could eliminate the duplicated mac
 addresses between machines.
 
 When user have set the mac_prefix in the configuration file, we should use it
 in stead of the dynamic generated mac prefix.
 
 Other change:
 . Fix randomly generating mac address so that it correspond to IEEE802.
 . Update clone function to decide clone mac address or not.
 . Update get_macaddr function.
 . Add set_mac_address function.
 
 New auto mac address pool algorithm:
 If address_index is defined, VM will get mac from config file then record mac
 in to address_pool. If address_index is not defined, VM will call
 get_mac_from_pool to auto create mac then recored mac to address_pool in
 following format:
 {'macpool': {'AE:9D:94:6A:9b:f9': ['20100310-165222-Wt7l:0']}}
 
   AE:9D:94:6A:9b:f9: mac address
   20100310-165222-Wt7l : instance attribute of VM
   0: index of NIC

Why do you use the mac address as a key, instead of the instance string
+ nic index?  When the mac address is used as a key, each key has a list
of values instead of just one value.  This order seems unnatural.  If it
were the other way around (i.e. key = VM instance + nic index, value =
mac address), then each key would have exactly one value, and I think
this patch would be shorter and simpler.

 Signed-off-by: Jason Wang jasow...@redhat.com
 Signed-off-by: Feng Yang fy...@redhat.com
 Signed-off-by: Amos Kong ak...@redhat.com
 ---
  0 files changed, 0 insertions(+), 0 deletions(-)
 
 diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
 index fb2d1c2..7c0946e 100644
 --- a/client/tests/kvm/kvm_utils.py
 +++ b/client/tests/kvm/kvm_utils.py
 @@ -5,6 +5,7 @@ KVM test utility functions.
  
  
  import time, string, random, socket, os, signal, re, logging, commands, 
 cPickle
 +import fcntl, shelve
  from autotest_lib.client.bin import utils
  from autotest_lib.client.common_lib import error, logging_config
  import kvm_subprocess
 @@ -82,6 +83,104 @@ def get_sub_dict_names(dict, keyword):
  
  # Functions related to MAC/IP addresses
  
 +def get_mac_from_pool(root_dir, vm, nic_index, prefix='00:11:22:33:'):

The name of this function is confusing because it does the exact
opposite: it puts a mac address in address_pool.  Maybe the pool you're
referring to in the name isn't address_pool, but still a less confusing
name should probably be used.

 +
 +random generated mac address.
 +
 +1) First try to generate macaddress based on the mac address prefix.
 +2) And then try to use total random generated mac address.
 +
 +@param root_dir: Root dir for kvm
 +@param vm: Here we use instance of vm
 +@param nic_index: The index of nic.
 +@param prefix: Prefix of mac address.
 +@Return: Return mac address.
 +
 +
 +lock_filename = os.path.join(root_dir, mac_lock)
 +lock_file = open(lock_filename, 'w')
 +fcntl.lockf(lock_file.fileno() ,fcntl.LOCK_EX)
 +mac_filename = os.path.join(root_dir, address_pool)

Maybe it makes sense to put address_pool and the lock file in /tmp,
where they can be shared by more than a single autotest instance running
on the same host (unlikely, but theoretically possible).

 +mac_shelve = shelve.open(mac_filename, writeback=False)
 +
 +mac_pool = mac_shelve.get(macpool)

Why is this 'macpool' needed?  Why not put the keys directly in the
shelve object?

 +
 +if not mac_pool:
 +mac_pool = {}
 +found = False
 +
 +val = %s:%s % (vm, nic_index)
 +for key in mac_pool.keys():
 +if val in mac_pool[key]:
 +mac_pool[key].append(val)

Why append val to mac_pool[key] if val is already in mac_pool[key]?

 +found = True
 +mac = key
 +
 +while not found:
 +postfix = %02x:%02x % (random.randint(0x00,0xfe),
 +random.randint(0x00,0xfe))
 +mac = prefix + postfix
 +mac_list = mac.split(:)
 +# Clear multicast bit
 +mac_list[0] = int(mac_list[0],16)  0xfe
 +# Set local assignment bit (IEEE802)
 +mac_list[0] = mac_list[0] | 0x02
 +mac_list[0] = %02x % mac_list[0]

Why is this needed?  Most mac addresses begin with 00. If the mac
address is generated from the address of eth0 (using the method in this
patch) it begins with 00, which is fine. If the prefix is set by the
user using mac_prefix, I don't think we should modify it.

 +mac = :.join(mac_list)
 +if mac not in mac_pool.keys() or 0 == len(mac_pool[mac]):
 +mac_pool[mac] = [%s:%s % (vm,nic_index)]
 +found = True
 +mac_shelve[macpool] = mac_pool
 +logging.debug(generating mac addr %s  % mac)
 +
 +

Re: [Autotest][RFC PATCH 00/14] Patchset of network related subtests

2010-07-20 Thread Lucas Meneghel Rodrigues
On Tue, 2010-07-20 at 09:34 +0800, Amos Kong wrote:
 The following series contain 11 network related subtests, welcome to give me
 some suggestions about correctness, design, enhancement.

Awesome work, will start to review them today. Thanks!

 Thank you so much!
 
 ---
 
 Amos Kong (14):
   KVM-test: Add a new macaddress pool algorithm
   KVM Test: Add a function get_interface_name() to kvm_net_utils.py
   KVM Test: Add a common ping module for network related tests
   KVM-test: Add a new subtest ping
   KVM-test: Add a subtest jumbo
   KVM-test: Add basic file transfer test
   KVM-test: Add a subtest of load/unload nic driver
   KVM-test: Add a subtest of nic promisc
   KVM-test: Add a subtest of multicast
   KVM-test: Add a subtest of pxe
   KVM-test: Add a subtest of changing mac address
   KVM-test: Add a subtest of netperf
   KVM-test: Improve vlan subtest
   KVM-test: Add subtest of testing offload by ethtool
 
 
  0 files changed, 0 insertions(+), 0 deletions(-)
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] Nonatomic interrupt injection

2010-07-20 Thread Avi Kivity
This patchset changes interrupt injection to be done from normal process
context instead of interrupts disabled context.  This is useful for real
mode interrupt injection on Intel without the current hacks (injecting as
a software interrupt of a vm86 task), reducing latencies, and later, for
allowing nested virtualization code to use kvm_read_guest()/kvm_write_guest()
instead of kmap() to access the guest vmcb/vmcs.

Seems to survive a hack that cancels every 16th entry, after injection has
already taken place.

TODO: svm support, more complicated due to debug and nsvm handling

Avi Kivity (3):
  KVM: VMX: Split up vmx_complete_interrupts()
  KVM: VMX: Parameterize vmx_complete_interrupts() for both exit and
entry
  KVM: Non-atomic interrupt injection

 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/vmx.c  |   64 +-
 arch/x86/kvm/x86.c  |   27 
 3 files changed, 64 insertions(+), 28 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM: Non-atomic interrupt injection

2010-07-20 Thread Avi Kivity
Change the interrupt injection code to work from preemptible, interrupts
enabled context.  This works by adding a -cancel_injection() operation
that undoes an injection in case we were not able to actually enter the guest
(this condition could never happen with atomic injection).

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/vmx.c  |   10 ++
 arch/x86/kvm/x86.c  |   27 ++-
 3 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 502e53f..5dd797c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -505,6 +505,7 @@ struct kvm_x86_ops {
void (*queue_exception)(struct kvm_vcpu *vcpu, unsigned nr,
bool has_error_code, u32 error_code,
bool reinject);
+   void (*cancel_injection)(struct kvm_vcpu *vcpu);
int (*interrupt_allowed)(struct kvm_vcpu *vcpu);
int (*nmi_allowed)(struct kvm_vcpu *vcpu);
bool (*get_nmi_mask)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 53b6fc0..a039af2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3906,6 +3906,15 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
  IDT_VECTORING_ERROR_CODE);
 }
 
+static void vmx_cancel_injection(struct vcpu_vmx *vmx)
+{
+   __vmx_complete_interrupts(vmx, vmcs_read32(VM_ENTRY_INTR_INFO_FIELD),
+ VM_ENTRY_INSTRUCTION_LEN,
+ VM_ENTRY_EXCEPTION_ERROR_CODE);
+
+   vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
+}
+
 /*
  * Failure to inject an interrupt should give us the information
  * in IDT_VECTORING_INFO_FIELD.  However, if the failure occurs
@@ -4360,6 +4369,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.set_irq = vmx_inject_irq,
.set_nmi = vmx_inject_nmi,
.queue_exception = vmx_queue_exception,
+   .cancel_injection = vmx_cancel_injection,
.interrupt_allowed = vmx_interrupt_allowed,
.nmi_allowed = vmx_nmi_allowed,
.get_nmi_mask = vmx_get_nmi_mask,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 84bfb51..1040d3f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4709,6 +4709,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (unlikely(r))
goto out;
 
+   inject_pending_event(vcpu);
+
+   /* enable NMI/IRQ window open exits if needed */
+   if (vcpu-arch.nmi_pending)
+   kvm_x86_ops-enable_nmi_window(vcpu);
+   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
+   kvm_x86_ops-enable_irq_window(vcpu);
+
+   if (kvm_lapic_enabled(vcpu)) {
+   update_cr8_intercept(vcpu);
+   kvm_lapic_sync_to_vapic(vcpu);
+   }
+
preempt_disable();
 
kvm_x86_ops-prepare_guest_switch(vcpu);
@@ -4727,23 +4740,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
smp_wmb();
local_irq_enable();
preempt_enable();
+   kvm_x86_ops-cancel_injection(vcpu);
r = 1;
goto out;
}
 
-   inject_pending_event(vcpu);
-
-   /* enable NMI/IRQ window open exits if needed */
-   if (vcpu-arch.nmi_pending)
-   kvm_x86_ops-enable_nmi_window(vcpu);
-   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
-   kvm_x86_ops-enable_irq_window(vcpu);
-
-   if (kvm_lapic_enabled(vcpu)) {
-   update_cr8_intercept(vcpu);
-   kvm_lapic_sync_to_vapic(vcpu);
-   }
-
srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx);
 
kvm_guest_enter();
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM: VMX: Split up vmx_complete_interrupts()

2010-07-20 Thread Avi Kivity
vmx_complete_interrupts() does too much, split it up:
 - vmx_vcpu_run() gets the cache important vmcs fields part
 - a new vmx_complete_atomic_exit() gets the parts that must be done atomically
 - a new vmx_recover_nmi_blocking() does what its name says
 - vmx_complete_interrupts() retains the event injection recovery code

This helps in reducing the work done in atomic context.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/vmx.c |   39 +++
 1 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2fdcc98..1a35964 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -125,6 +125,7 @@ struct vcpu_vmx {
unsigned long host_rsp;
int   launched;
u8fail;
+   u32   exit_intr_info;
u32   idt_vectoring_info;
struct shared_msr_entry *guest_msrs;
int   nmsrs;
@@ -3792,18 +3793,9 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, 
int tpr, int irr)
vmcs_write32(TPR_THRESHOLD, irr);
 }
 
-static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
+static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
 {
-   u32 exit_intr_info;
-   u32 idt_vectoring_info = vmx-idt_vectoring_info;
-   bool unblock_nmi;
-   u8 vector;
-   int type;
-   bool idtv_info_valid;
-
-   exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
-
-   vmx-exit_reason = vmcs_read32(VM_EXIT_REASON);
+   u32 exit_intr_info = vmx-exit_intr_info;
 
/* Handle machine checks before interrupts are enabled */
if ((vmx-exit_reason == EXIT_REASON_MCE_DURING_VMENTRY)
@@ -3818,8 +3810,16 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
asm(int $2);
kvm_after_handle_nmi(vmx-vcpu);
}
+}
 
-   idtv_info_valid = idt_vectoring_info  VECTORING_INFO_VALID_MASK;
+static void vmx_recover_nmi_blocking(struct vcpu_vmx *vmx)
+{
+   u32 exit_intr_info = vmx-exit_intr_info;
+   bool unblock_nmi;
+   u8 vector;
+   bool idtv_info_valid;
+
+   idtv_info_valid = vmx-idt_vectoring_info  VECTORING_INFO_VALID_MASK;
 
if (cpu_has_virtual_nmis()) {
unblock_nmi = (exit_intr_info  INTR_INFO_UNBLOCK_NMI) != 0;
@@ -3841,6 +3841,16 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
} else if (unlikely(vmx-soft_vnmi_blocked))
vmx-vnmi_blocked_time +=
ktime_to_ns(ktime_sub(ktime_get(), vmx-entry_time));
+}
+
+static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
+{
+   u32 idt_vectoring_info = vmx-idt_vectoring_info;
+   u8 vector;
+   int type;
+   bool idtv_info_valid;
+
+   idtv_info_valid = idt_vectoring_info  VECTORING_INFO_VALID_MASK;
 
vmx-vcpu.arch.nmi_injected = false;
kvm_clear_exception_queue(vmx-vcpu);
@@ -4051,6 +4061,11 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
asm(mov %0, %%ds; mov %0, %%es : : r(__USER_DS));
vmx-launched = 1;
 
+   vmx-exit_reason = vmcs_read32(VM_EXIT_REASON);
+   vmx-exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
+
+   vmx_complete_atomic_exit(vmx);
+   vmx_recover_nmi_blocking(vmx);
vmx_complete_interrupts(vmx);
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM: VMX: Parameterize vmx_complete_interrupts() for both exit and entry

2010-07-20 Thread Avi Kivity
Currently vmx_complete_interrupts() can decode event information from vmx
exit fields into the generic kvm event queues.  Make it able to decode
the information from the entry fields as well by parametrizing it.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/vmx.c |   19 ++-
 1 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1a35964..53b6fc0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3843,9 +3843,11 @@ static void vmx_recover_nmi_blocking(struct vcpu_vmx 
*vmx)
ktime_to_ns(ktime_sub(ktime_get(), vmx-entry_time));
 }
 
-static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
+static void __vmx_complete_interrupts(struct vcpu_vmx *vmx,
+ u32 idt_vectoring_info,
+ int instr_len_field,
+ int error_code_field)
 {
-   u32 idt_vectoring_info = vmx-idt_vectoring_info;
u8 vector;
int type;
bool idtv_info_valid;
@@ -3875,18 +3877,18 @@ static void vmx_complete_interrupts(struct vcpu_vmx 
*vmx)
break;
case INTR_TYPE_SOFT_EXCEPTION:
vmx-vcpu.arch.event_exit_inst_len =
-   vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
+   vmcs_read32(instr_len_field);
/* fall through */
case INTR_TYPE_HARD_EXCEPTION:
if (idt_vectoring_info  VECTORING_INFO_DELIVER_CODE_MASK) {
-   u32 err = vmcs_read32(IDT_VECTORING_ERROR_CODE);
+   u32 err = vmcs_read32(error_code_field);
kvm_queue_exception_e(vmx-vcpu, vector, err);
} else
kvm_queue_exception(vmx-vcpu, vector);
break;
case INTR_TYPE_SOFT_INTR:
vmx-vcpu.arch.event_exit_inst_len =
-   vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
+   vmcs_read32(instr_len_field);
/* fall through */
case INTR_TYPE_EXT_INTR:
kvm_queue_interrupt(vmx-vcpu, vector,
@@ -3897,6 +3899,13 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
}
 }
 
+static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
+{
+   __vmx_complete_interrupts(vmx, vmx-idt_vectoring_info,
+ VM_EXIT_INSTRUCTION_LEN,
+ IDT_VECTORING_ERROR_CODE);
+}
+
 /*
  * Failure to inject an interrupt should give us the information
  * in IDT_VECTORING_INFO_FIELD.  However, if the failure occurs
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Allow a user to stop and start one guest VM

2010-07-20 Thread Neil Aggarwal
Hello:

One of my customers asked for access to stop and start
their guest VM.  

Right now, I can do that using virsh, but I do not want
to give this customer the ability to stop and start
all VMs running on the host.

Is there a way to give stop and start control of one
VM to someone?

I am using KVM on a CentOS 5.5 host.

Thanks,
Neil

--
Neil Aggarwal, (281)846-8957
FREE trial: Virtualmin VPS with unmetered bandwidth
http://UnmeteredVPS.net/virtualmin

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Allow a user to stop and start one guest VM

2010-07-20 Thread Daniel P. Berrange
On Tue, Jul 20, 2010 at 08:01:15AM -0500, Neil Aggarwal wrote:
 Hello:
 
 One of my customers asked for access to stop and start
 their guest VM.  
 
 Right now, I can do that using virsh, but I do not want
 to give this customer the ability to stop and start
 all VMs running on the host.
 
 Is there a way to give stop and start control of one
 VM to someone?

Fine grained role based access control is not available at the
libvirt/virsh level. It is currently something that must be
provided by the management layer above libvirt. We intend to
add this capability directly into libvirt in the future, but
there's no firm ETA. So in the immediate term you'd need to
write a small tool using libvirt APIs to delegate stop/start
operations to users you desire


Regards,
Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: KVM call agenda for July 20

2010-07-20 Thread Luiz Capitulino
On Tue, 20 Jul 2010 09:07:11 +0300
Avi Kivity a...@redhat.com wrote:

 On 07/20/2010 12:46 AM, Chris Wright wrote:
  Please send in any agenda items you are interested in covering.
 
 
   Last week's agenda, minus the item that we started to discuss.

(includes 0.13)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/3] Nonatomic interrupt injection

2010-07-20 Thread Avi Kivity
This patchset changes interrupt injection to be done from normal process
context instead of interrupts disabled context.  This is useful for real
mode interrupt injection on Intel without the current hacks (injecting as
a software interrupt of a vm86 task), reducing latencies, and later, for
allowing nested virtualization code to use kvm_read_guest()/kvm_write_guest()
instead of kmap() to access the guest vmcb/vmcs.

Seems to survive a hack that cancels every 16th entry, after injection has
already taken place.

v2: svm support (easier than expected)
fix silly vmx warning

Avi Kivity (3):
  KVM: VMX: Split up vmx_complete_interrupts()
  KVM: VMX: Parameterize vmx_complete_interrupts() for both exit and
entry
  KVM: Non-atomic interrupt injection

 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/svm.c  |   12 +++
 arch/x86/kvm/vmx.c  |   65 ++-
 arch/x86/kvm/x86.c  |   27 
 4 files changed, 77 insertions(+), 28 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/3] KVM: VMX: Parameterize vmx_complete_interrupts() for both exit and entry

2010-07-20 Thread Avi Kivity
Currently vmx_complete_interrupts() can decode event information from vmx
exit fields into the generic kvm event queues.  Make it able to decode
the information from the entry fields as well by parametrizing it.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/vmx.c |   19 ++-
 1 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1a35964..53b6fc0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3843,9 +3843,11 @@ static void vmx_recover_nmi_blocking(struct vcpu_vmx 
*vmx)
ktime_to_ns(ktime_sub(ktime_get(), vmx-entry_time));
 }
 
-static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
+static void __vmx_complete_interrupts(struct vcpu_vmx *vmx,
+ u32 idt_vectoring_info,
+ int instr_len_field,
+ int error_code_field)
 {
-   u32 idt_vectoring_info = vmx-idt_vectoring_info;
u8 vector;
int type;
bool idtv_info_valid;
@@ -3875,18 +3877,18 @@ static void vmx_complete_interrupts(struct vcpu_vmx 
*vmx)
break;
case INTR_TYPE_SOFT_EXCEPTION:
vmx-vcpu.arch.event_exit_inst_len =
-   vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
+   vmcs_read32(instr_len_field);
/* fall through */
case INTR_TYPE_HARD_EXCEPTION:
if (idt_vectoring_info  VECTORING_INFO_DELIVER_CODE_MASK) {
-   u32 err = vmcs_read32(IDT_VECTORING_ERROR_CODE);
+   u32 err = vmcs_read32(error_code_field);
kvm_queue_exception_e(vmx-vcpu, vector, err);
} else
kvm_queue_exception(vmx-vcpu, vector);
break;
case INTR_TYPE_SOFT_INTR:
vmx-vcpu.arch.event_exit_inst_len =
-   vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
+   vmcs_read32(instr_len_field);
/* fall through */
case INTR_TYPE_EXT_INTR:
kvm_queue_interrupt(vmx-vcpu, vector,
@@ -3897,6 +3899,13 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
}
 }
 
+static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
+{
+   __vmx_complete_interrupts(vmx, vmx-idt_vectoring_info,
+ VM_EXIT_INSTRUCTION_LEN,
+ IDT_VECTORING_ERROR_CODE);
+}
+
 /*
  * Failure to inject an interrupt should give us the information
  * in IDT_VECTORING_INFO_FIELD.  However, if the failure occurs
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/3] KVM: Non-atomic interrupt injection

2010-07-20 Thread Avi Kivity
Change the interrupt injection code to work from preemptible, interrupts
enabled context.  This works by adding a -cancel_injection() operation
that undoes an injection in case we were not able to actually enter the guest
(this condition could never happen with atomic injection).

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/svm.c  |   12 
 arch/x86/kvm/vmx.c  |   11 +++
 arch/x86/kvm/x86.c  |   27 ++-
 4 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 502e53f..5dd797c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -505,6 +505,7 @@ struct kvm_x86_ops {
void (*queue_exception)(struct kvm_vcpu *vcpu, unsigned nr,
bool has_error_code, u32 error_code,
bool reinject);
+   void (*cancel_injection)(struct kvm_vcpu *vcpu);
int (*interrupt_allowed)(struct kvm_vcpu *vcpu);
int (*nmi_allowed)(struct kvm_vcpu *vcpu);
bool (*get_nmi_mask)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 56c9b6b..46d068e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3135,6 +3135,17 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
}
 }
 
+static void svm_cancel_injection(struct kvm_vcpu *vcpu)
+{
+   struct vcpu_svm *svm = to_svm(vcpu);
+   struct vmcb_control_area *control = svm-vmcb-control;
+
+   control-exit_int_info = control-event_inj;
+   control-exit_int_info_err = control-event_inj_err;
+   control-event_inj = 0;
+   svm_complete_interrupts(svm);
+}
+
 #ifdef CONFIG_X86_64
 #define R r
 #else
@@ -3493,6 +3504,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.set_irq = svm_set_irq,
.set_nmi = svm_inject_nmi,
.queue_exception = svm_queue_exception,
+   .cancel_injection = svm_cancel_injection,
.interrupt_allowed = svm_interrupt_allowed,
.nmi_allowed = svm_nmi_allowed,
.get_nmi_mask = svm_get_nmi_mask,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 53b6fc0..72381b7 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3906,6 +3906,16 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
  IDT_VECTORING_ERROR_CODE);
 }
 
+static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
+{
+   __vmx_complete_interrupts(to_vmx(vcpu),
+ vmcs_read32(VM_ENTRY_INTR_INFO_FIELD),
+ VM_ENTRY_INSTRUCTION_LEN,
+ VM_ENTRY_EXCEPTION_ERROR_CODE);
+
+   vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
+}
+
 /*
  * Failure to inject an interrupt should give us the information
  * in IDT_VECTORING_INFO_FIELD.  However, if the failure occurs
@@ -4360,6 +4370,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.set_irq = vmx_inject_irq,
.set_nmi = vmx_inject_nmi,
.queue_exception = vmx_queue_exception,
+   .cancel_injection = vmx_cancel_injection,
.interrupt_allowed = vmx_interrupt_allowed,
.nmi_allowed = vmx_nmi_allowed,
.get_nmi_mask = vmx_get_nmi_mask,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 84bfb51..1040d3f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4709,6 +4709,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (unlikely(r))
goto out;
 
+   inject_pending_event(vcpu);
+
+   /* enable NMI/IRQ window open exits if needed */
+   if (vcpu-arch.nmi_pending)
+   kvm_x86_ops-enable_nmi_window(vcpu);
+   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
+   kvm_x86_ops-enable_irq_window(vcpu);
+
+   if (kvm_lapic_enabled(vcpu)) {
+   update_cr8_intercept(vcpu);
+   kvm_lapic_sync_to_vapic(vcpu);
+   }
+
preempt_disable();
 
kvm_x86_ops-prepare_guest_switch(vcpu);
@@ -4727,23 +4740,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
smp_wmb();
local_irq_enable();
preempt_enable();
+   kvm_x86_ops-cancel_injection(vcpu);
r = 1;
goto out;
}
 
-   inject_pending_event(vcpu);
-
-   /* enable NMI/IRQ window open exits if needed */
-   if (vcpu-arch.nmi_pending)
-   kvm_x86_ops-enable_nmi_window(vcpu);
-   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
-   kvm_x86_ops-enable_irq_window(vcpu);
-
-   if (kvm_lapic_enabled(vcpu)) {
-   update_cr8_intercept(vcpu);
-   kvm_lapic_sync_to_vapic(vcpu);
-   }
-
srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx);
 
kvm_guest_enter();
-- 
1.7.1

--
To unsubscribe from this list: send 

[PATCH v2 1/3] KVM: VMX: Split up vmx_complete_interrupts()

2010-07-20 Thread Avi Kivity
vmx_complete_interrupts() does too much, split it up:
 - vmx_vcpu_run() gets the cache important vmcs fields part
 - a new vmx_complete_atomic_exit() gets the parts that must be done atomically
 - a new vmx_recover_nmi_blocking() does what its name says
 - vmx_complete_interrupts() retains the event injection recovery code

This helps in reducing the work done in atomic context.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/vmx.c |   39 +++
 1 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2fdcc98..1a35964 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -125,6 +125,7 @@ struct vcpu_vmx {
unsigned long host_rsp;
int   launched;
u8fail;
+   u32   exit_intr_info;
u32   idt_vectoring_info;
struct shared_msr_entry *guest_msrs;
int   nmsrs;
@@ -3792,18 +3793,9 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, 
int tpr, int irr)
vmcs_write32(TPR_THRESHOLD, irr);
 }
 
-static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
+static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
 {
-   u32 exit_intr_info;
-   u32 idt_vectoring_info = vmx-idt_vectoring_info;
-   bool unblock_nmi;
-   u8 vector;
-   int type;
-   bool idtv_info_valid;
-
-   exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
-
-   vmx-exit_reason = vmcs_read32(VM_EXIT_REASON);
+   u32 exit_intr_info = vmx-exit_intr_info;
 
/* Handle machine checks before interrupts are enabled */
if ((vmx-exit_reason == EXIT_REASON_MCE_DURING_VMENTRY)
@@ -3818,8 +3810,16 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
asm(int $2);
kvm_after_handle_nmi(vmx-vcpu);
}
+}
 
-   idtv_info_valid = idt_vectoring_info  VECTORING_INFO_VALID_MASK;
+static void vmx_recover_nmi_blocking(struct vcpu_vmx *vmx)
+{
+   u32 exit_intr_info = vmx-exit_intr_info;
+   bool unblock_nmi;
+   u8 vector;
+   bool idtv_info_valid;
+
+   idtv_info_valid = vmx-idt_vectoring_info  VECTORING_INFO_VALID_MASK;
 
if (cpu_has_virtual_nmis()) {
unblock_nmi = (exit_intr_info  INTR_INFO_UNBLOCK_NMI) != 0;
@@ -3841,6 +3841,16 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
} else if (unlikely(vmx-soft_vnmi_blocked))
vmx-vnmi_blocked_time +=
ktime_to_ns(ktime_sub(ktime_get(), vmx-entry_time));
+}
+
+static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
+{
+   u32 idt_vectoring_info = vmx-idt_vectoring_info;
+   u8 vector;
+   int type;
+   bool idtv_info_valid;
+
+   idtv_info_valid = idt_vectoring_info  VECTORING_INFO_VALID_MASK;
 
vmx-vcpu.arch.nmi_injected = false;
kvm_clear_exception_queue(vmx-vcpu);
@@ -4051,6 +4061,11 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
asm(mov %0, %%ds; mov %0, %%es : : r(__USER_DS));
vmx-launched = 1;
 
+   vmx-exit_reason = vmcs_read32(VM_EXIT_REASON);
+   vmx-exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
+
+   vmx_complete_atomic_exit(vmx);
+   vmx_recover_nmi_blocking(vmx);
vmx_complete_interrupts(vmx);
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 01/14] KVM-test: Add a new macaddress pool algorithm

2010-07-20 Thread Amos Kong
On Tue, Jul 20, 2010 at 01:19:39PM +0300, Michael Goldish wrote:


Michael,

Thanks for your comments. Let's simplify this method together.

 On 07/20/2010 04:34 AM, Amos Kong wrote:
  Old method uses the mac address in the configuration files which could
  lead serious problem when multiple tests running in different hosts.
  
  This patch adds a new macaddress pool algorithm, it generates the mac prefix
  based on mac address of the host which could eliminate the duplicated mac
  addresses between machines.
  
  When user have set the mac_prefix in the configuration file, we should use 
  it
  in stead of the dynamic generated mac prefix.
  
  Other change:
  . Fix randomly generating mac address so that it correspond to IEEE802.
  . Update clone function to decide clone mac address or not.
  . Update get_macaddr function.
  . Add set_mac_address function.
  
  New auto mac address pool algorithm:
  If address_index is defined, VM will get mac from config file then record 
  mac
  in to address_pool. If address_index is not defined, VM will call
  get_mac_from_pool to auto create mac then recored mac to address_pool in
  following format:
  {'macpool': {'AE:9D:94:6A:9b:f9': ['20100310-165222-Wt7l:0']}}
  
AE:9D:94:6A:9b:f9: mac address
20100310-165222-Wt7l : instance attribute of VM
0: index of NIC
 
 Why do you use the mac address as a key, instead of the instance string
 + nic index?  When the mac address is used as a key, each key has a list
 of values instead of just one value.  This order seems unnatural.  If it
 were the other way around (i.e. key = VM instance + nic index, value =
 mac address), then each key would have exactly one value, and I think
 this patch would be shorter and simpler.

One mac address may be used by two VMs, eg. migration.
 
  Signed-off-by: Jason Wang jasow...@redhat.com
  Signed-off-by: Feng Yang fy...@redhat.com
  Signed-off-by: Amos Kong ak...@redhat.com
  ---
   0 files changed, 0 insertions(+), 0 deletions(-)
  
  diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
  index fb2d1c2..7c0946e 100644
  --- a/client/tests/kvm/kvm_utils.py
  +++ b/client/tests/kvm/kvm_utils.py
  @@ -5,6 +5,7 @@ KVM test utility functions.
   
   
   import time, string, random, socket, os, signal, re, logging, commands, 
  cPickle
  +import fcntl, shelve
   from autotest_lib.client.bin import utils
   from autotest_lib.client.common_lib import error, logging_config
   import kvm_subprocess
  @@ -82,6 +83,104 @@ def get_sub_dict_names(dict, keyword):
   
   # Functions related to MAC/IP addresses
   
  +def get_mac_from_pool(root_dir, vm, nic_index, prefix='00:11:22:33:'):
 
 The name of this function is confusing because it does the exact
 opposite: it puts a mac address in address_pool.  Maybe the pool you're
 referring to in the name isn't address_pool, but still a less confusing
 name should probably be used.

How about allocate_mac(...) ?
address_pool - address_container

Allocate mac address and record into address_container.

 
  +
  +random generated mac address.
  +
  +1) First try to generate macaddress based on the mac address prefix.
  +2) And then try to use total random generated mac address.
  +
  +@param root_dir: Root dir for kvm
  +@param vm: Here we use instance of vm
  +@param nic_index: The index of nic.
  +@param prefix: Prefix of mac address.
  +@Return: Return mac address.
  +
  +
  +lock_filename = os.path.join(root_dir, mac_lock)
  +lock_file = open(lock_filename, 'w')
  +fcntl.lockf(lock_file.fileno() ,fcntl.LOCK_EX)
  +mac_filename = os.path.join(root_dir, address_pool)
 
 Maybe it makes sense to put address_pool and the lock file in /tmp,
 where they can be shared by more than a single autotest instance running
 on the same host (unlikely, but theoretically possible).

good idea.
 
  +mac_shelve = shelve.open(mac_filename, writeback=False)
  +
  +mac_pool = mac_shelve.get(macpool)
 
 Why is this 'macpool' needed?  Why not put the keys directly in the
 shelve object?
 
yes, put keys directly in the shelve object is better.

  +if not mac_pool:
  +mac_pool = {}
  +found = False
  +
  +val = %s:%s % (vm, nic_index)
  +for key in mac_pool.keys():
  +if val in mac_pool[key]:
  +mac_pool[key].append(val)
 
 Why append val to mac_pool[key] if val is already in mac_pool[key]?

need drop it.

  +found = True
  +mac = key
  +
  +while not found:
  +postfix = %02x:%02x % (random.randint(0x00,0xfe),
  +random.randint(0x00,0xfe))
  +mac = prefix + postfix
  +mac_list = mac.split(:)
  +# Clear multicast bit
  +mac_list[0] = int(mac_list[0],16)  0xfe
  +# Set local assignment bit (IEEE802)
  +mac_list[0] = mac_list[0] | 0x02
  +mac_list[0] = %02x % mac_list[0]
 
 Why is this needed?  Most 

does sidt get correct start address of IDT in guest?

2010-07-20 Thread 吴忠远
 in guest os , a module with sidt instruction was execution to get
start address of IDT.does this return the correct address of IDT in
guest OS? thanks.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: does sidt get correct start address of IDT in guest?

2010-07-20 Thread Avi Kivity

On 07/20/2010 05:04 PM, 吴忠远 wrote:

  in guest os , a module with sidt instruction was execution to get
start address of IDT.does this return the correct address of IDT in
guest OS? thanks.
   


Yes.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call minutes for July 20

2010-07-20 Thread Chris Wright
0.12.stable
- start w/ git tree + pull requests
- release process is separate from commit access
- justin will put up a tree for pull requests
- there's current backlog, what about that?
- anthony's concern with -stable is the testing (upstream tree gets more
  testing than -stable)
- 0.12.5?
  - planning to do next w/ 0.13 release
  - aurelien may cut a release
  - justin will do some sanity testing, most patches are in fedora anyway

0.13
- rc RSN (hopefully this week, top priority for anthony)

kvm testsuite
- was planning to clean up and contribute to qemu
- now thinking perhaps just split it out to its own repo
  - not really qemu code, not really kvm code, not cross compile, etc..
  - could use std serial device
  - could use vga (needs mmio space)
  - 
- would like to add nested svm and (more important) nested vmx
  - small bit to copy l1 to l2 state, to make guest nested
  - need framework, can then require nested patches come w/ regression tests
- current testsuite failing on qemu (shows softmmu issues, any takers?)

fw_cfg issues
- mostly on list
- concerns about dma interface (too close to use case specific hack)
- rep could be optimized in general
  - each byte == function call
- possible pull in 4k (instead of 1k) on each exit
- bar for changes should be no new interfaces
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Swap usage with KVM

2010-07-20 Thread Daniel Bareiro
On Sunday, 11 July 2010 19:08:58 -0300,
Daniel Bareiro wrote:

   I have an installation with Debian GNU/Linux 5.0.4 amd64 with
   qemu-kvm 0.12.3 compiled with the source code obtained from the
   official site of KVM and Linux 2.6.32.12 compiled from source code
   of kernel.org. All this is installed on an HP Proliant DL380 G6
   with two Xeon E5530 quadcore processors and 16 GiB of RAM which
   has two VMs with the following configuration of memory:

  Are you using virtio drivers in the VMs?
  
  There was an issue with KVM-72 and virtio that leaks memory in the
  host until all RAM and swap is used (inside the VMs, no swap is
  used). It was supposed to be fixed in KVM-80-something, though.
  
  Perhaps something similar is happening again?  If you switch the
  disks to scsi instead of virtio, does the problem go away?
  
  We are running KVM-72 on Debian 5.0 and have run into this issue.
  We'll be upgrading our hosts this month to fix this.

 Yes, we are using Virtio drivers for networking and storage in both
 VMs with cache=none. Both VMs are running Linux 2.6.32-bpo.5-amd64
 from Lenny Backports repositories. For VMHost, we are using a stable
 version of KVM with Linux 2.6.32.12 compiled from source code of
 kernel.org and qemu-kvm 0.12.3 compiled with the source code obtained
 from the official site of KVM.
 
 This is the syntax I'm using to boot the virtual machines:
 
 
  8587 ?Sl   6515:25 /usr/local/qemu-kvm/bin/qemu-system-x86_64 -drive
 file=/dev/vm/aps4-raiz,cache=none,if=virtio,boot=on -drive
 file=/dev/vm/aps4-cache,cache=none,if=virtio -drive 
 file=/dev/vm/aps4-index,cache=none,if=virtio
 -drive file=/dev/vm/aps4-space,cache=none,if=virtio -m 7168 -smp 4 -net
 nic,model=virtio,macaddr=00:16:3e:00:00:95 -net tap -daemonize -vnc :3 -k es 
 -localtime -monitor
 telnet:localhost:4003,server,nowait -serial 
 telnet:localhost:4043,server,nowait
 
  9769 ?Rl   11968:47 /usr/local/qemu-kvm/bin/qemu-system-x86_64 -drive
 file=/dev/vm/leela-raiz,cache=none,if=virtio,boot=on -drive
 file=/dev/vm/leela-u01,cache=none,if=virtio -drive 
 file=/dev/vm/leela-u02,cache=none,if=virtio
 -drive file=/dev/vm/leela-u03,cache=none,if=virtio -drive
 file=/dev/vm/leela-u04,cache=none,if=virtio -drive 
 file=/dev/vm/leela-u05,cache=none,if=virtio
 -drive file=/dev/vm/leela-u06,cache=none,if=virtio -drive
 file=/dev/vm/leela-u07,cache=none,if=virtio -drive 
 file=/dev/vm/leela-u08,cache=none,if=virtio
 -drive file=/dev/vm/leela-u09,cache=none,if=virtio -drive
 file=/dev/vm/leela-space,cache=none,if=virtio -m 7168 -smp 8 -net
 nic,model=virtio,macaddr=00:16:3e:00:00:96 -net tap -daemonize -vnc :4 -k es 
 -localtime -monitor
 telnet:localhost:4004,server,nowait -serial 
 telnet:localhost:4044,server,nowait

 To make the switch from Virtio to SCSI I would have to shut down the
 hosts, which would not be a good idea whereas are two productive
 systems. At least, before doing so I would be sure of what might be
 the problem.
 
 Taking a current measurement in VMHost with free, I got the following:
 
 
 ss04:~# free
  total   used   free sharedbuffers cached
 Mem:  16461588   16406504  55084  0   2920  21504
 -/+ buffers/cache:   16382080  79508
 Swap:  2028492 9831401045352
 
 
 It draws attention to me that thinking about initially leaving a margin
 of 2 GB of RAM for the VMHost, already it has used almost half of swap.

This is a current measurement I've taken in both the VMs and in VMHost:

* VMHost:


ss04:~# free
 total   used   free sharedbuffers cached
Mem:  16461588   16405140  56448  0   3496  18604
-/+ buffers/cache:   16383040  78548
Swap:  517422024015522772668


* Aps4:

aps4:~# free
 total   used   free sharedbuffers cached
Mem:   71643007120192  44108  0  23108 239076
-/+ buffers/cache:6858008 306292
Swap:  2931820  140842917736


* Leela:

leela:~# free
 total   used   free sharedbuffers cached
Mem:   71638366905224 258612  0 1233806282816
-/+ buffers/cache: 4990286664808
Swap:   979924  35640 944284


As you can see, I added more swap in VMHost for more margin, but
currently only 54% is free.



Thanks in advance for your replies.

Regards,
Daniel
-- 
Fingerprint: BFB3 08D6 B4D1 31B2 72B9  29CE 6696 BF1B 14E6 1D37
Powered by Debian GNU/Linux Lenny - Linux user #188.598


signature.asc
Description: Digital signature


Re: [Qemu-devel] [RFC PATCH 01/14] KVM-test: Add a new macaddress pool algorithm

2010-07-20 Thread Michael Goldish
On 07/20/2010 04:44 PM, Amos Kong wrote:
 On Tue, Jul 20, 2010 at 01:19:39PM +0300, Michael Goldish wrote:

 
 Michael,
 
 Thanks for your comments. Let's simplify this method together.
 
 On 07/20/2010 04:34 AM, Amos Kong wrote:
 Old method uses the mac address in the configuration files which could
 lead serious problem when multiple tests running in different hosts.

 This patch adds a new macaddress pool algorithm, it generates the mac prefix
 based on mac address of the host which could eliminate the duplicated mac
 addresses between machines.

 When user have set the mac_prefix in the configuration file, we should use 
 it
 in stead of the dynamic generated mac prefix.

 Other change:
 . Fix randomly generating mac address so that it correspond to IEEE802.
 . Update clone function to decide clone mac address or not.
 . Update get_macaddr function.
 . Add set_mac_address function.

 New auto mac address pool algorithm:
 If address_index is defined, VM will get mac from config file then record 
 mac
 in to address_pool. If address_index is not defined, VM will call
 get_mac_from_pool to auto create mac then recored mac to address_pool in
 following format:
 {'macpool': {'AE:9D:94:6A:9b:f9': ['20100310-165222-Wt7l:0']}}

   AE:9D:94:6A:9b:f9: mac address
   20100310-165222-Wt7l : instance attribute of VM
   0: index of NIC

 Why do you use the mac address as a key, instead of the instance string
 + nic index?  When the mac address is used as a key, each key has a list
 of values instead of just one value.  This order seems unnatural.  If it
 were the other way around (i.e. key = VM instance + nic index, value =
 mac address), then each key would have exactly one value, and I think
 this patch would be shorter and simpler.
 
 One mac address may be used by two VMs, eg. migration.

Sure, that's why I thought the opposite direction would be better: keys
= VMs (nics), values = mac addresses.  That way we have one value per
key, instead of a list of values per key.

To clarify, instead of using:

{'AE:9D:94:6A:9b:f9': ['20100310-165222-Wt7l:0',
'20100310-165222-Wt7l:1', '20100310-165222-Wt7l:2']}

I suggest:

{'20100310-165222-Wt7l:0': 'AE:9D:94:6A:9b:f9',
 '20100310-165222-Wt7l:1': 'AE:9D:94:6A:9b:f9',
 '20100310-165222-Wt7l:2': 'AE:9D:94:6A:9b:f9'}

 Signed-off-by: Jason Wang jasow...@redhat.com
 Signed-off-by: Feng Yang fy...@redhat.com
 Signed-off-by: Amos Kong ak...@redhat.com
 ---
  0 files changed, 0 insertions(+), 0 deletions(-)

 diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
 index fb2d1c2..7c0946e 100644
 --- a/client/tests/kvm/kvm_utils.py
 +++ b/client/tests/kvm/kvm_utils.py
 @@ -5,6 +5,7 @@ KVM test utility functions.
  
  
  import time, string, random, socket, os, signal, re, logging, commands, 
 cPickle
 +import fcntl, shelve
  from autotest_lib.client.bin import utils
  from autotest_lib.client.common_lib import error, logging_config
  import kvm_subprocess
 @@ -82,6 +83,104 @@ def get_sub_dict_names(dict, keyword):
  
  # Functions related to MAC/IP addresses
  
 +def get_mac_from_pool(root_dir, vm, nic_index, prefix='00:11:22:33:'):

 The name of this function is confusing because it does the exact
 opposite: it puts a mac address in address_pool.  Maybe the pool you're
 referring to in the name isn't address_pool, but still a less confusing
 name should probably be used.
 
 How about allocate_mac(...) ?
 address_pool - address_container
 
 Allocate mac address and record into address_container.

Yes, something like that, sounds less confusing.

 +
 +random generated mac address.
 +
 +1) First try to generate macaddress based on the mac address prefix.
 +2) And then try to use total random generated mac address.
 +
 +@param root_dir: Root dir for kvm
 +@param vm: Here we use instance of vm
 +@param nic_index: The index of nic.
 +@param prefix: Prefix of mac address.
 +@Return: Return mac address.
 +
 +
 +lock_filename = os.path.join(root_dir, mac_lock)
 +lock_file = open(lock_filename, 'w')
 +fcntl.lockf(lock_file.fileno() ,fcntl.LOCK_EX)
 +mac_filename = os.path.join(root_dir, address_pool)

 Maybe it makes sense to put address_pool and the lock file in /tmp,
 where they can be shared by more than a single autotest instance running
 on the same host (unlikely, but theoretically possible).
 
 good idea.
  
 +mac_shelve = shelve.open(mac_filename, writeback=False)
 +
 +mac_pool = mac_shelve.get(macpool)

 Why is this 'macpool' needed?  Why not put the keys directly in the
 shelve object?
  
 yes, put keys directly in the shelve object is better.
 
 +if not mac_pool:
 +mac_pool = {}
 +found = False
 +
 +val = %s:%s % (vm, nic_index)
 +for key in mac_pool.keys():
 +if val in mac_pool[key]:
 +mac_pool[key].append(val)

 Why append val to mac_pool[key] if val is already in mac_pool[key]?
 
 need drop it.
 
 +found = True
 + 

Re: [Qemu-devel] KVM call minutes for July 20

2010-07-20 Thread Aurelien Jarno
It's a pitty I can't easily attend to this conference call, as it seems
a lot of decisions are taken there. Anyway let me comment the part
concerning 0.12 stable:

On Tue, Jul 20, 2010 at 07:45:51AM -0700, Chris Wright wrote:
 0.12.stable
 - start w/ git tree + pull requests
 - release process is separate from commit access
 - justin will put up a tree for pull requests
 - there's current backlog, what about that?

I think someone should actively follow the patches committed to HEAD and
backport them when they seems to be stable material. I guess it's what's
Justin plans to do.

OTOH, it might be useful if people sending patches to HEAD adds a small
comment about cherry-picking the patch to stable if it applies.

 - anthony's concern with -stable is the testing (upstream tree gets more
   testing than -stable)

Debian gets regular uploads with the contents of the -stable tree
between to releases. Also patches from trunk are all cherry-picked from
HEAD.

 - 0.12.5?
   - planning to do next w/ 0.13 release
   - aurelien may cut a release

Following the minutes from last week, I sent a call for release, with a
deadline today. I only got the patch series from Kevin. There are
currently 44 patches waiting in the stable tree, so I guess we can go
for a release. I plan to do that later this week if nobody opposes.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call minutes for July 20

2010-07-20 Thread David S. Ahern


On 07/20/10 08:45, Chris Wright wrote:
 0.13
 - rc RSN (hopefully this week, top priority for anthony)

Can Cam's inter-vm shared memory device get committed for 0.13? It's
been stagnant on the list for a while now waiting for inclusion (or NAK
comments).

David
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes for July 20

2010-07-20 Thread Anthony Liguori

On 07/20/2010 11:29 AM, Aurelien Jarno wrote:

It's a pitty I can't easily attend to this conference call, as it seems
a lot of decisions are taken there. Anyway let me comment the part
concerning 0.12 stable:
   


Is it a matter of time zone or conflict?  The call has historically been 
centered around KVM issues but these days it's hard to make such a clear 
distinction..



On Tue, Jul 20, 2010 at 07:45:51AM -0700, Chris Wright wrote:
   

0.12.stable
- start w/ git tree + pull requests
- release process is separate from commit access
- justin will put up a tree for pull requests
- there's current backlog, what about that?
 

I think someone should actively follow the patches committed to HEAD and
backport them when they seems to be stable material. I guess it's what's
Justin plans to do.

OTOH, it might be useful if people sending patches to HEAD adds a small
comment about cherry-picking the patch to stable if it applies.
   


My big concern with -stable is testing.  For folks interested in helping 
out, what I'd really like to see is people explicitly testing their 
patches on -stable.  IOW, just saying this is probably stable material 
is not nearly as helpful as saying, I've verified this cherry picks 
cleanly to stable and tested there.



- anthony's concern with -stable is the testing (upstream tree gets more
   testing than -stable)
 

Debian gets regular uploads with the contents of the -stable tree
between to releases. Also patches from trunk are all cherry-picked from
HEAD.
   


That's good to know.  My main point was that proportionately speaking, 
the master branch gets considerably more testing than the stable 
branch.  Considering that there is a higher expectation of stable too, 
the testing requirement for it is pretty high in my opinion.


Regards,

Anthony Liguori


- 0.12.5?
   - planning to do next w/ 0.13 release
   - aurelien may cut a release
 

Following the minutes from last week, I sent a call for release, with a
deadline today. I only got the patch series from Kevin. There are
currently 44 patches waiting in the stable tree, so I guess we can go
for a release. I plan to do that later this week if nobody opposes.

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Swap usage with KVM

2010-07-20 Thread David Weber

 Yes, we are using Virtio drivers for networking and storage in both VMs
 with cache=none. Both VMs are running Linux 2.6.32-bpo.5-amd64 from
 Lenny Backports repositories. For VMHost, we are using a stable version
 of KVM with Linux 2.6.32.12 compiled from source code of kernel.org and
 qemu-kvm 0.12.3 compiled with the source code obtained from the official
 site of KVM.
 

Afaik this should be this bug
http://sourceforge.net/tracker/?func=detailatid=893831aid=2989366group_id=180599

try upgrading to 0.12.4 or backport this commit
http://git.kernel.org/?p=virt/kvm/qemu-
kvm.git;a=commit;h=012d4869c1eb195e83f159ed7b2bced33f37f960

David
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-20 Thread Greg KH
On Sat, Jul 17, 2010 at 10:45:23AM +0200, Piotr Jaroszy??ski wrote:
 On 16 July 2010 23:58, Tom Lyon p...@cisco.com wrote:
  The VFIO driver is used to allow privileged AND non-privileged processes 
  to
  implement user-level device drivers for any well-behaved PCI, PCI-X, and 
  PCIe
  devices.
 
 Thanks for working on that! I wonder whether it's possible to say what
 are the chances of it being merged to mainline and which version we
 might be talking about?

We still have a long way to go before you need to worry about what
kernel version it's going to show up in...

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/18] Make cpu_tsc_khz updates use local CPU

2010-07-20 Thread Zachary Amsden

On 07/19/2010 10:53 PM, Avi Kivity wrote:

On 07/19/2010 11:06 PM, Zachary Amsden wrote:

+static void tsc_khz_changed(void *data)
  {
-/* nothing */
+struct cpufreq_freqs *freq = data;
+unsigned long khz = 0;
+
+if (data)
+khz = freq-new;
+else if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
+khz = cpufreq_quick_get(raw_smp_processor_id());
+if (!khz)
+khz = tsc_khz;
+__get_cpu_var(cpu_tsc_khz) = khz;
  }


Do we really need to cache cpufreq_quick_get()?  If it's really 
quick, why not just use it everywhere instead of cacheing it?  Not a 
comment on this patch.





If cpufreq is compiled in, but disabled, it returns zero, so we need 
some sort of logic.


Maybe it's better to put it into cpufreq_quick_get().  Inconsistent 
APIs that appear to work are bad.




I don't think it's quite so simple; cpufreq is platform independent and 
tsc_khz is a platform specific export.  It seems cpufreq is designed to 
return zero when disabled and we're the unusual ones for wanting to use it.


Zach
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] device-assignment: Use PCI I/O port sysfs resource file when available

2010-07-20 Thread Alex Williamson
When supported by the host kernel, we can use read/write on the
PCI sysfs resource file for I/O port regions.  This allows us to
avoid raw in/out commands and works with deprivileged guests via
libvirt.  For uid 0 callers, we use in/out directly to avoid any
compatibility issues.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 Required kernel patch pending here:
 http://www.spinics.net/lists/linux-pci/msg09389.html

 hw/device-assignment.c |  131 
 hw/device-assignment.h |1 
 2 files changed, 99 insertions(+), 33 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 2bba22f..37c1278 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -67,6 +67,28 @@ static uint32_t guest_to_host_ioport(AssignedDevRegion 
*region, uint32_t addr)
 return region-u.r_baseport + (addr - region-e_physbase);
 }
 
+static int assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
+  uint32_t addr, int len, uint32_t *val,
+  int write)
+{
+if (dev_region-region-resource_fd == -1)
+return -1;
+
+if (write) {
+if (pwrite(dev_region-region-resource_fd, val, len,
+  (addr - dev_region-e_physbase)) != len) {
+return -1;
+}
+} else {
+if (pread(dev_region-region-resource_fd, val, len,
+  (addr - dev_region-e_physbase)) != len) {
+return -1;
+}
+}
+
+return 0;
+}
+
 static void assigned_dev_ioport_writeb(void *opaque, uint32_t addr,
uint32_t value)
 {
@@ -77,7 +99,9 @@ static void assigned_dev_ioport_writeb(void *opaque, uint32_t 
addr,
  r_pio, (int)r_access-e_physbase,
  (unsigned long)r_access-u.r_baseport, value);
 
-outb(value, r_pio);
+if (assigned_dev_ioport_rw(r_access, addr, 1, value, 1) != 0) {
+outb(value, r_pio);
+}
 }
 
 static void assigned_dev_ioport_writew(void *opaque, uint32_t addr,
@@ -90,7 +114,9 @@ static void assigned_dev_ioport_writew(void *opaque, 
uint32_t addr,
   r_pio, (int)r_access-e_physbase,
  (unsigned long)r_access-u.r_baseport, value);
 
-outw(value, r_pio);
+if (assigned_dev_ioport_rw(r_access, addr, 2, value, 1) != 0) {
+outw(value, r_pio);
+}
 }
 
 static void assigned_dev_ioport_writel(void *opaque, uint32_t addr,
@@ -103,7 +129,9 @@ static void assigned_dev_ioport_writel(void *opaque, 
uint32_t addr,
  r_pio, (int)r_access-e_physbase,
   (unsigned long)r_access-u.r_baseport, value);
 
-outl(value, r_pio);
+if (assigned_dev_ioport_rw(r_access, addr, 4, value, 1) != 0) {
+outl(value, r_pio);
+}
 }
 
 static uint32_t assigned_dev_ioport_readb(void *opaque, uint32_t addr)
@@ -112,7 +140,9 @@ static uint32_t assigned_dev_ioport_readb(void *opaque, 
uint32_t addr)
 uint32_t r_pio = guest_to_host_ioport(r_access, addr);
 uint32_t value;
 
-value = inb(r_pio);
+if (assigned_dev_ioport_rw(r_access, addr, 1, value, 0) != 0) {
+value = inb(r_pio);
+}
 
 DEBUG(r_pio=%08x e_physbase=%08x r_=%08lx value=%08x\n,
   r_pio, (int)r_access-e_physbase,
@@ -127,7 +157,9 @@ static uint32_t assigned_dev_ioport_readw(void *opaque, 
uint32_t addr)
 uint32_t r_pio = guest_to_host_ioport(r_access, addr);
 uint32_t value;
 
-value = inw(r_pio);
+if (assigned_dev_ioport_rw(r_access, addr, 2, value, 0) != 0) {
+value = inw(r_pio);
+}
 
 DEBUG(r_pio=%08x e_physbase=%08x r_baseport=%08lx value=%08x\n,
   r_pio, (int)r_access-e_physbase,
@@ -142,7 +174,9 @@ static uint32_t assigned_dev_ioport_readl(void *opaque, 
uint32_t addr)
 uint32_t r_pio = guest_to_host_ioport(r_access, addr);
 uint32_t value;
 
-value = inl(r_pio);
+if (assigned_dev_ioport_rw(r_access, addr, 4, value, 0) != 0) {
+value = inl(r_pio);
+}
 
 DEBUG(r_pio=%08x e_physbase=%08x r_baseport=%08lx value=%08x\n,
   r_pio, (int)r_access-e_physbase,
@@ -305,7 +339,7 @@ static void assigned_dev_ioport_map(PCIDevice *pci_dev, int 
region_num,
 DEBUG(e_phys=0x% FMT_PCIBUS  r_baseport=%x type=0x%x len=% FMT_PCIBUS 
 region_num=%d \n,
   addr, region-u.r_baseport, type, size, region_num);
 
-if (first_map) {
+if (first_map  region-region-resource_fd  0) {
struct ioperm_data *data;
 
data = qemu_mallocz(sizeof(struct ioperm_data));
@@ -586,19 +620,46 @@ static int assigned_dev_register_regions(PCIRegion 
*io_regions,
  slow_map ? assigned_dev_iomem_map_slow
   : assigned_dev_iomem_map);
 continue;
+} else {
+/* handle port io regions */
+uint32_t val;
+int ret;
+
+/* Test kernel support for ioport resource read/write.  Old
+ * kernels return EIO.  

Re: [PATCH] device-assignment: Use PCI I/O port sysfs resource file when available

2010-07-20 Thread Chris Wright
* Alex Williamson (alex.william...@redhat.com) wrote:
 When supported by the host kernel, we can use read/write on the
 PCI sysfs resource file for I/O port regions.  This allows us to
 avoid raw in/out commands and works with deprivileged guests via
 libvirt.  For uid 0 callers, we use in/out directly to avoid any
 compatibility issues.

won't uid 0 test will fail if libvirt launches qemu with user set to
root (capabilities still get dropped)?

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/6] KVM: MMU: fix forgot reserved bits check in speculative path

2010-07-20 Thread Xiao Guangrong


Xiao Guangrong wrote:
 In the speculative path, we should check guest pte's reserved bits just as
 the real processor does
 

Ping..?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] KVM: Non-atomic interrupt injection

2010-07-20 Thread Marcelo Tosatti
On Tue, Jul 20, 2010 at 04:17:07PM +0300, Avi Kivity wrote:
 Change the interrupt injection code to work from preemptible, interrupts
 enabled context.  This works by adding a -cancel_injection() operation
 that undoes an injection in case we were not able to actually enter the guest
 (this condition could never happen with atomic injection).
 
 Signed-off-by: Avi Kivity a...@redhat.com
 ---
  arch/x86/include/asm/kvm_host.h |1 +
  arch/x86/kvm/svm.c  |   12 
  arch/x86/kvm/vmx.c  |   11 +++
  arch/x86/kvm/x86.c  |   27 ++-
  4 files changed, 38 insertions(+), 13 deletions(-)
 

 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -4709,6 +4709,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
   if (unlikely(r))
   goto out;
  
 + inject_pending_event(vcpu);
 +
 + /* enable NMI/IRQ window open exits if needed */
 + if (vcpu-arch.nmi_pending)
 + kvm_x86_ops-enable_nmi_window(vcpu);
 + else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
 + kvm_x86_ops-enable_irq_window(vcpu);
 +
 + if (kvm_lapic_enabled(vcpu)) {
 + update_cr8_intercept(vcpu);
 + kvm_lapic_sync_to_vapic(vcpu);
 + }
 +
   preempt_disable();
  
   kvm_x86_ops-prepare_guest_switch(vcpu);
 @@ -4727,23 +4740,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
   smp_wmb();
   local_irq_enable();
   preempt_enable();
 + kvm_x86_ops-cancel_injection(vcpu);
   r = 1;
   goto out;
   }
  
 - inject_pending_event(vcpu);
 -
 - /* enable NMI/IRQ window open exits if needed */
 - if (vcpu-arch.nmi_pending)
 - kvm_x86_ops-enable_nmi_window(vcpu);
 - else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
 - kvm_x86_ops-enable_irq_window(vcpu);
 -
 - if (kvm_lapic_enabled(vcpu)) {
 - update_cr8_intercept(vcpu);
 - kvm_lapic_sync_to_vapic(vcpu);
 - }
 -
   srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx);
  
   kvm_guest_enter();

This breaks

int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
{
struct kvm_lapic *apic = vcpu-arch.apic;
int highest_irr;

/* This may race with setting of irr in __apic_accept_irq() and
 * value returned may be wrong, but kvm_vcpu_kick() in
 * __apic_accept_irq
 * will cause vmexit immediately and the value will be
 * recalculated
 * on the next vmentry.
 */

(also valid for nmi_pending and PIC). Can't simply move
atomic_set(guest_mode, 1) in preemptible section as that would make it
possible for kvm_vcpu_kick to IPI stale vcpu-cpu.

Also should undo vmx.rmode.* ?


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG?] vhost assert error with 4GB of RAM

2010-07-20 Thread Michael S. Tsirkin
On Tue, Jul 20, 2010 at 02:42:19PM -0600, Cam Macdonell wrote:
 I think I've found a bug when running a guest with vhost with less
 than 4GB of RAM.
 
 If a guest has less than 4GB of RAM, then above_4g_mem_size is 0 for
 this call to cpu_register_physical_memory() in pc_memory_init() from
 hw/pc.c:922
 
 #if TARGET_PHYS_ADDR_BITS  32
 cpu_register_physical_memory(0x1ULL, above_4g_mem_size,
  ram_addr + below_4g_mem_size);
 #endif

Yes, the fix is in qemu already, it's a matter of merging into qemu-kvm.

 this leads to vhost_client_set_memory being called with size == 0
 
 #3  0x004301f3 in vhost_client_set_memory (client=0x113b010,
 start_addr=4294967296, size=0, phys_offset=3221225472)
 at /home/cam/research/KVM/qemu-kvm/hw/vhost.c:312
 
 which trips the assert at hw/vhost.c:312
 
 static void vhost_client_set_memory(CPUPhysMemoryClient *client,
 target_phys_addr_t start_addr,
 ram_addr_t size,
 ram_addr_t phys_offset)
 {
 
 ...snip...
 
 assert(size);
 ...
 
 something like the following fixes the problem but I'm not sure if
 it's the proper way to handle it.
 
 diff --git a/exec.c b/exec.c
 index 5e9a5b7..991abfc 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -2592,7 +2592,9 @@ void
 cpu_register_physical_memory_offset(target_phys_addr_t start_addr,
  ram_addr_t orig_size = size;
  subpage_t *subpage;
 
 -cpu_notify_set_memory(start_addr, size, phys_offset);
 +if (size  0) {
 +cpu_notify_set_memory(start_addr, size, phys_offset);
 +}
 
  if (phys_offset == IO_MEM_UNASSIGNED) {
  region_offset = start_addr;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] KVM: Non-atomic interrupt injection

2010-07-20 Thread Avi Kivity

On 07/21/2010 03:55 AM, Marcelo Tosatti wrote:



--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4709,6 +4709,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (unlikely(r))
goto out;

+   inject_pending_event(vcpu);
+
+   /* enable NMI/IRQ window open exits if needed */
+   if (vcpu-arch.nmi_pending)
+   kvm_x86_ops-enable_nmi_window(vcpu);
+   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
+   kvm_x86_ops-enable_irq_window(vcpu);
+
+   if (kvm_lapic_enabled(vcpu)) {
+   update_cr8_intercept(vcpu);
+   kvm_lapic_sync_to_vapic(vcpu);
+   }
+
preempt_disable();

kvm_x86_ops-prepare_guest_switch(vcpu);
@@ -4727,23 +4740,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
smp_wmb();
local_irq_enable();
preempt_enable();
+   kvm_x86_ops-cancel_injection(vcpu);
r = 1;
goto out;
}

-   inject_pending_event(vcpu);
-
-   /* enable NMI/IRQ window open exits if needed */
-   if (vcpu-arch.nmi_pending)
-   kvm_x86_ops-enable_nmi_window(vcpu);
-   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
-   kvm_x86_ops-enable_irq_window(vcpu);
-
-   if (kvm_lapic_enabled(vcpu)) {
-   update_cr8_intercept(vcpu);
-   kvm_lapic_sync_to_vapic(vcpu);
-   }
-
srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx);

kvm_guest_enter();
 

This breaks

int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
{
 struct kvm_lapic *apic = vcpu-arch.apic;
 int highest_irr;

 /* This may race with setting of irr in __apic_accept_irq() and
  * value returned may be wrong, but kvm_vcpu_kick() in
  * __apic_accept_irq
  * will cause vmexit immediately and the value will be
  * recalculated
  * on the next vmentry.
  */

(also valid for nmi_pending and PIC). Can't simply move
atomic_set(guest_mode, 1) in preemptible section as that would make it
possible for kvm_vcpu_kick to IPI stale vcpu-cpu.
   


Right.  Can fix by adding a kvm_make_request() to force the retry loop.


Also should undo vmx.rmode.* ?
   


Elaborate?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PPC64/Power7 - 2.6.35-rc5] Bad relocation warnings whileBuilding a CONFIG_RELOCATABLE kernel with CONFIG_ISERIES enabled

2010-07-20 Thread Alexander Graf

On 20.07.2010, at 09:27, Milton Miller wrote:

 On Mon, 19 Jul 2010 about 14:00:56 +0200, Alexander Graf wrote:
 Milton Miller wrote:
 I wrote:
 
 Oh yea, and for book-3s, the code copies from 0x100 to __end_interrupts
 in arch/powerpc/kernel/exceptions-64s.h down to the real 0, but the rest
 of the kernel is at some disjointed address.  The interrupt will go to
 the copy at the real zero.  Any references to code outside that region
 must be done via a full indrect branch (not a relative one), simiar to
 the secondary startup (via following the function pointer in a descriptor
 set in very low memory), or syscall entry and exception vectors via paca.
 
 
 That would still break on normal PPC boxes, as any address accessed in
 real mode has to be inside the RMA. And the #include for
 kvm/book3s_rmhandlers.S happens after __end_interrupts. So I'd end up
 with code that gets executed outside of the RMA after a relocation, right?
 
 Alex
 
 
 Weither its outside of the RMA or not, DO_KVM is creating a branch outside
 of code copied to lowmem.
 
 This is BROKEN.
 
 We have a hard limit that we can't extend _end_interrupts past 0x7000, and
 a soft limit that we can't exceed 0x6000.  If there is space, we could
 move the real mode handler extensions inside end_interrupts in
 exceptions-64s.S, and store the full address in a .quad so it gets
 relocated properly.  Don't subtract the start, we have designed the kernel
 to run with start at a VA that can be used as a EA in real mode.

Moving everything to exceptions-64s.S sounds like the best thing to do. All the 
code in real mode really is there so it stays inside the RMA. I don't think we 
can guarantee that for any code that is not copied, right?

 Otherwise we need to mark KVM_BOOK3S_64 depends on (!RELOCATABLE ||
 BROKEN) for 2.6.35 until we get fixes.

Well - it's only broken when really getting relocated. But I agree, the current 
state doesn't cope with Linux's relocation logic.

 I took a read though the book3s code as of 2.6.34.   A few things I noticed:
 
 (1) The code is using slb large to control the segment size.   It should
 be using SLB B field (or just impliment 256M segments only).

I'm not sure I understand this part? We only use 256MB segments for now.

 (2) It appears that the mtspr and mfspr code is using the same storage for
 bats 4-7 as 0-3 ... I would have expected a 4 + a few places.

Yes, that one is fixed in more recent versions already.

 (3) Its not clear to me that you clear RI when transitioning to the guest
 but its obviously required because you place state in srr0  srr1.

Uh - do I have to clear RI? I'm not prepared to take an interrupt anyways and 
RI is just a soft flag for Linux's handlers, right?

 (4) I don't understand why __kvmppc_vcpu_run turns on interrupts so that
 __kvmppc_vcpu_entry can turn them back off.   Something to do with
 irq trace annotations?

__kvmppc_vcpu_run turns on soft interrupts while __kvmppc_vcpu_entry turns them 
off in MSR. This is so that when enabling interrupts again on guest exit, we 
have the soft enable bit set.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html