[COMMIT master] make-release: misc fixes

2010-06-06 Thread Avi Kivity
From: Michael S. Tsirkin m...@redhat.com

This fixes /tmp usage in make-release script for security.
Also, create output directory if it does not exist.
This also adds a 'tarball' optin to specify output file name.
Finally, remote output file before gzip to avoid prompt
'do you want to overwrite'.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/kvm/scripts/make-release b/kvm/scripts/make-release
index 11d9c27..fdc402b 100755
--- a/kvm/scripts/make-release
+++ b/kvm/scripts/make-release
@@ -1,7 +1,7 @@
 #!/bin/bash -e
 
 usage() {
-echo usage: $0 [--upload] [--formal] commit [name]
+echo usage: $0 [--upload] [--formal] commit [name] [tarball]
 exit 1
 }
 
@@ -12,7 +12,7 @@ formal=
 
 releasedir=~/sf-release
 [[ -z $TMP ]]  TMP=/tmp
-tmpdir=$TMP/qemu-kvm-make-release.$$
+tmpdir=`mktemp -d --tmpdir=$TMP qemu-kvm-make-release.XX`
 while [[ $1 = -* ]]; do
 opt=$1
 shift
@@ -40,9 +40,15 @@ if [[ -z $name ]]; then
 name=$commit
 fi
 
-tarball=$releasedir/$name.tar
+tarball=$3
+if [[ -z $tarball ]]; then
+tarball=$releasedir/$name.tar.gz
+fi
+#strip trailing .gz if any
+tarball=${tarball/%.gz/}
 
 cd $(dirname $0)/../..
+mkdir -p $(dirname $tarball)
 git archive --prefix=$name/ --format=tar $commit  $tarball
 
 mkdir -p $tmpdir
@@ -59,6 +65,7 @@ if [[ -n $formal ]]; then
 rm -rf $tmpdir
 fi
 
+rm -f $tarball.gz
 gzip -9 $tarball
 tarball=$tarball.gz
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Make get_para_features() similar to upstream

2010-06-06 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Accept a CPUState parameter instead of a kvm_context_t.

Signed-off-by: Avi Kivity a...@redhat.com
Acked-by: Glauber Costa glom...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 95b7aa5..0eb4060 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -1132,7 +1132,7 @@ struct kvm_para_features {
{ -1, -1 }
 };
 
-static int get_para_features(kvm_context_t kvm_context)
+static int get_para_features(CPUState *env)
 {
int i, features = 0;
 
@@ -1184,7 +1184,7 @@ int kvm_arch_init_vcpu(CPUState *cenv)
 pv_ent = cpuid_ent[cpuid_nent++];
 memset(pv_ent, 0, sizeof(*pv_ent));
 pv_ent-function = KVM_CPUID_FEATURES;
-pv_ent-eax = get_para_features(kvm_context);
+pv_ent-eax = get_para_features(cenv);
 #endif
 
 kvm_trim_features(cenv-cpuid_features,
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Use get_para_features() from upstream

2010-06-06 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com
Acked-by: Glauber Costa glom...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 0eb4060..3b9be6d 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -1116,34 +1116,6 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *e, 
uint32_t function,
 e-edx = env-regs[R_EDX];
 }
 
-struct kvm_para_features {
-   int cap;
-   int feature;
-} para_features[] = {
-#ifdef KVM_CAP_CLOCKSOURCE
-   { KVM_CAP_CLOCKSOURCE, KVM_FEATURE_CLOCKSOURCE },
-#endif
-#ifdef KVM_CAP_NOP_IO_DELAY
-   { KVM_CAP_NOP_IO_DELAY, KVM_FEATURE_NOP_IO_DELAY },
-#endif
-#ifdef KVM_CAP_PV_MMU
-   { KVM_CAP_PV_MMU, KVM_FEATURE_MMU_OP },
-#endif
-   { -1, -1 }
-};
-
-static int get_para_features(CPUState *env)
-{
-   int i, features = 0;
-
-   for (i = 0; i  ARRAY_SIZE(para_features)-1; i++) {
-   if (kvm_check_extension(kvm_state, para_features[i].cap))
-   features |= (1  para_features[i].feature);
-   }
-
-   return features;
-}
-
 static void kvm_trim_features(uint32_t *features, uint32_t supported)
 {
 int i;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 87c1133..ac9e17b 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -133,8 +133,6 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, 
uint32_t function, int reg)
 
 #endif
 
-#ifdef KVM_UPSTREAM
-
 #ifdef CONFIG_KVM_PARA
 struct kvm_para_features {
 int cap;
@@ -165,6 +163,8 @@ static int get_para_features(CPUState *env)
 }
 #endif
 
+#ifdef KVM_UPSTREAM
+
 int kvm_arch_init_vcpu(CPUState *env)
 {
 struct {
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] msix: fix msix_set/unset_mask_notifier

2010-06-06 Thread Avi Kivity
From: Michael S. Tsirkin m...@redhat.com

Sridhar Samudrala reported hitting the following assertions
in msix.c when doing a guest reboot or live migration using vhost.
qemu-kvm/hw/msix.c:375: msix_mask_all: Assertion `r = 0' failed.
qemu-kvm/hw/msix.c:640: msix_unset_mask_notifier:
Assertion `dev-msix_mask_notifier_opaque[vector]' failed.

The issue is that we didn't clear/set the opaque pointer
when vector is masked. The following patch fixes this.

Signed-off-by: Sridhar Samudrala s...@us.ibm.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
Tested-by: Sridhar Samudrala s...@us.ibm.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/hw/msix.c b/hw/msix.c
index 59c7407..a52338e 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -601,7 +601,7 @@ void msix_unuse_all_vectors(PCIDevice *dev)
 
 int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque)
 {
-int r;
+int r = 0;
 if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector])
 return 0;
 
@@ -610,13 +610,11 @@ int msix_set_mask_notifier(PCIDevice *dev, unsigned 
vector, void *opaque)
 assert(!dev-msix_mask_notifier_opaque[vector]);
 
 /* Unmask the new notifier unless vector is masked. */
-if (msix_is_masked(dev, vector)) {
-return 0;
-}
-r = dev-msix_mask_notifier(dev, vector, opaque,
-msix_is_masked(dev, vector));
-if (r  0) {
-return r;
+if (!msix_is_masked(dev, vector)) {
+r = dev-msix_mask_notifier(dev, vector, opaque, false);
+if (r  0) {
+return r;
+}
 }
 dev-msix_mask_notifier_opaque[vector] = opaque;
 return r;
@@ -625,21 +623,21 @@ int msix_set_mask_notifier(PCIDevice *dev, unsigned 
vector, void *opaque)
 int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector)
 {
 int r = 0;
+void *opaque;
 if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector])
 return 0;
 
+opaque = dev-msix_mask_notifier_opaque[vector];
+
 assert(dev-msix_mask_notifier);
-assert(dev-msix_mask_notifier_opaque[vector]);
+assert(opaque);
 
 /* Mask the old notifier unless it is already masked. */
-if (msix_is_masked(dev, vector)) {
-return 0;
-}
-r = dev-msix_mask_notifier(dev, vector,
-dev-msix_mask_notifier_opaque[vector],
-!msix_is_masked(dev, vector));
-if (r  0) {
-return r;
+if (!msix_is_masked(dev, vector)) {
+r = dev-msix_mask_notifier(dev, vector, opaque, true);
+if (r  0) {
+return r;
+}
 }
 dev-msix_mask_notifier_opaque[vector] = NULL;
 return r;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: MMU: Document large pages

2010-06-06 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/Documentation/kvm/mmu.txt b/Documentation/kvm/mmu.txt
index 1e7ecdd..8cb42b9 100644
--- a/Documentation/kvm/mmu.txt
+++ b/Documentation/kvm/mmu.txt
@@ -317,6 +317,29 @@ on fault type:
 
 (user write faults generate a #PF)
 
+Large pages
+===
+
+The mmu supports all combinations of large and small guest and host pages.
+Supported page sizes include 4k, 2M, 4M, and 1G.  4M pages are treated as
+two separate 2M pages, on both guest and host, since the mmu always uses PAE
+paging.
+
+To instantiate a large spte, four constraints must be satisfied:
+
+- the spte must point to a large host page
+- the guest pte must be a large pte of at least equivalent size (if tdp is
+  enabled, there is no guest pte and this condition is satisified)
+- if the spte will be writeable, the large page frame may not overlap any
+  write-protected pages
+- the guest page must be wholly contained by a single memory slot
+
+To check the last two conditions, the mmu maintains a -write_count set of
+arrays for each memory slot and large page size.  Every write protected page
+causes its write_count to be incremented, thus preventing instantiation of
+a large spte.  The frames at the end of an unaligned memory slot have
+artificically inflated -write_counts so they can never be instantiated.
+
 Further reading
 ===
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: MMU: Don't calculate quadrant if tdp_enabled

2010-06-06 Thread Avi Kivity
From: Gui Jianfeng guijianf...@cn.fujitsu.com

There's no need to calculate quadrant if tdp is enabled.

Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c16c4ca..0e35106 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1378,7 +1378,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
if (role.direct)
role.cr4_pae = 0;
role.access = access;
-   if (vcpu-arch.mmu.root_level = PT32_ROOT_LEVEL) {
+   if (!tdp_enabled  vcpu-arch.mmu.root_level = PT32_ROOT_LEVEL) {
quadrant = gaddr  (PAGE_SHIFT + (PT64_PT_BITS * level));
quadrant = (1  ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1;
role.quadrant = quadrant;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: read apic-irr with ioapic lock held

2010-06-06 Thread Avi Kivity
From: Marcelo Tosatti mtosa...@redhat.com

Read ioapic-irr inside ioapic-lock protected section.

KVM-Stable-Tag
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 3bc4fdb..1149c60 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -193,12 +193,13 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int 
irq)
 
 int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level)
 {
-   u32 old_irr = ioapic-irr;
+   u32 old_irr;
u32 mask = 1  irq;
union kvm_ioapic_redirect_entry entry;
int ret = 1;
 
spin_lock(ioapic-lock);
+   old_irr = ioapic-irr;
if (irq = 0  irq  IOAPIC_NUM_PINS) {
entry = ioapic-redirtbl[irq];
level ^= entry.fields.polarity;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: VMX: Enforce EPT pagetable level checking

2010-06-06 Thread Avi Kivity
From: Sheng Yang sh...@linux.intel.com

We only support 4 levels EPT pagetable now.

Signed-off-by: Sheng Yang sh...@linux.intel.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1b6a3be..3959668 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -340,6 +340,11 @@ static inline bool cpu_has_vmx_ept_1g_page(void)
return vmx_capability.ept  VMX_EPT_1GB_PAGE_BIT;
 }
 
+static inline bool cpu_has_vmx_ept_4levels(void)
+{
+   return vmx_capability.ept  VMX_EPT_PAGE_WALK_4_BIT;
+}
+
 static inline bool cpu_has_vmx_invept_individual_addr(void)
 {
return vmx_capability.ept  VMX_EPT_EXTENT_INDIVIDUAL_BIT;
@@ -1568,7 +1573,8 @@ static __init int hardware_setup(void)
if (!cpu_has_vmx_vpid())
enable_vpid = 0;
 
-   if (!cpu_has_vmx_ept()) {
+   if (!cpu_has_vmx_ept() ||
+   !cpu_has_vmx_ept_4levels()) {
enable_ept = 0;
enable_unrestricted_guest = 0;
}
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: Add Documentation/kvm/msr.txt

2010-06-06 Thread Avi Kivity
From: Glauber Costa glom...@redhat.com

This patch adds a file that documents the usage of KVM-specific
MSRs.

Signed-off-by: Glauber Costa glom...@redhat.com
Reviewed-by: Randy Dunlap randy.dun...@oracle.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/Documentation/kvm/msr.txt b/Documentation/kvm/msr.txt
new file mode 100644
index 000..f9f82c3
--- /dev/null
+++ b/Documentation/kvm/msr.txt
@@ -0,0 +1,153 @@
+KVM-specific MSRs.
+Glauber Costa glom...@redhat.com, Red Hat Inc, 2010
+=
+
+KVM makes use of some custom MSRs to service some requests.
+At present, this facility is only used by kvmclock.
+
+Custom MSRs have a range reserved for them, that goes from
+0x4b564d00 to 0x4b564dff. There are MSRs outside this area,
+but they are deprecated and their use is discouraged.
+
+Custom MSR list
+
+
+The current supported Custom MSR list is:
+
+MSR_KVM_WALL_CLOCK_NEW:   0x4b564d00
+
+   data: 4-byte alignment physical address of a memory area which must be
+   in guest RAM. This memory is expected to hold a copy of the following
+   structure:
+
+   struct pvclock_wall_clock {
+   u32   version;
+   u32   sec;
+   u32   nsec;
+   } __attribute__((__packed__));
+
+   whose data will be filled in by the hypervisor. The hypervisor is only
+   guaranteed to update this data at the moment of MSR write.
+   Users that want to reliably query this information more than once have
+   to write more than once to this MSR. Fields have the following meanings:
+
+   version: guest has to check version before and after grabbing
+   time information and check that they are both equal and even.
+   An odd version indicates an in-progress update.
+
+   sec: number of seconds for wallclock.
+
+   nsec: number of nanoseconds for wallclock.
+
+   Note that although MSRs are per-CPU entities, the effect of this
+   particular MSR is global.
+
+   Availability of this MSR must be checked via bit 3 in 0x401 cpuid
+   leaf prior to usage. 
+
+MSR_KVM_SYSTEM_TIME_NEW:  0x4b564d01
+
+   data: 4-byte aligned physical address of a memory area which must be in
+   guest RAM, plus an enable bit in bit 0. This memory is expected to hold
+   a copy of the following structure:
+
+   struct pvclock_vcpu_time_info {
+   u32   version;
+   u32   pad0;
+   u64   tsc_timestamp;
+   u64   system_time;
+   u32   tsc_to_system_mul;
+   s8tsc_shift;
+   u8flags;
+   u8pad[2];
+   } __attribute__((__packed__)); /* 32 bytes */
+
+   whose data will be filled in by the hypervisor periodically. Only one
+   write, or registration, is needed for each VCPU. The interval between
+   updates of this structure is arbitrary and implementation-dependent.
+   The hypervisor may update this structure at any time it sees fit until
+   anything with bit0 == 0 is written to it.
+
+   Fields have the following meanings:
+
+   version: guest has to check version before and after grabbing
+   time information and check that they are both equal and even.
+   An odd version indicates an in-progress update.
+
+   tsc_timestamp: the tsc value at the current VCPU at the time
+   of the update of this structure. Guests can subtract this value
+   from current tsc to derive a notion of elapsed time since the
+   structure update.
+
+   system_time: a host notion of monotonic time, including sleep
+   time at the time this structure was last updated. Unit is
+   nanoseconds.
+
+   tsc_to_system_mul: a function of the tsc frequency. One has
+   to multiply any tsc-related quantity by this value to get
+   a value in nanoseconds, besides dividing by 2^tsc_shift
+
+   tsc_shift: cycle to nanosecond divider, as a power of two, to
+   allow for shift rights. One has to shift right any tsc-related
+   quantity by this value to get a value in nanoseconds, besides
+   multiplying by tsc_to_system_mul.
+
+   With this information, guests can derive per-CPU time by
+   doing:
+
+   time = (current_tsc - tsc_timestamp)
+   time = (time * tsc_to_system_mul)  tsc_shift 
+   time = time + system_time
+
+   flags: bits in this field indicate extended capabilities
+   coordinated between the guest and the hypervisor. Availability
+   of specific flags has to be checked in 0x4001 cpuid leaf.
+   Current flags are:
+
+flag bit   | cpuid bit| meaning
+  

[COMMIT master] KVM: MMU: reduce remote tlb flush in kvm_mmu_pte_write()

2010-06-06 Thread Avi Kivity
From: Xiao Guangrong xiaoguangr...@cn.fujitsu.com

collect remote tlb flush in kvm_mmu_pte_write() path

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index cc5bc25..6b2c644 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2664,11 +2664,15 @@ static bool need_remote_flush(u64 old, u64 new)
return (old  ~new  PT64_PERM_MASK) != 0;
 }
 
-static void mmu_pte_write_flush_tlb(struct kvm_vcpu *vcpu, u64 old, u64 new)
+static void mmu_pte_write_flush_tlb(struct kvm_vcpu *vcpu, bool zap_page,
+   bool remote_flush, bool local_flush)
 {
-   if (need_remote_flush(old, new))
+   if (zap_page)
+   return;
+
+   if (remote_flush)
kvm_flush_remote_tlbs(vcpu-kvm);
-   else
+   else if (local_flush)
kvm_mmu_flush_tlb(vcpu);
 }
 
@@ -2733,6 +2737,9 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
int npte;
int r;
int invlpg_counter;
+   bool remote_flush, local_flush, zap_page;
+
+   zap_page = remote_flush = local_flush = false;
 
pgprintk(%s: gpa %llx bytes %d\n, __func__, gpa, bytes);
 
@@ -2806,7 +2813,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 */
pgprintk(misaligned: gpa %llx bytes %d role %x\n,
 gpa, bytes, sp-role.word);
-   kvm_mmu_prepare_zap_page(vcpu-kvm, sp,
+   zap_page |= !!kvm_mmu_prepare_zap_page(vcpu-kvm, sp,
 invalid_list);
++vcpu-kvm-stat.mmu_flooded;
continue;
@@ -2831,16 +2838,19 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
if (quadrant != sp-role.quadrant)
continue;
}
+   local_flush = true;
spte = sp-spt[page_offset / sizeof(*spte)];
while (npte--) {
entry = *spte;
mmu_pte_write_zap_pte(vcpu, sp, spte);
if (gentry)
mmu_pte_write_new_pte(vcpu, sp, spte, gentry);
-   mmu_pte_write_flush_tlb(vcpu, entry, *spte);
+   if (!remote_flush  need_remote_flush(entry, *spte))
+   remote_flush = true;
++spte;
}
}
+   mmu_pte_write_flush_tlb(vcpu, zap_page, remote_flush, local_flush);
kvm_mmu_commit_zap_page(vcpu-kvm, invalid_list);
kvm_mmu_audit(vcpu, post pte write);
spin_unlock(vcpu-kvm-mmu_lock);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into next

2010-06-06 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

* 'master' of 
ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6: (1443 
commits)
  Linux 2.6.35-rc2
  drm/i915: Move non-phys cursors into the GTT
  ext4: Fix remaining racy updates of EXT4_I(inode)-i_flags
  module: fix bne2 gave up waiting for init of module libcrc32c
  module: verify_export_symbols under the lock
  module: move find_module check to end
  module: make locking more fine-grained.
  module: Make module sysfs functions private.
  module: move sysfs exposure to end of load_module
  module: fix kdb's illicit use of struct module_use.
  module: Make the 'usage' lists be two-way
  X25: remove duplicated #include
  tcp: use correct net ns in cookie_v4_check()
  rps: tcp: fix rps_sock_flow_table table updates
  ppp_generic: fix multilink fragment sizes
  syncookies: remove Kconfig text line about disabled-by-default
  ixgbe: only check pfc bits in hang logic if pfc is enabled
  net: check for refcount if pop a stacked dst_entry
  omap: remove BUG_ON for disabled interrupts
  vmscan: fix do_try_to_free_pages() return value when priority==0 reclaim 
failure
  ...

Conflicts:
arch/powerpc/kernel/ppc_ksyms.c

Signed-off-by: Avi Kivity a...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: MMU: split the operations of kvm_mmu_zap_page()

2010-06-06 Thread Avi Kivity
From: Xiao Guangrong xiaoguangr...@cn.fujitsu.com

Using kvm_mmu_prepare_zap_page() and kvm_mmu_commit_zap_page() to
split kvm_mmu_zap_page() function, then we can:

- traverse hlist safely
- easily to gather remote tlb flush which occurs during page zapped

Those feature can be used in the later patches

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 99dddad..77eebbc 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -916,6 +916,7 @@ static int is_empty_shadow_page(u64 *spt)
 static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
ASSERT(is_empty_shadow_page(sp-spt));
+   hlist_del(sp-hash_link);
list_del(sp-link);
__free_page(virt_to_page(sp-spt));
if (!sp-role.direct)
@@ -1200,6 +1201,10 @@ static void kvm_unlink_unsync_page(struct kvm *kvm, 
struct kvm_mmu_page *sp)
 }
 
 static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp);
+static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
+   struct list_head *invalid_list);
+static void kvm_mmu_commit_zap_page(struct kvm *kvm,
+   struct list_head *invalid_list);
 
 #define for_each_gfn_sp(kvm, sp, gfn, pos, n)  \
   hlist_for_each_entry_safe(sp, pos, n,
\
@@ -1530,7 +1535,8 @@ static void kvm_mmu_unlink_parents(struct kvm *kvm, 
struct kvm_mmu_page *sp)
 }
 
 static int mmu_zap_unsync_children(struct kvm *kvm,
-  struct kvm_mmu_page *parent)
+  struct kvm_mmu_page *parent,
+  struct list_head *invalid_list)
 {
int i, zapped = 0;
struct mmu_page_path parents;
@@ -1544,7 +1550,7 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
struct kvm_mmu_page *sp;
 
for_each_sp(pages, sp, parents, i) {
-   kvm_mmu_zap_page(kvm, sp);
+   kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
mmu_pages_clear_parents(parents);
zapped++;
}
@@ -1554,16 +1560,16 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
return zapped;
 }
 
-static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
+   struct list_head *invalid_list)
 {
int ret;
 
-   trace_kvm_mmu_zap_page(sp);
+   trace_kvm_mmu_prepare_zap_page(sp);
++kvm-stat.mmu_shadow_zapped;
-   ret = mmu_zap_unsync_children(kvm, sp);
+   ret = mmu_zap_unsync_children(kvm, sp, invalid_list);
kvm_mmu_page_unlink_children(kvm, sp);
kvm_mmu_unlink_parents(kvm, sp);
-   kvm_flush_remote_tlbs(kvm);
if (!sp-role.invalid  !sp-role.direct)
unaccount_shadowed(kvm, sp-gfn);
if (sp-unsync)
@@ -1571,17 +1577,45 @@ static int kvm_mmu_zap_page(struct kvm *kvm, struct 
kvm_mmu_page *sp)
if (!sp-root_count) {
/* Count self */
ret++;
-   hlist_del(sp-hash_link);
-   kvm_mmu_free_page(kvm, sp);
+   list_move(sp-link, invalid_list);
} else {
-   sp-role.invalid = 1;
list_move(sp-link, kvm-arch.active_mmu_pages);
kvm_reload_remote_mmus(kvm);
}
+
+   sp-role.invalid = 1;
kvm_mmu_reset_last_pte_updated(kvm);
return ret;
 }
 
+static void kvm_mmu_commit_zap_page(struct kvm *kvm,
+   struct list_head *invalid_list)
+{
+   struct kvm_mmu_page *sp;
+
+   if (list_empty(invalid_list))
+   return;
+
+   kvm_flush_remote_tlbs(kvm);
+
+   do {
+   sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
+   WARN_ON(!sp-role.invalid || sp-root_count);
+   kvm_mmu_free_page(kvm, sp);
+   } while (!list_empty(invalid_list));
+
+}
+
+static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+   LIST_HEAD(invalid_list);
+   int ret;
+
+   ret = kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
+   kvm_mmu_commit_zap_page(kvm, invalid_list);
+   return ret;
+}
+
 /*
  * Changing the number of mmu pages allocated to the vm
  * Note: if kvm_nr_mmu_pages is too small, you will get dead lock
diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
index 42f07b1..3aab0f0 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmutrace.h
@@ -190,7 +190,7 @@ DEFINE_EVENT(kvm_mmu_page_class, kvm_mmu_unsync_page,
TP_ARGS(sp)
 );
 
-DEFINE_EVENT(kvm_mmu_page_class, kvm_mmu_zap_page,
+DEFINE_EVENT(kvm_mmu_page_class, kvm_mmu_prepare_zap_page,
TP_PROTO(struct 

[COMMIT master] KVM: MMU: gather remote tlb flush which occurs during page zapped

2010-06-06 Thread Avi Kivity
From: Xiao Guangrong xiaoguangr...@cn.fujitsu.com

Using kvm_mmu_prepare_zap_page() and kvm_mmu_zap_page() instead of
kvm_mmu_zap_page() that can reduce remote tlb flush IPI

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7713660..988c4fc 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1200,7 +1200,6 @@ static void kvm_unlink_unsync_page(struct kvm *kvm, 
struct kvm_mmu_page *sp)
--kvm-stat.mmu_unsync;
 }
 
-static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp);
 static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
struct list_head *invalid_list);
 static void kvm_mmu_commit_zap_page(struct kvm *kvm,
@@ -1218,10 +1217,10 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
(sp)-role.invalid) {} else
 
 static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
-  bool clear_unsync)
+  struct list_head *invalid_list, bool clear_unsync)
 {
if (sp-role.cr4_pae != !!is_pae(vcpu)) {
-   kvm_mmu_zap_page(vcpu-kvm, sp);
+   kvm_mmu_prepare_zap_page(vcpu-kvm, sp, invalid_list);
return 1;
}
 
@@ -1232,7 +1231,7 @@ static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp,
}
 
if (vcpu-arch.mmu.sync_page(vcpu, sp)) {
-   kvm_mmu_zap_page(vcpu-kvm, sp);
+   kvm_mmu_prepare_zap_page(vcpu-kvm, sp, invalid_list);
return 1;
}
 
@@ -1244,17 +1243,22 @@ static void mmu_convert_notrap(struct kvm_mmu_page *sp);
 static int kvm_sync_page_transient(struct kvm_vcpu *vcpu,
   struct kvm_mmu_page *sp)
 {
+   LIST_HEAD(invalid_list);
int ret;
 
-   ret = __kvm_sync_page(vcpu, sp, false);
+   ret = __kvm_sync_page(vcpu, sp, invalid_list, false);
if (!ret)
mmu_convert_notrap(sp);
+   else
+   kvm_mmu_commit_zap_page(vcpu-kvm, invalid_list);
+
return ret;
 }
 
-static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+struct list_head *invalid_list)
 {
-   return __kvm_sync_page(vcpu, sp, true);
+   return __kvm_sync_page(vcpu, sp, invalid_list, true);
 }
 
 /* @gfn should be write-protected at the call site */
@@ -1262,6 +1266,7 @@ static void kvm_sync_pages(struct kvm_vcpu *vcpu,  gfn_t 
gfn)
 {
struct kvm_mmu_page *s;
struct hlist_node *node, *n;
+   LIST_HEAD(invalid_list);
bool flush = false;
 
for_each_gfn_indirect_valid_sp(vcpu-kvm, s, gfn, node, n) {
@@ -1271,13 +1276,14 @@ static void kvm_sync_pages(struct kvm_vcpu *vcpu,  
gfn_t gfn)
WARN_ON(s-role.level != PT_PAGE_TABLE_LEVEL);
if ((s-role.cr4_pae != !!is_pae(vcpu)) ||
(vcpu-arch.mmu.sync_page(vcpu, s))) {
-   kvm_mmu_zap_page(vcpu-kvm, s);
+   kvm_mmu_prepare_zap_page(vcpu-kvm, s, invalid_list);
continue;
}
kvm_unlink_unsync_page(vcpu-kvm, s);
flush = true;
}
 
+   kvm_mmu_commit_zap_page(vcpu-kvm, invalid_list);
if (flush)
kvm_mmu_flush_tlb(vcpu);
 }
@@ -1348,6 +1354,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *sp;
struct mmu_page_path parents;
struct kvm_mmu_pages pages;
+   LIST_HEAD(invalid_list);
 
kvm_mmu_pages_init(parent, parents, pages);
while (mmu_unsync_walk(parent, pages)) {
@@ -1360,9 +1367,10 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
kvm_flush_remote_tlbs(vcpu-kvm);
 
for_each_sp(pages, sp, parents, i) {
-   kvm_sync_page(vcpu, sp);
+   kvm_sync_page(vcpu, sp, invalid_list);
mmu_pages_clear_parents(parents);
}
+   kvm_mmu_commit_zap_page(vcpu-kvm, invalid_list);
cond_resched_lock(vcpu-kvm-mmu_lock);
kvm_mmu_pages_init(parent, parents, pages);
}
@@ -1606,16 +1614,6 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 
 }
 
-static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp)
-{
-   LIST_HEAD(invalid_list);
-   int ret;
-
-   ret = kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
-   kvm_mmu_commit_zap_page(kvm, invalid_list);
-   return ret;
-}
-
 /*
  * Changing the number of mmu pages allocated to the vm
  * Note: if kvm_nr_mmu_pages is too small, you will get dead lock
@@ -1623,6 +1621,7 @@ static int kvm_mmu_zap_page(struct kvm *kvm, struct 
kvm_mmu_page *sp)
 void 

[KVM-AUTOTEST PATCH] KVM test: kvm_vm.py: don't require pci_assignable to be defined

2010-06-06 Thread Michael Goldish
Currently to disable PCI device assignment pci_assignable must be explicitly
set to no.  This patch allows it to remain undefined (and adds a warning
message and a comment).

Signed-off-by: Michael Goldish mgold...@redhat.com
---
 client/tests/kvm/kvm_vm.py |   47 +--
 1 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
index af45a81..78cbb16 100755
--- a/client/tests/kvm/kvm_vm.py
+++ b/client/tests/kvm/kvm_vm.py
@@ -502,49 +502,48 @@ class VM:
 self.uuid = f.read().strip()
 f.close()
 
-if not params.get(pci_assignable) == no:
-pa_type = params.get(pci_assignable)
+# Assign a PCI assignable device
+self.pci_assignable = None
+pa_type = params.get(pci_assignable)
+if pa_type in [vf, pf, mixed]:
 pa_devices_requested = params.get(devices_requested)
 
 # Virtual Functions (VF) assignable devices
 if pa_type == vf:
-pa_driver = params.get(driver)
-pa_driver_option = params.get(driver_option)
-self.pci_assignable = kvm_utils.PciAssignable(type=pa_type,
-driver=pa_driver,
-driver_option=pa_driver_option,
-devices_requested=pa_devices_requested)
+self.pci_assignable = kvm_utils.PciAssignable(
+type=pa_type,
+driver=params.get(driver),
+driver_option=params.get(driver_option),
+devices_requested=pa_devices_requested)
 # Physical NIC (PF) assignable devices
 elif pa_type == pf:
-pa_device_names = params.get(device_names)
-self.pci_assignable = kvm_utils.PciAssignable(type=pa_type,
- names=pa_device_names,
- 
devices_requested=pa_devices_requested)
+self.pci_assignable = kvm_utils.PciAssignable(
+type=pa_type,
+names=params.get(device_names),
+devices_requested=pa_devices_requested)
 # Working with both VF and PF
 elif pa_type == mixed:
-pa_device_names = params.get(device_names)
-pa_driver = params.get(driver)
-pa_driver_option = params.get(driver_option)
-self.pci_assignable = kvm_utils.PciAssignable(type=pa_type,
-driver=pa_driver,
-driver_option=pa_driver_option,
-names=pa_device_names,
-devices_requested=pa_devices_requested)
+self.pci_assignable = kvm_utils.PciAssignable(
+type=pa_type,
+driver=params.get(driver),
+driver_option=params.get(driver_option),
+names=params.get(device_names),
+devices_requested=pa_devices_requested)
 
 self.pa_pci_ids = self.pci_assignable.request_devs()
 
 if self.pa_pci_ids:
-logging.debug(Successfuly assigned devices: %s %
+logging.debug(Successfuly assigned devices: %s,
   self.pa_pci_ids)
 else:
 logging.error(No PCI assignable devices were assigned 
   and 'pci_assignable' is defined to %s 
-  on your config file. Aborting VM creation. 
%
+  on your config file. Aborting VM creation.,
   pa_type)
 return False
 
-else:
-self.pci_assignable = None
+elif pa_type and pa_type != no:
+logging.warn(Unsupported pci_assignable type: %s, pa_type)
 
 # Make qemu command
 qemu_command = self.make_qemu_command()
-- 
1.5.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH]: kdump on KVM

2010-06-06 Thread Avi Kivity

On 06/04/2010 11:11 PM, Chris Lalancette wrote:

Hello,
  I've gone back to look at the problem with kdump while running as a KVM
guest.  I'll preface this discussion by noting that all of the problems I
describe below only happen when the HPET emulation is shutoff.  However, since
there still seem to be some lingering issues with the HPET emulation, running
without the HPET is a reasonable mode of operation, and one in which it would
be nice to have kexec/kdump working.
  The basic problem with getting kdump working is that currently KVM
delivers the timer interrupts only to vcpu 0, so kexecing onto another
processor will never see timer interrupts.  In the KVM code, there are a
number of places where we make incorrect assumptions regarding the timer and
which vcpu to deliver it to:

1)  In the i8254.c code when the hrtimer representing the PIT expires.  In
this case, when we get the callback, we kick only the BSP.

2)  In the i8254.c code when a vcpu is migrated from one processor to another.
In this case we only migrate the PIT timer if the vcpu to be migrated is the
BSP.

3)  In the lapic code when deciding whether to accept a PIC interrupt, we only
accept interrupts on the BSP.

4)  In the irq_comm.c code when calling kvm_irq_delivery_to_apic().  The
problem here is that we don't take into account the fact that an LAPIC might
be disabled when trying to deliver an interrupt in DM_LOWEST mode.  Further,
on a kexec, the processor that we are kexec'ing *to* gets it's APIC ID
re-written to the BSP APIC ID.  What it means in the end is that we are
currently still matching against the BSP even though vcpu 1 (where the kexec
is happening) would match if we let it.

The attached patch takes care of 1), 3), and 4), and works in my testing (it
needs to be cleaned up a bit to not be so inefficient, but it should work).
However, problem 2) is pretty sticky.  The reason we are currently migrating
the PIT timer around with the BSP is pretty well explained in commit
2f5997140f22f68f6390c49941150d3fa8a95cb7.  With my new patch, though, we are
no longer guaranteeing that we are going to inject onto CPU 0.  I think we
can do something where when the hrtimer expires, we figure out which
vcpu will get the timer interrupt and IPI to that physical processor to cause
the VMEXIT.  Unfortunately it's racy because the expiration of the hrtimer is
de-coupled from the setting of the IRQ for the interrupt.


It's only racy because of (perhaps not immature) optimization. 
Logically, the flow of events is as follows:


1. the httimer fires
2. irq0 is toggled (perhaps via a workqueue since the hrtimer is in 
interrupt context)
3. the PIC and IOAPIC state machines run and decide which local APIC(s), 
if any, need to be notified

4. the local APICs are IPIed.

Perhaps we should start by deoptimizing the code to work like the above.

In light of the below, I'm not sure the race actually exists.


  That means that
the hrtimer could  expire, we could choose vcpu 2 (say), IPI to cause a
VMEXIT, but by the time it goes to VMENTER the guest has changed something in
the (IO)APIC(s) so that the set_irq logic chooses vcpu 3 to do the injection.
This would result in a delayed injection that 2f5997 is trying to avoid.
   


Not sure that's the race here.  Between the vmexit and vmenter we have 
the following trace:


  kvm_inject_pending_timer_irqs()
  kvm_inject_pit_timer_irqs()
  __inject_pit_timer_intr()
  kvm_set_irq()

So, even if we end up on the wrong vcpu, we will IPI the right one later.


Taking a step back, it seems to me that something along the lines of my
previous patchset (where we do set_irq directly from the hrtimer callback) is
the right way to go.  We would still need to IPI to the appropriate physical
cpu to cause a VMEXIT on the cpu we care about, but we would avoid the race I
describe in the previous paragraph.


I agree.  But we also want the vcpu that accepts the interrupt to pull 
the hrtimer after it (like it does now) to minimize cross-cpu talk.



Unfortunately that patchset is also much
more risky.
   


I'd like a minimal, backportable patchset first, then bigger changes 
later.  We have at least two choices, irq-safe pit and ioapic, and using 
a workqueue for to match impedances.  The latter is a lot safer since we 
already do it for assigned devices.



diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 3c4f8f6..bc40143 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -231,7 +231,13 @@ int pit_has_pending_timer(struct kvm_vcpu *vcpu)
  {
struct kvm_pit *pit = vcpu-kvm-arch.vpit;

-   if (pit  kvm_vcpu_is_bsp(vcpu)  pit-pit_state.irq_ack)
+   /* We only want to inject a single IRQ for each PIT timer that goes
+* off.  If 2 vcpus happen to come in here at the same time, we may
+* erroneously tell one of them that it has a timer pending.  That's
+* OK, though; in kvm_inject_pit_timer_irqs we will make sure to only
+* let one of them 

Re: Perf trace event parse errors for KVM events

2010-06-06 Thread Avi Kivity

On 06/04/2010 12:57 AM, Steven Rostedt wrote:

On Tue, 2010-06-01 at 15:39 +0300, Avi Kivity wrote:
   

On 06/01/2010 02:59 PM, Steven Rostedt wrote:
 
   

I meant that viewing would be slowed down.  It's an important part of
using ftrace!

How long does the Python formatter take to process 100k or 1M events?

 

I finally got around to testing this.

I ran a trace on lock_acquire, and traced 1,253,296 events.

I then created a python plugin to analyze the trace:


def lock_acquire(trace_seq, event):
 t = ''
 r = ''
 if int(event['flags'])  1:
 t = 'try'
 if int(event['flags'])  2:
 r = 'read'
 trace_seq.puts('t %x %s%s%s' % (
 event['lockdep_addr'], t, r,
 event['name']))

def register(pevent):
 pevent.register_event_handler(lock, lock_acquire, lock_acquire)


Disclaimer, I'm not a python expert, and I'm sure the above python code
sucks.
   


I think Python more or less guarantees your code will suck no matter 
what you do.



[r...@ixf9 trace-cmd.git]# time ./trace-cmd report -N/dev/null 21

real0m4.653s
user0m4.234s
sys 0m0.419s

* -N keeps trace-cmd from loading any plugins.


[r...@ixf9 trace-cmd.git]# time PYTHONPATH=`pwd` ./trace-cmd report/dev/null 
21

real0m53.916s
user0m53.047s
sys 0m0.859s


Yes, running a python interpreter is a bit more expensive. It took 4
seconds to read the million events with plain C, but 53 seconds to read
it in python.

That said...  This would only affect you if you were writing this to a
file. I doubt that you would notice this if you were scanning the trace
with less.
   


I'm more worried about searching with less.  But a minute for a million 
events isn't that bad.



Also, I kicked this off in kernelshark, and it made no difference that I
can see. This is because kernelshark only evaluates the viewable area of
the screen.
   


Neat.  Can it also search?  Where can I find it? googles, finds, gawks

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 1/2] virtio: support layout with avail ring before idx

2010-06-06 Thread Michael S. Tsirkin
On Sat, Jun 05, 2010 at 01:40:26PM +0930, Rusty Russell wrote:
 On Fri, 4 Jun 2010 09:12:05 pm Michael S. Tsirkin wrote:
  On Fri, Jun 04, 2010 at 08:46:49PM +0930, Rusty Russell wrote:
   I'm uncomfortable with moving a field.
   
   We haven't done that before and I wonder what will break with old code.
  
  With e.g. my patch, We only do this conditionally when bit is negotitated.
 
 Of course, but see this change:
 
 commit ef688e151c00e5d529703be9a04fd506df8bc54e
 Author: Rusty Russell ru...@rustcorp.com.au
 Date:   Fri Jun 12 22:16:35 2009 -0600
 
 virtio: meet virtio spec by finalizing features before using device
 
 Virtio devices are supposed to negotiate features before they start using
 the device, but the current code doesn't do this.  This is because the
 driver's probe() function invariably has to add buffers to a virtqueue,
 or probe the disk (virtio_blk).
 
 This currently doesn't matter since no existing backend is strict about
 the feature negotiation.  But it's possible to imagine a future feature
 which completely changes how a device operates: in this case, we'd need
 to acknowledge it before using the device.
 
 Signed-off-by: Rusty Russell ru...@rustcorp.com.au
 
 Now, this isn't impossible to overcome: we know that if they use the ring
 before completing feature negotiation then they don't understand the new
 format.
 
 But we have to be aware of that on the qemu side.  Are we?

I think we are ok. virtqueue_init which sets the avail/ysed pointers is
called when we write the base address.  So we only need to be careful
and not change this feature bit after creating the rings.


   Should we instead just abandon the flags field and use last_used only?
   Or, more radically, put flags == last_used when the feature is on?
   
   Thoughts?
   Rusty.
  
  Hmm, e.g. with TX and virtio net, we almost never want interrupts,
  whatever the index value.
 
 Good point.  OK, I give in, I'll take your patch which moves the fields
 to the end.  Is that your preference?

Yes, I think so.
You mean PATCHv3 unchanged with 254 byte padding?

 Please be careful with the qemu side though...
 
 It's not inconceivable that I'll write that virtio cacheline simulator this
 (coming) week, too...
 
 Thanks.
 Rusty.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers

2010-06-06 Thread Michael S. Tsirkin
On Thu, Jun 03, 2010 at 02:41:38PM -0700, Tom Lyon wrote:
 OK, in the interest of making progress, I am about to embark on the following:
 
 1. Create a user-iommu-domain driver - opening it will give a new empty 
 domain.
 Ultimately this can also populate sysfs with the state of its world, 
 which would
 also be a good addition to the base iommu stuff.
 If someone closes the fd while in use, the domain stays valid anyway 
 until users
 drop off.
 
 2. Add DOMAIN_SET and DOMAIN_UNSET ioctls to the vfio driver.  Require that
a domain be set before using the VFIO_DMA_MAP_IOVA ioctl

Require domain to be set before you allow any access to the device:
mmap, write, read.  IMO this is the only safe way to make sure userspace
does not corrupt memory, and this removes the need to special-case
MSI memory, play with bus master enable and hope it can be cleared without
reset, etc.

 (this is the one
that KVM wants).

Not sure I understand. I think that MAP should be done on the domain,
not the device, this handles pinning pages correctly and
this way you don't need any special checks.

However, the VFIO_DMA_MAP_ANYWHERE ioctl is the one
which uses the dma_sg interface which has no expicit control of domains. I
intend to keep it the way it is, but expect only non-hypervisor programs 
 would
want to use it.

If we support MAP_IOVA, why is MAP_ANYWHERE useful? Can't
non-hypervisors just pick an address?

 3. Clean up the docs and other nits that folks have found.
 
 Comments? 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/7] KVM: MMU: skip invalid sp when unprotect page

2010-06-06 Thread Avi Kivity

On 06/04/2010 04:52 PM, Xiao Guangrong wrote:

In kvm_mmu_unprotect_page(), the invalid sp can be skipped
   


Applied all, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Print a user-friendly message on failed vmentry

2010-06-06 Thread Mohammed Gamal
This patch address bug report in https://bugs.launchpad.net/qemu/+bug/530077.

Failed vmentries were handled with handle_unhandled() which prints a rather
unfriendly message to the user. This patch separates handling vmentry failures
from unknown exit reasons and prints a friendly message to the user.

Signed-off-by: Mohammed Gamal m.gamal...@gmail.com
---
 qemu-kvm.c |   20 +++-
 1 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 9534b31..1a66f55 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -106,6 +106,24 @@ static int handle_unhandled(uint64_t reason)
 return -EINVAL;
 }
 
+#define VMX_INVALID_GUEST_STATE 0x8021
+
+static int handle_failed_vmentry(uint64_t reason)
+{
+fprintf(stderr, kvm: vm entry failed with error 0x% PRIx64 \n\n, 
reason);
+
+/* Perhaps we will need to check if this machine is intel since exit 
reason 0x21 
+   has a different interpretation on SVM */
+if (reason == VMX_INVALID_GUEST_STATE) {
+fprintf(stderr, If you're runnning a guest on an Intel machine 
without\n);
+fprintf(stderr, unrestricted mode support, the failure can be most 
likely\n);
+fprintf(stderr, due to the guest entering an invalid state for Intel 
VT.\n);
+fprintf(stderr, For example, the guest maybe running in big real 
mode\n);
+fprintf(stderr, which is not supported on less recent Intel 
processors.\n\n);
+}
+
+return -EINVAL;
+}
 
 static inline void set_gsi(kvm_context_t kvm, unsigned int gsi)
 {
@@ -586,7 +604,7 @@ int kvm_run(CPUState *env)
 r = handle_unhandled(run-hw.hardware_exit_reason);
 break;
 case KVM_EXIT_FAIL_ENTRY:
-r = 
handle_unhandled(run-fail_entry.hardware_entry_failure_reason);
+r = 
handle_failed_vmentry(run-fail_entry.hardware_entry_failure_reason);
 break;
 case KVM_EXIT_EXCEPTION:
 fprintf(stderr, exception %d (%x)\n, run-ex.exception,
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mmu_notifers, pte_dirty questions

2010-06-06 Thread Avi Kivity

Why no notifer when testing and clearing the dirty bit?

(*clear_flush_dirty)(...).


static int page_mkclean_one(struct page *page, struct vm_area_struct *vma,
unsigned long address)
{
struct mm_struct *mm = vma-vm_mm;
pte_t *pte;
spinlock_t *ptl;
int ret = 0;

pte = page_check_address(page, mm, address, ptl, 1);
if (!pte)
goto out;

if (pte_dirty(*pte) || pte_write(*pte)) {
pte_t entry;

flush_cache_page(vma, address, pte_pfn(*pte));
entry = ptep_clear_flush_notify(vma, address, pte);
entry = pte_wrprotect(entry);
entry = pte_mkclean(entry);
set_pte_at(mm, address, pte, entry);


set_pte_at_notify()?  without this (or clear_flush_dirty) Linux will 
assume all ptes are now clean; if the guest writes to a page nothing 
will catch it.


- with set_pte_at_notify(), we can drop the spte and mark the page as 
dirty, so the next write will re-instantiate the spte
- with -clear_flush_dirty(), we can track the dirty state without 
dropping the spte.



ret = 1;
}

pte_unmap_unlock(pte, ptl);
out:
return ret;


I'm probably missing something big as I can't see how this works.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] KVM: MMU: Move accessed/dirty bit checks from rmap_remove() to drop_spte()

2010-06-06 Thread Avi Kivity
Since we need to make the check atomic, move it to the place that will
set the new spte.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/mmu.c |   17 +
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 17331c2..f93948d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -620,19 +620,11 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
struct kvm_rmap_desc *desc;
struct kvm_rmap_desc *prev_desc;
struct kvm_mmu_page *sp;
-   pfn_t pfn;
gfn_t gfn;
unsigned long *rmapp;
int i;
 
-   if (!is_rmap_spte(*spte))
-   return;
sp = page_header(__pa(spte));
-   pfn = spte_to_pfn(*spte);
-   if (*spte  shadow_accessed_mask)
-   kvm_set_pfn_accessed(pfn);
-   if (is_writable_pte(*spte))
-   kvm_set_pfn_dirty(pfn);
gfn = kvm_mmu_page_get_gfn(sp, spte - sp-spt);
rmapp = gfn_to_rmap(kvm, gfn, sp-role.level);
if (!*rmapp) {
@@ -668,6 +660,15 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
 
 static void drop_spte(struct kvm *kvm, u64 *sptep, u64 new_spte)
 {
+   pfn_t pfn;
+
+   if (!is_rmap_spte(*sptep))
+   return;
+   pfn = spte_to_pfn(*sptep);
+   if (*sptep  shadow_accessed_mask)
+   kvm_set_pfn_accessed(pfn);
+   if (is_writable_pte(*sptep))
+   kvm_set_pfn_dirty(pfn);
rmap_remove(kvm, sptep);
__set_spte(sptep, new_spte);
 }
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] Fix accessed bit tracking

2010-06-06 Thread Avi Kivity
The kvm mmu synchronizes shadow ptes using the mmu lock, however the cpu
will happily ignore the lock when setting the accessed bit.  This can cause
the accessed bit to be lost.  Luckily this only results in incorrect page
selection for swap.

This patchset fixes the problem by atomically updating the spte when
needed while taking care of the accessed bit.

Avi Kivity (4):
  KVM: MMU: Introduce drop_spte()
  KVM: MMU: Move accessed/dirty bit checks from rmap_remove() to
drop_spte()
  KVM: MMU: Atomically check for accessed bit when dropping an spte
  KVM: MMU: Don't drop accessed bit while updating an spte

 arch/x86/kvm/mmu.c |   91 +++
 arch/x86/kvm/paging_tmpl.h |   13 +++---
 2 files changed, 71 insertions(+), 33 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] KVM: MMU: Don't drop accessed bit while updating an spte

2010-06-06 Thread Avi Kivity
__set_spte() will happily replace an spte with the accessed bit set with
one that has the accessed bit clear.  Add a helper update_spte() which checks
for this condition and updates the page flag if needed.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/mmu.c |   25 +
 1 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b565a14..4c98726 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -305,6 +305,19 @@ static u64 __xchg_spte(u64 *sptep, u64 new_spte)
 #endif
 }
 
+static void update_spte(u64 *sptep, u64 new_spte)
+{
+   u64 old_spte;
+
+   if (!shadow_accessed_mask || (new_spte  shadow_accessed_mask)) {
+   __set_spte(sptep, new_spte);
+   } else {
+   old_spte = __xchg_spte(sptep, new_spte);
+   if (old_spte  shadow_accessed_mask)
+   mark_page_accessed(pfn_to_page(spte_to_pfn(old_spte)));
+   }
+}
+
 static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
  struct kmem_cache *base_cache, int min)
 {
@@ -730,7 +743,7 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
BUG_ON(!(*spte  PT_PRESENT_MASK));
rmap_printk(rmap_write_protect: spte %p %llx\n, spte, *spte);
if (is_writable_pte(*spte)) {
-   __set_spte(spte, *spte  ~PT_WRITABLE_MASK);
+   update_spte(spte, *spte  ~PT_WRITABLE_MASK);
write_protected = 1;
}
spte = rmap_next(kvm, rmapp, spte);
@@ -786,7 +799,7 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
 unsigned long data)
 {
int need_flush = 0;
-   u64 *spte, new_spte;
+   u64 *spte, new_spte, old_spte;
pte_t *ptep = (pte_t *)data;
pfn_t new_pfn;
 
@@ -806,9 +819,13 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned 
long *rmapp,
 
new_spte = ~PT_WRITABLE_MASK;
new_spte = ~SPTE_HOST_WRITEABLE;
+   new_spte = ~shadow_accessed_mask;
if (is_writable_pte(*spte))
kvm_set_pfn_dirty(spte_to_pfn(*spte));
-   __set_spte(spte, new_spte);
+   old_spte = __xchg_spte(spte, new_spte);
+   if (is_shadow_present_pte(old_spte)
+(old_spte  shadow_accessed_mask))
+   
mark_page_accessed(pfn_to_page(spte_to_pfn(old_spte)));
spte = rmap_next(kvm, rmapp, spte);
}
}
@@ -1956,7 +1973,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
mark_page_dirty(vcpu-kvm, gfn);
 
 set_pte:
-   __set_spte(sptep, spte);
+   update_spte(sptep, spte);
 done:
return ret;
 }
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] KVM: MMU: Atomically check for accessed bit when dropping an spte

2010-06-06 Thread Avi Kivity
Currently, in the window between the check for the accessed bit, and actually
dropping the spte, a vcpu can access the page through the spte and set the bit,
which will be ignored by the mmu.

Fix by using an exchange operation to atmoically fetch the spte and drop it.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/mmu.c |   26 +-
 1 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f93948d..b565a14 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -290,6 +290,21 @@ static void __set_spte(u64 *sptep, u64 spte)
 #endif
 }
 
+static u64 __xchg_spte(u64 *sptep, u64 new_spte)
+{
+#ifdef CONFIG_X86_64
+   return xchg(sptep, new_spte);
+#else
+   u64 old_spte;
+
+   do {
+   old_spte = *sptep;
+   } while (cmpxchg64(sptep, old_spte, new_spte) != old_spte);
+
+   return old;
+#endif
+}
+
 static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
  struct kmem_cache *base_cache, int min)
 {
@@ -661,16 +676,17 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
 static void drop_spte(struct kvm *kvm, u64 *sptep, u64 new_spte)
 {
pfn_t pfn;
+   u64 old_spte;
 
-   if (!is_rmap_spte(*sptep))
+   old_spte = __xchg_spte(sptep, new_spte);
+   if (!is_rmap_spte(old_spte))
return;
-   pfn = spte_to_pfn(*sptep);
-   if (*sptep  shadow_accessed_mask)
+   pfn = spte_to_pfn(old_spte);
+   if (old_spte  shadow_accessed_mask)
kvm_set_pfn_accessed(pfn);
-   if (is_writable_pte(*sptep))
+   if (is_writable_pte(old_spte))
kvm_set_pfn_dirty(pfn);
rmap_remove(kvm, sptep);
-   __set_spte(sptep, new_spte);
 }
 
 static u64 *rmap_next(struct kvm *kvm, unsigned long *rmapp, u64 *spte)
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] KVM: MMU: Introduce drop_spte()

2010-06-06 Thread Avi Kivity
When we call rmap_remove(), we (almost) always immediately follow it by
an __set_spte() to a nonpresent pte.  Since we need to perform the two
operations atomically, to avoid losing the dirty and accessed bits, introduce
a helper drop_spte() and convert all call sites.

The operation is still nonatomic at this point.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/mmu.c |   33 +++--
 arch/x86/kvm/paging_tmpl.h |   13 ++---
 2 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6b2c644..17331c2 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -666,6 +666,12 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
}
 }
 
+static void drop_spte(struct kvm *kvm, u64 *sptep, u64 new_spte)
+{
+   rmap_remove(kvm, sptep);
+   __set_spte(sptep, new_spte);
+}
+
 static u64 *rmap_next(struct kvm *kvm, unsigned long *rmapp, u64 *spte)
 {
struct kvm_rmap_desc *desc;
@@ -731,9 +737,9 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
BUG_ON((*spte  (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK)) != 
(PT_PAGE_SIZE_MASK|PT_PRESENT_MASK));
pgprintk(rmap_write_protect(large): spte %p %llx 
%lld\n, spte, *spte, gfn);
if (is_writable_pte(*spte)) {
-   rmap_remove(kvm, spte);
+   drop_spte(kvm, spte,
+ shadow_trap_nonpresent_pte);
--kvm-stat.lpages;
-   __set_spte(spte, shadow_trap_nonpresent_pte);
spte = NULL;
write_protected = 1;
}
@@ -753,8 +759,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
while ((spte = rmap_next(kvm, rmapp, NULL))) {
BUG_ON(!(*spte  PT_PRESENT_MASK));
rmap_printk(kvm_rmap_unmap_hva: spte %p %llx\n, spte, *spte);
-   rmap_remove(kvm, spte);
-   __set_spte(spte, shadow_trap_nonpresent_pte);
+   drop_spte(kvm, spte, shadow_trap_nonpresent_pte);
need_tlb_flush = 1;
}
return need_tlb_flush;
@@ -776,8 +781,7 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
rmap_printk(kvm_set_pte_rmapp: spte %p %llx\n, spte, *spte);
need_flush = 1;
if (pte_write(*ptep)) {
-   rmap_remove(kvm, spte);
-   __set_spte(spte, shadow_trap_nonpresent_pte);
+   drop_spte(kvm, spte, shadow_trap_nonpresent_pte);
spte = rmap_next(kvm, rmapp, NULL);
} else {
new_spte = *spte ~ (PT64_BASE_ADDR_MASK);
@@ -1498,13 +1502,14 @@ static void kvm_mmu_page_unlink_children(struct kvm 
*kvm,
ent = PT64_BASE_ADDR_MASK;
mmu_page_remove_parent_pte(page_header(ent),
   pt[i]);
+   pt[i] = shadow_trap_nonpresent_pte;
} else {
if (is_large_pte(ent))
--kvm-stat.lpages;
-   rmap_remove(kvm, pt[i]);
+   drop_spte(kvm, pt[i],
+ shadow_trap_nonpresent_pte);
}
}
-   pt[i] = shadow_trap_nonpresent_pte;
}
 }
 
@@ -1902,9 +1907,8 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
if (level  PT_PAGE_TABLE_LEVEL 
has_wrprotected_page(vcpu-kvm, gfn, level)) {
ret = 1;
-   rmap_remove(vcpu-kvm, sptep);
-   spte = shadow_trap_nonpresent_pte;
-   goto set_pte;
+   drop_spte(vcpu-kvm, sptep, shadow_trap_nonpresent_pte);
+   goto done;
}
 
spte |= PT_WRITABLE_MASK;
@@ -1936,6 +1940,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 
 set_pte:
__set_spte(sptep, spte);
+done:
return ret;
 }
 
@@ -1972,7 +1977,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
} else if (pfn != spte_to_pfn(*sptep)) {
pgprintk(hfn old %lx new %lx\n,
 spte_to_pfn(*sptep), pfn);
-   rmap_remove(vcpu-kvm, sptep);
+   drop_spte(vcpu-kvm, sptep, shadow_trap_nonpresent_pte);
} else
was_rmapped = 1;
}
@@ -2623,13 +2628,13 @@ static void mmu_pte_write_zap_pte(struct kvm_vcpu *vcpu,
pte = *spte;
if (is_shadow_present_pte(pte)) 

[PATCH qemu-kvm] kvm: rename kvm/test/test/powerpc to kvm/test/powerpc

2010-06-06 Thread Ken CC
To make the directory architecture match the file config file
kvm/test/config-powerpc.mk.

Signed-off-by: Ken CC cc.5...@gmail.com
---
 kvm/test/powerpc/44x/tlbsx.S   |   33 
 kvm/test/powerpc/44x/tlbwe.S   |   27 +++
 kvm/test/powerpc/44x/tlbwe_16KB.S  |   35 +
 kvm/test/powerpc/44x/tlbwe_hole.S  |   27 +++
 kvm/test/powerpc/cstart.S  |   38 
 kvm/test/powerpc/exit.c|   23 +++
 kvm/test/powerpc/helloworld.c  |   27 +++
 kvm/test/powerpc/io.S  |   32 +++
 kvm/test/powerpc/spin.S|4 +++
 kvm/test/powerpc/sprg.S|7 ++
 kvm/test/test/powerpc/44x/tlbsx.S  |   33 
 kvm/test/test/powerpc/44x/tlbwe.S  |   27 ---
 kvm/test/test/powerpc/44x/tlbwe_16KB.S |   35 -
 kvm/test/test/powerpc/44x/tlbwe_hole.S |   27 ---
 kvm/test/test/powerpc/cstart.S |   38 
 kvm/test/test/powerpc/exit.c   |   23 ---
 kvm/test/test/powerpc/helloworld.c |   27 ---
 kvm/test/test/powerpc/io.S |   32 ---
 kvm/test/test/powerpc/spin.S   |4 ---
 kvm/test/test/powerpc/sprg.S   |7 --
 20 files changed, 253 insertions(+), 253 deletions(-)
 create mode 100644 kvm/test/powerpc/44x/tlbsx.S
 create mode 100644 kvm/test/powerpc/44x/tlbwe.S
 create mode 100644 kvm/test/powerpc/44x/tlbwe_16KB.S
 create mode 100644 kvm/test/powerpc/44x/tlbwe_hole.S
 create mode 100644 kvm/test/powerpc/cstart.S
 create mode 100644 kvm/test/powerpc/exit.c
 create mode 100644 kvm/test/powerpc/helloworld.c
 create mode 100644 kvm/test/powerpc/io.S
 create mode 100644 kvm/test/powerpc/spin.S
 create mode 100644 kvm/test/powerpc/sprg.S
 delete mode 100644 kvm/test/test/powerpc/44x/tlbsx.S
 delete mode 100644 kvm/test/test/powerpc/44x/tlbwe.S
 delete mode 100644 kvm/test/test/powerpc/44x/tlbwe_16KB.S
 delete mode 100644 kvm/test/test/powerpc/44x/tlbwe_hole.S
 delete mode 100644 kvm/test/test/powerpc/cstart.S
 delete mode 100644 kvm/test/test/powerpc/exit.c
 delete mode 100644 kvm/test/test/powerpc/helloworld.c
 delete mode 100644 kvm/test/test/powerpc/io.S
 delete mode 100644 kvm/test/test/powerpc/spin.S
 delete mode 100644 kvm/test/test/powerpc/sprg.S

diff --git a/kvm/test/powerpc/44x/tlbsx.S b/kvm/test/powerpc/44x/tlbsx.S
new file mode 100644
index 000..b15874b
--- /dev/null
+++ b/kvm/test/powerpc/44x/tlbsx.S
@@ -0,0 +1,33 @@
+#define SPRN_MMUCR 0x3b2
+
+#define TLBWORD0 0x1210
+#define TLBWORD1 0x1000
+#define TLBWORD2 0x0003
+
+.global _start
+_start:
+   li  r4, 0
+   mtspr   SPRN_MMUCR, r4
+
+   li  r3, 23
+
+   lis r4, tlbwo...@h
+   ori r4, r4, tlbwo...@l
+   tlbwe   r4, r3, 0
+
+   lis r4, tlbwo...@h
+   ori r4, r4, tlbwo...@l
+   tlbwe   r4, r3, 1
+
+   lis r4, tlbwo...@h
+   ori r4, r4, tlbwo...@l
+   tlbwe   r4, r3, 2
+
+   lis r4, 0x1000
+   tlbsx   r5, r4, r0
+   cmpwi   r5, 23
+   beq good
+   trap
+
+good:
+   b   .
diff --git a/kvm/test/powerpc/44x/tlbwe.S b/kvm/test/powerpc/44x/tlbwe.S
new file mode 100644
index 000..ec6ef5c
--- /dev/null
+++ b/kvm/test/powerpc/44x/tlbwe.S
@@ -0,0 +1,27 @@
+#define SPRN_MMUCR 0x3b2
+
+/* Create a mapping at 4MB */
+#define TLBWORD0 0x00400210
+#define TLBWORD1 0x0040
+#define TLBWORD2 0x0003
+
+.global _start
+_start:
+   li  r4, 0
+   mtspr   SPRN_MMUCR, r4
+
+   li  r3, 23
+
+   lis r4, tlbwo...@h
+   ori r4, r4, tlbwo...@l
+   tlbwe   r4, r3, 0
+
+   lis r4, tlbwo...@h
+   ori r4, r4, tlbwo...@l
+   tlbwe   r4, r3, 1
+
+   lis r4, tlbwo...@h
+   ori r4, r4, tlbwo...@l
+   tlbwe   r4, r3, 2
+
+   b   .
diff --git a/kvm/test/powerpc/44x/tlbwe_16KB.S 
b/kvm/test/powerpc/44x/tlbwe_16KB.S
new file mode 100644
index 000..1bd10bf
--- /dev/null
+++ b/kvm/test/powerpc/44x/tlbwe_16KB.S
@@ -0,0 +1,35 @@
+#define SPRN_MMUCR 0x3b2
+
+/* 16KB mapping at 4MB */
+#define TLBWORD0 0x00400220
+#define TLBWORD1 0x0040
+#define TLBWORD2 0x0003
+
+.global _start
+_start:
+   li  r4, 0
+   mtspr   SPRN_MMUCR, r4
+
+   li  r3, 5
+
+   lis r4, tlbwo...@h
+   ori r4, r4, tlbwo...@l
+   tlbwe   r4, r3, 0
+
+   lis r4, tlbwo...@h
+   ori r4, r4, tlbwo...@l
+   tlbwe   r4, r3, 1
+
+   lis r4, tlbwo...@h
+   ori r4, r4, tlbwo...@l
+   tlbwe   r4, r3, 2
+
+   /* load from 4MB */
+   lis r3, 0x0040
+   lwz r4, 0(r3)
+
+   /* load from 4MB+8KB */
+   ori

Re: [PATCH 0/3][STABLE] KVM: Various issues in virtio_net

2010-06-06 Thread Avi Kivity

On 06/03/2010 10:31 PM, Bruce Rogers wrote:

These are patches which we have found useful for our 2.6.32 based SLES 11 SP1 
release.

The first patch is already upstream, but should be included in stable.

The second patch is a subset of another upstream patch. Again, stable material.

The third patch solves the last remaining issue we saw when testing kvm 
configurations with the SUSE certification test suite. Under heavy load, we 
observed rx stalls (first two patches applied), and this third patch was 
crafted to address the issue. Please apply to stable.
I assume this last problem also exists in more recent kernels than 2.6.32, but 
I haven't validated that.

With these 3 patches applied we no longer see any issues with virito networking 
using our certification test suite.

   



Please send virtio patches, including patches for the stable branches, 
to the virtio maintainer (Rusty).  Copying Michael (m...@redhat.com) is 
also a good idea.


Also:

- use git format-patch/git sent-email or equivalents; these ensure the 
correct formatting and group the patches together for threaded mail clients
- indicate the upstream commit (often it's enough to ask the maintainers 
to backport a commit)


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: mmu: Remove unused local variable

2010-06-06 Thread Avi Kivity

On 06/03/2010 10:08 AM, Jan Kiszka wrote:

From: Jan Kiszkajan.kis...@siemens.com

Signed-off-by: Jan Kiszkajan.kis...@siemens.com
---

No one else checks for new build warnings?

   


My fault, sorry, I updated the patch in place to remove a dangling r = 
something; and introduced two new warnings.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH qemu-kvm] kvm: rename kvm/test/test/powerpc to kvm/test/powerpc

2010-06-06 Thread Avi Kivity

On 06/06/2010 04:13 PM, Ken CC wrote:

To make the directory architecture match the file config file
kvm/test/config-powerpc.mk.

   


Can you please resent with 'diff.renames' set in your git 
configuration?  That will make the patch reviewable, as it will convert 
it to a pattern of moves instead of creates/deletes.


Also, isn't a Makefile change needed?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers

2010-06-06 Thread Avi Kivity

On 06/02/2010 07:53 PM, Chris Wright wrote:

* Avi Kivity (a...@redhat.com) wrote:
   

The interface would only work for clients which support it: kvm,
vhost, and iommu/devices with restartable dma.
 

BTW, there is no such thing as restartable dma.  There is a provision in
new specs (read: no real hardware) that allows a device to request pages
before using them.  So it's akin to demand paging, but the demand is an
explicit request rather than a page fault.
   


Thanks for the correction.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] KVM: MMU: Introduce drop_spte()

2010-06-06 Thread Avi Kivity

On 06/06/2010 04:06 PM, Avi Kivity wrote:

When we call rmap_remove(), we (almost) always immediately follow it by
an __set_spte() to a nonpresent pte.  Since we need to perform the two
operations atomically, to avoid losing the dirty and accessed bits, introduce
a helper drop_spte() and convert all call sites.

The operation is still nonatomic at this point.


@@ -1498,13 +1502,14 @@ static void kvm_mmu_page_unlink_children(struct kvm 
*kvm,
ent= PT64_BASE_ADDR_MASK;
mmu_page_remove_parent_pte(page_header(ent),
pt[i]);
+   pt[i] = shadow_trap_nonpresent_pte;
} else {
if (is_large_pte(ent))
--kvm-stat.lpages;
-   rmap_remove(kvm,pt[i]);
+   drop_spte(kvm,pt[i],
+ shadow_trap_nonpresent_pte);
}
}
-   pt[i] = shadow_trap_nonpresent_pte;
}
  }
   


Autotest points out that this transformation (and an identical one in 
zap_pte) does not preserve the semantics; if the outer if () fails, the 
new code does not update pt[i].


With the original line after the if () retained, autotest is happier.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2933400 ] virtio-blk io errors / data corruption on raw drives 1 TB

2010-06-06 Thread SourceForge.net
Bugs item #2933400, was opened at 2010-01-16 17:35
Message generated for change (Comment added) made by nevenchannyy
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2933400group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 9
Private: No
Submitted By: MaSc82 (masc82)
Assigned to: Nobody/Anonymous (nobody)
Summary: virtio-blk io errors / data corruption on raw drives  1 TB

Initial Comment:
When attaching raw drives  1 TB, buffer io errors will most likely occur, 
filesystems get corrupted. Processes (like mkfs.ext4) might freeze completely 
when filesystems are created on the guest.

Here's a typical log excerpt when using mkfs.ext4 on a 1.5 TB drive on a Ubuntu 
9.10 guest:
(kern.log)
Jan 15 20:40:44 q kernel: [  677.076602] Buffer I/O error on device vde, 
logical block 366283764
Jan 15 20:40:44 q kernel: [  677.076607] Buffer I/O error on device vde, 
logical block 366283765
Jan 15 20:40:44 q kernel: [  677.076611] Buffer I/O error on device vde, 
logical block 366283766
Jan 15 20:40:44 q kernel: [  677.076616] Buffer I/O error on device vde, 
logical block 366283767
Jan 15 20:40:44 q kernel: [  677.076621] Buffer I/O error on device vde, 
logical block 366283768
Jan 15 20:40:44 q kernel: [  677.076626] Buffer I/O error on device vde, 
logical block 366283769
(messages)
Jan 15 20:40:44 q kernel: [  677.076534] lost page write due to I/O error on vde
Jan 15 20:40:44 q kernel: [  677.076541] lost page write due to I/O error on vde
Jan 15 20:40:44 q kernel: [  677.076546] lost page write due to I/O error on vde
Jan 15 20:40:44 q kernel: [  677.076599] lost page write due to I/O error on vde
Jan 15 20:40:44 q kernel: [  677.076604] lost page write due to I/O error on vde
Jan 15 20:40:44 q kernel: [  677.076609] lost page write due to I/O error on vde
Jan 15 20:40:44 q kernel: [  677.076613] lost page write due to I/O error on vde
Jan 15 20:40:44 q kernel: [  677.076618] lost page write due to I/O error on vde
Jan 15 20:40:44 q kernel: [  677.076623] lost page write due to I/O error on vde
Jan 15 20:40:44 q kernel: [  677.076628] lost page write due to I/O error on vde
Jan 15 20:45:55 q Backgrounding to notify hosts...
(The following entries will repeat infinitely, mkfs.ext4 will not exit and 
cannot be killed)
Jan 15 20:49:27 q kernel: [ 1200.520096] mkfs.ext4 D  0 
 1839   1709 0x
Jan 15 20:49:27 q kernel: [ 1200.520101]  88004e157cb8 0082 
88004e157c58 00015880
Jan 15 20:49:27 q kernel: [ 1200.520115]  88004ef6c7c0 00015880 
00015880 00015880
Jan 15 20:49:27 q kernel: [ 1200.520118]  00015880 88004ef6c7c0 
00015880 00015880
Jan 15 20:49:27 q kernel: [ 1200.520123] Call Trace:
Jan 15 20:49:27 q kernel: [ 1200.520157]  [810da0f0] ? 
sync_page+0x0/0x50
Jan 15 20:49:27 q kernel: [ 1200.520178]  [815255f8] 
io_schedule+0x28/0x40
Jan 15 20:49:27 q kernel: [ 1200.520182]  [810da12d] 
sync_page+0x3d/0x50
Jan 15 20:49:27 q kernel: [ 1200.520185]  [81525b17] 
__wait_on_bit+0x57/0x80
Jan 15 20:49:27 q kernel: [ 1200.520192]  [810da29e] 
wait_on_page_bit+0x6e/0x80
Jan 15 20:49:27 q kernel: [ 1200.520205]  [81078650] ? 
wake_bit_function+0x0/0x40
Jan 15 20:49:27 q kernel: [ 1200.520210]  [810e44e0] ? 
pagevec_lookup_tag+0x20/0x30
Jan 15 20:49:27 q kernel: [ 1200.520213]  [810da745] 
wait_on_page_writeback_range+0xf5/0x190
Jan 15 20:49:27 q kernel: [ 1200.520217]  [810da807] 
filemap_fdatawait+0x27/0x30
Jan 15 20:49:27 q kernel: [ 1200.520220]  [810dacb4] 
filemap_write_and_wait+0x44/0x50
Jan 15 20:49:27 q kernel: [ 1200.520235]  [8114ba9f] 
__sync_blockdev+0x1f/0x40
Jan 15 20:49:27 q kernel: [ 1200.520239]  [8114bace] 
sync_blockdev+0xe/0x10
Jan 15 20:49:27 q kernel: [ 1200.520241]  [8114baea] 
block_fsync+0x1a/0x20
Jan 15 20:49:27 q kernel: [ 1200.520249]  [81142f26] 
vfs_fsync+0x86/0xf0
Jan 15 20:49:27 q kernel: [ 1200.520252]  [81142fc9] 
do_fsync+0x39/0x60
Jan 15 20:49:27 q kernel: [ 1200.520255]  [8114301b] 
sys_fsync+0xb/0x10
Jan 15 20:49:27 q kernel: [ 1200.520271]  [81011fc2] 
system_call_fastpath+0x16/0x1b

In my case I was switching to virtio at one point, but the behaviour didn't 
show until there was  1 TB data on the filesystem. very dangerous.

Tested using 2 different SATA controllers, 1.5 TB lvm/mdraid, single 1.5 TB 
drive and 2 TB lvm/mdraid.
The behaviour does not occur with if=scsi or if=ide.

#2914397 might be related: 
https://sourceforge.net/tracker/?func=detailaid=2914397group_id=180599atid=893831
This blog post might also relate: 
http://www.neuhalfen.name/2009/08/05/OpenSolaris_KVM_and_large_IDE_drives/

CPU: 

Re: Perf trace event parse errors for KVM events

2010-06-06 Thread Steven Rostedt
On Sun, 2010-06-06 at 11:08 +0300, Avi Kivity wrote:

  Also, I kicked this off in kernelshark, and it made no difference that I
  can see. This is because kernelshark only evaluates the viewable area of
  the screen.
 
 
 Neat.  Can it also search?  Where can I find it? googles, finds, gawks
 

It's in the same repo as trace-cmd:

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git

I'm still working on it. The latest is in the branch kernelshark-devel.

And yes, it does searches.

-- Steve


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v6 5/6] Inter-VM shared memory PCI device

2010-06-06 Thread Avi Kivity

On 06/05/2010 12:44 PM, Blue Swirl wrote:

On Fri, Jun 4, 2010 at 9:45 PM, Cam Macdonellc...@cs.ualberta.ca  wrote:
   

Support an inter-vm shared memory device that maps a shared-memory object as a
PCI device in the guest.  This patch also supports interrupts between guest by
communicating over a unix domain socket.  This patch applies to the qemu-kvm
repository.

-device ivshmem,size=size in format accepted by -m[,shm=shm name]

Interrupts are supported between multiple VMs by using a shared memory server
by using a chardev socket.

-device ivshmem,size=size in format accepted by -m[,shm=shm name]
   [,chardev=id][,msi=on][,irqfd=on][,vectors=n][,role=peer|master]
-chardev socket,path=path,id=id

(shared memory server is qemu.git/contrib/ivshmem-server)

Sample programs and init scripts are in a git repo here:

 

Why is this KVM specific BTW, Posix SHM is available on many
platforms? What would happen if kvm_set_foobar functions were not
called when KVM is not being used? Is host eventfd support essential?
   


It's not kvm specific, it's 
atomic-ops-on-shared-memory-are-visible-as-atomic-ops specific, which is 
currently only available with kvm.  When tcg gains true smp support (and 
not just against other tcg threads) this can work with tcg as well.


I guess that needs a host with at least 32/64 bit CAS for 32/64 bit 
targets respectively, and double that if the target has DCAS.  Not sure 
how targets with ll/sc can be implemented, especially if there are 
limits as to what can go in between.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: porting fixes regarding kvm-clock and lost irqs to stable qemu-kvm 0.12.4

2010-06-06 Thread Faidon Liambotis
Peter Lieven wrote:
 i was looking for a generic way to disable it in the hypervisor
 without the need to touch every guests kernel commandline.
 this would be easy revertible once its all working as expected.
 This is a huge hack but for the same reason I've made an LD_PRELOAD
 wrapper that wraps ioctl masking the hsot kernel's kvmclock feature from
 userspace kvm. I'll be happy to share it if you're interested.
 
 i would be grateful if you can sent it to me offlist.
http://people.debian.org/~paravoid/kvm-noclock-3.tar.gz

 however, i came up with this hack,
 that seems to work also:
snip

Yes, that's equivalent. We didn't use that since we wanted to use our
distribution's binaries and be able to upgrade without modifying the source.

Regards,
Faidon
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: rework remove-write-access for a slot

2010-06-06 Thread Avi Kivity

On 06/04/2010 11:14 AM, Lai Jiangshan wrote:




- I thought of a different approach to write protection: write protect
the L4 sptes, on write fault add write permission to the L4 spte and
write protect the L3 sptes that it points to, etc.  This method can use
the slot bitmap to reduce the number of write faults.  However we can
reintroduce the slot bitmap if/when we use the method, this shouldn't
block the patch.
 

It is very a good approach and it is blazing fast.

I have no time to implement it currently,
could you update it into the TODO list?
   


Done.


+static void rmapp_remove_write_access(struct kvm *kvm, unsigned long
*rmapp)
+{
+u64 *spte = rmap_next(kvm, rmapp, NULL);
+
+while (spte) {
+/* avoid RMW */
+if (is_writable_pte(*spte))
+*spte= ~PT_WRITABLE_MASK;
   

Must use an atomic operation here to avoid losing dirty or accessed bit.

 

Atomic operation is too expensive, I retained the comment /* avoid RMW */
and wait someone take a good approach for it.
   


You are right, it is an existing problem.  I just posted a patchset 
which fixes the problem, when it's merged please rebase on top.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] make-release: make mtime, owner, group consistent

2010-06-06 Thread Michael S. Tsirkin
On Fri, Jun 04, 2010 at 01:46:25PM -0300, Marcelo Tosatti wrote:
 On Wed, Jun 02, 2010 at 06:27:20PM +0300, Michael S. Tsirkin wrote:
  Files from git have modification time set to one
  of commit, and owner/group to root.
  Making it so for generated files as well makes
  it easier to generate an identical tarball from git.
  
  Setting owner/group to root is especially important because
  otherwise you must have a user/group with same name
  to generate an identical tarball.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   kvm/scripts/make-release |9 +++--
   1 files changed, 7 insertions(+), 2 deletions(-)
 
 Applied, thanks.

Could you re-generate and upload the qemu-kvm-0.12.4 release
tarball with this applied?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [COMMIT master] Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into next

2010-06-06 Thread Avi Kivity

On 06/06/2010 07:09 PM, Avi Kivity wrote:

From: Avi Kivitya...@redhat.com

* 'master' of 
ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6: (1443 
commits)
   Linux 2.6.35-rc2
   drm/i915: Move non-phys cursors into the GTT
   ext4: Fix remaining racy updates of EXT4_I(inode)-i_flags
   module: fix bne2 gave up waiting for init of module libcrc32c
   module: verify_export_symbols under the lock
   module: move find_module check to end
   module: make locking more fine-grained.
   module: Make module sysfs functions private.
   module: move sysfs exposure to end of load_module
   module: fix kdb's illicit use of struct module_use.
   module: Make the 'usage' lists be two-way
   X25: remove duplicated #include
   tcp: use correct net ns in cookie_v4_check()
   rps: tcp: fix rps_sock_flow_table table updates
   ppp_generic: fix multilink fragment sizes
   syncookies: remove Kconfig text line about disabled-by-default
   ixgbe: only check pfc bits in hang logic if pfc is enabled
   net: check for refcount if pop a stacked dst_entry
   omap: remove BUG_ON for disabled interrupts
   vmscan: fix do_try_to_free_pages() return value when priority==0 reclaim 
failure
   ...

Conflicts:
arch/powerpc/kernel/ppc_ksyms.c
   


Alex, can you verify that I merged this correctly (dropping the export)?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Perf trace event parse errors for KVM events

2010-06-06 Thread Avi Kivity

On 06/06/2010 06:02 PM, Steven Rostedt wrote:

On Sun, 2010-06-06 at 11:08 +0300, Avi Kivity wrote:

   

Also, I kicked this off in kernelshark, and it made no difference that I
can see. This is because kernelshark only evaluates the viewable area of
the screen.

   

Neat.  Can it also search?  Where can I find it?googles, finds, gawks

 

It's in the same repo as trace-cmd:

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git

I'm still working on it. The latest is in the branch kernelshark-devel.

And yes, it does searches.
   


I'll be tracking it.

In case you're interested in wishlists, my #1 would be to be able to 
initiate traces from the GUI (so it's easy to explore the available 
trace events and enable them interactively).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: porting fixes regarding kvm-clock and lost irqs to stable qemu-kvm 0.12.4

2010-06-06 Thread Peter Lieven

Am 06.06.2010 um 16:53 schrieb Faidon Liambotis:

 Peter Lieven wrote:
 i was looking for a generic way to disable it in the hypervisor
 without the need to touch every guests kernel commandline.
 this would be easy revertible once its all working as expected.
 This is a huge hack but for the same reason I've made an LD_PRELOAD
 wrapper that wraps ioctl masking the hsot kernel's kvmclock feature from
 userspace kvm. I'll be happy to share it if you're interested.
 
 i would be grateful if you can sent it to me offlist.
 http://people.debian.org/~paravoid/kvm-noclock-3.tar.gz
 
 however, i came up with this hack,
 that seems to work also:
 snip
 
 Yes, that's equivalent. We didn't use that since we wanted to use our
 distribution's binaries and be able to upgrade without modifying the source.

thank you. 

i want to share this patch for stable until -cpu xxx,-kvmclock works in the 
stable
tree.

diff -ur qemu-kvm-0.12.4.orig/qemu-kvm.c qemu-kvm-0.12.4/qemu-kvm.c
--- qemu-kvm-0.12.4.orig/qemu-kvm.c 2010-05-09 13:05:19.0 +0200
+++ qemu-kvm-0.12.4/qemu-kvm.c  2010-06-06 16:58:12.0 +0200
@@ -54,7 +54,7 @@
 int kvm_pit = 1;
 int kvm_pit_reinject = 1;
 int kvm_nested = 0;
-
+int kvm_clock = 1;
 
 KVMState *kvm_state;
 kvm_context_t kvm_context;
diff -ur qemu-kvm-0.12.4.orig/qemu-kvm.h qemu-kvm-0.12.4/qemu-kvm.h
--- qemu-kvm-0.12.4.orig/qemu-kvm.h 2010-05-09 13:05:19.0 +0200
+++ qemu-kvm-0.12.4/qemu-kvm.h  2010-06-06 17:04:02.0 +0200
@@ -1035,6 +1035,7 @@
 extern int kvm_pit;
 extern int kvm_pit_reinject;
 extern int kvm_nested;
+extern int kvm_clock;
 extern kvm_context_t kvm_context;
 
 struct ioperm_data {
diff -ur qemu-kvm-0.12.4.orig/qemu-kvm-x86.c qemu-kvm-0.12.4/qemu-kvm-x86.c
--- qemu-kvm-0.12.4.orig/qemu-kvm-x86.c 2010-05-09 13:05:19.0 +0200
+++ qemu-kvm-0.12.4/qemu-kvm-x86.c  2010-06-06 16:46:08.0 +0200
@@ -1259,6 +1259,7 @@
int i, features = 0;
 
for (i = 0; i  ARRAY_SIZE(para_features)-1; i++) {
+if (para_features[i].cap == KVM_CAP_CLOCKSOURCE  !kvm_clock) 
continue;
if (kvm_check_extension(kvm_state, para_features[i].cap))
features |= (1  para_features[i].feature);
}
diff -ur qemu-kvm-0.12.4.orig/qemu-options.hx qemu-kvm-0.12.4/qemu-options.hx
--- qemu-kvm-0.12.4.orig/qemu-options.hx2010-05-09 13:05:19.0 
+0200
+++ qemu-kvm-0.12.4/qemu-options.hx 2010-06-06 16:38:46.0 +0200
@@ -1959,6 +1959,9 @@
 -no-kvm-pit disable KVM kernel mode PIT\n)
 DEF(no-kvm-pit-reinjection, 0, QEMU_OPTION_no_kvm_pit_reinjection,
 -no-kvm-pit-reinjection disable KVM kernel mode PIT interrupt 
reinjection\n)
+DEF(no-kvm-clock, 0, QEMU_OPTION_no_kvm_clock,
+-no-kvm-clock   disable KVM pvclock\n)
+
 #if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(TARGET_IA64) || 
defined(__linux__)
 DEF(pcidevice, HAS_ARG, QEMU_OPTION_pcidevice,
 -pcidevice host=bus:dev.func[,dma=none][,name=string]\n
diff -ur qemu-kvm-0.12.4.orig/vl.c qemu-kvm-0.12.4/vl.c
--- qemu-kvm-0.12.4.orig/vl.c   2010-05-09 13:05:19.0 +0200
+++ qemu-kvm-0.12.4/vl.c2010-06-06 17:05:53.0 +0200
@@ -5530,6 +5530,10 @@
 kvm_pit_reinject = 0;
 break;
 }
+case QEMU_OPTION_no_kvm_clock: {
+kvm_clock = 0;
+break;
+}
case QEMU_OPTION_enable_nesting: {
kvm_nested = 1;
break;


 
 Regards,
 Faidon
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v4 3/3] block: add sheepdog driver for distributed storage support

2010-06-06 Thread MORITA Kazutaka
At Fri, 04 Jun 2010 13:04:00 +0200,
Kevin Wolf wrote:
 
 Am 03.06.2010 18:23, schrieb MORITA Kazutaka:
  +static void sd_aio_cancel(BlockDriverAIOCB *blockacb)
  +{
  + SheepdogAIOCB *acb = (SheepdogAIOCB *)blockacb;
  +
  + acb-canceled = 1;
  +}
 
  Does this provide the right semantics? You haven't really cancelled the
  request, but you pretend to. So you actually complete the request in the
  background and then throw the return code away.
 
  I seem to remember that posix-aio-compat.c waits at this point for
  completion of the requests, calls the callbacks and only afterwards
  returns from aio_cancel when no more requests are in flight.
 
  Or if you can really cancel requests, it would be the best option, of
  course.
 
  
  Sheepdog cannot cancel the requests which are already sent to the
  servers.  So, as you say, we pretend to cancel the requests without
  waiting for completion of them.  However, are there any situation
  where pretending to cancel causes problems in practice?
 
 I'm not sure how often it would happen in practice, but if the guest OS
 thinks the old value is on disk when in fact the new one is, this could
 lead to corruption. I think if it can happen, even without evidence that
 it actually does, it's already relevant enough.
 

I agree.

  To wait for completion of the requests here, we may need to create
  another thread for processing I/O like posix-aio-compat.c.
 
 I don't think you need a thread to get the same behaviour, you just need
 to call the fd handlers like in the main loop. It would probably be the
 first driver doing this, though, and it's not an often used code path,
 so it might be a bad idea.
 
 Maybe it's reasonable to just complete the request with -EIO? This way
 the guest couldn't make any assumption about the data written. On the
 other hand, it could be unhappy about failed requests, but that's
 probably better than corruption.
 

Completing with -EIO looks good to me.  Thanks for the advice.
I'll send an updated patch tomorrow.

Regards,

Kazutaka
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mmu_notifers, pte_dirty questions

2010-06-06 Thread Andrea Arcangeli
On Sun, Jun 06, 2010 at 03:07:27PM +0300, Avi Kivity wrote:
 Why no notifer when testing and clearing the dirty bit?
 
 (*clear_flush_dirty)(...).
 
  static int page_mkclean_one(struct page *page, struct vm_area_struct *vma,
  unsigned long address)
  {
  struct mm_struct *mm = vma-vm_mm;
  pte_t *pte;
  spinlock_t *ptl;
  int ret = 0;
 
  pte = page_check_address(page, mm, address, ptl, 1);
  if (!pte)
  goto out;
 
  if (pte_dirty(*pte) || pte_write(*pte)) {
  pte_t entry;
 
  flush_cache_page(vma, address, pte_pfn(*pte));
  entry = ptep_clear_flush_notify(vma, address, pte);
  entry = pte_wrprotect(entry);
  entry = pte_mkclean(entry);
  set_pte_at(mm, address, pte, entry);
 
 set_pte_at_notify()?  without this (or clear_flush_dirty) Linux will 
 assume all ptes are now clean; if the guest writes to a page nothing 
 will catch it.
 
 - with set_pte_at_notify(), we can drop the spte and mark the page as 
 dirty, so the next write will re-instantiate the spte
 - with -clear_flush_dirty(), we can track the dirty state without 
 dropping the spte.
 
  ret = 1;
  }
 
  pte_unmap_unlock(pte, ptl);
  out:
  return ret;
 
 I'm probably missing something big as I can't see how this works.

Under the PT lock it's safe to keep the PTE zero, just the pte must be
non zero again before pte_unmap_unlock.

The sptes are all gone by the time ptep_clear_flush_notify returns
(also gup-fast is stopped with the IPI of the flush) and no spte can
be established again before pte_unmap_unlock runs, so it's all safe as
far as I can tell.

set_pte_at_notify might prevent a minor fault though.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [COMMIT master] Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into next

2010-06-06 Thread Alexander Graf

On 06.06.2010, at 18:13, Avi Kivity wrote:

 On 06/06/2010 07:09 PM, Avi Kivity wrote:
 From: Avi Kivitya...@redhat.com
 
 * 'master' of 
 ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6: (1443 
 commits)
   Linux 2.6.35-rc2
   drm/i915: Move non-phys cursors into the GTT
   ext4: Fix remaining racy updates of EXT4_I(inode)-i_flags
   module: fix bne2 gave up waiting for init of module libcrc32c
   module: verify_export_symbols under the lock
   module: move find_module check to end
   module: make locking more fine-grained.
   module: Make module sysfs functions private.
   module: move sysfs exposure to end of load_module
   module: fix kdb's illicit use of struct module_use.
   module: Make the 'usage' lists be two-way
   X25: remove duplicated #include
   tcp: use correct net ns in cookie_v4_check()
   rps: tcp: fix rps_sock_flow_table table updates
   ppp_generic: fix multilink fragment sizes
   syncookies: remove Kconfig text line about disabled-by-default
   ixgbe: only check pfc bits in hang logic if pfc is enabled
   net: check for refcount if pop a stacked dst_entry
   omap: remove BUG_ON for disabled interrupts
   vmscan: fix do_try_to_free_pages() return value when priority==0 reclaim 
 failure
   ...
 
 Conflicts:
  arch/powerpc/kernel/ppc_ksyms.c
   
 
 Alex, can you verify that I merged this correctly (dropping the export)?

Yes, dropping was the right choice :). Thank you!

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.

2010-06-06 Thread Stephen Hemminger
Still not sure this is a good idea for a couple of reasons:

1. We already have lots of special cases with skb's (frags and fraglist),
   and skb's travel through a lot of different parts of the kernel.  So any
   new change like this creates lots of exposed points for new bugs. Look
   at cases like MD5 TCP and netfilter, and forwarding these SKB's to ipsec
   and ppp and ...

2. SKB's can have infinite lifetime in the kernel. If these buffers come from
   a fixed size pool in an external device, they can easily all get tied up
   if you have a slow listener. What happens then?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH QEMU-KVM v2 diff.renames] kvm: rename kvm/test/test/powerpc to kvm/test/powerpc

2010-06-06 Thread Ken CC
To make the directory architecture match the make config file
kvm/test/config-powerpc.mk.

Signed-off-by: Ken CC cc.5...@gmail.com
---
 kvm/test/{ = }/test/powerpc/44x/tlbsx.S  |0
 kvm/test/{ = }/test/powerpc/44x/tlbwe.S  |0
 kvm/test/{ = }/test/powerpc/44x/tlbwe_16KB.S |0
 kvm/test/{ = }/test/powerpc/44x/tlbwe_hole.S |0
 kvm/test/{ = }/test/powerpc/cstart.S |0
 kvm/test/{ = }/test/powerpc/exit.c   |0
 kvm/test/{ = }/test/powerpc/helloworld.c |0
 kvm/test/{ = }/test/powerpc/io.S |0
 kvm/test/{ = }/test/powerpc/spin.S   |0
 kvm/test/{ = }/test/powerpc/sprg.S   |0
 10 files changed, 0 insertions(+), 0 deletions(-)
 rename kvm/test/{ = }/test/powerpc/44x/tlbsx.S (100%)
 rename kvm/test/{ = }/test/powerpc/44x/tlbwe.S (100%)
 rename kvm/test/{ = }/test/powerpc/44x/tlbwe_16KB.S (100%)
 rename kvm/test/{ = }/test/powerpc/44x/tlbwe_hole.S (100%)
 rename kvm/test/{ = }/test/powerpc/cstart.S (100%)
 rename kvm/test/{ = }/test/powerpc/exit.c (100%)
 rename kvm/test/{ = }/test/powerpc/helloworld.c (100%)
 rename kvm/test/{ = }/test/powerpc/io.S (100%)
 rename kvm/test/{ = }/test/powerpc/spin.S (100%)
 rename kvm/test/{ = }/test/powerpc/sprg.S (100%)

diff --git a/kvm/test/test/powerpc/44x/tlbsx.S b/kvm/test/powerpc/44x/tlbsx.S
similarity index 100%
rename from kvm/test/test/powerpc/44x/tlbsx.S
rename to kvm/test/powerpc/44x/tlbsx.S
diff --git a/kvm/test/test/powerpc/44x/tlbwe.S b/kvm/test/powerpc/44x/tlbwe.S
similarity index 100%
rename from kvm/test/test/powerpc/44x/tlbwe.S
rename to kvm/test/powerpc/44x/tlbwe.S
diff --git a/kvm/test/test/powerpc/44x/tlbwe_16KB.S 
b/kvm/test/powerpc/44x/tlbwe_16KB.S
similarity index 100%
rename from kvm/test/test/powerpc/44x/tlbwe_16KB.S
rename to kvm/test/powerpc/44x/tlbwe_16KB.S
diff --git a/kvm/test/test/powerpc/44x/tlbwe_hole.S 
b/kvm/test/powerpc/44x/tlbwe_hole.S
similarity index 100%
rename from kvm/test/test/powerpc/44x/tlbwe_hole.S
rename to kvm/test/powerpc/44x/tlbwe_hole.S
diff --git a/kvm/test/test/powerpc/cstart.S b/kvm/test/powerpc/cstart.S
similarity index 100%
rename from kvm/test/test/powerpc/cstart.S
rename to kvm/test/powerpc/cstart.S
diff --git a/kvm/test/test/powerpc/exit.c b/kvm/test/powerpc/exit.c
similarity index 100%
rename from kvm/test/test/powerpc/exit.c
rename to kvm/test/powerpc/exit.c
diff --git a/kvm/test/test/powerpc/helloworld.c b/kvm/test/powerpc/helloworld.c
similarity index 100%
rename from kvm/test/test/powerpc/helloworld.c
rename to kvm/test/powerpc/helloworld.c
diff --git a/kvm/test/test/powerpc/io.S b/kvm/test/powerpc/io.S
similarity index 100%
rename from kvm/test/test/powerpc/io.S
rename to kvm/test/powerpc/io.S
diff --git a/kvm/test/test/powerpc/spin.S b/kvm/test/powerpc/spin.S
similarity index 100%
rename from kvm/test/test/powerpc/spin.S
rename to kvm/test/powerpc/spin.S
diff --git a/kvm/test/test/powerpc/sprg.S b/kvm/test/powerpc/sprg.S
similarity index 100%
rename from kvm/test/test/powerpc/sprg.S
rename to kvm/test/powerpc/sprg.S
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH qemu-kvm] kvm: rename kvm/test/test/powerpc to kvm/test/powerpc

2010-06-06 Thread Ken CC
On Sun, Jun 06, 2010 at 04:22:45PM +0300, Avi Kivity wrote:
 On 06/06/2010 04:13 PM, Ken CC wrote:
 To make the directory architecture match the file config file
 kvm/test/config-powerpc.mk.
 
 
 Can you please resent with 'diff.renames' set in your git
 configuration?  That will make the patch reviewable, as it will
 convert it to a pattern of moves instead of creates/deletes.
 

Hi, Avi,

I have resent the patch, please check.

 Also, isn't a Makefile change needed?
 

the content of config-powerpc.mak shares the same directory structure
with that of config-x86-common.mak, so i think it is not needed to
change Makefile while moving powerpc out of the test dir.


Regards,

Ken CC

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM VMX: Add all-context INVVPID type support

2010-06-06 Thread Gui Jianfeng
Add all-context INVVPID type support.

Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com
---
 arch/x86/include/asm/vmx.h |1 +
 arch/x86/kvm/vmx.c |   23 +--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index b4e2840..96a5886 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -377,6 +377,7 @@ enum vmcs_field {
 #define VMX_EPT_EXTENT_GLOBAL_BIT  (1ull  26)
 
 #define VMX_VPID_EXTENT_SINGLE_CONTEXT_BIT  (1ull  9) /* (41 - 32) */
+#define VMX_VPID_EXTENT_GLOBAL_CONTEXT_BIT  (1ull  10) /* (42 - 32) */
 
 #define VMX_EPT_DEFAULT_GAW3
 #define VMX_EPT_MAX_GAW0x4
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 240a407..aaf8a28 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -360,6 +360,11 @@ static inline bool cpu_has_vmx_invvpid_single(void)
return vmx_capability.vpid  VMX_VPID_EXTENT_SINGLE_CONTEXT_BIT;
 }
 
+static inline bool cpu_has_vmx_invvpid_global(void)
+{
+   return vmx_capability.vpid  VMX_VPID_EXTENT_GLOBAL_CONTEXT_BIT;
+}
+
 static inline bool cpu_has_vmx_ept(void)
 {
return vmcs_config.cpu_based_2nd_exec_ctrl 
@@ -508,6 +513,20 @@ static inline void vpid_sync_vcpu_all(struct vcpu_vmx *vmx)
__invvpid(VMX_VPID_EXTENT_SINGLE_CONTEXT, vmx-vpid, 0);
 }
 
+static inline void vpid_sync_vcpu_global(void)
+{
+   if (cpu_has_vmx_invvpid_global())
+   __invvpid(VMX_VPID_EXTENT_ALL_CONTEXT, 0, 0);
+}
+
+static inline void vpid_sync_context(struct vcpu_vmx *vmx)
+{
+   if (cpu_has_vmx_invvpid_single())
+   vpid_sync_vcpu_all(vmx);
+   else
+   vpid_sync_vcpu_global();
+}
+
 static inline void ept_sync_global(void)
 {
if (cpu_has_vmx_invept_global())
@@ -1797,7 +1816,7 @@ static void exit_lmode(struct kvm_vcpu *vcpu)
 
 static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
 {
-   vpid_sync_vcpu_all(to_vmx(vcpu));
+   vpid_sync_context(to_vmx(vcpu));
if (enable_ept)
ept_sync_context(construct_eptp(vcpu-arch.mmu.root_hpa));
 }
@@ -2753,7 +2772,7 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
vmx_fpu_activate(vmx-vcpu);
update_exception_bitmap(vmx-vcpu);
 
-   vpid_sync_vcpu_all(vmx);
+   vpid_sync_context(vmx);
 
ret = 0;
 
-- 
1.6.5.2


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM VMX: rename vpid_sync_vcpu_all() to vpid_sync_vcpu_single()

2010-06-06 Thread Gui Jianfeng
The name pid_sync_vcpu_all isn't appropriate since it just affect
a single vpid, so rename it to vpid_sync_vcpu_single().

Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com
---
 arch/x86/kvm/vmx.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index aaf8a28..4d893ac 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -504,7 +504,7 @@ static void vcpu_clear(struct vcpu_vmx *vmx)
smp_call_function_single(vmx-vcpu.cpu, __vcpu_clear, vmx, 1);
 }
 
-static inline void vpid_sync_vcpu_all(struct vcpu_vmx *vmx)
+static inline void vpid_sync_vcpu_single(struct vcpu_vmx *vmx)
 {
if (vmx-vpid == 0)
return;
@@ -522,7 +522,7 @@ static inline void vpid_sync_vcpu_global(void)
 static inline void vpid_sync_context(struct vcpu_vmx *vmx)
 {
if (cpu_has_vmx_invvpid_single())
-   vpid_sync_vcpu_all(vmx);
+   vpid_sync_vcpu_single(vmx);
else
vpid_sync_vcpu_global();
 }
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Reboot Problem

2010-06-06 Thread Chaitra Gorantla
Hi all,

We know that kernel reboots after any panic or crash.

I am using 2.6.21 kernel (HOST machine) and KVM-88 package.

After inserting kvm-intel.ko , if any kernel crash occurs,  the Host halts= .
The kernel never reboots.

Please suggest regarding this.

Thanks in advance,
Chaitra


This Email may contain confidential or privileged information for the intended 
recipient (s) If you are not the intended recipient, please do not use or 
disseminate the information, notify the sender and delete it from your system.

__
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reboot Problem

2010-06-06 Thread Sterling Windmill
The Linux kernel does not reboot on panic by default.

Read here: 
http://www.cyberciti.biz/tips/reboot-linux-box-after-a-kernel-panic.html

- Original Message -
From: Chaitra Gorantla chaitra.goran...@lntinfotech.com
To: kvm@vger.kernel.org
Sent: Sunday, June 6, 2010 10:59:11 PM
Subject: Reboot Problem

Hi all,

We know that kernel reboots after any panic or crash.

I am using 2.6.21 kernel (HOST machine) and KVM-88 package.

After inserting kvm-intel.ko , if any kernel crash occurs, the Host
halts= .
The kernel never reboots.

Please suggest regarding this.

Thanks in advance,
Chaitra


This Email may contain confidential or privileged information for the
intended recipient (s) If you are not the intended recipient, please do
not use or disseminate the information, notify the sender and delete it
from your system.

__
-- To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Reboot Problem

2010-06-06 Thread Chaitra Gorantla
Hi,

Yes, Insipte of adding
kernel.panic = 10
in sysctl.conf,
the kernel does not reboot when kvm-i9ntel.ko is been inserted.

Do you have any idea , because, when the kvm-intel.ko is not inserted, kernel 
reboots.


-Original Message-
From: Sterling Windmill [mailto:sterl...@ampx.net]
Sent: Monday, June 07, 2010 8:49 AM
To: Chaitra Gorantla
Cc: kvm@vger.kernel.org
Subject: Re: Reboot Problem

The Linux kernel does not reboot on panic by default.

Read here: 
http://www.cyberciti.biz/tips/reboot-linux-box-after-a-kernel-panic.html

- Original Message -
From: Chaitra Gorantla chaitra.goran...@lntinfotech.com
To: kvm@vger.kernel.org
Sent: Sunday, June 6, 2010 10:59:11 PM
Subject: Reboot Problem

Hi all,

We know that kernel reboots after any panic or crash.

I am using 2.6.21 kernel (HOST machine) and KVM-88 package.

After inserting kvm-intel.ko , if any kernel crash occurs, the Host
halts= .
The kernel never reboots.

Please suggest regarding this.

Thanks in advance,
Chaitra


This Email may contain confidential or privileged information for the
intended recipient (s) If you are not the intended recipient, please do
not use or disseminate the information, notify the sender and delete it
from your system.

__
-- To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

__

This Email may contain confidential or privileged information for the intended 
recipient (s) If you are not the intended recipient, please do not use or 
disseminate the information, notify the sender and delete it from your system.

__
N�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf

RE: [PATCH] kvm/powerpc: fix a build error in e500_tlb.c

2010-06-06 Thread Liu Yu-B13201
 

 -Original Message-
 From: kvm-ow...@vger.kernel.org 
 [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Alexander Graf
 Sent: Friday, June 04, 2010 10:08 PM
 To: Kevin Hao
 Cc: Marcelo Tosatti; Kumar Gala; Avi Kivity; 
 kvm@vger.kernel.org list; linuxppc-...@ozlabs.org; Liu Yu-B13201
 Subject: Re: [PATCH] kvm/powerpc: fix a build error in e500_tlb.c
 
 
 On 03.06.2010, at 07:52, Kevin Hao wrote:
 
  We use the wrong number arguments when invoking 
 trace_kvm_stlb_inval,
  and cause the following build error.
  arch/powerpc/kvm/e500_tlb.c: In function 
 'kvmppc_e500_stlbe_invalidate':
  arch/powerpc/kvm/e500_tlb.c:230: error: too many arguments 
 to function 'trace_kvm_stlb_inval'
 
 Liu, I'd like to get an ack from you here.
 

Seems commit e43f2f741a49483034bf968841275cfa553a4cb3 has solved this.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mmu_notifers, pte_dirty questions

2010-06-06 Thread Avi Kivity

On 06/06/2010 09:36 PM, Andrea Arcangeli wrote:

On Sun, Jun 06, 2010 at 03:07:27PM +0300, Avi Kivity wrote:
   

Why no notifer when testing and clearing the dirty bit?

(*clear_flush_dirty)(...).

 

static int page_mkclean_one(struct page *page, struct vm_area_struct *vma,
 unsigned long address)
{
 struct mm_struct *mm = vma-vm_mm;
 pte_t *pte;
 spinlock_t *ptl;
 int ret = 0;

 pte = page_check_address(page, mm, address,ptl, 1);
 if (!pte)
 goto out;

 if (pte_dirty(*pte) || pte_write(*pte)) {
 pte_t entry;

 flush_cache_page(vma, address, pte_pfn(*pte));
 entry = ptep_clear_flush_notify(vma, address, pte);
 entry = pte_wrprotect(entry);
 entry = pte_mkclean(entry);
 set_pte_at(mm, address, pte, entry);
   

set_pte_at_notify()?  without this (or clear_flush_dirty) Linux will
assume all ptes are now clean; if the guest writes to a page nothing
will catch it.

-  with set_pte_at_notify(), we can drop the spte and mark the page as
dirty, so the next write will re-instantiate the spte
-  with -clear_flush_dirty(), we can track the dirty state without
dropping the spte.

 

 ret = 1;
 }

 pte_unmap_unlock(pte, ptl);
out:
 return ret;
   

I'm probably missing something big as I can't see how this works.
 

Under the PT lock it's safe to keep the PTE zero, just the pte must be
non zero again before pte_unmap_unlock.

The sptes are all gone by the time ptep_clear_flush_notify returns
(also gup-fast is stopped with the IPI of the flush) and no spte can
be established again before pte_unmap_unlock runs, so it's all safe as
far as I can tell.

   


Somehow  I missed the ptep_clear_flush_notify()...  so all should be fine.


set_pte_at_notify might prevent a minor fault though.
   


I'm thinking of how to implement speculative write access for kvm: 
consider a read fault for a writeable page.  We could install a 
writeable spte with the dirty bit clear, and examine the dirty bit at 
pte_clear_flush_notify() time and transfer it to the page flags.  
However I can't see where the mm code checks the pte dirty bit for 
anonymous pages?  Does it assume anonymous pages are always dirty? (they 
could have a clean copy in swap, no?)


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: powerpc: fix init/exit annotation

2010-06-06 Thread Jean Delvare
Hi Avi,

On Wed, 19 May 2010 20:16:35 +0300, Avi Kivity wrote:
 On 05/18/2010 10:34 AM, Jean Delvare wrote:
  kvmppc_e500_exit() is a module_exit function, so it should be tagged
  with __exit, not __init. The incorrect annotation was added by commit
  2986b8c72c272ea58edd37903b042c6da985627d.
 
 Applied, thanks.

Still not upstream?

-- 
Jean Delvare
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html