date:20071229

Re: [BUG] 2.6.21-rc1,2,3 regressions on my system that I found so far

2007-12-29 Thread Mike Galbraith

(hm, google says i'm not the only one seeing this, so...)

On Sun, 2007-03-18 at 00:32 +0100, Thomas Gleixner wrote:
> Maxim,
> 
> On Sun, 2007-03-18 at 01:00 +0200, Maxim wrote:
> > >Mar 14 00:22:23 MAIN kernel: [2.072931] checking TSC synchronization 
> > >[CPU#0 -> CPU#1]:
> > >Mar 14 00:22:23 MAIN kernel: [2.092922] Measured 72051818872 cycles 
> > >TSC warp between CPUs, turning off
> > 
> > ^ This one I don't think is related to NO_HZ, maybe it is hardware
> > problem, but it exist without NO_HZ
> 
> The TSC is checked for synchronization between the CPUs. It's nothing to
> worry about. We switch off the TSC and use a different clocksource.
> 
> Is this after resume ? If yes, then something (probably BIOS) is
> fiddling with the TSC of one CPU when the resume happens.

My P4 box has the same "problem", which is remedied by..

diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 9125efe..7b74969 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -46,7 +46,7 @@ static __cpuinit void check_tsc_warp(void)
cycles_t start, now, prev, end;
int i;
 
-   start = get_cycles_sync();
+   start = last_tsc = get_cycles_sync();
/*
 * The measurement runs for 20 msecs:
 */

..whacking the ancient last_tsc before entering test loop.  Question is,
is there a good reason to disable the TSC once it's been stepped upon by
BIOS?  Are there any ill effects to be awaited by ignoring this BIOS
artifact?  All seems just fine here.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: Introduce REX prefix helper for kprobes

2007-12-29 Thread H. Peter Anvin

Masami Hiramatsu wrote:
> Hi Harvey,
> 
> Harvey Harrison wrote:
>> Fold some small ifdefs into a helper function.
>>
>> Signed-off-by: Harvey Harrison <[EMAIL PROTECTED]>
>> ---
>> Masami, Ingo, I had this left in some unsent kprobes unification
>> work.  Depends on your tastes, but does reduce ifdefs and is a bit
>> better about self-documenting the REX prefix on X86_64.
> 
> Basically, I think it is good idea.
> Could you use a macro same as the stack_addr() macro, like as below?
> 
> #defile is_REX_prefix(insn) ((insn & 0xf0) == 0x40))
> 
> This is just a bit checker, so I think a macro is better to do that.
> 

Why is a macro better than an inline, and why the odd mIXed case?

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 10/52] KVM: Portability: Move kvm_vcpu_ioctl_get_dirty_log to arch-specific file

2007-12-29 Thread Avi Kivity

From: Zhang Xiantao <[EMAIL PROTECTED]>

Meanwhile keep the interface in common, and leave as more logic in common
as possible.

Signed-off-by: Zhang Xiantao <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h  |5 +
 drivers/kvm/kvm_main.c |   19 ---
 drivers/kvm/x86.c  |   31 +++
 3 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index bdcc44e..c1aa84f 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -644,6 +644,11 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu);
 
 int kvm_dev_ioctl_check_extension(long ext);
 
+int kvm_get_dirty_log(struct kvm *kvm,
+   struct kvm_dirty_log *log, int *is_dirty);
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
+   struct kvm_dirty_log *log);
+
 int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
   struct
   kvm_userspace_memory_region *mem,
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 1c4e950..e64dfa2 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -389,19 +389,14 @@ int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
return kvm_set_memory_region(kvm, mem, user_alloc);
 }
 
-/*
- * Get (and clear) the dirty memory log for a memory slot.
- */
-static int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
- struct kvm_dirty_log *log)
+int kvm_get_dirty_log(struct kvm *kvm,
+   struct kvm_dirty_log *log, int *is_dirty)
 {
struct kvm_memory_slot *memslot;
int r, i;
int n;
unsigned long any = 0;
 
-   mutex_lock(>lock);
-
r = -EINVAL;
if (log->slot >= KVM_MEMORY_SLOTS)
goto out;
@@ -420,17 +415,11 @@ static int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
if (copy_to_user(log->dirty_bitmap, memslot->dirty_bitmap, n))
goto out;
 
-   /* If nothing is dirty, don't bother messing with page tables. */
-   if (any) {
-   kvm_mmu_slot_remove_write_access(kvm, log->slot);
-   kvm_flush_remote_tlbs(kvm);
-   memset(memslot->dirty_bitmap, 0, n);
-   }
+   if (any)
+   *is_dirty = 1;
 
r = 0;
-
 out:
-   mutex_unlock(>lock);
return r;
 }
 
diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index 9618fcb..935e276 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -937,6 +937,37 @@ static int kvm_vm_ioctl_set_irqchip(struct kvm *kvm, 
struct kvm_irqchip *chip)
return r;
 }
 
+/*
+ * Get (and clear) the dirty memory log for a memory slot.
+ */
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
+ struct kvm_dirty_log *log)
+{
+   int r;
+   int n;
+   struct kvm_memory_slot *memslot;
+   int is_dirty = 0;
+
+   mutex_lock(>lock);
+
+   r = kvm_get_dirty_log(kvm, log, _dirty);
+   if (r)
+   goto out;
+
+   /* If nothing is dirty, don't bother messing with page tables. */
+   if (is_dirty) {
+   kvm_mmu_slot_remove_write_access(kvm, log->slot);
+   kvm_flush_remote_tlbs(kvm);
+   memslot = >memslots[log->slot];
+   n = ALIGN(memslot->npages, BITS_PER_LONG) / 8;
+   memset(memslot->dirty_bitmap, 0, n);
+   }
+   r = 0;
+out:
+   mutex_unlock(>lock);
+   return r;
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg)
 {
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

why do we call clear_active_flags in shrink_inactive_list ?

2007-12-29 Thread minchan Kim

In 2.6.23's shrink_inactive_list function, why do we have to call
clear_active_flags after isolate_lru_pages call ?
IMHO, If it call isolate_lru_pages with "zone->inactive_list", It can
be sure that it is not PG_active. So I think It is unnecessary calling
clear_active_flags. Nonetheless, Why do we have to recheck PG_active
flags wich clear_active_flags.

If it is right, which case it happens that page is set to be PG_active ?

-- 
Thanks,
barrios
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 38/52] KVM: Portability: Split kvm_set_memory_region() to have an arch callout

2007-12-29 Thread Avi Kivity

From: Zhang Xiantao <[EMAIL PROTECTED]>

Moving !user_alloc case to kvm_arch to avoid unnecessary
code logic in non-x86 platform.

Signed-off-by: Zhang Xiantao <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h  |4 +++
 drivers/kvm/kvm_main.c |   38 ---
 drivers/kvm/x86.c  |   51 
 3 files changed, 60 insertions(+), 33 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index c2acd74..49094a2 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -391,6 +391,10 @@ int kvm_set_memory_region(struct kvm *kvm,
 int __kvm_set_memory_region(struct kvm *kvm,
struct kvm_userspace_memory_region *mem,
int user_alloc);
+int kvm_arch_set_memory_region(struct kvm *kvm,
+   struct kvm_userspace_memory_region *mem,
+   struct kvm_memory_slot old,
+   int user_alloc);
 gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn);
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
 void kvm_release_page_clean(struct page *page);
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 6a702e1..5f3ef54 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -291,33 +291,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
memset(new.rmap, 0, npages * sizeof(*new.rmap));
 
new.user_alloc = user_alloc;
-   if (user_alloc)
-   new.userspace_addr = mem->userspace_addr;
-   else {
-   down_write(>mm->mmap_sem);
-   new.userspace_addr = do_mmap(NULL, 0,
-npages * PAGE_SIZE,
-PROT_READ | PROT_WRITE,
-MAP_SHARED | MAP_ANONYMOUS,
-0);
-   up_write(>mm->mmap_sem);
-
-   if (IS_ERR((void *)new.userspace_addr))
-   goto out_free;
-   }
-   } else {
-   if (!old.user_alloc && old.rmap) {
-   int ret;
-
-   down_write(>mm->mmap_sem);
-   ret = do_munmap(current->mm, old.userspace_addr,
-   old.npages * PAGE_SIZE);
-   up_write(>mm->mmap_sem);
-   if (ret < 0)
-   printk(KERN_WARNING
-  "kvm_vm_ioctl_set_memory_region: "
-  "failed to munmap memory\n");
-   }
+   new.userspace_addr = mem->userspace_addr;
}
 
/* Allocate page dirty bitmap if needed */
@@ -335,14 +309,12 @@ int __kvm_set_memory_region(struct kvm *kvm,
 
*memslot = new;
 
-   if (!kvm->n_requested_mmu_pages) {
-   unsigned int nr_mmu_pages = kvm_mmu_calculate_mmu_pages(kvm);
-   kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages);
+   r = kvm_arch_set_memory_region(kvm, mem, old, user_alloc);
+   if (r) {
+   *memslot = old;
+   goto out_free;
}
 
-   kvm_mmu_slot_remove_write_access(kvm, mem->slot);
-   kvm_flush_remote_tlbs(kvm);
-
kvm_free_physmem_slot(, );
return 0;
 
diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index 5a54e32..6abb2ed 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -2637,3 +2638,53 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_free_physmem(kvm);
kfree(kvm);
 }
+
+int kvm_arch_set_memory_region(struct kvm *kvm,
+   struct kvm_userspace_memory_region *mem,
+   struct kvm_memory_slot old,
+   int user_alloc)
+{
+   int npages = mem->memory_size >> PAGE_SHIFT;
+   struct kvm_memory_slot *memslot = >memslots[mem->slot];
+
+   /*To keep backward compatibility with older userspace,
+*x86 needs to hanlde !user_alloc case.
+*/
+   if (!user_alloc) {
+   if (npages && !old.rmap) {
+   down_write(>mm->mmap_sem);
+   memslot->userspace_addr = do_mmap(NULL, 0,
+npages * PAGE_SIZE,
+PROT_READ | PROT_WRITE,
+MAP_SHARED | MAP_ANONYMOUS,
+0);
+   up_write(>mm->mmap_sem);
+
+   if (IS_ERR((void *)memslot->userspace_addr))
+   return PTR_ERR((void

[PATCH 39/52] KVM: Split vcpu creation to avoid vcpu_load() before preemption setup

2007-12-29 Thread Avi Kivity

Split kvm_arch_vcpu_create() into kvm_arch_vcpu_create() and
kvm_arch_vcpu_setup(), enabling preemption notification between the two.
This mean that we can now do vcpu_load() within kvm_arch_vcpu_setup().

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h  |1 +
 drivers/kvm/kvm_main.c |4 
 drivers/kvm/x86.c  |   16 +++-
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 49094a2..b65f5de 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -466,6 +466,7 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu);
 struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id);
+int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu);
 
 int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu);
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 5f3ef54..d99396b 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -769,6 +769,10 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, int n)
 
preempt_notifier_init(>preempt_notifier, _preempt_ops);
 
+   r = kvm_arch_vcpu_setup(vcpu);
+   if (r)
+   goto vcpu_destroy;
+
mutex_lock(>lock);
if (kvm->vcpus[n]) {
r = -EEXIST;
diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index 6abb2ed..b482b6a 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -2478,13 +2478,12 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
 struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
unsigned int id)
 {
-   int r;
-   struct kvm_vcpu *vcpu = kvm_x86_ops->vcpu_create(kvm, id);
+   return kvm_x86_ops->vcpu_create(kvm, id);
+}
 
-   if (IS_ERR(vcpu)) {
-   r = -ENOMEM;
-   goto fail;
-   }
+int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
+{
+   int r;
 
/* We do fxsave: this must be aligned. */
BUG_ON((unsigned long)>host_fx_image & 0xF);
@@ -2497,11 +2496,10 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
if (r < 0)
goto free_vcpu;
 
-   return vcpu;
+   return 0;
 free_vcpu:
kvm_x86_ops->vcpu_free(vcpu);
-fail:
-   return ERR_PTR(r);
+   return r;
 }
 
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 26/52] KVM: Portability: Move x86 pic strutctures

2007-12-29 Thread Avi Kivity

From: Jerone Young <[EMAIL PROTECTED]>

This patch moves structures:
kvm_pic_state
kvm_ioapic_state

to inclue/asm-x86/kvm.h.

Signed-off-by: Jerone Young <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 include/asm-x86/kvm.h |   49 +
 include/linux/kvm.h   |   48 
 2 files changed, 49 insertions(+), 48 deletions(-)

diff --git a/include/asm-x86/kvm.h b/include/asm-x86/kvm.h
index 37cf8e9..80752bc 100644
--- a/include/asm-x86/kvm.h
+++ b/include/asm-x86/kvm.h
@@ -17,4 +17,53 @@ struct kvm_memory_alias {
__u64 target_phys_addr;
 };
 
+/* for KVM_GET_IRQCHIP and KVM_SET_IRQCHIP */
+struct kvm_pic_state {
+   __u8 last_irr;  /* edge detection */
+   __u8 irr;   /* interrupt request register */
+   __u8 imr;   /* interrupt mask register */
+   __u8 isr;   /* interrupt service register */
+   __u8 priority_add;  /* highest irq priority */
+   __u8 irq_base;
+   __u8 read_reg_select;
+   __u8 poll;
+   __u8 special_mask;
+   __u8 init_state;
+   __u8 auto_eoi;
+   __u8 rotate_on_auto_eoi;
+   __u8 special_fully_nested_mode;
+   __u8 init4; /* true if 4 byte init */
+   __u8 elcr;  /* PIIX edge/trigger selection */
+   __u8 elcr_mask;
+};
+
+#define KVM_IOAPIC_NUM_PINS  24
+struct kvm_ioapic_state {
+   __u64 base_address;
+   __u32 ioregsel;
+   __u32 id;
+   __u32 irr;
+   __u32 pad;
+   union {
+   __u64 bits;
+   struct {
+   __u8 vector;
+   __u8 delivery_mode:3;
+   __u8 dest_mode:1;
+   __u8 delivery_status:1;
+   __u8 polarity:1;
+   __u8 remote_irr:1;
+   __u8 trig_mode:1;
+   __u8 mask:1;
+   __u8 reserve:7;
+   __u8 reserved[4];
+   __u8 dest_id;
+   } fields;
+   } redirtbl[KVM_IOAPIC_NUM_PINS];
+};
+
+#define KVM_IRQCHIP_PIC_MASTER   0
+#define KVM_IRQCHIP_PIC_SLAVE1
+#define KVM_IRQCHIP_IOAPIC   2
+
 #endif
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index d09dd5d..1779c3d 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -48,54 +48,6 @@ struct kvm_irq_level {
__u32 level;
 };
 
-/* for KVM_GET_IRQCHIP and KVM_SET_IRQCHIP */
-struct kvm_pic_state {
-   __u8 last_irr;  /* edge detection */
-   __u8 irr;   /* interrupt request register */
-   __u8 imr;   /* interrupt mask register */
-   __u8 isr;   /* interrupt service register */
-   __u8 priority_add;  /* highest irq priority */
-   __u8 irq_base;
-   __u8 read_reg_select;
-   __u8 poll;
-   __u8 special_mask;
-   __u8 init_state;
-   __u8 auto_eoi;
-   __u8 rotate_on_auto_eoi;
-   __u8 special_fully_nested_mode;
-   __u8 init4; /* true if 4 byte init */
-   __u8 elcr;  /* PIIX edge/trigger selection */
-   __u8 elcr_mask;
-};
-
-#define KVM_IOAPIC_NUM_PINS  24
-struct kvm_ioapic_state {
-   __u64 base_address;
-   __u32 ioregsel;
-   __u32 id;
-   __u32 irr;
-   __u32 pad;
-   union {
-   __u64 bits;
-   struct {
-   __u8 vector;
-   __u8 delivery_mode:3;
-   __u8 dest_mode:1;
-   __u8 delivery_status:1;
-   __u8 polarity:1;
-   __u8 remote_irr:1;
-   __u8 trig_mode:1;
-   __u8 mask:1;
-   __u8 reserve:7;
-   __u8 reserved[4];
-   __u8 dest_id;
-   } fields;
-   } redirtbl[KVM_IOAPIC_NUM_PINS];
-};
-
-#define KVM_IRQCHIP_PIC_MASTER   0
-#define KVM_IRQCHIP_PIC_SLAVE1
-#define KVM_IRQCHIP_IOAPIC   2
 
 struct kvm_irqchip {
__u32 chip_id;
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 31/52] KVM: Portability: Move cpuid structures to

2007-12-29 Thread Avi Kivity

From: Jerone Young <[EMAIL PROTECTED]>

This patch moves structures:
kvm_cpuid_entry
kvm_cpuid

from include/linux/kvm.h to include/asm-x86/kvm.h

Signed-off-by: Jerone Young <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 include/asm-x86/kvm.h |   17 +
 include/linux/kvm.h   |   16 
 2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/include/asm-x86/kvm.h b/include/asm-x86/kvm.h
index 32c7dda..4837d75 100644
--- a/include/asm-x86/kvm.h
+++ b/include/asm-x86/kvm.h
@@ -135,4 +135,21 @@ struct kvm_msr_list {
 };
 
 
+struct kvm_cpuid_entry {
+   __u32 function;
+   __u32 eax;
+   __u32 ebx;
+   __u32 ecx;
+   __u32 edx;
+   __u32 padding;
+};
+
+/* for KVM_SET_CPUID */
+struct kvm_cpuid {
+   __u32 nent;
+   __u32 padding;
+   struct kvm_cpuid_entry entries[0];
+};
+
+
 #endif
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index e6867aa..fd4f900 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -192,22 +192,6 @@ struct kvm_dirty_log {
};
 };
 
-struct kvm_cpuid_entry {
-   __u32 function;
-   __u32 eax;
-   __u32 ebx;
-   __u32 ecx;
-   __u32 edx;
-   __u32 padding;
-};
-
-/* for KVM_SET_CPUID */
-struct kvm_cpuid {
-   __u32 nent;
-   __u32 padding;
-   struct kvm_cpuid_entry entries[0];
-};
-
 /* for KVM_SET_SIGNAL_MASK */
 struct kvm_signal_mask {
__u32 len;
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 28/52] KVM: Portability: Move structure lapic_state to

2007-12-29 Thread Avi Kivity

From: Jerone Young <[EMAIL PROTECTED]>

This patch moves structure lapic_state from include/linux/kvm.h
to include/asm-x86/kvm.h

Signed-off-by: Jerone Young <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 include/asm-x86/kvm.h |6 ++
 include/linux/kvm.h   |5 -
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/asm-x86/kvm.h b/include/asm-x86/kvm.h
index c83a2ff..a2c65b5 100644
--- a/include/asm-x86/kvm.h
+++ b/include/asm-x86/kvm.h
@@ -76,4 +76,10 @@ struct kvm_regs {
__u64 rip, rflags;
 };
 
+/* for KVM_GET_LAPIC and KVM_SET_LAPIC */
+#define KVM_APIC_REG_SIZE 0x400
+struct kvm_lapic_state {
+   char regs[KVM_APIC_REG_SIZE];
+};
+
 #endif
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 0d83efc..280ec0d 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -150,11 +150,6 @@ struct kvm_fpu {
__u32 pad2;
 };
 
-/* for KVM_GET_LAPIC and KVM_SET_LAPIC */
-#define KVM_APIC_REG_SIZE 0x400
-struct kvm_lapic_state {
-   char regs[KVM_APIC_REG_SIZE];
-};
 
 struct kvm_segment {
__u64 base;
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 45/52] KVM: MMU: Introduce and use gpte_to_gfn()

2007-12-29 Thread Avi Kivity

Instead of repretitively open-coding this.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/paging_tmpl.h |   28 
 1 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h
index 6e01301..6f79ae8 100644
--- a/drivers/kvm/paging_tmpl.h
+++ b/drivers/kvm/paging_tmpl.h
@@ -52,6 +52,9 @@
#error Invalid PTTYPE value
 #endif
 
+#define gpte_to_gfn FNAME(gpte_to_gfn)
+#define gpte_to_gfn_pde FNAME(gpte_to_gfn_pde)
+
 /*
  * The guest_walker structure emulates the behavior of the hardware page
  * table walker.
@@ -65,6 +68,16 @@ struct guest_walker {
u32 error_code;
 };
 
+static gfn_t gpte_to_gfn(pt_element_t gpte)
+{
+   return (gpte & PT_BASE_ADDR_MASK) >> PAGE_SHIFT;
+}
+
+static gfn_t gpte_to_gfn_pde(pt_element_t gpte)
+{
+   return (gpte & PT_DIR_BASE_ADDR_MASK) >> PAGE_SHIFT;
+}
+
 /*
  * Fetch a guest pte for a guest virtual address
  */
@@ -96,7 +109,7 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
for (;;) {
index = PT_INDEX(addr, walker->level);
 
-   table_gfn = (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
+   table_gfn = gpte_to_gfn(pte);
pte_gpa = table_gfn << PAGE_SHIFT;
pte_gpa += index * sizeof(pt_element_t);
walker->table_gfn[walker->level - 1] = table_gfn;
@@ -127,15 +140,14 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
}
 
if (walker->level == PT_PAGE_TABLE_LEVEL) {
-   walker->gfn = (pte & PT_BASE_ADDR_MASK) >> PAGE_SHIFT;
+   walker->gfn = gpte_to_gfn(pte);
break;
}
 
if (walker->level == PT_DIRECTORY_LEVEL
&& (pte & PT_PAGE_SIZE_MASK)
&& (PTTYPE == 64 || is_pse(vcpu))) {
-   walker->gfn = (pte & PT_DIR_BASE_ADDR_MASK)
-   >> PAGE_SHIFT;
+   walker->gfn = gpte_to_gfn_pde(pte);
walker->gfn += PT_INDEX(addr, PT_PAGE_TABLE_LEVEL);
break;
}
@@ -296,8 +308,7 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *page,
return;
pgprintk("%s: gpte %llx spte %p\n", __FUNCTION__, (u64)gpte, spte);
FNAME(set_pte)(vcpu, gpte, spte, PT_USER_MASK | PT_WRITABLE_MASK, 0,
-  0, NULL, NULL,
-  (gpte & PT_BASE_ADDR_MASK) >> PAGE_SHIFT);
+  0, NULL, NULL, gpte_to_gfn(gpte));
 }
 
 static void FNAME(set_pde)(struct kvm_vcpu *vcpu, pt_element_t gpde,
@@ -370,8 +381,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
hugepage_access >>= PT_WRITABLE_SHIFT;
if (walker->pte & PT64_NX_MASK)
hugepage_access |= (1 << 2);
-   table_gfn = (walker->pte & PT_BASE_ADDR_MASK)
-   >> PAGE_SHIFT;
+   table_gfn = gpte_to_gfn(walker->pte);
} else {
metaphysical = 0;
table_gfn = walker->table_gfn[level - 2];
@@ -519,3 +529,5 @@ static void FNAME(prefetch_page)(struct kvm_vcpu *vcpu,
 #undef PT_DIR_BASE_ADDR_MASK
 #undef PT_LEVEL_BITS
 #undef PT_MAX_FULL_LEVELS
+#undef gpte_to_gfn
+#undef gpte_to_gfn_pde
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 48/52] KVM: MMU: Remove set_pde()

2007-12-29 Thread Avi Kivity

It is now identical to set_pte().

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/paging_tmpl.h |   25 -
 1 files changed, 4 insertions(+), 21 deletions(-)

diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h
index cc373ed..062f4f5 100644
--- a/drivers/kvm/paging_tmpl.h
+++ b/drivers/kvm/paging_tmpl.h
@@ -305,17 +305,6 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, 
struct kvm_mmu_page *page,
   0, NULL, NULL, gpte_to_gfn(gpte));
 }
 
-static void FNAME(set_pde)(struct kvm_vcpu *vcpu, pt_element_t gpde,
-  u64 *shadow_pte, u64 access_bits,
-  int user_fault, int write_fault, int *ptwrite,
-  struct guest_walker *walker, gfn_t gfn)
-{
-   access_bits &= gpde;
-   FNAME(set_pte_common)(vcpu, shadow_pte,
- gpde, access_bits, user_fault, write_fault,
- ptwrite, walker, gfn);
-}
-
 /*
  * Fetch a shadow pte for a specific level in the paging hierarchy.
  */
@@ -384,16 +373,10 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t 
addr,
prev_shadow_ent = shadow_ent;
}
 
-   if (walker->level == PT_DIRECTORY_LEVEL) {
-   FNAME(set_pde)(vcpu, walker->pte, shadow_ent,
-  walker->inherited_ar, user_fault, write_fault,
-  ptwrite, walker, walker->gfn);
-   } else {
-   ASSERT(walker->level == PT_PAGE_TABLE_LEVEL);
-   FNAME(set_pte)(vcpu, walker->pte, shadow_ent,
-  walker->inherited_ar, user_fault, write_fault,
-  ptwrite, walker, walker->gfn);
-   }
+   FNAME(set_pte)(vcpu, walker->pte, shadow_ent,
+  walker->inherited_ar, user_fault, write_fault,
+  ptwrite, walker, walker->gfn);
+
return shadow_ent;
 }
 
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 41/52] KVM: Add statistic for remote tlb flushes

2007-12-29 Thread Avi Kivity

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h  |1 +
 drivers/kvm/kvm_main.c |3 +++
 drivers/kvm/x86.c  |1 +
 3 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index b65f5de..048849d 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -300,6 +300,7 @@ struct kvm_vm_stat {
u32 mmu_pde_zapped;
u32 mmu_flooded;
u32 mmu_recycled;
+   u32 remote_tlb_flush;
 };
 
 struct kvm {
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index d99396b..411b2bd 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -115,6 +115,9 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
if (cpu != -1 && cpu != raw_smp_processor_id())
cpu_set(cpu, cpus);
}
+   if (cpus_empty(cpus))
+   return;
+   ++kvm->stat.remote_tlb_flush;
smp_call_function_mask(cpus, ack_flush, NULL, 1);
 }
 
diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index b482b6a..ac09f38 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -73,6 +73,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ "mmu_pde_zapped", VM_STAT(mmu_pde_zapped) },
{ "mmu_flooded", VM_STAT(mmu_flooded) },
{ "mmu_recycled", VM_STAT(mmu_recycled) },
+   { "remote_tlb_flush", VM_STAT(remote_tlb_flush) },
{ NULL }
 };
 
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 50/52] KVM: MMU: Adjust page_header_update_slot() to accept a gfn instead of a gpa

2007-12-29 Thread Avi Kivity

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/mmu.c |7 ---
 drivers/kvm/paging_tmpl.h |3 +--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index c3362ba..1dcffc4 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -860,9 +860,9 @@ static void mmu_unshadow(struct kvm *kvm, gfn_t gfn)
}
 }
 
-static void page_header_update_slot(struct kvm *kvm, void *pte, gpa_t gpa)
+static void page_header_update_slot(struct kvm *kvm, void *pte, gfn_t gfn)
 {
-   int slot = memslot_id(kvm, gfn_to_memslot(kvm, gpa >> PAGE_SHIFT));
+   int slot = memslot_id(kvm, gfn_to_memslot(kvm, gfn));
struct kvm_mmu_page *page_head = page_header(__pa(pte));
 
__set_bit(slot, _head->slot_bitmap);
@@ -928,7 +928,8 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
hpa_t p)
return 0;
}
mark_page_dirty(vcpu->kvm, v >> PAGE_SHIFT);
-   page_header_update_slot(vcpu->kvm, table, v);
+   page_header_update_slot(vcpu->kvm, table,
+   v >> PAGE_SHIFT);
table[index] = p | PT_PRESENT_MASK | PT_WRITABLE_MASK |
PT_USER_MASK;
if (!was_rmapped)
diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h
index 54a6ee8..a3da98b 100644
--- a/drivers/kvm/paging_tmpl.h
+++ b/drivers/kvm/paging_tmpl.h
@@ -259,8 +259,7 @@ unshadowed:
 
pgprintk("%s: setting spte %llx\n", __FUNCTION__, spte);
set_shadow_pte(shadow_pte, spte);
-   page_header_update_slot(vcpu->kvm, shadow_pte,
-   (gpa_t)gfn << PAGE_SHIFT);
+   page_header_update_slot(vcpu->kvm, shadow_pte, gfn);
if (!was_rmapped) {
rmap_add(vcpu, shadow_pte, gfn);
if (!is_rmap_pte(*shadow_pte))
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 40/52] KVM: MMU: Implement guest page fault bypass for nonpae

2007-12-29 Thread Avi Kivity

I spent an hour worrying why I see so many guest page faults on FC6 i386.
Turns out bypass wasn't implemented for nonpae.  Implement it so it doesn't
happen again.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/paging_tmpl.h |9 ++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h
index bf15d12..92b9313 100644
--- a/drivers/kvm/paging_tmpl.h
+++ b/drivers/kvm/paging_tmpl.h
@@ -486,19 +486,22 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, 
gva_t vaddr)
 static void FNAME(prefetch_page)(struct kvm_vcpu *vcpu,
 struct kvm_mmu_page *sp)
 {
-   int i;
+   int i, offset = 0;
pt_element_t *gpt;
struct page *page;
 
-   if (sp->role.metaphysical || PTTYPE == 32) {
+   if (sp->role.metaphysical
+   || (PTTYPE == 32 && sp->role.level > PT_PAGE_TABLE_LEVEL)) {
nonpaging_prefetch_page(vcpu, sp);
return;
}
 
+   if (PTTYPE == 32)
+   offset = sp->role.quadrant << PT64_LEVEL_BITS;
page = gfn_to_page(vcpu->kvm, sp->gfn);
gpt = kmap_atomic(page, KM_USER0);
for (i = 0; i < PT64_ENT_PER_PAGE; ++i)
-   if (is_present_pte(gpt[i]))
+   if (is_present_pte(gpt[offset + i]))
sp->spt[i] = shadow_trap_nonpresent_pte;
else
sp->spt[i] = shadow_notrap_nonpresent_pte;
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 52/52] KVM: MMU: Simplify nonpaging_map()

2007-12-29 Thread Avi Kivity

Instead of passing an hpa, pass a regular struct page.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/mmu.c |   24 ++--
 1 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index 1dcffc4..1965185 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -903,13 +903,11 @@ static void nonpaging_new_cr3(struct kvm_vcpu *vcpu)
 {
 }
 
-static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, hpa_t p)
+static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, struct page *page)
 {
int level = PT32E_ROOT_LEVEL;
hpa_t table_addr = vcpu->mmu.root_hpa;
-   struct page *page;
 
-   page = pfn_to_page(p >> PAGE_SHIFT);
for (; ; level--) {
u32 index = PT64_INDEX(v, level);
u64 *table;
@@ -930,8 +928,9 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
hpa_t p)
mark_page_dirty(vcpu->kvm, v >> PAGE_SHIFT);
page_header_update_slot(vcpu->kvm, table,
v >> PAGE_SHIFT);
-   table[index] = p | PT_PRESENT_MASK | PT_WRITABLE_MASK |
-   PT_USER_MASK;
+   table[index] = page_to_phys(page)
+   | PT_PRESENT_MASK | PT_WRITABLE_MASK
+   | PT_USER_MASK;
if (!was_rmapped)
rmap_add(vcpu, [index], v >> PAGE_SHIFT);
else
@@ -1050,10 +1049,9 @@ static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, 
gva_t vaddr)
 }
 
 static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
-  u32 error_code)
+   u32 error_code)
 {
-   gpa_t addr = gva;
-   hpa_t paddr;
+   struct page *page;
int r;
 
r = mmu_topup_memory_caches(vcpu);
@@ -1063,16 +1061,14 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, 
gva_t gva,
ASSERT(vcpu);
ASSERT(VALID_PAGE(vcpu->mmu.root_hpa));
 
+   page = gfn_to_page(vcpu->kvm, gva >> PAGE_SHIFT);
 
-   paddr = gpa_to_hpa(vcpu->kvm, addr & PT64_BASE_ADDR_MASK);
-
-   if (is_error_hpa(paddr)) {
-   kvm_release_page_clean(pfn_to_page((paddr & PT64_BASE_ADDR_MASK)
-  >> PAGE_SHIFT));
+   if (is_error_page(page)) {
+   kvm_release_page_clean(page);
return 1;
}
 
-   return nonpaging_map(vcpu, addr & PAGE_MASK, paddr);
+   return nonpaging_map(vcpu, gva & PAGE_MASK, page);
 }
 
 static void nonpaging_free(struct kvm_vcpu *vcpu)
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 49/52] KVM: MMU: Merge set_pte() and set_pte_common()

2007-12-29 Thread Avi Kivity

Since set_pte() is now the only caller of set_pte_common(), merge the two
functions.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/paging_tmpl.h |   26 ++
 1 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h
index 062f4f5..54a6ee8 100644
--- a/drivers/kvm/paging_tmpl.h
+++ b/drivers/kvm/paging_tmpl.h
@@ -186,15 +186,11 @@ err:
return 0;
 }
 
-static void FNAME(set_pte_common)(struct kvm_vcpu *vcpu,
- u64 *shadow_pte,
- pt_element_t gpte,
- u64 access_bits,
- int user_fault,
- int write_fault,
- int *ptwrite,
- struct guest_walker *walker,
- gfn_t gfn)
+static void FNAME(set_pte)(struct kvm_vcpu *vcpu, pt_element_t gpte,
+  u64 *shadow_pte, u64 access_bits,
+  int user_fault, int write_fault,
+  int *ptwrite, struct guest_walker *walker,
+  gfn_t gfn)
 {
int dirty = gpte & PT_DIRTY_MASK;
u64 spte;
@@ -206,6 +202,7 @@ static void FNAME(set_pte_common)(struct kvm_vcpu *vcpu,
 __FUNCTION__, *shadow_pte, (u64)gpte, access_bits,
 write_fault, user_fault, gfn);
 
+   access_bits &= gpte;
/*
 * We don't set the accessed bit, since we sometimes want to see
 * whether the guest actually used the pte (in order to detect
@@ -275,17 +272,6 @@ unshadowed:
vcpu->last_pte_updated = shadow_pte;
 }
 
-static void FNAME(set_pte)(struct kvm_vcpu *vcpu, pt_element_t gpte,
-  u64 *shadow_pte, u64 access_bits,
-  int user_fault, int write_fault, int *ptwrite,
-  struct guest_walker *walker, gfn_t gfn)
-{
-   access_bits &= gpte;
-   FNAME(set_pte_common)(vcpu, shadow_pte,
- gpte, access_bits, user_fault, write_fault,
- ptwrite, walker, gfn);
-}
-
 static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page,
  u64 *spte, const void *pte, int bytes,
  int offset_in_pte)
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 47/52] KVM: MMU: Remove extra gaddr parameter from set_pte_common()

2007-12-29 Thread Avi Kivity

Similar information is available in the gfn parameter, so use that.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/mmu.c |1 +
 drivers/kvm/paging_tmpl.h |   29 +
 2 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index a9fed59..c3362ba 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -29,6 +29,7 @@
 
 #include 
 #include 
+#include 
 
 #undef MMU_DEBUG
 
diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h
index dceb4b9..cc373ed 100644
--- a/drivers/kvm/paging_tmpl.h
+++ b/drivers/kvm/paging_tmpl.h
@@ -188,7 +188,6 @@ err:
 
 static void FNAME(set_pte_common)(struct kvm_vcpu *vcpu,
  u64 *shadow_pte,
- gpa_t gaddr,
  pt_element_t gpte,
  u64 access_bits,
  int user_fault,
@@ -197,7 +196,6 @@ static void FNAME(set_pte_common)(struct kvm_vcpu *vcpu,
  struct guest_walker *walker,
  gfn_t gfn)
 {
-   hpa_t paddr;
int dirty = gpte & PT_DIRTY_MASK;
u64 spte;
int was_rmapped = is_rmap_pte(*shadow_pte);
@@ -218,26 +216,20 @@ static void FNAME(set_pte_common)(struct kvm_vcpu *vcpu,
if (!dirty)
access_bits &= ~PT_WRITABLE_MASK;
 
-   paddr = gpa_to_hpa(vcpu->kvm, gaddr & PT64_BASE_ADDR_MASK);
-
-   /*
-* the reason paddr get mask even that it isnt pte is beacuse the
-* HPA_ERR_MASK bit might be used to signal error
-*/
-   page = pfn_to_page((paddr & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT);
+   page = gfn_to_page(vcpu->kvm, gfn);
 
spte |= PT_PRESENT_MASK;
if (access_bits & PT_USER_MASK)
spte |= PT_USER_MASK;
 
-   if (is_error_hpa(paddr)) {
+   if (is_error_page(page)) {
set_shadow_pte(shadow_pte,
   shadow_trap_nonpresent_pte | PT_SHADOW_IO_MARK);
kvm_release_page_clean(page);
return;
}
 
-   spte |= paddr;
+   spte |= page_to_phys(page);
 
if ((access_bits & PT_WRITABLE_MASK)
|| (write_fault && !is_write_protection(vcpu) && !user_fault)) {
@@ -266,14 +258,14 @@ static void FNAME(set_pte_common)(struct kvm_vcpu *vcpu,
 unshadowed:
 
if (access_bits & PT_WRITABLE_MASK)
-   mark_page_dirty(vcpu->kvm, gaddr >> PAGE_SHIFT);
+   mark_page_dirty(vcpu->kvm, gfn);
 
pgprintk("%s: setting spte %llx\n", __FUNCTION__, spte);
set_shadow_pte(shadow_pte, spte);
-   page_header_update_slot(vcpu->kvm, shadow_pte, gaddr);
+   page_header_update_slot(vcpu->kvm, shadow_pte,
+   (gpa_t)gfn << PAGE_SHIFT);
if (!was_rmapped) {
-   rmap_add(vcpu, shadow_pte, (gaddr & PT64_BASE_ADDR_MASK)
->> PAGE_SHIFT);
+   rmap_add(vcpu, shadow_pte, gfn);
if (!is_rmap_pte(*shadow_pte))
kvm_release_page_clean(page);
}
@@ -289,7 +281,7 @@ static void FNAME(set_pte)(struct kvm_vcpu *vcpu, 
pt_element_t gpte,
   struct guest_walker *walker, gfn_t gfn)
 {
access_bits &= gpte;
-   FNAME(set_pte_common)(vcpu, shadow_pte, gpte & PT_BASE_ADDR_MASK,
+   FNAME(set_pte_common)(vcpu, shadow_pte,
  gpte, access_bits, user_fault, write_fault,
  ptwrite, walker, gfn);
 }
@@ -318,11 +310,8 @@ static void FNAME(set_pde)(struct kvm_vcpu *vcpu, 
pt_element_t gpde,
   int user_fault, int write_fault, int *ptwrite,
   struct guest_walker *walker, gfn_t gfn)
 {
-   gpa_t gaddr;
-
access_bits &= gpde;
-   gaddr = (gpa_t)gfn << PAGE_SHIFT;
-   FNAME(set_pte_common)(vcpu, shadow_pte, gaddr,
+   FNAME(set_pte_common)(vcpu, shadow_pte,
  gpde, access_bits, user_fault, write_fault,
  ptwrite, walker, gfn);
 }
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 51/52] KVM: MMU: Introduce gfn_to_gpa()

2007-12-29 Thread Avi Kivity

Converting a frame number to an address is tricky since the data type changes
size.  Introduce a function to do it.  This fixes an actual bug when
accessing guest ptes.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h |4 
 drivers/kvm/paging_tmpl.h |4 ++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 048849d..eda82cd 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -499,6 +499,10 @@ static inline int memslot_id(struct kvm *kvm, struct 
kvm_memory_slot *slot)
return slot - kvm->memslots;
 }
 
+static inline gpa_t gfn_to_gpa(gfn_t gfn)
+{
+   return (gpa_t)gfn << PAGE_SHIFT;
+}
 
 enum kvm_stat_kind {
KVM_STAT_VM,
diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h
index a3da98b..b24bc7c 100644
--- a/drivers/kvm/paging_tmpl.h
+++ b/drivers/kvm/paging_tmpl.h
@@ -110,7 +110,7 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
index = PT_INDEX(addr, walker->level);
 
table_gfn = gpte_to_gfn(pte);
-   pte_gpa = table_gfn << PAGE_SHIFT;
+   pte_gpa = gfn_to_gpa(table_gfn);
pte_gpa += index * sizeof(pt_element_t);
walker->table_gfn[walker->level - 1] = table_gfn;
pgprintk("%s: table_gfn[%d] %lx\n", __FUNCTION__,
@@ -442,7 +442,7 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t 
vaddr)
r = FNAME(walk_addr)(, vcpu, vaddr, 0, 0, 0);
 
if (r) {
-   gpa = (gpa_t)walker.gfn << PAGE_SHIFT;
+   gpa = gfn_to_gpa(walker.gfn);
gpa |= vaddr & ~PAGE_MASK;
}
 
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 44/52] KVM: MMU: Code cleanup

2007-12-29 Thread Avi Kivity

From: Izik Eidus <[EMAIL PROTECTED]>

Signed-off-by: Izik Eidus <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/paging_tmpl.h |   20 ++--
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h
index 92b9313..6e01301 100644
--- a/drivers/kvm/paging_tmpl.h
+++ b/drivers/kvm/paging_tmpl.h
@@ -187,6 +187,7 @@ static void FNAME(set_pte_common)(struct kvm_vcpu *vcpu,
int dirty = gpte & PT_DIRTY_MASK;
u64 spte;
int was_rmapped = is_rmap_pte(*shadow_pte);
+   struct page *page;
 
pgprintk("%s: spte %llx gpte %llx access %llx write_fault %d"
 " user_fault %d gfn %lx\n",
@@ -205,6 +206,12 @@ static void FNAME(set_pte_common)(struct kvm_vcpu *vcpu,
 
paddr = gpa_to_hpa(vcpu->kvm, gaddr & PT64_BASE_ADDR_MASK);
 
+   /*
+* the reason paddr get mask even that it isnt pte is beacuse the
+* HPA_ERR_MASK bit might be used to signal error
+*/
+   page = pfn_to_page((paddr & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT);
+
spte |= PT_PRESENT_MASK;
if (access_bits & PT_USER_MASK)
spte |= PT_USER_MASK;
@@ -212,8 +219,7 @@ static void FNAME(set_pte_common)(struct kvm_vcpu *vcpu,
if (is_error_hpa(paddr)) {
set_shadow_pte(shadow_pte,
   shadow_trap_nonpresent_pte | PT_SHADOW_IO_MARK);
-   kvm_release_page_clean(pfn_to_page((paddr & PT64_BASE_ADDR_MASK)
-  >> PAGE_SHIFT));
+   kvm_release_page_clean(page);
return;
}
 
@@ -254,17 +260,11 @@ unshadowed:
if (!was_rmapped) {
rmap_add(vcpu, shadow_pte, (gaddr & PT64_BASE_ADDR_MASK)
 >> PAGE_SHIFT);
-   if (!is_rmap_pte(*shadow_pte)) {
-   struct page *page;
-
-   page = pfn_to_page((paddr & PT64_BASE_ADDR_MASK)
-  >> PAGE_SHIFT);
+   if (!is_rmap_pte(*shadow_pte))
kvm_release_page_clean(page);
-   }
}
else
-   kvm_release_page_clean(pfn_to_page((paddr & PT64_BASE_ADDR_MASK)
-  >> PAGE_SHIFT));
+   kvm_release_page_clean(page);
if (!ptwrite || !*ptwrite)
vcpu->last_pte_updated = shadow_pte;
 }
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 46/52] KVM: MMU: Move pse36 handling to the guest walker

2007-12-29 Thread Avi Kivity

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/mmu.c |7 +++
 drivers/kvm/paging_tmpl.h |5 ++---
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index 346aa65..a9fed59 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -218,6 +218,13 @@ static int is_rmap_pte(u64 pte)
&& pte != shadow_notrap_nonpresent_pte;
 }
 
+static gfn_t pse36_gfn_delta(u32 gpte)
+{
+   int shift = 32 - PT32_DIR_PSE36_SHIFT - PAGE_SHIFT;
+
+   return (gpte & PT32_DIR_PSE36_MASK) << shift;
+}
+
 static void set_shadow_pte(u64 *sptep, u64 spte)
 {
 #ifdef CONFIG_X86_64
diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h
index 6f79ae8..dceb4b9 100644
--- a/drivers/kvm/paging_tmpl.h
+++ b/drivers/kvm/paging_tmpl.h
@@ -149,6 +149,8 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
&& (PTTYPE == 64 || is_pse(vcpu))) {
walker->gfn = gpte_to_gfn_pde(pte);
walker->gfn += PT_INDEX(addr, PT_PAGE_TABLE_LEVEL);
+   if (PTTYPE == 32 && is_cpuid_PSE36())
+   walker->gfn += pse36_gfn_delta(pte);
break;
}
 
@@ -320,9 +322,6 @@ static void FNAME(set_pde)(struct kvm_vcpu *vcpu, 
pt_element_t gpde,
 
access_bits &= gpde;
gaddr = (gpa_t)gfn << PAGE_SHIFT;
-   if (PTTYPE == 32 && is_cpuid_PSE36())
-   gaddr |= (gpde & PT32_DIR_PSE36_MASK) <<
-   (32 - PT32_DIR_PSE36_SHIFT);
FNAME(set_pte_common)(vcpu, shadow_pte, gaddr,
  gpde, access_bits, user_fault, write_fault,
  ptwrite, walker, gfn);
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 42/52] KVM: MMU: Avoid unnecessary remote tlb flushes when guest updates a pte

2007-12-29 Thread Avi Kivity

If all we're doing is increasing permissions on a pte (typical for demand
paging), then there's not need to flush remote tlbs.  Worst case they'll
get a spurious page fault.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/mmu.c |   27 ++-
 1 files changed, 26 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index 101cd53..281dd5f 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -134,6 +134,8 @@ static int dbg = 1;
 #define PT32_DIR_BASE_ADDR_MASK \
(PAGE_MASK & ~((1ULL << (PAGE_SHIFT + PT32_LEVEL_BITS)) - 1))
 
+#define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | PT_USER_MASK \
+   | PT64_NX_MASK)
 
 #define PFERR_PRESENT_MASK (1U << 0)
 #define PFERR_WRITE_MASK (1U << 1)
@@ -1227,7 +1229,6 @@ static void mmu_pte_write_zap_pte(struct kvm_vcpu *vcpu,
}
}
set_shadow_pte(spte, shadow_trap_nonpresent_pte);
-   kvm_flush_remote_tlbs(vcpu->kvm);
 }
 
 static void mmu_pte_write_new_pte(struct kvm_vcpu *vcpu,
@@ -1250,6 +1251,27 @@ static void mmu_pte_write_new_pte(struct kvm_vcpu *vcpu,
offset_in_pte);
 }
 
+static bool need_remote_flush(u64 old, u64 new)
+{
+   if (!is_shadow_present_pte(old))
+   return false;
+   if (!is_shadow_present_pte(new))
+   return true;
+   if ((old ^ new) & PT64_BASE_ADDR_MASK)
+   return true;
+   old ^= PT64_NX_MASK;
+   new ^= PT64_NX_MASK;
+   return (old & ~new & PT64_PERM_MASK) != 0;
+}
+
+static void mmu_pte_write_flush_tlb(struct kvm_vcpu *vcpu, u64 old, u64 new)
+{
+   if (need_remote_flush(old, new))
+   kvm_flush_remote_tlbs(vcpu->kvm);
+   else
+   kvm_mmu_flush_tlb(vcpu);
+}
+
 static bool last_updated_pte_accessed(struct kvm_vcpu *vcpu)
 {
u64 *spte = vcpu->last_pte_updated;
@@ -1265,6 +1287,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
struct hlist_node *node, *n;
struct hlist_head *bucket;
unsigned index;
+   u64 entry;
u64 *spte;
unsigned offset = offset_in_page(gpa);
unsigned pte_size;
@@ -1335,9 +1358,11 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
}
spte = >spt[page_offset / sizeof(*spte)];
while (npte--) {
+   entry = *spte;
mmu_pte_write_zap_pte(vcpu, page, spte);
mmu_pte_write_new_pte(vcpu, page, spte, new, bytes,
  page_offset & (pte_size - 1));
+   mmu_pte_write_flush_tlb(vcpu, entry, *spte);
++spte;
}
}
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 43/52] KVM: Don't bother the mmu if cr3 load doesn't change cr3

2007-12-29 Thread Avi Kivity

If the guest requests just a tlb flush, don't take the vm lock and
drop the mmu context pointlessly.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/mmu.c |2 +-
 drivers/kvm/x86.c |   25 +
 drivers/kvm/x86.h |1 +
 3 files changed, 27 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index 281dd5f..346aa65 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -1086,7 +1086,7 @@ static int nonpaging_init_context(struct kvm_vcpu *vcpu)
return 0;
 }
 
-static void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu)
+void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu)
 {
++vcpu->stat.tlb_flush;
kvm_x86_ops->tlb_flush(vcpu);
diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index ac09f38..15e1203 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -166,6 +166,26 @@ out:
return ret;
 }
 
+static bool pdptrs_changed(struct kvm_vcpu *vcpu)
+{
+   u64 pdpte[ARRAY_SIZE(vcpu->pdptrs)];
+   bool changed = true;
+   int r;
+
+   if (is_long_mode(vcpu) || !is_pae(vcpu))
+   return false;
+
+   mutex_lock(>kvm->lock);
+   r = kvm_read_guest(vcpu->kvm, vcpu->cr3 & ~31u, pdpte, sizeof(pdpte));
+   if (r < 0)
+   goto out;
+   changed = memcmp(pdpte, vcpu->pdptrs, sizeof(pdpte)) != 0;
+out:
+   mutex_unlock(>kvm->lock);
+
+   return changed;
+}
+
 void set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 {
if (cr0 & CR0_RESERVED_BITS) {
@@ -271,6 +291,11 @@ EXPORT_SYMBOL_GPL(set_cr4);
 
 void set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
+   if (cr3 == vcpu->cr3 && !pdptrs_changed(vcpu)) {
+   kvm_mmu_flush_tlb(vcpu);
+   return;
+   }
+
if (is_long_mode(vcpu)) {
if (cr3 & CR3_L_MODE_RESERVED_BITS) {
printk(KERN_DEBUG "set_cr3: #GP, reserved bits\n");
diff --git a/drivers/kvm/x86.h b/drivers/kvm/x86.h
index 71f2477..b1528c9 100644
--- a/drivers/kvm/x86.h
+++ b/drivers/kvm/x86.h
@@ -299,6 +299,7 @@ int emulator_write_emulated(unsigned long addr,
 
 unsigned long segment_base(u16 selector);
 
+void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu);
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
   const u8 *new, int bytes);
 int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva);
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 30/52] KVM: Portability: Move kvm_sregs and msr structures to

2007-12-29 Thread Avi Kivity

From: Jerone Young <[EMAIL PROTECTED]>

Move structures:
kvm_sregs
kvm_msr_entry
kvm_msrs
kvm_msr_list

from include/linux/kvm.h to include/asm-x86/kvm.h

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 include/asm-x86/kvm.h |   36 
 include/linux/kvm.h   |   36 
 2 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/include/asm-x86/kvm.h b/include/asm-x86/kvm.h
index 644a325..32c7dda 100644
--- a/include/asm-x86/kvm.h
+++ b/include/asm-x86/kvm.h
@@ -9,6 +9,9 @@
 #include 
 #include 
 
+/* Architectural interrupt line count. */
+#define KVM_NR_INTERRUPTS 256
+
 struct kvm_memory_alias {
__u32 slot;  /* this has a different namespace than memory slots */
__u32 flags;
@@ -99,4 +102,37 @@ struct kvm_dtable {
 };
 
 
+/* for KVM_GET_SREGS and KVM_SET_SREGS */
+struct kvm_sregs {
+   /* out (KVM_GET_SREGS) / in (KVM_SET_SREGS) */
+   struct kvm_segment cs, ds, es, fs, gs, ss;
+   struct kvm_segment tr, ldt;
+   struct kvm_dtable gdt, idt;
+   __u64 cr0, cr2, cr3, cr4, cr8;
+   __u64 efer;
+   __u64 apic_base;
+   __u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64];
+};
+
+struct kvm_msr_entry {
+   __u32 index;
+   __u32 reserved;
+   __u64 data;
+};
+
+/* for KVM_GET_MSRS and KVM_SET_MSRS */
+struct kvm_msrs {
+   __u32 nmsrs; /* number of msrs in entries */
+   __u32 pad;
+
+   struct kvm_msr_entry entries[0];
+};
+
+/* for KVM_GET_MSR_INDEX_LIST */
+struct kvm_msr_list {
+   __u32 nmsrs; /* number of msrs in entries */
+   __u32 indices[0];
+};
+
+
 #endif
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 442cb58..e6867aa 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -13,9 +13,6 @@
 
 #define KVM_API_VERSION 12
 
-/* Architectural interrupt line count. */
-#define KVM_NR_INTERRUPTS 256
-
 /* for KVM_CREATE_MEMORY_REGION */
 struct kvm_memory_region {
__u32 slot;
@@ -151,39 +148,6 @@ struct kvm_fpu {
 };
 
 
-
-/* for KVM_GET_SREGS and KVM_SET_SREGS */
-struct kvm_sregs {
-   /* out (KVM_GET_SREGS) / in (KVM_SET_SREGS) */
-   struct kvm_segment cs, ds, es, fs, gs, ss;
-   struct kvm_segment tr, ldt;
-   struct kvm_dtable gdt, idt;
-   __u64 cr0, cr2, cr3, cr4, cr8;
-   __u64 efer;
-   __u64 apic_base;
-   __u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64];
-};
-
-struct kvm_msr_entry {
-   __u32 index;
-   __u32 reserved;
-   __u64 data;
-};
-
-/* for KVM_GET_MSRS and KVM_SET_MSRS */
-struct kvm_msrs {
-   __u32 nmsrs; /* number of msrs in entries */
-   __u32 pad;
-
-   struct kvm_msr_entry entries[0];
-};
-
-/* for KVM_GET_MSR_INDEX_LIST */
-struct kvm_msr_list {
-   __u32 nmsrs; /* number of msrs in entries */
-   __u32 indices[0];
-};
-
 /* for KVM_TRANSLATE */
 struct kvm_translation {
/* in */
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 37/52] KVM: Recalculate mmu pages needed for every memory region change

2007-12-29 Thread Avi Kivity

From: Zhang Xiantao <[EMAIL PROTECTED]>

Instead of incrementally changing the mmu cache size for every memory slot
operation, recalculate it from scratch.  This is simpler and safer.

Signed-off-by: Zhang Xiantao <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm_main.c |   21 -
 drivers/kvm/mmu.c  |   19 +++
 drivers/kvm/x86.h  |1 +
 3 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index f06fa3a..6a702e1 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -333,26 +333,13 @@ int __kvm_set_memory_region(struct kvm *kvm,
if (mem->slot >= kvm->nmemslots)
kvm->nmemslots = mem->slot + 1;
 
+   *memslot = new;
+
if (!kvm->n_requested_mmu_pages) {
-   unsigned int n_pages;
-
-   if (npages) {
-   n_pages = npages * KVM_PERMILLE_MMU_PAGES / 1000;
-   kvm_mmu_change_mmu_pages(kvm, kvm->n_alloc_mmu_pages +
-n_pages);
-   } else {
-   unsigned int nr_mmu_pages;
-
-   n_pages = old.npages * KVM_PERMILLE_MMU_PAGES / 1000;
-   nr_mmu_pages = kvm->n_alloc_mmu_pages - n_pages;
-   nr_mmu_pages = max(nr_mmu_pages,
-   (unsigned int) KVM_MIN_ALLOC_MMU_PAGES);
-   kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages);
-   }
+   unsigned int nr_mmu_pages = kvm_mmu_calculate_mmu_pages(kvm);
+   kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages);
}
 
-   *memslot = new;
-
kvm_mmu_slot_remove_write_access(kvm, mem->slot);
kvm_flush_remote_tlbs(kvm);
 
diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index 4624f37..101cd53 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -1535,6 +1535,25 @@ nomem:
return -ENOMEM;
 }
 
+/*
+ * Caculate mmu pages needed for kvm.
+ */
+unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm)
+{
+   int i;
+   unsigned int nr_mmu_pages;
+   unsigned int  nr_pages = 0;
+
+   for (i = 0; i < kvm->nmemslots; i++)
+   nr_pages += kvm->memslots[i].npages;
+
+   nr_mmu_pages = nr_pages * KVM_PERMILLE_MMU_PAGES / 1000;
+   nr_mmu_pages = max(nr_mmu_pages,
+   (unsigned int) KVM_MIN_ALLOC_MMU_PAGES);
+
+   return nr_mmu_pages;
+}
+
 #ifdef AUDIT
 
 static const char *audit_msg;
diff --git a/drivers/kvm/x86.h b/drivers/kvm/x86.h
index 90b791b..71f2477 100644
--- a/drivers/kvm/x86.h
+++ b/drivers/kvm/x86.h
@@ -236,6 +236,7 @@ void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 
notrap_pte);
 int kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
 void kvm_mmu_zap_all(struct kvm *kvm);
+unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
 
 enum emulation_result {
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 34/52] KVM: MMU: Selectively set PageDirty when releasing guest memory

2007-12-29 Thread Avi Kivity

From: Izik Eidus <[EMAIL PROTECTED]>

Improve dirty bit setting for pages that kvm release, until now every page
that we released we marked dirty, from now only pages that have potential
to get dirty we mark dirty.

Signed-off-by: Izik Eidus <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h |3 ++-
 drivers/kvm/kvm_main.c|   12 +---
 drivers/kvm/mmu.c |   23 +++
 drivers/kvm/paging_tmpl.h |   12 ++--
 drivers/kvm/x86.c |2 +-
 5 files changed, 33 insertions(+), 19 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 52e8018..c2acd74 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -393,7 +393,8 @@ int __kvm_set_memory_region(struct kvm *kvm,
int user_alloc);
 gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn);
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
-void kvm_release_page(struct page *page);
+void kvm_release_page_clean(struct page *page);
+void kvm_release_page_dirty(struct page *page);
 int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
int len);
 int kvm_read_guest(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len);
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 9c94491..f06fa3a 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -543,13 +543,19 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 
 EXPORT_SYMBOL_GPL(gfn_to_page);
 
-void kvm_release_page(struct page *page)
+void kvm_release_page_clean(struct page *page)
+{
+   put_page(page);
+}
+EXPORT_SYMBOL_GPL(kvm_release_page_clean);
+
+void kvm_release_page_dirty(struct page *page)
 {
if (!PageReserved(page))
SetPageDirty(page);
put_page(page);
 }
-EXPORT_SYMBOL_GPL(kvm_release_page);
+EXPORT_SYMBOL_GPL(kvm_release_page_dirty);
 
 static int next_segment(unsigned long len, int offset)
 {
@@ -1055,7 +1061,7 @@ static struct page *kvm_vm_nopage(struct vm_area_struct 
*vma,
/* current->mm->mmap_sem is already held so call lockless version */
page = __gfn_to_page(kvm, pgoff);
if (is_error_page(page)) {
-   kvm_release_page(page);
+   kvm_release_page_clean(page);
return NOPAGE_SIGBUS;
}
if (type != NULL)
diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index 8add4d5..4624f37 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -420,14 +420,18 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
struct kvm_rmap_desc *desc;
struct kvm_rmap_desc *prev_desc;
struct kvm_mmu_page *page;
+   struct page *release_page;
unsigned long *rmapp;
int i;
 
if (!is_rmap_pte(*spte))
return;
page = page_header(__pa(spte));
-   kvm_release_page(pfn_to_page((*spte & PT64_BASE_ADDR_MASK) >>
-PAGE_SHIFT));
+   release_page = pfn_to_page((*spte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT);
+   if (is_writeble_pte(*spte))
+   kvm_release_page_dirty(release_page);
+   else
+   kvm_release_page_clean(release_page);
rmapp = gfn_to_rmap(kvm, page->gfns[spte - page->spt]);
if (!*rmapp) {
printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", spte, *spte);
@@ -893,7 +897,9 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
hpa_t p)
 {
int level = PT32E_ROOT_LEVEL;
hpa_t table_addr = vcpu->mmu.root_hpa;
+   struct page *page;
 
+   page = pfn_to_page(p >> PAGE_SHIFT);
for (; ; level--) {
u32 index = PT64_INDEX(v, level);
u64 *table;
@@ -908,7 +914,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
hpa_t p)
pte = table[index];
was_rmapped = is_rmap_pte(pte);
if (is_shadow_present_pte(pte) && is_writeble_pte(pte)) 
{
-   kvm_release_page(pfn_to_page(p >> PAGE_SHIFT));
+   kvm_release_page_clean(page);
return 0;
}
mark_page_dirty(vcpu->kvm, v >> PAGE_SHIFT);
@@ -918,7 +924,8 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
hpa_t p)
if (!was_rmapped)
rmap_add(vcpu, [index], v >> PAGE_SHIFT);
else
-   kvm_release_page(pfn_to_page(p >> PAGE_SHIFT));
+   kvm_release_page_clean(page);
+
return 0;
}
 
@@ -933,7 +940,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
hpa_t p)
 1, 3, [index]);
if (!new_table) {
pgprintk("nonpaging_map: ENOMEM\n");
-

[PATCH 36/52] KVM: x86 emulator: prefetch up to 15 bytes of the instruction executed

2007-12-29 Thread Avi Kivity

Instead of fetching one byte at a time, prefetch 15 bytes (or until the next
page boundary) to avoid guest page table walks.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/x86_emulate.c |   38 --
 drivers/kvm/x86_emulate.h |7 +++
 2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/kvm/x86_emulate.c b/drivers/kvm/x86_emulate.c
index 8e2162f..6e7f774 100644
--- a/drivers/kvm/x86_emulate.c
+++ b/drivers/kvm/x86_emulate.c
@@ -414,8 +414,7 @@ static u16 twobyte_table[256] = {
 /* Fetch next part of the instruction being emulated. */
 #define insn_fetch(_type, _size, _eip)  \
 ({ unsigned long _x;   \
-   rc = ops->read_std((unsigned long)(_eip) + ctxt->cs_base, &_x,  \
-  (_size), ctxt->vcpu);\
+   rc = do_insn_fetch(ctxt, ops, (_eip), &_x, (_size));\
if (rc != 0)\
goto done;  \
(_eip) += (_size);  \
@@ -446,6 +445,41 @@ static u16 twobyte_table[256] = {
register_address_increment(c->eip, rel);\
} while (0)
 
+static int do_fetch_insn_byte(struct x86_emulate_ctxt *ctxt,
+ struct x86_emulate_ops *ops,
+ unsigned long linear, u8 *dest)
+{
+   struct fetch_cache *fc = >decode.fetch;
+   int rc;
+   int size;
+
+   if (linear < fc->start || linear >= fc->end) {
+   size = min(15UL, PAGE_SIZE - offset_in_page(linear));
+   rc = ops->read_std(linear, fc->data, size, ctxt->vcpu);
+   if (rc)
+   return rc;
+   fc->start = linear;
+   fc->end = linear + size;
+   }
+   *dest = fc->data[linear - fc->start];
+   return 0;
+}
+
+static int do_insn_fetch(struct x86_emulate_ctxt *ctxt,
+struct x86_emulate_ops *ops,
+unsigned long eip, void *dest, unsigned size)
+{
+   int rc = 0;
+
+   eip += ctxt->cs_base;
+   while (size--) {
+   rc = do_fetch_insn_byte(ctxt, ops, eip++, dest++);
+   if (rc)
+   return rc;
+   }
+   return 0;
+}
+
 /*
  * Given the 'reg' portion of a ModRM byte, and a register block, return a
  * pointer into the block that addresses the relevant register.
diff --git a/drivers/kvm/x86_emulate.h b/drivers/kvm/x86_emulate.h
index a62bf14..4603b2b 100644
--- a/drivers/kvm/x86_emulate.h
+++ b/drivers/kvm/x86_emulate.h
@@ -108,6 +108,12 @@ struct operand {
unsigned long val, orig_val, *ptr;
 };
 
+struct fetch_cache {
+   u8 data[15];
+   unsigned long start;
+   unsigned long end;
+};
+
 struct decode_cache {
u8 twobyte;
u8 b;
@@ -130,6 +136,7 @@ struct decode_cache {
u8 use_modrm_ea;
unsigned long modrm_ea;
unsigned long modrm_val;
+   struct fetch_cache fetch;
 };
 
 struct x86_emulate_ctxt {
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 33/52] KVM: MMU: Fix potential memory leak with smp real-mode

2007-12-29 Thread Avi Kivity

From: Izik Eidus <[EMAIL PROTECTED]>

When we map a page, we check whether some other vcpu mapped it for us and if
so, bail out.  But we should decrease the refcount on the page as we do so.

Signed-off-by: Izik Eidus <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/mmu.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index 87d8e70..8add4d5 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -907,8 +907,10 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
hpa_t p)
 
pte = table[index];
was_rmapped = is_rmap_pte(pte);
-   if (is_shadow_present_pte(pte) && is_writeble_pte(pte))
+   if (is_shadow_present_pte(pte) && is_writeble_pte(pte)) 
{
+   kvm_release_page(pfn_to_page(p >> PAGE_SHIFT));
return 0;
+   }
mark_page_dirty(vcpu->kvm, v >> PAGE_SHIFT);
page_header_update_slot(vcpu->kvm, table, v);
table[index] = p | PT_PRESENT_MASK | PT_WRITABLE_MASK |
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 32/52] KVM: Export include/asm-x86/kvm.h

2007-12-29 Thread Avi Kivity

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 include/asm-x86/Kbuild |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86/Kbuild b/include/asm-x86/Kbuild
index 12db5a1..da5eb69 100644
--- a/include/asm-x86/Kbuild
+++ b/include/asm-x86/Kbuild
@@ -3,6 +3,7 @@ include include/asm-generic/Kbuild.asm
 header-y += boot.h
 header-y += bootparam.h
 header-y += debugreg.h
+header-y += kvm.h
 header-y += ldt.h
 header-y += msr-index.h
 header-y += prctl.h
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 35/52] KVM: x86 emulator: retire ->write_std()

2007-12-29 Thread Avi Kivity

Theoretically used to acccess memory known to be ordinary RAM, it was
never implemented.  It is questionable whether it is possible to implement
it correctly.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/x86.c |   10 --
 drivers/kvm/x86_emulate.h |   11 ---
 2 files changed, 0 insertions(+), 21 deletions(-)

diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index 6212984..5a54e32 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -1162,15 +1162,6 @@ int emulator_read_std(unsigned long addr,
 }
 EXPORT_SYMBOL_GPL(emulator_read_std);
 
-static int emulator_write_std(unsigned long addr,
- const void *val,
- unsigned int bytes,
- struct kvm_vcpu *vcpu)
-{
-   pr_unimpl(vcpu, "emulator_write_std: addr %lx n %d\n", addr, bytes);
-   return X86EMUL_UNHANDLEABLE;
-}
-
 static int emulator_read_emulated(unsigned long addr,
  void *val,
  unsigned int bytes,
@@ -1367,7 +1358,6 @@ EXPORT_SYMBOL_GPL(kvm_report_emulation_failure);
 
 struct x86_emulate_ops emulate_ops = {
.read_std= emulator_read_std,
-   .write_std   = emulator_write_std,
.read_emulated   = emulator_read_emulated,
.write_emulated  = emulator_write_emulated,
.cmpxchg_emulated= emulator_cmpxchg_emulated,
diff --git a/drivers/kvm/x86_emulate.h b/drivers/kvm/x86_emulate.h
index e34868b..a62bf14 100644
--- a/drivers/kvm/x86_emulate.h
+++ b/drivers/kvm/x86_emulate.h
@@ -63,17 +63,6 @@ struct x86_emulate_ops {
unsigned int bytes, struct kvm_vcpu *vcpu);
 
/*
-* write_std: Write bytes of standard (non-emulated/special) memory.
-*Used for stack operations, and others.
-*  @addr:  [IN ] Linear address to which to write.
-*  @val:   [IN ] Value to write to memory (low-order bytes used as
-*required).
-*  @bytes: [IN ] Number of bytes to write to memory.
-*/
-   int (*write_std)(unsigned long addr, const void *val,
-unsigned int bytes, struct kvm_vcpu *vcpu);
-
-   /*
 * read_emulated: Read bytes from emulated/special memory area.
 *  @addr:  [IN ] Linear address from which to read.
 *  @val:   [OUT] Value read from memory, zero-extended to 'u_long'.
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 27/52] KVM: Portability: Move kvm_regs to

2007-12-29 Thread Avi Kivity

From: Jerone Young <[EMAIL PROTECTED]>

This patch moves structure kvm_regs to include/asm-x86/kvm.h.
Each architecture will need to create there own version of this
structure.

Signed-off-by: Jerone Young <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 include/asm-x86/kvm.h |   10 ++
 include/linux/kvm.h   |9 -
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/include/asm-x86/kvm.h b/include/asm-x86/kvm.h
index 80752bc..c83a2ff 100644
--- a/include/asm-x86/kvm.h
+++ b/include/asm-x86/kvm.h
@@ -66,4 +66,14 @@ struct kvm_ioapic_state {
 #define KVM_IRQCHIP_PIC_SLAVE1
 #define KVM_IRQCHIP_IOAPIC   2
 
+/* for KVM_GET_REGS and KVM_SET_REGS */
+struct kvm_regs {
+   /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
+   __u64 rax, rbx, rcx, rdx;
+   __u64 rsi, rdi, rsp, rbp;
+   __u64 r8,  r9,  r10, r11;
+   __u64 r12, r13, r14, r15;
+   __u64 rip, rflags;
+};
+
 #endif
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 1779c3d..0d83efc 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -134,15 +134,6 @@ struct kvm_run {
};
 };
 
-/* for KVM_GET_REGS and KVM_SET_REGS */
-struct kvm_regs {
-   /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
-   __u64 rax, rbx, rcx, rdx;
-   __u64 rsi, rdi, rsp, rbp;
-   __u64 r8,  r9,  r10, r11;
-   __u64 r12, r13, r14, r15;
-   __u64 rip, rflags;
-};
 
 /* for KVM_GET_FPU and KVM_SET_FPU */
 struct kvm_fpu {
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 29/52] KVM: Portability: Move kvm_segment & kvm_dtable structure to

2007-12-29 Thread Avi Kivity

From: Jerone Young <[EMAIL PROTECTED]>

This patch moves structures:
kvm_segment
kvm_dtable
from include/linux/kvm.h to include/asm-x86/kvm.h

Signed-off-by: Jerone Young <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 include/asm-x86/kvm.h |   17 +
 include/linux/kvm.h   |   15 ---
 2 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/include/asm-x86/kvm.h b/include/asm-x86/kvm.h
index a2c65b5..644a325 100644
--- a/include/asm-x86/kvm.h
+++ b/include/asm-x86/kvm.h
@@ -82,4 +82,21 @@ struct kvm_lapic_state {
char regs[KVM_APIC_REG_SIZE];
 };
 
+struct kvm_segment {
+   __u64 base;
+   __u32 limit;
+   __u16 selector;
+   __u8  type;
+   __u8  present, dpl, db, s, l, g, avl;
+   __u8  unusable;
+   __u8  padding;
+};
+
+struct kvm_dtable {
+   __u64 base;
+   __u16 limit;
+   __u16 padding[3];
+};
+
+
 #endif
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 280ec0d..442cb58 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -151,21 +151,6 @@ struct kvm_fpu {
 };
 
 
-struct kvm_segment {
-   __u64 base;
-   __u32 limit;
-   __u16 selector;
-   __u8  type;
-   __u8  present, dpl, db, s, l, g, avl;
-   __u8  unusable;
-   __u8  padding;
-};
-
-struct kvm_dtable {
-   __u64 base;
-   __u16 limit;
-   __u16 padding[3];
-};
 
 /* for KVM_GET_SREGS and KVM_SET_SREGS */
 struct kvm_sregs {
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 25/52] KVM: Portability: Move kvm_memory_alias to asm/kvm.h

2007-12-29 Thread Avi Kivity

From: Jerone Young <[EMAIL PROTECTED]>

This patch moves sturct kvm_memory_alias from include/linux/kvm.h
to include/asm-x86/kvm.h. Also have include/linux/kvm.h include
include/asm/kvm.h.

Signed-off-by: Jerone Young <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 include/asm-x86/kvm.h |   20 
 include/linux/kvm.h   |8 +---
 2 files changed, 21 insertions(+), 7 deletions(-)
 create mode 100644 include/asm-x86/kvm.h

diff --git a/include/asm-x86/kvm.h b/include/asm-x86/kvm.h
new file mode 100644
index 000..37cf8e9
--- /dev/null
+++ b/include/asm-x86/kvm.h
@@ -0,0 +1,20 @@
+#ifndef __LINUX_KVM_X86_H
+#define __LINUX_KVM_X86_H
+
+/*
+ * KVM x86 specific structures and definitions
+ *
+ */
+
+#include 
+#include 
+
+struct kvm_memory_alias {
+   __u32 slot;  /* this has a different namespace than memory slots */
+   __u32 flags;
+   __u64 guest_phys_addr;
+   __u64 memory_size;
+   __u64 target_phys_addr;
+};
+
+#endif
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 71d33d6..d09dd5d 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -9,6 +9,7 @@
 
 #include 
 #include 
+#include 
 
 #define KVM_API_VERSION 12
 
@@ -35,13 +36,6 @@ struct kvm_userspace_memory_region {
 /* for kvm_memory_region::flags */
 #define KVM_MEM_LOG_DIRTY_PAGES  1UL
 
-struct kvm_memory_alias {
-   __u32 slot;  /* this has a different namespace than memory slots */
-   __u32 flags;
-   __u64 guest_phys_addr;
-   __u64 memory_size;
-   __u64 target_phys_addr;
-};
 
 /* for KVM_IRQ_LINE */
 struct kvm_irq_level {
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 24/52] KVM: Move misplaced comment

2007-12-29 Thread Avi Kivity

From: Hollis Blanchard <[EMAIL PROTECTED]>

Signed-off-by: Hollis Blanchard <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index ccba958..52e8018 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -135,15 +135,15 @@ struct kvm_mmu {
 
 #define KVM_NR_MEM_OBJS 40
 
+/*
+ * We don't want allocation failures within the mmu code, so we preallocate
+ * enough memory for a single page fault in a cache.
+ */
 struct kvm_mmu_memory_cache {
int nobjs;
void *objects[KVM_NR_MEM_OBJS];
 };
 
-/*
- * We don't want allocation failures within the mmu code, so we preallocate
- * enough memory for a single page fault in a cache.
- */
 struct kvm_guest_debug {
int enabled;
unsigned long bp[4];
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 21/52] KVM: MMU: Remove unused variable

2007-12-29 Thread Avi Kivity

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/paging_tmpl.h |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h
index be66401..77a2b22 100644
--- a/drivers/kvm/paging_tmpl.h
+++ b/drivers/kvm/paging_tmpl.h
@@ -72,7 +72,6 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
struct kvm_vcpu *vcpu, gva_t addr,
int write_fault, int user_fault, int fetch_fault)
 {
-   pt_element_t *table;
pt_element_t pte;
gfn_t table_gfn;
unsigned index;
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 20/52] KVM: Add missing #include

2007-12-29 Thread Avi Kivity

Needed for empty_zero_page.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm_main.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 469e6b4..d6c5191 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -46,6 +46,7 @@
 #include 
 #include 
 #include 
+#include 
 
 MODULE_AUTHOR("Qumranet");
 MODULE_LICENSE("GPL");
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 22/52] KVM: Remove unused "rmap_overflow" variable

2007-12-29 Thread Avi Kivity

From: Hollis Blanchard <[EMAIL PROTECTED]>

Signed-off-by: Hollis Blanchard <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 1901456..ba78a45 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -318,7 +318,6 @@ struct kvm {
unsigned int n_alloc_mmu_pages;
struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
-   unsigned long rmap_overflow;
struct list_head vm_list;
struct file *filp;
struct kvm_io_bus mmio_bus;
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 23/52] KVM: Correct consistent typo: "destory" -> "destroy"

2007-12-29 Thread Avi Kivity

From: Hollis Blanchard <[EMAIL PROTECTED]>

Signed-off-by: Hollis Blanchard <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h  |2 +-
 drivers/kvm/kvm_main.c |2 +-
 drivers/kvm/x86.c  |2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index ba78a45..ccba958 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -461,7 +461,7 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu);
 struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id);
-void kvm_arch_vcpu_destory(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu);
 
 int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu);
 void kvm_arch_hardware_enable(void *garbage);
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index d6c5191..9c94491 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -824,7 +824,7 @@ unlink:
kvm->vcpus[n] = NULL;
mutex_unlock(>lock);
 vcpu_destroy:
-   kvm_arch_vcpu_destory(vcpu);
+   kvm_arch_vcpu_destroy(vcpu);
return r;
 }
 
diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index 2257a0a..5a1b72f 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -2513,7 +2513,7 @@ fail:
return ERR_PTR(r);
 }
 
-void kvm_arch_vcpu_destory(struct kvm_vcpu *vcpu)
+void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
vcpu_load(vcpu);
kvm_mmu_unload(vcpu);
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 16/52] KVM: Portability: Move some function declarations to x86.h

2007-12-29 Thread Avi Kivity

From: Zhang Xiantao <[EMAIL PROTECTED]>

Signed-off-by: Zhang Xiantao <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h |   84 -
 drivers/kvm/x86.h |   84 +
 2 files changed, 84 insertions(+), 84 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 41f6ee2..1901456 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -375,19 +375,6 @@ int kvm_init(void *opaque, unsigned int vcpu_size,
  struct module *module);
 void kvm_exit(void);
 
-int kvm_mmu_module_init(void);
-void kvm_mmu_module_exit(void);
-
-void kvm_mmu_destroy(struct kvm_vcpu *vcpu);
-int kvm_mmu_create(struct kvm_vcpu *vcpu);
-int kvm_mmu_setup(struct kvm_vcpu *vcpu);
-void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte);
-
-int kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
-void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
-void kvm_mmu_zap_all(struct kvm *kvm);
-void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
-
 hpa_t gpa_to_hpa(struct kvm *kvm, gpa_t gpa);
 #define HPA_MSB ((sizeof(hpa_t) * 8) - 1)
 #define HPA_ERR_MASK ((hpa_t)1 << HPA_MSB)
@@ -421,83 +408,12 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, 
gfn_t gfn);
 int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
 
-enum emulation_result {
-   EMULATE_DONE,   /* no further processing */
-   EMULATE_DO_MMIO,  /* kvm_run filled with mmio request */
-   EMULATE_FAIL, /* can't emulate this instruction */
-};
-
-int emulate_instruction(struct kvm_vcpu *vcpu, struct kvm_run *run,
-   unsigned long cr2, u16 error_code, int no_decode);
-void kvm_report_emulation_failure(struct kvm_vcpu *cvpu, const char *context);
-void realmode_lgdt(struct kvm_vcpu *vcpu, u16 size, unsigned long address);
-void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, unsigned long address);
-void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw,
-  unsigned long *rflags);
-
-unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr);
-void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value,
-unsigned long *rflags);
-int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data);
-int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
-
-struct x86_emulate_ctxt;
-
-int kvm_emulate_pio(struct kvm_vcpu *vcpu, struct kvm_run *run, int in,
-int size, unsigned port);
-int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, struct kvm_run *run, int in,
-  int size, unsigned long count, int down,
-   gva_t address, int rep, unsigned port);
-void kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
-int kvm_emulate_halt(struct kvm_vcpu *vcpu);
-int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address);
-int emulate_clts(struct kvm_vcpu *vcpu);
-int emulator_get_dr(struct x86_emulate_ctxt *ctxt, int dr,
-   unsigned long *dest);
-int emulator_set_dr(struct x86_emulate_ctxt *ctxt, int dr,
-   unsigned long value);
-
-void set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
-void set_cr3(struct kvm_vcpu *vcpu, unsigned long cr0);
-void set_cr4(struct kvm_vcpu *vcpu, unsigned long cr0);
-void set_cr8(struct kvm_vcpu *vcpu, unsigned long cr0);
-unsigned long get_cr8(struct kvm_vcpu *vcpu);
-void lmsw(struct kvm_vcpu *vcpu, unsigned long msw);
-void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l);
-
-int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
-int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data);
-
-void fx_init(struct kvm_vcpu *vcpu);
-
 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
 void kvm_resched(struct kvm_vcpu *vcpu);
 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
 void kvm_flush_remote_tlbs(struct kvm *kvm);
 
-int emulator_read_std(unsigned long addr,
- void *val,
- unsigned int bytes,
- struct kvm_vcpu *vcpu);
-int emulator_write_emulated(unsigned long addr,
-   const void *val,
-   unsigned int bytes,
-   struct kvm_vcpu *vcpu);
-
-unsigned long segment_base(u16 selector);
-
-void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-  const u8 *new, int bytes);
-int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva);
-void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu);
-int kvm_mmu_load(struct kvm_vcpu *vcpu);
-void kvm_mmu_unload(struct kvm_vcpu *vcpu);
-
-int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
-
-int kvm_fix_hypercall(struct kvm_vcpu *vcpu);
-
 long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg);
 long

[PATCH 18/52] KVM: MMU: Change guest pte access to kvm_{read,write}_guest()

2007-12-29 Thread Avi Kivity

From: Izik Eidus <[EMAIL PROTECTED]>

Things are simpler and more regular this way.

Signed-off-by: Izik Eidus <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/paging_tmpl.h |   24 +---
 1 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h
index 0f0266a..be66401 100644
--- a/drivers/kvm/paging_tmpl.h
+++ b/drivers/kvm/paging_tmpl.h
@@ -72,7 +72,6 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
struct kvm_vcpu *vcpu, gva_t addr,
int write_fault, int user_fault, int fetch_fault)
 {
-   struct page *page = NULL;
pt_element_t *table;
pt_element_t pte;
gfn_t table_gfn;
@@ -99,16 +98,13 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
index = PT_INDEX(addr, walker->level);
 
table_gfn = (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
+   pte_gpa = table_gfn << PAGE_SHIFT;
+   pte_gpa += index * sizeof(pt_element_t);
walker->table_gfn[walker->level - 1] = table_gfn;
pgprintk("%s: table_gfn[%d] %lx\n", __FUNCTION__,
 walker->level - 1, table_gfn);
 
-   page = gfn_to_page(vcpu->kvm, (pte & PT64_BASE_ADDR_MASK)
-  >> PAGE_SHIFT);
-
-   table = kmap_atomic(page, KM_USER0);
-   pte = table[index];
-   kunmap_atomic(table, KM_USER0);
+   kvm_read_guest(vcpu->kvm, pte_gpa, , sizeof(pte));
 
if (!is_present_pte(pte))
goto not_present;
@@ -128,9 +124,7 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
if (!(pte & PT_ACCESSED_MASK)) {
mark_page_dirty(vcpu->kvm, table_gfn);
pte |= PT_ACCESSED_MASK;
-   table = kmap_atomic(page, KM_USER0);
-   table[index] = pte;
-   kunmap_atomic(table, KM_USER0);
+   kvm_write_guest(vcpu->kvm, pte_gpa, , sizeof(pte));
}
 
if (walker->level == PT_PAGE_TABLE_LEVEL) {
@@ -149,21 +143,15 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
 
walker->inherited_ar &= pte;
--walker->level;
-   kvm_release_page(page);
}
 
if (write_fault && !is_dirty_pte(pte)) {
mark_page_dirty(vcpu->kvm, table_gfn);
pte |= PT_DIRTY_MASK;
-   table = kmap_atomic(page, KM_USER0);
-   table[index] = pte;
-   kunmap_atomic(table, KM_USER0);
-   pte_gpa = table_gfn << PAGE_SHIFT;
-   pte_gpa += index * sizeof(pt_element_t);
+   kvm_write_guest(vcpu->kvm, pte_gpa, , sizeof(pte));
kvm_mmu_pte_write(vcpu, pte_gpa, (u8 *), sizeof(pte));
}
 
-   kvm_release_page(page);
walker->pte = pte;
pgprintk("%s: pte %llx\n", __FUNCTION__, (u64)pte);
return 1;
@@ -182,8 +170,6 @@ err:
walker->error_code |= PFERR_USER_MASK;
if (fetch_fault)
walker->error_code |= PFERR_FETCH_MASK;
-   if (page)
-   kvm_release_page(page);
return 0;
 }
 
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 13/52] KVM: Portability: Move struct kvm_x86_ops definition to x86.h

2007-12-29 Thread Avi Kivity

From: Zhang Xiantao <[EMAIL PROTECTED]>

Signed-off-by: Zhang Xiantao <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h |   69 -
 drivers/kvm/x86.h |   67 +++
 2 files changed, 67 insertions(+), 69 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index aceecf4..e4e1ff7 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -386,75 +386,6 @@ struct descriptor_table {
unsigned long base;
 } __attribute__((packed));
 
-struct kvm_x86_ops {
-   int (*cpu_has_kvm_support)(void);  /* __init */
-   int (*disabled_by_bios)(void); /* __init */
-   void (*hardware_enable)(void *dummy);  /* __init */
-   void (*hardware_disable)(void *dummy);
-   void (*check_processor_compatibility)(void *rtn);
-   int (*hardware_setup)(void);   /* __init */
-   void (*hardware_unsetup)(void);/* __exit */
-
-   /* Create, but do not attach this VCPU */
-   struct kvm_vcpu *(*vcpu_create)(struct kvm *kvm, unsigned id);
-   void (*vcpu_free)(struct kvm_vcpu *vcpu);
-   int (*vcpu_reset)(struct kvm_vcpu *vcpu);
-
-   void (*prepare_guest_switch)(struct kvm_vcpu *vcpu);
-   void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu);
-   void (*vcpu_put)(struct kvm_vcpu *vcpu);
-   void (*vcpu_decache)(struct kvm_vcpu *vcpu);
-
-   int (*set_guest_debug)(struct kvm_vcpu *vcpu,
-  struct kvm_debug_guest *dbg);
-   void (*guest_debug_pre)(struct kvm_vcpu *vcpu);
-   int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata);
-   int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
-   u64 (*get_segment_base)(struct kvm_vcpu *vcpu, int seg);
-   void (*get_segment)(struct kvm_vcpu *vcpu,
-   struct kvm_segment *var, int seg);
-   void (*set_segment)(struct kvm_vcpu *vcpu,
-   struct kvm_segment *var, int seg);
-   void (*get_cs_db_l_bits)(struct kvm_vcpu *vcpu, int *db, int *l);
-   void (*decache_cr4_guest_bits)(struct kvm_vcpu *vcpu);
-   void (*set_cr0)(struct kvm_vcpu *vcpu, unsigned long cr0);
-   void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3);
-   void (*set_cr4)(struct kvm_vcpu *vcpu, unsigned long cr4);
-   void (*set_efer)(struct kvm_vcpu *vcpu, u64 efer);
-   void (*get_idt)(struct kvm_vcpu *vcpu, struct descriptor_table *dt);
-   void (*set_idt)(struct kvm_vcpu *vcpu, struct descriptor_table *dt);
-   void (*get_gdt)(struct kvm_vcpu *vcpu, struct descriptor_table *dt);
-   void (*set_gdt)(struct kvm_vcpu *vcpu, struct descriptor_table *dt);
-   unsigned long (*get_dr)(struct kvm_vcpu *vcpu, int dr);
-   void (*set_dr)(struct kvm_vcpu *vcpu, int dr, unsigned long value,
-  int *exception);
-   void (*cache_regs)(struct kvm_vcpu *vcpu);
-   void (*decache_regs)(struct kvm_vcpu *vcpu);
-   unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
-   void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
-
-   void (*tlb_flush)(struct kvm_vcpu *vcpu);
-   void (*inject_page_fault)(struct kvm_vcpu *vcpu,
- unsigned long addr, u32 err_code);
-
-   void (*inject_gp)(struct kvm_vcpu *vcpu, unsigned err_code);
-
-   void (*run)(struct kvm_vcpu *vcpu, struct kvm_run *run);
-   int (*handle_exit)(struct kvm_run *run, struct kvm_vcpu *vcpu);
-   void (*skip_emulated_instruction)(struct kvm_vcpu *vcpu);
-   void (*patch_hypercall)(struct kvm_vcpu *vcpu,
-   unsigned char *hypercall_addr);
-   int (*get_irq)(struct kvm_vcpu *vcpu);
-   void (*set_irq)(struct kvm_vcpu *vcpu, int vec);
-   void (*inject_pending_irq)(struct kvm_vcpu *vcpu);
-   void (*inject_pending_vectors)(struct kvm_vcpu *vcpu,
-  struct kvm_run *run);
-
-   int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
-};
-
-extern struct kvm_x86_ops *kvm_x86_ops;
-
 /* The guest did something we don't support. */
 #define pr_unimpl(vcpu, fmt, ...)  \
  do {  \
diff --git a/drivers/kvm/x86.h b/drivers/kvm/x86.h
index ec1d669..77b4092 100644
--- a/drivers/kvm/x86.h
+++ b/drivers/kvm/x86.h
@@ -121,6 +121,73 @@ struct kvm_vcpu {
struct x86_emulate_ctxt emulate_ctxt;
 };
 
+struct kvm_x86_ops {
+   int (*cpu_has_kvm_support)(void);  /* __init */
+   int (*disabled_by_bios)(void); /* __init */
+   void (*hardware_enable)(void *dummy);  /* __init */
+   void (*hardware_disable)(void *dummy);
+   void (*check_processor_compatibility)(void *rtn);
+   int (*hardware_setup)(void);   /* __init */
+

[PATCH 15/52] KVM: Move some static inline functions out from kvm.h into x86.h

2007-12-29 Thread Avi Kivity

From: Zhang Xiantao <[EMAIL PROTECTED]>

Signed-off-by: Zhang Xiantao <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h |  109 
 drivers/kvm/x86.h |  110 +
 2 files changed, 110 insertions(+), 109 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 1c4de50..41f6ee2 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -577,115 +577,6 @@ static inline int memslot_id(struct kvm *kvm, struct 
kvm_memory_slot *slot)
return slot - kvm->memslots;
 }
 
-static inline struct kvm_mmu_page *page_header(hpa_t shadow_page)
-{
-   struct page *page = pfn_to_page(shadow_page >> PAGE_SHIFT);
-
-   return (struct kvm_mmu_page *)page_private(page);
-}
-
-static inline u16 read_fs(void)
-{
-   u16 seg;
-   asm("mov %%fs, %0" : "=g"(seg));
-   return seg;
-}
-
-static inline u16 read_gs(void)
-{
-   u16 seg;
-   asm("mov %%gs, %0" : "=g"(seg));
-   return seg;
-}
-
-static inline u16 read_ldt(void)
-{
-   u16 ldt;
-   asm("sldt %0" : "=g"(ldt));
-   return ldt;
-}
-
-static inline void load_fs(u16 sel)
-{
-   asm("mov %0, %%fs" : : "rm"(sel));
-}
-
-static inline void load_gs(u16 sel)
-{
-   asm("mov %0, %%gs" : : "rm"(sel));
-}
-
-#ifndef load_ldt
-static inline void load_ldt(u16 sel)
-{
-   asm("lldt %0" : : "rm"(sel));
-}
-#endif
-
-static inline void get_idt(struct descriptor_table *table)
-{
-   asm("sidt %0" : "=m"(*table));
-}
-
-static inline void get_gdt(struct descriptor_table *table)
-{
-   asm("sgdt %0" : "=m"(*table));
-}
-
-static inline unsigned long read_tr_base(void)
-{
-   u16 tr;
-   asm("str %0" : "=g"(tr));
-   return segment_base(tr);
-}
-
-#ifdef CONFIG_X86_64
-static inline unsigned long read_msr(unsigned long msr)
-{
-   u64 value;
-
-   rdmsrl(msr, value);
-   return value;
-}
-#endif
-
-static inline void fx_save(struct i387_fxsave_struct *image)
-{
-   asm("fxsave (%0)":: "r" (image));
-}
-
-static inline void fx_restore(struct i387_fxsave_struct *image)
-{
-   asm("fxrstor (%0)":: "r" (image));
-}
-
-static inline void fpu_init(void)
-{
-   asm("finit");
-}
-
-static inline u32 get_rdx_init_val(void)
-{
-   return 0x600; /* P6 family */
-}
-
-#define ASM_VMX_VMCLEAR_RAX   ".byte 0x66, 0x0f, 0xc7, 0x30"
-#define ASM_VMX_VMLAUNCH  ".byte 0x0f, 0x01, 0xc2"
-#define ASM_VMX_VMRESUME  ".byte 0x0f, 0x01, 0xc3"
-#define ASM_VMX_VMPTRLD_RAX   ".byte 0x0f, 0xc7, 0x30"
-#define ASM_VMX_VMREAD_RDX_RAX".byte 0x0f, 0x78, 0xd0"
-#define ASM_VMX_VMWRITE_RAX_RDX   ".byte 0x0f, 0x79, 0xd0"
-#define ASM_VMX_VMWRITE_RSP_RDX   ".byte 0x0f, 0x79, 0xd4"
-#define ASM_VMX_VMXOFF".byte 0x0f, 0x01, 0xc4"
-#define ASM_VMX_VMXON_RAX ".byte 0xf3, 0x0f, 0xc7, 0x30"
-
-#define MSR_IA32_TIME_STAMP_COUNTER0x010
-
-#define TSS_IOPB_BASE_OFFSET 0x66
-#define TSS_BASE_SIZE 0x68
-#define TSS_IOPB_SIZE (65536 / 8)
-#define TSS_REDIRECTION_SIZE (256 / 8)
-#define RMODE_TSS_SIZE (TSS_BASE_SIZE + TSS_REDIRECTION_SIZE + TSS_IOPB_SIZE + 
1)
 
 enum kvm_stat_kind {
KVM_STAT_VM,
diff --git a/drivers/kvm/x86.h b/drivers/kvm/x86.h
index 77a4a4a..f1c43ca 100644
--- a/drivers/kvm/x86.h
+++ b/drivers/kvm/x86.h
@@ -267,4 +267,114 @@ static inline int is_paging(struct kvm_vcpu *vcpu)
 
 int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
 int complete_pio(struct kvm_vcpu *vcpu);
+
+static inline struct kvm_mmu_page *page_header(hpa_t shadow_page)
+{
+   struct page *page = pfn_to_page(shadow_page >> PAGE_SHIFT);
+
+   return (struct kvm_mmu_page *)page_private(page);
+}
+
+static inline u16 read_fs(void)
+{
+   u16 seg;
+   asm("mov %%fs, %0" : "=g"(seg));
+   return seg;
+}
+
+static inline u16 read_gs(void)
+{
+   u16 seg;
+   asm("mov %%gs, %0" : "=g"(seg));
+   return seg;
+}
+
+static inline u16 read_ldt(void)
+{
+   u16 ldt;
+   asm("sldt %0" : "=g"(ldt));
+   return ldt;
+}
+
+static inline void load_fs(u16 sel)
+{
+   asm("mov %0, %%fs" : : "rm"(sel));
+}
+
+static inline void load_gs(u16 sel)
+{
+   asm("mov %0, %%gs" : : "rm"(sel));
+}
+
+#ifndef load_ldt
+static inline void load_ldt(u16 sel)
+{
+   asm("lldt %0" : : "rm"(sel));
+}
+#endif
+
+static inline void get_idt(struct descriptor_table *table)
+{
+   asm("sidt %0" : "=m"(*table));
+}
+
+static inline void get_gdt(struct descriptor_table *table)
+{
+   asm("sgdt %0" : "=m"(*table));
+}
+
+static inline unsigned long read_tr_base(void)
+{
+   u16 tr;
+   asm("str %0" : "=g"(tr));
+   return segment_base(tr);
+}
+
+#ifdef CONFIG_X86_64
+static inline unsigned long read_msr(unsigned long msr)
+{
+   u64 value;
+
+   rdmsrl(msr, value);
+   return value;
+}
+#endif
+
+static inline void fx_save(struct i387_fxsave_struct *image)
+{
+   asm("fxsave

[PATCH 17/52] KVM: VMX: Force seg.base == (seg.sel << 4) in real mode

2007-12-29 Thread Avi Kivity

From: Jan Kiszka <[EMAIL PROTECTED]>

Ensure that segment.base == segment.selector << 4 when entering the real
mode on Intel so that the CPU will not bark at us.  This fixes some old
protected mode demo from http://www.x86.org/articles/pmbasics/tspec_a1_doc.htm.

Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/vmx.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index c23f399..fbe792d 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -1165,7 +1165,8 @@ static void fix_rmode_seg(int seg, struct 
kvm_save_segment *save)
save->base = vmcs_readl(sf->base);
save->limit = vmcs_read32(sf->limit);
save->ar = vmcs_read32(sf->ar_bytes);
-   vmcs_write16(sf->selector, vmcs_readl(sf->base) >> 4);
+   vmcs_write16(sf->selector, save->base >> 4);
+   vmcs_write32(sf->base, save->base & 0xf);
vmcs_write32(sf->limit, 0x);
vmcs_write32(sf->ar_bytes, 0xf3);
 }
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 14/52] KVM: Portability: Move vcpu regs enumeration definition to x86.h

2007-12-29 Thread Avi Kivity

From: Zhang Xiantao <[EMAIL PROTECTED]>

Signed-off-by: Zhang Xiantao <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h |   35 ---
 drivers/kvm/x86.h |   35 +++
 2 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index e4e1ff7..1c4de50 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -150,41 +150,6 @@ struct kvm_guest_debug {
int singlestep;
 };
 
-enum {
-   VCPU_REGS_RAX = 0,
-   VCPU_REGS_RCX = 1,
-   VCPU_REGS_RDX = 2,
-   VCPU_REGS_RBX = 3,
-   VCPU_REGS_RSP = 4,
-   VCPU_REGS_RBP = 5,
-   VCPU_REGS_RSI = 6,
-   VCPU_REGS_RDI = 7,
-#ifdef CONFIG_X86_64
-   VCPU_REGS_R8 = 8,
-   VCPU_REGS_R9 = 9,
-   VCPU_REGS_R10 = 10,
-   VCPU_REGS_R11 = 11,
-   VCPU_REGS_R12 = 12,
-   VCPU_REGS_R13 = 13,
-   VCPU_REGS_R14 = 14,
-   VCPU_REGS_R15 = 15,
-#endif
-   NR_VCPU_REGS
-};
-
-enum {
-   VCPU_SREG_CS,
-   VCPU_SREG_DS,
-   VCPU_SREG_ES,
-   VCPU_SREG_FS,
-   VCPU_SREG_GS,
-   VCPU_SREG_SS,
-   VCPU_SREG_TR,
-   VCPU_SREG_LDTR,
-};
-
-#include "x86_emulate.h"
-
 struct kvm_pio_request {
unsigned long count;
int cur_count;
diff --git a/drivers/kvm/x86.h b/drivers/kvm/x86.h
index 77b4092..77a4a4a 100644
--- a/drivers/kvm/x86.h
+++ b/drivers/kvm/x86.h
@@ -55,6 +55,41 @@
 extern spinlock_t kvm_lock;
 extern struct list_head vm_list;
 
+enum {
+   VCPU_REGS_RAX = 0,
+   VCPU_REGS_RCX = 1,
+   VCPU_REGS_RDX = 2,
+   VCPU_REGS_RBX = 3,
+   VCPU_REGS_RSP = 4,
+   VCPU_REGS_RBP = 5,
+   VCPU_REGS_RSI = 6,
+   VCPU_REGS_RDI = 7,
+#ifdef CONFIG_X86_64
+   VCPU_REGS_R8 = 8,
+   VCPU_REGS_R9 = 9,
+   VCPU_REGS_R10 = 10,
+   VCPU_REGS_R11 = 11,
+   VCPU_REGS_R12 = 12,
+   VCPU_REGS_R13 = 13,
+   VCPU_REGS_R14 = 14,
+   VCPU_REGS_R15 = 15,
+#endif
+   NR_VCPU_REGS
+};
+
+enum {
+   VCPU_SREG_CS,
+   VCPU_SREG_DS,
+   VCPU_SREG_ES,
+   VCPU_SREG_FS,
+   VCPU_SREG_GS,
+   VCPU_SREG_SS,
+   VCPU_SREG_TR,
+   VCPU_SREG_LDTR,
+};
+
+#include "x86_emulate.h"
+
 struct kvm_vcpu {
KVM_VCPU_COMM;
u64 host_tsc;
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 19/52] KVM: Simplify kvm_clear_guest_page()

2007-12-29 Thread Avi Kivity

From: Izik Eidus <[EMAIL PROTECTED]>

Use kvm_write_guest_page() with empty_zero_page, instead of doing
kmap and memset.

Signed-off-by: Izik Eidus <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm_main.c |   17 +
 1 files changed, 1 insertions(+), 16 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index f9fd865..469e6b4 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -633,22 +633,7 @@ int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void 
*data,
 
 int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len)
 {
-   void *page_virt;
-   struct page *page;
-
-   page = gfn_to_page(kvm, gfn);
-   if (is_error_page(page)) {
-   kvm_release_page(page);
-   return -EFAULT;
-   }
-   page_virt = kmap_atomic(page, KM_USER0);
-
-   memset(page_virt + offset, 0, len);
-
-   kunmap_atomic(page_virt, KM_USER0);
-   kvm_release_page(page);
-   mark_page_dirty(kvm, gfn);
-   return 0;
+   return kvm_write_guest_page(kvm, gfn, empty_zero_page, offset, len);
 }
 EXPORT_SYMBOL_GPL(kvm_clear_guest_page);
 
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 11/52] KVM: Portability: MMU initialization and teardown split

2007-12-29 Thread Avi Kivity

From: Zhang Xiantao <[EMAIL PROTECTED]>

Move out kvm_mmu init and exit functionality from kvm_main.c

Signed-off-by: Zhang Xiantao <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm_main.c |8 
 drivers/kvm/x86.c  |   24 +++-
 2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index e64dfa2..f9fd865 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -1383,10 +1383,6 @@ int kvm_init(void *opaque, unsigned int vcpu_size,
int r;
int cpu;
 
-   r = kvm_mmu_module_init();
-   if (r)
-   goto out4;
-
kvm_init_debug();
 
r = kvm_arch_init(opaque);
@@ -1446,8 +1442,6 @@ int kvm_init(void *opaque, unsigned int vcpu_size,
kvm_preempt_ops.sched_in = kvm_sched_in;
kvm_preempt_ops.sched_out = kvm_sched_out;
 
-   kvm_mmu_set_nonpresent_ptes(0ull, 0ull);
-
return 0;
 
 out_free:
@@ -1466,7 +1460,6 @@ out_free_0:
 out:
kvm_arch_exit();
kvm_exit_debug();
-   kvm_mmu_module_exit();
 out4:
return r;
 }
@@ -1485,6 +1478,5 @@ void kvm_exit(void)
kvm_arch_exit();
kvm_exit_debug();
__free_page(bad_page);
-   kvm_mmu_module_exit();
 }
 EXPORT_SYMBOL_GPL(kvm_exit);
diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index 935e276..2257a0a 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -1711,33 +1711,47 @@ EXPORT_SYMBOL_GPL(kvm_emulate_pio_string);
 
 int kvm_arch_init(void *opaque)
 {
+   int r;
struct kvm_x86_ops *ops = (struct kvm_x86_ops *)opaque;
 
+   r = kvm_mmu_module_init();
+   if (r)
+   goto out_fail;
+
kvm_init_msr_list();
 
if (kvm_x86_ops) {
printk(KERN_ERR "kvm: already loaded the other module\n");
-   return -EEXIST;
+   r = -EEXIST;
+   goto out;
}
 
if (!ops->cpu_has_kvm_support()) {
printk(KERN_ERR "kvm: no hardware support\n");
-   return -EOPNOTSUPP;
+   r = -EOPNOTSUPP;
+   goto out;
}
if (ops->disabled_by_bios()) {
printk(KERN_ERR "kvm: disabled by bios\n");
-   return -EOPNOTSUPP;
+   r = -EOPNOTSUPP;
+   goto out;
}
 
kvm_x86_ops = ops;
-
+   kvm_mmu_set_nonpresent_ptes(0ull, 0ull);
return 0;
+
+out:
+   kvm_mmu_module_exit();
+out_fail:
+   return r;
 }
 
 void kvm_arch_exit(void)
 {
kvm_x86_ops = NULL;
- }
+   kvm_mmu_module_exit();
+}
 
 int kvm_emulate_halt(struct kvm_vcpu *vcpu)
 {
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 12/52] KVM: Portability: Move some macro definitions from kvm.h to x86.h

2007-12-29 Thread Avi Kivity

From: Zhang Xiantao <[EMAIL PROTECTED]>

Signed-off-by: Zhang Xiantao <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h |   33 -
 drivers/kvm/x86.h |   33 +
 2 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index c1aa84f..aceecf4 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -20,24 +20,6 @@
 #include 
 #include 
 
-#define CR3_PAE_RESERVED_BITS ((X86_CR3_PWT | X86_CR3_PCD) - 1)
-#define CR3_NONPAE_RESERVED_BITS ((PAGE_SIZE-1) & ~(X86_CR3_PWT | X86_CR3_PCD))
-#define CR3_L_MODE_RESERVED_BITS 
(CR3_NONPAE_RESERVED_BITS|0xFF00ULL)
-
-#define KVM_GUEST_CR0_MASK \
-   (X86_CR0_PG | X86_CR0_PE | X86_CR0_WP | X86_CR0_NE \
-| X86_CR0_NW | X86_CR0_CD)
-#define KVM_VM_CR0_ALWAYS_ON \
-   (X86_CR0_PG | X86_CR0_PE | X86_CR0_WP | X86_CR0_NE | X86_CR0_TS \
-| X86_CR0_MP)
-#define KVM_GUEST_CR4_MASK \
-   (X86_CR4_VME | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_PGE | X86_CR4_VMXE)
-#define KVM_PMODE_VM_CR4_ALWAYS_ON (X86_CR4_PAE | X86_CR4_VMXE)
-#define KVM_RMODE_VM_CR4_ALWAYS_ON (X86_CR4_VME | X86_CR4_PAE | X86_CR4_VMXE)
-
-#define INVALID_PAGE (~(hpa_t)0)
-#define UNMAPPED_GVA (~(gpa_t)0)
-
 #define KVM_MAX_VCPUS 4
 #define KVM_ALIAS_SLOTS 4
 #define KVM_MEMORY_SLOTS 8
@@ -50,21 +32,6 @@
 #define KVM_REFILL_PAGES 25
 #define KVM_MAX_CPUID_ENTRIES 40
 
-#define DE_VECTOR 0
-#define UD_VECTOR 6
-#define NM_VECTOR 7
-#define DF_VECTOR 8
-#define TS_VECTOR 10
-#define NP_VECTOR 11
-#define SS_VECTOR 12
-#define GP_VECTOR 13
-#define PF_VECTOR 14
-
-#define SELECTOR_TI_MASK (1 << 2)
-#define SELECTOR_RPL_MASK 0x03
-
-#define IOPL_SHIFT 12
-
 #define KVM_PIO_PAGE_OFFSET 1
 
 /*
diff --git a/drivers/kvm/x86.h b/drivers/kvm/x86.h
index 4df0641..ec1d669 100644
--- a/drivers/kvm/x86.h
+++ b/drivers/kvm/x86.h
@@ -19,6 +19,39 @@
 #include 
 #include 
 
+#define CR3_PAE_RESERVED_BITS ((X86_CR3_PWT | X86_CR3_PCD) - 1)
+#define CR3_NONPAE_RESERVED_BITS ((PAGE_SIZE-1) & ~(X86_CR3_PWT | X86_CR3_PCD))
+#define CR3_L_MODE_RESERVED_BITS 
(CR3_NONPAE_RESERVED_BITS|0xFF00ULL)
+
+#define KVM_GUEST_CR0_MASK \
+   (X86_CR0_PG | X86_CR0_PE | X86_CR0_WP | X86_CR0_NE \
+| X86_CR0_NW | X86_CR0_CD)
+#define KVM_VM_CR0_ALWAYS_ON \
+   (X86_CR0_PG | X86_CR0_PE | X86_CR0_WP | X86_CR0_NE | X86_CR0_TS \
+| X86_CR0_MP)
+#define KVM_GUEST_CR4_MASK \
+   (X86_CR4_VME | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_PGE | X86_CR4_VMXE)
+#define KVM_PMODE_VM_CR4_ALWAYS_ON (X86_CR4_PAE | X86_CR4_VMXE)
+#define KVM_RMODE_VM_CR4_ALWAYS_ON (X86_CR4_VME | X86_CR4_PAE | X86_CR4_VMXE)
+
+#define INVALID_PAGE (~(hpa_t)0)
+#define UNMAPPED_GVA (~(gpa_t)0)
+
+#define DE_VECTOR 0
+#define UD_VECTOR 6
+#define NM_VECTOR 7
+#define DF_VECTOR 8
+#define TS_VECTOR 10
+#define NP_VECTOR 11
+#define SS_VECTOR 12
+#define GP_VECTOR 13
+#define PF_VECTOR 14
+
+#define SELECTOR_TI_MASK (1 << 2)
+#define SELECTOR_RPL_MASK 0x03
+
+#define IOPL_SHIFT 12
+
 extern spinlock_t kvm_lock;
 extern struct list_head vm_list;
 
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 07/52] KVM: Extend stats support for VM stats

2007-12-29 Thread Avi Kivity

This is in addition to the current virtual cpu statistics.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h  |   14 --
 drivers/kvm/kvm_main.c |   26 +++---
 drivers/kvm/x86.c  |   39 ---
 3 files changed, 55 insertions(+), 24 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 5a8a9af..d3171f9 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -231,7 +231,7 @@ struct kvm_pio_request {
int rep;
 };
 
-struct kvm_stat {
+struct kvm_vcpu_stat {
u32 pf_fixed;
u32 pf_guest;
u32 tlb_flush;
@@ -342,7 +342,7 @@ void kvm_io_bus_register_dev(struct kvm_io_bus *bus,
wait_queue_head_t wq;   \
int sigset_active;  \
sigset_t sigset;\
-   struct kvm_stat stat;   \
+   struct kvm_vcpu_stat stat;  \
KVM_VCPU_MMIO
 
 struct kvm_mem_alias {
@@ -361,6 +361,9 @@ struct kvm_memory_slot {
int user_alloc;
 };
 
+struct kvm_vm_stat {
+};
+
 struct kvm {
struct mutex lock; /* protects everything except vcpus */
int naliases;
@@ -387,6 +390,7 @@ struct kvm {
int round_robin_prev_vcpu;
unsigned int tss_addr;
struct page *apic_access_page;
+   struct kvm_vm_stat stat;
 };
 
 static inline struct kvm_pic *pic_irqchip(struct kvm *kvm)
@@ -809,9 +813,15 @@ static inline u32 get_rdx_init_val(void)
 #define TSS_REDIRECTION_SIZE (256 / 8)
 #define RMODE_TSS_SIZE (TSS_BASE_SIZE + TSS_REDIRECTION_SIZE + TSS_IOPB_SIZE + 
1)
 
+enum kvm_stat_kind {
+   KVM_STAT_VM,
+   KVM_STAT_VCPU,
+};
+
 struct kvm_stats_debugfs_item {
const char *name;
int offset;
+   enum kvm_stat_kind kind;
struct dentry *dentry;
 };
 extern struct kvm_stats_debugfs_item debugfs_entries[];
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 3aa34de..1c4e950 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -1281,7 +1281,22 @@ static struct notifier_block kvm_cpu_notifier = {
.priority = 20, /* must be > scheduler priority */
 };
 
-static u64 stat_get(void *_offset)
+static u64 vm_stat_get(void *_offset)
+{
+   unsigned offset = (long)_offset;
+   u64 total = 0;
+   struct kvm *kvm;
+
+   spin_lock(_lock);
+   list_for_each_entry(kvm, _list, vm_list)
+   total += *(u32 *)((void *)kvm + offset);
+   spin_unlock(_lock);
+   return total;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(vm_stat_fops, vm_stat_get, NULL, "%llu\n");
+
+static u64 vcpu_stat_get(void *_offset)
 {
unsigned offset = (long)_offset;
u64 total = 0;
@@ -1300,7 +1315,12 @@ static u64 stat_get(void *_offset)
return total;
 }
 
-DEFINE_SIMPLE_ATTRIBUTE(stat_fops, stat_get, NULL, "%llu\n");
+DEFINE_SIMPLE_ATTRIBUTE(vcpu_stat_fops, vcpu_stat_get, NULL, "%llu\n");
+
+static struct file_operations *stat_fops[] = {
+   [KVM_STAT_VCPU] = _stat_fops,
+   [KVM_STAT_VM]   = _stat_fops,
+};
 
 static void kvm_init_debug(void)
 {
@@ -1310,7 +1330,7 @@ static void kvm_init_debug(void)
for (p = debugfs_entries; p->name; ++p)
p->dentry = debugfs_create_file(p->name, 0444, debugfs_dir,
(void *)(long)p->offset,
-   _fops);
+   stat_fops[p->kind]);
 }
 
 static void kvm_exit_debug(void)
diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index a46b95b..016abc3 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -42,29 +42,30 @@
 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
 #define EFER_RESERVED_BITS 0xf2fe
 
-#define STAT_OFFSET(x) offsetof(struct kvm_vcpu, stat.x)
+#define VM_STAT(x) offsetof(struct kvm, stat.x), KVM_STAT_VM
+#define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
 
 struct kvm_x86_ops *kvm_x86_ops;
 
 struct kvm_stats_debugfs_item debugfs_entries[] = {
-   { "pf_fixed", STAT_OFFSET(pf_fixed) },
-   { "pf_guest", STAT_OFFSET(pf_guest) },
-   { "tlb_flush", STAT_OFFSET(tlb_flush) },
-   { "invlpg", STAT_OFFSET(invlpg) },
-   { "exits", STAT_OFFSET(exits) },
-   { "io_exits", STAT_OFFSET(io_exits) },
-   { "mmio_exits", STAT_OFFSET(mmio_exits) },
-   { "signal_exits", STAT_OFFSET(signal_exits) },
-   { "irq_window", STAT_OFFSET(irq_window_exits) },
-   { "halt_exits", STAT_OFFSET(halt_exits) },
-   { "halt_wakeup", STAT_OFFSET(halt_wakeup) },
-   { "request_irq", STAT_OFFSET(request_irq_exits) },
-   { "irq_exits", STAT_OFFSET(irq_exits) },
-   { "host_state_reload", STAT_OFFSET(host_state_reload) },
-   { "efer_reload", STAT_OFFSET(efer_reload) },
-   { "fpu_reload", STAT_OFFSET(fpu_reload) },
-   { "insn_emulation",

[PATCH 09/52] KVM: Make unloading of FPU state when putting vcpu arch-independent

2007-12-29 Thread Avi Kivity

From: Amit Shah <[EMAIL PROTECTED]>

Instead of having each architecture do it individually, we
do this in the arch-independent code (just x86 as of now).

[avi: add svm to the mix, which was added to mainline during the
 2.6.24-rc process]

Signed-off-by: Amit Shah <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/svm.c |1 -
 drivers/kvm/vmx.c |1 -
 drivers/kvm/x86.c |1 +
 3 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index 94c51a0..928fb35 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -659,7 +659,6 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
wrmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
 
rdtscll(vcpu->host_tsc);
-   kvm_put_guest_fpu(vcpu);
 }
 
 static void svm_vcpu_decache(struct kvm_vcpu *vcpu)
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index 4e60cf9..c23f399 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -541,7 +541,6 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
 {
vmx_load_host_state(to_vmx(vcpu));
-   kvm_put_guest_fpu(vcpu);
 }
 
 static void vmx_fpu_activate(struct kvm_vcpu *vcpu)
diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index fdc7632..9618fcb 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -678,6 +678,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
kvm_x86_ops->vcpu_put(vcpu);
+   kvm_put_guest_fpu(vcpu);
 }
 
 static void cpuid_fix_nx_cap(struct kvm_vcpu *vcpu)
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 05/52] KVM: Add fpu_reload counter

2007-12-29 Thread Avi Kivity

Measure the number of times we switch the fpu state.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h |1 +
 drivers/kvm/x86.c |2 ++
 2 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 04efe88..a85c590 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -248,6 +248,7 @@ struct kvm_stat {
u32 irq_exits;
u32 host_state_reload;
u32 efer_reload;
+   u32 fpu_reload;
 };
 
 struct kvm_io_device {
diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index 923dfd4..c1211e1 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -62,6 +62,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ "irq_exits", STAT_OFFSET(irq_exits) },
{ "host_state_reload", STAT_OFFSET(host_state_reload) },
{ "efer_reload", STAT_OFFSET(efer_reload) },
+   { "fpu_reload", STAT_OFFSET(fpu_reload) },
{ NULL }
 };
 
@@ -2417,6 +2418,7 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
vcpu->guest_fpu_loaded = 0;
fx_save(>guest_fx_image);
fx_restore(>host_fx_image);
+   ++vcpu->stat.fpu_reload;
 }
 EXPORT_SYMBOL_GPL(kvm_put_guest_fpu);
 
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 08/52] KVM: MMU: Add some mmu statistics

2007-12-29 Thread Avi Kivity

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h |6 ++
 drivers/kvm/mmu.c |9 -
 drivers/kvm/x86.c |6 ++
 3 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index d3171f9..bdcc44e 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -362,6 +362,12 @@ struct kvm_memory_slot {
 };
 
 struct kvm_vm_stat {
+   u32 mmu_shadow_zapped;
+   u32 mmu_pte_write;
+   u32 mmu_pte_updated;
+   u32 mmu_pde_zapped;
+   u32 mmu_flooded;
+   u32 mmu_recycled;
 };
 
 struct kvm {
diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index 9be54a5..87d8e70 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -755,6 +755,7 @@ static void kvm_mmu_zap_page(struct kvm *kvm,
 {
u64 *parent_pte;
 
+   ++kvm->stat.mmu_shadow_zapped;
while (page->multimapped || page->parent_pte) {
if (!page->multimapped)
parent_pte = page->parent_pte;
@@ -1226,9 +1227,12 @@ static void mmu_pte_write_new_pte(struct kvm_vcpu *vcpu,
  const void *new, int bytes,
  int offset_in_pte)
 {
-   if (page->role.level != PT_PAGE_TABLE_LEVEL)
+   if (page->role.level != PT_PAGE_TABLE_LEVEL) {
+   ++vcpu->kvm->stat.mmu_pde_zapped;
return;
+   }
 
+   ++vcpu->kvm->stat.mmu_pte_updated;
if (page->role.glevels == PT32_ROOT_LEVEL)
paging32_update_pte(vcpu, page, spte, new, bytes,
offset_in_pte);
@@ -1263,6 +1267,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
int npte;
 
pgprintk("%s: gpa %llx bytes %d\n", __FUNCTION__, gpa, bytes);
+   ++vcpu->kvm->stat.mmu_pte_write;
kvm_mmu_audit(vcpu, "pre pte write");
if (gfn == vcpu->last_pt_write_gfn
&& !last_updated_pte_accessed(vcpu)) {
@@ -1296,6 +1301,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
pgprintk("misaligned: gpa %llx bytes %d role %x\n",
 gpa, bytes, page->role.word);
kvm_mmu_zap_page(vcpu->kvm, page);
+   ++vcpu->kvm->stat.mmu_flooded;
continue;
}
page_offset = offset;
@@ -1344,6 +1350,7 @@ void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu)
page = container_of(vcpu->kvm->active_mmu_pages.prev,
struct kvm_mmu_page, link);
kvm_mmu_zap_page(vcpu->kvm, page);
+   ++vcpu->kvm->stat.mmu_recycled;
}
 }
 
diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index 016abc3..fdc7632 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -66,6 +66,12 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ "fpu_reload", VCPU_STAT(fpu_reload) },
{ "insn_emulation", VCPU_STAT(insn_emulation) },
{ "insn_emulation_fail", VCPU_STAT(insn_emulation_fail) },
+   { "mmu_shadow_zapped", VM_STAT(mmu_shadow_zapped) },
+   { "mmu_pte_write", VM_STAT(mmu_pte_write) },
+   { "mmu_pte_updated", VM_STAT(mmu_pte_updated) },
+   { "mmu_pde_zapped", VM_STAT(mmu_pde_zapped) },
+   { "mmu_flooded", VM_STAT(mmu_flooded) },
+   { "mmu_recycled", VM_STAT(mmu_recycled) },
{ NULL }
 };
 
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 04/52] KVM: Replace 'light_exits' stat with 'host_state_reload'

2007-12-29 Thread Avi Kivity

This is a little more accurate (since it counts actual reloads, not potential
reloads), and reverses the sense of the statistic to measure a bad event like
most of the other stats (e.g. we want to minimize all counters).

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h |2 +-
 drivers/kvm/svm.c |1 +
 drivers/kvm/vmx.c |1 +
 drivers/kvm/x86.c |6 ++
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 59e001c..04efe88 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -246,7 +246,7 @@ struct kvm_stat {
u32 halt_wakeup;
u32 request_irq_exits;
u32 irq_exits;
-   u32 light_exits;
+   u32 host_state_reload;
u32 efer_reload;
 };
 
diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index 762302a..94c51a0 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -654,6 +654,7 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
int i;
 
+   ++vcpu->stat.host_state_reload;
for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
wrmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
 
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index 30220ea..4e60cf9 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -463,6 +463,7 @@ static void vmx_load_host_state(struct vcpu_vmx *vmx)
if (!vmx->host_state.loaded)
return;
 
+   ++vmx->vcpu.stat.host_state_reload;
vmx->host_state.loaded = 0;
if (vmx->host_state.fs_reload_needed)
load_fs(vmx->host_state.fs_sel);
diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index b7c72ac..923dfd4 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -60,7 +60,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ "halt_wakeup", STAT_OFFSET(halt_wakeup) },
{ "request_irq", STAT_OFFSET(request_irq_exits) },
{ "irq_exits", STAT_OFFSET(irq_exits) },
-   { "light_exits", STAT_OFFSET(light_exits) },
+   { "host_state_reload", STAT_OFFSET(host_state_reload) },
{ "efer_reload", STAT_OFFSET(efer_reload) },
{ NULL }
 };
@@ -1988,10 +1988,8 @@ again:
++vcpu->stat.request_irq_exits;
goto out;
}
-   if (!need_resched()) {
-   ++vcpu->stat.light_exits;
+   if (!need_resched())
goto again;
-   }
}
 
 out:
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 06/52] KVM: Add instruction emulation statistics

2007-12-29 Thread Avi Kivity

---
 drivers/kvm/kvm.h |2 ++
 drivers/kvm/x86.c |4 
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index a85c590..5a8a9af 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -249,6 +249,8 @@ struct kvm_stat {
u32 host_state_reload;
u32 efer_reload;
u32 fpu_reload;
+   u32 insn_emulation;
+   u32 insn_emulation_fail;
 };
 
 struct kvm_io_device {
diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index c1211e1..a46b95b 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -63,6 +63,8 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ "host_state_reload", STAT_OFFSET(host_state_reload) },
{ "efer_reload", STAT_OFFSET(efer_reload) },
{ "fpu_reload", STAT_OFFSET(fpu_reload) },
+   { "insn_emulation", STAT_OFFSET(insn_emulation) },
+   { "insn_emulation_fail", STAT_OFFSET(insn_emulation_fail) },
{ NULL }
 };
 
@@ -1381,7 +1383,9 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
get_segment_base(vcpu, VCPU_SREG_FS);
 
r = x86_decode_insn(>emulate_ctxt, _ops);
+   ++vcpu->stat.insn_emulation;
if (r)  {
+   ++vcpu->stat.insn_emulation_fail;
if (kvm_mmu_unprotect_page_virt(vcpu, cr2))
return EMULATE_DONE;
return EMULATE_FAIL;
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 03/52] KVM: Portability: Add two hooks to handle kvm_create and destroy vm

2007-12-29 Thread Avi Kivity

From: Zhang Xiantao <[EMAIL PROTECTED]>

Add two arch hooks to handle kvm_create_vm and kvm destroy_vm. Now, just
put io_bus init and destory in common.

Signed-off-by: Zhang Xiantao <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h  |4 
 drivers/kvm/kvm_main.c |   42 ++
 drivers/kvm/x86.c  |   47 +++
 3 files changed, 57 insertions(+), 36 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index c4ad66b..59e001c 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -674,6 +674,10 @@ int kvm_arch_hardware_setup(void);
 void kvm_arch_hardware_unsetup(void);
 void kvm_arch_check_processor_compat(void *rtn);
 
+void kvm_free_physmem(struct kvm *kvm);
+
+struct  kvm *kvm_arch_create_vm(void);
+void kvm_arch_destroy_vm(struct kvm *kvm);
 
 static inline void kvm_guest_enter(void)
 {
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index acd26cf..3aa34de 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -156,18 +156,18 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_uninit);
 
 static struct kvm *kvm_create_vm(void)
 {
-   struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
+   struct kvm *kvm = kvm_arch_create_vm();
 
-   if (!kvm)
-   return ERR_PTR(-ENOMEM);
+   if (IS_ERR(kvm))
+   goto out;
 
kvm_io_bus_init(>pio_bus);
mutex_init(>lock);
-   INIT_LIST_HEAD(>active_mmu_pages);
kvm_io_bus_init(>mmio_bus);
spin_lock(_lock);
list_add(>vm_list, _list);
spin_unlock(_lock);
+out:
return kvm;
 }
 
@@ -188,7 +188,7 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot 
*free,
free->rmap = NULL;
 }
 
-static void kvm_free_physmem(struct kvm *kvm)
+void kvm_free_physmem(struct kvm *kvm)
 {
int i;
 
@@ -196,32 +196,6 @@ static void kvm_free_physmem(struct kvm *kvm)
kvm_free_physmem_slot(>memslots[i], NULL);
 }
 
-static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
-{
-   vcpu_load(vcpu);
-   kvm_mmu_unload(vcpu);
-   vcpu_put(vcpu);
-}
-
-static void kvm_free_vcpus(struct kvm *kvm)
-{
-   unsigned int i;
-
-   /*
-* Unpin any mmu pages first.
-*/
-   for (i = 0; i < KVM_MAX_VCPUS; ++i)
-   if (kvm->vcpus[i])
-   kvm_unload_vcpu_mmu(kvm->vcpus[i]);
-   for (i = 0; i < KVM_MAX_VCPUS; ++i) {
-   if (kvm->vcpus[i]) {
-   kvm_arch_vcpu_free(kvm->vcpus[i]);
-   kvm->vcpus[i] = NULL;
-   }
-   }
-
-}
-
 static void kvm_destroy_vm(struct kvm *kvm)
 {
spin_lock(_lock);
@@ -229,11 +203,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
spin_unlock(_lock);
kvm_io_bus_destroy(>pio_bus);
kvm_io_bus_destroy(>mmio_bus);
-   kfree(kvm->vpic);
-   kfree(kvm->vioapic);
-   kvm_free_vcpus(kvm);
-   kvm_free_physmem(kvm);
-   kfree(kvm);
+   kvm_arch_destroy_vm(kvm);
 }
 
 static int kvm_vm_release(struct inode *inode, struct file *filp)
diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index abb7bee..b7c72ac 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -2543,3 +2543,50 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
kvm_mmu_destroy(vcpu);
free_page((unsigned long)vcpu->pio_data);
 }
+
+struct  kvm *kvm_arch_create_vm(void)
+{
+   struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
+
+   if (!kvm)
+   return ERR_PTR(-ENOMEM);
+
+   INIT_LIST_HEAD(>active_mmu_pages);
+
+   return kvm;
+}
+
+static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
+{
+   vcpu_load(vcpu);
+   kvm_mmu_unload(vcpu);
+   vcpu_put(vcpu);
+}
+
+static void kvm_free_vcpus(struct kvm *kvm)
+{
+   unsigned int i;
+
+   /*
+* Unpin any mmu pages first.
+*/
+   for (i = 0; i < KVM_MAX_VCPUS; ++i)
+   if (kvm->vcpus[i])
+   kvm_unload_vcpu_mmu(kvm->vcpus[i]);
+   for (i = 0; i < KVM_MAX_VCPUS; ++i) {
+   if (kvm->vcpus[i]) {
+   kvm_arch_vcpu_free(kvm->vcpus[i]);
+   kvm->vcpus[i] = NULL;
+   }
+   }
+
+}
+
+void kvm_arch_destroy_vm(struct kvm *kvm)
+{
+   kfree(kvm->vpic);
+   kfree(kvm->vioapic);
+   kvm_free_vcpus(kvm);
+   kvm_free_physmem(kvm);
+   kfree(kvm);
+}
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 02/52] KVM: Remove __init attributes for kvm_init_debug and kvm_init_msr_list

2007-12-29 Thread Avi Kivity

From: Zhang Xiantao <[EMAIL PROTECTED]>

Since their callers are not declared with __init.

Signed-off-by: Zhang Xiantao <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm_main.c |2 +-
 drivers/kvm/x86.c  |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 7a871e0..acd26cf 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -1332,7 +1332,7 @@ static u64 stat_get(void *_offset)
 
 DEFINE_SIMPLE_ATTRIBUTE(stat_fops, stat_get, NULL, "%llu\n");
 
-static __init void kvm_init_debug(void)
+static void kvm_init_debug(void)
 {
struct kvm_stats_debugfs_item *p;
 
diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index f1746af..abb7bee 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -1049,7 +1049,7 @@ out:
return r;
 }
 
-static __init void kvm_init_msr_list(void)
+static void kvm_init_msr_list(void)
 {
u32 dummy[2];
unsigned i, j;
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 00/52] KVM patch queue review for 2.6.25 merge window (part III)

2007-12-29 Thread Avi Kivity

The third installment of the 2.6.25 kvm patch queue, for your reviewing
pleasure.  This time, a diffstat of the files affected by these 53 patches is
appended.

 drivers/kvm/kvm.h |  384 ++---
 drivers/kvm/kvm_main.c|  192 +++---
 drivers/kvm/mmu.c |  113 +++---
 drivers/kvm/paging_tmpl.h |  156 +++
 drivers/kvm/svm.c |4 +-
 drivers/kvm/vmx.c |5 +-
 drivers/kvm/x86.c |  261 +--
 drivers/kvm/x86.h |  331 ++
 drivers/kvm/x86_emulate.c |   38 +-
 drivers/kvm/x86_emulate.h |   18 +--
 include/asm-x86/Kbuild|1 +
 include/asm-x86/kvm.h |  155 ++
 include/linux/kvm.h   |  137 +
 13 files changed, 1003 insertions(+), 792 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 01/52] KVM: Remove ptr comparisons to 0

2007-12-29 Thread Avi Kivity

From: Joe Perches <[EMAIL PROTECTED]>

Fix sparse warnings "Using plain integer as NULL pointer"

Signed-off-by: Joe Perches <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h  |2 +-
 drivers/kvm/kvm_main.c |3 ++-
 drivers/kvm/svm.c  |2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index e34e246..c4ad66b 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -398,7 +398,7 @@ static inline struct kvm_ioapic *ioapic_irqchip(struct kvm 
*kvm)
 
 static inline int irqchip_in_kernel(struct kvm *kvm)
 {
-   return pic_irqchip(kvm) != 0;
+   return pic_irqchip(kvm) != NULL;
 }
 
 struct descriptor_table {
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index bce4216..7a871e0 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -1449,7 +1449,8 @@ int kvm_init(void *opaque, unsigned int vcpu_size,
 
/* A kmem cache lets us meet the alignment requirements of fx_save. */
kvm_vcpu_cache = kmem_cache_create("kvm_vcpu", vcpu_size,
-  __alignof__(struct kvm_vcpu), 0, 0);
+  __alignof__(struct kvm_vcpu),
+  0, NULL);
if (!kvm_vcpu_cache) {
r = -ENOMEM;
goto out_free_4;
diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index 0f0958d..762302a 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -1271,7 +1271,7 @@ static int handle_exit(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
   exit_code);
 
if (exit_code >= ARRAY_SIZE(svm_exit_handlers)
-   || svm_exit_handlers[exit_code] == 0) {
+   || !svm_exit_handlers[exit_code]) {
kvm_run->exit_reason = KVM_EXIT_UNKNOWN;
kvm_run->hw.hardware_exit_reason = exit_code;
return 0;
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] include/asm-alpha/core_cia.h, kernel 2.6.23.12

2007-12-29 Thread Anders Hammarquist

[please Cc: me on followups]

Trying to compile 2.6.23.12 on alpha (a miata) resulted in this
failure:

cc1: warnings being treated as errors
include/asm/io_trivial.h: In function 'cia_readb':
include/asm/io_trivial.h:75: warning: passing argument 1 of 'cia_ioread8' 
discards qualifiers from pointer target type

This trivial patch to include/asm-alpha/core_cia.h fixed it 
diff -ur linux-2.6.23.12/include/asm-alpha/core_cia.h 
../src/linux-2.6.23.12/include/asm-alpha/core_cia.h
--- linux-2.6.23.12/include/asm-alpha/core_cia.h2007-12-18 
22:55:57.0 +0100
+++ ../src/linux-2.6.23.12/include/asm-alpha/core_cia.h 2007-12-30 
04:52:28.956657441 +0100
@@ -341,7 +341,7 @@
 #define vuip   volatile unsigned int __force *
 #define vulp   volatile unsigned long __force *
 
-__EXTERN_INLINE unsigned int cia_ioread8(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int cia_ioread8(const volatile void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -358,7 +358,7 @@
return __kernel_extbl(result, addr & 3);
 }
 
-__EXTERN_INLINE void cia_iowrite8(u8 b, void __iomem *xaddr)
+__EXTERN_INLINE void cia_iowrite8(u8 b, volatile void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long w, base_and_type;
@@ -373,7 +373,7 @@
*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int cia_ioread16(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int cia_ioread16(const volatile void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -388,7 +388,7 @@
return __kernel_extwl(result, addr & 3);
 }
 
-__EXTERN_INLINE void cia_iowrite16(u16 b, void __iomem *xaddr)
+__EXTERN_INLINE void cia_iowrite16(u16 b, volatile void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long w, base_and_type;
@@ -403,7 +403,7 @@
*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int cia_ioread32(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int cia_ioread32(const volatile void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
if (addr < CIA_DENSE_MEM)
@@ -411,7 +411,7 @@
return *(vuip)addr;
 }
 
-__EXTERN_INLINE void cia_iowrite32(u32 b, void __iomem *xaddr)
+__EXTERN_INLINE void cia_iowrite32(u32 b, volatile void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
if (addr < CIA_DENSE_MEM)
Signed-off-by: Anders Hammarquist <[EMAIL PROTECTED]>

-- 
 -- Of course I'm crazy, but that doesn't mean I'm wrong.
Anders Hammarquist  | [EMAIL PROTECTED]
Physics student, Chalmers University of Technology, | Hem: +46 31 88 48 50
G|teborg, Sweden.   RADIO: SM6XMM and N2JGL | Mob: +46 707 27 86 87

Re: [PATCH 01/12] Use mutex instead of semaphore in driver core

2007-12-29 Thread David Brownell

On Saturday 29 December 2007, Alan Stern wrote:
> lockdep warns whenever a task acquires a mutex while holding another
> mutex of the same kind (that is, the same member in another structure
> of the same type).  But there are lots of places where the kernel needs
> to acquire dev->sem for one device while already holding
> dev->parent->sem.

Not just devices.  I've seen the same issue with genirq when
enabling or disabling wakeup:  while holding irq_desc[354].lock
it must also acquire the parent IRQ's irq_desc[37].lock so it
can update that parent IRQ's wake flag ... because the wake
signal goes from the child up to the parent up to the logic
that kicks the clock framework and thence the CPU, and software
must enable at least some of those paths by hand.

And lockdep says "[ INFO: possible recursive locking detected ]".
But the analysis is "ignore that one, it's a false alarm".

> There's no way to remove these, which means there's 
> no way to prevent lockdep from issuing a warning.

There may be no *efficient* way to do that.  If it tracked
every lock individually these false alarms could go away;
but that would increase the overhead to create and destroy
such locks too.

Such tradeoffs are what make it Engineering, not Science.  ;)

- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: kprobes remove fix_riprel #ifdef

2007-12-29 Thread Masami Hiramatsu

Hello Harvey,

A similar idea was already nack-ed by Ananth.
http://sources.redhat.com/ml/systemtap/2007-q4/msg00468.html
And I agree his thought.

Especially, "riprel" does not exist on x86_32, so fix_riprel()
is meaningless on it.
Thus, I think it would better be ifdef'd in call-site.

Harvey Harrison wrote:
> Move #ifdef around function definiton into the function and
> unconditionally return on X86_32.  Saves an ifdef from the
> one callsite.
> 
> Signed-off-by: Harvey Harrison <[EMAIL PROTECTED]>
> ---
> Ingo, Masami, final leftovers from some unsent kprobes unification work.
> 
> Net reduction of one #ifdef section.
> 
>  arch/x86/kernel/kprobes.c |   11 +++
>  1 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
> index b1804e4..1ac532e 100644
> --- a/arch/x86/kernel/kprobes.c
> +++ b/arch/x86/kernel/kprobes.c
> @@ -263,15 +263,16 @@ static int __kprobes is_IF_modifier(kprobe_opcode_t 
> *insn)
>   return 0;
>  }
>  
> -#ifdef CONFIG_X86_64
>  /*
>   * Adjust the displacement if the instruction uses the %rip-relative
>   * addressing mode.
>   * If it does, Return the address of the 32-bit displacement word.
>   * If not, return null.
> + * Only applicable to X86_64
>   */
>  static void __kprobes fix_riprel(struct kprobe *p)
>  {
> +#ifdef CONFIG_X86_64
>   u8 *insn = p->ainsn.insn;
>   s64 disp;
>   int need_modrm;
> @@ -335,15 +336,17 @@ static void __kprobes fix_riprel(struct kprobe *p)
>   *(s32 *)insn = (s32) disp;
>   }
>   }
> -}
> +#else
> + return;
>  #endif
> +}
>  
>  static void __kprobes arch_copy_kprobe(struct kprobe *p)
>  {
>   memcpy(p->ainsn.insn, p->addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
> -#ifdef CONFIG_X86_64
> +
>   fix_riprel(p);
> -#endif
> +
>   if (can_boost(p->addr))
>   p->ainsn.boostable = 0;
>   else

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [EMAIL PROTECTED], [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: Introduce REX prefix helper for kprobes

2007-12-29 Thread Masami Hiramatsu

Hi Harvey,

Harvey Harrison wrote:
> Fold some small ifdefs into a helper function.
> 
> Signed-off-by: Harvey Harrison <[EMAIL PROTECTED]>
> ---
> Masami, Ingo, I had this left in some unsent kprobes unification
> work.  Depends on your tastes, but does reduce ifdefs and is a bit
> better about self-documenting the REX prefix on X86_64.

Basically, I think it is good idea.
Could you use a macro same as the stack_addr() macro, like as below?

#defile is_REX_prefix(insn) ((insn & 0xf0) == 0x40))

This is just a bit checker, so I think a macro is better to do that.

> If I find places that could also use this I'll try to find a suitable
> header any stick a static inline there instead.  Otherwise static to
> kprobes.c is probably more appropriate for now.
> 
>  arch/x86/kernel/kprobes.c |   27 +++
>  1 files changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
> index 4e33329..b1804e4 100644
> --- a/arch/x86/kernel/kprobes.c
> +++ b/arch/x86/kernel/kprobes.c
> @@ -171,6 +171,19 @@ static void __kprobes set_jmp_op(void *from, void *to)
>  }
>  
>  /*
> + * Check for the REX prefix which can only exist on X86_64
> + * X86_32 always returns 0
> + */
> +static int __kprobes is_REX_prefix(kprobe_opcode_t *insn)
> +{
> +#ifdef CONFIG_X86_64
> + if ((*insn & 0xf0) == 0x40)
> + return 1;
> +#endif
> + return 0;
> +}
> +
> +/*
>   * Returns non-zero if opcode is boostable.
>   * RIP relative instructions are adjusted at copying time in 64 bits mode
>   */
> @@ -239,14 +252,14 @@ static int __kprobes is_IF_modifier(kprobe_opcode_t 
> *insn)
>   case 0x9d:  /* popf/popfd */
>   return 1;
>   }
> -#ifdef CONFIG_X86_64
> +
>   /*
> -  * on 64 bit x86, 0x40-0x4f are prefixes so we need to look
> +  * on X86_64, 0x40-0x4f are REX prefixes so we need to look
>* at the next byte instead.. but of course not recurse infinitely
>*/
> - if (*insn  >= 0x40 && *insn <= 0x4f)
> + if (is_REX_prefix(insn))
>   return is_IF_modifier(++insn);
> -#endif
> +
>   return 0;
>  }
>  
> @@ -284,7 +297,7 @@ static void __kprobes fix_riprel(struct kprobe *p)
>   }
>  
>   /* Skip REX instruction prefix.  */
> - if ((*insn & 0xf0) == 0x40)
> + if (is_REX_prefix(insn))
>   ++insn;
>  
>   if (*insn == 0x0f) {
> @@ -748,11 +761,9 @@ static void __kprobes resume_execution(struct kprobe *p,
>   unsigned long orig_ip = (unsigned long)p->addr;
>   kprobe_opcode_t *insn = p->ainsn.insn;
>  
> -#ifdef CONFIG_X86_64
>   /*skip the REX prefix*/
> - if (*insn >= 0x40 && *insn <= 0x4f)
> + if (is_REX_prefix(insn))
>   insn++;
> -#endif
>  
>   regs->flags &= ~X86_EFLAGS_TF;
>   switch (*insn) {

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [EMAIL PROTECTED], [EMAIL PROTECTED]


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: TOMOYO Linux Security Goal

2007-12-29 Thread Valdis . Kletnieks

On Sun, 30 Dec 2007 14:29:50 +0900, Tetsuo Handa said:

> Use of "learning mode" is independent from "correct policy".

My point *exactly*.

> The "learning mode" merely takes your duty of appending permissions to policy.
> We can develop and share procedures for how to exercise infrequently used code
> paths, like how to confirm that your SMTP service won't relay spams.
> This problem is nothing but "developing and sharing procedures for how to
> exercise infrequently used code paths" has not started yet.

And I don't have an issue with the concept that you let a "learning mode"
develop 95% of the policy for you - as long as you make clear that you *do*
need to do some work to ensure all code paths are tested, or otherwise verify
that the generated policy is in fact correct.  For instance, semantic analysis
of the program source can identify what directories a program *should* create
files in - if some directories aren't in fact listed in the policy, they need
to be added.  If the actual learned policy includes directories that shouldn't
have been learned, you've got a *bigger* problem.  And yes, I have worked with
more than one case where a "benchline" measurement of a system happened to
include an actual intrusion

> By the way, what is the definition of "correct policy"?

> The definition of "correct policy" depends on the user.
> 
> Some users may think that
> 
>   "A ready-made policy is better than a manually-made policy
>even if the ready-made policy contains unused/unneeded permissions.
>Being unable to handle infrequently used code paths is worse than
>leaving a room for not knowing/understanding what can happen."
> 
> but other users may think that
> 
>   "A manually-made policy is better than a ready-made policy
>even if the manually-made policy lacks permissions for infrequently
>used code paths.
>Leaving a room for not knowing/understanding what can happen is worse than
>being unable to handle infrequently used code paths."

Neither.

Policy *correctness* is a measure of how well the policy allows those
events that should properly happen, and rejects those events that should not
happen.  What you discuss here is what the relative impact of various *errors*
in the policy - basically, false-negative versus false-positive identification
of policy-violating activity.  Your first example says that it's preferable
to false-negative and fail to flag a violation, your second that it's
preferable to false-positive and flag something that should have been allowed.

In neither case are you actually talking about *correct* policy - which
would *properly* describe the desired behavior.  Note that how the policy
was *created* does not affect the *actual* correctness of the policy - it's
quite possible to create both correct and incorrect policies via both
manual and ready-made methods.

The *true* security question is:  What method minimizes your *total* cost of
both developing the policy and dealing with *both* the false positives and
negatives of a possibly incorrect policy?

> Since the definition of "correct policy" is not a globally agreed word,
> I think we can't say that "learning mode unlikely produces correct policy".

I'm pretty sure that most of the security community agrees on what "correct"
means - the disagreement is in the most cost-effective way to *create* one.

Notice that I never said "learning mode is *unlikely* to produce correct 
policy".
What I said was "Learning mode *may* produce a correct policy, but there is a
non-zero probability that needed things will remain unlearned unless care is
taken to ensure they are learned".

And I've been in this industry long enough to have *many* teeth marks where
I've been bitten by very small but non-zero probabilities :)



pgph3n4gW2HLl.pgp
Description: PGP signature

Re: [RFC/PATCH] e100 driver didn't support any MII-less PHYs...

2007-12-29 Thread Kok, Auke

Andreas Mohr wrote:
> Hi all,
> 
> I was mildly annoyed when rebooting my _headless_ internet gateway after a
> hotplug -> udev migration and witnessing it not coming up again,
> which turned out to be due to an eepro100 / e100 loading conflict
> since eepro100 supported both of my Intel-based network cards,
> whereas e100 only supported the "newer" one and entirely failed on ifup...
> (udev had somehow managed to tweak loading sequence as compared to
> a hotplug setup, which caused the drivers to probe differently)
> 
> After investigating this e100 failure for half an hour it was obvious
> that it was failing in e100_hw_init() -> e100_phy_init() since the driver was
> prepared to handle MII-capable PHYs only, not certain older(?) MII-less
> PHYs such as 80c24 or i82503.
> Investigating some FreeBSD etc. drivers it became terribly clear that there
> are also some MII-less PHYs and that one would have to handle them properly.
> 
> Thus I decided to add support for those:
> - after PHY init failure, try to detect whether the EEPROM lists one of
>   the MII-less PHYs
> - if so, don't fatally fail PHY init function
> - avoid touching MII in various utility functions in case of MII-less
>   PHY (FIXME: this may need review, it was a quick hack in some places)
> - add some proper logging on init failure
> 
> Note that this is an initial, semi-rough patch only, would love to have
> it corrected/improved by the e1000 team.
> (I also added some spelling updates for good measure, these would have
> to be committed separately obviously)
> 
> Frankly I'm quite uncertain as to why one would try to actively deprecate
> a driver which works for many cards with a newer one which fails to work
> for several card types and doesn't seem clearly superiour in hindsight
> after going through it...
> Oh, right, that's in order to brute-force people to report any
> nagging problems with the new driver, which is... errm... very
> understandable after all ;)
> (I hope that me "reporting" this problem via a patch is ok ;)
> 
> For reference, I'm using a BNC/AUI/TP PCI combo card
> Intel 82557 645477-004 FCC ID EJMNPDEPR10PCTPCI
> 
> This mail written using a reassuringly stable connection over the newly
> adapted driver...


ok, barely glanced over the patch but it might just be fine. Can you split up 
this
patch and send a separate patch for the spelling mistakes? I'll then have some
quick testing done on the result and do a bit deeper review after newyears.

Cheers,

Auke
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc6-mm1

2007-12-29 Thread Randy Dunlap

On Sun, 30 Dec 2007 04:34:36 +0100 Torsten Kaiser wrote:

> On Dec 30, 2007 2:30 AM, Herbert Xu <[EMAIL PROTECTED]> wrote:
> > On Sat, Dec 29, 2007 at 05:51:13PM +0100, Torsten Kaiser wrote:
> > >
> > > > > The cause, why I am resending this: I just got a crash with
> > > > > 2.6.24-rc6-mm1, again looking network related:
> > > > >
> > > > > [93436.933356] WARNING: at include/net/dst.h:165 dst_release()
> > > > > [93436.936685] Pid: 8079, comm: konqueror Not tainted 2.6.24-rc6-mm1 
> > > > > #11
> > > > > [93436.939292]
> > > > > [93436.939293] Call Trace:
> > > > > [93436.939304]  [] skb_release_all+0xdd/0x110
> > > > > [93436.939307]  [] __kfree_skb+0x11/0xa0
> > > > > [93436.939309]  [] kfree_skb+0x17/0x30
> > > > > [93436.939312]  [] unix_release_sock+0x128/0x250
> > > > > [93436.939315]  [] unix_release+0x21/0x30
> > > > > [93436.939318]  [] sock_release+0x24/0x90
> > > > > [93436.939320]  [] sock_close+0x26/0x50
> > > > > [93436.939324]  [] __fput+0xc1/0x230
> > > > > [93436.939327]  [] fput+0x16/0x20
> > > > > [93436.939329]  [] filp_close+0x56/0x90
> > > > > [93436.939331]  [] sys_close+0xa6/0x110
> > > > > [93436.939335]  [] 
> > > > > system_call_after_swapgs+0x7b/0x80
> > >
> > > >From code inspection I would blame the patch "[SKBUFF]: Free old skb
> > > properly in skb_morph" from Herbert Xu. (CC added)
> >
> > I doubt it.  skb_morph is only used on IP fragments so I don't see how
> > you could attribute an error from a Unix domain socket to this patch.
> 
> That's why I wrote that I do not know much about the network core...
> 
> > In any case, Unix socket packets should not have a dst at all so the
> > very fact that you're in that path means that you have some sort of
> > memory corruption.
> 
> ... I did not know about the fact that there should not have been an dst.
> 
> Its just that this warning was the first nice clue about the memory
> corruption related to networking that I see since 2.6.24-rc3-mm2.
> The time of the patch (Mon, 26 Nov 2007 15:11:19) even fits into the
> window between -rc3-mm1 and -rc3-mm2.
> 
> I doubt that the memory corruption is a hardware problem, because the
> system in question is using ECC ram and I did not see any messages
> about corrected/detected errors.
> 
> > Is this the very first OOPS/warning that you see? If not you should
> > ignore all but the very first one as that may have left your system
> > in an inconsistent state which may render all subsequent OOPSes and
> > warnings useless.
> 
> I looked into the log in question and the only other warning was a
> circular locking dependency that lockdep detected around 1.5 hour
> before this warning.
> 
> As reported in my original mail immeadeatly after the warning the
> system OOPSed and hang:
> [93436.947241] general protection fault:  [1] SMP
> -> first OOPS  ^
FYI, that's what this counter is... -^

> [93436.947243] last sysfs file:
> /sys/devices/pci:00/:00:0f.0/:01:00.1/irq
> [93436.947245] CPU 1
> [93436.947246] Modules linked in: radeon drm nfsd exportfs w83792d
> ipv6 tuner tea5767 tda8290 tuner_xc2
> 028 tda9887 tuner_simple mt20xx tea5761 tvaudio msp3400 bttv ir_common
> compat_ioctl32 videobuf_dma_sg v
> ideobuf_core btcx_risc tveeprom usbhid videodev v4l2_common hid
> v4l1_compat pata_amd sg i2c_nforce2
> [93436.947257] Pid: 8079, comm: konqueror Not tainted 2.6.24-rc6-mm1 #11
> -> not tainted by a previous OOPS
> [93436.947259] RIP: 0010:[]  []
> skb_drop_list+0x18/0x30
> [93436.947262] RSP: 0018:810005f4fda8  EFLAGS: 00010286
> [93436.947263] RAX: ab1ed5ca5b74e7de RBX: ab1ed5ca5b74e7de RCX: 
> d135
> [93436.947265] RDX: 81011d089a80 RSI: 0001 RDI: 
> 81011d089a88
> [93436.947266] RBP: 810005f4fdb8 R08: 0001 R09: 
> 0006
> [93436.947268] R10:  R11:  R12: 
> 8100de02c500
> [93436.947269] R13: 81011c188a00 R14: 0001 R15: 
> 81011c189198
> [93436.947271] FS:  7fb5bde0d700() GS:81007ff22000()
> knlGS:
> [93436.947273] CS:  0010 DS:  ES:  CR0: 8005003b
> [93436.947274] CR2: 7fb5bdd76000 CR3: 664d5000 CR4: 
> 06e0
> [93436.947276] DR0:  DR1:  DR2: 
> 
> [93436.947277] DR3:  DR6: 0ff0 DR7: 
> 0400
> [93436.947279] Process konqueror (pid: 8079, threadinfo
> 810005f4e000, task 8100a1dec000)
> [93436.947281] Stack:  810005f4fdd8 810116c86140
> 810005f4fdd8 805314ae
> [93436.947284]  810116c86140 8100de02c500 810005f4fdf8
> 80531cf0
> [93436.947286]  8100de02c500 81011c188b48 810005f4fe18
> 80531311
> [93436.947288] Call Trace:
> [93436.947290]  [] skb_release_data+0x5e/0xa0
> [93436.947293]  [] skb_release_all+0xa0/0x110
> [93436.947295]  [] __kfree_skb+0x11/0xa0
> [93436.947297]  []

Re: TOMOYO Linux Security Goal

2007-12-29 Thread Tetsuo Handa

Hello.

[EMAIL PROTECTED] wrote:
> Please make a *big* notation someplace that "learning mode" is quite likely to
> *not* produce a totally correct policy.  In particular, it won't build rules 
> for
> infrequently used code paths (such as error handling) unless you find a way to
> exercise those paths while in learning mode.
Use of "learning mode" is independent from "correct policy".
The "learning mode" merely takes your duty of appending permissions to policy.
We can develop and share procedures for how to exercise infrequently used code
paths, like how to confirm that your SMTP service won't relay spams.
This problem is nothing but "developing and sharing procedures for how to
exercise infrequently used code paths" has not started yet.

By the way, what is the definition of "correct policy"?
The definition of "correct policy" depends on the user.

Some users may think that

  "A ready-made policy is better than a manually-made policy
   even if the ready-made policy contains unused/unneeded permissions.
   Being unable to handle infrequently used code paths is worse than
   leaving a room for not knowing/understanding what can happen."

but other users may think that

  "A manually-made policy is better than a ready-made policy
   even if the manually-made policy lacks permissions for infrequently
   used code paths.
   Leaving a room for not knowing/understanding what can happen is worse than
   being unable to handle infrequently used code paths."

You can use "permissive mode" to adjust and confirm your policy
before you use "enforcing mode".
You can also use "delayed enforcing mode" that allows an administrator
handle infrequently used code paths without once rejecting those code paths.
If the policy is not correct, it is the person's fault who enforced that policy
without confirming that that policy is suitable for his/her system.

Since the definition of "correct policy" is not a globally agreed word,
I think we can't say that "learning mode unlikely produces correct policy".

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

SATA buffered read VERY slow (not raid, Promise TX300 card); 2.6.23.1(vanilla)

2007-12-29 Thread Linda Walsh


I needed to get a new hard disk for one of my systems and thought that
it was about time to start going with SATA.

I picked up a Promise 4-Port Sata300-TX4 to go with a 750G
Seagate SATA -- I'd had good luck with a Promise ATA100 (P)ATA
and lower capacity Seagates and thought it would be a good combo.

Unfortunately, the *buffered* read performance is *horrible*!

I timed the new disk against a 400GB PATA and old 80MB/s SCSI-based
18.3G hard disk.  While the raw speed numbers are faster as expected, 
the linux-buffered read numbers are not good.



sda=18.3G on 80MB/s SCSI
sdb=the new 750GB on a 3Gb SATA w/NCQ.
hdf=400GB PATA on an ATA100 Promise card

I used "dd" for my tests, reading 2GB on a quiescent machine
that has 1GB of main memory.  Output was to dev null.  Input
was from the device (not a partition or file), (/dev/sda, /dev/sdb
and /dev/hdf).  BS=1M, Count=2k.  For the direct tests, I used
the "iflag=direct" param.  No RAID or "volumes" are involved.

In each case, I took best run time out of 3 runs.

Direct read speeds (and cpu usage):
dev   speed   cpu/real %
sda   60MB/s 0.51/35.84   1.44
sdb   80MB/s 0.50/26.72   1.87
hdf   69.4MB/s   0.51/30.92   1.68


Buffered reads show the "bad news":
dev   speed   cpu/real %
sda   59.9MB/s  20.80/35.86   58.03
sdb   18.7MB/s  16.07/114.73  14.01  <-SATA extra badness
hdf   69.8MB/s  17.37/30.76   56.48

I assume this isn't expected behavior.

Why would buffered reads be so much slower for SATA?  Shouldn't
it be the same buffering system used by sda and hdf?  I can't
see how it would be the hardware or the driver since both
give "best" read performance with the new SATA disk being
15-20% faster.

But the buffered reads...are 60% *slower*.  I want to ask if this
is even possible, even though the evidence seems to indicate it is.
But what I mean to ask is: "are the SATA buffered read paths
*so* different from SCSI and PATA that they could cause this?
Isn't the block layer mostly "independent" above the device
layer?  If it isn't evident, I'm using the newer SATA drivers (not
the old ones included with the pata library and the pata disks
are using the old ATA interface.

I wanted to use the newer pata support in the SATA lib, but
got frustrated "real fast" by the lack of disk-parameter support
in the new pata library (hdparm is mostly broken; and the SCSI
utils aren't really intended for ATA(or SATA?) disks using the
SCSI interface.

Is there some 'gotcha' I'm missing?  Google didn't seem to
throw any answers at me that 'stood out'.

Also, as a side issue -- have the buffered commands always
taken that much cpu vs. direct (machine has 2x1GHz-P-III's).
Maybe it has and I just haven't noticed it -- but my main
problem right now is with the horrible buffered SATA
performance.

Since SATA's use ATA-7 (or at least the Seagate disk I
acquired seems to), shouldn't most of the hdparm commands
be functional on the SATA hardware as much as they would
be on PATA?  Or...maybe said a different way, is there
an "sdparm" that is to SATA what hdparm is to PATA?

The Promise controllers involved (PATA and SATA) are:
00:0d.0 Mass storage controller: Promise Technology, Inc. PDC20268 
(Ultra100 TX2) (rev 02)

and
02:09.0 Mass storage controller: Promise Technology, Inc. PDC40718 (SATA 
300 TX4) (rev 02)


I'd ask about a newer driver, but the hardware seems pretty
fast if I go around the Linux kernel.  Ideas?  What could
slow down the linux-buffer layer when the driver is faster?
Perversely, could it be the faster driver speed just tipping
over some internal "flooding" limit which degrades buffered
performance? 


Very Confused & TIA,
Linda


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] security: remove security_sb_post_mountroot hook

2007-12-29 Thread James Morris

On Sat, 29 Dec 2007, H. Peter Anvin wrote:

> The security_sb_post_mountroot() hook is long-since obsolete, and is
> fundamentally broken: it is never invoked if someone uses initramfs.
> This is particularly damaging, because the existence of this hook has
> been used as motivation for not using initramfs.
> 
> Stephen Smalley confirmed on 2007-07-19 that this hook was originally
> used by SELinux but can now be safely removed:
> 
>  http://marc.info/?l=linux-kernel=118485683612916=2

Thanks.

Applied to
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6.git#for-akpm


-- 
James Morris
<[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] USB driver for talking to the Microchip PIC18 boot loader

2007-12-29 Thread Xiaofan Chen

On Dec 30, 2007 11:53 AM, mgross <[EMAIL PROTECTED]> wrote:
> Yeah, it has been done from user space using a libusb based
> application.  (that didn't work with a usb-hub in the loop) and had
> code that was just too nasty for words, so I made a kernel driver that
> looks nicer to me and enables a nice python FW loader program to work.

http://forum.microchip.com/tm.aspx?m=275422=2
Just a guess why it does not work: this might be due to the auto-suspend.
If you update your kernel, it should be ok. The firmware is also to blame.

> What is the linux-usb policies on new drivers that could be
> implemented in user space?  When does a kernel driver make  sense over
> a libusb one?
>

That would be interesting to know.

Xiaofan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] USB driver for talking to the Microchip PIC18 boot loader

2007-12-29 Thread Xiaofan Chen

On Dec 30, 2007 12:29 PM, mgross <[EMAIL PROTECTED]> wrote:
> The device ID's are different 0x000C in ldusb.c vrs 0x000b in the
> driver I just posted.

I know that. 000b is for the demo application. 000c is for the
bootloader application.

I am not a progammer myself but I think the USB communication part
of both the two libusb based application are fine. I have no comments
about the other part of fsusb coce.
http://forum.microchip.com/tm.aspx?m=106426

> > Please do not add it to the kernel. There are libusb based application
> > for both the bootloader and the demo application and both are working
> > fine under Linux (along with Windows and I am trying to get FreeBSD
> > working).
>
> The libusb based FW loader http://www.internetking.org/fsusb/ program
> is nasty and didn't work on one of my systems, so I refactored it into
> a kernel driver and python program.
>

If it does not work, read my patches and see if it will work.

If you do not like the existing fsusb application, you can rewrite
it in python with pyusb (which is based on libusb) but you do not
need a kernel driver.

pyusb: http://pyusb.berlios.de/

Hex file parsing  in pyk by Mark Rages. He is using the Bitpim
libusb wrapper which IMHO is not as good as pyusb.
http://groups.google.com/group/pickit-devel/msg/35e850832256e890

Xiaofan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] USB driver for talking to the Microchip PIC18 boot loader

2007-12-29 Thread mgross

On Sun, Dec 30, 2007 at 10:40:45AM +0800, Xiaofan Chen wrote:
> On Dec 30, 2007 6:15 AM, Alan Stern <[EMAIL PROTECTED]> wrote:
> > On Sat, 29 Dec 2007, mgross wrote:
> >
> > > I'm playing around with a PIC based project at home (not an Intel
> > > activity) and found I needed a usb driver to talk to the boot loader
> > > so I can program my USB Bitwhacker with new custom firmware.  The
> > > following adds the pic18bl driver to the kernel.  Its pretty simple
> > > and is somewhat based on bits of a libusb driver that does some of
> > > what this driver does.
> > >
> > > What do you think?
> >
> > Not to detract from your driver, but would it be possible to do the
> > whole thing in userspace using libusb?  Maybe by extending the driver
> > you mentioned?
> >
> 
> The existing libusb based application works fine for PICDEM FS USB
> or those based on it (like the Bitwhacker the OP is using).

The device ID's are different 0x000C in ldusb.c vrs 0x000b in the
driver I just posted.

Have you read my patch yet?

> 
> Please do not add it to the kernel. There are libusb based application
> for both the bootloader and the demo application and both are working
> fine under Linux (along with Windows and I am trying to get FreeBSD
> working).

The libusb based FW loader http://www.internetking.org/fsusb/ program
is nasty and didn't work on one of my systems, so I refactored it into
a kernel driver and python program.

> 
> Last time the demo application has been added to the ldusb and
> I think it is not a good idea. But since then I've added patches to
> the existing libusb application.
> 
> Relevant discussion in thread
> '[PATCH 70/78] USB: add picdem device to ldusb'
> http://marc.info/?t=11777007643=1=2
> 
> So please do not do this again. It is not a problem for the libusb
> based applications after the patches but it is really not necessary.

Why not?

There are a lot of redundant things in the world.  Linux is not
necessary if you really want to take this argument to its extreme.

> 
> Original libusb based application for the bootloader:
> http://www.internetking.org/fsusb/

Yup thats the code.  I found it way complex to read and felt a simple
kernel driver and simple python program much nicer to my
sensibilities.

We are getting quickly getting into a fuzzy/ opinion, area on this
thread.  Is there a technical angle we can discuss?  My LOC count of
the kernel driver and boot loader is smaller than the fsusb thing.
Also, with a kernel driver and a python lib, a GUI based boot loader
utility can be had with little effort.
 
> Original libusb based application for the Demo which
> also includes my patch for libusb-win32.
> http://www.varxec.net/picdem_fs_usb/
> 
> Updated Patches to detach the kernel driver for both
> the bootloader and Demo application.
> http://forum.microchip.com/tm.aspx?m=106426
> 
> Xiaofan Chen
> http://mcuee.blogspot.com

You blogging about me already?
I wont comment on that.

--mgross


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: permit link(2) to work across --bind mounts ?

2007-12-29 Thread dean gaudet

On Sat, 29 Dec 2007, [EMAIL PROTECTED] wrote:

> On Sat, 29 Dec 2007 12:40:47 PST, dean gaudet said:
> 
> > the main worry i have is some user maliciously hardlinks everything
> > under /var/log somewhere else and slowly fills up the file system with
> > old rotated logs.
> 
> "Doctor, it hurts when I do this.." "Well, don't do that then".

actually it doesn't hurt.  i have other mechanisms which would pick this 
up fairly quickly.

-dean
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] USB driver for talking to the Microchip PIC18 boot loader

2007-12-29 Thread mgross

On Sat, Dec 29, 2007 at 05:15:30PM -0500, Alan Stern wrote:
> On Sat, 29 Dec 2007, mgross wrote:
> 
> > I'm playing around with a PIC based project at home (not an Intel
> > activity) and found I needed a usb driver to talk to the boot loader
> > so I can program my USB Bitwhacker with new custom firmware.  The
> > following adds the pic18bl driver to the kernel.  Its pretty simple
> > and is somewhat based on bits of a libusb driver that does some of
> > what this driver does.
> > 
> > What do you think?
> 
> Not to detract from your driver, but would it be possible to do the 
> whole thing in userspace using libusb?  Maybe by extending the driver 
> you mentioned?
>
Yeah, it has been done from user space using a libusb based
application.  (that didn't work with a usb-hub in the loop) and had
code that was just too nasty for words, so I made a kernel driver that
looks nicer to me and enables a nice python FW loader program to work.

What is the linux-usb policies on new drivers that could be
implemented in user space?  When does a kernel driver make  sense over
a libusb one?

--mgross

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86_64: clear IO_APIC before enabing apic error vector. v2

2007-12-29 Thread Yinghai Lu

please check if you can replace the one in the x86-mm

http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-x86.git;a=commitdiff;h=ffcbdc220a1520d006a837f33589c7c19ffbeb76

the updated one avoid one link warning.

YH

[PATCH] x86_64: clear IO_APIC before enabing apic error vector. v2

some apic id lifting system: 4 socket quad core, 8 socket quad core will do 
apic id lifting for BSP.

but io-apic regs for ExtINT still use 0 as dest.

so when we enable apic error vector in BSP, we will get one APIC error.

CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU 0/4 -> Node 0
CPU: Physical Processor ID: 1
CPU: Processor Core ID: 0
SMP alternatives: switching to UP code
ACPI: Core revision 20070126
enabled ExtINT on CPU#0
ESR value after enabling vector: , after 000c
APIC error on CPU0: 0c(08)
ENABLING IO-APIC IRQs
Synchronizing Arb IDs.

So move enable_IO_APIC from setup_IO_APIC into setup_local_APIC and call it
before enabling apic error vector.

this is the updated verison that take enable_IO_APIC as extra call for
setup_local_APIC to avoid linking warning.

Signed-off-by: Yinghai Lu <[EMAIL PROTECTED]>

Index: linux-2.6/arch/x86/kernel/apic_64.c
===
--- linux-2.6.orig/arch/x86/kernel/apic_64.c
+++ linux-2.6/arch/x86/kernel/apic_64.c
@@ -418,7 +418,7 @@ void __init init_bsp_APIC(void)
apic_write(APIC_LVT1, value);
 }
 
-void __cpuinit setup_local_APIC (void)
+void __cpuinit setup_local_APIC (void (*extra_call)(void))
 {
unsigned int value, maxlvt;
int i, j;
@@ -517,6 +517,13 @@ void __cpuinit setup_local_APIC (void)
value = APIC_DM_NMI | APIC_LVT_MASKED;
apic_write(APIC_LVT1, value);
 
+   /*
+* Now enable IO-APICs, actually call clear_IO_APIC
+* We need clear_IO_APIC before enabling vector on BP
+*/
+   if (extra_call)
+   (*extra_call)();
+
{
unsigned oldvalue;
maxlvt = get_maxlvt();
@@ -1198,6 +1205,8 @@ int disable_apic;
  */
 int __init APIC_init_uniprocessor (void)
 {
+   void (*extra_call)(void) = NULL;
+
if (disable_apic) {
printk(KERN_INFO "Apic disabled\n");
return -1;
@@ -1213,7 +1222,10 @@ int __init APIC_init_uniprocessor (void)
phys_cpu_present_map = physid_mask_of_physid(boot_cpu_id);
apic_write(APIC_ID, SET_APIC_ID(boot_cpu_id));
 
-   setup_local_APIC();
+   if (!skip_ioapic_setup && nr_ioapics)
+   extra_call = enable_IO_APIC;
+
+   setup_local_APIC(extra_call);
 
if (smp_found_config && !skip_ioapic_setup && nr_ioapics)
setup_IO_APIC();
Index: linux-2.6/arch/x86/kernel/io_apic_64.c
===
--- linux-2.6.orig/arch/x86/kernel/io_apic_64.c
+++ linux-2.6/arch/x86/kernel/io_apic_64.c
@@ -1171,7 +1171,7 @@ void __apicdebuginit print_PIC(void)
 
 #endif  /*  0  */
 
-static void __init enable_IO_APIC(void)
+void __init enable_IO_APIC(void)
 {
union IO_APIC_reg_01 reg_01;
int i8259_apic, i8259_pin;
@@ -1790,7 +1790,10 @@ __setup("no_timer_check", notimercheck);
 
 void __init setup_IO_APIC(void)
 {
-   enable_IO_APIC();
+
+   /*
+* calling enable_IO_APIC() is moved to setup_local_APIC for BP
+*/
 
if (acpi_ioapic)
io_apic_irqs = ~0;  /* all IRQs go through IOAPIC */
Index: linux-2.6/include/asm-x86/hw_irq_64.h
===
--- linux-2.6.orig/include/asm-x86/hw_irq_64.h
+++ linux-2.6/include/asm-x86/hw_irq_64.h
@@ -135,6 +135,7 @@ extern void init_8259A(int aeoi);
 extern void send_IPI_self(int vector);
 extern void init_VISWS_APIC_irqs(void);
 extern void setup_IO_APIC(void);
+extern void enable_IO_APIC(void);
 extern void disable_IO_APIC(void);
 extern void print_IO_APIC(void);
 extern int IO_APIC_get_PCI_irq_vector(int bus, int slot, int fn);
Index: linux-2.6/arch/x86/kernel/smpboot_64.c
===
--- linux-2.6.orig/arch/x86/kernel/smpboot_64.c
+++ linux-2.6/arch/x86/kernel/smpboot_64.c
@@ -211,7 +211,7 @@ void __cpuinit smp_callin(void)
 */
 
Dprintk("CALLIN, before setup_local_APIC().\n");
-   setup_local_APIC();
+   setup_local_APIC(NULL);
 
/*
 * Get our bogomips.
@@ -879,6 +879,8 @@ static void __init smp_cpu_index_default
  */
 void __init smp_prepare_cpus(unsigned int max_cpus)
 {
+   void (*extra_call)(void) = NULL;
+
nmi_watchdog_default();
smp_cpu_index_default();
current_cpu_data = boot_cpu_data;
@@ -896,7 +898,10 @@ void __init smp_prepare_cpus(unsigned in
/*
 * Switch from PIC to APIC mode.
 */
-   setup_local_APIC();
+if (!skip_ioapic_setup && nr_ioapics)
+

Re: RFC: permit link(2) to work across --bind mounts ?

2007-12-29 Thread Valdis . Kletnieks

On Sat, 29 Dec 2007 12:40:47 PST, dean gaudet said:

> > See, this is where you show that you don't understand the system.  I'll
> > explain it, just once.  /var/home contains  home directories.  /var/log and
> > /var/home are on the same filesystem.  So /var/log/* can be linked to
> > /var/home/malicious, and that's just one of your basic misunderstandings.
> 
> yes you are on crack.
> 
> i told you i understand this exactly.  it's right there in the message 
> sent.

So... You understand that if /var/home and /var/log are on one file system,
you can hard-link, and you set your system up knowing that, and then you're
*surprised* that:

> the main worry i have is some user maliciously hardlinks everything
> under /var/log somewhere else and slowly fills up the file system with
> old rotated logs.

"Doctor, it hurts when I do this.." "Well, don't do that then".

I think the first time I saw the recommendation "Put /home on its own
filesystem and don't give users directly writable directories on /var (except
via set-uid helpers) so they can't play hardlink games" back in 1983 or so.
I know that when SunOS 3.1 came out, that was already well-understood basic
sysadmining.  Sometimes, there's actual good reasons behind 20-year-old
voodoo.. ;)

You sure you don't want to redesign your filesystem layout so you don't
have to worry about your malicious users hardlinking stuff? Might be a lot
easier than trying to get the kernel to do what you want in this case



pgpPidVpWmhml.pgp
Description: PGP signature

[PATCH] x86: provide a DMI based port 0x80 I/O delay override

2007-12-29 Thread Rene Herman


Hi Linus.

[ resend, forgot the CC to linux-kernel. sorry ]

This fixes "hwclock" triggered boottime hangs for a few HP/Compaq laptops
and might as such be applicable to 2.6.24 still.

The kernel's use of an outb to port 0x80 as an I/O delay disagrees with
these machines (after ACPI is live, that is) and this provides for a DMI
based switch to alternate port 0xed for them.

Complete changelog inside the patch.

An evolved version of this patch that also supplies udelay(2) and 
as I/O delay lives in the x86.git tree as well but Alan Cox suggested those
choices shouldn't yet be provided as he's finding races in drivers on SMP
without the bus-locking outb.

As a minimal version I thought you might perhaps want to take this as a
specific fix for the afflicted laptops for 2.6.24. H. Peter Anvin earlier
agreed it would be minimal enough for that.

It was tested on both of the afflicted machines the DMI strings cover and
doesn't change anything on others by default. It also introduces a bootparam
io_delay= to make (or override) the choice manually.

  Documentation/kernel-parameters.txt |6 ++
  arch/x86/boot/compressed/misc_32.c  |8 +--
  arch/x86/boot/compressed/misc_64.c  |8 +--
  arch/x86/kernel/Makefile_32 |2
  arch/x86/kernel/Makefile_64 |2
  arch/x86/kernel/io_delay.c  |   77

  arch/x86/kernel/setup_32.c  |2
  arch/x86/kernel/setup_64.c  |2
  include/asm-x86/io_32.h |6 --
  include/asm-x86/io_64.h |   27 +++-
  10 files changed, 115 insertions(+), 25 deletions(-)

Rene.

commit b2a10c0b8e6c1c73b940e60fae4cbe9db9ca9e3b
Author: Rene Herman <[EMAIL PROTECTED]>
Date:   Mon Dec 17 21:23:55 2007 +0100

x86: provide a DMI based port 0x80 I/O delay override.

Certain (HP/Compaq) laptops experience trouble from our port 0x80
I/O delay writes. This patch provides for a DMI based switch to the
"alternate diagnostic port" 0xed (as used by some BIOSes as well)
for these.

David P. Reed confirmed that using port 0xed works and provides a
proper delay on his HP Pavilion dv9000z, Islam Amer comfirmed that
it does so on a Compaq Presario V6000. Both are Quanta boards, type
30B9 and 30B7 respectively and are the (only) machines for which
the DMI based switch triggers. HP Pavilion dv6000z is expected to
also need this but its DMI info hasn't been verified yet.

The symptoms of _not_ working are a hanging machine, with "hwclock"
use being a direct trigger and therefore the bootup often hanging
already on these machines.

Earlier versions of this attempted to simply use udelay(2), with the
2 being a value tested to be a nicely conservative upper-bound with
help from many on the linux-kernel mailinglist, but that approach has
two problems.

First, pre-loops_per_jiffy calibration (which is post PIT init while
some implementations of the PIT are actually one of the historically
problematic devices that need the delay) udelay() isn't particularly
well-defined. We could initialise loops_per_jiffy conservatively (and
based on CPU family so as to not unduly delay old machines) which
would sort of work, but still leaves:

Second, delaying isn't the only effect that a write to port 0x80 has.
It's also a PCI posting barrier which some devices may be explicitly
or implicitly relying on. Alan Cox did a survey and found evidence
that additionally various drivers are racy on SMP without the bus
locking outb.

Switching to an inb() makes the timing too unpredictable and as such,
this DMI based switch should be the safest approach for now. Any more
invasive changes should get more rigid testing first. It's moreover
only very few machines with the problem and a DMI based hack seems
to fit that situation.

An early boot parameter to make the choice manually (and override any
possible DMI based decision) is also provided:

io_delay=standard|alternate

This does not change the io_delay() in the boot code which is using
the same port 0x80 I/O delay but those do not appear to be a problem
as tested by David P. Reed. He moreover reported that booting with
"acpi=off" also fixed things and seeing as how ACPI isn't touched
until after this DMI based I/O port switch leaving the ones in the
boot code be is safe.

This patch is partly based on earlier patches from Pavel Machek and
David P. Reed.

Signed-off-by: Rene Herman <[EMAIL PROTECTED]>
Tested-by: David P. Reed <[EMAIL PROTECTED]>
Tested-by: Islam Amer <[EMAIL PROTECTED]>

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 33121d6..6948e25 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -785,6 +785,12 @@ and is between 256 and 4096

Re: 2.6.24-rc6-mm1

2007-12-29 Thread Torsten Kaiser

On Dec 30, 2007 2:30 AM, Herbert Xu <[EMAIL PROTECTED]> wrote:
> On Sat, Dec 29, 2007 at 05:51:13PM +0100, Torsten Kaiser wrote:
> >
> > > > The cause, why I am resending this: I just got a crash with
> > > > 2.6.24-rc6-mm1, again looking network related:
> > > >
> > > > [93436.933356] WARNING: at include/net/dst.h:165 dst_release()
> > > > [93436.936685] Pid: 8079, comm: konqueror Not tainted 2.6.24-rc6-mm1 #11
> > > > [93436.939292]
> > > > [93436.939293] Call Trace:
> > > > [93436.939304]  [] skb_release_all+0xdd/0x110
> > > > [93436.939307]  [] __kfree_skb+0x11/0xa0
> > > > [93436.939309]  [] kfree_skb+0x17/0x30
> > > > [93436.939312]  [] unix_release_sock+0x128/0x250
> > > > [93436.939315]  [] unix_release+0x21/0x30
> > > > [93436.939318]  [] sock_release+0x24/0x90
> > > > [93436.939320]  [] sock_close+0x26/0x50
> > > > [93436.939324]  [] __fput+0xc1/0x230
> > > > [93436.939327]  [] fput+0x16/0x20
> > > > [93436.939329]  [] filp_close+0x56/0x90
> > > > [93436.939331]  [] sys_close+0xa6/0x110
> > > > [93436.939335]  [] system_call_after_swapgs+0x7b/0x80
> >
> > >From code inspection I would blame the patch "[SKBUFF]: Free old skb
> > properly in skb_morph" from Herbert Xu. (CC added)
>
> I doubt it.  skb_morph is only used on IP fragments so I don't see how
> you could attribute an error from a Unix domain socket to this patch.

That's why I wrote that I do not know much about the network core...

> In any case, Unix socket packets should not have a dst at all so the
> very fact that you're in that path means that you have some sort of
> memory corruption.

... I did not know about the fact that there should not have been an dst.

Its just that this warning was the first nice clue about the memory
corruption related to networking that I see since 2.6.24-rc3-mm2.
The time of the patch (Mon, 26 Nov 2007 15:11:19) even fits into the
window between -rc3-mm1 and -rc3-mm2.

I doubt that the memory corruption is a hardware problem, because the
system in question is using ECC ram and I did not see any messages
about corrected/detected errors.

> Is this the very first OOPS/warning that you see? If not you should
> ignore all but the very first one as that may have left your system
> in an inconsistent state which may render all subsequent OOPSes and
> warnings useless.

I looked into the log in question and the only other warning was a
circular locking dependency that lockdep detected around 1.5 hour
before this warning.

As reported in my original mail immeadeatly after the warning the
system OOPSed and hang:
[93436.947241] general protection fault:  [1] SMP
-> first OOPS
[93436.947243] last sysfs file:
/sys/devices/pci:00/:00:0f.0/:01:00.1/irq
[93436.947245] CPU 1
[93436.947246] Modules linked in: radeon drm nfsd exportfs w83792d
ipv6 tuner tea5767 tda8290 tuner_xc2
028 tda9887 tuner_simple mt20xx tea5761 tvaudio msp3400 bttv ir_common
compat_ioctl32 videobuf_dma_sg v
ideobuf_core btcx_risc tveeprom usbhid videodev v4l2_common hid
v4l1_compat pata_amd sg i2c_nforce2
[93436.947257] Pid: 8079, comm: konqueror Not tainted 2.6.24-rc6-mm1 #11
-> not tainted by a previous OOPS
[93436.947259] RIP: 0010:[]  []
skb_drop_list+0x18/0x30
[93436.947262] RSP: 0018:810005f4fda8  EFLAGS: 00010286
[93436.947263] RAX: ab1ed5ca5b74e7de RBX: ab1ed5ca5b74e7de RCX: d135
[93436.947265] RDX: 81011d089a80 RSI: 0001 RDI: 81011d089a88
[93436.947266] RBP: 810005f4fdb8 R08: 0001 R09: 0006
[93436.947268] R10:  R11:  R12: 8100de02c500
[93436.947269] R13: 81011c188a00 R14: 0001 R15: 81011c189198
[93436.947271] FS:  7fb5bde0d700() GS:81007ff22000()
knlGS:
[93436.947273] CS:  0010 DS:  ES:  CR0: 8005003b
[93436.947274] CR2: 7fb5bdd76000 CR3: 664d5000 CR4: 06e0
[93436.947276] DR0:  DR1:  DR2: 
[93436.947277] DR3:  DR6: 0ff0 DR7: 0400
[93436.947279] Process konqueror (pid: 8079, threadinfo
810005f4e000, task 8100a1dec000)
[93436.947281] Stack:  810005f4fdd8 810116c86140
810005f4fdd8 805314ae
[93436.947284]  810116c86140 8100de02c500 810005f4fdf8
80531cf0
[93436.947286]  8100de02c500 81011c188b48 810005f4fe18
80531311
[93436.947288] Call Trace:
[93436.947290]  [] skb_release_data+0x5e/0xa0
[93436.947293]  [] skb_release_all+0xa0/0x110
[93436.947295]  [] __kfree_skb+0x11/0xa0
[93436.947297]  [] kfree_skb+0x17/0x30
[93436.947299]  [] unix_release_sock+0x128/0x250
[93436.947302]  [] unix_release+0x21/0x30
[93436.947304]  [] sock_release+0x24/0x90
[93436.947307]  [] sock_close+0x26/0x50
[93436.947309]  [] __fput+0xc1/0x230
[93436.947312]  [] fput+0x16/0x20
[93436.947314]  [] filp_close+0x56/0x90
[93436.947316]  [] sys_close+0xa6/0x110
[93436.947319]  [] system_call_after_swapgs+0x7b/0x80

Re: [PATCH] Option to disable AMD C1E (allows dynticks to work)

2007-12-29 Thread Rene Herman


On 30-12-07 02:49, Islam Amer wrote:


Glad I could be of help.

Sure please go ahead, I will keep testing this patch with upcoming git
kernels, and report any problems.


Thanks. I'll see if Linus wants it for 2.6.24 still. Could be minimal enough.


So what I understand now is that AMD C1E state saves battery like
dynticks, so we don't need dynticks ?


That bit I haven't a clue about I'm afraid...

Rene.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG][PATCH] bluetooth: put_device before device_del fix

2007-12-29 Thread David Miller

From: Dave Young <[EMAIL PROTECTED]>
Date: Thu, 27 Dec 2007 13:27:50 +0800

> Because of workqueue delay, the put_device could be called before device_del, 
> so move it to del_conn.
> 
> Signed-off-by: Dave Young <[EMAIL PROTECTED]> 

Applied, thanks Dave.

Please post bluetooth patches, just like any other networking patches,
to [EMAIL PROTECTED] (CC:'ing me) so that other networking
developers can process them during times like these when most
maintainers aren't around and thus not looking at bug fixes like
your's.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc6-mm1 - crash in tick_sched_timer/update_process_times

2007-12-29 Thread Valdis . Kletnieks

On Thu, 27 Dec 2007 12:54:34 EST, [EMAIL PROTECTED] said:

> [15345.901919] Unable to handle kernel paging request at 00af008c00cd RIP:
> [15345.901934]  [] scheduler_tick+0xdb/0x1c4
> [15345.901952] PGD 0
> [15345.901959] Oops:  [1] PREEMPT SMP
> [15345.901972] last sysfs file: /sys/devices/platform/coretemp.1/temp1_input
> [15345.901978] CPU 1
> [15345.901984] Modules linked in: irnet ppp_generic slhc irtty_sir sir_dev 
> ircomm_tty ircomm irda crc_ccitt coretemp nf_conntrack_ftp xt_pkttype 
> ipt_REJECT ipt_osf nf_conntrack_ipv4 xt_ipisforif ipt_recent ipt_LOG xt_u32 
> iptable_filter ip_tables xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack 
> ip6t_LOG xt_limit ip6table_filter ip6_tables x_tables sha256_generic 
> aes_generic acpi_cpufreq tpm_tis pcmcia gspca(U) iwl3945 firmware_class 
> iTCO_wdt yenta_socket compat_ioctl32 ohci1394 rsrc_nonstatic 
> iTCO_vendor_support mac80211 ieee1394 pcmcia_core nvidia(P)(U) watchdog_core 
> battery videodev ac watchdog_dev v4l2_common snd_hda_intel v4l1_compat 
> thermal power_supply cfg80211 button intel_agp processor rtc
> [15345.902170] Pid: 0, comm: Eterm Tainted: P2.6.24-rc6-mm1 #4
> [15345.902176] RIP: 0010:[]  [] 
> scheduler_tick+0xdb/0x1c4
> [15345.902189] RSP: 0018:81007f8a3eb8  EFLAGS: 00010083
> [15345.902195] RAX: 00af008c005d RBX: 0df4fefac7d9 RCX: 
> 0004
> [15345.902202] RDX: 0004 RSI: 81007e405600 RDI: 
> 810001011180
> [15345.902208] RBP: 81007f8a3ed8 R08: 0010 R09: 
> 0001
> [15345.902214] R10: 8100808f4000 R11: 8071d180 R12: 
> 810001011180
> [15345.902219] R13: 0001 R14: 81007e405600 R15: 
> 0001
> [15345.902226] FS:  () GS:81007f86f9c0(0063) 
> knlGS:f7dab6c0
> [15345.902233] CS:  0010 DS: 002b ES: 002b CR0: 8005003b
> [15345.902238] CR2: 00af008c00cd CR3: 766c8000 CR4: 
> 06e0
> [15345.902244] DR0:  DR1:  DR2: 
> 
> [15345.902249] DR3:  DR6: 0ff0 DR7: 
> 0400
> [15345.902256] Process Eterm (pid: 0, threadinfo 4394404d, task 
> 81007e405600)
> [15345.902261] Stack:  0001  81007e405600 
> 0c71d7bbe7da
> [15345.902279]  81007f8a3f08 8023f913 81007f8a3f08 
> 81000100e060
> [15345.902293]  810077161b08 81000100df60 81007f8a3f38 
> 80251b7d
> [15345.902306] Call Trace:
> [15345.902312][] update_process_times+0x4a/0x5b
> [15345.902334]  [] tick_sched_timer+0x8e/0xcb
> [15345.902345]  [] hrtimer_interrupt+0x111/0x1a1
> [15345.902357]  [] ia32_setup_frame+0xb5/0x1b7
> [15345.902367]  [] smp_apic_timer_interrupt+0x86/0xa6
> [15345.902377]  [] apic_timer_interrupt+0x66/0x70
> [15345.902383]  
> [15345.902389]
> [15345.902390] Code: ff 50 70 4c 89 e7 e8 4a 2d 2f 00 44 89 ef e8 85 9f ff ff 
> 41
> [15345.902445] RIP  [] scheduler_tick+0xdb/0x1c4
> [15345.902455]  RSP 
> [15345.902461] CR2: 00af008c00cd

In case it makes a difference, the Eterm that causes the issue on exit is
a 32-bit binary, with a 64-bit kernel (though I did have one kernel lockup
with xpdf, which is a 64-bit binary, but I can't prove that was/wasn't this
same issue)

Bisection says:

git-ipwireless_cs.patch GOOD
#
git-x86.patch
git-x86-fixup.patch
git-x86-arch-x86-math-emu-errorsc-fix-printk-warnings.patch
git-x86-drivers-pnp-pnpbios-bioscallsc-build-fix.patch
git-x86-fix-doubly-merged-patch.patch
git-x86-export-leave_mm.patch   BAD

and that's where bisection comes to a halt...

Time to bisect through git-x86, or somebody got a better idea?  Looking at
the commits listed in git-x86.patch, I didn't see anything that jumped out,
but I'm pretty sure the problem is in there somewhere...


pgpppRSp075se.pgp
Description: PGP signature

Re: [PATCH] Fix broken ip= parsing

2007-12-29 Thread David Miller

From: Thomas Bogendoerfer <[EMAIL PROTECTED]>
Date: Sat, 29 Dec 2007 18:08:49 +0100 (CET)

> Commit a6c05c3d064dbb83be88cba3189beb5db9d2dfc3 breaks ip= parsing
> completly, because ic_enable is never set. The patch below puts
> back the way ic_enable was set before.
> 
> Signed-off-by: Thomas Bogendoerfer <[EMAIL PROTECTED]>

I already have this exact patch in my net-2.6 tree from Simon.
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] USB driver for talking to the Microchip PIC18 boot loader

2007-12-29 Thread Xiaofan Chen

On Dec 30, 2007 6:15 AM, Alan Stern <[EMAIL PROTECTED]> wrote:
> On Sat, 29 Dec 2007, mgross wrote:
>
> > I'm playing around with a PIC based project at home (not an Intel
> > activity) and found I needed a usb driver to talk to the boot loader
> > so I can program my USB Bitwhacker with new custom firmware.  The
> > following adds the pic18bl driver to the kernel.  Its pretty simple
> > and is somewhat based on bits of a libusb driver that does some of
> > what this driver does.
> >
> > What do you think?
>
> Not to detract from your driver, but would it be possible to do the
> whole thing in userspace using libusb?  Maybe by extending the driver
> you mentioned?
>

The existing libusb based application works fine for PICDEM FS USB
or those based on it (like the Bitwhacker the OP is using).

Please do not add it to the kernel. There are libusb based application
for both the bootloader and the demo application and both are working
fine under Linux (along with Windows and I am trying to get FreeBSD
working).

Last time the demo application has been added to the ldusb and
I think it is not a good idea. But since then I've added patches to
the existing libusb application.

Relevant discussion in thread
'[PATCH 70/78] USB: add picdem device to ldusb'
http://marc.info/?t=11777007643=1=2

So please do not do this again. It is not a problem for the libusb
based applications after the patches but it is really not necessary.

Original libusb based application for the bootloader:
http://www.internetking.org/fsusb/

Original libusb based application for the Demo which
also includes my patch for libusb-win32.
http://www.varxec.net/picdem_fs_usb/

Updated Patches to detach the kernel driver for both
the bootloader and Demo application.
http://forum.microchip.com/tm.aspx?m=106426

Xiaofan Chen
http://mcuee.blogspot.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: eMMC and MMC plus card supports

2007-12-29 Thread Philip Langdale


Sandeep K wrote:

Hi all,

Can anybody please let me know, does the current linux
tree supports MMC plus and eMMC cards.


It certainly supports MMCplus. My understanding is that eMMC
is just a form-factor and is not visibly different to the host
controller, so it should just work. I know that the nokia n810
has an internal eMMC and it, obviously, works. They had some
odd issues with the Samsung part they're using and SET_BLOCK_COUNT
but I have no idea if that's an eMMC thing or specific to the part
they're using.

You also asked about "28bit lba" support in another email. I assume
you are asking about support for block addressed SDHC and MMCplus-HC cards.
The kernel supports both (although good luck trying to get a hold of an MMC
one).

Hope that helps,

--phil
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: unify x86 Makefile(s)

2007-12-29 Thread Andi Kleen


> Without inlining the maxmimum stack usage inside foobar() is
> max(stack usage foo(), stack usage bar()). [1]

It's a little more complicated. gcc 4.x (not sure which x, might 0) 
is clever enough to not use max() stack, but only use the stack for the 
different scopes as needed similar as when the calls weren't inlined. 
But gcc 3 didn't do that.

> With foo() and bar() inlined (-funit-at-a-time also enables 
> -finline-functions-called-once), the maxmimum stack usage inside 
> foobar() is sum(stack usage foo(), stack usage bar()). And this
> worst case is the area where gcc 4 is much better than gcc 3.4.

Yes exactly.  If the functions weren't inlined the problem wouldn't
occur because the stack sizes do not add up in the same dynamic call chain. 
Thus a few statetic noinlines will fix it.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Option to disable AMD C1E (allows dynticks to work)

2007-12-29 Thread Islam Amer

Glad I could be of help.

Sure please go ahead, I will keep testing this patch with upcoming git
kernels, and report any problems.

So what I understand now is that AMD C1E state saves battery like
dynticks, so we don't need dynticks ?

On Sun, 2007-12-30 at 02:38 +0100, Rene Herman wrote:
> On 29-12-07 23:28, Islam Amer wrote:
> 
> > Thanks for the detailed response.
> > 
> > I thought I had gotten to the bottom of my problems when I found that
> > udev workaround, I guess I was naive.
> > 
> > I did the two tests you described and they predictably caused the hard
> > hangs. I needed to run the port80 program only once to get the hard
> > hang.
> > 
> > The output of the dmidecode commands were :
> > 
> > Quanta
> > 30B7
> > 
> > I applied the patch you provided ( luckily I am using 2.6.24-rc6-git3
> > kernel because I need the b43 driver ), added these values and compiled.
> > 
> > 
> > {
> > .callback   = dmi_io_delay_port_alt,
> > .ident  = "Compaq Presario v6000",
> > .matches= {
> > DMI_MATCH(DMI_BOARD_VENDOR, "Quanta"),
> > DMI_MATCH(DMI_BOARD_NAME, "30B7")
> > }
> > },
> > 
> > I was able to boot without the udev workaround and can now use hwclock
> > without hanging the system. In dmesg I can see this new line :
> 
> Thanks much for testing (and David -- thanks for asking). Updated patch 
> attached with this information. Compaq itself seems to spell the type with a 
> capital V so that's the only difference...
> 
> Can I add a "Tested-by: Islam Amer <[EMAIL PROTECTED]>" to this? (and David, 
> same for you with this address you're using in this thread?)
> 
> Rene.
> plain text document attachment (dmi-port80-minimal-bootparam.diff)
> commit 5f27525d3e796ae12e3186afe8ef0ec41af9e160
> Author: Rene Herman <[EMAIL PROTECTED]>
> Date:   Mon Dec 17 21:23:55 2007 +0100
> 
> x86: provide a DMI based port 0x80 I/O delay override.
> 
> Certain (HP/Compaq) laptops experience trouble from our port 0x80
> I/O delay writes. This patch provides for a DMI based switch to the
> "alternate diagnostic port" 0xed (as used by some BIOSes as well)
> for these.
> 
> David P. Reed confirmed that using port 0xed works and provides a
> proper delay on his HP Pavilion dv9000z, Islam Amer comfirmed that
> it does so on a Compaq Presario V6000. Both are Quanta boards, type
> 30B9 and 30B7 respectively and are the (only) machines for which
> the DMI based switch triggers. HP Pavilion dv6000z is expected to
> also need this but its DMI info hasn't been verified yet.
> 
> The symptoms of _not_ working are a hanging machine, with "hwclock"
> use being a direct trigger and therefore the bootup often hanging
> already on these machines.
> 
> Earlier versions of this attempted to simply use udelay(2), with the
> 2 being a value tested to be a nicely conservative upper-bound with
> help from many on the linux-kernel mailinglist, but that approach has
> two problems.
> 
> First, pre-loops_per_jiffy calibration (which is post PIT init while
> some implementations of the PIT are actually one of the historically
> problematic devices that need the delay) udelay() isn't particularly
> well-defined. We could initialise loops_per_jiffy conservatively (and
> based on CPU family so as to not unduly delay old machines) which
> would sort of work, but still leaves:
> 
> Second, delaying isn't the only effect that a write to port 0x80 has.
> It's also a PCI posting barrier which some devices may be explicitly
> or implicitly relying on. Alan Cox did a survey and found evidence
> that additionally various drivers are racy on SMP without the bus
> locking outb.
> 
> Switching to an inb() makes the timing too unpredictable and as such,
> this DMI based switch should be the safest approach for now. Any more
> invasive changes should get more rigid testing first. It's moreover
> only very few machines with the problem and a DMI based hack seems
> to fit that situation.
> 
> An early boot parameter to make the choice manually (and override any
> possible DMI based decision) is also provided:
> 
>   io_delay=standard|alternate
> 
> This does not change the io_delay() in the boot code which is using
> the same port 0x80 I/O delay but those do not appear to be a problem
> as tested by David P. Reed. He moreover reported that booting with
> "acpi=off" also fixed things and seeing as how ACPI isn't touched
> until after this DMI based I/O port switch leaving the ones in the
> boot code be is safe.
> 
> This patch is partly based on earlier patches from Pavel Machek and
> David P. Reed.
> 
> Signed-off-by: Rene Herman <[EMAIL PROTECTED]>
> 
> diff --git a/Documentation/kernel-parameters.txt 
>

Re: [PATCH] Option to disable AMD C1E (allows dynticks to work)

2007-12-29 Thread Rene Herman


On 29-12-07 23:28, Islam Amer wrote:


Thanks for the detailed response.

I thought I had gotten to the bottom of my problems when I found that
udev workaround, I guess I was naive.

I did the two tests you described and they predictably caused the hard
hangs. I needed to run the port80 program only once to get the hard
hang.

The output of the dmidecode commands were :

Quanta
30B7

I applied the patch you provided ( luckily I am using 2.6.24-rc6-git3
kernel because I need the b43 driver ), added these values and compiled.


{
.callback   = dmi_io_delay_port_alt,
.ident  = "Compaq Presario v6000",
.matches= {
DMI_MATCH(DMI_BOARD_VENDOR, "Quanta"),
DMI_MATCH(DMI_BOARD_NAME, "30B7")
}
},

I was able to boot without the udev workaround and can now use hwclock
without hanging the system. In dmesg I can see this new line :


Thanks much for testing (and David -- thanks for asking). Updated patch 
attached with this information. Compaq itself seems to spell the type with a 
capital V so that's the only difference...


Can I add a "Tested-by: Islam Amer <[EMAIL PROTECTED]>" to this? (and David, 
same for you with this address you're using in this thread?)


Rene.
commit 5f27525d3e796ae12e3186afe8ef0ec41af9e160
Author: Rene Herman <[EMAIL PROTECTED]>
Date:   Mon Dec 17 21:23:55 2007 +0100

x86: provide a DMI based port 0x80 I/O delay override.

Certain (HP/Compaq) laptops experience trouble from our port 0x80
I/O delay writes. This patch provides for a DMI based switch to the
"alternate diagnostic port" 0xed (as used by some BIOSes as well)
for these.

David P. Reed confirmed that using port 0xed works and provides a
proper delay on his HP Pavilion dv9000z, Islam Amer comfirmed that
it does so on a Compaq Presario V6000. Both are Quanta boards, type
30B9 and 30B7 respectively and are the (only) machines for which
the DMI based switch triggers. HP Pavilion dv6000z is expected to
also need this but its DMI info hasn't been verified yet.

The symptoms of _not_ working are a hanging machine, with "hwclock"
use being a direct trigger and therefore the bootup often hanging
already on these machines.

Earlier versions of this attempted to simply use udelay(2), with the
2 being a value tested to be a nicely conservative upper-bound with
help from many on the linux-kernel mailinglist, but that approach has
two problems.

First, pre-loops_per_jiffy calibration (which is post PIT init while
some implementations of the PIT are actually one of the historically
problematic devices that need the delay) udelay() isn't particularly
well-defined. We could initialise loops_per_jiffy conservatively (and
based on CPU family so as to not unduly delay old machines) which
would sort of work, but still leaves:

Second, delaying isn't the only effect that a write to port 0x80 has.
It's also a PCI posting barrier which some devices may be explicitly
or implicitly relying on. Alan Cox did a survey and found evidence
that additionally various drivers are racy on SMP without the bus
locking outb.

Switching to an inb() makes the timing too unpredictable and as such,
this DMI based switch should be the safest approach for now. Any more
invasive changes should get more rigid testing first. It's moreover
only very few machines with the problem and a DMI based hack seems
to fit that situation.

An early boot parameter to make the choice manually (and override any
possible DMI based decision) is also provided:

io_delay=standard|alternate

This does not change the io_delay() in the boot code which is using
the same port 0x80 I/O delay but those do not appear to be a problem
as tested by David P. Reed. He moreover reported that booting with
"acpi=off" also fixed things and seeing as how ACPI isn't touched
until after this DMI based I/O port switch leaving the ones in the
boot code be is safe.

This patch is partly based on earlier patches from Pavel Machek and
David P. Reed.

Signed-off-by: Rene Herman <[EMAIL PROTECTED]>

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 33121d6..6948e25 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -785,6 +785,12 @@ and is between 256 and 4096 characters. It is defined in 
the file
for translation below 32 bit and if not available
then look in the higher range.
 
+   io_delay=   [X86-32,X86-64] I/O delay port
+   standard
+   Use the 0x80 standard I/O delay port (default)
+   alternate
+   Use the 0xed alternate I/O delay port

Re: 2.6.24-rc6-mm1

2007-12-29 Thread Herbert Xu

On Sat, Dec 29, 2007 at 05:51:13PM +0100, Torsten Kaiser wrote:
>
> > > The cause, why I am resending this: I just got a crash with
> > > 2.6.24-rc6-mm1, again looking network related:
> > >
> > > [93436.933356] WARNING: at include/net/dst.h:165 dst_release()
> > > [93436.936685] Pid: 8079, comm: konqueror Not tainted 2.6.24-rc6-mm1 #11
> > > [93436.939292]
> > > [93436.939293] Call Trace:
> > > [93436.939304]  [] skb_release_all+0xdd/0x110
> > > [93436.939307]  [] __kfree_skb+0x11/0xa0
> > > [93436.939309]  [] kfree_skb+0x17/0x30
> > > [93436.939312]  [] unix_release_sock+0x128/0x250
> > > [93436.939315]  [] unix_release+0x21/0x30
> > > [93436.939318]  [] sock_release+0x24/0x90
> > > [93436.939320]  [] sock_close+0x26/0x50
> > > [93436.939324]  [] __fput+0xc1/0x230
> > > [93436.939327]  [] fput+0x16/0x20
> > > [93436.939329]  [] filp_close+0x56/0x90
> > > [93436.939331]  [] sys_close+0xa6/0x110
> > > [93436.939335]  [] system_call_after_swapgs+0x7b/0x80
> 
> >From code inspection I would blame the patch "[SKBUFF]: Free old skb
> properly in skb_morph" from Herbert Xu. (CC added)

I doubt it.  skb_morph is only used on IP fragments so I don't see how
you could attribute an error from a Unix domain socket to this patch.

In any case, Unix socket packets should not have a dst at all so the
very fact that you're in that path means that you have some sort of
memory corruption.

Is this the very first OOPS/warning that you see? If not you should
ignore all but the very first one as that may have left your system
in an inconsistent state which may render all subsequent OOPSes and
warnings useless.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Option to disable AMD C1E (allows dynticks to work)

2007-12-29 Thread Islam Amer

Sorry, pressed send too soon. here is the line in dmesg:

Compaq Presario v6000: using alternate I/O delay port


Thanks, please tell me if you need anymore info.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] security: remove security_sb_post_mountroot hook

2007-12-29 Thread Casey Schaufler


--- "H. Peter Anvin" <[EMAIL PROTECTED]> wrote:

> The security_sb_post_mountroot() hook is long-since obsolete, and is
> fundamentally broken: it is never invoked if someone uses initramfs.
> This is particularly damaging, because the existence of this hook has
> been used as motivation for not using initramfs.
> 
> Stephen Smalley confirmed on 2007-07-19 that this hook was originally
> used by SELinux but can now be safely removed:
> 
>  http://marc.info/?l=linux-kernel=118485683612916=2
> 
> Cc: Stephen Smalley <[EMAIL PROTECTED]>
> Cc: James Morris <[EMAIL PROTECTED]>
> Cc: Eric Paris <[EMAIL PROTECTED]>
> Cc: Chris Wright <[EMAIL PROTECTED]>
> Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]>

It is also the case that Smack does not use this hook.
It can be removed as far as I'm concerned.



Casey Schaufler
[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)

2007-12-29 Thread Miguel Botón

On Sunday 30 December 2007 00:04:17 Matthew wrote:
> so I was wrong XD
>
> sorry,
>
> the error was found in the meantime:
>
> see: http://forums.gentoo.org/viewtopic-p-4667858.html#4667858
>
> Don't need to do more testing. The culprit is the unification of the
> x86 i387 code.
>
> The culprit is 57c3da2f5bb3fafedc31284117ae43bc593b65ab or
> f10c1cfd359660c01446807b6c2bc8ce3aee919a
>
> see http://forums.gentoo.org/viewtopic-p-4667906.html#4667906 and next post
>
> Greetings
>
> Mat
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

These hardlocks start to appear with commit 
f10c1cfd359660c01446807b6c2bc8ce3aee919a

-- 
Miguel Botón
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Option to disable AMD C1E (allows dynticks to work)

2007-12-29 Thread Islam Amer

Thanks for the detailed response.

I thought I had gotten to the bottom of my problems when I found that
udev workaround, I guess I was naive.

I did the two tests you described and they predictably caused the hard
hangs. I needed to run the port80 program only once to get the hard
hang.

The output of the dmidecode commands were :

Quanta
30B7

I applied the patch you provided ( luckily I am using 2.6.24-rc6-git3
kernel because I need the b43 driver ), added these values and compiled.


{
.callback   = dmi_io_delay_port_alt,
.ident  = "Compaq Presario v6000",
.matches= {
DMI_MATCH(DMI_BOARD_VENDOR, "Quanta"),
DMI_MATCH(DMI_BOARD_NAME, "30B7")
}
},

I was able to boot without the udev workaround and can now use hwclock
without hanging the system. In dmesg I can see this new line :



On Sat, 2007-12-29 at 09:43 -0500, David P. Reed wrote: 
> Islam Amer wrote:
> > Hello.
> > I was interested in getting dynticks to work on my compaq presario v6000
> > to help with the 1 hour thirty minutes battery time, but after this
> > discussion I lost interest.
> >
> > I too had the early boot time hang, and found it was udev triggering the
> > bug.
> >   
> This early boot time hang is *almost certainly* due to the in/out port 
> 80 bug, which I discovered a few weeks ago, which affects hwclock and 
> other I/O device drivers on a number of HP/Compaq machines in exactly 
> this way.  The proper fix for this bug is in dispute, and will probably 
> not occur in the 2.6.24 release because it touches code in many, many 
> drivers.  The simplest way to test if you have a problem of this sort is 
> to try this shell line as root, after you boot successfully.  If your 
> machine hangs hard,  you have a problem that really looks like the port 
> 80 problem.
> 
> for ((i = 0; i < 1000; i = i + 1)); do cat /dev/nvram > /dev/null; done
> 
> I have also attached a c program that only touches port 80.  Compile it 
> for 32-bit mode (see comment), run it as root, and after two or three 
> runs, it will hang a system that has the port 80 bug.
> 
> If you then run:
> 
> dmidecode -s baseboard-manufacturer
> dmidecode -s baseboard-product-name
> 
> are the values you should plug into the .matches field in the 
> dmi_system_id struct in the attached patch. It would be great if you 
> could do that, test, and post back with those values so they can be 
> accumulated.  HP/Compaq machines with quanta m/b's are very popular, and 
> very common - so at least a quirk patch for all the broken models would 
> be worth doing in 2.6.25 or downstream in the distros.  The right 
> patches will probably take a long time - there is a dispute as to what 
> the semantics of port 80 writes even mean among the core kernel 
> developers, because the hack is lost in the dim dark days of history, 
> and safe resolution will take time
> 
> There is also a C1E issue with the BIOS in my machine (an HP Pavilion 
> dv9000z).  I don't know if it is a bug, yet, but that's a different 
> problem - associated with dynticks, perhaps.  I have to say that 
> researching the AMD Kernel/BIOS docs on C1E (a very new feature in the 
> last year on AMD) leaves me puzzled as to whether the dynticks problem 
> exists on my machine at all, but the patch for it turns off dynticks!
> 
> 
> 
> > Changing the /etc/init.d/udev script so that the line containing
> >
> > /sbin/udevtrigger
> >
> > to
> >
> > /sbin/udevtrigger --subsystem-nomatch="*misc*"
> >
> > seemed to fix things.
> >
> > the hang is triggered specifically by 
> >
> > echo add > /sys/class/misc/rtc/uevent
> > after inserting rtc.ko
> >
> > Also using hwclock to set the rtc , will cause a hard hang, if you are
> > using 64bit linux. Disable the init scripts that set the time, or use
> > the 32bit binary, as suggested here : 
> >
> > http://www.mail-archive.com/[EMAIL PROTECTED]/msg41964.html
> >
> > I hope this helps. But your hardware is slightly different though.
> >   
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] security: remove security_sb_post_mountroot hook

2007-12-29 Thread H. Peter Anvin

The security_sb_post_mountroot() hook is long-since obsolete, and is
fundamentally broken: it is never invoked if someone uses initramfs.
This is particularly damaging, because the existence of this hook has
been used as motivation for not using initramfs.

Stephen Smalley confirmed on 2007-07-19 that this hook was originally
used by SELinux but can now be safely removed:

 http://marc.info/?l=linux-kernel=118485683612916=2

Cc: Stephen Smalley <[EMAIL PROTECTED]>
Cc: James Morris <[EMAIL PROTECTED]>
Cc: Eric Paris <[EMAIL PROTECTED]>
Cc: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]>
---
 include/linux/security.h |8 
 init/do_mounts.c |1 -
 security/dummy.c |6 --
 security/security.c  |5 -
 4 files changed, 0 insertions(+), 20 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index ac05083..21185bc 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -243,9 +243,6 @@ struct request_sock;
  * @mnt contains the mounted file system.
  * @flags contains the new filesystem flags.
  * @data contains the filesystem-specific data.
- * @sb_post_mountroot:
- * Update the security module's state when the root filesystem is mounted.
- * This hook is only called if the mount was successful.
  * @sb_post_addmount:
  * Update the security module's state when a filesystem is mounted.
  * This hook is called any time a mount is successfully grafetd to
@@ -1235,7 +1232,6 @@ struct security_operations {
void (*sb_umount_busy) (struct vfsmount * mnt);
void (*sb_post_remount) (struct vfsmount * mnt,
 unsigned long flags, void *data);
-   void (*sb_post_mountroot) (void);
void (*sb_post_addmount) (struct vfsmount * mnt,
  struct nameidata * mountpoint_nd);
int (*sb_pivotroot) (struct nameidata * old_nd,
@@ -1495,7 +1491,6 @@ int security_sb_umount(struct vfsmount *mnt, int flags);
 void security_sb_umount_close(struct vfsmount *mnt);
 void security_sb_umount_busy(struct vfsmount *mnt);
 void security_sb_post_remount(struct vfsmount *mnt, unsigned long flags, void 
*data);
-void security_sb_post_mountroot(void);
 void security_sb_post_addmount(struct vfsmount *mnt, struct nameidata 
*mountpoint_nd);
 int security_sb_pivotroot(struct nameidata *old_nd, struct nameidata *new_nd);
 void security_sb_post_pivotroot(struct nameidata *old_nd, struct nameidata 
*new_nd);
@@ -1777,9 +1772,6 @@ static inline void security_sb_post_remount (struct 
vfsmount *mnt,
 unsigned long flags, void *data)
 { }
 
-static inline void security_sb_post_mountroot (void)
-{ }
-
 static inline void security_sb_post_addmount (struct vfsmount *mnt,
  struct nameidata *mountpoint_nd)
 { }
diff --git a/init/do_mounts.c b/init/do_mounts.c
index 4efa1e5..31b2185 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -470,6 +470,5 @@ void __init prepare_namespace(void)
 out:
sys_mount(".", "/", NULL, MS_MOVE, NULL);
sys_chroot(".");
-   security_sb_post_mountroot();
 }
 
diff --git a/security/dummy.c b/security/dummy.c
index 3ccfbbe..1c5ab2b 100644
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -225,11 +225,6 @@ static void dummy_sb_post_remount (struct vfsmount *mnt, 
unsigned long flags,
 }
 
 
-static void dummy_sb_post_mountroot (void)
-{
-   return;
-}
-
 static void dummy_sb_post_addmount (struct vfsmount *mnt, struct nameidata *nd)
 {
return;
@@ -994,7 +989,6 @@ void security_fixup_ops (struct security_operations *ops)
set_to_dummy_if_null(ops, sb_umount_close);
set_to_dummy_if_null(ops, sb_umount_busy);
set_to_dummy_if_null(ops, sb_post_remount);
-   set_to_dummy_if_null(ops, sb_post_mountroot);
set_to_dummy_if_null(ops, sb_post_addmount);
set_to_dummy_if_null(ops, sb_pivotroot);
set_to_dummy_if_null(ops, sb_post_pivotroot);
diff --git a/security/security.c b/security/security.c
index 0e1f1f1..fb6767b 100644
--- a/security/security.c
+++ b/security/security.c
@@ -288,11 +288,6 @@ void security_sb_post_remount(struct vfsmount *mnt, 
unsigned long flags, void *d
security_ops->sb_post_remount(mnt, flags, data);
 }
 
-void security_sb_post_mountroot(void)
-{
-   security_ops->sb_post_mountroot();
-}
-
 void security_sb_post_addmount(struct vfsmount *mnt, struct nameidata 
*mountpoint_nd)
 {
security_ops->sb_post_addmount(mnt, mountpoint_nd);
-- 
1.5.3.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: TOMOYO Linux Security Goal

2007-12-29 Thread Pavel Machek

On Fri 2007-12-28 12:23:51, [EMAIL PROTECTED] wrote:
> On Fri, 28 Dec 2007 23:32:09 +0900, Tetsuo Handa said:
> 
> > You can run your system with only policy collected by learning mode.
> > Thus, you basically don't need manual intervention.
> > But since there are randomly named files (i.e. temporary files),
> > you pay a little time to modify policy.
> > 
> > The learning mode is to save time for permitting commonly accessed 
> > resources.
> > Administrator reviews policy collected by learning mode. Thus the 
> > readability
> > of policy is important so that administrator can understand what he/she is
> > going to allow or reject.
> 
> Please make a *big* notation someplace that "learning mode" is quite likely to
> *not* produce a totally correct policy.  In particular, it won't build rules 
> for
> infrequently used code paths (such as error handling) unless you find a way to
> exercise those paths while in learning mode.
> 
> Particularly fun - when learning mode doesn't create an entry for the logfile
> for I/O errors.  Then when one actually happens, you have no idea what it 
> was...

Yes... if you disallow access to /etc/nologin (or do something
similary stupid) you can even introduce security hole...
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)

2007-12-29 Thread Matthew

so I was wrong XD

sorry,

the error was found in the meantime:

see: http://forums.gentoo.org/viewtopic-p-4667858.html#4667858

Don't need to do more testing. The culprit is the unification of the
x86 i387 code.

The culprit is 57c3da2f5bb3fafedc31284117ae43bc593b65ab or
f10c1cfd359660c01446807b6c2bc8ce3aee919a

see http://forums.gentoo.org/viewtopic-p-4667906.html#4667906 and next post

Greetings

Mat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-29 Thread Hemmann, Volker Armin

Hi,

you guys were right, I was wrong.

It is the hardware. 

I increased ram voltage by 0.15V on the 22nd and hadn't any oopses since then. 
And I did torture the system.

I am deeply sorry that I wasted your time (but still puzzled that the oopses 
started after kernel update - maybe I should buy a new psu... ).

So it is not reiser4  nor the kernel, just the ram needs a little more 'juice' 
than the board delivers on 'auto' settings.

Glück Auf
Volker
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix broken ip= parsing

2007-12-29 Thread Adrian McMenamin

On 29/12/2007, Thomas Bogendoerfer <[EMAIL PROTECTED]> wrote:
> Commit a6c05c3d064dbb83be88cba3189beb5db9d2dfc3 breaks ip= parsing
> completly, because ic_enable is never set. The patch below puts
> back the way ic_enable was set before.
>
> Signed-off-by: Thomas Bogendoerfer <[EMAIL PROTECTED]>
> ---
>
This patch certainly fixes the problem I was having with NFS root - I
had a working setup that stopped (though at the same time as I updated
my Busybox setup so I thought this was my mistake).

Please apply.

Tested by: Adrian McMenamin <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] sleepy linux

2007-12-29 Thread Pavel Machek

Hi!

> > ... I also don't need to call any suspend() routines, because all the
> > drivers are already suspended, right?
> 
> Well, you have a number of devices which cannot do runtime pm.
> They can do suspend/resume with the whole system. For them these
> operations mean saving/restoring state.
> So for these devices implementing autosuspend makes no sense.
> They would sensibly do only idle/busy detection.

Yep... Let's call busy/idle detection and save/restore state
"autosuspend" for those devices. It does not save any power, but it
can be viewed as "kind-of-suspend". (No, I do not have this kind of
details ready).

> > And yes, I want device activity to prevent s2ram. If user is burning
> > CD, machine should not sleep. If user is actively typing, machine
> 
> In these cases the devices involved should report themselves busy,
> shouldn't they?

Yes.

> > should not sleep. My vision is: screen saver tells kernel keyboard
> > need not be very responsive, at that point keyboard driver can
> > autosuspend the keyboard, and if that was the last device, whole
> > system sleeps.
> 
> We lack a notion of telling devices that they are opened only for
> detecting wakeups. Currently a driver has to assume that an opened
> device has to be fully functional.

Yes, we'll need to add some userland interfaces. No, this will not be
easy.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] sleepy linux

2007-12-29 Thread Pavel Machek

Hi!

> > > > Is there an easy way to tell if all the devices are runtime suspended?
> > > 
> > > Do you really want to know whether they are suspended or whether they
> > > could be suspended?
> > 
> > If they are suspended.
> > 
> > My plan is: let the drivers autosuspend on their own. If I see all of
> > them are autosuspended, then it looks like great time to put whole
> > system into s2ram...
> 
> Your calculation of cost/benefit will be wrong. A driver will have timeouts
> based on the cost of a suspend/resume cycle of that device only.
> You'd have to calculate of keeping the whole system awake against that.

Hmm, right. Driver probably should have chance to autosuspend but tell
the core that whole system probably should not sleep... Hmm

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] sleepy linux

2007-12-29 Thread Pavel Machek

Hi!

> >NOHZ + C4 + turn off screen + turn off disk + turn off SATA is still
> >~8W on thinkpad x60.
> >
> >S3 is ~1W.
> >
> >That's quite significant difference.
> >
> >(But yes, connected-to-ethernet is not most important use scenario.)
> > Pavel
> 
> Still... if we could get the desktops of the world down anywhere close 
> to that range when not used, it would be a huge win.

Plus, it is probably mandatory if you want EnergyStar logo ...
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Hibernation: Document __save_processor_state() on x86-64

2007-12-29 Thread Pavel Machek

Hi!

> From: Rafael J. Wysocki <[EMAIL PROTECTED]>
> 
> Document the fact that __save_processor_state() has to save all CPU
> registers referred to by the kernel in case a different kernel is
> used to load and restore a hibernation image containing it. 


> Sigend-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
> ---
>  arch/x86/kernel/suspend_64.c |   20 
>  1 file changed, 20 insertions(+)
> 
> Index: linux-2.6/arch/x86/kernel/suspend_64.c
> ===
> --- linux-2.6.orig/arch/x86/kernel/suspend_64.c
> +++ linux-2.6/arch/x86/kernel/suspend_64.c
> @@ -19,6 +19,21 @@ extern const void __nosave_begin, __nosa
>  
>  struct saved_context saved_context;
>  
> +/**
> + *   __save_processor_state - save CPU registers before creating a
> + *   hibernation image and before restoring the memory state from it
> + *   @ctxt - structure to store the registers contents in
> + *
> + *   NOTE: If there is a CPU register the modification of which by the
> + *   boot kernel (ie. the kernel used for loading the hibernation image)
> + *   might affect the operations of the restored target kernel (ie. the one
> + *   saved in the hibernation image), then its contents must be saved by this
> + *   function.  In other words, if kernel A is hibernated and different
> + *   kernel B is used for loading the hibernation image into memory, the
> + *   kernel A's __save_processor_state() function must save all registers
> + *   needed by kernel A, so that it can operate correctly after the resume
> + *   regardless of what kernel B does in the meantime.
> + */

Maybe this warning should be appended to struct saved_context
definition? Reordering its fields (etc) would be bad news, too, and
documentation near data structures is easier to find...

Thanks,
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Top 9 kernel oopses/warnings for the week of December 29th, 2007

2007-12-29 Thread Geert Uytterhoeven

On Sat, 29 Dec 2007, Linus Torvalds wrote:
> On Sat, 29 Dec 2007, Arjan van de Ven wrote:
> > hmmm.. the copy in my Sent folder looks fine, as does the one in the lkml
> > archive:
> > http://lkml.org/lkml/2007/12/29/41
> > 
> > This is distinctly weird.
> 
> Ahh, it seems to be a alpine bug. Probably brought on by alpine trying to 
> highlight the web addresses.
> 
> It doesn't always happen - between Rank 3 and Rank 4, you have an empty 
> line, and alpine reacted correctly to that one, but the "empty" line 
> between Rank 1 and Rank 2 actually contained a single TAB, and that 
> apparently made alpine really confused.
> 
> So never mind. I'll make a bug-report on alpine, it wasn't a bug in your 
> email.

The bug is also present in the original branch, called pine 4.64 ;-)

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] USB driver for talking to the Microchip PIC18 boot loader

2007-12-29 Thread Alan Stern

On Sat, 29 Dec 2007, mgross wrote:

> I'm playing around with a PIC based project at home (not an Intel
> activity) and found I needed a usb driver to talk to the boot loader
> so I can program my USB Bitwhacker with new custom firmware.  The
> following adds the pic18bl driver to the kernel.  Its pretty simple
> and is somewhat based on bits of a libusb driver that does some of
> what this driver does.
> 
> What do you think?

Not to detract from your driver, but would it be possible to do the 
whole thing in userspace using libusb?  Maybe by extending the driver 
you mentioned?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 >

1 - 100 of 394 matches

Mail list logo