[PATCH 10/41] KVM: Remove minor wart from KVM_CREATE_VCPU ioctl

2007-04-01 Thread Avi Kivity
That ioctl does not transfer any data, so it should be an _IO rather than an
_IOW.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 include/linux/kvm.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index c6dd4a7..d89189a 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -241,7 +241,7 @@ struct kvm_cpuid {
  * KVM_CREATE_VCPU receives as a parameter the vcpu slot, and returns
  * a vcpu fd.
  */
-#define KVM_CREATE_VCPU   _IOW(KVMIO, 11, int)
+#define KVM_CREATE_VCPU   _IO(KVMIO, 11)
 #define KVM_GET_DIRTY_LOG _IOW(KVMIO, 12, struct kvm_dirty_log)
 
 /*
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 15/41] KVM: Add a special exit reason when exiting due to an interrupt

2007-04-01 Thread Avi Kivity
This is redundant, as we also return -EINTR from the ioctl, but it
allows us to examine the exit_reason field on resume without seeing
old data.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/svm.c   |2 ++
 drivers/kvm/vmx.c   |2 ++
 include/linux/kvm.h |3 ++-
 3 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index b09928f..0311665 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -1619,12 +1619,14 @@ again:
if (signal_pending(current)) {
++kvm_stat.signal_exits;
post_kvm_run_save(vcpu, kvm_run);
+   kvm_run->exit_reason = KVM_EXIT_INTR;
return -EINTR;
}
 
if (dm_request_for_irq_injection(vcpu, kvm_run)) {
++kvm_stat.request_irq_exits;
post_kvm_run_save(vcpu, kvm_run);
+   kvm_run->exit_reason = KVM_EXIT_INTR;
return -EINTR;
}
kvm_resched(vcpu);
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index cf9568f..e69bab6 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -1941,12 +1941,14 @@ again:
if (signal_pending(current)) {
++kvm_stat.signal_exits;
post_kvm_run_save(vcpu, kvm_run);
+   kvm_run->exit_reason = KVM_EXIT_INTR;
return -EINTR;
}
 
if (dm_request_for_irq_injection(vcpu, kvm_run)) {
++kvm_stat.request_irq_exits;
post_kvm_run_save(vcpu, kvm_run);
+   kvm_run->exit_reason = KVM_EXIT_INTR;
return -EINTR;
}
 
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 57f47ef..b3af92e 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -11,7 +11,7 @@
 #include 
 #include 
 
-#define KVM_API_VERSION 8
+#define KVM_API_VERSION 9
 
 /*
  * Architectural interrupt line count, and the size of the bitmap needed
@@ -45,6 +45,7 @@ enum kvm_exit_reason {
KVM_EXIT_IRQ_WINDOW_OPEN  = 7,
KVM_EXIT_SHUTDOWN = 8,
KVM_EXIT_FAIL_ENTRY   = 9,
+   KVM_EXIT_INTR = 10,
 };
 
 /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/41] KVM: Export

2007-04-01 Thread Avi Kivity
This allows users to actually build prgrams that use kvm without
the entire source tree.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 include/linux/Kbuild |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/Kbuild b/include/linux/Kbuild
index e81e301..b35b593 100644
--- a/include/linux/Kbuild
+++ b/include/linux/Kbuild
@@ -99,6 +99,7 @@ header-y += iso_fs.h
 header-y += ixjuser.h
 header-y += jffs2.h
 header-y += keyctl.h
+header-y += kvm.h
 header-y += limits.h
 header-y += lock_dlm_plock.h
 header-y += magic.h
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 12/41] KVM: Add method to check for backwards-compatible API extensions

2007-04-01 Thread Avi Kivity
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm_main.c |6 ++
 include/linux/kvm.h|5 +
 2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 5d24203..39cf8fd 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -2416,6 +2416,12 @@ static long kvm_dev_ioctl(struct file *filp,
r = 0;
break;
}
+   case KVM_CHECK_EXTENSION:
+   /*
+* No extensions defined at present.
+*/
+   r = 0;
+   break;
default:
;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 93472da..c93cf53 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -232,6 +232,11 @@ struct kvm_cpuid {
 #define KVM_GET_API_VERSION   _IO(KVMIO,   0x00)
 #define KVM_CREATE_VM _IO(KVMIO,   0x01) /* returns a VM fd */
 #define KVM_GET_MSR_INDEX_LIST_IOWR(KVMIO, 0x02, struct kvm_msr_list)
+/*
+ * Check if a kvm extension is available.  Argument is extension number,
+ * return is 1 (yes) or 0 (no, sorry).
+ */
+#define KVM_CHECK_EXTENSION   _IO(KVMIO,   0x03)
 
 /*
  * ioctls for VM fds
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/41] KVM: Remove the 'emulated' field from the userspace interface

2007-04-01 Thread Avi Kivity
We no longer emulate single instructions in userspace.  Instead, we service
mmio or pio requests.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm_main.c |5 -
 include/linux/kvm.h|3 +--
 2 files changed, 1 insertions(+), 7 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index caec54f..5d24203 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -1588,11 +1588,6 @@ static int kvm_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
/* re-sync apic's tpr */
vcpu->cr8 = kvm_run->cr8;
 
-   if (kvm_run->emulated) {
-   kvm_arch_ops->skip_emulated_instruction(vcpu);
-   kvm_run->emulated = 0;
-   }
-
if (kvm_run->io_completed) {
if (vcpu->pio_pending)
complete_pio(vcpu);
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 15e23bc..c6dd4a7 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -51,10 +51,9 @@ enum kvm_exit_reason {
 /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
 struct kvm_run {
/* in */
-   __u32 emulated;  /* skip current instruction */
__u32 io_completed; /* mmio/pio request completed */
__u8 request_interrupt_window;
-   __u8 padding1[7];
+   __u8 padding1[3];
 
/* out */
__u32 exit_type;
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 18/41] KVM: Allow kernel to select size of mmap() buffer

2007-04-01 Thread Avi Kivity
This allows us to store offsets in the kernel/user kvm_run area, and be
sure that userspace has them mapped.  As offsets can be outside the
kvm_run struct, userspace has no way of knowing how much to mmap.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm_main.c |8 +++-
 include/linux/kvm.h|4 
 2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index df85f5f..cba0b87 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -2436,7 +2436,7 @@ static long kvm_dev_ioctl(struct file *filp,
  unsigned int ioctl, unsigned long arg)
 {
void __user *argp = (void __user *)arg;
-   int r = -EINVAL;
+   long r = -EINVAL;
 
switch (ioctl) {
case KVM_GET_API_VERSION:
@@ -2478,6 +2478,12 @@ static long kvm_dev_ioctl(struct file *filp,
 */
r = 0;
break;
+   case KVM_GET_VCPU_MMAP_SIZE:
+   r = -EINVAL;
+   if (arg)
+   goto out;
+   r = PAGE_SIZE;
+   break;
default:
;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index c0d10cd..dad9081 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -253,6 +253,10 @@ struct kvm_signal_mask {
  * return is 1 (yes) or 0 (no, sorry).
  */
 #define KVM_CHECK_EXTENSION   _IO(KVMIO,   0x03)
+/*
+ * Get size for mmap(vcpu_fd)
+ */
+#define KVM_GET_VCPU_MMAP_SIZE_IO(KVMIO,   0x04) /* in bytes */
 
 /*
  * ioctls for VM fds
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/41] KVM: Allow userspace to process hypercalls which have no kernel handler

2007-04-01 Thread Avi Kivity
This is useful for paravirtualized graphics devices, for example.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm_main.c |   18 +-
 include/linux/kvm.h|   10 +-
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 39cf8fd..de93117 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -1203,7 +1203,16 @@ int kvm_hypercall(struct kvm_vcpu *vcpu, struct kvm_run 
*run)
}
switch (nr) {
default:
-   ;
+   run->hypercall.args[0] = a0;
+   run->hypercall.args[1] = a1;
+   run->hypercall.args[2] = a2;
+   run->hypercall.args[3] = a3;
+   run->hypercall.args[4] = a4;
+   run->hypercall.args[5] = a5;
+   run->hypercall.ret = ret;
+   run->hypercall.longmode = is_long_mode(vcpu);
+   kvm_arch_ops->decache_regs(vcpu);
+   return 0;
}
vcpu->regs[VCPU_REGS_RAX] = ret;
kvm_arch_ops->decache_regs(vcpu);
@@ -1599,6 +1608,13 @@ static int kvm_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
 
vcpu->mmio_needed = 0;
 
+   if (kvm_run->exit_type == KVM_EXIT_TYPE_VM_EXIT
+   && kvm_run->exit_reason == KVM_EXIT_HYPERCALL) {
+   kvm_arch_ops->cache_regs(vcpu);
+   vcpu->regs[VCPU_REGS_RAX] = kvm_run->hypercall.ret;
+   kvm_arch_ops->decache_regs(vcpu);
+   }
+
r = kvm_arch_ops->run(vcpu, kvm_run);
 
vcpu_put(vcpu);
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index c93cf53..9151ebf 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -11,7 +11,7 @@
 #include 
 #include 
 
-#define KVM_API_VERSION 6
+#define KVM_API_VERSION 7
 
 /*
  * Architectural interrupt line count, and the size of the bitmap needed
@@ -41,6 +41,7 @@ enum kvm_exit_reason {
KVM_EXIT_UNKNOWN  = 0,
KVM_EXIT_EXCEPTION= 1,
KVM_EXIT_IO   = 2,
+   KVM_EXIT_HYPERCALL= 3,
KVM_EXIT_DEBUG= 4,
KVM_EXIT_HLT  = 5,
KVM_EXIT_MMIO = 6,
@@ -103,6 +104,13 @@ struct kvm_run {
__u32 len;
__u8  is_write;
} mmio;
+   /* KVM_EXIT_HYPERCALL */
+   struct {
+   __u64 args[6];
+   __u64 ret;
+   __u32 longmode;
+   __u32 pad;
+   } hypercall;
};
 };
 
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/41] KVM: Renumber ioctls

2007-04-01 Thread Avi Kivity
The recent changes have left the ioctl numbers in complete disarray.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 include/linux/kvm.h |   34 +-
 1 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index d89189a..93472da 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -229,34 +229,34 @@ struct kvm_cpuid {
 /*
  * ioctls for /dev/kvm fds:
  */
-#define KVM_GET_API_VERSION   _IO(KVMIO, 1)
-#define KVM_CREATE_VM _IO(KVMIO, 2) /* returns a VM fd */
-#define KVM_GET_MSR_INDEX_LIST_IOWR(KVMIO, 15, struct kvm_msr_list)
+#define KVM_GET_API_VERSION   _IO(KVMIO,   0x00)
+#define KVM_CREATE_VM _IO(KVMIO,   0x01) /* returns a VM fd */
+#define KVM_GET_MSR_INDEX_LIST_IOWR(KVMIO, 0x02, struct kvm_msr_list)
 
 /*
  * ioctls for VM fds
  */
-#define KVM_SET_MEMORY_REGION _IOW(KVMIO, 10, struct kvm_memory_region)
+#define KVM_SET_MEMORY_REGION _IOW(KVMIO, 0x40, struct kvm_memory_region)
 /*
  * KVM_CREATE_VCPU receives as a parameter the vcpu slot, and returns
  * a vcpu fd.
  */
-#define KVM_CREATE_VCPU   _IO(KVMIO, 11)
-#define KVM_GET_DIRTY_LOG _IOW(KVMIO, 12, struct kvm_dirty_log)
+#define KVM_CREATE_VCPU   _IO(KVMIO,  0x41)
+#define KVM_GET_DIRTY_LOG _IOW(KVMIO, 0x42, struct kvm_dirty_log)
 
 /*
  * ioctls for vcpu fds
  */
-#define KVM_RUN   _IO(KVMIO, 16)
-#define KVM_GET_REGS  _IOR(KVMIO, 3, struct kvm_regs)
-#define KVM_SET_REGS  _IOW(KVMIO, 4, struct kvm_regs)
-#define KVM_GET_SREGS _IOR(KVMIO, 5, struct kvm_sregs)
-#define KVM_SET_SREGS _IOW(KVMIO, 6, struct kvm_sregs)
-#define KVM_TRANSLATE _IOWR(KVMIO, 7, struct kvm_translation)
-#define KVM_INTERRUPT _IOW(KVMIO, 8, struct kvm_interrupt)
-#define KVM_DEBUG_GUEST   _IOW(KVMIO, 9, struct kvm_debug_guest)
-#define KVM_GET_MSRS  _IOWR(KVMIO, 13, struct kvm_msrs)
-#define KVM_SET_MSRS  _IOW(KVMIO, 14, struct kvm_msrs)
-#define KVM_SET_CPUID _IOW(KVMIO, 17, struct kvm_cpuid)
+#define KVM_RUN   _IO(KVMIO,   0x80)
+#define KVM_GET_REGS  _IOR(KVMIO,  0x81, struct kvm_regs)
+#define KVM_SET_REGS  _IOW(KVMIO,  0x82, struct kvm_regs)
+#define KVM_GET_SREGS _IOR(KVMIO,  0x83, struct kvm_sregs)
+#define KVM_SET_SREGS _IOW(KVMIO,  0x84, struct kvm_sregs)
+#define KVM_TRANSLATE _IOWR(KVMIO, 0x85, struct kvm_translation)
+#define KVM_INTERRUPT _IOW(KVMIO,  0x86, struct kvm_interrupt)
+#define KVM_DEBUG_GUEST   _IOW(KVMIO,  0x87, struct kvm_debug_guest)
+#define KVM_GET_MSRS  _IOWR(KVMIO, 0x88, struct kvm_msrs)
+#define KVM_SET_MSRS  _IOW(KVMIO,  0x89, struct kvm_msrs)
+#define KVM_SET_CPUID _IOW(KVMIO,  0x8a, struct kvm_cpuid)
 
 #endif
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 14/41] KVM: Fold kvm_run::exit_type into kvm_run::exit_reason

2007-04-01 Thread Avi Kivity
Currently, userspace is told about the nature of the last exit from the
guest using two fields, exit_type and exit_reason, where exit_type has
just two enumerations (and no need for more).  So fold exit_type into
exit_reason, reducing the complexity of determining what really happened.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm_main.c |3 +--
 drivers/kvm/svm.c  |7 +++
 drivers/kvm/vmx.c  |7 +++
 include/linux/kvm.h|   15 ---
 4 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index de93117..ac44df5 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -1608,8 +1608,7 @@ static int kvm_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
 
vcpu->mmio_needed = 0;
 
-   if (kvm_run->exit_type == KVM_EXIT_TYPE_VM_EXIT
-   && kvm_run->exit_reason == KVM_EXIT_HYPERCALL) {
+   if (kvm_run->exit_reason == KVM_EXIT_HYPERCALL) {
kvm_arch_ops->cache_regs(vcpu);
vcpu->regs[VCPU_REGS_RAX] = kvm_run->hypercall.ret;
kvm_arch_ops->decache_regs(vcpu);
diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index d4b2936..b09928f 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -1298,8 +1298,6 @@ static int handle_exit(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
 {
u32 exit_code = vcpu->svm->vmcb->control.exit_code;
 
-   kvm_run->exit_type = KVM_EXIT_TYPE_VM_EXIT;
-
if (is_external_interrupt(vcpu->svm->vmcb->control.exit_int_info) &&
exit_code != SVM_EXIT_EXCP_BASE + PF_VECTOR)
printk(KERN_ERR "%s: unexpected exit_ini_info 0x%x "
@@ -1609,8 +1607,9 @@ again:
vcpu->svm->next_rip = 0;
 
if (vcpu->svm->vmcb->control.exit_code == SVM_EXIT_ERR) {
-   kvm_run->exit_type = KVM_EXIT_TYPE_FAIL_ENTRY;
-   kvm_run->exit_reason = vcpu->svm->vmcb->control.exit_code;
+   kvm_run->exit_reason = KVM_EXIT_FAIL_ENTRY;
+   kvm_run->fail_entry.hardware_entry_failure_reason
+   = vcpu->svm->vmcb->control.exit_code;
post_kvm_run_save(vcpu, kvm_run);
return 0;
}
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index 71410a6..cf9568f 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -1922,10 +1922,10 @@ again:
 
asm ("mov %0, %%ds; mov %0, %%es" : : "r"(__USER_DS));
 
-   kvm_run->exit_type = 0;
if (fail) {
-   kvm_run->exit_type = KVM_EXIT_TYPE_FAIL_ENTRY;
-   kvm_run->exit_reason = vmcs_read32(VM_INSTRUCTION_ERROR);
+   kvm_run->exit_reason = KVM_EXIT_FAIL_ENTRY;
+   kvm_run->fail_entry.hardware_entry_failure_reason
+   = vmcs_read32(VM_INSTRUCTION_ERROR);
r = 0;
} else {
/*
@@ -1935,7 +1935,6 @@ again:
profile_hit(KVM_PROFILING, (void 
*)vmcs_readl(GUEST_RIP));
 
vcpu->launched = 1;
-   kvm_run->exit_type = KVM_EXIT_TYPE_VM_EXIT;
r = kvm_handle_exit(kvm_run, vcpu);
if (r > 0) {
/* Give scheduler a change to reschedule. */
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 9151ebf..57f47ef 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -11,7 +11,7 @@
 #include 
 #include 
 
-#define KVM_API_VERSION 7
+#define KVM_API_VERSION 8
 
 /*
  * Architectural interrupt line count, and the size of the bitmap needed
@@ -34,9 +34,6 @@ struct kvm_memory_region {
 #define KVM_MEM_LOG_DIRTY_PAGES  1UL
 
 
-#define KVM_EXIT_TYPE_FAIL_ENTRY 1
-#define KVM_EXIT_TYPE_VM_EXIT2
-
 enum kvm_exit_reason {
KVM_EXIT_UNKNOWN  = 0,
KVM_EXIT_EXCEPTION= 1,
@@ -47,6 +44,7 @@ enum kvm_exit_reason {
KVM_EXIT_MMIO = 6,
KVM_EXIT_IRQ_WINDOW_OPEN  = 7,
KVM_EXIT_SHUTDOWN = 8,
+   KVM_EXIT_FAIL_ENTRY   = 9,
 };
 
 /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
@@ -57,12 +55,11 @@ struct kvm_run {
__u8 padding1[3];
 
/* out */
-   __u32 exit_type;
__u32 exit_reason;
__u32 instruction_length;
__u8 ready_for_interrupt_injection;
__u8 if_flag;
-   __u16 padding2;
+   __u8 padding2[6];
 
/* in (pre_kvm_run), out (post_kvm_run) */
__u64 cr8;
@@ -71,8 +68,12 @@ struct kvm_run {
union {
/* KVM_EXIT_UNKNOWN */
struct {
-   __u32 hardware_exit_reason;
+   __u64 hardware_exit_reason;
} hw;
+   /* KVM_EXIT_FAIL_ENTRY */
+   struct {
+   __u64 hardware_entry_failure_reason;
+   } fail_entry;
/* KVM_EXIT_EXCEPTION */
struct {
__u32 exception;
-- 
1.5.0.5


[PATCH 17/41] KVM: Add guest mode signal mask

2007-04-01 Thread Avi Kivity
Allow a special signal mask to be used while executing in guest mode.  This
allows signals to be used to interrupt a vcpu without requiring signal
delivery to a userspace handler, which is quite expensive.  Userspace still
receives -EINTR and can get the signal via sigwait().

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h  |3 +++
 drivers/kvm/kvm_main.c |   41 +
 include/linux/kvm.h|7 +++
 3 files changed, 51 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index be3a0e7..1c4a581 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -277,6 +277,9 @@ struct kvm_vcpu {
gpa_t mmio_phys_addr;
int pio_pending;
 
+   int sigset_active;
+   sigset_t sigset;
+
struct {
int active;
u8 save_iopl;
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index ac44df5..df85f5f 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -1591,9 +1591,13 @@ static void complete_pio(struct kvm_vcpu *vcpu)
 static int kvm_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 {
int r;
+   sigset_t sigsaved;
 
vcpu_load(vcpu);
 
+   if (vcpu->sigset_active)
+   sigprocmask(SIG_SETMASK, >sigset, );
+
/* re-sync apic's tpr */
vcpu->cr8 = kvm_run->cr8;
 
@@ -1616,6 +1620,9 @@ static int kvm_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
 
r = kvm_arch_ops->run(vcpu, kvm_run);
 
+   if (vcpu->sigset_active)
+   sigprocmask(SIG_SETMASK, , NULL);
+
vcpu_put(vcpu);
return r;
 }
@@ -2142,6 +2149,17 @@ out:
return r;
 }
 
+static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset)
+{
+   if (sigset) {
+   sigdelsetmask(sigset, sigmask(SIGKILL)|sigmask(SIGSTOP));
+   vcpu->sigset_active = 1;
+   vcpu->sigset = *sigset;
+   } else
+   vcpu->sigset_active = 0;
+   return 0;
+}
+
 static long kvm_vcpu_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg)
 {
@@ -2260,6 +2278,29 @@ static long kvm_vcpu_ioctl(struct file *filp,
goto out;
break;
}
+   case KVM_SET_SIGNAL_MASK: {
+   struct kvm_signal_mask __user *sigmask_arg = argp;
+   struct kvm_signal_mask kvm_sigmask;
+   sigset_t sigset, *p;
+
+   p = NULL;
+   if (argp) {
+   r = -EFAULT;
+   if (copy_from_user(_sigmask, argp,
+  sizeof kvm_sigmask))
+   goto out;
+   r = -EINVAL;
+   if (kvm_sigmask.len != sizeof sigset)
+   goto out;
+   r = -EFAULT;
+   if (copy_from_user(, sigmask_arg->sigset,
+  sizeof sigset))
+   goto out;
+   p = 
+   }
+   r = kvm_vcpu_ioctl_set_sigmask(vcpu, );
+   break;
+   }
default:
;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index b3af92e..c0d10cd 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -234,6 +234,12 @@ struct kvm_cpuid {
struct kvm_cpuid_entry entries[0];
 };
 
+/* for KVM_SET_SIGNAL_MASK */
+struct kvm_signal_mask {
+   __u32 len;
+   __u8  sigset[0];
+};
+
 #define KVMIO 0xAE
 
 /*
@@ -273,5 +279,6 @@ struct kvm_cpuid {
 #define KVM_GET_MSRS  _IOWR(KVMIO, 0x88, struct kvm_msrs)
 #define KVM_SET_MSRS  _IOW(KVMIO,  0x89, struct kvm_msrs)
 #define KVM_SET_CPUID _IOW(KVMIO,  0x8a, struct kvm_cpuid)
+#define KVM_SET_SIGNAL_MASK   _IOW(KVMIO,  0x8b, struct kvm_signal_mask)
 
 #endif
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/41] KVM: Initialize the apic_base msr on svm too

2007-04-01 Thread Avi Kivity
Older userspace didn't care, but newer userspace (with the cpuid changes)
does.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/svm.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index 0311665..2396ada 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -582,6 +582,9 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
init_vmcb(vcpu->svm->vmcb);
 
fx_init(vcpu);
+   vcpu->apic_base = 0xfee0 |
+   /*for vcpu 0*/ MSR_IA32_APICBASE_BSP |
+   MSR_IA32_APICBASE_ENABLE;
 
return 0;
 
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: plain 2.6.21-rc5 (1) vs amanda (0)

2007-04-01 Thread Gene Heskett
On Sunday 01 April 2007, Ingo Molnar wrote:
>* Gene Heskett <[EMAIL PROTECTED]> wrote:
>> Hi Ingo;
>>
>> Running 2.6.21-rc5 tonight.
>>
>> It appears that as of 2.6.21-rc5, (actually anything with a 2.6.21 in
>> its version string) amanda is still a loser. [...]
>
>here 'is a loser' means: "tries to back up way too much stuff instead of
>doing a nice incremental backup like it does on 2.6.20.4", correct?

Yes, and as an additional clue, 2.6.20.4-rc1 does this, the later 2.6.20.4 
does not, so we have narrowed it down to one or more of the 31 patches in 
that gap.

>since it appears to be caused by a kernel change, this is a serious
>regression in v2.6.21-to-be.

Yes, that's why I'm fussing now, hopefully before this gets into the field 
as a final version.

>> Good, 2.6.20.4 was running
>> sendsize.20070331000507.debug:sendsize[762]: time 248.361: getting
>> size via gnutar for /usr/music level 0
>> sendsize.20070331000507.debug:sendsize[762]: estimate time for
>> /usr/music level 0: 1.239 sendsize.20070331000507.debug:sendsize[762]:
>> estimate size for /usr/music level 0: 2466050 KB
>> sendsize.20070331000507.debug:sendsize[762]: time 249.605: getting
>> size via gnutar for /usr/music level 1
>> sendsize.20070331000507.debug:sendsize[762]: estimate time for
>> /usr/music level 1: 0.027 sendsize.20070331000507.debug:sendsize[762]:
>> estimate size for /usr/music level 1: 80 KB
>>
>> Bad, 2.6.21-rc5 is running
>> sendsize.20070401000504.debug:sendsize[18465]: time 167.371: getting
>> size via gnutar for /usr/music level 0
>> sendsize.20070401000504.debug:sendsize[18465]: estimate time for
>> /usr/music level 0: 0.398
>> sendsize.20070401000504.debug:sendsize[18465]: estimate size for
>> /usr/music level 0: 2466050 KB
>> sendsize.20070401000504.debug:sendsize[18465]: time 167.773: getting
>> size via gnutar for /usr/music level 1
>> sendsize.20070401000504.debug:sendsize[18465]: estimate time for
>> /usr/music level 1: 0.049
>> sendsize.20070401000504.debug:sendsize[18465]: estimate size for
>> /usr/music level 1: 2448290 KB
>>
>> Yesterdays run, dated 20070331000507 were correct as that directory
>> hasn't been write accessed in a couple of months.
>>
>> Today's, dated 20070401000504, shows totally bogus figures for exactly
>> the same data.
>
>'totally bogus figures' needs to be analyzed further. What system call
>or library calls returns incorrect data?

Tar is used with the output sent to /dev/null, to obtain those numbers 
since the last $level figures and these are then assigned to $level + 1.  
Each disklist entry gets scanned twice during the estimate phase, once 
for a level 0 reference, and once for a "since $timestamp" value.  I'm 
not sure if the timestamp in the /etc/amandates file is used, or the 
timestamp on the indice file amandates names is used. Amanda then decides 
what to do next in an attempt to balance the tape usage run to run, and 
not let a needed level 0 ever get more than 1 run behind if amanda can 
help it.  It will drop incrementals to meet that goal if it has to with 
the current order I have setup in my amanda.conf.

>> This effect I have isolated down to something in the 31 patches from
>> 2.6.20.4 to 2.6.20.5-rc1, but I'm going to need additional guidance in
>> setting up the bisect to find it.  If indeed its a kernel problem.

And I miss-quoted the above, its the 31 patches between 2.6.20.3 and 
2.6.20.4-rc1.  Then for some reason I don't grok, 2.6.20.4 is good again.

>> This same effect has been present in any and every 2.6.21.* release.
>
>maybe it's some sort of timestamp problem, on the FS level?
>
>   Ingo

I wasn't aware of a 'timestamp' problem on the FS level ever being 
discussed at any length here on lkml.  As far as my checking, I'm limited 
to what "ls -lc?" tells me, and those figures seem to be sane.

Is there additional data present now that could screw up tar, and do it 
without being obvious to ls?  I don't know. :-\

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
We have only two things to worry about:  That things will never get
back to normal, and that they already have.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 24/41] KVM: Remove set_cr0_no_modeswitch() arch op

2007-04-01 Thread Avi Kivity
set_cr0_no_modeswitch() was a hack to avoid corrupting segment registers.
As we now cache the protected mode values on entry to real mode, this
isn't an issue anymore, and it interferes with reboot (which usually _is_
a modeswitch).

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h  |2 --
 drivers/kvm/kvm_main.c |2 +-
 drivers/kvm/svm.c  |1 -
 drivers/kvm/vmx.c  |   17 -
 4 files changed, 1 insertions(+), 21 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index a4331da..7361c45 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -383,8 +383,6 @@ struct kvm_arch_ops {
void (*get_cs_db_l_bits)(struct kvm_vcpu *vcpu, int *db, int *l);
void (*decache_cr0_cr4_guest_bits)(struct kvm_vcpu *vcpu);
void (*set_cr0)(struct kvm_vcpu *vcpu, unsigned long cr0);
-   void (*set_cr0_no_modeswitch)(struct kvm_vcpu *vcpu,
- unsigned long cr0);
void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3);
void (*set_cr4)(struct kvm_vcpu *vcpu, unsigned long cr4);
void (*set_efer)(struct kvm_vcpu *vcpu, u64 efer);
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 0c30e1b..5ada7aa 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -1927,7 +1927,7 @@ static int kvm_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
kvm_arch_ops->decache_cr0_cr4_guest_bits(vcpu);
 
mmu_reset_needed |= vcpu->cr0 != sregs->cr0;
-   kvm_arch_ops->set_cr0_no_modeswitch(vcpu, sregs->cr0);
+   kvm_arch_ops->set_cr0(vcpu, sregs->cr0);
 
mmu_reset_needed |= vcpu->cr4 != sregs->cr4;
kvm_arch_ops->set_cr4(vcpu, sregs->cr4);
diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index 64afc5c..d3cc115 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -1716,7 +1716,6 @@ static struct kvm_arch_ops svm_arch_ops = {
.get_cs_db_l_bits = svm_get_cs_db_l_bits,
.decache_cr0_cr4_guest_bits = svm_decache_cr0_cr4_guest_bits,
.set_cr0 = svm_set_cr0,
-   .set_cr0_no_modeswitch = svm_set_cr0,
.set_cr3 = svm_set_cr3,
.set_cr4 = svm_set_cr4,
.set_efer = svm_set_efer,
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index aa7e2ba..027a962 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -788,22 +788,6 @@ static void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned 
long cr0)
vcpu->cr0 = cr0;
 }
 
-/*
- * Used when restoring the VM to avoid corrupting segment registers
- */
-static void vmx_set_cr0_no_modeswitch(struct kvm_vcpu *vcpu, unsigned long cr0)
-{
-   if (!vcpu->rmode.active && !(cr0 & CR0_PE_MASK))
-   enter_rmode(vcpu);
-
-   vcpu->rmode.active = ((cr0 & CR0_PE_MASK) == 0);
-   update_exception_bitmap(vcpu);
-   vmcs_writel(CR0_READ_SHADOW, cr0);
-   vmcs_writel(GUEST_CR0,
-   (cr0 & ~KVM_GUEST_CR0_MASK) | KVM_VM_CR0_ALWAYS_ON);
-   vcpu->cr0 = cr0;
-}
-
 static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
vmcs_writel(GUEST_CR3, cr3);
@@ -2069,7 +2053,6 @@ static struct kvm_arch_ops vmx_arch_ops = {
.get_cs_db_l_bits = vmx_get_cs_db_l_bits,
.decache_cr0_cr4_guest_bits = vmx_decache_cr0_cr4_guest_bits,
.set_cr0 = vmx_set_cr0,
-   .set_cr0_no_modeswitch = vmx_set_cr0_no_modeswitch,
.set_cr3 = vmx_set_cr3,
.set_cr4 = vmx_set_cr4,
 #ifdef CONFIG_X86_64
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 19/41] KVM: Future-proof argument-less ioctls

2007-04-01 Thread Avi Kivity
Some ioctls ignore their arguments.  By requiring them to be zero now,
we allow a nonzero value to have some special meaning in the future.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm_main.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index cba0b87..ba7f43a 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -2169,6 +2169,9 @@ static long kvm_vcpu_ioctl(struct file *filp,
 
switch (ioctl) {
case KVM_RUN:
+   r = -EINVAL;
+   if (arg)
+   goto out;
r = kvm_vcpu_ioctl_run(vcpu, vcpu->run);
break;
case KVM_GET_REGS: {
@@ -2440,9 +2443,15 @@ static long kvm_dev_ioctl(struct file *filp,
 
switch (ioctl) {
case KVM_GET_API_VERSION:
+   r = -EINVAL;
+   if (arg)
+   goto out;
r = KVM_API_VERSION;
break;
case KVM_CREATE_VM:
+   r = -EINVAL;
+   if (arg)
+   goto out;
r = kvm_dev_ioctl_create_vm();
break;
case KVM_GET_MSR_INDEX_LIST: {
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 27/41] KVM: Don't allow the guest to turn off the cpu cache

2007-04-01 Thread Avi Kivity
The cpu cache is a host resource; the guest should not be able to turn
it off (even for itself).

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/svm.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index d3cc115..191bc45 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -737,8 +737,10 @@ static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned 
long cr0)
}
 #endif
vcpu->svm->cr0 = cr0;
-   vcpu->svm->vmcb->save.cr0 = cr0 | CR0_PG_MASK | CR0_WP_MASK;
vcpu->cr0 = cr0;
+   cr0 |= CR0_PG_MASK | CR0_WP_MASK;
+   cr0 &= ~(CR0_CD_MASK | CR0_NW_MASK);
+   vcpu->svm->vmcb->save.cr0 = cr0;
 }
 
 static void svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 25/41] KVM: Modify guest segments after potentially switching modes

2007-04-01 Thread Avi Kivity
The SET_SREGS ioctl modifies both cr0.pe (real mode/protected mode) and
guest segment registers.  Since segment handling is modified by the mode on
Intel procesors, update the segment registers after the mode switch has taken
place.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm_main.c |   20 ++--
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 5ada7aa..12c3388 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -1895,16 +1895,6 @@ static int kvm_vcpu_ioctl_set_sregs(struct kvm_vcpu 
*vcpu,
 
vcpu_load(vcpu);
 
-   set_segment(vcpu, >cs, VCPU_SREG_CS);
-   set_segment(vcpu, >ds, VCPU_SREG_DS);
-   set_segment(vcpu, >es, VCPU_SREG_ES);
-   set_segment(vcpu, >fs, VCPU_SREG_FS);
-   set_segment(vcpu, >gs, VCPU_SREG_GS);
-   set_segment(vcpu, >ss, VCPU_SREG_SS);
-
-   set_segment(vcpu, >tr, VCPU_SREG_TR);
-   set_segment(vcpu, >ldt, VCPU_SREG_LDTR);
-
dt.limit = sregs->idt.limit;
dt.base = sregs->idt.base;
kvm_arch_ops->set_idt(vcpu, );
@@ -1944,6 +1934,16 @@ static int kvm_vcpu_ioctl_set_sregs(struct kvm_vcpu 
*vcpu,
if (vcpu->irq_pending[i])
__set_bit(i, >irq_summary);
 
+   set_segment(vcpu, >cs, VCPU_SREG_CS);
+   set_segment(vcpu, >ds, VCPU_SREG_DS);
+   set_segment(vcpu, >es, VCPU_SREG_ES);
+   set_segment(vcpu, >fs, VCPU_SREG_FS);
+   set_segment(vcpu, >gs, VCPU_SREG_GS);
+   set_segment(vcpu, >ss, VCPU_SREG_SS);
+
+   set_segment(vcpu, >tr, VCPU_SREG_TR);
+   set_segment(vcpu, >ldt, VCPU_SREG_LDTR);
+
vcpu_put(vcpu);
 
return 0;
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 23/41] KVM: Workaround vmx inability to virtualize the reset state

2007-04-01 Thread Avi Kivity
The reset state has cs.selector == 0xf000 and cs.base == 0x,
which aren't compatible with vm86 mode, which is used for real mode
virtualization.

When we create a vcpu, we set cs.base to 0xf, but if we get there by
way of a reset, the values are inconsistent and vmx refuses to enter
guest mode.

Workaround by detecting the state and munging it appropriately.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/vmx.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index 0d9bf0b..aa7e2ba 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -712,6 +712,8 @@ static void enter_rmode(struct kvm_vcpu *vcpu)
 
vmcs_write32(GUEST_CS_AR_BYTES, 0xf3);
vmcs_write32(GUEST_CS_LIMIT, 0x);
+   if (vmcs_readl(GUEST_CS_BASE) == 0x)
+   vmcs_writel(GUEST_CS_BASE, 0xf);
vmcs_write16(GUEST_CS_SELECTOR, vmcs_readl(GUEST_CS_BASE) >> 4);
 
fix_rmode_seg(VCPU_SREG_ES, >rmode.es);
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 29/41] KVM: Handle writes to MCG_STATUS msr

2007-04-01 Thread Avi Kivity
From: Sergey Kiselev <[EMAIL PROTECTED]>

Some older (~2.6.7) kernels write MCG_STATUS register during kernel
boot (mce_clear_all() function, called from mce_init()). It's not
currently handled by kvm and will cause it to inject a GPF.
Following patch adds a "nop" handler for this.

Signed-off-by: Sergey Kiselev <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm_main.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 12c3388..1ef5e9a 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -1467,6 +1467,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 data)
printk(KERN_WARNING "%s: MSR_IA32_MC0_STATUS 0x%llx, nop\n",
   __FUNCTION__, data);
break;
+   case MSR_IA32_MCG_STATUS:
+   printk(KERN_WARNING "%s: MSR_IA32_MCG_STATUS 0x%llx, nop\n",
+   __FUNCTION__, data);
+   break;
case MSR_IA32_UCODE_REV:
case MSR_IA32_UCODE_WRITE:
case 0x200 ... 0x2ff: /* MTRRs */
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 33/41] KVM: Remove unused function

2007-04-01 Thread Avi Kivity
From: Michal Piotrowski <[EMAIL PROTECTED]>

Remove unused function

CC  drivers/kvm/svm.o
drivers/kvm/svm.c:207: warning: ‘inject_db’ defined but not used

Signed-off-by: Michal Piotrowski <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/svm.c |7 ---
 1 files changed, 0 insertions(+), 7 deletions(-)

diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index ca2642f..303e959 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -203,13 +203,6 @@ static void inject_ud(struct kvm_vcpu *vcpu)
UD_VECTOR;
 }
 
-static void inject_db(struct kvm_vcpu *vcpu)
-{
-   vcpu->svm->vmcb->control.event_inj =SVM_EVTINJ_VALID |
-   SVM_EVTINJ_TYPE_EXEPT |
-   DB_VECTOR;
-}
-
 static int is_page_fault(uint32_t info)
 {
info &= SVM_EVTINJ_VEC_MASK | SVM_EVTINJ_TYPE_MASK | SVM_EVTINJ_VALID;
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/41] KVM: Use the generic skip_emulated_instruction() in hypercall code

2007-04-01 Thread Avi Kivity
From: Dor Laor <[EMAIL PROTECTED]>

Instead of twiddling the rip registers directly, use the
skip_emulated_instruction() function to do that for us.

Signed-off-by: Dor Laor <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/svm.c |3 ++-
 drivers/kvm/vmx.c |2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index 3d8ea7a..6787f11 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -1078,7 +1078,8 @@ static int halt_interception(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
 
 static int vmmcall_interception(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 {
-   vcpu->svm->vmcb->save.rip += 3;
+   vcpu->svm->next_rip = vcpu->svm->vmcb->save.rip + 3;
+   skip_emulated_instruction(vcpu);
return kvm_hypercall(vcpu, kvm_run);
 }
 
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index fbbf9d6..a721b60 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -1658,7 +1658,7 @@ static int handle_halt(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
 
 static int handle_vmcall(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 {
-   vmcs_writel(GUEST_RIP, vmcs_readl(GUEST_RIP)+3);
+   skip_emulated_instruction(vcpu);
return kvm_hypercall(vcpu, kvm_run);
 }
 
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 21/41] KVM: MMU: Remove unnecessary check for pdptr access

2007-04-01 Thread Avi Kivity
We already special case the pdptr access, so no need to check it again.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/paging_tmpl.h |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h
index f3bcee9..17bd440 100644
--- a/drivers/kvm/paging_tmpl.h
+++ b/drivers/kvm/paging_tmpl.h
@@ -148,8 +148,7 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
break;
}
 
-   if (walker->level != 3 || is_long_mode(vcpu))
-   walker->inherited_ar &= walker->table[index];
+   walker->inherited_ar &= walker->table[index];
table_gfn = (*ptep & PT_BASE_ADDR_MASK) >> PAGE_SHIFT;
paddr = safe_gpa_to_hpa(vcpu, *ptep & PT_BASE_ADDR_MASK);
kunmap_atomic(walker->table, KM_USER0);
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 20/41] KVM: Avoid guest virtual addresses in string pio userspace interface

2007-04-01 Thread Avi Kivity
The current string pio interface communicates using guest virtual addresses,
relying on userspace to translate addresses and to check permissions.  This
interface cannot fully support guest smp, as the check needs to take into
account two pages at one in case an unaligned string transfer straddles a
page boundary.

Change the interface not to communicate guest addresses at all; instead use
a buffer page (mmaped by userspace) and do transfers there.  The kernel
manages the virtual to physical translation and can perform the checks
atomically by taking the appropriate locks.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h  |   21 ++-
 drivers/kvm/kvm_main.c |  174 +++
 drivers/kvm/mmu.c  |9 +++
 drivers/kvm/svm.c  |   40 +--
 drivers/kvm/vmx.c  |   40 ++--
 include/linux/kvm.h|   11 +---
 6 files changed, 229 insertions(+), 66 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 1c4a581..7866b34 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -74,6 +74,8 @@
 
 #define IOPL_SHIFT 12
 
+#define KVM_PIO_PAGE_OFFSET 1
+
 /*
  * Address types:
  *
@@ -220,6 +222,18 @@ enum {
VCPU_SREG_LDTR,
 };
 
+struct kvm_pio_request {
+   unsigned long count;
+   int cur_count;
+   struct page *guest_pages[2];
+   unsigned guest_page_offset;
+   int in;
+   int size;
+   int string;
+   int down;
+   int rep;
+};
+
 struct kvm_vcpu {
struct kvm *kvm;
union {
@@ -275,7 +289,8 @@ struct kvm_vcpu {
int mmio_size;
unsigned char mmio_data[8];
gpa_t mmio_phys_addr;
-   int pio_pending;
+   struct kvm_pio_request pio;
+   void *pio_data;
 
int sigset_active;
sigset_t sigset;
@@ -421,6 +436,7 @@ hpa_t gpa_to_hpa(struct kvm_vcpu *vcpu, gpa_t gpa);
 #define HPA_ERR_MASK ((hpa_t)1 << HPA_MSB)
 static inline int is_error_hpa(hpa_t hpa) { return hpa >> HPA_MSB; }
 hpa_t gva_to_hpa(struct kvm_vcpu *vcpu, gva_t gva);
+struct page *gva_to_page(struct kvm_vcpu *vcpu, gva_t gva);
 
 void kvm_emulator_want_group7_invlpg(void);
 
@@ -453,6 +469,9 @@ void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, 
unsigned long value,
 
 struct x86_emulate_ctxt;
 
+int kvm_setup_pio(struct kvm_vcpu *vcpu, struct kvm_run *run, int in,
+ int size, unsigned long count, int string, int down,
+ gva_t address, int rep, unsigned port);
 void kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
 int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address);
 int emulate_clts(struct kvm_vcpu *vcpu);
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index ba7f43a..0c30e1b 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -346,6 +346,17 @@ static void kvm_free_physmem(struct kvm *kvm)
kvm_free_physmem_slot(>memslots[i], NULL);
 }
 
+static void free_pio_guest_pages(struct kvm_vcpu *vcpu)
+{
+   int i;
+
+   for (i = 0; i < 2; ++i)
+   if (vcpu->pio.guest_pages[i]) {
+   __free_page(vcpu->pio.guest_pages[i]);
+   vcpu->pio.guest_pages[i] = NULL;
+   }
+}
+
 static void kvm_free_vcpu(struct kvm_vcpu *vcpu)
 {
if (!vcpu->vmcs)
@@ -357,6 +368,9 @@ static void kvm_free_vcpu(struct kvm_vcpu *vcpu)
kvm_arch_ops->vcpu_free(vcpu);
free_page((unsigned long)vcpu->run);
vcpu->run = NULL;
+   free_page((unsigned long)vcpu->pio_data);
+   vcpu->pio_data = NULL;
+   free_pio_guest_pages(vcpu);
 }
 
 static void kvm_free_vcpus(struct kvm *kvm)
@@ -1550,44 +1564,159 @@ void kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_emulate_cpuid);
 
-static void complete_pio(struct kvm_vcpu *vcpu)
+static int pio_copy_data(struct kvm_vcpu *vcpu)
+{
+   void *p = vcpu->pio_data;
+   void *q;
+   unsigned bytes;
+   int nr_pages = vcpu->pio.guest_pages[1] ? 2 : 1;
+
+   kvm_arch_ops->vcpu_put(vcpu);
+   q = vmap(vcpu->pio.guest_pages, nr_pages, VM_READ|VM_WRITE,
+PAGE_KERNEL);
+   if (!q) {
+   kvm_arch_ops->vcpu_load(vcpu);
+   return -ENOMEM;
+   }
+   q += vcpu->pio.guest_page_offset;
+   bytes = vcpu->pio.size * vcpu->pio.cur_count;
+   if (vcpu->pio.in)
+   memcpy(q, p, bytes);
+   else
+   memcpy(p, q, bytes);
+   q -= vcpu->pio.guest_page_offset;
+   vunmap(q);
+   kvm_arch_ops->vcpu_load(vcpu);
+   return 0;
+}
+
+static int complete_pio(struct kvm_vcpu *vcpu)
 {
-   struct kvm_io *io = >run->io;
+   struct kvm_pio_request *io = >pio;
long delta;
+   int r;
 
kvm_arch_ops->cache_regs(vcpu);
 
+   io->count -= io->cur_count;
if (!io->string) {
-   if (io->direction == KVM_EXIT_IO_IN)
-   memcpy(>regs[VCPU_REGS_RAX], >value,
+   if (io->in)
+ 

[PATCH 35/41] KVM: Remove debug message

2007-04-01 Thread Avi Kivity
No longer interesting.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/vmx.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index 578dff5..b64b7b7 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -1131,7 +1131,6 @@ static int vmx_vcpu_setup(struct kvm_vcpu *vcpu)
vcpu->guest_msrs[j] = vcpu->host_msrs[j];
++vcpu->nmsrs;
}
-   printk(KERN_DEBUG "kvm: msrs: %d\n", vcpu->nmsrs);
 
nr_good_msrs = vcpu->nmsrs - NR_BAD_MSRS;
vmcs_writel(VM_ENTRY_MSR_LOAD_ADDR,
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 22/41] KVM: MMU: Remove global pte tracking

2007-04-01 Thread Avi Kivity
The initial, noncaching, version of the kvm mmu flushed the all nonglobal
shadow page table translations (much like a native tlb flush).  The new
implementation flushes translations only when they change, rendering global
pte tracking superfluous.

This removes the unused tracking mechanism and storage space.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h |1 -
 drivers/kvm/mmu.c |9 -
 2 files changed, 0 insertions(+), 10 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 7866b34..a4331da 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -136,7 +136,6 @@ struct kvm_mmu_page {
unsigned long slot_bitmap; /* One bit set per slot which has memory
* in this shadow page.
*/
-   int global;  /* Set if all ptes in this page are global */
int multimapped; /* More than one parent_pte? */
int root_count;  /* Currently serving as active root */
union {
diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index e8e2750..b181106 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -461,7 +461,6 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct 
kvm_vcpu *vcpu,
list_add(>link, >kvm->active_mmu_pages);
ASSERT(is_empty_shadow_page(page->page_hpa));
page->slot_bitmap = 0;
-   page->global = 1;
page->multimapped = 0;
page->parent_pte = parent_pte;
--vcpu->kvm->n_free_mmu_pages;
@@ -927,11 +926,6 @@ static void paging_new_cr3(struct kvm_vcpu *vcpu)
kvm_arch_ops->set_cr3(vcpu, vcpu->mmu.root_hpa);
 }
 
-static void mark_pagetable_nonglobal(void *shadow_pte)
-{
-   page_header(__pa(shadow_pte))->global = 0;
-}
-
 static inline void set_pte_common(struct kvm_vcpu *vcpu,
 u64 *shadow_pte,
 gpa_t gaddr,
@@ -949,9 +943,6 @@ static inline void set_pte_common(struct kvm_vcpu *vcpu,
 
*shadow_pte |= access_bits;
 
-   if (!(*shadow_pte & PT_GLOBAL_MASK))
-   mark_pagetable_nonglobal(shadow_pte);
-
if (is_error_hpa(paddr)) {
*shadow_pte |= gaddr;
*shadow_pte |= PT_SHADOW_IO_MARK;
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 00/41] kvm updates for 2.6.22

2007-04-01 Thread Avi Kivity
Following is my current 2.6.22 kvm queue.  It contains userspace interface
updates, improved guest support, cleanups, and plain bugfixes.  It will
likely grow slightly by the time the merge window opens.

Avi Kivity (34):
  KVM: Use own minor number
  KVM: Export 
  KVM: Fix bogus sign extension in mmu mapping audit
  KVM: Use a shared page for kernel/user communication when runing a vcpu
  KVM: Do not communicate to userspace through cpu registers during PIO
  KVM: Handle cpuid in the kernel instead of punting to userspace
  KVM: Remove the 'emulated' field from the userspace interface
  KVM: Remove minor wart from KVM_CREATE_VCPU ioctl
  KVM: Renumber ioctls
  KVM: Add method to check for backwards-compatible API extensions
  KVM: Allow userspace to process hypercalls which have no kernel handler
  KVM: Fold kvm_run::exit_type into kvm_run::exit_reason
  KVM: Add a special exit reason when exiting due to an interrupt
  KVM: Initialize the apic_base msr on svm too
  KVM: Add guest mode signal mask
  KVM: Allow kernel to select size of mmap() buffer
  KVM: Future-proof argument-less ioctls
  KVM: Avoid guest virtual addresses in string pio userspace interface
  KVM: MMU: Remove unnecessary check for pdptr access
  KVM: MMU: Remove global pte tracking
  KVM: Workaround vmx inability to virtualize the reset state
  KVM: Remove set_cr0_no_modeswitch() arch op
  KVM: Modify guest segments after potentially switching modes
  KVM: Hack real-mode segments on vmx from KVM_SET_SREGS
  KVM: Don't allow the guest to turn off the cpu cache
  KVM: Remove unused and write-only variables
  KVM: MMU: Fix hugepage pdes mapping same physical address with different 
access
  KVM: SVM: Ensure timestamp counter monotonicity
  KVM: Use list_move()
  KVM: Remove debug message
  KVM: x86 emulator: fix bit string operations operand size
  KVM: Simply gfn_to_page()
  KVM: Add physical memory aliasing feature
  KVM: Add fpu get/set operations

Dor Laor (3):
  KVM: Fix guest register corruption on paravirt hypercall
  KVM: Use the generic skip_emulated_instruction() in hypercall code
  KVM: Add mmu cache clear function

Joerg Roedel (2):
  KVM: SVM: forbid guest to execute monitor/mwait
  KVM: SVM: enable LBRV virtualization if available

Michal Piotrowski (1):
  KVM: Remove unused function

Sergey Kiselev (1):
  KVM: Handle writes to MCG_STATUS msr

 drivers/kvm/kvm.h  |   57 +++-
 drivers/kvm/kvm_main.c |  649 
 drivers/kvm/kvm_svm.h  |2 -
 drivers/kvm/mmu.c  |   69 +++--
 drivers/kvm/paging_tmpl.h  |   10 +-
 drivers/kvm/svm.c  |  111 +---
 drivers/kvm/svm.h  |6 +
 drivers/kvm/vmx.c  |   88 +++
 drivers/kvm/x86_emulate.c  |5 +-
 include/linux/Kbuild   |1 +
 include/linux/kvm.h|  135 +++---
 include/linux/miscdevice.h |1 +
 12 files changed, 898 insertions(+), 236 deletions(-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 26/41] KVM: Hack real-mode segments on vmx from KVM_SET_SREGS

2007-04-01 Thread Avi Kivity
As usual, we need to mangle segment registers when emulating real mode
as vm86 has specific constraints.  We special case the reset segment base,
and set the "access rights" (or descriptor flags) to vm86 comaptible values.

This fixes reboot on vmx.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/vmx.c |9 -
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index 027a962..578dff5 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -864,7 +864,14 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu,
vmcs_writel(sf->base, var->base);
vmcs_write32(sf->limit, var->limit);
vmcs_write16(sf->selector, var->selector);
-   if (var->unusable)
+   if (vcpu->rmode.active && var->s) {
+   /*
+* Hack real-mode segments into vm86 compatibility.
+*/
+   if (var->base == 0x && var->selector == 0xf000)
+   vmcs_writel(sf->base, 0xf);
+   ar = 0xf3;
+   } else if (var->unusable)
ar = 1 << 16;
else {
ar = var->type & 15;
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 31/41] KVM: MMU: Fix hugepage pdes mapping same physical address with different access

2007-04-01 Thread Avi Kivity
The kvm mmu keeps a shadow page for hugepage pdes; if several such pdes map
the same physical address, they share the same shadow page.  This is a fairly
common case (kernel mappings on i386 nonpae Linux, for example).

However, if the two pdes map the same memory but with different permissions, kvm
will happily use the cached shadow page.  If the access through the more
permissive pde will occur after the access to the strict pde, an endless 
pagefault
loop will be generated and the guest will make no progress.

Fix by making the access permissions part of the cache lookup key.

The fix allows Xen pae to boot on kvm and run guest domains.

Thanks to Jeremy Fitzhardinge for reporting the bug and testing the fix.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h |2 ++
 drivers/kvm/mmu.c |8 +---
 drivers/kvm/paging_tmpl.h |7 ++-
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 7361c45..f5e343c 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -109,6 +109,7 @@ struct kvm_pte_chain {
  *   bits 4:7 - page table level for this shadow (1-4)
  *   bits 8:9 - page table quadrant for 2-level guests
  *   bit   16 - "metaphysical" - gfn is not a real page (huge page/real mode)
+ *   bits 17:18 - "access" - the user and writable bits of a huge page pde
  */
 union kvm_mmu_page_role {
unsigned word;
@@ -118,6 +119,7 @@ union kvm_mmu_page_role {
unsigned quadrant : 2;
unsigned pad_for_nice_hex_output : 6;
unsigned metaphysical : 1;
+   unsigned hugepage_access : 2;
};
 };
 
diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index b181106..0216b77 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -568,6 +568,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
 gva_t gaddr,
 unsigned level,
 int metaphysical,
+unsigned hugepage_access,
 u64 *parent_pte)
 {
union kvm_mmu_page_role role;
@@ -581,6 +582,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
role.glevels = vcpu->mmu.root_level;
role.level = level;
role.metaphysical = metaphysical;
+   role.hugepage_access = hugepage_access;
if (vcpu->mmu.root_level <= PT32_ROOT_LEVEL) {
quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level));
quadrant &= (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1;
@@ -780,7 +782,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
hpa_t p)
>> PAGE_SHIFT;
new_table = kvm_mmu_get_page(vcpu, pseudo_gfn,
 v, level - 1,
-1, [index]);
+1, 0, [index]);
if (!new_table) {
pgprintk("nonpaging_map: ENOMEM\n");
return -ENOMEM;
@@ -835,7 +837,7 @@ static void mmu_alloc_roots(struct kvm_vcpu *vcpu)
 
ASSERT(!VALID_PAGE(root));
page = kvm_mmu_get_page(vcpu, root_gfn, 0,
-   PT64_ROOT_LEVEL, 0, NULL);
+   PT64_ROOT_LEVEL, 0, 0, NULL);
root = page->page_hpa;
++page->root_count;
vcpu->mmu.root_hpa = root;
@@ -852,7 +854,7 @@ static void mmu_alloc_roots(struct kvm_vcpu *vcpu)
root_gfn = 0;
page = kvm_mmu_get_page(vcpu, root_gfn, i << 30,
PT32_ROOT_LEVEL, !is_paging(vcpu),
-   NULL);
+   0, NULL);
root = page->page_hpa;
++page->root_count;
vcpu->mmu.pae_root[i] = root | PT_PRESENT_MASK;
diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h
index 17bd440..b94010d 100644
--- a/drivers/kvm/paging_tmpl.h
+++ b/drivers/kvm/paging_tmpl.h
@@ -247,6 +247,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
u64 shadow_pte;
int metaphysical;
gfn_t table_gfn;
+   unsigned hugepage_access = 0;
 
if (is_present_pte(*shadow_ent) || is_io_pte(*shadow_ent)) {
if (level == PT_PAGE_TABLE_LEVEL)
@@ -276,6 +277,9 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
if (level - 1 == PT_PAGE_TABLE_LEVEL
&& walker->level == PT_DIRECTORY_LEVEL) {
metaphysical = 1;
+   

[PATCH 30/41] KVM: SVM: forbid guest to execute monitor/mwait

2007-04-01 Thread Avi Kivity
From: Joerg Roedel <[EMAIL PROTECTED]>

This patch forbids the guest to execute monitor/mwait instructions on
SVM. This is necessary because the guest can execute these instructions
if they are available even if the kvm cpuid doesn't report its
existence.

Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/svm.c |6 +-
 drivers/kvm/svm.h |6 ++
 2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index ddc0505..0542d33 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -511,7 +511,9 @@ static void init_vmcb(struct vmcb *vmcb)
(1ULL << INTERCEPT_VMSAVE) |
(1ULL << INTERCEPT_STGI) |
(1ULL << INTERCEPT_CLGI) |
-   (1ULL << INTERCEPT_SKINIT);
+   (1ULL << INTERCEPT_SKINIT) |
+   (1ULL << INTERCEPT_MONITOR) |
+   (1ULL << INTERCEPT_MWAIT);
 
control->iopm_base_pa = iopm_base;
control->msrpm_base_pa = msrpm_base;
@@ -1292,6 +1294,8 @@ static int (*svm_exit_handlers[])(struct kvm_vcpu *vcpu,
[SVM_EXIT_STGI] = invalid_op_interception,
[SVM_EXIT_CLGI] = invalid_op_interception,
[SVM_EXIT_SKINIT]   = invalid_op_interception,
+   [SVM_EXIT_MONITOR]  = invalid_op_interception,
+   [SVM_EXIT_MWAIT]= invalid_op_interception,
 };
 
 
diff --git a/drivers/kvm/svm.h b/drivers/kvm/svm.h
index df731c3..5e93814 100644
--- a/drivers/kvm/svm.h
+++ b/drivers/kvm/svm.h
@@ -44,6 +44,9 @@ enum {
INTERCEPT_RDTSCP,
INTERCEPT_ICEBP,
INTERCEPT_WBINVD,
+   INTERCEPT_MONITOR,
+   INTERCEPT_MWAIT,
+   INTERCEPT_MWAIT_COND,
 };
 
 
@@ -298,6 +301,9 @@ struct __attribute__ ((__packed__)) vmcb {
 #define SVM_EXIT_RDTSCP0x087
 #define SVM_EXIT_ICEBP 0x088
 #define SVM_EXIT_WBINVD0x089
+#define SVM_EXIT_MONITOR   0x08a
+#define SVM_EXIT_MWAIT 0x08b
+#define SVM_EXIT_MWAIT_COND0x08c
 #define SVM_EXIT_NPF   0x400
 
 #define SVM_EXIT_ERR   -1
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 37/41] KVM: Add mmu cache clear function

2007-04-01 Thread Avi Kivity
From: Dor Laor <[EMAIL PROTECTED]>

Functions that play around with the physical memory map
need a way to clear mappings to possibly nonexistent or
invalid memory.  Both the mmu cache and the processor tlb
are cleared.

Signed-off-by: Dor Laor <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h |1 +
 drivers/kvm/mmu.c |   17 +
 2 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 6d0bd7a..59357be 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -430,6 +430,7 @@ int kvm_mmu_setup(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
 void kvm_mmu_slot_remove_write_access(struct kvm_vcpu *vcpu, int slot);
+void kvm_mmu_zap_all(struct kvm_vcpu *vcpu);
 
 hpa_t gpa_to_hpa(struct kvm_vcpu *vcpu, gpa_t gpa);
 #define HPA_MSB ((sizeof(hpa_t) * 8) - 1)
diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index c2487b6..707f63c 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -1313,6 +1313,23 @@ void kvm_mmu_slot_remove_write_access(struct kvm_vcpu 
*vcpu, int slot)
}
 }
 
+void kvm_mmu_zap_all(struct kvm_vcpu *vcpu)
+{
+   destroy_kvm_mmu(vcpu);
+
+   while (!list_empty(>kvm->active_mmu_pages)) {
+   struct kvm_mmu_page *page;
+
+   page = container_of(vcpu->kvm->active_mmu_pages.next,
+   struct kvm_mmu_page, link);
+   kvm_mmu_zap_page(vcpu, page);
+   }
+
+   mmu_free_memory_caches(vcpu);
+   kvm_arch_ops->tlb_flush(vcpu);
+   init_kvm_mmu(vcpu);
+}
+
 #ifdef AUDIT
 
 static const char *audit_msg;
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 28/41] KVM: Remove unused and write-only variables

2007-04-01 Thread Avi Kivity
Trivial cleanup.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm_svm.h |2 --
 drivers/kvm/svm.c |2 --
 2 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/drivers/kvm/kvm_svm.h b/drivers/kvm/kvm_svm.h
index 624f1ca..a1a9eba 100644
--- a/drivers/kvm/kvm_svm.h
+++ b/drivers/kvm/kvm_svm.h
@@ -28,8 +28,6 @@ struct vcpu_svm {
struct svm_cpu_data *svm_data;
uint64_t asid_generation;
 
-   unsigned long cr0;
-   unsigned long cr4;
unsigned long db_regs[NUM_DB_REGS];
 
u64 next_rip;
diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index 191bc45..ddc0505 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -576,7 +576,6 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
vcpu->svm->vmcb = page_address(page);
memset(vcpu->svm->vmcb, 0, PAGE_SIZE);
vcpu->svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT;
-   vcpu->svm->cr0 = 0x0010;
vcpu->svm->asid_generation = 0;
memset(vcpu->svm->db_regs, 0, sizeof(vcpu->svm->db_regs));
init_vmcb(vcpu->svm->vmcb);
@@ -736,7 +735,6 @@ static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned 
long cr0)
}
}
 #endif
-   vcpu->svm->cr0 = cr0;
vcpu->cr0 = cr0;
cr0 |= CR0_PG_MASK | CR0_WP_MASK;
cr0 &= ~(CR0_CD_MASK | CR0_NW_MASK);
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 40/41] KVM: Add fpu get/set operations

2007-04-01 Thread Avi Kivity
These are really helpful when migrating an floating point app to another
machine.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm_main.c |   86 
 include/linux/kvm.h|   17 +
 2 files changed, 103 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 56ddbc1..4473174 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -2389,6 +2389,67 @@ static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu 
*vcpu, sigset_t *sigset)
return 0;
 }
 
+/*
+ * fxsave fpu state.  Taken from x86_64/processor.h.  To be killed when
+ * we have asm/x86/processor.h
+ */
+struct fxsave {
+   u16 cwd;
+   u16 swd;
+   u16 twd;
+   u16 fop;
+   u64 rip;
+   u64 rdp;
+   u32 mxcsr;
+   u32 mxcsr_mask;
+   u32 st_space[32];   /* 8*16 bytes for each FP-reg = 128 bytes */
+#ifdef CONFIG_X86_64
+   u32 xmm_space[64];  /* 16*16 bytes for each XMM-reg = 256 bytes */
+#else
+   u32 xmm_space[32];  /* 8*16 bytes for each XMM-reg = 128 bytes */
+#endif
+};
+
+static int kvm_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+   struct fxsave *fxsave = (struct fxsave *)vcpu->guest_fx_image;
+
+   vcpu_load(vcpu);
+
+   memcpy(fpu->fpr, fxsave->st_space, 128);
+   fpu->fcw = fxsave->cwd;
+   fpu->fsw = fxsave->swd;
+   fpu->ftwx = fxsave->twd;
+   fpu->last_opcode = fxsave->fop;
+   fpu->last_ip = fxsave->rip;
+   fpu->last_dp = fxsave->rdp;
+   memcpy(fpu->xmm, fxsave->xmm_space, sizeof fxsave->xmm_space);
+
+   vcpu_put(vcpu);
+
+   return 0;
+}
+
+static int kvm_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+   struct fxsave *fxsave = (struct fxsave *)vcpu->guest_fx_image;
+
+   vcpu_load(vcpu);
+
+   memcpy(fxsave->st_space, fpu->fpr, 128);
+   fxsave->cwd = fpu->fcw;
+   fxsave->swd = fpu->fsw;
+   fxsave->twd = fpu->ftwx;
+   fxsave->fop = fpu->last_opcode;
+   fxsave->rip = fpu->last_ip;
+   fxsave->rdp = fpu->last_dp;
+   memcpy(fxsave->xmm_space, fpu->xmm, sizeof fxsave->xmm_space);
+
+   vcpu_put(vcpu);
+
+   return 0;
+}
+
 static long kvm_vcpu_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg)
 {
@@ -2533,6 +2594,31 @@ static long kvm_vcpu_ioctl(struct file *filp,
r = kvm_vcpu_ioctl_set_sigmask(vcpu, );
break;
}
+   case KVM_GET_FPU: {
+   struct kvm_fpu fpu;
+
+   memset(, 0, sizeof fpu);
+   r = kvm_vcpu_ioctl_get_fpu(vcpu, );
+   if (r)
+   goto out;
+   r = -EFAULT;
+   if (copy_to_user(argp, , sizeof fpu))
+   goto out;
+   r = 0;
+   break;
+   }
+   case KVM_SET_FPU: {
+   struct kvm_fpu fpu;
+
+   r = -EFAULT;
+   if (copy_from_user(, argp, sizeof fpu))
+   goto out;
+   r = kvm_vcpu_ioctl_set_fpu(vcpu, );
+   if (r)
+   goto out;
+   r = 0;
+   break;
+   }
default:
;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index da9b23f..07bf353 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -126,6 +126,21 @@ struct kvm_regs {
__u64 rip, rflags;
 };
 
+/* for KVM_GET_FPU and KVM_SET_FPU */
+struct kvm_fpu {
+   __u8  fpr[8][16];
+   __u16 fcw;
+   __u16 fsw;
+   __u8  ftwx;  /* in fxsave format */
+   __u8  pad1;
+   __u16 last_opcode;
+   __u64 last_ip;
+   __u64 last_dp;
+   __u8  xmm[16][16];
+   __u32 mxcsr;
+   __u32 pad2;
+};
+
 struct kvm_segment {
__u64 base;
__u32 limit;
@@ -285,5 +300,7 @@ struct kvm_signal_mask {
 #define KVM_SET_MSRS  _IOW(KVMIO,  0x89, struct kvm_msrs)
 #define KVM_SET_CPUID _IOW(KVMIO,  0x8a, struct kvm_cpuid)
 #define KVM_SET_SIGNAL_MASK   _IOW(KVMIO,  0x8b, struct kvm_signal_mask)
+#define KVM_GET_FPU   _IOR(KVMIO,  0x8c, struct kvm_fpu)
+#define KVM_SET_FPU   _IOW(KVMIO,  0x8d, struct kvm_fpu)
 
 #endif
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/41] KVM: Fix bogus sign extension in mmu mapping audit

2007-04-01 Thread Avi Kivity
When auditing a 32-bit guest on a 64-bit host, sign extension of the page
table directory pointer table index caused bogus addresses to be shown on
audit errors.

Fix by declaring the index unsigned.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/mmu.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index e85b4c7..266444a 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -1359,7 +1359,7 @@ static void audit_mappings_page(struct kvm_vcpu *vcpu, 
u64 page_pte,
 
 static void audit_mappings(struct kvm_vcpu *vcpu)
 {
-   int i;
+   unsigned i;
 
if (vcpu->mmu.root_level == 4)
audit_mappings_page(vcpu, vcpu->mmu.root_hpa, 0, 4);
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 36/41] KVM: x86 emulator: fix bit string operations operand size

2007-04-01 Thread Avi Kivity
On x86, bit operations operate on a string of bits that can reside in
multiple words.  For example, 'btsl %eax, (blah)' will touch the word
at blah+4 if %eax is between 32 and 63.

The x86 emulator compensates for that by advancing the operand address
by (bit offset / BITS_PER_LONG) and truncating the bit offset to the
range (0..BITS_PER_LONG-1).  This has a side effect of forcing the operand
size to 8 bytes on 64-bit hosts.

Now, a 32-bit guest goes and fork()s a process.  It write protects a stack
page at 0xb000 using the 'btr' instruction, at offset 0xffc in the page
table, with bit offset 1 (for the write permission bit).

The emulator now forces the operand size to 8 bytes as previously described,
and an innocent page table update turns into a cross-page-boundary write,
which is assumed by the mmu code not to be a page table, so it doesn't
actually clear the corresponding shadow page table entry.  The guest and
host permissions are out of sync and guest memory is corrupted soon
afterwards, leading to guest failure.

Fix by not using BITS_PER_LONG as the word size; instead use the actual
operand size, so we get a 32-bit write in that case.

Note we still have to teach the mmu to handle cross-page-boundary writes
to guest page table; but for now this allows Damn Small Linux 0.4 (2.4.20)
to boot.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/x86_emulate.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/kvm/x86_emulate.c b/drivers/kvm/x86_emulate.c
index 7513cdd..bcf872b 100644
--- a/drivers/kvm/x86_emulate.c
+++ b/drivers/kvm/x86_emulate.c
@@ -833,8 +833,9 @@ done_prefixes:
dst.ptr = (unsigned long *)cr2;
dst.bytes = (d & ByteOp) ? 1 : op_bytes;
if (d & BitOp) {
-   dst.ptr += src.val / BITS_PER_LONG;
-   dst.bytes = sizeof(long);
+   unsigned long mask = ~(dst.bytes * 8 - 1);
+
+   dst.ptr = (void *)dst.ptr + (src.val & mask) / 8;
}
if (!(d & Mov) && /* optimisation - avoid slow emulated read */
((rc = ops->read_emulated((unsigned long)dst.ptr,
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 39/41] KVM: Add physical memory aliasing feature

2007-04-01 Thread Avi Kivity
With this, we can specify that accesses to one physical memory range will
be remapped to another.  This is useful for the vga window at 0xa which
is used as a movable window into the (much larger) framebuffer.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h  |9 +
 drivers/kvm/kvm_main.c |   89 ++--
 include/linux/kvm.h|   10 +-
 3 files changed, 104 insertions(+), 4 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index d19985a..fceeb84 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -51,6 +51,7 @@
 #define UNMAPPED_GVA (~(gpa_t)0)
 
 #define KVM_MAX_VCPUS 1
+#define KVM_ALIAS_SLOTS 4
 #define KVM_MEMORY_SLOTS 4
 #define KVM_NUM_MMU_PAGES 256
 #define KVM_MIN_FREE_MMU_PAGES 5
@@ -312,6 +313,12 @@ struct kvm_vcpu {
struct kvm_cpuid_entry cpuid_entries[KVM_MAX_CPUID_ENTRIES];
 };
 
+struct kvm_mem_alias {
+   gfn_t base_gfn;
+   unsigned long npages;
+   gfn_t target_gfn;
+};
+
 struct kvm_memory_slot {
gfn_t base_gfn;
unsigned long npages;
@@ -322,6 +329,8 @@ struct kvm_memory_slot {
 
 struct kvm {
spinlock_t lock; /* protects everything except vcpus */
+   int naliases;
+   struct kvm_mem_alias aliases[KVM_ALIAS_SLOTS];
int nmemslots;
struct kvm_memory_slot memslots[KVM_MEMORY_SLOTS];
/*
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index aaaf306..56ddbc1 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -846,7 +846,73 @@ out:
return r;
 }
 
-struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
+/*
+ * Set a new alias region.  Aliases map a portion of physical memory into
+ * another portion.  This is useful for memory windows, for example the PC
+ * VGA region.
+ */
+static int kvm_vm_ioctl_set_memory_alias(struct kvm *kvm,
+struct kvm_memory_alias *alias)
+{
+   int r, n;
+   struct kvm_mem_alias *p;
+
+   r = -EINVAL;
+   /* General sanity checks */
+   if (alias->memory_size & (PAGE_SIZE - 1))
+   goto out;
+   if (alias->guest_phys_addr & (PAGE_SIZE - 1))
+   goto out;
+   if (alias->slot >= KVM_ALIAS_SLOTS)
+   goto out;
+   if (alias->guest_phys_addr + alias->memory_size
+   < alias->guest_phys_addr)
+   goto out;
+   if (alias->target_phys_addr + alias->memory_size
+   < alias->target_phys_addr)
+   goto out;
+
+   spin_lock(>lock);
+
+   p = >aliases[alias->slot];
+   p->base_gfn = alias->guest_phys_addr >> PAGE_SHIFT;
+   p->npages = alias->memory_size >> PAGE_SHIFT;
+   p->target_gfn = alias->target_phys_addr >> PAGE_SHIFT;
+
+   for (n = KVM_ALIAS_SLOTS; n > 0; --n)
+   if (kvm->aliases[n - 1].npages)
+   break;
+   kvm->naliases = n;
+
+   spin_unlock(>lock);
+
+   vcpu_load(>vcpus[0]);
+   spin_lock(>lock);
+   kvm_mmu_zap_all(>vcpus[0]);
+   spin_unlock(>lock);
+   vcpu_put(>vcpus[0]);
+
+   return 0;
+
+out:
+   return r;
+}
+
+static gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn)
+{
+   int i;
+   struct kvm_mem_alias *alias;
+
+   for (i = 0; i < kvm->naliases; ++i) {
+   alias = >aliases[i];
+   if (gfn >= alias->base_gfn
+   && gfn < alias->base_gfn + alias->npages)
+   return alias->target_gfn + gfn - alias->base_gfn;
+   }
+   return gfn;
+}
+
+static struct kvm_memory_slot *__gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
 {
int i;
 
@@ -859,13 +925,19 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, 
gfn_t gfn)
}
return NULL;
 }
-EXPORT_SYMBOL_GPL(gfn_to_memslot);
+
+struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
+{
+   gfn = unalias_gfn(kvm, gfn);
+   return __gfn_to_memslot(kvm, gfn);
+}
 
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 {
struct kvm_memory_slot *slot;
 
-   slot = gfn_to_memslot(kvm, gfn);
+   gfn = unalias_gfn(kvm, gfn);
+   slot = __gfn_to_memslot(kvm, gfn);
if (!slot)
return NULL;
return slot->phys_mem[gfn - slot->base_gfn];
@@ -2503,6 +2575,17 @@ static long kvm_vm_ioctl(struct file *filp,
goto out;
break;
}
+   case KVM_SET_MEMORY_ALIAS: {
+   struct kvm_memory_alias alias;
+
+   r = -EFAULT;
+   if (copy_from_user(, argp, sizeof alias))
+   goto out;
+   r = kvm_vm_ioctl_set_memory_alias(kvm, );
+   if (r)
+   goto out;
+   break;
+   }
default:
;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 728b24c..da9b23f 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ 

[PATCH 34/41] KVM: Use list_move()

2007-04-01 Thread Avi Kivity
Use list_move() where possible.  Noticed by Dor Laor.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/mmu.c |   12 
 1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index 0216b77..c2487b6 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -437,9 +437,8 @@ static void kvm_mmu_free_page(struct kvm_vcpu *vcpu, hpa_t 
page_hpa)
struct kvm_mmu_page *page_head = page_header(page_hpa);
 
ASSERT(is_empty_shadow_page(page_hpa));
-   list_del(_head->link);
page_head->page_hpa = page_hpa;
-   list_add(_head->link, >free_pages);
+   list_move(_head->link, >free_pages);
++vcpu->kvm->n_free_mmu_pages;
 }
 
@@ -457,8 +456,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct 
kvm_vcpu *vcpu,
return NULL;
 
page = list_entry(vcpu->free_pages.next, struct kvm_mmu_page, link);
-   list_del(>link);
-   list_add(>link, >kvm->active_mmu_pages);
+   list_move(>link, >kvm->active_mmu_pages);
ASSERT(is_empty_shadow_page(page->page_hpa));
page->slot_bitmap = 0;
page->multimapped = 0;
@@ -670,10 +668,8 @@ static void kvm_mmu_zap_page(struct kvm_vcpu *vcpu,
if (!page->root_count) {
hlist_del(>hash_link);
kvm_mmu_free_page(vcpu, page->page_hpa);
-   } else {
-   list_del(>link);
-   list_add(>link, >kvm->active_mmu_pages);
-   }
+   } else
+   list_move(>link, >kvm->active_mmu_pages);
 }
 
 static int kvm_mmu_unprotect_page(struct kvm_vcpu *vcpu, gfn_t gfn)
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 38/41] KVM: Simply gfn_to_page()

2007-04-01 Thread Avi Kivity
Mapping a guest page to a host page is a common operation.  Currently,
one has first to find the memory slot where the page belongs (gfn_to_memslot),
then locate the page itself (gfn_to_page()).

This is clumsy, and also won't work well with memory aliases.  So simplify
gfn_to_page() not to require memory slot translation first, and instead do it
internally.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h  |   12 +---
 drivers/kvm/kvm_main.c |   45 +
 drivers/kvm/mmu.c  |   12 
 drivers/kvm/vmx.c  |6 +++---
 4 files changed, 33 insertions(+), 42 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 59357be..d19985a 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -443,11 +443,7 @@ void kvm_emulator_want_group7_invlpg(void);
 
 extern hpa_t bad_page_address;
 
-static inline struct page *gfn_to_page(struct kvm_memory_slot *slot, gfn_t gfn)
-{
-   return slot->phys_mem[gfn - slot->base_gfn];
-}
-
+struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
 struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
 
@@ -523,12 +519,6 @@ static inline int kvm_mmu_page_fault(struct kvm_vcpu 
*vcpu, gva_t gva,
return vcpu->mmu.page_fault(vcpu, gva, error_code);
 }
 
-static inline struct page *_gfn_to_page(struct kvm *kvm, gfn_t gfn)
-{
-   struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
-   return (slot) ? slot->phys_mem[gfn - slot->base_gfn] : NULL;
-}
-
 static inline int is_long_mode(struct kvm_vcpu *vcpu)
 {
 #ifdef CONFIG_X86_64
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 1ef5e9a..aaaf306 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -420,12 +420,12 @@ static int load_pdptrs(struct kvm_vcpu *vcpu, unsigned 
long cr3)
u64 pdpte;
u64 *pdpt;
int ret;
-   struct kvm_memory_slot *memslot;
+   struct page *page;
 
spin_lock(>kvm->lock);
-   memslot = gfn_to_memslot(vcpu->kvm, pdpt_gfn);
-   /* FIXME: !memslot - emulate? 0xff? */
-   pdpt = kmap_atomic(gfn_to_page(memslot, pdpt_gfn), KM_USER0);
+   page = gfn_to_page(vcpu->kvm, pdpt_gfn);
+   /* FIXME: !page - emulate? 0xff? */
+   pdpt = kmap_atomic(page, KM_USER0);
 
ret = 1;
for (i = 0; i < 4; ++i) {
@@ -861,6 +861,17 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, 
gfn_t gfn)
 }
 EXPORT_SYMBOL_GPL(gfn_to_memslot);
 
+struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
+{
+   struct kvm_memory_slot *slot;
+
+   slot = gfn_to_memslot(kvm, gfn);
+   if (!slot)
+   return NULL;
+   return slot->phys_mem[gfn - slot->base_gfn];
+}
+EXPORT_SYMBOL_GPL(gfn_to_page);
+
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
 {
int i;
@@ -899,20 +910,20 @@ static int emulator_read_std(unsigned long addr,
unsigned offset = addr & (PAGE_SIZE-1);
unsigned tocopy = min(bytes, (unsigned)PAGE_SIZE - offset);
unsigned long pfn;
-   struct kvm_memory_slot *memslot;
-   void *page;
+   struct page *page;
+   void *page_virt;
 
if (gpa == UNMAPPED_GVA)
return X86EMUL_PROPAGATE_FAULT;
pfn = gpa >> PAGE_SHIFT;
-   memslot = gfn_to_memslot(vcpu->kvm, pfn);
-   if (!memslot)
+   page = gfn_to_page(vcpu->kvm, pfn);
+   if (!page)
return X86EMUL_UNHANDLEABLE;
-   page = kmap_atomic(gfn_to_page(memslot, pfn), KM_USER0);
+   page_virt = kmap_atomic(page, KM_USER0);
 
-   memcpy(data, page + offset, tocopy);
+   memcpy(data, page_virt + offset, tocopy);
 
-   kunmap_atomic(page, KM_USER0);
+   kunmap_atomic(page_virt, KM_USER0);
 
bytes -= tocopy;
data += tocopy;
@@ -963,16 +974,14 @@ static int emulator_read_emulated(unsigned long addr,
 static int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
   unsigned long val, int bytes)
 {
-   struct kvm_memory_slot *m;
struct page *page;
void *virt;
 
if (((gpa + bytes - 1) >> PAGE_SHIFT) != (gpa >> PAGE_SHIFT))
return 0;
-   m = gfn_to_memslot(vcpu->kvm, gpa >> PAGE_SHIFT);
-   if (!m)
+   page = gfn_to_page(vcpu->kvm, gpa >> PAGE_SHIFT);
+   if (!page)
return 0;
-   page = gfn_to_page(m, gpa >> PAGE_SHIFT);
kvm_mmu_pre_write(vcpu, gpa, bytes);
mark_page_dirty(vcpu->kvm, gpa >> PAGE_SHIFT);
virt = kmap_atomic(page, KM_USER0);
@@ -2507,15 +2516,11 @@ static struct page *kvm_vm_nopage(struct vm_area_struct 
*vma,
 {
struct kvm *kvm = vma->vm_file->private_data;
unsigned long pgoff;
-   struct 

[PATCH 32/41] KVM: SVM: Ensure timestamp counter monotonicity

2007-04-01 Thread Avi Kivity
When a vcpu is migrated from one cpu to another, its timestamp counter
may lose its monotonic property if the host has unsynced timestamp counters.
This can confuse the guest, sometimes to the point of refusing to boot.

As the rdtsc instruction is rather fast on AMD processors (7-10 cycles),
we can simply record the last host tsc when we drop the cpu, and adjust
the vcpu tsc offset when we detect that we've migrated to a different cpu.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h |1 +
 drivers/kvm/svm.c |   21 +
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index f5e343c..6d0bd7a 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -244,6 +244,7 @@ struct kvm_vcpu {
struct mutex mutex;
int   cpu;
int   launched;
+   u64 host_tsc;
struct kvm_run *run;
int interrupt_window_open;
unsigned long irq_summary; /* bit vector: 1 per word in irq_pending */
diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index 0542d33..ca2642f 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -459,7 +459,6 @@ static void init_vmcb(struct vmcb *vmcb)
 {
struct vmcb_control_area *control = >control;
struct vmcb_save_area *save = >save;
-   u64 tsc;
 
control->intercept_cr_read =INTERCEPT_CR0_MASK |
INTERCEPT_CR3_MASK |
@@ -517,8 +516,7 @@ static void init_vmcb(struct vmcb *vmcb)
 
control->iopm_base_pa = iopm_base;
control->msrpm_base_pa = msrpm_base;
-   rdtscll(tsc);
-   control->tsc_offset = -tsc;
+   control->tsc_offset = 0;
control->int_ctl = V_INTR_MASKING_MASK;
 
init_seg(>es);
@@ -606,11 +604,26 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
 
 static void svm_vcpu_load(struct kvm_vcpu *vcpu)
 {
-   get_cpu();
+   int cpu;
+
+   cpu = get_cpu();
+   if (unlikely(cpu != vcpu->cpu)) {
+   u64 tsc_this, delta;
+
+   /*
+* Make sure that the guest sees a monotonically
+* increasing TSC.
+*/
+   rdtscll(tsc_this);
+   delta = vcpu->host_tsc - tsc_this;
+   vcpu->svm->vmcb->control.tsc_offset += delta;
+   vcpu->cpu = cpu;
+   }
 }
 
 static void svm_vcpu_put(struct kvm_vcpu *vcpu)
 {
+   rdtscll(vcpu->host_tsc);
put_cpu();
 }
 
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/41] KVM: Use a shared page for kernel/user communication when runing a vcpu

2007-04-01 Thread Avi Kivity
Instead of passing a 'struct kvm_run' back and forth between the kernel and
userspace, allocate a page and allow the user to mmap() it.  This reduces
needless copying and makes the interface expandable by providing lots of
free space.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h  |1 +
 drivers/kvm/kvm_main.c |   54 +++
 include/linux/kvm.h|6 ++--
 3 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 0d122bf..901b8d9 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -228,6 +228,7 @@ struct kvm_vcpu {
struct mutex mutex;
int   cpu;
int   launched;
+   struct kvm_run *run;
int interrupt_window_open;
unsigned long irq_summary; /* bit vector: 1 per word in irq_pending */
 #define NR_IRQ_WORDS KVM_IRQ_BITMAP_SIZE(unsigned long)
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 946ed86..42be8a8 100755
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -355,6 +355,8 @@ static void kvm_free_vcpu(struct kvm_vcpu *vcpu)
kvm_mmu_destroy(vcpu);
vcpu_put(vcpu);
kvm_arch_ops->vcpu_free(vcpu);
+   free_page((unsigned long)vcpu->run);
+   vcpu->run = NULL;
 }
 
 static void kvm_free_vcpus(struct kvm *kvm)
@@ -1887,6 +1889,33 @@ static int kvm_vcpu_ioctl_debug_guest(struct kvm_vcpu 
*vcpu,
return r;
 }
 
+static struct page *kvm_vcpu_nopage(struct vm_area_struct *vma,
+   unsigned long address,
+   int *type)
+{
+   struct kvm_vcpu *vcpu = vma->vm_file->private_data;
+   unsigned long pgoff;
+   struct page *page;
+
+   *type = VM_FAULT_MINOR;
+   pgoff = ((address - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
+   if (pgoff != 0)
+   return NOPAGE_SIGBUS;
+   page = virt_to_page(vcpu->run);
+   get_page(page);
+   return page;
+}
+
+static struct vm_operations_struct kvm_vcpu_vm_ops = {
+   .nopage = kvm_vcpu_nopage,
+};
+
+static int kvm_vcpu_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   vma->vm_ops = _vcpu_vm_ops;
+   return 0;
+}
+
 static int kvm_vcpu_release(struct inode *inode, struct file *filp)
 {
struct kvm_vcpu *vcpu = filp->private_data;
@@ -1899,6 +1928,7 @@ static struct file_operations kvm_vcpu_fops = {
.release= kvm_vcpu_release,
.unlocked_ioctl = kvm_vcpu_ioctl,
.compat_ioctl   = kvm_vcpu_ioctl,
+   .mmap   = kvm_vcpu_mmap,
 };
 
 /*
@@ -1947,6 +1977,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, int 
n)
 {
int r;
struct kvm_vcpu *vcpu;
+   struct page *page;
 
r = -EINVAL;
if (!valid_vcpu(n))
@@ -1961,6 +1992,12 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, int 
n)
return -EEXIST;
}
 
+   page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+   r = -ENOMEM;
+   if (!page)
+   goto out_unlock;
+   vcpu->run = page_address(page);
+
vcpu->host_fx_image = (char*)ALIGN((hva_t)vcpu->fx_buf,
   FX_IMAGE_ALIGN);
vcpu->guest_fx_image = vcpu->host_fx_image + FX_IMAGE_SIZE;
@@ -1990,6 +2027,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, int 
n)
 
 out_free_vcpus:
kvm_free_vcpu(vcpu);
+out_unlock:
mutex_unlock(>mutex);
 out:
return r;
@@ -2003,21 +2041,9 @@ static long kvm_vcpu_ioctl(struct file *filp,
int r = -EINVAL;
 
switch (ioctl) {
-   case KVM_RUN: {
-   struct kvm_run kvm_run;
-
-   r = -EFAULT;
-   if (copy_from_user(_run, argp, sizeof kvm_run))
-   goto out;
-   r = kvm_vcpu_ioctl_run(vcpu, _run);
-   if (r < 0 &&  r != -EINTR)
-   goto out;
-   if (copy_to_user(argp, _run, sizeof kvm_run)) {
-   r = -EFAULT;
-   goto out;
-   }
+   case KVM_RUN:
+   r = kvm_vcpu_ioctl_run(vcpu, vcpu->run);
break;
-   }
case KVM_GET_REGS: {
struct kvm_regs kvm_regs;
 
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 275354f..d88e750 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -11,7 +11,7 @@
 #include 
 #include 
 
-#define KVM_API_VERSION 4
+#define KVM_API_VERSION 5
 
 /*
  * Architectural interrupt line count, and the size of the bitmap needed
@@ -49,7 +49,7 @@ enum kvm_exit_reason {
KVM_EXIT_SHUTDOWN = 8,
 };
 
-/* for KVM_RUN */
+/* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
 struct kvm_run {
/* in */
__u32 emulated;  /* skip current instruction */
@@ -233,7 +233,7 @@ struct kvm_dirty_log {
 /*
  * ioctls for vcpu fds
  */
-#define KVM_RUN   _IOWR(KVMIO, 2, struct 

[PATCH 41/41] KVM: SVM: enable LBRV virtualization if available

2007-04-01 Thread Avi Kivity
From: Joerg Roedel <[EMAIL PROTECTED]>

This patch enables the virtualization of the last branch record MSRs on
SVM if this feature is available in hardware. It also introduces a small
and simple check feature for specific SVM extensions.

Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/svm.c |   13 +
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index 303e959..b7e1410 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -44,6 +44,10 @@ MODULE_LICENSE("GPL");
 #define KVM_EFER_LMA (1 << 10)
 #define KVM_EFER_LME (1 << 8)
 
+#define SVM_FEATURE_NPT  (1 << 0)
+#define SVM_FEATURE_LBRV (1 << 1)
+#define SVM_DEATURE_SVML (1 << 2)
+
 unsigned long iopm_base;
 unsigned long msrpm_base;
 
@@ -68,6 +72,7 @@ struct svm_cpu_data {
 };
 
 static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
+static uint32_t svm_features;
 
 struct svm_init_data {
int cpu;
@@ -82,6 +87,11 @@ static u32 msrpm_ranges[] = {0, 0xc000, 0xc001};
 
 #define MAX_INST_SIZE 15
 
+static inline u32 svm_has(u32 feat)
+{
+   return svm_features & feat;
+}
+
 static unsigned get_addr_size(struct kvm_vcpu *vcpu)
 {
struct vmcb_save_area *sa = >svm->vmcb->save;
@@ -302,6 +312,7 @@ static void svm_hardware_enable(void *garbage)
svm_data->asid_generation = 1;
svm_data->max_asid = cpuid_ebx(SVM_CPUID_FUNC) - 1;
svm_data->next_asid = svm_data->max_asid + 1;
+   svm_features = cpuid_edx(SVM_CPUID_FUNC);
 
asm volatile ( "sgdt %0" : "=m"(gdt_descr) );
gdt = (struct desc_struct *)gdt_descr.address;
@@ -511,6 +522,8 @@ static void init_vmcb(struct vmcb *vmcb)
control->msrpm_base_pa = msrpm_base;
control->tsc_offset = 0;
control->int_ctl = V_INTR_MASKING_MASK;
+   if (svm_has(SVM_FEATURE_LBRV))
+   control->lbr_ctl = 1ULL;
 
init_seg(>es);
init_seg(>ss);
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/41] KVM: Fix guest register corruption on paravirt hypercall

2007-04-01 Thread Avi Kivity
From: Dor Laor <[EMAIL PROTECTED]>

The hypercall code mixes up the ->cache_regs() and ->decache_regs()
callbacks, resulting in guest register corruption.

Signed-off-by: Dor Laor <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm_main.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index dc7a8c7..ff7c836 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -1177,7 +1177,7 @@ int kvm_hypercall(struct kvm_vcpu *vcpu, struct kvm_run 
*run)
 {
unsigned long nr, a0, a1, a2, a3, a4, a5, ret;
 
-   kvm_arch_ops->decache_regs(vcpu);
+   kvm_arch_ops->cache_regs(vcpu);
ret = -KVM_EINVAL;
 #ifdef CONFIG_X86_64
if (is_long_mode(vcpu)) {
@@ -1204,7 +1204,7 @@ int kvm_hypercall(struct kvm_vcpu *vcpu, struct kvm_run 
*run)
;
}
vcpu->regs[VCPU_REGS_RAX] = ret;
-   kvm_arch_ops->cache_regs(vcpu);
+   kvm_arch_ops->decache_regs(vcpu);
return 1;
 }
 EXPORT_SYMBOL_GPL(kvm_hypercall);
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/41] KVM: Use own minor number

2007-04-01 Thread Avi Kivity
Use the minor number (232) allocated to kvm by lanana.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm_main.c |2 +-
 include/linux/miscdevice.h |1 +
 2 files changed, 2 insertions(+), 1 deletions(-)
 mode change 100644 => 100755 drivers/kvm/kvm_main.c

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
old mode 100644
new mode 100755
index ff7c836..946ed86
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -2299,7 +2299,7 @@ static struct file_operations kvm_chardev_ops = {
 };
 
 static struct miscdevice kvm_dev = {
-   MISC_DYNAMIC_MINOR,
+   KVM_MINOR,
"kvm",
_chardev_ops,
 };
diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
index 326da7d..dff9ea3 100644
--- a/include/linux/miscdevice.h
+++ b/include/linux/miscdevice.h
@@ -29,6 +29,7 @@
 
 #define TUN_MINOR   200
 #defineHPET_MINOR   228
+#define KVM_MINOR232
 
 struct device;
 
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2.4] (resent) Watchdog wdt83627 (Winbond W83627HF/F/HG/G) driver, 2.6 backport

2007-04-01 Thread Tal Kelrich
(resent due to a combination of mailer stupidity and my own
I had to mangle Padraig's name to make it stop encoding as
Quoted-Printable. hope this is palatable. sorry for the resends)

Hello,

Tested and working on Kontron JREX-PM.
fairly straightforward backport of w83627hf_wdt.c from 2.6.20.1

Changes from 2.6 version
Default timeout set to 120 seconds
Nonstandard read only proc interface (/proc/watchdog)
Always reset timer on driver load
Changed timeout limit to 255
Ignores failure to acquire IO port

Caveats:
Ignores failure to acquire IO port since it is always taken, there's
probably a better way around that.
Releases IO port regardless of having acquired it.

-- 
Tal Kelrich
PGP fingerprint: 3EDF FCC5 60BB 4729 AB2F  CAE6 FEC1 9AAC 12B9 AA69
Key Available at: http://www.hasturkun.com/pub.txt

The real reason psychology is hard is that psychologists are trying to
do the impossible.

--- linux-2.4.34.2/drivers/char/Config.in   Sat Mar 24 08:44:54 2007
+++ linux-2.4.34.2-w83627hf/drivers/char/Config.in  Sun Apr  1 16:23:01 2007
@@ -263,6 +263,7 @@
tristate '  W83877F (EMACS) Watchdog Timer' CONFIG_W83877F_WDT
tristate '  WDT Watchdog timer' CONFIG_WDT
tristate '  WDT PCI Watchdog timer' CONFIG_WDTPCI
+   tristate '  W83627HF/F/HG/G Watchdog' CONFIG_WDT_W83627
if [ "$CONFIG_WDT" != "n" ]; then
   bool 'WDT501 features' CONFIG_WDT_501
   if [ "$CONFIG_WDT_501" = "y" ]; then
--- linux-2.4.34.2/drivers/char/MakefileSat Mar 24 08:44:54 2007
+++ linux-2.4.34.2-w83627hf/drivers/char/Makefile   Sun Apr  1 16:23:01 2007
@@ -323,6 +323,7 @@
 obj-$(CONFIG_SOFT_WATCHDOG) += softdog.o
 obj-$(CONFIG_INDYDOG) += indydog.o
 obj-$(CONFIG_8xx_WDT) += mpc8xx_wdt.o
+obj-$(CONFIG_WDT_W83627) += wdt83627.o
 
 subdir-$(CONFIG_MWAVE) += mwave
 ifeq ($(CONFIG_MWAVE),y)
--- linux-2.4.34.2/drivers/char/wdt83627.c  Thu Jan  1 02:00:00 1970
+++ linux-2.4.34.2-w83627hf/drivers/char/wdt83627.c Sun Apr  1 16:22:35 2007
@@ -0,0 +1,416 @@
+/*
+ * w83627hf WDT driver
+ *
+ * backported from w83627hf_wdt.c kernel 2.6.20.1
+ * (c) Copyright 2007 Orpak Systems Ltd. (Tal Kelrich <[EMAIL PROTECTED]>)
+ *
+ * (c) Copyright 2003 Padraig Brady <[EMAIL PROTECTED]>
+ *
+ * Based on advantechwdt.c which is based on wdt.c.
+ * Original copyright messages:
+ *
+ * (c) Copyright 2000-2001 Marek Michalkiewicz <[EMAIL PROTECTED]>
+ *
+ * (c) Copyright 1996 Alan Cox <[EMAIL PROTECTED]>, All Rights Reserved.
+ * http://www.redhat.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Neither Alan Cox nor CymruNet Ltd. admit liability nor provide
+ * warranty for any of this software. This material is provided
+ * "AS-IS" and at no charge.
+ *
+ * (c) Copyright 1995Alan Cox <[EMAIL PROTECTED]>
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#define WATCHDOG_NAME "W83627HF WDT"
+#define PFX WATCHDOG_NAME ": "
+#define WATCHDOG_TIMEOUT 120   /* 120 sec default timeout */
+
+#ifdef CONFIG_WATCHDOG_NOWAYOUT
+#define WATCHDOG_NOWAYOUT 1
+#else
+#define WATCHDOG_NOWAYOUT 0
+#endif
+
+static unsigned long wdt_is_open;
+static char expect_close;
+static spinlock_t io_lock;
+
+/* You must set this - there is no sane way to probe for this board. */
+static int wdt_io = 0x2E;
+MODULE_PARM(wdt_io, "i");
+MODULE_PARM_DESC(wdt_io, "w83627hf WDT io port (default 0x2E)");
+
+static int timeout = WATCHDOG_TIMEOUT; /* in seconds */
+MODULE_PARM(timeout, "i");
+MODULE_PARM_DESC(timeout, "Watchdog timeout in seconds. 1<= timeout <=255, 
default=" __MODULE_STRING(WATCHDOG_TIMEOUT) ".");
+
+static int nowayout = WATCHDOG_NOWAYOUT;
+MODULE_PARM(nowayout, "i");
+MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started (default=" 
__MODULE_STRING(CONFIG_WATCHDOG_NOWAYOUT)")");
+
+/*
+ * Kernel methods.
+ */
+
+#define WDT_EFER (wdt_io+0)   /* Extended Function Enable Registers */
+#define WDT_EFIR (wdt_io+0)   /* Extended Function Index Register (same as 
EFER) */
+#define WDT_EFDR (WDT_EFIR+1) /* Extended Function Data Register */
+
+/* Non standard proc bits, added by request, wanted some feedback */
+
+static int wdt_readproc(char *page, char **start, off_t off, int count,
+   int *eof, void *data)
+{
+   int len;
+   unsigned char remaining;
+   unsigned char fired;
+   spin_lock(_lock);
+   w83627hf_select_wd_register();
+   outb_p(0xF6, WDT_EFIR);/* get current timer val */
+   remaining=inb_p(WDT_EFDR);
+   outb_p(0xF7, WDT_EFIR);
+   fired=inb_p(WDT_EFDR);
+   /* clear that bit (bit 4) */
+   

[2.4] Watchdog wdt977 (Winbond W83977EF) driver

2007-04-01 Thread Tal Kelrich
(resent due to mailer stupidity)
Hello,

This is my first submitted kernel patch, please be gentle.

Tested and working on AAEON GENE-6310B Subcompact Board
(also configured for same by default, should work elsewhere)
patch is against kernel 2.4.34.2

Changes/Features:

Added ioctl support
Disables watchdog on driver load
Supports timeout in seconds
Timeout defaults to 2 minutes
No longer under NetWinder arch
Configurable output GP (defaults to GP16)
Configurable base IO address
Non standard read only proc interface for status (/proc/watchdog)

Caveats:

No idea if this breaks netwinder, although it really shouldn't
Only tested with GP16
Utterly ignores inability to get its IO port, mostly because it's
already taken. I didn't know a way around that.
release_region is called regardless of having acquired the region, this
might be trouble.

-- 
Tal Kelrich
PGP fingerprint: 3EDF FCC5 60BB 4729 AB2F  CAE6 FEC1 9AAC 12B9 AA69
Key Available at: http://www.hasturkun.com/pub.txt

To err is human, to forgive is against company policy.


diff -udr linux-2.4.34.2/drivers/char/Config.in 
linux-2.4.34.2-wdt977/drivers/char/Config.in
--- linux-2.4.34.2/drivers/char/Config.in   Sat Mar 24 08:44:54 2007
+++ linux-2.4.34.2-wdt977/drivers/char/Config.inWed Mar 28 10:49:02 2007
@@ -247,10 +247,8 @@
tristate '  Berkshire Products PC Watchdog' CONFIG_PCWATCHDOG
if [ "$CONFIG_FOOTBRIDGE" = "y" ]; then
   tristate '  DC21285 watchdog' CONFIG_21285_WATCHDOG
-  if [ "$CONFIG_ARCH_NETWINDER" = "y" ]; then
- tristate '  NetWinder WB83C977 watchdog' CONFIG_977_WATCHDOG
-  fi
fi
+   tristate '  Winbond W83977EF Watchdog Timer' CONFIG_977_WATCHDOG
tristate '  Eurotech CPU-1220/1410 Watchdog Timer' CONFIG_EUROTECH_WDT
tristate '  IB700 SBC Watchdog Timer' CONFIG_IB700_WDT
tristate '  ICP ELectronics Wafer 5823 Watchdog' CONFIG_WAFER_WDT
--- linux-2.4.34.2/drivers/char/wdt977.cSat Mar 24 08:44:54 2007
+++ linux-2.4.34.2-wdt977/drivers/char/wdt977.c Wed Mar 28 10:52:23 2007
@@ -1,5 +1,7 @@
 /*
- * Wdt977  0.02:   A Watchdog Device for Netwinder W83977AF chip
+ * Wdt83977 0.03:  A Watchdog Device for Winbond W83977EF chip
+ * (c) Copyright 2007 Orpak Systems Ltd. (Tal Kelrich <[EMAIL PROTECTED]>)
+ * based on wdt977 driver by Woody Suwalski <[EMAIL PROTECTED]>
  *
  * (c) Copyright 1998 Rebel.com (Woody Suwalski <[EMAIL PROTECTED]>)
  *
@@ -9,10 +11,6 @@
  * modify it under the terms of the GNU General Public License
  * as published by the Free Software Foundation; either version
  * 2 of the License, or (at your option) any later version.
- *
- * ---
- *  14-Dec-2001 Matt Domsch <[EMAIL PROTECTED]>
- *   Added nowayout module option to override CONFIG_WATCHDOG_NOWAYOUT
  */
  
 #include 
@@ -23,17 +21,25 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 
 #include 
 #include 
-#include 
+#include 
+#include 
 
 #define WATCHDOG_MINOR 130
 
-static int timeout = 3;
+static int timeout = 120;
 static int timer_alive;
-static int testmode;
 static int expect_close = 0;
+static spinlock_t wdt_lock;
+
+/* port is either 0X370 or 0x3F0. there's probably no way to detect this */
+static int wdt_io = 0x370;
 
 #ifdef CONFIG_WATCHDOG_NOWAYOUT
 static int nowayout = 1;
@@ -41,14 +47,77 @@
 static int nowayout = 0;
 #endif
 
+static int whichgp = 16;
+
+MODULE_PARM(wdt_io,"i");
+MODULE_PARM_DESC(wdt_io,"WDT io port base (0x370/0x3F0)");
+
+MODULE_PARM(whichgp,"i");
+MODULE_PARM_DESC(whichgp,"which gp? (12/13/16)");
+
+MODULE_PARM(timeout,"i");
+
 MODULE_PARM(nowayout,"i");
 MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started 
(default=CONFIG_WATCHDOG_NOWAYOUT)");
 
+#define WDT_EFER (wdt_io+0)   /* Extended Function Enable Registers */
+#define WDT_EFIR (wdt_io+0)   /* Extended Function Index Register (same as 
EFER) */
+#define WDT_EFDR (WDT_EFIR+1) /* Extended Function Data Register */
+
+#define WDT_OUT(reg,data) {outb_p(reg,WDT_EFIR);outb_p(data,WDT_EFDR);}
+#define WDT_IN(reg,out) {outb_p(reg,WDT_EFIR);out=inb_p(WDT_EFDR);}
+#define WDT_DEV(device) {WDT_OUT(0x07,device);}
+#define WDT_ENABLE {outb_p(0x87,WDT_EFER);outb_p(0x87,WDT_EFER);}
+#define WDT_DISABLE{outb_p(0xAA,WDT_EFER);}
+
+static int wdt977_readproc(char *page, char **start, off_t off, int count,
+   int *eof, void *data)
+{
+   int len;
+   unsigned char remaining;
+   unsigned char fired;
+   spin_lock(_lock);
+   WDT_ENABLE;
+   WDT_DEV(0x08);
+   WDT_IN(0xF2,remaining); /* get remaining time */
+   WDT_IN(0xF4,fired); /* and some nice status bits */
+   /* and clear the bit we care about */
+   WDT_OUT(0xF4,fired&(~0x01));
+   WDT_DISABLE;
+   spin_unlock(_lock);
+   fired=fired & 0x01;
+   len=snprintf(page,PAGE_SIZE,
+   "W83977EF device\n"
+   "active=%d\n"
+   

[2.4] Watchdog wdt83627 (Winbond W83627HF/F/HG/G) driver, 2.6 backport

2007-04-01 Thread Tal Kelrich
Hello,

Tested and working on Kontron JREX-PM.
fairly straightforward backport of w83627hf_wdt.c from 2.6.20.1

Changes from 2.6 version
Default timeout set to 120 seconds
Nonstandard read only proc interface (/proc/watchdog)
Always reset timer on driver load
Changed timeout limit to 255
Ignores failure to acquire IO port

Caveats:
Ignores failure to acquire IO port since it is always taken, there's
probably a better way around that.
Releases IO port regardless of having acquired it.

-- 
Tal Kelrich
PGP fingerprint: 3EDF FCC5 60BB 4729 AB2F  CAE6 FEC1 9AAC 12B9 AA69
Key Available at: http://www.hasturkun.com/pub.txt

The real reason psychology is hard is that psychologists are trying to
do the impossible.

--- linux-2.4.34.2/drivers/char/Config.in   Sat Mar 24 08:44:54 2007
+++ linux-2.4.34.2-w83627hf/drivers/char/Config.in  Sun Apr  1 16:23:01 2007
@@ -263,6 +263,7 @@
tristate '  W83877F (EMACS) Watchdog Timer' CONFIG_W83877F_WDT
tristate '  WDT Watchdog timer' CONFIG_WDT
tristate '  WDT PCI Watchdog timer' CONFIG_WDTPCI
+   tristate '  W83627HF/F/HG/G Watchdog' CONFIG_WDT_W83627
if [ "$CONFIG_WDT" != "n" ]; then
   bool 'WDT501 features' CONFIG_WDT_501
   if [ "$CONFIG_WDT_501" = "y" ]; then
--- linux-2.4.34.2/drivers/char/MakefileSat Mar 24 08:44:54 2007
+++ linux-2.4.34.2-w83627hf/drivers/char/Makefile   Sun Apr  1 16:23:01 2007
@@ -323,6 +323,7 @@
 obj-$(CONFIG_SOFT_WATCHDOG) += softdog.o
 obj-$(CONFIG_INDYDOG) += indydog.o
 obj-$(CONFIG_8xx_WDT) += mpc8xx_wdt.o
+obj-$(CONFIG_WDT_W83627) += wdt83627.o
 
 subdir-$(CONFIG_MWAVE) += mwave
 ifeq ($(CONFIG_MWAVE),y)
--- linux-2.4.34.2/drivers/char/wdt83627.c  Thu Jan  1 02:00:00 1970
+++ linux-2.4.34.2-w83627hf/drivers/char/wdt83627.c Sun Apr  1 16:22:35 2007
@@ -0,0 +1,416 @@
+/*
+ * w83627hf WDT driver
+ *
+ * backported from w83627hf_wdt.c kernel 2.6.20.1
+ * (c) Copyright 2007 Orpak Systems Ltd. (Tal Kelrich <[EMAIL PROTECTED]>)
+ *
+ * (c) Copyright 2003 Pádraig Brady <[EMAIL PROTECTED]>
+ *
+ * Based on advantechwdt.c which is based on wdt.c.
+ * Original copyright messages:
+ *
+ * (c) Copyright 2000-2001 Marek Michalkiewicz <[EMAIL PROTECTED]>
+ *
+ * (c) Copyright 1996 Alan Cox <[EMAIL PROTECTED]>, All Rights Reserved.
+ * http://www.redhat.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Neither Alan Cox nor CymruNet Ltd. admit liability nor provide
+ * warranty for any of this software. This material is provided
+ * "AS-IS" and at no charge.
+ *
+ * (c) Copyright 1995Alan Cox <[EMAIL PROTECTED]>
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#define WATCHDOG_NAME "W83627HF WDT"
+#define PFX WATCHDOG_NAME ": "
+#define WATCHDOG_TIMEOUT 120   /* 120 sec default timeout */
+
+#ifdef CONFIG_WATCHDOG_NOWAYOUT
+#define WATCHDOG_NOWAYOUT 1
+#else
+#define WATCHDOG_NOWAYOUT 0
+#endif
+
+static unsigned long wdt_is_open;
+static char expect_close;
+static spinlock_t io_lock;
+
+/* You must set this - there is no sane way to probe for this board. */
+static int wdt_io = 0x2E;
+MODULE_PARM(wdt_io, "i");
+MODULE_PARM_DESC(wdt_io, "w83627hf WDT io port (default 0x2E)");
+
+static int timeout = WATCHDOG_TIMEOUT; /* in seconds */
+MODULE_PARM(timeout, "i");
+MODULE_PARM_DESC(timeout, "Watchdog timeout in seconds. 1<= timeout <=255, 
default=" __MODULE_STRING(WATCHDOG_TIMEOUT) ".");
+
+static int nowayout = WATCHDOG_NOWAYOUT;
+MODULE_PARM(nowayout, "i");
+MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started (default=" 
__MODULE_STRING(CONFIG_WATCHDOG_NOWAYOUT)")");
+
+/*
+ * Kernel methods.
+ */
+
+#define WDT_EFER (wdt_io+0)   /* Extended Function Enable Registers */
+#define WDT_EFIR (wdt_io+0)   /* Extended Function Index Register (same as 
EFER) */
+#define WDT_EFDR (WDT_EFIR+1) /* Extended Function Data Register */
+
+/* Non standard proc bits, added by request, wanted some feedback */
+
+static int wdt_readproc(char *page, char **start, off_t off, int count,
+   int *eof, void *data)
+{
+   int len;
+   unsigned char remaining;
+   unsigned char fired;
+   spin_lock(_lock);
+   w83627hf_select_wd_register();
+   outb_p(0xF6, WDT_EFIR);/* get current timer val */
+   remaining=inb_p(WDT_EFDR);
+   outb_p(0xF7, WDT_EFIR);
+   fired=inb_p(WDT_EFDR);
+   /* clear that bit (bit 4) */
+   outb_p(fired&(~0x10),WDT_EFDR);
+   w83627hf_unselect_wd_register();
+   spin_unlock(_lock);
+   fired=(fired&0x10)!=0;
+   len=snprintf(page,PAGE_SIZE,
+

[2.4] Watchdog wdt977 (Winbond W83977EF) driver

2007-04-01 Thread Tal Kelrich
Hello,

This is my first submitted kernel patch, please be gentle.

Tested and working on AAEON GENE-6310B Subcompact Board
(also configured for same by default, should work elsewhere)
patch is against kernel 2.4.34.2

Changes/Features:

Added ioctl support
Disables watchdog on driver load
Supports timeout in seconds
Timeout defaults to 2 minutes
No longer under NetWinder arch
Configurable output GP (defaults to GP16)
Configurable base IO address
Non standard read only proc interface for status (/proc/watchdog)

Caveats:

No idea if this breaks netwinder, although it really shouldn't
Only tested with GP16
Utterly ignores inability to get its IO port, mostly because it's
already taken. I didn't know a way around that.
release_region is called regardless of having acquired the region, this
might be trouble.

-- 
Tal Kelrich
PGP fingerprint: 3EDF FCC5 60BB 4729 AB2F  CAE6 FEC1 9AAC 12B9 AA69
Key Available at: http://www.hasturkun.com/pub.txt

To err is human, to forgive is against company policy.


diff -udr linux-2.4.34.2/drivers/char/Config.in linux-2.4.34.2-wdt977/drivers/char/Config.in
--- linux-2.4.34.2/drivers/char/Config.in	Sat Mar 24 08:44:54 2007
+++ linux-2.4.34.2-wdt977/drivers/char/Config.in	Wed Mar 28 10:49:02 2007
@@ -247,10 +247,8 @@
tristate '  Berkshire Products PC Watchdog' CONFIG_PCWATCHDOG
if [ "$CONFIG_FOOTBRIDGE" = "y" ]; then
   tristate '  DC21285 watchdog' CONFIG_21285_WATCHDOG
-  if [ "$CONFIG_ARCH_NETWINDER" = "y" ]; then
- tristate '  NetWinder WB83C977 watchdog' CONFIG_977_WATCHDOG
-  fi
fi
+   tristate '  Winbond W83977EF Watchdog Timer' CONFIG_977_WATCHDOG
tristate '  Eurotech CPU-1220/1410 Watchdog Timer' CONFIG_EUROTECH_WDT
tristate '  IB700 SBC Watchdog Timer' CONFIG_IB700_WDT
tristate '  ICP ELectronics Wafer 5823 Watchdog' CONFIG_WAFER_WDT
--- linux-2.4.34.2/drivers/char/wdt977.c	Sat Mar 24 08:44:54 2007
+++ linux-2.4.34.2-wdt977/drivers/char/wdt977.c	Wed Mar 28 10:52:23 2007
@@ -1,5 +1,7 @@
 /*
- *	Wdt977	0.02:	A Watchdog Device for Netwinder W83977AF chip
+ *	Wdt83977 0.03:	A Watchdog Device for Winbond W83977EF chip
+ *	(c) Copyright 2007 Orpak Systems Ltd. (Tal Kelrich <[EMAIL PROTECTED]>)
+ *	based on wdt977 driver by Woody Suwalski <[EMAIL PROTECTED]>
  *
  *	(c) Copyright 1998 Rebel.com (Woody Suwalski <[EMAIL PROTECTED]>)
  *
@@ -9,10 +11,6 @@
  *	modify it under the terms of the GNU General Public License
  *	as published by the Free Software Foundation; either version
  *	2 of the License, or (at your option) any later version.
- *
- *			---
- *  14-Dec-2001 Matt Domsch <[EMAIL PROTECTED]>
- *   Added nowayout module option to override CONFIG_WATCHDOG_NOWAYOUT
  */
  
 #include 
@@ -23,17 +21,25 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 
 #include 
 #include 
-#include 
+#include 
+#include 
 
 #define WATCHDOG_MINOR	130
 
-static	int timeout = 3;
+static	int timeout = 120;
 static	int timer_alive;
-static	int testmode;
 static	int expect_close = 0;
+static spinlock_t wdt_lock;
+
+/* port is either 0X370 or 0x3F0. there's probably no way to detect this */
+static int wdt_io = 0x370;
 
 #ifdef CONFIG_WATCHDOG_NOWAYOUT
 static int nowayout = 1;
@@ -41,14 +47,77 @@
 static int nowayout = 0;
 #endif
 
+static int whichgp = 16;
+
+MODULE_PARM(wdt_io,"i");
+MODULE_PARM_DESC(wdt_io,"WDT io port base (0x370/0x3F0)");
+
+MODULE_PARM(whichgp,"i");
+MODULE_PARM_DESC(whichgp,"which gp? (12/13/16)");
+
+MODULE_PARM(timeout,"i");
+
 MODULE_PARM(nowayout,"i");
 MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started (default=CONFIG_WATCHDOG_NOWAYOUT)");
 
+#define WDT_EFER (wdt_io+0)   /* Extended Function Enable Registers */
+#define WDT_EFIR (wdt_io+0)   /* Extended Function Index Register (same as EFER) */
+#define WDT_EFDR (WDT_EFIR+1) /* Extended Function Data Register */
+
+#define WDT_OUT(reg,data) {outb_p(reg,WDT_EFIR);outb_p(data,WDT_EFDR);}
+#define WDT_IN(reg,out) {outb_p(reg,WDT_EFIR);out=inb_p(WDT_EFDR);}
+#define WDT_DEV(device) {WDT_OUT(0x07,device);}
+#define WDT_ENABLE	{outb_p(0x87,WDT_EFER);outb_p(0x87,WDT_EFER);}
+#define WDT_DISABLE	{outb_p(0xAA,WDT_EFER);}
+
+static int wdt977_readproc(char *page, char **start, off_t off, int count,
+		int *eof, void *data)
+{
+	int len;
+	unsigned char remaining;
+	unsigned char fired;
+	spin_lock(_lock);
+	WDT_ENABLE;
+	WDT_DEV(0x08);
+	WDT_IN(0xF2,remaining); /* get remaining time */
+	WDT_IN(0xF4,fired); /* and some nice status bits */
+	/* and clear the bit we care about */
+	WDT_OUT(0xF4,fired&(~0x01));
+	WDT_DISABLE;
+	spin_unlock(_lock);
+	fired=fired & 0x01;
+	len=snprintf(page,PAGE_SIZE,
+			"W83977EF device\n"
+			"active=%d\n"
+			"iobase=%0X\n"
+			"gp=%d\n"
+			"nowayout=%d\n"
+			"timeout=%d\n"
+			"remaining=%d\n"
+			"fired=%d\n",
+			timer_alive,wdt_io,whichgp,nowayout,timeout,remaining,fired);
+	*eof=1;
+	return len;
+}
+
+static void wdt977_ctrl(int timeout)
+{
+	unsigned 

Re: Bug?

2007-04-01 Thread Michal Piotrowski

Hi,

On 01/04/07, Sascha Curth <[EMAIL PROTECTED]> wrote:

Hi,

i hav essen this in my logfiles. The server wasn't rechable any more and
a hard reset was needed


Yes, that's a bug.





Kernel: 2.6.18
Logfile:

Mar 29 22:18:07 seven [ cut here ]
Mar 29 22:18:07 seven kernel BUG at mm/rmap.c:522!
Mar 29 22:18:07 seven invalid opcode:  [#1]
Mar 29 22:18:07 seven SMP
Mar 29 22:18:07 seven Modules linked in: capi capifs st fcdsl
Mar 29 22:18:07 seven CPU:0
Mar 29 22:18:07 seven EIP:0060:[]Tainted: P SVLI


You are using proprietary module. Please reproduce with an untainted kernel.

Regards,
Michal

--
Michal K. K. Piotrowski
Hurd Testers Group
(http://www.hurdtestersgroup.org/)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [-mm3 PATCH] (Retry) Check the return value of kobject_add and etc.

2007-04-01 Thread Parag Warudkar

On 4/1/07, Parag Warudkar <[EMAIL PROTECTED]> wrote:

>-  kobject_add(>kobj);
>+  if (kobject_add(>kobj)) {
>+  kfree(p);
>+  return;

Please add a printk warning before the return statement to log a
proper warning stating what happened, which file and line etc. That
way people can know why something did not work as expected and
hopefully do something about it.


Never mind - kobject_add() is already verbose enough when it fails.

Parag
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Silent corruption on AMD64

2007-04-01 Thread Andi Kleen
Aaron Lehmann <[EMAIL PROTECTED]> writes:

[adding netdev]
[meta-comment: I wish people wouldn't use such unnecessarily broad subjects 
-- how is it the x86-64 port's or AMD's fault when you have broken hardware? 
Would anybody write "Silent corruption on i386" or "Silent corruption 
on Intel" or "Silent corruption on Linux"?]

> On Sat, Mar 31, 2007 at 08:03:16PM -0700, Jim Paris wrote:
> > Since it shows up under heavy load that includes unrelated devices, I
> > think ruling out hardware problems is important.  Some suggestions:
> 
> I've been able to narrow it down to the Realtek Ethernet card. I can't
> reproduce the problem using onboard Ethernet, whereas the Realtek card
> causes trouble in any slot. However, I still don't know whether it's a
> hardware or software issue, or whether it's caused directly or
> indirectly by the Realtek card.

You could disable the hardware checksumming support in the card with
the appended patch. Then hopefully Linux will catch most corruptions
(but perhaps not all because TCP checksums are not very strong) 
You can watch failed checksums then with netstat -s

-Andi

Index: linux-2.6.21-rc3-net/drivers/net/r8169.c
===
--- linux-2.6.21-rc3-net.orig/drivers/net/r8169.c
+++ linux-2.6.21-rc3-net/drivers/net/r8169.c
@@ -2477,6 +2477,7 @@ static inline int rtl8169_fragmented_fra
 
 static inline void rtl8169_rx_csum(struct sk_buff *skb, struct RxDesc *desc)
 {
+#if 0
u32 opts1 = le32_to_cpu(desc->opts1);
u32 status = opts1 & RxProtoMask;
 
@@ -2485,6 +2486,7 @@ static inline void rtl8169_rx_csum(struc
((status == RxProtoIP) && !(opts1 & IPFail)))
skb->ip_summed = CHECKSUM_UNNECESSARY;
else
+#endif
skb->ip_summed = CHECKSUM_NONE;
 }
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] fbcon: don't draw cursor when it's disabled

2007-04-01 Thread Michal Januszewski
From: Michal Januszewski <[EMAIL PROTECTED]>

When the cursor and echo are disabled on the current console, pressing a
key will cause a black rectangle to be painted in the cursor's position.
Fix this by not touching the framebuffer in fbcon_cursor() when the
cursor is off.

Signed-off-by: Michal Januszewski <[EMAIL PROTECTED]>
---

 drivers/video/console/fbcon.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c
index f721341..7089aa8 100644
--- a/drivers/video/console/fbcon.c
+++ b/drivers/video/console/fbcon.c
@@ -1330,7 +1330,7 @@ static void fbcon_cursor(struct vc_data *vc, int mode)
int y;
int c = scr_readw((u16 *) vc->vc_pos);
 
-   if (fbcon_is_inactive(vc, info))
+   if (fbcon_is_inactive(vc, info) || vc->vc_deccm != 1)
return;
 
ops->cursor_flash = (mode == CM_ERASE) ? 0 : 1;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH resend] vt: fix potential race in VT_WAITACTIVE handler

2007-04-01 Thread Michal Januszewski
From: Michal Januszewski <[EMAIL PROTECTED]>

On a multiprocessor machine the VT_WAITACTIVE ioctl call may return 0
if fg_console has already been updated in redraw_screen() but the
console switch itself hasn't been completed. Fix this by checking
fg_console in vt_waitactive() with the console sem held.

Signed-off-by: Michal Januszewski <[EMAIL PROTECTED]>
---
This is the 2nd version of this patch. It incorporates Andrew's
suggestions, ie. calls set_current_state() after down() and adds
a comment explaining why acquiring the console sem is necessary.

 drivers/char/vt_ioctl.c |   14 --
 1 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/char/vt_ioctl.c b/drivers/char/vt_ioctl.c
index 1fa2da8..0508293 100644
--- a/drivers/char/vt_ioctl.c
+++ b/drivers/char/vt_ioctl.c
@@ -1039,10 +1039,20 @@ int vt_waitactive(int vt)
 
add_wait_queue(_activate_queue, );
for (;;) {
-   set_current_state(TASK_INTERRUPTIBLE);
retval = 0;
-   if (vt == fg_console)
+
+   /* Synchronize with redraw_screen(). By acquiring the console
+* semaphore we make sure that the console switch is completed
+* before we return. If we didn't wait for the semaphore, we
+* could return at a point where fg_console has already been
+* updated, but the console switch hasn't been completed. */
+   acquire_console_sem();
+   set_current_state(TASK_INTERRUPTIBLE);
+   if (vt == fg_console) {
+   release_console_sem();
break;
+   }
+   release_console_sem();
retval = -EINTR;
if (signal_pending(current))
break;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: GPL vs non-GPL device drivers

2007-04-01 Thread devzero

Linux and OpenSource is evolution - go on and create your closed source drivers 
and do your own closed-source fork - go on and create your own little homo 
neanderthalensis !
___
SMS schreiben mit WEB.DE FreeMail - einfach, schnell und
kostenguenstig. Jetzt gleich testen! http://f.web.de/?mc=021192

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[KJ][PATCH] ROUND_UP cleanup in arch/mips/kernel/sysirix.c

2007-04-01 Thread Milind Arun Choudhary
ROUND_UP(32|64) cleanup, use ALIGN

Signed-off-by: Milind Arun Choudhary <[EMAIL PROTECTED]>

---
 sysirix.c |6 ++
 1 files changed, 2 insertions(+), 4 deletions(-)


diff --git a/arch/mips/kernel/sysirix.c b/arch/mips/kernel/sysirix.c
index 93a1484..59c25bc 100644
--- a/arch/mips/kernel/sysirix.c
+++ b/arch/mips/kernel/sysirix.c
@@ -1736,14 +1736,13 @@ struct irix_dirent32_callback {
 };
 
 #define NAME_OFFSET32(de) ((int) ((de)->d_name - (char *) (de)))
-#define ROUND_UP32(x) (((x)+sizeof(u32)-1) & ~(sizeof(u32)-1))
 
 static int irix_filldir32(void *__buf, const char *name,
int namlen, loff_t offset, u64 ino, unsigned int d_type)
 {
struct irix_dirent32 __user *dirent;
struct irix_dirent32_callback *buf = __buf;
-   unsigned short reclen = ROUND_UP32(NAME_OFFSET32(dirent) + namlen + 1);
+   unsigned short reclen = ALIGN(NAME_OFFSET32(dirent) + namlen + 1, 
sizeof(u32));
int err = 0;
u32 d_ino;
 
@@ -1838,14 +1837,13 @@ struct irix_dirent64_callback {
 };
 
 #define NAME_OFFSET64(de) ((int) ((de)->d_name - (char *) (de)))
-#define ROUND_UP64(x) (((x)+sizeof(u64)-1) & ~(sizeof(u64)-1))
 
 static int irix_filldir64(void *__buf, const char *name,
int namlen, loff_t offset, u64 ino, unsigned int d_type)
 {
struct irix_dirent64 __user *dirent;
struct irix_dirent64_callback * buf = __buf;
-   unsigned short reclen = ROUND_UP64(NAME_OFFSET64(dirent) + namlen + 1);
+   unsigned short reclen = ALIGN(NAME_OFFSET64(dirent) + namlen + 
1,sizeof(u64));
int err = 0;
 
if (!access_ok(VERIFY_WRITE, buf, sizeof(*buf)))

-- 
Milind Arun Choudhary
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb hid: reset NumLock

2007-04-01 Thread Pekka Enberg

On 3/30/07, Pete Zaitcev <[EMAIL PROTECTED]> wrote:

Dell people (Stuart and Charles) complained that on some USB keyboards,
if BIOS enables NumLock, it stays on even after Linux has started. Since
we always start with NumLock off, this confuses users. Quick double dab
at NumLock fixes it, but it's not nice.


What I am seeing on my Thinkpad is that when I boot _without_ an USB
keyboard NumLock is enabled. Switching to virtual console and back to
X fixes it which is why I have never bothered to debug it further.
Perhaps this is related? Should I give your patch a spin to see if it
fixes the problem?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cannot add device to partitioned raid6 array

2007-04-01 Thread Florian D.

Neil Brown wrote:

Definitely the cause.  If you really need to add this array, you may
be able to reduce the usage of the array, then reduce the size of the
array, then add the drive.
Depending on how you have partitioned the array, and how you are using
the partitions, you may just need to reduce the filesystem in the last
partition, then use *fdisk to resize the partition.  Then something
like:

   mdadm --grow --size=24300 /dev/md_d4

Is is generally safer to reduce the filesystem too much, resize the
device, then grow the filesystem up to the size of the device.  That
way avoids fiddly arithmetic and so reduces the chance of failure.

NeilBrown



thanks, but I decided to begin from scratch(backup is available ;)
now, all partitions have the same size. creating a raid6 array from 2 drives 
and hot-adding another
one works now. so this could be regarded as solved.

But when I try to create the array with 3 drives at once, the following strange 
error appears:

flockmock ~ # mdadm --create /dev/md_d4 --level=6 -a mdp --chunk=32 -n 4 /dev/sda2 /dev/sdb2 
/dev/sdc2 missing

mdadm: RUN_ARRAY failed: Input/output error
mdadm: stopped /dev/md_d4

dmesg shows:
[  484.362525] md: bind
[  484.363429] md: bind
[  484.364337] md: bind
[  484.364397] md: md_d4: raid array is not clean -- starting background 
reconstruction
[  484.365876] raid5: device sdc2 operational as raid disk 2
[  484.365879] raid5: device sdb2 operational as raid disk 1
[  484.365881] raid5: device sda2 operational as raid disk 0
[  484.365884] raid5: cannot start dirty degraded array for md_d4
[  484.365886] RAID5 conf printout:
[  484.365887]  --- rd:4 wd:3
[  484.365889]  disk 0, o:1, dev:sda2
[  484.365891]  disk 1, o:1, dev:sdb2
[  484.365893]  disk 2, o:1, dev:sdc2
[  484.365895] raid5: failed to run raid set md_d4
[  484.365897] md: pers->run() failed ...
[  484.366271] md: md_d4 stopped.
[  484.366303] md: unbind
[  484.366309] md: export_rdev(sdc2)
[  484.366314] md: unbind
[  484.366318] md: export_rdev(sdb2)
[  484.366321] md: unbind
[  484.366325] md: export_rdev(sda2)


I just wanted to report that FYI, I will take the first route and wait a 
little...
cheers, florian
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[KJ][PATCH] ROUND_UP macro cleanup in drivers/pci

2007-04-01 Thread Milind Arun Choudhary

ROUND_UP macro cleanup, use ALIGN where ever appropriate

Signed-off-by: Milind Arun Choudhary <[EMAIL PROTECTED]>

---
 hotplug/cpci_hotplug_pci.c |2 --
 setup-bus.c|8 +++-
 2 files changed, 3 insertions(+), 7 deletions(-)


diff --git a/drivers/pci/hotplug/cpci_hotplug_pci.c 
b/drivers/pci/hotplug/cpci_hotplug_pci.c
index 7b1beaa..5e9be44 100644
--- a/drivers/pci/hotplug/cpci_hotplug_pci.c
+++ b/drivers/pci/hotplug/cpci_hotplug_pci.c
@@ -45,8 +45,6 @@ extern int cpci_debug;
 #define info(format, arg...) printk(KERN_INFO "%s: " format "\n", MY_NAME , ## 
arg)
 #define warn(format, arg...) printk(KERN_WARNING "%s: " format "\n", MY_NAME , 
## arg)
 
-#define ROUND_UP(x, a) (((x) + (a) - 1) & ~((a) - 1))
-
 
 u8 cpci_get_attention_status(struct slot* slot)
 {
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 3554f39..e02766f 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -34,8 +34,6 @@
 #define DBG(x...)
 #endif
 
-#define ROUND_UP(x, a) (((x) + (a) - 1) & ~((a) - 1))
-
 static void __devinit
 pbus_assign_resources_sorted(struct pci_bus *bus)
 {
@@ -314,7 +312,7 @@ pbus_size_io(struct pci_bus *bus)
 #if defined(CONFIG_ISA) || defined(CONFIG_EISA)
size = (size & 0xff) + ((size & ~0xffUL) << 2);
 #endif
-   size = ROUND_UP(size + size1, 4096);
+   size = ALIGN(size + size1, 4096);
if (!size) {
b_res->flags = 0;
return;
@@ -383,11 +381,11 @@ pbus_size_mem(struct pci_bus *bus, unsigned long mask, 
unsigned long type)
 
if (!align)
min_align = align1;
-   else if (ROUND_UP(align + min_align, min_align) < align1)
+   else if (ALIGN(align + min_align, min_align) < align1)
min_align = align1 >> 1;
align += aligns[order];
}
-   size = ROUND_UP(size, min_align);
+   size = ALIGN(size, min_align);
if (!size) {
b_res->flags = 0;
return 1;

--
Milind Arun Choudhary
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] remove artificial software max_loop limit

2007-04-01 Thread devzero
not sure if this is a real issue and if it`s UML or loop related -  but how is 
low-memory situations being handled when creating loop devices ?

should losetup or dmesg tell "out of memory" if there is not enough memory left 
?

i fired up my 2.6.20 UML and tried to create lots of loop-devices.

this crashed my UML very soon , just around 200 devices - then i saw my 
"mistake" that my UML had only 32MB of RAM.

then i gave my UML mem=256M and now i can setup many more loop-devices, but 
still crashes in the end:

setting up loop-device 1962 with losetup
Kernel panic - not syncing: do_fork failed in kernel_thread, errno = -11

EIP: 0073:[] CPU: 0 Not tainted ESP: 007b:b7e6ffb0 EFLAGS: 0246
Not tainted
EAX:  EBX: 72b3 ECX: 0013 EDX: 72b3
ESI: 72af EDI: 0011 EBP:  DS: 007b ES: 007b
087e7eac:  [<0807ca7b>] notifier_call_chain+0x1d/0x33
087e7ec8:  [<08071416>] panic+0x52/0xdd
087e7ee4:  [<0805cd74>] kernel_thread+0x5d/0x5f
087e7ef4:  [<0808217f>] keventd_create_kthread+0x1a/0x48
087e7ef8:  [<080820cb>] kthread+0x0/0x9a
087e7f10:  [<0807f5fb>] run_workqueue+0x8a/0x11f
087e7f18:  [<08082165>] keventd_create_kthread+0x0/0x48
087e7f1c:  [<08068351>] set_signals+0x1d/0x32
087e7f2c:  [<0807f690>] worker_thread+0x0/0x14e
087e7f30:  [<0807f7a1>] worker_thread+0x111/0x14e
087e7f74:  [<0806e771>] default_wake_function+0x0/0x12
087e7f98:  [<0808213f>] kthread+0x74/0x9a
087e7fbc:  [<080679bf>] run_kernel_thread+0x38/0x41
087e7fd8:  [<080679a2>] run_kernel_thread+0x1b/0x41
087e7fe4:  [<0805f92f>] new_thread_handler+0x53/0x79
087e7fe8:  [<080820cb>] kthread+0x0/0x9a


regards
roland






> -Ursprüngliche Nachricht-
> Von: [EMAIL PROTECTED]
> Gesendet: 01.04.07 11:16:14
> An: linux-kernel@vger.kernel.org
> Betreff: Re: [patch] remove artificial software max_loop limit 


> >Remove artificial maximum 256 loop device that can be created due to a
> >legacy device number limit.  Searching through lkml archive, there are
> >several instances where users complained about the artificial limit
> >that the loop driver impose.  There is no reason to have such limit.
> 
> Hey, i was one of those :)
> 
> Nice to see, that it`s solved now, thanks very much!
> 
> I never expected this to happen and put all my hope into dm-loop instead.
> Did you mind that Bryn m. Reeves from redhat will suffer a serious depression 
> now? ( ->  http://sources.redhat.com/lvm2/wiki/DMLoop ) ;)
> 
> regards
> roland
> 


___
SMS schreiben mit WEB.DE FreeMail - einfach, schnell und
kostenguenstig. Jetzt gleich testen! http://f.web.de/?mc=021192

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86_64: Switch to SPARSE_VIRTUAL

2007-04-01 Thread Andi Kleen
On Sunday 01 April 2007 09:10, Christoph Lameter wrote:
> x86_64 make SPARSE_VIRTUAL the default
> 
> x86_64 is using 2M page table entries to map its 1-1 kernel space.
> We implement the virtual memmap also using 2M page table entries.
> So there is no difference at all to FLATMEM. Both schemes require
> a page table and a TLB.

Hmm, this means there is at least 2MB worth of struct page on every node?
Or do you have overlaps with other memory (I think you have)
In that case you have to handle the overlap in change_page_attr()

Also your "generic" vmemmap code doesn't look very generic, but
rather x86 specific. I didn't think huge pages could be easily
set up this way in many other architectures.  

And when you reserve virtual space somewhere you should 
update Documentation/x86_64/mm.txt. Also you didn't adjust 
the end of the vmalloc area so in theory vmalloc could run
into your vmemmap.

> Thus the SPARSEMEM becomes the most efficient way of handling
> virt_to_page, pfn_to_page and friends for UP, SMP and NUMA.

Do you have any benchmarks numbers to prove it? There seem to be a few
benchmarks where the discontig virt_to_page is a problem
(although I know ways to make it more efficient), and sparsemem
is normally slower. Still some numbers would be good.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] kbuild: fix dependency generation

2007-04-01 Thread Sam Ravnborg
From: Jan Beulich <[EMAIL PROTECTED]>

Commit 2e3646e51b2d6415549b310655df63e7e0d7a080 changed the way
the split config tree is built, but failed to also adjust fixdep
accordingly - if changing a config option from or to m, files
referencing the respective CONFIG_..._MODULE (but not the
corresponding CONFIG_...) didn't get rebuilt.

The problem is that trisate symbol are represent with three
different symbols:
SYMBOL=n => no symbol defined
SYMBOL=y => CONFIG_SYMBOL defined to '1'
SYMBOL=m => CONFIG_SYMBOL_MODULE defined to '1'

But conf_split_config do not distingush between the =y and =m case,
so only the =y case is honoured.
This is fixed in fixdep so when a CONFIG symbol with
_MODULE is found we skip that part and only look
for the CONFIG_SYMBOL version.

Signed-off-by: Jan Beulich <[EMAIL PROTECTED]>
Signed-off-by: Sam Ravnborg <[EMAIL PROTECTED]>
---

The patch can be pulled from:
ssh://master.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fix.git
[Contains only this fix]

Included below for reference and for -stable.

This should be applied to both latest -rc and stable.

Sam


 scripts/basic/fixdep.c |   10 +++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/scripts/basic/fixdep.c b/scripts/basic/fixdep.c
index 668a11a..24144a2 100644
--- a/scripts/basic/fixdep.c
+++ b/scripts/basic/fixdep.c
@@ -28,9 +28,11 @@
  * the dependency on linux/autoconf.h by a dependency on every config
  * option which is mentioned in any of the listed prequisites.
  *
- * To be exact, split-include populates a tree in include/config/,
- * e.g. include/config/his/driver.h, which contains the #define/#undef
- * for the CONFIG_HIS_DRIVER option.
+ * kconfig populates a tree in include/config/ with an empty file
+ * for each config symbol and when the configuration is updated
+ * the files representing changed config options are touched
+ * which thern let make pick up the changes and the files that uses
+ * the config symbols are rebuild.
  *
  * So if the user changes his CONFIG_HIS_DRIVER option, only the objects
  * which depend on "include/linux/config/his/driver.h" will be rebuilt,
@@ -245,6 +247,8 @@ void parse_config_file(char *map, size_t len)
continue;
 
found:
+   if (!memcmp(q - 7, "_MODULE", 7))
+   q -= 7;
use_config(p+7, q-p-7);
}
 }
-- 
1.5.1.rc3.20.gaa453

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fwd: kswapd issues + kernel 2.4.21-32.0.1.ELsmp

2007-04-01 Thread Dan Aloni
On Sat, Mar 31, 2007 at 01:11:29AM -0700, Pedram M wrote:
> Hi,
> 
> I've seen this around, and have heard about it in forums and else-where,
> could somebody enlighten me with more information or with experiences
> they have had.  Looks like kswapd begins to eat CPU dramatically till the
> box eventually locks up.

My experience with kswapd eating a lot of CPU tells that this particular 
problem with Linux 2.4 manifests itself on this scenario:

 * No swap is configured (we are blessed with RAM).

_AND_:

 * A lot of memory is allocated (for my case: a lot of anonymous pages 
used for applications).

Looking at this problem with kdb, I noticed that the CPU spends a lot 
of time in swap_out(), even though there are no swap devices...

So...

mm/vmscan.c, replace swap_out with this:

static int swap_out(zone_t * classzone)
{
return 0;
}

Now, this might raise some eyebrows for those MM gurus reading this
mail but I assure you I've been doing this for a lot of time (and 
a lot of machines) on 2.4.27, and it took care of my kswapd issues
quite nicely.

I hope it helps...

-- 
Dan Aloni
XIV LTD, http://www.xivstorage.com
da-x (at) monatomic.org, dan (at) xiv.co.il
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mcdx -- do_request(): non-read command to cd!!

2007-04-01 Thread Pekka Enberg

On 4/1/07, Pekka Enberg <[EMAIL PROTECTED]> wrote:

Looks like mcdx_xfer is sleeping while holding q->queue_lock. The
attached (untested) patch should fix it.


You also need to add a spin_lock_irq() before the call to
end_request() to Jens' patch.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mcdx -- do_request(): non-read command to cd!!

2007-04-01 Thread Pekka Enberg

On 3/31/07, Rene Herman <[EMAIL PROTECTED]> wrote:

There's quite a bit of noise in dmesg though. Repeated 5 times:

===BUG: scheduling while atomic: mount/0x0001/1166
  [] __sched_text_start+0x57/0x574
  [] schedule_timeout+0x70/0x8f
  [] process_timeout+0x0/0x5
  [] interruptible_sleep_on_timeout+0x5d/0xa5
  [] default_wake_function+0x0/0xc
  [] mcdx_xfer+0xae/0x2a5 [mcdx]


[snip]


  [] do_mcdx_request+0x9b/0xd2 [mcdx]
  [] __generic_unplug_device+0x1d/0x1f
  [] generic_unplug_device+0x11/0x29


Looks like mcdx_xfer is sleeping while holding q->queue_lock. The
attached (untested) patch should fix it.


mcdx-drop-queue_lock-before-sleeping
Description: Binary data


Re: cannot add device to partitioned raid6 array

2007-04-01 Thread Neil Brown
On Sunday April 1, [EMAIL PROTECTED] wrote:
> Neil Brown wrote:
> > On Saturday March 31, [EMAIL PROTECTED] wrote:
> >> hi list!
> >>
> >> in short:
> >> I created a partitioned raid6 array with 2 missing drives. Now, I want to 
> >> add a device. It fails with:
> >> flockmock ~ # mdadm -a /dev/md_d4 /dev/sdb2
> >> mdadm: add new device failed for /dev/sdb2 as 4: Invalid argument
> > 
> > Thanks for the detailed problem report.
> > 
> > I think the cause of the error is that /dev/sdb2 is too small.
> > It needs to be at least 490030594 sectors. How big is it?
  ^^^
I should have said 490030594 sectors or 245015297 KB, which would have
made it clearer, sorry.

> 
> but the *device* size should be only ~250GB, so the array size is ~500GB, no?
> 
> flockmock ~ # mdadm --detail /dev/md_d4
> /dev/md_d4:
>  Version : 00.90.03
>Creation Time : Sat Mar 31 19:48:58 2007
>   Raid Level : raid6
>   Array Size : 490030464 (467.33 GiB 501.79 GB)
>Used Dev Size : 245015232 (233.66 GiB 250.90 GB)
>^^
> 
> these disks are all 250 GB harddisks, formatted in the same way -- but they 
> are from different 
> vendors! The block sizes from the partitions vary a little bit (taken from 
> /sbin/fdisk):
> 
> /dev/sda2: 245015347
> /dev/sdc2: 245015347
> /dev/sdb2: 244099642  -> different vendor
> 
> do you think that may be the cause?

Definitely the cause.  If you really need to add this array, you may
be able to reduce the usage of the array, then reduce the size of the
array, then add the drive.
Depending on how you have partitioned the array, and how you are using
the partitions, you may just need to reduce the filesystem in the last
partition, then use *fdisk to resize the partition.  Then something
like:

   mdadm --grow --size=24300 /dev/md_d4

Is is generally safer to reduce the filesystem too much, resize the
device, then grow the filesystem up to the size of the device.  That
way avoids fiddly arithmetic and so reduces the chance of failure.

NeilBrown

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: missing kretprobes support on avr32 and sparc64

2007-04-01 Thread Håvard Skinnemoen
On Sat, March 31, 2007 15:15, Christoph Hellwig wrote:
> Currently all avr32 and sparc64 don't support kretprobes unlike all
> other kprobes supporting architectures.  This is not nice from the
> user interface point of view and from the ugly ifdefs point of view.
> Is there a reason these ports can't support kretprobes or was this
> simply an oversight / lazyness?

Lazyness on my part, I guess. It shouldn't be too hard to implement on
avr32; It looks like the generic code sets a breakpoint on function entry,
so all I have to do is to set a breakpoint on the return address. And
handle the breakpoint somehow. I'll try to implement it when I get back to
work in about a week.

Haavard

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cannot add device to partitioned raid6 array

2007-04-01 Thread Florian D.

Neil Brown wrote:

On Saturday March 31, [EMAIL PROTECTED] wrote:

hi list!

in short:
I created a partitioned raid6 array with 2 missing drives. Now, I want to add a 
device. It fails with:
flockmock ~ # mdadm -a /dev/md_d4 /dev/sdb2
mdadm: add new device failed for /dev/sdb2 as 4: Invalid argument


Thanks for the detailed problem report.

I think the cause of the error is that /dev/sdb2 is too small.
It needs to be at least 490030594 sectors. How big is it?


but the *device* size should be only ~250GB, so the array size is ~500GB, no?

flockmock ~ # mdadm --detail /dev/md_d4
/dev/md_d4:
Version : 00.90.03
  Creation Time : Sat Mar 31 19:48:58 2007
 Raid Level : raid6
 Array Size : 490030464 (467.33 GiB 501.79 GB)
  Used Dev Size : 245015232 (233.66 GiB 250.90 GB)
  ^^

these disks are all 250 GB harddisks, formatted in the same way -- but they are from different 
vendors! The block sizes from the partitions vary a little bit (taken from /sbin/fdisk):


/dev/sda2: 245015347
/dev/sdc2: 245015347
/dev/sdb2: 244099642  -> different vendor

do you think that may be the cause?

thanks a lot,
florian

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/5] x86_64: enable clockevents and dynticks

2007-04-01 Thread Thomas Gleixner
On Sat, 2007-03-31 at 01:31 -0700, Chris Wright wrote:
> This series converts x86_64 timers to clockevents drivers
> and then enables dynticks.  There's some minor cleanups along
> the way.  The lapic broadcast mechanism is untested, I'm sure it
> still needs work, there's still some cruft in lapic_setup_timer.
> 
> This is just for comments at this point, now that it's working
> on my test box in both NO_HZ=n and NO_HZ=n configurations (typically
> using hpet).

Have you checked, if we could share the code between i386 and x86_64 at
least for PIT and HPET. I'm not sure about the local APIC, but I think
it might be doable as well.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] remove artificial software max_loop limit

2007-04-01 Thread devzero
>Remove artificial maximum 256 loop device that can be created due to a
>legacy device number limit.  Searching through lkml archive, there are
>several instances where users complained about the artificial limit
>that the loop driver impose.  There is no reason to have such limit.

Hey, i was one of those :)

Nice to see, that it`s solved now, thanks very much!

I never expected this to happen and put all my hope into dm-loop instead.
Did you mind that Bryn m. Reeves from redhat will suffer a serious depression 
now? ( ->  http://sources.redhat.com/lvm2/wiki/DMLoop ) ;)

regards
roland

___
SMS schreiben mit WEB.DE FreeMail - einfach, schnell und
kostenguenstig. Jetzt gleich testen! http://f.web.de/?mc=021192

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 6/13] signal/timer/event fds v9 - timerfd core ...

2007-04-01 Thread Thomas Gleixner
On Sat, 2007-03-31 at 13:09 -0700, Davide Libenzi wrote:
> +/*
> + * This gets called when the timer event triggers. We increment the
> + * tick count and wake the possible waiters. If the timer in a
> + * sequential one (->tintv.tv64 != 0), we re-arm it with hrtimer_forward().
> + */
> +static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr)
> +{
> + struct timerfd_ctx *ctx = container_of(htmr, struct timerfd_ctx, tmr);
> + enum hrtimer_restart rval = HRTIMER_NORESTART;
> + unsigned long flags;
> +
> + spin_lock_irqsave(>lock, flags);
> + ctx->ticks++;
> + wake_up_locked(>wqh);
> + if (ctx->tintv.tv64 != 0) {
> + hrtimer_forward(htmr, hrtimer_cb_get_time(htmr), ctx->tintv);
> + rval = HRTIMER_RESTART;
> + }
> + spin_unlock_irqrestore(>lock, flags);
> +
> + return rval;
> +}

For periodic timers we probably want to know also about missed ticks,
i.e. when the timer was delayed.

I changed recently the rearm handling code of itimers to prevent DoS
attacks. See commit 8bfd9a7a229b5f3d3eda5d7d45c2eebec5b4ba16. The posix
timer code has a similar mechanism.

Probably we should do the same here. That means that we defer the
restart of the timer to the process context.

tglx

static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr)
{
   struct timerfd_ctx *ctx = container_of(htmr, struct timerfd_ctx, tmr);
   unsigned long flags;

   spin_lock_irqsave(>lock, flags);
   ctx->expired = 1;
   wake_up_locked(>wqh);
   spin_unlock_irqrestore(>lock, flags);

   return HRTIMER_NORESTART;
}

static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
   loff_t *ppos)
{
   struct timerfd_ctx *ctx = file->private_data;
   ssize_t res;
   u32 ticks = 0;
   DECLARE_WAITQUEUE(wait, current);

   if (count < sizeof(ticks))
   return -EINVAL;
   spin_lock_irq(>lock);
   res = -EAGAIN;
   if (!ctx->expired && !(file->f_flags & O_NONBLOCK)) {
   __add_wait_queue(>wqh, );
   for (res = 0;;) {
   set_current_state(TASK_INTERRUPTIBLE);
   if (ctx->expired) {
   res = 0;
   break;
   }
   if (signal_pending(current)) {
   res = -ERESTARTSYS;
   break;
   }
   spin_unlock_irq(>lock);
   schedule();
   spin_lock_irq(>lock);
   }
   __remove_wait_queue(>wqh, );
   __set_current_state(TASK_RUNNING);
   }
   if (ctx->expired) {
ctx->expired = 0;
if (ctx->tintv.tv64 != 0) {
ticks = hrtimer_forward(>tmr, ktime_get(),
ctx->tintv);
hrtimer_restart(>tmr);
} else
ticks = 1;
}
   spin_unlock_irq(>lock);
   if (ticks)
   res = put_user(ticks, buf) ? -EFAULT: sizeof(ticks);
   return res;
}


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hid: add two led codes to hid input mapping

2007-04-01 Thread Jiri Kosina
On Sat, 31 Mar 2007, Dan Engel wrote:

> This patch is really being offered because it's what's needed to make the 
> operation
> of the Belkin Flip USB KVM switch avaiable to user-space programs through the 
> HID input
> event interface. The Belkin Flip KVM overloads LED usages to give software 
> control
> over the device, providing options to flip either audio, video or both. 
> However,
> without an input mapping to the Off-hook and Speaker LED usages, this 
> functionality
> isn't available.

Dmitry, would adding these two LED_ constants to input.h be OK by you? 
(the coresponding usages are defined in HUT 1.12 on page 62).

Thanks,

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + clocksource-driver-initialize-list-value.patch added to -mm tree

2007-04-01 Thread Thomas Gleixner
On Sat, 2007-03-31 at 22:23 -0700, [EMAIL PROTECTED] wrote:
> The patch titled
>  clocksource: driver initialize list value
> has been added to the -mm tree.  Its filename is
>  clocksource-driver-initialize-list-value.patch
> 
> *** Remember to use Documentation/SubmitChecklist when testing your code ***
> 
> See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
> out what to do about this
> 
> --
> Subject: clocksource: driver initialize list value
> From: Daniel Walker <[EMAIL PROTECTED]>
> 
> Update drivers/clocksource/ with list initialization.

As others pointed out already, can we please have some usefull
description why this change is necessary ? i.e. that it is a preparatory
patch to simplify the clocksource registration logic.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86_64: fix arithmetic in comment

2007-04-01 Thread Avi Kivity
The xmm space on x86_64 is 256 bytes.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>

diff --git a/include/asm-x86_64/processor.h b/include/asm-x86_64/processor.h
index 76552d7..90eb24e 100644
--- a/include/asm-x86_64/processor.h
+++ b/include/asm-x86_64/processor.h
@@ -201,7 +201,7 @@ struct i387_fxsave_struct {
u32 mxcsr;
u32 mxcsr_mask;
u32 st_space[32];   /* 8*16 bytes for each FP-reg = 128 bytes */
-   u32 xmm_space[64];  /* 16*16 bytes for each XMM-reg = 128 bytes */
+   u32 xmm_space[64];  /* 16*16 bytes for each XMM-reg = 256 bytes */
u32 padding[24];
 } __attribute__ ((aligned (16)));
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [-mm3 PATCH] (Retry) Check the return value of kobject_add and etc.

2007-04-01 Thread Parag Warudkar

-   kobject_add(>kobj);
+   if (kobject_add(>kobj)) {
+   kfree(p);
+   return;


Please add a printk warning before the return statement to log a
proper warning stating what happened, which file and line etc. That
way people can know why something did not work as expected and
hopefully do something about it.

Parag
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [4/4] 2.6.21-rc5: known regressions (v2)

2007-04-01 Thread Michael S. Tsirkin
> Subject: after resume: X hangs after drawing a couple of windows
>  workaround: clocksource=acpi_pm
> References : http://lkml.org/lkml/2007/3/8/117
>  http://lkml.org/lkml/2007/3/25/20
>  http://lkml.org/lkml/2007/3/26/151
> Submitter  : Michael S. Tsirkin <[EMAIL PROTECTED]>
> Handled-By : Thomas Gleixner <[EMAIL PROTECTED]>
> Status : problem is being debugged

Adrian,
the bug was found by Maxim Levitsky and
the following patch appears to have fixed the problem:
http://lkml.org/lkml/2007/3/28/104

the right way to fix it is still being discussed:
http://lkml.org/lkml/2007/3/28/182


-- 
MST
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: cifs causes BUG: soft lockup detected on CPU

2007-04-01 Thread Valentin Zaharov
Hi again,

After applying changes manually to 2.6.20.4 according to the link that
Steven sent I still get those errors (attached below) but no crash so
far.
I am wondering if its ok or having errors still will cause freezes.

Thanks in advance,

Apr  1 04:23:49 UFR2 kernel:  CIFS VFS: cifs_strtoUCS: char2uni returned
-22
Apr  1 04:37:14 UFR2 last message repeated 30 times
Apr  1 04:38:53 UFR2 last message repeated 10 times
Apr  1 04:40:33 UFR2 last message repeated 10 times
Apr  1 04:45:31 UFR2 last message repeated 10 times
Apr  1 04:47:11 UFR2 last message repeated 10 times
Apr  1 05:00:32 UFR2 last message repeated 10 times
Apr  1 05:07:21 UFR2 last message repeated 10 times
Apr  1 05:21:50 UFR2 last message repeated 10 times
Apr  1 05:23:29 UFR2 last message repeated 10 times
Apr  1 05:29:07 UFR2 last message repeated 10 times
Apr  1 05:30:58 UFR2 last message repeated 10 times
Apr  1 05:30:58 UFR2 last message repeated 9 times
Apr  1 05:55:33 UFR2 kernel:  CIFS VFS: Error 0xfff3 on
cifs_get_inode_info in lookup of \nv322657
Apr  1 05:55:33 UFR2 kernel:  CIFS VFS: Error 0xfff3 on
cifs_get_inode_info in lookup of \nv322657
Apr  1 06:28:41 UFR2 kernel:  CIFS VFS: cifs_strtoUCS: char2uni returned
-22
Apr  1 07:24:36 UFR2 last message repeated 10 times
Apr  1 07:26:29 UFR2 last message repeated 10 times
Apr  1 07:28:17 UFR2 last message repeated 10 times
Apr  1 07:30:09 UFR2 last message repeated 10 times
Apr  1 07:35:45 UFR2 last message repeated 10 times
Apr  1 07:44:46 UFR2 last message repeated 10 times
Apr  1 08:23:10 UFR2 last message repeated 10 times
Apr  1 08:24:52 UFR2 last message repeated 10 times
Apr  1 10:04:37 UFR2 last message repeated 10 times
Apr  1 10:04:37 UFR2 last message repeated 8 times
Apr  1 10:36:17 UFR2 kernel:  CIFS VFS: Error 0xfff3 on
cifs_get_inode_info in lookup of \nv322657
Apr  1 10:36:18 UFR2 last message repeated 4 times
Apr  1 10:38:19 UFR2 kernel:  CIFS VFS: cifs_strtoUCS: char2uni returned
-22 


Valentin Zaharov

System Department

NetVision Ltd.

___

Tel 04-8560350 Email  [EMAIL PROTECTED]

NetVision Ltd. Omega Center, Matam Haifa 31905


-Original Message-
From: Steven French [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 29, 2007 5:06 PM
To: Andrew Morton
Cc: Valentin Zaharov; linux-kernel@vger.kernel.org
Subject: Re: cifs causes BUG: soft lockup detected on CPU

This should be fixed in 2.6.21 as of 2/26.  The fix was
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commi
t;h=3677db10a635a39f63ea509f8f0056d95589ff90


Steve French
Senior Software Engineer
Linux Technology Center - IBM Austin
phone: 512-838-2294
email: sfrench at-sign us dot ibm dot com

Andrew Morton <[EMAIL PROTECTED]> wrote on 03/29/2007 03:54:57
AM:

> On Wed, 28 Mar 2007 20:35:55 +0200 "Valentin Zaharov" 
> <[EMAIL PROTECTED]> wrote:
> 
> > Hi,
> > 
> > We have continous problem with server freezes. We are using cifs 
mounts
> > on apache powered web servers with content located on Win2k3 server.
> > Servers freeze from time to time, producing following error just 
before
> > freeze:
> > 
> > Mar 26 21:50:37 UFR2 kernel:  CIFS VFS: cifs_strtoUCS: char2uni 
returned
> > -22 Mar 26 21:51:45 UFR2 last message repeated 55 times Mar 26 
21:52:49
> > UFR2 last message repeated 30 times Mar 26 21:54:16 UFR2 last
message
> > repeated 10 times Mar 26 21:56:13 UFR2 last message repeated 20
times
> > Mar 26 21:58:34 UFR2 last message repeated 75 times Mar 26 21:59:43 
UFR2
> > last message repeated 30 times Mar 26 22:01:02 UFR2 last message
> > repeated 30 times Mar 26 22:02:04 UFR2 last message repeated 30
times
> > Mar 26 22:03:08 UFR2 last message repeated 50 times Mar 26 22:04:27 
UFR2
> > last message repeated 10 times Mar 26 22:05:59 UFR2 last message
> > repeated 20 times Mar 26 22:07:10 UFR2 last message repeated 20
times
> > Mar 26 22:29:00 UFR2 last message repeated 64 times Mar 27 00:47:40 
UFR2
> > last message repeated 15 times Mar 27 01:42:41 UFR2 last message
> > repeated 95 times Mar 27 02:15:57 UFR2 last message repeated 90
times
> > Mar 27 02:27:13 UFR2 last message repeated 45 times Mar 27 03:14:08 
UFR2
> > last message repeated 95 times Mar 27 04:26:10 UFR2 last message
> > repeated 2 times Mar 27 06:11:35 UFR2 last message repeated 45 times

Mar
> > 27 06:20:20 UFR2 last message repeated 15 times Mar 27 06:20:20 UFR2
> > last message repeated 12 times Mar 27 06:27:53 UFR2 kernel: BUG:
soft
> > lockup detected on CPU#3!
> > Mar 27 06:27:53 UFR2 kernel:  [] softlockup_tick+0x9e/0xac

Mar
> > 27 06:27:53 UFR2 kernel:  []
update_process_times+0x3b/0x5e
> > Mar 27 06:27:53 UFR2 kernel:  []
> > smp_apic_timer_interrupt+0x6c/0x7a
> > Mar 27 06:27:53 UFR2 kernel:  []
> > apic_timer_interrupt+0x28/0x30 Mar 27 06:27:53 UFR2 kernel:
> > [] generic_fillattr+0x75/0xa8 Mar 27 06:27:53 UFR2 kernel:
> > [] cifs_getattr+0x1e/0x2b [cifs] Mar 27 06:27:53 UFR2 
kernel:
> > [] cifs_getattr+0x0/0x2b 

[KJ][PATCH] ROUND_UP macro cleanup in arch/parisc

2007-04-01 Thread Milind Arun Choudhary
ROUND_UP macro cleanup, use ALIGN where ever appropriate

Signed-off-by: Milind Arun Choudhary <[EMAIL PROTECTED]>

---
 hpux/fs.c |5 ++---
 kernel/sys_parisc32.c |3 +--
 2 files changed, 3 insertions(+), 5 deletions(-)



diff --git a/arch/parisc/hpux/fs.c b/arch/parisc/hpux/fs.c
index 4204cd1..7ff5546 100644
--- a/arch/parisc/hpux/fs.c
+++ b/arch/parisc/hpux/fs.c
@@ -21,6 +21,7 @@
  *Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -70,7 +71,6 @@ struct getdents_callback {
 };
 
 #define NAME_OFFSET(de) ((int) ((de)->d_name - (char *) (de)))
-#define ROUND_UP(x) (((x)+sizeof(long)-1) & ~(sizeof(long)-1))
 
 static int filldir(void * __buf, const char * name, int namlen, loff_t offset,
u64 ino, unsigned d_type)
@@ -78,7 +78,7 @@ static int filldir(void * __buf, const char * name, int 
namlen, loff_t offset,
struct hpux_dirent * dirent;
struct getdents_callback * buf = (struct getdents_callback *) __buf;
ino_t d_ino;
-   int reclen = ROUND_UP(NAME_OFFSET(dirent) + namlen + 1);
+   int reclen = ALIGN(NAME_OFFSET(dirent) + namlen + 1,sizeof(long));
 
buf->error = -EINVAL;   /* only used if we fail.. */
if (reclen > buf->count)
@@ -103,7 +103,6 @@ static int filldir(void * __buf, const char * name, int 
namlen, loff_t offset,
 }
 
 #undef NAME_OFFSET
-#undef ROUND_UP
 
 int hpux_getdents(unsigned int fd, struct hpux_dirent *dirent, unsigned int 
count)
 {
diff --git a/arch/parisc/kernel/sys_parisc32.c 
b/arch/parisc/kernel/sys_parisc32.c
index ce3245f..e590880 100644
--- a/arch/parisc/kernel/sys_parisc32.c
+++ b/arch/parisc/kernel/sys_parisc32.c
@@ -311,14 +311,13 @@ struct readdir32_callback {
int count;
 };
 
-#define ROUND_UP(x,a)  ((__typeof__(x))(((unsigned long)(x) + ((a) - 1)) & 
~((a) - 1)))
 #define NAME_OFFSET(de) ((int) ((de)->d_name - (char __user *) (de)))
 static int filldir32 (void *__buf, const char *name, int namlen,
loff_t offset, u64 ino, unsigned int d_type)
 {
struct linux32_dirent __user * dirent;
struct getdents32_callback * buf = (struct getdents32_callback *) __buf;
-   int reclen = ROUND_UP(NAME_OFFSET(dirent) + namlen + 1, 4);
+   int reclen = ALIGN(NAME_OFFSET(dirent) + namlen + 1, 4);
u32 d_ino;
 
buf->error = -EINVAL;   /* only used if we fail.. */

-- 
Milind Arun Choudhary
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: plain 2.6.21-rc5 (1) vs amanda (0)

2007-04-01 Thread Ray Lee

On 3/31/07, Gene Heskett <[EMAIL PROTECTED]> wrote:

This effect I have isolated down to something in the 31 patches from
2.6.20.4 to 2.6.20.5-rc1, but I'm going to need additional guidance in
setting up the bisect to find it.  If indeed its a kernel problem.


First, set up a *small* test case, for your own sanity as well as
ours. (Set up a new backup job that does just part of your home
directory, for example. No, even better, just one file.) Then verify
that the small test case also fails the same was that you noticed your
big one does between 2.6.20.3 and 2.6.20.4.

Then, download everything in http://madrabbit.org/~ray/2.6.20.4 . That
has all the patches that Greg has in git, but your git is ancient so
let's just use the patches, hmm? It also has a control file (called
'series') that lists the order they should be applied in. Save
everything to the root of your 2.6.20.3 source tree. It'll be messy,
but it'll make things easier.

Once you have that, then go and apply the first half of the patches. (As in:
  head -n 16 series | xargs -n 1 patch -p1
at the base of the tree.

Compile and install that kernel, run your test case to see if the
problem is there. If it *is*, cut it in half again (Revert those 16
patches by adding a -R to the patch command (at the very end), then
redo the above command with an 8 instead of a 16.) If the problem
isn't there, cut the range [16,31] in half, giving you a 24 for the
next trial. Then repeat. Make sense?

Ray
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[-mm3 PATCH] (Retry) Check the return value of kobject_add and etc.

2007-04-01 Thread Cong WANG

Since kobject_add, sysfs_create_link and sysfs_create_file are marked
as '__must_check', we must always check their return values.

Signed-off-by: Cong WANG <[EMAIL PROTECTED]>

---
--- linux-2.6.21-rc5-mm3/fs/partitions/check.c.orig 2007-03-30
21:35:45.0 +0800
+++ linux-2.6.21-rc5-mm3/fs/partitions/check.c  2007-03-30
21:49:53.0 +0800
@@ -385,10 +385,16 @@ void add_partition(struct gendisk *disk,
p->kobj.parent = >kobj;
p->kobj.ktype = _part;
kobject_init(>kobj);
-   kobject_add(>kobj);
+   if (kobject_add(>kobj)) {
+   kfree(p);
+   return;
+   }
if (!disk->part_uevent_suppress)
kobject_uevent(>kobj, KOBJ_ADD);
-   sysfs_create_link(>kobj, _subsys.kset.kobj, "subsystem");
+   if (sysfs_create_link(>kobj, _subsys.kset.kobj, "subsystem")) {
+   kfree(p);
+   return;
+   }
if (flags & ADDPART_FLAG_WHOLEDISK) {
static struct attribute addpartattr = {
.name = "whole_disk",
@@ -396,7 +402,10 @@ void add_partition(struct gendisk *disk,
.owner = THIS_MODULE,
};

-   sysfs_create_file(>kobj, );
+   if (sysfs_create_file(>kobj, )) {
+   kfree(p);
+   return;
+   }
}
partition_sysfs_add_subdir(p);
disk->part[part-1] = p;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] slab: avoid __initdata warning (may be a bogus one)

2007-04-01 Thread Andrew Morton
On Sun, 1 Apr 2007 00:15:06 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> 
wrote:

> On Sat, 31 Mar 2007, Andrew Morton wrote:
> 
> > Yes, I think this is a flase positive - we'll never touch initkmem_list3[]
> > after free_initmem() because of the transitions of g_cpucache_up.
> 
> Correct.
>  
> > (In which case set_up_list3s() shoud be __init, too?)
> 
> Correct. Its only called during slab bootstrap.
> 
> > Christoph, I think you looked at this previously?
> 
> If you change set_up_list3s to __init then we have the same issue with 
> setup_cpu_cache right?

yup.

I wonder if there's a general way in which we can suppress such false
positives.  Say, create a new section called, umm, __nowarn and
__nowarndata and then we can tag functions or data with those tags tag and teach
the checker tools to ignore them?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] slab: avoid __initdata warning (may be a bogus one)

2007-04-01 Thread Christoph Lameter
On Sat, 31 Mar 2007, Andrew Morton wrote:

> Yes, I think this is a flase positive - we'll never touch initkmem_list3[]
> after free_initmem() because of the transitions of g_cpucache_up.

Correct.
 
> (In which case set_up_list3s() shoud be __init, too?)

Correct. Its only called during slab bootstrap.

> Christoph, I think you looked at this previously?

If you change set_up_list3s to __init then we have the same issue with 
setup_cpu_cache right?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] Generic Virtual Memmap suport for SPARSEMEM

2007-04-01 Thread Christoph Lameter
Spare Virtual: Virtual Memmap support for SPARSEMEM

SPARSEMEM is a pretty nice framework that unifies quite a bit of
code over all the arches. It would be great if it could be the default
so that we can get rid of various forms of DISCONTIG and other variations
on memory maps. So far what has hindered this are the additional lookups
that SPARSEMEM introduces for virt_to_page and page_address. This goes
so far that the code to do this has to be kept in a separate function
and cannot be used inline.

This patch introduces virtual memmap support for sparsemem. virt_to_page
page_address and consorts become simple shift/add operations. No page flag
fields, no table lookups nothing involving memory is required.

The two key operations pfn_to_page and page_to_page become:

#define pfn_to_page(pfn) (vmemmap + (pfn))
#define page_to_pfn(page)((page) - vmemmap)

In order for this to work we will have to use a virtual mapping.
These are usually for free since kernel memory is already mapped
via a 1-1 mapping requiring a page tabld. The virtual mapping must
be big enough to span all of memory that an arch support which may
make a virtual memmap difficult to use on funky 32 bit platforms
that support 36 address bits.

However, if there is enough virtual space available and the arch
already maps its 1-1 kernel space using TLBs (f.e. true of IA64
and x86_64) then this technique makes sparsemem lookups as efficient
as CONFIG_FLATMEM.

Maybe this patch will allow us to make SPARSEMEM the default
configuration that will work on UP, SMP and NUMA on most platforms?
Then we may hopefully be able to remove the various forms of support
for FLATMEM, DISCONTIG etc etc.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.21-rc5-mm2/include/asm-generic/memory_model.h
===
--- linux-2.6.21-rc5-mm2.orig/include/asm-generic/memory_model.h
2007-03-31 22:47:14.0 -0700
+++ linux-2.6.21-rc5-mm2/include/asm-generic/memory_model.h 2007-03-31 
22:59:35.0 -0700
@@ -47,6 +47,13 @@
 })
 
 #elif defined(CONFIG_SPARSEMEM)
+#ifdef CONFIG_SPARSE_VIRTUAL
+/*
+ * We have a virtual memmap that makes lookups very simple
+ */
+#define __pfn_to_page(pfn) (vmemmap + (pfn))
+#define __page_to_pfn(page)((page) - vmemmap)
+#else
 /*
  * Note: section's mem_map is encorded to reflect its start_pfn.
  * section[i].section_mem_map == mem_map's address - start_pfn;
@@ -62,6 +69,7 @@
struct mem_section *__sec = __pfn_to_section(__pfn);\
__section_mem_map_addr(__sec) + __pfn;  \
 })
+#endif
 #endif /* CONFIG_FLATMEM/DISCONTIGMEM/SPARSEMEM */
 
 #ifdef CONFIG_OUT_OF_LINE_PFN_TO_PAGE
Index: linux-2.6.21-rc5-mm2/mm/sparse.c
===
--- linux-2.6.21-rc5-mm2.orig/mm/sparse.c   2007-03-31 22:47:14.0 
-0700
+++ linux-2.6.21-rc5-mm2/mm/sparse.c2007-03-31 22:59:35.0 -0700
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Permanent SPARSEMEM data:
@@ -101,7 +102,7 @@ static inline int sparse_index_init(unsi
 
 /*
  * Although written for the SPARSEMEM_EXTREME case, this happens
- * to also work for the flat array case becase
+ * to also work for the flat array case because
  * NR_SECTION_ROOTS==NR_MEM_SECTIONS.
  */
 int __section_nr(struct mem_section* ms)
@@ -211,6 +212,90 @@ static int sparse_init_one_section(struc
return 1;
 }
 
+#ifdef CONFIG_SPARSE_VIRTUAL
+
+void *vmemmap_alloc_block(unsigned long size, int node)
+{
+   if (slab_is_available()) {
+   struct page *page =
+   alloc_pages_node(node, GFP_KERNEL,
+   get_order(size));
+
+   BUG_ON(!page);
+   return page_address(page);
+   } else
+   return __alloc_bootmem_node(NODE_DATA(node), size, size,
+   __pa(MAX_DMA_ADDRESS));
+}
+
+
+#ifndef ARCH_POPULATES_VIRTUAL_MEMMAP
+/*
+ * Virtual memmap populate functionality for architectures that support
+ * PMDs for huge pages like i386, x86_64 etc.
+ */
+static void vmemmap_pop_pmd(pud_t *pud, unsigned long addr,
+   unsigned long end, int node)
+{
+   pmd_t *pmd;
+
+   end = pmd_addr_end(addr, end);
+
+   for (pmd = pmd_offset(pud, addr); addr < end;
+   pmd++, addr += PMD_SIZE) {
+   if (pmd_none(*pmd)) {
+   void *block;
+   pte_t pte;
+
+   block = vmemmap_alloc_block(PMD_SIZE, node);
+   pte = pfn_pte(__pa(block) >> PAGE_SHIFT,
+   PAGE_KERNEL);
+   pte_mkdirty(pte);
+   pte_mkwrite(pte);
+   pte_mkyoung(pte);
+   mk_pte_huge(pte);
+   set_pmd(pmd, __pmd(pte_val(pte)));
+

[PATCH 1/4] x86_64: Switch to SPARSE_VIRTUAL

2007-04-01 Thread Christoph Lameter
x86_64 make SPARSE_VIRTUAL the default

x86_64 is using 2M page table entries to map its 1-1 kernel space.
We implement the virtual memmap also using 2M page table entries.
So there is no difference at all to FLATMEM. Both schemes require
a page table and a TLB.

Thus the SPARSEMEM becomes the most efficient way of handling
virt_to_page, pfn_to_page and friends for UP, SMP and NUMA.

So change the Kconfig for x86_64 to make SPARSE_VIRTUAL the
default and switch off all other memory models.

Oh. And PFN_TO_PAGE used to be out of line. Since it is now
so simple switch it back to inline.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.21-rc5-mm2/arch/x86_64/Kconfig
===
--- linux-2.6.21-rc5-mm2.orig/arch/x86_64/Kconfig   2007-03-31 
23:47:58.0 -0700
+++ linux-2.6.21-rc5-mm2/arch/x86_64/Kconfig2007-03-31 23:48:41.0 
-0700
@@ -380,25 +380,29 @@ config NUMA_EMU
  number of nodes. This is only useful for debugging.
 
 config ARCH_DISCONTIGMEM_ENABLE
-   bool
-   depends on NUMA
-   default y
+   def_bool n
 
 config ARCH_DISCONTIGMEM_DEFAULT
-   def_bool y
-   depends on NUMA
+   def_bool n
 
 config ARCH_SPARSEMEM_ENABLE
def_bool y
-   depends on (NUMA || EXPERIMENTAL)
+
+config SPARSEMEM_MANUAL
+   def_bool y
+
+config ARCH_SPARSE_VIRTUAL
+   def_bool y
+
+config SELECT_MEMORY_MODEL
+   def_bool n
 
 config ARCH_MEMORY_PROBE
def_bool y
depends on MEMORY_HOTPLUG
 
 config ARCH_FLATMEM_ENABLE
-   def_bool y
-   depends on !NUMA
+   def_bool n
 
 source "mm/Kconfig"
 
@@ -411,8 +415,7 @@ config HAVE_ARCH_EARLY_PFN_TO_NID
depends on NUMA
 
 config OUT_OF_LINE_PFN_TO_PAGE
-   def_bool y
-   depends on DISCONTIGMEM
+   def_bool n
 
 config NR_CPUS
int "Maximum number of CPUs (2-255)"
Index: linux-2.6.21-rc5-mm2/include/asm-x86_64/page.h
===
--- linux-2.6.21-rc5-mm2.orig/include/asm-x86_64/page.h 2007-03-31 
23:47:58.0 -0700
+++ linux-2.6.21-rc5-mm2/include/asm-x86_64/page.h  2007-03-31 
23:48:41.0 -0700
@@ -135,6 +135,7 @@ typedef struct { unsigned long pgprot; }
 VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
 
 #define __HAVE_ARCH_GATE_AREA 1
+#define vmemmap ((struct page *)0xe200UL)
 
 #include 
 #include 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86_64: Switch to SPARSE_VIRTUAL

2007-04-01 Thread Christoph Lameter
Argh. This should have been Patch 2/2. There is nothing in between.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] Generic Virtual Memmap suport for SPARSEMEM

2007-04-01 Thread Christoph Lameter
Spare Virtual: Virtual Memmap support for SPARSEMEM

SPARSEMEM is a pretty nice framework that unifies quite a bit of
code over all the arches. It would be great if it could be the default
so that we can get rid of various forms of DISCONTIG and other variations
on memory maps. So far what has hindered this are the additional lookups
that SPARSEMEM introduces for virt_to_page and page_address. This goes
so far that the code to do this has to be kept in a separate function
and cannot be used inline.

This patch introduces virtual memmap support for sparsemem. virt_to_page
page_address and consorts become simple shift/add operations. No page flag
fields, no table lookups nothing involving memory is required.

The two key operations pfn_to_page and page_to_page become:

#define pfn_to_page(pfn) (vmemmap + (pfn))
#define page_to_pfn(page)((page) - vmemmap)

In order for this to work we will have to use a virtual mapping.
These are usually for free since kernel memory is already mapped
via a 1-1 mapping requiring a page tabld. The virtual mapping must
be big enough to span all of memory that an arch support which may
make a virtual memmap difficult to use on funky 32 bit platforms
that support 36 address bits.

However, if there is enough virtual space available and the arch
already maps its 1-1 kernel space using TLBs (f.e. true of IA64
and x86_64) then this technique makes sparsemem lookups as efficient
as CONFIG_FLATMEM.

Maybe this patch will allow us to make SPARSEMEM the default
configuration that will work on UP, SMP and NUMA on most platforms?
Then we may hopefully be able to remove the various forms of support
for FLATMEM, DISCONTIG etc etc.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

Index: linux-2.6.21-rc5-mm2/include/asm-generic/memory_model.h
===
--- linux-2.6.21-rc5-mm2.orig/include/asm-generic/memory_model.h
2007-03-31 22:47:14.0 -0700
+++ linux-2.6.21-rc5-mm2/include/asm-generic/memory_model.h 2007-03-31 
22:59:35.0 -0700
@@ -47,6 +47,13 @@
 })
 
 #elif defined(CONFIG_SPARSEMEM)
+#ifdef CONFIG_SPARSE_VIRTUAL
+/*
+ * We have a virtual memmap that makes lookups very simple
+ */
+#define __pfn_to_page(pfn) (vmemmap + (pfn))
+#define __page_to_pfn(page)((page) - vmemmap)
+#else
 /*
  * Note: section's mem_map is encorded to reflect its start_pfn.
  * section[i].section_mem_map == mem_map's address - start_pfn;
@@ -62,6 +69,7 @@
struct mem_section *__sec = __pfn_to_section(__pfn);\
__section_mem_map_addr(__sec) + __pfn;  \
 })
+#endif
 #endif /* CONFIG_FLATMEM/DISCONTIGMEM/SPARSEMEM */
 
 #ifdef CONFIG_OUT_OF_LINE_PFN_TO_PAGE
Index: linux-2.6.21-rc5-mm2/mm/sparse.c
===
--- linux-2.6.21-rc5-mm2.orig/mm/sparse.c   2007-03-31 22:47:14.0 
-0700
+++ linux-2.6.21-rc5-mm2/mm/sparse.c2007-03-31 22:59:35.0 -0700
@@ -9,6 +9,7 @@
 #include linux/spinlock.h
 #include linux/vmalloc.h
 #include asm/dma.h
+#include asm/pgalloc.h
 
 /*
  * Permanent SPARSEMEM data:
@@ -101,7 +102,7 @@ static inline int sparse_index_init(unsi
 
 /*
  * Although written for the SPARSEMEM_EXTREME case, this happens
- * to also work for the flat array case becase
+ * to also work for the flat array case because
  * NR_SECTION_ROOTS==NR_MEM_SECTIONS.
  */
 int __section_nr(struct mem_section* ms)
@@ -211,6 +212,90 @@ static int sparse_init_one_section(struc
return 1;
 }
 
+#ifdef CONFIG_SPARSE_VIRTUAL
+
+void *vmemmap_alloc_block(unsigned long size, int node)
+{
+   if (slab_is_available()) {
+   struct page *page =
+   alloc_pages_node(node, GFP_KERNEL,
+   get_order(size));
+
+   BUG_ON(!page);
+   return page_address(page);
+   } else
+   return __alloc_bootmem_node(NODE_DATA(node), size, size,
+   __pa(MAX_DMA_ADDRESS));
+}
+
+
+#ifndef ARCH_POPULATES_VIRTUAL_MEMMAP
+/*
+ * Virtual memmap populate functionality for architectures that support
+ * PMDs for huge pages like i386, x86_64 etc.
+ */
+static void vmemmap_pop_pmd(pud_t *pud, unsigned long addr,
+   unsigned long end, int node)
+{
+   pmd_t *pmd;
+
+   end = pmd_addr_end(addr, end);
+
+   for (pmd = pmd_offset(pud, addr); addr  end;
+   pmd++, addr += PMD_SIZE) {
+   if (pmd_none(*pmd)) {
+   void *block;
+   pte_t pte;
+
+   block = vmemmap_alloc_block(PMD_SIZE, node);
+   pte = pfn_pte(__pa(block)  PAGE_SHIFT,
+   PAGE_KERNEL);
+   pte_mkdirty(pte);
+   pte_mkwrite(pte);
+   pte_mkyoung(pte);
+   mk_pte_huge(pte);
+   

[PATCH 1/4] x86_64: Switch to SPARSE_VIRTUAL

2007-04-01 Thread Christoph Lameter
x86_64 make SPARSE_VIRTUAL the default

x86_64 is using 2M page table entries to map its 1-1 kernel space.
We implement the virtual memmap also using 2M page table entries.
So there is no difference at all to FLATMEM. Both schemes require
a page table and a TLB.

Thus the SPARSEMEM becomes the most efficient way of handling
virt_to_page, pfn_to_page and friends for UP, SMP and NUMA.

So change the Kconfig for x86_64 to make SPARSE_VIRTUAL the
default and switch off all other memory models.

Oh. And PFN_TO_PAGE used to be out of line. Since it is now
so simple switch it back to inline.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

Index: linux-2.6.21-rc5-mm2/arch/x86_64/Kconfig
===
--- linux-2.6.21-rc5-mm2.orig/arch/x86_64/Kconfig   2007-03-31 
23:47:58.0 -0700
+++ linux-2.6.21-rc5-mm2/arch/x86_64/Kconfig2007-03-31 23:48:41.0 
-0700
@@ -380,25 +380,29 @@ config NUMA_EMU
  number of nodes. This is only useful for debugging.
 
 config ARCH_DISCONTIGMEM_ENABLE
-   bool
-   depends on NUMA
-   default y
+   def_bool n
 
 config ARCH_DISCONTIGMEM_DEFAULT
-   def_bool y
-   depends on NUMA
+   def_bool n
 
 config ARCH_SPARSEMEM_ENABLE
def_bool y
-   depends on (NUMA || EXPERIMENTAL)
+
+config SPARSEMEM_MANUAL
+   def_bool y
+
+config ARCH_SPARSE_VIRTUAL
+   def_bool y
+
+config SELECT_MEMORY_MODEL
+   def_bool n
 
 config ARCH_MEMORY_PROBE
def_bool y
depends on MEMORY_HOTPLUG
 
 config ARCH_FLATMEM_ENABLE
-   def_bool y
-   depends on !NUMA
+   def_bool n
 
 source mm/Kconfig
 
@@ -411,8 +415,7 @@ config HAVE_ARCH_EARLY_PFN_TO_NID
depends on NUMA
 
 config OUT_OF_LINE_PFN_TO_PAGE
-   def_bool y
-   depends on DISCONTIGMEM
+   def_bool n
 
 config NR_CPUS
int Maximum number of CPUs (2-255)
Index: linux-2.6.21-rc5-mm2/include/asm-x86_64/page.h
===
--- linux-2.6.21-rc5-mm2.orig/include/asm-x86_64/page.h 2007-03-31 
23:47:58.0 -0700
+++ linux-2.6.21-rc5-mm2/include/asm-x86_64/page.h  2007-03-31 
23:48:41.0 -0700
@@ -135,6 +135,7 @@ typedef struct { unsigned long pgprot; }
 VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
 
 #define __HAVE_ARCH_GATE_AREA 1
+#define vmemmap ((struct page *)0xe200UL)
 
 #include asm-generic/memory_model.h
 #include asm-generic/page.h
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86_64: Switch to SPARSE_VIRTUAL

2007-04-01 Thread Christoph Lameter
Argh. This should have been Patch 2/2. There is nothing in between.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] slab: avoid __initdata warning (may be a bogus one)

2007-04-01 Thread Christoph Lameter
On Sat, 31 Mar 2007, Andrew Morton wrote:

 Yes, I think this is a flase positive - we'll never touch initkmem_list3[]
 after free_initmem() because of the transitions of g_cpucache_up.

Correct.
 
 (In which case set_up_list3s() shoud be __init, too?)

Correct. Its only called during slab bootstrap.

 Christoph, I think you looked at this previously?

If you change set_up_list3s to __init then we have the same issue with 
setup_cpu_cache right?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] slab: avoid __initdata warning (may be a bogus one)

2007-04-01 Thread Andrew Morton
On Sun, 1 Apr 2007 00:15:06 -0700 (PDT) Christoph Lameter [EMAIL PROTECTED] 
wrote:

 On Sat, 31 Mar 2007, Andrew Morton wrote:
 
  Yes, I think this is a flase positive - we'll never touch initkmem_list3[]
  after free_initmem() because of the transitions of g_cpucache_up.
 
 Correct.
  
  (In which case set_up_list3s() shoud be __init, too?)
 
 Correct. Its only called during slab bootstrap.
 
  Christoph, I think you looked at this previously?
 
 If you change set_up_list3s to __init then we have the same issue with 
 setup_cpu_cache right?

yup.

I wonder if there's a general way in which we can suppress such false
positives.  Say, create a new section called, umm, __nowarn and
__nowarndata and then we can tag functions or data with those tags tag and teach
the checker tools to ignore them?


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[-mm3 PATCH] (Retry) Check the return value of kobject_add and etc.

2007-04-01 Thread Cong WANG

Since kobject_add, sysfs_create_link and sysfs_create_file are marked
as '__must_check', we must always check their return values.

Signed-off-by: Cong WANG [EMAIL PROTECTED]

---
--- linux-2.6.21-rc5-mm3/fs/partitions/check.c.orig 2007-03-30
21:35:45.0 +0800
+++ linux-2.6.21-rc5-mm3/fs/partitions/check.c  2007-03-30
21:49:53.0 +0800
@@ -385,10 +385,16 @@ void add_partition(struct gendisk *disk,
p-kobj.parent = disk-kobj;
p-kobj.ktype = ktype_part;
kobject_init(p-kobj);
-   kobject_add(p-kobj);
+   if (kobject_add(p-kobj)) {
+   kfree(p);
+   return;
+   }
if (!disk-part_uevent_suppress)
kobject_uevent(p-kobj, KOBJ_ADD);
-   sysfs_create_link(p-kobj, block_subsys.kset.kobj, subsystem);
+   if (sysfs_create_link(p-kobj, block_subsys.kset.kobj, subsystem)) {
+   kfree(p);
+   return;
+   }
if (flags  ADDPART_FLAG_WHOLEDISK) {
static struct attribute addpartattr = {
.name = whole_disk,
@@ -396,7 +402,10 @@ void add_partition(struct gendisk *disk,
.owner = THIS_MODULE,
};

-   sysfs_create_file(p-kobj, addpartattr);
+   if (sysfs_create_file(p-kobj, addpartattr)) {
+   kfree(p);
+   return;
+   }
}
partition_sysfs_add_subdir(p);
disk-part[part-1] = p;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[KJ][PATCH] ROUND_UP macro cleanup in arch/parisc

2007-04-01 Thread Milind Arun Choudhary
ROUND_UP macro cleanup, use ALIGN where ever appropriate

Signed-off-by: Milind Arun Choudhary [EMAIL PROTECTED]

---
 hpux/fs.c |5 ++---
 kernel/sys_parisc32.c |3 +--
 2 files changed, 3 insertions(+), 5 deletions(-)



diff --git a/arch/parisc/hpux/fs.c b/arch/parisc/hpux/fs.c
index 4204cd1..7ff5546 100644
--- a/arch/parisc/hpux/fs.c
+++ b/arch/parisc/hpux/fs.c
@@ -21,6 +21,7 @@
  *Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
  */
 
+#include linux/kernel.h
 #include linux/mm.h
 #include linux/sched.h
 #include linux/file.h
@@ -70,7 +71,6 @@ struct getdents_callback {
 };
 
 #define NAME_OFFSET(de) ((int) ((de)-d_name - (char *) (de)))
-#define ROUND_UP(x) (((x)+sizeof(long)-1)  ~(sizeof(long)-1))
 
 static int filldir(void * __buf, const char * name, int namlen, loff_t offset,
u64 ino, unsigned d_type)
@@ -78,7 +78,7 @@ static int filldir(void * __buf, const char * name, int 
namlen, loff_t offset,
struct hpux_dirent * dirent;
struct getdents_callback * buf = (struct getdents_callback *) __buf;
ino_t d_ino;
-   int reclen = ROUND_UP(NAME_OFFSET(dirent) + namlen + 1);
+   int reclen = ALIGN(NAME_OFFSET(dirent) + namlen + 1,sizeof(long));
 
buf-error = -EINVAL;   /* only used if we fail.. */
if (reclen  buf-count)
@@ -103,7 +103,6 @@ static int filldir(void * __buf, const char * name, int 
namlen, loff_t offset,
 }
 
 #undef NAME_OFFSET
-#undef ROUND_UP
 
 int hpux_getdents(unsigned int fd, struct hpux_dirent *dirent, unsigned int 
count)
 {
diff --git a/arch/parisc/kernel/sys_parisc32.c 
b/arch/parisc/kernel/sys_parisc32.c
index ce3245f..e590880 100644
--- a/arch/parisc/kernel/sys_parisc32.c
+++ b/arch/parisc/kernel/sys_parisc32.c
@@ -311,14 +311,13 @@ struct readdir32_callback {
int count;
 };
 
-#define ROUND_UP(x,a)  ((__typeof__(x))(((unsigned long)(x) + ((a) - 1))  
~((a) - 1)))
 #define NAME_OFFSET(de) ((int) ((de)-d_name - (char __user *) (de)))
 static int filldir32 (void *__buf, const char *name, int namlen,
loff_t offset, u64 ino, unsigned int d_type)
 {
struct linux32_dirent __user * dirent;
struct getdents32_callback * buf = (struct getdents32_callback *) __buf;
-   int reclen = ROUND_UP(NAME_OFFSET(dirent) + namlen + 1, 4);
+   int reclen = ALIGN(NAME_OFFSET(dirent) + namlen + 1, 4);
u32 d_ino;
 
buf-error = -EINVAL;   /* only used if we fail.. */

-- 
Milind Arun Choudhary
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: plain 2.6.21-rc5 (1) vs amanda (0)

2007-04-01 Thread Ray Lee

On 3/31/07, Gene Heskett [EMAIL PROTECTED] wrote:

This effect I have isolated down to something in the 31 patches from
2.6.20.4 to 2.6.20.5-rc1, but I'm going to need additional guidance in
setting up the bisect to find it.  If indeed its a kernel problem.


First, set up a *small* test case, for your own sanity as well as
ours. (Set up a new backup job that does just part of your home
directory, for example. No, even better, just one file.) Then verify
that the small test case also fails the same was that you noticed your
big one does between 2.6.20.3 and 2.6.20.4.

Then, download everything in http://madrabbit.org/~ray/2.6.20.4 . That
has all the patches that Greg has in git, but your git is ancient so
let's just use the patches, hmm? It also has a control file (called
'series') that lists the order they should be applied in. Save
everything to the root of your 2.6.20.3 source tree. It'll be messy,
but it'll make things easier.

Once you have that, then go and apply the first half of the patches. (As in:
  head -n 16 series | xargs -n 1 patch -p1
at the base of the tree.

Compile and install that kernel, run your test case to see if the
problem is there. If it *is*, cut it in half again (Revert those 16
patches by adding a -R to the patch command (at the very end), then
redo the above command with an 8 instead of a 16.) If the problem
isn't there, cut the range [16,31] in half, giving you a 24 for the
next trial. Then repeat. Make sense?

Ray
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: cifs causes BUG: soft lockup detected on CPU

2007-04-01 Thread Valentin Zaharov
Hi again,

After applying changes manually to 2.6.20.4 according to the link that
Steven sent I still get those errors (attached below) but no crash so
far.
I am wondering if its ok or having errors still will cause freezes.

Thanks in advance,

Apr  1 04:23:49 UFR2 kernel:  CIFS VFS: cifs_strtoUCS: char2uni returned
-22
Apr  1 04:37:14 UFR2 last message repeated 30 times
Apr  1 04:38:53 UFR2 last message repeated 10 times
Apr  1 04:40:33 UFR2 last message repeated 10 times
Apr  1 04:45:31 UFR2 last message repeated 10 times
Apr  1 04:47:11 UFR2 last message repeated 10 times
Apr  1 05:00:32 UFR2 last message repeated 10 times
Apr  1 05:07:21 UFR2 last message repeated 10 times
Apr  1 05:21:50 UFR2 last message repeated 10 times
Apr  1 05:23:29 UFR2 last message repeated 10 times
Apr  1 05:29:07 UFR2 last message repeated 10 times
Apr  1 05:30:58 UFR2 last message repeated 10 times
Apr  1 05:30:58 UFR2 last message repeated 9 times
Apr  1 05:55:33 UFR2 kernel:  CIFS VFS: Error 0xfff3 on
cifs_get_inode_info in lookup of \nv322657
Apr  1 05:55:33 UFR2 kernel:  CIFS VFS: Error 0xfff3 on
cifs_get_inode_info in lookup of \nv322657
Apr  1 06:28:41 UFR2 kernel:  CIFS VFS: cifs_strtoUCS: char2uni returned
-22
Apr  1 07:24:36 UFR2 last message repeated 10 times
Apr  1 07:26:29 UFR2 last message repeated 10 times
Apr  1 07:28:17 UFR2 last message repeated 10 times
Apr  1 07:30:09 UFR2 last message repeated 10 times
Apr  1 07:35:45 UFR2 last message repeated 10 times
Apr  1 07:44:46 UFR2 last message repeated 10 times
Apr  1 08:23:10 UFR2 last message repeated 10 times
Apr  1 08:24:52 UFR2 last message repeated 10 times
Apr  1 10:04:37 UFR2 last message repeated 10 times
Apr  1 10:04:37 UFR2 last message repeated 8 times
Apr  1 10:36:17 UFR2 kernel:  CIFS VFS: Error 0xfff3 on
cifs_get_inode_info in lookup of \nv322657
Apr  1 10:36:18 UFR2 last message repeated 4 times
Apr  1 10:38:19 UFR2 kernel:  CIFS VFS: cifs_strtoUCS: char2uni returned
-22 


Valentin Zaharov

System Department

NetVision Ltd.

___

Tel 04-8560350 Email  [EMAIL PROTECTED]

NetVision Ltd. Omega Center, Matam Haifa 31905


-Original Message-
From: Steven French [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 29, 2007 5:06 PM
To: Andrew Morton
Cc: Valentin Zaharov; linux-kernel@vger.kernel.org
Subject: Re: cifs causes BUG: soft lockup detected on CPU

This should be fixed in 2.6.21 as of 2/26.  The fix was
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commi
t;h=3677db10a635a39f63ea509f8f0056d95589ff90


Steve French
Senior Software Engineer
Linux Technology Center - IBM Austin
phone: 512-838-2294
email: sfrench at-sign us dot ibm dot com

Andrew Morton [EMAIL PROTECTED] wrote on 03/29/2007 03:54:57
AM:

 On Wed, 28 Mar 2007 20:35:55 +0200 Valentin Zaharov 
 [EMAIL PROTECTED] wrote:
 
  Hi,
  
  We have continous problem with server freezes. We are using cifs 
mounts
  on apache powered web servers with content located on Win2k3 server.
  Servers freeze from time to time, producing following error just 
before
  freeze:
  
  Mar 26 21:50:37 UFR2 kernel:  CIFS VFS: cifs_strtoUCS: char2uni 
returned
  -22 Mar 26 21:51:45 UFR2 last message repeated 55 times Mar 26 
21:52:49
  UFR2 last message repeated 30 times Mar 26 21:54:16 UFR2 last
message
  repeated 10 times Mar 26 21:56:13 UFR2 last message repeated 20
times
  Mar 26 21:58:34 UFR2 last message repeated 75 times Mar 26 21:59:43 
UFR2
  last message repeated 30 times Mar 26 22:01:02 UFR2 last message
  repeated 30 times Mar 26 22:02:04 UFR2 last message repeated 30
times
  Mar 26 22:03:08 UFR2 last message repeated 50 times Mar 26 22:04:27 
UFR2
  last message repeated 10 times Mar 26 22:05:59 UFR2 last message
  repeated 20 times Mar 26 22:07:10 UFR2 last message repeated 20
times
  Mar 26 22:29:00 UFR2 last message repeated 64 times Mar 27 00:47:40 
UFR2
  last message repeated 15 times Mar 27 01:42:41 UFR2 last message
  repeated 95 times Mar 27 02:15:57 UFR2 last message repeated 90
times
  Mar 27 02:27:13 UFR2 last message repeated 45 times Mar 27 03:14:08 
UFR2
  last message repeated 95 times Mar 27 04:26:10 UFR2 last message
  repeated 2 times Mar 27 06:11:35 UFR2 last message repeated 45 times

Mar
  27 06:20:20 UFR2 last message repeated 15 times Mar 27 06:20:20 UFR2
  last message repeated 12 times Mar 27 06:27:53 UFR2 kernel: BUG:
soft
  lockup detected on CPU#3!
  Mar 27 06:27:53 UFR2 kernel:  [c0134b57] softlockup_tick+0x9e/0xac

Mar
  27 06:27:53 UFR2 kernel:  [c0121440]
update_process_times+0x3b/0x5e
  Mar 27 06:27:53 UFR2 kernel:  [c010d885]
  smp_apic_timer_interrupt+0x6c/0x7a
  Mar 27 06:27:53 UFR2 kernel:  [c01032ec]
  apic_timer_interrupt+0x28/0x30 Mar 27 06:27:53 UFR2 kernel:
  [c0153d75] generic_fillattr+0x75/0xa8 Mar 27 06:27:53 UFR2 kernel:
  [f8e78ed2] cifs_getattr+0x1e/0x2b [cifs] Mar 27 06:27:53 UFR2 
kernel:
  [f8e78eb4] cifs_getattr+0x0/0x2b [cifs] Mar 27 06:27:53 

Re: [4/4] 2.6.21-rc5: known regressions (v2)

2007-04-01 Thread Michael S. Tsirkin
 Subject: after resume: X hangs after drawing a couple of windows
  workaround: clocksource=acpi_pm
 References : http://lkml.org/lkml/2007/3/8/117
  http://lkml.org/lkml/2007/3/25/20
  http://lkml.org/lkml/2007/3/26/151
 Submitter  : Michael S. Tsirkin [EMAIL PROTECTED]
 Handled-By : Thomas Gleixner [EMAIL PROTECTED]
 Status : problem is being debugged

Adrian,
the bug was found by Maxim Levitsky and
the following patch appears to have fixed the problem:
http://lkml.org/lkml/2007/3/28/104

the right way to fix it is still being discussed:
http://lkml.org/lkml/2007/3/28/182


-- 
MST
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [-mm3 PATCH] (Retry) Check the return value of kobject_add and etc.

2007-04-01 Thread Parag Warudkar

-   kobject_add(p-kobj);
+   if (kobject_add(p-kobj)) {
+   kfree(p);
+   return;


Please add a printk warning before the return statement to log a
proper warning stating what happened, which file and line etc. That
way people can know why something did not work as expected and
hopefully do something about it.

Parag
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86_64: fix arithmetic in comment

2007-04-01 Thread Avi Kivity
The xmm space on x86_64 is 256 bytes.

Signed-off-by: Avi Kivity [EMAIL PROTECTED]

diff --git a/include/asm-x86_64/processor.h b/include/asm-x86_64/processor.h
index 76552d7..90eb24e 100644
--- a/include/asm-x86_64/processor.h
+++ b/include/asm-x86_64/processor.h
@@ -201,7 +201,7 @@ struct i387_fxsave_struct {
u32 mxcsr;
u32 mxcsr_mask;
u32 st_space[32];   /* 8*16 bytes for each FP-reg = 128 bytes */
-   u32 xmm_space[64];  /* 16*16 bytes for each XMM-reg = 128 bytes */
+   u32 xmm_space[64];  /* 16*16 bytes for each XMM-reg = 256 bytes */
u32 padding[24];
 } __attribute__ ((aligned (16)));
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + clocksource-driver-initialize-list-value.patch added to -mm tree

2007-04-01 Thread Thomas Gleixner
On Sat, 2007-03-31 at 22:23 -0700, [EMAIL PROTECTED] wrote:
 The patch titled
  clocksource: driver initialize list value
 has been added to the -mm tree.  Its filename is
  clocksource-driver-initialize-list-value.patch
 
 *** Remember to use Documentation/SubmitChecklist when testing your code ***
 
 See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
 out what to do about this
 
 --
 Subject: clocksource: driver initialize list value
 From: Daniel Walker [EMAIL PROTECTED]
 
 Update drivers/clocksource/ with list initialization.

As others pointed out already, can we please have some usefull
description why this change is necessary ? i.e. that it is a preparatory
patch to simplify the clocksource registration logic.

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hid: add two led codes to hid input mapping

2007-04-01 Thread Jiri Kosina
On Sat, 31 Mar 2007, Dan Engel wrote:

 This patch is really being offered because it's what's needed to make the 
 operation
 of the Belkin Flip USB KVM switch avaiable to user-space programs through the 
 HID input
 event interface. The Belkin Flip KVM overloads LED usages to give software 
 control
 over the device, providing options to flip either audio, video or both. 
 However,
 without an input mapping to the Off-hook and Speaker LED usages, this 
 functionality
 isn't available.

Dmitry, would adding these two LED_ constants to input.h be OK by you? 
(the coresponding usages are defined in HUT 1.12 on page 62).

Thanks,

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 6/13] signal/timer/event fds v9 - timerfd core ...

2007-04-01 Thread Thomas Gleixner
On Sat, 2007-03-31 at 13:09 -0700, Davide Libenzi wrote:
 +/*
 + * This gets called when the timer event triggers. We increment the
 + * tick count and wake the possible waiters. If the timer in a
 + * sequential one (-tintv.tv64 != 0), we re-arm it with hrtimer_forward().
 + */
 +static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr)
 +{
 + struct timerfd_ctx *ctx = container_of(htmr, struct timerfd_ctx, tmr);
 + enum hrtimer_restart rval = HRTIMER_NORESTART;
 + unsigned long flags;
 +
 + spin_lock_irqsave(ctx-lock, flags);
 + ctx-ticks++;
 + wake_up_locked(ctx-wqh);
 + if (ctx-tintv.tv64 != 0) {
 + hrtimer_forward(htmr, hrtimer_cb_get_time(htmr), ctx-tintv);
 + rval = HRTIMER_RESTART;
 + }
 + spin_unlock_irqrestore(ctx-lock, flags);
 +
 + return rval;
 +}

For periodic timers we probably want to know also about missed ticks,
i.e. when the timer was delayed.

I changed recently the rearm handling code of itimers to prevent DoS
attacks. See commit 8bfd9a7a229b5f3d3eda5d7d45c2eebec5b4ba16. The posix
timer code has a similar mechanism.

Probably we should do the same here. That means that we defer the
restart of the timer to the process context.

tglx

static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr)
{
   struct timerfd_ctx *ctx = container_of(htmr, struct timerfd_ctx, tmr);
   unsigned long flags;

   spin_lock_irqsave(ctx-lock, flags);
   ctx-expired = 1;
   wake_up_locked(ctx-wqh);
   spin_unlock_irqrestore(ctx-lock, flags);

   return HRTIMER_NORESTART;
}

static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
   loff_t *ppos)
{
   struct timerfd_ctx *ctx = file-private_data;
   ssize_t res;
   u32 ticks = 0;
   DECLARE_WAITQUEUE(wait, current);

   if (count  sizeof(ticks))
   return -EINVAL;
   spin_lock_irq(ctx-lock);
   res = -EAGAIN;
   if (!ctx-expired  !(file-f_flags  O_NONBLOCK)) {
   __add_wait_queue(ctx-wqh, wait);
   for (res = 0;;) {
   set_current_state(TASK_INTERRUPTIBLE);
   if (ctx-expired) {
   res = 0;
   break;
   }
   if (signal_pending(current)) {
   res = -ERESTARTSYS;
   break;
   }
   spin_unlock_irq(ctx-lock);
   schedule();
   spin_lock_irq(ctx-lock);
   }
   __remove_wait_queue(ctx-wqh, wait);
   __set_current_state(TASK_RUNNING);
   }
   if (ctx-expired) {
ctx-expired = 0;
if (ctx-tintv.tv64 != 0) {
ticks = hrtimer_forward(ctx-tmr, ktime_get(),
ctx-tintv);
hrtimer_restart(ctx-tmr);
} else
ticks = 1;
}
   spin_unlock_irq(ctx-lock);
   if (ticks)
   res = put_user(ticks, buf) ? -EFAULT: sizeof(ticks);
   return res;
}


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] remove artificial software max_loop limit

2007-04-01 Thread devzero
Remove artificial maximum 256 loop device that can be created due to a
legacy device number limit.  Searching through lkml archive, there are
several instances where users complained about the artificial limit
that the loop driver impose.  There is no reason to have such limit.

Hey, i was one of those :)

Nice to see, that it`s solved now, thanks very much!

I never expected this to happen and put all my hope into dm-loop instead.
Did you mind that Bryn m. Reeves from redhat will suffer a serious depression 
now? ( -  http://sources.redhat.com/lvm2/wiki/DMLoop ) ;)

regards
roland

___
SMS schreiben mit WEB.DE FreeMail - einfach, schnell und
kostenguenstig. Jetzt gleich testen! http://f.web.de/?mc=021192

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/5] x86_64: enable clockevents and dynticks

2007-04-01 Thread Thomas Gleixner
On Sat, 2007-03-31 at 01:31 -0700, Chris Wright wrote:
 This series converts x86_64 timers to clockevents drivers
 and then enables dynticks.  There's some minor cleanups along
 the way.  The lapic broadcast mechanism is untested, I'm sure it
 still needs work, there's still some cruft in lapic_setup_timer.
 
 This is just for comments at this point, now that it's working
 on my test box in both NO_HZ=n and NO_HZ=n configurations (typically
 using hpet).

Have you checked, if we could share the code between i386 and x86_64 at
least for PIT and HPET. I'm not sure about the local APIC, but I think
it might be doable as well.

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cannot add device to partitioned raid6 array

2007-04-01 Thread Florian D.

Neil Brown wrote:

On Saturday March 31, [EMAIL PROTECTED] wrote:

hi list!

in short:
I created a partitioned raid6 array with 2 missing drives. Now, I want to add a 
device. It fails with:
flockmock ~ # mdadm -a /dev/md_d4 /dev/sdb2
mdadm: add new device failed for /dev/sdb2 as 4: Invalid argument


Thanks for the detailed problem report.

I think the cause of the error is that /dev/sdb2 is too small.
It needs to be at least 490030594 sectors. How big is it?


but the *device* size should be only ~250GB, so the array size is ~500GB, no?

flockmock ~ # mdadm --detail /dev/md_d4
/dev/md_d4:
Version : 00.90.03
  Creation Time : Sat Mar 31 19:48:58 2007
 Raid Level : raid6
 Array Size : 490030464 (467.33 GiB 501.79 GB)
  Used Dev Size : 245015232 (233.66 GiB 250.90 GB)
  ^^

these disks are all 250 GB harddisks, formatted in the same way -- but they are from different 
vendors! The block sizes from the partitions vary a little bit (taken from /sbin/fdisk):


/dev/sda2: 245015347
/dev/sdc2: 245015347
/dev/sdb2: 244099642  - different vendor

do you think that may be the cause?

thanks a lot,
florian

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: missing kretprobes support on avr32 and sparc64

2007-04-01 Thread Håvard Skinnemoen
On Sat, March 31, 2007 15:15, Christoph Hellwig wrote:
 Currently all avr32 and sparc64 don't support kretprobes unlike all
 other kprobes supporting architectures.  This is not nice from the
 user interface point of view and from the ugly ifdefs point of view.
 Is there a reason these ports can't support kretprobes or was this
 simply an oversight / lazyness?

Lazyness on my part, I guess. It shouldn't be too hard to implement on
avr32; It looks like the generic code sets a breakpoint on function entry,
so all I have to do is to set a breakpoint on the return address. And
handle the breakpoint somehow. I'll try to implement it when I get back to
work in about a week.

Haavard

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   >