Re: hrtimer possible issue

2013-02-05 Thread Izik Eidus

On 02/05/2013 12:20 PM, Thomas Gleixner wrote:

On Mon, 4 Feb 2013, Leonid Shatz wrote:


I assume the race can also happen between hrtimer cancel and start. In both
cases timer base switch can happen.

Izik, please check if you can arrange the patch in the standard format (do
we need to do it against latest kernel version?)

Yes please.


But I sent it already yasterday with git-send-email and checkpatch.pl :-)


Thanks,

tglx




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: hrtimer possible issue

2013-02-05 Thread Izik Eidus

On 02/05/2013 12:20 PM, Thomas Gleixner wrote:

On Mon, 4 Feb 2013, Leonid Shatz wrote:


I assume the race can also happen between hrtimer cancel and start. In both
cases timer base switch can happen.

Izik, please check if you can arrange the patch in the standard format (do
we need to do it against latest kernel version?)

Yes please.


But I sent it already yasterday with git-send-email and checkpatch.pl :-)


Thanks,

tglx




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] fix hrtimer_enqueue_reprogram race

2013-02-04 Thread Izik Eidus
From: leonid Shatz 

it seems like hrtimer_enqueue_reprogram contain a race which could result in
timer.base switch during unlock/lock sequence.

See the code at __hrtimer_start_range_ns where it calls
hrtimer_enqueue_reprogram. The later is releasing lock protecting the timer
base for a short time and timer base switch can occur from a different CPU
thread. Later when __hrtimer_start_range_ns calls unlock_hrtimer_base, a base
switch could have happened and this causes the bug

Try to start the same hrtimer from two different threads in kernel running
each one on a different CPU. Eventually one of the calls will cause timer base
switch while another thread is not expecting it.

This can happen in virtualized environment where one thread can be delayed by
lower hypervisor, and due to time delay a different CPU is taking care of
missed timer start and runs the timer start logic on its own.

Signed-off-by: Leonid Shatz 
Signed-off-by: Izik Eidus 
---
 kernel/hrtimer.c |   32 ++--
 1 file changed, 14 insertions(+), 18 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 6db7a5e..0c8c6cd 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -640,21 +640,9 @@ static inline void hrtimer_init_hres(struct 
hrtimer_cpu_base *base)
  * and expiry check is done in the hrtimer_interrupt or in the softirq.
  */
 static inline int hrtimer_enqueue_reprogram(struct hrtimer *timer,
-   struct hrtimer_clock_base *base,
-   int wakeup)
+   struct hrtimer_clock_base *base)
 {
-   if (base->cpu_base->hres_active && hrtimer_reprogram(timer, base)) {
-   if (wakeup) {
-   raw_spin_unlock(>cpu_base->lock);
-   raise_softirq_irqoff(HRTIMER_SOFTIRQ);
-   raw_spin_lock(>cpu_base->lock);
-   } else
-   __raise_softirq_irqoff(HRTIMER_SOFTIRQ);
-
-   return 1;
-   }
-
-   return 0;
+   return base->cpu_base->hres_active && hrtimer_reprogram(timer, base);
 }
 
 static inline ktime_t hrtimer_update_base(struct hrtimer_cpu_base *base)
@@ -735,8 +723,7 @@ static inline int hrtimer_switch_to_hres(void) { return 0; }
 static inline void
 hrtimer_force_reprogram(struct hrtimer_cpu_base *base, int skip_equal) { }
 static inline int hrtimer_enqueue_reprogram(struct hrtimer *timer,
-   struct hrtimer_clock_base *base,
-   int wakeup)
+   struct hrtimer_clock_base *base)
 {
return 0;
 }
@@ -995,8 +982,17 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, 
ktime_t tim,
 *
 * XXX send_remote_softirq() ?
 */
-   if (leftmost && new_base->cpu_base == &__get_cpu_var(hrtimer_bases))
-   hrtimer_enqueue_reprogram(timer, new_base, wakeup);
+   if (leftmost && new_base->cpu_base == &__get_cpu_var(hrtimer_bases)
+   && hrtimer_enqueue_reprogram(timer, new_base)) {
+   if (wakeup) {
+   raw_spin_unlock(_base->cpu_base->lock);
+   raise_softirq_irqoff(HRTIMER_SOFTIRQ);
+   local_irq_restore(flags);
+   return ret;
+   } else {
+   __raise_softirq_irqoff(HRTIMER_SOFTIRQ);
+   }
+   }
 
unlock_hrtimer_base(timer, );
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] fix hrtimer_enqueue_reprogram race

2013-02-04 Thread Izik Eidus
From: leonid Shatz leonid.sh...@ravellosystems.com

it seems like hrtimer_enqueue_reprogram contain a race which could result in
timer.base switch during unlock/lock sequence.

See the code at __hrtimer_start_range_ns where it calls
hrtimer_enqueue_reprogram. The later is releasing lock protecting the timer
base for a short time and timer base switch can occur from a different CPU
thread. Later when __hrtimer_start_range_ns calls unlock_hrtimer_base, a base
switch could have happened and this causes the bug

Try to start the same hrtimer from two different threads in kernel running
each one on a different CPU. Eventually one of the calls will cause timer base
switch while another thread is not expecting it.

This can happen in virtualized environment where one thread can be delayed by
lower hypervisor, and due to time delay a different CPU is taking care of
missed timer start and runs the timer start logic on its own.

Signed-off-by: Leonid Shatz leonid.sh...@ravellosystems.com
Signed-off-by: Izik Eidus izik.ei...@ravellosystems.com
---
 kernel/hrtimer.c |   32 ++--
 1 file changed, 14 insertions(+), 18 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 6db7a5e..0c8c6cd 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -640,21 +640,9 @@ static inline void hrtimer_init_hres(struct 
hrtimer_cpu_base *base)
  * and expiry check is done in the hrtimer_interrupt or in the softirq.
  */
 static inline int hrtimer_enqueue_reprogram(struct hrtimer *timer,
-   struct hrtimer_clock_base *base,
-   int wakeup)
+   struct hrtimer_clock_base *base)
 {
-   if (base-cpu_base-hres_active  hrtimer_reprogram(timer, base)) {
-   if (wakeup) {
-   raw_spin_unlock(base-cpu_base-lock);
-   raise_softirq_irqoff(HRTIMER_SOFTIRQ);
-   raw_spin_lock(base-cpu_base-lock);
-   } else
-   __raise_softirq_irqoff(HRTIMER_SOFTIRQ);
-
-   return 1;
-   }
-
-   return 0;
+   return base-cpu_base-hres_active  hrtimer_reprogram(timer, base);
 }
 
 static inline ktime_t hrtimer_update_base(struct hrtimer_cpu_base *base)
@@ -735,8 +723,7 @@ static inline int hrtimer_switch_to_hres(void) { return 0; }
 static inline void
 hrtimer_force_reprogram(struct hrtimer_cpu_base *base, int skip_equal) { }
 static inline int hrtimer_enqueue_reprogram(struct hrtimer *timer,
-   struct hrtimer_clock_base *base,
-   int wakeup)
+   struct hrtimer_clock_base *base)
 {
return 0;
 }
@@ -995,8 +982,17 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, 
ktime_t tim,
 *
 * XXX send_remote_softirq() ?
 */
-   if (leftmost  new_base-cpu_base == __get_cpu_var(hrtimer_bases))
-   hrtimer_enqueue_reprogram(timer, new_base, wakeup);
+   if (leftmost  new_base-cpu_base == __get_cpu_var(hrtimer_bases)
+hrtimer_enqueue_reprogram(timer, new_base)) {
+   if (wakeup) {
+   raw_spin_unlock(new_base-cpu_base-lock);
+   raise_softirq_irqoff(HRTIMER_SOFTIRQ);
+   local_irq_restore(flags);
+   return ret;
+   } else {
+   __raise_softirq_irqoff(HRTIMER_SOFTIRQ);
+   }
+   }
 
unlock_hrtimer_base(timer, flags);
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/11] ksm: NUMA trees and page migration

2013-01-28 Thread Izik Eidus

On 01/29/2013 02:49 AM, Izik Eidus wrote:

On 01/29/2013 01:54 AM, Andrew Morton wrote:

On Fri, 25 Jan 2013 17:53:10 -0800 (PST)
Hugh Dickins  wrote:


Here's a KSM series

Sanity check: do you have a feeling for how useful KSM is?
Performance/space improvements for typical (or atypical) workloads?
Are people using it?  Successfully?



BTW, After thinking a bit about the word people, I wanted to see if 
normal users of linux
that just download and install Linux (without using special 
virtualization product) are able to use it.

So I google little bit for it, and found some nice results from users:
http://serverascode.com/2012/11/11/ksm-kvm.html

But I do agree that it provide justifying value only for virtualization 
users...




Hi,
I think it mostly used for virtualization, I know at least two 
products that it use -
RHEV - RedHat enterprise virtualization, and my current place (Ravello 
Systems) that use it to do vm consolidation on top of cloud enviorments
(Run multiple unmodified VMs on top of one vm you get from ec2 / 
rackspace / what so ever), for Ravello it is highly critical in 
achieving high rate

of consolidation ratio...



IOW, is it justifying itself?




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/11] ksm: NUMA trees and page migration

2013-01-28 Thread Izik Eidus

On 01/29/2013 01:54 AM, Andrew Morton wrote:

On Fri, 25 Jan 2013 17:53:10 -0800 (PST)
Hugh Dickins  wrote:


Here's a KSM series

Sanity check: do you have a feeling for how useful KSM is?
Performance/space improvements for typical (or atypical) workloads?
Are people using it?  Successfully?


Hi,
I think it mostly used for virtualization, I know at least two products 
that it use -
RHEV - RedHat enterprise virtualization, and my current place (Ravello 
Systems) that use it to do vm consolidation on top of cloud enviorments
(Run multiple unmodified VMs on top of one vm you get from ec2 / 
rackspace / what so ever), for Ravello it is highly critical in 
achieving high rate

of consolidation ratio...



IOW, is it justifying itself?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/11] ksm: NUMA trees and page migration

2013-01-28 Thread Izik Eidus

On 01/29/2013 01:54 AM, Andrew Morton wrote:

On Fri, 25 Jan 2013 17:53:10 -0800 (PST)
Hugh Dickins hu...@google.com wrote:


Here's a KSM series

Sanity check: do you have a feeling for how useful KSM is?
Performance/space improvements for typical (or atypical) workloads?
Are people using it?  Successfully?


Hi,
I think it mostly used for virtualization, I know at least two products 
that it use -
RHEV - RedHat enterprise virtualization, and my current place (Ravello 
Systems) that use it to do vm consolidation on top of cloud enviorments
(Run multiple unmodified VMs on top of one vm you get from ec2 / 
rackspace / what so ever), for Ravello it is highly critical in 
achieving high rate

of consolidation ratio...



IOW, is it justifying itself?


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/11] ksm: NUMA trees and page migration

2013-01-28 Thread Izik Eidus

On 01/29/2013 02:49 AM, Izik Eidus wrote:

On 01/29/2013 01:54 AM, Andrew Morton wrote:

On Fri, 25 Jan 2013 17:53:10 -0800 (PST)
Hugh Dickins hu...@google.com wrote:


Here's a KSM series

Sanity check: do you have a feeling for how useful KSM is?
Performance/space improvements for typical (or atypical) workloads?
Are people using it?  Successfully?



BTW, After thinking a bit about the word people, I wanted to see if 
normal users of linux
that just download and install Linux (without using special 
virtualization product) are able to use it.

So I google little bit for it, and found some nice results from users:
http://serverascode.com/2012/11/11/ksm-kvm.html

But I do agree that it provide justifying value only for virtualization 
users...




Hi,
I think it mostly used for virtualization, I know at least two 
products that it use -
RHEV - RedHat enterprise virtualization, and my current place (Ravello 
Systems) that use it to do vm consolidation on top of cloud enviorments
(Run multiple unmodified VMs on top of one vm you get from ec2 / 
rackspace / what so ever), for Ravello it is highly critical in 
achieving high rate

of consolidation ratio...



IOW, is it justifying itself?




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH 7/8] MMU: add TDP support to the KVM MMU

2008-02-07 Thread Izik Eidus

Joerg Roedel wrote:

On Thu, Feb 07, 2008 at 03:27:19PM +0200, Izik Eidus wrote:
  

Joerg Roedel wrote:


This patch contains the changes to the KVM MMU necessary for support of the
Nested Paging feature in AMD Barcelona and Phenom Processors.
 
  

good patch, it look like things will be very fixable with it



Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]>
---
arch/x86/kvm/mmu.c |   79 ++--
arch/x86/kvm/mmu.h |6 
2 files changed, 82 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5e76963..5304d55 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1081,6 +1081,7 @@ static void mmu_alloc_roots(struct kvm_vcpu *vcpu)
int i;
gfn_t root_gfn;
struct kvm_mmu_page *sp;
+   int metaphysical = 0;
root_gfn = vcpu->arch.cr3 >> PAGE_SHIFT;
@@ -1089,14 +1090,20 @@ static void mmu_alloc_roots(struct kvm_vcpu *vcpu)
hpa_t root = vcpu->arch.mmu.root_hpa;
ASSERT(!VALID_PAGE(root));
+   if (tdp_enabled)
+   metaphysical = 1;
sp = kvm_mmu_get_page(vcpu, root_gfn, 0,
- PT64_ROOT_LEVEL, 0, ACC_ALL, NULL, NULL);
+ PT64_ROOT_LEVEL, metaphysical,
+ ACC_ALL, NULL, NULL);
root = __pa(sp->spt);
++sp->root_count;
vcpu->arch.mmu.root_hpa = root;
return;
}
#endif
+   metaphysical = !is_paging(vcpu);
+   if (tdp_enabled)
+   metaphysical = 1;
for (i = 0; i < 4; ++i) {
hpa_t root = vcpu->arch.mmu.pae_root[i];
@@ -1110,7 +1117,7 @@ static void mmu_alloc_roots(struct kvm_vcpu *vcpu)
} else if (vcpu->arch.mmu.root_level == 0)
root_gfn = 0;
sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30,
- PT32_ROOT_LEVEL, !is_paging(vcpu),
+ PT32_ROOT_LEVEL, metaphysical,
  ACC_ALL, NULL, NULL);
root = __pa(sp->spt);
++sp->root_count;
@@ -1144,6 +1151,36 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, 
gva_t gva,
 error_code & PFERR_WRITE_MASK, gfn);
}
+static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
+   u32 error_code)
 
  

you probably mean gpa_t ?



Yes. But the function is assigned to a function pointer. And the type of
that pointer expects gva_t there. So I named the parameter gpa to
describe that a guest physical address is meant there.

  

+{
+   struct page *page;
+   int r;
+
+   ASSERT(vcpu);
+   ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
+
+   r = mmu_topup_memory_caches(vcpu);
+   if (r)
+   return r;
+
+   down_read(>mm->mmap_sem);
+   page = gfn_to_page(vcpu->kvm, gpa >> PAGE_SHIFT);
+   if (is_error_page(page)) {
+   kvm_release_page_clean(page);
+   up_read(>mm->mmap_sem);
+   return 1;
+   }
 
  

i dont know if it worth checking it here,
in the worth case we will map the error page and the host will be safe



Looking at the nonpaging_map function it is the right place to check for
the error page.
  


thinking about it again you are right, (for some reason i was thinking 
about old kvm code that was replace already)

the is_error_page should be here.


Joerg Roedel

  



--
woof.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH 7/8] MMU: add TDP support to the KVM MMU

2008-02-07 Thread Izik Eidus

Joerg Roedel wrote:

This patch contains the changes to the KVM MMU necessary for support of the
Nested Paging feature in AMD Barcelona and Phenom Processors.
  


good patch, it look like things will be very fixable with it


Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]>
---
 arch/x86/kvm/mmu.c |   79 ++--
 arch/x86/kvm/mmu.h |6 
 2 files changed, 82 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5e76963..5304d55 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1081,6 +1081,7 @@ static void mmu_alloc_roots(struct kvm_vcpu *vcpu)
int i;
gfn_t root_gfn;
struct kvm_mmu_page *sp;
+   int metaphysical = 0;
 
 	root_gfn = vcpu->arch.cr3 >> PAGE_SHIFT;
 
@@ -1089,14 +1090,20 @@ static void mmu_alloc_roots(struct kvm_vcpu *vcpu)

hpa_t root = vcpu->arch.mmu.root_hpa;
 
 		ASSERT(!VALID_PAGE(root));

+   if (tdp_enabled)
+   metaphysical = 1;
sp = kvm_mmu_get_page(vcpu, root_gfn, 0,
- PT64_ROOT_LEVEL, 0, ACC_ALL, NULL, NULL);
+ PT64_ROOT_LEVEL, metaphysical,
+ ACC_ALL, NULL, NULL);
root = __pa(sp->spt);
++sp->root_count;
vcpu->arch.mmu.root_hpa = root;
return;
}
 #endif
+   metaphysical = !is_paging(vcpu);
+   if (tdp_enabled)
+   metaphysical = 1;
for (i = 0; i < 4; ++i) {
hpa_t root = vcpu->arch.mmu.pae_root[i];
 
@@ -1110,7 +1117,7 @@ static void mmu_alloc_roots(struct kvm_vcpu *vcpu)

} else if (vcpu->arch.mmu.root_level == 0)
root_gfn = 0;
sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30,
- PT32_ROOT_LEVEL, !is_paging(vcpu),
+ PT32_ROOT_LEVEL, metaphysical,
  ACC_ALL, NULL, NULL);
root = __pa(sp->spt);
++sp->root_count;
@@ -1144,6 +1151,36 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, 
gva_t gva,
 error_code & PFERR_WRITE_MASK, gfn);
 }
 
+static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,

+   u32 error_code)
  


you probably mean gpa_t ?


+{
+   struct page *page;
+   int r;
+
+   ASSERT(vcpu);
+   ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
+
+   r = mmu_topup_memory_caches(vcpu);
+   if (r)
+   return r;
+
+   down_read(>mm->mmap_sem);
+   page = gfn_to_page(vcpu->kvm, gpa >> PAGE_SHIFT);
+   if (is_error_page(page)) {
+   kvm_release_page_clean(page);
+   up_read(>mm->mmap_sem);
+   return 1;
+   }
  


i dont know if it worth checking it here,
in the worth case we will map the error page and the host will be safe


+   spin_lock(>kvm->mmu_lock);
+   kvm_mmu_free_some_pages(vcpu);
+   r = __direct_map(vcpu, gpa, error_code & PFERR_WRITE_MASK,
+gpa >> PAGE_SHIFT, page, TDP_ROOT_LEVEL);
+   spin_unlock(>kvm->mmu_lock);
+   up_read(>mm->mmap_sem);
+
+   return r;
+}
+
 static void nonpaging_free(struct kvm_vcpu *vcpu)
 {
mmu_free_roots(vcpu);
@@ -1237,7 +1274,35 @@ static int paging32E_init_context(struct kvm_vcpu *vcpu)
return paging64_init_context_common(vcpu, PT32E_ROOT_LEVEL);
 }
 
-static int init_kvm_mmu(struct kvm_vcpu *vcpu)

tdp_page_fault(struct
+static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
+{
+   struct kvm_mmu *context = >arch.mmu;
+
+   context->new_cr3 = nonpaging_new_cr3;
+   context->page_fault = tdp_page_fault;
+   context->free = nonpaging_free;
+   context->prefetch_page = nonpaging_prefetch_page;
+   context->shadow_root_level = TDP_ROOT_LEVEL;
+   context->root_hpa = INVALID_PAGE;
+
+   if (!is_paging(vcpu)) {
+   context->gva_to_gpa = nonpaging_gva_to_gpa;
+   context->root_level = 0;
+   } else if (is_long_mode(vcpu)) {
+   context->gva_to_gpa = paging64_gva_to_gpa;
+   context->root_level = PT64_ROOT_LEVEL;
+   } else if (is_pae(vcpu)) {
+   context->gva_to_gpa = paging64_gva_to_gpa;
+   context->root_level = PT32E_ROOT_LEVEL;
+   } else {
+   context->gva_to_gpa = paging32_gva_to_gpa;
+   context->root_level = PT32_ROOT_LEVEL;
+   }
+
+   return 0;
+}
+
+static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 {
ASSERT(vcpu);
ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
@@ -1252,6 +1317,14 @@ static int init_kvm_mmu(struct kvm_vcpu *vcpu)
return paging32_init_context(vcpu);
 }
 
+static int init_kvm_mmu(struct kvm_vcpu *vcpu)

+{
+   if (tdp_enabled)
+  

Re: [kvm-devel] [PATCH 7/8] MMU: add TDP support to the KVM MMU

2008-02-07 Thread Izik Eidus

Joerg Roedel wrote:

This patch contains the changes to the KVM MMU necessary for support of the
Nested Paging feature in AMD Barcelona and Phenom Processors.
  


good patch, it look like things will be very fixable with it


Signed-off-by: Joerg Roedel [EMAIL PROTECTED]
---
 arch/x86/kvm/mmu.c |   79 ++--
 arch/x86/kvm/mmu.h |6 
 2 files changed, 82 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5e76963..5304d55 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1081,6 +1081,7 @@ static void mmu_alloc_roots(struct kvm_vcpu *vcpu)
int i;
gfn_t root_gfn;
struct kvm_mmu_page *sp;
+   int metaphysical = 0;
 
 	root_gfn = vcpu-arch.cr3  PAGE_SHIFT;
 
@@ -1089,14 +1090,20 @@ static void mmu_alloc_roots(struct kvm_vcpu *vcpu)

hpa_t root = vcpu-arch.mmu.root_hpa;
 
 		ASSERT(!VALID_PAGE(root));

+   if (tdp_enabled)
+   metaphysical = 1;
sp = kvm_mmu_get_page(vcpu, root_gfn, 0,
- PT64_ROOT_LEVEL, 0, ACC_ALL, NULL, NULL);
+ PT64_ROOT_LEVEL, metaphysical,
+ ACC_ALL, NULL, NULL);
root = __pa(sp-spt);
++sp-root_count;
vcpu-arch.mmu.root_hpa = root;
return;
}
 #endif
+   metaphysical = !is_paging(vcpu);
+   if (tdp_enabled)
+   metaphysical = 1;
for (i = 0; i  4; ++i) {
hpa_t root = vcpu-arch.mmu.pae_root[i];
 
@@ -1110,7 +1117,7 @@ static void mmu_alloc_roots(struct kvm_vcpu *vcpu)

} else if (vcpu-arch.mmu.root_level == 0)
root_gfn = 0;
sp = kvm_mmu_get_page(vcpu, root_gfn, i  30,
- PT32_ROOT_LEVEL, !is_paging(vcpu),
+ PT32_ROOT_LEVEL, metaphysical,
  ACC_ALL, NULL, NULL);
root = __pa(sp-spt);
++sp-root_count;
@@ -1144,6 +1151,36 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, 
gva_t gva,
 error_code  PFERR_WRITE_MASK, gfn);
 }
 
+static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,

+   u32 error_code)
  


you probably mean gpa_t ?


+{
+   struct page *page;
+   int r;
+
+   ASSERT(vcpu);
+   ASSERT(VALID_PAGE(vcpu-arch.mmu.root_hpa));
+
+   r = mmu_topup_memory_caches(vcpu);
+   if (r)
+   return r;
+
+   down_read(current-mm-mmap_sem);
+   page = gfn_to_page(vcpu-kvm, gpa  PAGE_SHIFT);
+   if (is_error_page(page)) {
+   kvm_release_page_clean(page);
+   up_read(current-mm-mmap_sem);
+   return 1;
+   }
  


i dont know if it worth checking it here,
in the worth case we will map the error page and the host will be safe


+   spin_lock(vcpu-kvm-mmu_lock);
+   kvm_mmu_free_some_pages(vcpu);
+   r = __direct_map(vcpu, gpa, error_code  PFERR_WRITE_MASK,
+gpa  PAGE_SHIFT, page, TDP_ROOT_LEVEL);
+   spin_unlock(vcpu-kvm-mmu_lock);
+   up_read(current-mm-mmap_sem);
+
+   return r;
+}
+
 static void nonpaging_free(struct kvm_vcpu *vcpu)
 {
mmu_free_roots(vcpu);
@@ -1237,7 +1274,35 @@ static int paging32E_init_context(struct kvm_vcpu *vcpu)
return paging64_init_context_common(vcpu, PT32E_ROOT_LEVEL);
 }
 
-static int init_kvm_mmu(struct kvm_vcpu *vcpu)

tdp_page_fault(struct
+static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
+{
+   struct kvm_mmu *context = vcpu-arch.mmu;
+
+   context-new_cr3 = nonpaging_new_cr3;
+   context-page_fault = tdp_page_fault;
+   context-free = nonpaging_free;
+   context-prefetch_page = nonpaging_prefetch_page;
+   context-shadow_root_level = TDP_ROOT_LEVEL;
+   context-root_hpa = INVALID_PAGE;
+
+   if (!is_paging(vcpu)) {
+   context-gva_to_gpa = nonpaging_gva_to_gpa;
+   context-root_level = 0;
+   } else if (is_long_mode(vcpu)) {
+   context-gva_to_gpa = paging64_gva_to_gpa;
+   context-root_level = PT64_ROOT_LEVEL;
+   } else if (is_pae(vcpu)) {
+   context-gva_to_gpa = paging64_gva_to_gpa;
+   context-root_level = PT32E_ROOT_LEVEL;
+   } else {
+   context-gva_to_gpa = paging32_gva_to_gpa;
+   context-root_level = PT32_ROOT_LEVEL;
+   }
+
+   return 0;
+}
+
+static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 {
ASSERT(vcpu);
ASSERT(!VALID_PAGE(vcpu-arch.mmu.root_hpa));
@@ -1252,6 +1317,14 @@ static int init_kvm_mmu(struct kvm_vcpu *vcpu)
return paging32_init_context(vcpu);
 }
 
+static int init_kvm_mmu(struct kvm_vcpu *vcpu)

+{
+   if (tdp_enabled)
+   

Re: [kvm-devel] [PATCH 7/8] MMU: add TDP support to the KVM MMU

2008-02-07 Thread Izik Eidus

Joerg Roedel wrote:

On Thu, Feb 07, 2008 at 03:27:19PM +0200, Izik Eidus wrote:
  

Joerg Roedel wrote:


This patch contains the changes to the KVM MMU necessary for support of the
Nested Paging feature in AMD Barcelona and Phenom Processors.
 
  

good patch, it look like things will be very fixable with it



Signed-off-by: Joerg Roedel [EMAIL PROTECTED]
---
arch/x86/kvm/mmu.c |   79 ++--
arch/x86/kvm/mmu.h |6 
2 files changed, 82 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5e76963..5304d55 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1081,6 +1081,7 @@ static void mmu_alloc_roots(struct kvm_vcpu *vcpu)
int i;
gfn_t root_gfn;
struct kvm_mmu_page *sp;
+   int metaphysical = 0;
root_gfn = vcpu-arch.cr3  PAGE_SHIFT;
@@ -1089,14 +1090,20 @@ static void mmu_alloc_roots(struct kvm_vcpu *vcpu)
hpa_t root = vcpu-arch.mmu.root_hpa;
ASSERT(!VALID_PAGE(root));
+   if (tdp_enabled)
+   metaphysical = 1;
sp = kvm_mmu_get_page(vcpu, root_gfn, 0,
- PT64_ROOT_LEVEL, 0, ACC_ALL, NULL, NULL);
+ PT64_ROOT_LEVEL, metaphysical,
+ ACC_ALL, NULL, NULL);
root = __pa(sp-spt);
++sp-root_count;
vcpu-arch.mmu.root_hpa = root;
return;
}
#endif
+   metaphysical = !is_paging(vcpu);
+   if (tdp_enabled)
+   metaphysical = 1;
for (i = 0; i  4; ++i) {
hpa_t root = vcpu-arch.mmu.pae_root[i];
@@ -1110,7 +1117,7 @@ static void mmu_alloc_roots(struct kvm_vcpu *vcpu)
} else if (vcpu-arch.mmu.root_level == 0)
root_gfn = 0;
sp = kvm_mmu_get_page(vcpu, root_gfn, i  30,
- PT32_ROOT_LEVEL, !is_paging(vcpu),
+ PT32_ROOT_LEVEL, metaphysical,
  ACC_ALL, NULL, NULL);
root = __pa(sp-spt);
++sp-root_count;
@@ -1144,6 +1151,36 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, 
gva_t gva,
 error_code  PFERR_WRITE_MASK, gfn);
}
+static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
+   u32 error_code)
 
  

you probably mean gpa_t ?



Yes. But the function is assigned to a function pointer. And the type of
that pointer expects gva_t there. So I named the parameter gpa to
describe that a guest physical address is meant there.

  

+{
+   struct page *page;
+   int r;
+
+   ASSERT(vcpu);
+   ASSERT(VALID_PAGE(vcpu-arch.mmu.root_hpa));
+
+   r = mmu_topup_memory_caches(vcpu);
+   if (r)
+   return r;
+
+   down_read(current-mm-mmap_sem);
+   page = gfn_to_page(vcpu-kvm, gpa  PAGE_SHIFT);
+   if (is_error_page(page)) {
+   kvm_release_page_clean(page);
+   up_read(current-mm-mmap_sem);
+   return 1;
+   }
 
  

i dont know if it worth checking it here,
in the worth case we will map the error page and the host will be safe



Looking at the nonpaging_map function it is the right place to check for
the error page.
  


thinking about it again you are right, (for some reason i was thinking 
about old kvm code that was replace already)

the is_error_page should be here.


Joerg Roedel

  



--
woof.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/4] [RFC] MMU Notifiers V1

2008-01-28 Thread Izik Eidus

Andrea Arcangeli wrote:

On Thu, Jan 24, 2008 at 09:56:06PM -0800, Christoph Lameter wrote:
  

Andrea's mmu_notifier #4 -> RFC V1

- Merge subsystem rmap based with Linux rmap based approach
- Move Linux rmap based notifiers out of macro
- Try to account for what locks are held while the notifiers are
  called.
- Develop a patch sequence that separates out the different types of
  hooks so that it is easier to review their use.
- Avoid adding #include to linux/mm_types.h
- Integrate RCU logic suggested by Peter.



I'm glad you're converging on something a bit saner and much much
closer to my code, plus perfectly usable by KVM optimal rmap design
too. It would have preferred if you would have sent me patches like
Peter did for review and merging etc... that would have made review
especially easier. Anyway I'm used to that on lkml so it's ok, I just
need this patch to be included in mainline, everything else is
irrelevant to me.

On a technical merit this still partially makes me sick and I think
it's the last issue to debate.

@@ -971,6 +974,9 @@ int try_to_unmap(struct page *page, int 
else

ret = try_to_unmap_file(page, migration);

+   if (unlikely(PageExternalRmap(page)))
+   mmu_rmap_notifier(invalidate_page, page);
+
if (!page_mapped(page))
ret = SWAP_SUCCESS;
return ret;

I find the above hard to accept, because the moment you work with
physical pages and not "mm+address" I think you couldn't possibly care
if page_mapped is true or false, and I think the above notifier should
be called _outside_ try_to_unmap. Infact I'd call
mmu_rmap_notifier(invalidate_page, page); only if page_unmapped is
false and the linux pte is gone already (practically just before the
page_count == 2 check and after try_to_unmap).
  


i dont understand how is that better than notification on tlb flush?
i mean cpus have very smiler problem as we do,
so why not deal with the notification at the same place as they do (tlb 
flush) ?


moreover notification on tlb flush allow unmodified applications to work 
with us

for example the memory merging driver that i wrote was able to work with kvm
without any change to its source code.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/4] [RFC] MMU Notifiers V1

2008-01-28 Thread Izik Eidus

Andrea Arcangeli wrote:

On Thu, Jan 24, 2008 at 09:56:06PM -0800, Christoph Lameter wrote:
  

Andrea's mmu_notifier #4 - RFC V1

- Merge subsystem rmap based with Linux rmap based approach
- Move Linux rmap based notifiers out of macro
- Try to account for what locks are held while the notifiers are
  called.
- Develop a patch sequence that separates out the different types of
  hooks so that it is easier to review their use.
- Avoid adding #include to linux/mm_types.h
- Integrate RCU logic suggested by Peter.



I'm glad you're converging on something a bit saner and much much
closer to my code, plus perfectly usable by KVM optimal rmap design
too. It would have preferred if you would have sent me patches like
Peter did for review and merging etc... that would have made review
especially easier. Anyway I'm used to that on lkml so it's ok, I just
need this patch to be included in mainline, everything else is
irrelevant to me.

On a technical merit this still partially makes me sick and I think
it's the last issue to debate.

@@ -971,6 +974,9 @@ int try_to_unmap(struct page *page, int 
else

ret = try_to_unmap_file(page, migration);

+   if (unlikely(PageExternalRmap(page)))
+   mmu_rmap_notifier(invalidate_page, page);
+
if (!page_mapped(page))
ret = SWAP_SUCCESS;
return ret;

I find the above hard to accept, because the moment you work with
physical pages and not mm+address I think you couldn't possibly care
if page_mapped is true or false, and I think the above notifier should
be called _outside_ try_to_unmap. Infact I'd call
mmu_rmap_notifier(invalidate_page, page); only if page_unmapped is
false and the linux pte is gone already (practically just before the
page_count == 2 check and after try_to_unmap).
  


i dont understand how is that better than notification on tlb flush?
i mean cpus have very smiler problem as we do,
so why not deal with the notification at the same place as they do (tlb 
flush) ?


moreover notification on tlb flush allow unmodified applications to work 
with us

for example the memory merging driver that i wrote was able to work with kvm
without any change to its source code.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v2

2008-01-17 Thread Izik Eidus

Andrea Arcangeli wrote:

On Wed, Jan 16, 2008 at 07:48:06PM +0200, Izik Eidus wrote:
  

Rik van Riel wrote:


On Sun, 13 Jan 2008 17:24:18 +0100
Andrea Arcangeli <[EMAIL PROTECTED]> wrote:

  
  

In my basic initial patch I only track the tlb flushes which should be
the minimum required to have a nice linux-VM controlled swapping
behavior of the KVM gphysical memory. 


I have a vaguely related question on KVM swapping.

Do page accesses inside KVM guests get propagated to the host
OS, so Linux can choose a reasonable page for eviction, or is
the pageout of KVM guest pages essentially random?
  


Right, selection of the guest OS pages to swap is partly random but
wait: _only_ for the long-cached and hot spte entries. It's certainly
not entirely random.
  
As the shadow-cache is a bit dynamic, every new instantiated spte will

refresh the PG_referenced bit in follow_page already (through minor
faults). not-present fault of swapped non-present sptes, can trigger
minor faults from swapcache too and they'll refresh young regular
ptes.

  
right now when kvm remove pte from the shadow cache, it mark as access the 
page that this pte pointed to.



Yes: the referenced bit in the mmu-notifier invalidate case isn't
useful because it's set right before freeing the page.

  
it was a good solution untill the mmut notifiers beacuse the pages were 
pinned and couldnt be swapped to disk



It probably still makes sense for sptes removed because of other
reasons (not mmu notifier invalidates).
  

agree
  
so now it will have to do something more sophisticated or at least mark as 
access every page pointed by pte

that get insrted to the shadow cache



I think that should already be the case, see the mark_page_accessed in
follow_page, isn't FOLL_TOUCH set, isn't it?
  

yes you are right FOLL_TOUCH is set.

The only thing we clearly miss is a logic that refreshes the
PG_referenced bitflag for "hot" sptes that remains instantiated and
cached for a long time. For regular linux ptes this is done by the cpu
through the young bitflag. But note that not all architectures have
the young bitflag support in hardware! So I suppose the swapping of
the KVM task, is like the swapping any other task but on an alpha
CPU. It works good enough in practice even if we clearly have room for
further optimizations in this area (like there would be on archs w/o
young bit updated in hardware too).

To refresh the PG_referenced bit for long lived hot sptes, I think the
easiest solution is to chain the sptes in a lru, and to start dropping
them when memory pressure start. We could drop one spte every X pages
collected by the VM. So the "age" time factor depends on the VM
velocity and we totally avoid useless shadow page faults when there's
no VM pressure. When VM pressure increases, the kvm non-present fault
will then take care to refresh the PG_referenced bit. This should
solve the aging-issue for long lived and hot sptes. This should
improve the responsiveness of the guest OS during "initial" swap
pressure (after the initial swap pressure, the working set finds
itself in ram again). So it should avoid some swapout/swapin not
required jitter during the initial swap. I see this mostly as a kvm
internal optimization, not strictly related to the mmu notifiers
though.
  
ohh i like it, this is cleaver solution, and i guess the cost of the 
vmexits wont be too high if it will

be not too much aggressive


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v2

2008-01-17 Thread Izik Eidus

Andrea Arcangeli wrote:

On Wed, Jan 16, 2008 at 07:48:06PM +0200, Izik Eidus wrote:
  

Rik van Riel wrote:


On Sun, 13 Jan 2008 17:24:18 +0100
Andrea Arcangeli [EMAIL PROTECTED] wrote:

  
  

In my basic initial patch I only track the tlb flushes which should be
the minimum required to have a nice linux-VM controlled swapping
behavior of the KVM gphysical memory. 


I have a vaguely related question on KVM swapping.

Do page accesses inside KVM guests get propagated to the host
OS, so Linux can choose a reasonable page for eviction, or is
the pageout of KVM guest pages essentially random?
  


Right, selection of the guest OS pages to swap is partly random but
wait: _only_ for the long-cached and hot spte entries. It's certainly
not entirely random.
  
As the shadow-cache is a bit dynamic, every new instantiated spte will

refresh the PG_referenced bit in follow_page already (through minor
faults). not-present fault of swapped non-present sptes, can trigger
minor faults from swapcache too and they'll refresh young regular
ptes.

  
right now when kvm remove pte from the shadow cache, it mark as access the 
page that this pte pointed to.



Yes: the referenced bit in the mmu-notifier invalidate case isn't
useful because it's set right before freeing the page.

  
it was a good solution untill the mmut notifiers beacuse the pages were 
pinned and couldnt be swapped to disk



It probably still makes sense for sptes removed because of other
reasons (not mmu notifier invalidates).
  

agree
  
so now it will have to do something more sophisticated or at least mark as 
access every page pointed by pte

that get insrted to the shadow cache



I think that should already be the case, see the mark_page_accessed in
follow_page, isn't FOLL_TOUCH set, isn't it?
  

yes you are right FOLL_TOUCH is set.

The only thing we clearly miss is a logic that refreshes the
PG_referenced bitflag for hot sptes that remains instantiated and
cached for a long time. For regular linux ptes this is done by the cpu
through the young bitflag. But note that not all architectures have
the young bitflag support in hardware! So I suppose the swapping of
the KVM task, is like the swapping any other task but on an alpha
CPU. It works good enough in practice even if we clearly have room for
further optimizations in this area (like there would be on archs w/o
young bit updated in hardware too).

To refresh the PG_referenced bit for long lived hot sptes, I think the
easiest solution is to chain the sptes in a lru, and to start dropping
them when memory pressure start. We could drop one spte every X pages
collected by the VM. So the age time factor depends on the VM
velocity and we totally avoid useless shadow page faults when there's
no VM pressure. When VM pressure increases, the kvm non-present fault
will then take care to refresh the PG_referenced bit. This should
solve the aging-issue for long lived and hot sptes. This should
improve the responsiveness of the guest OS during initial swap
pressure (after the initial swap pressure, the working set finds
itself in ram again). So it should avoid some swapout/swapin not
required jitter during the initial swap. I see this mostly as a kvm
internal optimization, not strictly related to the mmu notifiers
though.
  
ohh i like it, this is cleaver solution, and i guess the cost of the 
vmexits wont be too high if it will

be not too much aggressive


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v2

2008-01-16 Thread Izik Eidus

Rik van Riel wrote:

On Sun, 13 Jan 2008 17:24:18 +0100
Andrea Arcangeli <[EMAIL PROTECTED]> wrote:

  

In my basic initial patch I only track the tlb flushes which should be
the minimum required to have a nice linux-VM controlled swapping
behavior of the KVM gphysical memory. 



I have a vaguely related question on KVM swapping.

Do page accesses inside KVM guests get propagated to the host
OS, so Linux can choose a reasonable page for eviction, or is
the pageout of KVM guest pages essentially random?

  
right now when kvm remove pte from the shadow cache, it mark as access 
the page that this pte pointed to.
it was a good solution untill the mmut notifiers beacuse the pages were 
pinned and couldnt be swapped to disk
so now it will have to do something more sophisticated or at least mark 
as access every page pointed by pte

that get insrted to the shadow cache

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v2

2008-01-16 Thread Izik Eidus

Rik van Riel wrote:

On Sun, 13 Jan 2008 17:24:18 +0100
Andrea Arcangeli [EMAIL PROTECTED] wrote:

  

In my basic initial patch I only track the tlb flushes which should be
the minimum required to have a nice linux-VM controlled swapping
behavior of the KVM gphysical memory. 



I have a vaguely related question on KVM swapping.

Do page accesses inside KVM guests get propagated to the host
OS, so Linux can choose a reasonable page for eviction, or is
the pageout of KVM guest pages essentially random?

  
right now when kvm remove pte from the shadow cache, it mark as access 
the page that this pte pointed to.
it was a good solution untill the mmut notifiers beacuse the pages were 
pinned and couldnt be swapped to disk
so now it will have to do something more sophisticated or at least mark 
as access every page pointed by pte

that get insrted to the shadow cache

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH 2/3] kvmclock - the host part.

2007-11-13 Thread Izik Eidus

Glauber de Oliveira Costa wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dong, Eddie escreveu:
  

+static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{
+   struct timespec ts;
+   int r;
+
+   if (!vcpu->clock_gpa)
+   return;
+
+   /* Updates version to the next odd number, indicating
we're writing */
+   vcpu->hv_clock.version++;
+   kvm_write_guest(vcpu->kvm, vcpu->clock_gpa,
>hv_clock, PAGE_SIZE);
+
+   kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
+ >hv_clock.last_tsc);
+
+   ktime_get_ts();
  

Do we need to disable preemption here?


After thinking for a little while, you are theoretically right.
In the current state, we could even be preempted between all operations
;-) Maybe after avi's suggestion of moving the call to it it will end up
in a preempt safe region, but anyway, it's safer to add the preempt
markers here.
I'll put it in next version, thanks
  
note that you cant call to kvm_write_guest with preemption disabled 
(witch you do few lines below)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH 2/3] kvmclock - the host part.

2007-11-13 Thread Izik Eidus

Glauber de Oliveira Costa wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dong, Eddie escreveu:
  

+static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{
+   struct timespec ts;
+   int r;
+
+   if (!vcpu-clock_gpa)
+   return;
+
+   /* Updates version to the next odd number, indicating
we're writing */
+   vcpu-hv_clock.version++;
+   kvm_write_guest(vcpu-kvm, vcpu-clock_gpa,
vcpu-hv_clock, PAGE_SIZE);
+
+   kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
+ vcpu-hv_clock.last_tsc);
+
+   ktime_get_ts(ts);
  

Do we need to disable preemption here?


After thinking for a little while, you are theoretically right.
In the current state, we could even be preempted between all operations
;-) Maybe after avi's suggestion of moving the call to it it will end up
in a preempt safe region, but anyway, it's safer to add the preempt
markers here.
I'll put it in next version, thanks
  
note that you cant call to kvm_write_guest with preemption disabled 
(witch you do few lines below)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/