Re: Fix lapic time counter read for periodic mode

2012-11-12 Thread Christian Ehrhardt

Hi,

thanks for your reply.

On Mon, Nov 12, 2012 at 07:32:37PM -0200, Marcelo Tosatti wrote:
> > there is a bug in the emulation of the lapic time counter. In particular
> > what we are seeing is that the time counter of a periodic lapic timer
> > in the guest reads as zero 99% of the time. The patch below fixes that.
> > 
> > The emulation of the lapic timer is done with the help of a hires
> > timer that expires with the same frequency as the lapic counter.
> > New expiration times for a periodic timer are calculated incrementally
> > based on the last scheduled expiration time. This ensures long term
> > accuracy of the emulated timer close to that of the underlying clock.
> > 
> > The actual value of the lapic time counter is calculated from the
> > real time difference between current time and scheduled expiration time
> > of the hires timer. If this difference is negative, the hires timer
> > expired. For oneshot mode this is correctly translated into a zero value
> > for the time counter. However, in periodic mode we must use the negative
> > difference unmodified.
> > 
> >  regards   Christian
> > 
> > Fix lapic time counter read for periodic mode.
> 
> In periodic mode the hrtimer is rearmed once expired, see
> apic_timer_fn. So _get_remaining should return proper value
> even if the guest is not able to process timer interrupts. 
> 
> Can you describe your specific scenario in more detail?

In my specific case, the host is admittedly somewhat special as it
already is a rehosted version of linux, i.e. not running directly on
native hardware. It is still unclear if the host has sufficiently accurate
timer interrupts. This is most likely part of the problems we are seeing.

However, AFAICS apic_timer_fn is only called once per jiffy (at least in
some configurations). In particular, it is not called by
hrtimer_get_remaining. Thus depending on the frequency of the LAPIC timer
in the guest there might _several_ iterations that are missed. This can
probably be mitigated by a hires timer interrupts. However, I think
the problem is still there even in that case.

Additionally, the behaviour that I want to establish matches that of the
PIT timer (in a not completely obvious way, though).

Having said that the proposed patch in my first mail is incomplete, as
the mod_64 does not work correctly for negative values. A fixed version
is below.

 regards Christian

Signed-off-by: Christian Ehrhardt 

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 43e9fad..ec7242c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -810,11 +810,22 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
if (kvm_apic_get_reg(apic, APIC_TMICT) == 0)
return 0;
 
+   /*
+* hrtimer_get_remaining returns the signed difference between
+* timer expiration time and current time. Keep negative return
+* values iff the the timer is periodic.
+*/
remaining = hrtimer_get_remaining(&apic->lapic_timer.timer);
-   if (ktime_to_ns(remaining) < 0)
-   remaining = ktime_set(0, 0);
+   ns = ktime_to_ns(remaining);
+   if (unlikely(ns < 0)) {
+   if (apic_lvtt_period(apic))
+   ns = apic->lapic_timer.period -
+   mod_64(-ns, apic->lapic_timer.period);
+   else
+   ns = 0;
+   }
 
-   ns = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
+   ns = mod_64(ns, apic->lapic_timer.period);
tmcct = div64_u64(ns,
 (APIC_BUS_CYCLE_NS * apic->divide_count));
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fix lapic time counter read for periodic mode

2012-11-11 Thread Christian Ehrhardt

Hi,

there is a bug in the emulation of the lapic time counter. In particular
what we are seeing is that the time counter of a periodic lapic timer
in the guest reads as zero 99% of the time. The patch below fixes that.

The emulation of the lapic timer is done with the help of a hires
timer that expires with the same frequency as the lapic counter.
New expiration times for a periodic timer are calculated incrementally
based on the last scheduled expiration time. This ensures long term
accuracy of the emulated timer close to that of the underlying clock.

The actual value of the lapic time counter is calculated from the
real time difference between current time and scheduled expiration time
of the hires timer. If this difference is negative, the hires timer
expired. For oneshot mode this is correctly translated into a zero value
for the time counter. However, in periodic mode we must use the negative
difference unmodified.

 regards   Christian

Fix lapic time counter read for periodic mode.

Signed-off-by: Christian Ehrhardt 

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 43e9fad..eff902d 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -810,8 +810,13 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
if (kvm_apic_get_reg(apic, APIC_TMICT) == 0)
return 0;
 
+   /*
+* hrtimer_get_remaining returns the signed difference between
+* timer expiration time and current time. Keep negative return
+* value iff the the timer is periodic.
+*/
remaining = hrtimer_get_remaining(&apic->lapic_timer.timer);
-   if (ktime_to_ns(remaining) < 0)
+   if (ktime_to_ns(remaining) < 0 && !apic_lvtt_period(apic))
remaining = ktime_set(0, 0);
 
ns = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] kvm-s390: streamline memslot handling - rebased

2009-06-15 Thread Christian Ehrhardt

Avi Kivity wrote:

Marcelo Tosatti wrote:
 

(continued below)
   

Anyway, yeah, the set request / wait mechanism you implement here is
quite similar to the idea mentioned earlier that could be used for 
x86.


Just get rid of this explicit KVM_REQ_MMU_RELOAD knowledge in
arch-independent code please (if you want to see this merged).

I agree to lift the wait part to other archs later if needed, but 
as  mentioned above I could move this to arch code to the cost of 
one arch  hook more. But as also mentioned it doesn't really hurt. I 
agree that it  does not need to be KVM_REQ_MMU_RELOAD specific, we 
could just  walk/clear/wake all bits on that vcpu->requests variable.

Would that be generic enough in your opinion ?



Don't know.

Avi?
  


I think I lost the thread here, but I'll try.  Isn't the wake part 
make_all_vcpus_request() in kvm_main.c?  The wait part could be moved 
to a similar generic function.



I'll try to summarize my current thoughts a bit:
The rebased patch series brings several fixes and the wait/wakeup 
mechanism which is in discussion here.
As explained before this keeps the new wait implementation in s390 arch 
code which allows us to experiment with it. Later if we are happy with 
it we might (or not) continue the merge and bring this mechanism to 
make_all_vcpus_request (as on x86 you don't have the issues I try to fix 
here we don't need to hurry bringing that into generic code).


Well now to the wait/wakeup which is here in discussion in detail:
The s390 arch code can kick a guest, but we don't know implicitly (as 
x86 does) that the kick succeeded, it might happen somewhen sooner or later.
Therefore the code uses wait_on_bit to wait until the vcpu->request bit 
is consumed.
To ensure cleanup of these waiting threads in some special cases the 
clear&wake up is also needed at other places than the real bit 
consumption. One of them is the vcpu release code where we should 
clear&wakeup all waiters (Marcelo correctly pointed out that we should 
not be bit specific there, so I just just wake up all in the updated code).


That was the discussion here: "if it would be ok to clear & wake up 
all". As wake_up_bit doesn't hurt if there is no waiter it looks like 
the best solution to to do that in the generic part of vcpu_release. If 
ever someone else waits for this or another bit in vcpu->requests, the 
code ensures all of them are awaken on vcpu release.


I send an updated version of the rebased series in a few minutes, 
containing updates related to what marcelo pointed out.


P.S. in case you think we need much more discussions we might try to 
catch up on irc to save this thread a few cycles :-)


--

Grüsse / regards, Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] kvm-s390: streamline memslot handling - rebased

2009-06-08 Thread Christian Ehrhardt

Avi Kivity wrote:
Christian Ehrhardt wrote:  


Really need that smp_mb__after_clear_bit ? AFAIK test_and_clear_bit
implies a barrier?
  


Well I agree that practically test_and_clear_bit has a barrier on 
s390, but as far as I read Documentation/atomic_ops.txt at line 
339-360 I think the interface does not imply it so I wanted to add it 
explicitly. I would be happy if someone really knows the in depth 
details here and corrects me :-)


IIUC rmw bitops are full memory barriers.  The non-rmw (from the 
caller's perspective), clear_bit() and set_bit(), are not.



Ok, as the real implementation has one + memory-barriers.txt describing 
it with barrier and finally include/asm-generic/bitops/atomic.h 
descirbes it that way too I think I can drop the explicit smb_wb from my 
patch in the next update (I wait a bit to give the discussion about the 
wati/bits a bit more time).


Hmm ... would that be worth a clarifying patch to atomic_ops.txt that 
confused me in the first place ?


--

Grüsse / regards, Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] kvm-s390: streamline memslot handling - rebased

2009-06-08 Thread Christian Ehrhardt

Marcelo Tosatti wrote:

On Tue, Jun 02, 2009 at 04:26:11PM +0200, ehrha...@linux.vnet.ibm.com wrote:
  

From: Christian Ehrhardt 


[...]

@@ -706,13 +713,19 @@ int kvm_arch_set_memory_region(struct kv
 
 	/* request update of sie control block for all available vcpus */

for (i = 0; i < KVM_MAX_VCPUS; ++i) {
-   if (kvm->vcpus[i]) {
-   if (test_and_set_bit(KVM_REQ_MMU_RELOAD,
-   &kvm->vcpus[i]->requests))
-   continue;
-   kvm_s390_inject_sigp_stop(kvm->vcpus[i],
- ACTION_VCPUREQUEST_ON_STOP);
-   }
+   vcpu = kvm->vcpus[i];
+   if (!vcpu)
+   continue;
+
+   if (!test_and_set_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests))
+   continue;
+
+   if (vcpu->cpu == -1)
+   continue;



What happens if the check for cpu == -1 races with kvm_arch_vcpu_put?
This context will wait until the vcpu_put context is scheduled back in
to clear the bit? Is that OK?
  
It either comes back to clear the bit or it is consumed on deletion of 
the vcpu. Both ways are ok. The question we have to answer is if it 
might stall the mem update ioctl for too long.
Because eventually the check for vcpu->cpu == -1 is just an optimization 
if we would completely ignore remove it we would have the same problem 
-> could it stall the set mem operation too much. That means the "race" 
is not an issue it might just be sub-optimal, but the chance for a long 
stall could become an issue. Unfortunately I have no better approach to 
that (yet), until then this I like this implementation more than what we 
would have without all the corner case fixes in that patch series.



+
+   kvm_s390_inject_sigp_stop(vcpu, ACTION_VCPUREQUEST_ON_STOP);
+   wait_on_bit(&vcpu->requests, KVM_REQ_MMU_RELOAD,
+   wait_bit_schedule, TASK_UNINTERRUPTIBLE);
}



 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+   vcpu->cpu = -1;
save_fp_regs(&vcpu->arch.guest_fpregs);
  

[...]

+++ kvm/arch/s390/kvm/kvm-s390.h
@@ -92,6 +92,13 @@ static inline unsigned long kvm_s390_han
if (!vcpu->requests)
return 0;
 
+	/* requests that can be handled at all levels */

+   if (test_and_clear_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests)) {
+   smp_mb__after_clear_bit();



Really need that smp_mb__after_clear_bit ? AFAIK test_and_clear_bit
implies a barrier?
  


Well I agree that practically test_and_clear_bit has a barrier on s390, 
but as far as I read Documentation/atomic_ops.txt at line 339-360 I 
think the interface does not imply it so I wanted to add it explicitly. 
I would be happy if someone really knows the in depth details here and 
corrects me :-)



+   wake_up_bit(&vcpu->requests, KVM_REQ_MMU_RELOAD);
+   kvm_s390_vcpu_set_mem(vcpu);
+   }
+
return vcpu->requests;
 }
 
Index: kvm/virt/kvm/kvm_main.c

===
--- kvm.orig/virt/kvm/kvm_main.c
+++ kvm/virt/kvm/kvm_main.c
@@ -1682,6 +1682,10 @@ static int kvm_vcpu_release(struct inode
 {
struct kvm_vcpu *vcpu = filp->private_data;
 
+	clear_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests);

+   smp_mb__after_clear_bit();
+   wake_up_bit(&vcpu->requests, KVM_REQ_MMU_RELOAD);
+



And this should be generic? Say if other architectures want to make use 
of a similar wait infrastructure. Talk is cheap.
  
Clear bit and wake up on release doesn't hurt any architecture, but it 
is at a good place fine for those using the mechanism to ensure cleaning 
up outstanding things when closing a vcpu fd.
I thought its not worth to add kvm_ARCH_vcpu_release for it while I 
could do so if we want it separated.

(continued below)

Anyway, yeah, the set request / wait mechanism you implement here is
quite similar to the idea mentioned earlier that could be used for x86.

Just get rid of this explicit KVM_REQ_MMU_RELOAD knowledge in
arch-independent code please (if you want to see this merged).
  
I agree to lift the wait part to other archs later if needed, but as 
mentioned above I could move this to arch code to the cost of one arch 
hook more. But as also mentioned it doesn't really hurt. I agree that it 
does not need to be KVM_REQ_MMU_RELOAD specific, we could just 
walk/clear/wake all bits on that vcpu->requests variable.

Would that be generic enough in your opinion ?

Later it can all be lifted off to arch independent code.
  
True for the wait part which can evolve in our arch code until it is 
ripe to get cross arch merged.


--

Grüsse / regards, Christian Ehrhardt
IBM Linux Technology Center, O

Re: [PATCH 4/4] kvm-s390: streamline memslot handling - v6

2009-06-02 Thread Christian Ehrhardt

Marcelo Tosatti wrote:

On Sun, May 31, 2009 at 11:22:58AM +0300, Avi Kivity wrote:
  

ehrha...@linux.vnet.ibm.com wrote:


From: Christian Ehrhardt 

*updates in v6*
- ensure the wait_on_bit waiter is notified
- move the reset of requests to kvm_vcpu_release to drop them early

*updates in v5*
- ensure dropping vcpu all requests while freeing a vcpu

*updates in v4*
- kickout only scheduled vcpus (its superfluous and wait might hang forever on
  not running vcpus)

  
  
v3 is already in (and pushed so I can't unapply), so please rebase on  
top of current git.



Christian, 


Seems a good step toward further unification. Can you please rebase as
requested? 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
The missing updates are mostly fixes or code merges due to our 
discussions and should be on the list in a few minutes.


--

Grüsse / regards, Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] kvm-s390: infrastructure to kick vcpus out of guest state

2009-05-28 Thread Christian Ehrhardt

Avi Kivity wrote:

Christian Ehrhardt wrote:

So you _need_ a mechanism to kick all vcpus out of guest mode?
  
I have a mechanism to kick a vcpu, and I use it. Due to the fact that 
smp_call_* don't work as kick for us the kick is an arch specific 
function.

I hop ethat clarified this part :-)



You could still use make_all_vcpus_request(), just change 
smp_call_function_many() to your own kicker.


Yes and I like this idea for further unification, but I don't want it 
mixed too much into the patches in discussion atm.
Because on one hand I have some problems giving my arch specific kick a 
behaviour like "return when the guest WAS kicked" and on the other hand 
I would e.g. also need to streamline the check in make_all_vcpus_request 
which cpu is running etc because vcpu->cpu stays -1 all the time on s390 
(never used).


Therefore I would unify things step by step and this way allow single 
task to went off my task pile here :-)


--

Grüsse / regards, Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] kvm-s390: infrastructure to kick vcpus out of guest state

2009-05-28 Thread Christian Ehrhardt

Marcelo Tosatti wrote:

On Tue, May 26, 2009 at 10:02:59AM +0200, Christian Ehrhardt wrote:
  

Marcelo Tosatti wrote:


On Mon, May 25, 2009 at 01:40:49PM +0200, ehrha...@linux.vnet.ibm.com wrote:
  
  

From: Christian Ehrhardt 

To ensure vcpu's come out of guest context in certain cases this patch adds a
s390 specific way to kick them out of guest context. Currently it kicks them
out to rerun the vcpu_run path in the s390 code, but the mechanism itself is
expandable and with a new flag we could also add e.g. kicks to userspace etc.

Signed-off-by: Christian Ehrhardt 



"For now I added the optimization to skip kicking vcpus out of guest
that had the request bit already set to the s390 specific loop (sent as
v2 in a few minutes).

We might one day consider standardizing some generic kickout levels e.g.
kick to "inner loop", "arch vcpu run", "generic vcpu run", "userspace",
... whatever levels fit *all* our use cases. And then let that kicks be
implemented in an kvm_arch_* backend as it might be very different how
they behave on different architectures."

That would be ideal, yes. Two things make_all_requests handles: 


1) It disables preemption with get_cpu(), so it can reliably check for
cpu id. Somehow you don't need that for s390 when kicking multiple
vcpus?
  
  
I don't even need the cpuid as make_all_requests does, I just insert a  
special bit in the vcpu arch part and the vcpu will "come out to me 
(host)".
Fortunateley the kick is rare and fast so I can just insert it  
unconditionally (it's even ok to insert it if the vcpu is not in guest  
state). That prevents us from needing vcpu lock or detailed checks which  
would end up where we started (no guarantee that vcpu's come out of  
guest context while trying to aquire all vcpu locks)



Let me see if I get this right: you kick the vcpus out of guest mode by
using a special bit in the vcpu arch part. OK.

What I don't understand is this: 
"would end up where we started (no guarantee that vcpu's come out of

guest context while trying to aquire all vcpu locks)"
  
initially the mechanism looped over vcpu's and just aquired the vcpu 
lock and then updated the vcpu.arch infor directly.
Avi mentioned that we have no guarantee if/when the vcpu will come out 
of guest context to free a lock currently held and suggested the 
mechanism x86 uses via setting vcpu->request and kicking the vcpu. Thats 
the eason behind "end up where we (the discussion) started", if we would 
need the vcpu lock again we would be at the beginnign of the discussion.

So you _need_ a mechanism to kick all vcpus out of guest mode?
  
I have a mechanism to kick a vcpu, and I use it. Due to the fact that 
smp_call_* don't work as kick for us the kick is an arch specific function.

I hop ethat clarified this part :-)

2) It uses smp_call_function_many(wait=1), which guarantees that by the
time make_all_requests returns no vcpus will be using stale data (the
remote vcpus will have executed ack_flush).
  
  
yes this is really a part my s390 implementation doesn't fulfill yet.  
Currently on return vcpus might still use the old memslot information.
As mentioned before letting all interrupts come "too far" out of the hot  
loop would be a performance issue, therefore I think I will need some  
request&confirm mechanism. I'm not sure yet but maybe it could be as  
easy as this pseudo code example:


# in make_all_requests
# remember we have slots_lock write here and the reentry that updates  
the vcpu specific data aquires slots_lock for read.

loop vcpus
 set_bit in vcpu requests
 kick vcpu #arch function
endloop

loop vcpus
 until the requested bit is disappeared #as the reentry path uses  
test_and_clear it will disappear

endloop

That would be a implicit synchronization and should work, as I wrote  
before setting memslots while the guest is running is rare if ever  
existant for s390. On x86 smp_call_many could then work without the wait  
flag being set.



I see, yes. 

  
But I assume that this synchronization approach is slower as it  
serializes all vcpus on reentry (they wait for the slots_lock to get  
dropped), therefore I wanted to ask how often setting memslots on  
runtime will occur on x86 ? Would this approach be acceptable ?



For x86 we need slots_lock for two things:

1) to protect the memslot structures from changing (very rare), ie:
kvm_set_memory.

2) to protect updates to the dirty bitmap (operations on behalf of
guest) which take slots_lock for read versus updates to that dirty
bitmap (an ioctl that retrieves what pages have been dirtied in the
memslots, and clears the dirtyness info).

All you need for S390 is 1), AFAICS.
  

correct

For 1), we can drop the slots_lock usage, but instead create an
explicit synchronization point, where all vcpus are forced to (

Re: [PATCH 3/3] kvm-s390: streamline memslot handling

2009-05-26 Thread Christian Ehrhardt

Avi Kivity wrote:

Christian Bornträger wrote:

Am Dienstag 26 Mai 2009 09:57:58 schrieb Avi Kivity:
  

[...]
In our low-level interrupt handler we do check for signal_pending, 
machine_check_pending and need_resched to leave the sie instruction. 
For anything else a the host sees a cpu bound guest always in the SIE 
instruction.   


Okay, now I understand (and agree with) you multi-level kick thing.  
Maybe we could do it like so:


Interrupt handler (on s390 only) checks vcpu->requests, handles the 
ones it cans.  If bits are still set, it exits to arch loop, which 
handles the bits it cans.  If bits are still set, it exits to the 
generic code loop, which can finally exit to userspace.


Does this fit with s390 hardware?

I like this idea instead of explicitly kicking to an (upper) level to 
use the lowest kick and exit if not able to handle.
I think it should work (no guarantee) and I try to come up with 
something in the next few days - either a updated patch series or 
additional discussion input :-).


--

Grüsse / regards, Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] kvm-s390: infrastructure to kick vcpus out of guest state

2009-05-26 Thread Christian Ehrhardt

Marcelo Tosatti wrote:

On Mon, May 25, 2009 at 01:40:49PM +0200, ehrha...@linux.vnet.ibm.com wrote:
  

From: Christian Ehrhardt 

To ensure vcpu's come out of guest context in certain cases this patch adds a
s390 specific way to kick them out of guest context. Currently it kicks them
out to rerun the vcpu_run path in the s390 code, but the mechanism itself is
expandable and with a new flag we could also add e.g. kicks to userspace etc.

Signed-off-by: Christian Ehrhardt 



"For now I added the optimization to skip kicking vcpus out of guest
that had the request bit already set to the s390 specific loop (sent as
v2 in a few minutes).

We might one day consider standardizing some generic kickout levels e.g.
kick to "inner loop", "arch vcpu run", "generic vcpu run", "userspace",
... whatever levels fit *all* our use cases. And then let that kicks be
implemented in an kvm_arch_* backend as it might be very different how
they behave on different architectures."

That would be ideal, yes. Two things make_all_requests handles: 


1) It disables preemption with get_cpu(), so it can reliably check for
cpu id. Somehow you don't need that for s390 when kicking multiple
vcpus?
  
I don't even need the cpuid as make_all_requests does, I just insert a 
special bit in the vcpu arch part and the vcpu will "come out to me (host)".
Fortunateley the kick is rare and fast so I can just insert it 
unconditionally (it's even ok to insert it if the vcpu is not in guest 
state). That prevents us from needing vcpu lock or detailed checks which 
would end up where we started (no guarantee that vcpu's come out of 
guest context while trying to aquire all vcpu locks)



2) It uses smp_call_function_many(wait=1), which guarantees that by the
time make_all_requests returns no vcpus will be using stale data (the
remote vcpus will have executed ack_flush).
  
yes this is really a part my s390 implementation doesn't fulfill yet. 
Currently on return vcpus might still use the old memslot information.
As mentioned before letting all interrupts come "too far" out of the hot 
loop would be a performance issue, therefore I think I will need some 
request&confirm mechanism. I'm not sure yet but maybe it could be as 
easy as this pseudo code example:


# in make_all_requests
# remember we have slots_lock write here and the reentry that updates 
the vcpu specific data aquires slots_lock for read.

loop vcpus
 set_bit in vcpu requests
 kick vcpu #arch function
endloop

loop vcpus
 until the requested bit is disappeared #as the reentry path uses 
test_and_clear it will disappear

endloop

That would be a implicit synchronization and should work, as I wrote 
before setting memslots while the guest is running is rare if ever 
existant for s390. On x86 smp_call_many could then work without the wait 
flag being set.
But I assume that this synchronization approach is slower as it 
serializes all vcpus on reentry (they wait for the slots_lock to get 
dropped), therefore I wanted to ask how often setting memslots on 
runtime will occur on x86 ? Would this approach be acceptable ?


If it is too adventurous for now I can implement it that way in the s390 
code and we split the long term discussion (synchronization + generic 
kickout levels + who knows what comes up).

If smp_call_function_many is hidden behind kvm_arch_kick_vcpus, can you
make use of make_all_requests for S390 (without the smp_call_function 
performance impact you mentioned) ?
  
In combination with the request&confirm mechanism desribed above it 
should work if smp_call function and all the cpuid gathering which 
belongs to it is hidden behind kvm_arch_kick_vcpus.

For x86 we can further optimize make_all_requests by checking REQ_KICK,
and kvm_arch_kick_vcpus would be a good place for that.

And the kickout levels idea you mentioned can come later, as an
optimization?

yes I agree splitting that to a later optimization is a good idea.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  



--

Grüsse / regards, Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] kvm-s390: streamline memslot handling

2009-05-25 Thread Christian Ehrhardt

Christian Ehrhardt wrote:

Avi Kivity wrote:

ehrha...@linux.vnet.ibm.com wrote:

From: Christian Ehrhardt 


[...]

-/* update sie control blocks, and unlock all vcpus */
+/* request update of sie control block for all available vcpus */
 for (i = 0; i < KVM_MAX_VCPUS; ++i) {
 if (kvm->vcpus[i]) {
-kvm->vcpus[i]->arch.sie_block->gmsor =
-kvm->arch.guest_origin;
-kvm->vcpus[i]->arch.sie_block->gmslm =
-kvm->arch.guest_memsize +
-kvm->arch.guest_origin +
-VIRTIODESCSPACE - 1ul;
-mutex_unlock(&kvm->vcpus[i]->mutex);
+set_bit(KVM_REQ_MMU_RELOAD, &kvm->vcpus[i]->requests);
+kvm_s390_inject_sigp_stop(kvm->vcpus[i],
+  ACTION_RELOADVCPU_ON_STOP);
 }
 }
  


There already exists a loop which does this, see 
make_all_cpus_request().  It uses an IPI (Marcelo, can't it use the 
reschedule interrupt?).  It has a couple of optimizations -- if the 
request is already set, it skips the IPI, and it avoids the IPI for 
vcpus out of guest mode.  Maybe it could fit s390 too.
I assume that the IPI on x86 is a implicit consequence of the 
smp_call_function_many function, but I think this doesn't work that 
way for us. The kick implied by that call would be recieved, but not 
reach the code the checke vcpu->request. I could add that behaviour, 
but that could make our normal interrupt handling much slower. 
Therefore I don't want to call that function, but on the other hand I 
like the "skip if the request is already set" functionality and think 
about adding that in my loop.




For now I added the optimization to skip kicking vcpus out of guest that 
had the request bit already set to the s390 specific loop (sent as v2 in 
a few minutes).


We might one day consider standardizing some generic kickout levels e.g. 
kick to "inner loop", "arch vcpu run", "generic vcpu run", "userspace", 
... whatever levels fit *all* our use cases. And then let that kicks be 
implemented in an kvm_arch_* backend as it might be very different how 
they behave on different architectures. In case an architecture cannot 
achive reaching the specified kickout level it has to kick to the next 
available upper level which eventually will reach the desired step on 
the way to re-run the vcpu.
Alltogether this should lead to a much more reliable and transparent 
interface that finally should be used all across the generic code.



--

Grüsse / regards, Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] kvm-s390: streamline memslot handling

2009-05-25 Thread Christian Ehrhardt

Avi Kivity wrote:

ehrha...@linux.vnet.ibm.com wrote:

From: Christian Ehrhardt 

This patch relocates the variables kvm-s390 uses to track guest mem 
addr/size.
As discussed dropping the variables at struct kvm_arch level allows 
to use the
common vcpu->request based mechanism to reload guest memory if e.g. 
changes

via set_memory_region.
The kick mechanism introduced in this series is used to ensure 
running vcpus

leave guest state to catch the update.


 
 rerun_vcpu:

+if (vcpu->requests)
+if (test_and_clear_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests))
+kvm_s390_vcpu_set_mem(vcpu);
+
 /* verify, that memory has been registered */
-if (!vcpu->kvm->arch.guest_memsize) {
+if (!vcpu->arch.sie_block->gmslm) {
 vcpu_put(vcpu);
+VCPU_EVENT(vcpu, 3, "%s", "no memory registered to run vcpu");
 return -EINVAL;
 }



x86 uses a double check: first we check vcpu->requests outside atomic 
context, then we enter the critical section and check again for 
signals and vcpu->requests.


This allows us (a) to do naughty things in vcpu->requests handlers, 
(b) keep the critical section short.


Does this apply here?


The patch already keeps the critical inner loop clear of extra code.
The check for vcpu->requests I added is only reached by either a 
heavyweight (userspace) exit/reentry or the explicit kickout of a vcpu 
to this label. Therefore weit fulfills a+b as you mentioned them above. 
Additionally the s390 reload is very rare as well as fast, therefore it 
would not even be an issue.



-/* update sie control blocks, and unlock all vcpus */
+/* request update of sie control block for all available vcpus */
 for (i = 0; i < KVM_MAX_VCPUS; ++i) {
 if (kvm->vcpus[i]) {
-kvm->vcpus[i]->arch.sie_block->gmsor =
-kvm->arch.guest_origin;
-kvm->vcpus[i]->arch.sie_block->gmslm =
-kvm->arch.guest_memsize +
-kvm->arch.guest_origin +
-VIRTIODESCSPACE - 1ul;
-mutex_unlock(&kvm->vcpus[i]->mutex);
+set_bit(KVM_REQ_MMU_RELOAD, &kvm->vcpus[i]->requests);
+kvm_s390_inject_sigp_stop(kvm->vcpus[i],
+  ACTION_RELOADVCPU_ON_STOP);
 }
 }
  


There already exists a loop which does this, see 
make_all_cpus_request().  It uses an IPI (Marcelo, can't it use the 
reschedule interrupt?).  It has a couple of optimizations -- if the 
request is already set, it skips the IPI, and it avoids the IPI for 
vcpus out of guest mode.  Maybe it could fit s390 too.
I assume that the IPI on x86 is a implicit consequence of the 
smp_call_function_many function, but I think this doesn't work that way 
for us. The kick implied by that call would be recieved, but not reach 
the code the checke vcpu->request. I could add that behaviour, but that 
could make our normal interrupt handling much slower. Therefore I don't 
want to call that function, but on the other hand I like the "skip if 
the request is already set" functionality and think about adding that in 
my loop.


--

Grüsse / regards, Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] kvm-s390: Fix memory slot versus run

2009-05-20 Thread Christian Ehrhardt

Avi Kivity wrote:

Christian Ehrhardt wrote:
The bad thing on vcpu->request in that case is that I don't want 
the async behaviour of vcpu->requests in that case, I want the 
memory slot updated in all vcpu's when the ioctl is returning.


You mean, the hardware can access the vcpu control block even when 
the vcpu is not running? 
No, hardware only uses it with a running vcpu, but I realised my 
own fault while changing the code to vcpu->request style.
For s390 I need to update the KVM->arch and *all* 
vcpu->arch->sie_block... data synchronously.


Out of interest, can you explain why?

Sure I'll try to give an example.

a) The whole guest has "one" memory slot representing all it's 
memory. Therefore some important values like guest_origin and 
guest_memsize (one slot so it's just addr+size) are kept at VM level 
in kvm->arch.


It should really be kept in kvm->memslots[0]->{userspace_addr, 
npages}.  This is common to all architectures.
As I said wanted to do that, but due to the need to relocate my work 
environment to a new laptop I was a bit stalled the last few days.
A patch series implementing it in a streamlined (storing in memslots 
only, slots_lock, vcpu->request, ...) way will soon appear on the list.

[...]
c) we have other code e.g. all our copy_from/to_guest stuff that uses 
the kvm->arch values


You want to drop these and use kvm_read_guest() / kvm_write_guest().
I put it on my "low-prio-but-very-useful todo list" to take a look at 
that too as one of the next opportunities to streamline our code.


[...]

--

Grüsse / regards, Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] kvm-s390: Fix memory slot versus run

2009-05-12 Thread Christian Ehrhardt

Avi Kivity wrote:

Christian Ehrhardt wrote:

Avi Kivity wrote:

Christian Ehrhardt wrote:


The bad thing on vcpu->request in that case is that I don't want 
the async behaviour of vcpu->requests in that case, I want the 
memory slot updated in all vcpu's when the ioctl is returning.


You mean, the hardware can access the vcpu control block even when 
the vcpu is not running? 
No, hardware only uses it with a running vcpu, but I realised my own 
fault while changing the code to vcpu->request style.
For s390 I need to update the KVM->arch and *all* 
vcpu->arch->sie_block... data synchronously.


Out of interest, can you explain why?

Sure I'll try to give an example.

a) The whole guest has "one" memory slot representing all it's memory. 
Therefore some important values like guest_origin and guest_memsize (one 
slot so it's just addr+size) are kept at VM level in kvm->arch.
b) We fortunately have cool hardware support for "nearly everything"(tm) 
:-) In this case for example we set in vcpu->arch.sie_block the values 
for origin and size translated into a "limit" to get memory management 
virtualization support.
c) we have other code e.g. all our copy_from/to_guest stuff that uses 
the kvm->arch values


If we would allow e.g. updates of a memslot (or as the patch supposes to 
harden the set_memory_region code against inconsiderate code changes in 
other sections) it might happen that we set the kvm->arch information 
but the vcpu->arch->sie_block stuff not until next reentry. Now 
concurrently the running vcpu could cause some kind of fault that 
involves a copy_from/to_guest. That way we could end up with potentially 
invalid handling of that fault (fault handling and running guest would 
use different userspace adresses until it is synced on next vcpu 
reentry) - it's theoretical I know, but it might cause some issues that 
would be hard to find.


On the other hand for the long term I wanted to note that all our 
copy_from/to_guest functions is per vcpu, so when we some day implement 
updateable memslots, multiple memslots or even just fill "free time"(tm) 
and streamline our code we could redesign that origin/size storage. This 
could be done multiple ways, either just store it per vcpu or with a 
lock for the kvm->arch level variables - both ways and maybe more could 
then use the vcpu->request based approach, but unfortunately it's 
neither part of that patch nor of the current effort to do that.


The really good thing is, because of our discussion about that I now 
have a really detailed idea how I can improve that code aside from this 
bugfix patch (lets hope not too far in the future).



That makes the "per vcpu resync on next entry" approach not feasible.

On the other hand I realized at the same moment that the livelock 
should be no issue for us, because as I mentioned:

a) only one memslot
b) a vcpu can't run without memslot
So I don't even need to kick out vcpu's, they just should not be 
running.
Until we ever support multiple slots, or updates of the existing 
single slot this should be ok, so is the bugfix patch this should be.
To avoid a theoretical deadlock in case other code is changing 
(badly) it should be fair to aquire the lock with mutex_trylock and 
return -EINVAL if we did not get all locks.


OK.





--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] kvm-s390: Fix memory slot versus run

2009-05-12 Thread Christian Ehrhardt

Avi Kivity wrote:

Christian Ehrhardt wrote:


The bad thing on vcpu->request in that case is that I don't want the 
async behaviour of vcpu->requests in that case, I want the memory 
slot updated in all vcpu's when the ioctl is returning.


You mean, the hardware can access the vcpu control block even when the 
vcpu is not running? 
No, hardware only uses it with a running vcpu, but I realised my own 
fault while changing the code to vcpu->request style.
For s390 I need to update the KVM->arch and *all* 
vcpu->arch->sie_block... data synchronously.

That makes the "per vcpu resync on next entry" approach not feasible.

On the other hand I realized at the same moment that the livelock should 
be no issue for us, because as I mentioned:

a) only one memslot
b) a vcpu can't run without memslot
So I don't even need to kick out vcpu's, they just should not be running.
Until we ever support multiple slots, or updates of the existing single 
slot this should be ok, so is the bugfix patch this should be.
To avoid a theoretical deadlock in case other code is changing (badly) 
it should be fair to aquire the lock with mutex_trylock and return 
-EINVAL if we did not get all locks.


[...]

If I can change it that way it will definitely require some testing.
... to be continued :-)


I definitely recommend it -- would bring s390 more in line with the 
other ports (I know it's a backward step for you :)


Note our plan is to change slots_lock to RCU, so it's even better if 
you use memslots.
As long as we have the special conditions mentioned above I think its ok 
to implement it the way I do it now.
I agree that if we ever support multiple memslots we should strive for a 
common solution.


p.s. the second patch in the series ensures that a vcpu really never 
runs without a memslot being set as this was another bug we had.


--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] kvm-s390: Fix memory slot versus run

2009-05-11 Thread Christian Ehrhardt

Avi Kivity wrote:

Christian Ehrhardt wrote:
I thought about implementing it with slots_lock, vcpu->request, etc 
but it really looks like overkill for s390.


We could make (some of) it common code, so it won't look so bad.  
There's value in having all kvm ports do things similarly; though of 
course we shouldn't force the solution when it isn't really needed.


vcpu->requests is useful whenever we modify global VM state that 
needs to be seen by all vcpus in host mode; see  
kvm_reload_remote_mmus().
yeah I read that code after your first hint in that thread, and I 
agree that merging some of this into common code might be good.
But in my opinion not now for this bugfix patch (the intention is 
just to prevent a user being able to crash the host via vcpu 
create,set mem& and vcpu run in that order).
It might be a good point to further streamline this once we use the 
same userspace code, but I think it doesn't make sense yet.


Sure, don't mix bugfixes with infrastructure changes, when possible.

At least today we can assume that we only have one memslot. 
Therefore a set_memslot with already created vcpu's will still not 
interfere with running vcpus (they can't run without memslot and 
since we have only one they won't run).
Anyway I the code is prepared to "meet" running vcpus, because it 
might be different in future. To prevent the livelock issue I 
changed the code using mutex_trylock and in case I can't get the 
lock I explicitly let the vcpu exit from guest.


Why not do it unconditionally?

hmm I might have written that misleading - eventually it's a loop 
until it got the lock

 while !trylock
   kick vcpu out of guest
   schedule

There is no reason to kick out guests where I got the lock cleanly as 
far as I see.
Especially as I expect the vcpus not running in the common case as i 
explained above (can't run without memslot + we only have one => no 
vcpu will run).


Still livelockable, unless you stop the vcpu from entering the guest 
immediately.


That's why vcpu->requests is so powerful.  Not only you kick the vcpu 
out of guest mode, you force it to synchronize when it tries to enter 
again.




The bad thing on vcpu->request in that case is that I don't want the 
async behaviour of vcpu->requests in that case, I want the memory slot 
updated in all vcpu's when the ioctl is returning.
Looking at vcpu->request based solution I don't find the synchronization 
I need. The changes to  vcpu->arch.guest_origin/guest_memsize and the 
changes to vcpu->arch.sie_block->gmsor/gmslm need to happen without the 
vcpu running.
Therefor i want the vcpu lock _before_ I update the both structs, 
otherwise it could be racy (at least on s390).


On the other hand while it is very++ unlikely to happen you are still 
right that it could theoretically livelock there.
I might use vcpu->request in to not enter vcpu run again after such a 
"kick" out of guest state.
It would be checked on vcpu_run enter and could then drop the lock, call 
schedule, relock and check the flag again until it is cleared.
I'm not yet happy with this solution as I expect it to end up in 
something like a reference count which then would not fit into the 
existing vcpu->request flags :-/


As I mentioned above the changes to vcpu->arch and vcpu->arch->sie_block 
have to be exclusive with the vcpu not running.
If I would find something as "transport" for the information I have on 
set_memory_slot (origin/size) until the next vcpu_run entry I could do 
both changes there synchronously.
In that case I could really use your suggested solution with 
vcpu->request, kick out unconditionally and set values on next (re-)entry.


Hmmm .. Maybe I can find all I need on reentry in vcpu->kvm->memslots[].
If I can change it that way it will definitely require some testing.
... to be continued :-)

--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] kvm-s390: Fix memory slot versus run

2009-05-11 Thread Christian Ehrhardt

Avi Kivity wrote:

Christian Ehrhardt wrote:


On x86, we use slots_lock to protect memory slots.  When we change 
the global memory configuration, we set a bit in vcpu->requests, and 
send an IPI to all cpus that are currently in guest mode for our 
guest.  This forces the cpu back to host mode.  On the next entry, 
vcpu_run notices vcpu->requests has the bit set and reloads the mmu 
configuration.  Of course, all this may be overkill for s390.


I thought about implementing it with slots_lock, vcpu->request, etc 
but it really looks like overkill for s390.


We could make (some of) it common code, so it won't look so bad.  
There's value in having all kvm ports do things similarly; though of 
course we shouldn't force the solution when it isn't really needed.


vcpu->requests is useful whenever we modify global VM state that needs 
to be seen by all vcpus in host mode; see  kvm_reload_remote_mmus().
yeah I read that code after your first hint in that thread, and I agree 
that merging some of this into common code might be good.
But in my opinion not now for this bugfix patch (the intention is just 
to prevent a user being able to crash the host via vcpu create,set mem& 
and vcpu run in that order).
It might be a good point to further streamline this once we use the same 
userspace code, but I think it doesn't make sense yet.


At least today we can assume that we only have one memslot. Therefore 
a set_memslot with already created vcpu's will still not interfere 
with running vcpus (they can't run without memslot and since we have 
only one they won't run).
Anyway I the code is prepared to "meet" running vcpus, because it 
might be different in future. To prevent the livelock issue I changed 
the code using mutex_trylock and in case I can't get the lock I 
explicitly let the vcpu exit from guest.


Why not do it unconditionally?

hmm I might have written that misleading - eventually it's a loop until 
it got the lock

 while !trylock
   kick vcpu out of guest
   schedule

There is no reason to kick out guests where I got the lock cleanly as 
far as I see.
Especially as I expect the vcpus not running in the common case as i 
explained above (can't run without memslot + we only have one => no vcpu 
will run).



--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] kvm-s390: Fix memory slot versus run

2009-05-11 Thread Christian Ehrhardt

Avi Kivity wrote:

ehrha...@linux.vnet.ibm.com wrote:

From: Carsten Otte 

This patch fixes an incorrectness in the kvm backend for s390.
In case virtual cpus are being created before the corresponding
memory slot is being registered, we need to update the sie
control blocks for the virtual cpus. In order to do that, we
use the vcpu->mutex to lock out kvm_run and friends. This way
we can ensure a consistent update of the memory for the entire
smp configuration.
@@ -657,6 +657,8 @@ int kvm_arch_set_memory_region(struct kv
 struct kvm_memory_slot old,
 int user_alloc)
 {
+int i;
+
 /* A few sanity checks. We can have exactly one memory slot 
which has

to start at guest virtual zero and which has to be located at a
page boundary in userland and which has to end at a page 
boundary.

@@ -676,13 +678,27 @@ int kvm_arch_set_memory_region(struct kv
 if (mem->memory_size & (PAGE_SIZE - 1))
 return -EINVAL;
 
+/* lock all vcpus */

+for (i = 0; i < KVM_MAX_VCPUS; ++i) {
+if (kvm->vcpus[i])
+mutex_lock(&kvm->vcpus[i]->mutex);
+}
+



Can't that livelock?  Nothing requires a vcpu to ever exit, and if the 
cpu on which it's running on has no other load and no interrupts, it 
could remain in guest mode indefinitely, and then the ioctl will hang, 
waiting for something to happen.



Yes it could wait indefinitely - good spot.

On x86, we use slots_lock to protect memory slots.  When we change the 
global memory configuration, we set a bit in vcpu->requests, and send 
an IPI to all cpus that are currently in guest mode for our guest.  
This forces the cpu back to host mode.  On the next entry, vcpu_run 
notices vcpu->requests has the bit set and reloads the mmu 
configuration.  Of course, all this may be overkill for s390.


I thought about implementing it with slots_lock, vcpu->request, etc but 
it really looks like overkill for s390.
At least today we can assume that we only have one memslot. Therefore a 
set_memslot with already created vcpu's will still not interfere with 
running vcpus (they can't run without memslot and since we have only one 
they won't run).
Anyway I the code is prepared to "meet" running vcpus, because it might 
be different in future. To prevent the livelock issue I changed the code 
using mutex_trylock and in case I can't get the lock I explicitly let 
the vcpu exit from guest.


--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] kvm-s390: Unlink vcpu on destroy

2009-05-11 Thread Christian Ehrhardt

Avi Kivity wrote:

ehrha...@linux.vnet.ibm.com wrote:

From: Carsten Otte 

This patch makes sure we do unlink a vcpu's sie control block
from the system control area in kvm_arch_vcpu_destroy. This
prevents illegal accesses to the sie control block from other
virtual cpus after free.

Reported-by: Mijo Safradin 
Signed-off-by: Carsten Otte 
---
 arch/s390/kvm/kvm-s390.c |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Index: kvm/arch/s390/kvm/kvm-s390.c
===
--- kvm.orig/arch/s390/kvm/kvm-s390.c
+++ kvm/arch/s390/kvm/kvm-s390.c
@@ -195,6 +195,9 @@ out_nokvm:
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
 VCPU_EVENT(vcpu, 3, "%s", "free cpu");
+if (vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda ==
+(__u64) vcpu->arch.sie_block)
+vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda = 0;
 free_page((unsigned long)(vcpu->arch.sie_block));

  


If this is accessed by hardware on a different cpu, don't you need a 
memory barrier here?




Right, will be in v2

--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu: fix configuring kvm probe when using --kerneldir

2009-01-11 Thread Christian Ehrhardt

Andre Przywara wrote:

ehrha...@linux.vnet.ibm.com wrote:

From: Christian Ehrhardt 

There is already a variable kvm_cflags which gets the path of the kernel
includes when using --kerneldir. But eventually with newer kernels we 
all will
need arch/$arch/include too (my case was a incldue of asm/kvm.h which 
was not

found anymore). Headers in a full kernel source are not flattened to
one arch like they are if e.g. installed kernel headers are used.
I also stumbled over this recently (in kvm-userspace.git), but I had 
problems with the qemu part not including KVM support because in 
qemu/configure the KVM build test failed due to the missing asm/kvm.h.
I saw that --kerneldir gets not propagated to qemu, but 
libkvm_kerneldir instead, which is hardcoded to point to `pwd`/kernel. 
Shouldn't that be fixed, too?
I use kvm-userspace.git and a not-installed kernel from kvm.git for 
compiling, so I say "./configure --kerneldir=/src/kvm.git 
--with-patched-kernel". I eventually hacked KVM's configure to 
propagate --kerneldir to qemu and added arch/x86/include to the 
include path in qemu/configure. This is of course a hack (that's why I 
don't append it here), but it worked ;-)
If someone proposes a clean and easy way to solve this, I'd be happy 
to write a patch.


I know this issue and reported it ~a month ago. I also had issues 
compiling against a --kerneldir kernel because the libkvm_kerneldir was 
propagated. Eventually in the discussion it came up that we don't need 
to fix configure "technically", but maybe we should find a way to better 
inform users/developüers about this (I guess up to 99% that this works 
for you in kvm-userspace too):


(in a clean kvm-userspace)
 cd kernel
 make sync LINUX=path/to/your/kerneldir
 cd ..
 ./configure opt=whateveryouwant

This way your kerneldir is synced and flattened into kvm-userspace and 
propagating libkvm_kerneldir is fine since that are your kerneldir 
headers now.
Maybe a "fix" would be that if --kerneldir is provided to configure it 
has to ensure that THIS kerneldir is synced in before continuing.
You should be aware that the fix I sent on Friday was for plain qemu 
which doesn't have that kernel subdir indirection and therefore works a 
bit different.


To fix that, the includes added to cflags depending on --kerneldir 
should also
contian the arch includes. The patch adds a special check for x86 
because its
source layout recently changed, all others directly use 
arch/$cpu/include if

existent.
This is one problem I also noticed. $cpu is not the same as the Linux' 
arch name, is there a suitable variable or do we have to do a large 
switch/case?


I looked around and there was no real 1:1 matching variable. But 
fortunately $cpu is similar enough to simplify that swicth/case a lot 
like I did in my patch here.



Regards,
Andre.




Signed-off-by: Christian Ehrhardt 
---

[diffstat]
 configure |6 ++
 1 file changed, 6 insertions(+)

[diff]
diff --git a/configure b/configure
--- a/configure
+++ b/configure
@@ -963,6 +963,12 @@ EOF
 EOF
   if test "$kerneldir" != "" ; then
   kvm_cflags=-I"$kerneldir"/include
+  if test \( "$cpu" = "i386" -o "$cpu" = "x86_64" \) \
+ -a -d "$kerneldir/arch/x86/include" ; then
+kvm_cflags="$kvm_cflags -I$kerneldir/arch/x86/include"
+elif test -d "$kerneldir/arch/$cpu/include" ; then
+kvm_cflags="$kvm_cflags -I$kerneldir/arch/$cpu/include"
+  fi
   else
   kvm_cflags=""
   fi



--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu: report issues causing the kvm probe to fail v3

2009-01-08 Thread Christian Ehrhardt

Anthony Liguori wrote:

ehrha...@linux.vnet.ibm.com wrote:

From: Christian Ehrhardt 

The patch applies to upstream qemu as well as kvm-userspace, but 
since it is
the qemu configure script I think it should go to upstream qemu 
(Anthony)
first and with the next merge to kvm-userspace. On the other hand it 
is the kvm

probe so an ack from Avi in case v3 is ok would be reasonable.

*updates*
v2 - it also reports other errors than just #error preprocessor 
statements

 (requested by Avi)
v3 - In case awk or grep is not installed it now gracfully (silently)
 fails still disabling kvm (requested by Anthony)

This patch is about reporting more details of the issue if 
configuring kvm
fails. Therefore this patch keeps the qemu style configure output 
which is a
list of "$Feature $Status", but extend the "no" result like "KVM 
Support no"

with some more information.

There might be a lot of things going wrong with that probe and I 
don't want

to handle all of them, but if it is one of the known checks e.g. for
KVM_API_VERSION then we could grep/awk that out and report it. The patch
reports in case of a known case in the style
"KVM support no - (Missing KVM capability 
KVM_CAP_DESTROY_MEMORY_REGION_WORKS)"


In case more than one #error is triggered it creates a comma 
separated list in
those brackets and in case it is something else than an #error it 
just reports

plain old "no".

Signed-off-by: Christian Ehrhardt 
---

 configure |   27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/qemu/configure b/qemu/configure
--- a/qemu/configure
+++ b/qemu/configure
  


Please send against upstream QEMU.

Regards,

Anthony Liguori

This applies to qemu upstream already when not specifying -p or -p 2 (as 
I already tested yesterday before submission).
Well I removed the leading qemu dir manually and updated the -51 lines 
offset (attached), should I change more than that (add the leading trunk 
maybe, atm I'm just not sure what changes you want) ?



--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

Subject: [PATCH] qemu: report issues causing the kvm probe to fail v3

From: Christian Ehrhardt 

The patch applies to upstream qemu as well as kvm-userspace, but since it is
the qemu configure script I think it should go to upstream qemu (Anthony)
first and with the next merge to kvm-userspace. On the other hand it is the kvm
probe so an ack from Avi in case v3 is ok would be reasonable.

*updates*
v2 - it also reports other errors than just #error preprocessor statements
 (requested by Avi)
v3 - In case awk or grep is not installed it now gracfully (silently)
 fails still disabling kvm (requested by Anthony)

This patch is about reporting more details of the issue if configuring kvm
fails. Therefore this patch keeps the qemu style configure output which is a
list of "$Feature $Status", but extend the "no" result like "KVM Support no"
with some more information.

There might be a lot of things going wrong with that probe and I don't want
to handle all of them, but if it is one of the known checks e.g. for
KVM_API_VERSION then we could grep/awk that out and report it. The patch
reports in case of a known case in the style
"KVM support no - (Missing KVM capability KVM_CAP_DESTROY_MEMORY_REGION_WORKS)"

In case more than one #error is triggered it creates a comma separated list in
those brackets and in case it is something else than an #error it just reports
plain old "no".

Signed-off-by: Christian Ehrhardt 
---

 configure |   27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/configure b/configure
--- a/configure
+++ b/configure
@@ -951,13 +951,17 @@ if test "$kvm" = "yes" ; then
 if test "$kvm" = "yes" ; then
 cat > $TMPC <
-#if !defined(KVM_API_VERSION) || \
-KVM_API_VERSION < 12 || \
-KVM_API_VERSION > 12 || \
-!defined(KVM_CAP_USER_MEMORY) || \
-!defined(KVM_CAP_SET_TSS_ADDR) || \
-!defined(KVM_CAP_DESTROY_MEMORY_REGION_WORKS)
+#if !defined(KVM_API_VERSION) || KVM_API_VERSION < 12 || KVM_API_VERSION > 12
 #error Invalid KVM version
+#endif
+#if !defined(KVM_CAP_USER_MEMORY)
+#error Missing KVM capability KVM_CAP_USER_MEMORY
+#endif
+#if !defined(KVM_CAP_SET_TSS_ADDR)
+#error Missing KVM capability KVM_CAP_SET_TSS_ADDR
+#endif
+#if !defined(KVM_CAP_DESTROY_MEMORY_REGION_WORKS)
+#error Missing KVM capability KVM_CAP_DESTROY_MEMORY_REGION_WORKS
 #endif
 int main(void) { return 0; }
 EOF
@@ -970,7 +974,16 @@ EOF
   > /dev/null 2>/dev/null ; then
 :
   else
-kvm="no"
+kvm="no";
+if [ -x "`which awk 2>/dev/null`" ] && \
+   [ -x "`which grep 2>/dev/null`" ]; then
+  kvmerr=`$cc $ARCH_CFLAGS -o $TMPE ${OS_CFLAGS} $kvm_cflags $TMPC 2>&1 \
+	| grep "error: " \
+	| awk -F "error: " '{if (NR>1) printf(", "); printf("%s",$2);}'`
+  if test "$kvmerr" != "" ; then
+kvm="no - (${kvmerr})"
+  fi
+fi
   fi
 fi
 


Re: [PATCH] kvm: work around inability of older kvm modules to destroy memory regions

2008-12-17 Thread Christian Ehrhardt

Avi Kivity wrote:

Christian Ehrhardt wrote:
Hi, this patch breaks all non x86 architectures as 
libkvm/libkvm-x86.c has the only implementation of the alias 
functionality.
Until now only qemu-kvm-x86 has called that functions, but since this 
patch the generic qemu-kvm.c calls them which leads to unresolved 
symbols for powerpc, s390 and surely ia64 too.


Well we could insert stubs for these call, but when looking on the 
kernel side x86 is also the only implementer of the 
KVM_SET_MEMORY_ALIAS ioctl. Until more arch support that there is no 
reason to create these functions for non-x86 in libkvm. Also the 
assumptions which addresses must be aliased base on hardware specific 
assumptions e.g. vga card -> arch specific too.


For now I hold a no-op stub in my private queue to test powerpc, but 
eventually this mechanism should be arch dependent and this 
implementation x86 only.

Avi could you modify your patch to work for the other arch's too ?


Your band aid should be fine.  Yes, it's ugly, but this will be in 
flux as we merge with upstream qemu.  Please resend it with a signoff.



still ugly, but in a proper patch style now :-)


--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

Subject: [PATCH] kvm-userspace: guard destroy memory regions to x86 only

From: Christian Ehrhardt 

Some parts of the changes made were arch specific e..g vga mem.
This is a quick, but ugly workaround until a real arch split is found.

Signed-off-by: Christian Ehrhardt 
---

[diffstat]
 qemu-kvm.c |   26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

[diff]
diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c
--- a/qemu/qemu-kvm.c
+++ b/qemu/qemu-kvm.c
@@ -766,7 +766,9 @@ int kvm_qemu_init()
 return 0;
 }

+#ifdef TARGET_I386
 static int destroy_region_works = 0;
+#endif

 int kvm_qemu_create_context(void)
 {
@@ -784,7 +786,9 @@ int kvm_qemu_create_context(void)
 r = kvm_arch_qemu_create_context();
 if(r <0)
 	kvm_qemu_destroy();
+#ifdef TARGET_I386
 destroy_region_works = kvm_destroy_memory_region_works(kvm_context);
+#endif
 return 0;
 }

@@ -793,6 +797,7 @@ void kvm_qemu_destroy(void)
 kvm_finalize(kvm_context);
 }

+#ifdef TARGET_I386
 static int must_use_aliases_source(target_phys_addr_t addr)
 {
 if (destroy_region_works)
@@ -849,6 +854,7 @@ static void drop_mapping(target_phys_add
 if (p)
 *p = mappings[--nr_mappings];
 }
+#endif

 void kvm_cpu_register_physical_memory(target_phys_addr_t start_addr,
   unsigned long size,
@@ -856,17 +862,21 @@ void kvm_cpu_register_physical_memory(ta
 {
 int r = 0;
 unsigned long area_flags = phys_offset & ~TARGET_PAGE_MASK;
+#ifdef TARGET_I386
 struct mapping *p;
+#endif

 phys_offset &= ~IO_MEM_ROM;

 if (area_flags == IO_MEM_UNASSIGNED) {
+#ifdef TARGET_I386
 if (must_use_aliases_source(start_addr)) {
 kvm_destroy_memory_alias(kvm_context, start_addr);
 return;
 }
 if (must_use_aliases_target(start_addr))
 return;
+#endif
 kvm_unregister_memory_area(kvm_context, start_addr, size);
 return;
 }
@@ -878,6 +888,7 @@ void kvm_cpu_register_physical_memory(ta
 if (area_flags >= TLB_MMIO)
 return;

+#ifdef TARGET_I386
 if (must_use_aliases_source(start_addr)) {
 p = find_ram_mapping(phys_offset);
 if (p) {
@@ -886,6 +897,7 @@ void kvm_cpu_register_physical_memory(ta
 }
 return;
 }
+#endif

 r = kvm_register_phys_mem(kvm_context, start_addr,
   phys_ram_base + phys_offset,
@@ -895,11 +907,13 @@ void kvm_cpu_register_physical_memory(ta
 exit(1);
 }

+#ifdef TARGET_I386
 drop_mapping(start_addr);
 p = &mappings[nr_mappings++];
 p->phys = start_addr;
 p->ram = phys_offset;
 p->len = size;
+#endif

 return;
 }
@@ -1068,8 +1082,10 @@ void kvm_qemu_log_memory(target_phys_add
 if (log)
 	kvm_dirty_pages_log_enable_slot(kvm_context, start, size);
 else {
+#ifdef TARGET_I386
 if (must_use_aliases_target(start))
 return;
+#endif
 	kvm_dirty_pages_log_disable_slot(kvm_context, start, size);
 }
 }
@@ -1157,8 +1173,10 @@ void kvm_physical_sync_dirty_bitmap(targ
 {
 void *buf;

+#ifdef TARGET_I386
 if (must_use_aliases_source(start_addr))
 return;
+#endif

 buf = qemu_malloc((end_addr - start_addr) / 8 + 2);
 kvm_get_dirty_pages_range(kvm_context, start_addr, end_addr - start_addr,
@@ -1168,21 +1186,21 @@ void kvm_physical_sync_dirty_bitmap(targ

 int kvm_log_start(target_phys_addr_t phys_addr, target_phys_addr_t len)
 {
-#ifndef TARGET_IA64
+#ifdef TARGET_I386
 if (must_use_aliases_source(phys_addr))
 return 0;
+#endif
 kvm_qemu_log_memory(phys_addr, len, 1);
-#endif
 return 0;
 }

 int kvm_log_stop(target_phys_addr_t phys_a

Re: [PATCH] [PATCH] qemu: report issues causing the kvm probe to fail

2008-12-15 Thread Christian Ehrhardt

Avi Kivity wrote:

Christian Ehrhardt wrote:
I ran into the issue of a failign KVM Probe of the qemu configure 
script three
times this week always needing "set -x", inserting an exit, masking 
the cleanup
trap and compiling the c file by hand until I knew what the reason 
is. I think

we could make easier for developers and end users.
Therefore this patch keeps the qemu style configure output which is a 
list of
"$Feature $Status", but extend the "no" result like "KVM Support no" 
with some

more information.

There might be a lot of things going wrong with that probe and I 
don't want

to handle all of them, but if it is one of the known checks e.g. for
KVM_API_VERSION then we could grep/awk that out and report it. The patch
reports in case of a known case in the style
"KVM support no - (Missing KVM capability 
KVM_CAP_DESTROY_MEMORY_REGION_WORKS)"


In case more than one #error is triggered it creates a comma 
separated list in
those brackets and in case it is something else than an #error it 
just reports

plain old "no".

diff --git a/qemu/configure b/qemu/configure
--- a/qemu/configure
+++ b/qemu/configure
@@ -1037,12 +1037,14 @@ if test "$kvm" = "yes" ; then
 if test "$kvm" = "yes" ; then
 cat > $TMPC <
-#if !defined(KVM_API_VERSION) || \
-KVM_API_VERSION < 12 || \
-KVM_API_VERSION > 12 || \
-!defined(KVM_CAP_USER_MEMORY) || \
-!defined(KVM_CAP_SET_TSS_ADDR)
+#if !defined(KVM_API_VERSION) || KVM_API_VERSION < 12 || 
KVM_API_VERSION > 12

 #error Invalid KVM version
  


You might refine this a bit:  if KVM_API_VERSION is not defined, most 
likely linux/kvm.h could not be found, so you might as well report that.



+#endif
  



Updated v2 should appear on the list soon reporting all gcc "error:" 
messages.


--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [PATCH] qemu: report issues causing the kvm probe to fail v2

2008-12-15 Thread Christian Ehrhardt
# HG changeset patch
# User Christian Ehrhardt 
# Date 1229347014 -3600
# Node ID c754b8806d756a19c57fc3b3e317bbe3c147d5ec
# Parent  f7dc67cd9b74c5d7ad322686e58325f879d93468
[PATCH] qemu: report issues causing the kvm probe to fail v2

From: Christian Ehrhardt 

*update to v2*
It now reports all "error:" statements from gcc behind the KVM no status.
As the status line now is a big larger I also found that "kvm support" is
reported twice, so I removed one duplicate. Finally there was also one check
in configure_kvm that did not use "" guards when runnign test on $kvm which
could fail now that it might contain more than just yes/no - I added
apostrophs to prevent that.

I ran into the issue of a failign KVM Probe of the qemu configure script three
times this week always needing "set -x", inserting an exit, masking the cleanup
trap and compiling the c file by hand until I knew what the reason is. I think
we could make easier for developers and end users.
Therefore this patch keeps the qemu style configure output which is a list of
"$Feature $Status", but extend the "no" result like "KVM Support no" with some
more information.

There might be a lot of things going wrong with that probe and I don't want
to handle all of them, but if it is one of the known checks e.g. for
KVM_API_VERSION then we could grep/awk that out and report it. The patch
reports in case of a known case in the style
"KVM support no - (Missing KVM capability KVM_CAP_DESTROY_MEMORY_REGION_WORKS)"

In case more than one #error is triggered it creates a comma separated list in
those brackets and in case it is something else than an #error it just reports
plain old "no".

I sent a similar patch matching qemu upstream version of this file to
qemu-de...@nongnu.org to keep both in sync as much as possible.

Signed-off-by: Christian Ehrhardt 
---

[diffstat]
 configure |   22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

[diff]

diff --git a/qemu/configure b/qemu/configure
--- a/qemu/configure
+++ b/qemu/configure
@@ -1037,12 +1037,14 @@ if test "$kvm" = "yes" ; then
 if test "$kvm" = "yes" ; then
 cat > $TMPC <
-#if !defined(KVM_API_VERSION) || \
-KVM_API_VERSION < 12 || \
-KVM_API_VERSION > 12 || \
-!defined(KVM_CAP_USER_MEMORY) || \
-!defined(KVM_CAP_SET_TSS_ADDR)
+#if !defined(KVM_API_VERSION) || KVM_API_VERSION < 12 || KVM_API_VERSION > 12
 #error Invalid KVM version
+#endif
+#if !defined(KVM_CAP_USER_MEMORY)
+#error Missing KVM capability KVM_CAP_USER_MEMORY
+#endif
+#if !defined(KVM_CAP_SET_TSS_ADDR)
+#error Missing KVM capability KVM_CAP_SET_TSS_ADDR
 #endif
 int main(void) { return 0; }
 EOF
@@ -1055,7 +1057,12 @@ EOF
   2>/dev/null ; then
 :
   else
-kvm="no"
+kvmprobeerr=`$cc $ARCH_CFLAGS -o $TMPE ${OS_CFLAGS} $kvm_cflags $TMPC 2>&1 
| grep "error: " | awk --field-separator "error: " '{if (NR>1) printf(", "); 
printf("%s",$2);}'`
+if test "$kvmprobeerr" != "" ; then
+  kvm="no - (${kvmprobeerr})"
+else
+  kvm="no"
+fi
   fi
 fi
 
@@ -1201,7 +1208,6 @@ if test -n "$sparc_cpu"; then
 echo "Target Sparc Arch $sparc_cpu"
 fi
 echo "kqemu support $kqemu"
-echo "kvm support   $kvm"
 echo "CPU emulation $cpu_emulation"
 if test $cpu = "powerpc"; then
 echo "libfdt support$device_tree_support"
@@ -1638,7 +1644,7 @@ disable_cpu_emulation() {
 }
 
 configure_kvm() {
-  if test $kvm = "yes" -a "$target_softmmu" = "yes" -a \
+  if test "$kvm" = "yes" -a "$target_softmmu" = "yes" -a \
   \( "$cpu" = "i386" -o "$cpu" = "x86_64" -o "$cpu" = "ia64" -o "$cpu" 
= "powerpc" \); then
 echo "#define USE_KVM 1" >> $config_h
 echo "USE_KVM=1" >> $config_mak
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [PATCH] kvm-userspace: ppc: fix compatfd build decision v2

2008-12-15 Thread Christian Ehrhardt
# HG changeset patch
# User Christian Ehrhardt 
# Date 1229345133 -3600
# Node ID b48b9d560f80037ab4e12eae128022622c7ccb34
# Parent  4b0ad05490115e4c6f31d2419c0e5b628040f90b
[PATCH] kvm-userspace: ppc: fix compatfd build decision v2

From: Christian Ehrhardt 

qemu-kvm.c uses qemu_eventfd/qemu_signalfd. The code of compatfd takes care
if CONFIG_eventfd/CONFIG_signalfd is really enabled. But currently compatfd
is not build if --disable-aio is set which leads to undefined references
when linking.

Since compatfd.c takes care of configs being set we can savely build it in
all cases dropping the ifdef completely. This allows powerpc but also all
other archs to build in case --disable-aio is set.

Signed-off-by: Christian Ehrhardt 
---

[diffstat]
 Makefile|5 +
 Makefile.target |6 +-
 2 files changed, 2 insertions(+), 9 deletions(-)

[diff]

diff --git a/qemu/Makefile b/qemu/Makefile
--- a/qemu/Makefile
+++ b/qemu/Makefile
@@ -51,7 +51,7 @@ BLOCK_OBJS+=block-cow.o block-qcow.o aes
 BLOCK_OBJS+=block-cow.o block-qcow.o aes.o block-vmdk.o block-cloop.o
 BLOCK_OBJS+=block-dmg.o block-bochs.o block-vpc.o block-vvfat.o
 BLOCK_OBJS+=block-qcow2.o block-parallels.o block-nbd.o
-BLOCK_OBJS+=nbd.o block.o aio.o
+BLOCK_OBJS+=nbd.o block.o aio.o compatfd.o
 
 ifdef CONFIG_WIN32
 BLOCK_OBJS += block-raw-win32.o
@@ -59,9 +59,6 @@ BLOCK_OBJS += block-raw-posix.o
 BLOCK_OBJS += block-raw-posix.o
 endif
 
-ifdef CONFIG_AIO
-BLOCK_OBJS += compatfd.o
-endif
 
 ##
 # libqemu_common.a: Target independent part of system emulation. The
diff --git a/qemu/Makefile.target b/qemu/Makefile.target
--- a/qemu/Makefile.target
+++ b/qemu/Makefile.target
@@ -646,6 +646,7 @@ OBJS=vl.o osdep.o monitor.o pci.o loader
 OBJS=vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o machine.o
 OBJS+=fw_cfg.o
 OBJS+=net.o
+OBJS+=compatfd.o
 ifdef CONFIG_KVM
 OBJS+=kvm.o kvm-all.o
 endif
@@ -654,11 +655,6 @@ else
 else
 OBJS+=block-raw-posix.o
 endif
-
-ifdef CONFIG_AIO
-OBJS+=compatfd.o
-endif
-
 LIBS+=-lz
 ifdef CONFIG_ALSA
 LIBS += -lasound
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1 of 6] [PATCH] kvm-userspace: ppc: fix compatfd build decision

2008-12-15 Thread Christian Ehrhardt

Avi Kivity wrote:


qemu-kvm.c uses qemu_eventfd/qemu_signalfd. The code of compatfd 
takes care
if CONFIG_eventfd/CONFIG_signalfd is really enabled. But currently 
compatfd is
not build if --disable-aio is set. This patch lets compatfd.c build 
if USE_KVM
is set to allow qemu-kvm to be linked in all cases (with/without 
--disable-aio)
  


This breaks x86, so I dropped it.


On the other Hand x86 it is broken atm too.
If you compile current upstream for x86 with --disable-aio you'll get 
this too:

 ibqemu.a(qemu-kvm.o): In function `kvm_main_loop':
 
/home/paelzer/Desktop/KVM/ppc_port/kvm-userspace-ppc.hg-testbuild/qemu/qemu-kvm.c:565: 
undefined reference to `qemu_eventfd'
 
/home/paelzer/Desktop/KVM/ppc_port/kvm-userspace-ppc.hg-testbuild/qemu/qemu-kvm.c:580: 
undefined reference to `qemu_signalfd'

 collect2: ld returned 1 exit status

Which was exactly what I had with power :-/

I checked for the error you reported Avi, and the problem seems to be 
that  USE_KVM was not set even if KVM support is enabled (weird?).
However looking at this more in detail I realized that I don't have to 
care about USE_KVM in this csae. As I mentioned before compatfd.c takes 
care if CONFIG_signalfd/CONFIG_eventfd are set. Therefore we can savely 
remove the makefile guard completely and just always build compatfd.c.


This updated patch works for x86&powerpc with/without --disable-aio in 
my tests.

It should appear on the list shortly.

--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [PATCH] kvm-userspace: gdb: fix new gdb function types

2008-12-12 Thread Christian Ehrhardt
# HG changeset patch
# User Christian Ehrhardt 
# Date 1229085659 -3600
# Node ID 37967a80a2757505488685aac135681945e6da91
# Parent  f0ed33f14658fe91a14ec02501cb42d26e32f01f
[PATCH] kvm-userspace: gdb: fix new gdb function types

From: Christian Ehrhardt 

The types changed in the header but not in the powerpc and ia64 implementation.
This patch fix that build error by changing the stubs to the right prototype.

Signed-off-by: Christian Ehrhardt 
---

[diffstat]
 qemu-kvm-ia64.c|6 --
 qemu-kvm-powerpc.c |6 --
 2 files changed, 8 insertions(+), 4 deletions(-)

[diff]

diff --git a/qemu/qemu-kvm-ia64.c b/qemu/qemu-kvm-ia64.c
--- a/qemu/qemu-kvm-ia64.c
+++ b/qemu/qemu-kvm-ia64.c
@@ -65,12 +65,14 @@ void kvm_arch_update_regs_for_sipi(CPUSt
 {
 }
 
-int kvm_arch_insert_sw_breakpoint(struct kvm_sw_breakpoint *bp)
+int kvm_arch_insert_sw_breakpoint(CPUState *current_env,
+  struct kvm_sw_breakpoint *bp)
 {
 return -EINVAL;
 }
 
-int kvm_arch_remove_sw_breakpoint(struct kvm_sw_breakpoint *bp)
+int kvm_arch_remove_sw_breakpoint(CPUState *current_env,
+  struct kvm_sw_breakpoint *bp)
 {
 return -EINVAL;
 }
diff --git a/qemu/qemu-kvm-powerpc.c b/qemu/qemu-kvm-powerpc.c
--- a/qemu/qemu-kvm-powerpc.c
+++ b/qemu/qemu-kvm-powerpc.c
@@ -223,12 +223,14 @@ void kvm_arch_cpu_reset(CPUState *env)
 {
 }
 
-int kvm_arch_insert_sw_breakpoint(struct kvm_sw_breakpoint *bp)
+int kvm_arch_insert_sw_breakpoint(CPUState *current_env,
+  struct kvm_sw_breakpoint *bp)
 {
 return -EINVAL;
 }
 
-int kvm_arch_remove_sw_breakpoint(struct kvm_sw_breakpoint *bp)
+int kvm_arch_remove_sw_breakpoint(CPUState *current_env,
+  struct kvm_sw_breakpoint *bp)
 {
 return -EINVAL;
 }
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [PATCH] qemu: ppc: kvm-userspace: KVM PowerPC support for qemu gdbstub

2008-12-12 Thread Christian Ehrhardt

Hollis Blanchard wrote:

On Thu, 2008-12-11 at 17:05 +0100, Jan Kiszka wrote:
  

Hollis Blanchard wrote:


On Thu, 2008-12-11 at 13:53 +0100, Christian Ehrhardt wrote:
  

This is v2 as version one had a type in it occured when splitting patches.
Mercurial somehow lost my changes to the patch description explaining 
that, but the patch is right this way.


Christian Ehrhardt wrote:


# HG changeset patch
# User Christian Ehrhardt 

# Date 1228999833 -3600
# Node ID dc1466c9077ab162f4637fffee1869f26be02299
# Parent  4c07fe2a56c7653a9113e05bb08c2de9aec210ce
[PATCH] qemu: ppc: kvm-userspace: KVM PowerPC support for qemu gdbstub

From: Hollis Blanchard 

Add basic KVM PowerPC support to qemu's gdbstub introducing a kvm ppc style
mmu implementation that uses the kvm_translate ioctl.
This also requires to save the kvm registers prior to the 'm' gdb operations.

Signed-off-by: Hollis Blanchard 

Signed-off-by: Christian Ehrhardt 

  

Let's *not* apply this to kvm-userspace. We will submit this to qemu,
and once we work out the right solution there it will be merged
naturally.

  

I don't oversee yet what you want to push upstream, but in case it's the
gdbstub support for kvm (including ppc bits): please note that I plan to
push the new interface once it is merged into kvm-userspace, avoiding to
spread the current, limited one as far as possible.

BTW, would be great if you could have a look / provide patches for ppc
to support the new interface already. I am open for feedback,
specifically regarding its suitability beyond x86.



I've been meaning to do this for a while, sorry. We'll take a look soon.

  

Hi Jan,
I saw that you already had that env->s->g_cpu fix, so if you change all that
anyway it might really be better to test/extend your patches for powerpc 
now.


If it is ok for you I would submit my patches that apply on top of yours to
you and cc the kvm list. But as Hollis mentioned I would prefer go for qemu
upstream first and then assist Avi in merging it into kvm-userspace because
this is the natural direction patches flow atm (and if you need to change it
multiple times until you get qemu acceptance you would have to extensivly
patch both projects to match again).

As my code in that case depend on your patches it would be nice if you could
put them into your series once you are happy with it.

--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [PATCH] qemu: report issues causing the kvm probe to fail

2008-12-11 Thread Christian Ehrhardt
# HG changeset patch
# User Christian Ehrhardt <[EMAIL PROTECTED]>
# Date 1229005383 -3600
# Node ID d788f32f8f60f3a0d86ab218459089e5186632ca
# Parent  f7dc67cd9b74c5d7ad322686e58325f879d93468
[PATCH] qemu: report issues causing the kvm probe to fail

From: Christian Ehrhardt <[EMAIL PROTECTED]>

I ran into the issue of a failign KVM Probe of the qemu configure script three
times this week always needing "set -x", inserting an exit, masking the cleanup
trap and compiling the c file by hand until I knew what the reason is. I think
we could make easier for developers and end users.
Therefore this patch keeps the qemu style configure output which is a list of
"$Feature $Status", but extend the "no" result like "KVM Support no" with some
more information.

There might be a lot of things going wrong with that probe and I don't want
to handle all of them, but if it is one of the known checks e.g. for
KVM_API_VERSION then we could grep/awk that out and report it. The patch
reports in case of a known case in the style
"KVM support no - (Missing KVM capability KVM_CAP_DESTROY_MEMORY_REGION_WORKS)"

In case more than one #error is triggered it creates a comma separated list in
those brackets and in case it is something else than an #error it just reports
plain old "no".

Signed-off-by: Christian Ehrhardt <[EMAIL PROTECTED]>
---

[diffstat]
 configure |   24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

[diff]

diff --git a/qemu/configure b/qemu/configure
--- a/qemu/configure
+++ b/qemu/configure
@@ -1037,12 +1037,14 @@ if test "$kvm" = "yes" ; then
 if test "$kvm" = "yes" ; then
 cat > $TMPC <
-#if !defined(KVM_API_VERSION) || \
-KVM_API_VERSION < 12 || \
-KVM_API_VERSION > 12 || \
-!defined(KVM_CAP_USER_MEMORY) || \
-!defined(KVM_CAP_SET_TSS_ADDR)
+#if !defined(KVM_API_VERSION) || KVM_API_VERSION < 12 || KVM_API_VERSION > 12
 #error Invalid KVM version
+#endif
+#if !defined(KVM_CAP_USER_MEMORY)
+#error Missing KVM capability KVM_CAP_USER_MEMORY
+#endif
+#if !defined(KVM_CAP_SET_TSS_ADDR)
+#error Missing KVM capability KVM_CAP_SET_TSS_ADDR
 #endif
 int main(void) { return 0; }
 EOF
@@ -1055,7 +1057,12 @@ EOF
   2>/dev/null ; then
 :
   else
-kvm="no"
+kvmprobeerr=`$cc $ARCH_CFLAGS -o $TMPE ${OS_CFLAGS} $kvm_cflags $TMPC 2>&1 
| grep "error: #error " | awk --field-separator "error: #error " '{if (NR>1) 
printf(", "); printf("%s",$2);}'`
+if test "$kvmprobeerr" != "" ; then
+  kvm="no - (${kvmprobeerr})"
+else
+  kvm="no"
+fi
   fi
 fi
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [PATCH] qemu: ppc: kvm-userspace: KVM PowerPC support for qemu gdbstub

2008-12-11 Thread Christian Ehrhardt
# HG changeset patch
# User Christian Ehrhardt <[EMAIL PROTECTED]>
# Date 1228999833 -3600
# Node ID dc1466c9077ab162f4637fffee1869f26be02299
# Parent  4c07fe2a56c7653a9113e05bb08c2de9aec210ce
[PATCH] qemu: ppc: kvm-userspace: KVM PowerPC support for qemu gdbstub

From: Hollis Blanchard <[EMAIL PROTECTED]>

Add basic KVM PowerPC support to qemu's gdbstub introducing a kvm ppc style
mmu implementation that uses the kvm_translate ioctl.
This also requires to save the kvm registers prior to the 'm' gdb operations.

Signed-off-by: Hollis Blanchard <[EMAIL PROTECTED]>
Signed-off-by: Christian Ehrhardt <[EMAIL PROTECTED]>
---

[diffstat]
 gdbstub.c   |2 ++
 hw/ppc440_bamboo.c  |1 +
 qemu-kvm-powerpc.c  |   28 
 target-ppc/cpu.h|2 ++
 target-ppc/helper.c |4 
 target-ppc/translate_init.c |5 +
 6 files changed, 42 insertions(+)

[diff]

diff --git a/qemu/gdbstub.c b/qemu/gdbstub.c
--- a/qemu/gdbstub.c
+++ b/qemu/gdbstub.c
@@ -1374,6 +1374,7 @@ static int gdb_handle_packet(GDBState *s
 if (*p == ',')
 p++;
 len = strtoull(p, NULL, 16);
+kvm_save_registers(s->g_cpu);
 if (cpu_memory_rw_debug(s->g_cpu, addr, mem_buf, len, 0) != 0) {
 put_packet (s, "E14");
 } else {
@@ -1389,6 +1390,7 @@ static int gdb_handle_packet(GDBState *s
 if (*p == ':')
 p++;
 hextomem(mem_buf, p, len);
+kvm_save_registers(s->g_cpu);
 if (cpu_memory_rw_debug(s->g_cpu, addr, mem_buf, len, 1) != 0)
 put_packet(s, "E14");
 else
diff --git a/qemu/hw/ppc440_bamboo.c b/qemu/hw/ppc440_bamboo.c
--- a/qemu/hw/ppc440_bamboo.c
+++ b/qemu/hw/ppc440_bamboo.c
@@ -99,6 +99,7 @@ void bamboo_init(ram_addr_t ram_size, in
fprintf(stderr, "Unable to initialize CPU!\n");
exit(1);
}
+   env->mmu_model = POWERPC_MMU_KVM;
 
/* call init */
printf("Calling function ppc440_init\n");
diff --git a/qemu/qemu-kvm-powerpc.c b/qemu/qemu-kvm-powerpc.c
--- a/qemu/qemu-kvm-powerpc.c
+++ b/qemu/qemu-kvm-powerpc.c
@@ -102,6 +102,7 @@ void kvm_arch_save_regs(CPUState *env)
 
 env->spr[SPR_SRR0] = regs.srr0;
 env->spr[SPR_SRR1] = regs.srr1;
+env->spr[SPR_BOOKE_PID] = regs.pid;
 
 env->spr[SPR_SPRG0] = regs.sprg0;
 env->spr[SPR_SPRG1] = regs.sprg1;
@@ -219,6 +220,33 @@ int handle_powerpc_dcr_write(int vcpu, u
 return 0; /* XXX ignore failed DCR ops */
 }
 
+int mmukvm_get_physical_address(CPUState *env, mmu_ctx_t *ctx,
+target_ulong eaddr, int rw, int access_type)
+{
+struct kvm_translation tr;
+uint64_t pid;
+uint64_t as;
+int r;
+
+pid = env->spr[SPR_BOOKE_PID];
+
+if (access_type == ACCESS_CODE)
+as = env->msr & msr_ir;
+else
+as = env->msr & msr_dr;
+
+tr.linear_address = as << 40 | pid << 32 | eaddr;
+r = kvm_translate(kvm_context, env->cpu_index, &tr);
+if (r == -1)
+return r;
+
+if (!tr.valid)
+return -EFAULT;
+
+ctx->raddr = tr.physical_address;
+return 0;
+}
+
 void kvm_arch_cpu_reset(CPUState *env)
 {
 }
diff --git a/qemu/target-ppc/cpu.h b/qemu/target-ppc/cpu.h
--- a/qemu/target-ppc/cpu.h
+++ b/qemu/target-ppc/cpu.h
@@ -98,6 +98,8 @@ enum powerpc_mmu_t {
 POWERPC_MMU_BOOKE_FSL  = 0x0009,
 /* PowerPC 601 MMU model (specific BATs format)*/
 POWERPC_MMU_601= 0x000A,
+/* KVM managing the MMU state  */
+POWERPC_MMU_KVM= 0x000B,
 #if defined(TARGET_PPC64)
 #define POWERPC_MMU_64   0x0001
 /* 64 bits PowerPC MMU */
diff --git a/qemu/target-ppc/helper.c b/qemu/target-ppc/helper.c
--- a/qemu/target-ppc/helper.c
+++ b/qemu/target-ppc/helper.c
@@ -1429,6 +1429,10 @@ int get_physical_address (CPUState *env,
 fprintf(logfile, "%s\n", __func__);
 }
 #endif
+
+if (env->mmu_model == POWERPC_MMU_KVM)
+return mmukvm_get_physical_address(env, ctx, eaddr, rw, access_type);
+
 if ((access_type == ACCESS_CODE && msr_ir == 0) ||
 (access_type != ACCESS_CODE && msr_dr == 0)) {
 /* No address translation */
diff --git a/qemu/target-ppc/translate_init.c b/qemu/target-ppc/translate_init.c
--- a/qemu/target-ppc/translate_init.c
+++ b/qemu/target-ppc/translate_init.c
@@ -9273,6 +9273,11 @@ int cpu_ppc_register_internal (CPUPPCSta
 case POWERPC_MMU_601:
 mmu_model = "PowerPC 601";
 break;
+#ifdef KVM
+case POWERPC_MMU_KVM:
+mmu_model = "PowerPC KVM";
+break;
+#endif
 #if defined (TARGET_PPC64)
 case POWERPC

Re: [PATCH] [PATCH] qemu: ppc: kvm-userspace: KVM PowerPC support for qemu gdbstub

2008-12-11 Thread Christian Ehrhardt

This is v2 as version one had a type in it occured when splitting patches.
Mercurial somehow lost my changes to the patch description explaining 
that, but the patch is right this way.


Christian Ehrhardt wrote:

# HG changeset patch
# User Christian Ehrhardt <[EMAIL PROTECTED]>
# Date 1228999833 -3600
# Node ID dc1466c9077ab162f4637fffee1869f26be02299
# Parent  4c07fe2a56c7653a9113e05bb08c2de9aec210ce
[PATCH] qemu: ppc: kvm-userspace: KVM PowerPC support for qemu gdbstub

From: Hollis Blanchard <[EMAIL PROTECTED]>

Add basic KVM PowerPC support to qemu's gdbstub introducing a kvm ppc style
mmu implementation that uses the kvm_translate ioctl.
This also requires to save the kvm registers prior to the 'm' gdb operations.

Signed-off-by: Hollis Blanchard <[EMAIL PROTECTED]>
Signed-off-by: Christian Ehrhardt <[EMAIL PROTECTED]>
---

[diffstat]
 gdbstub.c   |2 ++
 hw/ppc440_bamboo.c  |1 +
 qemu-kvm-powerpc.c  |   28 
 target-ppc/cpu.h|2 ++
 target-ppc/helper.c |4 
 target-ppc/translate_init.c |5 +
 6 files changed, 42 insertions(+)

[diff]

diff --git a/qemu/gdbstub.c b/qemu/gdbstub.c
--- a/qemu/gdbstub.c
+++ b/qemu/gdbstub.c
@@ -1374,6 +1374,7 @@ static int gdb_handle_packet(GDBState *s
 if (*p == ',')
 p++;
 len = strtoull(p, NULL, 16);
+kvm_save_registers(s->g_cpu);
 if (cpu_memory_rw_debug(s->g_cpu, addr, mem_buf, len, 0) != 0) {
 put_packet (s, "E14");
 } else {
@@ -1389,6 +1390,7 @@ static int gdb_handle_packet(GDBState *s
 if (*p == ':')
 p++;
 hextomem(mem_buf, p, len);
+kvm_save_registers(s->g_cpu);
 if (cpu_memory_rw_debug(s->g_cpu, addr, mem_buf, len, 1) != 0)
 put_packet(s, "E14");
 else
diff --git a/qemu/hw/ppc440_bamboo.c b/qemu/hw/ppc440_bamboo.c
--- a/qemu/hw/ppc440_bamboo.c
+++ b/qemu/hw/ppc440_bamboo.c
@@ -99,6 +99,7 @@ void bamboo_init(ram_addr_t ram_size, in
fprintf(stderr, "Unable to initialize CPU!\n");
exit(1);
}
+   env->mmu_model = POWERPC_MMU_KVM;

/* call init */
printf("Calling function ppc440_init\n");
diff --git a/qemu/qemu-kvm-powerpc.c b/qemu/qemu-kvm-powerpc.c
--- a/qemu/qemu-kvm-powerpc.c
+++ b/qemu/qemu-kvm-powerpc.c
@@ -102,6 +102,7 @@ void kvm_arch_save_regs(CPUState *env)

 env->spr[SPR_SRR0] = regs.srr0;
 env->spr[SPR_SRR1] = regs.srr1;
+env->spr[SPR_BOOKE_PID] = regs.pid;

 env->spr[SPR_SPRG0] = regs.sprg0;
 env->spr[SPR_SPRG1] = regs.sprg1;
@@ -219,6 +220,33 @@ int handle_powerpc_dcr_write(int vcpu, u
 return 0; /* XXX ignore failed DCR ops */
 }

+int mmukvm_get_physical_address(CPUState *env, mmu_ctx_t *ctx,
+target_ulong eaddr, int rw, int access_type)
+{
+struct kvm_translation tr;
+uint64_t pid;
+uint64_t as;
+int r;
+
+pid = env->spr[SPR_BOOKE_PID];
+
+if (access_type == ACCESS_CODE)
+as = env->msr & msr_ir;
+else
+as = env->msr & msr_dr;
+
+tr.linear_address = as << 40 | pid << 32 | eaddr;
+r = kvm_translate(kvm_context, env->cpu_index, &tr);
+if (r == -1)
+return r;
+
+if (!tr.valid)
+return -EFAULT;
+
+ctx->raddr = tr.physical_address;
+return 0;
+}
+
 void kvm_arch_cpu_reset(CPUState *env)
 {
 }
diff --git a/qemu/target-ppc/cpu.h b/qemu/target-ppc/cpu.h
--- a/qemu/target-ppc/cpu.h
+++ b/qemu/target-ppc/cpu.h
@@ -98,6 +98,8 @@ enum powerpc_mmu_t {
 POWERPC_MMU_BOOKE_FSL  = 0x0009,
 /* PowerPC 601 MMU model (specific BATs format)*/
 POWERPC_MMU_601= 0x000A,
+/* KVM managing the MMU state  */
+POWERPC_MMU_KVM= 0x000B,
 #if defined(TARGET_PPC64)
 #define POWERPC_MMU_64   0x0001
 /* 64 bits PowerPC MMU */
diff --git a/qemu/target-ppc/helper.c b/qemu/target-ppc/helper.c
--- a/qemu/target-ppc/helper.c
+++ b/qemu/target-ppc/helper.c
@@ -1429,6 +1429,10 @@ int get_physical_address (CPUState *env,
 fprintf(logfile, "%s\n", __func__);
 }
 #endif
+
+if (env->mmu_model == POWERPC_MMU_KVM)
+return mmukvm_get_physical_address(env, ctx, eaddr, rw, access_type);
+
 if ((access_type == ACCESS_CODE && msr_ir == 0) ||
 (access_type != ACCESS_CODE && msr_dr == 0)) {
 /* No address translation */
diff --git a/qemu/target-ppc/translate_init.c b/qemu/target-ppc/translate_init.c
--- a/qemu/target-ppc/translate_init.c
+++ b/qemu/target-ppc/translate_init.c
@@ -9273,6 +9273,11 @@ int cpu_ppc_register_internal (CPUPPCSta
 case POWERPC_MMU_601:
 mmu_model = "

[PATCH 3 of 3] [PATCH] kvm-userspace: fix gdbstub kvm integration

2008-12-11 Thread Christian Ehrhardt
# HG changeset patch
# User Christian Ehrhardt <[EMAIL PROTECTED]>
# Date 1228989958 -3600
# Node ID f80fb35de91fe69dae889c70948c9a53212ee444
# Parent  6f228c807ad0b239b7342d2974debfc66418d784
[PATCH] kvm-userspace: fix gdbstub kvm integration

From: Christian Ehrhardt <[EMAIL PROTECTED]>

Some recent qemu upstream merges brought in a new concept to not use "env" as
current cpu in gdb_handle_packet anymore. But the kvm calls still do, this
leads to SIGDEV's as env is not initialized when calling the functions like
kvm_save_registers.

Insted there is now a gdbstate structure holding current cpu for
step/continue and "other" ops splitted.

This patch changes the kvm_save_registers calls to use the right CPUState
variable for the kvm calls in gdb_handle_packet.

Signed-off-by: Christian Ehrhardt <[EMAIL PROTECTED]>
---

[diffstat]
 gdbstub.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

[diff]

diff --git a/qemu/gdbstub.c b/qemu/gdbstub.c
--- a/qemu/gdbstub.c
+++ b/qemu/gdbstub.c
@@ -1348,7 +1348,7 @@ static int gdb_handle_packet(GDBState *s
 }
 break;
 case 'g':
-kvm_save_registers(env);
+kvm_save_registers(s->g_cpu);
 len = 0;
 for (addr = 0; addr < num_g_regs; addr++) {
 reg_size = gdb_read_register(s->g_cpu, mem_buf + len, addr);
@@ -1366,7 +1366,7 @@ static int gdb_handle_packet(GDBState *s
 len -= reg_size;
 registers += reg_size;
 }
-kvm_load_registers(env);
+kvm_load_registers(s->g_cpu);
 put_packet(s, "OK");
 break;
 case 'm':
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1 of 3] [PATCH] kvm-userspace: ppc: Add kvm_translate wrapper

2008-12-11 Thread Christian Ehrhardt
# HG changeset patch
# User Christian Ehrhardt <[EMAIL PROTECTED]>
# Date 1228924564 -3600
# Node ID 38846cef16e56c681da1ddc179e248972c8b2ff9
# Parent  705d874ff7a24484eaa15ed75a748c4e1a70c2ef
[PATCH] kvm-userspace: ppc: Add kvm_translate wrapper

From: Hollis Blanchard <[EMAIL PROTECTED]>

Add kvm_translate() wrapper used to get mmu translations from userspace.

Signed-off-by: Hollis Blanchard <[EMAIL PROTECTED]>
Signed-off-by: Christian Ehrhardt <[EMAIL PROTECTED]>
---

[diffstat]
 libkvm.c |5 +
 libkvm.h |2 ++
 2 files changed, 7 insertions(+)

[diff]

diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
--- a/libkvm/libkvm.c
+++ b/libkvm/libkvm.c
@@ -987,6 +987,11 @@ int kvm_guest_debug(kvm_context_t kvm, i
return ioctl(kvm->vcpu_fd[vcpu], KVM_DEBUG_GUEST, dbg);
 }
 
+int kvm_translate(kvm_context_t kvm, int vcpu, struct kvm_translation *tr)
+{
+   return ioctl(kvm->vcpu_fd[vcpu], KVM_TRANSLATE, tr);
+}
+
 int kvm_set_signal_mask(kvm_context_t kvm, int vcpu, const sigset_t *sigset)
 {
struct kvm_signal_mask *sigmask;
diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h
--- a/libkvm/libkvm.h
+++ b/libkvm/libkvm.h
@@ -639,6 +639,8 @@ int kvm_set_pit(kvm_context_t kvm, struc
 int kvm_set_pit(kvm_context_t kvm, struct kvm_pit_state *s);
 #endif
 
+int kvm_translate(kvm_context_t kvm, int vcpu, struct kvm_translation *tr);
+
 #endif
 
 #ifdef KVM_CAP_VAPIC
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2 of 3] [PATCH] qemu: ppc: kvm-userspace: KVM PowerPC support for qemu gdbstub

2008-12-11 Thread Christian Ehrhardt
# HG changeset patch
# User Christian Ehrhardt <[EMAIL PROTECTED]>
# Date 1228989956 -3600
# Node ID 6f228c807ad0b239b7342d2974debfc66418d784
# Parent  38846cef16e56c681da1ddc179e248972c8b2ff9
[PATCH] qemu: ppc: kvm-userspace: KVM PowerPC support for qemu gdbstub

From: Hollis Blanchard <[EMAIL PROTECTED]>

Add basic KVM PowerPC support to qemu's gdbstub introducing a kvm ppc style
mmu implementation that uses the kvm_translate ioctl.
This also requires to save the kvm registers prior to the 'm' gdb operations.

Signed-off-by: Hollis Blanchard <[EMAIL PROTECTED]>
Signed-off-by: Christian Ehrhardt <[EMAIL PROTECTED]>
---

[diffstat]
 gdbstub.c   |2 ++
 hw/ppc440_bamboo.c  |1 +
 qemu-kvm-powerpc.c  |   28 
 target-ppc/cpu.h|2 ++
 target-ppc/helper.c |4 
 target-ppc/translate_init.c |5 +
 6 files changed, 42 insertions(+)

[diff]

diff --git a/qemu/gdbstub.c b/qemu/gdbstub.c
--- a/qemu/gdbstub.c
+++ b/qemu/gdbstub.c
@@ -1374,6 +1374,7 @@ static int gdb_handle_packet(GDBState *s
 if (*p == ',')
 p++;
 len = strtoull(p, NULL, 16);
+kvm_save_registers(s->g_cpu);
 if (cpu_memory_rw_debug(s->g_cpu, addr, mem_buf, len, 0) != 0) {
 put_packet (s, "E14");
 } else {
@@ -1389,6 +1390,7 @@ static int gdb_handle_packet(GDBState *s
 if (*p == ':')
 p++;
 hextomem(mem_buf, p, len);
+kvm_save_registers(s->gcpu);
 if (cpu_memory_rw_debug(s->g_cpu, addr, mem_buf, len, 1) != 0)
 put_packet(s, "E14");
 else
diff --git a/qemu/hw/ppc440_bamboo.c b/qemu/hw/ppc440_bamboo.c
--- a/qemu/hw/ppc440_bamboo.c
+++ b/qemu/hw/ppc440_bamboo.c
@@ -99,6 +99,7 @@ void bamboo_init(ram_addr_t ram_size, in
fprintf(stderr, "Unable to initialize CPU!\n");
exit(1);
}
+   env->mmu_model = POWERPC_MMU_KVM;
 
/* call init */
printf("Calling function ppc440_init\n");
diff --git a/qemu/qemu-kvm-powerpc.c b/qemu/qemu-kvm-powerpc.c
--- a/qemu/qemu-kvm-powerpc.c
+++ b/qemu/qemu-kvm-powerpc.c
@@ -102,6 +102,7 @@ void kvm_arch_save_regs(CPUState *env)
 
 env->spr[SPR_SRR0] = regs.srr0;
 env->spr[SPR_SRR1] = regs.srr1;
+env->spr[SPR_BOOKE_PID] = regs.pid;
 
 env->spr[SPR_SPRG0] = regs.sprg0;
 env->spr[SPR_SPRG1] = regs.sprg1;
@@ -219,6 +220,33 @@ int handle_powerpc_dcr_write(int vcpu, u
 return 0; /* XXX ignore failed DCR ops */
 }
 
+int mmukvm_get_physical_address(CPUState *env, mmu_ctx_t *ctx,
+target_ulong eaddr, int rw, int access_type)
+{
+struct kvm_translation tr;
+uint64_t pid;
+uint64_t as;
+int r;
+
+pid = env->spr[SPR_BOOKE_PID];
+
+if (access_type == ACCESS_CODE)
+as = env->msr & msr_ir;
+else
+as = env->msr & msr_dr;
+
+tr.linear_address = as << 40 | pid << 32 | eaddr;
+r = kvm_translate(kvm_context, env->cpu_index, &tr);
+if (r == -1)
+return r;
+
+if (!tr.valid)
+return -EFAULT;
+
+ctx->raddr = tr.physical_address;
+return 0;
+}
+
 void kvm_arch_cpu_reset(CPUState *env)
 {
 }
diff --git a/qemu/target-ppc/cpu.h b/qemu/target-ppc/cpu.h
--- a/qemu/target-ppc/cpu.h
+++ b/qemu/target-ppc/cpu.h
@@ -98,6 +98,8 @@ enum powerpc_mmu_t {
 POWERPC_MMU_BOOKE_FSL  = 0x0009,
 /* PowerPC 601 MMU model (specific BATs format)*/
 POWERPC_MMU_601= 0x000A,
+/* KVM managing the MMU state  */
+POWERPC_MMU_KVM= 0x000B,
 #if defined(TARGET_PPC64)
 #define POWERPC_MMU_64   0x0001
 /* 64 bits PowerPC MMU */
diff --git a/qemu/target-ppc/helper.c b/qemu/target-ppc/helper.c
--- a/qemu/target-ppc/helper.c
+++ b/qemu/target-ppc/helper.c
@@ -1429,6 +1429,10 @@ int get_physical_address (CPUState *env,
 fprintf(logfile, "%s\n", __func__);
 }
 #endif
+
+if (env->mmu_model == POWERPC_MMU_KVM)
+return mmukvm_get_physical_address(env, ctx, eaddr, rw, access_type);
+
 if ((access_type == ACCESS_CODE && msr_ir == 0) ||
 (access_type != ACCESS_CODE && msr_dr == 0)) {
 /* No address translation */
diff --git a/qemu/target-ppc/translate_init.c b/qemu/target-ppc/translate_init.c
--- a/qemu/target-ppc/translate_init.c
+++ b/qemu/target-ppc/translate_init.c
@@ -9273,6 +9273,11 @@ int cpu_ppc_register_internal (CPUPPCSta
 case POWERPC_MMU_601:
 mmu_model = "PowerPC 601";
 break;
+#ifdef KVM
+case POWERPC_MMU_KVM:
+mmu_model = "PowerPC KVM";
+break;
+#endif
 #if defined (TARGET_PPC64)
 case POWERPC

[PATCH 0 of 3] update gdbstub support

2008-12-11 Thread Christian Ehrhardt
This patch series updates the gdbstub support for kvm.
Patch 1&2 introduce basic powerpc support while patch 3 fixes gdbstub generic
code that was broken in a qemu merge.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6 of 6] [PATCH] kvm-userspace: ppc: align with upstream qemu - pcibus

2008-12-10 Thread Christian Ehrhardt
# HG changeset patch
# User Christian Ehrhardt <[EMAIL PROTECTED]>
# Date 1228922789 -3600
# Node ID 9a7208ca1afab83913ee14c629bf27632ee6326b
# Parent  5adc6fbbd4a3b82e1bc8acbcb233c60e89715b61
[PATCH] kvm-userspace: ppc: align with upstream qemu - pcibus

From: Christian Ehrhardt <[EMAIL PROTECTED]>

ppc initializer now properly use the opaque PCIBus type. This is already
changed in all upstream qemu files. This patch changes kvm ppc440 files
to use the new type too.

Signed-off-by: Christian Ehrhardt <[EMAIL PROTECTED]>
---

[diffstat]
 ppc440.c|   11 ++-
 ppc440.h|3 ++-
 ppc440_bamboo.c |   10 +-
 3 files changed, 13 insertions(+), 11 deletions(-)

[diff]

diff --git a/qemu/hw/ppc440.c b/qemu/hw/ppc440.c
--- a/qemu/hw/ppc440.c
+++ b/qemu/hw/ppc440.c
@@ -13,6 +13,7 @@
 #include "hw.h"
 #include "hw/isa.h"
 #include "ppc440.h"
+#include "pci.h"
 
 #define PPC440EP_PCI_CONFIG 0xeec0
 #define PPC440EP_PCI_INTACK 0xeed0
@@ -29,12 +30,12 @@ void ppc440ep_init(CPUState *env,
target_phys_addr_t ram_sizes[PPC440_MAX_RAM_SLOTS],
int nbanks,
qemu_irq **picp,
-   ppc4xx_pci_t **pcip,
+   PCIBus **pcibusp,
int do_init)
 {
ppc4xx_mmio_t *mmio;
qemu_irq *pic, *irqs;
-   ppc4xx_pci_t *pci;
+   PCIBus *pcibus;
int i;
 
ppc_dcr_init(env, NULL, NULL);
@@ -59,14 +60,14 @@ void ppc440ep_init(CPUState *env,
ppc405_sdram_init(env, pic[14], nbanks, ram_bases, ram_sizes, do_init);
 
/* PCI */
-   pci = ppc4xx_pci_init(env, pic,
+   pcibus = ppc4xx_pci_init(env, pic,
  PPC440EP_PCI_CONFIG,
  PPC440EP_PCI_INTACK,
  PPC440EP_PCI_SPECIAL,
  PPC440EP_PCI_REGS);
-   if (!pci)
+   if (!pcibus)
printf("couldn't create PCI controller!\n");
-   *pcip = pci;
+   *pcibusp = pcibus;
 
isa_mmio_init(PPC440EP_PCI_IO, PPC440EP_PCI_IOLEN);
 
diff --git a/qemu/hw/ppc440.h b/qemu/hw/ppc440.h
--- a/qemu/hw/ppc440.h
+++ b/qemu/hw/ppc440.h
@@ -20,6 +20,7 @@
 #include "sysemu.h"
 #include "exec-all.h"
 #include "boards.h"
+#include "pci.h"
 
 #define PPC440_MAX_RAM_SLOTS 4
 
@@ -28,7 +29,7 @@ void ppc440ep_init(CPUState *env,
target_phys_addr_t ram_sizes[PPC440_MAX_RAM_SLOTS],
int nbanks,
qemu_irq **picp,
-   ppc4xx_pci_t **pcip,
+   PCIBus **pcip,
int do_init);
 
 #endif
diff --git a/qemu/hw/ppc440_bamboo.c b/qemu/hw/ppc440_bamboo.c
--- a/qemu/hw/ppc440_bamboo.c
+++ b/qemu/hw/ppc440_bamboo.c
@@ -37,7 +37,7 @@ void bamboo_init(ram_addr_t ram_size, in
target_phys_addr_t ram_sizes[PPC440_MAX_RAM_SLOTS];
NICInfo *nd;
qemu_irq *pic;
-   ppc4xx_pci_t *pci;
+   PCIBus *pcibus;
CPUState *env;
 uint64_t elf_entry;
 uint64_t elf_lowaddr;
@@ -102,7 +102,7 @@ void bamboo_init(ram_addr_t ram_size, in
 
/* call init */
printf("Calling function ppc440_init\n");
-   ppc440ep_init(env, ram_bases, ram_sizes, nbanks, &pic, &pci, 1);
+   ppc440ep_init(env, ram_bases, ram_sizes, nbanks, &pic, &pcibus, 1);
printf("Done calling ppc440_init\n");
 
/* load kernel with uboot loader */
@@ -197,12 +197,12 @@ void bamboo_init(ram_addr_t ram_size, in
env->nip = entry;
}
 
-   if (pci) {
+   if (pcibus) {
int unit_id = 0;
 
/* Add virtio block devices. */
while ((i = drive_get_index(IF_VIRTIO, 0, unit_id)) != -1) {
-   virtio_blk_init(pci->bus, 0x1AF4, 0x1001,
+   virtio_blk_init(pcibus, 0x1AF4, 0x1001,
drives_table[i].bdrv);
unit_id++;
}
@@ -212,7 +212,7 @@ void bamboo_init(ram_addr_t ram_size, in
nd = &nd_table[i];
if (!nd->model)
nd->model = "virtio";
-   pci_nic_init(pci->bus, nd, -1);
+   pci_nic_init(pcibus, nd, -1);
}
}
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2 of 6] [PATCH] kvm-userspace: ppc: fix configure enabling kvm for ppc

2008-12-10 Thread Christian Ehrhardt
# HG changeset patch
# User Christian Ehrhardt <[EMAIL PROTECTED]>
# Date 1228922788 -3600
# Node ID f100b1bfa5f3d084d68bd2c66244271db1f5d084
# Parent  b41f0d6129f51fb86bf799a5fe7b14a9270eeca4
[PATCH] kvm-userspace: ppc: fix configure enabling kvm for ppc

From: Christian Ehrhardt <[EMAIL PROTECTED]>

The configure script is missing the target/host cpu sync for powerpc kvm.

Signed-off-by: Christian Ehrhardt <[EMAIL PROTECTED]>
---

[diffstat]
 configure |1 +
 1 file changed, 1 insertion(+)

[diff]

diff --git a/qemu/configure b/qemu/configure
--- a/qemu/configure
+++ b/qemu/configure
@@ -1659,6 +1659,7 @@ if [ use_upstream_kvm = yes ]; then
 
 # Make sure the target and host cpus are compatible
 if test "$kvm" = "yes" -a ! \( "$target_cpu" = "$cpu" -o \
+  \( "$target_cpu" = "ppcemb" -a "$cpu" = "powerpc" \) -o \
   \( "$target_cpu" = "x86_64" -a "$cpu" = "i386"   \) -o \
   \( "$target_cpu" = "i386"   -a "$cpu" = "x86_64" \) \) ; then
   kvm="no"
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3 of 6] [PATCH] kvm-userspace: ppc: align with upstream qemu - breakpoint reset

2008-12-10 Thread Christian Ehrhardt
# HG changeset patch
# User Christian Ehrhardt <[EMAIL PROTECTED]>
# Date 1228922788 -3600
# Node ID c032d8555c9494f9812e4d4e0b5b511ae597
# Parent  f100b1bfa5f3d084d68bd2c66244271db1f5d084
[PATCH] kvm-userspace: ppc: align with upstream qemu - breakpoint reset

From: Christian Ehrhardt <[EMAIL PROTECTED]>

The way resettign that changed upstream, adopt new style in kvmppc code.

Signed-off-by: Christian Ehrhardt <[EMAIL PROTECTED]>
---

[diffstat]
 qemu-kvm-powerpc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

[diff]

diff --git a/qemu/qemu-kvm-powerpc.c b/qemu/qemu-kvm-powerpc.c
--- a/qemu/qemu-kvm-powerpc.c
+++ b/qemu/qemu-kvm-powerpc.c
@@ -31,7 +31,7 @@ extern kvm_context_t kvm_context;
 
 void cpu_reset(CPUState *env)
 {
-   memset(env->breakpoints, 0, sizeof(env->breakpoints));
+   memset(env, 0, offsetof(CPUPPCState, breakpoints));
cpu_ppc_reset(env);
 }
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4 of 6] [PATCH] kvm-userpace: ppc: align with upstream qemu - 4xxdevs

2008-12-10 Thread Christian Ehrhardt
# HG changeset patch
# User Christian Ehrhardt <[EMAIL PROTECTED]>
# Date 1228922788 -3600
# Node ID 214485869c303ab81c9da30b6784d641f58585f4
# Parent  c032d8555c9494f9812e4d4e0b5b511ae597
[PATCH] kvm-userpace: ppc: align with upstream qemu - 4xxdevs

From: Christian Ehrhardt <[EMAIL PROTECTED]>

This was mostly moved to qemu/hw/ppc4xx_pci.c and is no more needed
in hw/ppc4xx_devs.c

Signed-off-by: Christian Ehrhardt <[EMAIL PROTECTED]>
---

[diffstat]
 ppc4xx_devs.c |  371 --
 1 file changed, 371 deletions(-)

[diff]

diff --git a/qemu/hw/ppc4xx_devs.c b/qemu/hw/ppc4xx_devs.c
--- a/qemu/hw/ppc4xx_devs.c
+++ b/qemu/hw/ppc4xx_devs.c
@@ -2,9 +2,6 @@
  * QEMU PowerPC 4xx embedded processors shared devices emulation
  *
  * Copyright (c) 2007 Jocelyn Mayer
- *
- * Copyright 2008 IBM Corp.
- * Authors: Hollis Blanchard <[EMAIL PROTECTED]>
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to 
deal
@@ -29,8 +26,6 @@
 #include "ppc4xx.h"
 #include "sysemu.h"
 #include "qemu-log.h"
-#include "pci.h"
-#include "bswap.h"
 
 //#define DEBUG_MMIO
 //#define DEBUG_UNASSIGNED
@@ -537,369 +532,3 @@ qemu_irq *ppcuic_init (CPUState *env, qe
 
 return qemu_allocate_irqs(&ppcuic_set_irq, uic, UIC_MAX_IRQ);
 }
-
-
-
-
-#define PCIC0_CFGADDR   0x0
-#define PCIC0_CFGDATA   0x4
-
-#define PCIL0_PMM0LA0x0
-#define PCIL0_PMM0MA0x4
-#define PCIL0_PMM0PCILA 0x8
-#define PCIL0_PMM0PCIHA 0xc
-#define PCIL0_PMM1LA0x10
-#define PCIL0_PMM1MA0x14
-#define PCIL0_PMM1PCILA 0x18
-#define PCIL0_PMM1PCIHA 0x1c
-#define PCIL0_PMM2LA0x20
-#define PCIL0_PMM2MA0x24
-#define PCIL0_PMM2PCILA 0x28
-#define PCIL0_PMM2PCIHA 0x2c
-#define PCIL0_PTM1MS0x30
-#define PCIL0_PTM1LA0x34
-#define PCIL0_PTM2MS0x38
-#define PCIL0_PTM2LA0x3c
-#define PCI_REG_SIZE0x40
-
-#define PPC44x_PCI_MA_MASK   0xf000
-#define PPC44x_PCI_MA_ENABLE 0x1
-
-
-static uint32_t pci4xx_cfgaddr_read4(void *opaque, target_phys_addr_t addr)
-{
-ppc4xx_pci_t *ppc4xx_pci = opaque;
-return cpu_to_le32(ppc4xx_pci->pcic0_cfgaddr);
-}
-
-static CPUReadMemoryFunc *pci4xx_cfgaddr_read[] = {
-&pci4xx_cfgaddr_read4,
-&pci4xx_cfgaddr_read4,
-&pci4xx_cfgaddr_read4,
-};
-
-static void pci4xx_cfgaddr_write4(void *opaque, target_phys_addr_t addr,
-  uint32_t value)
-{
-ppc4xx_pci_t *ppc4xx_pci = opaque;
-
-value = le32_to_cpu(value);
-
-ppc4xx_pci->pcic0_cfgaddr = value & ~0x3;
-}
-
-static CPUWriteMemoryFunc *pci4xx_cfgaddr_write[] = {
-&pci4xx_cfgaddr_write4,
-&pci4xx_cfgaddr_write4,
-&pci4xx_cfgaddr_write4,
-};
-
-static uint32_t pci4xx_cfgdata_read1(void *opaque, target_phys_addr_t addr)
-{
-ppc4xx_pci_t *ppc4xx_pci = opaque;
-int offset = addr & 0x3;
-uint32_t cfgaddr = ppc4xx_pci->pcic0_cfgaddr;
-uint32_t value;
-
-if (!(cfgaddr & (1<<31)))
-return 0x;
-
-value = pci_data_read(ppc4xx_pci->bus, cfgaddr | offset, 1);
-
-return value;
-}
-
-static uint32_t pci4xx_cfgdata_read2(void *opaque, target_phys_addr_t addr)
-{
-ppc4xx_pci_t *ppc4xx_pci = opaque;
-int offset = addr & 0x3;
-uint32_t cfgaddr = ppc4xx_pci->pcic0_cfgaddr;
-uint32_t value;
-
-if (!(cfgaddr & (1<<31)))
-return 0x;
-
-value = pci_data_read(ppc4xx_pci->bus, cfgaddr | offset, 2);
-
-return cpu_to_le16(value);
-}
-
-static uint32_t pci4xx_cfgdata_read4(void *opaque, target_phys_addr_t addr)
-{
-ppc4xx_pci_t *ppc4xx_pci = opaque;
-int offset = addr & 0x3;
-uint32_t cfgaddr = ppc4xx_pci->pcic0_cfgaddr;
-uint32_t value;
-
-if (!(cfgaddr & (1<<31)))
-return 0x;
-
-value = pci_data_read(ppc4xx_pci->bus, cfgaddr | offset, 4);
-
-return cpu_to_le32(value);
-}
-
-static CPUReadMemoryFunc *pci4xx_cfgdata_read[] = {
-&pci4xx_cfgdata_read1,
-&pci4xx_cfgdata_read2,
-&pci4xx_cfgdata_read4,
-};
-
-static void pci4xx_cfgdata_write1(void *opaque, target_phys_addr_t addr,
-  uint32_t value)
-{
-ppc4xx_pci_t *ppc4xx_pci = opaque;
-int offset = addr & 0x3;
-
-pci_data_write(ppc4xx_pci->bus, ppc4xx_pci->pcic0_cfgaddr | offset,
-   value, 1);
-}
-
-static void pci4xx_cfgdata_write2(void *opaque, target_phys_addr_t addr,
-  uint32_t value)
-{
-ppc4xx_pci_t *ppc4xx_pci = opaque;
-int offset = addr & 0x3;
-
-value = le16_to_cpu(value);
-
-pci_data_write(ppc4xx_pci->bus, ppc4xx_pci->pcic0_cfgaddr | offset,
- 

[PATCH 5 of 6] [PATCH] kvm-userspace: ppc: use virtio-blk header

2008-12-10 Thread Christian Ehrhardt
# HG changeset patch
# User Christian Ehrhardt <[EMAIL PROTECTED]>
# Date 1228922789 -3600
# Node ID 5adc6fbbd4a3b82e1bc8acbcb233c60e89715b61
# Parent  214485869c303ab81c9da30b6784d641f58585f4
[PATCH] kvm-userspace: ppc: use virtio-blk header

From: Christian Ehrhardt <[EMAIL PROTECTED]>

virtio_blk_init now is in a separate header

Signed-off-by: Christian Ehrhardt <[EMAIL PROTECTED]>
---

[diffstat]
 ppc440_bamboo.c |1 +
 1 file changed, 1 insertion(+)

[diff]

diff --git a/qemu/hw/ppc440_bamboo.c b/qemu/hw/ppc440_bamboo.c
--- a/qemu/hw/ppc440_bamboo.c
+++ b/qemu/hw/ppc440_bamboo.c
@@ -19,6 +19,7 @@
 #include "ppc440.h"
 #include "qemu-kvm.h"
 #include "device_tree.h"
+#include "virtio-blk.h"
 
 #define BINARY_DEVICE_TREE_FILE "bamboo.dtb"
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1 of 6] [PATCH] kvm-userspace: ppc: fix compatfd build decision

2008-12-10 Thread Christian Ehrhardt
# HG changeset patch
# User Christian Ehrhardt <[EMAIL PROTECTED]>
# Date 1228922788 -3600
# Node ID b41f0d6129f51fb86bf799a5fe7b14a9270eeca4
# Parent  3af3fa5e009e143e1167979e55d547c453661059
[PATCH] kvm-userspace: ppc: fix compatfd build decision

From: Christian Ehrhardt <[EMAIL PROTECTED]>

qemu-kvm.c uses qemu_eventfd/qemu_signalfd. The code of compatfd takes care
if CONFIG_eventfd/CONFIG_signalfd is really enabled. But currently compatfd is
not build if --disable-aio is set. This patch lets compatfd.c build if USE_KVM
is set to allow qemu-kvm to be linked in all cases (with/without --disable-aio)

Signed-off-by: Christian Ehrhardt <[EMAIL PROTECTED]>
---

[diffstat]
 Makefile|2 +-
 Makefile.target |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

[diff]

diff --git a/qemu/Makefile b/qemu/Makefile
--- a/qemu/Makefile
+++ b/qemu/Makefile
@@ -59,7 +59,7 @@ BLOCK_OBJS += block-raw-posix.o
 BLOCK_OBJS += block-raw-posix.o
 endif
 
-ifdef CONFIG_AIO
+ifeq ($(USE_KVM), 1)
 BLOCK_OBJS += compatfd.o
 endif
 
diff --git a/qemu/Makefile.target b/qemu/Makefile.target
--- a/qemu/Makefile.target
+++ b/qemu/Makefile.target
@@ -655,7 +655,7 @@ OBJS+=block-raw-posix.o
 OBJS+=block-raw-posix.o
 endif
 
-ifdef CONFIG_AIO
+ifeq ($(USE_KVM), 1)
 OBJS+=compatfd.o
 endif
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: work around inability of older kvm modules to destroy memory regions

2008-12-10 Thread Christian Ehrhardt
+drop_mapping(start_addr);
+p = &mappings[nr_mappings++];
+p->phys = start_addr;
+p->ram = phys_offset;
+p->len = size;
+
 return;
 }

@@ -984,8 +1067,11 @@ void kvm_qemu_log_memory(target_phys_addr_t start, 
target_phys_addr_t size,
 {
 if (log)
kvm_dirty_pages_log_enable_slot(kvm_context, start, size);
-else
+else {
+if (must_use_aliases_target(start))
+return;
kvm_dirty_pages_log_disable_slot(kvm_context, start, size);
+}
 }

 int kvm_get_phys_ram_page_bitmap(unsigned char *bitmap)
@@ -1071,6 +1157,9 @@ void kvm_physical_sync_dirty_bitmap(target_phys_addr_t 
start_addr, target_phys_a
 {
 void *buf;

+if (must_use_aliases_source(start_addr))
+return;
+
 buf = qemu_malloc((end_addr - start_addr) / 8 + 2);
 kvm_get_dirty_pages_range(kvm_context, start_addr, end_addr - start_addr,
  buf, NULL, kvm_get_dirty_bitmap_cb);
@@ -1080,6 +1169,8 @@ void kvm_physical_sync_dirty_bitmap(target_phys_addr_t 
start_addr, target_phys_a
 int kvm_log_start(target_phys_addr_t phys_addr, target_phys_addr_t len)
 {
 #ifndef TARGET_IA64
+if (must_use_aliases_source(phys_addr))
+return 0;
 kvm_qemu_log_memory(phys_addr, len, 1);
 #endif
 return 0;
@@ -1088,6 +1179,8 @@ int kvm_log_start(target_phys_addr_t phys_addr, 
target_phys_addr_t len)
 int kvm_log_stop(target_phys_addr_t phys_addr, target_phys_addr_t len)
 {
 #ifndef TARGET_IA64
+if (must_use_aliases_source(phys_addr))
+return 0;
 kvm_qemu_log_memory(phys_addr, len, 0);
 #endif
 return 0;
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  



--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c
--- a/qemu/qemu-kvm.c
+++ b/qemu/qemu-kvm.c
@@ -766,7 +766,9 @@ int kvm_qemu_init()
 return 0;
 }
 
+#ifdef TARGET_I386
 static int destroy_region_works = 0;
+#endif
 
 int kvm_qemu_create_context(void)
 {
@@ -784,7 +786,9 @@ int kvm_qemu_create_context(void)
 r = kvm_arch_qemu_create_context();
 if(r <0)
 	kvm_qemu_destroy();
+#ifdef TARGET_I386
 destroy_region_works = kvm_destroy_memory_region_works(kvm_context);
+#endif
 return 0;
 }
 
@@ -793,6 +797,7 @@ void kvm_qemu_destroy(void)
 kvm_finalize(kvm_context);
 }
 
+#ifdef TARGET_I386
 static int must_use_aliases_source(target_phys_addr_t addr)
 {
 if (destroy_region_works)
@@ -849,6 +854,7 @@ static void drop_mapping(target_phys_add
 if (p)
 *p = mappings[--nr_mappings];
 }
+#endif
 
 void kvm_cpu_register_physical_memory(target_phys_addr_t start_addr,
   unsigned long size,
@@ -856,17 +862,21 @@ void kvm_cpu_register_physical_memory(ta
 {
 int r = 0;
 unsigned long area_flags = phys_offset & ~TARGET_PAGE_MASK;
+#ifdef TARGET_I386
 struct mapping *p;
+#endif
 
 phys_offset &= ~IO_MEM_ROM;
 
 if (area_flags == IO_MEM_UNASSIGNED) {
+#ifdef TARGET_I386
 if (must_use_aliases_source(start_addr)) {
 kvm_destroy_memory_alias(kvm_context, start_addr);
 return;
 }
 if (must_use_aliases_target(start_addr))
 return;
+#endif
 kvm_unregister_memory_area(kvm_context, start_addr, size);
 return;
 }
@@ -878,6 +888,7 @@ void kvm_cpu_register_physical_memory(ta
 if (area_flags >= TLB_MMIO)
 return;
 
+#ifdef TARGET_I386
 if (must_use_aliases_source(start_addr)) {
 p = find_ram_mapping(phys_offset);
 if (p) {
@@ -886,6 +897,7 @@ void kvm_cpu_register_physical_memory(ta
 }
 return;
 }
+#endif
 
 r = kvm_register_phys_mem(kvm_context, start_addr,
   phys_ram_base + phys_offset,
@@ -895,11 +907,13 @@ void kvm_cpu_register_physical_memory(ta
 exit(1);
 }
 
+#ifdef TARGET_I386
 drop_mapping(start_addr);
 p = &mappings[nr_mappings++];
 p->phys = start_addr;
 p->ram = phys_offset;
 p->len = size;
+#endif
 
 return;
 }
@@ -1068,8 +1082,10 @@ void kvm_qemu_log_memory(target_phys_add
 if (log)
 	kvm_dirty_pages_log_enable_slot(kvm_context, start, size);
 else {
+#ifdef TARGET_I386
 if (must_use_aliases_target(start))
 return;
+#endif
 	kvm_dirty_pages_log_disable_slot(kvm_context, start, size);
 }
 }
@@ -1157,8 +1173,10 @@ void kvm_physical_sync_dirty_bitmap(targ
 {
 void *buf;
 
+#ifdef TARGET_I386
 if (must_use_aliases_source(start_addr))
 return;
+#endif
 
 buf = qemu_malloc((end_addr - start_addr) / 8 + 2);
 kvm_get_dirty_pages_range(kvm_context, start_addr, end_addr - start_addr,
@@ -1168,21 +1186,21 @@ void kvm_physical_sync_dirty_bitmap(targ
 
 int kvm_log

kvm-userspace requires kvm capable kernel headers in default search path of the compiler

2008-12-10 Thread Christian Ehrhardt

Hi everyone,
while running a test when updating kvm-userspace for powerpc I found 
that the current kvm userspace requires kvm kernel headers in the 
default search path of the used compilers.
I used to update and build in the same kvm-userspace directory for a 
while and this one had an old stale kernel/include directory which 
fulfilled all the requirements.
While testing my current patch series on a new git clone of 
kvm-userspace I wondered why this doesn't work anymore (it worked in my 
old directory which also had the updated current source).


I switched to x86 to verify that issue there and I found that eventually 
the issue is in the kvm detection "KVM Probe" part of the qemu 
configuration in kvm-userspace. It failed not finding kvm headers.
  gcc -m32 -o /tmp/qemu-conf--21885-  
-I/home/paelzer/Desktop/kvm-userspace/kernel/include 
/tmp/qemu-conf--21885-.c
  /tmp/qemu-conf--21885-.c:1:23: error: linux/kvm.h: No such file or 
directory

  /tmp/qemu-conf--21885-.c:3:2: error: #error Invalid KVM version

Looking at the include paths it is worth to note that a current git 
clone of kvm-userspace has no kernel/include directory anymore. A few 
questions later to some other kvm developers I found that there headers 
can be found, but in the default search path of the compiler e.g. 
/usr/include. In my environment with an older gcc cross compiler for 
powerpc and no up to date linux headers installed for x86 both 
architectures failed.


The Solution to that can be done in several ways:
a) we decide that kvm-userspace needs up-to-date kernel headers 
installed. And modify the KVM Probe at least to tell the user about this 
possible reason when failing instead of silently switching to "KVM 
support no".
b) if the user already provide a --kerneldir option to specify where the 
right includes can be found we should give those configure checks a 
chance to really reach that. Atm it adds the kernel/include path of the 
kvm-userspace tree which doesn't exist anymore (except as stale old dir 
in a lot of source trees out there).
c) I overlooked something and there is an even better or trivial 
approach :-)


Since this could be solved several very different ways I think its worth 
a discussion.
As I prefer b) I attached a simple patch example how this could look 
like :-).


--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

diff --git a/configure b/configure
index 63f956c..f772bae 100755
--- a/configure
+++ b/configure
@@ -3,6 +3,7 @@
 prefix=/usr/local
 kernelsourcedir=
 kerneldir=/lib/modules/$(uname -r)/build
+qemu_kerneldir=
 cc=gcc
 ld=ld
 objcopy=objcopy
@@ -56,6 +57,7 @@ while [[ "$1" = -* ]]; do
 	;;
 	--kerneldir)
 	kerneldir="$arg"
+	qemu_kerneldir="$arg"
 	;;
 	--with-patched-kernel)
 	want_module=
@@ -84,9 +86,12 @@ while [[ "$1" = -* ]]; do
 esac
 done
 
-
-#set kenel directory
+#set libkvm kernel directory
 libkvm_kerneldir=$(readlink -f kernel)
+# use libkvm_kerneldir for qemu if no kerneldir option was set
+if test "$qemu_kerneldir" = "" ; then
+qemu_kerneldir=$libkvm_kerneldir
+fi
 
 case $arch in
 i?86*|x86_64*)
@@ -123,7 +128,7 @@ fi
 --disable-gcc-check \
 --extra-cflags="-I $PWD/../libkvm $qemu_cflags" \
 --extra-ldflags="-L $PWD/../libkvm $qemu_ldflags" \
---kerneldir="$libkvm_kerneldir" \
+--kerneldir="$qemu_kerneldir" \
 --prefix="$prefix" \
 ${cross_prefix:+"--cross-prefix=$cross_prefix"} \
 ${cross_prefix:+"--cpu=$arch"} "[EMAIL PROTECTED]"


Re: [PATCH] [mq]: fix-kvm-init.diff

2008-11-10 Thread Christian Ehrhardt

sorry - this should actually have some description ...
I'll resend it with one as soon as I find the issue why it is missing

Ehrhardt Christian wrote:

diff --git a/qemu/target-ppc/helper.c b/qemu/target-ppc/helper.c
--- a/qemu/target-ppc/helper.c
+++ b/qemu/target-ppc/helper.c
@@ -2959,10 +2959,8 @@
 env->cpu_model_str = cpu_model;
 cpu_ppc_register_internal(env, def);
 cpu_ppc_reset(env);
-#ifdef USE_KVM
 if (kvm_enabled())
-   kvm_init_new_ap(env->cpu_index, env);
-#endif
+   kvm_init_vcpu(env);
 return env;
 }

  



--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0 of 7] kvm-userspace: support multiple processors in the same architecture

2008-10-30 Thread Christian Ehrhardt

Hollis Blanchard wrote:

These patches allow the kvmctl bits (including testcases and libcflat) to be
built for multiple processor types within the same architecture (e.g. 440 and
e500). This is important because PowerPC supervisor mode can contain
significant differences between processors (it's user mode that's more or less
identical).

For example, the data in a TLB entry and how to manipulate the TLB
are a major difference between 440 and e500, which is critical here because
libcflat must create its own mappings and so must know which method to use.

Some of the complexity comes from user/Makefile *not* using the top-level
config.mak, so we have to add some of the same logic to both configure scripts
to generate both config.mak files.

Too much makefile logic depends on ARCH containing only the architecture
name, so it was simpler to create and export a separate PROCESSOR variable.

-Hollis
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
Unfortunately there is too often no "really nice" way to do Makefile 
magic :-)

I know you started with the arch-platform-os-compiler after our discussion,
but I like the $PROCESSOR solution for our *powerpc* Makefiles too.
And a good catch with that AR usage in patch 7.

(full series)
Acked-by: Christian Ehrhardt <[EMAIL PROTECTED]>

--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03 of 10] [PATCH] user: ppc: better error reporting in load_file

2008-10-29 Thread Christian Ehrhardt

Avi Kivity wrote:

Ehrhardt Christian wrote:

From: Hollis Blanchard <[EMAIL PROTECTED]>

Fancy description.
  


Ahem.


Sorry that is my patch template description :-/
A proper description header should be:

Subject: [PATCH] user: ppc: better error reporting in load_file

From: Hollis Blanchard <[EMAIL PROTECTED]>

This patch adds a better error reporting for powerpc testcases.
It prints the bytes read in load_file so far until an error occured.

Signed-off-by: Hollis Blanchard <[EMAIL PROTECTED]>
Signed-off-by: Christian Ehrhardt <[EMAIL PROTECTED]>
---

[diffstat]
main-ppc.c |3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

[diff]

--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] kvm-userspace: ppc: userspace fixes for powerpc

2008-10-28 Thread Christian Ehrhardt
Ok I should have send these two series with some minutes in between to 
not intermix them :-/

Additionally I have the wrong header in this one it should be [0/5] :-/++

Overall it is a patch series of three patches for powerpc kvm-userspace, 
a five patch series for kvm-suerspace/user/ and a single patch I just 
submitted while cleaning our userspace repo to get the missing things 
upstream.


Avi, let me know if I confused you and I'll send them once again with 
some time in between to order them easier.


[EMAIL PROTECTED] wrote:

From: Christian Ehrhardt <[EMAIL PROTECTED]>

This is a set of fixes for the powerpc tests kvm-userspace/user.

Patch 1&2 fix main-ppc.c while patch 3 introduces libcflat for powerpc.
Further on patch 4 provides a timebase accessor for the ppc testcases (not
used yet) and patch 5 finally adds a stub nmi handler to main-ppc.c.

[patches in series]
[PATCH 1/5] user: ppc: fix threading bugs in main-ppc.c
[PATCH 2/5] user: ppc: better error reporting in load_file
[PATCH 3/5] user: ppc: implement PowerPC 44x libcflat
[PATCH 4/5] libcflat: ppc: add timebase accessor
[PATCH 5/5] user: ppc: add stub nmi handler

---
[diffstat]
 b/user/config-powerpc-44x.mak  |   14 +
 b/user/config-powerpc.mak  |   46 -
 b/user/main-ppc.c  |   32 +++-
 b/user/test/lib/powerpc/44x/map.c  |   51 +
 b/user/test/lib/powerpc/44x/timebase.S |   28 ++
 b/user/test/lib/powerpc/44x/timebase.h |   25 
 b/user/test/lib/powerpc/44x/tlbwe.S|   29 ++
 b/user/test/lib/powerpc/io.c   |   35 ++
 b/user/test/powerpc/cstart.S   |   38 
 b/user/test/powerpc/exit.c |   23 ++
 user/config-powerpc-44x.mak|3 +
 user/main-ppc.c|9 +
 12 files changed, 296 insertions(+), 37 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  



--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/3] kvm-userspace: kvmppc: fix hostlonbits detection when cross compiling v2

2008-10-01 Thread Christian Ehrhardt

malc wrote:

On Tue, 30 Sep 2008, [EMAIL PROTECTED] wrote:


From: Christian Ehrhardt <[EMAIL PROTECTED]>

*update*
further debugging according to some requests revealed that 
ARCH_CFLAGS does
not contain all CFLAGS that might be needed, especially those 
supplied via
extra-cflags. Therefore people supplying things via extra-cflags 
instead of an

environment variable might have had issues.


This part i don't get, there are few more checks before/after 
hostlongbits where no CFLAGS are added to the $cc argument list. What

makes hostlongbits selection "special"? Do people specify -m32/-m64 via
--extra-cflags?

it was there to ensure availability of the needed include paths to reach 
wordsize.h.
But Hollis approach is much simpler, better and more reliable so never 
mind :-)




A recent kvm merge with qemu brought code for 64bit power that broke 
cross

compilation. The issue is caused by configure trying to execute target
architecture binaries where configure is executed.


Yes, i never thought about cross-compilation, my bad.

np, now it's fixed - thanks for quickly applying it.



I tried to change that detection so that it works with&without cross
compilation with only a small change and especially without an addtional
configure command line switch. Including the bits/wordsize.h header a 
platform

usually can check its wordsize and by doing that configure can check the
hostlongbits without executing the binary. Instead it now stops after
preprocessing stage which resolved the __WORDSIZE constant and retrieves
that value.

I don't like my new check style, but it is at least less broken than 
before.

Another approach that was suggested was that qemu might end up needing
something like asm-offsets in the kernel to manage architecture sizes 
etc.

Comments and other approaches welcome.



I think Hollis Blanchard's method is sound,

Thank you for bringing this up.




--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0 of 5] PowerPC patches for 2.6.27

2008-07-29 Thread Christian Ehrhardt
On Sunday 27 July 2008 10:50:38 Avi Kivity wrote:
> Hollis Blanchard wrote:
> > Hi Avi, can these patches go upstream for 2.6.27? There's a bug fix,
> > the addition of hardware breakpoint functionality, and three very
> > significant performance improvements.
> >
> > By the way, I will be on vacation for a few weeks starting Monday, but
> > Christian Ehrhardt should be able to take care of any technical issues.
>
> Applied all; thanks.  I prefer to only merge bug fixes at this time for
> 2.6.27.  As far as I can tell, patch 2 is independent of the rest so
> I'll queue that.  Let me know if that works.

Yes these patches are independent - for the type question you can consider
1/5 - feature
2/5 - bug fix
3,4,5/5 - yeah what is it .. neither a pure bug fix nor a feature let's call 
it "performance fix"

We tested most of those performance fixes at least 1.5 Months now, having them 
applied while developing new things on top and I feel very good with 3&4.
It would be nice if at least the performance fixes 3&4 could go in now under 
the bug fix label too, but it's up to you.

-- 

Grüsse / regards, 
Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5 of 5] kvm: powerpc: Map guest userspace with TID=0 mappings

2008-07-29 Thread Christian Ehrhardt
On Monday 28 July 2008 12:33:41 Liu Yu wrote:
> I have a question that I could not think through.
> While multiple qemu/kvm processes are running at the same time, how to
> prevent one guest from using others' TLB? For all the guests have the
> same TID=0 for userspace and TID=1 for kernel.
[...]

Hi Yu Liu, thats a good question.
Afaik thats solved by the fact that the shadow tlb which is used when entering 
guest context is per vcpu. Therefor a guest has always it's own shadow tlb 
active and no mappings to the content of other guests.

This patch just allows us that a single guest userspace process accessing the 
kernel 20 times (and changing privilege level 20 times by doing so) can run 
without tlb flushes.
Guest-userspace context switch (pid is changing) -> tlb flush; and guest 
switches (guest A -> guest B) -> other shadow tlb active; should still be 
working fine.

> >
> > The net is that we don't need to flush the TLB on privilege
> > switches, but we do on guest context switches (which are far
> > more infrequent). Guest boot time performance improvement: about 30%.
> >

-- 

Grüsse / regards, 
Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2/RFC] libkvm-s390

2008-07-17 Thread Christian Ehrhardt

Christian Borntraeger wrote:

This is an update patch for libkvm to build and work on s390

[...]

Index: kvm-userspace/libkvm/libkvm-s390.c
===
--- /dev/null
+++ kvm-userspace/libkvm/libkvm-s390.c
@@ -0,0 +1,137 @@
+/*
+ * This file contains the s390 specific implementation for the
+ * architecture dependent functions defined in kvm-common.h and
+ * libkvm.h
+ *
+ * Copyright (C) 2006 Qumranet
+ * Copyright IBM Corp. 2008
+ *
+ * Authors:
+ * Carsten Otte <[EMAIL PROTECTED]>
+ *  Christian Borntraeger <[EMAIL PROTECTED]>
  
whitespace error - well just in the comment, but if you resubmit it 
anyway this space could be removed ;-)


[...]

+
+int handle_dcr(struct kvm_run *run,  kvm_context_t kvm, int vcpu)
+{
+   fprintf(stderr, "%s: Operation not supported\n", __FUNCTION__);
+   return -1;
+}
+
  


I think Carsten started with the powerpc file here.
This function is powerpc only and even in our code only called later on 
in this libkvm-$arch.c file.

Maybe we should patch our file and make it static to clarify that.
Since you don't have that call its superfluous, you should be able to 
drop it without consequence.


[...]

Index: kvm-userspace/libkvm/libkvm.c
===
--- kvm-userspace.orig/libkvm/libkvm.c
+++ kvm-userspace/libkvm/libkvm.c
@@ -48,6 +48,10 @@
 #include "kvm-powerpc.h"
 #endif

+#if defined(__s390__)
+#include "kvm-s390.h"
+#endif
+
 int kvm_abi = EXPECTED_KVM_API_VERSION;
 int kvm_page_size;

@@ -88,7 +92,11 @@ int get_free_slot(kvm_context_t kvm)
if (tss_ext > 0)
i = 0;
else
+#if !defined(__s390__)
i = 1;
+#else
+   i = 0;
+#endif

  
This looks a bit complicated, but when thinking of the result this just 
means that i is always 0 on s390.
Therefore I think it would look less confusing when the ifdef would not 
be in the else part of that tss_ext if.


Since i will always be zero in s390 case you could do things like
- initialize i with 0 and let the whole block until the for loop run in 
"if !defined(__s390__)"


That way you would additionally stop modifying the tss_ext variable 
which is not used in the s390 case anyway.


[...]

--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmtrace?

2008-07-10 Thread Christian Ehrhardt
On Monday 07 July 2008 17:34:13 Jun Koi wrote:
> On Mon, Jul 7, 2008 at 11:07 PM, Christian Ehrhardt
>
> <[EMAIL PROTECTED]> wrote:
> > Jun Koi wrote:
> >> Hi,
> >>
> >> Could anybody explain a bit about kvmtrace? I cannot find any doc on
> >> it anywhere.
> >>
> >> - Is there any short usage instruction for kvmtrace?
> >
> > I don't know if there is an official doc yet, but you can just start with
> > the simple version like:
> >  ./kvmtrace -o outfile -w 5
> > which traces for 5 seconds to outfile.*
>
> This really surprises me. I expected that we can collect trace for a
> specific VM. But it seems that we always trace all the VMs currently
> running??

usually you want to trace all to ensure that you can interpret sideeffects.
Otherwise you might often urn into wrong interpretation or a good assumption 
and you would "just" need the data for the other guests to ensure your 
theory.
Anyway if you really think we need it you might code it in the probe function
near the head of kvm_add_trace, there is a check for KVM_TRACE_STATE_RUNNING
and there you could easily implement some filter mechanism (you'll want to 
extend the ioctls too to get your mask).
But anyway, as I mentioned you usually want all data to "be sure" what you see 
in your trace.

The only scenario where I think we would really need a mask is in very big 
many guest scenarios, but today I think no one except the s390 guys have 
enough guests to have that need practically.

> With trace feature always ON, I guess we suffer some overhead, right?
> Perhaps the overhead is little, but still we have. If so, is there
> anyway to turn the trace feature OFF?

Well, you can turn it off permanent by disabling CONFIG_KVM_TRACE.
But be aware that kvm_trace uses the markers infrastructure (see 
Documentation/markers.txt) which means that the overhead is really really low 
as long as you don't actively use kvmtrace (invoked via the userspace tool).
And if you use it, well then it should do something ;-)

> Many thanks,
> Jun
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 

Grüsse / regards, 
Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] kvmtrace: make cycle calculation architecture aware

2008-07-09 Thread Christian Ehrhardt

Hollis Blanchard wrote:

On Mon, 2008-07-07 at 15:56 +0200, [EMAIL PROTECTED] wrote:
  

From: Christian Ehrhardt <[EMAIL PROTECTED]>

The current implementation of kvmtrace uses always a 64 bit cycle variable,
but get_cycles() which is used to fill it is "unsigned long" which might be 32
bit.
This reduces the accuracy e.g. on embedded powerpc since we would have a 64bit
value but get_cycle() only returns the low 32 bit.
To solve that this patch introduces kvm_arch_trace_cycles() which allows us
to make this calculation architecture aware. That way every architecture can
insert whatever fits best for their "kvmtrace cycle counter".

Signed-off-by: Christian Ehrhardt <[EMAIL PROTECTED]>



"cycles" is a very poor name, because that's not really what we're
talking about at all. (Also, that function name made me wonder what a
"trace cycle" is. :)

I would strongly prefer using "timestamp" instead. It would be nice if
we could rename the data structure too, but I'd settle for only properly
naming the new architecture function hook.
  
In fact, if we want to be rigorous about it, it should really be

something like "nanoseconds" instead, so that userspace wouldn't need to
perform awkward conversions of "cycles" or "timebase ticks" to real
time. It looks like getnstimeofday() would do the trick, and that way we
wouldn't need an arch-specific hook at all
I agree that the naming is poor. I also wondered about it and just 
continued it to the kvm_arch function.
I'll change the naming - timestamp would be fine, but only if it is 
really time and not a cycle counter.
I experimented with the tv_nsec portion and the classic current_time and 
accuracy there was just far to low compared to the time base register 
data (5 to 20 instructions until tv.nssec portion changed).


I checked the getnstimeofday function you mentioned and compared 
accuracy again and yes that is exactly what I would like to use:

here a ittle example
263527418457 (+   36696)ns=0x1c43286a
263527740677 (+  30)   ns=0x1c4a880c (+ 75FA2 = 483234)
263527779757 (+   39080)ns=0x1c4b6cae (+ E4A2   =  58530)
While the unit is not the same (explains the difference in the same line 
e.g. 39080 vs. 58530) the accuracy is the same.
this can be seen due to the fact that it changes by the same ratio all 
over the file.

For the example I had above
cycle = 30/39080 = 8.25
nsec = 483234/58530 = 8.25

That leads me to the point where I completely agree with Hollis that the 
naming should be timestamp as well as that the function should rely on 
getnstimeofday() instead of get_cycles() and that way would remove the 
need kvm_arch_ function.


So the question that is left before changing that is, if the original 
author had something special in mind chosing cycles here.

I added Eric on CC for that.

I wait with my resubmission of the patch series until all architectures 
agree *hope* on using getnstimeofday() - after an ack from all sides I 
would revise my patch series and submit that changes alltogether.


--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] kvmtrace: make cycle calculation architecture aware

2008-07-09 Thread Christian Ehrhardt

Hollis Blanchard wrote:

On Mon, 2008-07-07 at 15:56 +0200, [EMAIL PROTECTED] wrote:
  

From: Christian Ehrhardt <[EMAIL PROTECTED]>

The current implementation of kvmtrace uses always a 64 bit cycle variable,
but get_cycles() which is used to fill it is "unsigned long" which might be 32
bit.
This reduces the accuracy e.g. on embedded powerpc since we would have a 64bit
value but get_cycle() only returns the low 32 bit.
To solve that this patch introduces kvm_arch_trace_cycles() which allows us
to make this calculation architecture aware. That way every architecture can
insert whatever fits best for their "kvmtrace cycle counter".

Signed-off-by: Christian Ehrhardt <[EMAIL PROTECTED]>



Just one comment below...

BTW, because this breaks the ia64 and s390 builds, it might be nice to
CC the appropriate list/maintainer directly.

  
you're right - I'll add dummy stubs for ia64&s390 using the classic 
get_cycle() call and resubmit the series.

---

[diffstat]
 arch/powerpc/kvm/powerpc.c |   25 +
 arch/x86/kvm/x86.c |5 +
 include/linux/kvm_host.h   |2 ++
 virt/kvm/kvm_trace.c   |2 +-
 4 files changed, 33 insertions(+), 1 deletion(-)

[diff]

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -521,6 +521,31 @@
return r;
 }

+/*
+ * We need a 64 bit value here and the default get_cycles has only 32bit (tbl)
+ * on 32bit ppc. Since we have a 64bit counter for that we provide it here in
+ * full resolution for the trace records.
+*/
+__u64 kvm_arch_trace_cycles()
+{
+   unsigned long ruval;
+   unsigned long ruval2;
+   unsigned long rlval;
+
+   /* get a consistant pair of upper/lower timebase (no wrap occured) */
+   asm volatile(
+   "loop:\n"
+ "mftbu %0\n"
+ "mftbl %1\n"
+ "mftbu %2\n"
+ "cmpw %0, %2\n"
+ "bne loop"
+   : "=r" (ruval), "=r" (rlval), "=r" (ruval2)
+   );
+
+   return (((__u64)ruval) << 32) | rlval;
+}



You should use get_tb() here (see asm-powerpc/time.h).
  
yep does the same, I didn't see it and coded the asm from the processor 
manual hint at the time base register.

Anyway - I'll change the code to get_tb() - thanks for the hint.

--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmtrace?

2008-07-07 Thread Christian Ehrhardt

Jun Koi wrote:

Hi,

Could anybody explain a bit about kvmtrace? I cannot find any doc on
it anywhere.

- Is there any short usage instruction for kvmtrace?
  
I don't know if there is an official doc yet, but you can just start 
with the simple version like:

  ./kvmtrace -o outfile -w 5
which traces for 5 seconds to outfile.*

After that you can cat that file and pipe it to kvmtrace_format which 
converts it to a readable file:

  cat outfile.kvmtrace.0 | ./kvmtrace_format formats

formats is a simple file defining how to present the trace records.


- Which kind of data kvmtrace can collect?
  
Due to the structure of the "formats" file (lockated in 
kvm-userspace/user/formats) this is also a list of the event types 
currently tracked - it is readable ascii and lists event+additional data 
reported.


Both kvmtrace (c) & kvmtrace_formats (python) are not very big (yet) so 
if you want to go further just look into the code. While I'm sure that 
it will grow in complexity in the future it's clearly arranged and easy 
to read atm.



Many thanks,
Jun
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  



--

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvmtrace: fix "Remove use of bit fields in kvm trace structure" patch

2008-07-03 Thread Christian Ehrhardt


[PATCH] kvmppc: kvmtrace: fix "Remove use of bit fields in kvm trace 
structure"


From: Christian Ehrhardt <[EMAIL PROTECTED]>

The current patch kvm_trace_nobitfields.diff in the kvmppc.hg-dev repo
doesn't compile and has some minor flaws.
This patch is known to the kvm list as "[PATCH] [v2] Remove use of bit
fields in kvm trace structure" from Jerone.

Problem:
code:
  rec.rec_val |= TRACE_REC_EVENT_ID(va_arg(*args, u32));
makro:
  #define TRACE_REC_EVENT_ID (val) \
   (0x0fff & (val))

is extended to:
  rec.rec_val |= (val) (0x0fff & (val))(__builtin_va_arg(*args,u32));

that is failing with:
  kvm_trace.c: In function 'kvm_add_trace':
  kvm_trace.c:66: error: 'val' undeclared (first use in this function)
  kvm_trace.c:66: error: (Each undeclared identifier is reported only once
  kvm_trace.c:66: error: for each function it appears in.)

Obviously caused by the blank between macro definition and (val) which makes
it a non-parameter macro.

Additionally the two macro definitions below don't put () around the 
parameter
which is one of the golden macro rules to keep operator precedence 
independent

to what someone passes as argument to your macro.

Further I found that part of the path added a superfluous assignment in
kvm_trace.c (initialize with =0 and directly or something in with |=).

This patch fixes all three issues. As Hollis pointed out yesterday patch 
#1 of

the initial series of 3 patches should already be applied according to your
(Avi) response while it is not seen on kvm-commits yet.
He offered to collect those patches in his patch queue and send you a batch
soon.
That said you can consider this patch FYI (to collect comments now) and wait
until our consolidated batch arrives including all fixes at once.

Signed-off-by: Christian Ehrhardt <[EMAIL PROTECTED]>
---

[diffstat]
include/linux/kvm.h  |   10 +-
virt/kvm/kvm_trace.c |4 +---
2 files changed, 6 insertions(+), 8 deletions(-)

[diff]

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -331,12 +331,12 @@
} u;
} __attribute__((packed));

-#define TRACE_REC_EVENT_ID (val) \
+#define TRACE_REC_EVENT_ID(val) \
(0x0fff & (val))
-#define TRACE_REC_NUM_DATA_ARGS (val) \
-(0x7000 & (val << 28))
-#define TRACE_REC_TCS (val) \
-(0x8000 & (val << 31))
+#define TRACE_REC_NUM_DATA_ARGS(val) \
+(0x7000 & ((val) << 28))
+#define TRACE_REC_TCS(val) \
+(0x8000 & ((val) << 31))

#define KVMIO 0xAE

diff --git a/virt/kvm/kvm_trace.c b/virt/kvm/kvm_trace.c
--- a/virt/kvm/kvm_trace.c
+++ b/virt/kvm/kvm_trace.c
@@ -60,10 +60,8 @@
if (unlikely(kt->trace_state != KVM_TRACE_STATE_RUNNING))
return;

-rec.rec_val = 0;
-   
/* set event id */   
-rec.rec_val|= TRACE_REC_EVENT_ID(va_arg(*args, u32));

+rec.rec_val= TRACE_REC_EVENT_ID(va_arg(*args, u32));

vcpu    = va_arg(*args, struct kvm_vcpu *);
rec.pid= current->tgid;

---

Grüsse / regards, 
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html