Re: lapic & npt

2008-06-30 Thread Gerd Hoffmann
Avi Kivity wrote:
> Gerd Hoffmann wrote:
>>   Hi,
>>
>> I've just noticed that xenner doesn't work for 64bit xen guest kernels
>> on my new shiny barcelona box.  The VM crashes when trying to access the
>> lapic.  lapic setup is done before idt setup, register dump looks like
>> init state, thus it most likely is a triple fault resetting the vcpu.
>> Turning off npt fixes it.
>>
>> xenner maps the lapic to 8200 (64bit guests) or fe90
>> (32bit guests).  32bit works fine even with npt enabled.
>>
>> host kernel is fedora 9 with kvm-69 modules.
>>
>> ideas anyone?
> 
> Turn on logging in lapic.c.  See if something turns up.

It doesn't come that far according to kvmtrace.

> If not, the page tables are probably set up incorrectly, but in a way
> that kvm doesn't notice.

Dunno.  Tried kvmtrace and got the results attached without and with
NTP.  Cutted down to the important sequence, starting with the apic MSR
read.  After the msr access there are some page faults (some page table
pages are allocated and used to map the apic, thus likely triggering
shadow pt updates).  Then the apic access (id register).

With npt kvm doesn't see the apic access at all.  Also the TDP_FAULT
line looks fishy to me.  The "virt = ... " isn't a (guest) virtual address.

btw: the iowrite to 0x00ea is console output.

ideas anyone?
  Gerd

-- 
http://kraxel.fedorapeople.org/xenner/
1673786946412 (+6068)  VMEXITvcpu = 0x  pid = 0x102d [ 
exitcode = 0x007c, rip = 0x8300 a1d2 ]
0 (+   0)  MSR_READ  vcpu = 0x  pid = 0x102d [ MSR# = 
0x001b, data = 0x fee00900 ]
1673786950748 (+4336)  VMENTRY   vcpu = 0x  pid = 0x102d
1673786957068 (+6320)  VMEXITvcpu = 0x  pid = 0x102d [ 
exitcode = 0x0003, rip = 0x8300 6ceb ]
0 (+   0)  CR_READ   vcpu = 0x  pid = 0x102d [ CR# = 3, 
value = 0x 017a3000 ]
1673786966522 (+9454)  VMENTRY   vcpu = 0x  pid = 0x102d
1673786973678 (+7156)  VMEXITvcpu = 0x  pid = 0x102d [ 
exitcode = 0x004e, rip = 0x8300 4023 ]
0 (+   0)  PAGE_FAULTvcpu = 0x  pid = 0x102d [ errorcode = 
0x0002, virt = 0x8300 00048000 ]
1673787002404 (+   28726)  VMENTRY   vcpu = 0x  pid = 0x102d
1673787034034 (+   31630)  VMEXITvcpu = 0x  pid = 0x102d [ 
exitcode = 0x004e, rip = 0x8300 4023 ]
0 (+   0)  PAGE_FAULTvcpu = 0x  pid = 0x102d [ errorcode = 
0x0002, virt = 0x8300 00049000 ]
1673787049388 (+   15354)  VMENTRY   vcpu = 0x  pid = 0x102d
1673787080890 (+   31502)  VMEXITvcpu = 0x  pid = 0x102d [ 
exitcode = 0x004e, rip = 0x8300 a1ef ]
0 (+   0)  PAGE_FAULTvcpu = 0x  pid = 0x102d [ errorcode = 
0x, virt = 0x8200 0020 ]
0 (+   0)  APIC_ACCESS   vcpu = 0x  pid = 0x102d [ offset = 
0x0020 ]
1673787096270 (+   15380)  VMENTRY   vcpu = 0x  pid = 0x102d
1834321779170 (+6312)  VMEXITvcpu = 0x  pid = 0x1080 [ 
exitcode = 0x007c, rip = 0x8300 a1d2 ]
0 (+   0)  MSR_READ  vcpu = 0x  pid = 0x1080 [ MSR# = 
0x001b, data = 0x fee00900 ]
1834321783872 (+4702)  VMENTRY   vcpu = 0x  pid = 0x1080
1834321793790 (+9918)  VMEXITvcpu = 0x  pid = 0x1080 [ 
exitcode = 0x0400, rip = 0x8300 4023 ]
0 (+   0)  TDP_FAULT vcpu = 0x  pid = 0x1080 [ errorcode = 
0x0006, virt = 0x 00048000 ]
1834321818582 (+   24792)  VMENTRY   vcpu = 0x  pid = 0x1080
1834321851480 (+   32898)  VMEXITvcpu = 0x  pid = 0x1080 [ 
exitcode = 0x0400, rip = 0x8300 4023 ]
0 (+   0)  TDP_FAULT vcpu = 0x  pid = 0x1080 [ errorcode = 
0x0006, virt = 0x 00049000 ]
1834321870858 (+   19378)  VMENTRY   vcpu = 0x  pid = 0x1080
1834321906650 (+   35792)  VMEXITvcpu = 0x  pid = 0x1080 [ 
exitcode = 0x0400, rip = 0x8300 53b1 ]
0 (+   0)  TDP_FAULT vcpu = 0x  pid = 0x1080 [ errorcode = 
0x0004, virt = 0x d000 ]
1834321912818 (+6168)  VMENTRY   vcpu = 0x  pid = 0x1080
1834321931382 (+   18564)  VMEXITvcpu = 0x  pid = 0x1080 [ 
exitcode = 0x007b, rip = 0x8300 b256 ]
0 (+   0)  IO_WRITE  vcpu = 0x  pid = 0x1080 [ port = 
0x00ea, size = 1 ]
1834322090822 (+  159440)  VMENTRY   vcpu = 0x  pid = 0x1080


[PATCH] Use register accessor for invalid_guest_state()

2008-06-30 Thread Mohammed Gamal
I am still using Guillaume's real mode patches in my local tree. This
fixes a compilation error I came across after the lastest pull. I am
sending just in case you re-apply the patch.


Signed-off-by: Mohammed Gamal <[EMAIL PROTECTED]>


 arch/x86/kvm/vmx.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1d0087e..8bba4e0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2684,7 +2684,7 @@ static int invalid_guest_state(struct kvm_vcpu *vcpu,
 {
u16 ss, cs;
u8 opcodes[4];
-   unsigned long rip = vcpu->arch.rip;
+   unsigned long rip = kvm_rip_read(vcpu);
unsigned long rip_linear;

ss = vmcs_read16(GUEST_SS_SELECTOR);
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2001452 ] Restarted Windows 2003 Server guests have disk corruption

2008-06-30 Thread SourceForge.net
Bugs item #2001452, was opened at 2008-06-24 07:27
Message generated for change (Comment added) made by gerdwachs
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2001452&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: intel
Group: None
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: gwachs (gerdwachs)
Assigned to: Nobody/Anonymous (nobody)
Summary: Restarted Windows 2003 Server guests have disk corruption

Initial Comment:
I have a number of Windows 2003 32Bit guests.

I use them to perform installation and configuration
tests of a large software product.

During these tests, the guests are restarted.

Randomly, the guests produce disk corruption messages
after a restart.

The following are two examples :

---
Windows  Registry Hive Recovered

Registry hive (file): SOFTWARE was corrupted and it has
been recovered. Some data might have been lost.
---
The system cannot log on due to the following error:
Unable to complete the requested operation because of
either a catastrophic media failure or a data structure
corruption on the disk.
---

OS : Ubuntu 8.04 x86_64

Kernel : 2.6.24-18-server #1 SMP x86_64 GNU/Linux

KVM: kvm-70

CPU: Intel(R) Core(TM)2 Quad CPU @ 2.40GHz

flags  : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm 
constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 
ssse3 cx16 xt

Start Command : sudo /usr/local/kvm/bin/qemu-system-x86_64 -hda asit51ascs.img \
-m 1024 -std-vga -boot c -k sv -usb -usbdevice tablet -snapshot 
-vnc :51 \
   -net nic,vlan=0,macaddr=00:16:3e:00:51:00 -net 
tap,vlan=0,script=/etc/qemu-ifup-br0 \
   -net nic,vlan=1,macaddr=00:16:3e:00:51:01 -net 
tap,vlan=1,script=/etc/qemu-ifup-br1

no-kvm : Cannot do due to the loss of performance.
 Tests execute time is 7 hours with kvm.




--

>Comment By: gwachs (gerdwachs)
Date: 2008-06-30 13:27

Message:
Logged In: YES 
user_id=2122332
Originator: YES

Sorry, I am not that advanced on the usage of git.

If you would care to send instructions, I will try.

I am currently using the latest snapshots. 

It appears to be working enough to for my requirements, but I have still
been getting the odd corruption message.

P.S. I believe that kvm is an absolute winning concept.


--

Comment By: Brian Jackson (iggy_cav)
Date: 2008-06-27 16:09

Message:
Logged In: YES 
user_id=611130
Originator: NO

For some reason, I thought a virtio patch would help you, when you
obviously aren't using that with windows guests. It could still be
something with the i/o thread though. Is there any way you can do a git
bisect to figure out where exactly it breaks? I know it's hard to do
something like that when it takes so long to trigger the issue. It may be
our only option though.

kvm-69 doesn't have the i/o thread (I think), so it should be safe to use
if you just need something that works.

--

Comment By: gwachs (gerdwachs)
Date: 2008-06-27 12:44

Message:
Logged In: YES 
user_id=2122332
Originator: YES

Patch did not fix problem.
Had one XP guest hang with build 20080626
Are currently running with build 20080626 on both hosts
to assess disk corruption occurances.

--

Comment By: gwachs (gerdwachs)
Date: 2008-06-27 10:25

Message:
Logged In: YES 
user_id=2122332
Originator: YES

Regarding the patch. This seemed to fix the problem, 
but will keep re-running for a week before being certain.

Regarding the computer freezing, tried snapshot 20080626, 
whilst one guest SEEMED to hang for seconds/minutes, it kept running.

Thank you very much iggy_cav


--

Comment By: gwachs (gerdwachs)
Date: 2008-06-26 12:18

Message:
Logged In: YES 
user_id=2122332
Originator: YES

I have applied the patch and begun testing.
I will update after testing.


--

Comment By: Brian Jackson (iggy_cav)
Date: 2008-06-25 14:59

Message:
Logged In: YES 
user_id=611130
Originator: NO

Can you try to revert (patch -R) the virtio async feature? Someone else in
the irc channel that was having fs corruption had luck doing that.
http://people.redhat.com/~mtosatti/virtioblk-async.patch

Otherwise, just stick with the kvm-69 userspace until it's fixed.


Re: KVM: VMX: cache exit_intr_info

2008-06-30 Thread Yang, Sheng
On Saturday 28 June 2008 13:35:27 Marcelo Tosatti wrote:
> On Sat, Jun 28, 2008 at 11:20:47AM +0800, Yang, Sheng wrote:
> > On Saturday 28 June 2008 02:05:19 Marcelo Tosatti wrote:
> > > exit_intr_info is read-only in nature, so once read it can be
> > > cached similarly to idtv_vectoring_inf.
> > >
> > > Reduces guest re-entry in about 50 cycles on my machine (the
> > > exception path should be similar, but haven't measured).
> > >
> > > Applies on top of register accessor patch.
> > >
> > > Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>
> >
> > Thanks for the patches! :)
> >
> > And I realized there are also too much vmcs_read32
> > (CPU_BASED_VM_EXEC_CONTROL)(though not read only). I'd like to
> > post another patch to optimize it later.
>
> GUEST_INTERRUPTIBILITY_INFO is also a candidate, with significant
> wins (used by skip_emulated_instruction which is often used in the
> exit handlers).
>
> GUEST_RFLAGS is another register read multiple times in the fast
> path, but seems trickier.
>
> Do you have a better suggestion instead of
> vmcs_cache_read32/vmcs_cache_write32 below for this caching
> optimizations?

I think we may include more MSRs, though not all of them in the 
critical path. GUEST_INTERRUPTIABILITY_INFO is on the critical path, 
as well as VM_ENTRY_INTR_INFO_FIELD. The GUEST_RFLAGS and 
CPU_BASED_VM_EXEC_CONTROL also been used very frequently. Of course 
the latter three MSR I mentioned need write cache support, I'd like 
to go the similar way as kvm_cache_regs did.

>
> With these three patches applied gettimeofday() microbenchmark is
> 5% faster.

I will test if we include these write cache MSR, how much benefit we 
can get. Can you provide some detail on how can you get the 
performance data? :)

-- 
Thanks
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kvm_mmu doubts

2008-06-30 Thread Sukanto Ghosh
What do these refer to ?
i) kvm_rmap_desc
ii) rmap_pde
iii) kvm_mmu_page-> spt ??? ( i thought kvm_mmu_page itself refers to page
of shadow PT, then what does spt points to ? )


Thanks and Regards
Sukanto Ghosh













--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM fixes for 2.6.26-rc7

2008-06-30 Thread Soren Hansen
On Sun, Jun 29, 2008 at 11:59:27AM +0300, Avi Kivity wrote:
> Yeah.  To be on the safe side you can have KVM_CHECK_EXTENSION return
> false for KVM_CAP_CLOCKSOURCE.

Oh! That's even better. Would that be sufficient? I mean, do all guest
versions check this before trying to use the paravirt clock?

-- 
Soren Hansen   | 
Virtualisation specialist  | Ubuntu Server Team
Canonical Ltd. | http://www.ubuntu.com/


signature.asc
Description: Digital signature


Re: [GIT PULL] KVM fixes for 2.6.26-rc7

2008-06-30 Thread Avi Kivity

Soren Hansen wrote:

On Sun, Jun 29, 2008 at 11:59:27AM +0300, Avi Kivity wrote:
  

Yeah.  To be on the safe side you can have KVM_CHECK_EXTENSION return
false for KVM_CAP_CLOCKSOURCE.



Oh! That's even better. Would that be sufficient? I mean, do all guest
versions check this before trying to use the paravirt clock?
  


You need to disable the host side (KVM_CAP_CLOCKSOURCE) and the guest 
side (CONFIG_KVM_GUEST).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm_mmu doubts

2008-06-30 Thread Avi Kivity

Sukanto Ghosh wrote:

What do these refer to ?
i) kvm_rmap_desc
  


It's a reverse mapping listing all shadow ptes pointing to a given guest 
page.



ii) rmap_pde
  


Same, for large guest pages.


iii) kvm_mmu_page-> spt ??? ( i thought kvm_mmu_page itself refers to page
of shadow PT, then what does spt points to ? )
  


kvm_mmu_page contains information about the guest page table and the 
host shadow page table.  spt is the host shadow page table.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm_mmu doubts

2008-06-30 Thread Carsten Otte

Sukanto Ghosh wrote:

What do these refer to ?
i) kvm_rmap_desc
ii) rmap_pde
iii) kvm_mmu_page-> spt ??? ( i thought kvm_mmu_page itself refers to page
of shadow PT, then what does spt points to ? )
I liked Avis presentation about the kvm softmmu at the developer forum 
2007. I felt like I understood what's going on after that, but I've 
forgotten at least 75% since. Maybe this helps?

http://kvm.qumranet.com/kvmwiki/KvmForum2007?action=AttachFile&do=get&target=shadowy-depths-of-the-kvm-mmu.pdf
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM fixes for 2.6.26-rc7

2008-06-30 Thread Soren Hansen
On Mon, Jun 30, 2008 at 04:39:23PM +0300, Avi Kivity wrote:
>>> Yeah.  To be on the safe side you can have KVM_CHECK_EXTENSION
>>> return false for KVM_CAP_CLOCKSOURCE.
>> Oh! That's even better. Would that be sufficient? I mean, do all
>> guest versions check this before trying to use the paravirt clock?
> You need to disable the host side (KVM_CAP_CLOCKSOURCE) and the guest
> side (CONFIG_KVM_GUEST).

Well, we don't have KVM_GUEST in the Ubuntu 8.04 kernels anyway (them
being 2.6.24 based and all), so returning false for KVM_CAP_CLOCKSOURCE
should be sufficient, I presume?

-- 
Soren Hansen   | 
Virtualisation specialist  | Ubuntu Server Team
Canonical Ltd. | http://www.ubuntu.com/


signature.asc
Description: Digital signature


Re: [GIT PULL] KVM fixes for 2.6.26-rc7

2008-06-30 Thread Avi Kivity

Soren Hansen wrote:

On Mon, Jun 30, 2008 at 04:39:23PM +0300, Avi Kivity wrote:
  

Yeah.  To be on the safe side you can have KVM_CHECK_EXTENSION
return false for KVM_CAP_CLOCKSOURCE.


Oh! That's even better. Would that be sufficient? I mean, do all
guest versions check this before trying to use the paravirt clock?
  

You need to disable the host side (KVM_CAP_CLOCKSOURCE) and the guest
side (CONFIG_KVM_GUEST).



Well, we don't have KVM_GUEST in the Ubuntu 8.04 kernels anyway (them
being 2.6.24 based and all), so returning false for KVM_CAP_CLOCKSOURCE
should be sufficient, I presume?

  

Yes.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add memory clobber to hypercalls (v2)

2008-06-30 Thread Hollis Blanchard
On Sat, 2008-06-28 at 06:43 +0300, Avi Kivity wrote:
> Anthony Liguori wrote:
> > Hypercalls can modify arbitrary regions of memory.  Make sure to indicate 
> > this
> > in the clobber list.  This fixes a hang when using KVM_GUEST kernel built 
> > with
> > GCC 4.3.0.
> >
> > This was originally spotted and analyzed by Marcelo.
> >
> > Since v1, I've also added a "m" constraint for the inputs to the hypercall.
> > This was suggested by Christian since it's not entirely clear whether a 
> > memory
> > clobber will force the data to be in memory before the asm statement.  In 
> > the
> > very least, it helps to be more conservative.
> >
> > Signed-off-by: Anthony Liguori <[EMAIL PROTECTED]>
> >
> > @@ -80,7 +81,9 @@ static inline long kvm_hypercall1(unsigned int nr, 
> > unsigned long p1)
> > long ret;
> > asm volatile(KVM_HYPERCALL
> >  : "=a"(ret)
> > -: "a"(nr), "b"(p1));
> > +: "a"(nr), "b"(p1),
> > +  "m"(*(char *)p1)
> > +: "memory");
> > return ret;
> >  }
> >  
> >   
> 
> Those are physical addresses, not virtual, and on i386 the addresses are 
> split across multiple registers.
> 
> However a small test program shows that the memory clobber does work 
> with gcc 4.3, so I'll pick the earlier patch.

What about gcc 4.4? 4.5? 5.0?

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Has anyone tried to run Linux 2.2 under KVM?

2008-06-30 Thread Richard W.M. Jones
I've been trying out RHL6.2 [sic] on KVM 70.  This has an ancient
2.2.14 kernel and generally dates from 1999/2000.  However it does run
nicely in 16 MB of RAM which makes it useful for me because I want to
see what happens when we run 100s of KVM instances :-)

A few observations:

(1) IDE DMA seems to cause problems.  I get lost interrupts and
general disk problems unless I boot the kernel with ide=nodma.

(2) The kernel sees the default rtl8139 NIC but cannot seem to get any
packets out of it.  I had to switch to using ne2k_pci instead.

(3) Heavy console activity hangs the virtual machine.  This is a
really weird and very annoying bug.

  (a) Serial console (-serial stdio) _also_ hangs under heavy activity.

  (b) At the point when the kernel hangs, the EIP is always at the
same place, seemingly in the __best_copy_from_user function in
the guest kernel.

  (c) The KVM monitor is still responsive.

  (d) (Now the strange bit ..)  If I switch to the KVM monitor first
and run my load tests - it doesn't hang!

  (e) However, qemu-kvm -nographic hangs under my load tests.

  (f) Plain qemu 0.9.1 doesn't hang.

Rich.

-- 
Richard Jones, Emerging Technologies, Red Hat  http://et.redhat.com/~rjones
Read my OCaml programming blog: http://camltastic.blogspot.com/
Fedora now supports 59 OCaml packages (the OPEN alternative to F#)
http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] KVM: PCIPT: VT-d: fix context mapping

2008-06-30 Thread Ben-Ami Yassour
On Fri, 2008-06-20 at 14:23 +0800, Han, Weidong wrote:
> Ben-Ami Yassour1 wrote:
> > "Han, Weidong" <[EMAIL PROTECTED]> wrote on 19/06/2008 17:18:00:
> > 
> >> Ben-Ami Yassour wrote:
> >>> On Thu, 2008-06-19 at 16:59 +0800, Han, Weidong wrote:
>  [EMAIL PROTECTED] wrote:
> > From: Ben-Ami Yassour <[EMAIL PROTECTED]>
> > 
> > When changing the VT-d context mapping, according to the spec, it
> > is required to first set the context to not present, flush and
> > only then apply the new context. 
> > 
> > Signed-off-by: Ben-Ami Yassour <[EMAIL PROTECTED]>
> > ---
> >  drivers/pci/intel-iommu.c |   17 +
> >  1 files changed, 17 insertions(+), 0 deletions(-)
> > 
> > diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> > index 930874f..dcdfa97 100644 --- a/drivers/pci/intel-iommu.c
> > +++ b/drivers/pci/intel-iommu.c
> > @@ -56,6 +56,7 @@
> > 
> > 
> >  static void flush_unmaps_timeout(unsigned long data);
> > +static void detach_domain_for_dev(struct dmar_domain *domain, u8
> > bus, u8 devfn); 
> > 
> >  DEFINE_TIMER(unmap_timer,  flush_unmaps_timeout, 0, 0);
> > 
> > @@ -1264,7 +1265,23 @@ static int
> > domain_context_mapping_one(struct dmar_domain *domain, if
> >(!context) return -ENOMEM;
> > spin_lock_irqsave(&iommu->lock, flags);
> > +
> > +   if (context_present(*context) &&
> > +   (context_domain_id(*context) == domain->id) &&
> > +   (context_address_width(*context) == domain->agaw) &&
> > +   (context_address_root(*context) ==
> > virt_to_phys(domain->pgd)) && +  
> > (context_translation_type(*context) == CONTEXT_TT_MULTI_LEVEL) &&
> > + (!context_fault_disable(*context))) {
> > +  spin_unlock_irqrestore(&iommu->lock, flags); +  return
> > 0; +   }
>  
>  Only need to check context_present(*context) according to VT-d
>  spec, which says "software must not modify fields other than the
>  Present (P) field of currently present root-entries or
>  context-entries." 
>  
>  Randy (Weidong)
> >>> 
> >>> The logic here is that, if no change is made to the context then
> >>> just return ok (0). Otherwise, according to the spec, we need to
> >>> first invalidate the context, flush it, and only then apply the
> >>> changes to the context. 
> >>> 
> >>> The other option is to make sure that before every call to this
> >>> function the context is invalidated, but disabling it inside the
> >>> function seems safer. do you agree with that?
> >>> 
> >> 
> >> After a device can be assigned to guest with VT-d, it needs a
> >> context unmap function. When a device is assigned to a guest, map
> >> context for it, while when it is detached from a guest, should unmap
> >> its context. With the context unmap function, I think we needn't to
> >> implement your logic in domain_context_mapping_one(), instead just
> >> check its present. What's your opinion?
> > 
> > Sure, that's fine, this is the other option I mentioned.
> > But we need to add the context unmap function.
> > Something like:
> > diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
> > index be775cd..4b96fbb 100644
> > --- a/arch/x86/kvm/vtd.c
> > +++ b/arch/x86/kvm/vtd.c
> > @@ -109,11 +109,17 @@ found:
> > kvm_iommu_unmap_memslots(kvm);
> > return -EFAULT;
> > }
> > +
> > +   kvm_intel_iommu_detach_dev(kvm->arch.domain,
> > +  pdev->bus->number, pdev->devfn);
> > +
> > kvm_intel_iommu_context_mapping(kvm->arch.domain, pdev);
> > return 0;
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_iommu_map_guest);
> > 
> > Agree?
> > 
> 
> I don't think it's necessary to add kvm_intel_iommu_detach_dev() here.
> If a device can be assigned to a guest, it should not be used by other
> guest (assuming no hotplug support). And also
> kvm_intel_iommu_detach_dev() is already called in
> kvm_iommu_unmap_guest(). Normally context won't be present here.
> Otherwise, there should be some wrong. I attach a patch to change it
> back to original kernel VT-d code. I think it's correct and clean.

The problem is with the following scenario:
1. load the host NIC driver
2. unload the host NIC driver
3. start kvm with passthrough for that NIC

In this case the context is not cleaned, and we get VT-d failures.
I agree with changing the VT-d driver code back to original.
But we do need:

diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
index be775cd..4b96fbb 100644
--- a/arch/x86/kvm/vtd.c
+++ b/arch/x86/kvm/vtd.c
@@ -109,11 +109,17 @@ found:
kvm_iommu_unmap_memslots(kvm);
return -EFAULT;
}
+
+   kvm_intel_iommu_detach_dev(kvm->arch.domain,
+  pdev->bus->number, pdev->devfn);
+
kvm_intel_iommu_context_mapping(kvm->arch.domain, pdev);
return 0;
 }
 EXPORT_SYMBOL_GPL(kvm

Re: [PATCH] Add memory clobber to hypercalls (v2)

2008-06-30 Thread Avi Kivity

Hollis Blanchard wrote:

On Sat, 2008-06-28 at 06:43 +0300, Avi Kivity wrote:
  

Anthony Liguori wrote:


Hypercalls can modify arbitrary regions of memory.  Make sure to indicate this
in the clobber list.  This fixes a hang when using KVM_GUEST kernel built with
GCC 4.3.0.

This was originally spotted and analyzed by Marcelo.

Since v1, I've also added a "m" constraint for the inputs to the hypercall.
This was suggested by Christian since it's not entirely clear whether a memory
clobber will force the data to be in memory before the asm statement.  In the
very least, it helps to be more conservative.

Signed-off-by: Anthony Liguori <[EMAIL PROTECTED]>

@@ -80,7 +81,9 @@ static inline long kvm_hypercall1(unsigned int nr, unsigned 
long p1)
long ret;
asm volatile(KVM_HYPERCALL
 : "=a"(ret)
-: "a"(nr), "b"(p1));
+: "a"(nr), "b"(p1),
+  "m"(*(char *)p1)
+: "memory");
return ret;
 }
 
  
  
Those are physical addresses, not virtual, and on i386 the addresses are 
split across multiple registers.


However a small test program shows that the memory clobber does work 
with gcc 4.3, so I'll pick the earlier patch.



What about gcc 4.4? 4.5? 5.0?

  


Alexandre Oliva's idea of adding a constraint in __pa() to tell gcc that 
the memory there must be written seems to be the best option here.  
Though perhaps gcc should consider all memory pointed to by a pointer 
that is cast to an integer as escaped.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm_mmu doubts

2008-06-30 Thread Sukanto Ghosh
> Sukanto Ghosh wrote:
>> What do these refer to ?
>> i) kvm_rmap_desc
>
> It's a reverse mapping listing all shadow ptes pointing to a given guest
page.
>
Then what is the rmap field of the 'struct kvm_memory_slot' ? Is it the
list of kvm_rmap_desc (one list entry for each guest page in that memory
slot) ?


>> iii) kvm_mmu_page-> spt ??? ( i thought kvm_mmu_page itself refers to page
>> of shadow PT, then what does spt points to ? )
>
> kvm_mmu_page contains information about the guest page table and the
host shadow page table.  spt is the host shadow page table.
>

I got more confused now.
I think it is due to terminology. I am novice here and I try to relate
everything to the OS textbooks.

I am calling the entire tree-like structure (including the page
directories) as a page table. In the above statement are you referring to
the same ? Or is it the last-level table that holds translated physical
addresses (+ dirty  bit, etc ) ?

What about the PGD, PMDs ?

Also, can you explain a line about each of these fields of the
kvm_mmu_page:

i) link (LRU link of what ?)

ii) gfn (guest frame number of the guest page table ?)

iii) parent_pte (in a multi-level page table structure, the PTE in a page
directory that holds the base address of the page table)

iv) root_count ( comment says 'currently serving as an active root; .is
root = PGD? )



Thanks and regards,
Sukanto
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm_mmu doubts

2008-06-30 Thread Avi Kivity

Sukanto Ghosh wrote:

Sukanto Ghosh wrote:


What do these refer to ?
i) kvm_rmap_desc
  

It's a reverse mapping listing all shadow ptes pointing to a given guest


page.
  
Then what is the rmap field of the 'struct kvm_memory_slot' ? Is it the

list of kvm_rmap_desc (one list entry for each guest page in that memory
slot) ?


  


The head of this list.


iii) kvm_mmu_page-> spt ??? ( i thought kvm_mmu_page itself refers to page
of shadow PT, then what does spt points to ? )
  

kvm_mmu_page contains information about the guest page table and the


host shadow page table.  spt is the host shadow page table.
  


I got more confused now.
I think it is due to terminology. I am novice here and I try to relate
everything to the OS textbooks.

  


Well, that won't work as I haven't read those textbooks.


I am calling the entire tree-like structure (including the page
directories) as a page table. In the above statement are you referring to
the same ? Or is it the last-level table that holds translated physical
addresses (+ dirty  bit, etc ) ?
  


No, any guest page that is part of the structure. Note the structure is 
not a tree, since multiple roots exist and as it may be cyclic.



What about the PGD, PMDs ?
  


We try not to use Linux specific names while describing guests.


Also, can you explain a line about each of these fields of the
kvm_mmu_page:

i) link (LRU link of what ?)
  


Yes, the lru link.


ii) gfn (guest frame number of the guest page table ?)
  


Yes.


iii) parent_pte (in a multi-level page table structure, the PTE in a page
directory that holds the base address of the page table)
  


Yes (or a list of those pte pointers).


iv) root_count ( comment says 'currently serving as an active root; .is
root = PGD? )
  


Yes.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2.6 patch] make kvm_smp_prepare_boot_cpu() static

2008-06-30 Thread Adrian Bunk
This patch makes the needlessly global kvm_smp_prepare_boot_cpu() static.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---
a03ee2a21c4e40483712d453a4f803980186c59f 
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 87edf1c..d02def0 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -113,7 +113,7 @@ static void kvm_setup_secondary_clock(void)
 #endif
 
 #ifdef CONFIG_SMP
-void __init kvm_smp_prepare_boot_cpu(void)
+static void __init kvm_smp_prepare_boot_cpu(void)
 {
WARN_ON(kvm_register_clock("primary cpu clock"));
native_smp_prepare_boot_cpu();

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix block mode during halt emulation

2008-06-30 Thread Dor Laor
>From d85feaae019bc0abc98a2524369e04d521a78aa8 Mon Sep 17 00:00:00 2001
From: Dor Laor <[EMAIL PROTECTED]>
Date: Mon, 30 Jun 2008 18:22:44 -0400
Subject: [PATCH] Fix block mode hduring halt emulation

There is no need to check for pending pit/apic timer, nor
pending virq, since all of the check KVM_MP_STATE_RUNNABLE
and wakeup the waitqueue.

It fixes 100% cpu when windows guest is shutdown (non acpi HAL)

Signed-off-by: Dor Laor <[EMAIL PROTECTED]>
---
 virt/kvm/kvm_main.c |4 
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b90da0b..faa0778 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -816,10 +816,6 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
for (;;) {
prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
 
-   if (kvm_cpu_has_interrupt(vcpu))
-   break;
-   if (kvm_cpu_has_pending_timer(vcpu))
-   break;
if (kvm_arch_vcpu_runnable(vcpu))
break;
if (signal_pending(current))
-- 
1.5.4


0001-Fix-block-mode-during-halt-emulation.patch
Description: application/mbox


Re: another kvm-70 compile bug with rhel/centos 5.2

2008-06-30 Thread Erik Bussink
Thanks Avi for the rhel-5.2.patch. I was able to successfully apply it
to kvm-70. In addition I found myself having to make one more small
modification for RHEL 5.2

I found myself also having to modify the kernel/external-module-compat.h
at line 670 (as per emails from Andrea Arcangeli on June 18th) 

and change 
#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,19) &&
defined(CONFIG_KALLSYMS)

to

#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,17) &&
defined(CONFIG_KALLSYMS)

So now I can successfully build KVM-70 on RHEL 5.2 for the AMD-V, and
load the modules in the kernel (2.6.18-92.1.6.el5

[EMAIL PROTECTED] ~]# lsmod | grep kvm
[EMAIL PROTECTED] ~]# modprobe kvm
[EMAIL PROTECTED] ~]# modprobe kvm_amd
[EMAIL PROTECTED] ~]# lsmod | grep kvm
kvm_amd62888  0 
kvm   178416  1 kvm_amd
[EMAIL PROTECTED] ~]# 

The only thing that I seem to be halting my progression right now is
that I cannot find the qemu-kvm helper program. It doesn't seem like it
was compiled and installed. Yet I find now qemu-system-x86_64 instead.

[EMAIL PROTECTED] ~]# which qemu-kvm
/usr/bin/which: no qemu-kvm in
(/usr/lib64/qt-3.3/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/lib64/ccache/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin)

[EMAIL PROTECTED] ~]# which qemu-nbd
/usr/local/bin/qemu-nbd

[EMAIL PROTECTED] ~]# ll /usr/local/bin/qem*
-rwxr-xr-x 1 root root  128616 Jul  1 02:54 /usr/local/bin/qemu-img
-rwxr-xr-x 1 root root  130808 Jul  1 02:54 /usr/local/bin/qemu-nbd
-rwxr-xr-x 1 root root 6740783 Jul  1
02:54 /usr/local/bin/qemu-system-x86_64

I'm able to start a virtual machine with qemu-system-x86_64. Should I
stop looking for a qemu-kvm ?

Thanks for any pointers.
Erik Bussink


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] KVM: PCIPT: VT-d: fix context mapping

2008-06-30 Thread Han, Weidong
Ben-Ami Yassour wrote:
> On Fri, 2008-06-20 at 14:23 +0800, Han, Weidong wrote:
>> Ben-Ami Yassour1 wrote:
>>> "Han, Weidong" <[EMAIL PROTECTED]> wrote on 19/06/2008 17:18:00:
>>> 
 Ben-Ami Yassour wrote:
> On Thu, 2008-06-19 at 16:59 +0800, Han, Weidong wrote:
>> [EMAIL PROTECTED] wrote:
>>> From: Ben-Ami Yassour <[EMAIL PROTECTED]>
>>> 
>>> When changing the VT-d context mapping, according to the spec,
>>> it is required to first set the context to not present, flush
>>> and only then apply the new context.
>>> 
>>> Signed-off-by: Ben-Ami Yassour <[EMAIL PROTECTED]>
>>> ---
>>>  drivers/pci/intel-iommu.c |   17 +
>>>  1 files changed, 17 insertions(+), 0 deletions(-)
>>> 
>>> diff --git a/drivers/pci/intel-iommu.c
>>> b/drivers/pci/intel-iommu.c index 930874f..dcdfa97 100644 ---
>>> a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c
>>> @@ -56,6 +56,7 @@
>>> 
>>> 
>>>  static void flush_unmaps_timeout(unsigned long data);
>>> +static void detach_domain_for_dev(struct dmar_domain *domain,
>>> u8 bus, u8 devfn); 
>>> 
>>>  DEFINE_TIMER(unmap_timer,  flush_unmaps_timeout, 0, 0);
>>> 
>>> @@ -1264,7 +1265,23 @@ static int
>>> domain_context_mapping_one(struct dmar_domain *domain, if
>>>(!context) return -ENOMEM;
>>> spin_lock_irqsave(&iommu->lock, flags);
>>> +
>>> +   if (context_present(*context) &&
>>> +   (context_domain_id(*context) == domain->id) &&
>>> +   (context_address_width(*context) == domain->agaw) &&
>>> +   (context_address_root(*context) ==
>>> virt_to_phys(domain->pgd)) && +
>>> (context_translation_type(*context) == CONTEXT_TT_MULTI_LEVEL)
>>> && + (!context_fault_disable(*context))) {
>>> +  spin_unlock_irqrestore(&iommu->lock, flags); + 
>>> return 0; +   }
>> 
>> Only need to check context_present(*context) according to VT-d
>> spec, which says "software must not modify fields other than the
>> Present (P) field of currently present root-entries or
>> context-entries." 
>> 
>> Randy (Weidong)
> 
> The logic here is that, if no change is made to the context then
> just return ok (0). Otherwise, according to the spec, we need to
> first invalidate the context, flush it, and only then apply the
> changes to the context. 
> 
> The other option is to make sure that before every call to this
> function the context is invalidated, but disabling it inside the
> function seems safer. do you agree with that?
> 
 
 After a device can be assigned to guest with VT-d, it needs a
 context unmap function. When a device is assigned to a guest, map
 context for it, while when it is detached from a guest, should
 unmap its context. With the context unmap function, I think we
 needn't to implement your logic in domain_context_mapping_one(),
 instead just check its present. What's your opinion?
>>> 
>>> Sure, that's fine, this is the other option I mentioned.
>>> But we need to add the context unmap function.
>>> Something like:
>>> diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
>>> index be775cd..4b96fbb 100644
>>> --- a/arch/x86/kvm/vtd.c
>>> +++ b/arch/x86/kvm/vtd.c
>>> @@ -109,11 +109,17 @@ found:
>>> kvm_iommu_unmap_memslots(kvm);
>>> return -EFAULT;
>>> }
>>> +
>>> +   kvm_intel_iommu_detach_dev(kvm->arch.domain,
>>> +  pdev->bus->number, pdev->devfn);
>>> + kvm_intel_iommu_context_mapping(kvm->arch.domain, pdev); 
>>>  return 0; }
>>>  EXPORT_SYMBOL_GPL(kvm_iommu_map_guest);
>>> 
>>> Agree?
>>> 
>> 
>> I don't think it's necessary to add kvm_intel_iommu_detach_dev()
>> here. If a device can be assigned to a guest, it should not be used
>> by other guest (assuming no hotplug support). And also
>> kvm_intel_iommu_detach_dev() is already called in
>> kvm_iommu_unmap_guest(). Normally context won't be present here.
>> Otherwise, there should be some wrong. I attach a patch to change it
>> back to original kernel VT-d code. I think it's correct and clean.
> 
> The problem is with the following scenario:
> 1. load the host NIC driver
> 2. unload the host NIC driver
> 3. start kvm with passthrough for that NIC
> 
> In this case the context is not cleaned, and we get VT-d failures.
> I agree with changing the VT-d driver code back to original.
> But we do need:
> 
> diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
> index be775cd..4b96fbb 100644
> --- a/arch/x86/kvm/vtd.c
> +++ b/arch/x86/kvm/vtd.c
> @@ -109,11 +109,17 @@ found:
> kvm_iommu_unmap_memslots(kvm);
> return -EFAULT;
> }
> +
> +   kvm_intel_iommu_detach_dev(kvm->arch.domain,
> +  pdev->bus->number, pdev->devfn);
> +
> kvm_intel_iommu_context_mapping(kvm->arch.

Re: kvm_mmu doubts

2008-06-30 Thread Sukanto Ghosh
> Sukanto Ghosh wrote:
>>
>> I am calling the entire tree-like structure (including the page
>> directories) as a page table. In the above statement are you referring
>> to
>> the same ? Or is it the last-level table that holds translated physical
>> addresses (+ dirty  bit, etc ) ?
>>
>
> No, any guest page that is part of the structure. Note the structure is
> not a tree, since multiple roots exist and as it may be cyclic.
>

I understand that it is not a tree and that in case of sharing and
aliasing, multiple roots will exist. But, how can it be cyclic ? (Are you
not considering directionality or is it because the nodes in this
structure do have pointers to the parents or there is something else as
well ?)


Thanks and Regards,
Sukanto Ghosh
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html