date:20120214

[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane

2012-02-14 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=42755





--- Comment #29 from Rosen   2012-02-15 07:49:35 ---
(In reply to comment #28)
> (In reply to comment #27)
> > and there soon will be video capture with 'perf top'
> > 
> > http://vbox7.com/play:199e9ede30
> 
> Run it while the guest is also running.

Good Morning!
There will be video http://vbox7.com/play:7128f03f1f after some momments.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

buildbot failure in kvm on i386

2012-02-14 Thread kvm

The Buildbot has detected a new failure on builder i386 while building kvm.
Full details are available at:
 http://buildbot.b1-systems.de/kvm/builders/i386/builds/454

Buildbot URL: http://buildbot.b1-systems.de/kvm/

Buildslave for this Build: b1_kvm_1

Build Reason: The Nightly scheduler named 'nightly_master' triggered this build
Build Source Stamp: [branch master] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

Re: The way of mapping BIOS into the guest's address space

2012-02-14 Thread Kevin O'Connor

On Tue, Feb 14, 2012 at 04:13:42PM +0400, Cyrill Gorcunov wrote:
> On Tue, Feb 14, 2012 at 01:10:59PM +0200, Pekka Enberg wrote:
> > On Tue, Feb 14, 2012 at 1:03 PM, Yang Bai  wrote:
> > > Since on X86, bios is always at the end of the address space, so I
> > > have some thought about how to implement the seabios support for kvm
> > > tool.
> > >
> > > 1. using kvm__register_mem to map the end of address space to the
> > > guest then copy the code of seabios to this mem region. Just emulating
> > > the bios chip.
> 
> I think this is what should be done.
> 
> > >
> > > 2. leave the bios code alone and don't touch the guest's address
> > > space. If the guest accesses the address belonging to the bios, it
> > > will be an IO request and we can emulate the IO access to the bios
> > > chip.
> > >
> > > Any ideas about this?
> > 
> > The latter solution doesn't make any sense to me. Cyrill, do we really
> > need to put the BIOS at the end of the address space? Don't we have
> > unused space below 1 MB?
> 
> I don't remember for sure how SeaBIOS works actually. What I rememer
> is that it aquires all hw environment might have. So without real look
> into seabios code I fear I can't answer. But reserving end of 4G address
> space for bios copy sounds reasonable if we going to behave as real
> hardware. Maybe we could poke someone from KVM camp for a hint?

SeaBIOS has two ways to be deployed - first is to copy the image to
the top of the first 1MB (eg, 0xe-0xf) and jump to
0xf000:0xfff0 in 16bit mode.  The second way is to use the SeaBIOS elf
and deploy into memory (according to the elf memory map) and jump to
SeaBIOS in 32bit mode (according to the elf entry point).

SeaBIOS doesn't really need to be in the top 4G of ram.  SeaBIOS does
expect to have normal PC hardware devices (eg, a PIC), though many
hardware devices can be compiled out via its kconfig interface.  The
more interesting challenge will likely be in communicating critical
pieces of information (eg, total memory size) into SeaBIOS.

The SeaBIOS mailing list (seab...@seabios.org) is probably a better
location for technical seabios questions.

-Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: AESNI and guest hosts

2012-02-14 Thread Ryan Brown

Thanks for the reply, I was thinking AESNI was supported in the way
SSE/MMX and other cpu flags are supported? is a QEMU or a KVM issue?

On Wed, Feb 15, 2012 at 7:18 AM, Brian Jackson  wrote:
> On Tuesday, February 14, 2012 03:31:10 AM Ryan Brown wrote:
>> Sorry for being a noob here, Any clues with this?, anyone ...
>>
>> On Mon, Feb 13, 2012 at 2:05 AM, Ryan Brown  wrote:
>> > Host/KVM server is running linux 3.2.4 (Debian wheezy), and guest
>> > kernel is running 3.2.5. The cpu is an E3-1230, but for some reason
>> > its not able to supply the guest with aesni. Is there a config option
>> > or is there something we're missing?
>
>
>
> I don't think it's supported to pass that functionality to the guest.
>
>
>
>> >
>> >    
>> >     x86_64
>> >     Westmere
>> >     Intel
>> >     
>> >     
>> >     
>> >     
>> >     
>> >     
>> >     
>> >     
>> >     
>> >     
>> >     
>> >     
>> >     
>> >     
>> >     
>> >    
>> >  
>> >
>> > Guest:
>> > [root@fanboy:~]# cat /proc/cpuinfo
>> > processor       : 0
>> > vendor_id       : GenuineIntel
>> > cpu family      : 6
>> > model           : 2
>> > model name      : QEMU Virtual CPU version 1.0
>> > stepping        : 3
>> > microcode       : 0x1
>> > cpu MHz         : 3192.748
>> > cache size      : 4096 KB
>> > fdiv_bug        : no
>> > hlt_bug         : no
>> > f00f_bug        : no
>> > coma_bug        : no
>> > fpu             : yes
>> > fpu_exception   : yes
>> > cpuid level     : 4
>> > wp              : yes
>> > flags           : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca
>> > cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 popcnt
>> > hypervisor lahf_lm
>> > bogomips        : 6385.49
>> > clflush size    : 64
>> > cache_alignment : 64
>> > address sizes   : 40 bits physical, 48 bits virtual
>> > power management:
>> >
>> > processor       : 1
>> > vendor_id       : GenuineIntel
>> > cpu family      : 6
>> > model           : 2
>> > model name      : QEMU Virtual CPU version 1.0
>> > stepping        : 3
>> > microcode       : 0x1
>> > cpu MHz         : 3192.748
>> > cache size      : 4096 KB
>> > fdiv_bug        : no
>> > hlt_bug         : no
>> > f00f_bug        : no
>> > coma_bug        : no
>> > fpu             : yes
>> > fpu_exception   : yes
>> > cpuid level     : 4
>> > wp              : yes
>> > flags           : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca
>> > cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 popcnt
>> > hypervisor lahf_lm
>> > bogomips        : 6385.49
>> > clflush size    : 64
>> > cache_alignment : 64
>> > address sizes   : 40 bits physical, 48 bits virtual
>>
>> > power management:
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4 V13] Avoid soft lockup message when KVM is stopped by host

2012-02-14 Thread Eric B Munson

On Tue, 14 Feb 2012, Marcelo Tosatti wrote:

> On Tue, Feb 14, 2012 at 10:50:13AM -0500, Eric B Munson wrote:
> > On Tue, 14 Feb 2012, Marcelo Tosatti wrote:
> > 
> > > On Tue, Feb 14, 2012 at 10:29:31AM -0500, Eric B Munson wrote:
> > > > On Wed, 08 Feb 2012, Eric B Munson wrote:
> > > > 
> > > > > 
> > > > > When a guest kernel is stopped by the host hypervisor it can look 
> > > > > like a soft
> > > > > lockup to the guest kernel.  This false warning can mask later soft 
> > > > > lockup
> > > > > warnings which may be real.  This patch series adds a method for a 
> > > > > host
> > > > > hypervisor to communicate to a guest kernel that it is being stopped. 
> > > > >  The
> > > > > final patch in the series has the watchdog check this flag when it 
> > > > > goes to
> > > > > issue a soft lockup warning and skip the warning if the guest knows 
> > > > > it was
> > > > > stopped.
> > > > > 
> > > > > It was attempted to solve this in Qemu, but the side effects of 
> > > > > saving and
> > > > > restoring the clock and tsc for each vcpu put the wall clock of the 
> > > > > guest behind
> > > > > by the amount of time of the pause.  This forces a guest to have ntp 
> > > > > running
> > > > > in order to keep the wall clock accurate.
> > > > 
> > > > Avi,
> > > > 
> > > > Is this set fit for merging or is there something else you want changed?
> > > 
> > > Eric,
> > > 
> > > On Message-ID: <20120210160536.ga23...@amt.cnet>, i asked:
> > > 
> > > How is the stub getting included for other architectures again?
> > > 
> > 
> > Marcelo,
> > 
> > Sorry, I put out V13 to answer that.  There is a stub in asm-generic that 
> > was
> > lost in the V11-V12 rebase.  This stub has be included in the V13 set.
> > 
> > Eric
> 
> Eric, 
> 
> I know the stub has been included in the series. But i am asking how 
> it is #include'ed for other architectures? (can't see that).

Marcelo,

kernel/watchdog.c now includes linux/kvm_para.h which includes asm/kvm_para.h.
The check_and_clear function is defined in arch include/asm/kvm_para.h or in
asm-generic/kvm_para.h for any arch lacking the specific header in their asm
include dir.  If I have misunderstood how these headers work, please let me
know and I will fix it.

Eric


signature.asc
Description: Digital signature

Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

2012-02-14 Thread Marcelo Tosatti

On Tue, Feb 14, 2012 at 07:53:56PM +0100, Andrea Arcangeli wrote:
> On Tue, Feb 14, 2012 at 03:29:47PM -0200, Marcelo Tosatti wrote:
> > The problem the patch is fixing is not related to page freeing, but
> > rmap_write_protect. From 8bf3f6f06fcdfd097b6c6ec51531d8292fa0d81d
> 
> Can't find the commit on kvm.git.

Sorry, we got kvm.git out of sync. But you can see an equivalent below.

> 
> > (replace "A (get_dirty_log)" with "mmu_notifier_invalidate_page"):
> > 
> > 
> > During protecting pages for dirty logging, other threads may also try
> > to protect a page in mmu_sync_children() or kvm_mmu_get_page().
> > 
> > In such a case, because get_dirty_log releases mmu_lock before flushing
> > TLB's, the following race condition can happen:
> > 
> >   A (get_dirty_log) B (another thread)
> > 
> >   lock(mmu_lock)
> >   clear pte.w
> >   unlock(mmu_lock)
> > lock(mmu_lock)
> > pte.w is already cleared
> > unlock(mmu_lock)
> > skip TLB flush
> 
> Not sure which tree it is, but in kvm and upstream I see an
> unconditional tlb flush here, not skip (both
> kvm_mmu_slot_remove_write_access and kvm_mmu_rmap_write_protect). So I
> assume this has been updated in your tree to eb conditional.

if (!direct) {
if (rmap_write_protect(vcpu->kvm, gfn))
kvm_flush_remote_tlbs(vcpu->kvm);


> Also note kvm_mmu_rmap_write_protect, flushes outside of the mmu_lock
> in the kvm_mmu_rmap_write_protect case (like in quoted description),
> so two write_protect_slot in parallel against each other may not be
> ok, but that may be enforced by design if qemu won't ever call that
> ioctl from two different userland threads (it doesn't sounds security
> related so it should be ok to enforce its safety by userland design).

Yes, here is the fix:

http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=02b48d00d7f1853bdf8a06da19ca5413ebe334c6

This is an equivalent of 8bf3f6f06fcdfd097b6c6ec51531d8292fa0d81d.

> 
> > return
> >   ...
> >   TLB flush
> > 
> > Though thread B assumes the page has already been protected when it
> > returns, the remaining TLB entry will break that assumption.
> 
> Now I get the question of why not running the TLB flush inside the
> mmu_lock only if the spte was writable :).
> 
> kvm_mmu_get_page as long as it only runs in the context of a kvm page
> fault is ok, because the page fault would be inhibited by the mmu
> notifier invalidates, so maybe it's safe.

Ah, perhaps, but this was not taken into account before. Can you confirm
this is the case so we can revert the invalidate_page patch?

> mmu_sync_children seems to have a problem instead, in your tree
> get_dirty_log also has an issue if it has been updated to skip the
> flush on readonly sptes, like I guess.
> 
> Interesting how the spte is already non present, the page is just
> being freed shortly later, but yet we still need to trigger write
> faults synchronously and prevent other CPUs in guest mode to further
> modify the page to avoid losing dirty bits updates or updates on
> pagetables that maps pagetables in the not NPT/EPT case. If the page
> was really only going to be freed it would be ok if the other cpus
> would still write to it for a little longer until the page was freed.
> 
> Like I wrote in previous email, I was thinking if we'd change the mmu
> notifier methods to do an unconditional flush, then every other flush
> could also run outside of the mmu_lock. But then I didn't think enough
> about this to be sure. My guess is we could move all flushes outside
> the mmu_lock if we stop flushling the tlb conditonally to the current
> spte values. It'd clearly be slower for an UP guest though :). Large
> SMP guests might benefit, if that is feasible at all... It depends how
> problematic the mmu_lock is on the large SMP guests and how much we're
> saving by doing conditional TLB flushes.

Also it should not be necessary for these flushes to be inside mmu_lock
on EPT/NPT case (since there is no write protection there). But it would
be awkward to differentiate the unlock position based on EPT/NPT.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware

2012-02-14 Thread John Fastabend

On 2/14/2012 11:05 AM, Stephen Hemminger wrote:
> On Tue, 14 Feb 2012 10:57:04 -0800
> John Fastabend  wrote:
> 
>> On 2/14/2012 5:18 AM, jamal wrote:
>>> On Mon, 2012-02-13 at 07:13 -0800, John Fastabend wrote:
>>>
 The use case here is multiple VFs but the same solution should work with
 multiple PFs as well. FDB controls should be independent of how the ports
 are exposed VFs, PFs, VMDQ/queue pairs, macvlan, etc.
>>>
>>> Makes sense.
>>>
 With events and ADD/DEL/GET FDB controls we can solve both cases. This also
 solves Roopa's case with macvlan where she wants to add additional 
 addresses
 to macvlan ports.
>>>
>>> Not familiar with that issue - I'll prowl the list.
>>
>> Roopa was likely on the right track here,
>>
>> http://patchwork.ozlabs.org/patch/123064/
>>
>> But I think the proper syntax is to use the existing PF_BRIDGE:RTM_XXX
>> netlink messages. And if possible drive this without extending ndo_ops.
>>
>> An ideal user space interaction IMHO would look like,
>>
>> [root@jf-dev1-dcblab iproute2]# ./br/br fdb add 52:e5:62:7b:57:88 dev veth10
>> [root@jf-dev1-dcblab iproute2]# ./br/br fdb
>> portmac addrflags
>> veth2   36:a6:35:9b:96:c4   local
>> veth4   aa:54:b0:7b:42:ef   local
>> veth0   2a:e8:5c:95:6c:1b   local
>> veth6   6e:26:d5:43:a3:36   local
>> veth0   f2:c1:39:76:6a:fb
>> veth8   4e:35:16:af:87:13   local
>> veth10  52:e5:62:7b:57:88   static
>> veth10  aa:a9:35:21:15:c4   local
>> [root@jf-dev1-dcblab iproute2]# ./br/br fdb add dev eth3 to 52:e5:62:7b:57:88
>> RTNETLINK answers: Invalid argument
> 
> I am going to put bridge (nameclash with br) tool into iproute2 (soon).

I've been using it on my dev box for awhile now and it works well for
me.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware

2012-02-14 Thread Stephen Hemminger

On Tue, 14 Feb 2012 10:57:04 -0800
John Fastabend  wrote:

> On 2/14/2012 5:18 AM, jamal wrote:
> > On Mon, 2012-02-13 at 07:13 -0800, John Fastabend wrote:
> > 
> >> The use case here is multiple VFs but the same solution should work with
> >> multiple PFs as well. FDB controls should be independent of how the ports
> >> are exposed VFs, PFs, VMDQ/queue pairs, macvlan, etc.
> > 
> > Makes sense.
> > 
> >> With events and ADD/DEL/GET FDB controls we can solve both cases. This also
> >> solves Roopa's case with macvlan where she wants to add additional 
> >> addresses
> >> to macvlan ports.
> > 
> > Not familiar with that issue - I'll prowl the list.
> 
> Roopa was likely on the right track here,
> 
> http://patchwork.ozlabs.org/patch/123064/
> 
> But I think the proper syntax is to use the existing PF_BRIDGE:RTM_XXX
> netlink messages. And if possible drive this without extending ndo_ops.
> 
> An ideal user space interaction IMHO would look like,
> 
> [root@jf-dev1-dcblab iproute2]# ./br/br fdb add 52:e5:62:7b:57:88 dev veth10
> [root@jf-dev1-dcblab iproute2]# ./br/br fdb
> portmac addrflags
> veth2   36:a6:35:9b:96:c4   local
> veth4   aa:54:b0:7b:42:ef   local
> veth0   2a:e8:5c:95:6c:1b   local
> veth6   6e:26:d5:43:a3:36   local
> veth0   f2:c1:39:76:6a:fb
> veth8   4e:35:16:af:87:13   local
> veth10  52:e5:62:7b:57:88   static
> veth10  aa:a9:35:21:15:c4   local
> [root@jf-dev1-dcblab iproute2]# ./br/br fdb add dev eth3 to 52:e5:62:7b:57:88
> RTNETLINK answers: Invalid argument

I am going to put bridge (nameclash with br) tool into iproute2 (soon).
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware

2012-02-14 Thread John Fastabend

On 2/14/2012 5:18 AM, jamal wrote:
> On Mon, 2012-02-13 at 07:13 -0800, John Fastabend wrote:
> 
>> The use case here is multiple VFs but the same solution should work with
>> multiple PFs as well. FDB controls should be independent of how the ports
>> are exposed VFs, PFs, VMDQ/queue pairs, macvlan, etc.
> 
> Makes sense.
> 
>> With events and ADD/DEL/GET FDB controls we can solve both cases. This also
>> solves Roopa's case with macvlan where she wants to add additional addresses
>> to macvlan ports.
> 
> Not familiar with that issue - I'll prowl the list.

Roopa was likely on the right track here,

http://patchwork.ozlabs.org/patch/123064/

But I think the proper syntax is to use the existing PF_BRIDGE:RTM_XXX
netlink messages. And if possible drive this without extending ndo_ops.

An ideal user space interaction IMHO would look like,

[root@jf-dev1-dcblab iproute2]# ./br/br fdb add 52:e5:62:7b:57:88 dev veth10
[root@jf-dev1-dcblab iproute2]# ./br/br fdb
portmac addrflags
veth2   36:a6:35:9b:96:c4   local
veth4   aa:54:b0:7b:42:ef   local
veth0   2a:e8:5c:95:6c:1b   local
veth6   6e:26:d5:43:a3:36   local
veth0   f2:c1:39:76:6a:fb
veth8   4e:35:16:af:87:13   local
veth10  52:e5:62:7b:57:88   static
veth10  aa:a9:35:21:15:c4   local
[root@jf-dev1-dcblab iproute2]# ./br/br fdb add dev eth3 to 52:e5:62:7b:57:88
RTNETLINK answers: Invalid argument

Using Stephen's br tool. First command adds FDB entry to SW bridge and
if the same tool could be used to add entries to embedded bridge I think
that would be the best case. So no RTNETLINK error on the second cmd. Then
embedded FDB entries could be dumped this way also so I get a complete view
of my FDB setup across multiple sw bridges and embedded bridges.

I don't think br is part of iproute2 yet I just pulled it out of some RFC
but it works reasonably well and is intuitive enough.

> 
>> Yes it should flood here, unless its acting as a 802.1Qbg VEB or VEPA.
> 
> Ok. So there is a toggle somewhere which controls how flooding should
> happen.
> 

Yes. The hardware has a bit to support this which is currently not exposed
to user space. That's a case where we have 'yet another knob' that needs
a clean solution. This causes real bugs today when users try to use the
macvlan devices in VEPA mode on top of SR-IOV. By the way these modes are
all part of the 802.1Qbg spec which people actually want to use with Linux
so a good clean solution is probably needed.

>>
>> Maybe not. But the kernel already has the needed signals with one extra
>> hook we can save running a daemon in user space. Maybe that's not a great
>> argument to add kernel code though.
> 
> You make a reasonable arguement to have it in the kernel but i think we
> win more if we separate the control. So while i empathize, I am hoping
> that youd go with the path that is hard to travel ;->
> 
>> The PF_BRIDGE:RTM_GETNEIGH,RTM_NEWNEIGH,RTM_DELNEIGH are registered in the
>> br_netlink_init() path. 
> 
> Hrm - hadnt paid attention to that before. Nasty.
> The bridge seems to be hard-coding policy on station movement, no? 
> This is a good example of the qualms i have on adding things to the
> kernel;->
> I may not want to auto update a MAC address moving ports as part of
> some policy i have. I can go and add YAK (Yet Another Knob) - but where
> is the line drawn?
> 

I have no problem with drawing the line here and trying to implement something
over PF_BRIDGE:RTM_xxx nlmsgs. I'll work with Roopa and see if we can come
up with something in the next couple days.

w.r.t. VEPA/VEB and flooding behavior we could probably have a bit to indicate
if the port is a flooding port or not. Then users could build any sort of 
forwarding
table they wanted OR we could just drive it through a notifier (ndo_ops?) in the
macvlan path which does VEPA today.

OK I'll try to write some actual code now that can be critiqued.

> cheers,
> jamal
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

2012-02-14 Thread Andrea Arcangeli

On Tue, Feb 14, 2012 at 03:29:47PM -0200, Marcelo Tosatti wrote:
> The problem the patch is fixing is not related to page freeing, but
> rmap_write_protect. From 8bf3f6f06fcdfd097b6c6ec51531d8292fa0d81d

Can't find the commit on kvm.git.

> (replace "A (get_dirty_log)" with "mmu_notifier_invalidate_page"):
> 
> 
> During protecting pages for dirty logging, other threads may also try
> to protect a page in mmu_sync_children() or kvm_mmu_get_page().
> 
> In such a case, because get_dirty_log releases mmu_lock before flushing
> TLB's, the following race condition can happen:
> 
>   A (get_dirty_log) B (another thread)
> 
>   lock(mmu_lock)
>   clear pte.w
>   unlock(mmu_lock)
> lock(mmu_lock)
> pte.w is already cleared
> unlock(mmu_lock)
> skip TLB flush

Not sure which tree it is, but in kvm and upstream I see an
unconditional tlb flush here, not skip (both
kvm_mmu_slot_remove_write_access and kvm_mmu_rmap_write_protect). So I
assume this has been updated in your tree to eb conditional.

Also note kvm_mmu_rmap_write_protect, flushes outside of the mmu_lock
in the kvm_mmu_rmap_write_protect case (like in quoted description),
so two write_protect_slot in parallel against each other may not be
ok, but that may be enforced by design if qemu won't ever call that
ioctl from two different userland threads (it doesn't sounds security
related so it should be ok to enforce its safety by userland design).

> return
>   ...
>   TLB flush
> 
> Though thread B assumes the page has already been protected when it
> returns, the remaining TLB entry will break that assumption.

Now I get the question of why not running the TLB flush inside the
mmu_lock only if the spte was writable :).

kvm_mmu_get_page as long as it only runs in the context of a kvm page
fault is ok, because the page fault would be inhibited by the mmu
notifier invalidates, so maybe it's safe.

mmu_sync_children seems to have a problem instead, in your tree
get_dirty_log also has an issue if it has been updated to skip the
flush on readonly sptes, like I guess.

Interesting how the spte is already non present, the page is just
being freed shortly later, but yet we still need to trigger write
faults synchronously and prevent other CPUs in guest mode to further
modify the page to avoid losing dirty bits updates or updates on
pagetables that maps pagetables in the not NPT/EPT case. If the page
was really only going to be freed it would be ok if the other cpus
would still write to it for a little longer until the page was freed.

Like I wrote in previous email, I was thinking if we'd change the mmu
notifier methods to do an unconditional flush, then every other flush
could also run outside of the mmu_lock. But then I didn't think enough
about this to be sure. My guess is we could move all flushes outside
the mmu_lock if we stop flushling the tlb conditonally to the current
spte values. It'd clearly be slower for an UP guest though :). Large
SMP guests might benefit, if that is feasible at all... It depends how
problematic the mmu_lock is on the large SMP guests and how much we're
saving by doing conditional TLB flushes.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: AESNI and guest hosts

2012-02-14 Thread Brian Jackson

On Tuesday, February 14, 2012 03:31:10 AM Ryan Brown wrote:
> Sorry for being a noob here, Any clues with this?, anyone ...
> 
> On Mon, Feb 13, 2012 at 2:05 AM, Ryan Brown  wrote:
> > Host/KVM server is running linux 3.2.4 (Debian wheezy), and guest
> > kernel is running 3.2.5. The cpu is an E3-1230, but for some reason
> > its not able to supply the guest with aesni. Is there a config option
> > or is there something we're missing?



I don't think it's supported to pass that functionality to the guest.



> > 
> >
> > x86_64
> > Westmere
> > Intel
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >
> >  
> > 
> > Guest:
> > [root@fanboy:~]# cat /proc/cpuinfo
> > processor   : 0
> > vendor_id   : GenuineIntel
> > cpu family  : 6
> > model   : 2
> > model name  : QEMU Virtual CPU version 1.0
> > stepping: 3
> > microcode   : 0x1
> > cpu MHz : 3192.748
> > cache size  : 4096 KB
> > fdiv_bug: no
> > hlt_bug : no
> > f00f_bug: no
> > coma_bug: no
> > fpu : yes
> > fpu_exception   : yes
> > cpuid level : 4
> > wp  : yes
> > flags   : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca
> > cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 popcnt
> > hypervisor lahf_lm
> > bogomips: 6385.49
> > clflush size: 64
> > cache_alignment : 64
> > address sizes   : 40 bits physical, 48 bits virtual
> > power management:
> > 
> > processor   : 1
> > vendor_id   : GenuineIntel
> > cpu family  : 6
> > model   : 2
> > model name  : QEMU Virtual CPU version 1.0
> > stepping: 3
> > microcode   : 0x1
> > cpu MHz : 3192.748
> > cache size  : 4096 KB
> > fdiv_bug: no
> > hlt_bug : no
> > f00f_bug: no
> > coma_bug: no
> > fpu : yes
> > fpu_exception   : yes
> > cpuid level : 4
> > wp  : yes
> > flags   : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca
> > cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 popcnt
> > hypervisor lahf_lm
> > bogomips: 6385.49
> > clflush size: 64
> > cache_alignment : 64
> > address sizes   : 40 bits physical, 48 bits virtual
> 
> > power management:
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [VT-d reboot problems] Re: [PATCH] x86 / reboot: Blacklist Dell OptiPlex 990 known to require PCI reboot

2012-02-14 Thread Bastien ROUCARIES

On Tue, Jan 31, 2012 at 1:15 PM, Ingo Molnar  wrote:
>
> (added KVM folks to the Cc:)
>
> * Bastien ROUCARIES  wrote:
>
>> Ping^2
>>
>> Bastien
>> On Mon, Jan 23, 2012 at 11:28 AM, Bastien ROUCARIES
>>  wrote:
>> > On Mon, Jan 16, 2012 at 8:21 PM, H. Peter Anvin  wrote:
>> >> On 01/16/2012 03:27 AM, Bastien ROUCARIES wrote:
>> 
>>  Does it work if you disable VT-d in the firmware? If so, then adding it
>>  to the reboot method blacklist is the wrong fix - we need to figure out
>>  why VT-d interferes with Dell's reboot code.
>> >>>
>> >>> Yes it work
>> >>>
>> >>
>> >> This is particularly so since we are very close to having a full Dell
>> >> model catalogue in the kernel...
>> >
>> > Ping ? Do you need some dump ? testing ?
>
> So disabling VT-d in the BIOS fixes the reboot problem and
> Matthew Garrett suggests we should figure out why and how VT-d
> on this Dell box interferes with the reboot method.
>
> Thanks,
>
>        Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

2012-02-14 Thread Marcelo Tosatti

On Tue, Feb 14, 2012 at 06:10:44PM +0100, Andrea Arcangeli wrote:
> On Fri, Feb 10, 2012 at 03:28:31PM +0900, Takuya Yoshikawa wrote:
> > Other threads may process the same page in that small window and skip
> > TLB flush and then return before these functions do flush.
> 
> It's correct to flush the shadow MMU TLB without the mmu_lock only in
> the context of mmu notifier methods. So the below while won't hurt,
> it's performance regression and shouldn't be applied (and
> it obfuscates the code by not being strict anymore).
> 
> To the contrary every other place that does a shadow/secondary MMU smp
> tlb flush _must_ happen inside the mmu_lock, otherwise the
> serialization isn't correct anymore against the very below mmu_lock in
> the below quoted patch taken by
> kvm_mmu_notifier_invalidate_page/kvm_mmu_notifier_invalidate_range_start.
> 
> The explanation is in commit 4539b35881ae9664b0e2953438dd83f5ee02c0b4.
> 
> I'll try to explain it more clearly: the moment you drop mmu_lock,
> pages can be freed. So if you invalidate a spte in any place inside
> the KVM code (except the mmu notifier methods where a reference of the
> page is implicitly hold by the caller and so the page can't go away
> under a mmu notifier method by design), then the below
> kvm_mmu_notifier_invalidate_page/kvm_mmu_notifier_invalidate_range_start
> won't get their need_tlb_flush set anymore, and they won't run the tlb
> flush before freeing the page.
> 
> So every other place (except mmu notifier) must flush the secondary
> MMU smp tlb _before_ releasing the mmu_lock.
> 
> Only mmu notifier is safe to flush the secondary MMU TLB _after_
> releasing the mmu_lock.
> 
> If we changed the mmu notifier methods to unconditionally flush the
> shadow TLB (regardless if a spte was present or not), we might not
> need to hold the mmu_lock in every tlb flush outside the context of
> the mmu notifier methods. But then the mmu notifier methods would
> become more expensive, I didn't evaluate fully what would be the side
> effects of such a change. Also note, only the
> kvm_mmu_notifier_invalidate_page and
> kvm_mmu_notifier_invalidate_range_start would need to do that, because
> they're the only two where the page reference gets dropped.
> 
> Even shorter: because the mmu notifier a implicit reference on the
> page exists and is hold by the caller, they can flush outside the
> mmu_lock. Every other place in KVM only holds an implicit valid
> reference on the page only as long as you hold the mmu_lock, or while
> a spte is still established.
> 
> Well it's not easy logic so it's not surprising it wasn't totally
> clear.
> 
> It's probably not heavily documented, and the fact you changed it
> still is still good so we refresh our minds on the exact rules of mmu
> notifier locking, thanks!

The problem the patch is fixing is not related to page freeing, but
rmap_write_protect. From 8bf3f6f06fcdfd097b6c6ec51531d8292fa0d81d
(replace "A (get_dirty_log)" with "mmu_notifier_invalidate_page"):


During protecting pages for dirty logging, other threads may also try
to protect a page in mmu_sync_children() or kvm_mmu_get_page().

In such a case, because get_dirty_log releases mmu_lock before flushing
TLB's, the following race condition can happen:

  A (get_dirty_log) B (another thread)

  lock(mmu_lock)
  clear pte.w
  unlock(mmu_lock)
lock(mmu_lock)
pte.w is already cleared
unlock(mmu_lock)
skip TLB flush
return
  ...
  TLB flush

Though thread B assumes the page has already been protected when it
returns, the remaining TLB entry will break that assumption.


> 
> Andrea
> 
> > 
> > Signed-off-by: Takuya Yoshikawa 
> > ---
> >  virt/kvm/kvm_main.c |   19 ++-
> >  1 files changed, 10 insertions(+), 9 deletions(-)
> > 
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 470e305..2b4bc77 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -289,15 +289,15 @@ static void kvm_mmu_notifier_invalidate_page(struct 
> > mmu_notifier *mn,
> >  */
> > idx = srcu_read_lock(&kvm->srcu);
> > spin_lock(&kvm->mmu_lock);
> > +
> > kvm->mmu_notifier_seq++;
> > need_tlb_flush = kvm_unmap_hva(kvm, address) | kvm->tlbs_dirty;
> > -   spin_unlock(&kvm->mmu_lock);
> > -   srcu_read_unlock(&kvm->srcu, idx);
> > -
> > /* we've to flush the tlb before the pages can be freed */
> > if (need_tlb_flush)
> > kvm_flush_remote_tlbs(kvm);
> >  
> > +   spin_unlock(&kvm->mmu_lock);
> > +   srcu_read_unlock(&kvm->srcu, idx);
> >  }
> >  
> >  static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
> > @@ -335,12 +335,12 @@ static void 
> > kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
> > for (; start < end; start += PAGE_SIZE)
> > need_tlb_flush |= kvm_unmap_hva(kvm, start);
> > need_tlb_flush |= kvm->tlbs_dirty;
> > -   spi

Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

2012-02-14 Thread Andrea Arcangeli

On Fri, Feb 10, 2012 at 03:52:49PM +0800, Xiao Guangrong wrote:
> On 02/10/2012 02:28 PM, Takuya Yoshikawa wrote:
> 
> > Other threads may process the same page in that small window and skip
> > TLB flush and then return before these functions do flush.
> > 
> 
> 
> It is possible that flush tlb in mmu lock only when writeable
> spte is invalided? Sometimes, kvm_flush_remote_tlbs need
> long time to wait.

readonly isn't enough to defer the flush after mmu_lock is
released... if you do it only for writable spte, then what can happen
is the guest may read random data and would crash.

However for this case, the mmu_notifier methods (and only them) are
perfectly safe to flush the shadow MMU TLB after the mmu_lock is
released because the page reference is guaranteed hold by the caller
(not the case for any other place where a spte gets dropped in KVM,
all other places dropping sptes, can only on the mmu notifier to block
on the mmu_lock in order to have a guarantee of the page not being
freed under them, so in every other place the shadow MMU TLB flush
must happen before releasing the mmu_lock so the mmu_notifier will
wait and prevent the page to be freed until all other CPUs running in
guest mode stopped accessing it).
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] KVM: MMU: Flush TLBs only once in invlpg() before releasing mmu_lock

2012-02-14 Thread Andrea Arcangeli

On Tue, Feb 14, 2012 at 01:56:17PM +0900, Takuya Yoshikawa wrote:
> (2012/02/14 13:36), Takuya Yoshikawa wrote:
> 
> > BTW, do you think that "kvm_mmu_flush_tlb()" should be moved inside of the
> > mmu_lock critical section?
> >
> 
> Ah, forget about this.  Trivially no.

Yes the reason is that it's the local flush and guest mode isn't
running if we're running that function so it's ok to run it later.

About the other change you did in this patch 2/2, I can't find the
code you're patching in the 3.2 upstream source, when I added the tlb
flush to invlpg, I immediately used a cumulative need_flush at the end
(before relasing mmu_lock of course).

   if (need_flush)
  kvm_flush_remote_tlbs(vcpu->kvm);
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

2012-02-14 Thread Andrea Arcangeli

On Fri, Feb 10, 2012 at 03:28:31PM +0900, Takuya Yoshikawa wrote:
> Other threads may process the same page in that small window and skip
> TLB flush and then return before these functions do flush.

It's correct to flush the shadow MMU TLB without the mmu_lock only in
the context of mmu notifier methods. So the below while won't hurt,
it's performance regression and shouldn't be applied (and
it obfuscates the code by not being strict anymore).

To the contrary every other place that does a shadow/secondary MMU smp
tlb flush _must_ happen inside the mmu_lock, otherwise the
serialization isn't correct anymore against the very below mmu_lock in
the below quoted patch taken by
kvm_mmu_notifier_invalidate_page/kvm_mmu_notifier_invalidate_range_start.

The explanation is in commit 4539b35881ae9664b0e2953438dd83f5ee02c0b4.

I'll try to explain it more clearly: the moment you drop mmu_lock,
pages can be freed. So if you invalidate a spte in any place inside
the KVM code (except the mmu notifier methods where a reference of the
page is implicitly hold by the caller and so the page can't go away
under a mmu notifier method by design), then the below
kvm_mmu_notifier_invalidate_page/kvm_mmu_notifier_invalidate_range_start
won't get their need_tlb_flush set anymore, and they won't run the tlb
flush before freeing the page.

So every other place (except mmu notifier) must flush the secondary
MMU smp tlb _before_ releasing the mmu_lock.

Only mmu notifier is safe to flush the secondary MMU TLB _after_
releasing the mmu_lock.

If we changed the mmu notifier methods to unconditionally flush the
shadow TLB (regardless if a spte was present or not), we might not
need to hold the mmu_lock in every tlb flush outside the context of
the mmu notifier methods. But then the mmu notifier methods would
become more expensive, I didn't evaluate fully what would be the side
effects of such a change. Also note, only the
kvm_mmu_notifier_invalidate_page and
kvm_mmu_notifier_invalidate_range_start would need to do that, because
they're the only two where the page reference gets dropped.

Even shorter: because the mmu notifier a implicit reference on the
page exists and is hold by the caller, they can flush outside the
mmu_lock. Every other place in KVM only holds an implicit valid
reference on the page only as long as you hold the mmu_lock, or while
a spte is still established.

Well it's not easy logic so it's not surprising it wasn't totally
clear.

It's probably not heavily documented, and the fact you changed it
still is still good so we refresh our minds on the exact rules of mmu
notifier locking, thanks!

Andrea

> 
> Signed-off-by: Takuya Yoshikawa 
> ---
>  virt/kvm/kvm_main.c |   19 ++-
>  1 files changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 470e305..2b4bc77 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -289,15 +289,15 @@ static void kvm_mmu_notifier_invalidate_page(struct 
> mmu_notifier *mn,
>*/
>   idx = srcu_read_lock(&kvm->srcu);
>   spin_lock(&kvm->mmu_lock);
> +
>   kvm->mmu_notifier_seq++;
>   need_tlb_flush = kvm_unmap_hva(kvm, address) | kvm->tlbs_dirty;
> - spin_unlock(&kvm->mmu_lock);
> - srcu_read_unlock(&kvm->srcu, idx);
> -
>   /* we've to flush the tlb before the pages can be freed */
>   if (need_tlb_flush)
>   kvm_flush_remote_tlbs(kvm);
>  
> + spin_unlock(&kvm->mmu_lock);
> + srcu_read_unlock(&kvm->srcu, idx);
>  }
>  
>  static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
> @@ -335,12 +335,12 @@ static void 
> kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
>   for (; start < end; start += PAGE_SIZE)
>   need_tlb_flush |= kvm_unmap_hva(kvm, start);
>   need_tlb_flush |= kvm->tlbs_dirty;
> - spin_unlock(&kvm->mmu_lock);
> - srcu_read_unlock(&kvm->srcu, idx);
> -
>   /* we've to flush the tlb before the pages can be freed */
>   if (need_tlb_flush)
>   kvm_flush_remote_tlbs(kvm);
> +
> + spin_unlock(&kvm->mmu_lock);
> + srcu_read_unlock(&kvm->srcu, idx);
>  }
>  
>  static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
> @@ -378,13 +378,14 @@ static int kvm_mmu_notifier_clear_flush_young(struct 
> mmu_notifier *mn,
>  
>   idx = srcu_read_lock(&kvm->srcu);
>   spin_lock(&kvm->mmu_lock);
> - young = kvm_age_hva(kvm, address);
> - spin_unlock(&kvm->mmu_lock);
> - srcu_read_unlock(&kvm->srcu, idx);
>  
> + young = kvm_age_hva(kvm, address);
>   if (young)
>   kvm_flush_remote_tlbs(kvm);
>  
> + spin_unlock(&kvm->mmu_lock);
> + srcu_read_unlock(&kvm->srcu, idx);
> +
>   return young;
>  }
>  
> -- 
> 1.7.5.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.

[PATCH] virt: Fix migration bg command

2012-02-14 Thread Lucas Meneghel Rodrigues

In migration tests, the command we were using as a
'watchdog' command was tcpdump, but without specifying
which interface it should listen to. As this may fail
depending on the interface ordering, let's change the
command to listen in all interfaces, since this way
it's safer and the command won't fail depending on
the interface ordering.

Signed-off-by: Eduardo Habkost 
---
 client/virt/subtests.cfg.sample |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/client/virt/subtests.cfg.sample b/client/virt/subtests.cfg.sample
index b08a5c4..56043e0 100644
--- a/client/virt/subtests.cfg.sample
+++ b/client/virt/subtests.cfg.sample
@@ -350,7 +350,7 @@ variants:
 - migrate: install setup image_copy unattended_install.cdrom
 type = migration
 migration_test_command = help
-migration_bg_command = "cd /tmp; nohup tcpdump -q -t ip host localhost"
+migration_bg_command = "cd /tmp; nohup tcpdump -q -i any -t ip host 
localhost"
 migration_bg_check_command = pgrep tcpdump
 migration_bg_kill_command = pkill tcpdump
 kill_vm_on_error = yes
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4 V13] Avoid soft lockup message when KVM is stopped by host

2012-02-14 Thread Marcelo Tosatti

On Tue, Feb 14, 2012 at 10:50:13AM -0500, Eric B Munson wrote:
> On Tue, 14 Feb 2012, Marcelo Tosatti wrote:
> 
> > On Tue, Feb 14, 2012 at 10:29:31AM -0500, Eric B Munson wrote:
> > > On Wed, 08 Feb 2012, Eric B Munson wrote:
> > > 
> > > > 
> > > > When a guest kernel is stopped by the host hypervisor it can look like 
> > > > a soft
> > > > lockup to the guest kernel.  This false warning can mask later soft 
> > > > lockup
> > > > warnings which may be real.  This patch series adds a method for a host
> > > > hypervisor to communicate to a guest kernel that it is being stopped.  
> > > > The
> > > > final patch in the series has the watchdog check this flag when it goes 
> > > > to
> > > > issue a soft lockup warning and skip the warning if the guest knows it 
> > > > was
> > > > stopped.
> > > > 
> > > > It was attempted to solve this in Qemu, but the side effects of saving 
> > > > and
> > > > restoring the clock and tsc for each vcpu put the wall clock of the 
> > > > guest behind
> > > > by the amount of time of the pause.  This forces a guest to have ntp 
> > > > running
> > > > in order to keep the wall clock accurate.
> > > 
> > > Avi,
> > > 
> > > Is this set fit for merging or is there something else you want changed?
> > 
> > Eric,
> > 
> > On Message-ID: <20120210160536.ga23...@amt.cnet>, i asked:
> > 
> > How is the stub getting included for other architectures again?
> > 
> 
> Marcelo,
> 
> Sorry, I put out V13 to answer that.  There is a stub in asm-generic that was
> lost in the V11-V12 rebase.  This stub has be included in the V13 set.
> 
> Eric

Eric, 

I know the stub has been included in the series. But i am asking how 
it is #include'ed for other architectures? (can't see that).


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4 V13] Avoid soft lockup message when KVM is stopped by host

2012-02-14 Thread Eric B Munson

On Tue, 14 Feb 2012, Marcelo Tosatti wrote:

> On Tue, Feb 14, 2012 at 10:29:31AM -0500, Eric B Munson wrote:
> > On Wed, 08 Feb 2012, Eric B Munson wrote:
> > 
> > > 
> > > When a guest kernel is stopped by the host hypervisor it can look like a 
> > > soft
> > > lockup to the guest kernel.  This false warning can mask later soft lockup
> > > warnings which may be real.  This patch series adds a method for a host
> > > hypervisor to communicate to a guest kernel that it is being stopped.  The
> > > final patch in the series has the watchdog check this flag when it goes to
> > > issue a soft lockup warning and skip the warning if the guest knows it was
> > > stopped.
> > > 
> > > It was attempted to solve this in Qemu, but the side effects of saving and
> > > restoring the clock and tsc for each vcpu put the wall clock of the guest 
> > > behind
> > > by the amount of time of the pause.  This forces a guest to have ntp 
> > > running
> > > in order to keep the wall clock accurate.
> > 
> > Avi,
> > 
> > Is this set fit for merging or is there something else you want changed?
> 
> Eric,
> 
> On Message-ID: <20120210160536.ga23...@amt.cnet>, i asked:
> 
> How is the stub getting included for other architectures again?
> 

Marcelo,

Sorry, I put out V13 to answer that.  There is a stub in asm-generic that was
lost in the V11-V12 rebase.  This stub has be included in the V13 set.

Eric


signature.asc
Description: Digital signature

Re: [PATCH 0/4 V13] Avoid soft lockup message when KVM is stopped by host

2012-02-14 Thread Marcelo Tosatti

On Tue, Feb 14, 2012 at 10:29:31AM -0500, Eric B Munson wrote:
> On Wed, 08 Feb 2012, Eric B Munson wrote:
> 
> > 
> > When a guest kernel is stopped by the host hypervisor it can look like a 
> > soft
> > lockup to the guest kernel.  This false warning can mask later soft lockup
> > warnings which may be real.  This patch series adds a method for a host
> > hypervisor to communicate to a guest kernel that it is being stopped.  The
> > final patch in the series has the watchdog check this flag when it goes to
> > issue a soft lockup warning and skip the warning if the guest knows it was
> > stopped.
> > 
> > It was attempted to solve this in Qemu, but the side effects of saving and
> > restoring the clock and tsc for each vcpu put the wall clock of the guest 
> > behind
> > by the amount of time of the pause.  This forces a guest to have ntp running
> > in order to keep the wall clock accurate.
> 
> Avi,
> 
> Is this set fit for merging or is there something else you want changed?

Eric,

On Message-ID: <20120210160536.ga23...@amt.cnet>, i asked:

How is the stub getting included for other architectures again?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: The way of mapping BIOS into the guest's address space

2012-02-14 Thread Pekka Enberg

On Tue, Feb 14, 2012 at 5:38 PM, Cyrill Gorcunov  wrote:
>> > Ideally we should get rid of our minibios completely and only have
>> > seabios here instead.
>>
>> No, no, they should co-exist. There's absolutely no reason to force
>> people to use a BIOS to boot Linux.
>
> I meant run-time (ie in memory). I didn't mean substitude our minibios,
> but rather have an ability to either run with compiled-in bios or with
> seabios instead.

Sure.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: The way of mapping BIOS into the guest's address space

2012-02-14 Thread Cyrill Gorcunov

On Tue, Feb 14, 2012 at 05:35:47PM +0200, Pekka Enberg wrote:
> On Tue, Feb 14, 2012 at 09:20:18PM +0800, Yang Bai wrote:
> >> And will seabios replace the present bios implement or co-exsit?
> 
> On Tue, Feb 14, 2012 at 3:32 PM, Cyrill Gorcunov  wrote:
> > Ideally we should get rid of our minibios completely and only have
> > seabios here instead.
> 
> No, no, they should co-exist. There's absolutely no reason to force
> people to use a BIOS to boot Linux.
> 

I meant run-time (ie in memory). I didn't mean substitude our minibios,
but rather have an ability to either run with compiled-in bios or with
seabios instead.

Cyrill
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: The way of mapping BIOS into the guest's address space

2012-02-14 Thread Pekka Enberg

On Tue, Feb 14, 2012 at 09:20:18PM +0800, Yang Bai wrote:
>> And will seabios replace the present bios implement or co-exsit?

On Tue, Feb 14, 2012 at 3:32 PM, Cyrill Gorcunov  wrote:
> Ideally we should get rid of our minibios completely and only have
> seabios here instead.

No, no, they should co-exist. There's absolutely no reason to force
people to use a BIOS to boot Linux.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4 V13] Avoid soft lockup message when KVM is stopped by host

2012-02-14 Thread Eric B Munson

On Wed, 08 Feb 2012, Eric B Munson wrote:

> 
> When a guest kernel is stopped by the host hypervisor it can look like a soft
> lockup to the guest kernel.  This false warning can mask later soft lockup
> warnings which may be real.  This patch series adds a method for a host
> hypervisor to communicate to a guest kernel that it is being stopped.  The
> final patch in the series has the watchdog check this flag when it goes to
> issue a soft lockup warning and skip the warning if the guest knows it was
> stopped.
> 
> It was attempted to solve this in Qemu, but the side effects of saving and
> restoring the clock and tsc for each vcpu put the wall clock of the guest 
> behind
> by the amount of time of the pause.  This forces a guest to have ntp running
> in order to keep the wall clock accurate.

Avi,

Is this set fit for merging or is there something else you want changed?

Eric

> 
> Cc: mi...@redhat.com
> Cc: h...@zytor.com
> Cc: ry...@linux.vnet.ibm.com
> Cc: aligu...@us.ibm.com
> Cc: mtosa...@redhat.com
> Cc: kvm@vger.kernel.org
> Cc: linux-a...@vger.kernel.org
> Cc: x...@kernel.org
> Cc: linux-ker...@vger.kernel.org
> 
> Eric B Munson (4):
>   Add flag to indicate that a vm was stopped by the host
>   Add functions to check if the host has stopped the vm
>   Add ioctl for KVM_KVMCLOCK_CTRL
>   Add check for suspended vm in softlockup detector
> 
>  Documentation/virtual/kvm/api.txt   |   13 +
>  arch/ia64/include/asm/kvm_para.h|5 +
>  arch/powerpc/include/asm/kvm_para.h |5 +
>  arch/s390/include/asm/kvm_para.h|5 +
>  arch/x86/include/asm/kvm_para.h |8 
>  arch/x86/include/asm/pvclock-abi.h  |1 +
>  arch/x86/kernel/kvmclock.c  |   21 +
>  arch/x86/kvm/x86.c  |   22 ++
>  include/asm-generic/kvm_para.h  |   14 ++
>  include/linux/kvm.h |3 +++
>  kernel/watchdog.c   |   12 
>  11 files changed, 109 insertions(+), 0 deletions(-)
>  create mode 100644 include/asm-generic/kvm_para.h
> 
> -- 
> 1.7.5.4
> 


signature.asc
Description: Digital signature

Re: level in kvm_mmu_page_role

2012-02-14 Thread Avi Kivity

On 02/13/2012 11:30 PM, Sanidhya Kashyap wrote:
> I have been going through the kvm code but didn't get the significance
> of level in kvm_mmu_page_role. So, it would be nice if anyone can
> explain it what is its use?
>
>

It's the page table level.  Level 1 contains page table entries pointing
to 4k pages.  Level 2 contains page directory entries pointing to level
1 page tables, or pointers to 2M pages, and so forth.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 6/9] kvmvapic: Introduce TPR access optimization for Windows guests

2012-02-14 Thread Jan Kiszka

This enables acceleration for MMIO-based TPR registers accesses of
32-bit Windows guest systems. It is mostly useful with KVM enabled,
either on older Intel CPUs (without flexpriority feature, can also be
manually disabled for testing) or any current AMD processor.

The approach introduced here is derived from the original version of
qemu-kvm. It was refactored, documented, and extended by support for
user space APIC emulation, both with and without KVM acceleration. The
VMState format was kept compatible, so was the ABI to the option ROM
that implements the guest-side para-virtualized driver service. This
enables seamless migration from qemu-kvm to upstream or, one day,
between KVM and TCG mode.

The basic concept goes like this:
 - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
   irqchip) a vmcall hypercall is registered
 - VAPIC option ROM is loaded into guest
 - option ROM activates TPR MMIO access reporting via port 0x7e
 - TPR accesses are trapped and patched in the guest to call into option
   ROM instead, VAPIC support is enabled
 - option ROM TPR helpers track state in memory and invoke hypercall to
   poll for pending IRQs if required

Signed-off-by: Jan Kiszka 
---
 Makefile.target|3 +-
 hw/apic.c  |  126 -
 hw/apic_common.c   |   64 -
 hw/apic_internal.h |   27 ++
 hw/kvm/apic.c  |   32 ++
 hw/kvmvapic.c  |  803 
 6 files changed, 1041 insertions(+), 14 deletions(-)
 create mode 100644 hw/kvmvapic.c

diff --git a/Makefile.target b/Makefile.target
index 68481a3..ec7eff8 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -230,7 +230,8 @@ obj-y += device-hotplug.o
 
 # Hardware support
 obj-i386-y += mc146818rtc.o pc.o
-obj-i386-y += sga.o apic_common.o apic.o ioapic_common.o ioapic.o piix_pci.o
+obj-i386-y += apic_common.o apic.o kvmvapic.o
+obj-i386-y += sga.o ioapic_common.o ioapic.o piix_pci.o
 obj-i386-y += vmport.o
 obj-i386-y += pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
diff --git a/hw/apic.c b/hw/apic.c
index 086c544..2ebf3ca 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -35,6 +35,10 @@
 #define MSI_ADDR_DEST_ID_SHIFT 12
 #defineMSI_ADDR_DEST_ID_MASK   0x000
 
+#define SYNC_FROM_VAPIC 0x1
+#define SYNC_TO_VAPIC   0x2
+#define SYNC_ISR_IRR_TO_VAPIC   0x4
+
 static APICCommonState *local_apics[MAX_APICS + 1];
 
 static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode);
@@ -78,6 +82,70 @@ static inline int get_bit(uint32_t *tab, int index)
 return !!(tab[i] & mask);
 }
 
+/* return -1 if no bit is set */
+static int get_highest_priority_int(uint32_t *tab)
+{
+int i;
+for (i = 7; i >= 0; i--) {
+if (tab[i] != 0) {
+return i * 32 + fls_bit(tab[i]);
+}
+}
+return -1;
+}
+
+static void apic_sync_vapic(APICCommonState *s, int sync_type)
+{
+VAPICState vapic_state;
+size_t length;
+off_t start;
+int vector;
+
+if (!s->vapic_paddr) {
+return;
+}
+if (sync_type & SYNC_FROM_VAPIC) {
+cpu_physical_memory_rw(s->vapic_paddr, (void *)&vapic_state,
+   sizeof(vapic_state), 0);
+s->tpr = vapic_state.tpr;
+}
+if (sync_type & (SYNC_TO_VAPIC | SYNC_ISR_IRR_TO_VAPIC)) {
+start = offsetof(VAPICState, isr);
+length = offsetof(VAPICState, enabled) - offsetof(VAPICState, isr);
+
+if (sync_type & SYNC_TO_VAPIC) {
+assert(qemu_cpu_is_self(s->cpu_env));
+
+vapic_state.tpr = s->tpr;
+vapic_state.enabled = 1;
+start = 0;
+length = sizeof(VAPICState);
+}
+
+vector = get_highest_priority_int(s->isr);
+if (vector < 0) {
+vector = 0;
+}
+vapic_state.isr = vector & 0xf0;
+
+vapic_state.zero = 0;
+
+vector = get_highest_priority_int(s->irr);
+if (vector < 0) {
+vector = 0;
+}
+vapic_state.irr = vector & 0xff;
+
+cpu_physical_memory_write_rom(s->vapic_paddr + start,
+  ((void *)&vapic_state) + start, length);
+}
+}
+
+static void apic_vapic_base_update(APICCommonState *s)
+{
+apic_sync_vapic(s, SYNC_TO_VAPIC);
+}
+
 static void apic_local_deliver(APICCommonState *s, int vector)
 {
 uint32_t lvt = s->lvt[vector];
@@ -239,20 +307,17 @@ static void apic_set_base(APICCommonState *s, uint64_t 
val)
 
 static void apic_set_tpr(APICCommonState *s, uint8_t val)
 {
-s->tpr = (val & 0x0f) << 4;
-apic_update_irq(s);
+/* Updates from cr8 are ignored while the VAPIC is active */
+if (!s->vapic_paddr) {
+s->tpr = val << 4;
+apic_update_irq(s);
+}
 }
 
-/* return -1 if no bit is set */
-static int get_highest_priority_int(uint32_t *tab)
+static uint8_t apic_get_tpr(APICCommonState *s)
 {
-int i;
-f

[PATCH v3 0/9] uq/master: TPR access optimization for Windows guests

2012-02-14 Thread Jan Kiszka

v3 comes with the following changes:
 - clear TPR access report on system reset
   (in case we load a guest without the option ROM)
 - addressed review comments on details in kvmvapic.c
 - streamlined 16-bit VAPIC port handling
 - included cleanup for useless next_cpu casts in cpus.c
   (to avoid conflicts on merge)

The series is also available at

git://git.kiszka.org/qemu-kvm.git queues/kvm-tpr

Please review/apply.

CC: Paolo Bonzini 

Jan Kiszka (9):
  kvm: Set cpu_single_env only once
  Remove useless casts from cpu iterators
  Allow to use pause_all_vcpus from VCPU context
  target-i386: Add infrastructure for reporting TPR MMIO accesses
  kvmvapic: Add option ROM
  kvmvapic: Introduce TPR access optimization for Windows guests
  kvmvapic: Simplify mp/up_set_tpr
  optionsrom: Reserve space for checksum
  kvmvapic: Use optionrom helpers

 .gitignore|1 +
 Makefile  |2 +-
 Makefile.target   |3 +-
 cpu-all.h |3 +-
 cpus.c|   21 +-
 hw/apic.c |  126 ++-
 hw/apic.h |2 +
 hw/apic_common.c  |   68 -
 hw/apic_internal.h|   27 ++
 hw/kvm/apic.c |   32 ++
 hw/kvmvapic.c |  803 +
 kvm-all.c |5 -
 pc-bios/optionrom/Makefile|2 +-
 pc-bios/optionrom/kvmvapic.S  |  335 +
 pc-bios/optionrom/optionrom.h |3 +-
 target-i386/cpu.h |   11 +
 target-i386/helper.c  |   19 +
 target-i386/kvm.c |   24 ++-
 18 files changed, 1458 insertions(+), 29 deletions(-)
 create mode 100644 hw/kvmvapic.c
 create mode 100644 pc-bios/optionrom/kvmvapic.S

-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 8/9] optionsrom: Reserve space for checksum

2012-02-14 Thread Jan Kiszka

Always add a byte before the final 512-bytes alignment to reserve the
space for the ROM checksum.

Signed-off-by: Jan Kiszka 
---
 pc-bios/optionrom/optionrom.h |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/pc-bios/optionrom/optionrom.h b/pc-bios/optionrom/optionrom.h
index aa783de..3daf7da 100644
--- a/pc-bios/optionrom/optionrom.h
+++ b/pc-bios/optionrom/optionrom.h
@@ -124,7 +124,8 @@
movw%ax, %ds;
 
 #define OPTION_ROM_END \
-.align 512, 0; \
+   .byte   0;  \
+   .align  512, 0; \
 _end:
 
 #define BOOT_ROM_END   \
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 3/9] Allow to use pause_all_vcpus from VCPU context

2012-02-14 Thread Jan Kiszka

In order to perform critical manipulations on the VM state in the
context of a VCPU, specifically code patching, stopping and resuming of
all VCPUs may be necessary. resume_all_vcpus is already compatible, now
enable pause_all_vcpus for this use case by stopping the calling context
before starting to wait for the whole gang.

CC: Paolo Bonzini 
Signed-off-by: Jan Kiszka 
---
 cpus.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/cpus.c b/cpus.c
index 4e65894..290daa8 100644
--- a/cpus.c
+++ b/cpus.c
@@ -870,6 +870,18 @@ void pause_all_vcpus(void)
 penv = penv->next_cpu;
 }
 
+if (!qemu_thread_is_self(&io_thread)) {
+cpu_stop_current();
+if (!kvm_enabled()) {
+while (penv) {
+penv->stop = 0;
+penv->stopped = 1;
+penv = penv->next_cpu;
+}
+return;
+}
+}
+
 while (!all_vcpus_paused()) {
 qemu_cond_wait(&qemu_pause_cond, &qemu_global_mutex);
 penv = first_cpu;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 4/9] target-i386: Add infrastructure for reporting TPR MMIO accesses

2012-02-14 Thread Jan Kiszka

This will allow the APIC core to file a TPR access report. Depending on
the accelerator and kernel irqchip mode, it will either be delivered
right away or queued for later reporting.

In TCG mode, we can restart the triggering instruction and can therefore
forward the event directly. KVM does not allows us to restart, so we
postpone the delivery of events recording in the user space APIC until
the current instruction is completed.

Note that KVM without in-kernel irqchip will report the address after
the instruction that triggered a write access. In contrast, read
accesses will return the precise information.

Signed-off-by: Jan Kiszka 
---
 cpu-all.h|3 ++-
 hw/apic.h|2 ++
 hw/apic_common.c |4 
 target-i386/cpu.h|   11 +++
 target-i386/helper.c |   19 +++
 target-i386/kvm.c|   24 ++--
 6 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index e2c3c49..80e6d42 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -375,8 +375,9 @@ DECLARE_TLS(CPUState *,cpu_single_env);
 #define CPU_INTERRUPT_TGT_INT_0   0x0100
 #define CPU_INTERRUPT_TGT_INT_1   0x0400
 #define CPU_INTERRUPT_TGT_INT_2   0x0800
+#define CPU_INTERRUPT_TGT_INT_3   0x2000
 
-/* First unused bit: 0x2000.  */
+/* First unused bit: 0x4000.  */
 
 /* The set of all bits that should be masked when single-stepping.  */
 #define CPU_INTERRUPT_SSTEP_MASK \
diff --git a/hw/apic.h b/hw/apic.h
index a62d83b..45598bd 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -18,6 +18,8 @@ void cpu_set_apic_tpr(DeviceState *s, uint8_t val);
 uint8_t cpu_get_apic_tpr(DeviceState *s);
 void apic_init_reset(DeviceState *s);
 void apic_sipi(DeviceState *s);
+void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip,
+   int access);
 
 /* pc.c */
 int cpu_is_bsp(CPUState *env);
diff --git a/hw/apic_common.c b/hw/apic_common.c
index 8373d79..588531b 100644
--- a/hw/apic_common.c
+++ b/hw/apic_common.c
@@ -68,6 +68,10 @@ uint8_t cpu_get_apic_tpr(DeviceState *d)
 return s ? s->tpr >> 4 : 0;
 }
 
+void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip, int access)
+{
+}
+
 void apic_report_irq_delivered(int delivered)
 {
 apic_irq_delivered += delivered;
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 37dde79..c2e9ca3 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -482,6 +482,7 @@
 #define CPU_INTERRUPT_VIRQ  CPU_INTERRUPT_TGT_INT_0
 #define CPU_INTERRUPT_INIT  CPU_INTERRUPT_TGT_INT_1
 #define CPU_INTERRUPT_SIPI  CPU_INTERRUPT_TGT_INT_2
+#define CPU_INTERRUPT_TPR   CPU_INTERRUPT_TGT_INT_3
 
 
 enum {
@@ -772,6 +773,9 @@ typedef struct CPUX86State {
 XMMReg ymmh_regs[CPU_NB_REGS];
 
 uint64_t xcr0;
+
+target_ulong tpr_access_ip;
+int tpr_access_type;
 } CPUX86State;
 
 CPUX86State *cpu_x86_init(const char *cpu_model);
@@ -1064,4 +1068,11 @@ void svm_check_intercept(CPUState *env1, uint32_t type);
 
 uint32_t cpu_cc_compute_all(CPUState *env1, int op);
 
+typedef enum TPRAccess {
+TPR_ACCESS_READ,
+TPR_ACCESS_WRITE,
+} TPRAccess;
+
+void cpu_report_tpr_access(CPUState *env, TPRAccess access);
+
 #endif /* CPU_I386_H */
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 2586aff..79aeb8f 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1189,6 +1189,25 @@ void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, 
int bank,
 }
 }
 }
+
+void cpu_report_tpr_access(CPUState *env, TPRAccess access)
+{
+TranslationBlock *tb;
+
+if (kvm_enabled()) {
+cpu_synchronize_state(env);
+
+env->tpr_access_ip = env->eip;
+env->tpr_access_type = access;
+
+cpu_interrupt(env, CPU_INTERRUPT_TPR);
+} else {
+tb = tb_find_pc(env->mem_io_pc);
+cpu_restore_state(tb, env, env->mem_io_pc);
+
+apic_handle_tpr_access_report(env->apic_state, env->eip, access);
+}
+}
 #endif /* !CONFIG_USER_ONLY */
 
 static void mce_init(CPUX86State *cenv)
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 981192d..fa77f9d 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1635,8 +1635,10 @@ void kvm_arch_pre_run(CPUState *env, struct kvm_run *run)
 }
 
 if (!kvm_irqchip_in_kernel()) {
-/* Force the VCPU out of its inner loop to process the INIT request */
-if (env->interrupt_request & CPU_INTERRUPT_INIT) {
+/* Force the VCPU out of its inner loop to process any INIT requests
+ * or pending TPR access reports. */
+if (env->interrupt_request &
+(CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
 env->exit_request = 1;
 }
 
@@ -1730,6 +1732,11 @@ int kvm_arch_process_async_events(CPUState *env)
 kvm_cpu_synchronize_state(env);
 do_cpu_sipi(env);
 }
+if (env->interrupt_request & CPU_INTERRUPT_TPR) {
+env->interrupt_request &= ~CPU_INTERRUPT_TPR;
+apic_handle_tpr_access_

[PATCH v3 9/9] kvmvapic: Use optionrom helpers

2012-02-14 Thread Jan Kiszka

Use OPTION_ROM_START/END from the common header file, add comment to
init code.

Signed-off-by: Jan Kiszka 
---
 pc-bios/optionrom/kvmvapic.S |   18 --
 1 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/pc-bios/optionrom/kvmvapic.S b/pc-bios/optionrom/kvmvapic.S
index 856c1e5..aa17a40 100644
--- a/pc-bios/optionrom/kvmvapic.S
+++ b/pc-bios/optionrom/kvmvapic.S
@@ -9,12 +9,10 @@
 # option) any later version.  See the COPYING file in the top-level directory.
 #
 
-   .text 0
-   .code16
-.global _start
-_start:
-   .short 0xaa55
-   .byte (_end - _start) / 512
+#include "optionrom.h"
+
+OPTION_ROM_START
+
# clear vapic area: firmware load using rep insb may cause
# stale tpr/isr/irr data to corrupt the vapic area.
push %es
@@ -26,8 +24,11 @@ _start:
cld
rep stosw
pop %es
+
+   # announce presence to the hypervisor
mov $vapic_base, %ax
out %ax, $0x7e
+
lret
 
.code32
@@ -331,7 +332,4 @@ up_set_tpr_poll_irq:
 vapic:
 . = . + vapic_size
 
-.byte 0  # reserve space for signature
-.align 512, 0
-
-_end:
+OPTION_ROM_END
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 1/9] kvm: Set cpu_single_env only once

2012-02-14 Thread Jan Kiszka

As we have thread-local cpu_single_env now and KVM uses exactly one
thread per VCPU, we can drop the cpu_single_env updates from the loop
and initialize this variable only once during setup.

Signed-off-by: Jan Kiszka 
---
 cpus.c|1 +
 kvm-all.c |5 -
 2 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/cpus.c b/cpus.c
index f45a438..d0c8340 100644
--- a/cpus.c
+++ b/cpus.c
@@ -714,6 +714,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
 qemu_mutex_lock(&qemu_global_mutex);
 qemu_thread_get_self(env->thread);
 env->thread_id = qemu_get_thread_id();
+cpu_single_env = env;
 
 r = kvm_init_vcpu(env);
 if (r < 0) {
diff --git a/kvm-all.c b/kvm-all.c
index c4babda..e2cbc03 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1118,8 +1118,6 @@ int kvm_cpu_exec(CPUState *env)
 return EXCP_HLT;
 }
 
-cpu_single_env = env;
-
 do {
 if (env->kvm_vcpu_dirty) {
 kvm_arch_put_registers(env, KVM_PUT_RUNTIME_STATE);
@@ -1136,13 +1134,11 @@ int kvm_cpu_exec(CPUState *env)
  */
 qemu_cpu_kick_self();
 }
-cpu_single_env = NULL;
 qemu_mutex_unlock_iothread();
 
 run_ret = kvm_vcpu_ioctl(env, KVM_RUN, 0);
 
 qemu_mutex_lock_iothread();
-cpu_single_env = env;
 kvm_arch_post_run(env, run);
 
 kvm_flush_coalesced_mmio_buffer();
@@ -1206,7 +1202,6 @@ int kvm_cpu_exec(CPUState *env)
 }
 
 env->exit_request = 0;
-cpu_single_env = NULL;
 return ret;
 }
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 2/9] Remove useless casts from cpu iterators

2012-02-14 Thread Jan Kiszka

CPUState::next_cpu is already CPUState *.

Signed-off-by: Jan Kiszka 
---
 cpus.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/cpus.c b/cpus.c
index d0c8340..4e65894 100644
--- a/cpus.c
+++ b/cpus.c
@@ -853,7 +853,7 @@ static int all_vcpus_paused(void)
 if (!penv->stopped) {
 return 0;
 }
-penv = (CPUState *)penv->next_cpu;
+penv = penv->next_cpu;
 }
 
 return 1;
@@ -867,7 +867,7 @@ void pause_all_vcpus(void)
 while (penv) {
 penv->stop = 1;
 qemu_cpu_kick(penv);
-penv = (CPUState *)penv->next_cpu;
+penv = penv->next_cpu;
 }
 
 while (!all_vcpus_paused()) {
@@ -875,7 +875,7 @@ void pause_all_vcpus(void)
 penv = first_cpu;
 while (penv) {
 qemu_cpu_kick(penv);
-penv = (CPUState *)penv->next_cpu;
+penv = penv->next_cpu;
 }
 }
 }
@@ -889,7 +889,7 @@ void resume_all_vcpus(void)
 penv->stop = 0;
 penv->stopped = 0;
 qemu_cpu_kick(penv);
-penv = (CPUState *)penv->next_cpu;
+penv = penv->next_cpu;
 }
 }
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 7/9] kvmvapic: Simplify mp/up_set_tpr

2012-02-14 Thread Jan Kiszka

The CH registers is only written, never read. So we can remove these
operations and, in case of up_set_tpr, also the ECX push/pop.

Signed-off-by: Jan Kiszka 
---
 pc-bios/optionrom/kvmvapic.S |6 +-
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/pc-bios/optionrom/kvmvapic.S b/pc-bios/optionrom/kvmvapic.S
index e1d8f18..856c1e5 100644
--- a/pc-bios/optionrom/kvmvapic.S
+++ b/pc-bios/optionrom/kvmvapic.S
@@ -202,7 +202,6 @@ mp_isr_is_bigger:
mov %bh, %bl
 mp_tpr_is_bigger:
/* %bl = ppr */
-   mov %bl, %ch   /* ch = ppr */
rol $8, %ebx
/* now: %bl = irr, %bh = ppr */
cmp %bh, %bl
@@ -276,7 +275,6 @@ up_set_tpr_eax:
 up_set_tpr:
pushf
push %eax
-   push %ecx
push %ebx
reenable_vtpr
 
@@ -284,7 +282,7 @@ up_set_tpr_failed:
mov vapic, %eax ; fixup
 
mov %eax, %ebx
-   mov 20(%esp), %bl
+   mov 16(%esp), %bl
 
/* %ebx = new vapic (%bl = tpr, %bh = isr, %b3 = irr) */
 
@@ -298,7 +296,6 @@ up_isr_is_bigger:
mov %bh, %bl
 up_tpr_is_bigger:
/* %bl = ppr */
-   mov %bl, %ch   /* ch = ppr */
rol $8, %ebx
/* now: %bl = irr, %bh = ppr */
cmp %bh, %bl
@@ -306,7 +303,6 @@ up_tpr_is_bigger:
 
 up_set_tpr_out:
pop %ebx
-   pop %ecx
pop %eax
popf
ret $4
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 5/9] kvmvapic: Add option ROM

2012-02-14 Thread Jan Kiszka

This imports and builds the original VAPIC option ROM of qemu-kvm.
Its interaction with QEMU is described in the commit that introduces the
corresponding device model.

Signed-off-by: Jan Kiszka 
---
 .gitignore   |1 +
 Makefile |2 +-
 pc-bios/optionrom/Makefile   |2 +-
 pc-bios/optionrom/kvmvapic.S |  341 ++
 4 files changed, 344 insertions(+), 2 deletions(-)
 create mode 100644 pc-bios/optionrom/kvmvapic.S

diff --git a/.gitignore b/.gitignore
index f5aab2c..d3b78c3 100644
--- a/.gitignore
+++ b/.gitignore
@@ -75,6 +75,7 @@ pc-bios/vgabios-pq/status
 pc-bios/optionrom/linuxboot.bin
 pc-bios/optionrom/multiboot.bin
 pc-bios/optionrom/multiboot.raw
+pc-bios/optionrom/kvmvapic.bin
 .stgit-*
 cscope.*
 tags
diff --git a/Makefile b/Makefile
index 47acf3d..c2ef135 100644
--- a/Makefile
+++ b/Makefile
@@ -255,7 +255,7 @@ pxe-e1000.rom pxe-eepro100.rom pxe-ne2k_pci.rom \
 pxe-pcnet.rom pxe-rtl8139.rom pxe-virtio.rom \
 bamboo.dtb petalogix-s3adsp1800.dtb petalogix-ml605.dtb \
 mpc8544ds.dtb \
-multiboot.bin linuxboot.bin \
+multiboot.bin linuxboot.bin kvmvapic.bin \
 s390-zipl.rom \
 spapr-rtas.bin slof.bin \
 palcode-clipper
diff --git a/pc-bios/optionrom/Makefile b/pc-bios/optionrom/Makefile
index 2caf7e6..f6b4027 100644
--- a/pc-bios/optionrom/Makefile
+++ b/pc-bios/optionrom/Makefile
@@ -14,7 +14,7 @@ CFLAGS += -I$(SRC_PATH)
 CFLAGS += $(call cc-option, $(CFLAGS), -fno-stack-protector)
 QEMU_CFLAGS = $(CFLAGS)
 
-build-all: multiboot.bin linuxboot.bin
+build-all: multiboot.bin linuxboot.bin kvmvapic.bin
 
 # suppress auto-removal of intermediate files
 .SECONDARY:
diff --git a/pc-bios/optionrom/kvmvapic.S b/pc-bios/optionrom/kvmvapic.S
new file mode 100644
index 000..e1d8f18
--- /dev/null
+++ b/pc-bios/optionrom/kvmvapic.S
@@ -0,0 +1,341 @@
+#
+# Local APIC acceleration for Windows XP and related guests
+#
+# Copyright 2011 Red Hat, Inc. and/or its affiliates
+#
+# Author: Avi Kivity 
+#
+# This work is licensed under the terms of the GNU GPL, version 2, or (at your
+# option) any later version.  See the COPYING file in the top-level directory.
+#
+
+   .text 0
+   .code16
+.global _start
+_start:
+   .short 0xaa55
+   .byte (_end - _start) / 512
+   # clear vapic area: firmware load using rep insb may cause
+   # stale tpr/isr/irr data to corrupt the vapic area.
+   push %es
+   push %cs
+   pop %es
+   xor %ax, %ax
+   mov $vapic_size/2, %cx
+   lea vapic, %di
+   cld
+   rep stosw
+   pop %es
+   mov $vapic_base, %ax
+   out %ax, $0x7e
+   lret
+
+   .code32
+vapic_size = 2*4096
+
+.macro fixup delta=-4
+777:
+   .text 1
+   .long 777b + \delta  - vapic_base
+   .text 0
+.endm
+
+.macro reenable_vtpr
+   out %al, $0x7e
+.endm
+
+.text 1
+   fixup_start = .
+.text 0
+
+.align 16
+
+vapic_base:
+   .ascii "kvm aPiC"
+
+   /* relocation data */
+   .long vapic_base; fixup
+   .long fixup_start   ; fixup
+   .long fixup_end ; fixup
+
+   .long vapic ; fixup
+   .long vapic_size
+vcpu_shift:
+   .long 0
+real_tpr:
+   .long 0
+   .long up_set_tpr; fixup
+   .long up_set_tpr_eax; fixup
+   .long up_get_tpr_eax; fixup
+   .long up_get_tpr_ecx; fixup
+   .long up_get_tpr_edx; fixup
+   .long up_get_tpr_ebx; fixup
+   .long 0 /* esp. won't work. */
+   .long up_get_tpr_ebp; fixup
+   .long up_get_tpr_esi; fixup
+   .long up_get_tpr_edi; fixup
+   .long up_get_tpr_stack  ; fixup
+   .long mp_set_tpr; fixup
+   .long mp_set_tpr_eax; fixup
+   .long mp_get_tpr_eax; fixup
+   .long mp_get_tpr_ecx; fixup
+   .long mp_get_tpr_edx; fixup
+   .long mp_get_tpr_ebx; fixup
+   .long 0 /* esp. won't work. */
+   .long mp_get_tpr_ebp; fixup
+   .long mp_get_tpr_esi; fixup
+   .long mp_get_tpr_edi; fixup
+   .long mp_get_tpr_stack  ; fixup
+
+.macro kvm_hypercall
+   .byte 0x0f, 0x01, 0xc1
+.endm
+
+kvm_hypercall_vapic_poll_irq = 1
+
+pcr_cpu = 0x51
+
+.align 64
+
+mp_get_tpr_eax:
+   pushf
+   cli
+   reenable_vtpr
+   push %ecx
+
+   fs/movzbl pcr_cpu, %eax
+
+   mov vcpu_shift, %ecx; fixup
+   shl %cl, %eax
+   testb $1, vapic+4(%eax) ; fixup delta=-5
+   jz mp_get_tpr_bad
+   movzbl vapic(%eax), %eax ; fixup
+
+mp_get_tpr_out:
+   pop %ecx
+   popf
+   ret
+
+mp_get_tpr_bad:
+   mov real_tpr, %eax  ; fixup
+   mov (%eax), %eax
+   jmp mp_get_tpr_out
+
+mp_get_tpr_ebx:
+   mov %eax, %ebx
+   call mp_get_tpr_eax
+   xchg %eax, %ebx
+   ret
+
+mp_get_tpr_ecx:
+   mov %eax, %ecx
+   call mp_get_tpr_eax
+   xchg %eax, %ecx
+   ret
+
+mp_get_tpr_edx:
+   mov %eax, %edx
+   call mp_get_tpr_eax
+

Re: The way of mapping BIOS into the guest's address space

2012-02-14 Thread cody


On 02/14/2012 07:03 PM, Yang Bai wrote:

Hi all,

Since on X86, bios is always at the end of the address space, so I
have some thought about how to implement the seabios support for kvm
tool.

1. using kvm__register_mem to map the end of address space to the
guest then copy the code of seabios to this mem region. Just emulating
the bios chip.

2. leave the bios code alone and don't touch the guest's address
space. If the guest accesses the address belonging to the bios, it
will be an IO request and we can emulate the IO access to the bios
chip.

Any ideas about this?
   
Can I ask what's the purpose of mapping BIOS code to guest? Any usage? 
Shouldn't BIOS's behavior be emulated by hypervisor? Thanks.


-cody

And question:  How could I set the first instruction address after we
issue the vmlaunch instruction?

Thanks,
Yang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane

2012-02-14 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=42755





--- Comment #28 from Avi Kivity   2012-02-14 14:47:38 ---
(In reply to comment #27)
> and there soon will be video capture with 'perf top'
> 
> http://vbox7.com/play:199e9ede30

Run it while the guest is also running.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM call agenda for Tuesday 14

2012-02-14 Thread Juan Quintela

Juan Quintela  wrote:
> Hi
>
> Please send in any agenda items you are interested in covering.

As there are no topics, call is cancelled.

Happy hacking,

Juan.

PD. You should use the extra time to draw a qemu mascot O:-)

> Cheers,
>
> Juan.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH RFC] seabios: add OSHP method stub

2012-02-14 Thread Michael S. Tsirkin

On Tue, Feb 14, 2012 at 01:47:59PM +, Paul Brook wrote:
> > > > Now an OS can have a standard driver and use it
> > > > to activate hotplug functionality. This is OS hotplug (OSHP).
> > > 
> > > So presumably this will work on targets that don't have ACPI?
> > > Assuming a competent guest OS of course.  Have you tested this?
> > 
> > This being the qemu side of things? I run Linux
> > and verified that it calls OSHP and afterwards,
> > runs the native driver and handles hotplug/unplug
> > without invoking ACPI at all.
> 
> I mean using your shiny new hotplug PCI-PCI bridge on arm/ppc/mips targets 
> (i.e anything other than x86 PC).  From your description it sounds like it 
> *should* work.
>  
> > It seems that at least the SHPC driver in linux
> > doesn't work if you don't have an acpi table
> > with the OSHP method - not many people run with acpi=off
> > nowdays, so it's probably just a bug.
> > I'll check how hard it is to fix this.
> 
> Targets other than x86 don't have ACPI to start with.
> 
> Paul

So

#ifdef CONFIG_ACPI
#include 
static inline int get_hp_hw_control_from_firmware(struct pci_dev *dev)
{
u32 flags = OSC_SHPC_NATIVE_HP_CONTROL;
return acpi_get_hp_hw_control_from_firmware(dev, flags);
}
#else
#define get_hp_hw_control_from_firmware(dev) (0)
#endif

So if you build your guest without acpi, things should work fine.

-- 
MMST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Win 2000 driver for -vga std ?

2012-02-14 Thread Reeted


On 02/14/12 07:25, Michael Tokarev wrote:

On 14.02.2012 05:42, Reeted wrote:

Hello, subject says it all

The driver for windows 2000 for the -vga std should be the Anapa VBE 
Vesa VBEMP if I understand correctly

but I cannot on earth find this executable
http://navozhdeniye.narod.ru/vbemp.htm
all links for download all over the world are dangling!

Anybody has conserved this very important driver?


This "adapter" works in all versions of windows with a built-in
vesa driver just fine, no replacement is necessary or desired.

The only problem is that some versions of windows consider that
driver to be "problematic" somehow and mark the corresponding
device with yellow exclamation sign.  Go ask M$ about this.


I don't think so...

It detects new hardware (I am virtualizing an existing machine), asks me 
where to look for a driver, I make it go looking into the Win2000 
installation CD and online at Windows Update but it says it cannot find 
a driver for such video adapter. It asks me if I want to disable the 
device or be prompted again for installation at the next boot.


And it keeps running at 16 colors (4 bit depth) 800x600 and very poor 
performances when moving windows around.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: performance trouble

2012-02-14 Thread Vadim Rozenfeld



- Original Message -
From: "Avi Kivity" 
To: "David Cure" 
Cc: kvm@vger.kernel.org, "Vadim Rozenfeld" 
Sent: Tuesday, February 14, 2012 3:32:16 PM
Subject: Re: performance trouble

On 02/10/2012 12:09 PM, David Cure wrote:
>   hello,
>
> Le Sun, Feb 05, 2012 at 11:38:34AM +0200, Avi Kivity ecrivait :
> > 
> > Please post a trace as documented in http://www.linux-kvm.org/page/Tracing.
>
>   I made the trace : started just before the slow function launch
> and stoped just after. I start only one VM with 2 vcpus/16G RAM and only one
> user connected to the VM to launch the test.
>
>   The trace file is too big to post here, I gzip it and the file
> is available here : http://www.roullier.net/report.txt.gz
>
>   I hope you can find something strange.
>

It's reading the HPET like crazy.  There are also tons of interrupts. 
Please use the windows performance tools to see which devices trigger
these interrupts.

[VR]
+1 
Try Microsoft Windows Performance Toolkit from Windows SDK 
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=3138
It's really good.


The HPET issue will be fixed by the hyper-V enlightenments, but these
will take some time to cook.

You can also try vhost-net to improve networking latency.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH RFC] seabios: add OSHP method stub

2012-02-14 Thread Paul Brook

> > > Now an OS can have a standard driver and use it
> > > to activate hotplug functionality. This is OS hotplug (OSHP).
> > 
> > So presumably this will work on targets that don't have ACPI?
> > Assuming a competent guest OS of course.  Have you tested this?
> 
> This being the qemu side of things? I run Linux
> and verified that it calls OSHP and afterwards,
> runs the native driver and handles hotplug/unplug
> without invoking ACPI at all.

I mean using your shiny new hotplug PCI-PCI bridge on arm/ppc/mips targets 
(i.e anything other than x86 PC).  From your description it sounds like it 
*should* work.
 
> It seems that at least the SHPC driver in linux
> doesn't work if you don't have an acpi table
> with the OSHP method - not many people run with acpi=off
> nowdays, so it's probably just a bug.
> I'll check how hard it is to fix this.

Targets other than x86 don't have ACPI to start with.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: performance trouble

2012-02-14 Thread Gleb Natapov

On Tue, Feb 14, 2012 at 03:32:16PM +0200, Avi Kivity wrote:
> On 02/10/2012 12:09 PM, David Cure wrote:
> > hello,
> >
> > Le Sun, Feb 05, 2012 at 11:38:34AM +0200, Avi Kivity ecrivait :
> > > 
> > > Please post a trace as documented in 
> > > http://www.linux-kvm.org/page/Tracing.
> >
> > I made the trace : started just before the slow function launch
> > and stoped just after. I start only one VM with 2 vcpus/16G RAM and only one
> > user connected to the VM to launch the test.
> >
> > The trace file is too big to post here, I gzip it and the file
> > is available here : http://www.roullier.net/report.txt.gz
> >
> > I hope you can find something strange.
> >
> 
> It's reading the HPET like crazy.  There are also tons of interrupts. 
> Please use the windows performance tools to see which devices trigger
> these interrupts.
> 
> The HPET issue will be fixed by the hyper-V enlightenments, but these
> will take some time to cook.
> 
Try to add -no-hpet to qemu command line and see if it helps.

> You can also try vhost-net to improve networking latency.
> 
> -- 
> error compiling committee.c: too many arguments to function
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH RFC] seabios: add OSHP method stub

2012-02-14 Thread Michael S. Tsirkin

On Tue, Feb 14, 2012 at 12:49:08PM +, Paul Brook wrote:
> > > In a nutshell, I don't know what a SHPC is (nor OSHP), so I'm looking
> > > for an additional Ack.
> > 
> > No problem, I'll get an Ack :)
> > Meanwhile - here's a summary, as far as I understand it.
> > 
> > Originally PCI SIG only defined the electrical
> > and mechanical requirements from hotplug, no standard
> > software interface. So it needed ACPI to drive device-specific registers
> > to actually do hotplug.
> > At some point PCISIG defined standard interfaces
> > for PCI hotplug. There are two of them: standard
> > hot plug controller (SHPC) for PCI and PCIE hotplug
> > for Express.
> > 
> > Now an OS can have a standard driver and use it
> > to activate hotplug functionality. This is OS hotplug (OSHP).
> 
> So presumably this will work on targets that don't have ACPI?
> Assuming a competent guest OS of course.  Have you tested this?
> 
> Paul

This being the qemu side of things? I run Linux
and verified that it calls OSHP and afterwards,
runs the native driver and handles hotplug/unplug
without invoking ACPI at all.

It seems that at least the SHPC driver in linux
doesn't work if you don't have an acpi table
with the OSHP method - not many people run with acpi=off
nowdays, so it's probably just a bug.
I'll check how hard it is to fix this.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: performance trouble

2012-02-14 Thread Avi Kivity

On 02/10/2012 12:09 PM, David Cure wrote:
>   hello,
>
> Le Sun, Feb 05, 2012 at 11:38:34AM +0200, Avi Kivity ecrivait :
> > 
> > Please post a trace as documented in http://www.linux-kvm.org/page/Tracing.
>
>   I made the trace : started just before the slow function launch
> and stoped just after. I start only one VM with 2 vcpus/16G RAM and only one
> user connected to the VM to launch the test.
>
>   The trace file is too big to post here, I gzip it and the file
> is available here : http://www.roullier.net/report.txt.gz
>
>   I hope you can find something strange.
>

It's reading the HPET like crazy.  There are also tons of interrupts. 
Please use the windows performance tools to see which devices trigger
these interrupts.

The HPET issue will be fixed by the hyper-V enlightenments, but these
will take some time to cook.

You can also try vhost-net to improve networking latency.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: The way of mapping BIOS into the guest's address space

2012-02-14 Thread Cyrill Gorcunov

On Tue, Feb 14, 2012 at 09:20:18PM +0800, Yang Bai wrote:
> And will seabios replace the present bios implement or co-exsit?

Ideally we should get rid of our minibios completely and only have
seabios here instead.

Cyrill
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH RFC] seabios: add OSHP method stub

2012-02-14 Thread Paul Brook

> > In a nutshell, I don't know what a SHPC is (nor OSHP), so I'm looking
> > for an additional Ack.
> 
> No problem, I'll get an Ack :)
> Meanwhile - here's a summary, as far as I understand it.
> 
> Originally PCI SIG only defined the electrical
> and mechanical requirements from hotplug, no standard
> software interface. So it needed ACPI to drive device-specific registers
> to actually do hotplug.
> At some point PCISIG defined standard interfaces
> for PCI hotplug. There are two of them: standard
> hot plug controller (SHPC) for PCI and PCIE hotplug
> for Express.
> 
> Now an OS can have a standard driver and use it
> to activate hotplug functionality. This is OS hotplug (OSHP).

So presumably this will work on targets that don't have ACPI?
Assuming a competent guest OS of course.  Have you tested this?

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware

2012-02-14 Thread jamal

On Mon, 2012-02-13 at 07:13 -0800, John Fastabend wrote:

> The use case here is multiple VFs but the same solution should work with
> multiple PFs as well. FDB controls should be independent of how the ports
> are exposed VFs, PFs, VMDQ/queue pairs, macvlan, etc.

Makes sense.

> With events and ADD/DEL/GET FDB controls we can solve both cases. This also
> solves Roopa's case with macvlan where he wants to add additional addresses
> to macvlan ports.

Not familiar with that issue - I'll prowl the list.

> Yes it should flood here, unless its acting as a 802.1Qbg VEB or VEPA.

Ok. So there is a toggle somewhere which controls how flooding should
happen.

> 
> Maybe not. But the kernel already has the needed signals with one extra
> hook we can save running a daemon in user space. Maybe that's not a great
> argument to add kernel code though.

You make a reasonable arguement to have it in the kernel but i think we
win more if we separate the control. So while i empathize, I am hoping
that youd go with the path that is hard to travel ;->

> The PF_BRIDGE:RTM_GETNEIGH,RTM_NEWNEIGH,RTM_DELNEIGH are registered in the
> br_netlink_init() path. 

Hrm - hadnt paid attention to that before. Nasty.
The bridge seems to be hard-coding policy on station movement, no? 
This is a good example of the qualms i have on adding things to the
kernel;->
I may not want to auto update a MAC address moving ports as part of
some policy i have. I can go and add YAK (Yet Another Knob) - but where
is the line drawn?

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: The way of mapping BIOS into the guest's address space

2012-02-14 Thread Cyrill Gorcunov

On Tue, Feb 14, 2012 at 01:10:59PM +0200, Pekka Enberg wrote:
> On Tue, Feb 14, 2012 at 1:03 PM, Yang Bai  wrote:
> > Since on X86, bios is always at the end of the address space, so I
> > have some thought about how to implement the seabios support for kvm
> > tool.
> >
> > 1. using kvm__register_mem to map the end of address space to the
> > guest then copy the code of seabios to this mem region. Just emulating
> > the bios chip.

I think this is what should be done.

> >
> > 2. leave the bios code alone and don't touch the guest's address
> > space. If the guest accesses the address belonging to the bios, it
> > will be an IO request and we can emulate the IO access to the bios
> > chip.
> >
> > Any ideas about this?
> 
> The latter solution doesn't make any sense to me. Cyrill, do we really
> need to put the BIOS at the end of the address space? Don't we have
> unused space below 1 MB?

I don't remember for sure how SeaBIOS works actually. What I rememer
is that it aquires all hw environment might have. So without real look
into seabios code I fear I can't answer. But reserving end of 4G address
space for bios copy sounds reasonable if we going to behave as real
hardware. Maybe we could poke someone from KVM camp for a hint?

Cyrill
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: The way of mapping BIOS into the guest's address space

2012-02-14 Thread Pekka Enberg

On Tue, Feb 14, 2012 at 1:03 PM, Yang Bai  wrote:
> Since on X86, bios is always at the end of the address space, so I
> have some thought about how to implement the seabios support for kvm
> tool.
>
> 1. using kvm__register_mem to map the end of address space to the
> guest then copy the code of seabios to this mem region. Just emulating
> the bios chip.
>
> 2. leave the bios code alone and don't touch the guest's address
> space. If the guest accesses the address belonging to the bios, it
> will be an IO request and we can emulate the IO access to the bios
> chip.
>
> Any ideas about this?

The latter solution doesn't make any sense to me. Cyrill, do we really
need to put the BIOS at the end of the address space? Don't we have
unused space below 1 MB?

> And question:  How could I set the first instruction address after we
> issue the vmlaunch instruction?

You need to set ->boot_ip and fiends. See
tools/kvm/x86/kvm.c::load_bzimage() for an example.

Pekka
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

The way of mapping BIOS into the guest's address space

2012-02-14 Thread Yang Bai

Hi all,

Since on X86, bios is always at the end of the address space, so I
have some thought about how to implement the seabios support for kvm
tool.

1. using kvm__register_mem to map the end of address space to the
guest then copy the code of seabios to this mem region. Just emulating
the bios chip.

2. leave the bios code alone and don't touch the guest's address
space. If the guest accesses the address belonging to the bios, it
will be an IO request and we can emulate the IO access to the bios
chip.

Any ideas about this?

And question:  How could I set the first instruction address after we
issue the vmlaunch instruction?

Thanks,
Yang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

2012-02-14 Thread Paolo Bonzini


On 02/14/2012 02:11 AM, Michael S. Tsirkin wrote:

On Tue, Feb 14, 2012 at 11:49:55AM +1100, ronnie sahlberg wrote:

By just exposing this device to the kernel, the kernel keeps sending,
or if not the kernel maybe some other process trying to poll the
status?

every few seconds :
PREVENT_ALLOW_MEDIUM_REMOVAL  prevent removal
PREVENT_ALLOW_MEDIUM_REMOVAL  to immediatel change it back to allow
removal again
TEST_UNIT_READY


After I run this
mount /dev/sdd1 /mnt

The kernel sends a single
PREVENT_ALLOW_MEDIUM_REMOVAL to prevent removal
then every few seconds a
TEST_UNIT_READY


Sorry to interrupt you again guys, but: the discussion started with 
virtio-blk hotplug and now we're talking about SCSI commands?  Sure 
somebody switched topic at some point :) and anyway this is irrelevant 
to what virtio-blk can/cannot do.


BTW, for virtio-scsi the spec provides a way to do hotplug and hotunplug 
without any polling, though it's not implemented yet in the driver.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] need to improve slot creation/destruction? -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-14 Thread Avi Kivity

On 02/10/2012 07:16 PM, Marcelo Tosatti wrote:
> On Thu, Feb 09, 2012 at 04:25:36PM +0200, Avi Kivity wrote:
> > On 02/08/2012 08:45 PM, Marcelo Tosatti wrote:
> > > > BTW do we really need fast slot creation/destruction?
> > >
> > > At the moment yes. Boot a RHEL/Fedora installation disk (or any other
> > > guest which uses SYSLINUX splash screen) and you will see.
> > 
> > Another workload that suffers is Windows XP clearing the screen during boot.
> > 
> > >  That
> > > particular case is a limitation of cirrus in QEMU, ideally it should be
> > > optimized there.
> > 
> > Why do you say that?
>
> There is no fundamental need to create/destroy the 0xa VGA memory
> slot repeatedly.

If the guest writes to it, then the need exists.

> But you are right that the aim should be decent performance
> nevertheless.


-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] need to improve slot creation/destruction? -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-14 Thread Avi Kivity

On 02/10/2012 03:25 PM, Takuya Yoshikawa wrote:
> Avi Kivity  wrote:
>
> > > 2. When we create(and shift?) a memory slot, we call 
> > > kvm_arch_flush_shadow()
> > > to clear all mmio sptes, again not restricted to that slot.
> > >
> > >   /*
> > >* If the new memory slot is created, we need to clear all
> > >* mmio sptes.
> > >*/
> > >   if (npages && old.base_gfn != mem->guest_phys_addr >> PAGE_SHIFT)
> > >   kvm_arch_flush_shadow(kvm);
> > 
> > This is pretty rare outside the previous scenario (memory/pci hotplug).
>
> Is this condition correct?
>
> When npages != 0 and old.npages == 0, the slot is being newly created, do we
> really need to flush shadow pages?
>
> This should be
>   if (npages && old.npages && (old.base_gfn != base_gfn))
>

Your condition is more correct, but in practice there's no difference. 
If old.npages == 0, then old.base_gfn will be 0, and the condition will
fail, except for the first slot created (when the shadow cache is empty
anyway).

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: AESNI and guest hosts

2012-02-14 Thread Ryan Brown

Sorry for being a noob here, Any clues with this?, anyone ...

On Mon, Feb 13, 2012 at 2:05 AM, Ryan Brown  wrote:
> Host/KVM server is running linux 3.2.4 (Debian wheezy), and guest
> kernel is running 3.2.5. The cpu is an E3-1230, but for some reason
> its not able to supply the guest with aesni. Is there a config option
> or is there something we're missing?
>
>    
>     x86_64
>     Westmere
>     Intel
>     
>     
>     
>     
>     
>     
>     
>     
>     
>     
>     
>     
>     
>     
>     
>    
>  
>
> Guest:
> [root@fanboy:~]# cat /proc/cpuinfo
> processor       : 0
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 2
> model name      : QEMU Virtual CPU version 1.0
> stepping        : 3
> microcode       : 0x1
> cpu MHz         : 3192.748
> cache size      : 4096 KB
> fdiv_bug        : no
> hlt_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 4
> wp              : yes
> flags           : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca
> cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 popcnt
> hypervisor lahf_lm
> bogomips        : 6385.49
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 40 bits physical, 48 bits virtual
> power management:
>
> processor       : 1
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 2
> model name      : QEMU Virtual CPU version 1.0
> stepping        : 3
> microcode       : 0x1
> cpu MHz         : 3192.748
> cache size      : 4096 KB
> fdiv_bug        : no
> hlt_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 4
> wp              : yes
> flags           : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca
> cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 popcnt
> hypervisor lahf_lm
> bogomips        : 6385.49
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 40 bits physical, 48 bits virtual
> power management:
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests

2012-02-14 Thread Gleb Natapov

On Tue, Feb 14, 2012 at 09:55:46AM +0100, Jan Kiszka wrote:
> On 2012-02-14 08:54, Gleb Natapov wrote:
> > On Mon, Feb 13, 2012 at 08:22:21PM +0100, Jan Kiszka wrote:
>  Unfortunately, this is only an internal structure, not officially
>  documented by MS. However, all supported OS versions a legacy by now, no
>  longer changing its structure.
> >>>
> >>> This and a note about the supported OS versions could be added as comment.
> >>
> >> OK.
> >>
> >> For the folks that developed it in qemu-kvm: This targets Windows XP,
> >> Vista and Server 2003, all 32-bit, right?
> >>
> > Not Vista. Not sure about Server 2003.
> 
> I think I saw some 2003 reference in the qemu-kvm git logs.
> 
Very likely. AFAIK it uses the same kernel as XP.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests

2012-02-14 Thread Jan Kiszka

On 2012-02-14 08:54, Gleb Natapov wrote:
> On Mon, Feb 13, 2012 at 08:22:21PM +0100, Jan Kiszka wrote:
 Unfortunately, this is only an internal structure, not officially
 documented by MS. However, all supported OS versions a legacy by now, no
 longer changing its structure.
>>>
>>> This and a note about the supported OS versions could be added as comment.
>>
>> OK.
>>
>> For the folks that developed it in qemu-kvm: This targets Windows XP,
>> Vista and Server 2003, all 32-bit, right?
>>
> Not Vista. Not sure about Server 2003.

I think I saw some 2003 reference in the qemu-kvm git logs.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Kernel mode VGAs?

2012-02-14 Thread Jan Kiszka

On 2012-02-14 08:12, Gerhard Wiesinger wrote:
> Hello,
> 
> Current QEMU-KVM VGA implementation have the following problem with
> legacy OS (e.g. DOS with INT10h calls): Performance is low on accessing
> A000:0
> page and doing bank switching at the 64k page.

Do we already understand the mode and access patterns here? Which VGA
adapter? Cirrus, standard, or any? What is the concrete test case (one
that won't require me digging for MS Dose floppy disks in my basement)?

> 
> Would a kernel mode VGA solve these problems?
> How complicated is it?
> Is it possible to have only some parts in kernel mode?
> Any further ideas or suggestions?

Provided we take heavy exits so far, in-kernel acceleration may reduce
the exit overhead by factor, hmm, maybe 3-4. Better is to avoid exists
completely, i.e. switch the region to RAM mode. But that depends on the
graphic mode, and I'm afraid we have already covered all which can be
mapped like this.

In any case, before discussing solutions, we need to analyze the problem.

Jan

signature.asc
Description: OpenPGP digital signature

58 matches

Mail list logo