date:20100419

Re: [BUG] kvm: dereference srcu-protected pointer without srcu_read_lock() held

2010-04-19 Thread Lai Jiangshan

Marcelo Tosatti wrote:
> On Mon, Apr 19, 2010 at 01:08:29PM +0300, Avi Kivity wrote:
>> On 04/19/2010 12:58 PM, Lai Jiangshan wrote:
>>> Applied the patch I just sent and let CONFIG_PROVE_RCU=y,
>>> we can got the following dmesg. And we found that it is
>>> because some codes in KVM dereferences srcu-protected pointer without
>>> srcu_read_lock() held or update-side lock held.
>>>
>>> It is not hard to fix, the problem is that:
>>> Where is the most proper place to put a srcu_read_lock()?
>>>
>>> I can not determine the answer, so I report this bug
>>> instead of fixing it.
>>>
>> I think the else branch in complete_pio() should work.  Marcelo?
>>
>> Longer term I'd like to see the lock taken at the high levels
>> (ioctls, in virt/kvm) and dropped only for guest entry and when we
>> explicitly sleep (hlt emulation).
>>
>> Note: complete_pio() is gone in the current code.
> 
> Yes, this was fixed by 7fb2ea1e6.
> 
> 

Applied the patch I sent yesterday and let CONFIG_PROVE_RCU=y
I can get the following dmesg.

Under very simple test, these is no complaint from PROVE_RCU
after this patch applied.

More test or reviewing of code are need in future.

--
Subject: [PATCH] kvm: add missing srcu_read_lock()

I got this dmesg due to srcu_read_lock() is missing in
kvm_mmu_notifier_release().

===
[ INFO: suspicious rcu_dereference_check() usage. ]
---
arch/x86/kvm/x86.h:72 invoked rcu_dereference_check() without protection!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 0
2 locks held by qemu-system-x86/3100:
 #0:  (rcu_read_lock){.+.+..}, at: [] 
__mmu_notifier_release+0x38/0xdf
 #1:  (&(&kvm->mmu_lock)->rlock){+.+...}, at: [] 
kvm_mmu_zap_all+0x21/0x5e [kvm]

stack backtrace:
Pid: 3100, comm: qemu-system-x86 Not tainted 2.6.34-rc3-22949-gbc8a97a-dirty #2
Call Trace:
 [] lockdep_rcu_dereference+0xaa/0xb3
 [] unalias_gfn+0x56/0xab [kvm]
 [] gfn_to_memslot+0x16/0x25 [kvm]
 [] gfn_to_rmap+0x17/0x6e [kvm]
 [] rmap_remove+0xa0/0x19d [kvm]
 [] kvm_mmu_zap_page+0x109/0x34d [kvm]
 [] kvm_mmu_zap_all+0x35/0x5e [kvm]
 [] kvm_arch_flush_shadow+0x16/0x22 [kvm]
 [] kvm_mmu_notifier_release+0x15/0x17 [kvm]
 [] __mmu_notifier_release+0x88/0xdf
 [] ? __mmu_notifier_release+0x38/0xdf
 [] ? exit_mm+0xe0/0x115
 [] exit_mmap+0x2c/0x17e
 [] mmput+0x2d/0xd4
 [] exit_mm+0x108/0x115
[...]

Signed-off-by: Lai Jiangshan 
---
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a5dfea1..a6d639d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -341,7 +341,11 @@ static void kvm_mmu_notifier_release(struct mmu_notifier 
*mn,
 struct mm_struct *mm)
 {
struct kvm *kvm = mmu_notifier_to_kvm(mn);
+   int idx;
+
+   idx = srcu_read_lock(&kvm->srcu);
kvm_arch_flush_shadow(kvm);
+   srcu_read_unlock(&kvm->srcu, idx);
 }
 
 static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V5 1/3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-19 Thread Zhang, Yanmin

On Tue, 2010-04-20 at 08:09 +0200, Ingo Molnar wrote:
> * Ingo Molnar  wrote:
> 
> > 
> > * Zhang, Yanmin  wrote:
> > 
> > >  unsigned long perf_misc_flags(struct pt_regs *regs)
> > >  {
> > >   int misc = 0;
> > > +
> > >   if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
> > > + if (perf_guest_cbs->is_user_mode())
> > > + misc |= PERF_RECORD_MISC_GUEST_USER;
> > > + else
> > > + misc |= PERF_RECORD_MISC_GUEST_KERNEL;
> > > + } else if (user_mode(regs))
> > > + misc |= PERF_RECORD_MISC_USER;
> > > + else
> > > + misc |= PERF_RECORD_MISC_KERNEL;
> > > +
> > 
> > We try to use balanced curly braces. I.e.:
> > 
> > if (x) {
> > boo();
> > } else {
> > if (y)
> > foo();
> > else
> > bar();
> > }   
> > 
> > And avoid unbalanced ones:
> > 
> > if (x) {
> > boo();
> > } else
> > if (y)
> > foo();
> > else
> > bar();
> 
> Note, i fixed this in the patch and applied it to perf/core. (the invalid-C 
> problem was causing build failures)
Thanks for your teaching. Originally, I used {} in the 2nd half, but deleted it.

Yanmin


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V5 1/3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-19 Thread Ingo Molnar


* Ingo Molnar  wrote:

> 
> * Zhang, Yanmin  wrote:
> 
> >  unsigned long perf_misc_flags(struct pt_regs *regs)
> >  {
> > int misc = 0;
> > +
> > if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
> > +   if (perf_guest_cbs->is_user_mode())
> > +   misc |= PERF_RECORD_MISC_GUEST_USER;
> > +   else
> > +   misc |= PERF_RECORD_MISC_GUEST_KERNEL;
> > +   } else if (user_mode(regs))
> > +   misc |= PERF_RECORD_MISC_USER;
> > +   else
> > +   misc |= PERF_RECORD_MISC_KERNEL;
> > +
> 
> We try to use balanced curly braces. I.e.:
> 
>   if (x) {
>   boo();
>   } else {
>   if (y)
>   foo();
>   else
>   bar();
>   }   
> 
> And avoid unbalanced ones:
> 
>   if (x) {
>   boo();
>   } else
>   if (y)
>   foo();
>   else
>   bar();

Note, i fixed this in the patch and applied it to perf/core. (the invalid-C 
problem was causing build failures)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V5 1/3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-19 Thread Ingo Molnar


* Zhang, Yanmin  wrote:

>  unsigned long perf_misc_flags(struct pt_regs *regs)
>  {
>   int misc = 0;
> +
>   if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
> + if (perf_guest_cbs->is_user_mode())
> + misc |= PERF_RECORD_MISC_GUEST_USER;
> + else
> + misc |= PERF_RECORD_MISC_GUEST_KERNEL;
> + } else if (user_mode(regs))
> + misc |= PERF_RECORD_MISC_USER;
> + else
> + misc |= PERF_RECORD_MISC_KERNEL;
> +

We try to use balanced curly braces. I.e.:

if (x) {
boo();
} else {
if (y)
foo();
else
bar();
}   

And avoid unbalanced ones:

if (x) {
boo();
} else
if (y)
foo();
else
bar();

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM test: Silence screendump thread by default

2010-04-19 Thread Michael Goldish

On 04/16/2010 09:12 PM, Lucas Meneghel Rodrigues wrote:
> The VM screendump thread recently introduced generates
> a lot of output on debug logs. Such output is not needed
> most of the time (we are interested to see if a screenshot
> production attempt failed though) and distracts the user
> from other more important info.
> 
> So let's add an additional parameter on send_monitor_cmd
> that specifies if we actually want the monitor command logged,
> defaulting it to True so the rest of the callers won't have
> any change on their behavior. The screendump thread will
> call send_monitor_cmd with verbose=False unless in the
> configuration file one sets screendump_verbose=yes (defaults
> to no on the sample confifg file).
> 
> Signed-off-by: Lucas Meneghel Rodrigues 
> ---
>  client/tests/kvm/kvm_preprocessing.py  |   10 +-
>  client/tests/kvm/kvm_vm.py |7 +--
>  client/tests/kvm/tests_base.cfg.sample |1 +
>  3 files changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/client/tests/kvm/kvm_preprocessing.py 
> b/client/tests/kvm/kvm_preprocessing.py
> index 50db65c..4b9290c 100644
> --- a/client/tests/kvm/kvm_preprocessing.py
> +++ b/client/tests/kvm/kvm_preprocessing.py
> @@ -401,6 +401,10 @@ def _take_screendumps(test, params, env):
>   kvm_utils.generate_random_string(6))
>  delay = float(params.get("screendump_delay", 5))
>  quality = int(params.get("screendump_quality", 30))
> +if params.get("screendump_verbose") == 'yes':
> +screendump_verbose = True
> +else:
> +screendump_verbose = False

Why not:

screendump_verbose = params.get("screendump_verbose") == "yes"

>  cache = {}
>  
> @@ -408,7 +412,11 @@ def _take_screendumps(test, params, env):
>  for vm in kvm_utils.env_get_all_vms(env):
>  if vm.is_dead():
>  continue
> -vm.send_monitor_cmd("screendump %s" % temp_filename)
> +if screendump_verbose:
> +vm.send_monitor_cmd("screendump %s" % temp_filename)
> +else:
> +vm.send_monitor_cmd("screendump %s" % temp_filename,
> +verbose=False)

Also, why not:

vm.send_monitor_cmd("screendump %s" % temp_filename,
verbose=screendump_verbose)

>  if not os.path.exists(temp_filename):
>  logging.warn("VM '%s' failed to produce a screendump", 
> vm.name)
>  continue
> diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
> index 047505a..244355e 100755
> --- a/client/tests/kvm/kvm_vm.py
> +++ b/client/tests/kvm/kvm_vm.py
> @@ -498,7 +498,7 @@ class VM:
>  lockfile.close()
>  
>  
> -def send_monitor_cmd(self, command, block=True, timeout=20.0):
> +def send_monitor_cmd(self, command, block=True, timeout=20.0, 
> verbose=True):
>  """
>  Send command to the QEMU monitor.
>  
> @@ -541,8 +541,11 @@ class VM:
>  time.sleep(0.01)
>  return (False, o)
>  
> +# In certain conditions printing this debug output might be too much
> +# Just print it if verbose is enabled (True by default)
> +if verbose:
> +logging.debug("Sending monitor command: %s" % command)
>  # Connect to monitor
> -logging.debug("Sending monitor command: %s" % command)
>  try:
>  s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
>  s.setblocking(False)
> diff --git a/client/tests/kvm/tests_base.cfg.sample 
> b/client/tests/kvm/tests_base.cfg.sample
> index 4baa1dc..e74c7cb 100644
> --- a/client/tests/kvm/tests_base.cfg.sample
> +++ b/client/tests/kvm/tests_base.cfg.sample
> @@ -23,6 +23,7 @@ keep_screendumps_on_error = yes
>  screendump_delay = 5
>  screendump_quality = 30
>  screendump_temp_dir = /dev/shm
> +screendump_verbose = no
>  
>  # Some default VM params
>  qemu_binary = qemu

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [GSoC 2010] Pass-through filesystem support.

2010-04-19 Thread Mohammed Gamal

On Tue, Apr 20, 2010 at 12:54 AM, jvrao  wrote:
> Mohammed Gamal wrote:
>> On Tue, Apr 13, 2010 at 9:08 PM, jvrao  wrote:
>>> jvrao wrote:
 Alexander Graf wrote:
> On 12.04.2010, at 13:58, Jamie Lokier wrote:
>
>> Mohammed Gamal wrote:
>>> On Mon, Apr 12, 2010 at 12:29 AM, Jamie Lokier  
>>> wrote:
 Javier Guerra Giraldez wrote:
> On Sat, Apr 10, 2010 at 7:42 AM, Mohammed Gamal 
>  wrote:
>> On Sat, Apr 10, 2010 at 2:12 PM, Jamie Lokier  
>> wrote:
>>> To throw a spanner in, the most widely supported filesystem across
>>> operating systems is probably NFS, version 2 :-)
>> Remember that Windows usage on a VM is not some rare use case, and
>> it'd be a little bit of a pain from a user's perspective to have to
>> install a third party NFS client for every VM they use. Having
>> something supported on the VM out of the box is a better option IMO.
> i don't think virtio-CIFS has any more support out of the box (on any
> system) than virtio-9P.
 It doesn't, but at least network-CIFS tends to work ok and is the
 method of choice for Windows VMs - when you can setup Samba on the
 host (which as previously noted you cannot always do non-disruptively
 with current Sambas).

 -- Jamie

>>> I think having support for both 9p and CIFS would be the best option.
>>> In that case the user will have the option to use either one,
>>> depending on how their guests support these filesystems. In that case
>>> I'd prefer to work on CIFS support while the 9p effort can still go
>>> on. I don't think both efforts are mutually exclusive.
>>>
>>> What do the rest of you guys think?
>> I only noted NFS because most old OSes do not support CIFS or 9P -
>> especially all the old unixes.
>>
>> I don't think old versions of MS-DOS and Windows (95, 98, ME, Nt4?)
>> even support current CIFS.  They need extra server settings to work
>> - such as setting passwords on the server to non-encrypted and other 
>> quirks.
>>
>> Meanwhile Windows Vista/2008/7 works better with SMB2, not CIFS, to
>> properly see symlinks and hard links.
>>
>> So there is no really nice out of the box file service which works
>> easily with all guest OSes.
>>
>> I'm guessing that out of all the filesystems, CIFS is the most widely
>> supported in recent OSes (released in the last 10 years).  But I'm not
>> really sure what the state of CIFS is for non-Windows, non-Linux,
>> non-BSD guests.
> So what? If you want to have direct host fs access, install guest 
> drivers. If you can't, set up networking and use CIFS or NFS or whatever.
>
>> I'm not sure why 9P is being pursued.  Does anything much support it,
>> or do all OSes except quite recent Linux need a custom driver for 9P?
>> Even Linux only got the first commit in the kernel 5 years ago, so
>> probably it was only about 3 years ago that it will have begun
>> appearing in stable distros, if at all.  Filesystem passthrough to
>> Linux guests installed in the last couple of years is a useful
>> feature, and I know that for many people that is their only use of
>> KVM, but compared with CIFS' broad support it seems like quite a
>> narrow goal.
> The goal is to have something simple and fast. We can fine-tune 9P to 
> align with the Linux VFS structures, making it really little overhead 
> (and little headache too). For Windows guests, nothing prevents us to 
> expose yet another 9P flavor. That again would be aligned well with 
> Windows's VFS and be slim and fast there.
>
> The biggest problem I see with CIFS is that it's a huge beast. There are 
> a lot of corner cases where it just doesn't fit in. See my previous mail 
> for more details.
>
 As Alex mentioned, 9P is chosen for its mere simplicity and easy 
 adaptability.
 NFS and CIFS does not give that flexibility. As we mentioned in the patch 
 series, we are
 already seeing better numbers with 9P. Looking ahead 9P can embed KVM/QEMU 
 knowledge
 to share physical resources like memory/cache between the host and the 
 guest.

 I think looking into the windows side of 9P client would be great option 
 too.
 The current patch on the mailing list supports .U (unix) protocol and will 
 be introducing
 .L (Linux) variant as we move forward.
>>> Hi Mohammed,
>>> Please let us know once you decide on where your interest lies.
>>> Will be glad to have you on VirtFS (9P) though. :)
>>>
>>>
>>> - JV
>>>
>>
>> It seems the community is more keen on getting 9P support merged than
>> getting CIFS supported, and they have made good points to support
>> their argument. I'm not sure whether work on this project could fit in
>> as a GSoC project and if th

Re: [PATCH V3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-19 Thread Sheng Yang

On Monday 19 April 2010 16:25:17 Avi Kivity wrote:
> On 04/17/2010 09:12 PM, Avi Kivity wrote:
> > I think you were right the first time around.
> 
> Re-reading again (esp. the part about treatment of indirect NMI
> vmexits), I think this was wrong, and that the code is correct.  I am
> now thoroughly confused.
> 
My fault...

To my understanding now, "If an event causes a VM exit directly, it does not 
update architectural state as it would have if it had it not caused the VM 
exit:", means: in NMI case, NMI would involve the NMI handler, and change the 
"architectural state" to NMI block. In VMX non-root mode, the behavior of 
calling NMI handler changed(determine by some VMCS fields), but not the 
affection to the "architectural state". So the NMI block state would remain 
the same.

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: MMU: Drop cr4.pge from shadow page role

2010-04-19 Thread Marcelo Tosatti

On Mon, Apr 19, 2010 at 05:31:49PM +0300, Avi Kivity wrote:
> Since commit bf47a760f66ad, we no longer handle ptes with the global bit
> set specially, so there is no reason to distinguish between shadow pages
> created with cr4.gpe set and clear.
> 
> Such tracking is expensive when the guest toggles cr4.pge, so drop it.
> 
> Signed-off-by: Avi Kivity 

ACK

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ kvm-Bugs-2976863 ] 32PAE Windows guest blue screen when booting with apci on

2010-04-19 Thread SourceForge.net

Bugs item #2976863, was opened at 2010-03-26 14:44
Message generated for change (Comment added) made by haoxudong
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2976863&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Xudong Hao (haoxudong)
Assigned to: Nobody/Anonymous (nobody)
Summary: 32PAE Windows guest blue screen when booting with apci on

Initial Comment:
Environment:

Host OS (ia32/ia32e/IA64):32e and pae
Guest OS (ia32/ia32e/IA64): ia32pae
Guest OS Type (Linux/Windows): windows(xp, vista, and Win7)
kvm.git Commit: b8ed93d4.. 
qemu-kvm Commit: aa4df134.. 
Host Kernel Version:2.6.34-rc2
Hardware: Stoakely


Bug detailed description:
--
when we boot a 32pae Windows guest with acpi on, the guest will become bule 
screen of death.


Reproduce steps:

1. boot into host, loal kvm and kvm_intel module
2. use following command to boot a guest: qemu-system-x86_64  -m 256 -net
nic,macaddr=00:16:3e:27:b6:8a,model=rtl8139 -net tap,script=/etc/kvm/qemu-ifup
-hda /share/xvs/var/Windows.img
3. the guest will become bule screen of death.

--

>Comment By: Xudong Hao (haoxudong)
Date: 2010-04-20 10:35

Message:
Now BSOD issue got out on kvm commit:
bc8a97a218d0d2367d125db3e11ec45bb0fa0a3f-a1b705841caa33fca0b95928505c01d5105b706f.

However, boot a 32-bit XP with acpi on will cost about 9 minutes, did you
meet similar issue on your side?

--

Comment By: Marcelo Tosatti (mtosatti)
Date: 2010-04-15 04:51

Message:
1) 32-bit WinXP

qemu-system-x86_64 -drive file=/root/winxp-32-good.qcow2,cache=writeback 
-m 2000 -vnc :1 -usbdevice tablet -net nic,model=e1000 -net
tap,script=/root/ifup

2) acpi on 

Control panes shows "ACPI multiprocessor PC"

3) shadow mode

[r...@localhost ~]# cat /sys/module/kvm_intel/parameters/ept 
N

host:

Linux localhost.localdomain 2.6.34-rc3 #101 SMP Mon Apr 12 18:18:39 BRT
2010 i686 i686 i386 GNU/Linux


Can you send a screenshot of the hang? 

--

Comment By: Xudong Hao (haoxudong)
Date: 2010-04-13 13:45

Message:
Motsatti,

I checked again, guest still booting stop at the loading scroll bar. Let
find what difference between ours.
Can you check if your reproduce steps:
1) 32-bit WinXP
1) acpi on
2) shadow mode

--

Comment By: Marcelo Tosatti (mtosatti)
Date: 2010-04-13 06:28

Message:
Xudong,

Tested WinXP.32 and WinVista.32 on 32-bit host, both work fine.


--

Comment By: Xudong Hao (haoxudong)
Date: 2010-04-12 09:56

Message:
Latest commit: 069c71e7a5e5f968099ca551f68e5dfe5a3f71bc
qemu: 5c781061a45abe1855ad0a95d834336da574703c
Windows guest did not blue screen with latest KVM commit, but guest hang
at the loading scroll bar. 

This issue only happen on shadow mode. Windows guest can boot up with EPT
mode.


--

Comment By: Marcelo Tosatti (mtosatti)
Date: 2010-04-06 02:50

Message:
Xudong,

Can you reproduce the problem with latest git repositories? 

The BSOD screenshot is truncated, the error number is not present.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2976863&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [RFC][PATCH v2 0/3] Provide a zero-copy method on KVM virtio-net.

2010-04-19 Thread Xin, Xiaohui

Michael,

>> What we have not done yet:
>>  packet split support
>> 
>What does this mean, exactly?
 We can support 1500MTU, but for jumbo frame, since vhost driver before 
 don't 
support mergeable buffer, we cannot try it for multiple sg.
 
>>>I do not see why, vhost currently supports 64K buffers with indirect
>>>descriptors.
>>> 
>> The receive_skb() in guest virtio-net driver will merge the multiple sg to 
>> skb frags, how >>can indirect descriptors to that?

>See add_recvbuf_big.

I don't mean this, it's for buffer submission. I mean when packet is received, 
in receive_buf(), mergeable buffer knows which pages received can be hooked in 
skb frags, it's receive_mergeable() which do this.

When a NIC driver supports packet split mode, then each ring descriptor 
contains a skb and a page. When packet is received, if the status is not EOP, 
then hook the page of the next descriptor to the prev skb. We don't how many 
frags belongs to one skb. So when guest submit buffers, it should submit 
multiple pages, and when receive, the guest should know which pages are belongs 
to one skb and hook them together. I think receive_mergeable() can do this, but 
I don't see how big->packets handle this. May I miss something here?

Thanks
Xiaohui 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/1 V2 RESEND] Add new VM Exit codes

2010-04-19 Thread Manish Regmi

sorry the last one was word wrapped. now i fixed it.

hi, this patch,
 * adds two more EXIT_REASONS 33 and 34 in vmx.h
 * also adds them to exit reason strings.
Signed-off-by: Manish Regmi 

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index fb9a080..9bf84db 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -251,6 +251,8 @@ enum vmcs_field {
 #define EXIT_REASON_IO_INSTRUCTION  30
 #define EXIT_REASON_MSR_READ31
 #define EXIT_REASON_MSR_WRITE   32
+#define EXIT_REASON_INVALID_GUEST_STATE 33
+#define EXIT_REASON_MSR_LOADING 34
 #define EXIT_REASON_MWAIT_INSTRUCTION   36
 #define EXIT_REASON_MONITOR_INSTRUCTION 39
 #define EXIT_REASON_PAUSE_INSTRUCTION   40
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0b896ac..a5c53ad 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4061,6 +4061,8 @@ static const struct trace_print_flags 
vmx_exit_reasons_str[] = {
_ER(IO_INSTRUCTION),
_ER(MSR_READ),
_ER(MSR_WRITE),
+_ER(INVALID_GUEST_STATE),
+_ER(MSR_LOADING),
_ER(MWAIT_INSTRUCTION),
_ER(MONITOR_INSTRUCTION),
_ER(PAUSE_INSTRUCTION),

regards,
Manish Regmi
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/1 V2] clear leftmost bit when exit failure is vm entry type

2010-04-19 Thread Manish Regmi

hi,
 When the vm exit reason is VM Entry failures it has leftmost bit set.
 This patch
 - clears the leftmost bit when copying to vmx->exit_reason. This will make the 
checks like 
   if ((vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY) valid in 
vmx_complete_interrupts.

Signed-off-by: Manish Regmi 


diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0b896ac..e0ca917 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3642,7 +3642,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
 
exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
 
-   vmx->exit_reason = vmcs_read32(VM_EXIT_REASON);
+   vmx->exit_reason = vmcs_read32(VM_EXIT_REASON) & 
~VMX_EXIT_REASONS_FAILED_VMENTRY;
 
/* Handle machine checks before interrupts are enabled */
if ((vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY)

regards
Manish Regmi
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V5 1/3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-19 Thread Zhang, Yanmin

On Mon, 2010-04-19 at 20:11 +0200, Ingo Molnar wrote:
> FYI, i found a few problems that need fixing:
> 
> > +   unsigned long ip;
> > +   if (perf_guest_cbs && perf_guest_cbs->is_in_guest())
> 
> missing newline.
> 
> > +   int misc = 0;
> > +   if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
> 
> ditto.
> 
> > +   PERF_RECORD_MISC_GUEST_KERNEL;
> > +   } else
> > +   misc |= user_mode(regs) ? PERF_RECORD_MISC_USER :
> > +   PERF_RECORD_MISC_KERNEL;
> 
> - unbalanced curly braces
> - missing curly brace for multi-line statement.
> - unnecessary line-break due to col80 warning from checkpatch
> 
> > +extern struct perf_guest_info_callbacks *perf_guest_cbs;
> > +extern int perf_register_guest_info_callbacks(
> > +   struct perf_guest_info_callbacks *);
> > +extern int perf_unregister_guest_info_callbacks(
> > +   struct perf_guest_info_callbacks *);
> 
> - unnecessary line-break due to col80 warning from checkpatch
> 
> > +static inline int perf_register_guest_info_callbacks
> > +(struct perf_guest_info_callbacks *) {return 0; }
> > +static inline int perf_unregister_guest_info_callbacks
> > +(struct perf_guest_info_callbacks *) {return 0; }
> 
> - invalid C: function parameter needs name even if unused
> - missing space after opening curly brace
> 
> Please provide delta fixes.

Here is the fix on the top of the prior 3 patches of V5.

From: Zhang, Yanmin 

Fix some programming style issues on the top of perf kvm
enhancement V5.

Signed-off-by: Zhang Yanmin 

---

diff -Nraup linux-2.6_tip0417_perfkvm/arch/x86/kernel/cpu/perf_event.c 
linux-2.6_tip0417_perfkvmstyle/arch/x86/kernel/cpu/perf_event.c
--- linux-2.6_tip0417_perfkvm/arch/x86/kernel/cpu/perf_event.c  2010-04-19 
09:53:59.689452915 +0800
+++ linux-2.6_tip0417_perfkvmstyle/arch/x86/kernel/cpu/perf_event.c 
2010-04-20 10:48:18.500999849 +0800
@@ -1752,23 +1752,29 @@ void perf_arch_fetch_caller_regs(struct 
 unsigned long perf_instruction_pointer(struct pt_regs *regs)
 {
unsigned long ip;
+
if (perf_guest_cbs && perf_guest_cbs->is_in_guest())
ip = perf_guest_cbs->get_guest_ip();
else
ip = instruction_pointer(regs);
+
return ip;
 }
 
 unsigned long perf_misc_flags(struct pt_regs *regs)
 {
int misc = 0;
+
if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
-   misc |= perf_guest_cbs->is_user_mode() ?
-   PERF_RECORD_MISC_GUEST_USER :
-   PERF_RECORD_MISC_GUEST_KERNEL;
-   } else
-   misc |= user_mode(regs) ? PERF_RECORD_MISC_USER :
-   PERF_RECORD_MISC_KERNEL;
+   if (perf_guest_cbs->is_user_mode())
+   misc |= PERF_RECORD_MISC_GUEST_USER;
+   else
+   misc |= PERF_RECORD_MISC_GUEST_KERNEL;
+   } else if (user_mode(regs))
+   misc |= PERF_RECORD_MISC_USER;
+   else
+   misc |= PERF_RECORD_MISC_KERNEL;
+
if (regs->flags & PERF_EFLAGS_EXACT)
misc |= PERF_RECORD_MISC_EXACT;
 
diff -Nraup linux-2.6_tip0417_perfkvm/arch/x86/kvm/x86.c 
linux-2.6_tip0417_perfkvmstyle/arch/x86/kvm/x86.c
--- linux-2.6_tip0417_perfkvm/arch/x86/kvm/x86.c2010-04-19 
09:53:59.691378953 +0800
+++ linux-2.6_tip0417_perfkvmstyle/arch/x86/kvm/x86.c   2010-04-20 
10:11:40.507545564 +0800
@@ -3776,16 +3776,20 @@ static int kvm_is_in_guest(void)
 static int kvm_is_user_mode(void)
 {
int user_mode = 3;
+
if (percpu_read(current_vcpu))
user_mode = kvm_x86_ops->get_cpl(percpu_read(current_vcpu));
+
return user_mode != 0;
 }
 
 static unsigned long kvm_get_guest_ip(void)
 {
unsigned long ip = 0;
+
if (percpu_read(current_vcpu))
ip = kvm_rip_read(percpu_read(current_vcpu));
+
return ip;
 }
 
diff -Nraup linux-2.6_tip0417_perfkvm/include/linux/perf_event.h 
linux-2.6_tip0417_perfkvmstyle/include/linux/perf_event.h
--- linux-2.6_tip0417_perfkvm/include/linux/perf_event.h2010-04-19 
09:53:59.691378953 +0800
+++ linux-2.6_tip0417_perfkvmstyle/include/linux/perf_event.h   2010-04-20 
10:08:03.531551890 +0800
@@ -941,10 +941,8 @@ static inline void perf_event_mmap(struc
 }
 
 extern struct perf_guest_info_callbacks *perf_guest_cbs;
-extern int perf_register_guest_info_callbacks(
-   struct perf_guest_info_callbacks *);
-extern int perf_unregister_guest_info_callbacks(
-   struct perf_guest_info_callbacks *);
+extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks 
*callbacks);
+extern int perf_unregister_guest_info_callbacks(struct 
perf_guest_info_callbacks *callbacks);
 
 extern void perf_event_comm(struct task_struct *tsk);
 extern void perf_event_fork(struct task_struct *tsk);
@@ -1016,9 +1014,9 @@ static inline void
 perf_bp_event(struct perf_event *event, void *data){ }
 
 static i

Re: [PATCH] kvm: use the correct RCU API

2010-04-19 Thread Lai Jiangshan

Paul E. McKenney wrote:
> On Mon, Apr 19, 2010 at 12:49:04PM +0300, Avi Kivity wrote:
>> On 04/19/2010 12:41 PM, Lai Jiangshan wrote:
>>> The RCU/SRCU API have already changed for proving RCU usage.
>>>
>>> I got the following dmesg when PROVE_RCU=y because we used incorrect API.
>>> This patch coverts rcu_deference() to srcu_dereference() or family API.
>>>
>>> ===
>>> [ INFO: suspicious rcu_dereference_check() usage. ]
>>> ---
>>> arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection!
>>>
>>> other info that might help us debug this:
>>>
>>>
>>> rcu_scheduler_active = 1, debug_locks = 0
>>> 2 locks held by qemu-system-x86/8550:
>>>  #0:  (&kvm->slots_lock){+.+.+.}, at: [] 
>>> kvm_set_memory_region+0x29/0x50 [kvm]
>>>  #1:  (&(&kvm->mmu_lock)->rlock){+.+...}, at: [] 
>>> kvm_arch_commit_memory_region+0xa6/0xe2 [kvm]
>>>
>>> stack backtrace:
>>> Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 
>>> #27
>>> Call Trace:
>>>  [] lockdep_rcu_dereference+0xaa/0xb3
>>>  [] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm]
>>>  [] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm]
>>>  [] __kvm_set_memory_region+0x636/0x6e2 [kvm]
>>>  [] kvm_set_memory_region+0x37/0x50 [kvm]
>>>  [] vmx_set_tss_addr+0x46/0x5a [kvm_intel]
>>>  [] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm]
>>>  [] ? unlock_page+0x27/0x2c
>>>  [] ? __do_fault+0x3a9/0x3e1
>>>  [] kvm_vm_ioctl+0x364/0x38d [kvm]
>>>  [] ? up_read+0x23/0x3d
>>>  [] vfs_ioctl+0x32/0xa6
>>>  [] do_vfs_ioctl+0x495/0x4db
>>>  [] ? fget_light+0xc2/0x241
>>>  [] ? do_sys_open+0x104/0x116
>>>  [] ? retint_swapgs+0xe/0x13
>>>  [] sys_ioctl+0x47/0x6a
>>>  [] system_call_fastpath+0x16/0x1b
>>>
>>>
>>>
>>> +static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm)
>>> +{
>>> +   return rcu_dereference_check(kvm->memslots,
>>> +   srcu_read_lock_held(&kvm->srcu)
>>> +   || lockdep_is_held(&kvm->slots_lock));
>>> +}
>>> +
>>
>> This open-codes srcu_dereference().  I guess we need an
>> srcu_dereference_check().  Paul?
> 

rcu_dereference_check() is useful when rcu_dereference(),
rcu_dereference_bh(), rcu_dereference_sched() and srcu_dereference()
are not appropriate.

I think we don't need srcu_dereference_check() nor rcu_dereference_bh_check()
nor rcu_dereference_sched_check().

> One is coming in Arnd's sparse-based patchset.  It is probably best
> to open-code this in the meantime and clean up later, but I will
> double-check with Arnd.
> 
>> btw, perhaps it is possible not to call rcu_dereference from the
>> write paths.
> 
> There is an rcu_dereference_protected() on its way to mainline to handle
> the case where the reference is always protected by a lock.  Why not
> just access it directly?  Because if you do that, the sparse-based checks
> will yell at you.
> 
> There is also an rcu_access_pointer() on its way to mainline for cases
> where you only want to test the pointer itself, not dereference it.
> 
>   Thanx, Paul
> 

I reviewed the code, the functions can be called from the srcu-read-site
or update-site, rcu_dereference_check() can simplify the code.

If we use rcu_dereference_protected(), we may need duplicate the functions.

I think there is very small overhead of using rcu_dereference(), so we can
call it from write paths.

Thanks,
Lai
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Marcelo Tosatti

On Mon, Apr 19, 2010 at 03:25:43PM -0300, Glauber Costa wrote:
> On Mon, Apr 19, 2010 at 09:19:38AM -0700, Jeremy Fitzhardinge wrote:
> > On 04/19/2010 07:26 AM, Glauber Costa wrote:
> > >> Is the problem that the tscs are starting out of sync, or that they're
> > >> drifting relative to each other over time?  Do the problems become worse
> > >> the longer the uptime?  How large are the offsets we're talking about 
> > >> here?
> > >> 
> > > The offsets usually seem pretty small, under a microsecond. So I don't 
> > > think
> > > it has anything to do with tscs starting out of sync. Specially because 
> > > the
> > > delta-based calculation has the exact purpose of handling that case.
> > >   
> > 
> > So you think they're drifting out of sync from an initially synced
> > state?  If so, what would bound the drift?
> I think delta calculation introduces errors.

Yes.

> Marcelo can probably confirm it, but he has a nehalem with an appearently
> very good tsc source. Even this machine warps.
> 
> It stops warping if we only write pvclock data structure once and forget it,
> (which only updated tsc_timestamp once), according to him.

Yes. So its not as if the guest visible TSCs go out of sync (they don't
on this machine Glauber mentioned, or even on a multi-core Core 2 Duo),
but the delta calculation is very hard (if not impossible) to get right.

The timewarps i've seen were in the 0-200ns range, and very rare (once
every 10 minutes or so).

> Obviously, we can't do that everywhere.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/1 V2] Add new VM Exit codes

2010-04-19 Thread Manish Regmi


hi, this patch,
* adds two more EXIT_REASONS 33 and 34 in vmx.h
* also adds them to exit reason strings.

Signed-off-by: Manish Regmi 

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index fb9a080..9bf84db 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -251,6 +251,8 @@ enum vmcs_field {
#define EXIT_REASON_IO_INSTRUCTION  30
#define EXIT_REASON_MSR_READ31
#define EXIT_REASON_MSR_WRITE   32
+#define EXIT_REASON_INVALID_GUEST_STATE 33
+#define EXIT_REASON_MSR_LOADING 34
#define EXIT_REASON_MWAIT_INSTRUCTION   36
#define EXIT_REASON_MONITOR_INSTRUCTION 39
#define EXIT_REASON_PAUSE_INSTRUCTION   40
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0b896ac..a5c53ad 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4061,6 +4061,8 @@ static const struct trace_print_flags 
vmx_exit_reasons_str[] = {

_ER(IO_INSTRUCTION),
_ER(MSR_READ),
_ER(MSR_WRITE),
+_ER(INVALID_GUEST_STATE),
+_ER(MSR_LOADING),
_ER(MWAIT_INSTRUCTION),
_ER(MONITOR_INSTRUCTION),
_ER(PAUSE_INSTRUCTION),


regards,
Manish Regmi

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] kvm: dereference srcu-protected pointer without srcu_read_lock() held

2010-04-19 Thread Marcelo Tosatti

On Mon, Apr 19, 2010 at 01:08:29PM +0300, Avi Kivity wrote:
> On 04/19/2010 12:58 PM, Lai Jiangshan wrote:
> >Applied the patch I just sent and let CONFIG_PROVE_RCU=y,
> >we can got the following dmesg. And we found that it is
> >because some codes in KVM dereferences srcu-protected pointer without
> >srcu_read_lock() held or update-side lock held.
> >
> >It is not hard to fix, the problem is that:
> >Where is the most proper place to put a srcu_read_lock()?
> >
> >I can not determine the answer, so I report this bug
> >instead of fixing it.
> >
> 
> I think the else branch in complete_pio() should work.  Marcelo?
> 
> Longer term I'd like to see the lock taken at the high levels
> (ioctls, in virt/kvm) and dropped only for guest entry and when we
> explicitly sleep (hlt emulation).
> 
> Note: complete_pio() is gone in the current code.

Yes, this was fixed by 7fb2ea1e6.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] virtio-spec: document block CMD and FLUSH

2010-04-19 Thread Jamie Lokier

Michael S. Tsirkin wrote:
> I took a stub at documenting CMD and FLUSH request types in virtio
> block.  Christoph, could you look over this please?
> 
> I note that the interface seems full of warts to me,
> this might be a first step to cleaning them.
> 
> One issue I struggled with especially is how type
> field mixes bits and non-bit values. I ended up
> simply defining all legal values, so that we have
> CMD = 2, CMD_OUT = 3 and so on.
> 
> I also avoided instroducing inhdr/outhdr structures
> that virtio blk driver in linux uses, I was concerned
> that nesting tables will confuse the reader.
> 
> Comments welcome.
> 
> Signed-off-by: Michael S. Tsirkin 
> 
> --
> 
> diff --git a/virtio-spec.lyx b/virtio-spec.lyx
> index d16104a..ed35893 100644
> --- a/virtio-spec.lyx
> +++ b/virtio-spec.lyx
> @@ -67,7 +67,11 @@ IBM Corporation
>  \end_layout
>  
>  \begin_layout Standard
> +
> +\change_deleted 0 1266531118
>  FIXME: virtio block scsi passthrough section
> +\change_unchanged
> +
>  \end_layout
>  
>  \begin_layout Standard
> @@ -4376,7 +4380,7 @@ struct virtio_net_ctrl_mac {
>  The device can filter incoming packets by any number of destination MAC
>   addresses.
>  \begin_inset Foot
> -status open
> +status collapsed
>  
>  \begin_layout Plain Layout
>  Since there are no guarentees, it can use a hash filter orsilently switch
> @@ -4549,6 +4553,22 @@ blk_size
>  \end_inset
>  
>  .
> +\change_inserted 0 1266444580
> +
> +\end_layout
> +
> +\begin_layout Description
> +
> +\change_inserted 0 1266471229
> +VIRTIO_BLK_F_SCSI (7) Device supports scsi packet commands.
> +\end_layout
> +
> +\begin_layout Description
> +
> +\change_inserted 0 1266444605
> +VIRTIO_BLK_F_FLUSH (9) Cache flush command support.
> +\change_unchanged
> +
>  \end_layout
>  
>  \begin_layout Description
> @@ -4700,17 +4720,25 @@ struct virtio_blk_req {
>  
>  \begin_layout Plain Layout
>  
> +\change_deleted 0 1266472188
> +
>  #define VIRTIO_BLK_T_IN  0
>  \end_layout
>  
>  \begin_layout Plain Layout
>  
> +\change_deleted 0 1266472188
> +
>  #define VIRTIO_BLK_T_OUT 1
>  \end_layout
>  
>  \begin_layout Plain Layout
>  
> +\change_deleted 0 1266472188
> +
>  #define VIRTIO_BLK_T_BARRIER  0x8000
> +\change_unchanged
> +
>  \end_layout
>  
>  \begin_layout Plain Layout
> @@ -4735,11 +4763,15 @@ struct virtio_blk_req {
>  
>  \begin_layout Plain Layout
>  
> +\change_deleted 0 1266472204
> +
>  #define VIRTIO_BLK_S_OK0
>  \end_layout
>  
>  \begin_layout Plain Layout
>  
> +\change_deleted 0 1266472204
> +
>  #define VIRTIO_BLK_S_IOERR 1
>  \end_layout
>  
> @@ -4759,32 +4791,481 @@ struct virtio_blk_req {
>  \end_layout
>  
>  \begin_layout Standard
> -The type of the request is either a read (VIRTIO_BLK_T_IN) or a write 
> (VIRTIO_BL
> -K_T_OUT); the high bit indicates that this request acts as a barrier and
> - that all preceeding requests must be complete before this one, and all
> - following requests must not be started until this is complete.
> +
> +\change_inserted 0 1266472490
> +If the device has VIRTIO_BLK_F_SCSI feature, it can also support scsi packet
> + command requests, each of these requests is of form:
> +\begin_inset listings
> +inline false
> +status open
> +
> +\begin_layout Plain Layout
> +
> +\change_inserted 0 1266472395
> +
> +struct virtio_scsi_pc_req {
> +\end_layout
> +
> +\begin_layout Plain Layout
> +
> +\change_inserted 0 1266472375
> +
> + u32 type;
> +\end_layout
> +
> +\begin_layout Plain Layout
> +
> +\change_inserted 0 1266472375
> +
> + u32 ioprio;
> +\end_layout
> +
> +\begin_layout Plain Layout
> +
> +\change_inserted 0 1266474298
> +
> + u64 sector;
> +\end_layout
> +
> +\begin_layout Plain Layout
> +
> +\change_inserted 0 1266474308
> +
> +char cmd[];
> +\end_layout
> +
> +\begin_layout Plain Layout
> +
> +\change_inserted 0 1266505809
> +
> + char data[][512];
> +\end_layout
> +
> +\begin_layout Plain Layout
> +
> +\change_inserted 0 1266505825
> +
> +#define SCSI_SENSE_BUFFERSIZE   96
> +\end_layout
> +
> +\begin_layout Plain Layout
> +
> +\change_inserted 0 1266505848
> +
> +u8 sense[SCSI_SENSE_BUFFERSIZE];
> +\end_layout
> +
> +\begin_layout Plain Layout
> +
> +\change_inserted 0 1266472969
> +
> +u32 errors;
> +\end_layout
> +
> +\begin_layout Plain Layout
> +
> +\change_inserted 0 1266472979
> +
> +u32 data_len;
> +\end_layout
> +
> +\begin_layout Plain Layout
> +
> +\change_inserted 0 1266472984
> +
> +u32 sense_len;
> +\end_layout
> +
> +\begin_layout Plain Layout
> +
> +\change_inserted 0 1266472987
> +
> +u32 residual;
> +\end_layout
> +
> +\begin_layout Plain Layout
> +
> +\change_inserted 0 1266472375
> +
> + u8 status;
> +\end_layout
> +
> +\begin_layout Plain Layout
> +
> +\change_inserted 0 1266472375
> +
> +};
> +\end_layout
> +
> +\end_inset
> +
> +
> +\change_unchanged
> +
>  \end_layout
>  
>  \begin_layout Standard
> -The ioprio field is a hint about the relative priorities of requests to
> - the

Re: [PATCH] kvm x86 mmu: simplify kvm_mmu_unlink_parents()

2010-04-19 Thread Marcelo Tosatti

On Sat, Apr 17, 2010 at 04:50:13PM +0800, Lai Jiangshan wrote:
> 
> mmu_page_remove_parent_pte() does much maintenance works,
> but kvm_mmu_unlink_parents() unlink all parents, so
> such maintenance works are not need.
> 
> This patch simplifies the works of kvm_mmu_unlink_parents()
> by unlinking parents without so many maintenance works.
> 
> Signed-off-by: Lai Jiangshan 
> ---
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 90f666e..71faa04 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1453,22 +1453,33 @@ static void kvm_mmu_reset_last_pte_updated(struct kvm 
> *kvm)
>  
>  static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
> - u64 *parent_pte;
> + if (!sp->multimapped) {
> + if (!sp->parent_pte)
> + return;
>  
> - while (sp->multimapped || sp->parent_pte) {
> - if (!sp->multimapped)
> - parent_pte = sp->parent_pte;
> - else {
> - struct kvm_pte_chain *chain;
> + __set_spte(sp->parent_pte, shadow_trap_nonpresent_pte);
> + sp->parent_pte = NULL;
> + return;
> + }
>  
> - chain = container_of(sp->parent_ptes.first,
> -  struct kvm_pte_chain, link);
> - parent_pte = chain->parent_ptes[0];
> + while (!hlist_empty(&sp->parent_ptes)) {
> + struct kvm_pte_chain *chain;
> + u64 *parent_pte;
> + int i;
> +
> + chain = hlist_entry(sp->parent_ptes.first,
> + struct kvm_pte_chain, link);
> + for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) {
> + parent_pte = chain->parent_ptes[i];
> + if (!parent_pte)
> + break;
> + __set_spte(parent_pte, shadow_trap_nonpresent_pte);
>   }
> - BUG_ON(!parent_pte);
> - kvm_mmu_put_page(sp, parent_pte);
> - __set_spte(parent_pte, shadow_trap_nonpresent_pte);
> + hlist_del(&chain->link);
> + mmu_free_pte_chain(chain);
>   }
> + sp->multimapped = 0;
> + sp->parent_pte = NULL;
>  }
>  
>  static int mmu_zap_unsync_children(struct kvm *kvm,
> 

Personally i don't see why this is any better than the previous code.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/9] KVM: PPC: Convert u64 -> ulong

2010-04-19 Thread Alexander Graf

There are some pieces in the code that I overlooked that still use
u64s instead of longs. This slows down 32 bit hosts unnecessarily, so
let's just move them to ulong.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - don't touch vsid - that stays u64!
---
 arch/powerpc/include/asm/kvm_book3s.h |4 ++--
 arch/powerpc/include/asm/kvm_host.h   |6 +++---
 arch/powerpc/kvm/book3s.c |6 +++---
 arch/powerpc/kvm/book3s_32_mmu.c  |6 +++---
 arch/powerpc/kvm/book3s_32_mmu_host.c |8 +++-
 arch/powerpc/kvm/book3s_64_mmu.c  |4 ++--
 arch/powerpc/kvm/book3s_64_mmu_host.c |6 +++---
 7 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 9517b8d..5d3bd0c 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -107,9 +107,9 @@ struct kvmppc_vcpu_book3s {
 #define VSID_BAT   0x7fb0ULL
 #define VSID_PR0x8000ULL
 
-extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, u64 ea, u64 ea_mask);
+extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong ea, ulong 
ea_mask);
 extern void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 vp, u64 vp_mask);
-extern void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, u64 pa_start, u64 
pa_end);
+extern void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, ulong pa_start, ulong 
pa_end);
 extern void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 new_msr);
 extern void kvmppc_mmu_book3s_64_init(struct kvm_vcpu *vcpu);
 extern void kvmppc_mmu_book3s_32_init(struct kvm_vcpu *vcpu);
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 5a83995..0c9ad86 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -124,9 +124,9 @@ struct kvm_arch {
 };
 
 struct kvmppc_pte {
-   u64 eaddr;
+   ulong eaddr;
u64 vpage;
-   u64 raddr;
+   ulong raddr;
bool may_read   : 1;
bool may_write  : 1;
bool may_execute: 1;
@@ -145,7 +145,7 @@ struct kvmppc_mmu {
int  (*xlate)(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte 
*pte, bool data);
void (*reset_msr)(struct kvm_vcpu *vcpu);
void (*tlbie)(struct kvm_vcpu *vcpu, ulong addr, bool large);
-   int  (*esid_to_vsid)(struct kvm_vcpu *vcpu, u64 esid, u64 *vsid);
+   int  (*esid_to_vsid)(struct kvm_vcpu *vcpu, ulong esid, u64 *vsid);
u64  (*ea_to_vp)(struct kvm_vcpu *vcpu, gva_t eaddr, bool data);
bool (*is_dcbz32)(struct kvm_vcpu *vcpu);
 };
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 5805f99..a7de709 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -812,12 +812,12 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 * so we can't use the NX bit inside the guest. 
Let's cross our fingers,
 * that no guest that needs the dcbz hack does NX.
 */
-   kvmppc_mmu_pte_flush(vcpu, kvmppc_get_pc(vcpu), 
~0xFFFULL);
+   kvmppc_mmu_pte_flush(vcpu, kvmppc_get_pc(vcpu), 
~0xFFFUL);
r = RESUME_GUEST;
} else {
vcpu->arch.msr |= to_svcpu(vcpu)->shadow_srr1 & 
0x5800;
kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
-   kvmppc_mmu_pte_flush(vcpu, kvmppc_get_pc(vcpu), 
~0xFFFULL);
+   kvmppc_mmu_pte_flush(vcpu, kvmppc_get_pc(vcpu), 
~0xFFFUL);
r = RESUME_GUEST;
}
break;
@@ -843,7 +843,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
vcpu->arch.dear = dar;
to_book3s(vcpu)->dsisr = to_svcpu(vcpu)->fault_dsisr;
kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
-   kvmppc_mmu_pte_flush(vcpu, vcpu->arch.dear, ~0xFFFULL);
+   kvmppc_mmu_pte_flush(vcpu, vcpu->arch.dear, ~0xFFFUL);
r = RESUME_GUEST;
}
break;
diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index 48efb37..33186b7 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -60,7 +60,7 @@ static inline bool check_debug_ip(struct kvm_vcpu *vcpu)
 
 static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr,
  struct kvmppc_pte *pte, bool data);
-static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, u64 esid,
+static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
 u64 *vsid);
 
 static struct kvmppc_sr *find_sr(struct kvmppc_vcpu_book3s *vcpu_book3s

[PATCH 7/9] KVM: PPC: Fix Book3S_64 Host MMU debug output

2010-04-19 Thread Alexander Graf

We have some debug output in Book3S_64. Some of that was invalid though,
partially not even compiling because it accessed incorrect variables.

So let's fix that up, making debugging more fun again.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_64_mmu_host.c |   20 
 1 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
b/arch/powerpc/kvm/book3s_64_mmu_host.c
index 5545c45..e4b5744 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -48,8 +48,8 @@
 
 static void invalidate_pte(struct hpte_cache *pte)
 {
-   dprintk_mmu("KVM: Flushing SPT %d: 0x%llx (0x%llx) -> 0x%llx\n",
-   i, pte->pte.eaddr, pte->pte.vpage, pte->host_va);
+   dprintk_mmu("KVM: Flushing SPT: 0x%lx (0x%llx) -> 0x%llx\n",
+   pte->pte.eaddr, pte->pte.vpage, pte->host_va);
 
ppc_md.hpte_invalidate(pte->slot, pte->host_va,
   MMU_PAGE_4K, MMU_SEGSIZE_256M,
@@ -66,7 +66,7 @@ void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong 
guest_ea, ulong ea_mask)
 {
int i;
 
-   dprintk_mmu("KVM: Flushing %d Shadow PTEs: 0x%llx & 0x%llx\n",
+   dprintk_mmu("KVM: Flushing %d Shadow PTEs: 0x%lx & 0x%lx\n",
vcpu->arch.hpte_cache_offset, guest_ea, ea_mask);
BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM);
 
@@ -114,8 +114,8 @@ void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, ulong 
pa_start, ulong pa_end)
 {
int i;
 
-   dprintk_mmu("KVM: Flushing %d Shadow pPTEs: 0x%llx & 0x%llx\n",
-   vcpu->arch.hpte_cache_offset, guest_pa, pa_mask);
+   dprintk_mmu("KVM: Flushing %d Shadow pPTEs: 0x%lx & 0x%lx\n",
+   vcpu->arch.hpte_cache_offset, pa_start, pa_end);
BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM);
 
for (i = 0; i < vcpu->arch.hpte_cache_offset; i++) {
@@ -186,7 +186,7 @@ static struct kvmppc_sid_map *find_sid_vsid(struct kvm_vcpu 
*vcpu, u64 gvsid)
sid_map_mask = kvmppc_sid_hash(vcpu, gvsid);
map = &to_book3s(vcpu)->sid_map[sid_map_mask];
if (map->guest_vsid == gvsid) {
-   dprintk_slb("SLB: Searching 0x%llx -> 0x%llx\n",
+   dprintk_slb("SLB: Searching: 0x%llx -> 0x%llx\n",
gvsid, map->host_vsid);
return map;
}
@@ -198,7 +198,8 @@ static struct kvmppc_sid_map *find_sid_vsid(struct kvm_vcpu 
*vcpu, u64 gvsid)
return map;
}
 
-   dprintk_slb("SLB: Searching 0x%llx -> not found\n", gvsid);
+   dprintk_slb("SLB: Searching %d/%d: 0x%llx -> not found\n",
+   sid_map_mask, SID_MAP_MASK - sid_map_mask, gvsid);
return NULL;
 }
 
@@ -275,7 +276,7 @@ map_again:
int hpte_id = kvmppc_mmu_hpte_cache_next(vcpu);
struct hpte_cache *pte = &vcpu->arch.hpte_cache[hpte_id];
 
-   dprintk_mmu("KVM: %c%c Map 0x%llx: [%lx] 0x%lx (0x%llx) -> 
%lx\n",
+   dprintk_mmu("KVM: %c%c Map 0x%lx: [%lx] 0x%lx (0x%llx) -> 
%lx\n",
((rflags & HPTE_R_PP) == 3) ? '-' : 'w',
(rflags & HPTE_R_N) ? '-' : 'x',
orig_pte->eaddr, hpteg, va, orig_pte->vpage, 
hpaddr);
@@ -331,6 +332,9 @@ static struct kvmppc_sid_map *create_sid_map(struct 
kvm_vcpu *vcpu, u64 gvsid)
map->guest_vsid = gvsid;
map->valid = true;
 
+   dprintk_slb("SLB: New mapping at %d: 0x%llx -> 0x%llx\n",
+   sid_map_mask, gvsid, map->host_vsid);
+
return map;
 }
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/9] KVM: PPC: Be more informative on BUG

2010-04-19 Thread Alexander Graf

We have a condition in the ppc64 host mmu code that should never occur.
Unfortunately, it just did happen to me and I was rather puzzled on why,
because BUG_ON doesn't tell me anything useful.

So let's add some more debug output in case this goes wrong. Also change
BUG to WARN, since I don't want to reboot every time I mess something up.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - make WARN bail out
---
 arch/powerpc/kvm/book3s_64_mmu_host.c |   10 --
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
b/arch/powerpc/kvm/book3s_64_mmu_host.c
index 41af12f..5545c45 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -231,10 +231,16 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct 
kvmppc_pte *orig_pte)
vcpu->arch.mmu.esid_to_vsid(vcpu, orig_pte->eaddr >> SID_SHIFT, &vsid);
map = find_sid_vsid(vcpu, vsid);
if (!map) {
-   kvmppc_mmu_map_segment(vcpu, orig_pte->eaddr);
+   ret = kvmppc_mmu_map_segment(vcpu, orig_pte->eaddr);
+   WARN_ON(ret < 0);
map = find_sid_vsid(vcpu, vsid);
}
-   BUG_ON(!map);
+   if (!map) {
+   printk(KERN_ERR "KVM: Segment map for 0x%llx (0x%lx) failed\n",
+   vsid, orig_pte->eaddr);
+   WARN_ON(true);
+   return -EINVAL;
+   }
 
vsid = map->host_vsid;
va = hpt_va(orig_pte->eaddr, vsid, MMU_SEGSIZE_256M);
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 9/9] KVM: PPC: Enable native paired singles

2010-04-19 Thread Alexander Graf

When we're on a paired single capable host, we can just always enable
paired singles and expose them to the guest directly.

This approach breaks when multiple VMs run and access PS concurrently,
but this should suffice until we get a proper framework for it in Linux.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_asm.h |1 +
 arch/powerpc/kvm/book3s.c  |   19 +++
 arch/powerpc/kvm/book3s_emulate.c  |5 -
 3 files changed, 24 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h 
b/arch/powerpc/include/asm/kvm_asm.h
index 7238c04..c5ea4cd 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -89,6 +89,7 @@
 #define BOOK3S_HFLAG_DCBZ320x1
 #define BOOK3S_HFLAG_SLB   0x2
 #define BOOK3S_HFLAG_PAIRED_SINGLE 0x4
+#define BOOK3S_HFLAG_NATIVE_PS 0x8
 
 #define RESUME_FLAG_NV  (1<<0)  /* Reload guest nonvolatile state? */
 #define RESUME_FLAG_HOST(1<<1)  /* Resume host? */
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 91dc42d..cfc3051 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -344,6 +344,8 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
 
 void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr)
 {
+   u32 host_pvr;
+
vcpu->arch.hflags &= ~BOOK3S_HFLAG_SLB;
vcpu->arch.pvr = pvr;
 #ifdef CONFIG_PPC_BOOK3S_64
@@ -375,6 +377,23 @@ void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr)
/* 32 bit Book3S always has 32 byte dcbz */
vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
 #endif
+
+   /* On some CPUs we can execute paired single operations natively */
+   asm ( "mfpvr %0" : "=r"(host_pvr));
+   switch (host_pvr) {
+   case 0x00080200:/* lonestar 2.0 */
+   case 0x00088202:/* lonestar 2.2 */
+   case 0x7100:/* gekko 1.0 */
+   case 0x00080100:/* gekko 2.0 */
+   case 0x00083203:/* gekko 2.3a */
+   case 0x00083213:/* gekko 2.3b */
+   case 0x00083204:/* gekko 2.4 */
+   case 0x00083214:/* gekko 2.4e (8SE) - retail HW2 */
+   case 0x00087200:/* broadway */
+   vcpu->arch.hflags |= BOOK3S_HFLAG_NATIVE_PS;
+   /* Enable HID2.PSE - in case we need it later */
+   mtspr(SPRN_HID2_GEKKO, mfspr(SPRN_HID2_GEKKO) | (1 << 29));
+   }
 }
 
 /* Book3s_32 CPUs always have 32 bytes cache line size, which Linux assumes. To
diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index 3f7afb5..c85f906 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -365,7 +365,10 @@ int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int 
sprn, int rs)
case 0x00083213:/* gekko 2.3b */
case 0x00083204:/* gekko 2.4 */
case 0x00083214:/* gekko 2.4e (8SE) - retail HW2 */
-   if (spr_val & (1 << 29)) { /* HID2.PSE */
+   case 0x00087200:/* broadway */
+   if (vcpu->arch.hflags & BOOK3S_HFLAG_NATIVE_PS) {
+   /* Native paired singles */
+   } else if (spr_val & (1 << 29)) { /* HID2.PSE */
vcpu->arch.hflags |= BOOK3S_HFLAG_PAIRED_SINGLE;
kvmppc_giveup_ext(vcpu, MSR_FP);
} else {
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 8/9] KVM: PPC: Find HTAB ourselves

2010-04-19 Thread Alexander Graf

For KVM we need to find the location of the HTAB. We can either rely
on internal data structures of the kernel or ask the hardware.

Ben issued complaints about the internal data structure method, so
let's switch it to our own inquiry of the HTAB. Now we're fully
independend :-).

CC: Benjamin Herrenschmidt 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kernel/ppc_ksyms.c   |5 -
 arch/powerpc/kvm/book3s_32_mmu_host.c |   21 +
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
index 2b7c43f..bc9f39d 100644
--- a/arch/powerpc/kernel/ppc_ksyms.c
+++ b/arch/powerpc/kernel/ppc_ksyms.c
@@ -178,11 +178,6 @@ EXPORT_SYMBOL(switch_mmu_context);
 extern long mol_trampoline;
 EXPORT_SYMBOL(mol_trampoline); /* For MOL */
 EXPORT_SYMBOL(flush_hash_pages); /* For MOL */
-
-extern struct hash_pte *Hash;
-extern unsigned long _SDR1;
-EXPORT_SYMBOL_GPL(Hash); /* For KVM */
-EXPORT_SYMBOL_GPL(_SDR1); /* For KVM */
 #ifdef CONFIG_SMP
 extern int mmu_hash_lock;
 EXPORT_SYMBOL(mmu_hash_lock); /* For MOL */
diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c 
b/arch/powerpc/kvm/book3s_32_mmu_host.c
index 2bb67e6..0bb6600 100644
--- a/arch/powerpc/kvm/book3s_32_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -54,6 +54,9 @@
 #error Only 32 bit pages are supported for now
 #endif
 
+static ulong htab;
+static u32 htabmask;
+
 static void invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
 {
volatile u32 *pteg;
@@ -217,14 +220,11 @@ static struct kvmppc_sid_map *find_sid_vsid(struct 
kvm_vcpu *vcpu, u64 gvsid)
return NULL;
 }
 
-extern struct hash_pte *Hash;
-extern unsigned long _SDR1;
-
 static u32 *kvmppc_mmu_get_pteg(struct kvm_vcpu *vcpu, u32 vsid, u32 eaddr,
bool primary)
 {
-   u32 page, hash, htabmask;
-   ulong pteg = (ulong)Hash;
+   u32 page, hash;
+   ulong pteg = htab;
 
page = (eaddr & ~ESID_MASK) >> 12;
 
@@ -232,13 +232,12 @@ static u32 *kvmppc_mmu_get_pteg(struct kvm_vcpu *vcpu, 
u32 vsid, u32 eaddr,
if (!primary)
hash = ~hash;
 
-   htabmask = ((_SDR1 & 0x1FF) << 16) | 0xFFC0;
hash &= htabmask;
 
pteg |= hash;
 
-   dprintk_mmu("htab: %p | hash: %x | htabmask: %x | pteg: %lx\n",
-   Hash, hash, htabmask, pteg);
+   dprintk_mmu("htab: %lx | hash: %x | htabmask: %x | pteg: %lx\n",
+   htab, hash, htabmask, pteg);
 
return (u32*)pteg;
 }
@@ -453,6 +452,7 @@ int kvmppc_mmu_init(struct kvm_vcpu *vcpu)
 {
struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu);
int err;
+   ulong sdr1;
 
err = __init_new_context();
if (err < 0)
@@ -474,5 +474,10 @@ int kvmppc_mmu_init(struct kvm_vcpu *vcpu)
 
vcpu3s->vsid_next = vcpu3s->vsid_first;
 
+   /* Remember where the HTAB is */
+   asm ( "mfsdr1 %0" : "=r"(sdr1) );
+   htabmask = ((sdr1 & 0x1FF) << 16) | 0xFFC0;
+   htab = (ulong)__va(sdr1 & 0x);
+
return 0;
 }
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/9] KVM: PPC: Improve split mode

2010-04-19 Thread Alexander Graf

When in split mode, instruction relocation and data relocation are not equal.

So far we implemented this mode by reserving a special pseudo-VSID for the
two cases and flushing all PTEs when going into split mode, which is slow.

Unfortunately 32bit Linux and Mac OS X use split mode extensively. So to not
slow down things too much, I came up with a different idea: Mark the split
mode with a bit in the VSID and then treat it like any other segment.

This means we can just flush the shadow segment cache, but keep the PTEs
intact. I verified that this works with ppc32 Linux and Mac OS X 10.4
guests and does speed them up.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s.h |9 -
 arch/powerpc/kvm/book3s.c |   28 ++--
 arch/powerpc/kvm/book3s_32_mmu.c  |   21 +
 arch/powerpc/kvm/book3s_64_mmu.c  |   27 +++
 4 files changed, 46 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 5d3bd0c..6f74d93 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -100,11 +100,10 @@ struct kvmppc_vcpu_book3s {
 #define CONTEXT_GUEST  1
 #define CONTEXT_GUEST_END  2
 
-#define VSID_REAL_DR   0x7ff0ULL
-#define VSID_REAL_IR   0x7fe0ULL
-#define VSID_SPLIT_MASK0x7fe0ULL
-#define VSID_REAL  0x7fc0ULL
-#define VSID_BAT   0x7fb0ULL
+#define VSID_REAL  0x1fc0ULL
+#define VSID_BAT   0x1fb0ULL
+#define VSID_REAL_DR   0x2000ULL
+#define VSID_REAL_IR   0x4000ULL
 #define VSID_PR0x8000ULL
 
 extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong ea, ulong 
ea_mask);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index a03163b..91dc42d 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -147,16 +147,8 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
}
}
 
-   if (((vcpu->arch.msr & (MSR_IR|MSR_DR)) != (old_msr & (MSR_IR|MSR_DR))) 
||
-   (vcpu->arch.msr & MSR_PR) != (old_msr & MSR_PR)) {
-   bool dr = (vcpu->arch.msr & MSR_DR) ? true : false;
-   bool ir = (vcpu->arch.msr & MSR_IR) ? true : false;
-
-   /* Flush split mode PTEs */
-   if (dr != ir)
-   kvmppc_mmu_pte_vflush(vcpu, VSID_SPLIT_MASK,
- VSID_SPLIT_MASK);
-
+   if ((vcpu->arch.msr & (MSR_PR|MSR_IR|MSR_DR)) !=
+  (old_msr & (MSR_PR|MSR_IR|MSR_DR))) {
kvmppc_mmu_flush_segments(vcpu);
kvmppc_mmu_map_segment(vcpu, kvmppc_get_pc(vcpu));
}
@@ -534,6 +526,7 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
bool is_mmio = false;
bool dr = (vcpu->arch.msr & MSR_DR) ? true : false;
bool ir = (vcpu->arch.msr & MSR_IR) ? true : false;
+   u64 vsid;
 
relocated = data ? dr : ir;
 
@@ -551,13 +544,20 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 
switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
case 0:
-   pte.vpage |= VSID_REAL;
+   pte.vpage |= ((u64)VSID_REAL << (SID_SHIFT - 12));
break;
case MSR_DR:
-   pte.vpage |= VSID_REAL_DR;
-   break;
case MSR_IR:
-   pte.vpage |= VSID_REAL_IR;
+   vcpu->arch.mmu.esid_to_vsid(vcpu, eaddr >> SID_SHIFT, &vsid);
+
+   if ((vcpu->arch.msr & (MSR_DR|MSR_IR)) == MSR_DR)
+   pte.vpage |= ((u64)VSID_REAL_DR << (SID_SHIFT - 12));
+   else
+   pte.vpage |= ((u64)VSID_REAL_IR << (SID_SHIFT - 12));
+   pte.vpage |= vsid;
+
+   if (vsid == -1)
+   page_found = -EINVAL;
break;
}
 
diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index 33186b7..0b10503 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -330,30 +330,35 @@ static void kvmppc_mmu_book3s_32_tlbie(struct kvm_vcpu 
*vcpu, ulong ea, bool lar
 static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
 u64 *vsid)
 {
+   ulong ea = esid << SID_SHIFT;
+   struct kvmppc_sr *sr;
+   u64 gvsid = esid;
+
+   if (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+   sr = find_sr(to_book3s(vcpu), ea);
+   if (sr->valid)
+   gvsid = sr->vsid;
+   }
+
/* In case we only have one of MSR_IR or MSR_DR set, let's put
   that in the real-mode context (and hope RM doesn't access
   high memory) */
switch (vcpu

[PATCH 0/9] Post-PPC32 series v2

2010-04-19 Thread Alexander Graf

While working with the PPC32 host target we finally have I stumbled over
several things. Thanks to the now possible performance measurements I also
tracked down split mode as one of the major slowdowns to KVM.

What's left now that slows us down is the normal flushing code that needs
to move to a table based lookup and instruction emulation. On PPC32 guests
we waste about 70% of our time on emulating mfmsr, mtmsr, mfsprg, mtsprg
and friends.

Either way - this patch series deprecates the former performance counter
and u64 patch.

Avi / Marcelo, please apply the former series and this series. Ignore the
two patches in between.

v1 -> v2:

  - add paired single patch
  - move WARN bailing to the correct patch

Alexander Graf (9):
  KVM: PPC: Convert u64 -> ulong
  KVM: PPC: Make Performance Counters work
  KVM: PPC: Improve split mode
  KVM: PPC: Make Alignment interrupts work again
  KVM: PPC: Be more informative on BUG
  KVM: PPC: Set VSID_PR also for Book3S_64
  KVM: PPC: Fix Book3S_64 Host MMU debug output
  KVM: PPC: Find HTAB ourselves
  KVM: PPC: Enable native paired singles

 arch/powerpc/include/asm/kvm_asm.h|1 +
 arch/powerpc/include/asm/kvm_book3s.h |   13 +++
 arch/powerpc/include/asm/kvm_host.h   |6 ++--
 arch/powerpc/kernel/ppc_ksyms.c   |5 ---
 arch/powerpc/kvm/book3s.c |   56 +++--
 arch/powerpc/kvm/book3s_32_mmu.c  |   27 +--
 arch/powerpc/kvm/book3s_32_mmu_host.c |   29 +---
 arch/powerpc/kvm/book3s_64_mmu.c  |   34 
 arch/powerpc/kvm/book3s_64_mmu_host.c |   36 +---
 arch/powerpc/kvm/book3s_emulate.c |5 ++-
 arch/powerpc/kvm/book3s_interrupts.S  |2 +
 arch/powerpc/kvm/book3s_segment.S |2 +
 12 files changed, 132 insertions(+), 84 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/9] KVM: PPC: Set VSID_PR also for Book3S_64

2010-04-19 Thread Alexander Graf

Book3S_64 didn't set VSID_PR when we're in PR=1. This lead to pretty bad
behavior when searching for the shadow segment, as part of the code relied
on VSID_PR being set.

This patch fixes booting Book3S_64 guests.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_64_mmu.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
index 612de6e..4025ea2 100644
--- a/arch/powerpc/kvm/book3s_64_mmu.c
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -473,6 +473,9 @@ static int kvmppc_mmu_book3s_64_esid_to_vsid(struct 
kvm_vcpu *vcpu, ulong esid,
break;
}
 
+   if (vcpu->arch.msr & MSR_PR)
+   *vsid |= VSID_PR;
+
return 0;
 }
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/9] KVM: PPC: Make Alignment interrupts work again

2010-04-19 Thread Alexander Graf

In the process of merging Book3S_32 and 64 I somehow ended up having the
alignment interrupt handler take last_inst, but the fetching code not
fetching it. So we ended up with stale last_inst values.

Let's just enable last_inst fetching for alignment interrupts too.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_segment.S |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_segment.S 
b/arch/powerpc/kvm/book3s_segment.S
index 778e3fc..ede47fd 100644
--- a/arch/powerpc/kvm/book3s_segment.S
+++ b/arch/powerpc/kvm/book3s_segment.S
@@ -196,6 +196,8 @@ kvmppc_handler_trampoline_exit:
beq ld_last_inst
cmpwi   r12, BOOK3S_INTERRUPT_PROGRAM
beq ld_last_inst
+   cmpwi   r12, BOOK3S_INTERRUPT_ALIGNMENT
+   beq-ld_last_inst
 
b   no_ld_last_inst
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/9] KVM: PPC: Make Performance Counters work

2010-04-19 Thread Alexander Graf

When we get a performance counter interrupt we need to route it on to the
Linux handler after we got out of the guest context. We also need to tell
our handling code that this particular interrupt doesn't need treatment.

So let's add those two bits in, making perf work while having a KVM guest
running.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s.c|3 +++
 arch/powerpc/kvm/book3s_interrupts.S |2 ++
 2 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index a7de709..a03163b 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -872,6 +872,9 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
vcpu->stat.ext_intr_exits++;
r = RESUME_GUEST;
break;
+   case BOOK3S_INTERRUPT_PERFMON:
+   r = RESUME_GUEST;
+   break;
case BOOK3S_INTERRUPT_PROGRAM:
{
enum emulation_result er;
diff --git a/arch/powerpc/kvm/book3s_interrupts.S 
b/arch/powerpc/kvm/book3s_interrupts.S
index f5b3358..e486193 100644
--- a/arch/powerpc/kvm/book3s_interrupts.S
+++ b/arch/powerpc/kvm/book3s_interrupts.S
@@ -228,6 +228,8 @@ no_dcbz32_off:
beq call_linux_handler
cmpwi   r12, BOOK3S_INTERRUPT_DECREMENTER
beq call_linux_handler
+   cmpwi   r12, BOOK3S_INTERRUPT_PERFMON
+   beq call_linux_handler
 
/* Back to EE=1 */
mtmsr   r6
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call agenda for Apr 20

2010-04-19 Thread Brian Jackson

On Monday 19 April 2010 18:30:44 Chris Wright wrote:
> Please send in any agenda items you are interested in covering.


0.12.4?


> 
> thanks,
> -chris
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: use the correct RCU API

2010-04-19 Thread Paul E. McKenney

On Mon, Apr 19, 2010 at 12:49:04PM +0300, Avi Kivity wrote:
> On 04/19/2010 12:41 PM, Lai Jiangshan wrote:
> >The RCU/SRCU API have already changed for proving RCU usage.
> >
> >I got the following dmesg when PROVE_RCU=y because we used incorrect API.
> >This patch coverts rcu_deference() to srcu_dereference() or family API.
> >
> >===
> >[ INFO: suspicious rcu_dereference_check() usage. ]
> >---
> >arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection!
> >
> >other info that might help us debug this:
> >
> >
> >rcu_scheduler_active = 1, debug_locks = 0
> >2 locks held by qemu-system-x86/8550:
> >  #0:  (&kvm->slots_lock){+.+.+.}, at: [] 
> > kvm_set_memory_region+0x29/0x50 [kvm]
> >  #1:  (&(&kvm->mmu_lock)->rlock){+.+...}, at: [] 
> > kvm_arch_commit_memory_region+0xa6/0xe2 [kvm]
> >
> >stack backtrace:
> >Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 
> >#27
> >Call Trace:
> >  [] lockdep_rcu_dereference+0xaa/0xb3
> >  [] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm]
> >  [] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm]
> >  [] __kvm_set_memory_region+0x636/0x6e2 [kvm]
> >  [] kvm_set_memory_region+0x37/0x50 [kvm]
> >  [] vmx_set_tss_addr+0x46/0x5a [kvm_intel]
> >  [] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm]
> >  [] ? unlock_page+0x27/0x2c
> >  [] ? __do_fault+0x3a9/0x3e1
> >  [] kvm_vm_ioctl+0x364/0x38d [kvm]
> >  [] ? up_read+0x23/0x3d
> >  [] vfs_ioctl+0x32/0xa6
> >  [] do_vfs_ioctl+0x495/0x4db
> >  [] ? fget_light+0xc2/0x241
> >  [] ? do_sys_open+0x104/0x116
> >  [] ? retint_swapgs+0xe/0x13
> >  [] sys_ioctl+0x47/0x6a
> >  [] system_call_fastpath+0x16/0x1b
> >
> >
> >
> >+static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm)
> >+{
> >+return rcu_dereference_check(kvm->memslots,
> >+srcu_read_lock_held(&kvm->srcu)
> >+|| lockdep_is_held(&kvm->slots_lock));
> >+}
> >+
> 
> 
> This open-codes srcu_dereference().  I guess we need an
> srcu_dereference_check().  Paul?

One is coming in Arnd's sparse-based patchset.  It is probably best
to open-code this in the meantime and clean up later, but I will
double-check with Arnd.

> btw, perhaps it is possible not to call rcu_dereference from the
> write paths.

There is an rcu_dereference_protected() on its way to mainline to handle
the case where the reference is always protected by a lock.  Why not
just access it directly?  Because if you do that, the sparse-based checks
will yell at you.

There is also an rcu_access_pointer() on its way to mainline for cases
where you only want to test the pointer itself, not dereference it.

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM call agenda for Apr 20

2010-04-19 Thread Chris Wright

Please send in any agenda items you are interested in covering.

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [GSoC 2010] Pass-through filesystem support.

2010-04-19 Thread jvrao

Mohammed Gamal wrote:
> On Tue, Apr 13, 2010 at 9:08 PM, jvrao  wrote:
>> jvrao wrote:
>>> Alexander Graf wrote:
 On 12.04.2010, at 13:58, Jamie Lokier wrote:

> Mohammed Gamal wrote:
>> On Mon, Apr 12, 2010 at 12:29 AM, Jamie Lokier  
>> wrote:
>>> Javier Guerra Giraldez wrote:
 On Sat, Apr 10, 2010 at 7:42 AM, Mohammed Gamal  
 wrote:
> On Sat, Apr 10, 2010 at 2:12 PM, Jamie Lokier  
> wrote:
>> To throw a spanner in, the most widely supported filesystem across
>> operating systems is probably NFS, version 2 :-)
> Remember that Windows usage on a VM is not some rare use case, and
> it'd be a little bit of a pain from a user's perspective to have to
> install a third party NFS client for every VM they use. Having
> something supported on the VM out of the box is a better option IMO.
 i don't think virtio-CIFS has any more support out of the box (on any
 system) than virtio-9P.
>>> It doesn't, but at least network-CIFS tends to work ok and is the
>>> method of choice for Windows VMs - when you can setup Samba on the
>>> host (which as previously noted you cannot always do non-disruptively
>>> with current Sambas).
>>>
>>> -- Jamie
>>>
>> I think having support for both 9p and CIFS would be the best option.
>> In that case the user will have the option to use either one,
>> depending on how their guests support these filesystems. In that case
>> I'd prefer to work on CIFS support while the 9p effort can still go
>> on. I don't think both efforts are mutually exclusive.
>>
>> What do the rest of you guys think?
> I only noted NFS because most old OSes do not support CIFS or 9P -
> especially all the old unixes.
>
> I don't think old versions of MS-DOS and Windows (95, 98, ME, Nt4?)
> even support current CIFS.  They need extra server settings to work
> - such as setting passwords on the server to non-encrypted and other 
> quirks.
>
> Meanwhile Windows Vista/2008/7 works better with SMB2, not CIFS, to
> properly see symlinks and hard links.
>
> So there is no really nice out of the box file service which works
> easily with all guest OSes.
>
> I'm guessing that out of all the filesystems, CIFS is the most widely
> supported in recent OSes (released in the last 10 years).  But I'm not
> really sure what the state of CIFS is for non-Windows, non-Linux,
> non-BSD guests.
 So what? If you want to have direct host fs access, install guest drivers. 
 If you can't, set up networking and use CIFS or NFS or whatever.

> I'm not sure why 9P is being pursued.  Does anything much support it,
> or do all OSes except quite recent Linux need a custom driver for 9P?
> Even Linux only got the first commit in the kernel 5 years ago, so
> probably it was only about 3 years ago that it will have begun
> appearing in stable distros, if at all.  Filesystem passthrough to
> Linux guests installed in the last couple of years is a useful
> feature, and I know that for many people that is their only use of
> KVM, but compared with CIFS' broad support it seems like quite a
> narrow goal.
 The goal is to have something simple and fast. We can fine-tune 9P to 
 align with the Linux VFS structures, making it really little overhead (and 
 little headache too). For Windows guests, nothing prevents us to expose 
 yet another 9P flavor. That again would be aligned well with Windows's VFS 
 and be slim and fast there.

 The biggest problem I see with CIFS is that it's a huge beast. There are a 
 lot of corner cases where it just doesn't fit in. See my previous mail for 
 more details.

>>> As Alex mentioned, 9P is chosen for its mere simplicity and easy 
>>> adaptability.
>>> NFS and CIFS does not give that flexibility. As we mentioned in the patch 
>>> series, we are
>>> already seeing better numbers with 9P. Looking ahead 9P can embed KVM/QEMU 
>>> knowledge
>>> to share physical resources like memory/cache between the host and the 
>>> guest.
>>>
>>> I think looking into the windows side of 9P client would be great option 
>>> too.
>>> The current patch on the mailing list supports .U (unix) protocol and will 
>>> be introducing
>>> .L (Linux) variant as we move forward.
>> Hi Mohammed,
>> Please let us know once you decide on where your interest lies.
>> Will be glad to have you on VirtFS (9P) though. :)
>>
>>
>> - JV
>>
> 
> It seems the community is more keen on getting 9P support merged than
> getting CIFS supported, and they have made good points to support
> their argument. I'm not sure whether work on this project could fit in
> as a GSoC project and if there is much remaining work that could make
> it fit in that direction. But I'd be glad to volunteer anyway :)

I was thinking over the wk-end what fit

Re: [PATCH 0/1] trace all instructions whose emulation failed

2010-04-19 Thread Manish Regmi

On Mon, Apr 19, 2010 at 4:20 AM, Avi Kivity  wrote:
>>        }
>>
>>
>
> It's better not to trace #UD triggered emulations, since we except these to
> fail, for example if the guest executes the UD2 instruction.
>

ya. that sounds more logical. Thanks for explaining.
---
regards
Manish Regmi

http://manish-cs.blogspot.com
http://ext2read.sf.net
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4] Add mergeable RX bufs support to vhost

2010-04-19 Thread David L Stevens

This patch adds the mergeable RX buffers feature to vhost.

Signed-off-by: David L Stevens 

diff -ruNp net-next-p0/drivers/vhost/net.c net-next-v4/drivers/vhost/net.c
--- net-next-p0/drivers/vhost/net.c 2010-03-22 12:04:38.0 -0700
+++ net-next-v4/drivers/vhost/net.c 2010-04-19 14:23:38.0 -0700
@@ -108,7 +108,7 @@ static void handle_tx(struct vhost_net *
};
size_t len, total_len = 0;
int err, wmem;
-   size_t hdr_size;
+   size_t vhost_hlen;
struct socket *sock = rcu_dereference(vq->private_data);
if (!sock)
return;
@@ -127,13 +127,13 @@ static void handle_tx(struct vhost_net *
 
if (wmem < sock->sk->sk_sndbuf / 2)
tx_poll_stop(net);
-   hdr_size = vq->hdr_size;
+   vhost_hlen = vq->vhost_hlen;
 
for (;;) {
-   head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
-ARRAY_SIZE(vq->iov),
-&out, &in,
-NULL, NULL);
+   head = vhost_get_desc(&net->dev, vq, vq->iov,
+ ARRAY_SIZE(vq->iov),
+ &out, &in,
+ NULL, NULL);
/* Nothing new?  Wait for eventfd to tell us they refilled. */
if (head == vq->num) {
wmem = atomic_read(&sock->sk->sk_wmem_alloc);
@@ -154,20 +154,20 @@ static void handle_tx(struct vhost_net *
break;
}
/* Skip header. TODO: support TSO. */
-   s = move_iovec_hdr(vq->iov, vq->hdr, hdr_size, out);
+   s = move_iovec_hdr(vq->iov, vq->hdr, vhost_hlen, out);
msg.msg_iovlen = out;
len = iov_length(vq->iov, out);
/* Sanity check */
if (!len) {
vq_err(vq, "Unexpected header len for TX: "
   "%zd expected %zd\n",
-  iov_length(vq->hdr, s), hdr_size);
+  iov_length(vq->hdr, s), vhost_hlen);
break;
}
/* TODO: Check specific error and bomb out unless ENOBUFS? */
err = sock->ops->sendmsg(NULL, sock, &msg, len);
if (unlikely(err < 0)) {
-   vhost_discard_vq_desc(vq);
+   vhost_discard_desc(vq, 1);
tx_poll_start(net, sock);
break;
}
@@ -186,12 +186,25 @@ static void handle_tx(struct vhost_net *
unuse_mm(net->dev.mm);
 }
 
+static int vhost_head_len(struct vhost_virtqueue *vq, struct sock *sk)
+{
+   struct sk_buff *head;
+   int len = 0;
+
+   lock_sock(sk);
+   head = skb_peek(&sk->sk_receive_queue);
+   if (head)
+   len = head->len + vq->sock_hlen;
+   release_sock(sk);
+   return len;
+}
+
 /* Expects to be always run from workqueue - which acts as
  * read-size critical section for our kind of RCU. */
 static void handle_rx(struct vhost_net *net)
 {
struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_RX];
-   unsigned head, out, in, log, s;
+   unsigned in, log, s;
struct vhost_log *vq_log;
struct msghdr msg = {
.msg_name = NULL,
@@ -202,14 +215,14 @@ static void handle_rx(struct vhost_net *
.msg_flags = MSG_DONTWAIT,
};
 
-   struct virtio_net_hdr hdr = {
-   .flags = 0,
-   .gso_type = VIRTIO_NET_HDR_GSO_NONE
+   struct virtio_net_hdr_mrg_rxbuf hdr = {
+   .hdr.flags = 0,
+   .hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE
};
 
size_t len, total_len = 0;
-   int err;
-   size_t hdr_size;
+   int err, headcount, datalen;
+   size_t vhost_hlen;
struct socket *sock = rcu_dereference(vq->private_data);
if (!sock || skb_queue_empty(&sock->sk->sk_receive_queue))
return;
@@ -217,18 +230,18 @@ static void handle_rx(struct vhost_net *
use_mm(net->dev.mm);
mutex_lock(&vq->mutex);
vhost_disable_notify(vq);
-   hdr_size = vq->hdr_size;
+   vhost_hlen = vq->vhost_hlen;
 
vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) ?
vq->log : NULL;
 
-   for (;;) {
-   head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
-ARRAY_SIZE(vq->iov),
-&out, &in,
-vq_log, &log);
+   while ((datalen = vhost_head_len(vq, sock->sk))) {
+   headcount = vhost_get_desc_n(vq, vq->heads, datalen+vhost_hlen,
+&in, vq_log, &log);
+   if (headcount < 0)
+   break;

[PATCH V3] drivers/uio/uio_pci_generic.c: allow access for non-privileged processes

2010-04-19 Thread Tom Lyon


These are changes to uio_pci_generic.c to allow better use of the driver by
non-privileged processes.
1. Add back old code which allowed interrupt re-enablement through uio fd.
2. Translate PCI bards to uio mmap regions, to allow mmap through uio fd.
3. Allow devices which support MSI or MSI-X, but not IRQ.
Signed-off-by: Tom Lyon 
---
checkpatch.pl is happy with this one.

--- linux-2.6.33/drivers/uio/uio_pci_generic.c  2010-02-24 10:52:17.0 
-0800
+++ mylinux-2.6.33/drivers/uio/uio_pci_generic.c2010-04-19 
14:57:21.0 -0700
@@ -14,9 +14,10 @@
  * # ls -l /sys/bus/pci/devices/:00:19.0/driver
  * .../:00:19.0/driver -> ../../../bus/pci/drivers/uio_pci_generic
  *
- * Driver won't bind to devices which do not support the Interrupt Disable Bit
- * in the command register. All devices compliant to PCI 2.3 (circa 2002) and
- * all compliant PCI Express devices should support this bit.
+ * Driver won't bind to devices which do not support MSI, MSI-x, or the
+ * Interrupt Disable Bit in the command register. All devices compliant
+ * to PCI 2.3 (circa 2002) and all compliant PCI Express devices should
+ * support one of these.
  */
 
 #include 
@@ -27,7 +28,7 @@
 
 #define DRIVER_VERSION "0.01.0"
 #define DRIVER_AUTHOR  "Michael S. Tsirkin "
-#define DRIVER_DESC"Generic UIO driver for PCI 2.3 devices"
+#define DRIVER_DESC"Generic UIO driver for PCIe & PCI 2.3 devices"
 
 struct uio_pci_generic_dev {
struct uio_info info;
@@ -41,6 +42,39 @@ to_uio_pci_generic_dev(struct uio_info *
return container_of(info, struct uio_pci_generic_dev, info);
 }
 
+/* Read/modify/write command register to disable interrupts.
+ * Note: we could cache the value and optimize the read if there was a way to
+ * get notified of user changes to command register through sysfs.
+ * */
+static void irqtoggle(struct uio_pci_generic_dev *gdev, int irq_on)
+{
+   struct pci_dev *pdev = gdev->pdev;
+   unsigned long flags;
+   u16 orig, new;
+
+   spin_lock_irqsave(&gdev->lock, flags);
+   pci_block_user_cfg_access(pdev);
+   pci_read_config_word(pdev, PCI_COMMAND, &orig);
+   new = irq_on ? (orig & ~PCI_COMMAND_INTX_DISABLE)
+: (orig | PCI_COMMAND_INTX_DISABLE);
+   if (new != orig)
+   pci_write_config_word(gdev->pdev, PCI_COMMAND, new);
+   pci_unblock_user_cfg_access(pdev);
+   spin_unlock_irqrestore(&gdev->lock, flags);
+}
+
+/* irqcontrol is use by userspace to enable/disable interrupts. */
+/* A privileged app can write the PCI_COMMAND register directly,
+ * but we need this for normal apps
+ */
+static int irqcontrol(struct uio_info *info, s32 irq_on)
+{
+   struct uio_pci_generic_dev *gdev = to_uio_pci_generic_dev(info);
+
+   irqtoggle(gdev, irq_on);
+   return 0;
+}
+
 /* Interrupt handler. Read/modify/write the command register to disable
  * the interrupt. */
 static irqreturn_t irqhandler(int irq, struct uio_info *info)
@@ -89,7 +123,7 @@ done:
 /* Verify that the device supports Interrupt Disable bit in command register,
  * per PCI 2.3, by flipping this bit and reading it back: this bit was readonly
  * in PCI 2.2. */
-static int __devinit verify_pci_2_3(struct pci_dev *pdev)
+static int verify_pci_2_3(struct pci_dev *pdev)
 {
u16 orig, new;
int err = 0;
@@ -121,17 +155,52 @@ err:
return err;
 }
 
-static int __devinit probe(struct pci_dev *pdev,
+/* we could've used the generic pci sysfs stuff for mmap,
+ * but this way we can allow non-privileged users as long
+ * as /dev/uio* has the right permissions
+ */
+static void uio_do_maps(struct uio_pci_generic_dev *gdev)
+{
+   struct pci_dev *pdev = gdev->pdev;
+   struct uio_info *info = &gdev->info;
+   int i, j;
+   char *name;
+
+   for (i = 0, j = 0; i < PCI_STD_RESOURCE_END && j < MAX_UIO_MAPS; i++) {
+   if (pci_resource_flags(pdev, i) & IORESOURCE_MEM) {
+   name = kmalloc(8, GFP_KERNEL);
+   if (name == NULL)
+   break;
+   sprintf(name, "membar%d", i);
+   info->mem[j].name = name;
+   info->mem[j].addr = pci_resource_start(pdev, i);
+   info->mem[j].size = pci_resource_len(pdev, i);
+   info->mem[j].memtype = UIO_MEM_PHYS;
+   j++;
+   }
+   }
+   for (i = 0, j = 0; i < PCI_STD_RESOURCE_END &&
+  j < MAX_UIO_PORT_REGIONS; i++) {
+   if (pci_resource_flags(pdev, i) & IORESOURCE_IO) {
+   name = kmalloc(8, GFP_KERNEL);
+   if (name == NULL)
+   break;
+   sprintf(name, "iobar%d", i);
+   info->port[j].name = name;
+   info->port[j].start = pci_resource_start(pdev, i);
+   info->port[j].siz

Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-04-19 Thread Michael S. Tsirkin

On Fri, Feb 19, 2010 at 12:22:20AM +0200, Michael S. Tsirkin wrote:
> I took a stub at documenting CMD and FLUSH request types in virtio
> block.

Any comments?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

About cpu_set, CPU hotplug and related subjects

2010-04-19 Thread Lucas Meneghel Rodrigues

Hi folks, 

I've implemented a functional test for the cpu_set feature some time
ago. I was going through the patch queue and realized the patch needed
some respin, so I did it. I decided to try and see what is the state of
the feature, considering last time I tried it was not working
(segfaulting).

Now, with the latest qemu-kvm things are on a much better shape, cpu_set
is not giving me segfaults, and qemu monitor reports the new cpus added.

10:21:20 INFO | Adding 1 CPUs to guest
10:21:20 DEBUG| Sending monitor command: cpu_set 0 online
10:21:20 DEBUG| Sending monitor command: cpu_set 1 online
10:21:20 DEBUG| Sending monitor command: info cpus
10:21:20 DEBUG| Output of info cpus:
* CPU #0: pc=0x8102e23d thread_id=11035 
  CPU #1: pc=0x thread_id=11065 

The address of the CPU #1 seems a little strange. The guest OS is
completely unaware of any changes to the number of CPUs though.

After doing some reading it seems to me that the reason why that is not
happening is because SeaBIOS still doesn't have code to support CPU hot
plugging as BochsBIOS did. I also looked after some documentation about
how that feature is supposed to work, and didn't find it. So, assuming
my understanding about the current status is correct:

1) Is anybody planning on adding the necessary support to SeaBIOS
anytime soon?
2) How the whole functionality is supposed to work? My mental model goes
like this:

* cpu_set [total number of CPUs you want the system to have] online
* Guest should notice new CPUS added to the system and they appear
under /sys/devices/system/cpu/cpu*, initially offlined
* The online status of each CPU is
in /sys/devices/system/cpu/cpu*/online, and onlining it is just a matter
of writing 1 to this file
* It is not possible to downgrade the number of CPUs of the system
during the lifetime of the VM process

Thanks for your attention,

Lucas


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

QEMU-KVM and video performance

2010-04-19 Thread Gerhard Wiesinger


Hello,

Finally I got QEMU-KVM to work but video performance under DOS is very 
low (QEMU 0.12.3 stable and QEMU GIT master branch is fast, QEMU KVM is 
slow)


I'm measuring 2 performance critical video performance parameters:
1.) INT 10h, function AX=4F05h (set same window/set window/get window)
2.) Memory performance to segment page A000h

So BIOS performance (which might be port performance to VGA index/value 
port) is about factor 5 slower, memory performance is about factor 100 
slower.


QEMU 0.12.3 and QEMU GIT performance is the same (in the measurement 
tolerance) and listed only once, QEMU KVM is much more slower (details see 
below).


Test programs can be provided, source code will be release soon.

Any ideas why KVM is so slow? Any ideas for improvement?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

==
QEMU 0.12.3 and QEMU GIT
==
INT10PER performance information V1.0, (c) 2010 by Gerhard Wiesinger

VESA set window: seconds=2, operations=100, ops/s=50.0
VESA set window (nc): seconds=2, operations=100, ops/s=50.0
VESA get window: seconds=2, operations=100, ops/s=50.0

MEMPERF video performance V1.0, (c) 2010 by Gerhard Wiesinger

BYTE performance, time=10s, bytes=930611200, rate=88.750 MB/s
WORD performance, time=10s, bytes=766771200, rate=73.125 MB/s
DWORD performance, time=10s, bytes=812646400, rate=77.500 MB/s
QWORD performance, time=10s, bytes=806092800, rate=76.875 MB/s
==

==
QEMU-KVM
==
INT10PER performance information V1.0, (c) 2010 by Gerhard Wiesinger

VESA set window: seconds=9, operations=100, ops/s=11.1
VESA set window (nc): seconds=9, operations=100, ops/s=11.1
VESA get window: seconds=5, operations=100, ops/s=20.0

MEMPERF video performance V1.0, (c) 2010 by Gerhard Wiesinger

BYTE performance, time=13s, bytes=13107200, rate=0.962 MB/s
WORD performance, time=13s, bytes=13107200, rate=0.962 MB/s
DWORD performance, time=12s, bytes=13107200, rate=1.042 MB/s
QWORD performance, time=13s, bytes=13107200, rate=0.962 MB/s
==

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Autotest] Autotest: Unattended_install testcase always fail with rhel3.9-32 guest

2010-04-19 Thread Lucas Meneghel Rodrigues

On Sun, 2010-04-18 at 21:32 -0600, David S. Ahern wrote:
> On 04/18/2010 12:26 PM, Lucas Meneghel Rodrigues wrote:
> > On Sat, 2010-04-17 at 22:55 -0600, David S. Ahern wrote:
> >>
> >> On 04/17/2010 10:09 PM, Amos Kong wrote:
> >>> %post --interpreter /usr/bin/python
> >>> import socket, os
> >>> os.system('dhclient')
> >>> os.system('chkconfig sshd on')
> >>> os.system('iptables -F')
> >>> os.system('echo 0 > /selinux/enforce')
> >>> server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> >>> server.bind(('', 12323))
> >>> server.listen(1)
> >>> (client, addr) = server.accept()
> >>> client.send("done")
> >>> client.close()
> >>
> >> So, effectively after the install completes use dhclient to configure a
> >> network address, start a server on a known port and when a client
> >> connects send the message "done". I would expect that to work just fine.
> > 
> > Me too, it has been working for RHEL 4.X, 5.X 32/64 bit and 3.X 64 bit.
> > The problem has been effectively 3.9 32 bit.
> 
> I fired up a 3.9 guest with your ks.cfg. The problem is due to the
> limited functionality in the RHEL3 BOOT kernel for i386. Specifically,
> dhclient is failing at:
> 
> setsockopt(6, SOL_SOCKET, SO_ATTACH_FILTER, "\v\0\6\10\240Y\n\10", 8) =
> -1 ENOPROTOOPT
> 
> So dhclient client is out. But you can still configure and use
> networking via ifconfig if static addressing is an option for you. I was
> able to use that command to configure eth0 and push an strace output
> file for dhclient.
> 
> Also, a couple of comments on this use case:
> - SELinux is not applicable
> - 32 GB of RAM is way beyond what the RHEL3 i386 can detect and use
> - 12 vcpus seems high as well.

Problem resolved :) Yay!

http://autotest.kernel.org/changeset/4432

Thank you very much David,


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM test: Fix RHEL 3.9 32 bit installation

2010-04-19 Thread Lucas Meneghel Rodrigues

Fix a long standing bug where the unattended installs for
RHEL 3.9 32 bit were failing due to some limitations on the
boot install kernel for this specific version. Instead
of using dhclient, resort to static IP configuration and
bring joy to the land. This modification to the kickstart
file does not regress 3.9 64 bit.

Special thanks go to David S. Ahern  for
the time spent on resolving this problem.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/unattended/RHEL-3-series.ks |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/client/tests/kvm/unattended/RHEL-3-series.ks 
b/client/tests/kvm/unattended/RHEL-3-series.ks
index ad748cb..884b386 100644
--- a/client/tests/kvm/unattended/RHEL-3-series.ks
+++ b/client/tests/kvm/unattended/RHEL-3-series.ks
@@ -24,10 +24,9 @@ skipx
 
 %post --interpreter /usr/bin/python
 import socket, os
-os.system('dhclient')
+os.system('/sbin/ifconfig eth0 10.0.2.15 netmask 255.255.255.0 up')
+os.system('/sbin/route add default gw 10.0.2.2')
 os.system('chkconfig sshd on')
-os.system('iptables -F')
-os.system('echo 0 > /selinux/enforce')
 server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 server.bind(('', 12323))
 server.listen(1)
-- 
1.6.6.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM Forum 2010: Call for Papers

2010-04-19 Thread KVM Forum 2010 Program Committee

=

CALL FOR PAPERS 

KVM Forum 2010

=


DESCRIPTION

 The KVM Forum is back! After a break last year we're proud to present
 this year's gathering around KVM again.  The idea is to have everyone
 involved with KVM development come together to talk about the future
 and current state of KVM, teaching everyone some pieces of the puzzle
 they might be missing without.
 
 So if you're a KVM developer, mark the dates in your calendar! If
 possible, also submit a talk -- we're interested in a wide variety of
 KVM topics, so don't hesitate to propose a talk on your work.

 If you're not a KVM developer, please read on nevertheless (or jump to
 END USER COLLABORATION).
 

DATES / LOCATION

 Conference: August 9 - 10, 2010
 Location: Renaissance Boston Waterfront in Boston, MA
 
 Abstracts due: May 14th, 2010
 Notification: May 28th, 2010
 
 Yes, we're colocated with LinuxCon.  Tickets for the KVM Forum also count
 for LinuxCon.

 
http://events.linuxfoundation.org/component/registrationpro/?func=details&did=34
 
 
PROCESS

 At first check if it's before May 14th.  If you're past that date, you're
 out of luck.  Now try to think hard and come up with a great idea that
 you could talk about.  Once you have that set, we need you to write up
 a short abstract (~150 words) on it.  In your submission please note
 how long your talk will take.  Slots vary in length up to one hour.
 Also include in your proposal the proposal type -- one of: technical talk,
 breakout session, or end-user talk.  Add that information to the abstract
 and submit it at the following URL:
 
 http://events.linuxfoundation.org/cfp/cfp-add
 
 Now, wait until May 24th.  You will receive a notification on whether
 your talk was accepted or not.
 
 
SCOPE OF TALKS

 We have a list of suggested presentation topics below.  These suggestions
 are just for guidance, please feel free to submit a proposal on any
 of these or related topics.  In general, the more it's about backend
 infrastructure, the better.
 
 KVM
   - Scaling and performance
   - Nested virtualization
   - I/O improvements
   - Driver domains
   - Time keeping
   - Memory management (page sharing, swapping, huge pages, etc)
   - Fault tolerance
   - VEPA, vswitch

  Embedded KVM
   - KVM on ARM, PPC, MIPS, ...?
   - Real-time requirements host/guest
   - Device pass-through w/o iommu
   - Custom device/platform models
 
 QEMU
   - Device model improvements
   - New devices
   - Security model
   - Scaling and performance
   - Desktop virtualization
   - Increasing robustness
   - Management interfaces
   - QMP protocol and implementation
   - Live migration
 
 Virtio
   - Speeding up existing devices
   - Vhost
   - Alternatives
   - Using virtio in non-kvm environments
   - Virtio on non-Linux
 
 Management infrastructure
   - Libvirt
   - Kvm autotest
   - Easy networking
   - Qemud


BREAKOUT SESSION

 We will reserve some time each day to break out for working sessions.
 These sessions will be less formal than a presentation and more focused
 on developing a solution to some real development issue.  If you are
 interested in getting developers together to hack on some code, submit
 your proposal and just make it clear it's a breakout session proposal.
 

END-USER COLLABORATION

 One of the big challenges as developers is to know what, where and how
 people actually use our software.  To solve this issue at least a little,
 there will be a few slots reserved for end users talking about their
 deployments, problems and achievements.
 
 So if you have a KVM based deployment running in production or are about
 to roll out one, please also submit a talk (see PROCESS), and simply
 mark it asn an end-user collaboration proposal.  We would love to have
 an open discussion of fields where KVM/Qemu can still improve and you
 would have the unique chance to steer that process!
  
 Keep in mind that most of the Forum will be focused on development though,
 so we suggest you also come with a good portion of technical interest :-).
 And of course, no product marketing please!  The purpose is to engage
 with KVM developers.
 
 
LIGHTNING TALKS

 In addition to submitted talks we will also have some room for lightning
 talks. So if you have something you think might be done until the KVM
 Forum, but you're not sure you could fill 15 minutes with it. Or if you
 don't know if you'll make it until there, just keep in mind that you
 will still get the chance to talk about it.  Lightning talk submissions
 and scheduling will be handled on-site at KVM Forum.


Thank you for your interest in KVM.  We're looking forward to your
submissions and seeing you at the KVM Forum 2010 in August!  Now, start
thinking about that talk you want to give.

Thanks,
your KVM Forum 2010 Program Commitee

Alexander Graf, Novell
Anthony Liguori, IBM
Avi Kivity, Red Hat
Chris Wright, Red Hat
Dor Laor

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Zachary Amsden


On 04/19/2010 12:54 AM, Avi Kivity wrote:

On 04/19/2010 01:51 PM, Peter Zijlstra wrote:



Right, so on x86 we have:

X86_FEATURE_CONSTANT_TSC, which only states that TSC is frequency
independent, not that it doesn't stop in C states and similar fun 
stuff.


X86_FEATURE_TSC_RELIABLE, which IIRC should indicate the TSC is 
constant

and synced between cores.



Sockets and boards too?  (IOW, how reliable is TSC_RELIABLE)?

Not sure, IIRC we clear that when the TSC sync test fails, eg when we
mark the tsc clocksource unusable.


Worrying.  By the time we detect this the guest may already have 
gotten confused by clocks going backwards.


Upstream, we are marking the TSC unstable preemptively when hardware 
which will eventually sync test is detected, so this should be fine.


Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Glauber Costa

On Mon, Apr 19, 2010 at 09:19:38AM -0700, Jeremy Fitzhardinge wrote:
> On 04/19/2010 07:26 AM, Glauber Costa wrote:
> >> Is the problem that the tscs are starting out of sync, or that they're
> >> drifting relative to each other over time?  Do the problems become worse
> >> the longer the uptime?  How large are the offsets we're talking about here?
> >> 
> > The offsets usually seem pretty small, under a microsecond. So I don't think
> > it has anything to do with tscs starting out of sync. Specially because the
> > delta-based calculation has the exact purpose of handling that case.
> >   
> 
> So you think they're drifting out of sync from an initially synced
> state?  If so, what would bound the drift?
I think delta calculation introduces errors.

Marcelo can probably confirm it, but he has a nehalem with an appearently
very good tsc source. Even this machine warps.

It stops warping if we only write pvclock data structure once and forget it,
(which only updated tsc_timestamp once), according to him.

Obviously, we can't do that everywhere.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V5 1/3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-19 Thread Ingo Molnar


FYI, i found a few problems that need fixing:

> + unsigned long ip;
> + if (perf_guest_cbs && perf_guest_cbs->is_in_guest())

missing newline.

> + int misc = 0;
> + if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {

ditto.

> + PERF_RECORD_MISC_GUEST_KERNEL;
> + } else
> + misc |= user_mode(regs) ? PERF_RECORD_MISC_USER :
> + PERF_RECORD_MISC_KERNEL;

- unbalanced curly braces
- missing curly brace for multi-line statement.
- unnecessary line-break due to col80 warning from checkpatch

> +extern struct perf_guest_info_callbacks *perf_guest_cbs;
> +extern int perf_register_guest_info_callbacks(
> + struct perf_guest_info_callbacks *);
> +extern int perf_unregister_guest_info_callbacks(
> + struct perf_guest_info_callbacks *);

- unnecessary line-break due to col80 warning from checkpatch

> +static inline int perf_register_guest_info_callbacks
> +(struct perf_guest_info_callbacks *) {return 0; }
> +static inline int perf_unregister_guest_info_callbacks
> +(struct perf_guest_info_callbacks *) {return 0; }

- invalid C: function parameter needs name even if unused
- missing space after opening curly brace

Please provide delta fixes.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Jeremy Fitzhardinge

On 04/19/2010 07:26 AM, Glauber Costa wrote:
>> Is the problem that the tscs are starting out of sync, or that they're
>> drifting relative to each other over time?  Do the problems become worse
>> the longer the uptime?  How large are the offsets we're talking about here?
>> 
> The offsets usually seem pretty small, under a microsecond. So I don't think
> it has anything to do with tscs starting out of sync. Specially because the
> delta-based calculation has the exact purpose of handling that case.
>   

So you think they're drifting out of sync from an initially synced
state?  If so, what would bound the drift?

J
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Jeremy Fitzhardinge

On 04/19/2010 07:46 AM, Peter Zijlstra wrote:
> What avi says! :-)
>
> On a 32bit machine a 64bit read are two 32bit reads, so
>
>   last = last_value;
>
> becomes:
>
>   last.high = last_value.high;
>   last.low  = last_vlue.low;
>
> (or the reverse of course)
>
> Now imagine a write getting interleaved with that ;-)
>   

You could explicitly do:

do {
h = last.high;
barrier();
l = last.low;
barrier();
} while (last.high != h);

This works because we expect last to be always increasing, so the only
worry is low wrapping and incrementing high, and is more efficient than
making the read fully atomic (the write is still cmpxchg64).  But it's
pretty ugly to open code just for 32b architectures; its something that
might be useful to turn into a general abstraction (monotonic_read_64
FTW!).  I already have code like this in the Xen time code, so I could
make immediate use of it.

J
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Jeremy Fitzhardinge

On 04/19/2010 07:33 AM, Avi Kivity wrote:
> On 04/19/2010 05:21 PM, Glauber Costa wrote:
>>
>>> Oh yes, just trying to avoid a patch with both atomic64_read() and
>>> ACCESS_ONCE().
>>>  
>> you're mixing the private version of the patch you saw with this one.
>> there isn't any atomic reads in here. I'll use a barrier then
>>
>
> This patch writes last_value atomically, but reads it non-atomically. 
> A barrier is insufficient.

Well, on a 32b system, you can explicitly order the updates of low and
high, then do a high-low-checkhigh read.  That would be much more
efficient than atomic64.  If we really care about 32b.

J

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

buildbot failure in qemu-kvm on default_i386_out_of_tree

2010-04-19 Thread qemu-kvm

The Buildbot has detected a new failure of default_i386_out_of_tree on qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/default_i386_out_of_tree/builds/299

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_2

Build Reason: 
Build Source Stamp: [branch next] HEAD
Blamelist: Adam Litke ,Alexander Graf ,Amos 
Kong ,Anthony Liguori ,Arnaud Patard 
(Rtp) ,Aurelien Jarno ,Avi 
Kivity ,BjÃ¸rn Mork ,Blue Swirl 
,Chris Webb ,Christoph Hellwig 
,Dmitry Ilyevsky ,Edgar E. Iglesias 
,Gerd Hoffmann ,Jan Kiszka 
,Johan Bengtsson ,Juan Quintela 
,Juergen Lock ,Kevin Wolf 
,Lars Munch ,Marcelo Tosatti 
,Markus Armbruster ,Max Reitz 
,Michael Casadevall ,Michael S. 
Tsirkin ,Michael Tokarev ,Naphtali Sprei 
,Paolo Bonzini ,Paul Bolle
  ,Paul Brook ,Rabin Vincent 
,Richard Henderson ,Riku Voipio 
,Rob Landley ,Ryota Ozaki 
,Shahar Havivi ,Stefan Weil 
,TeLeMan ,malc 

BUILD FAILED: failed git

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/5] export new cpuid KVM_CAP

2010-04-19 Thread Glauber Costa

On Sat, Apr 17, 2010 at 09:58:26PM +0300, Avi Kivity wrote:
> On 04/15/2010 09:37 PM, Glauber Costa wrote:
> >Since we're changing the msrs kvmclock uses, we have to communicate
> >that to the guest, through cpuid. We can add a new KVM_CAP to the
> >hypervisor, and then patch userspace to recognize it.
> >
> >And if we ever add a new cpuid bit in the future, we have to do that again,
> >which create some complexity and delay in feature adoption.
> >
> >Instead, what I'm proposing in this patch is a new capability, called
> >KVM_CAP_X86_CPUID_FEATURE_LIST, that returns the current feature list
> >currently supported by the hypervisor. If we ever want to add or remove
> >some feature, we only need to tweak into the HV, leaving userspace untouched.
> >
> 
> Hm.  We need to update userspace anyway, since we don't like turning
> features on unconditionally (it breaks live migration into an older
> kernel).
Right now, we don't have any mechanism to disable, say, kvmclock cpuid bit
at userspace. But let's suppose we have: What's the difference between disabling
it in the way it is now, and disabling it with the method I am proposing?

All this ioctl say is: "Those are the current supported stuff in this HV".
It does not mandate userspace to expose all of this to the guest. It just saves
us from the job of creating yet another CAP for every bit we plan on including.

If we want to be conservative, we can keep everything but the things we know 
already disabled, in userspace.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

buildbot failure in qemu-kvm on default_x86_64_out_of_tree

2010-04-19 Thread qemu-kvm

The Buildbot has detected a new failure of default_x86_64_out_of_tree on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/default_x86_64_out_of_tree/builds/300

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_1

Build Reason: 
Build Source Stamp: [branch master] HEAD
Blamelist: Adam Litke ,Alexander Graf ,Amos 
Kong ,Anthony Liguori ,Arnaud Patard 
(Rtp) ,Aurelien Jarno ,Avi 
Kivity ,BjÃ¸rn Mork ,Blue Swirl 
,Chris Webb ,Christoph Hellwig 
,Dmitry Ilyevsky ,Edgar E. Iglesias 
,Gerd Hoffmann ,Jan Kiszka 
,Johan Bengtsson ,Juan Quintela 
,Juergen Lock ,Kevin Wolf 
,Lars Munch ,Marcelo Tosatti 
,Markus Armbruster ,Max Reitz 
,Michael Casadevall ,Michael S. 
Tsirkin ,Michael Tokarev ,Naphtali Sprei 
,Paolo Bonzini ,Paul Bolle
  ,Paul Brook ,Rabin Vincent 
,Richard Henderson ,Riku Voipio 
,Rob Landley ,Ryota Ozaki 
,Shahar Havivi ,Stefan Weil 
,TeLeMan ,malc 

BUILD FAILED: failed git

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

buildbot failure in qemu-kvm on default_i386_debian_5_0

2010-04-19 Thread qemu-kvm

The Buildbot has detected a new failure of default_i386_debian_5_0 on qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/default_i386_debian_5_0/builds/361

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_2

Build Reason: 
Build Source Stamp: [branch master] HEAD
Blamelist: Adam Litke ,Alexander Graf ,Amos 
Kong ,Anthony Liguori ,Arnaud Patard 
(Rtp) ,Aurelien Jarno ,Avi 
Kivity ,BjÃ¸rn Mork ,Blue Swirl 
,Chris Webb ,Christoph Hellwig 
,Dmitry Ilyevsky ,Edgar E. Iglesias 
,Gerd Hoffmann ,Jan Kiszka 
,Johan Bengtsson ,Juan Quintela 
,Juergen Lock ,Kevin Wolf 
,Lars Munch ,Marcelo Tosatti 
,Markus Armbruster ,Max Reitz 
,Michael Casadevall ,Michael S. 
Tsirkin ,Michael Tokarev ,Naphtali Sprei 
,Paolo Bonzini ,Paul Bolle
  ,Paul Brook ,Rabin Vincent 
,Richard Henderson ,Riku Voipio 
,Rob Landley ,Ryota Ozaki 
,Shahar Havivi ,Stefan Weil 
,TeLeMan ,malc 

BUILD FAILED: failed git

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Peter Zijlstra

On Mon, 2010-04-19 at 17:33 +0300, Avi Kivity wrote:
> On 04/19/2010 05:21 PM, Glauber Costa wrote:
> >
> >> Oh yes, just trying to avoid a patch with both atomic64_read() and
> >> ACCESS_ONCE().
> >>  
> > you're mixing the private version of the patch you saw with this one.
> > there isn't any atomic reads in here. I'll use a barrier then
> >
> 
> This patch writes last_value atomically, but reads it non-atomically.  A 
> barrier is insufficient.

What avi says! :-)

On a 32bit machine a 64bit read are two 32bit reads, so

  last = last_value;

becomes:

  last.high = last_value.high;
  last.low  = last_vlue.low;

(or the reverse of course)

Now imagine a write getting interleaved with that ;-)



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Avi Kivity


On 04/19/2010 05:32 PM, Glauber Costa wrote:



Right, another option is to put the initial read outside of the loop,
that way you'll have the best of all cases, a single LOCK'ed op in the
loop, and only a single LOCK'ed op for the fast path on sensible
architectures ;-)

 last = atomic64_read(&last_value);
 

isn't a barrier enough here?

   


No.  On i386, the statement

   last = last_value;

will be split by the compiler into two 32-bit loads.  If a write 
(atomic, using cmpxchg) on another cpu happens between those two loads, 
then the variable last will have a corrupted value.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Avi Kivity


On 04/19/2010 05:21 PM, Glauber Costa wrote:



Oh yes, just trying to avoid a patch with both atomic64_read() and
ACCESS_ONCE().
 

you're mixing the private version of the patch you saw with this one.
there isn't any atomic reads in here. I'll use a barrier then
   


This patch writes last_value atomically, but reads it non-atomically.  A 
barrier is insufficient.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Glauber Costa

On Mon, Apr 19, 2010 at 01:19:43PM +0200, Peter Zijlstra wrote:
> On Mon, 2010-04-19 at 14:13 +0300, Avi Kivity wrote:
> > On 04/19/2010 01:56 PM, Peter Zijlstra wrote:
> > >
> > >
> > >>> Right, do bear in mind that the x86 implementation of atomic64_read() is
> > >>> terrifyingly expensive, it is better to not do that read and simply use
> > >>> the result of the cmpxchg.
> > >>>
> > >>>
> > >>>
> > >> atomic64_read() _is_ cmpxchg64b.  Are you thinking of some clever
> > >> implementation for smp i386?
> > >>  
> > >
> > > No, what I was suggesting was to rewrite that loop no to need the
> > > initial read but use the cmpxchg result of the previous iteration.
> > >
> > > Something like:
> > >
> > >u64 last = 0;
> > >
> > >/* more stuff */
> > >
> > >do {
> > >  if (ret<  last)
> > >return last;
> > >  last = cmpxchg64(&last_value, last, ret);
> > >} while (last != ret);
> > >
> > > That only has a single cmpxchg8 in there per loop instead of two
> > > (avoiding the atomic64_read() one).
> > >
> > 
> > Still have two cmpxchgs in the common case.  The first iteration will 
> > fail, fetching last_value, the second will work.
> > 
> > It will be better when we have contention, though, so it's worthwhile.
> 
> Right, another option is to put the initial read outside of the loop,
> that way you'll have the best of all cases, a single LOCK'ed op in the
> loop, and only a single LOCK'ed op for the fast path on sensible
> architectures ;-)
> 
> last = atomic64_read(&last_value);
isn't a barrier enough here?


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: MMU: Drop cr4.pge from shadow page role

2010-04-19 Thread Avi Kivity

Since commit bf47a760f66ad, we no longer handle ptes with the global bit
set specially, so there is no reason to distinguish between shadow pages
created with cr4.gpe set and clear.

Such tracking is expensive when the guest toggles cr4.pge, so drop it.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_host.h |1 -
 arch/x86/kvm/mmutrace.h |3 +--
 arch/x86/kvm/x86.c  |1 -
 3 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3c31c5a..d47d087 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -178,7 +178,6 @@ union kvm_mmu_page_role {
unsigned direct:1;
unsigned access:3;
unsigned invalid:1;
-   unsigned cr4_pge:1;
unsigned nxe:1;
};
 };
diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
index 3851f1f..bc4f7f0 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmutrace.h
@@ -28,7 +28,7 @@
\
role.word = __entry->role;  \
\
-   trace_seq_printf(p, "sp gfn %llx %u%s q%u%s %s%s %spge" \
+   trace_seq_printf(p, "sp gfn %llx %u%s q%u%s %s%s"   \
 " %snxe root %u %s%c", \
 __entry->gfn, role.level,  \
 role.cr4_pae ? " pae" : "",\
@@ -36,7 +36,6 @@
 role.direct ? " direct" : "",  \
 access_str[role.access],   \
 role.invalid ? " invalid" : "",\
-role.cr4_pge ? "" : "!",   \
 role.nxe ? "" : "!",   \
 __entry->root_count,   \
 __entry->unsync ? "unsync" : "sync", 0);   \
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 58a96e6..d7d58ce 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -488,7 +488,6 @@ void kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
}
kvm_x86_ops->set_cr4(vcpu, cr4);
vcpu->arch.cr4 = cr4;
-   vcpu->arch.mmu.base_role.cr4_pge = (cr4 & X86_CR4_PGE) && !tdp_enabled;
kvm_mmu_reset_context(vcpu);
 }
 EXPORT_SYMBOL_GPL(kvm_set_cr4);
-- 
1.6.6.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Glauber Costa

On Fri, Apr 16, 2010 at 01:36:34PM -0700, Jeremy Fitzhardinge wrote:
> On 04/15/2010 11:37 AM, Glauber Costa wrote:
> > In recent stress tests, it was found that pvclock-based systems
> > could seriously warp in smp systems. Using ingo's time-warp-test.c,
> > I could trigger a scenario as bad as 1.5mi warps a minute in some systems.
> >   
> 
> Is that "1.5 million"?
Yes it is.

But as I said, this seem to be a very deep worst case scenario. Most of boxes
are not even close to being that bad.

> 
> > (to be fair, it wasn't that bad in most of them). Investigating further, I
> > found out that such warps were caused by the very offset-based calculation
> > pvclock is based on.
> >   
> 
> Is the problem that the tscs are starting out of sync, or that they're
> drifting relative to each other over time?  Do the problems become worse
> the longer the uptime?  How large are the offsets we're talking about here?
The offsets usually seem pretty small, under a microsecond. So I don't think
it has anything to do with tscs starting out of sync. Specially because the
delta-based calculation has the exact purpose of handling that case.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Glauber Costa

On Mon, Apr 19, 2010 at 02:10:54PM +0300, Avi Kivity wrote:
> On 04/19/2010 02:05 PM, Peter Zijlstra wrote:
> >>
> >>>ACCESS_ONCE() is your friend.
> >>>
> >>I think it's implied with atomic64_read().
> >Yes it would be. I was merely trying to point out that
> >
> >   last = ACCESS_ONCE(last_value);
> >
> >Is a narrower way of writing:
> >
> >   last = last_value;
> >   barrier();
> >
> >In that it need not clobber all memory locations and makes it instantly
> >clear what we want the barrier for.
> 
> Oh yes, just trying to avoid a patch with both atomic64_read() and
> ACCESS_ONCE().
you're mixing the private version of the patch you saw with this one.
there isn't any atomic reads in here. I'll use a barrier then

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ kvm-Bugs-2989366 ] huge memory leak (qemu-kvm 0.12.3)

2010-04-19 Thread SourceForge.net

Bugs item #2989366, was opened at 2010-04-19 15:47
Message generated for change (Tracker Item Submitted) made by tgr1
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2989366&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: tgr (tgr1)
Assigned to: Nobody/Anonymous (nobody)
Summary: huge memory leak (qemu-kvm 0.12.3)

Initial Comment:
CPU: Xeon E5450
KVM: qemu-kvm 0.12.3
Host: Debian Squeeze amd64, kernel 2.6.32
Guest: Debian Lenny amd64, kernel 2.6.26
qemu command line below (no helpers like libvirt used)
whether the problem goes away if using the -no-kvm-irqchip or -no-kvm-pit 
switch: seems unrelated
whether the problem also appears with the -no-kvm switch: load profile of the 
guest makes running without KVM infeasible

The problem: after 4 days, the kvm process of a guest run with -mem 3072 takes 
12 GB VSZ / 9 GB RSS - grows until OOM ("kvm invoked oom-killer" in dmesg).
At that point the guest shows more than 2 GB of free memory.

qemu command line:

kvm -smp 4 -m 3072 -net nic,vlan=0,model=virtio,name=eth0,macaddr=x:x:x:x:x:x 
-net tap,vlan=0,ifname=tap6 -net 
nic,vlan=1,model=virtio,name=eth1,macaddr=x:x:x:x:x:x -net 
tap,vlan=1,ifname=tap7 -drive if=virtio,file=/dev/x/x-rootfs,index=0 -drive 
if=virtio,file=/dev/x/x-swap,index=1 -name x -nographic -kernel 
/boot/vmlinuz-2.6.26-2-amd64 -initrd /boot/initrd.img-2.6.26-2-amd64 -append 
root=/dev/vda console=ttyS0 -echr 2 -serial 
mon:telnet:localhost:12344,server,nowait -monitor 
unix:/var/run/kvm-x,server,nowait -daemonize -watchdog i6300esb 
-watchdog-action reset

(the monitor on both telnet and unix socket is intentional, the one on the 
socket is only used for sending system_reset and system_powerdown qemu 
commands).


--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2989366&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Fix GFP flags passed from the virtio balloon driver

2010-04-19 Thread Balbir Singh

The virtio balloon driver can dig into the reservation pools
of the OS to satisfy a balloon request. This is not advisable
and other balloon drivers (drivers/xen/balloon.c) avoid this
as well. The patch also adds changes to avoid printing a warning
if allocation fails, since we retry after sometime anyway.

To: kvm 
Cc: Rusty Russell 

Signed-off-by: Balbir Singh 
---
 drivers/virtio/virtio_balloon.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 369f2ee..f8ffe8c 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -102,7 +102,8 @@ static void fill_balloon(struct virtio_balloon *vb, size_t 
num)
num = min(num, ARRAY_SIZE(vb->pfns));
 
for (vb->num_pfns = 0; vb->num_pfns < num; vb->num_pfns++) {
-   struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY);
+   struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
+   __GFP_NOMEMALLOC | __GFP_NOWARN);
if (!page) {
if (printk_ratelimit())
dev_printk(KERN_INFO, &vb->vdev->dev,
-- 
1.6.6.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM test: Add cpu_set subtest

2010-04-19 Thread Lucas Meneghel Rodrigues

Tests the ability of adding virtual cpus on the fly to qemu using
the monitor command cpu_set, then after everything is OK, run the
cpu_hotplug testsuite on the guest through autotest.

Updates: The cpu_set feature has been worked out and is in
better shape, however it is my understanding that SeaBIOS code
has to be extended to support cpu_hotplug, like bochs bios did.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/tests/cpu_set.py  |   95 
 client/tests/kvm/tests_base.cfg.sample |7 ++
 2 files changed, 102 insertions(+), 0 deletions(-)
 create mode 100644 client/tests/kvm/tests/cpu_set.py

diff --git a/client/tests/kvm/tests/cpu_set.py 
b/client/tests/kvm/tests/cpu_set.py
new file mode 100644
index 000..e79fc95
--- /dev/null
+++ b/client/tests/kvm/tests/cpu_set.py
@@ -0,0 +1,95 @@
+import os, logging, re
+from autotest_lib.client.common_lib import error
+import kvm_test_utils
+
+def run_cpu_set(test, params, env):
+"""
+Runs CPU set test:
+
+1) Pick up a living guest
+2) Send the monitor command cpu_set [n cpus]
+3) Verify if guest has the additional CPU showing up under
+/sys/devices/system/cpu
+4) Try to bring it online by writing 1 to the 'online' file inside that dir
+5) Run the CPU Hotplug test suite shipped with autotest inside guest
+
+@param test: kvm test object.
+@param params: Dictionary with test parameters.
+@param env: Dictionary with the test environment.
+"""
+vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
+session = kvm_test_utils.wait_for_login(vm)
+
+n_cpus_add = int(params.get("n_cpus_add", 1))
+current_cpus = int(params.get("smp", 1))
+total_cpus = current_cpus + n_cpus_add
+# Let's define some boundaries for this test
+if total_cpus > 100:
+raise error.TestError("Unsupported number of total CPUs: %s" %
+  total_cpus)
+
+dmesg_before = session.get_command_output("dmesg -c")
+
+logging.info("Adding %d CPUs to guest", n_cpus_add)
+for i in range(total_cpus):
+status, output = vm.send_monitor_cmd("cpu_set %s online" % i)
+if status != 0:
+raise error.TestFail("Failed to add CPU %s to guest: %s" %
+ (i, output))
+
+status, output = vm.send_monitor_cmd("info cpus")
+logging.debug("Output of info cpus:\n%s", output)
+cpu_regexp = re.compile("CPU #(\d+)")
+total_cpus_monitor = len(cpu_regexp.findall(output))
+if total_cpus_monitor != total_cpus:
+raise error.TestFail("Monitor reports %s CPUs, when VM should have %s" 
%
+ (total_cpus_monitor, total_cpus))
+
+dmesg_after = session.get_command_output("dmesg -c")
+logging.debug("Guest dmesg output after CPU add:\n%s" % dmesg_after)
+
+# Verify whether the new cpus are showing up on /sys
+n_cmd = 'find /sys/devices/system/cpu/cpu[0-99] -maxdepth 0 -type d | wc 
-l'
+output = session.get_command_output(n_cmd)
+try:
+cpus_after_addition = int(output)
+except ValueError:
+logging.error("Verify command output: %s", output)
+raise error.TestFail("Unable to get CPU count after CPU addition")
+
+if cpus_after_addition != total_cpus:
+raise error.TestFail("%s CPUs are showing up under "
+ "/sys/devices/system/cpu, was expecting %s" %
+ (cpus_after_addition, total_cpus))
+
+r_cmd = 'find /sys/devices/system/cpu/cpu[0-99]/online -maxdepth 0 -type f'
+online_files = session.get_command_output(r_cmd).split().sort()
+
+if not online_files:
+raise error.TestFail("Could not find CPUs that can be "
+ "enabled/disabled on guest")
+
+for online_file in online_files:
+cpu_regexp = re.compile("cpu(\d+)", re.IGNORECASE)
+cpu_id = cpu_regexp.findall(online_file)[0]
+try:
+check_online_status = int(session.get_command_output("cat %s" %
+ online_file))
+except ValueError:
+raise error.TestFail("Unable to get online status from CPU %s" %
+ cpu_id)
+if check_online_status == 0:
+logging.debug("CPU %s offline, bringing it online" % cpu_id)
+bring_online_status = session.get_command_status("echo 1 > %s" %
+ online_file)
+if bring_online_status != 0:
+raise error.TestError("Error bringing guest CPU %s online" %
+  cpu_id)
+
+# Now that all CPUs were onlined, let's execute the
+# autotest CPU Hotplug test
+control_path = os.path.join(test.bindir, "autotest_control",
+"cpu_hotplug.control")
+kvm_test_utils.run_autotest(vm, session, control_path,

Re: does qmp supports usb_add?

2010-04-19 Thread Daniel P. Berrange

On Mon, Apr 19, 2010 at 08:55:37PM +0800, chunhui zhao wrote:
> Thanks Dan.
> However,  as I checked this command "device_add", I did not found QMP
> support it.
> I use the command:
> {"QMP": {"query-command"}}
> In the return list, I did not found the "device_add". The list is as below:
> -
> {"return": [{"name": "quit"}, {"name": "eject"}, {"name": "change"},
> {"name": "stop"}, {"name": "cont"}, {"name": "system_reset"}, {"name":
> "system_powerdown"}, {"name": "memsave"}, {"name": "pmemsave"},
> {"name": "migrate"}, {"name": "migrate_cancel"}, {"name": "pci_add"},
> {"name": "pci_del"}, {"name": "balloon"}, {"name": "getfd"}, {"name":
> "closefd"}, {"name": "block_passwd"}, {"name": "query-version"},
> {"name": "query-commands"}, {"name": "query-chardev"}, {"name":
> "query-block"}, {"name": "query-blockstats"}, {"name": "query-cpus"},
> {"name": "query-hpet"}, {"name": "query-kvm"}, {"name":
> "query-status"}, {"name": "query-mice"}, {"name": "query-vnc"},
> {"name": "query-name"}, {"name": "query-uuid"}, {"name":
> "query-migrate"}, {"name": "query-balloon"}]}
> -
> 
> And on the page : http://www.linux-kvm.org/page/MonitorProtocol.  I
> notice that :
> -
> TODO
> High Priority
> do_device_add()/do_device_del() conversions (markus)
> do_netdev_add()/do_netdev_del() conversions (markus)
> do_blockdev_add()/do_blockdev_del() conversions (markus)
> Events Grouping (luiz)
> Make qmp-shell work again (luiz)
> Self-description & High-level protocol documentation
> High-level internal documentation
> Better QObjects and QMP debug support
> -
> 
> Cause the do_device_add is in the to_do list, does this mean it is not
> finished yet?
> 
> BTW: my qemu version is 0.12.3 which I believe is the latest version.

It is not practical to use QMP with any 0.12.x version of QEMU. The
0.13.x release series will be the first where it is complete enough
to use. Current GIT code is getting reasonably complete - waiting for
netdev_add/del to be merged, and then blockdev_add/del will complete
the functionality for device hotplug use cases.

Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: does qmp supports usb_add?

2010-04-19 Thread chunhui zhao

Thanks Dan.
However,  as I checked this command "device_add", I did not found QMP
support it.
I use the command:
{"QMP": {"query-command"}}
In the return list, I did not found the "device_add". The list is as below:
-
{"return": [{"name": "quit"}, {"name": "eject"}, {"name": "change"},
{"name": "stop"}, {"name": "cont"}, {"name": "system_reset"}, {"name":
"system_powerdown"}, {"name": "memsave"}, {"name": "pmemsave"},
{"name": "migrate"}, {"name": "migrate_cancel"}, {"name": "pci_add"},
{"name": "pci_del"}, {"name": "balloon"}, {"name": "getfd"}, {"name":
"closefd"}, {"name": "block_passwd"}, {"name": "query-version"},
{"name": "query-commands"}, {"name": "query-chardev"}, {"name":
"query-block"}, {"name": "query-blockstats"}, {"name": "query-cpus"},
{"name": "query-hpet"}, {"name": "query-kvm"}, {"name":
"query-status"}, {"name": "query-mice"}, {"name": "query-vnc"},
{"name": "query-name"}, {"name": "query-uuid"}, {"name":
"query-migrate"}, {"name": "query-balloon"}]}
-

And on the page : http://www.linux-kvm.org/page/MonitorProtocol.  I
notice that :
-
TODO
High Priority
do_device_add()/do_device_del() conversions (markus)
do_netdev_add()/do_netdev_del() conversions (markus)
do_blockdev_add()/do_blockdev_del() conversions (markus)
Events Grouping (luiz)
Make qmp-shell work again (luiz)
Self-description & High-level protocol documentation
High-level internal documentation
Better QObjects and QMP debug support
-

Cause the do_device_add is in the to_do list, does this mean it is not
finished yet?

BTW: my qemu version is 0.12.3 which I believe is the latest version.
My linux kernel is 2.6.33

Good Luck!

On Fri, Apr 16, 2010 at 9:28 PM, Daniel P. Berrange  wrote:
> On Fri, Apr 16, 2010 at 09:18:12PM +0800, chunhui zhao wrote:
>> Hi guys.
>>
>>     As the kvm does not support auto connection of USB device, unless
>> someone inputs the command "usb_add" in the monitor.  I wonder if
>> there is a better way, say, like some linux, auto mount the USB device
>> without any command input. And I know there is also qmp(qemu monitor
>> protocol) to use to communicate with the monitor. So I wanna write a
>> program which will capture the hotplug signal and communicate with the
>> monitor to auto mount the USB device.
>>
>> The problem is, the qmp does not support the command "usb_add".
>> (http://www.linux-kvm.org/page/MonitorProtocol)
>> Is there any method to solve this?
>
> usb_add/del & pci_add/del are all deprecated and not going to be ported
> to QMP. Instead use the device_add command eg in plain monitor format:
>
>  device_add usb-host,hostbus=BUS,hostaddr=ADDR,vendorid=VID,productid=PRID
>
> Or QMP format
>
>    { "execute": "device_add", "arguments": { "driver": "usb-host",
>                                              "hostbus": "BUS",
>                                              "hostaddr": "ADDR",
>                                              "vendorid": "VID",
>                                              "productid": "PRID" }}
>
> This accepts identical param names as the -device command line arg which is
> documented in docs/qdev-device-use.txt
>
> Regards,
> Daniel
> --
> |: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
> |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
> |: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
> |: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Timedrift in KVM guests after livemigration.

2010-04-19 Thread Dor Laor


On 04/19/2010 12:29 PM, Gleb Natapov wrote:

On Mon, Apr 19, 2010 at 11:21:47AM +0200, Espen Berg wrote:

Den 18.04.2010 11:56, skrev Gleb Natapov:


That's two different things here:
The issue that Espen is reporting is that the hosts have different
frequency and guests that relay on the tsc as a source clock will
notice that post migration. The is indeed a problem that -tdf does
not solve. -tdf only adds compensation for the RTC clock emulation.


It's -rtc-td-hack. -tdf does pit compensation, but since usually kernel
pit is used it does nothing.


So this "hack" will not solve our problem?


As I also stated, in the past the kvmclock MSRs were not sync upon live 
migration and it was fixed in 1a03675db146dfc760b3b48b3448075189f142cc ,

better check with the code.




If your guest uses RTC for time keeping it may help. Otherwise it does
nothing.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Avi Kivity


On 04/19/2010 02:19 PM, Peter Zijlstra wrote:



Still have two cmpxchgs in the common case.  The first iteration will
fail, fetching last_value, the second will work.

It will be better when we have contention, though, so it's worthwhile.
 

Right, another option is to put the initial read outside of the loop,
that way you'll have the best of all cases, a single LOCK'ed op in the
loop, and only a single LOCK'ed op for the fast path on sensible
architectures ;-)

 last = atomic64_read(&last_value);
 do {
   if (ret<  last)
 return last;
   last = atomic64_cmpxchg(&last_value, last, ret);
 } while (unlikely(last != ret));

Or so.

   


Yes, looks optimal when !NONSTOP_TSC.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Avi Kivity


On 04/19/2010 01:59 PM, Peter Zijlstra wrote:



So what do we need?  test for both TSC_RELIABLE and NONSTOP_TSC?  IMO
TSC_RELIABLE should imply NONSTOP_TSC.
 

Yeah, I think RELIABLE does imply NONSTOP and CONSTANT, but NONSTOP&&
CONSTANT does not make RELIABLE.
   


The manual says:


16.11.1 Invariant TSC

The time stamp counter in newer processors may support an enhancement, 
referred

to as invariant TSC. Processor’s support for invariant TSC is indicated by
CPUID.8007H:EDX[8].
The invariant TSC will run at a constant rate in all ACPI P-, C-. and 
T-states. This is
the architectural behavior moving forward. On processors with 
invariant TSC
support, the OS may use the TSC for wall clock timer services (instead 
of ACPI or
HPET timers). TSC reads are much more efficient and do not incur the 
overhead

associated with a ring transition or access to a platform resource.


and this maps to NONSTOP, so I think we're fine.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Peter Zijlstra

On Mon, 2010-04-19 at 14:13 +0300, Avi Kivity wrote:
> On 04/19/2010 01:56 PM, Peter Zijlstra wrote:
> >
> >
> >>> Right, do bear in mind that the x86 implementation of atomic64_read() is
> >>> terrifyingly expensive, it is better to not do that read and simply use
> >>> the result of the cmpxchg.
> >>>
> >>>
> >>>
> >> atomic64_read() _is_ cmpxchg64b.  Are you thinking of some clever
> >> implementation for smp i386?
> >>  
> >
> > No, what I was suggesting was to rewrite that loop no to need the
> > initial read but use the cmpxchg result of the previous iteration.
> >
> > Something like:
> >
> >u64 last = 0;
> >
> >/* more stuff */
> >
> >do {
> >  if (ret<  last)
> >return last;
> >  last = cmpxchg64(&last_value, last, ret);
> >} while (last != ret);
> >
> > That only has a single cmpxchg8 in there per loop instead of two
> > (avoiding the atomic64_read() one).
> >
> 
> Still have two cmpxchgs in the common case.  The first iteration will 
> fail, fetching last_value, the second will work.
> 
> It will be better when we have contention, though, so it's worthwhile.

Right, another option is to put the initial read outside of the loop,
that way you'll have the best of all cases, a single LOCK'ed op in the
loop, and only a single LOCK'ed op for the fast path on sensible
architectures ;-)

last = atomic64_read(&last_value);
do {
  if (ret < last)
return last;
  last = atomic64_cmpxchg(&last_value, last, ret);
} while (unlikely(last != ret));

Or so.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Avi Kivity


On 04/19/2010 01:56 PM, Peter Zijlstra wrote:


   

Right, do bear in mind that the x86 implementation of atomic64_read() is
terrifyingly expensive, it is better to not do that read and simply use
the result of the cmpxchg.


   

atomic64_read() _is_ cmpxchg64b.  Are you thinking of some clever
implementation for smp i386?
 


No, what I was suggesting was to rewrite that loop no to need the
initial read but use the cmpxchg result of the previous iteration.

Something like:

   u64 last = 0;

   /* more stuff */

   do {
 if (ret<  last)
   return last;
 last = cmpxchg64(&last_value, last, ret);
   } while (last != ret);

That only has a single cmpxchg8 in there per loop instead of two
(avoiding the atomic64_read() one).
   


Still have two cmpxchgs in the common case.  The first iteration will 
fail, fetching last_value, the second will work.


It will be better when we have contention, though, so it's worthwhile.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Avi Kivity


On 04/19/2010 02:05 PM, Peter Zijlstra wrote:



ACCESS_ONCE() is your friend.

   

I think it's implied with atomic64_read().
 

Yes it would be. I was merely trying to point out that

   last = ACCESS_ONCE(last_value);

Is a narrower way of writing:

   last = last_value;
   barrier();

In that it need not clobber all memory locations and makes it instantly
clear what we want the barrier for.
   


Oh yes, just trying to avoid a patch with both atomic64_read() and 
ACCESS_ONCE().




--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM Test: Add KVM unit test (kvmctl) v2

2010-04-19 Thread Lucas Meneghel Rodrigues

From: sshang 

The test use kvm test harness kvmctl load binary test case file
to test various functions of the kvm kernel module.

This test is for unit testing on older KVM branches, after some
consideration we decided to keep the modules for upstream and
branches that use kvmctl separated.

Changes from v1:

 * Fixed some bugs
 * Renamed the test file
 * Made code shorter

Signed-off-by: Shuxi Shang 
---
 client/tests/kvm/tests/unit_test_kvmctl.py |   28 
 client/tests/kvm/tests_base.cfg.sample |   32 
 2 files changed, 60 insertions(+), 0 deletions(-)
 create mode 100644 client/tests/kvm/tests/unit_test_kvmctl.py

diff --git a/client/tests/kvm/tests/unit_test_kvmctl.py 
b/client/tests/kvm/tests/unit_test_kvmctl.py
new file mode 100644
index 000..a5b88b6
--- /dev/null
+++ b/client/tests/kvm/tests/unit_test_kvmctl.py
@@ -0,0 +1,28 @@
+import os
+from autotest_lib.client.bin import utils
+from autotest_lib.client.common_lib import error
+
+
+def run_unit_test_kvmctl(test, params, env):
+"""
+This is kvm userspace unit test, use kvm test harness kvmctl load binary
+test case file to test various functions of the kvm kernel module.
+The output of all unit tests can be found in the test result dir.
+
+@param test: KVM test object.
+@param params: Dictionary with the test parameters.
+@param env: Dictionary with test environment.
+"""
+case = params.get("case")
+srcdir = params.get("srcdir", test.srcdir)
+unit_dir = os.path.join(srcdir, "kvm_userspace", "kvm", "user")
+os.chdir(unit_dir)
+
+cmd = "./kvmctl test/x86/bootstrap test/x86/%s.flat" % case
+try:
+results = utils.system_output(cmd)
+except error.CmdError, e:
+raise error.TestFail("Unit test %s failed" % case)
+
+result_file = os.path.join(test.resultsdir, case)
+utils.open_write_close(result_file, results)
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index e73ba44..9f82ffb 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -306,6 +306,38 @@ variants:
 - ksm_parallel:
 ksm_mode = "parallel"
 
+# This unit test module is for older branches of KVM that use the
+# kvmctl test harness (such as the code shipped with RHEL 5.x)
+- unit_test_kvmctl:
+type = unit_test
+vms = ''
+profilers = ''
+variants:
+- access:
+case = access
+- apic:
+case = apic
+- emulator:
+case = emulator
+- hypercall:
+case = hypercall
+- msr:
+case = msr
+- port80:
+case = port80
+- realmode:
+case = realmode
+- sieve:
+case = sieve
+- smptest:
+case = smptest
+- tsc:
+case = tsc
+- stringio:
+case = stringio
+- vmexit:
+case = vmexit
+
 - qemu_img:
 type = qemu_img
 vms = ''
-- 
1.6.6.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Peter Zijlstra

On Mon, 2010-04-19 at 13:50 +0300, Avi Kivity wrote:
> On 04/19/2010 01:39 PM, Peter Zijlstra wrote:
> > On Fri, 2010-04-16 at 13:36 -0700, Jeremy Fitzhardinge wrote:
> >
> >>> + do {
> >>> + last = last_value;
> >>>
> >>>
> >> Does this need a barrier() to prevent the compiler from re-reading
> >> last_value for the subsequent lines?  Otherwise "(ret<  last)" and
> >> "return last" could execute with different values for "last".
> 
> > ACCESS_ONCE() is your friend.
> >
> 
> I think it's implied with atomic64_read().

Yes it would be. I was merely trying to point out that

  last = ACCESS_ONCE(last_value);

Is a narrower way of writing:

  last = last_value;
  barrier();

In that it need not clobber all memory locations and makes it instantly
clear what we want the barrier for.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Peter Zijlstra

On Mon, 2010-04-19 at 13:53 +0300, Avi Kivity wrote:
> On 04/19/2010 01:49 PM, Peter Zijlstra wrote:
> >
> >> Right, so on x86 we have:
> >>
> >> X86_FEATURE_CONSTANT_TSC, which only states that TSC is frequency
> >> independent, not that it doesn't stop in C states and similar fun stuff.
> >>
> >> X86_FEATURE_TSC_RELIABLE, which IIRC should indicate the TSC is constant
> >> and synced between cores.
> >>  
> > Fun, we also have:
> >
> > X86_FEATURE_NONSTOP_TSC, which states the thing doesn't stop in C
> > states.
> >
> 
> All of them? I though tsc stops in some mwait deep REM sleep thing.

The idea is that TSC will not stop ever (except s2r like stuff), not
sure about the actual implementation of the NONSTOP_TSC bit though.

> So what do we need?  test for both TSC_RELIABLE and NONSTOP_TSC?  IMO 
> TSC_RELIABLE should imply NONSTOP_TSC.

Yeah, I think RELIABLE does imply NONSTOP and CONSTANT, but NONSTOP &&
CONSTANT does not make RELIABLE.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Peter Zijlstra

On Mon, 2010-04-19 at 13:47 +0300, Avi Kivity wrote:
> On 04/19/2010 01:43 PM, Peter Zijlstra wrote:
> >
> >>>
>  +
> cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
> {
> struct pvclock_shadow_time shadow;
> unsigned version;
> cycle_t ret, offset;
>  +u64 last;
> 
> 
>  +do {
>  +last = last_value;
>   
> >>> Otherwise, this assignment can see a partial update.
> >>>
> >> On a 32-bit guest, that is.
> >>  
> > Right, do bear in mind that the x86 implementation of atomic64_read() is
> > terrifyingly expensive, it is better to not do that read and simply use
> > the result of the cmpxchg.
> >
> >
> 
> atomic64_read() _is_ cmpxchg64b.  Are you thinking of some clever 
> implementation for smp i386?


No, what I was suggesting was to rewrite that loop no to need the
initial read but use the cmpxchg result of the previous iteration.

Something like:

  u64 last = 0;

  /* more stuff */
 
  do {
if (ret < last)
  return last;
last = cmpxchg64(&last_value, last, ret);
  } while (last != ret);

That only has a single cmpxchg8 in there per loop instead of two
(avoiding the atomic64_read() one).

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Make Performance Counters work

2010-04-19 Thread Alexander Graf


On 19.04.2010, at 12:34, Avi Kivity wrote:

> On 04/17/2010 01:22 AM, Alexander Graf wrote:
>> When we get a performance counter interrupt we need to route it on to the
>> Linux handler after we got out of the guest context. We also need to tell
>> our handling code that this particular interrupt doesn't need treatment.
>> 
>> So let's add those two bits in, making perf work while having a KVM guest
>> running.
>> 
>>   
> 
> Doesn't apply - does it depend on another patchset?

This patch is deprecated by the "Post-PPC32" patchset. I only noted that in the 
cover letter of the Post-PPC32 patchset, not in a reply to this patch. Sorry.

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Avi Kivity


On 04/19/2010 01:51 PM, Peter Zijlstra wrote:


   

Right, so on x86 we have:

X86_FEATURE_CONSTANT_TSC, which only states that TSC is frequency
independent, not that it doesn't stop in C states and similar fun stuff.

X86_FEATURE_TSC_RELIABLE, which IIRC should indicate the TSC is constant
and synced between cores.


   

Sockets and boards too?  (IOW, how reliable is TSC_RELIABLE)?
 

Not sure, IIRC we clear that when the TSC sync test fails, eg when we
mark the tsc clocksource unusable.
   


Worrying.  By the time we detect this the guest may already have gotten 
confused by clocks going backwards.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Avi Kivity


On 04/19/2010 01:49 PM, Peter Zijlstra wrote:



Right, so on x86 we have:

X86_FEATURE_CONSTANT_TSC, which only states that TSC is frequency
independent, not that it doesn't stop in C states and similar fun stuff.

X86_FEATURE_TSC_RELIABLE, which IIRC should indicate the TSC is constant
and synced between cores.
 

Fun, we also have:

X86_FEATURE_NONSTOP_TSC, which states the thing doesn't stop in C
states.
   


All of them? I though tsc stops in some mwait deep REM sleep thing.

So what do we need?  test for both TSC_RELIABLE and NONSTOP_TSC?  IMO 
TSC_RELIABLE should imply NONSTOP_TSC.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Peter Zijlstra

On Mon, 2010-04-19 at 13:49 +0300, Avi Kivity wrote:
> On 04/19/2010 01:46 PM, Peter Zijlstra wrote:
> > On Sat, 2010-04-17 at 21:48 +0300, Avi Kivity wrote:
> >
> >>> After this patch is applied, I don't see a single warp in time during 5 
> >>> days
> >>> of execution, in any of the machines I saw them before.
> >>>
> >>>
> >>>
> >> Please define a cpuid bit that makes this optional.  When we eventually
> >> enable it in the future, it will allow a wider range of guests to enjoy it.
> >>  
> > Right, so on x86 we have:
> >
> > X86_FEATURE_CONSTANT_TSC, which only states that TSC is frequency
> > independent, not that it doesn't stop in C states and similar fun stuff.
> >
> > X86_FEATURE_TSC_RELIABLE, which IIRC should indicate the TSC is constant
> > and synced between cores.
> >
> >
> 
> Sockets and boards too?  (IOW, how reliable is TSC_RELIABLE)?

Not sure, IIRC we clear that when the TSC sync test fails, eg when we
mark the tsc clocksource unusable.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Avi Kivity


On 04/19/2010 01:39 PM, Peter Zijlstra wrote:

On Fri, 2010-04-16 at 13:36 -0700, Jeremy Fitzhardinge wrote:
   

+ do {
+ last = last_value;

   

Does this need a barrier() to prevent the compiler from re-reading
last_value for the subsequent lines?  Otherwise "(ret<  last)" and
"return last" could execute with different values for "last".

 

ACCESS_ONCE() is your friend.
   


I think it's implied with atomic64_read().

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Peter Zijlstra

On Mon, 2010-04-19 at 12:46 +0200, Peter Zijlstra wrote:
> On Sat, 2010-04-17 at 21:48 +0300, Avi Kivity wrote:
> > > After this patch is applied, I don't see a single warp in time during 5 
> > > days
> > > of execution, in any of the machines I saw them before.
> > >
> > >
> > 
> > Please define a cpuid bit that makes this optional.  When we eventually 
> > enable it in the future, it will allow a wider range of guests to enjoy it.
> 
> Right, so on x86 we have:
> 
> X86_FEATURE_CONSTANT_TSC, which only states that TSC is frequency
> independent, not that it doesn't stop in C states and similar fun stuff.
> 
> X86_FEATURE_TSC_RELIABLE, which IIRC should indicate the TSC is constant
> and synced between cores.

Fun, we also have:

X86_FEATURE_NONSTOP_TSC, which states the thing doesn't stop in C
states.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Avi Kivity


On 04/19/2010 01:46 PM, Peter Zijlstra wrote:

On Sat, 2010-04-17 at 21:48 +0300, Avi Kivity wrote:
   

After this patch is applied, I don't see a single warp in time during 5 days
of execution, in any of the machines I saw them before.


   

Please define a cpuid bit that makes this optional.  When we eventually
enable it in the future, it will allow a wider range of guests to enjoy it.
 

Right, so on x86 we have:

X86_FEATURE_CONSTANT_TSC, which only states that TSC is frequency
independent, not that it doesn't stop in C states and similar fun stuff.

X86_FEATURE_TSC_RELIABLE, which IIRC should indicate the TSC is constant
and synced between cores.

   


Sockets and boards too?  (IOW, how reliable is TSC_RELIABLE)?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Avi Kivity


On 04/19/2010 01:43 PM, Peter Zijlstra wrote:





+
   cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
   {
   struct pvclock_shadow_time shadow;
   unsigned version;
   cycle_t ret, offset;
+u64 last;


+do {
+last = last_value;
 

Otherwise, this assignment can see a partial update.
   

On a 32-bit guest, that is.
 

Right, do bear in mind that the x86 implementation of atomic64_read() is
terrifyingly expensive, it is better to not do that read and simply use
the result of the cmpxchg.

   


atomic64_read() _is_ cmpxchg64b.  Are you thinking of some clever 
implementation for smp i386?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Peter Zijlstra

On Sat, 2010-04-17 at 21:48 +0300, Avi Kivity wrote:
> > After this patch is applied, I don't see a single warp in time during 5 days
> > of execution, in any of the machines I saw them before.
> >
> >
> 
> Please define a cpuid bit that makes this optional.  When we eventually 
> enable it in the future, it will allow a wider range of guests to enjoy it.

Right, so on x86 we have:

X86_FEATURE_CONSTANT_TSC, which only states that TSC is frequency
independent, not that it doesn't stop in C states and similar fun stuff.

X86_FEATURE_TSC_RELIABLE, which IIRC should indicate the TSC is constant
and synced between cores.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Peter Zijlstra

On Sat, 2010-04-17 at 21:49 +0300, Avi Kivity wrote:
> On 04/17/2010 09:48 PM, Avi Kivity wrote:
> >
> >>
> >> +static u64 last_value = 0;
> >
> > Needs to be atomic64_t.
> >
> >> +
> >>   cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
> >>   {
> >>   struct pvclock_shadow_time shadow;
> >>   unsigned version;
> >>   cycle_t ret, offset;
> >> +u64 last;
> >>
> >>
> >> +do {
> >> +last = last_value;
> >
> > Otherwise, this assignment can see a partial update.
> 
> On a 32-bit guest, that is.

Right, do bear in mind that the x86 implementation of atomic64_read() is
terrifyingly expensive, it is better to not do that read and simply use
the result of the cmpxchg.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Peter Zijlstra

On Fri, 2010-04-16 at 13:36 -0700, Jeremy Fitzhardinge wrote:
> > + do {
> > + last = last_value;
> >   
> Does this need a barrier() to prevent the compiler from re-reading
> last_value for the subsequent lines?  Otherwise "(ret < last)" and
> "return last" could execute with different values for "last".
> 
ACCESS_ONCE() is your friend.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Make Performance Counters work

2010-04-19 Thread Avi Kivity


On 04/17/2010 01:22 AM, Alexander Graf wrote:

When we get a performance counter interrupt we need to route it on to the
Linux handler after we got out of the guest context. We also need to tell
our handling code that this particular interrupt doesn't need treatment.

So let's add those two bits in, making perf work while having a KVM guest
running.

   


Doesn't apply - does it depend on another patchset?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH v2 0/3] Provide a zero-copy method on KVM virtio-net.

2010-04-19 Thread Michael S. Tsirkin

On Mon, Apr 19, 2010 at 06:05:17PM +0800, Xin, Xiaohui wrote:
> > Michael,
> > >>> The idea is simple, just to pin the guest VM user space and then
> > >>> let host NIC driver has the chance to directly DMA to it. 
> > >>> The patches are based on vhost-net backend driver. We add a device
> > >>> which provides proto_ops as sendmsg/recvmsg to vhost-net to
> > >>> send/recv directly to/from the NIC driver. KVM guest who use the
> > >>> vhost-net backend may bind any ethX interface in the host side to
> > >>> get copyless data transfer thru guest virtio-net frontend.
> > >>> 
> > >>> The scenario is like this:
> > >>> 
> > >>> The guest virtio-net driver submits multiple requests thru vhost-net
> > >>> backend driver to the kernel. And the requests are queued and then
> > >>> completed after corresponding actions in h/w are done.
> > >>> 
> > >>> For read, user space buffers are dispensed to NIC driver for rx when
> > >>> a page constructor API is invoked. Means NICs can allocate user buffers
> > >>> from a page constructor. We add a hook in netif_receive_skb() function
> > >>> to intercept the incoming packets, and notify the zero-copy device.
> > >>> 
> > >>> For write, the zero-copy deivce may allocates a new host skb and puts
> > >>> payload on the skb_shinfo(skb)->frags, and copied the header to 
> > >>> skb->data.
> > >>> The request remains pending until the skb is transmitted by h/w.
> > >>> 
> > >>> Here, we have ever considered 2 ways to utilize the page constructor
> > >>> API to dispense the user buffers.
> > >>> 
> > >>> One:Modify __alloc_skb() function a bit, it can only allocate a 
> > >>> structure of sk_buff, and the data pointer is pointing to a 
> > >>> user buffer which is coming from a page constructor API.
> > >>> Then the shinfo of the skb is also from guest.
> > >>> When packet is received from hardware, the skb->data is filled
> > >>> directly by h/w. What we have done is in this way.
> > >>> 
> > >>> Pros:   We can avoid any copy here.
> > >>> Cons:   Guest virtio-net driver needs to allocate skb as almost
> > >>> the same method with the host NIC drivers, say the size
> > >>> of netdev_alloc_skb() and the same reserved space in the
> > >>> head of skb. Many NIC drivers are the same with guest 
> > >>> and
> > >>> ok for this. But some lastest NIC drivers reserves 
> > >>> special
> > >>> room in skb head. To deal with it, we suggest to provide
> > >>> a method in guest virtio-net driver to ask for parameter
> > >>> we interest from the NIC driver when we know which 
> > >>> device 
> > >>> we have bind to do zero-copy. Then we ask guest to do 
> > >>> so.
> > >>> Is that reasonable?
> > >>Unfortunately, this would break compatibility with existing virtio.
> > >>This also complicates migration.  
> >> You mean any modification to the guest virtio-net driver will break the
> >> compatibility? We tried to enlarge the virtio_net_config to contains the
> >> 2 parameter, and add one VIRTIO_NET_F_PASSTHRU flag, virtionet_probe()
> >> will check the feature flag, and get the parameters, then virtio-net 
> >> driver use
> >> it to allocate buffers. How about this?
> 
> >This means that we can't, for example, live-migrate between different systems
> >without flushing outstanding buffers.
> 
> Ok. What we have thought about now is to do something with skb_reserve().
> If the device is binded by mp, then skb_reserve() will do nothing with it.
> 
> > >>What is the room in skb head used for?
> > >I'm not sure, but the latest ixgbe driver does this, it reserves 32 bytes 
> > >compared to
> >> NET_IP_ALIGN.
> 
> >Looking at code, this seems to do with alignment - could just be
> >a performance optimization.
> 
> > >>> Two:Modify driver to get user buffer allocated from a page 
> > >>> constructor
> > >>> API(to substitute alloc_page()), the user buffer are used as 
> > >>> payload
> > >>> buffers and filled by h/w directly when packet is received. 
> > >>> Driver
> > >>> should associate the pages with skb (skb_shinfo(skb)->frags). 
> > >>> For 
> > >>> the head buffer side, let host allocates skb, and h/w fills it. 
> > >>> After that, the data filled in host skb header will be copied 
> > >>> into
> > >>> guest header buffer which is submitted together with the 
> > >>> payload buffer.
> > >>> 
> > >>> Pros:   We could less care the way how guest or host allocates 
> > >>> their
> > >>> buffers.
> > >>> Cons:   We still need a bit copy here for the skb header.
> > >>> 
> > >>> We are not sure which way is the better here. 
> > >>The obvious question would be whether you see any speed difference
> > >>with the two approaches. If no, then the second approach would be
> > >>better.
> > 
> >> I remember the second a

Re: [PATCH] KVM-Test: Add KVM unit test (kvmctl)

2010-04-19 Thread Jan Kiszka

Lucas Meneghel Rodrigues wrote:
> On Mon, 2010-04-19 at 10:50 +0200, Jan Kiszka wrote:
>> Lucas Meneghel Rodrigues wrote:
>>> From: sshang 
>>>
>>> The test use kvm test harness kvmctl load binary test case file
>>> to test various functions of the kvm kernel module.
>> I thought that tool is deprecated (or even broken), and I was about to
>> suggest dropping it from the tree in favor of a simple script like this:
>>
>> $QEMU -chardev stdio,id=cons -device testdev,chardev=cons -vga none \
>>  -vnc none -kernel $TESTCASE
> 
> The purpose of the this specific test module is to do the unit testing
> in older branches of the code (such as RHEL 5.x branch). For upstream we
> have another module that will use the -kernel option being worked. I
> plan on calling it unit_test_kernel.

Ah, see. Thanks for the clarification.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM-Test: Add KVM unit test (kvmctl)

2010-04-19 Thread Lucas Meneghel Rodrigues

On Mon, 2010-04-19 at 10:50 +0200, Jan Kiszka wrote:
> Lucas Meneghel Rodrigues wrote:
> > From: sshang 
> > 
> > The test use kvm test harness kvmctl load binary test case file
> > to test various functions of the kvm kernel module.
> 
> I thought that tool is deprecated (or even broken), and I was about to
> suggest dropping it from the tree in favor of a simple script like this:
> 
> $QEMU -chardev stdio,id=cons -device testdev,chardev=cons -vga none \
>   -vnc none -kernel $TESTCASE

The purpose of the this specific test module is to do the unit testing
in older branches of the code (such as RHEL 5.x branch). For upstream we
have another module that will use the -kernel option being worked. I
plan on calling it unit_test_kernel.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] kvm: dereference srcu-protected pointer without srcu_read_lock() held

2010-04-19 Thread Avi Kivity


On 04/19/2010 12:58 PM, Lai Jiangshan wrote:

Applied the patch I just sent and let CONFIG_PROVE_RCU=y,
we can got the following dmesg. And we found that it is
because some codes in KVM dereferences srcu-protected pointer without
srcu_read_lock() held or update-side lock held.

It is not hard to fix, the problem is that:
Where is the most proper place to put a srcu_read_lock()?

I can not determine the answer, so I report this bug
instead of fixing it.

   


I think the else branch in complete_pio() should work.  Marcelo?

Longer term I'd like to see the lock taken at the high levels (ioctls, 
in virt/kvm) and dropped only for guest entry and when we explicitly 
sleep (hlt emulation).


Note: complete_pio() is gone in the current code.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [RFC][PATCH v2 0/3] Provide a zero-copy method on KVM virtio-net.

2010-04-19 Thread Xin, Xiaohui

> Michael,
> >>> The idea is simple, just to pin the guest VM user space and then
> >>> let host NIC driver has the chance to directly DMA to it. 
> >>> The patches are based on vhost-net backend driver. We add a device
> >>> which provides proto_ops as sendmsg/recvmsg to vhost-net to
> >>> send/recv directly to/from the NIC driver. KVM guest who use the
> >>> vhost-net backend may bind any ethX interface in the host side to
> >>> get copyless data transfer thru guest virtio-net frontend.
> >>> 
> >>> The scenario is like this:
> >>> 
> >>> The guest virtio-net driver submits multiple requests thru vhost-net
> >>> backend driver to the kernel. And the requests are queued and then
> >>> completed after corresponding actions in h/w are done.
> >>> 
> >>> For read, user space buffers are dispensed to NIC driver for rx when
> >>> a page constructor API is invoked. Means NICs can allocate user buffers
> >>> from a page constructor. We add a hook in netif_receive_skb() function
> >>> to intercept the incoming packets, and notify the zero-copy device.
> >>> 
> >>> For write, the zero-copy deivce may allocates a new host skb and puts
> >>> payload on the skb_shinfo(skb)->frags, and copied the header to skb->data.
> >>> The request remains pending until the skb is transmitted by h/w.
> >>> 
> >>> Here, we have ever considered 2 ways to utilize the page constructor
> >>> API to dispense the user buffers.
> >>> 
> >>> One:  Modify __alloc_skb() function a bit, it can only allocate a 
> >>>   structure of sk_buff, and the data pointer is pointing to a 
> >>>   user buffer which is coming from a page constructor API.
> >>>   Then the shinfo of the skb is also from guest.
> >>>   When packet is received from hardware, the skb->data is filled
> >>>   directly by h/w. What we have done is in this way.
> >>> 
> >>>   Pros:   We can avoid any copy here.
> >>>   Cons:   Guest virtio-net driver needs to allocate skb as almost
> >>>   the same method with the host NIC drivers, say the size
> >>>   of netdev_alloc_skb() and the same reserved space in the
> >>>   head of skb. Many NIC drivers are the same with guest and
> >>>   ok for this. But some lastest NIC drivers reserves special
> >>>   room in skb head. To deal with it, we suggest to provide
> >>>   a method in guest virtio-net driver to ask for parameter
> >>>   we interest from the NIC driver when we know which device 
> >>>   we have bind to do zero-copy. Then we ask guest to do so.
> >>>   Is that reasonable?
> >>Unfortunately, this would break compatibility with existing virtio.
> >>This also complicates migration.  
>> You mean any modification to the guest virtio-net driver will break the
>> compatibility? We tried to enlarge the virtio_net_config to contains the
>> 2 parameter, and add one VIRTIO_NET_F_PASSTHRU flag, virtionet_probe()
>> will check the feature flag, and get the parameters, then virtio-net driver 
>> use
>> it to allocate buffers. How about this?

>This means that we can't, for example, live-migrate between different systems
>without flushing outstanding buffers.

Ok. What we have thought about now is to do something with skb_reserve().
If the device is binded by mp, then skb_reserve() will do nothing with it.

> >>What is the room in skb head used for?
> >I'm not sure, but the latest ixgbe driver does this, it reserves 32 bytes 
> >compared to
>> NET_IP_ALIGN.

>Looking at code, this seems to do with alignment - could just be
>a performance optimization.

> >>> Two:  Modify driver to get user buffer allocated from a page 
> >>> constructor
> >>>   API(to substitute alloc_page()), the user buffer are used as payload
> >>>   buffers and filled by h/w directly when packet is received. Driver
> >>>   should associate the pages with skb (skb_shinfo(skb)->frags). For 
> >>>   the head buffer side, let host allocates skb, and h/w fills it. 
> >>>   After that, the data filled in host skb header will be copied into
> >>>   guest header buffer which is submitted together with the payload buffer.
> >>> 
> >>>   Pros:   We could less care the way how guest or host allocates their
> >>>   buffers.
> >>>   Cons:   We still need a bit copy here for the skb header.
> >>> 
> >>> We are not sure which way is the better here. 
> >>The obvious question would be whether you see any speed difference
> >>with the two approaches. If no, then the second approach would be
> >>better.
> 
>> I remember the second approach is a bit slower in 1500MTU. 
>> But we did not tested too much.

>Well, that's an important datapoint. By the way, you'll need
>header copy to activate LRO in host, so that's a good
>reason to go with option 2 as well.


> >>> This is the first thing we want
> >>> to get comments from the community. We wish the modification to the 
> >>> network
> >>> part will be generic which not used by vhost-net backend only, but a user
> >>> application may use it as well when the

Re: [PATCH V5 1/3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-19 Thread Avi Kivity


On 04/19/2010 12:22 PM, Zhang, Yanmin wrote:



I don't want to merge tip/master... does tip/perf/core contain the
needed updates?
 

I think so. A moment ago, I checked out to b5a80b7e9 of tip/perf/core. All 3
patches could be applied cleanly and compilation is ok. A quick testing shows
tip/perf/core kernel does work.

   


Thanks.  I applied all three patches to the 'perf' branch in kvm.git and 
merged it to master.


Ingo, please pull

  git://git.kernel.org/pub/scm/virt/kvm/kvm.git perf

into tip's perf/core to receive those three patches.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[BUG] kvm: dereference srcu-protected pointer without srcu_read_lock() held

2010-04-19 Thread Lai Jiangshan

Applied the patch I just sent and let CONFIG_PROVE_RCU=y,
we can got the following dmesg. And we found that it is
because some codes in KVM dereferences srcu-protected pointer without
srcu_read_lock() held or update-side lock held.

It is not hard to fix, the problem is that:
Where is the most proper place to put a srcu_read_lock()?

I can not determine the answer, so I report this bug
instead of fixing it.

Thanks.
Lai.

Reported-by: Lai Jiangshan 


===
[ INFO: suspicious rcu_dereference_check() usage. ]
---
arch/x86/kvm/x86.h:72 invoked rcu_dereference_check() without protection!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 0
1 lock held by qemu-system-x86/3027:
 #0:  (&vcpu->mutex){+.+.+.}, at: [] vcpu_load+0x1a/0x66 [kvm]

stack backtrace:
Pid: 3027, comm: qemu-system-x86 Not tainted 
2.6.34-rc4-tip-01028-g939eab1-dirty #28
Call Trace:
 [] lockdep_rcu_dereference+0xaa/0xb3
 [] unalias_gfn_instantiation+0x56/0xaf [kvm]
 [] gfn_to_hva+0x14/0x4c [kvm]
 [] kvm_write_guest_page+0x2a/0x7f [kvm]
 [] kvm_write_guest+0x41/0x83 [kvm]
 [] kvm_write_guest_virt+0x78/0xa1 [kvm]
 [] pio_copy_data+0x46/0x75 [kvm]
 [] ? sub_preempt_count+0x9/0x83
 [] complete_pio+0x91/0x1b9 [kvm]
 [] kvm_arch_vcpu_ioctl_run+0x93/0xd2b [kvm]
 [] ? kvm_arch_vcpu_ioctl_run+0x8e5/0xd2b [kvm]
 [] ? __lock_acquire+0x7b4/0x16d5
 [] kvm_vcpu_ioctl+0x103/0x97b [kvm]
 [] ? kvm_vm_ioctl+0x364/0x38d [kvm]
 [] ? fget_light+0xf1/0x241
 [] vfs_ioctl+0x32/0xa6
 [] do_vfs_ioctl+0x495/0x4db
 [] ? fget_light+0x231/0x241
 [] ? fget_light+0xf1/0x241
 [] sys_ioctl+0x47/0x6a
 [] system_call_fastpath+0x16/0x1b
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: use the correct RCU API

2010-04-19 Thread Avi Kivity


On 04/19/2010 12:41 PM, Lai Jiangshan wrote:

The RCU/SRCU API have already changed for proving RCU usage.

I got the following dmesg when PROVE_RCU=y because we used incorrect API.
This patch coverts rcu_deference() to srcu_dereference() or family API.

===
[ INFO: suspicious rcu_dereference_check() usage. ]
---
arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 0
2 locks held by qemu-system-x86/8550:
  #0:  (&kvm->slots_lock){+.+.+.}, at: [] 
kvm_set_memory_region+0x29/0x50 [kvm]
  #1:  (&(&kvm->mmu_lock)->rlock){+.+...}, at: [] 
kvm_arch_commit_memory_region+0xa6/0xe2 [kvm]

stack backtrace:
Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 #27
Call Trace:
  [] lockdep_rcu_dereference+0xaa/0xb3
  [] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm]
  [] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm]
  [] __kvm_set_memory_region+0x636/0x6e2 [kvm]
  [] kvm_set_memory_region+0x37/0x50 [kvm]
  [] vmx_set_tss_addr+0x46/0x5a [kvm_intel]
  [] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm]
  [] ? unlock_page+0x27/0x2c
  [] ? __do_fault+0x3a9/0x3e1
  [] kvm_vm_ioctl+0x364/0x38d [kvm]
  [] ? up_read+0x23/0x3d
  [] vfs_ioctl+0x32/0xa6
  [] do_vfs_ioctl+0x495/0x4db
  [] ? fget_light+0xc2/0x241
  [] ? do_sys_open+0x104/0x116
  [] ? retint_swapgs+0xe/0x13
  [] sys_ioctl+0x47/0x6a
  [] system_call_fastpath+0x16/0x1b



+static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm)
+{
+   return rcu_dereference_check(kvm->memslots,
+   srcu_read_lock_held(&kvm->srcu)
+   || lockdep_is_held(&kvm->slots_lock));
+}
+
   



This open-codes srcu_dereference().  I guess we need an 
srcu_dereference_check().  Paul?


btw, perhaps it is possible not to call rcu_dereference from the write 
paths.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm: use the correct RCU API

2010-04-19 Thread Lai Jiangshan

The RCU/SRCU API have already changed for proving RCU usage.

I got the following dmesg when PROVE_RCU=y because we used incorrect API.
This patch coverts rcu_deference() to srcu_dereference() or family API.

===
[ INFO: suspicious rcu_dereference_check() usage. ]
---
arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 0
2 locks held by qemu-system-x86/8550:
 #0:  (&kvm->slots_lock){+.+.+.}, at: [] 
kvm_set_memory_region+0x29/0x50 [kvm]
 #1:  (&(&kvm->mmu_lock)->rlock){+.+...}, at: [] 
kvm_arch_commit_memory_region+0xa6/0xe2 [kvm]

stack backtrace:
Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 #27
Call Trace:
 [] lockdep_rcu_dereference+0xaa/0xb3
 [] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm]
 [] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm]
 [] __kvm_set_memory_region+0x636/0x6e2 [kvm]
 [] kvm_set_memory_region+0x37/0x50 [kvm]
 [] vmx_set_tss_addr+0x46/0x5a [kvm_intel]
 [] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm]
 [] ? unlock_page+0x27/0x2c
 [] ? __do_fault+0x3a9/0x3e1
 [] kvm_vm_ioctl+0x364/0x38d [kvm]
 [] ? up_read+0x23/0x3d
 [] vfs_ioctl+0x32/0xa6
 [] do_vfs_ioctl+0x495/0x4db
 [] ? fget_light+0xc2/0x241
 [] ? do_sys_open+0x104/0x116
 [] ? retint_swapgs+0xe/0x13
 [] sys_ioctl+0x47/0x6a
 [] system_call_fastpath+0x16/0x1b

Signed-off-by: Lai Jiangshan 
---
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 73c5c2b..52b8ece 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1379,7 +1379,7 @@ static void kvm_release_vm_pages(struct kvm *kvm)
int i, j;
unsigned long base_gfn;
 
-   slots = rcu_dereference(kvm->memslots);
+   slots = kvm_memslots(kvm);
for (i = 0; i < slots->nmemslots; i++) {
memslot = &slots->memslots[i];
base_gfn = memslot->base_gfn;
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 60f09ab..cfa9d17 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -72,7 +72,7 @@ static inline void kvm_s390_vcpu_set_mem(struct kvm_vcpu 
*vcpu)
struct kvm_memslots *memslots;
 
idx = srcu_read_lock(&vcpu->kvm->srcu);
-   memslots = rcu_dereference(vcpu->kvm->memslots);
+   memslots = kvm_memslots(vcpu->kvm);
 
mem = &memslots->memslots[0];
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 71faa04..bb61881 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -792,7 +792,7 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long 
hva,
int retval = 0;
struct kvm_memslots *slots;
 
-   slots = rcu_dereference(kvm->memslots);
+   slots = kvm_memslots(kvm);
 
for (i = 0; i < slots->nmemslots; i++) {
struct kvm_memory_slot *memslot = &slots->memslots[i];
@@ -3017,7 +3017,8 @@ unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm)
unsigned int  nr_pages = 0;
struct kvm_memslots *slots;
 
-   slots = rcu_dereference(kvm->memslots);
+   slots = kvm_memslots(kvm);
+
for (i = 0; i < slots->nmemslots; i++)
nr_pages += slots->memslots[i].npages;
 
@@ -3292,7 +3293,7 @@ static int count_rmaps(struct kvm_vcpu *vcpu)
int i, j, k, idx;
 
idx = srcu_read_lock(&kvm->srcu);
-   slots = rcu_dereference(kvm->memslots);
+   slots = kvm_memslots(kvm);
for (i = 0; i < KVM_MEMORY_SLOTS; ++i) {
struct kvm_memory_slot *m = &slots->memslots[i];
struct kvm_rmap_desc *d;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 077cac5..725e7b6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1514,7 +1514,7 @@ static gva_t rmode_tss_base(struct kvm *kvm)
struct kvm_memslots *slots;
gfn_t base_gfn;
 
-   slots = rcu_dereference(kvm->memslots);
+   slots = kvm_memslots(kvm);
base_gfn = kvm->memslots->memslots[0].base_gfn +
 kvm->memslots->memslots[0].npages - 3;
return base_gfn << PAGE_SHIFT;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6120e33..4dcd62c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2408,7 +2408,7 @@ gfn_t unalias_gfn_instantiation(struct kvm *kvm, gfn_t 
gfn)
struct kvm_mem_alias *alias;
struct kvm_mem_aliases *aliases;
 
-   aliases = rcu_dereference(kvm->arch.aliases);
+   aliases = kvm_aliases(kvm);
 
for (i = 0; i < aliases->naliases; ++i) {
alias = &aliases->aliases[i];
@@ -2427,7 +2427,7 @@ gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn)
struct kvm_mem_alias *alias;
struct kvm_mem_aliases *aliases;
 
-   aliases = rcu_dereference(kvm->arch.aliases);
+   aliases = kvm_aliases(kvm);
 
for (i = 0; i < al

Re: Timedrift in KVM guests after livemigration.

2010-04-19 Thread Gleb Natapov

On Mon, Apr 19, 2010 at 11:21:47AM +0200, Espen Berg wrote:
> Den 18.04.2010 11:56, skrev Gleb Natapov:
> 
> >>That's two different things here:
> >>The issue that Espen is reporting is that the hosts have different
> >>frequency and guests that relay on the tsc as a source clock will
> >>notice that post migration. The is indeed a problem that -tdf does
> >>not solve. -tdf only adds compensation for the RTC clock emulation.
> >>
> >It's -rtc-td-hack. -tdf does pit compensation, but since usually kernel
> >pit is used it does nothing.
> 
> So this "hack" will not solve our problem?
> 
If your guest uses RTC for time keeping it may help. Otherwise it does
nothing.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] correctly handle VM Entry Exit reasons and also show them in trace.

2010-04-19 Thread Avi Kivity


On 04/18/2010 09:35 AM, Manish Regmi wrote:

Hi,
  When the vm exit reason is VM Entry failures it has leftmost bit set.
This patch
  - clears the leftmost bit when copying to vmx->exit_reason. This will
make the checks like if ((vmx->exit_reason ==
EXIT_REASON_MCE_DURING_VMENTRY) valid in vmx_complete_interrupts.
  - adds two more EXIT_REASONS 33 and 34 in vmx.h
  - also adds them to exit reason strings.

Please let me know if there is anything missing or wrong. Thank you.

   


The patch is wordwrapped.  Please use git send-email or equivalent.


diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7e2f8d5..e93be6f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3641,7 +3641,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)

exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);

-   vmx->exit_reason = vmcs_read32(VM_EXIT_REASON);
+   vmx->exit_reason = vmcs_read32(VM_EXIT_REASON)&
~VMX_EXIT_REASONS_FAILED_VMENTRY;
   


(here for example)



/* Handle machine checks before interrupts are enabled */
if ((vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY)
@@ -4057,6 +4057,8 @@ static const struct trace_print_flags
vmx_exit_reasons_str[] = {
_ER(IO_INSTRUCTION),
_ER(MSR_READ),
_ER(MSR_WRITE),
+   _ER(INVALID_GUEST_STATE),
+   _ER(MSR_LOADING),
_ER(MWAIT_INSTRUCTION),
_ER(MONITOR_INSTRUCTION),
_ER(PAUSE_INSTRUCTION),
   


Please split the new exit codes into a separate patch.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V5 1/3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-19 Thread Zhang, Yanmin

On Mon, 2010-04-19 at 11:59 +0300, Avi Kivity wrote:
> On 04/19/2010 11:55 AM, Zhang, Yanmin wrote:
> > On Mon, 2010-04-19 at 11:37 +0300, Avi Kivity wrote:
> >
> >> On 04/19/2010 08:32 AM, Zhang, Yanmin wrote:
> >>  
> >>> Below patch introduces perf_guest_info_callbacks and related 
> >>> register/unregister
> >>> functions. Add more PERF_RECORD_MISC_XXX bits meaning guest kernel and 
> >>> guest user
> >>> space.
> >>>
> >>>
> >> This doesn't apply against upstream.
> >>  
> > What does upstream mean here? The vanilla 2.6.34-rc4?
> >
> 
> Yes, sorry for being unclear.
> 
> >>What branch was this generated
> >> against?
> >>
> >>  
> > It's against the latest tip/master. I checked out to 19b26586090 as the 
> > latest
> > tip/master has some updates on perf.
> >
> 
> I don't want to merge tip/master... does tip/perf/core contain the 
> needed updates?
I think so. A moment ago, I checked out to b5a80b7e9 of tip/perf/core. All 3
patches could be applied cleanly and compilation is ok. A quick testing shows
tip/perf/core kernel does work.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Timedrift in KVM guests after livemigration.

2010-04-19 Thread Espen Berg


Den 18.04.2010 11:56, skrev Gleb Natapov:


That's two different things here:
The issue that Espen is reporting is that the hosts have different
frequency and guests that relay on the tsc as a source clock will
notice that post migration. The is indeed a problem that -tdf does
not solve. -tdf only adds compensation for the RTC clock emulation.


It's -rtc-td-hack. -tdf does pit compensation, but since usually kernel
pit is used it does nothing.


So this "hack" will not solve our problem?

Espen


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/1] trace all instructions whose emulation failed

2010-04-19 Thread Avi Kivity


On 04/18/2010 09:33 AM, Manish Regmi wrote:

Hi,
   The following patch makes sure all code path of failed emulation
runs trace_kvm_emulate_insn_failed().
Please let me know if there is anything missing or wrong.
Thank you.

Signed-off-by: Manish Regmi

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b6e7535..fd1e875 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3784,36 +3784,35 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
c =&vcpu->arch.emulate_ctxt.decode;
if (emulation_type&  EMULTYPE_TRAP_UD) {
if (!c->twobyte)
-   return EMULATE_FAIL;
+   goto emulate_failed;
switch (c->b) {
case 0x01: /* VMMCALL */
if (c->modrm_mod != 3 || c->modrm_rm != 1)
-   return EMULATE_FAIL;
+   goto emulate_failed;
break;
case 0x34: /* sysenter */
case 0x35: /* sysexit */
if (c->modrm_mod != 0 || c->modrm_rm != 0)
-   return EMULATE_FAIL;
+   goto emulate_failed;
break;
case 0x05: /* syscall */
if (c->modrm_mod != 0 || c->modrm_rm != 0)
-   return EMULATE_FAIL;
+   goto emulate_failed;;
break;
default:
-   return EMULATE_FAIL;
+   goto emulate_failed;
}

if (!(c->modrm_reg == 0 || c->modrm_reg == 3))
-   return EMULATE_FAIL;
+   goto emulate_failed;
}

++vcpu->stat.insn_emulation;
if (r)  {
++vcpu->stat.insn_emulation_fail;
-   trace_kvm_emulate_insn_failed(vcpu);
if (kvm_mmu_unprotect_page_virt(vcpu, cr2))
return EMULATE_DONE;
-   return EMULATE_FAIL;
+   goto emulate_failed;
}
}

   


It's better not to trace #UD triggered emulations, since we except these 
to fail, for example if the guest executes the UD2 instruction.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 107 matches

Mail list logo