Re: [PATCH][KVM-AUTOTEST] Check exit status of custom install script and fail if script failed.

2009-05-28 Thread Lucas Meneghel Rodrigues
On Sun, 2009-05-24 at 17:48 +0300, Avi Kivity wrote:
 Mike Burns wrote:
  Signed-off-by: Mike Burns mbu...@redhat.com
  ---
   client/tests/kvm_runtest_2/kvm_install.py |7 ++-
   1 files changed, 6 insertions(+), 1 deletions(-)
 
  diff --git a/client/tests/kvm_runtest_2/kvm_install.py 
  b/client/tests/kvm_runtest_2/kvm_install.py
  index ebd8b7d..392ef0c 100755
  --- a/client/tests/kvm_runtest_2/kvm_install.py
  +++ b/client/tests/kvm_runtest_2/kvm_install.py
  @@ -90,7 +90,12 @@ def run_kvm_install(test, params, env):
kvm_log.info(Adding KVM_INSTALL_%s to Environment % (k))
 os.putenv(KVM_INSTALL_%s % (k), str(params[k]))
  kvm_log.info(Running  + script +  to install kvm)
  -os.system(cd %s; %s % (test.bindir, script))
  +install_result = os.system(cd %s; %s % (test.bindir, script))
  +   if os.WEXITSTATUS(install_result) != 0:
  +  message = Custom Script encountered an error
  +  kvm_log.error(message)
  +  raise error.TestError, message
  +

 
 How about a helper that does os.system()  (or rather, 
 commands.getstatusoutput()) and throws an exception on failure?  I 
 imagine it could be used in many places.

utils.system() does that. If we have exit code != 0, it throws an
error.CmdError exception.

-- 
Lucas Meneghel Rodrigues
Software Engineer (QE)
Red Hat - Emerging Technologies

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fw: [KVM] soft lockup with RHEL 5.3 guest remote migaration

2009-05-28 Thread Pradeep K Surisetty

Find the total dmesg for RHEL5.3 guest

http://pastebin.com/f7e22fd1a

Regards
Pradeep


- Forwarded by Pradeep K Surisetty/India/IBM on 05/28/2009 12:13 PM
-
   
 Pradeep K 
 Surisetty/India/I 
 BM To 
   kvm@vger.kernel.org 
 05/27/2009 10:07   cc 
 AMPavan Naregundi/India/i...@ibmin,
   Sachin P Sant/India/i...@ibmin   
   Subject 
   [KVM] soft lockup with RHEL 5.3 
   guest remote migaration 
   
   
   
   
   
   



Tried to migrate RHEL 5.3 guest to remote machine. It fails to migrate with
the
below soft lockup message on guest.

I haven't faced this issue, with qemu-kvm-10.1. Remote migration fails with
qemu-kvm-0.10.4.


=
BUG: soft lockup - CPU#0 stuck for 10s! [init:1]

Pid: 1, comm: init
EIP: 0060:[c044d1e9] CPU: 0
EIP is at handle_IRQ_event+0x39/0x8c
 EFLAGS: 0246Not tainted  (2.6.18-125.el5 #1)
EAX: 000c EBX: c06e7480 ECX: c79a8da0 EDX: c0734fb4
ESI: c79a8da0 EDI: 000c EBP:  DS: 007b ES: 007b
CR0: 8005003b CR2: 08198f00 CR3: 079c7000 CR4: 06d0
 [c044d2c0] __do_IRQ+0x84/0xd6
 [c044d23c] __do_IRQ+0x0/0xd6
 [c04074ce] do_IRQ+0x99/0xc3
 [c0405946] common_interrupt+0x1a/0x20
 [c0428b6f] __do_softirq+0x57/0x114
 [c04073eb] do_softirq+0x52/0x9c
 [c04059d7] apic_timer_interrupt+0x1f/0x24
 [c053a6ae] add_softcursor+0x13/0xa2
 [c053ab36] set_cursor+0x3a/0x5c
 [c053ab9e] con_flush_chars+0x27/0x2f
 [c0533a31] write_chan+0x1c5/0x298
 [c041e3d7] default_wake_function+0x0/0xc
 [c05315ea] tty_write+0x147/0x1d8
 [c053386c] write_chan+0x0/0x298
 [c0531ffd] redirected_tty_write+0x1c/0x6c
 [c0531fe1] redirected_tty_write+0x0/0x6c
 [c0472cff] vfs_write+0xa1/0x143
 [c04732f1] sys_write+0x3c/0x63
 [c0404f17] syscall_call+0x7/0xb
 ===

Source machine:

Machine: x3850
Kernel: 2.6.30-rc6-git4
qemu-kvm-0.10.4


Destination machine:

Machine: LS21
Kernel: 2.6.30-rc6-git3
qemu-kvm-0.10.4


Steps to Reproduce:

1. Install RHEL 5.3 guest on x3850 with above mentioned kernel on qemu
version
2. NFS mount the dir containing the guest image on Destination(LS21)
3. Boot the rhel guest
4. Wait for guest migration on  LS21 by following command
qemu-system-x86_64 -boot c rhel5.3.raw -incoming tcp:0:

5. Start the migration on source(x3850)
  a. On guest press Alt+Ctl+2 to switch to qemu prompt
  b. Run migrate -d tcp:'Destination IP':
6. Above command hang the guest for around 3 or 4 min and give the call
trace
on dmesg showing the  soft lockup on guest.


Few other details:

Tried to migrate from ls21 to another ls21 and faced the same failure.
I haven't faced this issue, when i was using qemu-10.1
Migration of RHEL 5.3 guest from ls21 to another ls21 succeeded with
qemu-10.1


Regards
Pradeep


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] kvm-s390: infrastructure to kick vcpus out of guest state

2009-05-28 Thread Christian Ehrhardt

Marcelo Tosatti wrote:

On Tue, May 26, 2009 at 10:02:59AM +0200, Christian Ehrhardt wrote:
  

Marcelo Tosatti wrote:


On Mon, May 25, 2009 at 01:40:49PM +0200, ehrha...@linux.vnet.ibm.com wrote:
  
  

From: Christian Ehrhardt ehrha...@linux.vnet.ibm.com

To ensure vcpu's come out of guest context in certain cases this patch adds a
s390 specific way to kick them out of guest context. Currently it kicks them
out to rerun the vcpu_run path in the s390 code, but the mechanism itself is
expandable and with a new flag we could also add e.g. kicks to userspace etc.

Signed-off-by: Christian Ehrhardt ehrha...@linux.vnet.ibm.com



For now I added the optimization to skip kicking vcpus out of guest
that had the request bit already set to the s390 specific loop (sent as
v2 in a few minutes).

We might one day consider standardizing some generic kickout levels e.g.
kick to inner loop, arch vcpu run, generic vcpu run, userspace,
... whatever levels fit *all* our use cases. And then let that kicks be
implemented in an kvm_arch_* backend as it might be very different how
they behave on different architectures.

That would be ideal, yes. Two things make_all_requests handles: 


1) It disables preemption with get_cpu(), so it can reliably check for
cpu id. Somehow you don't need that for s390 when kicking multiple
vcpus?
  
  
I don't even need the cpuid as make_all_requests does, I just insert a  
special bit in the vcpu arch part and the vcpu will come out to me 
(host).
Fortunateley the kick is rare and fast so I can just insert it  
unconditionally (it's even ok to insert it if the vcpu is not in guest  
state). That prevents us from needing vcpu lock or detailed checks which  
would end up where we started (no guarantee that vcpu's come out of  
guest context while trying to aquire all vcpu locks)



Let me see if I get this right: you kick the vcpus out of guest mode by
using a special bit in the vcpu arch part. OK.

What I don't understand is this: 
would end up where we started (no guarantee that vcpu's come out of

guest context while trying to aquire all vcpu locks)
  
initially the mechanism looped over vcpu's and just aquired the vcpu 
lock and then updated the vcpu.arch infor directly.
Avi mentioned that we have no guarantee if/when the vcpu will come out 
of guest context to free a lock currently held and suggested the 
mechanism x86 uses via setting vcpu-request and kicking the vcpu. Thats 
the eason behind end up where we (the discussion) started, if we would 
need the vcpu lock again we would be at the beginnign of the discussion.

So you _need_ a mechanism to kick all vcpus out of guest mode?
  
I have a mechanism to kick a vcpu, and I use it. Due to the fact that 
smp_call_* don't work as kick for us the kick is an arch specific function.

I hop ethat clarified this part :-)

2) It uses smp_call_function_many(wait=1), which guarantees that by the
time make_all_requests returns no vcpus will be using stale data (the
remote vcpus will have executed ack_flush).
  
  
yes this is really a part my s390 implementation doesn't fulfill yet.  
Currently on return vcpus might still use the old memslot information.
As mentioned before letting all interrupts come too far out of the hot  
loop would be a performance issue, therefore I think I will need some  
requestconfirm mechanism. I'm not sure yet but maybe it could be as  
easy as this pseudo code example:


# in make_all_requests
# remember we have slots_lock write here and the reentry that updates  
the vcpu specific data aquires slots_lock for read.

loop vcpus
 set_bit in vcpu requests
 kick vcpu #arch function
endloop

loop vcpus
 until the requested bit is disappeared #as the reentry path uses  
test_and_clear it will disappear

endloop

That would be a implicit synchronization and should work, as I wrote  
before setting memslots while the guest is running is rare if ever  
existant for s390. On x86 smp_call_many could then work without the wait  
flag being set.



I see, yes. 

  
But I assume that this synchronization approach is slower as it  
serializes all vcpus on reentry (they wait for the slots_lock to get  
dropped), therefore I wanted to ask how often setting memslots on  
runtime will occur on x86 ? Would this approach be acceptable ?



For x86 we need slots_lock for two things:

1) to protect the memslot structures from changing (very rare), ie:
kvm_set_memory.

2) to protect updates to the dirty bitmap (operations on behalf of
guest) which take slots_lock for read versus updates to that dirty
bitmap (an ioctl that retrieves what pages have been dirtied in the
memslots, and clears the dirtyness info).

All you need for S390 is 1), AFAICS.
  

correct

For 1), we can drop the slots_lock usage, but instead create an
explicit synchronization point, where all vcpus are forced to (say
kvm_vcpu_block) paused state. qemu-kvm has such notion.

Same language?
  

Yes, I think i got your point :-)
But I 

[ kvm-Bugs-2796640 ] KVM Regression from 2.6.28-11 to 2.6.30-rc5

2009-05-28 Thread SourceForge.net
Bugs item #2796640, was opened at 2009-05-26 03:00
Message generated for change (Comment added) made by kelu6
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2796640group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Bob Manners (bobmanners)
Assigned to: Nobody/Anonymous (nobody)
Summary: KVM Regression from 2.6.28-11 to 2.6.30-rc5

Initial Comment:
Changing from the Ubuntu 09.04 stock kernel 2.6.28-11 to the current 'alpha' 
kernel in the Ubuntu development repository for 'Karmic' 2.6.30-rc5, some (but 
not all) of my KVM guests stop booting.

I have the following guest OSs installed as KVM virtual machines: Ubuntu 09.04, 
FreeBSD 7.1, Minix 3, OpenSolaris 08.11, Windows XP and Windows 7RC

Under 2.6.28-11 these all boot and run OK.  With kernel 2.6.30-rc5, Ubuntu, 
FreeBSD and Minix boot and run OK, but OpenSolaris 08.11, Windows XP and 
Windows 7RC fail to boot.  They appear to freeze very early in the boot 
process, after the boot loader is done and the OS kernel itself starts loading. 
 There are no strange messages in the host system logs that I am aware of.  
When the guest OS freezes, it pins one of my CPUs at 100% utilization (I have 
KVM set to only give one CPU to the guest).

This looks like a regression under 2.6.30-rc5.  It is possible that it is 
caused by a Ubuntu patch, and I have filed a bug in Launchpad, but I suspect 
that this is an upstream problem, and so I am reporting it here also.

Please let me know what I can do to assist in debugging this.


--

Comment By: Kenni (kelu6)
Date: 2009-05-28 10:26

Message:
Your CPU still has the NX flag, which apparently also was my issue:
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/31206/focus=31774

--

Comment By: Bob Manners (bobmanners)
Date: 2009-05-28 02:22

Message:
In response to kelu6's question, actually I am running 32 bit host / 32 bit
guest.  My CPU is Intel Core Duo (not Core2 Duo) which is a 32 bit CPU:

b...@gecko2:~$ cat /proc/cpuinfo 
processor   : 0 
vendor_id   : GenuineIntel  
cpu family  : 6 
model   : 14
model name  : Genuine Intel(R) CPU   T2400  @ 1.83GHz
stepping: 8  
cpu MHz : 1000.000   
cache size  : 2048 KB
physical id : 0  
siblings: 2  
core id : 0  
cpu cores   : 2  
apicid  : 0  
initial apicid  : 0  
fdiv_bug: no 
hlt_bug : no 
f00f_bug: no 
coma_bug: no 
fpu : yes
fpu_exception   : yes
cpuid level : 10 
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc
arch_perfmon bts pni monitor vmx est tm2 xtpr pdcm 

bogomips: 3658.94 
 
clflush size: 64  
 
power management: 
 

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 14
model name  : Genuine Intel(R) CPU   T2400  @ 1.83GHz
stepping: 8
cpu MHz : 1000.000
cache size  : 2048 KB
physical id : 0
siblings: 2
core id : 1
cpu cores   : 2
apicid  : 1
initial apicid  : 1
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc
arch_perfmon bts pni monitor vmx est tm2 xtpr pdcm
bogomips: 3658.93
clflush size: 64
power management:



Re: [PATCH 1/3] kvm-s390: infrastructure to kick vcpus out of guest state

2009-05-28 Thread Avi Kivity

Christian Ehrhardt wrote:

So you _need_ a mechanism to kick all vcpus out of guest mode?
  
I have a mechanism to kick a vcpu, and I use it. Due to the fact that 
smp_call_* don't work as kick for us the kick is an arch specific 
function.

I hop ethat clarified this part :-)



You could still use make_all_vcpus_request(), just change 
smp_call_function_many() to your own kicker.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kvm + windows XP, failure to connect to samba server on host

2009-05-28 Thread hoefle marco
Hello,
I use kvm on debian with Windoes XP as guest OS. To have XP in the same
network I use a tap network device. I start kvm with

sudo kvm -m 512 -smp 2 -hda /mnt/ext3_data/Qemu/winxp.img -net
nic,macaddr=52:54:00:00:00:01 -net tap,ifname=tap0,script=no -localtime

from a script.
I have a fat32 partition mounted on the host which I want to share with
XP. I use a samba server as the Quemu build-in samba server did not
work.
In windows I see the samba server but when trying to connect to the
fat32 partition a login is required. First no password works and
secondly I don't want a login, I prefer to be able to connect straight
away. This is the smb.conf: 

[global]
workgroup = NANOTRONIC
server string = Linux SAMBA server3
log file = /var/log/samba/log.%m
max log size = 50


[vfat_data]
comment = fat32 drive
path = /mnt/vfat_data
guest ok = yes

Does anyone have an idea what I am missing?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Userspace MSR handling

2009-05-28 Thread Avi Kivity

Gerd Hoffmann wrote:


- what about connecting the guest driver to xen netback one day?  we 
don't

want to go through userspace for that.


You can't without emulation tons of xen stuff in-kernel.

Current situation:
 * Guest does xen hypercalls.  We can handle that just fine.
 * Host userspace (backends) calls libxen*, where the xen hypercall
   calls are hidden.  We can redirect the library calls via LD_PRELOAD
   (standalone xenner) or function pointers (qemuified xenner) and do
   something else instead.

Trying to use in-kernel xen netback driver adds this problem:
 * Host kernel does xen hypercalls.  Ouch.  We have to emulate them
   in-kernel (otherwise using in-kernel netback would be a quite
   pointless exercise).


Or do the standard function pointer trick.  Event channel notifications 
change to eventfd_signal, grant table ops change to copy_to_user().





One way or another, the MSR somehow has to map in a chunk of data
supplied by userspace. Are you suggesting an alternative to the PIO
hack?


Well, the chunk of data is on disk anyway:
$libdir/xenner/hvm{32,64}.bin

So a possible plan to attack could be ln -s $libdir/xenner 
/lib/firmware, let kvm.ko grab it if needed using 
request_firmware(xenner/hvm${bits}.bin), and a few lines of kernel 
code handling the wrmsr.  Logic is just this:


void xenner_wrmsr(uint64_t val, int longmode)
{
uint32_t page = val  ~PAGE_MASK;
uint64_t paddr = val  PAGE_MASK;
uint8_t *blob = longmode ? hvm64 : hvm32;
cpu_physical_memory_write(paddr, blob + page * PAGE_SIZE,
  PAGE_SIZE);
}

Well, you'll have to sprinkle in blob loading and caching and some 
error checking.  But even with that it is probably hard to beat in 
actual code size.  


This ties all guests to one hypercall page implementation installed in 
one root-only place.



Additional plus is we get away without a new ioctl then.



Minimizing the amount of ioctls is an important non-goal.  If you 
replace request_firmware with an ioctl that defines the location and 
size of the hypercall page in host userspace, this would work well.



--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm-kmod: Add missing line continuation

2009-05-28 Thread Jan Kiszka
Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

 Makefile |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Makefile b/Makefile
index db44772..f51b491 100644
--- a/Makefile
+++ b/Makefile
@@ -30,7 +30,7 @@ all:: prerequisite
$(if $(KERNELSOURCEDIR),-Iinclude2 
-I$(KERNELSOURCEDIR)/include) \
-Iarch/${ARCH_DIR}/include -I`pwd`/include-compat \
-include include/linux/autoconf.h \
-   -include `pwd`/$(ARCH_DIR)/external-module-compat.h 
$(module_defines)
+   -include `pwd`/$(ARCH_DIR)/external-module-compat.h 
$(module_defines) \
$$@
 
 include $(MAKEFILE_PRE)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2D Graphics Performance

2009-05-28 Thread FinnTux
 No, it works on windows too, it is just (quite) a bit of a pain to
 find and setup the
 correct driver.

Any pointers where to look for them?


SR
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v4 3/3] kvm: add iosignalfd support

2009-05-28 Thread Mark McLoughlin
On Wed, 2009-05-27 at 16:45 -0400, Gregory Haskins wrote:
 Mark McLoughlin wrote:
  The virtio ABI is fixed, so we couldn't e.g. have the guest use a cookie
  to identify a queue - it's just going to continue using a per-device
  queue number. 
 
 Actually, I was originally thinking this would be exposed as a virtio
 FEATURE bit anyway, so there were no backwards-compat constraints.  That
 said, we can possibly make it work in a backwards compat way, too. 
 IIRC, today virtio does a PIO cycle to a specific register with the
 queue-id when it wants to signal guest-host, right?  What is the width
 of the write?

It's a 16-bit write.

/* A 16-bit r/w queue notifier */
#define VIRTIO_PCI_QUEUE_NOTIFY 16

  So, if the cookie was also the trigger, we'd need an
  eventfd per device.

 
 I'm having trouble parsing this one.  The cookie namespace is controlled
 by the userspace component that owns the corresponding IO address, so
 there's no reason you can't make queue-id = 0 use cookie = 0, or
 whatever.  That said, I still think a separation of the cookie and
 trigger as suggested above is a good idea, so its probably moot to
 discuss this point further.

Ah, my mistake - I thought the cookie was returned to userspace when the
eventfd was signalled, but no ... userspace only gets an event counter
value and the cookie is used during de-assignment to distinguish between
iosignalfds.

Okay, so suppose you do assign multiple times at a given address -
you're presumably going to use a different eventfd for each assignment?
If so, can't we match using both the address and eventfd at
de-assignment and drop the cookie from the interface altogether?

i.e. to replace the virtio queue notify with this, we'd:

  1) create an eventfd per queue

  2) assign each of those eventfds to the QUEUE_NOTIFY address

  3) have only one of the eventfds be triggered, depending on what 
 value is written by the guest

  4) de-assign using the address/eventfd pair to distinguish between 
 assignments

Cheers,
Mark.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remove qemu_alloc_physram()

2009-05-28 Thread nicolas prochazka
After some tests :
with kvm-85  all seems to be ok for me, multiple qemu runs well with hugepage
with kvm-86 + patch : after some time, vm reboot with strange disk
corruption ( windows tells that's some file missing ) and cpu load is
very important.

Note :
KVM 85 : ( run one qemu with -m 512 )
HugePages_Total: 13676
HugePages_Free:  13409
HugePages_Rsvd:  0
HugePages_Surp:  0

KVM 86 + patche
HugePages_Total: 13676
HugePages_Free:  13412
HugePages_Rsvd:  0
HugePages_Surp:  0

I do not known if this difference is the bug cause.

Regards
Nicolas

2009/5/27 Avi Kivity a...@redhat.com:
 Avi Kivity wrote:

 nicolas prochazka wrote:

 without -mem-prealloc
 HugePages_Total:  2560
 HugePages_Free:   2296
 HugePages_Rsvd:      0

 so after minimum test, i can say that's your patch seems to be correct
 this problem.


 It isn't correct, it doesn't generate the right alignment.

 Better patch attached.

 --
 error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] use explicit 64bit storage for sysenter values

2009-05-28 Thread Andre Przywara
Since AMD does not support sysenter in 64bit mode, the VMCB fields storing
the MSRs are truncated to 32bit upon VMRUN/#VMEXIT. So store the values
in a separate 64bit storage to avoid truncation.

Signed-off-by: Christoph Egger christoph.eg...@amd.com
---
 arch/x86/kvm/kvm_svm.h |4 
 arch/x86/kvm/svm.c |   12 ++--
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/kvm_svm.h b/arch/x86/kvm/kvm_svm.h
index ed66e4c..4129dc1 100644
--- a/arch/x86/kvm/kvm_svm.h
+++ b/arch/x86/kvm/kvm_svm.h
@@ -27,6 +27,10 @@ struct vcpu_svm {
unsigned long vmcb_pa;
struct svm_cpu_data *svm_data;
uint64_t asid_generation;
+   uint64_t sysenter_cs;
+   uint64_t sysenter_esp;
+   uint64_t sysenter_eip;
+   struct kvm_segment user_cs; /* used in sysenter/sysexit emulation */
 
u64 next_rip;
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index dd667dd..f0f2885 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1978,13 +1978,13 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned 
ecx, u64 *data)
break;
 #endif
case MSR_IA32_SYSENTER_CS:
-   *data = svm-vmcb-save.sysenter_cs;
+   *data = svm-sysenter_cs;
break;
case MSR_IA32_SYSENTER_EIP:
-   *data = svm-vmcb-save.sysenter_eip;
+   *data = svm-sysenter_eip;
break;
case MSR_IA32_SYSENTER_ESP:
-   *data = svm-vmcb-save.sysenter_esp;
+   *data = svm-sysenter_esp;
break;
/* Nobody will change the following 5 values in the VMCB so
   we can safely return them on rdmsr. They will always be 0
@@ -2068,13 +2068,13 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned 
ecx, u64 data)
break;
 #endif
case MSR_IA32_SYSENTER_CS:
-   svm-vmcb-save.sysenter_cs = data;
+   svm-sysenter_cs = data;
break;
case MSR_IA32_SYSENTER_EIP:
-   svm-vmcb-save.sysenter_eip = data;
+   svm-sysenter_eip = data;
break;
case MSR_IA32_SYSENTER_ESP:
-   svm-vmcb-save.sysenter_esp = data;
+   svm-sysenter_esp = data;
break;
case MSR_IA32_DEBUGCTLMSR:
if (!svm_has(SVM_FEATURE_LBRV)) {
-- 
1.6.1.3


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v3 3/4] Break dependency between vcpu index in vcpus array and vcpu_id.

2009-05-28 Thread Gleb Natapov
Archs are free to use vcpu_id as they see fit. For x86 it is used as
vcpu's apic id. New ioctl is added to configure boot vcpu id that was
assumed to be 0 till now.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 include/linux/kvm.h  |2 +
 include/linux/kvm_host.h |2 +
 virt/kvm/kvm_main.c  |   50 ++---
 3 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 632a856..d10ab5d 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -430,6 +430,7 @@ struct kvm_trace_rec {
 #ifdef __KVM_HAVE_PIT
 #define KVM_CAP_PIT2 33
 #endif
+#define KVM_CAP_SET_BOOT_CPU_ID 34
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -537,6 +538,7 @@ struct kvm_irqfd {
 #define KVM_DEASSIGN_DEV_IRQ   _IOW(KVMIO, 0x75, struct kvm_assigned_irq)
 #define KVM_IRQFD  _IOW(KVMIO, 0x76, struct kvm_irqfd)
 #define KVM_CREATE_PIT2   _IOW(KVMIO, 0x77, struct 
kvm_pit_config)
+#define KVM_SET_BOOT_CPU_ID_IO(KVMIO, 0x78)
 
 /*
  * ioctls for vcpu fds
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e9e0cd8..e368a14 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -130,8 +130,10 @@ struct kvm {
int nmemslots;
struct kvm_memory_slot memslots[KVM_MEMORY_SLOTS +
KVM_PRIVATE_MEM_SLOTS];
+   u32 bsp_vcpu_id;
struct kvm_vcpu *bsp_vcpu;
struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
+   atomic_t online_vcpus;
struct list_head vm_list;
struct kvm_io_bus mmio_bus;
struct kvm_io_bus pio_bus;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5a55fe0..d65c637 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -693,11 +693,6 @@ out:
 }
 #endif
 
-static inline int valid_vcpu(int n)
-{
-   return likely(n = 0  n  KVM_MAX_VCPUS);
-}
-
 inline int kvm_is_mmio_pfn(pfn_t pfn)
 {
if (pfn_valid(pfn)) {
@@ -1713,15 +1708,12 @@ static int create_vcpu_fd(struct kvm_vcpu *vcpu)
 /*
  * Creates some virtual cpus.  Good luck creating more than one.
  */
-static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, int n)
+static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 {
int r;
struct kvm_vcpu *vcpu;
 
-   if (!valid_vcpu(n))
-   return -EINVAL;
-
-   vcpu = kvm_arch_vcpu_create(kvm, n);
+   vcpu = kvm_arch_vcpu_create(kvm, id);
if (IS_ERR(vcpu))
return PTR_ERR(vcpu);
 
@@ -1732,25 +1724,36 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, 
int n)
return r;
 
mutex_lock(kvm-lock);
-   if (kvm-vcpus[n]) {
-   r = -EEXIST;
+   if (atomic_read(kvm-online_vcpus) == KVM_MAX_VCPUS) {
+   r = -EINVAL;
goto vcpu_destroy;
}
-   kvm-vcpus[n] = vcpu;
-   if (n == 0)
-   kvm-bsp_vcpu = vcpu;
-   mutex_unlock(kvm-lock);
+
+   for (r = 0; r  atomic_read(kvm-online_vcpus); r++)
+   if (kvm-vcpus[r]-vcpu_id == id) {
+   r = -EEXIST;
+   goto vcpu_destroy;
+   }
+
+   BUG_ON(kvm-vcpus[atomic_read(kvm-online_vcpus)]);
 
/* Now it's all set up, let userspace reach it */
kvm_get_kvm(kvm);
r = create_vcpu_fd(vcpu);
-   if (r  0)
-   goto unlink;
+   if (r  0) {
+   kvm_put_kvm(kvm);
+   goto vcpu_destroy;
+   }
+
+   kvm-vcpus[atomic_read(kvm-online_vcpus)] = vcpu;
+   smp_wmb();
+   atomic_inc(kvm-online_vcpus);
+
+   if (kvm-bsp_vcpu_id == id)
+   kvm-bsp_vcpu = vcpu;
+   mutex_unlock(kvm-lock);
return r;
 
-unlink:
-   mutex_lock(kvm-lock);
-   kvm-vcpus[n] = NULL;
 vcpu_destroy:
mutex_unlock(kvm-lock);
kvm_arch_vcpu_destroy(vcpu);
@@ -2223,6 +2226,10 @@ static long kvm_vm_ioctl(struct file *filp,
r = kvm_irqfd(kvm, data.fd, data.gsi, data.flags);
break;
}
+   case KVM_SET_BOOT_CPU_ID:
+   kvm-bsp_vcpu_id = arg;
+   r = 0;
+   break;
default:
r = kvm_arch_vm_ioctl(filp, ioctl, arg);
}
@@ -2289,6 +2296,7 @@ static long kvm_dev_ioctl_check_extension_generic(long 
arg)
case KVM_CAP_USER_MEMORY:
case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
case KVM_CAP_JOIN_MEMORY_REGIONS_WORKS:
+   case KVM_CAP_SET_BOOT_CPU_ID:
return 1;
 #ifdef CONFIG_HAVE_KVM_IRQCHIP
case KVM_CAP_IRQ_ROUTING:
-- 
1.6.2.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v3 0/4] decouple vcpu index from apic id

2009-05-28 Thread Gleb Natapov
Currently vcpu_id is used as an index into vcpus array and as apic id
on x86.  This is incorrect since apic ids not have to be continuous
(they can also encode cpu hierarchy information) and may have values
bigger then vcpu array in case of x2apic. This series removes use of
vcpu_id as vcpus array index. Each architecture may use it how it sees
fit. x86 uses it as apic id.

v2:
In this version vcpus[] is managed by generic code.

v3:
New ioctl is added to specify vcpu_id of bsp vcpu. To maintain backwards
compatibility by default vcpu_id == 0 is bsp.

Gleb Natapov (4):
  Introduce kvm_vcpu_is_bsp() function.
  Use pointer to vcpu instead of vcpu_id in timer code.
  Break dependency between vcpu index in vcpus array and vcpu_id.
  Use macro to iterate over vcpus.

 arch/ia64/kvm/kvm-ia64.c   |   35 ++--
 arch/ia64/kvm/vcpu.c   |2 +-
 arch/powerpc/kvm/powerpc.c |   16 +++
 arch/s390/kvm/kvm-s390.c   |   33 +++
 arch/x86/kvm/i8254.c   |   13 +++-
 arch/x86/kvm/i8259.c   |6 ++--
 arch/x86/kvm/kvm_timer.h   |2 +-
 arch/x86/kvm/lapic.c   |9 +++---
 arch/x86/kvm/mmu.c |6 ++--
 arch/x86/kvm/svm.c |4 +-
 arch/x86/kvm/timer.c   |2 +-
 arch/x86/kvm/vmx.c |6 ++--
 arch/x86/kvm/x86.c |   29 ++--
 include/linux/kvm.h|2 +
 include/linux/kvm_host.h   |   18 
 virt/kvm/ioapic.c  |4 ++-
 virt/kvm/irq_comm.c|6 +---
 virt/kvm/kvm_main.c|   63 +++
 18 files changed, 138 insertions(+), 118 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v3 2/4] Use pointer to vcpu instead of vcpu_id in timer code.

2009-05-28 Thread Gleb Natapov

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/i8254.c |2 +-
 arch/x86/kvm/kvm_timer.h |2 +-
 arch/x86/kvm/lapic.c |2 +-
 arch/x86/kvm/timer.c |2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index de4785a..049cea2 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -291,7 +291,7 @@ static void create_pit_timer(struct kvm_kpit_state *ps, u32 
val, int is_period)
pt-timer.function = kvm_timer_fn;
pt-t_ops = kpit_ops;
pt-kvm = ps-pit-kvm;
-   pt-vcpu_id = 0;
+   pt-vcpu = pt-kvm-bsp_vcpu;
 
atomic_set(pt-pending, 0);
ps-irq_ack = 1;
diff --git a/arch/x86/kvm/kvm_timer.h b/arch/x86/kvm/kvm_timer.h
index 26bd6ba..55c7524 100644
--- a/arch/x86/kvm/kvm_timer.h
+++ b/arch/x86/kvm/kvm_timer.h
@@ -6,7 +6,7 @@ struct kvm_timer {
bool reinject;
struct kvm_timer_ops *t_ops;
struct kvm *kvm;
-   int vcpu_id;
+   struct kvm_vcpu *vcpu;
 };
 
 struct kvm_timer_ops {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 260d4fa..9d0608c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -946,7 +946,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
apic-lapic_timer.timer.function = kvm_timer_fn;
apic-lapic_timer.t_ops = lapic_timer_ops;
apic-lapic_timer.kvm = vcpu-kvm;
-   apic-lapic_timer.vcpu_id = vcpu-vcpu_id;
+   apic-lapic_timer.vcpu = vcpu;
 
apic-base_address = APIC_DEFAULT_PHYS_BASE;
vcpu-arch.apic_base = APIC_DEFAULT_PHYS_BASE;
diff --git a/arch/x86/kvm/timer.c b/arch/x86/kvm/timer.c
index 86dbac0..85cc743 100644
--- a/arch/x86/kvm/timer.c
+++ b/arch/x86/kvm/timer.c
@@ -33,7 +33,7 @@ enum hrtimer_restart kvm_timer_fn(struct hrtimer *data)
struct kvm_vcpu *vcpu;
struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
 
-   vcpu = ktimer-kvm-vcpus[ktimer-vcpu_id];
+   vcpu = ktimer-vcpu;
if (!vcpu)
return HRTIMER_NORESTART;
 
-- 
1.6.2.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v3 1/4] Introduce kvm_vcpu_is_bsp() function.

2009-05-28 Thread Gleb Natapov
Use it instead of open code vcpu_id zero is BSP assumption.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/ia64/kvm/kvm-ia64.c |2 +-
 arch/ia64/kvm/vcpu.c |2 +-
 arch/x86/kvm/i8254.c |4 ++--
 arch/x86/kvm/i8259.c |6 +++---
 arch/x86/kvm/lapic.c |7 ---
 arch/x86/kvm/svm.c   |4 ++--
 arch/x86/kvm/vmx.c   |6 +++---
 arch/x86/kvm/x86.c   |4 ++--
 include/linux/kvm_host.h |5 +
 virt/kvm/ioapic.c|4 +++-
 virt/kvm/kvm_main.c  |2 ++
 11 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 3199221..3924591 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1216,7 +1216,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
if (IS_ERR(vmm_vcpu))
return PTR_ERR(vmm_vcpu);
 
-   if (vcpu-vcpu_id == 0) {
+   if (kvm_vcpu_is_bsp(vcpu)) {
vcpu-arch.mp_state = KVM_MP_STATE_RUNNABLE;
 
/*Set entry address for first run.*/
diff --git a/arch/ia64/kvm/vcpu.c b/arch/ia64/kvm/vcpu.c
index a2c6c15..7e7391d 100644
--- a/arch/ia64/kvm/vcpu.c
+++ b/arch/ia64/kvm/vcpu.c
@@ -830,7 +830,7 @@ static void vcpu_set_itc(struct kvm_vcpu *vcpu, u64 val)
 
kvm = (struct kvm *)KVM_VM_BASE;
 
-   if (vcpu-vcpu_id == 0) {
+   if (kvm_vcpu_is_bsp(vcpu)) {
for (i = 0; i  kvm-arch.online_vcpus; i++) {
v = (struct kvm_vcpu *)((char *)vcpu +
sizeof(struct kvm_vcpu_data) * i);
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 584e3d3..de4785a 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -228,7 +228,7 @@ int pit_has_pending_timer(struct kvm_vcpu *vcpu)
 {
struct kvm_pit *pit = vcpu-kvm-arch.vpit;
 
-   if (pit  vcpu-vcpu_id == 0  pit-pit_state.irq_ack)
+   if (pit  kvm_vcpu_is_bsp(vcpu)  pit-pit_state.irq_ack)
return atomic_read(pit-pit_state.pit_timer.pending);
return 0;
 }
@@ -249,7 +249,7 @@ void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
struct kvm_pit *pit = vcpu-kvm-arch.vpit;
struct hrtimer *timer;
 
-   if (vcpu-vcpu_id != 0 || !pit)
+   if (!kvm_vcpu_is_bsp(vcpu) || !pit)
return;
 
timer = pit-pit_state.pit_timer.timer;
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 1ccb50c..6befa98 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -57,7 +57,7 @@ static void pic_unlock(struct kvm_pic *s)
}
 
if (wakeup) {
-   vcpu = s-kvm-vcpus[0];
+   vcpu = s-kvm-bsp_vcpu;
if (vcpu)
kvm_vcpu_kick(vcpu);
}
@@ -252,7 +252,7 @@ void kvm_pic_reset(struct kvm_kpic_state *s)
 {
int irq, irqbase, n;
struct kvm *kvm = s-pics_state-irq_request_opaque;
-   struct kvm_vcpu *vcpu0 = kvm-vcpus[0];
+   struct kvm_vcpu *vcpu0 = kvm-bsp_vcpu;
 
if (s == s-pics_state-pics[0])
irqbase = 0;
@@ -505,7 +505,7 @@ static void picdev_read(struct kvm_io_device *this,
 static void pic_irq_request(void *opaque, int level)
 {
struct kvm *kvm = opaque;
-   struct kvm_vcpu *vcpu = kvm-vcpus[0];
+   struct kvm_vcpu *vcpu = kvm-bsp_vcpu;
struct kvm_pic *s = pic_irqchip(kvm);
int irq = pic_get_irq(s-pics[0]);
 
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index ae99d83..260d4fa 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -787,7 +787,8 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
vcpu-arch.apic_base = value;
return;
}
-   if (apic-vcpu-vcpu_id)
+
+   if (!kvm_vcpu_is_bsp(apic-vcpu))
value = ~MSR_IA32_APICBASE_BSP;
 
vcpu-arch.apic_base = value;
@@ -844,7 +845,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
}
update_divide_count(apic);
atomic_set(apic-lapic_timer.pending, 0);
-   if (vcpu-vcpu_id == 0)
+   if (kvm_vcpu_is_bsp(vcpu))
vcpu-arch.apic_base |= MSR_IA32_APICBASE_BSP;
apic_update_ppr(apic);
 
@@ -985,7 +986,7 @@ int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
u32 lvt0 = apic_get_reg(vcpu-arch.apic, APIC_LVT0);
int r = 0;
 
-   if (vcpu-vcpu_id == 0) {
+   if (kvm_vcpu_is_bsp(vcpu)) {
if (!apic_hw_enabled(vcpu-arch.apic))
r = 1;
if ((lvt0  APIC_LVT_MASKED) == 0 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index dd667dd..8eede3f 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -605,7 +605,7 @@ static int svm_vcpu_reset(struct kvm_vcpu *vcpu)
 
init_vmcb(svm);
 
-   if (vcpu-vcpu_id != 0) {
+   if (!kvm_vcpu_is_bsp(vcpu)) {
kvm_rip_write(vcpu, 0);
svm-vmcb-save.cs.base = svm-vcpu.arch.sipi_vector  12;
 

[PATCH RFC v3 4/4] Use macro to iterate over vcpus.

2009-05-28 Thread Gleb Natapov

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/ia64/kvm/kvm-ia64.c   |   33 ++---
 arch/powerpc/kvm/powerpc.c |   16 ++--
 arch/s390/kvm/kvm-s390.c   |   33 -
 arch/x86/kvm/i8254.c   |7 ++-
 arch/x86/kvm/mmu.c |6 +++---
 arch/x86/kvm/x86.c |   25 -
 include/linux/kvm_host.h   |   11 +++
 virt/kvm/irq_comm.c|6 ++
 virt/kvm/kvm_main.c|   19 +++
 9 files changed, 77 insertions(+), 79 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 3924591..c1c5cb6 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -337,13 +337,12 @@ static struct kvm_vcpu *lid_to_vcpu(struct kvm *kvm, 
unsigned long id,
 {
union ia64_lid lid;
int i;
+   struct kvm_vcpu *vcpu;
 
-   for (i = 0; i  kvm-arch.online_vcpus; i++) {
-   if (kvm-vcpus[i]) {
-   lid.val = VCPU_LID(kvm-vcpus[i]);
-   if (lid.id == id  lid.eid == eid)
-   return kvm-vcpus[i];
-   }
+   kvm_for_each_vcpu(i, vcpu, kvm) {
+   lid.val = VCPU_LID(vcpu);
+   if (lid.id == id  lid.eid == eid)
+   return vcpu;
}
 
return NULL;
@@ -409,21 +408,21 @@ static int handle_global_purge(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
struct kvm *kvm = vcpu-kvm;
struct call_data call_data;
int i;
+   struct kvm_vcpu *vcpui;
 
call_data.ptc_g_data = p-u.ptc_g_data;
 
-   for (i = 0; i  kvm-arch.online_vcpus; i++) {
-   if (!kvm-vcpus[i] || kvm-vcpus[i]-arch.mp_state ==
-   KVM_MP_STATE_UNINITIALIZED ||
-   vcpu == kvm-vcpus[i])
+   kvm_for_each_vcpu(i, vcpui, kvm) {
+   if (vcpui-arch.mp_state == KVM_MP_STATE_UNINITIALIZED ||
+   vcpu == vcpui)
continue;
 
-   if (waitqueue_active(kvm-vcpus[i]-wq))
-   wake_up_interruptible(kvm-vcpus[i]-wq);
+   if (waitqueue_active(vcpui-wq))
+   wake_up_interruptible(vcpui-wq);
 
-   if (kvm-vcpus[i]-cpu != -1) {
-   call_data.vcpu = kvm-vcpus[i];
-   smp_call_function_single(kvm-vcpus[i]-cpu,
+   if (vcpui-cpu != -1) {
+   call_data.vcpu = vcpui;
+   smp_call_function_single(vcpui-cpu,
vcpu_global_purge, call_data, 1);
} else
printk(KERN_WARNINGkvm: Uninit vcpu received ipi!\n);
@@ -852,8 +851,6 @@ struct  kvm *kvm_arch_create_vm(void)
 
kvm_init_vm(kvm);
 
-   kvm-arch.online_vcpus = 0;
-
return kvm;
 
 }
@@ -1356,8 +1353,6 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
goto fail;
}
 
-   kvm-arch.online_vcpus++;
-
return vcpu;
 fail:
return ERR_PTR(r);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 2cf915e..7ad30e0 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -122,13 +122,17 @@ struct kvm *kvm_arch_create_vm(void)
 static void kvmppc_free_vcpus(struct kvm *kvm)
 {
unsigned int i;
+   struct kvm_vcpu *vcpu;
 
-   for (i = 0; i  KVM_MAX_VCPUS; ++i) {
-   if (kvm-vcpus[i]) {
-   kvm_arch_vcpu_free(kvm-vcpus[i]);
-   kvm-vcpus[i] = NULL;
-   }
-   }
+   kvm_for_each_vcpu(i, vcpu, kvm)
+   kvm_arch_vcpu_free(vcpu);
+
+   mutex_lock(kvm-lock);
+   for (i = 0; i  atomic_read(kvm-online_vcpus); i++)
+   kvm-vcpus[i] = NULL;
+
+   atomic_set(kvm-online_vcpus, 0);
+   mutex_unlock(kvm-lock);
 }
 
 void kvm_arch_sync_events(struct kvm *kvm)
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 981ab04..217d3d3 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -209,13 +209,17 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 static void kvm_free_vcpus(struct kvm *kvm)
 {
unsigned int i;
+   struct kvm_vcpu *vcpu;
 
-   for (i = 0; i  KVM_MAX_VCPUS; ++i) {
-   if (kvm-vcpus[i]) {
-   kvm_arch_vcpu_destroy(kvm-vcpus[i]);
-   kvm-vcpus[i] = NULL;
-   }
-   }
+   kvm_for_each_vcpu(i, vcpu, kvm)
+   kvm_arch_vcpu_destroy(vcpu);
+
+   mutex_lock(kvm-lock);
+   for (i = 0; i  atomic_read(kvm-online_vcpus); i++)
+   kvm-vcpus[i] = NULL;
+
+   atomic_set(kvm-online_vcpus, 0);
+   mutex_unlock(kvm-lock);
 }
 
 void kvm_arch_sync_events(struct kvm *kvm)
@@ -311,8 +315,6 @@ struct kvm_vcpu 

[PATCH 2/2] add sysenter/syscall emulation for 32bit compat mode

2009-05-28 Thread Andre Przywara
sysenter/sysexit are not supported on AMD's 32bit compat mode, whereas
syscall is not supported on Intel's 32bit compat mode. To allow cross
vendor migration we emulate the missing instructions by setting up the
processor state according to the other call.
The sysenter code was originally sketched by Amit Shah, it was completed,
debugged,  syscall added and made-to-work by Christoph Egger and polished
up by Andre Przywara.
Please note that sysret does not need to be emulated, because it will be
exectued in 64bit mode and returning to 32bit compat mode works on Intel.

Signed-off-by: Amit Shah amit.s...@redhat.com
Signed-off-by: Christoph Egger christoph.eg...@amd.com
Signed-off-by: Andre Przywara andre.przyw...@amd.com
---
 arch/x86/kvm/x86.c |   37 -
 arch/x86/kvm/x86_emulate.c |  349 +++-
 2 files changed, 380 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6d44dd5..dae7726 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2593,11 +2593,38 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
/* Reject the instructions other than VMCALL/VMMCALL when
 * try to emulate invalid opcode */
c = vcpu-arch.emulate_ctxt.decode;
-   if ((emulation_type  EMULTYPE_TRAP_UD) 
-   (!(c-twobyte  c-b == 0x01 
- (c-modrm_reg == 0 || c-modrm_reg == 3) 
-  c-modrm_mod == 3  c-modrm_rm == 1)))
-   return EMULATE_FAIL;
+
+   if (emulation_type  EMULTYPE_TRAP_UD) {
+   if (!c-twobyte)
+   return EMULATE_FAIL;
+   switch (c-b) {
+   case 0x01: /* VMMCALL */
+   if (c-modrm_mod != 3)
+   return EMULATE_FAIL;
+   if (c-modrm_rm != 1)
+   return EMULATE_FAIL;
+   break;
+   case 0x34: /* sysenter */
+   case 0x35: /* sysexit */
+   if (c-modrm_mod != 0)
+   return EMULATE_FAIL;
+   if (c-modrm_rm != 0)
+   return EMULATE_FAIL;
+   break;
+   case 0x05: /* syscall */
+   r = 0;
+   if (c-modrm_mod != 0)
+   return EMULATE_FAIL;
+   if (c-modrm_rm != 0)
+   return EMULATE_FAIL;
+   break;
+   default:
+   return EMULATE_FAIL;
+   }
+
+   if (!(c-modrm_reg == 0 || c-modrm_reg == 3))
+   return EMULATE_FAIL;
+   }
 
++vcpu-stat.insn_emulation;
if (r)  {
diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
index 22c765d..41b78fa 100644
--- a/arch/x86/kvm/x86_emulate.c
+++ b/arch/x86/kvm/x86_emulate.c
@@ -32,6 +32,8 @@
 #include linux/module.h
 #include asm/kvm_x86_emulate.h
 
+#include mmu.h
+
 /*
  * Opcode effective-address decode tables.
  * Note that we only emulate instructions that have at least one memory
@@ -217,7 +219,9 @@ static u32 twobyte_table[256] = {
ModRM | ImplicitOps, ModRM, ModRM | ImplicitOps, ModRM, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
/* 0x30 - 0x3F */
-   ImplicitOps, 0, ImplicitOps, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+   ImplicitOps, 0, ImplicitOps, 0,
+   ImplicitOps, ImplicitOps, 0, 0,
+   0, 0, 0, 0, 0, 0, 0, 0,
/* 0x40 - 0x47 */
DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov,
DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov,
@@ -320,8 +324,11 @@ static u32 group2_table[] = {
 };
 
 /* EFLAGS bit definitions. */
+#define EFLG_VM (117)
+#define EFLG_RF (116)
 #define EFLG_OF (111)
 #define EFLG_DF (110)
+#define EFLG_IF (19)
 #define EFLG_SF (17)
 #define EFLG_ZF (16)
 #define EFLG_AF (14)
@@ -1985,10 +1992,114 @@ twobyte_insn:
goto cannot_emulate;
}
break;
+   case 0x05: { /* syscall */
+   unsigned long cr0 = ctxt-vcpu-arch.cr0;
+   struct kvm_segment cs, ss;
+
+   memset(cs, 0, sizeof(struct kvm_segment));
+   memset(ss, 0, sizeof(struct kvm_segment));
+
+   /* inject #UD if
+* 1. we are in real mode
+* 2. protected mode is not enabled
+* 3. LOCK prefix is used
+*/
+   if ((ctxt-mode == X86EMUL_MODE_REAL)
+   || (!(cr0  X86_CR0_PE))
+  

Fw: [KVM] soft lockup with RHEL 5.3 guest remote migaration

2009-05-28 Thread Pradeep K Surisetty

Tried local migration of RHEL 5.3 guest on x3850.
After local migration, both source and destination are in active state and
leaves soft lockup oops.

Thanks
Pradeep

- Forwarded by Pradeep K Surisetty/India/IBM on 05/28/2009 04:01 PM
-
   
 Pradeep K 
 Surisetty/India/I 
 BM To 
   kvm@vger.kernel.org 
 05/28/2009 12:13   cc 
 PMPavan Naregundi/India/i...@ibmin,
   Sachin P Sant/India/i...@ibmin   
   Subject 
   Fw: [KVM] soft lockup with RHEL 5.3 
   guest remote migaration 
   
   
   
   
   
   



Find the total dmesg for RHEL5.3 guest

http://pastebin.com/f7e22fd1a

Regards
Pradeep


- Forwarded by Pradeep K Surisetty/India/IBM on 05/28/2009 12:13 PM
-
   
 Pradeep K 
 Surisetty/India/I 
 BM To 
   kvm@vger.kernel.org 
 05/27/2009 10:07   cc 
 AMPavan Naregundi/India/i...@ibmin,
   Sachin P Sant/India/i...@ibmin   
   Subject 
   [KVM] soft lockup with RHEL 5.3 
   guest remote migaration 
   
   
   
   
   
   



Tried to migrate RHEL 5.3 guest to remote machine. It fails to migrate with
the
below soft lockup message on guest.

I haven't faced this issue, with qemu-kvm-10.1. Remote migration fails with
qemu-kvm-0.10.4.


=
BUG: soft lockup - CPU#0 stuck for 10s! [init:1]

Pid: 1, comm: init
EIP: 0060:[c044d1e9] CPU: 0
EIP is at handle_IRQ_event+0x39/0x8c
 EFLAGS: 0246Not tainted  (2.6.18-125.el5 #1)
EAX: 000c EBX: c06e7480 ECX: c79a8da0 EDX: c0734fb4
ESI: c79a8da0 EDI: 000c EBP:  DS: 007b ES: 007b
CR0: 8005003b CR2: 08198f00 CR3: 079c7000 CR4: 06d0
 [c044d2c0] __do_IRQ+0x84/0xd6
 [c044d23c] __do_IRQ+0x0/0xd6
 [c04074ce] do_IRQ+0x99/0xc3
 [c0405946] common_interrupt+0x1a/0x20
 [c0428b6f] __do_softirq+0x57/0x114
 [c04073eb] do_softirq+0x52/0x9c
 [c04059d7] apic_timer_interrupt+0x1f/0x24
 [c053a6ae] add_softcursor+0x13/0xa2
 [c053ab36] set_cursor+0x3a/0x5c
 [c053ab9e] con_flush_chars+0x27/0x2f
 [c0533a31] write_chan+0x1c5/0x298
 [c041e3d7] default_wake_function+0x0/0xc
 [c05315ea] tty_write+0x147/0x1d8
 [c053386c] write_chan+0x0/0x298
 [c0531ffd] redirected_tty_write+0x1c/0x6c
 [c0531fe1] redirected_tty_write+0x0/0x6c
 [c0472cff] vfs_write+0xa1/0x143
 [c04732f1] sys_write+0x3c/0x63
 [c0404f17] syscall_call+0x7/0xb
 ===

Source machine:

Machine: x3850
Kernel: 2.6.30-rc6-git4
qemu-kvm-0.10.4


Destination machine:

Machine: LS21
Kernel: 2.6.30-rc6-git3
qemu-kvm-0.10.4


Steps to Reproduce:

1. Install RHEL 5.3 guest on x3850 with above mentioned kernel on qemu
version
2. NFS mount the dir containing the guest image on Destination(LS21)
3. Boot the rhel guest
4. Wait for guest migration on  LS21 by following command
qemu-system-x86_64 -boot c rhel5.3.raw -incoming tcp:0:

5. Start the migration on source(x3850)
  a. On guest press Alt+Ctl+2 to switch to qemu prompt
  b. Run migrate -d tcp:'Destination IP':
6. Above command hang the guest for around 3 or 4 min and give the call

Re: [KVM PATCH v2 2/3] kvm: cleanup io_device code

2009-05-28 Thread Gregory Haskins
Chris Wright wrote:
 * Gregory Haskins (ghask...@novell.com) wrote:
   
 We modernize the io_device code so that we use container_of() instead of
 dev-private, and move the vtable to a separate ops structure
 (theoretically allows better caching for multiple instances of the same
 ops structure)
 

 Looks like a nice cleanup.  Couple minor nits.

   
 +static struct kvm_io_device_ops pit_dev_ops = {
 +.read = pit_ioport_read,
 +.write= pit_ioport_write,
 +.in_range = pit_in_range,
 +};
 +
 +static struct kvm_io_device_ops speaker_dev_ops = {
 +.read = speaker_ioport_read,
 +.write= speaker_ioport_write,
 +.in_range = speaker_in_range,
 +};
 

 kvm_io_device_ops instances could be made const.
   

Ack

   
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -2227,7 +2227,7 @@ static struct kvm_io_device 
 *vcpu_find_pervcpu_dev(struct kvm_vcpu *vcpu,
  
  if (vcpu-arch.apic) {
  dev = vcpu-arch.apic-dev;
 -if (dev-in_range(dev, addr, len, is_write))
 +if (dev-ops-in_range(dev, addr, len, is_write))
  return dev;
 

   
 --- a/virt/kvm/iodev.h
 +++ b/virt/kvm/iodev.h
 @@ -18,7 +18,9 @@
  
  #include linux/kvm_types.h
  
 -struct kvm_io_device {
 +struct kvm_io_device;
 +
 +struct kvm_io_device_ops {
  void (*read)(struct kvm_io_device *this,
   gpa_t addr,
   int len,
 @@ -30,16 +32,25 @@ struct kvm_io_device {
  int (*in_range)(struct kvm_io_device *this, gpa_t addr, int len,
  int is_write);
  void (*destructor)(struct kvm_io_device *this);
 +};
 +
  
 -void *private;
 +struct kvm_io_device {
 +struct kvm_io_device_ops *ops;
  };
 

 Did you plan to extend kvm_io_device struct?

   
 +static inline void kvm_iodevice_init(struct kvm_io_device *dev,
 + struct kvm_io_device_ops *ops)
 +{
 +dev-ops = ops;
 +}
 

 And similarly, did you have a plan to do more with kvm_iodevice_init()?
 Otherwise looking a bit like overkill to me.
   

Yeah.  As of right now my plan is to wait for Marcelo's lock cleanup to
go in and integrate with that, and then convert the MMIO/PIO code to use
RCU to acquire a reference to the io_device (so we run as fine-graned
and lockless as possible).  When that happens, you will see an atomic_t
in the struct/init as well.

Even if that doesn't make the cut after review, I am thinking that we
may be making the structure more complex in the future (for instance, to
use a rbtree/hlist instead of the array, or to do tricks with caching
the MRU device, etc.) and this will simplify that effort by already
having all users call the abstracted init. 

That said, we could just defer these hunks until needed.  I just figured
while Im in here but its nbd either way.

   
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -2456,7 +2456,7 @@ struct kvm_io_device *kvm_io_bus_find_dev(struct 
 kvm_io_bus *bus,
  for (i = 0; i  bus-dev_count; i++) {
  struct kvm_io_device *pos = bus-devs[i];
  
 -if (pos-in_range(pos, addr, len, is_write))
 +if (kvm_iodevice_inrange(pos, addr, len, is_write))
  return pos;
  }
 

 You converted this to the helper but not vcpu_find_pervcpu_dev() (not
 convinced it actually helps readability, but consistency is good).

Oops..oversight.  Will fix.

   BTW,
 while there, s/kvm_iodevice_inrange/kvm_iodevice_in_range/ would be nice.
   

Yeah, good idea.  Will fix.

Thanks Chris,
-Greg




signature.asc
Description: OpenPGP digital signature


Re: [KVM-AUTOTEST PATCH] Use new function VM.get_name() to get the VM's name, instead of VM.name

2009-05-28 Thread Lucas Meneghel Rodrigues
On Sun, 2009-05-24 at 18:46 +0300, Michael Goldish wrote:
 kvm_vm.py: add function VM.get_name().
 kvm_preprocessing.py: use VM.get_name() instead of directly accessing the 
 .name
 attribute.

Are there any advantages of creating this method over directly accessing
the attribute?

 Signed-off-by: Michael Goldish mgold...@redhat.com
 ---
  client/tests/kvm_runtest_2/kvm_preprocessing.py |6 +++---
  client/tests/kvm_runtest_2/kvm_vm.py|4 
  2 files changed, 7 insertions(+), 3 deletions(-)
 
 diff --git a/client/tests/kvm_runtest_2/kvm_preprocessing.py 
 b/client/tests/kvm_runtest_2/kvm_preprocessing.py
 index c9eb35d..bcabf5a 100644
 --- a/client/tests/kvm_runtest_2/kvm_preprocessing.py
 +++ b/client/tests/kvm_runtest_2/kvm_preprocessing.py
 @@ -178,7 +178,7 @@ def preprocess(test, params, env):
  if vm.is_dead():
  continue
  if not vm.verify_process_identity():
 -kvm_log.debug(VM '%s' seems to have been replaced by another 
 process % vm.name)
 +kvm_log.debug(VM '%s' seems to have been replaced by another 
 process % vm.get_name())
  vm.pid = None
  
  # Destroy and remove VMs that are no longer needed in the environment
 @@ -187,8 +187,8 @@ def preprocess(test, params, env):
  vm = env[key]
  if not kvm_utils.is_vm(vm):
  continue
 -if not vm.name in requested_vms:
 -kvm_log.debug(VM '%s' found in environment but not required for 
 test; removing it... % vm.name)
 +if not vm.get_name() in requested_vms:
 +kvm_log.debug(VM '%s' found in environment but not required for 
 test; removing it... % vm.get_name())
  vm.destroy()
  del env[key]
  
 diff --git a/client/tests/kvm_runtest_2/kvm_vm.py 
 b/client/tests/kvm_runtest_2/kvm_vm.py
 index fab839f..df99859 100644
 --- a/client/tests/kvm_runtest_2/kvm_vm.py
 +++ b/client/tests/kvm_runtest_2/kvm_vm.py
 @@ -454,6 +454,10 @@ class VM:
  Return True iff the VM's PID does not exist.
  return not kvm_utils.pid_exists(self.pid)
  
 +def get_name(self):
 +Return the VM's name.
 +return self.name
 +
  def get_params(self):
  Return the VM's params dict.
  
-- 
Lucas Meneghel Rodrigues
Software Engineer (QE)
Red Hat - Emerging Technologies

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] kvm-s390: infrastructure to kick vcpus out of guest state

2009-05-28 Thread Christian Ehrhardt

Avi Kivity wrote:

Christian Ehrhardt wrote:

So you _need_ a mechanism to kick all vcpus out of guest mode?
  
I have a mechanism to kick a vcpu, and I use it. Due to the fact that 
smp_call_* don't work as kick for us the kick is an arch specific 
function.

I hop ethat clarified this part :-)



You could still use make_all_vcpus_request(), just change 
smp_call_function_many() to your own kicker.


Yes and I like this idea for further unification, but I don't want it 
mixed too much into the patches in discussion atm.
Because on one hand I have some problems giving my arch specific kick a 
behaviour like return when the guest WAS kicked and on the other hand 
I would e.g. also need to streamline the check in make_all_vcpus_request 
which cpu is running etc because vcpu-cpu stays -1 all the time on s390 
(never used).


Therefore I would unify things step by step and this way allow single 
task to went off my task pile here :-)


--

Grüsse / regards, Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-AUTOTEST PATCH] kvm_runtest_2.py: use pickle instead of shelve when loading/saving env

2009-05-28 Thread Lucas Meneghel Rodrigues
On Sun, 2009-05-24 at 18:46 +0300, Michael Goldish wrote:
 pickle allows more control over the load/save process. Specifically, it 
 enables
 us to dump the contents of an object to disk without having to unpickle it.
 shelve, which uses pickle, seems to pickle and unpickle every time sync() is
 called. This is bad for classes that need to be unpickled only once per test
 (such a class will be introduced in a future patch).

Looks good to me.

 Signed-off-by: Michael Goldish mgold...@redhat.com
 ---
  client/tests/kvm_runtest_2/kvm_runtest_2.py |   29 --
  1 files changed, 22 insertions(+), 7 deletions(-)
 
 diff --git a/client/tests/kvm_runtest_2/kvm_runtest_2.py 
 b/client/tests/kvm_runtest_2/kvm_runtest_2.py
 index a69951b..5f7f6ad 100644
 --- a/client/tests/kvm_runtest_2/kvm_runtest_2.py
 +++ b/client/tests/kvm_runtest_2/kvm_runtest_2.py
 @@ -3,9 +3,9 @@
  import sys
  import os
  import time
 -import shelve
  import random
  import resource
 +import cPickle
  
  from autotest_lib.client.bin import test
  from autotest_lib.client.common_lib import error
 @@ -18,6 +18,22 @@ class test_routine:
  self.routine = None
  
 
 +def dump_env(obj, filename):
 +file = open(filename, w)
 +cPickle.dump(obj, file)
 +file.close()
 +
 +
 +def load_env(filename, default=None):
 +try:
 +file = open(filename, r)
 +except:
 +return default
 +obj = cPickle.load(file)
 +file.close()
 +return obj
 +
 +
  class kvm_runtest_2(test.test):
  version = 1
  
 @@ -65,7 +81,7 @@ class kvm_runtest_2(test.test):
  
  # Open the environment file
  env_filename = os.path.join(self.bindir, params.get(env, env))
 -env = shelve.open(env_filename, writeback=True)
 +env = load_env(env_filename, {})
  kvm_log.debug(Contents of environment: %s % str(env))
  
  try:
 @@ -87,24 +103,23 @@ class kvm_runtest_2(test.test):
  
  # Preprocess
  kvm_preprocessing.preprocess(self, params, env)
 -env.sync()
 +dump_env(env, env_filename)
  # Run the test function
  routine_obj.routine(self, params, env)
 -env.sync()
 +dump_env(env, env_filename)
  
  except Exception, e:
  kvm_log.error(Test failed: %s % e)
  kvm_log.debug(Postprocessing on error...)
  kvm_preprocessing.postprocess_on_error(self, params, env)
 -env.sync()
 +dump_env(env, env_filename)
  raise
  
  finally:
  # Postprocess
  kvm_preprocessing.postprocess(self, params, env)
  kvm_log.debug(Contents of environment: %s % str(env))
 -env.sync()
 -env.close()
 +dump_env(env, env_filename)
  
  def postprocess(self):
  pass
-- 
Lucas Meneghel Rodrigues
Software Engineer (QE)
Red Hat - Emerging Technologies

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-AUTOTEST PATCH] kvm_vm.py: add new VM parameter 'x11_display' that controls $DISPLAY

2009-05-28 Thread Lucas Meneghel Rodrigues
On Sun, 2009-05-24 at 18:46 +0300, Michael Goldish wrote:
 If x11_display is specified, the DISPLAY environment variable is set to this
 value for the QEMU process. This may be useful for SDL rendering.

Looks good to me!

 Also add some comments.
 
 Signed-off-by: Michael Goldish mgold...@redhat.com
 ---
  client/tests/kvm_runtest_2/kvm_vm.py |   12 +++-
  1 files changed, 11 insertions(+), 1 deletions(-)
 
 diff --git a/client/tests/kvm_runtest_2/kvm_vm.py 
 b/client/tests/kvm_runtest_2/kvm_vm.py
 index 9571a3b..7a4ce4a 100644
 --- a/client/tests/kvm_runtest_2/kvm_vm.py
 +++ b/client/tests/kvm_runtest_2/kvm_vm.py
 @@ -173,6 +173,8 @@ class VM:
  (iso_dir is pre-pended to the ISO filename)
  extra_params -- a string to append to the qemu command
  ssh_port -- should be 22 for SSH, 23 for Telnet
 +x11_display -- if specified, the DISPLAY environment variable 
 will be be set
 +to this value for the qemu process (useful for SDL rendering)
  images -- a list of image object names, separated by spaces
  nics -- a list of NIC object names, separated by spaces
  
 @@ -198,8 +200,16 @@ class VM:
  if iso_dir == None:
  iso_dir = self.iso_dir
  
 -qemu_cmd = qemu_path
 +# Start constructing the qemu command
 +qemu_cmd = 
 +# Set the X11 display parameter if requested
 +if params.get(x11_display):
 +qemu_cmd += DISPLAY=%s  % params.get(x11_display)
 +# Add the qemu binary
 +qemu_cmd += qemu_path
 +# Add the VM's name
  qemu_cmd +=  -name '%s' % name
 +# Add the monitor socket parameter
  qemu_cmd +=  -monitor unix:%s,server,nowait % 
 self.monitor_file_name
  
  for image_name in kvm_utils.get_sub_dict_names(params, images):
-- 
Lucas Meneghel Rodrigues
Software Engineer (QE)
Red Hat - Emerging Technologies

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-AUTOTEST PATCH] kvm_runtest_2.py: use environment filename specified by the 'env' parameter

2009-05-28 Thread Lucas Meneghel Rodrigues
On Sun, 2009-05-24 at 18:46 +0300, Michael Goldish wrote:
 Do not use hardcoded environment filename 'env'. Instead use the value
 specified by the 'env' parameter. If unspecified, use 'env' as the filename.

Looks good to me!

 This is important for parallel execution; it may be necessary to use a 
 separate
 environment file for each process.
 
 Signed-off-by: Michael Goldish mgold...@redhat.com
 ---
  client/tests/kvm_runtest_2/kvm_runtest_2.py |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/client/tests/kvm_runtest_2/kvm_runtest_2.py 
 b/client/tests/kvm_runtest_2/kvm_runtest_2.py
 index fda7282..a69951b 100644
 --- a/client/tests/kvm_runtest_2/kvm_runtest_2.py
 +++ b/client/tests/kvm_runtest_2/kvm_runtest_2.py
 @@ -64,7 +64,7 @@ class kvm_runtest_2(test.test):
  self.write_test_keyval({key: params[key]})
  
  # Open the environment file
 -env_filename = os.path.join(self.bindir, env)
 +env_filename = os.path.join(self.bindir, params.get(env, env))
  env = shelve.open(env_filename, writeback=True)
  kvm_log.debug(Contents of environment: %s % str(env))
  
-- 
Lucas Meneghel Rodrigues
Software Engineer (QE)
Red Hat - Emerging Technologies

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: just a dump

2009-05-28 Thread Lucas Meneghel Rodrigues
On Wed, 2009-05-27 at 09:43 +0200, Hans de Bruin wrote:
 
 [09:09:47 INFO ] Test finished after 1 iterations.
 Memory test passed.
 [09:09:48 DEBUG] Running 'grep MemTotal /proc/meminfo'
 [09:09:48 DEBUG] Running 'rpm -qa'
 [09:09:49 INFO ]GOODdma_memtest dma_memtest 
 timestamp=1243408189localtime=May 27 09:09:49   completed 
 successfully
 [09:09:49 DEBUG] Persistent state variable __group_level now set to 1
 [09:09:49 INFO ]END GOODdma_memtest dma_memtest 
 timestamp=1243408189localtime=May 27 09:09:49
 [09:09:49 DEBUG] Dropping caches
 [09:09:49 DEBUG] Running 'sync'
 [09:09:51 DEBUG] Running 'sync'
 [09:09:51 DEBUG] Running 'echo 3  /proc/sys/vm/drop_caches'
 [09:09:52 DEBUG] Persistent state variable __group_level now set to 0
 [09:09:52 INFO ] END GOOD   timestamp=1243408192 
 localtime=May 27 09:09:52
 
 Well that looks good. The web page talks about forcing the system to 
 swap. That never happend swap usage is still 0 bytes. I installed 
 autotest on the same lv (3 disk stripe) as the vmdisks.

Interesting, about that I made some tests and just realized that in some
cases both mine and the original shell implementation are failing on
forcing the system to go to swap (the tests I made when the test was
firstly implemented did manage to make the machines swap, but in
conditions where the system had a significantly higher initial memory
usage). I will work on a better heuristic to force swap.

Thanks for pointing this out,

-- 
Lucas Meneghel Rodrigues
Software Engineer (QE)
Red Hat - Emerging Technologies

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-AUTOTEST PATCH] VM.create(): always destroy() the VM before attempting to start it

2009-05-28 Thread Lucas Meneghel Rodrigues
On Sun, 2009-05-24 at 18:46 +0300, Michael Goldish wrote:
 Also, don't do it in kvm_preprocessing.py since it's now done in kvm_vm.py.

Looks good to me!

 Signed-off-by: Michael Goldish mgold...@redhat.com
 ---
  client/tests/kvm_runtest_2/kvm_preprocessing.py |1 -
  client/tests/kvm_runtest_2/kvm_vm.py|2 ++
  2 files changed, 2 insertions(+), 1 deletions(-)
 
 diff --git a/client/tests/kvm_runtest_2/kvm_preprocessing.py 
 b/client/tests/kvm_runtest_2/kvm_preprocessing.py
 index bcabf5a..9ccaf78 100644
 --- a/client/tests/kvm_runtest_2/kvm_preprocessing.py
 +++ b/client/tests/kvm_runtest_2/kvm_preprocessing.py
 @@ -84,7 +84,6 @@ def preprocess_vm(test, params, env, name):
  start_vm = True
  
  if start_vm:
 -vm.destroy()
  if not vm.create(name, params, qemu_path, image_dir, iso_dir, 
 for_migration):
  message = Could not start VM
  kvm_log.error(message)
 diff --git a/client/tests/kvm_runtest_2/kvm_vm.py 
 b/client/tests/kvm_runtest_2/kvm_vm.py
 index df99859..a1462c6 100644
 --- a/client/tests/kvm_runtest_2/kvm_vm.py
 +++ b/client/tests/kvm_runtest_2/kvm_vm.py
 @@ -238,6 +238,8 @@ class VM:
  stored in the class attributes is used, and if it is supplied, it is 
 stored
  for later use.
  
 +self.destroy()
 +
  if name != None:
  self.name = name
  if params != None:
-- 
Lucas Meneghel Rodrigues
Software Engineer (QE)
Red Hat - Emerging Technologies

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-05-28 Thread Tomasz Chmielewski

Avi Kivity wrote:

Tomasz Chmielewski wrote:

Maybe virtio is racy and a loaded host exposes the race.


I see it happening with virtio on 2.6.29.x guests as well.

So, what would you do if you saw it on your systems as well? ;)

Add some debug routines into virtio_* modules?



I'm no virtio expert.  Maybe I'd insert tracepoints to record interrupts 
and kicks.


Accidentally, I made some interesting discovery.

This ~2 MB video shows a kvm-86 guest being rebooted and GRUB started:

http://syneticon.net/kvm/kvm-slowness.ogg


GRUB has its timeout set to 50 seconds, and is supposed to show it on 
the screen by decreasing the number of seconds shown, every second.


Here, GRUB decreases the second counter very fast by 2 seconds, then 
waits 2 seconds, then again decreases the number of sends by 2 seconds 
very fast, and so on.


Perhaps my wording does not describe it very well though, so just try to 
download the video and open it i.e. in mplayer.



Comments?


--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] kvm-s390: update vcpu-cpu

2009-05-28 Thread ehrhardt
From: Christian Ehrhardt ehrha...@linux.vnet.ibm.com

kvm on s390 formerly ignored vcpu-cpu.
This patch adds set/unset vcpu-cpu in kvm_arch_vcpu_load/put to allow
further architecture unification e.g. let generic code not find -1 on
currently scheduled vcpus.

Signed-off-by: Christian Ehrhardt ehrha...@linux.vnet.ibm.com
---

[diffstat]
 kvm-s390.c |2 ++
 1 file changed, 2 insertions(+)

[diff]
Index: kvm/arch/s390/kvm/kvm-s390.c
===
--- kvm.orig/arch/s390/kvm/kvm-s390.c
+++ kvm/arch/s390/kvm/kvm-s390.c
@@ -243,6 +243,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcp
 
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
+   vcpu-cpu = cpu;
save_fp_regs(vcpu-arch.host_fpregs);
save_access_regs(vcpu-arch.host_acrs);
vcpu-arch.guest_fpregs.fpc = FPC_VALID_MASK;
@@ -252,6 +253,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu 
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+   vcpu-cpu = -1;
save_fp_regs(vcpu-arch.guest_fpregs);
save_access_regs(vcpu-arch.guest_acrs);
restore_fp_regs(vcpu-arch.host_fpregs);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] kvm-s390: streamline memslot handling - v4

2009-05-28 Thread ehrhardt
From: Christian Ehrhardt ehrha...@linux.vnet.ibm.com

*updates in v4*
- kickout only scheduled vcpus (its superfluous and wait might hang forever on
  not running vcpus)

*updates in v3*
- handling the mmu reload vcpu request can now be handled inside the sigp
  handling avoiding an addtional exit
- kvm_arch_set_memory_region now waits for kicked vcpu's to consume the request
  bit it set to ensure that after the kvm_arch_set_memory_region call all vcpus
  use the updated memory information

*updates in v2*
- added optimization to skip (addtional) kickout of vcpu's that had the request
  already set.

This patch relocates the variables kvm-s390 uses to track guest mem addr/size.
As discussed dropping the variables at struct kvm_arch level allows to use the
common vcpu-request based mechanism to reload guest memory if e.g. changes
via set_memory_region.
The kick mechanism introduced in this series is used to ensure running vcpus
leave guest state to catch the update.


Signed-off-by: Christian Ehrhardt ehrha...@linux.vnet.ibm.com
---

[diffstat]
 include/asm/kvm_host.h |4 ---
 kvm/gaccess.h  |   23 ++-
 kvm/intercept.c|6 ++---
 kvm/kvm-s390.c |   57 -
 kvm/kvm-s390.h |   33 +++-
 kvm/sigp.c |4 +--
 6 files changed, 74 insertions(+), 53 deletions(-)

[diff]
Index: kvm/arch/s390/kvm/gaccess.h
===
--- kvm.orig/arch/s390/kvm/gaccess.h
+++ kvm/arch/s390/kvm/gaccess.h
@@ -1,7 +1,7 @@
 /*
  * gaccess.h -  access guest memory
  *
- * Copyright IBM Corp. 2008
+ * Copyright IBM Corp. 2008,2009
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License (version 2 only)
@@ -16,13 +16,14 @@
 #include linux/compiler.h
 #include linux/kvm_host.h
 #include asm/uaccess.h
+#include kvm-s390.h
 
 static inline void __user *__guestaddr_to_user(struct kvm_vcpu *vcpu,
   unsigned long guestaddr)
 {
unsigned long prefix  = vcpu-arch.sie_block-prefix;
-   unsigned long origin  = vcpu-kvm-arch.guest_origin;
-   unsigned long memsize = vcpu-kvm-arch.guest_memsize;
+   unsigned long origin  = vcpu-arch.sie_block-gmsor;
+   unsigned long memsize = kvm_s390_vcpu_get_memsize(vcpu);
 
if (guestaddr  2 * PAGE_SIZE)
guestaddr += prefix;
@@ -158,8 +159,8 @@ static inline int copy_to_guest(struct k
const void *from, unsigned long n)
 {
unsigned long prefix  = vcpu-arch.sie_block-prefix;
-   unsigned long origin  = vcpu-kvm-arch.guest_origin;
-   unsigned long memsize = vcpu-kvm-arch.guest_memsize;
+   unsigned long origin  = vcpu-arch.sie_block-gmsor;
+   unsigned long memsize = kvm_s390_vcpu_get_memsize(vcpu);
 
if ((guestdest  2 * PAGE_SIZE)  (guestdest + n  2 * PAGE_SIZE))
goto slowpath;
@@ -209,8 +210,8 @@ static inline int copy_from_guest(struct
  unsigned long guestsrc, unsigned long n)
 {
unsigned long prefix  = vcpu-arch.sie_block-prefix;
-   unsigned long origin  = vcpu-kvm-arch.guest_origin;
-   unsigned long memsize = vcpu-kvm-arch.guest_memsize;
+   unsigned long origin  = vcpu-arch.sie_block-gmsor;
+   unsigned long memsize = kvm_s390_vcpu_get_memsize(vcpu);
 
if ((guestsrc  2 * PAGE_SIZE)  (guestsrc + n  2 * PAGE_SIZE))
goto slowpath;
@@ -244,8 +245,8 @@ static inline int copy_to_guest_absolute
 unsigned long guestdest,
 const void *from, unsigned long n)
 {
-   unsigned long origin  = vcpu-kvm-arch.guest_origin;
-   unsigned long memsize = vcpu-kvm-arch.guest_memsize;
+   unsigned long origin  = vcpu-arch.sie_block-gmsor;
+   unsigned long memsize = kvm_s390_vcpu_get_memsize(vcpu);
 
if (guestdest + n  memsize)
return -EFAULT;
@@ -262,8 +263,8 @@ static inline int copy_from_guest_absolu
   unsigned long guestsrc,
   unsigned long n)
 {
-   unsigned long origin  = vcpu-kvm-arch.guest_origin;
-   unsigned long memsize = vcpu-kvm-arch.guest_memsize;
+   unsigned long origin  = vcpu-arch.sie_block-gmsor;
+   unsigned long memsize = kvm_s390_vcpu_get_memsize(vcpu);
 
if (guestsrc + n  memsize)
return -EFAULT;
Index: kvm/arch/s390/kvm/intercept.c
===
--- kvm.orig/arch/s390/kvm/intercept.c
+++ kvm/arch/s390/kvm/intercept.c
@@ -1,7 +1,7 @@
 /*
  * intercept.c - in-kernel handling for sie intercepts
  *
- * Copyright IBM Corp. 2008
+ * Copyright IBM Corp. 2008,2009
  *
  * This program is free software; you can 

[PATCH 1/5] kvm-s390: infrastructure to kick vcpus out of guest state - v3

2009-05-28 Thread ehrhardt
From: Christian Ehrhardt ehrha...@linux.vnet.ibm.com

*updates in v3*
- ensure allocations (might_sleep) are out of atomic context

*updates in v2*
instead of a kick to level behaviour the patch now implements a kick to the
lowest level. The check there bails out to upper levels if not all outstanding
vcpu-requests could be handled internally (it could still support an explicit
kick to level if ever needed).

To ensure vcpu's come out of guest context in certain cases this patch adds a
s390 specific way to kick them out of guest context. Currently it kicks them
out to rerun the vcpu_run path in the s390 code, but the mechanism itself is
expandable and with a new flag we could also add e.g. kicks to userspace etc.

Signed-off-by: Christian Ehrhardt ehrha...@linux.vnet.ibm.com
---

[diffstat]
 include/asm/kvm_host.h |5 ++--
 kvm/intercept.c|   14 +--
 kvm/kvm-s390.c |6 
 kvm/kvm-s390.h |   17 ++
 kvm/sigp.c |   59 ++---
 5 files changed, 79 insertions(+), 22 deletions(-)

[diff]
Index: kvm/arch/s390/kvm/intercept.c
===
--- kvm.orig/arch/s390/kvm/intercept.c
+++ kvm/arch/s390/kvm/intercept.c
@@ -128,7 +128,7 @@ static int handle_noop(struct kvm_vcpu *
 
 static int handle_stop(struct kvm_vcpu *vcpu)
 {
-   int rc;
+   int rc = 0;
 
vcpu-stat.exit_stop_request++;
atomic_clear_mask(CPUSTAT_RUNNING, vcpu-arch.sie_block-cpuflags);
@@ -141,12 +141,20 @@ static int handle_stop(struct kvm_vcpu *
rc = -ENOTSUPP;
}
 
+   if (vcpu-arch.local_int.action_bits  ACTION_VCPUREQUEST_ON_STOP) {
+   vcpu-arch.local_int.action_bits = ~ACTION_VCPUREQUEST_ON_STOP;
+   if (kvm_s390_handle_vcpu_requests(vcpu, VCPUREQUESTLVL_SIGP)) {
+   rc = SIE_INTERCEPT_CHECKREQUESTS;
+   vcpu-run-exit_reason = KVM_EXIT_INTR;
+   }
+   }
+
if (vcpu-arch.local_int.action_bits  ACTION_STOP_ON_STOP) {
vcpu-arch.local_int.action_bits = ~ACTION_STOP_ON_STOP;
VCPU_EVENT(vcpu, 3, %s, cpu stopped);
rc = -ENOTSUPP;
-   } else
-   rc = 0;
+   }
+
spin_unlock_bh(vcpu-arch.local_int.lock);
return rc;
 }
Index: kvm/arch/s390/kvm/kvm-s390.c
===
--- kvm.orig/arch/s390/kvm/kvm-s390.c
+++ kvm/arch/s390/kvm/kvm-s390.c
@@ -487,6 +487,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_v
 
vcpu_load(vcpu);
 
+rerun_vcpu:
+   kvm_s390_handle_vcpu_requests(vcpu, VCPUREQUESTLVL_VCPURUN);
+
/* verify, that memory has been registered */
if (!vcpu-kvm-arch.guest_memsize) {
vcpu_put(vcpu);
@@ -519,6 +522,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_v
rc = kvm_handle_sie_intercept(vcpu);
} while (!signal_pending(current)  !rc);
 
+   if (rc == SIE_INTERCEPT_CHECKREQUESTS)
+   goto rerun_vcpu;
+
if (signal_pending(current)  !rc)
rc = -EINTR;
 
Index: kvm/arch/s390/kvm/kvm-s390.h
===
--- kvm.orig/arch/s390/kvm/kvm-s390.h
+++ kvm/arch/s390/kvm/kvm-s390.h
@@ -20,6 +20,8 @@
 
 typedef int (*intercept_handler_t)(struct kvm_vcpu *vcpu);
 
+/* negativ values are error codes, positive values for internal conditions */
+#define SIE_INTERCEPT_CHECKREQUESTS(10)
 int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu);
 
 #define VM_EVENT(d_kvm, d_loglevel, d_string, d_args...)\
@@ -50,6 +52,21 @@ int kvm_s390_inject_vm(struct kvm *kvm,
 int kvm_s390_inject_vcpu(struct kvm_vcpu *vcpu,
struct kvm_s390_interrupt *s390int);
 int kvm_s390_inject_program_int(struct kvm_vcpu *vcpu, u16 code);
+int kvm_s390_inject_sigp_stop(struct kvm_vcpu *vcpu, int action);
+
+/* interception levels from which handle vcpu requests can be called */
+#define VCPUREQUESTLVL_SIGP1
+#define VCPUREQUESTLVL_VCPURUN 2
+static inline unsigned long kvm_s390_handle_vcpu_requests(struct kvm_vcpu 
*vcpu,
+  int level)
+{
+   BUG_ON(!level);
+
+   if (!vcpu-requests)
+   return 0;
+
+   return vcpu-requests;
+}
 
 /* implemented in priv.c */
 int kvm_s390_handle_b2(struct kvm_vcpu *vcpu);
Index: kvm/arch/s390/include/asm/kvm_host.h
===
--- kvm.orig/arch/s390/include/asm/kvm_host.h
+++ kvm/arch/s390/include/asm/kvm_host.h
@@ -180,8 +180,9 @@ struct kvm_s390_interrupt_info {
 };
 
 /* for local_interrupt.action_flags */
-#define ACTION_STORE_ON_STOP 1
-#define ACTION_STOP_ON_STOP  2
+#define ACTION_STORE_ON_STOP   (10)
+#define ACTION_STOP_ON_STOP(11)
+#define ACTION_VCPUREQUEST_ON_STOP (12)
 
 

[PATCH 4/5] kvm: remove redundant declarations

2009-05-28 Thread ehrhardt
From: Christian Ehrhardt ehrha...@linux.vnet.ibm.com

Changing s390 code in kvm_arch_vcpu_load/put come across this header
declarations. They are complete duplicates, not even useful forward
declarations as nothing using it is in between (according to git
blame the s390 and ia64 contributions introducing the first arch
independency overlapped and added it twice).

This patch removes the two dispensable lines.

Signed-off-by: Christian Ehrhardt ehrha...@linux.vnet.ibm.com
---

[diffstat]
 kvm_host.h |2 --
 1 file changed, 2 deletions(-)

[diff]
Index: kvm/include/linux/kvm_host.h
===
--- kvm.orig/include/linux/kvm_host.h
+++ kvm/include/linux/kvm_host.h
@@ -241,8 +241,6 @@ long kvm_arch_dev_ioctl(struct file *fil
unsigned int ioctl, unsigned long arg);
 long kvm_arch_vcpu_ioctl(struct file *filp,
 unsigned int ioctl, unsigned long arg);
-void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
-void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu);
 
 int kvm_dev_ioctl_check_extension(long ext);
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] kvm-s390: fix interruption casued by signal - v2

2009-05-28 Thread ehrhardt
From: Christian Ehrhardt ehrha...@linux.vnet.ibm.com

*updates in v2*
merged a small piece of code from patch 1/1 that belongs here themtically

If signal pending is true we exit without updating kvm_run, userspace
currently just does nothing and jumps to kvm_run again.
Since we did not set an exit_reason we might end up with a random one
(whatever was the last exit). Therefore it was possible to e.g. jump to
the psw position the last real interruption set.
Setting the INTR exit reason ensures that no old psw data is swapped
in on reentry.

Signed-off-by: Christian Ehrhardt ehrha...@linux.vnet.ibm.com
---

[diffstat]
 kvm-s390.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

[diff]
Index: kvm/arch/s390/kvm/kvm-s390.c
===
--- kvm.orig/arch/s390/kvm/kvm-s390.c
+++ kvm/arch/s390/kvm/kvm-s390.c
@@ -509,6 +509,7 @@ rerun_vcpu:
vcpu-arch.sie_block-gpsw.addr = kvm_run-s390_sieic.addr;
break;
case KVM_EXIT_UNKNOWN:
+   case KVM_EXIT_INTR:
case KVM_EXIT_S390_RESET:
break;
default:
@@ -525,8 +526,10 @@ rerun_vcpu:
if (rc == SIE_INTERCEPT_CHECKREQUESTS)
goto rerun_vcpu;
 
-   if (signal_pending(current)  !rc)
+   if (signal_pending(current)  !rc) {
+   kvm_run-exit_reason = KVM_EXIT_INTR;
rc = -EINTR;
+   }
 
if (rc == -ENOTSUPP) {
/* intercept cannot be handled in-kernel, prepare kvm-run */
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] kvm-s390: revised version of kvm-s390 guest memory handling - v4

2009-05-28 Thread ehrhardt
From: Christian Ehrhardt ehrha...@de.ibm.com

*updates in v3*
- ensure kick allocations (might_sleep) are out of atomic context
- replaced kick to level behaviour by kick to low level and bail out
- updates on running vcpus can now be handled without need to rerun the vcpu
- kvm_arch_set_memory_region waits until the update is consumed by the vcpu
- kickout only scheduled vcpus (wait might hang forever on non-scheduled vcpus)
- moved some code between patches 1  2 thematically
- update vcpu-cpu in kvm-s390 arch handler for load/put
- remove a redundant declaration in kvm_host.h related to load/put

Note: further unification of make_all_cpu_request and the kick mechanism is
planned, but it might be good to split it from this step towards commonality.

*updates in v2*
added optimization to patch 3/3 to skip (addtional) kickout of vcpu's that had
the request already set.

This patch series results from our discussions about handling memslots and vcpu
mmu reloads. It streamlines kvm-s390 a bit by using slots_lock, vcpu-request
(KVM_REQ_MMU_RELOAD) and a kick mechanism to ensure vcpus come out of guest
context to catch the update.

I tested the reworked code a while with multiple smp guests and some extra
code that periodically injects kicks and/or mmu reload requests, but I'm happy
about every additional review feedback.

Patches included:
Subject: [PATCH 1/5] kvm-s390: infrastructure to kick vcpus out of guest state 
- v3
Subject: [PATCH 2/5] kvm-s390: fix interruption casued by signal - v2
Subject: [PATCH 3/5] kvm-s390: update vcpu-cpu
Subject: [PATCH 4/5] kvm: remove redundant declarations
Subject: [PATCH 5/5] kvm-s390: streamline memslot handling - v4

Overall-Diffstat:
 arch/s390/include/asm/kvm_host.h |9 ++---
 arch/s390/kvm/gaccess.h  |   23 ++--
 arch/s390/kvm/intercept.c|   20 +++
 arch/s390/kvm/kvm-s390.c |   70 ---
 arch/s390/kvm/kvm-s390.h |   50 +++
 arch/s390/kvm/sigp.c |   63 ---
 include/linux/kvm_host.h |2 -
 7 files changed, 159 insertions(+), 78 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remove qemu_alloc_physram()

2009-05-28 Thread Ryan Harper
* Avi Kivity a...@redhat.com [2009-05-27 11:23]:
 Avi Kivity wrote:
 nicolas prochazka wrote:
 without -mem-prealloc
 HugePages_Total:  2560
 HugePages_Free:   2296
 HugePages_Rsvd:  0
 
 so after minimum test, i can say that's your patch seems to be correct
 this problem.
   
 
 It isn't correct, it doesn't generate the right alignment.
 
 Better patch attached.

This patch restores -mempath working for me on latest qemu-kvm.git.

btw, why'd we go from -mem-path (kvm-84/stable) to -mempath (kvm-85 and
newer)? Changing these breaks existing scripts.  Not a big deal, but
wondering what was the motivation for the change.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] VMX Unrestricted mode support

2009-05-28 Thread Nitin A Kamble
Avi,

A new VMX feature Unrestricted Guest feature is added in the VMX
specification. You can look at the latest Intel processor manual for
details of the feature here:

 http://www.intel.com/products/processor/manuals

It allows kvm guests to run real mode and unpaged mode
code natively in the VMX mode when EPT is turned on. With the
unrestricted guest there is no need to emulate the guest real mode code
in the vm86 container or in the emulator. Also the guest big real mode
code works like native. 

  The attached patch enhances KVM to use the unrestricted guest feature
if available on the processor. It also adds a new kernel/module
parameter to disable the unrestricted guest feature at the boot time. 

Signed-Off-By: Nitin A Kamble nitin.a.kam...@intel.com

Thanks  Regards,
Nitin


diff --git a/arch/x86/include/asm/kvm_host.h
b/arch/x86/include/asm/kvm_host.h
index d2b082d..7832599 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -40,9 +40,13 @@
 #define KVM_GUEST_CR0_MASK\
(X86_CR0_PG | X86_CR0_PE | X86_CR0_WP | X86_CR0_NE \
 | X86_CR0_NW | X86_CR0_CD)
+#define KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST
\
+   (X86_CR0_WP | X86_CR0_NE | X86_CR0_TS | X86_CR0_MP)
+#define KVM_VM_CR0_ALWAYS_ON_RESTRICTED_GUEST  \
+   (KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST | X86_CR0_PG | X86_CR0_PE)
 #define KVM_VM_CR0_ALWAYS_ON   \
-   (X86_CR0_PG | X86_CR0_PE | X86_CR0_WP | X86_CR0_NE | X86_CR0_TS \
-| X86_CR0_MP)
+   (enable_unrestricted_guest ? KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST \
+   : KVM_VM_CR0_ALWAYS_ON_RESTRICTED_GUEST)
 #define KVM_GUEST_CR4_MASK \
(X86_CR4_VME | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_PGE | X86_CR4_VMXE)
 #define KVM_PMODE_VM_CR4_ALWAYS_ON (X86_CR4_PAE | X86_CR4_VMXE)
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 498f944..c73da02 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -55,6 +55,7 @@
 #define SECONDARY_EXEC_ENABLE_EPT   0x0002
 #define SECONDARY_EXEC_ENABLE_VPID  0x0020
 #define SECONDARY_EXEC_WBINVD_EXITING  0x0040
+#define SECONDARY_EXEC_UNRESTRICTED_GUEST  0x0080
 
 
 #define PIN_BASED_EXT_INTR_MASK 0x0001
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 25f1239..703d2c4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -50,6 +50,10 @@ module_param_named(flexpriority,
flexpriority_enabled, bool, S_IRUGO);
 static int __read_mostly enable_ept = 1;
 module_param_named(ept, enable_ept, bool, S_IRUGO);
 
+static int __read_mostly enable_unrestricted_guest = 1;
+module_param_named(unrestricted_guest,
+   enable_unrestricted_guest, bool, S_IRUGO);
+
 static int __read_mostly emulate_invalid_guest_state = 0;
 module_param(emulate_invalid_guest_state, bool, S_IRUGO);
 
@@ -268,6 +272,12 @@ static inline int cpu_has_vmx_ept(void)
SECONDARY_EXEC_ENABLE_EPT;
 }
 
+static inline int cpu_has_vmx_unrestricted_guest(void)
+{
+   return vmcs_config.cpu_based_2nd_exec_ctrl 
+   SECONDARY_EXEC_UNRESTRICTED_GUEST;
+}
+
 static inline int vm_need_virtualize_apic_accesses(struct kvm *kvm)
 {
return flexpriority_enabled 
@@ -731,7 +741,7 @@ static unsigned long vmx_get_rflags(struct kvm_vcpu
*vcpu)
 
 static void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 {
-   if (vcpu-arch.rmode.active)
+   if (vcpu-arch.rmode.active  !enable_unrestricted_guest)
rflags |= X86_EFLAGS_IOPL | X86_EFLAGS_VM;
vmcs_writel(GUEST_RFLAGS, rflags);
 }
@@ -1195,7 +1205,8 @@ static __init int setup_vmcs_config(struct
vmcs_config *vmcs_conf)
opt2 = SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
SECONDARY_EXEC_WBINVD_EXITING |
SECONDARY_EXEC_ENABLE_VPID |
-   SECONDARY_EXEC_ENABLE_EPT;
+   SECONDARY_EXEC_ENABLE_EPT |
+   SECONDARY_EXEC_UNRESTRICTED_GUEST;
if (adjust_vmx_controls(min2, opt2,
MSR_IA32_VMX_PROCBASED_CTLS2,
_cpu_based_2nd_exec_control)  0)
@@ -1325,8 +1336,13 @@ static __init int hardware_setup(void)
if (!cpu_has_vmx_vpid())
enable_vpid = 0;
 
-   if (!cpu_has_vmx_ept())
+   if (!cpu_has_vmx_ept()) {
enable_ept = 0;
+   enable_unrestricted_guest = 0;
+   }
+
+   if (!cpu_has_vmx_unrestricted_guest())
+   enable_unrestricted_guest = 0;
 
if (!cpu_has_vmx_flexpriority())
flexpriority_enabled = 0;
@@ -1363,9 +1379,17 @@ static void enter_pmode(struct kvm_vcpu *vcpu)
unsigned long flags;
 

Re: [patch] VMX Unrestricted mode support

2009-05-28 Thread Alexey Eremenko
VMX Unrestricted mode support -- looks like very interesting (and
useful !) feature.

Which CPUs support it ?

Core i7 900-series (Nehalem) ?

-- 
-Alexey Eromenko Technologov
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip v8 6/7] tracing: ftrace dynamic ftrace_event_call support

2009-05-28 Thread Masami Hiramatsu
Add dynamic ftrace_event_call support to ftrace. Trace engines can adds new
ftrace_event_call to ftrace on the fly. Each operator functions of the call
takes a ftrace_event_call data structure as an argument, because these
functions may be shared among several ftrace_event_calls.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Frederic Weisbecker fweis...@gmail.com
---

 include/linux/ftrace_event.h |   13 ++
 include/trace/ftrace.h   |   22 +
 kernel/trace/trace_events.c  |   54 +-
 kernel/trace/trace_export.c  |   27 ++---
 4 files changed, 69 insertions(+), 47 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index bbf40f6..e25f3a4 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -108,12 +108,13 @@ struct ftrace_event_call {
struct dentry   *dir;
struct trace_event  *event;
int enabled;
-   int (*regfunc)(void);
-   void(*unregfunc)(void);
+   int (*regfunc)(struct ftrace_event_call *);
+   void(*unregfunc)(struct ftrace_event_call *);
int id;
-   int (*raw_init)(void);
-   int (*show_format)(struct trace_seq *s);
-   int (*define_fields)(void);
+   int (*raw_init)(struct ftrace_event_call *);
+   int (*show_format)(struct ftrace_event_call *,
+  struct trace_seq *);
+   int (*define_fields)(struct ftrace_event_call *);
struct list_headfields;
int filter_active;
void*filter;
@@ -138,6 +139,8 @@ extern int filter_current_check_discard(struct 
ftrace_event_call *call,
 
 extern int trace_define_field(struct ftrace_event_call *call, char *type,
  char *name, int offset, int size, int is_signed);
+extern int trace_add_event_call(struct ftrace_event_call *call);
+extern void trace_remove_event_call(struct ftrace_event_call *call);
 
 #define is_signed_type(type)   (((type)(-1))  0)
 
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index b4ec83a..de3ee7c 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -229,7 +229,8 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int 
flags)\
 #undef TRACE_EVENT
 #define TRACE_EVENT(call, proto, args, tstruct, func, print)   \
 static int \
-ftrace_format_##call(struct trace_seq *s)  \
+ftrace_format_##call(struct ftrace_event_call *event_call, \
+struct trace_seq *s)   \
 {  \
struct ftrace_raw_##call field __attribute__((unused)); \
int ret = 0;\
@@ -269,10 +270,9 @@ ftrace_format_##call(struct trace_seq *s)  
\
 #undef TRACE_EVENT
 #define TRACE_EVENT(call, proto, args, tstruct, func, print)   \
 int\
-ftrace_define_fields_##call(void)  \
+ftrace_define_fields_##call(struct ftrace_event_call *event_call)  \
 {  \
struct ftrace_raw_##call field; \
-   struct ftrace_event_call *event_call = event_##call;   \
int ret;\
\
__common_field(int, type, 1);   \
@@ -298,7 +298,7 @@ ftrace_define_fields_##call(void)   
\
  * event_trace_printk(_RET_IP_, call:  fmt);
  * }
  *
- * static int ftrace_reg_event_call(void)
+ * static int ftrace_reg_event_call(struct ftrace_event_call *dummy)
  * {
  * int ret;
  *
@@ -309,7 +309,7 @@ ftrace_define_fields_##call(void)   
\
  * return ret;
  * }
  *
- * static void ftrace_unreg_event_call(void)
+ * static void ftrace_unreg_event_call(struct ftrace_event_call *dummy)
  * {
  * unregister_trace_call(ftrace_event_call);
  * }
@@ -342,7 +342,7 @@ ftrace_define_fields_##call(void)   
\
  * trace_current_buffer_unlock_commit(event, irq_flags, pc);
  * }
  *
- * static int ftrace_raw_reg_event_call(void)
+ * static int 

[PATCH -tip v8 3/7] kprobes: checks probe address is instruction boudary on x86

2009-05-28 Thread Masami Hiramatsu
Ensure safeness of inserting kprobes by checking whether the specified
address is at the first byte of a instruction on x86.
This is done by decoding probed function from its head to the probe point.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: Ingo Molnar mi...@elte.hu
---

 arch/x86/kernel/kprobes.c |   69 +
 1 files changed, 69 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index 7b5169d..41d524f 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -48,12 +48,14 @@
 #include linux/preempt.h
 #include linux/module.h
 #include linux/kdebug.h
+#include linux/kallsyms.h
 
 #include asm/cacheflush.h
 #include asm/desc.h
 #include asm/pgtable.h
 #include asm/uaccess.h
 #include asm/alternative.h
+#include asm/insn.h
 
 void jprobe_return_end(void);
 
@@ -244,6 +246,71 @@ retry:
}
 }
 
+/* Recover the probed instruction at addr for further analysis. */
+static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long addr)
+{
+   struct kprobe *kp;
+   kp = get_kprobe((void *)addr);
+   if (!kp)
+   return -EINVAL;
+
+   /*
+*  Basically, kp-ainsn.insn has an original instruction.
+*  However, RIP-relative instruction can not do single-stepping
+*  at different place, fix_riprel() tweaks the displacement of
+*  that instruction. In that case, we can't recover the instruction
+*  from the kp-ainsn.insn.
+*
+*  On the other hand, kp-opcode has a copy of the first byte of
+*  the probed instruction, which is overwritten by int3. And
+*  the instruction at kp-addr is not modified by kprobes except
+*  for the first byte, we can recover the original instruction
+*  from it and kp-opcode.
+*/
+   memcpy(buf, kp-addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
+   buf[0] = kp-opcode;
+   return 0;
+}
+
+/* Dummy buffers for kallsyms_lookup */
+static char __dummy_buf[KSYM_NAME_LEN];
+
+/* Check if paddr is at an instruction boundary */
+static int __kprobes can_probe(unsigned long paddr)
+{
+   int ret;
+   unsigned long addr, offset = 0;
+   struct insn insn;
+   kprobe_opcode_t buf[MAX_INSN_SIZE];
+
+   if (!kallsyms_lookup(paddr, NULL, offset, NULL, __dummy_buf))
+   return 0;
+
+   /* Decode instructions */
+   addr = paddr - offset;
+   while (addr  paddr) {
+   kernel_insn_init(insn, (void *)addr);
+   insn_get_opcode(insn);
+
+   /* Check if the instruction has been modified. */
+   if (OPCODE1(insn) == BREAKPOINT_INSTRUCTION) {
+   ret = recover_probed_instruction(buf, addr);
+   if (ret)
+   /*
+* Another debugging subsystem might insert
+* this breakpoint. In that case, we can't
+* recover it.
+*/
+   return 0;
+   kernel_insn_init(insn, buf);
+   }
+   insn_get_length(insn);
+   addr += insn.length;
+   }
+
+   return (addr == paddr);
+}
+
 /*
  * Returns non-zero if opcode modifies the interrupt flag.
  */
@@ -359,6 +426,8 @@ static void __kprobes arch_copy_kprobe(struct kprobe *p)
 
 int __kprobes arch_prepare_kprobe(struct kprobe *p)
 {
+   if (!can_probe((unsigned long)p-addr))
+   return -EILSEQ;
/* insn: must be on special executable page on x86. */
p-ainsn.insn = get_insn_slot();
if (!p-ainsn.insn)


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip v8 5/7] x86: add pt_regs register and stack access APIs

2009-05-28 Thread Masami Hiramatsu
Add following APIs for accessing registers and stack entries from pt_regs.

- query_register_offset(const char *name)
   Query the offset of name register.

- query_register_name(unsigned offset)
   Query the name of register by its offset.

- get_register(struct pt_regs *regs, unsigned offset)
   Get the value of a register by its offset.

- within_kernel_stack(struct pt_regs *regs, unsigned long addr)
   Check the address is in the kernel stack.

- get_kernel_stack_nth(struct pt_regs *reg, unsigned nth)
   Get Nth entry of the kernel stack. (N = 0)

- get_argument_nth(struct pt_regs *reg, unsigned nth)
   Get Nth argument at function call. (N = 0)

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Roland McGrath rol...@redhat.com
---

 arch/x86/include/asm/ptrace.h |   67 +
 arch/x86/kernel/ptrace.c  |   60 +
 2 files changed, 127 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 0f0d908..577d625 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -7,6 +7,7 @@
 
 #ifdef __KERNEL__
 #include asm/segment.h
+#include asm/page_types.h
 #endif
 
 #ifndef __ASSEMBLY__
@@ -216,6 +217,72 @@ static inline unsigned long user_stack_pointer(struct 
pt_regs *regs)
return regs-sp;
 }
 
+/* Query offset/name of register from its name/offset */
+extern int query_register_offset(const char *name);
+extern const char *query_register_name(unsigned offset);
+#define MAX_REG_OFFSET (offsetof(struct pt_regs, ss))
+
+/* Get register value from its offset */
+static inline unsigned long get_register(struct pt_regs *regs, unsigned offset)
+{
+   if (unlikely(offset  MAX_REG_OFFSET))
+   return 0;
+   return *(unsigned long *)((unsigned long)regs + offset);
+}
+
+/* Check the address in the stack */
+static inline int within_kernel_stack(struct pt_regs *regs, unsigned long addr)
+{
+   return ((addr  ~(THREAD_SIZE - 1))  ==
+   (kernel_stack_pointer(regs)  ~(THREAD_SIZE - 1)));
+}
+
+/* Get Nth entry of the stack */
+static inline unsigned long get_kernel_stack_nth(struct pt_regs *regs,
+unsigned n)
+{
+   unsigned long *addr = (unsigned long *)kernel_stack_pointer(regs);
+   addr += n;
+   if (within_kernel_stack(regs, (unsigned long)addr))
+   return *addr;
+   else
+   return 0;
+}
+
+/* Get Nth argument at function call */
+static inline unsigned long get_argument_nth(struct pt_regs *regs, unsigned n)
+{
+#ifdef CONFIG_X86_32
+#define NR_REGPARMS 3
+   if (n  NR_REGPARMS) {
+   switch (n) {
+   case 0: return regs-ax;
+   case 1: return regs-dx;
+   case 2: return regs-cx;
+   }
+   return 0;
+#else /* CONFIG_X86_64 */
+#define NR_REGPARMS 6
+   if (n  NR_REGPARMS) {
+   switch (n) {
+   case 0: return regs-di;
+   case 1: return regs-si;
+   case 2: return regs-dx;
+   case 3: return regs-cx;
+   case 4: return regs-r8;
+   case 5: return regs-r9;
+   }
+   return 0;
+#endif
+   } else {
+   /*
+* The typical case: arg n is on the stack.
+* (Note: stack[0] = return address, so skip it)
+*/
+   return get_kernel_stack_nth(regs, 1 + n - NR_REGPARMS);
+   }
+}
+
 /*
  * These are defined as per linux/ptrace.h, which see.
  */
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 09ecbde..00eb9d7 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -48,6 +48,66 @@ enum x86_regset {
REGSET_IOPERM32,
 };
 
+struct pt_regs_offset {
+   const char *name;
+   int offset;
+};
+
+#define REG_OFFSET(r) offsetof(struct pt_regs, r)
+#define REG_OFFSET_NAME(r) {.name = #r, .offset = REG_OFFSET(r)}
+#define REG_OFFSET_END {.name = NULL, .offset = 0}
+
+static const struct pt_regs_offset regoffset_table[] = {
+#ifdef CONFIG_X86_64
+   REG_OFFSET_NAME(r15),
+   REG_OFFSET_NAME(r14),
+   REG_OFFSET_NAME(r13),
+   REG_OFFSET_NAME(r12),
+   REG_OFFSET_NAME(r11),
+   REG_OFFSET_NAME(r10),
+   REG_OFFSET_NAME(r9),
+   REG_OFFSET_NAME(r8),
+#endif
+   REG_OFFSET_NAME(bx),
+   REG_OFFSET_NAME(cx),
+   REG_OFFSET_NAME(dx),
+   REG_OFFSET_NAME(si),
+   REG_OFFSET_NAME(di),
+   REG_OFFSET_NAME(bp),
+   REG_OFFSET_NAME(ax),
+#ifdef CONFIG_X86_32
+   REG_OFFSET_NAME(ds),
+   REG_OFFSET_NAME(es),
+   REG_OFFSET_NAME(fs),
+   REG_OFFSET_NAME(gs),
+#endif
+   REG_OFFSET_NAME(orig_ax),
+   

[PATCH -tip v8 2/7] x86: x86 instruction decoder build-time selftest

2009-05-28 Thread Masami Hiramatsu
Add a user-space selftest of x86 instruction decoder at kernel build time.
When CONFIG_X86_DECODER_SELFTEST=y, Kbuild builds a test harness of x86
instruction decoder and performs it after building vmlinux.
The test compares the results of objdump and x86 instruction decoder
code and check there are no differences.

Changes from v7:
- Add data, addr, rep, lock prefixes to skip instructions list.
- Add license comments.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Signed-off-by: Jim Keniston jkeni...@us.ibm.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Vegard Nossum vegard.nos...@gmail.com
Cc: Avi Kivity a...@redhat.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
Cc: Sam Ravnborg s...@ravnborg.org
---

 arch/x86/Kconfig.debug  |9 
 arch/x86/Makefile   |3 +
 arch/x86/include/asm/inat.h |2 +
 arch/x86/include/asm/insn.h |2 +
 arch/x86/lib/inat.c |2 +
 arch/x86/lib/insn.c |2 +
 arch/x86/scripts/Makefile   |   19 +++
 arch/x86/scripts/distill.awk|   42 +
 arch/x86/scripts/test_get_len.c |   99 +++
 arch/x86/scripts/user_include.h |   49 +++
 10 files changed, 229 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/scripts/Makefile
 create mode 100644 arch/x86/scripts/distill.awk
 create mode 100644 arch/x86/scripts/test_get_len.c
 create mode 100644 arch/x86/scripts/user_include.h

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 9a88937..430aab4 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -179,6 +179,15 @@ config X86_DS_SELFTEST
 config HAVE_MMIOTRACE_SUPPORT
def_bool y
 
+config X86_DECODER_SELFTEST
+ bool x86 instruction decoder selftest
+ depends on DEBUG_KERNEL
+   ---help---
+Perform x86 instruction decoder selftests at build time.
+This option is useful for checking the sanity of x86 instruction
+decoder code.
+If unsure, say N.
+
 #
 # IO delay types:
 #
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 1b68659..7046556 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -154,6 +154,9 @@ all: bzImage
 KBUILD_IMAGE := $(boot)/bzImage
 
 bzImage: vmlinux
+ifeq ($(CONFIG_X86_DECODER_SELFTEST),y)
+   $(Q)$(MAKE) $(build)=arch/x86/scripts posttest
+endif
$(Q)$(MAKE) $(build)=$(boot) $(KBUILD_IMAGE)
$(Q)mkdir -p $(objtree)/arch/$(UTS_MACHINE)/boot
$(Q)ln -fsn ../../x86/boot/bzImage 
$(objtree)/arch/$(UTS_MACHINE)/boot/$@
diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index 01e079a..9090665 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -20,7 +20,9 @@
  * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
  *
  */
+#ifdef __KERNEL__
 #include linux/types.h
+#endif
 
 /* Instruction attributes */
 typedef u32 insn_attr_t;
diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index 5b50fa3..5736404 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -20,7 +20,9 @@
  * Copyright (C) IBM Corporation, 2009
  */
 
+#ifdef __KERNEL__
 #include linux/types.h
+#endif
 /* insn_attr_t is defined in inat.h */
 #include asm/inat.h
 
diff --git a/arch/x86/lib/inat.c b/arch/x86/lib/inat.c
index d6a34be..564ecbd 100644
--- a/arch/x86/lib/inat.c
+++ b/arch/x86/lib/inat.c
@@ -18,7 +18,9 @@
  * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
  *
  */
+#ifdef __KERNEL__
 #include linux/module.h
+#endif
 #include asm/insn.h
 
 /* Attribute tables are generated from opcode map */
diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c
index 254c848..3b9451a 100644
--- a/arch/x86/lib/insn.c
+++ b/arch/x86/lib/insn.c
@@ -18,8 +18,10 @@
  * Copyright (C) IBM Corporation, 2002, 2004, 2009
  */
 
+#ifdef __KERNEL__
 #include linux/string.h
 #include linux/module.h
+#endif
 #include asm/inat.h
 #include asm/insn.h
 
diff --git a/arch/x86/scripts/Makefile b/arch/x86/scripts/Makefile
new file mode 100644
index 000..f08859e
--- /dev/null
+++ b/arch/x86/scripts/Makefile
@@ -0,0 +1,19 @@
+PHONY += posttest
+quiet_cmd_posttest = TEST$@
+  cmd_posttest = objdump -d $(objtree)/vmlinux | awk -f 
$(srctree)/arch/x86/scripts/distill.awk | $(obj)/test_get_len
+
+posttest: $(obj)/test_get_len vmlinux
+   $(call cmd,posttest)
+
+test_get_len_SRC = $(srctree)/arch/x86/scripts/test_get_len.c 
$(srctree)/arch/x86/lib/insn.c $(srctree)/arch/x86/lib/inat.c
+test_get_len_INC = $(srctree)/arch/x86/include/asm/inat.h 
$(srctree)/arch/x86/include/asm/insn.h $(objtree)/arch/x86/lib/inat-tables.c
+
+quiet_cmd_test_get_len = CC  $@
+  cmd_test_get_len = $(CC) -Wall $(test_get_len_SRC) 

[PATCH -tip v8 4/7] kprobes: cleanup fix_riprel() using insn decoder on x86

2009-05-28 Thread Masami Hiramatsu
Cleanup fix_riprel() in arch/x86/kernel/kprobes.c by using x86 instruction
decoder.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: Ingo Molnar mi...@elte.hu
---

 arch/x86/kernel/kprobes.c |  128 -
 1 files changed, 23 insertions(+), 105 deletions(-)

diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index 41d524f..ebac470 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -108,50 +108,6 @@ static const u32 twobyte_is_boostable[256 / 32] = {
/*  --- */
/*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
 };
-static const u32 onebyte_has_modrm[256 / 32] = {
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-   /*  --- */
-   W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 00 */
-   W(0x10, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 10 */
-   W(0x20, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 20 */
-   W(0x30, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 30 */
-   W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 40 */
-   W(0x50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 50 */
-   W(0x60, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0) | /* 60 */
-   W(0x70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 70 */
-   W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */
-   W(0x90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 90 */
-   W(0xa0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* a0 */
-   W(0xb0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* b0 */
-   W(0xc0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* c0 */
-   W(0xd0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */
-   W(0xe0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* e0 */
-   W(0xf0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1)   /* f0 */
-   /*  --- */
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-};
-static const u32 twobyte_has_modrm[256 / 32] = {
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-   /*  --- */
-   W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1) | /* 0f */
-   W(0x10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0) , /* 1f */
-   W(0x20, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 2f */
-   W(0x30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 3f */
-   W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 4f */
-   W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 5f */
-   W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 6f */
-   W(0x70, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1) , /* 7f */
-   W(0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 8f */
-   W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 9f */
-   W(0xa0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1) | /* af */
-   W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1) , /* bf */
-   W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* cf */
-   W(0xd0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* df */
-   W(0xe0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* ef */
-   W(0xf0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0)   /* ff */
-   /*  --- */
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-};
 #undef W
 
 struct kretprobe_blackpoint kretprobe_blacklist[] = {
@@ -344,68 +300,30 @@ static int __kprobes is_IF_modifier(kprobe_opcode_t *insn)
 static void __kprobes fix_riprel(struct kprobe *p)
 {
 #ifdef CONFIG_X86_64
-   u8 *insn = p-ainsn.insn;
-   s64 disp;
-   int need_modrm;
-
-   /* Skip legacy instruction prefixes.  */
-   while (1) {
-   switch (*insn) {
-   case 0x66:
-   case 0x67:
-   case 0x2e:
-   case 0x3e:
-   case 0x26:
-   case 0x64:
-   case 0x65:
-   case 0x36:
-   case 0xf0:
-   case 0xf3:
-   case 0xf2:
-   ++insn;
-   continue;
-   }
-   break;
-   }
+   struct insn insn;
+   kernel_insn_init(insn, p-ainsn.insn);
 
-   /* Skip REX instruction prefix.  */
-   if (is_REX_prefix(insn))
-   ++insn;
-
-   if (*insn == 0x0f) {
-   /* Two-byte opcode.  */
-   ++insn;
-   

[PATCH -tip v8 1/7] x86: instruction decoder API

2009-05-28 Thread Masami Hiramatsu
Add x86 instruction decoder to arch-specific libraries. This decoder
can decode x86 instructions used in kernel into prefix, opcode, modrm,
sib, displacement and immediates. This can also show the length of
instructions.

This version introduces instruction attributes for decoding instructions.
The instruction attribute tables are generated from the opcode map file
(x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk).

Currently, the opcode maps are based on opcode maps in Intel(R) 64 and
IA-32 Architectures Software Developers Manual Vol.2: Appendix.A,
and consist of below two types of opcode tables.

1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are
written as below;

 Table: table-name
 Referrer: escaped-name
 opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 
2nd-mnemonic ...]
  (or)
 opcode: escape # escaped-name
 EndTable

Group opcodes, which has 8 elements, are written as below;

 GrpTable: GrpXXX
 reg:  mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 
2nd-mnemonic ...]
 EndTable

These opcode maps do NOT include most of SSE and FP opcodes, because
those opcodes are not used in the kernel.

Changes from v6.1:
- fix patch title.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Signed-off-by: Jim Keniston jkeni...@us.ibm.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Vegard Nossum vegard.nos...@gmail.com
Cc: Avi Kivity a...@redhat.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
---

 arch/x86/include/asm/inat.h|  125 ++
 arch/x86/include/asm/insn.h|  134 ++
 arch/x86/lib/Makefile  |   13 +
 arch/x86/lib/inat.c|   80 
 arch/x86/lib/insn.c|  471 +
 arch/x86/lib/x86-opcode-map.txt|  711 
 arch/x86/scripts/gen-insn-attr-x86.awk |  314 ++
 7 files changed, 1848 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/inat.h
 create mode 100644 arch/x86/include/asm/insn.h
 create mode 100644 arch/x86/lib/inat.c
 create mode 100644 arch/x86/lib/insn.c
 create mode 100644 arch/x86/lib/x86-opcode-map.txt
 create mode 100644 arch/x86/scripts/gen-insn-attr-x86.awk

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
new file mode 100644
index 000..01e079a
--- /dev/null
+++ b/arch/x86/include/asm/inat.h
@@ -0,0 +1,125 @@
+#ifndef _ASM_INAT_INAT_H
+#define _ASM_INAT_INAT_H
+/*
+ * x86 instruction attributes
+ *
+ * Written by Masami Hiramatsu mhira...@redhat.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ */
+#include linux/types.h
+
+/* Instruction attributes */
+typedef u32 insn_attr_t;
+
+/*
+ * Internal bits. Don't use bitmasks directly, because these bits are
+ * unstable. You should add checking macros and use that macro in
+ * your code.
+ */
+
+#define INAT_OPCODE_TABLE_SIZE 256
+#define INAT_GROUP_TABLE_SIZE 8
+
+/* Legacy instruction prefixes */
+#define INAT_PFX_OPNDSZ1   /* 0x66 */ /* LPFX1 */
+#define INAT_PFX_REPNE 2   /* 0xF2 */ /* LPFX2 */
+#define INAT_PFX_REPE  3   /* 0xF3 */ /* LPFX3 */
+#define INAT_PFX_LOCK  4   /* 0xF0 */
+#define INAT_PFX_CS5   /* 0x2E */
+#define INAT_PFX_DS6   /* 0x3E */
+#define INAT_PFX_ES7   /* 0x26 */
+#define INAT_PFX_FS8   /* 0x64 */
+#define INAT_PFX_GS9   /* 0x65 */
+#define INAT_PFX_SS10  /* 0x36 */
+#define INAT_PFX_ADDRSZ11  /* 0x67 */
+
+#define INAT_LPREFIX_MAX   3
+
+/* Immediate size */
+#define INAT_IMM_BYTE  1
+#define INAT_IMM_WORD  2
+#define INAT_IMM_DWORD 3
+#define INAT_IMM_QWORD 4
+#define INAT_IMM_PTR   5
+#define INAT_IMM_VWORD32   6
+#define INAT_IMM_VWORD 7
+
+/* Legacy prefix */
+#define INAT_PFX_OFFS  0
+#define INAT_PFX_BITS  4
+#define INAT_PFX_MAX((1  INAT_PFX_BITS) - 1)
+#define INAT_PFX_MASK  (INAT_PFX_MAX  INAT_PFX_OFFS)
+/* Escape opcodes */
+#define INAT_ESC_OFFS  (INAT_PFX_OFFS + INAT_PFX_BITS)
+#define INAT_ESC_BITS  2
+#define INAT_ESC_MAX   ((1  

[PATCH -tip v8 0/7] tracing: kprobe-based event tracer and x86 instruction decoder

2009-05-28 Thread Masami Hiramatsu
Hi,

Here are the patches of kprobe-based event tracer for x86, version 8,
which allows you to probe various kernel events through ftrace interface.

This version, I added per-probe filtering support which allows you to set
filters on each probe and shows formats of each probe. I think this is
more generic integration with ftrace, especially event-tracer.

This patchset also includes x86(-64) instruction decoder which
supports non-SSE/FP opcodes and includes x86 opcode map. The decoder
is used for finding the instruction boundaries when inserting new
kprobes. I think it will be possible to share this opcode map
with KVM's decoder.
The decoder is tested when building kernel, the test compares the 
results of objdump and the decoder right after building vmlinux.
You can enable that test by CONFIG_X86_DECODER_SELFTEST=y.

This series can be applied on the latest linux-2.6-tip tree.

This supports only x86(-32/-64) (but porting it on other arch
just needs kprobes/kretprobes and register and stack access APIs).

This patchset includes following changes:
- Add x86 instruction decoder [1/7] (FIXED)
- Add x86 instruction decoder selftest [2/7] (FIXED)
- Check insertion point safety in kprobe [3/7]
- Cleanup fix_riprel() with insn decoder [4/7]
- Add arch-dep register and stack fetching functions [5/7]
- Add dynamic event_call support to ftrace [6/7] (NEW)
- Add kprobe-based event tracer [7/7] (UPDATED)


Enhancement ideas will be added after merging:
- .init function tracing support.
- Support primitive types(long, ulong, int, uint, etc) for args.


Kprobe-based Event Tracer
=

Overview

This tracer is similar to the events tracer which is based on Tracepoint
infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe
and kretprobe). It probes anywhere where kprobes can probe(this means, all
functions body except for __kprobes functions).

Unlike the function tracer, this tracer can probe instructions inside of
kernel functions. It allows you to check which instruction has been executed.

Unlike the Tracepoint based events tracer, this tracer can add new probe points
on the fly.

Similar to the events tracer, this tracer doesn't need to be activated via
current_tracer, instead of that, just set probe points via
/debug/tracing/kprobe_events.


Synopsis of kprobe_events
-
  p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS] : set a probe
  r[:EVENT] SYMBOL[+0] [FETCHARGS]  : set a return probe

 EVENT  : Event name
 SYMBOL[+offs|-offs]: Symbol+offset where the probe is inserted
 MEMADDR: Address where the probe is inserted

 FETCHARGS  : Arguments
  %REG  : Fetch register REG
  sN: Fetch Nth entry of stack (N = 0)
  @ADDR : Fetch memory at ADDR (ADDR should be in kernel)
  @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
  aN: Fetch function argument. (N = 0)(*)
  rv: Fetch return value.(**)
  ra: Fetch return address.(**)
  +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.(***)

  (*) aN may not correct on asmlinkaged functions and at the middle of
  function body.
  (**) only for return probe.
  (***) this is useful for fetching a field of data structures.


Per-Probe Event Filtering
-
 Per-probe event filtering feature allows you to set different filter on each
probe and gives you what arguments will be shown in trace buffer. If an event
name is specified right after 'p:' or 'r:' in kprobe_events, the tracer adds
an event under tracing/events/kprobes/EVENT, at the directory you can see
'id', 'enabled', 'format' and 'filter'.

enabled:
  You can enable/disable the probe by writing 1 or 0 on it.

format:
  It shows the format of this probe event. It also shows aliases of arguments
 which you specified to kprobe_events.

filter:
  You can write filtering rules of this event. And you can use both of aliase
 names and field names for describing filters.


Usage examples
--
To add a probe as a new event, write a new definition to kprobe_events
as below.

  echo p:myprobe do_sys_open a0 a1 a2 a3  /debug/tracing/kprobe_events

 This sets a kprobe on the top of do_sys_open() function with recording
1st to 4th arguments as myprobe event.

  echo r:myretprobe do_sys_open rv ra  /debug/tracing/kprobe_events

 This sets a kretprobe on the return point of do_sys_open() function with
recording return value and return address as myretprobe event.
 You can see the format of these events via
tracing/events/kprobes/EVENT/format.

  cat /debug/tracing/events/kprobes/myprobe/format
name: myprobe
ID: 23
format:
field:unsigned short common_type;   offset:0;   size:2;
field:unsigned char common_flags;   offset:2;   size:1;
field:unsigned char common_preempt_count;   offset:3;   size:1;
field:int common_pid;   offset:4;   size:4;

[PATCH -tip v8 7/7] tracing: add kprobe-based event tracer

2009-05-28 Thread Masami Hiramatsu
Add kprobes-based event tracer on ftrace.

This tracer is similar to the events tracer which is based on Tracepoint
infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe
and kretprobe). It probes anywhere where kprobes can probe(this means, all
functions body except for __kprobes functions).

Similar to the events tracer, this tracer doesn't need to be activated via
current_tracer, instead of that, just set probe points via
/debug/tracing/kprobe_events. And you can set filters on each probe events
via /debug/tracing/events/kprobes/EVENT/filter.

This tracer supports following probe arguments for each probe.

  %REG  : Fetch register REG
  sN: Fetch Nth entry of stack (N = 0)
  @ADDR : Fetch memory at ADDR (ADDR should be in kernel)
  @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
  aN: Fetch function argument. (N = 0)
  rv: Fetch return value.
  ra: Fetch return address.
  +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.

See Documentation/trace/kprobes.txt for details.

Changes from v7:
 - Fix document example.
 - Remove solved TODO.
 - Support per-probe event filtering.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Tom Zanussi tzanu...@gmail.com
---

 Documentation/trace/kprobes.txt  |  138 
 kernel/trace/Kconfig |   12 
 kernel/trace/Makefile|1 
 kernel/trace/trace.h |   22 +
 kernel/trace/trace_event_types.h |   20 +
 kernel/trace/trace_kprobe.c  | 1174 ++
 6 files changed, 1367 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/trace/kprobes.txt
 create mode 100644 kernel/trace/trace_kprobe.c

diff --git a/Documentation/trace/kprobes.txt b/Documentation/trace/kprobes.txt
new file mode 100644
index 000..f6b4587
--- /dev/null
+++ b/Documentation/trace/kprobes.txt
@@ -0,0 +1,138 @@
+ Kprobe-based Event Tracer
+ =
+
+ Documentation is written by Masami Hiramatsu
+
+
+Overview
+
+This tracer is similar to the events tracer which is based on Tracepoint
+infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe
+and kretprobe). It probes anywhere where kprobes can probe(this means, all
+functions body except for __kprobes functions).
+
+Unlike the function tracer, this tracer can probe instructions inside of
+kernel functions. It allows you to check which instruction has been executed.
+
+Unlike the Tracepoint based events tracer, this tracer can add and remove
+probe points on the fly.
+
+Similar to the events tracer, this tracer doesn't need to be activated via
+current_tracer, instead of that, just set probe points via
+/debug/tracing/kprobe_events. And you can set filters on each probe events
+via /debug/tracing/events/kprobes/EVENT/filter.
+
+
+Synopsis of kprobe_events
+-
+  p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS]: set a probe
+  r[:EVENT] SYMBOL[+0] [FETCHARGS] : set a return probe
+
+ EVENT : Event name
+ SYMBOL[+offs|-offs]   : Symbol+offset where the probe is inserted
+ MEMADDR   : Address where the probe is inserted
+
+ FETCHARGS : Arguments
+  %REG : Fetch register REG
+  sN   : Fetch Nth entry of stack (N = 0)
+  @ADDR: Fetch memory at ADDR (ADDR should be in kernel)
+  @SYM[+|-offs]: Fetch memory at SYM +|- offs (SYM should be a data 
symbol)
+  aN   : Fetch function argument. (N = 0)(*)
+  rv   : Fetch return value.(**)
+  ra   : Fetch return address.(**)
+  +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.(***)
+
+  (*) aN may not correct on asmlinkaged functions and at the middle of
+  function body.
+  (**) only for return probe.
+  (***) this is useful for fetching a field of data structures.
+
+
+Per-Probe Event Filtering
+-
+ Per-probe event filtering feature allows you to set different filter on each
+probe and gives you what arguments will be shown in trace buffer. If an event
+name is specified right after 'p:' or 'r:' in kprobe_events, the tracer adds
+an event under tracing/events/kprobes/EVENT, at the directory you can see
+'id', 'enabled', 'format' and 'filter'.
+
+enabled:
+  You can enable/disable the probe by writing 1 or 0 on it.
+
+format:
+  It shows the format of this probe event. It also shows aliases of arguments
+ which you specified to kprobe_events.
+
+filter:
+  You can write filtering rules of this event. And you can use both of aliase
+ names and field names for describing filters.
+
+
+Usage examples
+--
+To add a probe as a new event, write a new definition to kprobe_events
+as below.
+
+  echo 

Re: [patch] VMX Unrestricted guest mode support

2009-05-28 Thread Nitin A Kamble
Unrestricted guest features is introduced in Westmere processor.
Westmere is next to nehalem. And all the following processors will
support it.  Westmere would be the 1st processor being built on the 32nm
process.

Thanks,
Nitin

On Thu, 2009-05-28 at 16:39 -0700, Alexey Eremenko wrote:
 VMX Unrestricted mode support -- looks like very interesting (and
 useful !) feature.
 
 Which CPUs support it ?
 
 Core i7 900-series (Nehalem) ?
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/2] Add serial number support for virtio_blk, V3

2009-05-28 Thread john cooper

 +if (!(id = kzalloc(ATA_ID_WORDS, GFP_KERNEL)))
 +rv = -ENOMEM;
 
 Doesn't ATA_ID_WORDS seem like a strange name for a number of bytes?

Yes I caught that bug in the rework as well.

 What's this *for* BTW?

Sorry -- I assumed you were on either list.
Please see patch to follow.

-john

-- 
john.coo...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/2] Add serial number support for virtio_blk, V3

2009-05-28 Thread john cooper
Christoph Hellwig wrote:
 On Wed, May 27, 2009 at 09:49:19AM +0200, Christoph Hellwig wrote:
 /*
  * IDE-compatible identify ioctl.
  *
  * Currenlyt only returns the serial number and leaves all other fields
  * zero.
  */
 
 Btw, thinking about it the rest of the information in the ioctl should
 probably be filled up with faked data, similar to how we do it for
 the ide emulation inside qemu.

Agreed.  I've done so to the extent it makes sense
in the case of the more generic fields.  A fair
amount of the identify information being generated
by hw/ide.c appears obsoleted.

-john

-- 
john.coo...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Add serial number support for virtio_blk, V4

2009-05-28 Thread john cooper
[Rework of earlier patch to provide additional
information in the response to an ATA identify
request -- virtio_blk treats the data as opaque,
content created by qemu's virtio-blk.  Comments
from Christoph also incorporated.]

This patch allows passing of a virtio_blk drive
serial number from qemu into a guest's virtio_blk
driver, and provides a means to access the serial
number from a guest's userspace.

Equivalent functionality currently exists for IDE
and SCSI, however it is not yet implemented for
virtio.  Scenarios exist where guest code relies
on a unique drive serial number to correctly
identify the machine environment in which it
exists.

The following two patches implement the above:

  qemu-vblk-serial-4.patch

which provides the qemu missing bits to interpret
a '-drive .. serial=XYZ ..' flag, and:

  virtio_blk-serial-4.patch

which extracts this information and makes it
available to guest userspace via an HDIO_GET_IDENTITY
ioctl, eg: 'hdparm -i /dev/vda'.

The above patches are relative to qemu-kvm.git and
2.6.29.3 respectively.

-john

-- 
john.coo...@redhat.com



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Add serial number support for virtio_blk, V4

2009-05-28 Thread john cooper
qemu-vblk-serial-4.patch



diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 8dd3c7a..0b7ebe9 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -25,6 +25,7 @@ typedef struct VirtIOBlock
 BlockDriverState *bs;
 VirtQueue *vq;
 void *rq;
+char serial_str[BLOCK_SERIAL_STRLEN + 1];
 } VirtIOBlock;
 
 static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev)
@@ -285,6 +286,50 @@ static void virtio_blk_reset(VirtIODevice *vdev)
 qemu_aio_flush();
 }
 
+/* store identify data in little endian format
+ */
+static inline void put_le16(uint16_t *p, unsigned int v)
+{
+*p = cpu_to_le16(v);
+}
+
+/* copy to *dst from *src, nul pad dst tail as needed to len bytes
+ */
+static inline void padstr(char *dst, const char *src, int len)
+{
+while (len--)
+*dst++ = *src ? *src++ : '\0';
+}
+
+/* setup simulated identify data as appropriate for virtio block device
+ *
+ * ref: AT Attachment 8 - ATA/ATAPI Command Set (ATA8-ACS)
+ */
+static inline void virtio_identify_template(struct virtio_blk_config *bc)
+{
+uint16_t *p = bc-identify[0];
+uint64_t lba_sectors = bc-capacity;
+
+memset(p, 0, sizeof(bc-identify));
+put_le16(p + 0, 0x0);/* ATA device */
+padstr((char *)(p + 23), QEMU_VERSION, 8);   /* firmware revision */
+padstr((char *)(p + 27), QEMU VIRT_BLK, 40);   /* model# */
+put_le16(p + 47, 0x80ff);/* max xfer 255 sectors */
+put_le16(p + 49, 0x0b00);/* support IORDY/LBA/DMA 
*/
+put_le16(p + 59, 0x1ff); /* cur xfer 255 sectors */
+put_le16(p + 80, 0x1f0); /* support ATA8/7/6/5/4 */
+put_le16(p + 81, 0x16);
+put_le16(p + 82, 0x400);
+put_le16(p + 83, 0x400);
+put_le16(p + 100, lba_sectors);
+put_le16(p + 101, lba_sectors  16);
+put_le16(p + 102, lba_sectors  32);
+put_le16(p + 103, lba_sectors  48);
+}
+
+/* coalesce internal state, copy to pci i/o region 0
+ */
+
 static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
 {
 VirtIOBlock *s = to_virtio_blk(vdev);
@@ -299,11 +344,15 @@ static void virtio_blk_update_config(VirtIODevice *vdev, 
uint8_t *config)
 stw_raw(blkcfg.cylinders, cylinders);
 blkcfg.heads = heads;
 blkcfg.sectors = secs;
+virtio_identify_template(blkcfg);
+memcpy(blkcfg.identify[VIRTIO_BLK_ID_SN], s-serial_str,
+VIRTIO_BLK_ID_SN_BYTES);
 memcpy(config, blkcfg, sizeof(blkcfg));
 }
 
 static uint32_t virtio_blk_get_features(VirtIODevice *vdev)
 {
+VirtIOBlock *s = to_virtio_blk(vdev);
 uint32_t features = 0;
 
 features |= (1  VIRTIO_BLK_F_SEG_MAX);
@@ -311,6 +360,8 @@ static uint32_t virtio_blk_get_features(VirtIODevice *vdev)
 #ifdef __linux__
 features |= (1  VIRTIO_BLK_F_SCSI);
 #endif
+if (strcmp(s-serial_str, 0))
+features |= 1  VIRTIO_BLK_F_IDENTIFY;
 
 return features;
 }
@@ -354,6 +405,7 @@ VirtIODevice *virtio_blk_init(DeviceState *dev)
 int cylinders, heads, secs;
 static int virtio_blk_id;
 BlockDriverState *bs;
+char *ps;
 
 s = (VirtIOBlock *)virtio_common_init(virtio-blk, VIRTIO_ID_BLOCK,
   sizeof(struct virtio_blk_config),
@@ -365,6 +417,10 @@ VirtIODevice *virtio_blk_init(DeviceState *dev)
 s-vdev.reset = virtio_blk_reset;
 s-bs = bs;
 s-rq = NULL;
+if (strlen(ps = (char *)drive_get_serial(bs)))
+strncpy(s-serial_str, ps, sizeof(s-serial_str));
+else
+snprintf(s-serial_str, sizeof(s-serial_str), 0);
 bs-private = dev;
 bdrv_guess_geometry(s-bs, cylinders, heads, secs);
 bdrv_set_geometry_hint(s-bs, cylinders, heads, secs);
diff --git a/hw/virtio-blk.h b/hw/virtio-blk.h
index dff3e0c..1be4342 100644
--- a/hw/virtio-blk.h
+++ b/hw/virtio-blk.h
@@ -30,6 +30,11 @@
 #define VIRTIO_BLK_F_RO 5   /* Disk is read-only */
 #define VIRTIO_BLK_F_BLK_SIZE   6   /* Block size of disk is available*/
 #define VIRTIO_BLK_F_SCSI   7   /* Supports scsi command passthru */
+#define VIRTIO_BLK_F_IDENTIFY   8   /* ATA IDENTIFY supported */
+
+#define VIRTIO_BLK_ID_LEN   256 /* length of identify u16 array */
+#define VIRTIO_BLK_ID_SN10  /* start of char * serial# */
+#define VIRTIO_BLK_ID_SN_BYTES  20  /* length in bytes of serial# */
 
 struct virtio_blk_config
 {
@@ -39,6 +44,8 @@ struct virtio_blk_config
 uint16_t cylinders;
 uint8_t heads;
 uint8_t sectors;
+uint32_t _blk_size;/* structure pad, currently unused */
+uint16_t identify[VIRTIO_BLK_ID_LEN];
 } __attribute__((packed));
 
 /* These two define direction. */
diff --git a/sysemu.h b/sysemu.h
index 47d001e..d3df19f 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -152,6 +152,8 @@ typedef enum {
 BLOCK_ERR_STOP_ANY
 } BlockInterfaceErrorAction;
 
+#define BLOCK_SERIAL_STRLEN 20
+
 typedef struct DriveInfo {
 BlockDriverState *bdrv;
 

[PATCH 2/2] Add serial number support for virtio_blk, V4

2009-05-28 Thread john cooper
virtio_blk-serial-4.patch



 drivers/block/virtio_blk.c |   41 ++---
 include/linux/virtio_blk.h |7 +++
 2 files changed, 45 insertions(+), 3 deletions(-)
=
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -146,12 +146,46 @@ static void do_virtblk_request(struct re
vblk-vq-vq_ops-kick(vblk-vq);
 }
 
+/* return ATA identify data
+ */
+static int virtblk_identify(struct gendisk *disk, void *argp)
+{
+   struct virtio_blk *vblk = disk-private_data;
+   u16 *id;
+   int err = -ENOMEM;
+
+   id = kmalloc(VIRTIO_BLK_ID_BYTES, GFP_KERNEL);
+   if (!id)
+   goto out;
+
+   err = virtio_config_buf(vblk-vdev, VIRTIO_BLK_F_IDENTIFY,
+   offsetof(struct virtio_blk_config, identify), id,
+   VIRTIO_BLK_ID_BYTES);
+
+   if (err)
+   goto out_kfree;
+
+   if (copy_to_user(argp, id, VIRTIO_BLK_ID_BYTES))
+   err = -EFAULT;
+
+out_kfree:
+   kfree(id);
+out:
+   return err;
+}
+
 static int virtblk_ioctl(struct block_device *bdev, fmode_t mode,
 unsigned cmd, unsigned long data)
 {
-   return scsi_cmd_ioctl(bdev-bd_disk-queue,
- bdev-bd_disk, mode, cmd,
- (void __user *)data);
+   struct gendisk *disk = bdev-bd_disk;
+   void __user *argp = (void __user *)data;
+
+   switch (cmd) {
+   case HDIO_GET_IDENTITY:
+   return virtblk_identify(disk, argp);
+   default:
+   return scsi_cmd_ioctl(disk-queue, disk, mode, cmd, argp);
+   }
 }
 
 /* We provide getgeo only to please some old bootloader/partitioning tools */
@@ -356,6 +390,7 @@ static struct virtio_device_id id_table[
 static unsigned int features[] = {
VIRTIO_BLK_F_BARRIER, VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX,
VIRTIO_BLK_F_GEOMETRY, VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE,
+   VIRTIO_BLK_F_IDENTIFY
 };
 
 static struct virtio_driver virtio_blk = {
=
--- a/include/linux/virtio_blk.h
+++ b/include/linux/virtio_blk.h
@@ -15,7 +15,13 @@
 #define VIRTIO_BLK_F_GEOMETRY  4   /* Legacy geometry available  */
 #define VIRTIO_BLK_F_RO5   /* Disk is read-only */
 #define VIRTIO_BLK_F_BLK_SIZE  6   /* Block size of disk is available*/
+#define VIRTIO_BLK_F_IDENTIFY  8   /* ATA IDENTIFY supported */
 
+#define VIRTIO_BLK_ID_LEN 256
+#define VIRTIO_BLK_ID_BYTES (VIRTIO_BLK_ID_LEN * sizeof (u16))
+
+/* mapped into pci i/o region 0
+ */
 struct virtio_blk_config
 {
/* The capacity (in 512-byte sectors). */
@@ -32,6 +38,7 @@ struct virtio_blk_config
} geometry;
/* block size of device (if VIRTIO_BLK_F_BLK_SIZE) */
__u32 blk_size;
+   __u16 identify[VIRTIO_BLK_ID_LEN];
 } __attribute__((packed));
 
 /* These two define direction. */
-- 
john.coo...@redhat.com



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html